[Pvfs2-developers] patch: namei bug fixes

Sam Lang slang at mcs.anl.gov
Thu Mar 29 13:51:40 EST 2007


Hi Phil,

I managed to try your remove dir test on a 2.4 kernel, and was able  
to get a segfault with unpatched HEAD.  It looks like returning the  
actual dentry is unexpected for 2.4.  If I just return NULL all the  
time in the successful cases, it seems to work.  This is what your  
patch essentially does I think.  I've attached a patch that does  
pretty much the same thing.  Seems to work on 2.4 and 2.6 (as yours  
does).


On Mar 27, 2007, at 4:33 PM, Phil Carns wrote:

> Just to clarify a little bit, there is actually a problem with the  
> 2.6 code path here as well.  I went back and ran some tests with  
> and without the namei.patch on a RHEL4 box (2.6.9-something) to  
> confirm.  If I leave the patch out, then the following two LTP  
> tests (open08, statfs02) fail:
>

On 2.6.20 these tests seem to pass with unpatched HEAD.  But yeah,  
with your patch and the one attached, they pass as well.

Its odd that esp. the 2.6 kernel expects NULL instead of the actual  
dentry.  The particular semantics (and how its changed over the  
different kernel versions) of lookup's expected return value aren't  
well documented.

In any case, let me know what you think of this patch.

-sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: namei3.patch
Type: application/octet-stream
Size: 5503 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070329/35101775/namei3.obj
-------------- next part --------------


> <<<test_start>>>
> tag=open08 stime=1175025961
> cmdline="open08"
> contacts=""
> analysis=exit
> initiation_status="ok"
> <<<test_output>>>
> open08      1  PASS  :  expected failure - errno = 17 : File exists
> open08      2  PASS  :  expected failure - errno = 21 : Is a directory
> open08      3  PASS  :  expected failure - errno = 20 : Not a  
> directory
> open08      4  FAIL  :  unexpected error - 2 : No such file or  
> directory - expected 36
> open08      5  PASS  :  expected failure - errno = 13 : Permission  
> denied
> open08      6  PASS  :  expected failure - errno = 14 : Bad address
> <<<execution_status>>>
> duration=0 termination_type=exited termination_id=1 corefile=no
> cutime=0 cstime=1
> <<<test_end>>>
>
> <<<test_start>>>
> tag=statfs02 stime=1175026039
> cmdline="statfs02"
> contacts=""
> analysis=exit
> initiation_status="ok"
> <<<test_output>>>
> statfs02    1  PASS  :  expected failure - errno = 20 : Not a  
> directory
> statfs02    2  PASS  :  expected failure - errno = 2 : No such file  
> or directory
> statfs02    3  FAIL  :  unexpected error - 2 : No such file or  
> directory - expected 36
> statfs02    4  PASS  :  expected failure - errno = 14 : Bad address
> statfs02    5  PASS  :  expected failure - errno = 14 : Bad address
> <<<execution_status>>>
> duration=0 termination_type=exited termination_id=1 corefile=no
> cutime=0 cstime=0
> <<<test_end>>>
>
> They pass without any trouble if I apply namei.patch.


>
> I haven't dug into your technical comments below yet (inodes and  
> dentries make my head hurt), but I'll try and catch up on them  
> soon. From a high level, it does sound like there are some odd  
> things in the patch, but I don't know which part of it is  
> relevant.  I just narrowed down cvsps patchsets until I found a  
> code snippet that seemed to be the source of my problems and  
> generated a patch to revert it :)
>
> I wish that I could say that we could stop using 2.4 soon, but we  
> can't quite seem to get to that point yet :(
>
> -Phil
>
> Sam Lang wrote:
>> Hi Phil,
>> I have comments about the patch inline.  Two general comments   
>> though.  Reading this code (at least for me) has been hard because  
>> of  all the feature check #ifdefs.  I think trying to support all  
>> these  different kernel versions gives us a lot of headache that  
>> maybe we  don't need (I've seen this recently with  
>> HAVE_AIO_VFS_SUPPORT).  Any  chance we could pair it down to more  
>> recent versions?  Do you guys  expect to discontinue supporting  
>> 2.4 in the near future?  Would it be  possible to say that future  
>> pvfs releases only support 2.6 (maybe  even > 2.6.x)?  Anyone that  
>> has an older kernel has to use an older  pvfs version?
>> -sam
>> On Mar 20, 2007, at 9:36 AM, Phil Carns wrote:
>>> I am sending this patch in a separate email because it may need   
>>> some discussion to hash out.  Sometime in the past several  
>>> months,  the pvfs2_lookup() function in namei.c changed (I think  
>>> along with  something not directly related, but I don't recall  
>>> exactly what  happened now).
>>>
>>> This change caused several directory related bugs to show up for  
>>> us  on 2.4 and 2.6 kernels.  The 2.4 one was more severe,  
>>> though,  because it caused a kernel panic.  It could be triggered  
>>> by the  "rename01" test in LTP, or by the following manual steps:
>>>
>>> [root at rhel3 pvfs2]# mkdir testdir
>>> [root at rhel3 pvfs2]# cd testdir
>>> [root at rhel3 testdir]# mkdir dir1
>>> [root at rhel3 testdir]# mv dir1 dir2
>>> [root at rhel3 testdir]# ls -alh
>>> total 12K
>>> drwxr-xr-x    1 root     root         4.0K Dec  1 12:18 .
>>> drwxrwxrwt    1 root     root         4.0K Dec  1 12:17 ..
>>> drwxr-xr-x    1 root     root         4.0K Dec  1 12:17 dir2
>>> [root at centos-rhel3 testdir]# rm dir2
>>> <crash>
>>>
>>> ... so it had something to do with removing a directory that had   
>>> previously been renamed.
>>>
>>> At any rate, I don't know enough about dentries and inodes  
>>> anymore  to truly understand the old logic that used to work or  
>>> the newer  logic that causes us problems.  This patch just  
>>> naively reverts  some of the logic in namei.c to the point that  
>>> it works again for  us (without changing anything else that was  
>>> in that set of  commits).  With this in place, we don't see any  
>>> more test case  failures or kernel panics on 2.6 or 2.4.
>>>
>>> We have been using this patch for several months with success,  
>>> but  it would probably be a good idea for someone more familiar  
>>> with  this code to look at the change more carefully.
>>>
>> I don't have access to a 2.4 kernel with root at the moment, but   
>> following the code paths I have some comments.  Hopefully Murali  
>> can  chime in at some point and correct me where I'm wrong about  
>> stuff.
>>> -Phil
>>> Index: pvfs2_src/src/kernel/linux-2.6/namei.c
>>> ===================================================================
>>> --- pvfs2_src/src/kernel/linux-2.6/namei.c    (revision 2909)
>>> +++ pvfs2_src/src/kernel/linux-2.6/namei.c    (revision 2910)
>>> @@ -164,19 +164,24 @@
>>>      inode = pvfs2_iget(sb, &new_op->downcall.resp.lookup.refn);
>>>      if (inode && !is_bad_inode(inode))
>>>      {
>>> -            struct dentry *res;
>>> +            found_pvfs2_inode = PVFS2_I(inode);
>>>
>>> +            /* store the retrieved handle and fs_id */
>>> +            found_pvfs2_inode->refn = new_op-  
>>> >downcall.resp.lookup.refn;
>>> +
>> I think this is redundant.  Unless you're using a _very_ old 2.4   
>> kernel, the pvfs2_iget call sets the fsid and handle in the pvfs2   
>> inode pointer of the inode.  Basically, pvfs2_iget translates to  
>> a  pvfs2_iget_common (with keep_locked == 0), which means that   
>> pvfs2_set_inode will get called, which will do the same thing as   
>> above.  The only way I can see that this wouldn't happen is if   
>> iget4_locked isn't supported by your kernel version, but it  
>> appears  to have been in 2.4.25 and up, so you'd have to be  
>> running with  something pretty old.
>> I think that Murali added pvfs2_iget to abstract out this manual   
>> setting, which will still appear to do in some places.
>>>          /* update dentry/inode pair into dcache */
>>>          dentry->d_op = &pvfs2_dentry_operations;
>>>
>>> -            res = pvfs2_d_splice_alias(dentry, inode);
>>> +            pvfs2_d_splice_alias(dentry, inode);
>>>
>>>              gossip_debug(GOSSIP_NAME_DEBUG, "Lookup success  
>>> (inode  ct = %d)\n",
>>>                          (int)atomic_read(&inode->i_count));
>>> +#if 0
>>>              op_release(new_op);
>>>              if (res)
>>>                  res->d_op = &pvfs2_dentry_operations;
>>>              return res;
>>> +#endif
>> Here too, if you're running 2.4, then res is guaranteed to equal   
>> dentry, so while setting the dentry_operations struct to d_op is   
>> redundant for 2.4, its needed for 2.6 in the case where we found  
>> a  disconnected dentry and returned that instead.  The only  
>> difference  here in the 2.4 code path that I can see is that if a  
>> new entry _was_  added to the dcache, you return NULL now, instead  
>> of returning the  new entry.  Since the new entry is in the  
>> dcache, maybe that's ok  (and expected for 2.4).  For 2.6 it seems  
>> clear that the new dentry  is supposed to be returned.
>>>      }
>>>          else if (inode && is_bad_inode(inode))
>>>          {
>>> @@ -227,7 +232,14 @@
>>>      }
>>>
>>>      op_release(new_op);
>>> -    return NULL;
>>> +    if(ret != -ENOENT)
>>> +    {
>>> +        return ERR_PTR(ret);
>>> +    }
>>> +    else
>>> +    {
>>> +        return NULL;
>>> +    }
>> This might be what's causing your failures with 2.4.  It expects  
>> NULL  if a dentry was added to the dcache (which happens with  
>> ENOENT), or a  non-null error pointer for any other error.  As  
>> mentioned above, I  think the semantics of the return from lookup  
>> have changed in 2.6, so  we might want to change this up a bit.
>>>  }
>>>
>>>  /* return 0 on success; non-zero otherwise */
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list