[Pvfs2-developers] patch: namei bug fixes

Phil Carns pcarns at wastedcycles.org
Mon Apr 2 12:16:01 EDT 2007


Hi Sam,

I haven't looked at the technical details yet, but I just wanted to 
report that the patch that you attached does pass the test cases that I 
have on 2.6 and 2.4.

thanks,
-Phil

Sam Lang wrote:
> 
> Hi Phil,
> 
> I managed to try your remove dir test on a 2.4 kernel, and was able  to 
> get a segfault with unpatched HEAD.  It looks like returning the  actual 
> dentry is unexpected for 2.4.  If I just return NULL all the  time in 
> the successful cases, it seems to work.  This is what your  patch 
> essentially does I think.  I've attached a patch that does  pretty much 
> the same thing.  Seems to work on 2.4 and 2.6 (as yours  does).
> 
> 
> On Mar 27, 2007, at 4:33 PM, Phil Carns wrote:
> 
>> Just to clarify a little bit, there is actually a problem with the  
>> 2.6 code path here as well.  I went back and ran some tests with  and 
>> without the namei.patch on a RHEL4 box (2.6.9-something) to  confirm.  
>> If I leave the patch out, then the following two LTP  tests (open08, 
>> statfs02) fail:
>>
> 
> On 2.6.20 these tests seem to pass with unpatched HEAD.  But yeah,  with 
> your patch and the one attached, they pass as well.
> 
> Its odd that esp. the 2.6 kernel expects NULL instead of the actual  
> dentry.  The particular semantics (and how its changed over the  
> different kernel versions) of lookup's expected return value aren't  
> well documented.
> 
> In any case, let me know what you think of this patch.
> 
> -sam
> 
> 
> 
>> <<<test_start>>>
>> tag=open08 stime=1175025961
>> cmdline="open08"
>> contacts=""
>> analysis=exit
>> initiation_status="ok"
>> <<<test_output>>>
>> open08      1  PASS  :  expected failure - errno = 17 : File exists
>> open08      2  PASS  :  expected failure - errno = 21 : Is a directory
>> open08      3  PASS  :  expected failure - errno = 20 : Not a  directory
>> open08      4  FAIL  :  unexpected error - 2 : No such file or  
>> directory - expected 36
>> open08      5  PASS  :  expected failure - errno = 13 : Permission  
>> denied
>> open08      6  PASS  :  expected failure - errno = 14 : Bad address
>> <<<execution_status>>>
>> duration=0 termination_type=exited termination_id=1 corefile=no
>> cutime=0 cstime=1
>> <<<test_end>>>
>>
>> <<<test_start>>>
>> tag=statfs02 stime=1175026039
>> cmdline="statfs02"
>> contacts=""
>> analysis=exit
>> initiation_status="ok"
>> <<<test_output>>>
>> statfs02    1  PASS  :  expected failure - errno = 20 : Not a  directory
>> statfs02    2  PASS  :  expected failure - errno = 2 : No such file  
>> or directory
>> statfs02    3  FAIL  :  unexpected error - 2 : No such file or  
>> directory - expected 36
>> statfs02    4  PASS  :  expected failure - errno = 14 : Bad address
>> statfs02    5  PASS  :  expected failure - errno = 14 : Bad address
>> <<<execution_status>>>
>> duration=0 termination_type=exited termination_id=1 corefile=no
>> cutime=0 cstime=0
>> <<<test_end>>>
>>
>> They pass without any trouble if I apply namei.patch.
> 
> 
> 
>>
>> I haven't dug into your technical comments below yet (inodes and  
>> dentries make my head hurt), but I'll try and catch up on them  soon. 
>> From a high level, it does sound like there are some odd  things in 
>> the patch, but I don't know which part of it is  relevant.  I just 
>> narrowed down cvsps patchsets until I found a  code snippet that 
>> seemed to be the source of my problems and  generated a patch to 
>> revert it :)
>>
>> I wish that I could say that we could stop using 2.4 soon, but we  
>> can't quite seem to get to that point yet :(
>>
>> -Phil
>>
>> Sam Lang wrote:
>>
>>> Hi Phil,
>>> I have comments about the patch inline.  Two general comments   
>>> though.  Reading this code (at least for me) has been hard because  
>>> of  all the feature check #ifdefs.  I think trying to support all  
>>> these  different kernel versions gives us a lot of headache that  
>>> maybe we  don't need (I've seen this recently with  
>>> HAVE_AIO_VFS_SUPPORT).  Any  chance we could pair it down to more  
>>> recent versions?  Do you guys  expect to discontinue supporting  2.4 
>>> in the near future?  Would it be  possible to say that future  pvfs 
>>> releases only support 2.6 (maybe  even > 2.6.x)?  Anyone that  has an 
>>> older kernel has to use an older  pvfs version?
>>> -sam
>>> On Mar 20, 2007, at 9:36 AM, Phil Carns wrote:
>>>
>>>> I am sending this patch in a separate email because it may need   
>>>> some discussion to hash out.  Sometime in the past several  months,  
>>>> the pvfs2_lookup() function in namei.c changed (I think  along with  
>>>> something not directly related, but I don't recall  exactly what  
>>>> happened now).
>>>>
>>>> This change caused several directory related bugs to show up for  
>>>> us  on 2.4 and 2.6 kernels.  The 2.4 one was more severe,  though,  
>>>> because it caused a kernel panic.  It could be triggered  by the  
>>>> "rename01" test in LTP, or by the following manual steps:
>>>>
>>>> [root at rhel3 pvfs2]# mkdir testdir
>>>> [root at rhel3 pvfs2]# cd testdir
>>>> [root at rhel3 testdir]# mkdir dir1
>>>> [root at rhel3 testdir]# mv dir1 dir2
>>>> [root at rhel3 testdir]# ls -alh
>>>> total 12K
>>>> drwxr-xr-x    1 root     root         4.0K Dec  1 12:18 .
>>>> drwxrwxrwt    1 root     root         4.0K Dec  1 12:17 ..
>>>> drwxr-xr-x    1 root     root         4.0K Dec  1 12:17 dir2
>>>> [root at centos-rhel3 testdir]# rm dir2
>>>> <crash>
>>>>
>>>> ... so it had something to do with removing a directory that had   
>>>> previously been renamed.
>>>>
>>>> At any rate, I don't know enough about dentries and inodes  anymore  
>>>> to truly understand the old logic that used to work or  the newer  
>>>> logic that causes us problems.  This patch just  naively reverts  
>>>> some of the logic in namei.c to the point that  it works again for  
>>>> us (without changing anything else that was  in that set of  
>>>> commits).  With this in place, we don't see any  more test case  
>>>> failures or kernel panics on 2.6 or 2.4.
>>>>
>>>> We have been using this patch for several months with success,  but  
>>>> it would probably be a good idea for someone more familiar  with  
>>>> this code to look at the change more carefully.
>>>>
>>> I don't have access to a 2.4 kernel with root at the moment, but   
>>> following the code paths I have some comments.  Hopefully Murali  
>>> can  chime in at some point and correct me where I'm wrong about  stuff.
>>>
>>>> -Phil
>>>> Index: pvfs2_src/src/kernel/linux-2.6/namei.c
>>>> ===================================================================
>>>> --- pvfs2_src/src/kernel/linux-2.6/namei.c    (revision 2909)
>>>> +++ pvfs2_src/src/kernel/linux-2.6/namei.c    (revision 2910)
>>>> @@ -164,19 +164,24 @@
>>>>      inode = pvfs2_iget(sb, &new_op->downcall.resp.lookup.refn);
>>>>      if (inode && !is_bad_inode(inode))
>>>>      {
>>>> -            struct dentry *res;
>>>> +            found_pvfs2_inode = PVFS2_I(inode);
>>>>
>>>> +            /* store the retrieved handle and fs_id */
>>>> +            found_pvfs2_inode->refn = new_op-  
>>>> >downcall.resp.lookup.refn;
>>>> +
>>>
>>> I think this is redundant.  Unless you're using a _very_ old 2.4   
>>> kernel, the pvfs2_iget call sets the fsid and handle in the pvfs2   
>>> inode pointer of the inode.  Basically, pvfs2_iget translates to  a  
>>> pvfs2_iget_common (with keep_locked == 0), which means that   
>>> pvfs2_set_inode will get called, which will do the same thing as   
>>> above.  The only way I can see that this wouldn't happen is if   
>>> iget4_locked isn't supported by your kernel version, but it  appears  
>>> to have been in 2.4.25 and up, so you'd have to be  running with  
>>> something pretty old.
>>> I think that Murali added pvfs2_iget to abstract out this manual   
>>> setting, which will still appear to do in some places.
>>>
>>>>          /* update dentry/inode pair into dcache */
>>>>          dentry->d_op = &pvfs2_dentry_operations;
>>>>
>>>> -            res = pvfs2_d_splice_alias(dentry, inode);
>>>> +            pvfs2_d_splice_alias(dentry, inode);
>>>>
>>>>              gossip_debug(GOSSIP_NAME_DEBUG, "Lookup success  
>>>> (inode  ct = %d)\n",
>>>>                          (int)atomic_read(&inode->i_count));
>>>> +#if 0
>>>>              op_release(new_op);
>>>>              if (res)
>>>>                  res->d_op = &pvfs2_dentry_operations;
>>>>              return res;
>>>> +#endif
>>>
>>> Here too, if you're running 2.4, then res is guaranteed to equal   
>>> dentry, so while setting the dentry_operations struct to d_op is   
>>> redundant for 2.4, its needed for 2.6 in the case where we found  a  
>>> disconnected dentry and returned that instead.  The only  difference  
>>> here in the 2.4 code path that I can see is that if a  new entry 
>>> _was_  added to the dcache, you return NULL now, instead  of 
>>> returning the  new entry.  Since the new entry is in the  dcache, 
>>> maybe that's ok  (and expected for 2.4).  For 2.6 it seems  clear 
>>> that the new dentry  is supposed to be returned.
>>>
>>>>      }
>>>>          else if (inode && is_bad_inode(inode))
>>>>          {
>>>> @@ -227,7 +232,14 @@
>>>>      }
>>>>
>>>>      op_release(new_op);
>>>> -    return NULL;
>>>> +    if(ret != -ENOENT)
>>>> +    {
>>>> +        return ERR_PTR(ret);
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        return NULL;
>>>> +    }
>>>
>>> This might be what's causing your failures with 2.4.  It expects  
>>> NULL  if a dentry was added to the dcache (which happens with  
>>> ENOENT), or a  non-null error pointer for any other error.  As  
>>> mentioned above, I  think the semantics of the return from lookup  
>>> have changed in 2.6, so  we might want to change this up a bit.
>>>
>>>>  }
>>>>
>>>>  /* return 0 on success; non-zero otherwise */
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>
> 



More information about the Pvfs2-developers mailing list