[Pvfs2-developers] patch: namei bug fixes
Phil Carns
pcarns at wastedcycles.org
Mon Apr 2 12:16:01 EDT 2007
Hi Sam,
I haven't looked at the technical details yet, but I just wanted to
report that the patch that you attached does pass the test cases that I
have on 2.6 and 2.4.
thanks,
-Phil
Sam Lang wrote:
>
> Hi Phil,
>
> I managed to try your remove dir test on a 2.4 kernel, and was able to
> get a segfault with unpatched HEAD. It looks like returning the actual
> dentry is unexpected for 2.4. If I just return NULL all the time in
> the successful cases, it seems to work. This is what your patch
> essentially does I think. I've attached a patch that does pretty much
> the same thing. Seems to work on 2.4 and 2.6 (as yours does).
>
>
> On Mar 27, 2007, at 4:33 PM, Phil Carns wrote:
>
>> Just to clarify a little bit, there is actually a problem with the
>> 2.6 code path here as well. I went back and ran some tests with and
>> without the namei.patch on a RHEL4 box (2.6.9-something) to confirm.
>> If I leave the patch out, then the following two LTP tests (open08,
>> statfs02) fail:
>>
>
> On 2.6.20 these tests seem to pass with unpatched HEAD. But yeah, with
> your patch and the one attached, they pass as well.
>
> Its odd that esp. the 2.6 kernel expects NULL instead of the actual
> dentry. The particular semantics (and how its changed over the
> different kernel versions) of lookup's expected return value aren't
> well documented.
>
> In any case, let me know what you think of this patch.
>
> -sam
>
>
>
>> <<<test_start>>>
>> tag=open08 stime=1175025961
>> cmdline="open08"
>> contacts=""
>> analysis=exit
>> initiation_status="ok"
>> <<<test_output>>>
>> open08 1 PASS : expected failure - errno = 17 : File exists
>> open08 2 PASS : expected failure - errno = 21 : Is a directory
>> open08 3 PASS : expected failure - errno = 20 : Not a directory
>> open08 4 FAIL : unexpected error - 2 : No such file or
>> directory - expected 36
>> open08 5 PASS : expected failure - errno = 13 : Permission
>> denied
>> open08 6 PASS : expected failure - errno = 14 : Bad address
>> <<<execution_status>>>
>> duration=0 termination_type=exited termination_id=1 corefile=no
>> cutime=0 cstime=1
>> <<<test_end>>>
>>
>> <<<test_start>>>
>> tag=statfs02 stime=1175026039
>> cmdline="statfs02"
>> contacts=""
>> analysis=exit
>> initiation_status="ok"
>> <<<test_output>>>
>> statfs02 1 PASS : expected failure - errno = 20 : Not a directory
>> statfs02 2 PASS : expected failure - errno = 2 : No such file
>> or directory
>> statfs02 3 FAIL : unexpected error - 2 : No such file or
>> directory - expected 36
>> statfs02 4 PASS : expected failure - errno = 14 : Bad address
>> statfs02 5 PASS : expected failure - errno = 14 : Bad address
>> <<<execution_status>>>
>> duration=0 termination_type=exited termination_id=1 corefile=no
>> cutime=0 cstime=0
>> <<<test_end>>>
>>
>> They pass without any trouble if I apply namei.patch.
>
>
>
>>
>> I haven't dug into your technical comments below yet (inodes and
>> dentries make my head hurt), but I'll try and catch up on them soon.
>> From a high level, it does sound like there are some odd things in
>> the patch, but I don't know which part of it is relevant. I just
>> narrowed down cvsps patchsets until I found a code snippet that
>> seemed to be the source of my problems and generated a patch to
>> revert it :)
>>
>> I wish that I could say that we could stop using 2.4 soon, but we
>> can't quite seem to get to that point yet :(
>>
>> -Phil
>>
>> Sam Lang wrote:
>>
>>> Hi Phil,
>>> I have comments about the patch inline. Two general comments
>>> though. Reading this code (at least for me) has been hard because
>>> of all the feature check #ifdefs. I think trying to support all
>>> these different kernel versions gives us a lot of headache that
>>> maybe we don't need (I've seen this recently with
>>> HAVE_AIO_VFS_SUPPORT). Any chance we could pair it down to more
>>> recent versions? Do you guys expect to discontinue supporting 2.4
>>> in the near future? Would it be possible to say that future pvfs
>>> releases only support 2.6 (maybe even > 2.6.x)? Anyone that has an
>>> older kernel has to use an older pvfs version?
>>> -sam
>>> On Mar 20, 2007, at 9:36 AM, Phil Carns wrote:
>>>
>>>> I am sending this patch in a separate email because it may need
>>>> some discussion to hash out. Sometime in the past several months,
>>>> the pvfs2_lookup() function in namei.c changed (I think along with
>>>> something not directly related, but I don't recall exactly what
>>>> happened now).
>>>>
>>>> This change caused several directory related bugs to show up for
>>>> us on 2.4 and 2.6 kernels. The 2.4 one was more severe, though,
>>>> because it caused a kernel panic. It could be triggered by the
>>>> "rename01" test in LTP, or by the following manual steps:
>>>>
>>>> [root at rhel3 pvfs2]# mkdir testdir
>>>> [root at rhel3 pvfs2]# cd testdir
>>>> [root at rhel3 testdir]# mkdir dir1
>>>> [root at rhel3 testdir]# mv dir1 dir2
>>>> [root at rhel3 testdir]# ls -alh
>>>> total 12K
>>>> drwxr-xr-x 1 root root 4.0K Dec 1 12:18 .
>>>> drwxrwxrwt 1 root root 4.0K Dec 1 12:17 ..
>>>> drwxr-xr-x 1 root root 4.0K Dec 1 12:17 dir2
>>>> [root at centos-rhel3 testdir]# rm dir2
>>>> <crash>
>>>>
>>>> ... so it had something to do with removing a directory that had
>>>> previously been renamed.
>>>>
>>>> At any rate, I don't know enough about dentries and inodes anymore
>>>> to truly understand the old logic that used to work or the newer
>>>> logic that causes us problems. This patch just naively reverts
>>>> some of the logic in namei.c to the point that it works again for
>>>> us (without changing anything else that was in that set of
>>>> commits). With this in place, we don't see any more test case
>>>> failures or kernel panics on 2.6 or 2.4.
>>>>
>>>> We have been using this patch for several months with success, but
>>>> it would probably be a good idea for someone more familiar with
>>>> this code to look at the change more carefully.
>>>>
>>> I don't have access to a 2.4 kernel with root at the moment, but
>>> following the code paths I have some comments. Hopefully Murali
>>> can chime in at some point and correct me where I'm wrong about stuff.
>>>
>>>> -Phil
>>>> Index: pvfs2_src/src/kernel/linux-2.6/namei.c
>>>> ===================================================================
>>>> --- pvfs2_src/src/kernel/linux-2.6/namei.c (revision 2909)
>>>> +++ pvfs2_src/src/kernel/linux-2.6/namei.c (revision 2910)
>>>> @@ -164,19 +164,24 @@
>>>> inode = pvfs2_iget(sb, &new_op->downcall.resp.lookup.refn);
>>>> if (inode && !is_bad_inode(inode))
>>>> {
>>>> - struct dentry *res;
>>>> + found_pvfs2_inode = PVFS2_I(inode);
>>>>
>>>> + /* store the retrieved handle and fs_id */
>>>> + found_pvfs2_inode->refn = new_op-
>>>> >downcall.resp.lookup.refn;
>>>> +
>>>
>>> I think this is redundant. Unless you're using a _very_ old 2.4
>>> kernel, the pvfs2_iget call sets the fsid and handle in the pvfs2
>>> inode pointer of the inode. Basically, pvfs2_iget translates to a
>>> pvfs2_iget_common (with keep_locked == 0), which means that
>>> pvfs2_set_inode will get called, which will do the same thing as
>>> above. The only way I can see that this wouldn't happen is if
>>> iget4_locked isn't supported by your kernel version, but it appears
>>> to have been in 2.4.25 and up, so you'd have to be running with
>>> something pretty old.
>>> I think that Murali added pvfs2_iget to abstract out this manual
>>> setting, which will still appear to do in some places.
>>>
>>>> /* update dentry/inode pair into dcache */
>>>> dentry->d_op = &pvfs2_dentry_operations;
>>>>
>>>> - res = pvfs2_d_splice_alias(dentry, inode);
>>>> + pvfs2_d_splice_alias(dentry, inode);
>>>>
>>>> gossip_debug(GOSSIP_NAME_DEBUG, "Lookup success
>>>> (inode ct = %d)\n",
>>>> (int)atomic_read(&inode->i_count));
>>>> +#if 0
>>>> op_release(new_op);
>>>> if (res)
>>>> res->d_op = &pvfs2_dentry_operations;
>>>> return res;
>>>> +#endif
>>>
>>> Here too, if you're running 2.4, then res is guaranteed to equal
>>> dentry, so while setting the dentry_operations struct to d_op is
>>> redundant for 2.4, its needed for 2.6 in the case where we found a
>>> disconnected dentry and returned that instead. The only difference
>>> here in the 2.4 code path that I can see is that if a new entry
>>> _was_ added to the dcache, you return NULL now, instead of
>>> returning the new entry. Since the new entry is in the dcache,
>>> maybe that's ok (and expected for 2.4). For 2.6 it seems clear
>>> that the new dentry is supposed to be returned.
>>>
>>>> }
>>>> else if (inode && is_bad_inode(inode))
>>>> {
>>>> @@ -227,7 +232,14 @@
>>>> }
>>>>
>>>> op_release(new_op);
>>>> - return NULL;
>>>> + if(ret != -ENOENT)
>>>> + {
>>>> + return ERR_PTR(ret);
>>>> + }
>>>> + else
>>>> + {
>>>> + return NULL;
>>>> + }
>>>
>>> This might be what's causing your failures with 2.4. It expects
>>> NULL if a dentry was added to the dcache (which happens with
>>> ENOENT), or a non-null error pointer for any other error. As
>>> mentioned above, I think the semantics of the return from lookup
>>> have changed in 2.6, so we might want to change this up a bit.
>>>
>>>> }
>>>>
>>>> /* return 0 on success; non-zero otherwise */
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>
>
More information about the Pvfs2-developers
mailing list