[Pvfs2-developers] patch: namei bug fixes
Sam Lang
slang at mcs.anl.gov
Mon Mar 26 11:47:08 EST 2007
Hi Phil,
I have comments about the patch inline. Two general comments
though. Reading this code (at least for me) has been hard because of
all the feature check #ifdefs. I think trying to support all these
different kernel versions gives us a lot of headache that maybe we
don't need (I've seen this recently with HAVE_AIO_VFS_SUPPORT). Any
chance we could pair it down to more recent versions? Do you guys
expect to discontinue supporting 2.4 in the near future? Would it be
possible to say that future pvfs releases only support 2.6 (maybe
even > 2.6.x)? Anyone that has an older kernel has to use an older
pvfs version?
-sam
On Mar 20, 2007, at 9:36 AM, Phil Carns wrote:
> I am sending this patch in a separate email because it may need
> some discussion to hash out. Sometime in the past several months,
> the pvfs2_lookup() function in namei.c changed (I think along with
> something not directly related, but I don't recall exactly what
> happened now).
>
> This change caused several directory related bugs to show up for us
> on 2.4 and 2.6 kernels. The 2.4 one was more severe, though,
> because it caused a kernel panic. It could be triggered by the
> "rename01" test in LTP, or by the following manual steps:
>
> [root at rhel3 pvfs2]# mkdir testdir
> [root at rhel3 pvfs2]# cd testdir
> [root at rhel3 testdir]# mkdir dir1
> [root at rhel3 testdir]# mv dir1 dir2
> [root at rhel3 testdir]# ls -alh
> total 12K
> drwxr-xr-x 1 root root 4.0K Dec 1 12:18 .
> drwxrwxrwt 1 root root 4.0K Dec 1 12:17 ..
> drwxr-xr-x 1 root root 4.0K Dec 1 12:17 dir2
> [root at centos-rhel3 testdir]# rm dir2
> <crash>
>
> ... so it had something to do with removing a directory that had
> previously been renamed.
>
> At any rate, I don't know enough about dentries and inodes anymore
> to truly understand the old logic that used to work or the newer
> logic that causes us problems. This patch just naively reverts
> some of the logic in namei.c to the point that it works again for
> us (without changing anything else that was in that set of
> commits). With this in place, we don't see any more test case
> failures or kernel panics on 2.6 or 2.4.
>
> We have been using this patch for several months with success, but
> it would probably be a good idea for someone more familiar with
> this code to look at the change more carefully.
>
I don't have access to a 2.4 kernel with root at the moment, but
following the code paths I have some comments. Hopefully Murali can
chime in at some point and correct me where I'm wrong about stuff.
> -Phil
> Index: pvfs2_src/src/kernel/linux-2.6/namei.c
> ===================================================================
> --- pvfs2_src/src/kernel/linux-2.6/namei.c (revision 2909)
> +++ pvfs2_src/src/kernel/linux-2.6/namei.c (revision 2910)
> @@ -164,19 +164,24 @@
> inode = pvfs2_iget(sb, &new_op->downcall.resp.lookup.refn);
> if (inode && !is_bad_inode(inode))
> {
> - struct dentry *res;
> + found_pvfs2_inode = PVFS2_I(inode);
>
> + /* store the retrieved handle and fs_id */
> + found_pvfs2_inode->refn = new_op-
> >downcall.resp.lookup.refn;
> +
I think this is redundant. Unless you're using a _very_ old 2.4
kernel, the pvfs2_iget call sets the fsid and handle in the pvfs2
inode pointer of the inode. Basically, pvfs2_iget translates to a
pvfs2_iget_common (with keep_locked == 0), which means that
pvfs2_set_inode will get called, which will do the same thing as
above. The only way I can see that this wouldn't happen is if
iget4_locked isn't supported by your kernel version, but it appears
to have been in 2.4.25 and up, so you'd have to be running with
something pretty old.
I think that Murali added pvfs2_iget to abstract out this manual
setting, which will still appear to do in some places.
> /* update dentry/inode pair into dcache */
> dentry->d_op = &pvfs2_dentry_operations;
>
> - res = pvfs2_d_splice_alias(dentry, inode);
> + pvfs2_d_splice_alias(dentry, inode);
>
> gossip_debug(GOSSIP_NAME_DEBUG, "Lookup success (inode
> ct = %d)\n",
> (int)atomic_read(&inode->i_count));
> +#if 0
> op_release(new_op);
> if (res)
> res->d_op = &pvfs2_dentry_operations;
> return res;
> +#endif
Here too, if you're running 2.4, then res is guaranteed to equal
dentry, so while setting the dentry_operations struct to d_op is
redundant for 2.4, its needed for 2.6 in the case where we found a
disconnected dentry and returned that instead. The only difference
here in the 2.4 code path that I can see is that if a new entry _was_
added to the dcache, you return NULL now, instead of returning the
new entry. Since the new entry is in the dcache, maybe that's ok
(and expected for 2.4). For 2.6 it seems clear that the new dentry
is supposed to be returned.
> }
> else if (inode && is_bad_inode(inode))
> {
> @@ -227,7 +232,14 @@
> }
>
> op_release(new_op);
> - return NULL;
> + if(ret != -ENOENT)
> + {
> + return ERR_PTR(ret);
> + }
> + else
> + {
> + return NULL;
> + }
This might be what's causing your failures with 2.4. It expects NULL
if a dentry was added to the dcache (which happens with ENOENT), or a
non-null error pointer for any other error. As mentioned above, I
think the semantics of the return from lookup have changed in 2.6, so
we might want to change this up a bit.
> }
>
> /* return 0 on success; non-zero otherwise */
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
More information about the Pvfs2-developers
mailing list