[Pvfs2-developers] help with kernel dentry revalidate problem

Sam Lang slang at mcs.anl.gov
Wed Aug 15 23:02:38 EDT 2007


On Aug 15, 2007, at 7:00 PM, Murali Vilayannur wrote:

> Oops: Forget to do a reply-to-all..
>
> Pete,
> Weird..
> d_revalidate is the FS callback to tell the VFS that the dentry is no
> longer valid
> and in this case it looks like it rightfully returns 0 indicating that
> the entry is longer to be trusted.
>
>        int status = dentry->d_op->d_revalidate(dentry, nd);
>         if (unlikely(status <= 0)) {
>                 /*
>                  * The dentry failed validation.
>                  * If d_revalidate returned 0 attempt to invalidate
>                  * the dentry otherwise d_revalidate is asking us
>                  * to return a fail status.
>                  */
>                 if (!status) {
>                         if (!d_invalidate(dentry)) { <-- will try to
> invalidate the dentry
>                                 dput(dentry);  <-- drop the reference
>                                 dentry = NULL; <-- set it to NULL
>                         }
>                 } else {
>                         dput(dentry);
>                         dentry = ERR_PTR(status);
>                 }
> return dentry <-- gets returned.
>
> This percolates upwards
>
>         dentry = cached_lookup(base, name, nd);
>         if (!dentry) {
>                 struct dentry *new = d_alloc(base, name); <-- would
> have allocated a new dentry
>                 dentry = ERR_PTR(-ENOMEM);
>                 if (!new)
>                         goto out;
>                 dentry = inode->i_op->lookup(inode, new, nd); <--
> callsinto our lookup
>                 if (!dentry)
>                         dentry = new;
>                 else
>                         dput(new);
>         }
>
> I am confused on why we are doing a getattr on the stale handle unless
> the dentry was somehow not being invalidated i.e. d_invalidate()
> failed perhaps?
> In any case d_drop() is the API to drop the dentry which is what
> d_invalidate() should have invoked..
> btw, Were there are any files under the "foo" directory?
>
> Can you bump up the loglevels (include lookup GOSSIP_NAME_DEBUG,
> getattr GOSSIP_GETATTR_DEBUG) and send it to me?
> I will run this workload tonight and see if I can repro this..
> This needs more than 1 machine, right?

Yep.  When Kevin discovered this problem, I think it was with two  
client nodes and four MPI processes.  I think Pete was using the same  
setup.  This shouldn't need more than one metadata server to reproduce.

-sam


> thanks,
> Murali
>
> On 8/15/07, Pete Wyckoff <pw at osc.edu> wrote:
>> Sam and I have been tracking down a pvfs bug when using the VFS
>> interface.  Kevin discovered it.
>>
>> The code is test #7 in simul.  It runs in parallel, four tasks,
>> tasks 0 and 1 on node1 and tasks 2 and 3 on node2.  It does:
>>
>>     if (task == 0)
>>         mkdir("foo");
>>
>>     MPI_Barrier();
>>     sleep(3);
>>
>>     stat("foo");
>>
>>     if (task == 0)
>>         rmdir("foo");
>>
>> On a freshly initialized pvfs (1 server for both md + io), it works.
>> Task 0 creates the directory, and all four tasks stat it
>> successfully.  When the process exits, the directory is indeed gone.
>>
>> The second time you run it, tasks 2 and 3 (on node2) get -ENOENT
>> from the stat, but tasks 0 and 1 work fine as before and the
>> directory was indeed created properly.
>>
>> Looking down a bit further, the server sees lookup requests from
>> tasks 2 and 3, and returns the proper handle Id.  Then it sees
>> getattr requests from tasks 2 and 3 for the handle ID that the
>> directory had on the first run, not the handle ID for this run.
>>
>> We may have traced this to the kernel module.  Some of the log looks
>> like this:
>>
>>     pvfs2_d_revalidate_common: called on dentry ffff81003df86970.
>>     pvfs2_d_revalidate_common: parent found.
>>     pvfs2_d_revalidate_common: attempting lookup.
>>     Alloced OP (ffff81003d98a1f8: 121 OP_LOOKUP)
>>     pvfs2: service_operation: pvfs2_lookup ffff81003d98a1f8
>>     client-core: reading op tag 120 OP_LOOKUP
>>     client-core: reading op tag 121 OP_LOOKUP
>>     (get) Alloced OP (ffff81003df561b8:120)
>>     (get) Alloced OP (ffff81003d98a1f8:121)
>>     pvfs2: service_operation pvfs2_lookup returning: 0 for
>>     ffff81003df561b8.
>>     pvfs2_d_revalidate_common: lookup failure or no match.
>>     Releasing OP (ffff81003df561b8: 120)
>>     pvfs2_getattr: called on simul_dir_stat.0
>>     pvfs2_inode_getattr: called on inode 1048471
>>
>> Something calls revalidate on the dentry.  The lookup returns
>> successful from userspace.  The kernel sees that the handles are
>> different:
>>
>>             if((new_op->downcall.status != 0) ||
>>                     !match_handle(new_op- 
>> >downcall.resp.lookup.refn.handle, inode))
>>             {
>>                 gossip_debug(GOSSIP_DCACHE_DEBUG,  
>> "pvfs2_d_revalidate_common: lookup failure or no match.\n");
>>                 op_release(new_op);
>>                 return(0);
>>             }
>>
>> But then immediatly issues a getattr for the old handle ID.  Anybody
>> know how to fix or destroy the bad dentry?  (Looks at Murali...)
>>
>>                 -- Pete
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list