[Pvfs2-developers] help with kernel dentry revalidate problem
Sam Lang
slang at mcs.anl.gov
Wed Aug 15 23:00:18 EDT 2007
On Aug 15, 2007, at 7:50 PM, Phil Carns wrote:
> You might want to try repeating the test with the pvfs2-client set
> to disable the ncache and acache (set the timeout to zero either in
> proc or with command line arguments). I don't know if they are
> playing any role or not, but it may at least simplify the debugging
> a little.
The ncache was my guess too, but it turned out to still occur with -n 0.
-sam
>
> -Phil
>
> Pete Wyckoff wrote:
>> Sam and I have been tracking down a pvfs bug when using the VFS
>> interface. Kevin discovered it.
>> The code is test #7 in simul. It runs in parallel, four tasks,
>> tasks 0 and 1 on node1 and tasks 2 and 3 on node2. It does:
>> if (task == 0)
>> mkdir("foo");
>> MPI_Barrier();
>> sleep(3);
>> stat("foo");
>> if (task == 0)
>> rmdir("foo");
>> On a freshly initialized pvfs (1 server for both md + io), it works.
>> Task 0 creates the directory, and all four tasks stat it
>> successfully. When the process exits, the directory is indeed gone.
>> The second time you run it, tasks 2 and 3 (on node2) get -ENOENT
>> from the stat, but tasks 0 and 1 work fine as before and the
>> directory was indeed created properly.
>> Looking down a bit further, the server sees lookup requests from
>> tasks 2 and 3, and returns the proper handle Id. Then it sees
>> getattr requests from tasks 2 and 3 for the handle ID that the
>> directory had on the first run, not the handle ID for this run.
>> We may have traced this to the kernel module. Some of the log looks
>> like this:
>> pvfs2_d_revalidate_common: called on dentry ffff81003df86970.
>> pvfs2_d_revalidate_common: parent found.
>> pvfs2_d_revalidate_common: attempting lookup.
>> Alloced OP (ffff81003d98a1f8: 121 OP_LOOKUP)
>> pvfs2: service_operation: pvfs2_lookup ffff81003d98a1f8
>> client-core: reading op tag 120 OP_LOOKUP
>> client-core: reading op tag 121 OP_LOOKUP
>> (get) Alloced OP (ffff81003df561b8:120)
>> (get) Alloced OP (ffff81003d98a1f8:121)
>> pvfs2: service_operation pvfs2_lookup returning: 0 for
>> ffff81003df561b8.
>> pvfs2_d_revalidate_common: lookup failure or no match.
>> Releasing OP (ffff81003df561b8: 120)
>> pvfs2_getattr: called on simul_dir_stat.0
>> pvfs2_inode_getattr: called on inode 1048471
>> Something calls revalidate on the dentry. The lookup returns
>> successful from userspace. The kernel sees that the handles are
>> different:
>> if((new_op->downcall.status != 0) ||
>> !match_handle(new_op-
>> >downcall.resp.lookup.refn.handle, inode))
>> {
>> gossip_debug(GOSSIP_DCACHE_DEBUG,
>> "pvfs2_d_revalidate_common: lookup failure or no match.\n");
>> op_release(new_op);
>> return(0);
>> }
>> But then immediatly issues a getattr for the old handle ID. Anybody
>> know how to fix or destroy the bad dentry? (Looks at Murali...)
>> -- Pete
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list