[Pvfs2-developers] Re: d_revalidate
Sam Lang
slang at mcs.anl.gov
Fri Dec 7 14:44:10 EST 2007
On Dec 7, 2007, at 1:04 PM, Murali Vilayannur wrote:
> Hi Sam,
>>
>> I can test it on older kernels. :-)
>
> Okay.. sounds good!
>
>> We do this with the ncache in the client daemon. Sure, it still
>> requires invalidating an entry and doing the lookup through the VFS
>> to
>> the client daemon, but that seems tiny by comparison to the network
>> roundtrip.
>
> RIght.. But don't do it in kmod for some reason I can't remember.
I think we try to keep the kmod simple so that its only forwarding
requests to the daemon. Caching fits better in the daemon if that's
the case.
>
> If we are going to go with the dentry cache timeout, then your patch
> will need
> some modifications..
I'm confused now. Why do we need a dentry cache timeout?
>
> Basically, we would have to a check if dentry is valid based on
> timeout and if expired
> we can then return a 0..
>
>>
>>> The downside is some strange errors such as the one that Emmanuel
>>> is seeing with the NFS server workload..
>>
>> I think that's probably something else. His error is specific to
>> readdir being done in chunks of 26, not with dentry revalidate/
>> lookups, right?
>
> It is quite possible that this is what is causing his error. I did not
> try it out yet..
> Have you looked at it or shall I take a stab?
I think this bug might be on the server, so if you want to look into
it keep that in mind.
-sam
>
> thanks,
> Murali
>
>> -sam
>>
>>
>>>
>>> Thanks
>>> Murali
>>>
>>>
>>> On Dec 5, 2007 12:35 PM, Sam Lang <slang at mcs.anl.gov> wrote:
>>>>
>>>> Hi Murali,
>>>>
>>>> I'm trying to figure out a bug in pvfs_revalidate_common. My
>>>> understanding is that the revalidate code tries to handle the cases
>>>> where a dentry might not be valid by doing a PVFS lookup operation,
>>>> and comparing the results with the handle specified in the
>>>> inode. So
>>>> except for a couple edge cases (the root dir), we have to (or
>>>> should
>>>> be doing) a PVFS lookup and inode/handle comparison when
>>>> d_revalidate
>>>> is called by the VFS. Is this an accurate view of what's going on?
>>>>
>>>> Also, while it might be slightly more optimal to do the PVFS lookup
>>>> in
>>>> the revalidate, since it requires a network operation, my guess is
>>>> its
>>>> not much of one. Why not just return 0 from revalidate all the
>>>> time
>>>> (indicating an invalid dentry), in which case, the VFS destroys the
>>>> dentry and creates a new one by doing the lookup itself. This
>>>> leaves
>>>> the d_revalidate code fairly simple, and it doesn't seem like we're
>>>> able to optimize-out the expensive lookup for most (all?) dentries
>>>> anyway...
>>>>
>>>> I've attached a patch that does more or less what I've just
>>>> described. It seems to fix the errors I was seeing. You can
>>>> reproduce it by doing:
>>>>
>>>> If I do this:
>>>>
>>>> nodea> touch f1
>>>> nodeb> rm f1
>>>> nodeb> touch f1
>>>> nodea> ls -lrt f1
>>>>
>>>> The result is either an ENOENT error for the file, or in some
>>>> cases, a
>>>> bug in the kernel (I've included the trace below). Honestly, I
>>>> didn't
>>>> dig into this bug too much -- it looks like the new inode given to
>>>> d_splice_alias is corrupted in some way -- but I think there
>>>> might be
>>>> a simpler fix, so I'll wait to hear from you on the above before
>>>> digging further.
>>>>
>>>> Thanks,
>>>> -sam
>>>>
>>>>
>>>>
>>>> Index: src/kernel/linux-2.6/dcache.c
>>>> ===================================================================
>>>> RCS file: /projects/cvsroot/pvfs2/src/kernel/linux-2.6/dcache.c,v
>>>> retrieving revision 1.32
>>>> diff -u -a -p -r1.32 dcache.c
>>>> --- src/kernel/linux-2.6/dcache.c 9 Oct 2007 00:05:39
>>>> -0000 1.32
>>>> +++ src/kernel/linux-2.6/dcache.c 5 Dec 2007 20:20:08 -0000
>>>> @@ -36,7 +36,7 @@ static int pvfs2_d_revalidate_common(str
>>>> }
>>>>
>>>> if (inode == NULL) {
>>>> - return 1;
>>>> + return 0;
>>>> }
>>>> if (inode && parent_inode)
>>>> {
>>>> @@ -49,6 +49,7 @@ static int pvfs2_d_revalidate_common(str
>>>> if (!is_root_handle(inode))
>>>> {
>>>> gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>> "pvfs2_d_revalidate_common: attempting lookup.\n");
>>>> + return 0;
>>>> new_op = op_alloc(PVFS2_VFS_OP_LOOKUP);
>>>> if (!new_op)
>>>> {
>>>> @@ -110,6 +111,7 @@ static int pvfs2_d_revalidate_common(str
>>>> gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>> "pvfs2_d_revalidate_common: root handle, lookup skipped.\n");
>>>> }
>>>>
>>>> + return 0;
>>>> /* now perform revalidation */
>>>> gossip_debug(GOSSIP_DCACHE_DEBUG, " (inode %llu)\n",
>>>> llu(get_handle_from_ino(inode)));
>>>>
>>>>
>>>>
>>>>
>>>> 154457.551332] ------------[ cut here ]------------
>>>> [154457.560704] kernel BUG at fs/dcache.c:952!
>>>> [154457.569044] invalid opcode: 0000 [1] SMP
>>>> [154457.577277] CPU 3
>>>> [154457.581498] Modules linked in: pvfs2 raid0 nfs lockd sunrpc
>>>> ipv6
>>>> sony_acpi pcc
>>>> _acpi dev_acpi tc1100_wmi video sbs i2c_ec dock container button
>>>> battery ac asus_a
>>>> cpi backlight xfs sbp2 lp af_packet snd_hda_intel snd_hda_codec
>>>> snd_pcm_oss snd_mi
>>>> xer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi
>>>> snd_seq_midi_ev
>>>> ent snd_seq serio_raw snd_timer psmouse snd_seq_device ib_ipath
>>>> ib_core pcspkr par
>>>> port_pc parport k8temp snd soundcore snd_page_alloc i2c_nforce2
>>>> i2c_core shpchp pc
>>>> i_hotplug tsdev evdev ext3 jbd mbcache sg sd_mod ata_generic
>>>> ohci1394
>>>> sata_nv amd7
>>>> 4xx ohci_hcd libata ehci_hcd ieee1394 scsi_mod tg3 usbcore generic
>>>> raid456 xor rai
>>>> d1 md_mod thermal processor fan fbcon tileblit font bitblit
>>>> softcursor
>>>> vesafb cfbc
>>>> opyarea cfbimgblt cfbfillrect capability commoncap
>>>> [154457.721315] Pid: 32045, comm: ls Not tainted 2.6.20-16-
>>>> generic #2
>>>> [154457.733629] RIP: 0010:[<ffffffff8023cd74>]
>>>> [<ffffffff8023cd74>]
>>>> d_instantiate
>>>> +0x14/0x90
>>>> [154457.749962] RSP: 0018:ffff8100740b3bb8 EFLAGS: 00010216
>>>> [154457.760721] RAX: 0000000000008000 RBX: ffff81011f97ec28 RCX:
>>>> 0000000000000036
>>>> [154457.775115] RDX: 0000000000000003 RSI: ffff81011f97ec28 RDI:
>>>> ffff8101216acb10
>>>> [154457.789507] RBP: ffff8101216acb80 R08: ffff8100740b2000 R09:
>>>> 0000000000000000
>>>> [154457.803901] R10: 000000000000000a R11: 0000000000000202 R12:
>>>> ffff8101216acb10
>>>> [154457.818293] R13: ffff81011f9520f8 R14: ffff81007462a740 R15:
>>>> 0000000000000000
>>>> [154457.832688] FS: 00002aab82463b00(0000)
>>>> GS:ffff810121e60740(0000)
>>>> knlGS:000000
>>>> 0000000000
>>>> [154457.848983] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [154457.860607] CR2: 0000000000404131 CR3: 00000001229d6000 CR4:
>>>> 00000000000006e0
>>>> [154457.875002] Process ls (pid: 32045, threadinfo
>>>> ffff8100740b2000,
>>>> task ffff8100
>>>> 7dea00c0)
>>>> [154457.891124] Stack: ffff81011f97ec28 0000000000000000
>>>> ffff8101216acb10 fffffff
>>>> f8023215d
>>>> [154457.907419] ffff8101216acb10 ffff81007462abe8 ffff81011f97ec28
>>>> ffffffff8850f7
>>>> 06
>>>> [154457.922471] ffff8101216acb10 ffff8100740b3ca8 ffff8100740b3e48
>>>> ffff8100740b3c
>>>> a8
>>>> [154457.937140] Call Trace:
>>>>
>>>> [154457.942578] [<ffffffff8023215d>] d_splice_alias+0x11d/0x140
>>>> [154457.954037] [<ffffffff8850f706>] :pvfs2:pvfs2_d_revalidate
>>>> +0x206/0x300
>>>> [154457.967403] [<ffffffff8020ca58>] do_lookup+0x198/0x210
>>>> [154457.977990] [<ffffffff802099cb>] __link_path_walk+0x90b/0xdc0
>>>> [154457.989795] [<ffffffff8020e6db>] link_path_walk+0x5b/0xf0
>>>> [154458.000900] [<ffffffff8020b37e>] touch_atime+0xde/0x130
>>>> [154458.011657] [<ffffffff8020c770>] do_path_lookup+0x1b0/0x1e0
>>>> [154458.023104] [<ffffffff802120d7>] getname+0x167/0x1d0
>>>> [154458.033346] [<ffffffff802248fb>] __user_walk_fd+0x4b/0x80
>>>> [154458.044454] [<ffffffff80241e5c>] vfs_lstat_fd+0x2c/0x70
>>>> [154458.055220] [<ffffffff8020b37e>] touch_atime+0xde/0x130
>>>> [154458.065973] [<ffffffff8022c027>] sys_newlstat+0x27/0x50
>>>> [154458.076738] [<ffffffff8026806d>] error_exit+0x0/0x84
>>>> [154458.086977] [<ffffffff8026111e>] system_call+0x7e/0x83
>>>> [154458.097567]
>>>> [154458.100708]
>>>> [154458.100709] Code: 0f 0b eb fe 48 c7 c7 00 f2 55 80 e8 9c ac
>>>> 02 00
>>>> 48 85 db 74
>>>> [154458.119148] RIP [<ffffffff8023cd74>] d_instantiate+0x14/0x90
>>>> [154458.130808] RSP <ffff8100740b3bb8>
>>>> [154458.138115]
>>>>
>>>> RCS file: /projects/cvsroot/pvfs2/src/kernel/linux-2.6/dcache.c,v
>>>> retrieving revision 1.32
>>>> diff -u -a -p -r1.32 dcache.c
>>>> --- src/kernel/linux-2.6/dcache.c 9 Oct 2007 00:05:39
>>>> -0000 1.32
>>>> +++ src/kernel/linux-2.6/dcache.c 5 Dec 2007 19:59:26 -0000
>>>> @@ -49,6 +49,7 @@ static int pvfs2_d_revalidate_common(str
>>>> if (!is_root_handle(inode))
>>>> {
>>>> gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>> "pvfs2_d_revalidate_common: attempting lookup.\n");
>>>> + return 0;
>>>> new_op = op_alloc(PVFS2_VFS_OP_LOOKUP);
>>>> if (!new_op)
>>>> {
>>>> @@ -110,6 +111,7 @@ static int pvfs2_d_revalidate_common(str
>>>> gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>> "pvfs2_d_revalidate_common: root handle, lookup skipped.\n");
>>>> }
>>>>
>>>> + return 0;
>>>> /* now perform revalidation */
>>>> gossip_debug(GOSSIP_DCACHE_DEBUG, " (inode %llu)\n",
>>>> llu(get_handle_from_ino(inode)));
>>>>
>>>>
>>>>
>>>
>>
>>
>
More information about the Pvfs2-developers
mailing list