[Pvfs2-developers] Re: d_revalidate

Sam Lang slang at mcs.anl.gov
Fri Dec 7 14:57:56 EST 2007


On Dec 7, 2007, at 1:54 PM, Murali Vilayannur wrote:

> Sam,
>>
>> I think we try to keep the kmod simple so that its only forwarding
>> requests to the daemon.  Caching fits better in the daemon if that's
>> the case.
>
> Sure.. Okay.
>
>> I'm confused now.  Why do we need a dentry cache timeout?
> i.e. only if we wish to take advantage of the kernel provided dcache.
> Right now, it is as if the timeout is 0, i..e hits in the dcache is
> treated like a miss
> for all practical purposes since we do a full blown revalidate.

Agreed, but any timeout is going to have to be small, and we already  
have our own dcache in place.  I would vote for leaving the revalidate  
function as simple as possible.
-sam

>
> thanks,
> Murali
>>
>>>
>>> Basically, we would have to a check if dentry is valid based on
>>> timeout and if expired
>>> we can then return a 0..
>>>
>>>>
>>>>> The downside is some strange errors such as the one that Emmanuel
>>>>> is seeing with the NFS server workload..
>>>>
>>>> I think that's probably something else.  His error is specific to
>>>> readdir being done in chunks of 26, not with dentry revalidate/
>>>> lookups, right?
>>>
>>> It is quite possible that this is what is causing his error. I did  
>>> not
>>> try it out yet..
>>> Have you looked at it or shall I take a stab?
>>
>> I think this bug might be on the server, so if you want to look into
>> it keep that in mind.
>> -sam
>>
>>
>>>
>>> thanks,
>>> Murali
>>>
>>>> -sam
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>> Murali
>>>>>
>>>>>
>>>>> On Dec 5, 2007 12:35 PM, Sam Lang <slang at mcs.anl.gov> wrote:
>>>>>>
>>>>>> Hi Murali,
>>>>>>
>>>>>> I'm trying to figure out a bug in pvfs_revalidate_common.  My
>>>>>> understanding is that the revalidate code tries to handle the  
>>>>>> cases
>>>>>> where a dentry might not be valid by doing a PVFS lookup  
>>>>>> operation,
>>>>>> and comparing the results with the handle specified in the
>>>>>> inode.  So
>>>>>> except for a couple edge cases (the root dir), we have to (or
>>>>>> should
>>>>>> be doing) a PVFS lookup and inode/handle comparison when
>>>>>> d_revalidate
>>>>>> is called by the VFS.  Is this an accurate view of what's going  
>>>>>> on?
>>>>>>
>>>>>> Also, while it might be slightly more optimal to do the PVFS  
>>>>>> lookup
>>>>>> in
>>>>>> the revalidate, since it requires a network operation, my guess  
>>>>>> is
>>>>>> its
>>>>>> not much of one.  Why not just return 0 from revalidate all the
>>>>>> time
>>>>>> (indicating an invalid dentry), in which case, the VFS destroys  
>>>>>> the
>>>>>> dentry and creates a new one by doing the lookup itself.  This
>>>>>> leaves
>>>>>> the d_revalidate code fairly simple, and it doesn't seem like  
>>>>>> we're
>>>>>> able to optimize-out the expensive lookup for most (all?)  
>>>>>> dentries
>>>>>> anyway...
>>>>>>
>>>>>> I've attached a patch that does more or less what I've just
>>>>>> described.  It seems to fix the errors I was seeing.  You can
>>>>>> reproduce it by doing:
>>>>>>
>>>>>> If I do this:
>>>>>>
>>>>>> nodea> touch f1
>>>>>> nodeb> rm f1
>>>>>> nodeb> touch f1
>>>>>> nodea> ls -lrt f1
>>>>>>
>>>>>> The result is either an ENOENT error for the file, or in some
>>>>>> cases, a
>>>>>> bug in the kernel (I've included the trace below).  Honestly, I
>>>>>> didn't
>>>>>> dig into this bug too much -- it looks like the new inode given  
>>>>>> to
>>>>>> d_splice_alias is corrupted in some way -- but I think there
>>>>>> might be
>>>>>> a simpler fix, so I'll wait to hear from you on the above before
>>>>>> digging further.
>>>>>>
>>>>>> Thanks,
>>>>>> -sam
>>>>>>
>>>>>>
>>>>>>
>>>>>> Index: src/kernel/linux-2.6/dcache.c
>>>>>> = 
>>>>>> = 
>>>>>> =================================================================
>>>>>> RCS file: /projects/cvsroot/pvfs2/src/kernel/linux-2.6/dcache.c,v
>>>>>> retrieving revision 1.32
>>>>>> diff -u -a -p -r1.32 dcache.c
>>>>>> --- src/kernel/linux-2.6/dcache.c       9 Oct 2007 00:05:39
>>>>>> -0000       1.32
>>>>>> +++ src/kernel/linux-2.6/dcache.c       5 Dec 2007 20:20:08 -0000
>>>>>> @@ -36,7 +36,7 @@ static int pvfs2_d_revalidate_common(str
>>>>>>    }
>>>>>>
>>>>>>    if (inode == NULL) {
>>>>>> -        return 1;
>>>>>> +        return 0;
>>>>>>    }
>>>>>>    if (inode && parent_inode)
>>>>>>    {
>>>>>> @@ -49,6 +49,7 @@ static int pvfs2_d_revalidate_common(str
>>>>>>        if (!is_root_handle(inode))
>>>>>>        {
>>>>>>            gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>>>> "pvfs2_d_revalidate_common: attempting lookup.\n");
>>>>>> +            return 0;
>>>>>>            new_op = op_alloc(PVFS2_VFS_OP_LOOKUP);
>>>>>>            if (!new_op)
>>>>>>            {
>>>>>> @@ -110,6 +111,7 @@ static int pvfs2_d_revalidate_common(str
>>>>>>            gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>>>> "pvfs2_d_revalidate_common: root handle, lookup skipped.\n");
>>>>>>        }
>>>>>>
>>>>>> +        return 0;
>>>>>>        /* now perform revalidation */
>>>>>>        gossip_debug(GOSSIP_DCACHE_DEBUG, " (inode %llu)\n",
>>>>>>                    llu(get_handle_from_ino(inode)));
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 154457.551332] ------------[ cut here ]------------
>>>>>> [154457.560704] kernel BUG at fs/dcache.c:952!
>>>>>> [154457.569044] invalid opcode: 0000 [1] SMP
>>>>>> [154457.577277] CPU 3
>>>>>> [154457.581498] Modules linked in: pvfs2 raid0 nfs lockd sunrpc
>>>>>> ipv6
>>>>>> sony_acpi pcc
>>>>>> _acpi dev_acpi tc1100_wmi video sbs i2c_ec dock container button
>>>>>> battery ac asus_a
>>>>>> cpi backlight xfs sbp2 lp af_packet snd_hda_intel snd_hda_codec
>>>>>> snd_pcm_oss snd_mi
>>>>>> xer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi  
>>>>>> snd_rawmidi
>>>>>> snd_seq_midi_ev
>>>>>> ent snd_seq serio_raw snd_timer psmouse snd_seq_device ib_ipath
>>>>>> ib_core pcspkr par
>>>>>> port_pc parport k8temp snd soundcore snd_page_alloc i2c_nforce2
>>>>>> i2c_core shpchp pc
>>>>>> i_hotplug tsdev evdev ext3 jbd mbcache sg sd_mod ata_generic
>>>>>> ohci1394
>>>>>> sata_nv amd7
>>>>>> 4xx ohci_hcd libata ehci_hcd ieee1394 scsi_mod tg3 usbcore  
>>>>>> generic
>>>>>> raid456 xor rai
>>>>>> d1 md_mod thermal processor fan fbcon tileblit font bitblit
>>>>>> softcursor
>>>>>> vesafb cfbc
>>>>>> opyarea cfbimgblt cfbfillrect capability commoncap
>>>>>> [154457.721315] Pid: 32045, comm: ls Not tainted 2.6.20-16-
>>>>>> generic #2
>>>>>> [154457.733629] RIP: 0010:[<ffffffff8023cd74>]
>>>>>> [<ffffffff8023cd74>]
>>>>>> d_instantiate
>>>>>> +0x14/0x90
>>>>>> [154457.749962] RSP: 0018:ffff8100740b3bb8  EFLAGS: 00010216
>>>>>> [154457.760721] RAX: 0000000000008000 RBX: ffff81011f97ec28 RCX:
>>>>>> 0000000000000036
>>>>>> [154457.775115] RDX: 0000000000000003 RSI: ffff81011f97ec28 RDI:
>>>>>> ffff8101216acb10
>>>>>> [154457.789507] RBP: ffff8101216acb80 R08: ffff8100740b2000 R09:
>>>>>> 0000000000000000
>>>>>> [154457.803901] R10: 000000000000000a R11: 0000000000000202 R12:
>>>>>> ffff8101216acb10
>>>>>> [154457.818293] R13: ffff81011f9520f8 R14: ffff81007462a740 R15:
>>>>>> 0000000000000000
>>>>>> [154457.832688] FS:  00002aab82463b00(0000)
>>>>>> GS:ffff810121e60740(0000)
>>>>>> knlGS:000000
>>>>>> 0000000000
>>>>>> [154457.848983] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>> [154457.860607] CR2: 0000000000404131 CR3: 00000001229d6000 CR4:
>>>>>> 00000000000006e0
>>>>>> [154457.875002] Process ls (pid: 32045, threadinfo
>>>>>> ffff8100740b2000,
>>>>>> task ffff8100
>>>>>> 7dea00c0)
>>>>>> [154457.891124] Stack:  ffff81011f97ec28 0000000000000000
>>>>>> ffff8101216acb10 fffffff
>>>>>> f8023215d
>>>>>> [154457.907419]  ffff8101216acb10 ffff81007462abe8  
>>>>>> ffff81011f97ec28
>>>>>> ffffffff8850f7
>>>>>> 06
>>>>>> [154457.922471]  ffff8101216acb10 ffff8100740b3ca8  
>>>>>> ffff8100740b3e48
>>>>>> ffff8100740b3c
>>>>>> a8
>>>>>> [154457.937140] Call Trace:
>>>>>>
>>>>>> [154457.942578]  [<ffffffff8023215d>] d_splice_alias+0x11d/0x140
>>>>>> [154457.954037]  [<ffffffff8850f706>] :pvfs2:pvfs2_d_revalidate
>>>>>> +0x206/0x300
>>>>>> [154457.967403]  [<ffffffff8020ca58>] do_lookup+0x198/0x210
>>>>>> [154457.977990]  [<ffffffff802099cb>] __link_path_walk+0x90b/ 
>>>>>> 0xdc0
>>>>>> [154457.989795]  [<ffffffff8020e6db>] link_path_walk+0x5b/0xf0
>>>>>> [154458.000900]  [<ffffffff8020b37e>] touch_atime+0xde/0x130
>>>>>> [154458.011657]  [<ffffffff8020c770>] do_path_lookup+0x1b0/0x1e0
>>>>>> [154458.023104]  [<ffffffff802120d7>] getname+0x167/0x1d0
>>>>>> [154458.033346]  [<ffffffff802248fb>] __user_walk_fd+0x4b/0x80
>>>>>> [154458.044454]  [<ffffffff80241e5c>] vfs_lstat_fd+0x2c/0x70
>>>>>> [154458.055220]  [<ffffffff8020b37e>] touch_atime+0xde/0x130
>>>>>> [154458.065973]  [<ffffffff8022c027>] sys_newlstat+0x27/0x50
>>>>>> [154458.076738]  [<ffffffff8026806d>] error_exit+0x0/0x84
>>>>>> [154458.086977]  [<ffffffff8026111e>] system_call+0x7e/0x83
>>>>>> [154458.097567]
>>>>>> [154458.100708]
>>>>>> [154458.100709] Code: 0f 0b eb fe 48 c7 c7 00 f2 55 80 e8 9c ac
>>>>>> 02 00
>>>>>> 48 85 db 74
>>>>>> [154458.119148] RIP  [<ffffffff8023cd74>] d_instantiate+0x14/0x90
>>>>>> [154458.130808]  RSP <ffff8100740b3bb8>
>>>>>> [154458.138115]
>>>>>>
>>>>>> RCS file: /projects/cvsroot/pvfs2/src/kernel/linux-2.6/dcache.c,v
>>>>>> retrieving revision 1.32
>>>>>> diff -u -a -p -r1.32 dcache.c
>>>>>> --- src/kernel/linux-2.6/dcache.c       9 Oct 2007 00:05:39
>>>>>> -0000       1.32
>>>>>> +++ src/kernel/linux-2.6/dcache.c       5 Dec 2007 19:59:26 -0000
>>>>>> @@ -49,6 +49,7 @@ static int pvfs2_d_revalidate_common(str
>>>>>>        if (!is_root_handle(inode))
>>>>>>        {
>>>>>>            gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>>>> "pvfs2_d_revalidate_common: attempting lookup.\n");
>>>>>> +            return 0;
>>>>>>            new_op = op_alloc(PVFS2_VFS_OP_LOOKUP);
>>>>>>            if (!new_op)
>>>>>>            {
>>>>>> @@ -110,6 +111,7 @@ static int pvfs2_d_revalidate_common(str
>>>>>>            gossip_debug(GOSSIP_DCACHE_DEBUG,
>>>>>> "pvfs2_d_revalidate_common: root handle, lookup skipped.\n");
>>>>>>        }
>>>>>>
>>>>>> +        return 0;
>>>>>>        /* now perform revalidation */
>>>>>>        gossip_debug(GOSSIP_DCACHE_DEBUG, " (inode %llu)\n",
>>>>>>                    llu(get_handle_from_ino(inode)));
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>



More information about the Pvfs2-developers mailing list