[Pvfs2-developers] Listing performance patch

Phil Carns carns at mcs.anl.gov
Tue Nov 4 10:54:49 EST 2008


Just a quick update for anyone following this bug report.  So far I 
haven't been able to reproduce this yet on a single server file system 
using either 2.6.3 (patched to increase dirent count) or trunk code.  We 
are still trying to narrow down what the key factor is.

-Phil

Phil Carns wrote:
> Hi David,
> 
> Thanks for the bug report.  I don't know off hand how the dirent count 
> could cause the ls utility to consume so much memory, but I'll see if I 
> can reproduce it here.
> 
> thanks,
> -Phil
> 
> David Metheny wrote:
>> It appears that increasing the "MAX_DIRENT_COUNT" in the
>> src/kernel/linux2.6/pvfs2-dev-proto.h file has turned out to be a bad 
>> thing
>> for us. We had implemented this to be 96 also, and found some issues 
>> in some
>> stress testing.
>>
>> We've hit a scenario where a single directory on our file system 
>> contained >
>> 800,000 files/directories, with many directories containing 10,000+ files
>> each. When we executed 'ls -Rl' on the top level directory, after about 8
>> hours, the 'ls' command was consuming 800MB+ memory and eventually exited
>> with a "memory exhausted" error. We definitely have some paths that 
>> are long
>> enough that 96 of them won't fit into a single 4K page.
>> We backed out only the "MAX_DIRENT_COUNT" in the
>> src/kernel/linux2.6/pvfs2-dev-proto.h and put it back at 0x00000020 
>> (32) and
>> reran the test. The 'ls -Rl' consistently runs in about an hour now, and
>> finishes correctly.
>>
>> -----Original Message-----
>> From: pvfs2-developers-bounces at beowulf-underground.org
>> [mailto:pvfs2-developers-bounces at beowulf-underground.org] On Behalf Of 
>> Phil
>> Carns
>> Sent: Thursday, September 11, 2008 9:33 AM
>> To: Bart Taylor
>> Cc: pvfs2-developers at beowulf-underground.org
>> Subject: Re: [Pvfs2-developers] Listing performance patch
>>
>> Hi Bart,
>>
>> I fixed a silly bug in our readdir logic just now, and now your patch 
>> works fine for the case I was looking at.  I applied the dirent 
>> increase patch to trunk.
>>
>> I now get the correct number of getdents calls (using ext3 for 
>> comparison) on PVFS:
>>
>> getdents64(3, /* 170 entries */, 4096)  = 4080
>> getdents64(3, /* 132 entries */, 4096)  = 3168
>> getdents64(3, /* 0 entries */, 4096)    = 0
>>
>> So even with just 300 entries your patch takes us from 11 getdents 
>> system calls down to 3 to do an ls.
>>
>> Thanks!
>> -Phil
>>
>> Phil Carns wrote:
>>> I looked at the code a little just now.  The getdents system call 
>>> passes a filldir() callback function into the file system readdir() 
>>> implementation that lets it fill entries until the user's dentry 
>>> buffer is full.  The dentries at this level use variable length 
>>> strings.  The only remaining cap at this point is the size of the 
>>> dentry buffer passed in from user space (and any artificial cap 
>>> introduced by the file system implementation).
>>>
>>> http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L270
>>> http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L232
>>>
>>> If I do an strace on a directory with 300 entries on ext3, this is 
>>> what happens:
>>>
>>> getdents64(3, /* 170 entries */, 4096)  = 4080
>>> getdents64(3, /* 132 entries */, 4096)  = 3168
>>> getdents64(3, /* 0 entries */, 4096)    = 0
>>>
>>> If I do the same thing on a PVFS volume, this is what happens:
>>>
>>> getdents64(3, /* 34 entries */, 4096)   = 816
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 32 entries */, 4096)   = 768
>>> getdents64(3, /* 12 entries */, 4096)   = 288
>>> getdents64(3, /* 0 entries */, 4096)    = 0
>>>
>>> The latter is not filling up the getdents buffer because our code is 
>>> stopping at 32 entries per iteration.  If I then apply Bart's patch, 
>>> things improve in terms of how much it fits into one getdents system 
>>> call, but on my box at least (2.6.24-19, 32bit, current PVFS trunk) 
>>> something new breaks:
>>>
>>> getdents64(3, /* 170 entries */, 4096)  = 4080
>>> getdents64(3, /* 0 entries */, 4096)    = 0
>>>
>>> It looks like it stopped after one getdents (the actual output from 
>>> ls only shows 170 entries).
>>>
>>> So... I would like to apply this patch, but first I need to dig a 
>>> little more and find out what the bug is on my system that is making 
>>> it stop at the first getdents call.  It must not be handling the 
>>> token right in the case where PVFS returns more entries than 
>>> filldir() can consume.
>>>
>>> -Phil
>>>
>>>
>>> Rob Ross wrote:
>>>> Has the internal kernel value changed since we last looked?
>>>>
>>>> Rob
>>>>
>>>> On Sep 4, 2008, at 4:16 PM, Phil Carns wrote:
>>>>
>>>>> Sam Lang wrote:
>>>>>> Hi Bart,
>>>>>> Thanks for the patch.  For users with that many files in a 
>>>>>> directory, using pvfs2-ls is probably a good alternative.
>>>>>> The kernel does readdir requests 32 entries at a time, so 
>>>>>> increasing MAX_NUM_DIRENTS won't help for ls.  Long listings 
>>>>>> requires getting the size of files, which in PVFS is fairly 
>>>>>> expensive.
>>>>>> Unfortunately, we haven't kept up with the readdirplus 
>>>>>> implementation, some bugs have probably crept in since Murali 
>>>>>> added that tool.  If you were motivated to look at where the 
>>>>>> servers were crashing, we'd certainly be interested in helping 
>>>>>> with the debugging there.
>>>>>> Thanks again,
>>>>>> -sam
>>>>> It does look like ls improved with the patches for some reason, 
>>>>> though.
>>>>>
>>>>> The 256 and 512 results are also just about close enough to be 
>>>>> noise. It looks like most of the benefit came from the jump from 
>>>>> 32/64 to 256.
>>>>>
>>>>> -Phil
>>>>> _______________________________________________
>>>>> Pvfs2-developers mailing list
>>>>> Pvfs2-developers at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> 
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



More information about the Pvfs2-developers mailing list