[Pvfs2-developers] server crash on startup with millions of files

Phil Carns pcarns at wastedcycles.org
Wed Feb 28 07:54:12 EST 2007


I know that you guys still have some ongoing discussion about the long
range design for tracking handles, but I have another item about the
current implementation that might be of interest.

Most of the remaining startup performance problem (after Sam's
optimization patches) appears to be a result of how the db is ordered.
If I modify the attr db's comparison function so that it has a "<"
rather than ">", then all of the preads during startup go in order
through the db rather than backwards.  This takes the startup time on a
cold db down to just 34 seconds.  Previously it was 2 minutes 22 seconds.

It still could be faster, but that seems to be the biggest part of the
time. I imagine the rest of it is just the access size (4 KB at a time) 
that might be tunable through some berkeley db settings.

The downside of making that particular change to the comparison method 
is that it breaks storage space compatibility.

I wonder if it might be possible to accomplish the same thing in the
current db format by modifying iterate_handles() to just run the cursor
backwards (using DB_PREV instead of DB_NEXT)?  That wouldn't hurt
storage space compability (if it works), but I don't know if it makes 
any difference to callers of that function what order the handles come 
out in.

-Phil


Phil Carns wrote:
> Phil Carns wrote:
> 
>>> Yeah that is odd.  Setting the cursor for each call to  
>>> iterate_handles may be the reason for it starting over.  Do you know  
>>> how many times it starts over?  The number of times iterate_handles  
>>> is called will be (# of files / 4096).
>>
>>
>>
>> It only goes through the file twice if I am looking at the log 
>> correctly.  Also, I just realized that on both passes (the one jumping 
>> backwards 40KB at a time and the one jumping backwards 4KB at a time) 
>> it is only reading 4KB per pread.  I don't know what it is doing from 
>> a db point of view, but from an access point of view it looks like it 
>> goes backwards with a strided pattern and then goes backwards reading 
>> the entire thing.  There are some other reads scattered here and 
>> there, but those two cycles represent the overwhelming majority of the 
>> total preads in the strace file.  By spot checking I don't really see 
>> any significant divergence from the patterns.
>>
>> It also just occurred to me that maybe I should repeat the strace and 
>> try to capture it with timestamps; I'm not really sure if both of 
>> these pread cycles are actually during the scan or not.
>>
> 
> I just double checked- both of those big pread cycles are happening 
> after this message is logged:
> 
> [D 13:06:53.916769] dbpf collection 752900094 - Setting collection 
> handle ranges to 4-536870914,4294967292-4831838202
> 
> ... but before the next message.  So they do appear to both be a result 
> of the handle scanning on startup.
> 
> -Phil




More information about the Pvfs2-developers mailing list