[Pvfs2-developers] server crash on startup with millions of files

Sam Lang slang at mcs.anl.gov
Thu Mar 1 09:16:58 EST 2007


On Feb 28, 2007, at 6:54 AM, Phil Carns wrote:

> I know that you guys still have some ongoing discussion about the long
> range design for tracking handles, but I have another item about the
> current implementation that might be of interest.
>
> Most of the remaining startup performance problem (after Sam's
> optimization patches) appears to be a result of how the db is ordered.
> If I modify the attr db's comparison function so that it has a "<"
> rather than ">", then all of the preads during startup go in order
> through the db rather than backwards.  This takes the startup time  
> on a
> cold db down to just 34 seconds.  Previously it was 2 minutes 22  
> seconds.
>
> It still could be faster, but that seems to be the biggest part of the
> time. I imagine the rest of it is just the access size (4 KB at a  
> time) that might be tunable through some berkeley db settings.
>
> The downside of making that particular change to the comparison  
> method is that it breaks storage space compatibility.
>
> I wonder if it might be possible to accomplish the same thing in the
> current db format by modifying iterate_handles() to just run the  
> cursor
> backwards (using DB_PREV instead of DB_NEXT)?  That wouldn't hurt
> storage space compability (if it works), but I don't know if it  
> makes any difference to callers of that function what order the  
> handles come out in.

It doesn't matter to the caller.  You'll also need to set the cursor  
to the last position in the db with DB_LAST.  Does DB_PREV work with  
DB_MULTIPLE though?  Its not clear from the above, does the  
improvement to 34 seconds occur with MULTIPLE or without?

I mentioned previously that the dspace db gets opened with the RECNUM  
flag.  I don't think that's necessary, and removing it will  
invariably improve performance, but we need a way to return the  
position for iterate_handles.  The easiest thing to do is turn  
PVFS_ds_position into a uint64_t (currently its only uint32_t).  That  
breaks interfaces and protocols though.

-sam


>
> -Phil
>
>
> Phil Carns wrote:
>> Phil Carns wrote:
>>>> Yeah that is odd.  Setting the cursor for each call to   
>>>> iterate_handles may be the reason for it starting over.  Do you  
>>>> know  how many times it starts over?  The number of times  
>>>> iterate_handles  is called will be (# of files / 4096).
>>>
>>>
>>>
>>> It only goes through the file twice if I am looking at the log  
>>> correctly.  Also, I just realized that on both passes (the one  
>>> jumping backwards 40KB at a time and the one jumping backwards  
>>> 4KB at a time) it is only reading 4KB per pread.  I don't know  
>>> what it is doing from a db point of view, but from an access  
>>> point of view it looks like it goes backwards with a strided  
>>> pattern and then goes backwards reading the entire thing.  There  
>>> are some other reads scattered here and there, but those two  
>>> cycles represent the overwhelming majority of the total preads in  
>>> the strace file.  By spot checking I don't really see any  
>>> significant divergence from the patterns.
>>>
>>> It also just occurred to me that maybe I should repeat the strace  
>>> and try to capture it with timestamps; I'm not really sure if  
>>> both of these pread cycles are actually during the scan or not.
>>>
>> I just double checked- both of those big pread cycles are  
>> happening after this message is logged:
>> [D 13:06:53.916769] dbpf collection 752900094 - Setting collection  
>> handle ranges to 4-536870914,4294967292-4831838202
>> ... but before the next message.  So they do appear to both be a  
>> result of the handle scanning on startup.
>> -Phil
>
>



More information about the Pvfs2-developers mailing list