[Pvfs2-developers] server crash on startup with millions of files

Sam Lang slang at mcs.anl.gov
Wed Mar 7 15:02:11 EST 2007


On Mar 1, 2007, at 10:00 AM, Sam Lang wrote:

>
> On Mar 1, 2007, at 9:52 AM, Phil Carns wrote:
>
>> Sam Lang wrote:
>>> On Feb 28, 2007, at 6:54 AM, Phil Carns wrote:
>>>> I know that you guys still have some ongoing discussion about  
>>>> the long
>>>> range design for tracking handles, but I have another item about  
>>>> the
>>>> current implementation that might be of interest.
>>>>
>>>> Most of the remaining startup performance problem (after Sam's
>>>> optimization patches) appears to be a result of how the db is  
>>>> ordered.
>>>> If I modify the attr db's comparison function so that it has a "<"
>>>> rather than ">", then all of the preads during startup go in order
>>>> through the db rather than backwards.  This takes the startup  
>>>> time  on a
>>>> cold db down to just 34 seconds.  Previously it was 2 minutes  
>>>> 22  seconds.
>>>>
>>>> It still could be faster, but that seems to be the biggest part  
>>>> of the
>>>> time. I imagine the rest of it is just the access size (4 KB at  
>>>> a  time) that might be tunable through some berkeley db settings.
>>>>
>>>> The downside of making that particular change to the comparison   
>>>> method is that it breaks storage space compatibility.
>>>>
>>>> I wonder if it might be possible to accomplish the same thing in  
>>>> the
>>>> current db format by modifying iterate_handles() to just run  
>>>> the  cursor
>>>> backwards (using DB_PREV instead of DB_NEXT)?  That wouldn't hurt
>>>> storage space compability (if it works), but I don't know if it   
>>>> makes any difference to callers of that function what order the   
>>>> handles come out in.
>>> It doesn't matter to the caller.  You'll also need to set the  
>>> cursor  to the last position in the db with DB_LAST.  Does  
>>> DB_PREV work with  DB_MULTIPLE though?  Its not clear from the  
>>> above, does the  improvement to 34 seconds occur with MULTIPLE or  
>>> without?
>>> I mentioned previously that the dspace db gets opened with the  
>>> RECNUM  flag.  I don't think that's necessary, and removing it  
>>> will  invariably improve performance, but we need a way to return  
>>> the  position for iterate_handles.  The easiest thing to do is  
>>> turn  PVFS_ds_position into a uint64_t (currently its only  
>>> uint32_t).  That  breaks interfaces and protocols though.
>>
>> I don't know if the PREV approach would work with MULTIPLE or  
>> not.  The 34 second times (with inverted comparison function) were  
>> run with your MULTIPLE patches applied.  I didn't try it without  
>> the patches.
>
> I couldn't find anything in the berkeley db about DB_MULTIPLE_KEY  
> and DB_PREV not being allowed, but when tried it returns an error  
> about Illegal flag combinations.  So our option is to either use  
> DB_PREV without DB_MULTIPLE (no storage format changes), or change  
> the comparison function and storage format so that we can use  
> DB_NEXT with DB_MULTIPLE_KEY.
>
> Checking the storage format version and providing the appropriate  
> comparison function wouldn't be hard though, and wouldn't require  
> any "migration" of the old to new format.  Older formats wouldn't  
> benefit from the performance improvements though.

Can we conclude this discussion?  In summary:

* The current comparison function causes bad IO patterns for iterate  
on the dspace db.  We can change it but the disk format will change  
in new releases.

	- If we change it, either we check a version number and provide the  
right comparison function, or we perform migration to the new storage  
format.

	- If we don't change it, we can still improve performance by  
iterating from the last entry to the first, but we can't use  
DB_MULTIPLE_KEY, which also improves performance for big filesystems.

* If we change PVFS_ds_position from uint32_t to uint64_t, we can use  
the handle as the position, and avoid opening the dspace db with the  
RECNO flag, which is killing our performance on writes.

-sam


> -sam
>
>>
>> -Phil
>>
>



More information about the Pvfs2-developers mailing list