[Pvfs2-developers] server crash on startup with millions of files
Sam Lang
slang at mcs.anl.gov
Thu Mar 1 11:00:19 EST 2007
On Mar 1, 2007, at 9:52 AM, Phil Carns wrote:
> Sam Lang wrote:
>> On Feb 28, 2007, at 6:54 AM, Phil Carns wrote:
>>> I know that you guys still have some ongoing discussion about the
>>> long
>>> range design for tracking handles, but I have another item about the
>>> current implementation that might be of interest.
>>>
>>> Most of the remaining startup performance problem (after Sam's
>>> optimization patches) appears to be a result of how the db is
>>> ordered.
>>> If I modify the attr db's comparison function so that it has a "<"
>>> rather than ">", then all of the preads during startup go in order
>>> through the db rather than backwards. This takes the startup
>>> time on a
>>> cold db down to just 34 seconds. Previously it was 2 minutes 22
>>> seconds.
>>>
>>> It still could be faster, but that seems to be the biggest part
>>> of the
>>> time. I imagine the rest of it is just the access size (4 KB at
>>> a time) that might be tunable through some berkeley db settings.
>>>
>>> The downside of making that particular change to the comparison
>>> method is that it breaks storage space compatibility.
>>>
>>> I wonder if it might be possible to accomplish the same thing in the
>>> current db format by modifying iterate_handles() to just run the
>>> cursor
>>> backwards (using DB_PREV instead of DB_NEXT)? That wouldn't hurt
>>> storage space compability (if it works), but I don't know if it
>>> makes any difference to callers of that function what order the
>>> handles come out in.
>> It doesn't matter to the caller. You'll also need to set the
>> cursor to the last position in the db with DB_LAST. Does DB_PREV
>> work with DB_MULTIPLE though? Its not clear from the above, does
>> the improvement to 34 seconds occur with MULTIPLE or without?
>> I mentioned previously that the dspace db gets opened with the
>> RECNUM flag. I don't think that's necessary, and removing it
>> will invariably improve performance, but we need a way to return
>> the position for iterate_handles. The easiest thing to do is
>> turn PVFS_ds_position into a uint64_t (currently its only
>> uint32_t). That breaks interfaces and protocols though.
>
> I don't know if the PREV approach would work with MULTIPLE or not.
> The 34 second times (with inverted comparison function) were run
> with your MULTIPLE patches applied. I didn't try it without the
> patches.
I couldn't find anything in the berkeley db about DB_MULTIPLE_KEY and
DB_PREV not being allowed, but when tried it returns an error about
Illegal flag combinations. So our option is to either use DB_PREV
without DB_MULTIPLE (no storage format changes), or change the
comparison function and storage format so that we can use DB_NEXT
with DB_MULTIPLE_KEY.
Checking the storage format version and providing the appropriate
comparison function wouldn't be hard though, and wouldn't require any
"migration" of the old to new format. Older formats wouldn't benefit
from the performance improvements though.
-sam
>
> -Phil
>
More information about the Pvfs2-developers
mailing list