[Pvfs2-developers] server crash on startup with millions of files
Sam Lang
slang at mcs.anl.gov
Thu Mar 1 09:16:58 EST 2007
On Feb 28, 2007, at 6:54 AM, Phil Carns wrote:
> I know that you guys still have some ongoing discussion about the long
> range design for tracking handles, but I have another item about the
> current implementation that might be of interest.
>
> Most of the remaining startup performance problem (after Sam's
> optimization patches) appears to be a result of how the db is ordered.
> If I modify the attr db's comparison function so that it has a "<"
> rather than ">", then all of the preads during startup go in order
> through the db rather than backwards. This takes the startup time
> on a
> cold db down to just 34 seconds. Previously it was 2 minutes 22
> seconds.
>
> It still could be faster, but that seems to be the biggest part of the
> time. I imagine the rest of it is just the access size (4 KB at a
> time) that might be tunable through some berkeley db settings.
>
> The downside of making that particular change to the comparison
> method is that it breaks storage space compatibility.
>
> I wonder if it might be possible to accomplish the same thing in the
> current db format by modifying iterate_handles() to just run the
> cursor
> backwards (using DB_PREV instead of DB_NEXT)? That wouldn't hurt
> storage space compability (if it works), but I don't know if it
> makes any difference to callers of that function what order the
> handles come out in.
It doesn't matter to the caller. You'll also need to set the cursor
to the last position in the db with DB_LAST. Does DB_PREV work with
DB_MULTIPLE though? Its not clear from the above, does the
improvement to 34 seconds occur with MULTIPLE or without?
I mentioned previously that the dspace db gets opened with the RECNUM
flag. I don't think that's necessary, and removing it will
invariably improve performance, but we need a way to return the
position for iterate_handles. The easiest thing to do is turn
PVFS_ds_position into a uint64_t (currently its only uint32_t). That
breaks interfaces and protocols though.
-sam
>
> -Phil
>
>
> Phil Carns wrote:
>> Phil Carns wrote:
>>>> Yeah that is odd. Setting the cursor for each call to
>>>> iterate_handles may be the reason for it starting over. Do you
>>>> know how many times it starts over? The number of times
>>>> iterate_handles is called will be (# of files / 4096).
>>>
>>>
>>>
>>> It only goes through the file twice if I am looking at the log
>>> correctly. Also, I just realized that on both passes (the one
>>> jumping backwards 40KB at a time and the one jumping backwards
>>> 4KB at a time) it is only reading 4KB per pread. I don't know
>>> what it is doing from a db point of view, but from an access
>>> point of view it looks like it goes backwards with a strided
>>> pattern and then goes backwards reading the entire thing. There
>>> are some other reads scattered here and there, but those two
>>> cycles represent the overwhelming majority of the total preads in
>>> the strace file. By spot checking I don't really see any
>>> significant divergence from the patterns.
>>>
>>> It also just occurred to me that maybe I should repeat the strace
>>> and try to capture it with timestamps; I'm not really sure if
>>> both of these pread cycles are actually during the scan or not.
>>>
>> I just double checked- both of those big pread cycles are
>> happening after this message is logged:
>> [D 13:06:53.916769] dbpf collection 752900094 - Setting collection
>> handle ranges to 4-536870914,4294967292-4831838202
>> ... but before the next message. So they do appear to both be a
>> result of the handle scanning on startup.
>> -Phil
>
>
More information about the Pvfs2-developers
mailing list