[Pvfs2-developers] server crash on startup with millions of files
Phil Carns
pcarns at wastedcycles.org
Wed Feb 28 07:54:12 EST 2007
I know that you guys still have some ongoing discussion about the long
range design for tracking handles, but I have another item about the
current implementation that might be of interest.
Most of the remaining startup performance problem (after Sam's
optimization patches) appears to be a result of how the db is ordered.
If I modify the attr db's comparison function so that it has a "<"
rather than ">", then all of the preads during startup go in order
through the db rather than backwards. This takes the startup time on a
cold db down to just 34 seconds. Previously it was 2 minutes 22 seconds.
It still could be faster, but that seems to be the biggest part of the
time. I imagine the rest of it is just the access size (4 KB at a time)
that might be tunable through some berkeley db settings.
The downside of making that particular change to the comparison method
is that it breaks storage space compatibility.
I wonder if it might be possible to accomplish the same thing in the
current db format by modifying iterate_handles() to just run the cursor
backwards (using DB_PREV instead of DB_NEXT)? That wouldn't hurt
storage space compability (if it works), but I don't know if it makes
any difference to callers of that function what order the handles come
out in.
-Phil
Phil Carns wrote:
> Phil Carns wrote:
>
>>> Yeah that is odd. Setting the cursor for each call to
>>> iterate_handles may be the reason for it starting over. Do you know
>>> how many times it starts over? The number of times iterate_handles
>>> is called will be (# of files / 4096).
>>
>>
>>
>> It only goes through the file twice if I am looking at the log
>> correctly. Also, I just realized that on both passes (the one jumping
>> backwards 40KB at a time and the one jumping backwards 4KB at a time)
>> it is only reading 4KB per pread. I don't know what it is doing from
>> a db point of view, but from an access point of view it looks like it
>> goes backwards with a strided pattern and then goes backwards reading
>> the entire thing. There are some other reads scattered here and
>> there, but those two cycles represent the overwhelming majority of the
>> total preads in the strace file. By spot checking I don't really see
>> any significant divergence from the patterns.
>>
>> It also just occurred to me that maybe I should repeat the strace and
>> try to capture it with timestamps; I'm not really sure if both of
>> these pread cycles are actually during the scan or not.
>>
>
> I just double checked- both of those big pread cycles are happening
> after this message is logged:
>
> [D 13:06:53.916769] dbpf collection 752900094 - Setting collection
> handle ranges to 4-536870914,4294967292-4831838202
>
> ... but before the next message. So they do appear to both be a result
> of the handle scanning on startup.
>
> -Phil
More information about the Pvfs2-developers
mailing list