[Pvfs2-developers] server crash on startup with millions of files
Phil Carns
pcarns at wastedcycles.org
Fri Feb 23 09:16:43 EST 2007
Thanks Sam! We will give these patches a try and report back.
-Phil
Sam Lang wrote:
>
> Hi Phil,
>
> Attached mult.patch implements iterating over the dspace db using
> DB_MULTIPLE_KEY. This may allow for the db get call to do larger reads
> from your SAN. I was seeing slightly better performance with local
> disk after creating 20K files in a fresh storage space. Doing strace
> doesn't show fewer mmaps or larger reads though, so I'm not sure how
> berkeley db pulls in its pages. Anyway, if it helps improve
> performance for you guys, I can clean it up a bit and commit it. I
> don't think anything uses dspace_iterate_handles besides that ledger
> handle management code.
>
> You can fiddle the MAX_NUM_VERIFY_HANDLE_COUNT value to set how many
> handles to get at a time. Right now its set to 4096. Keep in mind
> that this requires a much larger buffer allocated in
> dbpf_dspace_iterate_handles_op_svc, since we have to get keys and
> values, so essentially we do a get with a buffer that's 4096*(sizeof
> (handle) + sizeof(stored_attr)), which ends up being about 300K.
>
> I also attached a patch (server-start.patch) that prints out the start
> message as well as ready message after server initialization has
> completed. If you set the Logstamp to usec, you'll be able to see the
> time it takes to initialize the server. Also, this might help in
> knowing when you can mount the clients, although, hopefully at some
> point we'll be able to add the zero-conf stuff and then we can return
> EAGAIN or something.
>
> I'm not sure its time to replace the ledger code. It seems to work ok,
> and to fix the slowness you're seeing would mean switching to some kind
> of range tree that could be serialized to disk so that we wouldn't have
> to iterate through the entire dspace db on startup. That opens up the
> possibility of the dspace db and the ledger-on-disk getting out of
> sync, which I'd rather avoid.
>
> We could hand out new handles by choosing one randomly, and then
> checking if its in the DB, getting rid of the need for a ledger
> entirely, but I assume this idea was already scratched to avoid the
> potential costs at creation time, especially as the filesystem grows.
>
> -sam
>
>
>
> On Feb 20, 2007, at 11:23 AM, Phil Carns wrote:
>
>> Robert Latham wrote:
>>
>>> On Tue, Feb 20, 2007 at 07:29:16AM -0500, Phil Carns wrote:
>>>
>>>> Oh, and one other detail; the memory usage of the servers looks
>>>> fine during startup, so this doesn't appear to be a memory leak.
>>>> There is quite a bit of CPU work, but I am guessing that is just
>>>> berkeley db keeping busy in the iteration function.
>>>
>>> How long does it take to scan 1.4 million files on startup?
>>> ==rob
>>
>>
>> That's an interesting issue :)
>>
>> A few observations:
>>
>> - we were looking at this on SAN; the results may be different on
>> local disks
>>
>> - the db files are on the order of 500 MB for this particular setup
>>
>> - the time to scan varies depending on if the db files are hot in the
>> Linux buffer cache
>>
>> If we start the daemon right after killing another one that just did
>> the same scan, then the process is CPU intensive, but fast (about 5
>> seconds). If we unmount/mount the SAN between the two runs so that
>> the buffer cache is cleared, then it is very slow (about 5 minutes).
>>
>> An interesting trick is to use dd with a healthy buffer size to read
>> the .db files and throw the output into /dev/null before starting the
>> servers. This only takes a few seconds, and makes it so that the
>> scan consistently finishes in just a few seconds as well. I think
>> the reason is just that it forces the db data into the Linux buffer
>> cache using an efficient access pattern so that berkeley db doesn't
>> have to wait on disk latency for whatever small accesses it is
>> performing.
>>
>> This seems to indicate that berkeley db's access pattern generated by
>> PVFS2 for this case isn't very friendly, at least to SANs that aren't
>> specifically tuned for it.
>>
>> The 5 minute scan time is a problem, because it makes it hard to tell
>> when you will actually be able to mount the file system after the
>> daemons appear to have started. We would be happy to try out any
>> optimizations here :)
>>
>> -Phil
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
More information about the Pvfs2-developers
mailing list