[Pvfs2-developers] server crash on startup with millions of files

Phil Carns pcarns at wastedcycles.org
Fri Feb 23 09:16:43 EST 2007


Thanks Sam!  We will give these patches a try and report back.

-Phil

Sam Lang wrote:
> 
> Hi Phil,
> 
> Attached mult.patch implements iterating over the dspace db using  
> DB_MULTIPLE_KEY.  This may allow for the db get call to do larger  reads 
> from your SAN.  I was seeing slightly better performance with  local 
> disk after creating 20K files in a fresh storage space.  Doing  strace 
> doesn't show fewer mmaps or larger reads though, so I'm not  sure how 
> berkeley db pulls in its pages.  Anyway, if it helps improve  
> performance for you guys, I can clean it up a bit and commit it.  I  
> don't think anything uses dspace_iterate_handles besides that ledger  
> handle management code.
> 
> You can fiddle the MAX_NUM_VERIFY_HANDLE_COUNT value to set how many  
> handles to get at a time.  Right now its set to 4096.  Keep in mind  
> that this requires a much larger buffer allocated in  
> dbpf_dspace_iterate_handles_op_svc, since we have to get keys and  
> values, so essentially we do a get with a buffer that's 4096*(sizeof 
> (handle) + sizeof(stored_attr)), which ends up being about 300K.
> 
> I also attached a patch (server-start.patch) that prints out the  start 
> message as well as ready message after server initialization  has 
> completed.  If you set the Logstamp to usec, you'll be able to  see the 
> time it takes to initialize the server.  Also, this might  help in 
> knowing when you can mount the clients, although, hopefully  at some 
> point we'll be able to add the zero-conf stuff and then we  can return 
> EAGAIN or something.
> 
> I'm not sure its time to replace the ledger code.  It seems to work  ok, 
> and to fix the slowness you're seeing would mean switching to  some kind 
> of range tree that could be serialized to disk so that we  wouldn't have 
> to iterate through the entire dspace db on startup.   That opens up the 
> possibility of the dspace db and the ledger-on-disk  getting out of 
> sync, which I'd rather avoid.
> 
> We could hand out new handles by choosing one randomly, and then  
> checking if its in the DB, getting rid of the need for a ledger  
> entirely, but I assume this idea was already scratched to avoid the  
> potential costs at creation time, especially as the filesystem grows.
> 
> -sam
> 
> 
> 
> On Feb 20, 2007, at 11:23 AM, Phil Carns wrote:
> 
>> Robert Latham wrote:
>>
>>> On Tue, Feb 20, 2007 at 07:29:16AM -0500, Phil Carns wrote:
>>>
>>>> Oh, and one other detail; the memory usage of the servers looks  
>>>> fine during startup, so this doesn't appear to be a memory leak.   
>>>> There is quite a bit of CPU work, but I am guessing that is just  
>>>> berkeley db keeping busy in the iteration function.
>>>
>>> How long does it take to scan 1.4 million files on startup?
>>> ==rob
>>
>>
>> That's an interesting issue :)
>>
>> A few observations:
>>
>> - we were looking at this on SAN; the results may be different on  
>> local disks
>>
>> - the db files are on the order of 500 MB for this particular setup
>>
>> - the time to scan varies depending on if the db files are hot in  the 
>> Linux buffer cache
>>
>> If we start the daemon right after killing another one that just  did 
>> the same scan, then the process is CPU intensive, but fast  (about 5 
>> seconds).  If we unmount/mount the SAN between the two  runs so that 
>> the buffer cache is cleared, then it is very slow  (about 5 minutes).
>>
>> An interesting trick is to use dd with a healthy buffer size to  read 
>> the .db files and throw the output into /dev/null before  starting the 
>> servers.  This only takes a few seconds, and makes it  so that the 
>> scan consistently finishes in just a few seconds as  well.  I think 
>> the reason is just that it forces the db data into  the Linux buffer 
>> cache using an efficient access pattern so that  berkeley db doesn't 
>> have to wait on disk latency for whatever small  accesses it is 
>> performing.
>>
>> This seems to indicate that berkeley db's access pattern generated  by 
>> PVFS2 for this case isn't very friendly, at least to SANs that  aren't 
>> specifically tuned for it.
>>
>> The 5 minute scan time is a problem, because it makes it hard to  tell 
>> when you will actually be able to mount the file system after  the 
>> daemons appear to have started.  We would be happy to try out  any 
>> optimizations here :)
>>
>> -Phil
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> 



More information about the Pvfs2-developers mailing list