[Pvfs2-developers] server crash on startup with millions of files

Phil Carns pcarns at wastedcycles.org
Thu Mar 15 21:49:23 EST 2007


Thanks Sam!

I tried your patches, and here are the performance results for startup 
of a file system that was created with a 356 MB dataspace_attributes.db 
file:

- old format storage space, stock server: 7 minutes, 10 seconds
- old format storage space, patched server: 6 minutes, 49 seconds
- new format storage space, patched server: 1 minute, 13 seconds

As expected, much better performance if the file system is generated 
with the new storage format!

All of the functionality seemed fine regardless of whether the patched 
server was run on the new or old storage format.  In all three cases I 
created 960,000 files on a single meta server to get to that db size. 
There was also a slight improvement (about 1%) in file creation 
performance using the new format and patched server.

This is definitely a big help to be able to get the servers started quicker.

-Phil


Sam Lang wrote:
> 
> Here's a patch with the suggested changes.  Handling the comparison  
> function with a different storage format ended up being a bit uglier  
> than I expected.  Removing the DB_RECNUM flag from our db's was also  
> not as easy as expected.  I did the following:
> 
> * changed the dspace comparison function to use > instead of <.  This  
> should allow the iterate_handles function to get berkeley db to read  in 
> pages from front to back instead of back to front.
> 
> * Modified the symantics of our storage format version a bit.  The  
> previous version was 0.1.2, and unless I'm mistaken, the individual  
> components of the version didn't carry much meaning.  Any version  
> change meant that the new code would abort on older storage  versions.  
> I've given the components names: major.minor.incremental,  and allowed 
> the incremental value to be changed (0.1.3) so that new  code can 
> support older formats, but all major and minor value changes  are _not_ 
> backward compatible.
> 
> * With the storage version changes, we now accept 0.1.3 and 0.1.2,  and 
> call the appropriate comparison function based on the version.
> 
> * Changed the PVFS_ds_position from int32_t to uint64_t.  Note that  
> this required changing many of the request encoding/decoding  functions 
> that pass a position field, and incrementing the protocol  major version 
> (do we zero the minor version when we increment the  major version?).  
> It required getting the alignment right for device  requests as well.
> 
> * It turns out that once a db is created with DB_RECNUM, it always  has 
> to be opened with DB_RECNUM, so that's another storage format  change.  
> For now, I try to open without DB_RECNUM, and if that  returns EINVAL I 
> retry with DB_RECNUM.  Newly created dbs don't get  the DB_RECNUM flag, 
> so hopefully that will improve performance (the  doc says it can really 
> slow things down).
> 
> Let me know how these changes look, and if someone gets a chance to  
> look at performance differences, that would be great.
> 
> Thanks,
> 
> -sam
> 
> 
> 
> On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:
> 
>>
>>> Can we conclude this discussion?  In summary:
>>> * The current comparison function causes bad IO patterns for  
>>> iterate  on the dspace db.  We can change it but the disk format  
>>> will change  in new releases.
>>>     - If we change it, either we check a version number and  provide 
>>> the  right comparison function, or we perform migration to  the new 
>>> storage  format.
>>>     - If we don't change it, we can still improve performance by   
>>> iterating from the last entry to the first, but we can't use   
>>> DB_MULTIPLE_KEY, which also improves performance for big filesystems.
>>
>>
>> I don't really have a preference either way.
>>
>>> * If we change PVFS_ds_position from uint32_t to uint64_t, we can  
>>> use  the handle as the position, and avoid opening the dspace db  
>>> with the  RECNO flag, which is killing our performance on writes.
>>
>>
>> I think this sounds good too.  We would be happy to help test any  
>> combination of the options you list.
>>
>> -Phil
>>
> 



More information about the Pvfs2-developers mailing list