[Pvfs2-developers] server crash on startup with millions of files

Rob Ross rross at mcs.anl.gov
Wed Mar 14 17:08:50 EST 2007


yeah zero the minor when changing the major version. thanks for putting 
this together! -- rob

Sam Lang wrote:
> 
> Here's a patch with the suggested changes.  Handling the comparison 
> function with a different storage format ended up being a bit uglier 
> than I expected.  Removing the DB_RECNUM flag from our db's was also not 
> as easy as expected.  I did the following:
> 
> * changed the dspace comparison function to use > instead of <.  This 
> should allow the iterate_handles function to get berkeley db to read in 
> pages from front to back instead of back to front.
> 
> * Modified the symantics of our storage format version a bit.  The 
> previous version was 0.1.2, and unless I'm mistaken, the individual 
> components of the version didn't carry much meaning.  Any version change 
> meant that the new code would abort on older storage versions.  I've 
> given the components names: major.minor.incremental, and allowed the 
> incremental value to be changed (0.1.3) so that new code can support 
> older formats, but all major and minor value changes are _not_ backward 
> compatible.
> 
> * With the storage version changes, we now accept 0.1.3 and 0.1.2, and 
> call the appropriate comparison function based on the version.
> 
> * Changed the PVFS_ds_position from int32_t to uint64_t.  Note that this 
> required changing many of the request encoding/decoding functions that 
> pass a position field, and incrementing the protocol major version (do 
> we zero the minor version when we increment the major version?).  It 
> required getting the alignment right for device requests as well.
> 
> * It turns out that once a db is created with DB_RECNUM, it always has 
> to be opened with DB_RECNUM, so that's another storage format change.  
> For now, I try to open without DB_RECNUM, and if that returns EINVAL I 
> retry with DB_RECNUM.  Newly created dbs don't get the DB_RECNUM flag, 
> so hopefully that will improve performance (the doc says it can really 
> slow things down).
> 
> Let me know how these changes look, and if someone gets a chance to look 
> at performance differences, that would be great.
> 
> Thanks,
> 
> -sam
> 
> 
> On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:
> 
>>
>>> Can we conclude this discussion?  In summary:
>>> * The current comparison function causes bad IO patterns for iterate  
>>> on the dspace db.  We can change it but the disk format will change  
>>> in new releases.
>>>     - If we change it, either we check a version number and provide 
>>> the  right comparison function, or we perform migration to the new 
>>> storage  format.
>>>     - If we don't change it, we can still improve performance by  
>>> iterating from the last entry to the first, but we can't use  
>>> DB_MULTIPLE_KEY, which also improves performance for big filesystems.
>>
>> I don't really have a preference either way.
>>
>>> * If we change PVFS_ds_position from uint32_t to uint64_t, we can 
>>> use  the handle as the position, and avoid opening the dspace db with 
>>> the  RECNO flag, which is killing our performance on writes.
>>
>> I think this sounds good too.  We would be happy to help test any 
>> combination of the options you list.
>>
>> -Phil
>>
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


More information about the Pvfs2-developers mailing list