[Pvfs2-developers] server crash on startup with millions of files

Sam Lang slang at mcs.anl.gov
Wed Mar 14 16:11:12 EST 2007


Here's a patch with the suggested changes.  Handling the comparison  
function with a different storage format ended up being a bit uglier  
than I expected.  Removing the DB_RECNUM flag from our db's was also  
not as easy as expected.  I did the following:

* changed the dspace comparison function to use > instead of <.  This  
should allow the iterate_handles function to get berkeley db to read  
in pages from front to back instead of back to front.

* Modified the symantics of our storage format version a bit.  The  
previous version was 0.1.2, and unless I'm mistaken, the individual  
components of the version didn't carry much meaning.  Any version  
change meant that the new code would abort on older storage  
versions.  I've given the components names: major.minor.incremental,  
and allowed the incremental value to be changed (0.1.3) so that new  
code can support older formats, but all major and minor value changes  
are _not_ backward compatible.

* With the storage version changes, we now accept 0.1.3 and 0.1.2,  
and call the appropriate comparison function based on the version.

* Changed the PVFS_ds_position from int32_t to uint64_t.  Note that  
this required changing many of the request encoding/decoding  
functions that pass a position field, and incrementing the protocol  
major version (do we zero the minor version when we increment the  
major version?).  It required getting the alignment right for device  
requests as well.

* It turns out that once a db is created with DB_RECNUM, it always  
has to be opened with DB_RECNUM, so that's another storage format  
change.  For now, I try to open without DB_RECNUM, and if that  
returns EINVAL I retry with DB_RECNUM.  Newly created dbs don't get  
the DB_RECNUM flag, so hopefully that will improve performance (the  
doc says it can really slow things down).

Let me know how these changes look, and if someone gets a chance to  
look at performance differences, that would be great.

Thanks,

-sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mult.patch
Type: application/octet-stream
Size: 27918 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070314/8cc9578d/mult-0001.obj
-------------- next part --------------


On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:

>
>> Can we conclude this discussion?  In summary:
>> * The current comparison function causes bad IO patterns for  
>> iterate  on the dspace db.  We can change it but the disk format  
>> will change  in new releases.
>>     - If we change it, either we check a version number and  
>> provide the  right comparison function, or we perform migration to  
>> the new storage  format.
>>     - If we don't change it, we can still improve performance by   
>> iterating from the last entry to the first, but we can't use   
>> DB_MULTIPLE_KEY, which also improves performance for big filesystems.
>
> I don't really have a preference either way.
>
>> * If we change PVFS_ds_position from uint32_t to uint64_t, we can  
>> use  the handle as the position, and avoid opening the dspace db  
>> with the  RECNO flag, which is killing our performance on writes.
>
> I think this sounds good too.  We would be happy to help test any  
> combination of the options you list.
>
> -Phil
>



More information about the Pvfs2-developers mailing list