[Pvfs2-developers] server crash on startup with millions of files
Rob Ross
rross at mcs.anl.gov
Wed Mar 14 17:08:50 EST 2007
yeah zero the minor when changing the major version. thanks for putting
this together! -- rob
Sam Lang wrote:
>
> Here's a patch with the suggested changes. Handling the comparison
> function with a different storage format ended up being a bit uglier
> than I expected. Removing the DB_RECNUM flag from our db's was also not
> as easy as expected. I did the following:
>
> * changed the dspace comparison function to use > instead of <. This
> should allow the iterate_handles function to get berkeley db to read in
> pages from front to back instead of back to front.
>
> * Modified the symantics of our storage format version a bit. The
> previous version was 0.1.2, and unless I'm mistaken, the individual
> components of the version didn't carry much meaning. Any version change
> meant that the new code would abort on older storage versions. I've
> given the components names: major.minor.incremental, and allowed the
> incremental value to be changed (0.1.3) so that new code can support
> older formats, but all major and minor value changes are _not_ backward
> compatible.
>
> * With the storage version changes, we now accept 0.1.3 and 0.1.2, and
> call the appropriate comparison function based on the version.
>
> * Changed the PVFS_ds_position from int32_t to uint64_t. Note that this
> required changing many of the request encoding/decoding functions that
> pass a position field, and incrementing the protocol major version (do
> we zero the minor version when we increment the major version?). It
> required getting the alignment right for device requests as well.
>
> * It turns out that once a db is created with DB_RECNUM, it always has
> to be opened with DB_RECNUM, so that's another storage format change.
> For now, I try to open without DB_RECNUM, and if that returns EINVAL I
> retry with DB_RECNUM. Newly created dbs don't get the DB_RECNUM flag,
> so hopefully that will improve performance (the doc says it can really
> slow things down).
>
> Let me know how these changes look, and if someone gets a chance to look
> at performance differences, that would be great.
>
> Thanks,
>
> -sam
>
>
> On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:
>
>>
>>> Can we conclude this discussion? In summary:
>>> * The current comparison function causes bad IO patterns for iterate
>>> on the dspace db. We can change it but the disk format will change
>>> in new releases.
>>> - If we change it, either we check a version number and provide
>>> the right comparison function, or we perform migration to the new
>>> storage format.
>>> - If we don't change it, we can still improve performance by
>>> iterating from the last entry to the first, but we can't use
>>> DB_MULTIPLE_KEY, which also improves performance for big filesystems.
>>
>> I don't really have a preference either way.
>>
>>> * If we change PVFS_ds_position from uint32_t to uint64_t, we can
>>> use the handle as the position, and avoid opening the dspace db with
>>> the RECNO flag, which is killing our performance on writes.
>>
>> I think this sounds good too. We would be happy to help test any
>> combination of the options you list.
>>
>> -Phil
>>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
More information about the Pvfs2-developers
mailing list