[Pvfs2-developers] server crash on startup with millions of files
Sam Lang
slang at mcs.anl.gov
Wed Mar 14 16:11:12 EST 2007
Here's a patch with the suggested changes. Handling the comparison
function with a different storage format ended up being a bit uglier
than I expected. Removing the DB_RECNUM flag from our db's was also
not as easy as expected. I did the following:
* changed the dspace comparison function to use > instead of <. This
should allow the iterate_handles function to get berkeley db to read
in pages from front to back instead of back to front.
* Modified the symantics of our storage format version a bit. The
previous version was 0.1.2, and unless I'm mistaken, the individual
components of the version didn't carry much meaning. Any version
change meant that the new code would abort on older storage
versions. I've given the components names: major.minor.incremental,
and allowed the incremental value to be changed (0.1.3) so that new
code can support older formats, but all major and minor value changes
are _not_ backward compatible.
* With the storage version changes, we now accept 0.1.3 and 0.1.2,
and call the appropriate comparison function based on the version.
* Changed the PVFS_ds_position from int32_t to uint64_t. Note that
this required changing many of the request encoding/decoding
functions that pass a position field, and incrementing the protocol
major version (do we zero the minor version when we increment the
major version?). It required getting the alignment right for device
requests as well.
* It turns out that once a db is created with DB_RECNUM, it always
has to be opened with DB_RECNUM, so that's another storage format
change. For now, I try to open without DB_RECNUM, and if that
returns EINVAL I retry with DB_RECNUM. Newly created dbs don't get
the DB_RECNUM flag, so hopefully that will improve performance (the
doc says it can really slow things down).
Let me know how these changes look, and if someone gets a chance to
look at performance differences, that would be great.
Thanks,
-sam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mult.patch
Type: application/octet-stream
Size: 27918 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070314/8cc9578d/mult-0001.obj
-------------- next part --------------
On Mar 7, 2007, at 2:39 PM, Phil Carns wrote:
>
>> Can we conclude this discussion? In summary:
>> * The current comparison function causes bad IO patterns for
>> iterate on the dspace db. We can change it but the disk format
>> will change in new releases.
>> - If we change it, either we check a version number and
>> provide the right comparison function, or we perform migration to
>> the new storage format.
>> - If we don't change it, we can still improve performance by
>> iterating from the last entry to the first, but we can't use
>> DB_MULTIPLE_KEY, which also improves performance for big filesystems.
>
> I don't really have a preference either way.
>
>> * If we change PVFS_ds_position from uint32_t to uint64_t, we can
>> use the handle as the position, and avoid opening the dspace db
>> with the RECNO flag, which is killing our performance on writes.
>
> I think this sounds good too. We would be happy to help test any
> combination of the options you list.
>
> -Phil
>
More information about the Pvfs2-developers
mailing list