[Pvfs2-developers] server crash on startup with millions of files

Sam Lang slang at mcs.anl.gov
Fri Feb 23 12:39:01 EST 2007


On Feb 23, 2007, at 10:45 AM, Pete Wyckoff wrote:

> slang at mcs.anl.gov wrote on Fri, 23 Feb 2007 10:03 -0600:
>> On Feb 22, 2007, at 5:54 PM, Sam Lang wrote:
>>> We could hand out new handles by choosing one randomly, and then
>>> checking if its in the DB, getting rid of the need for a ledger
>>> entirely, but I assume this idea was already scratched to avoid the
>>> potential costs at creation time, especially as the filesystem  
>>> grows.
>>
>> Actually as I think about this some more, maybe its worth
>> considering.  Right now genconfig only uses the first 2^32 handles,
>> dividing them up equally amongst the number of servers.  That's
>> obviously not anywhere near the possible limit.  If genconfig
>> allocated even half of the 2^64 handles to the servers, that would
>> really decrease the likelihood of selecting an already used handle at
>> random, even for a filesystem with millions of files.
>>
>> Also, the ledger could still be used to keep track of the handles
>> that are created during the lifetime of that particular server
>> process, as well as the ones that already exist if a randomly chosen
>> handle gets a hit.  If genconfig allocates over the 1 - 2^63 range,
>> with 64 servers the chance of randomly picking an already used handle
>> is 1 in 2^56.  With 16 million files its still 1 in 4 billion.
>>
>> The interfaces do allow the client to specify the specific handle or
>> a range of handles when doing the create, but we always just get the
>> range directly from the config file.  Are there use cases out there
>> where more limited ranges (or specific handles) are requested by the
>> client?
>
> I like the idea of ditching the ledger.  What's the reason to keep
> track of handles that are created during the lifetime of a
> particular server process?
>
> Some old design notes say the servers will track recently freed
> handles to avoid the reuse problem.  But I'm not sure if we do this
> or if it is really a good idea.
>
> Now for some crazy comments.
>
> For create scalability, you may want the client to pick handle IDs
> and offer those to the server, so that you can optimistically create
> a metafile assuming there are no collisions on the server.  These
> guessed handle IDs can be random though.  We did not implement this
> as it would be quite expensive if implemented in terms of the
> existing extent/extentlist/ledger data structures.
>
> In the OSD work, we have to do painful things to return a handle ID
> in a particular range.  I would much rather have the server pick a
> random ID and give it to the client.  Or for the client to try to
> pick a particular ID and hope there is no collision at the server.

Rob and I have talked about this a little bit.  At the least an IO  
server's handle range could be partitioned up amongst the metadata  
servers in the config file.  Then its up to the metadata server to  
allocate datafile handles for servers.  This still seems reasonable  
with randomly chosen handles if the range is big enough.

>
> So I'd like to discard the idea of pre-assigned per-server handle
> ranges and augment our notion of PVFS_handle to include some sort of
> "server identifier" as well as the 64-bit ID that is private to the
> particular device on which the object sits.

Are you talking about increasing the size of PVFS_handle (object,  
whatever) to 128 bits, and use the upper half for server/object  
namespace?  I hate to sound like Bill Gates, but surely no one will  
ever need 2^64 servers.  We actually sort of already have a server  
identifier built into the handle, although I agree its thinking about  
the problem differently than ranges.  It seems like including the  
actual server id in the handle/object thingy reduces the transparency  
of the object.  What would it do to migration, for example?

>
> Various distributed FS implementations for wide-area use seem to be
> happy with 128-bit handles and assume collisions will never happen.
> This always struck me as wildly reckless, but maybe it is time to
> accept the fact that these number spaces are really big.

I thought they used something like uuids and embedded host and  
timestamp info into the actual handle to guarantee uniqueness.  It  
does seem odd that they would just assume that collisions of random  
numbers wouldn't occur.

-sam


>
> 		-- Pete
>



More information about the Pvfs2-developers mailing list