[Pvfs2-developers] handle ledger

Sam Lang slang at mcs.anl.gov
Tue Jan 29 16:53:02 EST 2008


On Jan 29, 2008, at 1:42 PM, Pete Wyckoff wrote:

> slang at mcs.anl.gov wrote on Tue, 29 Jan 2008 13:32 -0600:
>> On Jan 28, 2008, at 6:43 PM, Pete Wyckoff wrote:
>>> slang at mcs.anl.gov wrote on Mon, 28 Jan 2008 16:38 -0600:
>>>> Attached patch disables the handle ledger.  For those not  
>>>> familiar, the
>>>> handle ledger is an in-memory structure that maintains allocated  
>>>> handles
>>>> for a given server.  I'm disabling it because reading the entire  
>>>> database
>>>> each time the server loads is extremely expensive for large  
>>>> filesystems.
>>>> Instead of choosing a handle from the ledger, the patch picks one
>>>> randomly.
>>>> This means we have to deal with collisions now, but because of  
>>>> our large
>>>> handle space, they only occur every 100 billion times or so.
>>>>
>>>> I didn't blow away the handle allocation code entirely...I just  
>>>> disabled
>>>> the calls that we had been using to invoke the handle ledger, and  
>>>> added
>>>> some functionality that picks a random handle from a given  
>>>> range.  In the
>>>> dspace code, I modified the create function to continue up to 32  
>>>> times if
>>>> a
>>>> collision with an already existing handle occurs.
>>>
>>> Great change.  Never liked that myself either.  Some comments.
>>>
>>>> diff -u -a -p -r1.152 dbpf-dspace.c
>>>> --- src/io/trove/trove-dbpf/dbpf-dspace.c	8 Nov 2007 21:48:22  
>>>> -0000	1.152
>>>> +++ src/io/trove/trove-dbpf/dbpf-dspace.c	28 Jan 2008 21:55:49  
>>>> -0000
>>> [..]
>>>> +    } while(ret != DB_NOTFOUND && ++attempts >
>>>> MAX_HANDLE_ALLOC_ATTEMPTS);
>>>
>>> Uh, maybe <.
>>
>> Are you arguing for increasing the max number of attempts, or just  
>> retrying
>> forever?
>
> Maybe I misunderstand the termination condition for the loop.  You
> want it to keep trying until attempts gets up to a certain value.
> Just the < is backwards.  If I'm spacing and you're sure this is
> right, ignore me.

Oh, no you're right.  Heh, I was the one spacing, and I thought you  
were doing some kind of weird winking smiley. <-%  Nice catch!

>
>
>>>> +    rfd = open("/dev/urandom", O_RDONLY, 0);
>>>> +    if(rfd < 0)
>>>> +    {
>>>> +        return -PVFS_EINVAL;
>>>> +    }
>>>
>>> Painted ourselves into a linux-specific corner here.  Maybe have the
>>> usual time() etc. srand option here too if open fails.
>>>
>>>> +    random_r(&trove_handle_random_data, &r1);
>>>> +    i = r1 % extent_array->extent_count;
>>>
>>> May want a feature test for this.  Not sure if POSIX has gotten
>>> itself into all the OSes on which people may run servers.
>>
>> Right, I was concerned with making sure I got a good seed here.  It  
>> needs
>> to generate both a very large random sequence from the seed, as  
>> well as not
>> pick the same seed over and over on server startup.  Using  
>> initstate_r with
>> an array size of 256 makes the values returned by random_r much more
>> random, and passing the current time ensures that the seed will be
>> different on each server startup.
>>
>> If we use the more primitive forms of getting a random number, its  
>> just
>> more likely to get repeated values for handles. Is that  
>> acceptable?  Does
>> it become the user's problem his random handle values aren't so  
>> random?
>
> Yeah.  It will just run through the same set of allocated handles,
> taking a long time that first time for people with lousy RNGs.  Then
> it will fall into an unallocated space and continue normally.  As
> long as there is a configure test for random_r, we can fall back to
> lrand48() and friends or even ancient srand/rand.  /dev/urandom
> test must be at runtime, with graceful fallback to a seed made up of
> hostname[0:255] | time() << 29 | coll_id << 63 | ... any other
> random stuff you can get your hands on in that routine easily.

Not sure hostname is actually useful in this case, since the handles  
are allocated (and only need to be unique) per-server.  Same with the  
fs_id.  I could use process id I suppose...

Can I check for /dev/urandom with a runtime check in configure, or are  
those verboten for cross-compiles?
-sam

>
>
> I thought about proposing just doing linear allocation.  Find the
> highest handle, allocate +1 on that.  That's what we do with the
> OSDs, using a 1-element cache to remember the last handle allocated.
> This works nicely until you first fill up your handle space and have
> to wrap, then can go bad if you hit a run of undeleted old handles.
>
> I've no idea what the cost is to run the RNG.  Presumably it is very
> fast.  In which case just doing it all the time like you have it is
> perfect.
>
> 		-- Pete
>



More information about the Pvfs2-developers mailing list