[Pvfs2-developers] server crash on startup with millions of files

Pete Wyckoff pw at osc.edu
Fri Feb 23 17:12:32 EST 2007


rross at mcs.anl.gov wrote on Fri, 23 Feb 2007 14:38 -0600:
> The current implementation (a) tracks what is free and (b) what is 
> recently used, (c) lets the server choose the handle to return, and (d) 
> keeps a global handle space (a handle is unique across all servers).

No disputes with these properties and most of what they buy us.

> walt's idea seems to allow us to map a collection of objects (a 
> "segment") to a given server, then a client could pick values in that 
> segment. my feeling is that this hamstrings our ability to move objects 
> around, because we would then need to move entire segments around, as at 
> the very least it could take a very long time to reach a consistent 
> state again (think of many large objects needing to be moved; how do 
> clients know where to contact?). this idea is a generalization of pete's 
> idea to have a server id be part of the object handle; pete's approach 
> makes it impossible to migrate without changing file metadata. more on 
> this below.

Actually we may do someting like this for the OSD implementation.
Consider your bottom 64-bits in the new 128-bit handle ID are
server-assigned, or random, whatever.  The top are chosen by the
config manager, call it segment if you like.  At file system create
time, servers A, B, C are assigned ranges 1.1-9, 2.1-9, 3.1-9,
respectively.  Each has its own 64-bit space underneath a particular
server ID (where "9" is the biggest number that fits in 64 bits).

Now add a new server D and rebalance by moving objects off the
existing servers and onto D, trying to keep the handle ID space
contiguous.  The new handle map might be:

    1.1-6 A
    1.7-9 D

    2.1-6 B
    2.7-9 D

    3.1-6 C
    3.7-9 D

    4.1-9 A
    5.1-9 B
    6.1-9 C
    7.1-9 D

To create a new datafile in A, the client specifies handle range
4.1-9.  I.e., the entire 64-bit space with upper bits 4.  The old
"1" space is closed to new additions.  You only move the part of
the space you want to for performance reasons.

Similarly to consolidate you break up an existing space on the
server that is going away and assign contiguous parts to the
remaining servers.  One issue is that collision may happen during
consolidation in the lower 64-bit space, which the OSD will not
like, at which point you need to do a metafile remapping.  Not sure
if this kills the entire algorithm from a PVFS point of view.

The size of the handle map grows at about the same rate as our
current fixed-range map, as servers are added and removed.

> i don't understand why it is difficult to get a value in a particular 
> range in the OSD work. can you clarify this pete? can't you just "guess" 
> a value in the range until you get one?

Certainly one can guess, and retry until it works out.  We'll likely
not be able to do the recently-used list on OSDs, but could do lazy
algorithms to hide old handles on the OSD until a cleaner comes
through and wipes them based on timestamp.

> one thing that we could discuss is the relative merit of migration using 
> this sort of approach. maybe in fact this idea that i have that we want 
> to keep a FS-wide object handle space is flawed, that changing file 
> metadata can be addressed in a reasonable way that simplifies the 
> overall system, allows for migration, and doesn't have a negative impact 
> on our caching of metadata.

This may in fact be something we will be forced to give up as things
get too large.  The size of the handle map is fairly small when you
start with N ranges, one for each server in your new FS.  If someone
actually adds/removes servers with some regularity, and moves
existing handles to balance load, either in size or utilization
sense, that map will fragment until it becomes too painful for each
client to search to do the handle -> server lookup.  The only way
out is to rewrite metafile entries.  I'm not worried in the short
term though.

> overall i think that changing how we reference objects, with the 
> exception of perhaps redoing how we keep up with free/recently-freed 
> objects, is something that should perhaps wait until we have 
> server-to-server working. we're likely to want to make some changes at 
> that point anyway, once the system has more control over the 
> construction of files and directories. maybe we can discuss how we'd 
> like things to work in that context and concentrate on getting there, 
> rather than torquing things now and then perhaps messing with things again?

Of course, I want to make sure everything works without
server-to-server communication, as that is 1) not possible with OSDs,
and 2) limits server scalability if they always have to talk to each
other.  Don't even start with server-collective algorithms.  :)  Nor
will I be motivated to make big changes to the handle map without
some motivating problem to fix.

		-- Pete



More information about the Pvfs2-developers mailing list