[Pvfs2-developers] Distribution by hostname

Rob Ross rross at mcs.anl.gov
Mon Oct 9 11:57:04 EDT 2006


I'm warming to the idea of doing everything related to distributions 
through the hints mechanism, even though it sort of undoes Pete's 
attempt to simplify the discussion.

The cost of string processing is minimal compared to everything else 
that we do, so that isn't a big deal IMO. We could still transport 
distribution info the same way that we do now, and in fact this 
extension to allow for enumerating servers (via aliases) wouldn't impact 
that either (or doesn't have to right now at least).

I think everyone is ok with:
- storing handles for datafiles, not aliases
- not creating three new distributions, but instead just changing how we 
get the list of servers to them (Sam believes that we can do this nicely)
- keep other existing functionality of simple_stripe/var_strip

I personally don't think the environment variable approach is all that 
useful for PVFS. I believe this because there are only a *very* limited 
number of cases where it would apply cleanly (e.g. pvfs2-cp) that there 
isn't another mechanism for doing the same thing (e.g. MPI-IO hints). I 
think that MPI-IO hints should have some mechanism along those lines, 
and I wouldn't mind discussing that separately, but that's a different 
topic.

So I think we're mostly trying to work out what our API should really 
be, whether we should extend the distro functionality vs. going totally 
to hints, and if we go to hints what that API should look like, right?

Julian and others have done examples of passing in distro parameters to 
MPI-IO, and BradS I think has done this too. What did they look like 
(since they would be other examples of string passing of values)? What 
did we like/dislike about them?

Regards,

Rob

Walter B. Ligon III wrote:
> I tend to agree with Pete and Sam ... Given the existing distro 
> interface this is the right place to set a list of specific servers - 
> though I tend to think it should be independent of the distro type (just 
> as the number of servers is currently independent of which distro type 
> you use).  The list of servers shouldn't be saved as part of the 
> metadata - I think Sam's example of setting such a list on a directory 
> is a distinct case.  In this case you set an extended attribute which is 
> used as the default during file creation - but that list still isn't 
> saved with the files (but might be saved as an extended attrib on a 
> subdir ... but I digress).
> 
> On the other hand, if we had a good generic hint mechanism we COULD 
> replace the entire distro interface and just use hints to do all of 
> that.  I'm not sure this is the best approach because it is awkward. 
> Especially if everything is specified using strings.
> 
> Maybe we should do it all with XML!   ;-)
> 
> Walt
> 
> Sam Lang wrote:
>>
>> On Oct 7, 2006, at 2:09 PM, Rob Ross wrote:
>>
>>> I agree completely with Pete. I think we might consider just  
>>> adjusting the input parameters on the client side to allow for  
>>> inclusion of the list of aliases. There is no reason for these to  be 
>>> stored as part of the distribution information on the server, as  
>>> once the objects are created we already know where they are.
>>
>>
>> I agree storing them in a file's distribution is redundant.  The  
>> distribution stored on a directory in the extended attribute is the  
>> only reason I can think of for storing the list in the distro.
>>
>> In terms of interfaces, the distribution parameter we pass into  
>> create is really a sort of hint:  We don't require it (defaulting to  
>> simple stripe if its not specified), its opaque to the caller  
>> (specified by a key string: "varstrip" and arbitrary parameters), and  
>> it functions as a 'hint' in the way we seem to have a need for them,  
>> changing the distribution of that file.
>>
>> It might be possible to design a hints interface that both allows us  
>> to express a distribution as one of possibly many generic hints being  
>> passed into create, without losing the convenience of just passing in  
>> a distribution, or a list of servers.
>>
>> -sam
>>
>>>
>>> We have what, 3 distributions that will need to be adjusted to this  
>>> new scheme? That shouldn't be too bad, and they overall cover a  very 
>>> wide range of options.
>>>
>>> Regards,
>>>
>>> Rob
>>>
>>> Pete Wyckoff wrote:
>>>
>>>> slang at mcs.anl.gov wrote on Fri, 06 Oct 2006 16:33 -0500:
>>>>
>>>>> On Oct 6, 2006, at 1:48 PM, Julian Martin Kunkel wrote:
>>>>>
>>>>>> Also it will not
>>>>>> allow to set the servers for all distributions...
>>>>>
>>>>> Yeah I can't imagine wanting to ever do that.  It would mean  
>>>>> passing  in a distribution different from the default simple- 
>>>>> stripe, as well  as a hint saying you want a specific set of  
>>>>> servers in the same  call.  Seems sort of yucky to me.  I'd  rather 
>>>>> have all the  information about the distribution in the  
>>>>> distribution.  You're even  able to use the distribution field in  
>>>>> the directory hints structure  to specify per-directory IO server  
>>>>> lists.  Not that you would ever  want to do that either...
>>>>
>>>> I agree with Sam that this is yucky.  I'm hijacking this thread.
>>>> Let's forget about hints for a moment and decide how we want to
>>>> extend the concept of distributions, as seen by users, in such a way
>>>> that they can specify particular IO servers by name.  If this is an
>>>> interface people want, we should design it properly, not just
>>>> implement it with hints because we (might) have them.
>>>> Some issues, please suggest approaches and other issues.  (I'm using
>>>> "name" here to mean host alias.)
>>>> 1.  What kind of control do users want?
>>>>     - all data on one server by name?
>>>>     - arbirtrary control of stripe sizes and host names?
>>>> 2.  New distribution name, or extension to existing ones?
>>>>     - dist-varstrip has a lot of flexibility, but no hostnames
>>>>     - maybe a new "dist-single-host-by-name" is all that is desired
>>>> 3.  Store hostnames in on-disk distribution?
>>>>     - guessing no for the single-stripe distro, but perhaps somebody
>>>>     can really think of a use case for this?
>>>> 4.  User API
>>>>     - through PVFS_dist_create
>>>>     - (please not through both PVFS_dist_create + some hint)
>>>>     - via environment variable too?
>>>> If our design happens to end up as something that would be
>>>> implemented well by hints, then we can think about using them.  For
>>>> now, let's just get the design correct.
>>>> We can come back and argue the merits of a generic hint interface in
>>>> a different thread.
>>>>         -- Pete
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> 


More information about the Pvfs2-developers mailing list