[Pvfs2-developers] Distribution by hostname

Walter B. Ligon III walt at clemson.edu
Mon Oct 9 12:36:18 EDT 2006


I'm down with that.  For the record, my concern over strings isn't in 
our processing them - its the users having to muck with them - 
especially when passing numeric data.  I just worry about lots of 
mallocing/string copying blech when all you really want to do is set the 
stripe size to the variable x.

If we can work that out I feel like we could probably design a nice 
generic interface that will let us do what we already do AND add new 
stuff with less interface mucking.

I think we're all moving in a good direction.

Walt

Rob Ross wrote:
> I'm warming to the idea of doing everything related to distributions 
> through the hints mechanism, even though it sort of undoes Pete's 
> attempt to simplify the discussion.
> 
> The cost of string processing is minimal compared to everything else 
> that we do, so that isn't a big deal IMO. We could still transport 
> distribution info the same way that we do now, and in fact this 
> extension to allow for enumerating servers (via aliases) wouldn't impact 
> that either (or doesn't have to right now at least).
> 
> I think everyone is ok with:
> - storing handles for datafiles, not aliases
> - not creating three new distributions, but instead just changing how we 
> get the list of servers to them (Sam believes that we can do this nicely)
> - keep other existing functionality of simple_stripe/var_strip
> 
> I personally don't think the environment variable approach is all that 
> useful for PVFS. I believe this because there are only a *very* limited 
> number of cases where it would apply cleanly (e.g. pvfs2-cp) that there 
> isn't another mechanism for doing the same thing (e.g. MPI-IO hints). I 
> think that MPI-IO hints should have some mechanism along those lines, 
> and I wouldn't mind discussing that separately, but that's a different 
> topic.
> 
> So I think we're mostly trying to work out what our API should really 
> be, whether we should extend the distro functionality vs. going totally 
> to hints, and if we go to hints what that API should look like, right?
> 
> Julian and others have done examples of passing in distro parameters to 
> MPI-IO, and BradS I think has done this too. What did they look like 
> (since they would be other examples of string passing of values)? What 
> did we like/dislike about them?
> 
> Regards,
> 
> Rob
> 
> Walter B. Ligon III wrote:
> 
>> I tend to agree with Pete and Sam ... Given the existing distro 
>> interface this is the right place to set a list of specific servers - 
>> though I tend to think it should be independent of the distro type 
>> (just as the number of servers is currently independent of which 
>> distro type you use).  The list of servers shouldn't be saved as part 
>> of the metadata - I think Sam's example of setting such a list on a 
>> directory is a distinct case.  In this case you set an extended 
>> attribute which is used as the default during file creation - but that 
>> list still isn't saved with the files (but might be saved as an 
>> extended attrib on a subdir ... but I digress).
>>
>> On the other hand, if we had a good generic hint mechanism we COULD 
>> replace the entire distro interface and just use hints to do all of 
>> that.  I'm not sure this is the best approach because it is awkward. 
>> Especially if everything is specified using strings.
>>
>> Maybe we should do it all with XML!   ;-)
>>
>> Walt
>>
>> Sam Lang wrote:
>>
>>>
>>> On Oct 7, 2006, at 2:09 PM, Rob Ross wrote:
>>>
>>>> I agree completely with Pete. I think we might consider just  
>>>> adjusting the input parameters on the client side to allow for  
>>>> inclusion of the list of aliases. There is no reason for these to  
>>>> be stored as part of the distribution information on the server, as  
>>>> once the objects are created we already know where they are.
>>>
>>>
>>>
>>> I agree storing them in a file's distribution is redundant.  The  
>>> distribution stored on a directory in the extended attribute is the  
>>> only reason I can think of for storing the list in the distro.
>>>
>>> In terms of interfaces, the distribution parameter we pass into  
>>> create is really a sort of hint:  We don't require it (defaulting to  
>>> simple stripe if its not specified), its opaque to the caller  
>>> (specified by a key string: "varstrip" and arbitrary parameters), 
>>> and  it functions as a 'hint' in the way we seem to have a need for 
>>> them,  changing the distribution of that file.
>>>
>>> It might be possible to design a hints interface that both allows us  
>>> to express a distribution as one of possibly many generic hints 
>>> being  passed into create, without losing the convenience of just 
>>> passing in  a distribution, or a list of servers.
>>>
>>> -sam
>>>
>>>>
>>>> We have what, 3 distributions that will need to be adjusted to this  
>>>> new scheme? That shouldn't be too bad, and they overall cover a  
>>>> very wide range of options.
>>>>
>>>> Regards,
>>>>
>>>> Rob
>>>>
>>>> Pete Wyckoff wrote:
>>>>
>>>>> slang at mcs.anl.gov wrote on Fri, 06 Oct 2006 16:33 -0500:
>>>>>
>>>>>> On Oct 6, 2006, at 1:48 PM, Julian Martin Kunkel wrote:
>>>>>>
>>>>>>> Also it will not
>>>>>>> allow to set the servers for all distributions...
>>>>>>
>>>>>>
>>>>>> Yeah I can't imagine wanting to ever do that.  It would mean  
>>>>>> passing  in a distribution different from the default simple- 
>>>>>> stripe, as well  as a hint saying you want a specific set of  
>>>>>> servers in the same  call.  Seems sort of yucky to me.  I'd  
>>>>>> rather have all the  information about the distribution in the  
>>>>>> distribution.  You're even  able to use the distribution field in  
>>>>>> the directory hints structure  to specify per-directory IO server  
>>>>>> lists.  Not that you would ever  want to do that either...
>>>>>
>>>>>
>>>>> I agree with Sam that this is yucky.  I'm hijacking this thread.
>>>>> Let's forget about hints for a moment and decide how we want to
>>>>> extend the concept of distributions, as seen by users, in such a way
>>>>> that they can specify particular IO servers by name.  If this is an
>>>>> interface people want, we should design it properly, not just
>>>>> implement it with hints because we (might) have them.
>>>>> Some issues, please suggest approaches and other issues.  (I'm using
>>>>> "name" here to mean host alias.)
>>>>> 1.  What kind of control do users want?
>>>>>     - all data on one server by name?
>>>>>     - arbirtrary control of stripe sizes and host names?
>>>>> 2.  New distribution name, or extension to existing ones?
>>>>>     - dist-varstrip has a lot of flexibility, but no hostnames
>>>>>     - maybe a new "dist-single-host-by-name" is all that is desired
>>>>> 3.  Store hostnames in on-disk distribution?
>>>>>     - guessing no for the single-stripe distro, but perhaps somebody
>>>>>     can really think of a use case for this?
>>>>> 4.  User API
>>>>>     - through PVFS_dist_create
>>>>>     - (please not through both PVFS_dist_create + some hint)
>>>>>     - via environment variable too?
>>>>> If our design happens to end up as something that would be
>>>>> implemented well by hints, then we can think about using them.  For
>>>>> now, let's just get the design correct.
>>>>> We can come back and argue the merits of a generic hint interface in
>>>>> a different thread.
>>>>>         -- Pete
>>>>> _______________________________________________
>>>>> Pvfs2-developers mailing list
>>>>> Pvfs2-developers at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>>
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>

-- 
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


More information about the Pvfs2-developers mailing list