[Pvfs2-developers] Distribution by hostname
Walter B. Ligon III
walt at clemson.edu
Mon Oct 9 12:36:18 EDT 2006
I'm down with that. For the record, my concern over strings isn't in
our processing them - its the users having to muck with them -
especially when passing numeric data. I just worry about lots of
mallocing/string copying blech when all you really want to do is set the
stripe size to the variable x.
If we can work that out I feel like we could probably design a nice
generic interface that will let us do what we already do AND add new
stuff with less interface mucking.
I think we're all moving in a good direction.
Walt
Rob Ross wrote:
> I'm warming to the idea of doing everything related to distributions
> through the hints mechanism, even though it sort of undoes Pete's
> attempt to simplify the discussion.
>
> The cost of string processing is minimal compared to everything else
> that we do, so that isn't a big deal IMO. We could still transport
> distribution info the same way that we do now, and in fact this
> extension to allow for enumerating servers (via aliases) wouldn't impact
> that either (or doesn't have to right now at least).
>
> I think everyone is ok with:
> - storing handles for datafiles, not aliases
> - not creating three new distributions, but instead just changing how we
> get the list of servers to them (Sam believes that we can do this nicely)
> - keep other existing functionality of simple_stripe/var_strip
>
> I personally don't think the environment variable approach is all that
> useful for PVFS. I believe this because there are only a *very* limited
> number of cases where it would apply cleanly (e.g. pvfs2-cp) that there
> isn't another mechanism for doing the same thing (e.g. MPI-IO hints). I
> think that MPI-IO hints should have some mechanism along those lines,
> and I wouldn't mind discussing that separately, but that's a different
> topic.
>
> So I think we're mostly trying to work out what our API should really
> be, whether we should extend the distro functionality vs. going totally
> to hints, and if we go to hints what that API should look like, right?
>
> Julian and others have done examples of passing in distro parameters to
> MPI-IO, and BradS I think has done this too. What did they look like
> (since they would be other examples of string passing of values)? What
> did we like/dislike about them?
>
> Regards,
>
> Rob
>
> Walter B. Ligon III wrote:
>
>> I tend to agree with Pete and Sam ... Given the existing distro
>> interface this is the right place to set a list of specific servers -
>> though I tend to think it should be independent of the distro type
>> (just as the number of servers is currently independent of which
>> distro type you use). The list of servers shouldn't be saved as part
>> of the metadata - I think Sam's example of setting such a list on a
>> directory is a distinct case. In this case you set an extended
>> attribute which is used as the default during file creation - but that
>> list still isn't saved with the files (but might be saved as an
>> extended attrib on a subdir ... but I digress).
>>
>> On the other hand, if we had a good generic hint mechanism we COULD
>> replace the entire distro interface and just use hints to do all of
>> that. I'm not sure this is the best approach because it is awkward.
>> Especially if everything is specified using strings.
>>
>> Maybe we should do it all with XML! ;-)
>>
>> Walt
>>
>> Sam Lang wrote:
>>
>>>
>>> On Oct 7, 2006, at 2:09 PM, Rob Ross wrote:
>>>
>>>> I agree completely with Pete. I think we might consider just
>>>> adjusting the input parameters on the client side to allow for
>>>> inclusion of the list of aliases. There is no reason for these to
>>>> be stored as part of the distribution information on the server, as
>>>> once the objects are created we already know where they are.
>>>
>>>
>>>
>>> I agree storing them in a file's distribution is redundant. The
>>> distribution stored on a directory in the extended attribute is the
>>> only reason I can think of for storing the list in the distro.
>>>
>>> In terms of interfaces, the distribution parameter we pass into
>>> create is really a sort of hint: We don't require it (defaulting to
>>> simple stripe if its not specified), its opaque to the caller
>>> (specified by a key string: "varstrip" and arbitrary parameters),
>>> and it functions as a 'hint' in the way we seem to have a need for
>>> them, changing the distribution of that file.
>>>
>>> It might be possible to design a hints interface that both allows us
>>> to express a distribution as one of possibly many generic hints
>>> being passed into create, without losing the convenience of just
>>> passing in a distribution, or a list of servers.
>>>
>>> -sam
>>>
>>>>
>>>> We have what, 3 distributions that will need to be adjusted to this
>>>> new scheme? That shouldn't be too bad, and they overall cover a
>>>> very wide range of options.
>>>>
>>>> Regards,
>>>>
>>>> Rob
>>>>
>>>> Pete Wyckoff wrote:
>>>>
>>>>> slang at mcs.anl.gov wrote on Fri, 06 Oct 2006 16:33 -0500:
>>>>>
>>>>>> On Oct 6, 2006, at 1:48 PM, Julian Martin Kunkel wrote:
>>>>>>
>>>>>>> Also it will not
>>>>>>> allow to set the servers for all distributions...
>>>>>>
>>>>>>
>>>>>> Yeah I can't imagine wanting to ever do that. It would mean
>>>>>> passing in a distribution different from the default simple-
>>>>>> stripe, as well as a hint saying you want a specific set of
>>>>>> servers in the same call. Seems sort of yucky to me. I'd
>>>>>> rather have all the information about the distribution in the
>>>>>> distribution. You're even able to use the distribution field in
>>>>>> the directory hints structure to specify per-directory IO server
>>>>>> lists. Not that you would ever want to do that either...
>>>>>
>>>>>
>>>>> I agree with Sam that this is yucky. I'm hijacking this thread.
>>>>> Let's forget about hints for a moment and decide how we want to
>>>>> extend the concept of distributions, as seen by users, in such a way
>>>>> that they can specify particular IO servers by name. If this is an
>>>>> interface people want, we should design it properly, not just
>>>>> implement it with hints because we (might) have them.
>>>>> Some issues, please suggest approaches and other issues. (I'm using
>>>>> "name" here to mean host alias.)
>>>>> 1. What kind of control do users want?
>>>>> - all data on one server by name?
>>>>> - arbirtrary control of stripe sizes and host names?
>>>>> 2. New distribution name, or extension to existing ones?
>>>>> - dist-varstrip has a lot of flexibility, but no hostnames
>>>>> - maybe a new "dist-single-host-by-name" is all that is desired
>>>>> 3. Store hostnames in on-disk distribution?
>>>>> - guessing no for the single-stripe distro, but perhaps somebody
>>>>> can really think of a use case for this?
>>>>> 4. User API
>>>>> - through PVFS_dist_create
>>>>> - (please not through both PVFS_dist_create + some hint)
>>>>> - via environment variable too?
>>>>> If our design happens to end up as something that would be
>>>>> implemented well by hints, then we can think about using them. For
>>>>> now, let's just get the design correct.
>>>>> We can come back and argue the merits of a generic hint interface in
>>>>> a different thread.
>>>>> -- Pete
>>>>> _______________________________________________
>>>>> Pvfs2-developers mailing list
>>>>> Pvfs2-developers at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>>
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
More information about the Pvfs2-developers
mailing list