[Pvfs2-developers] bmi multiple address endpoints

Sam Lang slang at mcs.anl.gov
Tue Nov 6 19:18:24 EST 2007


Here's take2.  Hopefully a little cleaner.

The more I think about the issue with multiple protocols and primary/ 
secondary addresses, the more complicated it gets.  I've added an  
option to the server to specify the index in the set of primary/ 
secondary addresses to listen on, allowing a server to be started and  
listen on the 3rd set of addresses in the endpoint string.  This  
doesn't fix the problem on the client though, as we don't really want  
to iterate over both the primary/secondary endpoints and the  
different protocols.  I guess maybe the protocol on the client has to  
be chosen based on the protocol specified in the mntent.

-sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bmi-maddrs-take2.patch
Type: application/octet-stream
Size: 111158 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071106/9f19118e/bmi-maddrs-take2-0001.obj
-------------- next part --------------

On Nov 6, 2007, at 2:27 PM, Sam Lang wrote:

>
> On Nov 6, 2007, at 1:41 PM, Pete Wyckoff wrote:
>
>> slang at mcs.anl.gov wrote on Tue, 06 Nov 2007 10:50 -0600:
>>> The attached patch implements BMI multiple address endpoints that we
>>> talked about some time ago.  To refresh everyone's memory, this
>>> allows a set of addresses to be specified:
>>>
>>> tcp://host1:3334/pvfs2-fs,tcp://host2:3335/pvfs2-fs,tcp::// 
>>> host3:3336/
>>> pvfs2-fs
>>>
>>> In the config file for a given storage endpoint.  The BMI code
>>> manages the endpoint, setting the currently used address to the  
>>> first
>>> one in the list.  On message failure, the endpoint is  
>>> transitioned to
>>> point to the next address in the list.  This continues in a round-
>>> robin fashion.
>>
>> This is good stuff.  I'd like to help review it.  Can you do some
>> trivial things first to make that easier?
>>
>> 1.
>>> +    struct bmi_endpoint_ref_s *newref;
>> [..]
>>> +    /* haven't seen any of these addresses before, add a new  
>>> endpoint */
>>> +    newref = malloc(sizeof(struct bmi_addr_ref_s));
>>
>> Please change all malloc(sizeof()) and memset(,,sizeof()) to use the
>> variable name, not its type.  The bug you did above (and other
>> places) is way too common.  We just have to stop doing that.  Like
>> this instead:
>>
>> 	newref = malloc(sizeof(*newref));
>>
>> 2.
>> There's a bunch of stuff that seems out of place.  Can you check
>> that in or push it to the side so we can concentrate on the core?
>> 179 kB is a big patch.  :)
>
> The bmi-addr.[ch] files are new files, and the critical part of the  
> patch.  They obsolete the ref_st calls, so the reference-list.[ch]  
> files have been removed in the patch.  That may be adding to the  
> number of lines.
>
>>
>> Renaming all the method_ops is a big uninteresting part.
>
> Yeah, I got tired of calling them over and over with the  
> BMI_method_ tagged on the front.  I can try to pull that stuff out.
>
>>
>> Adding bmi_ in front of everything too, but it's too hard to rip
>> that out mid-patch now.  Random whitespace fixes too.  Good changes,
>> just hard to read.
>>
>> 3.
>> Some trivial bugs.
>>
>>> +    ref->current = ref->current + 1 % ref->count;
>>
>> Check your precedence table.
>>
>>> + * (C) 2001 Clemson University and The University of Chicago
>>
>> New files go back in time.
>>
>> Can you put some comments above each of the three new structs in
>> bmi-addr.c?  I keep getting confused on which is the old-style
>> addr and which is the comma-separated list.  And what the various
>> "link" and "refs" fields point to.
>
> Sure thing.
>
>>
>>> I've done some basic testing, but there's still more to do.  The
>>> client IO state machine is a bear, and testing all the cases where
>>> things could failover (requests, flows, acks, etc.) is going to take
>>> some more work.  I wanted to get the patch out there to allow others
>>> to provide feedback.
>>
>> Yeah, totally.  But it can be made to work.
>>
>> What did we decide do with mixed method usage?  The old semantic
>> was "ib://foo:2345/pvfs2-fs,tcp://foo:2347/pvfs2-fs" means try to
>> use IB, but if you don't have an IB nic, switch to TCP.  I agree we
>> decided that was less interesting.  Do we just add docs that say
>> that this comma is now for multi-pathing?  If people try this
>> example with the new code, it will flip from IB to TCP at every
>> timeout.  The old behavior was to stick with the first one where
>> you had the hardware.  In other words, probably some docs somewhere
>> should be added to this patch.
>
> Right, I thought we had decided to go the multi-path route.  I  
> guess there could be a config option that would set the flipping  
> from round-robin to try-once, giving the old behavior.
>
>>
>> On the server side, the ,-separated addresses mean "listen on all
>> these interfaces".  What do servers do now when they see your
>> tcp://host1,tcp://host2 example string above?  Looks like they would
>> fail to listen on anything (host1 can't bind to host2 address?).
>> This has to be in the fs.conf as the Alias string for each server so
>> that clients can find the IO servers, not just in the pvfs2tab to
>> find the config server.
>
> Yeah that formatting isn't going to work.  I wanted to keep it  
> simple, but I guess that's not possible.  Should we separate  
> fallback addresses with ';' instead?
>
> tcp://host1:3331,ib://host1:3335;tcp://host2:3332,ib://host2:3336
>
> Something like that?
> -sam
>
>>
>> 		-- Pete
>>
>



More information about the Pvfs2-developers mailing list