[Pvfs2-developers] Re: the halloween bug fixed
Scott Atchley
atchley at myri.com
Tue Oct 9 09:02:56 EDT 2007
On Oct 8, 2007, at 3:56 PM, Sam Lang wrote:
> On Oct 8, 2007, at 1:34 PM, Pete Wyckoff wrote:
>
>> Tell me which of these things are true/false, and perhaps add more
>> so we Scott and I can understand.
>>
>> 1. Address types
>>
>> - core PVFS refers to peers using a 64-bit opaque BMI_addr_t
>
> True.
>
>> - BMI has a list that maps BMI_addr_to to a method (e.g. mx or tcp)
>> and a struct method_addr *
>
> True. I've been referring to this part of the code as the bmi
> control layer (there's references to that name in the code). It
> includes the 'address reference list', which holds the ref_st_p
> instances.
>
>> - BMI methods do not see BMI_addr_t. They see struct method_addr *.
>> That struct has a void * that is filled with method-specific
>> items, like a connection structure.
>
> True, although my patch changes that, because the address reference
> list is accessed based on the PVFS_BMI_addr_t. The
> bmi_method_addr_reg_callback function returns a PVFS_BMI_addr_t,
> which the method is meant to store, and when it comes time to call
> bmi_method_addr_forget_callback, it passes that PVFS_BMI_addr_t it
> stored.
>
>> 2. Address creation
>>
>> - apps ask for peers by name: tcp://myhost:1223/pvfs
>>
>> - these names are parsed by the appropriate method to return
>> an existing or new struct method_addr *. Methods are expected
>> to keep a list of known peers? IB does.
>
> I think they all do, but I'm not sure that they have to. The
> reference list stores the method_addr* as well, so the particular
> method could just throw everything in there before calling
> bmi_method_addr_reg_callback. If the method wants to iterate
> through its known addresses though, it needs to manage them itself
> as well.
>
>> - servers can also receive new connections, in which case they
>> must build a struct method_addr, and register it with core
>> BMI, through bmi_method_addr_reg_callback()
>
> True.
>
>> 3. Address destruction
>>
>> - currently never happens. In theory, core BMI can request
>> that a method drop an address, perhaps when out of resources.
>
> Within the particular method (tcp being the one I'm looking at),
> the address is destroyed (tcp_forget_addr is called). But the
> address reference is never being removed from the reference list.
> In the case of tcp, this is a big problem (the bug in question),
> because tcp calls bmi_method_addr_reg_callback for each new
> connection, not just each new peer that connects.
Why not have tcp simply not register on each connection, but on each
new client. The handle_new_connection() function is only used on the
server.
> The patch implements a complementary callback function:
> bmi_method_addr_forget_callback, to be called by the individual
> methods when the connection is closed or the address is removed
> from the method's own management structures. In the case of tcp, a
> connection is closed when poll or epoll returns an IN event on a
> socket (through testcontext) and the following read returns zero
> (EOF). Its at this point the method now (with my patch) calls
> bmi_method_addr_forget_callback to remove the address reference
> from the list.
I think it is stretching a bit for the server's BMI method to make an
assumption about the fate of a client due to a loss of communication.
The loss could be permanent or a re-connect attempt may already be
initiated.
Is the determination about when to give up on a client (e.g. after
some time period, via some userspace utility, etc.) better handled at
the PVFS2 layer and then use BMI_get_info(DROP_ADDR) to handle it?
>> - BMI_addr_t and struct method_addr and method-internal state are
>> kept synchronized how? Through which calls? Who frees which
>> things when?
>
> On the server, PVFS_BMI_addr_t gets added to the reference list by
> bmi_method_addr_reg_callback. It gets removed from the list (with
> my patch) by bmi_method_addr_forget_callback, before the method
> frees up the address structures it was managing internally.
>
>> I'm obviously wrong and confused about #3. And maybe 2b makes
>> a bad assumption.
>
> What you've described sounds right.
>
>> Then how does your new forget callback fit in with all this?
>> Somehow I thought the endless list growth was purely a TCP problem,
>> not something that needed core plumbing.
>
> I think its only a tcp problem because the tcp method calls
> bmi_method_addr_reg_callback for each new connection. It looks
> like the other methods only call it for each new peer? The list
> just doesn't get larger over time, so unless we plan to allow for
> millions of clients, its not an issue for other methods. And in
> that case, they don't have to worry about calling the
> forget_callback. I was assuming that they should in the interest
> of completeness.
>
> -sam
More information about the Pvfs2-developers
mailing list