[Pvfs2-developers] Re: the halloween bug fixed

Scott Atchley atchley at myri.com
Tue Oct 9 09:02:56 EDT 2007


On Oct 8, 2007, at 3:56 PM, Sam Lang wrote:

> On Oct 8, 2007, at 1:34 PM, Pete Wyckoff wrote:
>
>> Tell me which of these things are true/false, and perhaps add more
>> so we Scott and I can understand.
>>
>> 1. Address types
>>
>> - core PVFS refers to peers using a 64-bit opaque BMI_addr_t
>
> True.
>
>> - BMI has a list that maps BMI_addr_to to a method (e.g. mx or tcp)
>>     and a struct method_addr *
>
> True.  I've been referring to this part of the code as the bmi  
> control layer (there's references to that name in the code).  It  
> includes the 'address reference list', which holds the ref_st_p  
> instances.
>
>> - BMI methods do not see BMI_addr_t.  They see struct method_addr *.
>>     That struct has a void * that is filled with method-specific
>>     items, like a connection structure.
>
> True, although my patch changes that, because the address reference  
> list is accessed based on the PVFS_BMI_addr_t.  The  
> bmi_method_addr_reg_callback function returns a PVFS_BMI_addr_t,  
> which the method is meant to store, and when it comes time to call  
> bmi_method_addr_forget_callback, it passes that PVFS_BMI_addr_t it  
> stored.
>
>> 2. Address creation
>>
>> - apps ask for peers by name:  tcp://myhost:1223/pvfs
>>
>> - these names are parsed by the appropriate method to return
>>     an existing or new struct method_addr *.  Methods are expected
>>     to keep a list of known peers?  IB does.
>
> I think they all do, but I'm not sure that they have to.  The  
> reference list stores the method_addr* as well, so the particular  
> method could just throw everything in there before calling  
> bmi_method_addr_reg_callback.  If the method wants to iterate  
> through its known addresses though, it needs to manage them itself  
> as well.
>
>> - servers can also receive new connections, in which case they
>>     must build a struct method_addr, and register it with core
>>     BMI, through bmi_method_addr_reg_callback()
>
> True.
>
>> 3. Address destruction
>>
>> - currently never happens.  In theory, core BMI can request
>>     that a method drop an address, perhaps when out of resources.
>
> Within the particular method (tcp being the one I'm looking at),  
> the address is destroyed (tcp_forget_addr is called).  But the  
> address reference is never being removed from the reference list.   
> In the case of tcp, this is a big problem (the bug in question),  
> because tcp calls bmi_method_addr_reg_callback for each new  
> connection, not just each new peer that connects.

Why not have tcp simply not register on each connection, but on each  
new client. The handle_new_connection() function is only used on the  
server.

> The patch implements a complementary callback function:  
> bmi_method_addr_forget_callback, to be called by the individual  
> methods when the connection is closed or the address is removed  
> from the method's own management structures.  In the case of tcp, a  
> connection is closed when poll or epoll returns an IN event on a  
> socket (through testcontext) and the following read returns zero  
> (EOF).  Its at this point the method now (with my patch) calls  
> bmi_method_addr_forget_callback to remove the address reference  
> from the list.

I think it is stretching a bit for the server's BMI method to make an  
assumption about the fate of a client due to a loss of communication.  
The loss could be permanent or a re-connect attempt may already be  
initiated.

Is the determination about when to give up on a client (e.g. after  
some time period, via some userspace utility, etc.) better handled at  
the PVFS2 layer and then use BMI_get_info(DROP_ADDR) to handle it?

>> - BMI_addr_t and struct method_addr and method-internal state are
>>     kept synchronized how?  Through which calls?  Who frees which
>>     things when?
>
> On the server, PVFS_BMI_addr_t gets added to the reference list by  
> bmi_method_addr_reg_callback.  It gets removed from the list (with  
> my patch) by bmi_method_addr_forget_callback, before the method  
> frees up the address structures it was managing internally.
>
>> I'm obviously wrong and confused about #3.  And maybe 2b makes
>> a bad assumption.
>
> What you've described sounds right.
>
>> Then how does your new forget callback fit in with all this?
>> Somehow I thought the endless list growth was purely a TCP problem,
>> not something that needed core plumbing.
>
> I think its only a tcp problem because the tcp method calls  
> bmi_method_addr_reg_callback for each new connection.  It looks  
> like the other methods only call it for each new peer?  The list  
> just doesn't get larger over time, so unless we plan to allow for  
> millions of clients, its not an issue for other methods.  And in  
> that case, they don't have to worry about calling the  
> forget_callback.  I was assuming that they should in the interest  
> of completeness.
>
> -sam



More information about the Pvfs2-developers mailing list