[Pvfs2-developers] Re: the halloween bug fixed

Sam Lang slang at mcs.anl.gov
Tue Oct 9 09:39:12 EDT 2007


On Oct 9, 2007, at 8:27 AM, Pete Wyckoff wrote:

> slang at mcs.anl.gov wrote on Mon, 08 Oct 2007 14:56 -0500:
>> True, although my patch changes that, because the address reference
>> list is accessed based on the PVFS_BMI_addr_t.  The
>> bmi_method_addr_reg_callback function returns a PVFS_BMI_addr_t,
>> which the method is meant to store, and when it comes time to call
>> bmi_method_addr_forget_callback, it passes that PVFS_BMI_addr_t it
>> stored.
> [..]
>> Within the particular method (tcp being the one I'm looking at), the
>> address is destroyed (tcp_forget_addr is called).  But the address
>> reference is never being removed from the reference list.  In the
>> case of tcp, this is a big problem (the bug in question), because tcp
>> calls bmi_method_addr_reg_callback for each new connection, not just
>> each new peer that connects.
>
> Thanks for all the explanation.  The crucial difference with TCP
> that I wasn't grokking was that it doesn't search its own internal
> peer list---it always registers each new connection.
>
> Thus your "forget" approach seems good.  Except for one aspect.  Why
> force the method to store the PVFS_BMI_addr_t just so it can hand it
> back to BMI core, which then convers it into a struct method_addr?
> Can you just pass the struct method_addr directly?  If not, no big
> deal.

Its not converting it to method_addr.  I'm doing a lookup into the  
reference list based on the PVFS_BMI_addr_t, and getting back a  
ref_st_p.

>
> More generally, it bugs me that both core BMI and each method must
> keep separate lists of addresses.  It's probably time to expose the
> data structure to BMI methods so we have just one list.  But this is
> certainly more than you set out to do.

Yeah, but I agree its a mess.  As we head down the path of multiple  
methods enabled though, it seems like we will want to allow an  
individual method to get at its own peer/connected addresses easily,  
without having to iterate through a list where another method has a  
bunch of addresses already.

One alternative might be to throw out the address management (this  
reference list) in the bmi control layer, in favor of forcing methods  
to manage their own (since most of them do anyway), and instead of  
creating PVFS_BMI_addr_t values from id_gen_fast_register (a hash of  
the reference pointer), we could come up with a scheme that splits  
the 64bit value into a method type and an address value that the  
method returns.  I think that would allow us to keep with the  
interface layering that we have now, although it would require some  
address management in the tcp method (and possibly others).

-sam

>
> 		-- Pete
>



More information about the Pvfs2-developers mailing list