[Pvfs2-developers] bmi testcontext/testunexpected

Rob Ross rross at mcs.anl.gov
Tue Jan 6 15:52:20 EST 2009


Yeah a special named context for unexpected message would be a clean  
way to have done things... -- Rob

On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:

> Yeah, I don't particularly like adding special cases either.
>
> I feel like making the consumer play with timeouts or use an extra  
> thread would be just as much of a hack/workaround, though.  Its just  
> moving the problem elsewhere.
>
> Fundamentally it seems more like a BMI API flaw.  It would have made  
> more sense (for example) if unexpected messages were assigned to a  
> specific context and the testunexpected() and testcontext()  
> functions were combined.  The consumer could then use a single test  
> call to retrieve both unexpected and normal messages at once if they  
> are in the same context (as in the pvfs2-server use case).  Testing  
> on a different context would ignore the presence of unexpected  
> messages (as in the problem triggering use case here).
>
> There are other ways to deal with it, that's just an example.  We  
> just need the API to better express the intention of the caller  
> (preferably in one function) so that BMI doesn't have to optimize by  
> guessing about what else is going on.
>
> That is more work than just adding a flag, though :)  It probably  
> depends on if we think the use case is going to be around long  
> enough to justify tweaking the API.
>
> -Phil
>
> Sam Lang wrote:
>> I've committed the set_info fix for this.  I'm not crazy about it,  
>> but it should work for now.  In the long term, we should probably  
>> move away from method specific hacks like this.  I.e. it should be  
>> up to the API consumer (our server) to adjust timeouts or call  
>> testunexpected in a separate thread.
>> Nawab, in the zoidfs init code after initializing BMI you need to  
>> call:
>> int check = 0;
>> BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
>> -sam
>> On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
>>> Sam Lang wrote:
>>>> Hi All,
>>>> I think Nawab has found a bug (or untested code path) in the BMI  
>>>> tcp method.  He's running a daemon that both receives unexpected  
>>>> requests (as a server), and receives expected responses (as a  
>>>> client).
>>>> In the BMI_testcontext call, if there aren't any completed  
>>>> (expected) operations, and there are completed unexpected  
>>>> receives, we return immediately, assuming that BMI_testunexpected  
>>>> will be called in turn.  I think the idea here is that we want to  
>>>> keep our latency down for unexpected messages, instead of doing  
>>>> work on expected messages while unexpected messages are waiting  
>>>> in the hopper.  But the daemon is single threaded, and making  
>>>> blocking PVFS_sys_* calls, so we essentially spin forever calling  
>>>> BMI_testcontext over and over.
>>>> I'm not sure of the best way to fix this.  Easy fixes would be to  
>>>> remove the check for completed unexpected receives, and/or do  
>>>> tcp_do_work for a shorter timeout.
>>>> It seems like we have a special case for blocking PVFS_sys_*  
>>>> calls.  We want to ignore unexpected receives just in that case,  
>>>> and actually call tcp_do_work.  In other contexts, I think we  
>>>> want the behavior that we have now, where we assume that a  
>>>> BMI_testunexpected call will follow a BMI_testcontext call.  We  
>>>> could modify the testcontext call to take a separate parameter,  
>>>> but that seems messy.  We might also be able to handle this with  
>>>> separate BMI contexts somehow...
>>>
>>> I haven't dug in the code yet to see if I see any more elegant way  
>>> to handle it, but I wanted to mention that if you want to add a  
>>> special flag to toggle the behavior, it might be better to just  
>>> set it globally with the set_info() function rather than modifying  
>>> the testcontext() api.  That way you don't have to change any of  
>>> the other BMI methods. There are already a couple of similar  
>>> set_info() calls to toggle BMI behavior for different use cases.
>>>
>>> -Phil
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



More information about the Pvfs2-developers mailing list