[Pvfs2-developers] bmi testcontext/testunexpected

Rob Ross rross at mcs.anl.gov
Tue Jan 6 20:51:09 EST 2009


Hi Sam,

My take on your email was that you were combining the two issues, so I  
wanted to make sure that we were in agreement that the alternative API  
was preferred (not that I think we should necessarily do anything  
about it at the moment). I'm glad we are in agreement.

The terms "scheduling" and "priority" are being tossed around here in  
a way that I don't think is appropriate. The current textcontext does  
neither prioritization nor scheduling, and neither would the proposed  
modified API (as described thus far). The current BMI behavior is more  
like a bug than anything else, although changing the behavior at this  
point would require some significant regression testing.

The I/O forwarding system probably ought to use the non-blocking PVFS  
calls so that it can better deal with this scenario anyway, right?

Rob

On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:

>
> On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
>
>> I think if we had this alternative design and one wanted to have  
>> different priorities, one would look for messages under different  
>> contexts as you say. But when you don't care about priority, it  
>> would be nice to be able to get everything in one call.
>
> I think you're arguing for a single testcontext function, instead of  
> the testcontext/testunexpected split.  I agree with that, but Phil  
> and I are arguing about something else.  Where should scheduling  
> decisions be made?  Within a BMI method, or by the API consumer?   
> I'm arguing for the latter.  Changing the API to be more consistent  
> or user friendly doesn't affect where we choose to set the priority.
>
> -sam
>
>>
>>
>> Rob
>>
>> On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
>>
>>>
>>> Changing the API as you describe would actually bring back the  
>>> original problem.  As is, the BMI_tcp_testcontext call knows that  
>>> there are unexpected messages waiting, so it returns immediately  
>>> (expecting a call to testunexpected to follow).  This is a  
>>> specific policy hard-coded in the tcp method.
>>>
>>> With just a single testcontext call and all expected and  
>>> unexpected messages going to that context, the tcp code would have  
>>> to put all the unexpected messages at the top of the context to  
>>> give them priority.  This would fix the particular problem that  
>>> Nawab has, but its still dictating policy (which messages get  
>>> priority) from within the particular BMI method.
>>>
>>> I agree that forcing the application to define the policy (with  
>>> threads or timeouts) is moving the problem elsewhere, but its  
>>> moving the problem to where it belongs.  Its our pvfs server that  
>>> wants unexpected messages to have priority, the bmi code itself  
>>> shouldn't dictate that priority.  We could define interfaces to  
>>> BMI that allow the policy to be set, but that's even further from  
>>> where we are now.
>>>
>>> -sam
>>>
>>> On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
>>>
>>>> Yeah a special named context for unexpected message would be a  
>>>> clean way to have done things... -- Rob
>>>>
>>>> On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
>>>>
>>>>> Yeah, I don't particularly like adding special cases either.
>>>>>
>>>>> I feel like making the consumer play with timeouts or use an  
>>>>> extra thread would be just as much of a hack/workaround,  
>>>>> though.  Its just moving the problem elsewhere.
>>>>>
>>>>> Fundamentally it seems more like a BMI API flaw.  It would have  
>>>>> made more sense (for example) if unexpected messages were  
>>>>> assigned to a specific context and the testunexpected() and  
>>>>> testcontext() functions were combined.  The consumer could then  
>>>>> use a single test call to retrieve both unexpected and normal  
>>>>> messages at once if they are in the same context (as in the  
>>>>> pvfs2-server use case).  Testing on a different context would  
>>>>> ignore the presence of unexpected messages (as in the problem  
>>>>> triggering use case here).
>>>>>
>>>>> There are other ways to deal with it, that's just an example.   
>>>>> We just need the API to better express the intention of the  
>>>>> caller (preferably in one function) so that BMI doesn't have to  
>>>>> optimize by guessing about what else is going on.
>>>>>
>>>>> That is more work than just adding a flag, though :)  It  
>>>>> probably depends on if we think the use case is going to be  
>>>>> around long enough to justify tweaking the API.
>>>>>
>>>>> -Phil
>>>>>
>>>>> Sam Lang wrote:
>>>>>> I've committed the set_info fix for this.  I'm not crazy about  
>>>>>> it, but it should work for now.  In the long term, we should  
>>>>>> probably move away from method specific hacks like this.  I.e.  
>>>>>> it should be up to the API consumer (our server) to adjust  
>>>>>> timeouts or call testunexpected in a separate thread.
>>>>>> Nawab, in the zoidfs init code after initializing BMI you need  
>>>>>> to call:
>>>>>> int check = 0;
>>>>>> BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
>>>>>> -sam
>>>>>> On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
>>>>>>> Sam Lang wrote:
>>>>>>>> Hi All,
>>>>>>>> I think Nawab has found a bug (or untested code path) in the  
>>>>>>>> BMI tcp method.  He's running a daemon that both receives  
>>>>>>>> unexpected requests (as a server), and receives expected  
>>>>>>>> responses (as a client).
>>>>>>>> In the BMI_testcontext call, if there aren't any completed  
>>>>>>>> (expected) operations, and there are completed unexpected  
>>>>>>>> receives, we return immediately, assuming that  
>>>>>>>> BMI_testunexpected will be called in turn.  I think the idea  
>>>>>>>> here is that we want to keep our latency down for unexpected  
>>>>>>>> messages, instead of doing work on expected messages while  
>>>>>>>> unexpected messages are waiting in the hopper.  But the  
>>>>>>>> daemon is single threaded, and making blocking PVFS_sys_*  
>>>>>>>> calls, so we essentially spin forever calling BMI_testcontext  
>>>>>>>> over and over.
>>>>>>>> I'm not sure of the best way to fix this.  Easy fixes would  
>>>>>>>> be to remove the check for completed unexpected receives, and/ 
>>>>>>>> or do tcp_do_work for a shorter timeout.
>>>>>>>> It seems like we have a special case for blocking PVFS_sys_*  
>>>>>>>> calls.  We want to ignore unexpected receives just in that  
>>>>>>>> case, and actually call tcp_do_work.  In other contexts, I  
>>>>>>>> think we want the behavior that we have now, where we assume  
>>>>>>>> that a BMI_testunexpected call will follow a BMI_testcontext  
>>>>>>>> call.  We could modify the testcontext call to take a  
>>>>>>>> separate parameter, but that seems messy.  We might also be  
>>>>>>>> able to handle this with separate BMI contexts somehow...
>>>>>>>
>>>>>>> I haven't dug in the code yet to see if I see any more elegant  
>>>>>>> way to handle it, but I wanted to mention that if you want to  
>>>>>>> add a special flag to toggle the behavior, it might be better  
>>>>>>> to just set it globally with the set_info() function rather  
>>>>>>> than modifying the testcontext() api.  That way you don't have  
>>>>>>> to change any of the other BMI methods. There are already a  
>>>>>>> couple of similar set_info() calls to toggle BMI behavior for  
>>>>>>> different use cases.
>>>>>>>
>>>>>>> -Phil
>>>>>
>>>>> _______________________________________________
>>>>> Pvfs2-developers mailing list
>>>>> Pvfs2-developers at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>
>



More information about the Pvfs2-developers mailing list