[Pvfs2-developers] bmi testcontext/testunexpected

Rob Ross rross at mcs.anl.gov
Tue Jan 6 22:40:33 EST 2009


the fact that zoidfs is blocking is irrelevant to how the server  
implements servicing the calls. -- rob

On Jan 6, 2009, at 9:15 PM, Sam Lang wrote:

>
> On Jan 6, 2009, at 7:51 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>
>> Hi Sam,
>>
>> My take on your email was that you were combining the two issues,  
>> so I wanted to make sure that we were in agreement that the  
>> alternative API was preferred (not that I think we should  
>> necessarily do anything about it at the moment). I'm glad we are in  
>> agreement.
>>
>> The terms "scheduling" and "priority" are being tossed around here  
>> in a way that I don't think is appropriate. The current textcontext  
>> does neither prioritization nor scheduling, and neither would the  
>> proposed modified API (as described thus far). The current BMI  
>> behavior is more like a bug than anything else, although changing  
>> the behavior at this point would require some significant  
>> regression testing.
>>
> testcontext is setting a priority, I can only assume it's a desired  
> priority for our servers..  A separate thread in our server that  
> called testunexpected and fired off the state machines would be  
> fairly straightforward and prevent any starvation that might occur.
>
> In other words, we want the behavior, I just disagree with the  
> notion that the behavior should be set by bmi tcp.
>
> The API is an orthogonal issue.
>
>
>> The I/O forwarding system probably ought to use the non-blocking  
>> PVFS calls so that it can better deal with this scenario anyway,  
>> right?
>>
> zoidfs is a blocking API.
> -sam
>
>> Rob
>>
>> On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:
>>
>>>
>>> On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
>>>
>>>> I think if we had this alternative design and one wanted to have  
>>>> different priorities, one would look for messages under different  
>>>> contexts as you say. But when you don't care about priority, it  
>>>> would be nice to be able to get everything in one call.
>>>
>>> I think you're arguing for a single testcontext function, instead  
>>> of the testcontext/testunexpected split.  I agree with that, but  
>>> Phil and I are arguing about something else.  Where should  
>>> scheduling decisions be made?  Within a BMI method, or by the API  
>>> consumer?  I'm arguing for the latter.  Changing the API to be  
>>> more consistent or user friendly doesn't affect where we choose to  
>>> set the priority.
>>>
>>> -sam
>>>
>>>>
>>>>
>>>> Rob
>>>>
>>>> On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
>>>>
>>>>>
>>>>> Changing the API as you describe would actually bring back the  
>>>>> original problem.  As is, the BMI_tcp_testcontext call knows  
>>>>> that there are unexpected messages waiting, so it returns  
>>>>> immediately (expecting a call to testunexpected to follow).   
>>>>> This is a specific policy hard-coded in the tcp method.
>>>>>
>>>>> With just a single testcontext call and all expected and  
>>>>> unexpected messages going to that context, the tcp code would  
>>>>> have to put all the unexpected messages at the top of the  
>>>>> context to give them priority.  This would fix the particular  
>>>>> problem that Nawab has, but its still dictating policy (which  
>>>>> messages get priority) from within the particular BMI method.
>>>>>
>>>>> I agree that forcing the application to define the policy (with  
>>>>> threads or timeouts) is moving the problem elsewhere, but its  
>>>>> moving the problem to where it belongs.  Its our pvfs server  
>>>>> that wants unexpected messages to have priority, the bmi code  
>>>>> itself shouldn't dictate that priority.  We could define  
>>>>> interfaces to BMI that allow the policy to be set, but that's  
>>>>> even further from where we are now.
>>>>>
>>>>> -sam
>>>>>
>>>>> On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
>>>>>
>>>>>> Yeah a special named context for unexpected message would be a  
>>>>>> clean way to have done things... -- Rob
>>>>>>
>>>>>> On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
>>>>>>
>>>>>>> Yeah, I don't particularly like adding special cases either.
>>>>>>>
>>>>>>> I feel like making the consumer play with timeouts or use an  
>>>>>>> extra thread would be just as much of a hack/workaround,  
>>>>>>> though.  Its just moving the problem elsewhere.
>>>>>>>
>>>>>>> Fundamentally it seems more like a BMI API flaw.  It would  
>>>>>>> have made more sense (for example) if unexpected messages were  
>>>>>>> assigned to a specific context and the testunexpected() and  
>>>>>>> testcontext() functions were combined.  The consumer could  
>>>>>>> then use a single test call to retrieve both unexpected and  
>>>>>>> normal messages at once if they are in the same context (as in  
>>>>>>> the pvfs2-server use case).  Testing on a different context  
>>>>>>> would ignore the presence of unexpected messages (as in the  
>>>>>>> problem triggering use case here).
>>>>>>>
>>>>>>> There are other ways to deal with it, that's just an example.   
>>>>>>> We just need the API to better express the intention of the  
>>>>>>> caller (preferably in one function) so that BMI doesn't have  
>>>>>>> to optimize by guessing about what else is going on.
>>>>>>>
>>>>>>> That is more work than just adding a flag, though :)  It  
>>>>>>> probably depends on if we think the use case is going to be  
>>>>>>> around long enough to justify tweaking the API.
>>>>>>>
>>>>>>> -Phil
>>>>>>>
>>>>>>> Sam Lang wrote:
>>>>>>>> I've committed the set_info fix for this.  I'm not crazy  
>>>>>>>> about it, but it should work for now.  In the long term, we  
>>>>>>>> should probably move away from method specific hacks like  
>>>>>>>> this.  I.e. it should be up to the API consumer (our server)  
>>>>>>>> to adjust timeouts or call testunexpected in a separate thread.
>>>>>>>> Nawab, in the zoidfs init code after initializing BMI you  
>>>>>>>> need to call:
>>>>>>>> int check = 0;
>>>>>>>> BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
>>>>>>>> -sam
>>>>>>>> On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
>>>>>>>>> Sam Lang wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>> I think Nawab has found a bug (or untested code path) in  
>>>>>>>>>> the BMI tcp method.  He's running a daemon that both  
>>>>>>>>>> receives unexpected requests (as a server), and receives  
>>>>>>>>>> expected responses (as a client).
>>>>>>>>>> In the BMI_testcontext call, if there aren't any completed  
>>>>>>>>>> (expected) operations, and there are completed unexpected  
>>>>>>>>>> receives, we return immediately, assuming that  
>>>>>>>>>> BMI_testunexpected will be called in turn.  I think the  
>>>>>>>>>> idea here is that we want to keep our latency down for  
>>>>>>>>>> unexpected messages, instead of doing work on expected  
>>>>>>>>>> messages while unexpected messages are waiting in the  
>>>>>>>>>> hopper.  But the daemon is single threaded, and making  
>>>>>>>>>> blocking PVFS_sys_* calls, so we essentially spin forever  
>>>>>>>>>> calling BMI_testcontext over and over.
>>>>>>>>>> I'm not sure of the best way to fix this.  Easy fixes would  
>>>>>>>>>> be to remove the check for completed unexpected receives,  
>>>>>>>>>> and/or do tcp_do_work for a shorter timeout.
>>>>>>>>>> It seems like we have a special case for blocking  
>>>>>>>>>> PVFS_sys_* calls.  We want to ignore unexpected receives  
>>>>>>>>>> just in that case, and actually call tcp_do_work.  In other  
>>>>>>>>>> contexts, I think we want the behavior that we have now,  
>>>>>>>>>> where we assume that a BMI_testunexpected call will follow  
>>>>>>>>>> a BMI_testcontext call.  We could modify the testcontext  
>>>>>>>>>> call to take a separate parameter, but that seems messy.   
>>>>>>>>>> We might also be able to handle this with separate BMI  
>>>>>>>>>> contexts somehow...
>>>>>>>>>
>>>>>>>>> I haven't dug in the code yet to see if I see any more  
>>>>>>>>> elegant way to handle it, but I wanted to mention that if  
>>>>>>>>> you want to add a special flag to toggle the behavior, it  
>>>>>>>>> might be better to just set it globally with the set_info()  
>>>>>>>>> function rather than modifying the testcontext() api.  That  
>>>>>>>>> way you don't have to change any of the other BMI methods.  
>>>>>>>>> There are already a couple of similar set_info() calls to  
>>>>>>>>> toggle BMI behavior for different use cases.
>>>>>>>>>
>>>>>>>>> -Phil
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pvfs2-developers mailing list
>>>>>>> Pvfs2-developers at beowulf-underground.org
>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>>>
>>>>>
>>>>
>>>
>>



More information about the Pvfs2-developers mailing list