[Pvfs2-developers] bmi testcontext/testunexpected

Sam Lang slang at mcs.anl.gov
Tue Jan 6 22:51:11 EST 2009



On Jan 6, 2009, at 9:40 PM, Rob Ross <rross at mcs.anl.gov> wrote:

> the fact that zoidfs is blocking is irrelevant to how the server  
> implements servicing the calls. -- rob
>

The server implements servicing the calls by calling zoidfs functions,  
which in turn call PVFS..
-sam

> On Jan 6, 2009, at 9:15 PM, Sam Lang wrote:
>
>>
>> On Jan 6, 2009, at 7:51 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>>
>>> Hi Sam,
>>>
>>> My take on your email was that you were combining the two issues,  
>>> so I wanted to make sure that we were in agreement that the  
>>> alternative API was preferred (not that I think we should  
>>> necessarily do anything about it at the moment). I'm glad we are  
>>> in agreement.
>>>
>>> The terms "scheduling" and "priority" are being tossed around here  
>>> in a way that I don't think is appropriate. The current  
>>> textcontext does neither prioritization nor scheduling, and  
>>> neither would the proposed modified API (as described thus far).  
>>> The current BMI behavior is more like a bug than anything else,  
>>> although changing the behavior at this point would require some  
>>> significant regression testing.
>>>
>> testcontext is setting a priority, I can only assume it's a desired  
>> priority for our servers..  A separate thread in our server that  
>> called testunexpected and fired off the state machines would be  
>> fairly straightforward and prevent any starvation that might occur.
>>
>> In other words, we want the behavior, I just disagree with the  
>> notion that the behavior should be set by bmi tcp.
>>
>> The API is an orthogonal issue.
>>
>>
>>> The I/O forwarding system probably ought to use the non-blocking  
>>> PVFS calls so that it can better deal with this scenario anyway,  
>>> right?
>>>
>> zoidfs is a blocking API.
>> -sam
>>
>>> Rob
>>>
>>> On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:
>>>
>>>>
>>>> On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
>>>>
>>>>> I think if we had this alternative design and one wanted to have  
>>>>> different priorities, one would look for messages under  
>>>>> different contexts as you say. But when you don't care about  
>>>>> priority, it would be nice to be able to get everything in one  
>>>>> call.
>>>>
>>>> I think you're arguing for a single testcontext function, instead  
>>>> of the testcontext/testunexpected split.  I agree with that, but  
>>>> Phil and I are arguing about something else.  Where should  
>>>> scheduling decisions be made?  Within a BMI method, or by the API  
>>>> consumer?  I'm arguing for the latter.  Changing the API to be  
>>>> more consistent or user friendly doesn't affect where we choose  
>>>> to set the priority.
>>>>
>>>> -sam
>>>>
>>>>>
>>>>>
>>>>> Rob
>>>>>
>>>>> On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
>>>>>
>>>>>>
>>>>>> Changing the API as you describe would actually bring back the  
>>>>>> original problem.  As is, the BMI_tcp_testcontext call knows  
>>>>>> that there are unexpected messages waiting, so it returns  
>>>>>> immediately (expecting a call to testunexpected to follow).   
>>>>>> This is a specific policy hard-coded in the tcp method.
>>>>>>
>>>>>> With just a single testcontext call and all expected and  
>>>>>> unexpected messages going to that context, the tcp code would  
>>>>>> have to put all the unexpected messages at the top of the  
>>>>>> context to give them priority.  This would fix the particular  
>>>>>> problem that Nawab has, but its still dictating policy (which  
>>>>>> messages get priority) from within the particular BMI method.
>>>>>>
>>>>>> I agree that forcing the application to define the policy (with  
>>>>>> threads or timeouts) is moving the problem elsewhere, but its  
>>>>>> moving the problem to where it belongs.  Its our pvfs server  
>>>>>> that wants unexpected messages to have priority, the bmi code  
>>>>>> itself shouldn't dictate that priority.  We could define  
>>>>>> interfaces to BMI that allow the policy to be set, but that's  
>>>>>> even further from where we are now.
>>>>>>
>>>>>> -sam
>>>>>>
>>>>>> On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
>>>>>>
>>>>>>> Yeah a special named context for unexpected message would be a  
>>>>>>> clean way to have done things... -- Rob
>>>>>>>
>>>>>>> On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
>>>>>>>
>>>>>>>> Yeah, I don't particularly like adding special cases either.
>>>>>>>>
>>>>>>>> I feel like making the consumer play with timeouts or use an  
>>>>>>>> extra thread would be just as much of a hack/workaround,  
>>>>>>>> though.  Its just moving the problem elsewhere.
>>>>>>>>
>>>>>>>> Fundamentally it seems more like a BMI API flaw.  It would  
>>>>>>>> have made more sense (for example) if unexpected messages  
>>>>>>>> were assigned to a specific context and the testunexpected()  
>>>>>>>> and testcontext() functions were combined.  The consumer  
>>>>>>>> could then use a single test call to retrieve both unexpected  
>>>>>>>> and normal messages at once if they are in the same context  
>>>>>>>> (as in the pvfs2-server use case).  Testing on a different  
>>>>>>>> context would ignore the presence of unexpected messages (as  
>>>>>>>> in the problem triggering use case here).
>>>>>>>>
>>>>>>>> There are other ways to deal with it, that's just an  
>>>>>>>> example.  We just need the API to better express the  
>>>>>>>> intention of the caller (preferably in one function) so that  
>>>>>>>> BMI doesn't have to optimize by guessing about what else is  
>>>>>>>> going on.
>>>>>>>>
>>>>>>>> That is more work than just adding a flag, though :)  It  
>>>>>>>> probably depends on if we think the use case is going to be  
>>>>>>>> around long enough to justify tweaking the API.
>>>>>>>>
>>>>>>>> -Phil
>>>>>>>>
>>>>>>>> Sam Lang wrote:
>>>>>>>>> I've committed the set_info fix for this.  I'm not crazy  
>>>>>>>>> about it, but it should work for now.  In the long term, we  
>>>>>>>>> should probably move away from method specific hacks like  
>>>>>>>>> this.  I.e. it should be up to the API consumer (our server)  
>>>>>>>>> to adjust timeouts or call testunexpected in a separate  
>>>>>>>>> thread.
>>>>>>>>> Nawab, in the zoidfs init code after initializing BMI you  
>>>>>>>>> need to call:
>>>>>>>>> int check = 0;
>>>>>>>>> BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
>>>>>>>>> -sam
>>>>>>>>> On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
>>>>>>>>>> Sam Lang wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>> I think Nawab has found a bug (or untested code path) in  
>>>>>>>>>>> the BMI tcp method.  He's running a daemon that both  
>>>>>>>>>>> receives unexpected requests (as a server), and receives  
>>>>>>>>>>> expected responses (as a client).
>>>>>>>>>>> In the BMI_testcontext call, if there aren't any completed  
>>>>>>>>>>> (expected) operations, and there are completed unexpected  
>>>>>>>>>>> receives, we return immediately, assuming that  
>>>>>>>>>>> BMI_testunexpected will be called in turn.  I think the  
>>>>>>>>>>> idea here is that we want to keep our latency down for  
>>>>>>>>>>> unexpected messages, instead of doing work on expected  
>>>>>>>>>>> messages while unexpected messages are waiting in the  
>>>>>>>>>>> hopper.  But the daemon is single threaded, and making  
>>>>>>>>>>> blocking PVFS_sys_* calls, so we essentially spin forever  
>>>>>>>>>>> calling BMI_testcontext over and over.
>>>>>>>>>>> I'm not sure of the best way to fix this.  Easy fixes  
>>>>>>>>>>> would be to remove the check for completed unexpected  
>>>>>>>>>>> receives, and/or do tcp_do_work for a shorter timeout.
>>>>>>>>>>> It seems like we have a special case for blocking  
>>>>>>>>>>> PVFS_sys_* calls.  We want to ignore unexpected receives  
>>>>>>>>>>> just in that case, and actually call tcp_do_work.  In  
>>>>>>>>>>> other contexts, I think we want the behavior that we have  
>>>>>>>>>>> now, where we assume that a BMI_testunexpected call will  
>>>>>>>>>>> follow a BMI_testcontext call.  We could modify the  
>>>>>>>>>>> testcontext call to take a separate parameter, but that  
>>>>>>>>>>> seems messy.  We might also be able to handle this with  
>>>>>>>>>>> separate BMI contexts somehow...
>>>>>>>>>>
>>>>>>>>>> I haven't dug in the code yet to see if I see any more  
>>>>>>>>>> elegant way to handle it, but I wanted to mention that if  
>>>>>>>>>> you want to add a special flag to toggle the behavior, it  
>>>>>>>>>> might be better to just set it globally with the set_info()  
>>>>>>>>>> function rather than modifying the testcontext() api.  That  
>>>>>>>>>> way you don't have to change any of the other BMI methods.  
>>>>>>>>>> There are already a couple of similar set_info() calls to  
>>>>>>>>>> toggle BMI behavior for different use cases.
>>>>>>>>>>
>>>>>>>>>> -Phil
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pvfs2-developers mailing list
>>>>>>>> Pvfs2-developers at beowulf-underground.org
>>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>


More information about the Pvfs2-developers mailing list