[Pvfs2-developers] bmi testcontext/testunexpected
Rob Ross
rross at mcs.anl.gov
Tue Jan 6 22:40:33 EST 2009
the fact that zoidfs is blocking is irrelevant to how the server
implements servicing the calls. -- rob
On Jan 6, 2009, at 9:15 PM, Sam Lang wrote:
>
> On Jan 6, 2009, at 7:51 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>
>> Hi Sam,
>>
>> My take on your email was that you were combining the two issues,
>> so I wanted to make sure that we were in agreement that the
>> alternative API was preferred (not that I think we should
>> necessarily do anything about it at the moment). I'm glad we are in
>> agreement.
>>
>> The terms "scheduling" and "priority" are being tossed around here
>> in a way that I don't think is appropriate. The current textcontext
>> does neither prioritization nor scheduling, and neither would the
>> proposed modified API (as described thus far). The current BMI
>> behavior is more like a bug than anything else, although changing
>> the behavior at this point would require some significant
>> regression testing.
>>
> testcontext is setting a priority, I can only assume it's a desired
> priority for our servers.. A separate thread in our server that
> called testunexpected and fired off the state machines would be
> fairly straightforward and prevent any starvation that might occur.
>
> In other words, we want the behavior, I just disagree with the
> notion that the behavior should be set by bmi tcp.
>
> The API is an orthogonal issue.
>
>
>> The I/O forwarding system probably ought to use the non-blocking
>> PVFS calls so that it can better deal with this scenario anyway,
>> right?
>>
> zoidfs is a blocking API.
> -sam
>
>> Rob
>>
>> On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:
>>
>>>
>>> On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
>>>
>>>> I think if we had this alternative design and one wanted to have
>>>> different priorities, one would look for messages under different
>>>> contexts as you say. But when you don't care about priority, it
>>>> would be nice to be able to get everything in one call.
>>>
>>> I think you're arguing for a single testcontext function, instead
>>> of the testcontext/testunexpected split. I agree with that, but
>>> Phil and I are arguing about something else. Where should
>>> scheduling decisions be made? Within a BMI method, or by the API
>>> consumer? I'm arguing for the latter. Changing the API to be
>>> more consistent or user friendly doesn't affect where we choose to
>>> set the priority.
>>>
>>> -sam
>>>
>>>>
>>>>
>>>> Rob
>>>>
>>>> On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
>>>>
>>>>>
>>>>> Changing the API as you describe would actually bring back the
>>>>> original problem. As is, the BMI_tcp_testcontext call knows
>>>>> that there are unexpected messages waiting, so it returns
>>>>> immediately (expecting a call to testunexpected to follow).
>>>>> This is a specific policy hard-coded in the tcp method.
>>>>>
>>>>> With just a single testcontext call and all expected and
>>>>> unexpected messages going to that context, the tcp code would
>>>>> have to put all the unexpected messages at the top of the
>>>>> context to give them priority. This would fix the particular
>>>>> problem that Nawab has, but its still dictating policy (which
>>>>> messages get priority) from within the particular BMI method.
>>>>>
>>>>> I agree that forcing the application to define the policy (with
>>>>> threads or timeouts) is moving the problem elsewhere, but its
>>>>> moving the problem to where it belongs. Its our pvfs server
>>>>> that wants unexpected messages to have priority, the bmi code
>>>>> itself shouldn't dictate that priority. We could define
>>>>> interfaces to BMI that allow the policy to be set, but that's
>>>>> even further from where we are now.
>>>>>
>>>>> -sam
>>>>>
>>>>> On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
>>>>>
>>>>>> Yeah a special named context for unexpected message would be a
>>>>>> clean way to have done things... -- Rob
>>>>>>
>>>>>> On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
>>>>>>
>>>>>>> Yeah, I don't particularly like adding special cases either.
>>>>>>>
>>>>>>> I feel like making the consumer play with timeouts or use an
>>>>>>> extra thread would be just as much of a hack/workaround,
>>>>>>> though. Its just moving the problem elsewhere.
>>>>>>>
>>>>>>> Fundamentally it seems more like a BMI API flaw. It would
>>>>>>> have made more sense (for example) if unexpected messages were
>>>>>>> assigned to a specific context and the testunexpected() and
>>>>>>> testcontext() functions were combined. The consumer could
>>>>>>> then use a single test call to retrieve both unexpected and
>>>>>>> normal messages at once if they are in the same context (as in
>>>>>>> the pvfs2-server use case). Testing on a different context
>>>>>>> would ignore the presence of unexpected messages (as in the
>>>>>>> problem triggering use case here).
>>>>>>>
>>>>>>> There are other ways to deal with it, that's just an example.
>>>>>>> We just need the API to better express the intention of the
>>>>>>> caller (preferably in one function) so that BMI doesn't have
>>>>>>> to optimize by guessing about what else is going on.
>>>>>>>
>>>>>>> That is more work than just adding a flag, though :) It
>>>>>>> probably depends on if we think the use case is going to be
>>>>>>> around long enough to justify tweaking the API.
>>>>>>>
>>>>>>> -Phil
>>>>>>>
>>>>>>> Sam Lang wrote:
>>>>>>>> I've committed the set_info fix for this. I'm not crazy
>>>>>>>> about it, but it should work for now. In the long term, we
>>>>>>>> should probably move away from method specific hacks like
>>>>>>>> this. I.e. it should be up to the API consumer (our server)
>>>>>>>> to adjust timeouts or call testunexpected in a separate thread.
>>>>>>>> Nawab, in the zoidfs init code after initializing BMI you
>>>>>>>> need to call:
>>>>>>>> int check = 0;
>>>>>>>> BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
>>>>>>>> -sam
>>>>>>>> On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
>>>>>>>>> Sam Lang wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>> I think Nawab has found a bug (or untested code path) in
>>>>>>>>>> the BMI tcp method. He's running a daemon that both
>>>>>>>>>> receives unexpected requests (as a server), and receives
>>>>>>>>>> expected responses (as a client).
>>>>>>>>>> In the BMI_testcontext call, if there aren't any completed
>>>>>>>>>> (expected) operations, and there are completed unexpected
>>>>>>>>>> receives, we return immediately, assuming that
>>>>>>>>>> BMI_testunexpected will be called in turn. I think the
>>>>>>>>>> idea here is that we want to keep our latency down for
>>>>>>>>>> unexpected messages, instead of doing work on expected
>>>>>>>>>> messages while unexpected messages are waiting in the
>>>>>>>>>> hopper. But the daemon is single threaded, and making
>>>>>>>>>> blocking PVFS_sys_* calls, so we essentially spin forever
>>>>>>>>>> calling BMI_testcontext over and over.
>>>>>>>>>> I'm not sure of the best way to fix this. Easy fixes would
>>>>>>>>>> be to remove the check for completed unexpected receives,
>>>>>>>>>> and/or do tcp_do_work for a shorter timeout.
>>>>>>>>>> It seems like we have a special case for blocking
>>>>>>>>>> PVFS_sys_* calls. We want to ignore unexpected receives
>>>>>>>>>> just in that case, and actually call tcp_do_work. In other
>>>>>>>>>> contexts, I think we want the behavior that we have now,
>>>>>>>>>> where we assume that a BMI_testunexpected call will follow
>>>>>>>>>> a BMI_testcontext call. We could modify the testcontext
>>>>>>>>>> call to take a separate parameter, but that seems messy.
>>>>>>>>>> We might also be able to handle this with separate BMI
>>>>>>>>>> contexts somehow...
>>>>>>>>>
>>>>>>>>> I haven't dug in the code yet to see if I see any more
>>>>>>>>> elegant way to handle it, but I wanted to mention that if
>>>>>>>>> you want to add a special flag to toggle the behavior, it
>>>>>>>>> might be better to just set it globally with the set_info()
>>>>>>>>> function rather than modifying the testcontext() api. That
>>>>>>>>> way you don't have to change any of the other BMI methods.
>>>>>>>>> There are already a couple of similar set_info() calls to
>>>>>>>>> toggle BMI behavior for different use cases.
>>>>>>>>>
>>>>>>>>> -Phil
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pvfs2-developers mailing list
>>>>>>> Pvfs2-developers at beowulf-underground.org
>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>>>
>>>>>
>>>>
>>>
>>
More information about the Pvfs2-developers
mailing list