[Pvfs2-developers] bmi testcontext/testunexpected
Rob Ross
rross at mcs.anl.gov
Tue Jan 6 20:51:09 EST 2009
Hi Sam,
My take on your email was that you were combining the two issues, so I
wanted to make sure that we were in agreement that the alternative API
was preferred (not that I think we should necessarily do anything
about it at the moment). I'm glad we are in agreement.
The terms "scheduling" and "priority" are being tossed around here in
a way that I don't think is appropriate. The current textcontext does
neither prioritization nor scheduling, and neither would the proposed
modified API (as described thus far). The current BMI behavior is more
like a bug than anything else, although changing the behavior at this
point would require some significant regression testing.
The I/O forwarding system probably ought to use the non-blocking PVFS
calls so that it can better deal with this scenario anyway, right?
Rob
On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:
>
> On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
>
>> I think if we had this alternative design and one wanted to have
>> different priorities, one would look for messages under different
>> contexts as you say. But when you don't care about priority, it
>> would be nice to be able to get everything in one call.
>
> I think you're arguing for a single testcontext function, instead of
> the testcontext/testunexpected split. I agree with that, but Phil
> and I are arguing about something else. Where should scheduling
> decisions be made? Within a BMI method, or by the API consumer?
> I'm arguing for the latter. Changing the API to be more consistent
> or user friendly doesn't affect where we choose to set the priority.
>
> -sam
>
>>
>>
>> Rob
>>
>> On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
>>
>>>
>>> Changing the API as you describe would actually bring back the
>>> original problem. As is, the BMI_tcp_testcontext call knows that
>>> there are unexpected messages waiting, so it returns immediately
>>> (expecting a call to testunexpected to follow). This is a
>>> specific policy hard-coded in the tcp method.
>>>
>>> With just a single testcontext call and all expected and
>>> unexpected messages going to that context, the tcp code would have
>>> to put all the unexpected messages at the top of the context to
>>> give them priority. This would fix the particular problem that
>>> Nawab has, but its still dictating policy (which messages get
>>> priority) from within the particular BMI method.
>>>
>>> I agree that forcing the application to define the policy (with
>>> threads or timeouts) is moving the problem elsewhere, but its
>>> moving the problem to where it belongs. Its our pvfs server that
>>> wants unexpected messages to have priority, the bmi code itself
>>> shouldn't dictate that priority. We could define interfaces to
>>> BMI that allow the policy to be set, but that's even further from
>>> where we are now.
>>>
>>> -sam
>>>
>>> On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
>>>
>>>> Yeah a special named context for unexpected message would be a
>>>> clean way to have done things... -- Rob
>>>>
>>>> On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
>>>>
>>>>> Yeah, I don't particularly like adding special cases either.
>>>>>
>>>>> I feel like making the consumer play with timeouts or use an
>>>>> extra thread would be just as much of a hack/workaround,
>>>>> though. Its just moving the problem elsewhere.
>>>>>
>>>>> Fundamentally it seems more like a BMI API flaw. It would have
>>>>> made more sense (for example) if unexpected messages were
>>>>> assigned to a specific context and the testunexpected() and
>>>>> testcontext() functions were combined. The consumer could then
>>>>> use a single test call to retrieve both unexpected and normal
>>>>> messages at once if they are in the same context (as in the
>>>>> pvfs2-server use case). Testing on a different context would
>>>>> ignore the presence of unexpected messages (as in the problem
>>>>> triggering use case here).
>>>>>
>>>>> There are other ways to deal with it, that's just an example.
>>>>> We just need the API to better express the intention of the
>>>>> caller (preferably in one function) so that BMI doesn't have to
>>>>> optimize by guessing about what else is going on.
>>>>>
>>>>> That is more work than just adding a flag, though :) It
>>>>> probably depends on if we think the use case is going to be
>>>>> around long enough to justify tweaking the API.
>>>>>
>>>>> -Phil
>>>>>
>>>>> Sam Lang wrote:
>>>>>> I've committed the set_info fix for this. I'm not crazy about
>>>>>> it, but it should work for now. In the long term, we should
>>>>>> probably move away from method specific hacks like this. I.e.
>>>>>> it should be up to the API consumer (our server) to adjust
>>>>>> timeouts or call testunexpected in a separate thread.
>>>>>> Nawab, in the zoidfs init code after initializing BMI you need
>>>>>> to call:
>>>>>> int check = 0;
>>>>>> BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
>>>>>> -sam
>>>>>> On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
>>>>>>> Sam Lang wrote:
>>>>>>>> Hi All,
>>>>>>>> I think Nawab has found a bug (or untested code path) in the
>>>>>>>> BMI tcp method. He's running a daemon that both receives
>>>>>>>> unexpected requests (as a server), and receives expected
>>>>>>>> responses (as a client).
>>>>>>>> In the BMI_testcontext call, if there aren't any completed
>>>>>>>> (expected) operations, and there are completed unexpected
>>>>>>>> receives, we return immediately, assuming that
>>>>>>>> BMI_testunexpected will be called in turn. I think the idea
>>>>>>>> here is that we want to keep our latency down for unexpected
>>>>>>>> messages, instead of doing work on expected messages while
>>>>>>>> unexpected messages are waiting in the hopper. But the
>>>>>>>> daemon is single threaded, and making blocking PVFS_sys_*
>>>>>>>> calls, so we essentially spin forever calling BMI_testcontext
>>>>>>>> over and over.
>>>>>>>> I'm not sure of the best way to fix this. Easy fixes would
>>>>>>>> be to remove the check for completed unexpected receives, and/
>>>>>>>> or do tcp_do_work for a shorter timeout.
>>>>>>>> It seems like we have a special case for blocking PVFS_sys_*
>>>>>>>> calls. We want to ignore unexpected receives just in that
>>>>>>>> case, and actually call tcp_do_work. In other contexts, I
>>>>>>>> think we want the behavior that we have now, where we assume
>>>>>>>> that a BMI_testunexpected call will follow a BMI_testcontext
>>>>>>>> call. We could modify the testcontext call to take a
>>>>>>>> separate parameter, but that seems messy. We might also be
>>>>>>>> able to handle this with separate BMI contexts somehow...
>>>>>>>
>>>>>>> I haven't dug in the code yet to see if I see any more elegant
>>>>>>> way to handle it, but I wanted to mention that if you want to
>>>>>>> add a special flag to toggle the behavior, it might be better
>>>>>>> to just set it globally with the set_info() function rather
>>>>>>> than modifying the testcontext() api. That way you don't have
>>>>>>> to change any of the other BMI methods. There are already a
>>>>>>> couple of similar set_info() calls to toggle BMI behavior for
>>>>>>> different use cases.
>>>>>>>
>>>>>>> -Phil
>>>>>
>>>>> _______________________________________________
>>>>> Pvfs2-developers mailing list
>>>>> Pvfs2-developers at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>
>
More information about the Pvfs2-developers
mailing list