[Pvfs2-developers] bmi testcontext/testunexpected
Sam Lang
slang at mcs.anl.gov
Tue Jan 6 22:51:11 EST 2009
On Jan 6, 2009, at 9:40 PM, Rob Ross <rross at mcs.anl.gov> wrote:
> the fact that zoidfs is blocking is irrelevant to how the server
> implements servicing the calls. -- rob
>
The server implements servicing the calls by calling zoidfs functions,
which in turn call PVFS..
-sam
> On Jan 6, 2009, at 9:15 PM, Sam Lang wrote:
>
>>
>> On Jan 6, 2009, at 7:51 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>>
>>> Hi Sam,
>>>
>>> My take on your email was that you were combining the two issues,
>>> so I wanted to make sure that we were in agreement that the
>>> alternative API was preferred (not that I think we should
>>> necessarily do anything about it at the moment). I'm glad we are
>>> in agreement.
>>>
>>> The terms "scheduling" and "priority" are being tossed around here
>>> in a way that I don't think is appropriate. The current
>>> textcontext does neither prioritization nor scheduling, and
>>> neither would the proposed modified API (as described thus far).
>>> The current BMI behavior is more like a bug than anything else,
>>> although changing the behavior at this point would require some
>>> significant regression testing.
>>>
>> testcontext is setting a priority, I can only assume it's a desired
>> priority for our servers.. A separate thread in our server that
>> called testunexpected and fired off the state machines would be
>> fairly straightforward and prevent any starvation that might occur.
>>
>> In other words, we want the behavior, I just disagree with the
>> notion that the behavior should be set by bmi tcp.
>>
>> The API is an orthogonal issue.
>>
>>
>>> The I/O forwarding system probably ought to use the non-blocking
>>> PVFS calls so that it can better deal with this scenario anyway,
>>> right?
>>>
>> zoidfs is a blocking API.
>> -sam
>>
>>> Rob
>>>
>>> On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:
>>>
>>>>
>>>> On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
>>>>
>>>>> I think if we had this alternative design and one wanted to have
>>>>> different priorities, one would look for messages under
>>>>> different contexts as you say. But when you don't care about
>>>>> priority, it would be nice to be able to get everything in one
>>>>> call.
>>>>
>>>> I think you're arguing for a single testcontext function, instead
>>>> of the testcontext/testunexpected split. I agree with that, but
>>>> Phil and I are arguing about something else. Where should
>>>> scheduling decisions be made? Within a BMI method, or by the API
>>>> consumer? I'm arguing for the latter. Changing the API to be
>>>> more consistent or user friendly doesn't affect where we choose
>>>> to set the priority.
>>>>
>>>> -sam
>>>>
>>>>>
>>>>>
>>>>> Rob
>>>>>
>>>>> On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
>>>>>
>>>>>>
>>>>>> Changing the API as you describe would actually bring back the
>>>>>> original problem. As is, the BMI_tcp_testcontext call knows
>>>>>> that there are unexpected messages waiting, so it returns
>>>>>> immediately (expecting a call to testunexpected to follow).
>>>>>> This is a specific policy hard-coded in the tcp method.
>>>>>>
>>>>>> With just a single testcontext call and all expected and
>>>>>> unexpected messages going to that context, the tcp code would
>>>>>> have to put all the unexpected messages at the top of the
>>>>>> context to give them priority. This would fix the particular
>>>>>> problem that Nawab has, but its still dictating policy (which
>>>>>> messages get priority) from within the particular BMI method.
>>>>>>
>>>>>> I agree that forcing the application to define the policy (with
>>>>>> threads or timeouts) is moving the problem elsewhere, but its
>>>>>> moving the problem to where it belongs. Its our pvfs server
>>>>>> that wants unexpected messages to have priority, the bmi code
>>>>>> itself shouldn't dictate that priority. We could define
>>>>>> interfaces to BMI that allow the policy to be set, but that's
>>>>>> even further from where we are now.
>>>>>>
>>>>>> -sam
>>>>>>
>>>>>> On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
>>>>>>
>>>>>>> Yeah a special named context for unexpected message would be a
>>>>>>> clean way to have done things... -- Rob
>>>>>>>
>>>>>>> On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
>>>>>>>
>>>>>>>> Yeah, I don't particularly like adding special cases either.
>>>>>>>>
>>>>>>>> I feel like making the consumer play with timeouts or use an
>>>>>>>> extra thread would be just as much of a hack/workaround,
>>>>>>>> though. Its just moving the problem elsewhere.
>>>>>>>>
>>>>>>>> Fundamentally it seems more like a BMI API flaw. It would
>>>>>>>> have made more sense (for example) if unexpected messages
>>>>>>>> were assigned to a specific context and the testunexpected()
>>>>>>>> and testcontext() functions were combined. The consumer
>>>>>>>> could then use a single test call to retrieve both unexpected
>>>>>>>> and normal messages at once if they are in the same context
>>>>>>>> (as in the pvfs2-server use case). Testing on a different
>>>>>>>> context would ignore the presence of unexpected messages (as
>>>>>>>> in the problem triggering use case here).
>>>>>>>>
>>>>>>>> There are other ways to deal with it, that's just an
>>>>>>>> example. We just need the API to better express the
>>>>>>>> intention of the caller (preferably in one function) so that
>>>>>>>> BMI doesn't have to optimize by guessing about what else is
>>>>>>>> going on.
>>>>>>>>
>>>>>>>> That is more work than just adding a flag, though :) It
>>>>>>>> probably depends on if we think the use case is going to be
>>>>>>>> around long enough to justify tweaking the API.
>>>>>>>>
>>>>>>>> -Phil
>>>>>>>>
>>>>>>>> Sam Lang wrote:
>>>>>>>>> I've committed the set_info fix for this. I'm not crazy
>>>>>>>>> about it, but it should work for now. In the long term, we
>>>>>>>>> should probably move away from method specific hacks like
>>>>>>>>> this. I.e. it should be up to the API consumer (our server)
>>>>>>>>> to adjust timeouts or call testunexpected in a separate
>>>>>>>>> thread.
>>>>>>>>> Nawab, in the zoidfs init code after initializing BMI you
>>>>>>>>> need to call:
>>>>>>>>> int check = 0;
>>>>>>>>> BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
>>>>>>>>> -sam
>>>>>>>>> On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
>>>>>>>>>> Sam Lang wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>> I think Nawab has found a bug (or untested code path) in
>>>>>>>>>>> the BMI tcp method. He's running a daemon that both
>>>>>>>>>>> receives unexpected requests (as a server), and receives
>>>>>>>>>>> expected responses (as a client).
>>>>>>>>>>> In the BMI_testcontext call, if there aren't any completed
>>>>>>>>>>> (expected) operations, and there are completed unexpected
>>>>>>>>>>> receives, we return immediately, assuming that
>>>>>>>>>>> BMI_testunexpected will be called in turn. I think the
>>>>>>>>>>> idea here is that we want to keep our latency down for
>>>>>>>>>>> unexpected messages, instead of doing work on expected
>>>>>>>>>>> messages while unexpected messages are waiting in the
>>>>>>>>>>> hopper. But the daemon is single threaded, and making
>>>>>>>>>>> blocking PVFS_sys_* calls, so we essentially spin forever
>>>>>>>>>>> calling BMI_testcontext over and over.
>>>>>>>>>>> I'm not sure of the best way to fix this. Easy fixes
>>>>>>>>>>> would be to remove the check for completed unexpected
>>>>>>>>>>> receives, and/or do tcp_do_work for a shorter timeout.
>>>>>>>>>>> It seems like we have a special case for blocking
>>>>>>>>>>> PVFS_sys_* calls. We want to ignore unexpected receives
>>>>>>>>>>> just in that case, and actually call tcp_do_work. In
>>>>>>>>>>> other contexts, I think we want the behavior that we have
>>>>>>>>>>> now, where we assume that a BMI_testunexpected call will
>>>>>>>>>>> follow a BMI_testcontext call. We could modify the
>>>>>>>>>>> testcontext call to take a separate parameter, but that
>>>>>>>>>>> seems messy. We might also be able to handle this with
>>>>>>>>>>> separate BMI contexts somehow...
>>>>>>>>>>
>>>>>>>>>> I haven't dug in the code yet to see if I see any more
>>>>>>>>>> elegant way to handle it, but I wanted to mention that if
>>>>>>>>>> you want to add a special flag to toggle the behavior, it
>>>>>>>>>> might be better to just set it globally with the set_info()
>>>>>>>>>> function rather than modifying the testcontext() api. That
>>>>>>>>>> way you don't have to change any of the other BMI methods.
>>>>>>>>>> There are already a couple of similar set_info() calls to
>>>>>>>>>> toggle BMI behavior for different use cases.
>>>>>>>>>>
>>>>>>>>>> -Phil
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pvfs2-developers mailing list
>>>>>>>> Pvfs2-developers at beowulf-underground.org
>>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
More information about the Pvfs2-developers
mailing list