[Pvfs2-developers] bmi questions

Phil Carns pcarns at wastedcycles.org
Fri Aug 18 10:09:40 EDT 2006


> I have some questions related to the design semantics of BMI.
> 
> * timeouts.  It looks like the timeout for bmi test calls is the max  
> amount of time spent _idling_ in the test call (as apposed to the max  
> time spent in the test call).  

This is correct.  The name of the argument is max_idle_time_ms.  The 
main reason it was put there is to give an opportunity to prevent BMI 
from busy spinning when it is polling for completion.  The more 
traditional timeout semantics (where you wait up to N seconds for 
something specific to finish before giving up, whether busy or not) is 
implemented at the job level. When the job level doesn't want BMI to 
block, it sets max_idle_time_ms to 0, but when it is doesn't really have 
much else to do it will set it to a few milliseconds.  This is enough to 
prevent high cpu usage, but still low enough for us to pop out and do 
other occasional book keeping at the job level.

> In other words, if operations are  being 
> completed continuously, then the timeout is never triggered,  and the 
> call can block for much longer than the actual timeout  specified.  

I don't think this is true in practice, because we never loop (within 
bmi) over a function that can idle.  The bmi_tcp and bmi_gm methods take 
this approach to implementing the max idle time:

- check completion queue: if find something, return immediately
- call a generic progress function that may idle for as long as 
max_idle_time_ms but will exit as soon as it gets any work done (the 
work may or may not be related to what the caller tested for)
- check completion queue: if find something, return immediately

So the only way that this function can block much longer than 
max_idle_time_ms is if checking the completion queue takes a long time. 
  Completion checking is typically very fast though; testsome() and 
test() map ids directoy to operations so there is no data structure 
searching, while testcontext() just takes the first N available items 
from the completion queue.

> Is 
> this the desired behavior?  The concern would be that  the bmi 
> operations would be completed at a constant rate, causing a  bursty 
> behavior of completed bmi jobs.  

I don't think it is particularly bursty, but the test functions will 
always return as much as they can from the completion queue when they 
check, on the theory that the caller can do a better job of figuring out 
what to do with them.  There isn't much reason for the BMI layer to 
throttle completed operations.

 > The incount constrains this,  but for
 > both bmi api users and bmi method implementors we should  probably
 > document all those semantics.

This stuff could definitely stand to have much better documentation.

-Phil


More information about the Pvfs2-developers mailing list