[Pvfs2-developers] bmi testcontext/testunexpected

Sam Lang slang at mcs.anl.gov
Mon Dec 22 16:06:09 EST 2008


Hi All,

I think Nawab has found a bug (or untested code path) in the BMI tcp  
method.  He's running a daemon that both receives unexpected requests  
(as a server), and receives expected responses (as a client).

In the BMI_testcontext call, if there aren't any completed (expected)  
operations, and there are completed unexpected receives, we return  
immediately, assuming that BMI_testunexpected will be called in turn.   
I think the idea here is that we want to keep our latency down for  
unexpected messages, instead of doing work on expected messages while  
unexpected messages are waiting in the hopper.  But the daemon is  
single threaded, and making blocking PVFS_sys_* calls, so we  
essentially spin forever calling BMI_testcontext over and over.

I'm not sure of the best way to fix this.  Easy fixes would be to  
remove the check for completed unexpected receives, and/or do  
tcp_do_work for a shorter timeout.

It seems like we have a special case for blocking PVFS_sys_* calls.   
We want to ignore unexpected receives just in that case, and actually  
call tcp_do_work.  In other contexts, I think we want the behavior  
that we have now, where we assume that a BMI_testunexpected call will  
follow a BMI_testcontext call.  We could modify the testcontext call  
to take a separate parameter, but that seems messy.  We might also be  
able to handle this with separate BMI contexts somehow...

-sam


More information about the Pvfs2-developers mailing list