[PVFS2-developers] Trusted ports/network patch

Sam Lang slang at mcs.anl.gov
Fri Oct 21 12:15:42 EDT 2005


Rob Ross wrote:
> Sam Lang wrote:
> 
>> Murali Vilayannur wrote:
> 
> 
>>> Sam and I discussed this a bit. I guess the thing that we are trying to
>>> answer here is what sort of errors should cause the bmi_thread_function
>>> thread to die and what should cause it to continue as much as possible?
>>
>>
>> It looks like the bmi thread function was implemented to allow
>> failures from unexpected messages to be recoverable (with
>> BMI_testunexpected), while failures from other bmi operations
>> (flows?) are supposed to cause the thread function to exit
>> (BMI_testcontext).  Except it doesn't sound like this is actually
>> how it works, since Murali's test was with pvfs2-ls, and the
>> server's bmi thread exited anyway.
> 
>>
> 
>>> Currently the way it looks if accept() (possibly other paths as well,
>>> but I haven't looked carefully) fails for some reason,
>>> pvfs2-server is going to sit unresponsive because the bmi thread just
>>> died!
>>> What is the right thing to do in this case?
>>
>>
>> The server should shutdown, right?  What good is a server that can't
>> do I/O?
> 
> 
> I'm not sure that it should shutdown.  It's possible that the server
> might have other operations that need to complete (and thus immediate
> shutdown would be inappropriate), it's possible that the network is
> having a temporary problem (in which case retrying would be in order), etc.
> 
> We should definitely think about this one more.
> 

Yeah it makes sense that trove operations could complete and so
immediate shutdown would be bad in that case.  If the bmi thread has
exited though it won't matter if the network is down temporarily.
Its not entirely clear to me in what cases you would want bmi
failures to be stop the bmi thread, but for the ones that are, it
seems like you have to at least schedule shutdown when all jobs have
completed.

-sam

> Rob
> 



More information about the PVFS2-developers mailing list