[Pvfs2-developers] encoding negative responses

Sam Lang slang at mcs.anl.gov
Thu Oct 19 18:57:02 EDT 2006


It doesn't actually exist right now, but the counter example I can  
come up with has to do with file open, where the O_CREAT flag is set  
but O_EXCL is not.  If a parallel job tries to create a file (call  
PVFS_sys_create), and it already exists, it gets back an EEXISTS  
error, and then has to do a lookup to get the actual PVFS_handle of  
that file.  This means if a bunch of jobs all do a create of the same  
file (stupid sure, but possible), all the ones that got EEXISTS back  
have to do a lookup.  We could potentially optimize this by sticking  
the handle into the error response.

Maybe not worth it for this case, but it shows that some error  
responses might be treated as success (or at least non-fatal) by the  
caller.

-sam

On Oct 19, 2006, at 2:26 PM, Rob Ross wrote:

> Sounds like formalizing this would be a good idea. Is there even  
> one counter-example that we can think of or that exists today?
>
> Rob
>
> Pete Wyckoff wrote:
>> pcarns at wastedcycles.org wrote on Thu, 19 Oct 2006 16:34 +0200:
>>> We've plugged some similar bugs before by just fixing the  
>>> specific case (being more careful about cleaning up masks,  
>>> pointers, sizes, etc. after an error), and the same could easily  
>>> be done here.  Should there be a more general solution, though?
>>>
>>> What I am wondering is if the response encoder should even bother  
>>> to encode the whole message if the status is non-zero.  It seems  
>>> like if there is an error code it should just stick to encoding  
>>> the basic PVFS_server_resp struct.  The rest of the fields can't  
>>> really be trusted  since we hit an error condition.  Likewise on  
>>> the decode side of things.
>>>
>>> The lebf_encode_resp() function already sort of catches one case  
>>> of this:
>>>
>>>     /* we stand a good chance of segfaulting if we try to encode  
>>> the response
>>>      * after something bad happened reading data from disk. */
>>>     if (resp->status != -PVFS_EIO)
>>>     {
>>>
>>> ... but that only handles one specific error code.
>> I'm sort of in agreement with your analysis.  We had a similar issue
>> on the decoding side on the client:  an EIO came back from the
>> server but the client proceeded to try to decode particular fields.
>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2006- 
>> June/002202.html
>> Maybe both the lebf_{en,de}code_resp should encode or decode the
>> rest of the message if status == 0.  The comparison for EIO alone is
>> not sufficient.
>> Are there any reasons the client should be looking at more fields
>> once it sees a negative status?  Maybe we should just formalize
>> this.  And fixup lebf_decode_rel so it only frees things if status
>> was zero.
>> 		-- Pete
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list