[Pvfs2-developers] encoding negative responses
Sam Lang
slang at mcs.anl.gov
Thu Oct 19 18:57:02 EDT 2006
It doesn't actually exist right now, but the counter example I can
come up with has to do with file open, where the O_CREAT flag is set
but O_EXCL is not. If a parallel job tries to create a file (call
PVFS_sys_create), and it already exists, it gets back an EEXISTS
error, and then has to do a lookup to get the actual PVFS_handle of
that file. This means if a bunch of jobs all do a create of the same
file (stupid sure, but possible), all the ones that got EEXISTS back
have to do a lookup. We could potentially optimize this by sticking
the handle into the error response.
Maybe not worth it for this case, but it shows that some error
responses might be treated as success (or at least non-fatal) by the
caller.
-sam
On Oct 19, 2006, at 2:26 PM, Rob Ross wrote:
> Sounds like formalizing this would be a good idea. Is there even
> one counter-example that we can think of or that exists today?
>
> Rob
>
> Pete Wyckoff wrote:
>> pcarns at wastedcycles.org wrote on Thu, 19 Oct 2006 16:34 +0200:
>>> We've plugged some similar bugs before by just fixing the
>>> specific case (being more careful about cleaning up masks,
>>> pointers, sizes, etc. after an error), and the same could easily
>>> be done here. Should there be a more general solution, though?
>>>
>>> What I am wondering is if the response encoder should even bother
>>> to encode the whole message if the status is non-zero. It seems
>>> like if there is an error code it should just stick to encoding
>>> the basic PVFS_server_resp struct. The rest of the fields can't
>>> really be trusted since we hit an error condition. Likewise on
>>> the decode side of things.
>>>
>>> The lebf_encode_resp() function already sort of catches one case
>>> of this:
>>>
>>> /* we stand a good chance of segfaulting if we try to encode
>>> the response
>>> * after something bad happened reading data from disk. */
>>> if (resp->status != -PVFS_EIO)
>>> {
>>>
>>> ... but that only handles one specific error code.
>> I'm sort of in agreement with your analysis. We had a similar issue
>> on the decoding side on the client: an EIO came back from the
>> server but the client proceeded to try to decode particular fields.
>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2006-
>> June/002202.html
>> Maybe both the lebf_{en,de}code_resp should encode or decode the
>> rest of the message if status == 0. The comparison for EIO alone is
>> not sufficient.
>> Are there any reasons the client should be looking at more fields
>> once it sees a negative status? Maybe we should just formalize
>> this. And fixup lebf_decode_rel so it only frees things if status
>> was zero.
>> -- Pete
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list