[Pvfs2-developers] encoding negative responses
Phil Carns
pcarns at wastedcycles.org
Fri Nov 10 11:34:13 EST 2006
I have done some more work on this, but unfortunately I don't have a
patch that I can share yet (it might be faster for you guys to just make
the change if you want it; I can point out one little gotcha that I
found in sys-rename.sm but the rest is simple).
As far as I can tell after a good bit of testing with the modification
in place, there are no _current_ operations that rely on decoding fields
from negative responses.
Sam brought up a hypothetical scenario where you might want this ability
(an O_EXCL create returns a handle on failure), but I would vote for
making that a special case if it happens rather than having all negative
responses get fully encoded by default. We can always add a check to
let things like that drop through. Alternatively for that kind of
scenario we could send a postitive response, and just have a flag in the
response structure that indicates what really happened (ie, I didn't
actually do the creation, but here is your handle anyway) rather than
using a negative error code for that purpose.
-Phil
Rob Ross wrote:
> Did we reach any sort of consensus on this idea?
>
> Rob
>
> Phil Carns wrote:
>
>> We've run into a couple of scenarios lately where a broken metadata
>> file can cause the server to crash. The latest one looks like this:
>>
>> ------------------
>> - a metadata file exists, but the db entry for its datafile handles
>> (the "dh" key is missing for some reason
>>
>> - a client requests a getattr on the file
>>
>> - the server reads the basic attributes successfully, indicating that
>> there is a dfile array of size x and a distribution of size y
>>
>> - the server tries to read the dfile array but fails
>>
>> - at this point the dfile array pointer in the attributes structure is
>> non-zero but has garbage in it
>>
>> - at this point the distribution pointer in the attributes structure
>> is NULL
>>
>> - the server state machine follows a failure path to send a response
>> with an error status
>>
>> - the encoder tries to encode the response, but segfaults because it
>> thinks the distribution exists (the mask is set and the size is set to
>> y) when really it is a NULL pointer
>> ------------------
>>
>> We've plugged some similar bugs before by just fixing the specific
>> case (being more careful about cleaning up masks, pointers, sizes,
>> etc. after an error), and the same could easily be done here. Should
>> there be a more general solution, though?
>>
>> What I am wondering is if the response encoder should even bother to
>> encode the whole message if the status is non-zero. It seems like if
>> there is an error code it should just stick to encoding the basic
>> PVFS_server_resp struct. The rest of the fields can't really be
>> trusted since we hit an error condition. Likewise on the decode side
>> of things.
>>
>> The lebf_encode_resp() function already sort of catches one case of this:
>>
>> /* we stand a good chance of segfaulting if we try to encode the
>> response
>> * after something bad happened reading data from disk. */
>> if (resp->status != -PVFS_EIO)
>> {
>>
>> ... but that only handles one specific error code.
>>
>> I don't think it would be a real big change to do this in a more
>> general manner. The biggest problem is that it throws off the size
>> checking logic that checks buffer sizes after encoding, but that
>> shouldn't be hard to adjust.
>>
>> Any thoughts? Is this the right approach, or should we continue to
>> encode/decode all of the request-specific regardless of error?
>>
>> -Phil
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
More information about the Pvfs2-developers
mailing list