[Pvfs2-developers] encoding negative responses

Phil Carns pcarns at wastedcycles.org
Fri Nov 10 11:34:13 EST 2006


I have done some more work on this, but unfortunately I don't have a 
patch that I can share yet (it might be faster for you guys to just make 
the change if you want it; I can point out one little gotcha that I 
found in sys-rename.sm but the rest is simple).

As far as I can tell after a good bit of testing with the modification 
in place, there are no _current_ operations that rely on decoding fields 
from negative responses.

Sam brought up a hypothetical scenario where you might want this ability 
(an O_EXCL create returns a handle on failure), but I would vote for 
making that a special case if it happens rather than having all negative 
responses get fully encoded by default.  We can always add a check to 
let things like that drop through.  Alternatively for that kind of 
scenario we could send a postitive response, and just have a flag in the 
response structure that indicates what really happened (ie, I didn't 
actually do the creation, but here is your handle anyway) rather than 
using a negative error code for that purpose.

-Phil

Rob Ross wrote:
> Did we reach any sort of consensus on this idea?
> 
> Rob
> 
> Phil Carns wrote:
> 
>> We've run into a couple of scenarios lately where a broken metadata 
>> file can cause the server to crash.  The latest one looks like this:
>>
>> ------------------
>> - a metadata file exists, but the db entry for its datafile handles 
>> (the "dh" key is missing for some reason
>>
>> - a client requests a getattr on the file
>>
>> - the server reads the basic attributes successfully, indicating that 
>> there is a dfile array of size x and a distribution of size y
>>
>> - the server tries to read the dfile array but fails
>>
>> - at this point the dfile array pointer in the attributes structure is 
>> non-zero but has garbage in it
>>
>> - at this point the distribution pointer in the attributes structure 
>> is NULL
>>
>> - the server state machine follows a failure path to send a response 
>> with an error status
>>
>> - the encoder tries to encode the response, but segfaults because it 
>> thinks the distribution exists (the mask is set and the size is set to 
>> y) when really it is a NULL pointer
>> ------------------
>>
>> We've plugged some similar bugs before by just fixing the specific 
>> case (being more careful about cleaning up masks, pointers, sizes, 
>> etc. after an error), and the same could easily be done here.  Should 
>> there be a more general solution, though?
>>
>> What I am wondering is if the response encoder should even bother to 
>> encode the whole message if the status is non-zero.  It seems like if 
>> there is an error code it should just stick to encoding the basic 
>> PVFS_server_resp struct.  The rest of the fields can't really be 
>> trusted  since we hit an error condition.  Likewise on the decode side 
>> of things.
>>
>> The lebf_encode_resp() function already sort of catches one case of this:
>>
>>     /* we stand a good chance of segfaulting if we try to encode the 
>> response
>>      * after something bad happened reading data from disk. */
>>     if (resp->status != -PVFS_EIO)
>>     {
>>
>> ... but that only handles one specific error code.
>>
>> I don't think it would be a real big change to do this in a more 
>> general manner.  The biggest problem is that it throws off the size 
>> checking logic that checks buffer sizes after encoding, but that 
>> shouldn't be hard to adjust.
>>
>> Any thoughts?  Is this the right approach, or should we continue to 
>> encode/decode all of the request-specific regardless of error?
>>
>> -Phil
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>



More information about the Pvfs2-developers mailing list