[Pvfs2-developers] encoding negative responses

Phil Carns pcarns at wastedcycles.org
Tue Mar 20 09:03:02 EST 2007


I have attached a patch that implements the proposed bug fixes.  With 
this applies, the encoder does not bother to encode or decode any 
trailing data for error responses.  We haven't seen any ill effects, 
although a minor adjustment was needed to the sys-rename.sm state 
machine which modifying the response structure after decoding.

-Phil

Phil Carns wrote:
> I have done some more work on this, but unfortunately I don't have a 
> patch that I can share yet (it might be faster for you guys to just make 
> the change if you want it; I can point out one little gotcha that I 
> found in sys-rename.sm but the rest is simple).
> 
> As far as I can tell after a good bit of testing with the modification 
> in place, there are no _current_ operations that rely on decoding fields 
> from negative responses.
> 
> Sam brought up a hypothetical scenario where you might want this ability 
> (an O_EXCL create returns a handle on failure), but I would vote for 
> making that a special case if it happens rather than having all negative 
> responses get fully encoded by default.  We can always add a check to 
> let things like that drop through.  Alternatively for that kind of 
> scenario we could send a postitive response, and just have a flag in the 
> response structure that indicates what really happened (ie, I didn't 
> actually do the creation, but here is your handle anyway) rather than 
> using a negative error code for that purpose.
> 
> -Phil
> 
> Rob Ross wrote:
> 
>> Did we reach any sort of consensus on this idea?
>>
>> Rob
>>
>> Phil Carns wrote:
>>
>>> We've run into a couple of scenarios lately where a broken metadata 
>>> file can cause the server to crash.  The latest one looks like this:
>>>
>>> ------------------
>>> - a metadata file exists, but the db entry for its datafile handles 
>>> (the "dh" key is missing for some reason
>>>
>>> - a client requests a getattr on the file
>>>
>>> - the server reads the basic attributes successfully, indicating that 
>>> there is a dfile array of size x and a distribution of size y
>>>
>>> - the server tries to read the dfile array but fails
>>>
>>> - at this point the dfile array pointer in the attributes structure 
>>> is non-zero but has garbage in it
>>>
>>> - at this point the distribution pointer in the attributes structure 
>>> is NULL
>>>
>>> - the server state machine follows a failure path to send a response 
>>> with an error status
>>>
>>> - the encoder tries to encode the response, but segfaults because it 
>>> thinks the distribution exists (the mask is set and the size is set 
>>> to y) when really it is a NULL pointer
>>> ------------------
>>>
>>> We've plugged some similar bugs before by just fixing the specific 
>>> case (being more careful about cleaning up masks, pointers, sizes, 
>>> etc. after an error), and the same could easily be done here.  Should 
>>> there be a more general solution, though?
>>>
>>> What I am wondering is if the response encoder should even bother to 
>>> encode the whole message if the status is non-zero.  It seems like if 
>>> there is an error code it should just stick to encoding the basic 
>>> PVFS_server_resp struct.  The rest of the fields can't really be 
>>> trusted  since we hit an error condition.  Likewise on the decode 
>>> side of things.
>>>
>>> The lebf_encode_resp() function already sort of catches one case of 
>>> this:
>>>
>>>     /* we stand a good chance of segfaulting if we try to encode the 
>>> response
>>>      * after something bad happened reading data from disk. */
>>>     if (resp->status != -PVFS_EIO)
>>>     {
>>>
>>> ... but that only handles one specific error code.
>>>
>>> I don't think it would be a real big change to do this in a more 
>>> general manner.  The biggest problem is that it throws off the size 
>>> checking logic that checks buffer sizes after encoding, but that 
>>> shouldn't be hard to adjust.
>>>
>>> Any thoughts?  Is this the right approach, or should we continue to 
>>> encode/decode all of the request-specific regardless of error?
>>>
>>> -Phil
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
> 
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: negative-response-encode.patch
Type: text/x-patch
Size: 7715 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070320/55c0a288/negative-response-encode-0001.bin


More information about the Pvfs2-developers mailing list