[Pvfs2-developers] encoding negative responses

Rob Ross rross at mcs.anl.gov
Fri Nov 10 10:07:47 EST 2006


Did we reach any sort of consensus on this idea?

Rob

Phil Carns wrote:
> We've run into a couple of scenarios lately where a broken metadata file 
> can cause the server to crash.  The latest one looks like this:
> 
> ------------------
> - a metadata file exists, but the db entry for its datafile handles (the 
> "dh" key is missing for some reason
> 
> - a client requests a getattr on the file
> 
> - the server reads the basic attributes successfully, indicating that 
> there is a dfile array of size x and a distribution of size y
> 
> - the server tries to read the dfile array but fails
> 
> - at this point the dfile array pointer in the attributes structure is 
> non-zero but has garbage in it
> 
> - at this point the distribution pointer in the attributes structure is 
> NULL
> 
> - the server state machine follows a failure path to send a response 
> with an error status
> 
> - the encoder tries to encode the response, but segfaults because it 
> thinks the distribution exists (the mask is set and the size is set to 
> y) when really it is a NULL pointer
> ------------------
> 
> We've plugged some similar bugs before by just fixing the specific case 
> (being more careful about cleaning up masks, pointers, sizes, etc. after 
> an error), and the same could easily be done here.  Should there be a 
> more general solution, though?
> 
> What I am wondering is if the response encoder should even bother to 
> encode the whole message if the status is non-zero.  It seems like if 
> there is an error code it should just stick to encoding the basic 
> PVFS_server_resp struct.  The rest of the fields can't really be trusted 
>  since we hit an error condition.  Likewise on the decode side of things.
> 
> The lebf_encode_resp() function already sort of catches one case of this:
> 
>     /* we stand a good chance of segfaulting if we try to encode the 
> response
>      * after something bad happened reading data from disk. */
>     if (resp->status != -PVFS_EIO)
>     {
> 
> ... but that only handles one specific error code.
> 
> I don't think it would be a real big change to do this in a more general 
> manner.  The biggest problem is that it throws off the size checking 
> logic that checks buffer sizes after encoding, but that shouldn't be 
> hard to adjust.
> 
> Any thoughts?  Is this the right approach, or should we continue to 
> encode/decode all of the request-specific regardless of error?
> 
> -Phil
> 
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> 


More information about the Pvfs2-developers mailing list