[Pvfs2-developers] encoding negative responses
Rob Ross
rross at mcs.anl.gov
Fri Nov 10 10:07:47 EST 2006
Did we reach any sort of consensus on this idea?
Rob
Phil Carns wrote:
> We've run into a couple of scenarios lately where a broken metadata file
> can cause the server to crash. The latest one looks like this:
>
> ------------------
> - a metadata file exists, but the db entry for its datafile handles (the
> "dh" key is missing for some reason
>
> - a client requests a getattr on the file
>
> - the server reads the basic attributes successfully, indicating that
> there is a dfile array of size x and a distribution of size y
>
> - the server tries to read the dfile array but fails
>
> - at this point the dfile array pointer in the attributes structure is
> non-zero but has garbage in it
>
> - at this point the distribution pointer in the attributes structure is
> NULL
>
> - the server state machine follows a failure path to send a response
> with an error status
>
> - the encoder tries to encode the response, but segfaults because it
> thinks the distribution exists (the mask is set and the size is set to
> y) when really it is a NULL pointer
> ------------------
>
> We've plugged some similar bugs before by just fixing the specific case
> (being more careful about cleaning up masks, pointers, sizes, etc. after
> an error), and the same could easily be done here. Should there be a
> more general solution, though?
>
> What I am wondering is if the response encoder should even bother to
> encode the whole message if the status is non-zero. It seems like if
> there is an error code it should just stick to encoding the basic
> PVFS_server_resp struct. The rest of the fields can't really be trusted
> since we hit an error condition. Likewise on the decode side of things.
>
> The lebf_encode_resp() function already sort of catches one case of this:
>
> /* we stand a good chance of segfaulting if we try to encode the
> response
> * after something bad happened reading data from disk. */
> if (resp->status != -PVFS_EIO)
> {
>
> ... but that only handles one specific error code.
>
> I don't think it would be a real big change to do this in a more general
> manner. The biggest problem is that it throws off the size checking
> logic that checks buffer sizes after encoding, but that shouldn't be
> hard to adjust.
>
> Any thoughts? Is this the right approach, or should we continue to
> encode/decode all of the request-specific regardless of error?
>
> -Phil
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list