[Pvfs2-developers] encoding negative responses

Phil Carns pcarns at wastedcycles.org
Thu Oct 19 10:34:04 EDT 2006


We've run into a couple of scenarios lately where a broken metadata file 
can cause the server to crash.  The latest one looks like this:

------------------
- a metadata file exists, but the db entry for its datafile handles (the 
"dh" key is missing for some reason

- a client requests a getattr on the file

- the server reads the basic attributes successfully, indicating that 
there is a dfile array of size x and a distribution of size y

- the server tries to read the dfile array but fails

- at this point the dfile array pointer in the attributes structure is 
non-zero but has garbage in it

- at this point the distribution pointer in the attributes structure is NULL

- the server state machine follows a failure path to send a response 
with an error status

- the encoder tries to encode the response, but segfaults because it 
thinks the distribution exists (the mask is set and the size is set to 
y) when really it is a NULL pointer
------------------

We've plugged some similar bugs before by just fixing the specific case 
(being more careful about cleaning up masks, pointers, sizes, etc. after 
an error), and the same could easily be done here.  Should there be a 
more general solution, though?

What I am wondering is if the response encoder should even bother to 
encode the whole message if the status is non-zero.  It seems like if 
there is an error code it should just stick to encoding the basic 
PVFS_server_resp struct.  The rest of the fields can't really be trusted 
  since we hit an error condition.  Likewise on the decode side of things.

The lebf_encode_resp() function already sort of catches one case of this:

     /* we stand a good chance of segfaulting if we try to encode the 
response
      * after something bad happened reading data from disk. */
     if (resp->status != -PVFS_EIO)
     {

... but that only handles one specific error code.

I don't think it would be a real big change to do this in a more general 
manner.  The biggest problem is that it throws off the size checking 
logic that checks buffer sizes after encoding, but that shouldn't be 
hard to adjust.

Any thoughts?  Is this the right approach, or should we continue to 
encode/decode all of the request-specific regardless of error?

-Phil



More information about the Pvfs2-developers mailing list