[Pvfs2-developers] encoding negative responses

Sam Lang slang at mcs.anl.gov
Tue Mar 20 17:30:15 EST 2007


Hi Phil,

In the special remove case, could we make the error checking even  
tighter?  Something like:

if(resp_p->status == -PVFS_ENOENT && index == 1)
	return 0;
if(resp_p->status != 0)
	return resp_p->status;

Other than that this patch looks good.

-sam

On Mar 20, 2007, at 9:03 AM, Phil Carns wrote:

> I have attached a patch that implements the proposed bug fixes.   
> With this applies, the encoder does not bother to encode or decode  
> any trailing data for error responses.  We haven't seen any ill  
> effects, although a minor adjustment was needed to the sys- 
> rename.sm state machine which modifying the response structure  
> after decoding.
>
> -Phil
>
> Phil Carns wrote:
>> I have done some more work on this, but unfortunately I don't have  
>> a patch that I can share yet (it might be faster for you guys to  
>> just make the change if you want it; I can point out one little  
>> gotcha that I found in sys-rename.sm but the rest is simple).
>> As far as I can tell after a good bit of testing with the  
>> modification in place, there are no _current_ operations that rely  
>> on decoding fields from negative responses.
>> Sam brought up a hypothetical scenario where you might want this  
>> ability (an O_EXCL create returns a handle on failure), but I  
>> would vote for making that a special case if it happens rather  
>> than having all negative responses get fully encoded by default.   
>> We can always add a check to let things like that drop through.   
>> Alternatively for that kind of scenario we could send a postitive  
>> response, and just have a flag in the response structure that  
>> indicates what really happened (ie, I didn't actually do the  
>> creation, but here is your handle anyway) rather than using a  
>> negative error code for that purpose.
>> -Phil
>> Rob Ross wrote:
>>> Did we reach any sort of consensus on this idea?
>>>
>>> Rob
>>>
>>> Phil Carns wrote:
>>>
>>>> We've run into a couple of scenarios lately where a broken  
>>>> metadata file can cause the server to crash.  The latest one  
>>>> looks like this:
>>>>
>>>> ------------------
>>>> - a metadata file exists, but the db entry for its datafile  
>>>> handles (the "dh" key is missing for some reason
>>>>
>>>> - a client requests a getattr on the file
>>>>
>>>> - the server reads the basic attributes successfully, indicating  
>>>> that there is a dfile array of size x and a distribution of size y
>>>>
>>>> - the server tries to read the dfile array but fails
>>>>
>>>> - at this point the dfile array pointer in the attributes  
>>>> structure is non-zero but has garbage in it
>>>>
>>>> - at this point the distribution pointer in the attributes  
>>>> structure is NULL
>>>>
>>>> - the server state machine follows a failure path to send a  
>>>> response with an error status
>>>>
>>>> - the encoder tries to encode the response, but segfaults  
>>>> because it thinks the distribution exists (the mask is set and  
>>>> the size is set to y) when really it is a NULL pointer
>>>> ------------------
>>>>
>>>> We've plugged some similar bugs before by just fixing the  
>>>> specific case (being more careful about cleaning up masks,  
>>>> pointers, sizes, etc. after an error), and the same could easily  
>>>> be done here.  Should there be a more general solution, though?
>>>>
>>>> What I am wondering is if the response encoder should even  
>>>> bother to encode the whole message if the status is non-zero.   
>>>> It seems like if there is an error code it should just stick to  
>>>> encoding the basic PVFS_server_resp struct.  The rest of the  
>>>> fields can't really be trusted  since we hit an error  
>>>> condition.  Likewise on the decode side of things.
>>>>
>>>> The lebf_encode_resp() function already sort of catches one case  
>>>> of this:
>>>>
>>>>     /* we stand a good chance of segfaulting if we try to encode  
>>>> the response
>>>>      * after something bad happened reading data from disk. */
>>>>     if (resp->status != -PVFS_EIO)
>>>>     {
>>>>
>>>> ... but that only handles one specific error code.
>>>>
>>>> I don't think it would be a real big change to do this in a more  
>>>> general manner.  The biggest problem is that it throws off the  
>>>> size checking logic that checks buffer sizes after encoding, but  
>>>> that shouldn't be hard to adjust.
>>>>
>>>> Any thoughts?  Is this the right approach, or should we continue  
>>>> to encode/decode all of the request-specific regardless of error?
>>>>
>>>> -Phil
>>>>
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2- 
>>>> developers
>>>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> Index: pvfs2_src/src/proto/PINT-le-bytefield.c
> ===================================================================
> --- pvfs2_src/src/proto/PINT-le-bytefield.c	(revision 2774)
> +++ pvfs2_src/src/proto/PINT-le-bytefield.c	(revision 2775)
> @@ -423,7 +423,7 @@
>
>      /* we stand a good chance of segfaulting if we try to encode  
> the response
>       * after something bad happened reading data from disk. */
> -    if (resp->status != -PVFS_EIO)
> +    if (resp->status == 0)
>      {
>
>          /* extra encoding rules for particular responses */
> @@ -606,7 +606,7 @@
>      decode_PVFS_server_resp(p, resp);
>      gossip_debug(GOSSIP_ENDECODE_DEBUG,"lebf_decode_resp\n");
>
> -    if (resp->status == -PVFS_EIO)
> +    if (resp->status != 0)
>          goto out;
>
>  #define CASE(tag,var) \
> @@ -778,87 +778,90 @@
>  	}
>      } else if (input_type == PINT_DECODE_RESP) {
>  	struct PVFS_server_resp *resp = &msg->stub_dec.resp;
> -	switch (resp->op) {
> +        if(resp->status == 0)
> +        {
> +            switch (resp->op) {
>
> -	    case PVFS_SERV_LOOKUP_PATH:
> -            {
> -		struct PVFS_servresp_lookup_path *lookup =
> -                    &resp->u.lookup_path;
> -		decode_free(lookup->handle_array);
> -		decode_free(lookup->attr_array);
> -		break;
> -	    }
> +                case PVFS_SERV_LOOKUP_PATH:
> +                {
> +                    struct PVFS_servresp_lookup_path *lookup =
> +                        &resp->u.lookup_path;
> +                    decode_free(lookup->handle_array);
> +                    decode_free(lookup->attr_array);
> +                    break;
> +                }
>
> -	    case PVFS_SERV_READDIR:
> -		decode_free(resp->u.readdir.dirent_array);
> -		break;
> +                case PVFS_SERV_READDIR:
> +                    decode_free(resp->u.readdir.dirent_array);
> +                    break;
>
> -	    case PVFS_SERV_MGMT_PERF_MON:
> -		decode_free(resp->u.mgmt_perf_mon.perf_array);
> -		break;
> +                case PVFS_SERV_MGMT_PERF_MON:
> +                    decode_free(resp->u.mgmt_perf_mon.perf_array);
> +                    break;
>
> -	    case PVFS_SERV_MGMT_ITERATE_HANDLES:
> -		decode_free(resp->u.mgmt_iterate_handles.handle_array);
> -		break;
> +                case PVFS_SERV_MGMT_ITERATE_HANDLES:
> +                    decode_free(resp- 
> >u.mgmt_iterate_handles.handle_array);
> +                    break;
>
> -	    case PVFS_SERV_MGMT_DSPACE_INFO_LIST:
> -		decode_free(resp->u.mgmt_dspace_info_list.dspace_info_array);
> -		break;
> +                case PVFS_SERV_MGMT_DSPACE_INFO_LIST:
> +                    decode_free(resp- 
> >u.mgmt_dspace_info_list.dspace_info_array);
> +                    break;
>
> -	    case PVFS_SERV_GETATTR:
> -		if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DIST)
> -		    decode_free(resp->u.getattr.attr.u.meta.dist);
> -		if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DFILES)
> -		    decode_free(resp->u.getattr.attr.u.meta.dfile_array);
> -		break;
> +                case PVFS_SERV_GETATTR:
> +                    if (resp->u.getattr.attr.mask &  
> PVFS_ATTR_META_DIST)
> +                        decode_free(resp- 
> >u.getattr.attr.u.meta.dist);
> +                    if (resp->u.getattr.attr.mask &  
> PVFS_ATTR_META_DFILES)
> +                        decode_free(resp- 
> >u.getattr.attr.u.meta.dfile_array);
> +                    break;
>
> -	    case PVFS_SERV_MGMT_EVENT_MON:
> -		decode_free(resp->u.mgmt_event_mon.event_array);
> -		break;
> +                case PVFS_SERV_MGMT_EVENT_MON:
> +                    decode_free(resp->u.mgmt_event_mon.event_array);
> +                    break;
>
> -	    case PVFS_SERV_GETEATTR:
> -                /* need a loop here?  WBL */
> -		if (resp->u.geteattr.val)
> -		    decode_free(resp->u.geteattr.val);
> -		break;
> -            case PVFS_SERV_LISTEATTR:
> -                if (resp->u.listeattr.key)
> -                    decode_free(resp->u.listeattr.key);
> -                break;
> +                case PVFS_SERV_GETEATTR:
> +                    /* need a loop here?  WBL */
> +                    if (resp->u.geteattr.val)
> +                        decode_free(resp->u.geteattr.val);
> +                    break;
> +                case PVFS_SERV_LISTEATTR:
> +                    if (resp->u.listeattr.key)
> +                        decode_free(resp->u.listeattr.key);
> +                    break;
>
> -	    case PVFS_SERV_GETCONFIG:
> -	    case PVFS_SERV_CREATE:
> -	    case PVFS_SERV_REMOVE:
> -	    case PVFS_SERV_MGMT_REMOVE_OBJECT:
> -	    case PVFS_SERV_MGMT_REMOVE_DIRENT:
> -	    case PVFS_SERV_MGMT_GET_DIRDATA_HANDLE:
> -	    case PVFS_SERV_IO:
> -            case PVFS_SERV_SMALL_IO:
> -	    case PVFS_SERV_SETATTR:
> -	    case PVFS_SERV_SETEATTR:
> -	    case PVFS_SERV_DELEATTR:
> -	    case PVFS_SERV_CRDIRENT:
> -	    case PVFS_SERV_RMDIRENT:
> -	    case PVFS_SERV_CHDIRENT:
> -	    case PVFS_SERV_TRUNCATE:
> -	    case PVFS_SERV_MKDIR:
> -	    case PVFS_SERV_FLUSH:
> -	    case PVFS_SERV_MGMT_SETPARAM:
> -	    case PVFS_SERV_MGMT_NOOP:
> -	    case PVFS_SERV_STATFS:
> -	    case PVFS_SERV_WRITE_COMPLETION:
> -	    case PVFS_SERV_PROTO_ERROR:
> -		/* nothing to free */
> -		break;
> +                case PVFS_SERV_GETCONFIG:
> +                case PVFS_SERV_CREATE:
> +                case PVFS_SERV_REMOVE:
> +                case PVFS_SERV_MGMT_REMOVE_OBJECT:
> +                case PVFS_SERV_MGMT_REMOVE_DIRENT:
> +                case PVFS_SERV_MGMT_GET_DIRDATA_HANDLE:
> +                case PVFS_SERV_IO:
> +                case PVFS_SERV_SMALL_IO:
> +                case PVFS_SERV_SETATTR:
> +                case PVFS_SERV_SETEATTR:
> +                case PVFS_SERV_DELEATTR:
> +                case PVFS_SERV_CRDIRENT:
> +                case PVFS_SERV_RMDIRENT:
> +                case PVFS_SERV_CHDIRENT:
> +                case PVFS_SERV_TRUNCATE:
> +                case PVFS_SERV_MKDIR:
> +                case PVFS_SERV_FLUSH:
> +                case PVFS_SERV_MGMT_SETPARAM:
> +                case PVFS_SERV_MGMT_NOOP:
> +                case PVFS_SERV_STATFS:
> +                case PVFS_SERV_WRITE_COMPLETION:
> +                case PVFS_SERV_PROTO_ERROR:
> +                    /* nothing to free */
> +                    break;
>
> -	    case PVFS_SERV_INVALID:
> -	    case PVFS_SERV_PERF_UPDATE:
> -	    case PVFS_SERV_JOB_TIMER:
> -	    case PVFS_SERV_TROVE_SYNC_TIMER:
> -		gossip_lerr("%s: invalid response operation %d.\n",
> -		  __func__, resp->op);
> -		break;
> -	}
> +                case PVFS_SERV_INVALID:
> +                case PVFS_SERV_PERF_UPDATE:
> +                case PVFS_SERV_JOB_TIMER:
> +                case PVFS_SERV_TROVE_SYNC_TIMER:
> +                    gossip_lerr("%s: invalid response operation %d. 
> \n",
> +                      __func__, resp->op);
> +                    break;
> +            }
> +        }
>      }
>  }
>
> Index: pvfs2_src/src/server/pvfs2-server.h
> ===================================================================
> --- pvfs2_src/src/server/pvfs2-server.h	(revision 2774)
> +++ pvfs2_src/src/server/pvfs2-server.h	(revision 2775)
> @@ -313,6 +313,7 @@
>      PVFS_error* err_array;
>      PVFS_ds_keyval_handle_info keyval_handle_info;
>      PVFS_handle dirent_handle;
> +    PVFS_ds_keyval dist_val;
>  };
>
>  /* this is used in both set_eattr, get_eattr and list_eattr */
> Index: pvfs2_src/src/client/sysint/sys-rename.sm
> ===================================================================
> --- pvfs2_src/src/client/sysint/sys-rename.sm	(revision 2774)
> +++ pvfs2_src/src/client/sysint/sys-rename.sm	(revision 2775)
> @@ -434,7 +434,13 @@
>                the target entry does not already exist, so it's
>                expected and is not an error.
>              */
> +            /* note: don't change the actual resp structure,  
> because the
> +             * decoder counts on this to know how to release data  
> structures
> +             */
> +#if 0
>              resp_p->status = 0;
> +#endif
> +            return(0);
>          }
>  	return resp_p->status;
>      }
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



More information about the Pvfs2-developers mailing list