[Pvfs2-developers] encoding negative responses
Sam Lang
slang at mcs.anl.gov
Tue Mar 20 17:30:15 EST 2007
Hi Phil,
In the special remove case, could we make the error checking even
tighter? Something like:
if(resp_p->status == -PVFS_ENOENT && index == 1)
return 0;
if(resp_p->status != 0)
return resp_p->status;
Other than that this patch looks good.
-sam
On Mar 20, 2007, at 9:03 AM, Phil Carns wrote:
> I have attached a patch that implements the proposed bug fixes.
> With this applies, the encoder does not bother to encode or decode
> any trailing data for error responses. We haven't seen any ill
> effects, although a minor adjustment was needed to the sys-
> rename.sm state machine which modifying the response structure
> after decoding.
>
> -Phil
>
> Phil Carns wrote:
>> I have done some more work on this, but unfortunately I don't have
>> a patch that I can share yet (it might be faster for you guys to
>> just make the change if you want it; I can point out one little
>> gotcha that I found in sys-rename.sm but the rest is simple).
>> As far as I can tell after a good bit of testing with the
>> modification in place, there are no _current_ operations that rely
>> on decoding fields from negative responses.
>> Sam brought up a hypothetical scenario where you might want this
>> ability (an O_EXCL create returns a handle on failure), but I
>> would vote for making that a special case if it happens rather
>> than having all negative responses get fully encoded by default.
>> We can always add a check to let things like that drop through.
>> Alternatively for that kind of scenario we could send a postitive
>> response, and just have a flag in the response structure that
>> indicates what really happened (ie, I didn't actually do the
>> creation, but here is your handle anyway) rather than using a
>> negative error code for that purpose.
>> -Phil
>> Rob Ross wrote:
>>> Did we reach any sort of consensus on this idea?
>>>
>>> Rob
>>>
>>> Phil Carns wrote:
>>>
>>>> We've run into a couple of scenarios lately where a broken
>>>> metadata file can cause the server to crash. The latest one
>>>> looks like this:
>>>>
>>>> ------------------
>>>> - a metadata file exists, but the db entry for its datafile
>>>> handles (the "dh" key is missing for some reason
>>>>
>>>> - a client requests a getattr on the file
>>>>
>>>> - the server reads the basic attributes successfully, indicating
>>>> that there is a dfile array of size x and a distribution of size y
>>>>
>>>> - the server tries to read the dfile array but fails
>>>>
>>>> - at this point the dfile array pointer in the attributes
>>>> structure is non-zero but has garbage in it
>>>>
>>>> - at this point the distribution pointer in the attributes
>>>> structure is NULL
>>>>
>>>> - the server state machine follows a failure path to send a
>>>> response with an error status
>>>>
>>>> - the encoder tries to encode the response, but segfaults
>>>> because it thinks the distribution exists (the mask is set and
>>>> the size is set to y) when really it is a NULL pointer
>>>> ------------------
>>>>
>>>> We've plugged some similar bugs before by just fixing the
>>>> specific case (being more careful about cleaning up masks,
>>>> pointers, sizes, etc. after an error), and the same could easily
>>>> be done here. Should there be a more general solution, though?
>>>>
>>>> What I am wondering is if the response encoder should even
>>>> bother to encode the whole message if the status is non-zero.
>>>> It seems like if there is an error code it should just stick to
>>>> encoding the basic PVFS_server_resp struct. The rest of the
>>>> fields can't really be trusted since we hit an error
>>>> condition. Likewise on the decode side of things.
>>>>
>>>> The lebf_encode_resp() function already sort of catches one case
>>>> of this:
>>>>
>>>> /* we stand a good chance of segfaulting if we try to encode
>>>> the response
>>>> * after something bad happened reading data from disk. */
>>>> if (resp->status != -PVFS_EIO)
>>>> {
>>>>
>>>> ... but that only handles one specific error code.
>>>>
>>>> I don't think it would be a real big change to do this in a more
>>>> general manner. The biggest problem is that it throws off the
>>>> size checking logic that checks buffer sizes after encoding, but
>>>> that shouldn't be hard to adjust.
>>>>
>>>> Any thoughts? Is this the right approach, or should we continue
>>>> to encode/decode all of the request-specific regardless of error?
>>>>
>>>> -Phil
>>>>
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-
>>>> developers
>>>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> Index: pvfs2_src/src/proto/PINT-le-bytefield.c
> ===================================================================
> --- pvfs2_src/src/proto/PINT-le-bytefield.c (revision 2774)
> +++ pvfs2_src/src/proto/PINT-le-bytefield.c (revision 2775)
> @@ -423,7 +423,7 @@
>
> /* we stand a good chance of segfaulting if we try to encode
> the response
> * after something bad happened reading data from disk. */
> - if (resp->status != -PVFS_EIO)
> + if (resp->status == 0)
> {
>
> /* extra encoding rules for particular responses */
> @@ -606,7 +606,7 @@
> decode_PVFS_server_resp(p, resp);
> gossip_debug(GOSSIP_ENDECODE_DEBUG,"lebf_decode_resp\n");
>
> - if (resp->status == -PVFS_EIO)
> + if (resp->status != 0)
> goto out;
>
> #define CASE(tag,var) \
> @@ -778,87 +778,90 @@
> }
> } else if (input_type == PINT_DECODE_RESP) {
> struct PVFS_server_resp *resp = &msg->stub_dec.resp;
> - switch (resp->op) {
> + if(resp->status == 0)
> + {
> + switch (resp->op) {
>
> - case PVFS_SERV_LOOKUP_PATH:
> - {
> - struct PVFS_servresp_lookup_path *lookup =
> - &resp->u.lookup_path;
> - decode_free(lookup->handle_array);
> - decode_free(lookup->attr_array);
> - break;
> - }
> + case PVFS_SERV_LOOKUP_PATH:
> + {
> + struct PVFS_servresp_lookup_path *lookup =
> + &resp->u.lookup_path;
> + decode_free(lookup->handle_array);
> + decode_free(lookup->attr_array);
> + break;
> + }
>
> - case PVFS_SERV_READDIR:
> - decode_free(resp->u.readdir.dirent_array);
> - break;
> + case PVFS_SERV_READDIR:
> + decode_free(resp->u.readdir.dirent_array);
> + break;
>
> - case PVFS_SERV_MGMT_PERF_MON:
> - decode_free(resp->u.mgmt_perf_mon.perf_array);
> - break;
> + case PVFS_SERV_MGMT_PERF_MON:
> + decode_free(resp->u.mgmt_perf_mon.perf_array);
> + break;
>
> - case PVFS_SERV_MGMT_ITERATE_HANDLES:
> - decode_free(resp->u.mgmt_iterate_handles.handle_array);
> - break;
> + case PVFS_SERV_MGMT_ITERATE_HANDLES:
> + decode_free(resp-
> >u.mgmt_iterate_handles.handle_array);
> + break;
>
> - case PVFS_SERV_MGMT_DSPACE_INFO_LIST:
> - decode_free(resp->u.mgmt_dspace_info_list.dspace_info_array);
> - break;
> + case PVFS_SERV_MGMT_DSPACE_INFO_LIST:
> + decode_free(resp-
> >u.mgmt_dspace_info_list.dspace_info_array);
> + break;
>
> - case PVFS_SERV_GETATTR:
> - if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DIST)
> - decode_free(resp->u.getattr.attr.u.meta.dist);
> - if (resp->u.getattr.attr.mask & PVFS_ATTR_META_DFILES)
> - decode_free(resp->u.getattr.attr.u.meta.dfile_array);
> - break;
> + case PVFS_SERV_GETATTR:
> + if (resp->u.getattr.attr.mask &
> PVFS_ATTR_META_DIST)
> + decode_free(resp-
> >u.getattr.attr.u.meta.dist);
> + if (resp->u.getattr.attr.mask &
> PVFS_ATTR_META_DFILES)
> + decode_free(resp-
> >u.getattr.attr.u.meta.dfile_array);
> + break;
>
> - case PVFS_SERV_MGMT_EVENT_MON:
> - decode_free(resp->u.mgmt_event_mon.event_array);
> - break;
> + case PVFS_SERV_MGMT_EVENT_MON:
> + decode_free(resp->u.mgmt_event_mon.event_array);
> + break;
>
> - case PVFS_SERV_GETEATTR:
> - /* need a loop here? WBL */
> - if (resp->u.geteattr.val)
> - decode_free(resp->u.geteattr.val);
> - break;
> - case PVFS_SERV_LISTEATTR:
> - if (resp->u.listeattr.key)
> - decode_free(resp->u.listeattr.key);
> - break;
> + case PVFS_SERV_GETEATTR:
> + /* need a loop here? WBL */
> + if (resp->u.geteattr.val)
> + decode_free(resp->u.geteattr.val);
> + break;
> + case PVFS_SERV_LISTEATTR:
> + if (resp->u.listeattr.key)
> + decode_free(resp->u.listeattr.key);
> + break;
>
> - case PVFS_SERV_GETCONFIG:
> - case PVFS_SERV_CREATE:
> - case PVFS_SERV_REMOVE:
> - case PVFS_SERV_MGMT_REMOVE_OBJECT:
> - case PVFS_SERV_MGMT_REMOVE_DIRENT:
> - case PVFS_SERV_MGMT_GET_DIRDATA_HANDLE:
> - case PVFS_SERV_IO:
> - case PVFS_SERV_SMALL_IO:
> - case PVFS_SERV_SETATTR:
> - case PVFS_SERV_SETEATTR:
> - case PVFS_SERV_DELEATTR:
> - case PVFS_SERV_CRDIRENT:
> - case PVFS_SERV_RMDIRENT:
> - case PVFS_SERV_CHDIRENT:
> - case PVFS_SERV_TRUNCATE:
> - case PVFS_SERV_MKDIR:
> - case PVFS_SERV_FLUSH:
> - case PVFS_SERV_MGMT_SETPARAM:
> - case PVFS_SERV_MGMT_NOOP:
> - case PVFS_SERV_STATFS:
> - case PVFS_SERV_WRITE_COMPLETION:
> - case PVFS_SERV_PROTO_ERROR:
> - /* nothing to free */
> - break;
> + case PVFS_SERV_GETCONFIG:
> + case PVFS_SERV_CREATE:
> + case PVFS_SERV_REMOVE:
> + case PVFS_SERV_MGMT_REMOVE_OBJECT:
> + case PVFS_SERV_MGMT_REMOVE_DIRENT:
> + case PVFS_SERV_MGMT_GET_DIRDATA_HANDLE:
> + case PVFS_SERV_IO:
> + case PVFS_SERV_SMALL_IO:
> + case PVFS_SERV_SETATTR:
> + case PVFS_SERV_SETEATTR:
> + case PVFS_SERV_DELEATTR:
> + case PVFS_SERV_CRDIRENT:
> + case PVFS_SERV_RMDIRENT:
> + case PVFS_SERV_CHDIRENT:
> + case PVFS_SERV_TRUNCATE:
> + case PVFS_SERV_MKDIR:
> + case PVFS_SERV_FLUSH:
> + case PVFS_SERV_MGMT_SETPARAM:
> + case PVFS_SERV_MGMT_NOOP:
> + case PVFS_SERV_STATFS:
> + case PVFS_SERV_WRITE_COMPLETION:
> + case PVFS_SERV_PROTO_ERROR:
> + /* nothing to free */
> + break;
>
> - case PVFS_SERV_INVALID:
> - case PVFS_SERV_PERF_UPDATE:
> - case PVFS_SERV_JOB_TIMER:
> - case PVFS_SERV_TROVE_SYNC_TIMER:
> - gossip_lerr("%s: invalid response operation %d.\n",
> - __func__, resp->op);
> - break;
> - }
> + case PVFS_SERV_INVALID:
> + case PVFS_SERV_PERF_UPDATE:
> + case PVFS_SERV_JOB_TIMER:
> + case PVFS_SERV_TROVE_SYNC_TIMER:
> + gossip_lerr("%s: invalid response operation %d.
> \n",
> + __func__, resp->op);
> + break;
> + }
> + }
> }
> }
>
> Index: pvfs2_src/src/server/pvfs2-server.h
> ===================================================================
> --- pvfs2_src/src/server/pvfs2-server.h (revision 2774)
> +++ pvfs2_src/src/server/pvfs2-server.h (revision 2775)
> @@ -313,6 +313,7 @@
> PVFS_error* err_array;
> PVFS_ds_keyval_handle_info keyval_handle_info;
> PVFS_handle dirent_handle;
> + PVFS_ds_keyval dist_val;
> };
>
> /* this is used in both set_eattr, get_eattr and list_eattr */
> Index: pvfs2_src/src/client/sysint/sys-rename.sm
> ===================================================================
> --- pvfs2_src/src/client/sysint/sys-rename.sm (revision 2774)
> +++ pvfs2_src/src/client/sysint/sys-rename.sm (revision 2775)
> @@ -434,7 +434,13 @@
> the target entry does not already exist, so it's
> expected and is not an error.
> */
> + /* note: don't change the actual resp structure,
> because the
> + * decoder counts on this to know how to release data
> structures
> + */
> +#if 0
> resp_p->status = 0;
> +#endif
> + return(0);
> }
> return resp_p->status;
> }
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
More information about the Pvfs2-developers
mailing list