[Pvfs2-developers] Re: pvfs-client segfault

Sam Lang slang at mcs.anl.gov
Thu Feb 14 18:25:45 EST 2008


I committed a fix to the segfault in the client daemon.  The problem  
that my patch introduced was to allow a state machine to complete  
immediately if possible (none of its actions needed to block).   
Unfortunately, the client code didn't handle this case -- we were  
adding the smcb to the completion list without checking for immediate  
completion, and then free-ing the smcb.  Later (as Pete pointed out),  
we would pull the (now freed) smcb off the list and try to do  
something with it.  Invalid read!

So the commits I just made check if the state machine completed  
immediately and prevent the smcb from ever getting added to the  
completion list.  The upside of this patch is that now the client  
daemon should be sending the response downcall directly on completion  
of the state machine, instead of just "posting" a state machine that  
completes immediately, and handling the response later.

With this fix, I was able to copy 100s of files and list them without  
the client daemon falling over.

Troy, Pete:  Thanks for the debugging help on this one.
-sam

On Feb 14, 2008, at 9:54 AM, Samuel Lang wrote:

>
>
> On Thu, 14 Feb 2008, Pete Wyckoff wrote:
>
>> pw at osc.edu wrote on Thu, 14 Feb 2008 00:19 -0500:
>>> slang at mcs.anl.gov wrote on Wed, 13 Feb 2008 21:25 -0600:
>>>> What happens when you restart the client daemon?  Does the  
>>>> segfault occur
>>>> with bmi_tcp?
>>>
>>> Yeah, I'm getting the same sort of thing, with TCP.  1 client, 1
>>> md+data server.  2.6.24-rc6.  A few "ls -la /pvfs" will crash
>>> client-core, and it automatically is restarted.  Similar sort of
>>> backtrace.  Valgrind doesn't show anything before where it all goes
>>> bad in Troy's traces.
>>>
>>> ==7517== Invalid read of size 8
>>> ==7517==    at 0x4C6989E: qlist_empty (quicklist.h:117)
>>> ==7517==    by 0x4C697DD: PINT_sm_frame (state-machine-fns.c:595)
>>> ==7517==    by 0x4C270AB: completion_list_retrieve_completed  
>>> (client-state-machine.c:140)
>>> ==7517==    by 0x4C281DB: PINT_client_state_machine_testsome  
>>> (client-state-machine.c:694)
>>> ==7517==    by 0x4C285EB: PVFS_sys_testsome (client-state- 
>>> machine.c:907)
>>> ==7517==    by 0x407BED: process_vfs_requests (pvfs2-client-core.c: 
>>> 2943)
>>> ==7517==    by 0x40A284: main (pvfs2-client-core.c:3379)
>>> ==7517==  Address 0x53BD870 is 80 bytes inside a block of size 176  
>>> free'd
>>> ==7517==    at 0x4A0560B: free (vg_replace_malloc.c:233)
>>> ==7517==    by 0x4C696D7: PINT_smcb_free (state-machine-fns.c:551)
>>> ==7517==    by 0x4C2768A: PINT_client_state_machine_post (client- 
>>> state-machine.c:395)
>>> ==7517==    by 0x4C29FE5: PVFS_isys_getattr (sys-getattr.sm:211)
>>> ==7517==    by 0x403D94: post_getattr_request (pvfs2-client-core.c: 
>>> 558)
>>> ==7517==    by 0x408648: handle_unexp_vfs_request (pvfs2-client- 
>>> core.c:2708)
>>> ==7517==    by 0x407D70: process_vfs_requests (pvfs2-client-core.c: 
>>> 2990)
>>> ==7517==    by 0x40A284: main (pvfs2-client-core.c:3379)
>>>
>>> (Parse the second half of this first:)
>>>
>>> handle_unexp_vfs_request goes off to post a getattr.
>>> PVFS_isys_getattr allocs a new smcb.  PINT_client_state_machine_post
>>> starts the SM and it must have finished immediately through a
>>> successful acache lookup.  PINT_client_state_machine_post frees the
>>> smcb.
>>>
>>> (Now the top half:)
>>>
>>> Later testsome decides it has a completed smcb.  The same one that
>>> had been freed as above.  Although maybe not related.
>>>
>>> This is the CVS head _before_ the big cleanup Sam did today.  Are
>>> we forgetting to initialize smcb->frames somewhere related?  Looking
>>> back for suspicious changes.
>>>
>>> There's the memmove() fix on s_completion_list[] by Phil back on 15
>>> jan, but that's obviously a big fix, and it's probably not getting
>>> triggered either.  And a bunch of locking changes that are harmless.
>>>
>>> Needs more debugging.
>>
>> It's definitely this commit that causes the problem.  Gotta love
>> git-bisect.  Does it offer any clues?
>>
>
> Yep, I'm looking into that.  Thanks Pete!
> -sam
>
>> 		-- Pete
>>
>> commit 3d0ba5f2ee107c9cec913d6ba1a56b571c87c2b7
>> Author: slang <slang>
>> Date:   Thu Feb 7 16:16:35 2008 +0000
>>
>>    fixes to sm code for parallel jumps.
>>
>> diff --git a/src/common/misc/state-machine-fns.c b/src/common/misc/ 
>> state-machine-fns.c
>> index 6098ea0..c512db1 100644
>> --- a/src/common/misc/state-machine-fns.c
>> +++ b/src/common/misc/state-machine-fns.c
>> @@ -70,30 +70,40 @@ int PINT_state_machine_halt(void)
>>  */
>> int PINT_state_machine_terminate(struct PINT_smcb *smcb,  
>> job_status_s *r)
>> {
>> -    struct PINT_frame_s *current_frame;
>> +    struct PINT_frame_s *my_frame, *f;
>>     job_id_t id;
>>
>>     /* notify parent */
>>     if (smcb->parent_smcb)
>>     {
>> -        assert(smcb->parent_smcb->children_running > 0);
>> -
>> -        current_frame = qlist_entry(
>> -            &smcb->frames.next, struct PINT_frame_s, link);
>> -        current_frame->error = r->error_code;
>> -
>> -        if (--smcb->parent_smcb->children_running > 0)
>> -        {
>> -            /* SM is still deferred */
>> -            return SM_ACTION_DEFERRED;
>> -        }
>> -        else
>> +        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> +                     "[SM Terminating Child]: (%p) %s:%s  
>> (error_code: %d)\n",
>> +                     smcb,
>> +                     /* skip pvfs2_ */
>> +                     PINT_state_machine_current_machine_name(smcb),
>> +                     PINT_state_machine_current_state_name(smcb),
>> +                     (int32_t)r->error_code);
>> +         assert(smcb->parent_smcb->children_running > 0);
>> +
>> +         my_frame = qlist_entry(
>> +            smcb->frames.next, struct PINT_frame_s, link);
>> +         qlist_for_each_entry(f, &smcb->parent_smcb->frames, link)
>> +         {
>> +             if(my_frame->frame == f->frame)
>> +             {
>> +                 f->error = r->error_code;
>> +                 break;
>> +             }
>> +         }
>> +
>> +        if (--smcb->parent_smcb->children_running <= 0)
>>         {
>>             /* no more child state machines running, so we can
>>              * start up the parent state machine again
>>              */
>>             job_null(0, smcb->parent_smcb, 0, r, &id, smcb->context);
>>         }
>> +        return SM_ACTION_DEFERRED;
>>     }
>>     /* call state machine completion function */
>>     if (smcb->terminate_fn)
>> @@ -127,10 +137,6 @@ PINT_sm_action  
>> PINT_state_machine_invoke(struct PINT_smcb *smcb,
>>         return SM_ERROR;
>>     }
>>
>> -    /* print pre-call debugging info */
>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -            "SM invoke smcb %p op %d\n",smcb,(smcb)->op);
>> -
>>     state_name = PINT_state_machine_current_state_name(smcb);
>>     machine_name = PINT_state_machine_current_machine_name(smcb);
>>
>> @@ -148,17 +154,10 @@ PINT_sm_action  
>> PINT_state_machine_invoke(struct PINT_smcb *smcb,
>>     switch (retval)
>>     {
>>     case SM_ACTION_TERMINATE :
>> -            gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -                    "SM Terminates (%p)\n", smcb);
>>             smcb->op_terminate = 1;
>>             break;
>>     case SM_ACTION_COMPLETE :
>> -            gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -                    "SM Returns Complete (%p)\n", smcb);
>> -            break;
>>     case SM_ACTION_DEFERRED :
>> -            gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -                    "SM Returns Deferred (%p)\n", smcb);
>>             break;
>>     default :
>>             /* error */
>> @@ -169,7 +168,7 @@ PINT_sm_action PINT_state_machine_invoke(struct  
>> PINT_smcb *smcb,
>>
>>     /* print post-call debugging info */
>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -                 "[SM Exiting]: (%p) %s:%s (error code: %d), (sm  
>> action: %s)\n",
>> +                 "[SM Exiting]: (%p) %s:%s (error code: %d),  
>> (action: %s)\n",
>>                  smcb,
>>                  /* skip pvfs2_ */
>>                  machine_name,
>> @@ -181,9 +180,6 @@ PINT_sm_action PINT_state_machine_invoke(struct  
>> PINT_smcb *smcb,
>>     {
>>         /* start child SMs */
>>         PINT_sm_start_child_frames(smcb);
>> -        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -                "SM (%p) started %d child frames\n",
>> -                smcb, smcb->children_running);
>>         if (smcb->children_running > 0)
>>             retval = SM_ACTION_DEFERRED;
>>         else
>> @@ -210,7 +206,7 @@ PINT_sm_action PINT_state_machine_start(struct  
>> PINT_smcb *smcb, job_status_s *r)
>>     if (ret == SM_ACTION_COMPLETE || ret == SM_ACTION_TERMINATE)
>>     {
>>         /* keep running until state machine deferrs or terminates */
>> -        ret = PINT_state_machine_next(smcb, r);
>> +        ret = PINT_state_machine_continue(smcb, r);
>>
>>         /* note that if ret == SM_ACTION_TERMINATE, we _don't_ call
>>          * PINT_state_machine_terminate here because that adds the  
>> smcb
>> @@ -239,8 +235,6 @@ PINT_sm_action PINT_state_machine_next(struct  
>> PINT_smcb *smcb, job_status_s *r)
>>         gossip_err("SM next called on invald smcb\n");
>>         return -1;
>>     }
>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -            "SM next smcb %p op %d\n",smcb,(smcb)->op);
>>     /* loop while invoke of new state returns COMPLETED */
>>     do {
>>         /* loop while returning from nested SM */
>> @@ -254,12 +248,12 @@ PINT_sm_action PINT_state_machine_next(struct  
>> PINT_smcb *smcb, job_status_s *r)
>>                 return -1;
>>             }
>>             transtbl = smcb->current_state->trtbl;
>> -
>> -	    /* for each entry in the transition table there is a return
>> -	    * code followed by a next state pointer to the new state.
>> -	    * This loops through each entry, checking for a match on the
>> -	    * return address, and then sets the new current_state and calls
>> -	    * the new state action function */
>> +
>> +            /* for each entry in the transition table there is a  
>> return
>> +             * code followed by a next state pointer to the new  
>> state.
>> +             * This loops through each entry, checking for a match  
>> on the
>> +             * return address, and then sets the new current_state  
>> and calls
>> +             * the new state action function */
>>             for (i = 0; transtbl[i].return_value != DEFAULT_ERROR; i 
>> ++)
>>             {
>>                 if (transtbl[i].return_value == r->error_code)
>> @@ -296,10 +290,8 @@ PINT_sm_action PINT_state_machine_next(struct  
>> PINT_smcb *smcb, job_status_s *r)
>>                 if(!smcb->current_state ||
>>                    smcb->current_state->trtbl[0].flag == SM_TERM)
>>                 {
>> -                    /* assume nested machine was invoked without
>> -                     * a parent, or nested machine completion  
>> results
>> -                     * in immediate termination
>> -                     */
>> +                    /* assume nested state machine was invoked  
>> without
>> +                     * a parent */
>>                     return SM_ACTION_TERMINATE;
>>                 }
>> 	    }
>> @@ -355,6 +347,8 @@ int PINT_state_machine_locate(struct PINT_smcb  
>> *smcb)
>> {
>>     struct PINT_state_s *current_tmp;
>>     struct PINT_state_machine_s *op_sm;
>> +    const char *state_name;
>> +    const char *machine_name;
>>
>>     /* check for valid inputs */
>>     if (!smcb || smcb->op < 0 || !smcb->op_get_state_machine)
>> @@ -363,7 +357,7 @@ int PINT_state_machine_locate(struct PINT_smcb  
>> *smcb)
>> 	return -PVFS_EINVAL;
>>     }
>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -            "SM locate smcb %p op %d\n",smcb,(smcb)->op);
>> +            "[SM Locating]: (%p) op-id: %d\n",smcb,(smcb)->op);
>>     /* this is a the usage dependant routine to look up the SM */
>>     op_sm = (*smcb->op_get_state_machine)(smcb->op);
>>     if (op_sm != NULL)
>> @@ -379,6 +373,14 @@ int PINT_state_machine_locate(struct PINT_smcb  
>> *smcb)
>>                            current_tmp->action.nested)->first_state;
>> 	}
>>         smcb->current_state = current_tmp;
>> +
>> +        state_name = PINT_state_machine_current_state_name(smcb);
>> +        machine_name =  
>> PINT_state_machine_current_machine_name(smcb);
>> +
>> +        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> +                     "[SM Locating]: (%p) located: %s:%s\n",
>> +                     smcb, machine_name, state_name);
>> +
>> 	return 1; /* indicates successful locate */
>>     }
>>     gossip_err("State machine not found for operation %d\n",smcb- 
>> >op);
>> @@ -393,8 +395,6 @@ int PINT_state_machine_locate(struct PINT_smcb  
>> *smcb)
>>  */
>> int PINT_smcb_set_op(struct PINT_smcb *smcb, int op)
>> {
>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -            "SM set op smcb %p op %d\n",smcb,op);
>>     smcb->op = op;
>>     return PINT_state_machine_locate(smcb);
>> }
>> @@ -499,8 +499,6 @@ int PINT_smcb_alloc(
>>     {
>>         return -PVFS_ENOMEM;
>>     }
>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -            "SM allocate smcb %p op %d\n",*smcb,op);
>>     /* zero out all members */
>>     memset(*smcb, 0, sizeof(struct PINT_smcb));
>>
>> @@ -540,8 +538,6 @@ void PINT_smcb_free(struct PINT_smcb *smcb)
>> {
>>     struct PINT_frame_s *frame_entry, *tmp;
>>     assert(smcb);
>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -             "SM free smcb %p op %d\n", smcb, smcb->op);
>>     qlist_for_each_entry_safe(frame_entry, tmp, &smcb->frames, link)
>>     {
>>         if (frame_entry->frame && frame_entry->task_id == 0)
>> @@ -564,8 +560,10 @@ void PINT_smcb_free(struct PINT_smcb *smcb)
>>  */
>> static struct PINT_state_s *PINT_pop_state(struct PINT_smcb *smcb)
>> {
>> -    assert(smcb->stackptr > 0);
>> -
>> +    if(smcb->stackptr == 0)
>> +    {
>> +        return NULL;
>> +    }
>>     return smcb->state_stack[--smcb->stackptr];
>> }
>>
>> @@ -611,9 +609,6 @@ void *PINT_sm_frame(struct PINT_smcb *smcb, int  
>> index)
>>             next = next->next;
>>         }
>>         frame_entry = qlist_entry(next, struct PINT_frame_s, link);
>> -        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -                     "FRAME GET smcb %p index %d -> frame: %p\n",
>> -                     smcb, index, frame_entry->frame);
>>         return frame_entry->frame;
>>     }
>> }
>> @@ -627,9 +622,8 @@ int PINT_sm_push_frame(struct PINT_smcb *smcb,  
>> int task_id, void *frame_p)
>> {
>>     struct PINT_frame_s *newframe;
>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -                 "PUSH FRAME %p onto smcb %p\n",
>> -                 frame_p, smcb);
>> -
>> +                 "[SM Frame PUSH]: (%p) frame: %p\n",
>> +                 smcb, frame_p);
>>     newframe = malloc(sizeof(struct PINT_frame_s));
>>     if(!newframe)
>>     {
>> @@ -638,15 +632,22 @@ int PINT_sm_push_frame(struct PINT_smcb  
>> *smcb, int task_id, void *frame_p)
>>     newframe->task_id = task_id;
>>     newframe->frame = frame_p;
>>     qlist_add(&newframe->link, &smcb->frames);
>> +    smcb->frame_count++;
>>     return 0;
>> }
>>
>> /* Function: PINT_sm_pop_frame
>> - * Params: pointer to an smcb pointer
>> + * Params: smcb - pointer to an smcb pointer
>> + *         task_id - the task id of this frame
>> + *         error_code - the frame's error if there was one.
>> + *         remaining - count of remaining frames on the smcb.
>>  * Returns: frame pointer
>>  * Synopsis: pops a frame pointer from the frame_stack and returns it
>>  */
>> -void *PINT_sm_pop_frame(struct PINT_smcb *smcb, int *error_code)
>> +void *PINT_sm_pop_frame(struct PINT_smcb *smcb,
>> +                        int *task_id,
>> +                        int *error_code,
>> +                        int *remaining)
>> {
>>     struct PINT_frame_s *frame_entry;
>>     void *frame;
>> @@ -656,17 +657,24 @@ void *PINT_sm_pop_frame(struct PINT_smcb  
>> *smcb, int *error_code)
>>         return NULL;
>>     }
>>
>> -    frame_entry = qlist_entry(&smcb->frames.next, struct  
>> PINT_frame_s, link);
>> +    frame_entry = qlist_entry(smcb->frames.next, struct  
>> PINT_frame_s, link);
>>     qlist_del(smcb->frames.next);
>> +    smcb->frame_count--;
>> +
>> +    if(remaining)
>> +    {
>> +        *remaining = smcb->frame_count;
>> +    }
>>
>>     frame = frame_entry->frame;
>>     *error_code = frame_entry->error;
>> +    *task_id = frame_entry->task_id;
>>
>>     free(frame_entry);
>>
>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> -            "POP FRAME %p from smcb %p\n",
>> -            frame, smcb);
>> +            "[SM Frame POP]: (%p) frame: %p\n",
>> +            smcb, frame);
>>     return frame;
>> }
>>
>> @@ -710,11 +718,22 @@ static void PINT_sm_start_child_frames(struct  
>> PINT_smcb *smcb)
>>     struct PINT_smcb *new_sm;
>>     struct PINT_frame_s *frame_entry;
>>     job_status_s r;
>> +    struct qlist_head *f;
>>
>>     assert(smcb);
>>
>> -    qlist_for_each_entry(frame_entry, &smcb->frames, link)
>> +    memset(&r, 0, sizeof(job_status_s));
>> +
>> +    qlist_for_each(f, &smcb->frames)
>>     {
>> +        /* skip the last since its the parent frame */
>> +        if(f->next == &smcb->frames)
>> +        {
>> +            break;
>> +        }
>> +
>> +        frame_entry = qlist_entry(f, struct PINT_frame_s, link);
>> +
>>         /* allocate smcb */
>>         PINT_smcb_alloc(&new_sm, smcb->op, 0, NULL,
>>                 child_sm_frame_terminate, smcb->context);
>> diff --git a/src/common/misc/state-machine.h b/src/common/misc/ 
>> state-machine.h
>> index 86f3220..cddaead 100644
>> --- a/src/common/misc/state-machine.h
>> +++ b/src/common/misc/state-machine.h
>> @@ -68,6 +68,7 @@ typedef struct PINT_smcb
>>     struct PINT_state_s *state_stack[PINT_STATE_STACK_SIZE];
>>
>>     struct qlist_head frames;
>> +    int frame_count;
>>
>>     /* usage specific routinet to look up SM from OP */
>>     struct PINT_state_machine_s *(*op_get_state_machine)(int);
>> @@ -152,14 +153,6 @@ enum {
>> #define SM_STATE_RETURN -1
>> #define SM_NESTED_STATE 1
>>
>> -#define SM_NONE   0
>> -#define SM_NEXT   1
>> -#define SM_RETURN 2
>> -#define SM_EXTERN 3
>> -#define SM_NESTED 5
>> -#define SM_JUMP   6
>> -#define SM_TERMINATE 7
>> -
>> /* Prototypes for functions provided by user */
>> int PINT_state_machine_complete(void *);
>>
>> @@ -196,9 +189,10 @@ int PINT_smcb_alloc(struct PINT_smcb **, int,  
>> int,
>> void PINT_smcb_free(struct PINT_smcb *);
>> void *PINT_sm_frame(struct PINT_smcb *, int);
>> int PINT_sm_push_frame(struct PINT_smcb *smcb, int task_id, void  
>> *frame_p);
>> -void *PINT_sm_pop_frame(struct PINT_smcb *smcb, int *error_code);
>> -
>> -int PINT_sm_pop_error(PINT_smcb *smcb, PVFS_error ret);
>> +void *PINT_sm_pop_frame(struct PINT_smcb *smcb,
>> +                        int *task_id,
>> +                        int *error_code,
>> +                        int *remaining);
>>
>> /* This macro is used in calls to PINT_sm_fram() */
>> #define PINT_FRAME_CURRENT 0
>>
>>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list