[Pvfs2-developers] Re: pvfs-client segfault

Walter B. Ligon III walt at CLEMSON.EDU
Mon Feb 18 17:16:52 EST 2008


You found my bug!  I was having exactly that bug in the client a year 
ago and never could figure out why!  Cool!

- Walt

Sam Lang wrote:
> 
> I committed a fix to the segfault in the client daemon.  The problem 
> that my patch introduced was to allow a state machine to complete 
> immediately if possible (none of its actions needed to block).  
> Unfortunately, the client code didn't handle this case -- we were adding 
> the smcb to the completion list without checking for immediate 
> completion, and then free-ing the smcb.  Later (as Pete pointed out), we 
> would pull the (now freed) smcb off the list and try to do something 
> with it.  Invalid read!
> 
> So the commits I just made check if the state machine completed 
> immediately and prevent the smcb from ever getting added to the 
> completion list.  The upside of this patch is that now the client daemon 
> should be sending the response downcall directly on completion of the 
> state machine, instead of just "posting" a state machine that completes 
> immediately, and handling the response later.
> 
> With this fix, I was able to copy 100s of files and list them without 
> the client daemon falling over.
> 
> Troy, Pete:  Thanks for the debugging help on this one.
> -sam
> 
> On Feb 14, 2008, at 9:54 AM, Samuel Lang wrote:
> 
>>
>>
>> On Thu, 14 Feb 2008, Pete Wyckoff wrote:
>>
>>> pw at osc.edu wrote on Thu, 14 Feb 2008 00:19 -0500:
>>>> slang at mcs.anl.gov wrote on Wed, 13 Feb 2008 21:25 -0600:
>>>>> What happens when you restart the client daemon?  Does the segfault 
>>>>> occur
>>>>> with bmi_tcp?
>>>>
>>>> Yeah, I'm getting the same sort of thing, with TCP.  1 client, 1
>>>> md+data server.  2.6.24-rc6.  A few "ls -la /pvfs" will crash
>>>> client-core, and it automatically is restarted.  Similar sort of
>>>> backtrace.  Valgrind doesn't show anything before where it all goes
>>>> bad in Troy's traces.
>>>>
>>>> ==7517== Invalid read of size 8
>>>> ==7517==    at 0x4C6989E: qlist_empty (quicklist.h:117)
>>>> ==7517==    by 0x4C697DD: PINT_sm_frame (state-machine-fns.c:595)
>>>> ==7517==    by 0x4C270AB: completion_list_retrieve_completed 
>>>> (client-state-machine.c:140)
>>>> ==7517==    by 0x4C281DB: PINT_client_state_machine_testsome 
>>>> (client-state-machine.c:694)
>>>> ==7517==    by 0x4C285EB: PVFS_sys_testsome 
>>>> (client-state-machine.c:907)
>>>> ==7517==    by 0x407BED: process_vfs_requests 
>>>> (pvfs2-client-core.c:2943)
>>>> ==7517==    by 0x40A284: main (pvfs2-client-core.c:3379)
>>>> ==7517==  Address 0x53BD870 is 80 bytes inside a block of size 176 
>>>> free'd
>>>> ==7517==    at 0x4A0560B: free (vg_replace_malloc.c:233)
>>>> ==7517==    by 0x4C696D7: PINT_smcb_free (state-machine-fns.c:551)
>>>> ==7517==    by 0x4C2768A: PINT_client_state_machine_post 
>>>> (client-state-machine.c:395)
>>>> ==7517==    by 0x4C29FE5: PVFS_isys_getattr (sys-getattr.sm:211)
>>>> ==7517==    by 0x403D94: post_getattr_request (pvfs2-client-core.c:558)
>>>> ==7517==    by 0x408648: handle_unexp_vfs_request 
>>>> (pvfs2-client-core.c:2708)
>>>> ==7517==    by 0x407D70: process_vfs_requests 
>>>> (pvfs2-client-core.c:2990)
>>>> ==7517==    by 0x40A284: main (pvfs2-client-core.c:3379)
>>>>
>>>> (Parse the second half of this first:)
>>>>
>>>> handle_unexp_vfs_request goes off to post a getattr.
>>>> PVFS_isys_getattr allocs a new smcb.  PINT_client_state_machine_post
>>>> starts the SM and it must have finished immediately through a
>>>> successful acache lookup.  PINT_client_state_machine_post frees the
>>>> smcb.
>>>>
>>>> (Now the top half:)
>>>>
>>>> Later testsome decides it has a completed smcb.  The same one that
>>>> had been freed as above.  Although maybe not related.
>>>>
>>>> This is the CVS head _before_ the big cleanup Sam did today.  Are
>>>> we forgetting to initialize smcb->frames somewhere related?  Looking
>>>> back for suspicious changes.
>>>>
>>>> There's the memmove() fix on s_completion_list[] by Phil back on 15
>>>> jan, but that's obviously a big fix, and it's probably not getting
>>>> triggered either.  And a bunch of locking changes that are harmless.
>>>>
>>>> Needs more debugging.
>>>
>>> It's definitely this commit that causes the problem.  Gotta love
>>> git-bisect.  Does it offer any clues?
>>>
>>
>> Yep, I'm looking into that.  Thanks Pete!
>> -sam
>>
>>>         -- Pete
>>>
>>> commit 3d0ba5f2ee107c9cec913d6ba1a56b571c87c2b7
>>> Author: slang <slang>
>>> Date:   Thu Feb 7 16:16:35 2008 +0000
>>>
>>>    fixes to sm code for parallel jumps.
>>>
>>> diff --git a/src/common/misc/state-machine-fns.c 
>>> b/src/common/misc/state-machine-fns.c
>>> index 6098ea0..c512db1 100644
>>> --- a/src/common/misc/state-machine-fns.c
>>> +++ b/src/common/misc/state-machine-fns.c
>>> @@ -70,30 +70,40 @@ int PINT_state_machine_halt(void)
>>>  */
>>> int PINT_state_machine_terminate(struct PINT_smcb *smcb, job_status_s 
>>> *r)
>>> {
>>> -    struct PINT_frame_s *current_frame;
>>> +    struct PINT_frame_s *my_frame, *f;
>>>     job_id_t id;
>>>
>>>     /* notify parent */
>>>     if (smcb->parent_smcb)
>>>     {
>>> -        assert(smcb->parent_smcb->children_running > 0);
>>> -
>>> -        current_frame = qlist_entry(
>>> -            &smcb->frames.next, struct PINT_frame_s, link);
>>> -        current_frame->error = r->error_code;
>>> -
>>> -        if (--smcb->parent_smcb->children_running > 0)
>>> -        {
>>> -            /* SM is still deferred */
>>> -            return SM_ACTION_DEFERRED;
>>> -        }
>>> -        else
>>> +        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> +                     "[SM Terminating Child]: (%p) %s:%s 
>>> (error_code: %d)\n",
>>> +                     smcb,
>>> +                     /* skip pvfs2_ */
>>> +                     PINT_state_machine_current_machine_name(smcb),
>>> +                     PINT_state_machine_current_state_name(smcb),
>>> +                     (int32_t)r->error_code);
>>> +         assert(smcb->parent_smcb->children_running > 0);
>>> +
>>> +         my_frame = qlist_entry(
>>> +            smcb->frames.next, struct PINT_frame_s, link);
>>> +         qlist_for_each_entry(f, &smcb->parent_smcb->frames, link)
>>> +         {
>>> +             if(my_frame->frame == f->frame)
>>> +             {
>>> +                 f->error = r->error_code;
>>> +                 break;
>>> +             }
>>> +         }
>>> +
>>> +        if (--smcb->parent_smcb->children_running <= 0)
>>>         {
>>>             /* no more child state machines running, so we can
>>>              * start up the parent state machine again
>>>              */
>>>             job_null(0, smcb->parent_smcb, 0, r, &id, smcb->context);
>>>         }
>>> +        return SM_ACTION_DEFERRED;
>>>     }
>>>     /* call state machine completion function */
>>>     if (smcb->terminate_fn)
>>> @@ -127,10 +137,6 @@ PINT_sm_action PINT_state_machine_invoke(struct 
>>> PINT_smcb *smcb,
>>>         return SM_ERROR;
>>>     }
>>>
>>> -    /* print pre-call debugging info */
>>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -            "SM invoke smcb %p op %d\n",smcb,(smcb)->op);
>>> -
>>>     state_name = PINT_state_machine_current_state_name(smcb);
>>>     machine_name = PINT_state_machine_current_machine_name(smcb);
>>>
>>> @@ -148,17 +154,10 @@ PINT_sm_action PINT_state_machine_invoke(struct 
>>> PINT_smcb *smcb,
>>>     switch (retval)
>>>     {
>>>     case SM_ACTION_TERMINATE :
>>> -            gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -                    "SM Terminates (%p)\n", smcb);
>>>             smcb->op_terminate = 1;
>>>             break;
>>>     case SM_ACTION_COMPLETE :
>>> -            gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -                    "SM Returns Complete (%p)\n", smcb);
>>> -            break;
>>>     case SM_ACTION_DEFERRED :
>>> -            gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -                    "SM Returns Deferred (%p)\n", smcb);
>>>             break;
>>>     default :
>>>             /* error */
>>> @@ -169,7 +168,7 @@ PINT_sm_action PINT_state_machine_invoke(struct 
>>> PINT_smcb *smcb,
>>>
>>>     /* print post-call debugging info */
>>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -                 "[SM Exiting]: (%p) %s:%s (error code: %d), (sm 
>>> action: %s)\n",
>>> +                 "[SM Exiting]: (%p) %s:%s (error code: %d), 
>>> (action: %s)\n",
>>>                  smcb,
>>>                  /* skip pvfs2_ */
>>>                  machine_name,
>>> @@ -181,9 +180,6 @@ PINT_sm_action PINT_state_machine_invoke(struct 
>>> PINT_smcb *smcb,
>>>     {
>>>         /* start child SMs */
>>>         PINT_sm_start_child_frames(smcb);
>>> -        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -                "SM (%p) started %d child frames\n",
>>> -                smcb, smcb->children_running);
>>>         if (smcb->children_running > 0)
>>>             retval = SM_ACTION_DEFERRED;
>>>         else
>>> @@ -210,7 +206,7 @@ PINT_sm_action PINT_state_machine_start(struct 
>>> PINT_smcb *smcb, job_status_s *r)
>>>     if (ret == SM_ACTION_COMPLETE || ret == SM_ACTION_TERMINATE)
>>>     {
>>>         /* keep running until state machine deferrs or terminates */
>>> -        ret = PINT_state_machine_next(smcb, r);
>>> +        ret = PINT_state_machine_continue(smcb, r);
>>>
>>>         /* note that if ret == SM_ACTION_TERMINATE, we _don't_ call
>>>          * PINT_state_machine_terminate here because that adds the smcb
>>> @@ -239,8 +235,6 @@ PINT_sm_action PINT_state_machine_next(struct 
>>> PINT_smcb *smcb, job_status_s *r)
>>>         gossip_err("SM next called on invald smcb\n");
>>>         return -1;
>>>     }
>>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -            "SM next smcb %p op %d\n",smcb,(smcb)->op);
>>>     /* loop while invoke of new state returns COMPLETED */
>>>     do {
>>>         /* loop while returning from nested SM */
>>> @@ -254,12 +248,12 @@ PINT_sm_action PINT_state_machine_next(struct 
>>> PINT_smcb *smcb, job_status_s *r)
>>>                 return -1;
>>>             }
>>>             transtbl = smcb->current_state->trtbl;
>>> -
>>> -        /* for each entry in the transition table there is a return
>>> -        * code followed by a next state pointer to the new state.
>>> -        * This loops through each entry, checking for a match on the
>>> -        * return address, and then sets the new current_state and calls
>>> -        * the new state action function */
>>> +
>>> +            /* for each entry in the transition table there is a return
>>> +             * code followed by a next state pointer to the new state.
>>> +             * This loops through each entry, checking for a match 
>>> on the
>>> +             * return address, and then sets the new current_state 
>>> and calls
>>> +             * the new state action function */
>>>             for (i = 0; transtbl[i].return_value != DEFAULT_ERROR; i++)
>>>             {
>>>                 if (transtbl[i].return_value == r->error_code)
>>> @@ -296,10 +290,8 @@ PINT_sm_action PINT_state_machine_next(struct 
>>> PINT_smcb *smcb, job_status_s *r)
>>>                 if(!smcb->current_state ||
>>>                    smcb->current_state->trtbl[0].flag == SM_TERM)
>>>                 {
>>> -                    /* assume nested machine was invoked without
>>> -                     * a parent, or nested machine completion results
>>> -                     * in immediate termination
>>> -                     */
>>> +                    /* assume nested state machine was invoked without
>>> +                     * a parent */
>>>                     return SM_ACTION_TERMINATE;
>>>                 }
>>>         }
>>> @@ -355,6 +347,8 @@ int PINT_state_machine_locate(struct PINT_smcb 
>>> *smcb)
>>> {
>>>     struct PINT_state_s *current_tmp;
>>>     struct PINT_state_machine_s *op_sm;
>>> +    const char *state_name;
>>> +    const char *machine_name;
>>>
>>>     /* check for valid inputs */
>>>     if (!smcb || smcb->op < 0 || !smcb->op_get_state_machine)
>>> @@ -363,7 +357,7 @@ int PINT_state_machine_locate(struct PINT_smcb 
>>> *smcb)
>>>     return -PVFS_EINVAL;
>>>     }
>>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -            "SM locate smcb %p op %d\n",smcb,(smcb)->op);
>>> +            "[SM Locating]: (%p) op-id: %d\n",smcb,(smcb)->op);
>>>     /* this is a the usage dependant routine to look up the SM */
>>>     op_sm = (*smcb->op_get_state_machine)(smcb->op);
>>>     if (op_sm != NULL)
>>> @@ -379,6 +373,14 @@ int PINT_state_machine_locate(struct PINT_smcb 
>>> *smcb)
>>>                            current_tmp->action.nested)->first_state;
>>>     }
>>>         smcb->current_state = current_tmp;
>>> +
>>> +        state_name = PINT_state_machine_current_state_name(smcb);
>>> +        machine_name = PINT_state_machine_current_machine_name(smcb);
>>> +
>>> +        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> +                     "[SM Locating]: (%p) located: %s:%s\n",
>>> +                     smcb, machine_name, state_name);
>>> +
>>>     return 1; /* indicates successful locate */
>>>     }
>>>     gossip_err("State machine not found for operation %d\n",smcb->op);
>>> @@ -393,8 +395,6 @@ int PINT_state_machine_locate(struct PINT_smcb 
>>> *smcb)
>>>  */
>>> int PINT_smcb_set_op(struct PINT_smcb *smcb, int op)
>>> {
>>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -            "SM set op smcb %p op %d\n",smcb,op);
>>>     smcb->op = op;
>>>     return PINT_state_machine_locate(smcb);
>>> }
>>> @@ -499,8 +499,6 @@ int PINT_smcb_alloc(
>>>     {
>>>         return -PVFS_ENOMEM;
>>>     }
>>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -            "SM allocate smcb %p op %d\n",*smcb,op);
>>>     /* zero out all members */
>>>     memset(*smcb, 0, sizeof(struct PINT_smcb));
>>>
>>> @@ -540,8 +538,6 @@ void PINT_smcb_free(struct PINT_smcb *smcb)
>>> {
>>>     struct PINT_frame_s *frame_entry, *tmp;
>>>     assert(smcb);
>>> -    gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -             "SM free smcb %p op %d\n", smcb, smcb->op);
>>>     qlist_for_each_entry_safe(frame_entry, tmp, &smcb->frames, link)
>>>     {
>>>         if (frame_entry->frame && frame_entry->task_id == 0)
>>> @@ -564,8 +560,10 @@ void PINT_smcb_free(struct PINT_smcb *smcb)
>>>  */
>>> static struct PINT_state_s *PINT_pop_state(struct PINT_smcb *smcb)
>>> {
>>> -    assert(smcb->stackptr > 0);
>>> -
>>> +    if(smcb->stackptr == 0)
>>> +    {
>>> +        return NULL;
>>> +    }
>>>     return smcb->state_stack[--smcb->stackptr];
>>> }
>>>
>>> @@ -611,9 +609,6 @@ void *PINT_sm_frame(struct PINT_smcb *smcb, int 
>>> index)
>>>             next = next->next;
>>>         }
>>>         frame_entry = qlist_entry(next, struct PINT_frame_s, link);
>>> -        gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -                     "FRAME GET smcb %p index %d -> frame: %p\n",
>>> -                     smcb, index, frame_entry->frame);
>>>         return frame_entry->frame;
>>>     }
>>> }
>>> @@ -627,9 +622,8 @@ int PINT_sm_push_frame(struct PINT_smcb *smcb, 
>>> int task_id, void *frame_p)
>>> {
>>>     struct PINT_frame_s *newframe;
>>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -                 "PUSH FRAME %p onto smcb %p\n",
>>> -                 frame_p, smcb);
>>> -
>>> +                 "[SM Frame PUSH]: (%p) frame: %p\n",
>>> +                 smcb, frame_p);
>>>     newframe = malloc(sizeof(struct PINT_frame_s));
>>>     if(!newframe)
>>>     {
>>> @@ -638,15 +632,22 @@ int PINT_sm_push_frame(struct PINT_smcb *smcb, 
>>> int task_id, void *frame_p)
>>>     newframe->task_id = task_id;
>>>     newframe->frame = frame_p;
>>>     qlist_add(&newframe->link, &smcb->frames);
>>> +    smcb->frame_count++;
>>>     return 0;
>>> }
>>>
>>> /* Function: PINT_sm_pop_frame
>>> - * Params: pointer to an smcb pointer
>>> + * Params: smcb - pointer to an smcb pointer
>>> + *         task_id - the task id of this frame
>>> + *         error_code - the frame's error if there was one.
>>> + *         remaining - count of remaining frames on the smcb.
>>>  * Returns: frame pointer
>>>  * Synopsis: pops a frame pointer from the frame_stack and returns it
>>>  */
>>> -void *PINT_sm_pop_frame(struct PINT_smcb *smcb, int *error_code)
>>> +void *PINT_sm_pop_frame(struct PINT_smcb *smcb,
>>> +                        int *task_id,
>>> +                        int *error_code,
>>> +                        int *remaining)
>>> {
>>>     struct PINT_frame_s *frame_entry;
>>>     void *frame;
>>> @@ -656,17 +657,24 @@ void *PINT_sm_pop_frame(struct PINT_smcb *smcb, 
>>> int *error_code)
>>>         return NULL;
>>>     }
>>>
>>> -    frame_entry = qlist_entry(&smcb->frames.next, struct 
>>> PINT_frame_s, link);
>>> +    frame_entry = qlist_entry(smcb->frames.next, struct 
>>> PINT_frame_s, link);
>>>     qlist_del(smcb->frames.next);
>>> +    smcb->frame_count--;
>>> +
>>> +    if(remaining)
>>> +    {
>>> +        *remaining = smcb->frame_count;
>>> +    }
>>>
>>>     frame = frame_entry->frame;
>>>     *error_code = frame_entry->error;
>>> +    *task_id = frame_entry->task_id;
>>>
>>>     free(frame_entry);
>>>
>>>     gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> -            "POP FRAME %p from smcb %p\n",
>>> -            frame, smcb);
>>> +            "[SM Frame POP]: (%p) frame: %p\n",
>>> +            smcb, frame);
>>>     return frame;
>>> }
>>>
>>> @@ -710,11 +718,22 @@ static void PINT_sm_start_child_frames(struct 
>>> PINT_smcb *smcb)
>>>     struct PINT_smcb *new_sm;
>>>     struct PINT_frame_s *frame_entry;
>>>     job_status_s r;
>>> +    struct qlist_head *f;
>>>
>>>     assert(smcb);
>>>
>>> -    qlist_for_each_entry(frame_entry, &smcb->frames, link)
>>> +    memset(&r, 0, sizeof(job_status_s));
>>> +
>>> +    qlist_for_each(f, &smcb->frames)
>>>     {
>>> +        /* skip the last since its the parent frame */
>>> +        if(f->next == &smcb->frames)
>>> +        {
>>> +            break;
>>> +        }
>>> +
>>> +        frame_entry = qlist_entry(f, struct PINT_frame_s, link);
>>> +
>>>         /* allocate smcb */
>>>         PINT_smcb_alloc(&new_sm, smcb->op, 0, NULL,
>>>                 child_sm_frame_terminate, smcb->context);
>>> diff --git a/src/common/misc/state-machine.h 
>>> b/src/common/misc/state-machine.h
>>> index 86f3220..cddaead 100644
>>> --- a/src/common/misc/state-machine.h
>>> +++ b/src/common/misc/state-machine.h
>>> @@ -68,6 +68,7 @@ typedef struct PINT_smcb
>>>     struct PINT_state_s *state_stack[PINT_STATE_STACK_SIZE];
>>>
>>>     struct qlist_head frames;
>>> +    int frame_count;
>>>
>>>     /* usage specific routinet to look up SM from OP */
>>>     struct PINT_state_machine_s *(*op_get_state_machine)(int);
>>> @@ -152,14 +153,6 @@ enum {
>>> #define SM_STATE_RETURN -1
>>> #define SM_NESTED_STATE 1
>>>
>>> -#define SM_NONE   0
>>> -#define SM_NEXT   1
>>> -#define SM_RETURN 2
>>> -#define SM_EXTERN 3
>>> -#define SM_NESTED 5
>>> -#define SM_JUMP   6
>>> -#define SM_TERMINATE 7
>>> -
>>> /* Prototypes for functions provided by user */
>>> int PINT_state_machine_complete(void *);
>>>
>>> @@ -196,9 +189,10 @@ int PINT_smcb_alloc(struct PINT_smcb **, int, int,
>>> void PINT_smcb_free(struct PINT_smcb *);
>>> void *PINT_sm_frame(struct PINT_smcb *, int);
>>> int PINT_sm_push_frame(struct PINT_smcb *smcb, int task_id, void 
>>> *frame_p);
>>> -void *PINT_sm_pop_frame(struct PINT_smcb *smcb, int *error_code);
>>> -
>>> -int PINT_sm_pop_error(PINT_smcb *smcb, PVFS_error ret);
>>> +void *PINT_sm_pop_frame(struct PINT_smcb *smcb,
>>> +                        int *task_id,
>>> +                        int *error_code,
>>> +                        int *remaining);
>>>
>>> /* This macro is used in calls to PINT_sm_fram() */
>>> #define PINT_FRAME_CURRENT 0
>>>
>>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> 
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

-- 
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


More information about the Pvfs2-developers mailing list