[Pvfs2-developers] Re: pvfs-client segfault
Walter B. Ligon III
walt at CLEMSON.EDU
Mon Feb 18 17:16:52 EST 2008
You found my bug! I was having exactly that bug in the client a year
ago and never could figure out why! Cool!
- Walt
Sam Lang wrote:
>
> I committed a fix to the segfault in the client daemon. The problem
> that my patch introduced was to allow a state machine to complete
> immediately if possible (none of its actions needed to block).
> Unfortunately, the client code didn't handle this case -- we were adding
> the smcb to the completion list without checking for immediate
> completion, and then free-ing the smcb. Later (as Pete pointed out), we
> would pull the (now freed) smcb off the list and try to do something
> with it. Invalid read!
>
> So the commits I just made check if the state machine completed
> immediately and prevent the smcb from ever getting added to the
> completion list. The upside of this patch is that now the client daemon
> should be sending the response downcall directly on completion of the
> state machine, instead of just "posting" a state machine that completes
> immediately, and handling the response later.
>
> With this fix, I was able to copy 100s of files and list them without
> the client daemon falling over.
>
> Troy, Pete: Thanks for the debugging help on this one.
> -sam
>
> On Feb 14, 2008, at 9:54 AM, Samuel Lang wrote:
>
>>
>>
>> On Thu, 14 Feb 2008, Pete Wyckoff wrote:
>>
>>> pw at osc.edu wrote on Thu, 14 Feb 2008 00:19 -0500:
>>>> slang at mcs.anl.gov wrote on Wed, 13 Feb 2008 21:25 -0600:
>>>>> What happens when you restart the client daemon? Does the segfault
>>>>> occur
>>>>> with bmi_tcp?
>>>>
>>>> Yeah, I'm getting the same sort of thing, with TCP. 1 client, 1
>>>> md+data server. 2.6.24-rc6. A few "ls -la /pvfs" will crash
>>>> client-core, and it automatically is restarted. Similar sort of
>>>> backtrace. Valgrind doesn't show anything before where it all goes
>>>> bad in Troy's traces.
>>>>
>>>> ==7517== Invalid read of size 8
>>>> ==7517== at 0x4C6989E: qlist_empty (quicklist.h:117)
>>>> ==7517== by 0x4C697DD: PINT_sm_frame (state-machine-fns.c:595)
>>>> ==7517== by 0x4C270AB: completion_list_retrieve_completed
>>>> (client-state-machine.c:140)
>>>> ==7517== by 0x4C281DB: PINT_client_state_machine_testsome
>>>> (client-state-machine.c:694)
>>>> ==7517== by 0x4C285EB: PVFS_sys_testsome
>>>> (client-state-machine.c:907)
>>>> ==7517== by 0x407BED: process_vfs_requests
>>>> (pvfs2-client-core.c:2943)
>>>> ==7517== by 0x40A284: main (pvfs2-client-core.c:3379)
>>>> ==7517== Address 0x53BD870 is 80 bytes inside a block of size 176
>>>> free'd
>>>> ==7517== at 0x4A0560B: free (vg_replace_malloc.c:233)
>>>> ==7517== by 0x4C696D7: PINT_smcb_free (state-machine-fns.c:551)
>>>> ==7517== by 0x4C2768A: PINT_client_state_machine_post
>>>> (client-state-machine.c:395)
>>>> ==7517== by 0x4C29FE5: PVFS_isys_getattr (sys-getattr.sm:211)
>>>> ==7517== by 0x403D94: post_getattr_request (pvfs2-client-core.c:558)
>>>> ==7517== by 0x408648: handle_unexp_vfs_request
>>>> (pvfs2-client-core.c:2708)
>>>> ==7517== by 0x407D70: process_vfs_requests
>>>> (pvfs2-client-core.c:2990)
>>>> ==7517== by 0x40A284: main (pvfs2-client-core.c:3379)
>>>>
>>>> (Parse the second half of this first:)
>>>>
>>>> handle_unexp_vfs_request goes off to post a getattr.
>>>> PVFS_isys_getattr allocs a new smcb. PINT_client_state_machine_post
>>>> starts the SM and it must have finished immediately through a
>>>> successful acache lookup. PINT_client_state_machine_post frees the
>>>> smcb.
>>>>
>>>> (Now the top half:)
>>>>
>>>> Later testsome decides it has a completed smcb. The same one that
>>>> had been freed as above. Although maybe not related.
>>>>
>>>> This is the CVS head _before_ the big cleanup Sam did today. Are
>>>> we forgetting to initialize smcb->frames somewhere related? Looking
>>>> back for suspicious changes.
>>>>
>>>> There's the memmove() fix on s_completion_list[] by Phil back on 15
>>>> jan, but that's obviously a big fix, and it's probably not getting
>>>> triggered either. And a bunch of locking changes that are harmless.
>>>>
>>>> Needs more debugging.
>>>
>>> It's definitely this commit that causes the problem. Gotta love
>>> git-bisect. Does it offer any clues?
>>>
>>
>> Yep, I'm looking into that. Thanks Pete!
>> -sam
>>
>>> -- Pete
>>>
>>> commit 3d0ba5f2ee107c9cec913d6ba1a56b571c87c2b7
>>> Author: slang <slang>
>>> Date: Thu Feb 7 16:16:35 2008 +0000
>>>
>>> fixes to sm code for parallel jumps.
>>>
>>> diff --git a/src/common/misc/state-machine-fns.c
>>> b/src/common/misc/state-machine-fns.c
>>> index 6098ea0..c512db1 100644
>>> --- a/src/common/misc/state-machine-fns.c
>>> +++ b/src/common/misc/state-machine-fns.c
>>> @@ -70,30 +70,40 @@ int PINT_state_machine_halt(void)
>>> */
>>> int PINT_state_machine_terminate(struct PINT_smcb *smcb, job_status_s
>>> *r)
>>> {
>>> - struct PINT_frame_s *current_frame;
>>> + struct PINT_frame_s *my_frame, *f;
>>> job_id_t id;
>>>
>>> /* notify parent */
>>> if (smcb->parent_smcb)
>>> {
>>> - assert(smcb->parent_smcb->children_running > 0);
>>> -
>>> - current_frame = qlist_entry(
>>> - &smcb->frames.next, struct PINT_frame_s, link);
>>> - current_frame->error = r->error_code;
>>> -
>>> - if (--smcb->parent_smcb->children_running > 0)
>>> - {
>>> - /* SM is still deferred */
>>> - return SM_ACTION_DEFERRED;
>>> - }
>>> - else
>>> + gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> + "[SM Terminating Child]: (%p) %s:%s
>>> (error_code: %d)\n",
>>> + smcb,
>>> + /* skip pvfs2_ */
>>> + PINT_state_machine_current_machine_name(smcb),
>>> + PINT_state_machine_current_state_name(smcb),
>>> + (int32_t)r->error_code);
>>> + assert(smcb->parent_smcb->children_running > 0);
>>> +
>>> + my_frame = qlist_entry(
>>> + smcb->frames.next, struct PINT_frame_s, link);
>>> + qlist_for_each_entry(f, &smcb->parent_smcb->frames, link)
>>> + {
>>> + if(my_frame->frame == f->frame)
>>> + {
>>> + f->error = r->error_code;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + if (--smcb->parent_smcb->children_running <= 0)
>>> {
>>> /* no more child state machines running, so we can
>>> * start up the parent state machine again
>>> */
>>> job_null(0, smcb->parent_smcb, 0, r, &id, smcb->context);
>>> }
>>> + return SM_ACTION_DEFERRED;
>>> }
>>> /* call state machine completion function */
>>> if (smcb->terminate_fn)
>>> @@ -127,10 +137,6 @@ PINT_sm_action PINT_state_machine_invoke(struct
>>> PINT_smcb *smcb,
>>> return SM_ERROR;
>>> }
>>>
>>> - /* print pre-call debugging info */
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM invoke smcb %p op %d\n",smcb,(smcb)->op);
>>> -
>>> state_name = PINT_state_machine_current_state_name(smcb);
>>> machine_name = PINT_state_machine_current_machine_name(smcb);
>>>
>>> @@ -148,17 +154,10 @@ PINT_sm_action PINT_state_machine_invoke(struct
>>> PINT_smcb *smcb,
>>> switch (retval)
>>> {
>>> case SM_ACTION_TERMINATE :
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM Terminates (%p)\n", smcb);
>>> smcb->op_terminate = 1;
>>> break;
>>> case SM_ACTION_COMPLETE :
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM Returns Complete (%p)\n", smcb);
>>> - break;
>>> case SM_ACTION_DEFERRED :
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM Returns Deferred (%p)\n", smcb);
>>> break;
>>> default :
>>> /* error */
>>> @@ -169,7 +168,7 @@ PINT_sm_action PINT_state_machine_invoke(struct
>>> PINT_smcb *smcb,
>>>
>>> /* print post-call debugging info */
>>> gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "[SM Exiting]: (%p) %s:%s (error code: %d), (sm
>>> action: %s)\n",
>>> + "[SM Exiting]: (%p) %s:%s (error code: %d),
>>> (action: %s)\n",
>>> smcb,
>>> /* skip pvfs2_ */
>>> machine_name,
>>> @@ -181,9 +180,6 @@ PINT_sm_action PINT_state_machine_invoke(struct
>>> PINT_smcb *smcb,
>>> {
>>> /* start child SMs */
>>> PINT_sm_start_child_frames(smcb);
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM (%p) started %d child frames\n",
>>> - smcb, smcb->children_running);
>>> if (smcb->children_running > 0)
>>> retval = SM_ACTION_DEFERRED;
>>> else
>>> @@ -210,7 +206,7 @@ PINT_sm_action PINT_state_machine_start(struct
>>> PINT_smcb *smcb, job_status_s *r)
>>> if (ret == SM_ACTION_COMPLETE || ret == SM_ACTION_TERMINATE)
>>> {
>>> /* keep running until state machine deferrs or terminates */
>>> - ret = PINT_state_machine_next(smcb, r);
>>> + ret = PINT_state_machine_continue(smcb, r);
>>>
>>> /* note that if ret == SM_ACTION_TERMINATE, we _don't_ call
>>> * PINT_state_machine_terminate here because that adds the smcb
>>> @@ -239,8 +235,6 @@ PINT_sm_action PINT_state_machine_next(struct
>>> PINT_smcb *smcb, job_status_s *r)
>>> gossip_err("SM next called on invald smcb\n");
>>> return -1;
>>> }
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM next smcb %p op %d\n",smcb,(smcb)->op);
>>> /* loop while invoke of new state returns COMPLETED */
>>> do {
>>> /* loop while returning from nested SM */
>>> @@ -254,12 +248,12 @@ PINT_sm_action PINT_state_machine_next(struct
>>> PINT_smcb *smcb, job_status_s *r)
>>> return -1;
>>> }
>>> transtbl = smcb->current_state->trtbl;
>>> -
>>> - /* for each entry in the transition table there is a return
>>> - * code followed by a next state pointer to the new state.
>>> - * This loops through each entry, checking for a match on the
>>> - * return address, and then sets the new current_state and calls
>>> - * the new state action function */
>>> +
>>> + /* for each entry in the transition table there is a return
>>> + * code followed by a next state pointer to the new state.
>>> + * This loops through each entry, checking for a match
>>> on the
>>> + * return address, and then sets the new current_state
>>> and calls
>>> + * the new state action function */
>>> for (i = 0; transtbl[i].return_value != DEFAULT_ERROR; i++)
>>> {
>>> if (transtbl[i].return_value == r->error_code)
>>> @@ -296,10 +290,8 @@ PINT_sm_action PINT_state_machine_next(struct
>>> PINT_smcb *smcb, job_status_s *r)
>>> if(!smcb->current_state ||
>>> smcb->current_state->trtbl[0].flag == SM_TERM)
>>> {
>>> - /* assume nested machine was invoked without
>>> - * a parent, or nested machine completion results
>>> - * in immediate termination
>>> - */
>>> + /* assume nested state machine was invoked without
>>> + * a parent */
>>> return SM_ACTION_TERMINATE;
>>> }
>>> }
>>> @@ -355,6 +347,8 @@ int PINT_state_machine_locate(struct PINT_smcb
>>> *smcb)
>>> {
>>> struct PINT_state_s *current_tmp;
>>> struct PINT_state_machine_s *op_sm;
>>> + const char *state_name;
>>> + const char *machine_name;
>>>
>>> /* check for valid inputs */
>>> if (!smcb || smcb->op < 0 || !smcb->op_get_state_machine)
>>> @@ -363,7 +357,7 @@ int PINT_state_machine_locate(struct PINT_smcb
>>> *smcb)
>>> return -PVFS_EINVAL;
>>> }
>>> gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM locate smcb %p op %d\n",smcb,(smcb)->op);
>>> + "[SM Locating]: (%p) op-id: %d\n",smcb,(smcb)->op);
>>> /* this is a the usage dependant routine to look up the SM */
>>> op_sm = (*smcb->op_get_state_machine)(smcb->op);
>>> if (op_sm != NULL)
>>> @@ -379,6 +373,14 @@ int PINT_state_machine_locate(struct PINT_smcb
>>> *smcb)
>>> current_tmp->action.nested)->first_state;
>>> }
>>> smcb->current_state = current_tmp;
>>> +
>>> + state_name = PINT_state_machine_current_state_name(smcb);
>>> + machine_name = PINT_state_machine_current_machine_name(smcb);
>>> +
>>> + gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> + "[SM Locating]: (%p) located: %s:%s\n",
>>> + smcb, machine_name, state_name);
>>> +
>>> return 1; /* indicates successful locate */
>>> }
>>> gossip_err("State machine not found for operation %d\n",smcb->op);
>>> @@ -393,8 +395,6 @@ int PINT_state_machine_locate(struct PINT_smcb
>>> *smcb)
>>> */
>>> int PINT_smcb_set_op(struct PINT_smcb *smcb, int op)
>>> {
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM set op smcb %p op %d\n",smcb,op);
>>> smcb->op = op;
>>> return PINT_state_machine_locate(smcb);
>>> }
>>> @@ -499,8 +499,6 @@ int PINT_smcb_alloc(
>>> {
>>> return -PVFS_ENOMEM;
>>> }
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM allocate smcb %p op %d\n",*smcb,op);
>>> /* zero out all members */
>>> memset(*smcb, 0, sizeof(struct PINT_smcb));
>>>
>>> @@ -540,8 +538,6 @@ void PINT_smcb_free(struct PINT_smcb *smcb)
>>> {
>>> struct PINT_frame_s *frame_entry, *tmp;
>>> assert(smcb);
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "SM free smcb %p op %d\n", smcb, smcb->op);
>>> qlist_for_each_entry_safe(frame_entry, tmp, &smcb->frames, link)
>>> {
>>> if (frame_entry->frame && frame_entry->task_id == 0)
>>> @@ -564,8 +560,10 @@ void PINT_smcb_free(struct PINT_smcb *smcb)
>>> */
>>> static struct PINT_state_s *PINT_pop_state(struct PINT_smcb *smcb)
>>> {
>>> - assert(smcb->stackptr > 0);
>>> -
>>> + if(smcb->stackptr == 0)
>>> + {
>>> + return NULL;
>>> + }
>>> return smcb->state_stack[--smcb->stackptr];
>>> }
>>>
>>> @@ -611,9 +609,6 @@ void *PINT_sm_frame(struct PINT_smcb *smcb, int
>>> index)
>>> next = next->next;
>>> }
>>> frame_entry = qlist_entry(next, struct PINT_frame_s, link);
>>> - gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "FRAME GET smcb %p index %d -> frame: %p\n",
>>> - smcb, index, frame_entry->frame);
>>> return frame_entry->frame;
>>> }
>>> }
>>> @@ -627,9 +622,8 @@ int PINT_sm_push_frame(struct PINT_smcb *smcb,
>>> int task_id, void *frame_p)
>>> {
>>> struct PINT_frame_s *newframe;
>>> gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "PUSH FRAME %p onto smcb %p\n",
>>> - frame_p, smcb);
>>> -
>>> + "[SM Frame PUSH]: (%p) frame: %p\n",
>>> + smcb, frame_p);
>>> newframe = malloc(sizeof(struct PINT_frame_s));
>>> if(!newframe)
>>> {
>>> @@ -638,15 +632,22 @@ int PINT_sm_push_frame(struct PINT_smcb *smcb,
>>> int task_id, void *frame_p)
>>> newframe->task_id = task_id;
>>> newframe->frame = frame_p;
>>> qlist_add(&newframe->link, &smcb->frames);
>>> + smcb->frame_count++;
>>> return 0;
>>> }
>>>
>>> /* Function: PINT_sm_pop_frame
>>> - * Params: pointer to an smcb pointer
>>> + * Params: smcb - pointer to an smcb pointer
>>> + * task_id - the task id of this frame
>>> + * error_code - the frame's error if there was one.
>>> + * remaining - count of remaining frames on the smcb.
>>> * Returns: frame pointer
>>> * Synopsis: pops a frame pointer from the frame_stack and returns it
>>> */
>>> -void *PINT_sm_pop_frame(struct PINT_smcb *smcb, int *error_code)
>>> +void *PINT_sm_pop_frame(struct PINT_smcb *smcb,
>>> + int *task_id,
>>> + int *error_code,
>>> + int *remaining)
>>> {
>>> struct PINT_frame_s *frame_entry;
>>> void *frame;
>>> @@ -656,17 +657,24 @@ void *PINT_sm_pop_frame(struct PINT_smcb *smcb,
>>> int *error_code)
>>> return NULL;
>>> }
>>>
>>> - frame_entry = qlist_entry(&smcb->frames.next, struct
>>> PINT_frame_s, link);
>>> + frame_entry = qlist_entry(smcb->frames.next, struct
>>> PINT_frame_s, link);
>>> qlist_del(smcb->frames.next);
>>> + smcb->frame_count--;
>>> +
>>> + if(remaining)
>>> + {
>>> + *remaining = smcb->frame_count;
>>> + }
>>>
>>> frame = frame_entry->frame;
>>> *error_code = frame_entry->error;
>>> + *task_id = frame_entry->task_id;
>>>
>>> free(frame_entry);
>>>
>>> gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> - "POP FRAME %p from smcb %p\n",
>>> - frame, smcb);
>>> + "[SM Frame POP]: (%p) frame: %p\n",
>>> + smcb, frame);
>>> return frame;
>>> }
>>>
>>> @@ -710,11 +718,22 @@ static void PINT_sm_start_child_frames(struct
>>> PINT_smcb *smcb)
>>> struct PINT_smcb *new_sm;
>>> struct PINT_frame_s *frame_entry;
>>> job_status_s r;
>>> + struct qlist_head *f;
>>>
>>> assert(smcb);
>>>
>>> - qlist_for_each_entry(frame_entry, &smcb->frames, link)
>>> + memset(&r, 0, sizeof(job_status_s));
>>> +
>>> + qlist_for_each(f, &smcb->frames)
>>> {
>>> + /* skip the last since its the parent frame */
>>> + if(f->next == &smcb->frames)
>>> + {
>>> + break;
>>> + }
>>> +
>>> + frame_entry = qlist_entry(f, struct PINT_frame_s, link);
>>> +
>>> /* allocate smcb */
>>> PINT_smcb_alloc(&new_sm, smcb->op, 0, NULL,
>>> child_sm_frame_terminate, smcb->context);
>>> diff --git a/src/common/misc/state-machine.h
>>> b/src/common/misc/state-machine.h
>>> index 86f3220..cddaead 100644
>>> --- a/src/common/misc/state-machine.h
>>> +++ b/src/common/misc/state-machine.h
>>> @@ -68,6 +68,7 @@ typedef struct PINT_smcb
>>> struct PINT_state_s *state_stack[PINT_STATE_STACK_SIZE];
>>>
>>> struct qlist_head frames;
>>> + int frame_count;
>>>
>>> /* usage specific routinet to look up SM from OP */
>>> struct PINT_state_machine_s *(*op_get_state_machine)(int);
>>> @@ -152,14 +153,6 @@ enum {
>>> #define SM_STATE_RETURN -1
>>> #define SM_NESTED_STATE 1
>>>
>>> -#define SM_NONE 0
>>> -#define SM_NEXT 1
>>> -#define SM_RETURN 2
>>> -#define SM_EXTERN 3
>>> -#define SM_NESTED 5
>>> -#define SM_JUMP 6
>>> -#define SM_TERMINATE 7
>>> -
>>> /* Prototypes for functions provided by user */
>>> int PINT_state_machine_complete(void *);
>>>
>>> @@ -196,9 +189,10 @@ int PINT_smcb_alloc(struct PINT_smcb **, int, int,
>>> void PINT_smcb_free(struct PINT_smcb *);
>>> void *PINT_sm_frame(struct PINT_smcb *, int);
>>> int PINT_sm_push_frame(struct PINT_smcb *smcb, int task_id, void
>>> *frame_p);
>>> -void *PINT_sm_pop_frame(struct PINT_smcb *smcb, int *error_code);
>>> -
>>> -int PINT_sm_pop_error(PINT_smcb *smcb, PVFS_error ret);
>>> +void *PINT_sm_pop_frame(struct PINT_smcb *smcb,
>>> + int *task_id,
>>> + int *error_code,
>>> + int *remaining);
>>>
>>> /* This macro is used in calls to PINT_sm_fram() */
>>> #define PINT_FRAME_CURRENT 0
>>>
>>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
More information about the Pvfs2-developers
mailing list