[Pvfs2-developers] Re: pvfs-client segfault
Sam Lang
slang at mcs.anl.gov
Wed Feb 13 18:28:08 EST 2008
Troy,
Could you also sent the stacktrace from gdb where the segfault
occurs? That's going to be the most useful info for us.
Thanks,
-sam
On Feb 13, 2008, at 4:24 PM, Troy Benjegerdes wrote:
> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
> or
> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz
>
> This looks pretty bad:
>
> [D 16:10:47.741903] [SM Entering]: (0x10103cf0) sysdev_unexp_sm:post
> (status: 0)
> [D 16:10:47.741929] [SM Exiting]: (0x10103cf0) sysdev_unexp_sm:post
> (error code: 0), (action: DEFERRED)
> [D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for
> test)(-1073741839)
> [D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to
> inlined completion
> [D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL
>
> Maybe we should have an assert in this code??
>
> .. more info from a gdb trace..
>
> (gdb) print smcb
> $1 = (PINT_smcb *) 0x10119bd8
> (gdb) print *smcb
> $2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack =
> {0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0}, frames = {next = 0x10119c00, prev = 0x10119c00}, frame_count
> = 1,
> op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>, op
> = 5, op_id = 0, parent_smcb = 0x0,
> op_terminate = 1, op_cancelled = 0, children_running = 0,
> op_completed = 1, context = 0,
> terminate_fn = 0x1005ce74 <client_state_machine_terminate>, user_ptr
> = 0x0}
> (gdb) print limit
> $3 = 64
> (gdb) print i
> $4 = 0
> (gdb) list PINT_sm_frame
> 586 * Params: pointer to smcb, stack index
> 587 * Returns: pointer to frame
> 588 * Synopsis: returns a frame off of the frame stack
> 589 */
> 590 void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
> 591 {
> 592 struct PINT_frame_s *frame_entry;
> 593 struct qlist_head *next;
> 594
> 595 if(qlist_empty(&smcb->frames))
> (gdb)
> 596 {
> 597 gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
> 598 "FRAME GET smcb %p index %d -> frame:
> NULL\n",
> 599 smcb, index);
> 600 return NULL;
> 601 }
> 602 else
> 603 {
> 604 int i = 0;
> 605
> (gdb)
> 606 next = smcb->frames.next;
> 607 while(i < index)
> 608 {
> 609 next = next->next;
> 610 }
> 611 frame_entry = qlist_entry(next, struct PINT_frame_s,
> link);
> 612 return frame_entry->frame;
> 613 }
> 614 }
> 615
> (gdb) print smcb->frames
> $5 = {next = 0x10119c00, prev = 0x10119c00}
>
>
>>
>> All I get from this is that the frames qlist has a single entry,
>> state_stack[4]. Not sure how it got so deep into there. Likely
>> some sort of memory corruption, or we have a fairly major
>> undiscovered SM bug on our hands.
>>
>> If you can repeat this at will, doing a -g build and running with
>> all debugging would be especially nice. Maybe the debug log would
>> show something curious.
>>
>> The other approach is to run under valgrind and cross fingers it
>> finds something interesting.
>>
>> -- Pete
>>
>>> (gdb) info locals
>>> i = 0
>>> new_list_index = 0
>>> tmp_completion_list = {0x0 <repeats 256 times>}
>>> sm_p = (PINT_client_sm *) 0x0
>>> __PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
>>> (gdb) print op_id_array
>>> $5 = (PVFS_sys_op_id *) 0xfff7d710
>>> (gdb) print op_id_array[0]
>>> $7 = 34
>>>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list