[Pvfs2-developers] Re: pvfs-client segfault
Troy Benjegerdes
troy at scl.ameslab.gov
Wed Feb 13 17:24:13 EST 2008
http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
or
http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz
This looks pretty bad:
[D 16:10:47.741903] [SM Entering]: (0x10103cf0) sysdev_unexp_sm:post (status: 0)
[D 16:10:47.741929] [SM Exiting]: (0x10103cf0) sysdev_unexp_sm:post (error code: 0), (action: DEFERRED)
[D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for test)(-1073741839)
[D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to inlined completion
[D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL
Maybe we should have an assert in this code??
.. more info from a gdb trace..
(gdb) print smcb
$1 = (PINT_smcb *) 0x10119bd8
(gdb) print *smcb
$2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack =
{0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, frames = {next = 0x10119c00, prev = 0x10119c00}, frame_count = 1,
op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>, op =
5, op_id = 0, parent_smcb = 0x0,
op_terminate = 1, op_cancelled = 0, children_running = 0, op_completed
= 1, context = 0,
terminate_fn = 0x1005ce74 <client_state_machine_terminate>, user_ptr =
0x0}
(gdb) print limit
$3 = 64
(gdb) print i
$4 = 0
(gdb) list PINT_sm_frame
586 * Params: pointer to smcb, stack index
587 * Returns: pointer to frame
588 * Synopsis: returns a frame off of the frame stack
589 */
590 void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
591 {
592 struct PINT_frame_s *frame_entry;
593 struct qlist_head *next;
594
595 if(qlist_empty(&smcb->frames))
(gdb)
596 {
597 gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
598 "FRAME GET smcb %p index %d -> frame: NULL\n",
599 smcb, index);
600 return NULL;
601 }
602 else
603 {
604 int i = 0;
605
(gdb)
606 next = smcb->frames.next;
607 while(i < index)
608 {
609 next = next->next;
610 }
611 frame_entry = qlist_entry(next, struct PINT_frame_s, link);
612 return frame_entry->frame;
613 }
614 }
615
(gdb) print smcb->frames
$5 = {next = 0x10119c00, prev = 0x10119c00}
>
> All I get from this is that the frames qlist has a single entry,
> state_stack[4]. Not sure how it got so deep into there. Likely
> some sort of memory corruption, or we have a fairly major
> undiscovered SM bug on our hands.
>
> If you can repeat this at will, doing a -g build and running with
> all debugging would be especially nice. Maybe the debug log would
> show something curious.
>
> The other approach is to run under valgrind and cross fingers it
> finds something interesting.
>
> -- Pete
>
>> (gdb) info locals
>> i = 0
>> new_list_index = 0
>> tmp_completion_list = {0x0 <repeats 256 times>}
>> sm_p = (PINT_client_sm *) 0x0
>> __PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
>> (gdb) print op_id_array
>> $5 = (PVFS_sys_op_id *) 0xfff7d710
>> (gdb) print op_id_array[0]
>> $7 = 34
>>
More information about the Pvfs2-developers
mailing list