[Pvfs2-developers] Re: pvfs-client segfault
Troy Benjegerdes
troy at scl.ameslab.gov
Wed Feb 13 22:17:20 EST 2008
An earlier instance of this got me this backtrace:
(gdb) bt
#0 completion_list_retrieve_completed (op_id_array=0xfff7d710,
user_ptr_array=0xfff7d310, error_code_array=0xfff7d410, limit=64,
out_count=0xfff7d2f0) at ../src/client/sysint/client-state-machine.c:141
#1 0x100441b4 in PINT_client_state_machine_testsome
(op_id_array=0xfff7d710,
op_count=0xfff7d2f0, user_ptr_array=0xfff7d310,
error_code_array=0xfff7d410, timeout_ms=10)
at ../src/client/sysint/client-state-machine.c:694
#2 0x10010c00 in process_vfs_requests ()
at ../src/apps/kernel/linux/pvfs2-client-core.c:2943
#3 0x100120f4 in main (argc=<value optimized out>, argv=0xfff7dc74)
at ../src/apps/kernel/linux/pvfs2-client-core.c:3379
(gdb) print sm_p
Sam Lang wrote:
>
> Troy,
>
> Could you also sent the stacktrace from gdb where the segfault
> occurs? That's going to be the most useful info for us.
>
> Thanks,
> -sam
>
> On Feb 13, 2008, at 4:24 PM, Troy Benjegerdes wrote:
>
>> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
>> or
>> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz
>>
>> This looks pretty bad:
>>
>> [D 16:10:47.741903] [SM Entering]: (0x10103cf0) sysdev_unexp_sm:post
>> (status: 0)
>> [D 16:10:47.741929] [SM Exiting]: (0x10103cf0) sysdev_unexp_sm:post
>> (error code: 0), (action: DEFERRED)
>> [D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for
>> test)(-1073741839)
>> [D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to
>> inlined completion
>> [D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL
>>
>> Maybe we should have an assert in this code??
>>
>> .. more info from a gdb trace..
>>
>> (gdb) print smcb
>> $1 = (PINT_smcb *) 0x10119bd8
>> (gdb) print *smcb
>> $2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack =
>> {0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0}, frames = {next = 0x10119c00, prev = 0x10119c00}, frame_count
>> = 1,
>> op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>, op =
>> 5, op_id = 0, parent_smcb = 0x0,
>> op_terminate = 1, op_cancelled = 0, children_running = 0,
>> op_completed = 1, context = 0,
>> terminate_fn = 0x1005ce74 <client_state_machine_terminate>, user_ptr
>> = 0x0}
>> (gdb) print limit
>> $3 = 64
>> (gdb) print i
>> $4 = 0
>> (gdb) list PINT_sm_frame
>> 586 * Params: pointer to smcb, stack index
>> 587 * Returns: pointer to frame
>> 588 * Synopsis: returns a frame off of the frame stack
>> 589 */
>> 590 void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
>> 591 {
>> 592 struct PINT_frame_s *frame_entry;
>> 593 struct qlist_head *next;
>> 594
>> 595 if(qlist_empty(&smcb->frames))
>> (gdb)
>> 596 {
>> 597 gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> 598 "FRAME GET smcb %p index %d -> frame:
>> NULL\n",
>> 599 smcb, index);
>> 600 return NULL;
>> 601 }
>> 602 else
>> 603 {
>> 604 int i = 0;
>> 605
>> (gdb)
>> 606 next = smcb->frames.next;
>> 607 while(i < index)
>> 608 {
>> 609 next = next->next;
>> 610 }
>> 611 frame_entry = qlist_entry(next, struct PINT_frame_s,
>> link);
>> 612 return frame_entry->frame;
>> 613 }
>> 614 }
>> 615
>> (gdb) print smcb->frames
>> $5 = {next = 0x10119c00, prev = 0x10119c00}
>>
>>
>>>
>>> All I get from this is that the frames qlist has a single entry,
>>> state_stack[4]. Not sure how it got so deep into there. Likely
>>> some sort of memory corruption, or we have a fairly major
>>> undiscovered SM bug on our hands.
>>>
>>> If you can repeat this at will, doing a -g build and running with
>>> all debugging would be especially nice. Maybe the debug log would
>>> show something curious.
>>>
>>> The other approach is to run under valgrind and cross fingers it
>>> finds something interesting.
>>>
>>> -- Pete
>>>
>>>> (gdb) info locals
>>>> i = 0
>>>> new_list_index = 0
>>>> tmp_completion_list = {0x0 <repeats 256 times>}
>>>> sm_p = (PINT_client_sm *) 0x0
>>>> __PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
>>>> (gdb) print op_id_array
>>>> $5 = (PVFS_sys_op_id *) 0xfff7d710
>>>> (gdb) print op_id_array[0]
>>>> $7 = 34
>>>>
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
More information about the Pvfs2-developers
mailing list