[Pvfs2-developers] Re: pvfs-client segfault
Sam Lang
slang at mcs.anl.gov
Wed Feb 13 22:25:02 EST 2008
What happens when you restart the client daemon? Does the segfault
occur with bmi_tcp?
-sam
On Feb 13, 2008, at 9:17 PM, Troy Benjegerdes wrote:
> An earlier instance of this got me this backtrace:
>
> (gdb) bt
> #0 completion_list_retrieve_completed (op_id_array=0xfff7d710,
> user_ptr_array=0xfff7d310, error_code_array=0xfff7d410, limit=64,
> out_count=0xfff7d2f0) at ../src/client/sysint/client-state-
> machine.c:141
> #1 0x100441b4 in PINT_client_state_machine_testsome
> (op_id_array=0xfff7d710,
> op_count=0xfff7d2f0, user_ptr_array=0xfff7d310,
> error_code_array=0xfff7d410, timeout_ms=10)
> at ../src/client/sysint/client-state-machine.c:694
> #2 0x10010c00 in process_vfs_requests ()
> at ../src/apps/kernel/linux/pvfs2-client-core.c:2943
> #3 0x100120f4 in main (argc=<value optimized out>, argv=0xfff7dc74)
> at ../src/apps/kernel/linux/pvfs2-client-core.c:3379
> (gdb) print sm_p
>
> Sam Lang wrote:
>>
>> Troy,
>>
>> Could you also sent the stacktrace from gdb where the segfault
>> occurs? That's going to be the most useful info for us.
>>
>> Thanks,
>> -sam
>>
>> On Feb 13, 2008, at 4:24 PM, Troy Benjegerdes wrote:
>>
>>> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
>>> or
>>> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz
>>>
>>> This looks pretty bad:
>>>
>>> [D 16:10:47.741903] [SM Entering]: (0x10103cf0)
>>> sysdev_unexp_sm:post (status: 0)
>>> [D 16:10:47.741929] [SM Exiting]: (0x10103cf0)
>>> sysdev_unexp_sm:post (error code: 0), (action: DEFERRED)
>>> [D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for
>>> test)(-1073741839)
>>> [D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to
>>> inlined completion
>>> [D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL
>>>
>>> Maybe we should have an assert in this code??
>>>
>>> .. more info from a gdb trace..
>>>
>>> (gdb) print smcb
>>> $1 = (PINT_smcb *) 0x10119bd8
>>> (gdb) print *smcb
>>> $2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack =
>>> {0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>>> 0x0}, frames = {next = 0x10119c00, prev = 0x10119c00},
>>> frame_count = 1,
>>> op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>,
>>> op = 5, op_id = 0, parent_smcb = 0x0,
>>> op_terminate = 1, op_cancelled = 0, children_running = 0,
>>> op_completed = 1, context = 0,
>>> terminate_fn = 0x1005ce74 <client_state_machine_terminate>,
>>> user_ptr = 0x0}
>>> (gdb) print limit
>>> $3 = 64
>>> (gdb) print i
>>> $4 = 0
>>> (gdb) list PINT_sm_frame
>>> 586 * Params: pointer to smcb, stack index
>>> 587 * Returns: pointer to frame
>>> 588 * Synopsis: returns a frame off of the frame stack
>>> 589 */
>>> 590 void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
>>> 591 {
>>> 592 struct PINT_frame_s *frame_entry;
>>> 593 struct qlist_head *next;
>>> 594
>>> 595 if(qlist_empty(&smcb->frames))
>>> (gdb)
>>> 596 {
>>> 597 gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>>> 598 "FRAME GET smcb %p index %d -> frame:
>>> NULL\n",
>>> 599 smcb, index);
>>> 600 return NULL;
>>> 601 }
>>> 602 else
>>> 603 {
>>> 604 int i = 0;
>>> 605
>>> (gdb)
>>> 606 next = smcb->frames.next;
>>> 607 while(i < index)
>>> 608 {
>>> 609 next = next->next;
>>> 610 }
>>> 611 frame_entry = qlist_entry(next, struct
>>> PINT_frame_s, link);
>>> 612 return frame_entry->frame;
>>> 613 }
>>> 614 }
>>> 615
>>> (gdb) print smcb->frames
>>> $5 = {next = 0x10119c00, prev = 0x10119c00}
>>>
>>>
>>>>
>>>> All I get from this is that the frames qlist has a single entry,
>>>> state_stack[4]. Not sure how it got so deep into there. Likely
>>>> some sort of memory corruption, or we have a fairly major
>>>> undiscovered SM bug on our hands.
>>>>
>>>> If you can repeat this at will, doing a -g build and running with
>>>> all debugging would be especially nice. Maybe the debug log would
>>>> show something curious.
>>>>
>>>> The other approach is to run under valgrind and cross fingers it
>>>> finds something interesting.
>>>>
>>>> -- Pete
>>>>
>>>>> (gdb) info locals
>>>>> i = 0
>>>>> new_list_index = 0
>>>>> tmp_completion_list = {0x0 <repeats 256 times>}
>>>>> sm_p = (PINT_client_sm *) 0x0
>>>>> __PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
>>>>> (gdb) print op_id_array
>>>>> $5 = (PVFS_sys_op_id *) 0xfff7d710
>>>>> (gdb) print op_id_array[0]
>>>>> $7 = 34
>>>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>
>
More information about the Pvfs2-developers
mailing list