[Pvfs2-developers] Re: pvfs-client segfault

Troy Benjegerdes troy at scl.ameslab.gov
Wed Feb 13 22:17:20 EST 2008


An earlier instance of this got me this backtrace:

(gdb) bt
#0  completion_list_retrieve_completed (op_id_array=0xfff7d710,
   user_ptr_array=0xfff7d310, error_code_array=0xfff7d410, limit=64,
   out_count=0xfff7d2f0) at ../src/client/sysint/client-state-machine.c:141
#1  0x100441b4 in PINT_client_state_machine_testsome 
(op_id_array=0xfff7d710,
   op_count=0xfff7d2f0, user_ptr_array=0xfff7d310,
   error_code_array=0xfff7d410, timeout_ms=10)
   at ../src/client/sysint/client-state-machine.c:694
#2  0x10010c00 in process_vfs_requests ()
   at ../src/apps/kernel/linux/pvfs2-client-core.c:2943
#3  0x100120f4 in main (argc=<value optimized out>, argv=0xfff7dc74)
   at ../src/apps/kernel/linux/pvfs2-client-core.c:3379
(gdb) print sm_p

Sam Lang wrote:
>
> Troy,
>
> Could you also sent the stacktrace from gdb where the segfault 
> occurs?  That's going to be the most useful info for us.
>
> Thanks,
> -sam
>
> On Feb 13, 2008, at 4:24 PM, Troy Benjegerdes wrote:
>
>> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
>> or
>> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz
>>
>> This looks pretty bad:
>>
>> [D 16:10:47.741903] [SM Entering]: (0x10103cf0) sysdev_unexp_sm:post 
>> (status: 0)
>> [D 16:10:47.741929] [SM Exiting]: (0x10103cf0) sysdev_unexp_sm:post 
>> (error code: 0), (action: DEFERRED)
>> [D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for 
>> test)(-1073741839)
>> [D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to 
>> inlined completion
>> [D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL
>>
>> Maybe we should have an assert in this code??
>>
>> .. more info from a gdb trace..
>>
>> (gdb) print smcb
>> $1 = (PINT_smcb *) 0x10119bd8
>> (gdb) print *smcb
>> $2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack = 
>> {0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>>   0x0}, frames = {next = 0x10119c00, prev = 0x10119c00}, frame_count 
>> = 1,
>> op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>, op = 
>> 5, op_id = 0, parent_smcb = 0x0,
>> op_terminate = 1, op_cancelled = 0, children_running = 0, 
>> op_completed = 1, context = 0,
>> terminate_fn = 0x1005ce74 <client_state_machine_terminate>, user_ptr 
>> = 0x0}
>> (gdb) print limit
>> $3 = 64
>> (gdb) print i
>> $4 = 0
>> (gdb) list PINT_sm_frame
>> 586      * Params: pointer to smcb, stack index
>> 587      * Returns: pointer to frame
>> 588      * Synopsis: returns a frame off of the frame stack
>> 589      */
>> 590     void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
>> 591     {
>> 592         struct PINT_frame_s *frame_entry;
>> 593         struct qlist_head *next;
>> 594
>> 595         if(qlist_empty(&smcb->frames))
>> (gdb)
>> 596         {
>> 597             gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
>> 598                          "FRAME GET smcb %p index %d -> frame: 
>> NULL\n",
>> 599                          smcb, index);
>> 600             return NULL;
>> 601         }
>> 602         else
>> 603         {
>> 604             int i = 0;
>> 605
>> (gdb)
>> 606             next = smcb->frames.next;
>> 607             while(i < index)
>> 608             {
>> 609                 next = next->next;
>> 610             }
>> 611             frame_entry = qlist_entry(next, struct PINT_frame_s, 
>> link);
>> 612             return frame_entry->frame;
>> 613         }
>> 614     }
>> 615
>> (gdb) print smcb->frames
>> $5 = {next = 0x10119c00, prev = 0x10119c00}
>>
>>
>>>
>>> All I get from this is that the frames qlist has a single entry,
>>> state_stack[4].  Not sure how it got so deep into there.  Likely
>>> some sort of memory corruption, or we have a fairly major
>>> undiscovered SM bug on our hands.
>>>
>>> If you can repeat this at will, doing a -g build and running with
>>> all debugging would be especially nice.  Maybe the debug log would
>>> show something curious.
>>>
>>> The other approach is to run under valgrind and cross fingers it
>>> finds something interesting.
>>>
>>>         -- Pete
>>>
>>>> (gdb) info locals
>>>> i = 0
>>>> new_list_index = 0
>>>> tmp_completion_list = {0x0 <repeats 256 times>}
>>>> sm_p = (PINT_client_sm *) 0x0
>>>> __PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
>>>> (gdb) print op_id_array
>>>> $5 = (PVFS_sys_op_id *) 0xfff7d710
>>>> (gdb) print op_id_array[0]
>>>> $7 = 34
>>>>
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>



More information about the Pvfs2-developers mailing list