[Pvfs2-developers] Re: pvfs-client segfault

Troy Benjegerdes troy at scl.ameslab.gov
Wed Feb 13 17:24:13 EST 2008


http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
or
http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz

This looks pretty bad:

[D 16:10:47.741903] [SM Entering]: (0x10103cf0) sysdev_unexp_sm:post (status: 0)
[D 16:10:47.741929] [SM Exiting]: (0x10103cf0) sysdev_unexp_sm:post (error code: 0), (action: DEFERRED)
[D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for test)(-1073741839)
[D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to inlined completion
[D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL

Maybe we should have an assert in this code??

.. more info from a gdb trace..

(gdb) print smcb
$1 = (PINT_smcb *) 0x10119bd8
(gdb) print *smcb
$2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack = 
{0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
    0x0}, frames = {next = 0x10119c00, prev = 0x10119c00}, frame_count = 1,
  op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>, op = 
5, op_id = 0, parent_smcb = 0x0,
  op_terminate = 1, op_cancelled = 0, children_running = 0, op_completed 
= 1, context = 0,
  terminate_fn = 0x1005ce74 <client_state_machine_terminate>, user_ptr = 
0x0}
(gdb) print limit
$3 = 64
(gdb) print i
$4 = 0
(gdb) list PINT_sm_frame
586      * Params: pointer to smcb, stack index
587      * Returns: pointer to frame
588      * Synopsis: returns a frame off of the frame stack
589      */
590     void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
591     {
592         struct PINT_frame_s *frame_entry;
593         struct qlist_head *next;
594
595         if(qlist_empty(&smcb->frames))
(gdb)
596         {
597             gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
598                          "FRAME GET smcb %p index %d -> frame: NULL\n",
599                          smcb, index);
600             return NULL;
601         }
602         else
603         {
604             int i = 0;
605
(gdb)
606             next = smcb->frames.next;
607             while(i < index)
608             {
609                 next = next->next;
610             }
611             frame_entry = qlist_entry(next, struct PINT_frame_s, link);
612             return frame_entry->frame;
613         }
614     }
615
(gdb) print smcb->frames
$5 = {next = 0x10119c00, prev = 0x10119c00}


>
> All I get from this is that the frames qlist has a single entry,
> state_stack[4].  Not sure how it got so deep into there.  Likely
> some sort of memory corruption, or we have a fairly major
> undiscovered SM bug on our hands.
>
> If you can repeat this at will, doing a -g build and running with
> all debugging would be especially nice.  Maybe the debug log would
> show something curious.
>
> The other approach is to run under valgrind and cross fingers it
> finds something interesting.
>
> 		-- Pete
>
>> (gdb) info locals
>> i = 0
>> new_list_index = 0
>> tmp_completion_list = {0x0 <repeats 256 times>}
>> sm_p = (PINT_client_sm *) 0x0
>> __PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
>> (gdb) print op_id_array
>> $5 = (PVFS_sys_op_id *) 0xfff7d710
>> (gdb) print op_id_array[0]
>> $7 = 34
>>



More information about the Pvfs2-developers mailing list