[Pvfs2-developers] Re: bmi_ib failure on memfull HCA's

Troy Benjegerdes troy at scl.ameslab.gov
Tue Feb 19 17:59:34 EST 2008


Here's a little bit more info..

[E 02/19 17:00] max send/recv sge 29 30
[E 02/19 17:01] job_time_mgr_expire: job time out: cancelling flow 
operation, job_id: 2437.
[E 02/19 17:01] fp_multiqueue_cancel: flow proto cancel called on 0x6216c0
[E 02/19 17:01] handle_io_error: flow proto error cleanup started on 
0x6216c0, error_code: -1610613121

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47602573279216 (LWP 4049)]
memcache_deregister (md=<value optimized out>, buflist=0x664ff8) at 
../src/io/bmi/bmi_ib/mem.c:317
317             --c->count;
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) list
312
313         gen_mutex_lock(&memcache_device->mutex);
314         for (i=0; i<buflist->num; i++) {
315     #if ENABLE_MEMCACHE
316             memcache_entry_t *c = buflist->memcache[i];
317             --c->count;
318             debug(2,
319                "%s: dec refcount [%d] %p len %lld (via %p len %lld) 
refcnt now %d",
320                __func__, i, buflist->buf.send[i], lld(buflist->len[i]),
321                c->buf, lld(c->len), c->count);


Troy Benjegerdes wrote:
> Now this is interesting..  This is the memfree hardware, but it looks 
> like the 'bmi_ib: cancel memcache deregister states' fix didn't quite 
> do the right thing.
>
> da0:/scratch/1# tail -f pvfs2-server.log
> [D 02/19 16:25] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 
> starting.
> [E 02/19 16:25] max send/recv sge 29 30
> [E 02/19 16:25] max send/recv sge 29 30
> [E 02/19 16:26] max send/recv sge 29 30
> [E 02/19 16:27] max send/recv sge 29 30
>
>
>
> [E 02/19 16:30] job_time_mgr_expire: job time out: cancelling flow 
> operation, job_id: 9572.
> [E 02/19 16:30] fp_multiqueue_cancel: flow proto cancel called on 
> 0x66cbf0
> [E 02/19 16:30] handle_io_error: flow proto error cleanup started on 
> 0x66cbf0, error_code: -1610613121
> [E 02/19 16:30] PVFS2 server: signal 11, faulty address is 0x20, from 
> 0x429efa
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a) 
> [0x429efa]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a) 
> [0x429efa]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x4295a7]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_thread_mgr_bmi_cancel+0xa6) 
> [0x43aac6]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42ddb0]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e035]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e4e7]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_flow_cancel+0x42) 
> [0x42d4d2]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_flow_cancel+0x3d) 
> [0x4388cd]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_time_mgr_expire+0x1ca) 
> [0x43b18a]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x45b339]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_invoke+0xb5) 
> [0x446f35]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_next+0xab) 
> [0x4471cb]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_continue+0x1e) 
> [0x446dee]
> [E 02/19 16:30] [bt] 
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(main+0xe00) 
> [0x410b90]
>
>
> Pete Wyckoff wrote:
>> kschoche at gmail.com wrote on Tue, 19 Feb 2008 14:53 -0600:
>>> When migrating some of our machines to the production network, going
>>> from memfree mellanox ib cards to older pci-x memfull cards, we came
>>> across this error on the server, we're running debian 2.6.18-5-amd64
>>> for the server, and a powerpc node for client which performed the same
>>> test flawlessly against our memfree cards.
>>>
>>>
>>> [D 12:39:44.257923] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 
>>> starting.
>>> [E 13:53:07.910247] Error: openib_post_sr_rdmaw: ibv_post_send (-1).
>> [..]
>>> Looking at the infiniband driver code, since we dont have a check for
>>> this type of error in pvfs2, and I dont have the ib spec with me right
>>> now, I noticed this error code occurs when:
>>>
>>>                 if (wq_overflow(&qp->sq, nreq, 
>>> to_mcq(qp->ibv_qp.send_cq))) {
>>>                         ret = -1;
>>>                         *bad_wr = wr;
>>>                         goto out;
>>>                 }
>>>
>>> Looks like a WQ overflow?
>>> I suppose we could debug inside pvfs2 and decode the bad_wr structure
>>> to get more useful information for the future, do you know if ib spec
>>> states this as a fatal error?
>>> If its not a fatal error - though it looks fatal to me - can we
>>> attempt to repost the send with a backed-off/smaller sge list?
>>
>> Sounds like a WQ overflow to me too.  This is in the RDMA code where
>> it is necessary to do multiple RDMAs to satisfy one request,
>> sometimes.  There is not any global checking against the number of
>> available WQs.  I'm not sure why this hasn't come up before.  It
>> would be possible to phase this with an extra couple of states.
>> Just not a very fun thing to have to do!
>>
>> I'm curious what you have for SG support in that hardware.  Can
>> you add some printf around here to show the max_send_sge and
>> max_recv_sge.  I added an example line; maybe it will work:
>>
>>     /* compare the caps that came back against what we already have */
>>     gossip_err("max send/recv sge %d %d\n", att.cap.max_send_sge,
>>                att.cap.max_recv_sge);
>>     if (od->sg_max_len == 0) {
>>         od->sg_max_len = att.cap.max_send_sge;
>>         if (att.cap.max_recv_sge < od->sg_max_len)
>>             od->sg_max_len = att.cap.max_recv_sge;
>>         od->sg_tmp_array = Malloc(od->sg_max_len * 
>> sizeof(*od->sg_tmp_array));
>>     } else {
>>         if (att.cap.max_send_sge < od->sg_max_len)
>>             error("%s: new conn has smaller send SG array size %d vs 
>> %d",
>>                   __func__, att.cap.max_send_sge, od->sg_max_len);
>>         if (att.cap.max_recv_sge < od->sg_max_len)
>>             error("%s: new conn has smaller recv SG array size %d vs 
>> %d",
>>                   __func__, att.cap.max_recv_sge, od->sg_max_len);
>>     }
>>
>>
>
>



More information about the Pvfs2-developers mailing list