[Pvfs2-developers] Re: bmi_ib failure on memfull HCA's
Troy Benjegerdes
troy at scl.ameslab.gov
Tue Feb 19 17:59:34 EST 2008
Here's a little bit more info..
[E 02/19 17:00] max send/recv sge 29 30
[E 02/19 17:01] job_time_mgr_expire: job time out: cancelling flow
operation, job_id: 2437.
[E 02/19 17:01] fp_multiqueue_cancel: flow proto cancel called on 0x6216c0
[E 02/19 17:01] handle_io_error: flow proto error cleanup started on
0x6216c0, error_code: -1610613121
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47602573279216 (LWP 4049)]
memcache_deregister (md=<value optimized out>, buflist=0x664ff8) at
../src/io/bmi/bmi_ib/mem.c:317
317 --c->count;
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) list
312
313 gen_mutex_lock(&memcache_device->mutex);
314 for (i=0; i<buflist->num; i++) {
315 #if ENABLE_MEMCACHE
316 memcache_entry_t *c = buflist->memcache[i];
317 --c->count;
318 debug(2,
319 "%s: dec refcount [%d] %p len %lld (via %p len %lld)
refcnt now %d",
320 __func__, i, buflist->buf.send[i], lld(buflist->len[i]),
321 c->buf, lld(c->len), c->count);
Troy Benjegerdes wrote:
> Now this is interesting.. This is the memfree hardware, but it looks
> like the 'bmi_ib: cancel memcache deregister states' fix didn't quite
> do the right thing.
>
> da0:/scratch/1# tail -f pvfs2-server.log
> [D 02/19 16:25] PVFS2 Server version 2.7.1pre1-2008-02-19-171553
> starting.
> [E 02/19 16:25] max send/recv sge 29 30
> [E 02/19 16:25] max send/recv sge 29 30
> [E 02/19 16:26] max send/recv sge 29 30
> [E 02/19 16:27] max send/recv sge 29 30
>
>
>
> [E 02/19 16:30] job_time_mgr_expire: job time out: cancelling flow
> operation, job_id: 9572.
> [E 02/19 16:30] fp_multiqueue_cancel: flow proto cancel called on
> 0x66cbf0
> [E 02/19 16:30] handle_io_error: flow proto error cleanup started on
> 0x66cbf0, error_code: -1610613121
> [E 02/19 16:30] PVFS2 server: signal 11, faulty address is 0x20, from
> 0x429efa
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)
> [0x429efa]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)
> [0x429efa]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x4295a7]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_thread_mgr_bmi_cancel+0xa6)
> [0x43aac6]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42ddb0]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e035]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e4e7]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_flow_cancel+0x42)
> [0x42d4d2]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_flow_cancel+0x3d)
> [0x4388cd]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_time_mgr_expire+0x1ca)
> [0x43b18a]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x45b339]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_invoke+0xb5)
> [0x446f35]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_next+0xab)
> [0x4471cb]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_continue+0x1e)
> [0x446dee]
> [E 02/19 16:30] [bt]
> /usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(main+0xe00)
> [0x410b90]
>
>
> Pete Wyckoff wrote:
>> kschoche at gmail.com wrote on Tue, 19 Feb 2008 14:53 -0600:
>>> When migrating some of our machines to the production network, going
>>> from memfree mellanox ib cards to older pci-x memfull cards, we came
>>> across this error on the server, we're running debian 2.6.18-5-amd64
>>> for the server, and a powerpc node for client which performed the same
>>> test flawlessly against our memfree cards.
>>>
>>>
>>> [D 12:39:44.257923] PVFS2 Server version 2.7.1pre1-2008-02-19-171553
>>> starting.
>>> [E 13:53:07.910247] Error: openib_post_sr_rdmaw: ibv_post_send (-1).
>> [..]
>>> Looking at the infiniband driver code, since we dont have a check for
>>> this type of error in pvfs2, and I dont have the ib spec with me right
>>> now, I noticed this error code occurs when:
>>>
>>> if (wq_overflow(&qp->sq, nreq,
>>> to_mcq(qp->ibv_qp.send_cq))) {
>>> ret = -1;
>>> *bad_wr = wr;
>>> goto out;
>>> }
>>>
>>> Looks like a WQ overflow?
>>> I suppose we could debug inside pvfs2 and decode the bad_wr structure
>>> to get more useful information for the future, do you know if ib spec
>>> states this as a fatal error?
>>> If its not a fatal error - though it looks fatal to me - can we
>>> attempt to repost the send with a backed-off/smaller sge list?
>>
>> Sounds like a WQ overflow to me too. This is in the RDMA code where
>> it is necessary to do multiple RDMAs to satisfy one request,
>> sometimes. There is not any global checking against the number of
>> available WQs. I'm not sure why this hasn't come up before. It
>> would be possible to phase this with an extra couple of states.
>> Just not a very fun thing to have to do!
>>
>> I'm curious what you have for SG support in that hardware. Can
>> you add some printf around here to show the max_send_sge and
>> max_recv_sge. I added an example line; maybe it will work:
>>
>> /* compare the caps that came back against what we already have */
>> gossip_err("max send/recv sge %d %d\n", att.cap.max_send_sge,
>> att.cap.max_recv_sge);
>> if (od->sg_max_len == 0) {
>> od->sg_max_len = att.cap.max_send_sge;
>> if (att.cap.max_recv_sge < od->sg_max_len)
>> od->sg_max_len = att.cap.max_recv_sge;
>> od->sg_tmp_array = Malloc(od->sg_max_len *
>> sizeof(*od->sg_tmp_array));
>> } else {
>> if (att.cap.max_send_sge < od->sg_max_len)
>> error("%s: new conn has smaller send SG array size %d vs
>> %d",
>> __func__, att.cap.max_send_sge, od->sg_max_len);
>> if (att.cap.max_recv_sge < od->sg_max_len)
>> error("%s: new conn has smaller recv SG array size %d vs
>> %d",
>> __func__, att.cap.max_recv_sge, od->sg_max_len);
>> }
>>
>>
>
>
More information about the Pvfs2-developers
mailing list