[Pvfs2-developers] Re: bmi_ib failure on memfull HCA's
Troy Benjegerdes
troy at scl.ameslab.gov
Tue Feb 19 17:32:02 EST 2008
Now this is interesting.. This is the memfree hardware, but it looks
like the 'bmi_ib: cancel memcache deregister states' fix didn't quite do
the right thing.
da0:/scratch/1# tail -f pvfs2-server.log
[D 02/19 16:25] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 starting.
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:26] max send/recv sge 29 30
[E 02/19 16:27] max send/recv sge 29 30
[E 02/19 16:30] job_time_mgr_expire: job time out: cancelling flow
operation, job_id: 9572.
[E 02/19 16:30] fp_multiqueue_cancel: flow proto cancel called on 0x66cbf0
[E 02/19 16:30] handle_io_error: flow proto error cleanup started on
0x66cbf0, error_code: -1610613121
[E 02/19 16:30] PVFS2 server: signal 11, faulty address is 0x20, from
0x429efa
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)
[0x429efa]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a)
[0x429efa]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x4295a7]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_thread_mgr_bmi_cancel+0xa6)
[0x43aac6]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42ddb0]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e035]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e4e7]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_flow_cancel+0x42)
[0x42d4d2]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_flow_cancel+0x3d)
[0x4388cd]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_time_mgr_expire+0x1ca)
[0x43b18a]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x45b339]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_invoke+0xb5)
[0x446f35]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_next+0xab)
[0x4471cb]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_continue+0x1e)
[0x446dee]
[E 02/19 16:30] [bt]
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(main+0xe00)
[0x410b90]
Pete Wyckoff wrote:
> kschoche at gmail.com wrote on Tue, 19 Feb 2008 14:53 -0600:
>> When migrating some of our machines to the production network, going
>> from memfree mellanox ib cards to older pci-x memfull cards, we came
>> across this error on the server, we're running debian 2.6.18-5-amd64
>> for the server, and a powerpc node for client which performed the same
>> test flawlessly against our memfree cards.
>>
>>
>> [D 12:39:44.257923] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 starting.
>> [E 13:53:07.910247] Error: openib_post_sr_rdmaw: ibv_post_send (-1).
> [..]
>> Looking at the infiniband driver code, since we dont have a check for
>> this type of error in pvfs2, and I dont have the ib spec with me right
>> now, I noticed this error code occurs when:
>>
>> if (wq_overflow(&qp->sq, nreq, to_mcq(qp->ibv_qp.send_cq))) {
>> ret = -1;
>> *bad_wr = wr;
>> goto out;
>> }
>>
>> Looks like a WQ overflow?
>> I suppose we could debug inside pvfs2 and decode the bad_wr structure
>> to get more useful information for the future, do you know if ib spec
>> states this as a fatal error?
>> If its not a fatal error - though it looks fatal to me - can we
>> attempt to repost the send with a backed-off/smaller sge list?
>
> Sounds like a WQ overflow to me too. This is in the RDMA code where
> it is necessary to do multiple RDMAs to satisfy one request,
> sometimes. There is not any global checking against the number of
> available WQs. I'm not sure why this hasn't come up before. It
> would be possible to phase this with an extra couple of states.
> Just not a very fun thing to have to do!
>
> I'm curious what you have for SG support in that hardware. Can
> you add some printf around here to show the max_send_sge and
> max_recv_sge. I added an example line; maybe it will work:
>
> /* compare the caps that came back against what we already have */
> gossip_err("max send/recv sge %d %d\n", att.cap.max_send_sge,
> att.cap.max_recv_sge);
> if (od->sg_max_len == 0) {
> od->sg_max_len = att.cap.max_send_sge;
> if (att.cap.max_recv_sge < od->sg_max_len)
> od->sg_max_len = att.cap.max_recv_sge;
> od->sg_tmp_array = Malloc(od->sg_max_len * sizeof(*od->sg_tmp_array));
> } else {
> if (att.cap.max_send_sge < od->sg_max_len)
> error("%s: new conn has smaller send SG array size %d vs %d",
> __func__, att.cap.max_send_sge, od->sg_max_len);
> if (att.cap.max_recv_sge < od->sg_max_len)
> error("%s: new conn has smaller recv SG array size %d vs %d",
> __func__, att.cap.max_recv_sge, od->sg_max_len);
> }
>
>
More information about the Pvfs2-developers
mailing list