[Pvfs2-developers] Re: bmi_ib failure on memfull HCA's

Troy Benjegerdes troy at scl.ameslab.gov
Tue Feb 19 17:32:02 EST 2008


Now this is interesting..  This is the memfree hardware, but it looks 
like the 'bmi_ib: cancel memcache deregister states' fix didn't quite do 
the right thing.

da0:/scratch/1# tail -f pvfs2-server.log
[D 02/19 16:25] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 starting.
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:25] max send/recv sge 29 30
[E 02/19 16:26] max send/recv sge 29 30
[E 02/19 16:27] max send/recv sge 29 30



[E 02/19 16:30] job_time_mgr_expire: job time out: cancelling flow 
operation, job_id: 9572.
[E 02/19 16:30] fp_multiqueue_cancel: flow proto cancel called on 0x66cbf0
[E 02/19 16:30] handle_io_error: flow proto error cleanup started on 
0x66cbf0, error_code: -1610613121
[E 02/19 16:30] PVFS2 server: signal 11, faulty address is 0x20, from 
0x429efa
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a) 
[0x429efa]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(memcache_deregister+0x3a) 
[0x429efa]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x4295a7]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_thread_mgr_bmi_cancel+0xa6) 
[0x43aac6]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42ddb0]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e035]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x42e4e7]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_flow_cancel+0x42) 
[0x42d4d2]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_flow_cancel+0x3d) 
[0x4388cd]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(job_time_mgr_expire+0x1ca) 
[0x43b18a]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server [0x45b339]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_invoke+0xb5) 
[0x446f35]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_next+0xab) 
[0x4471cb]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(PINT_state_machine_continue+0x1e) 
[0x446dee]
[E 02/19 16:30] [bt] 
/usr/src/pvfs2-hg/Bamd64/6ba964b95427_tip/sbin/pvfs2-server(main+0xe00) 
[0x410b90]


Pete Wyckoff wrote:
> kschoche at gmail.com wrote on Tue, 19 Feb 2008 14:53 -0600:
>> When migrating some of our machines to the production network, going
>> from memfree mellanox ib cards to older pci-x memfull cards, we came
>> across this error on the server, we're running debian 2.6.18-5-amd64
>> for the server, and a powerpc node for client which performed the same
>> test flawlessly against our memfree cards.
>>
>>
>> [D 12:39:44.257923] PVFS2 Server version 2.7.1pre1-2008-02-19-171553 starting.
>> [E 13:53:07.910247] Error: openib_post_sr_rdmaw: ibv_post_send (-1).
> [..]
>> Looking at the infiniband driver code, since we dont have a check for
>> this type of error in pvfs2, and I dont have the ib spec with me right
>> now, I noticed this error code occurs when:
>>
>>                 if (wq_overflow(&qp->sq, nreq, to_mcq(qp->ibv_qp.send_cq))) {
>>                         ret = -1;
>>                         *bad_wr = wr;
>>                         goto out;
>>                 }
>>
>> Looks like a WQ overflow?
>> I suppose we could debug inside pvfs2 and decode the bad_wr structure
>> to get more useful information for the future, do you know if ib spec
>> states this as a fatal error?
>> If its not a fatal error - though it looks fatal to me - can we
>> attempt to repost the send with a backed-off/smaller sge list?
>
> Sounds like a WQ overflow to me too.  This is in the RDMA code where
> it is necessary to do multiple RDMAs to satisfy one request,
> sometimes.  There is not any global checking against the number of
> available WQs.  I'm not sure why this hasn't come up before.  It
> would be possible to phase this with an extra couple of states.
> Just not a very fun thing to have to do!
>
> I'm curious what you have for SG support in that hardware.  Can
> you add some printf around here to show the max_send_sge and
> max_recv_sge.  I added an example line; maybe it will work:
>
>     /* compare the caps that came back against what we already have */
>     gossip_err("max send/recv sge %d %d\n", att.cap.max_send_sge,
>                att.cap.max_recv_sge);
>     if (od->sg_max_len == 0) {
>         od->sg_max_len = att.cap.max_send_sge;
>         if (att.cap.max_recv_sge < od->sg_max_len)
>             od->sg_max_len = att.cap.max_recv_sge;
>         od->sg_tmp_array = Malloc(od->sg_max_len * sizeof(*od->sg_tmp_array));
>     } else {
>         if (att.cap.max_send_sge < od->sg_max_len)
>             error("%s: new conn has smaller send SG array size %d vs %d",
>                   __func__, att.cap.max_send_sge, od->sg_max_len);
>         if (att.cap.max_recv_sge < od->sg_max_len)
>             error("%s: new conn has smaller recv SG array size %d vs %d",
>                   __func__, att.cap.max_recv_sge, od->sg_max_len);
>     }
>
>



More information about the Pvfs2-developers mailing list