[Pvfs2-users] OpenIB/kernel interface: null pointer dereference in put_back_slot

Pete Wyckoff pw at osc.edu
Tue Mar 20 11:36:25 EST 2007


kschoche at scl.ameslab.gov wrote on Tue, 20 Mar 2007 11:12 -0600:
> According to the log, you're getting IBV_WC_WR_FLUSH returned by the 
> check_cq fuction which does all the polling for openIB.
> The IB spec says this about the error:
> "Work Request Flushed Error - A Work Request was in process or 
> outstanding when the QP transitioned into the Error State."
> 
> It doesnt go any further into the details of this error, but generally 
> whenever the QP is sent into an error state,
> it is considered to be a fatal error by most of the IB community. 
> (correct me if I'm wrong, please)
> This leads me to believe that you may still have underlying network 
> problems.
> Have you been able to successfully run the various openIB test programs 
> like ibv_rc_pingpong() or possibly tried the latest NetPIPE release 
> which has openIB support (it may not give a pretty answer other than 
> crashing if you have network problems though :-/ )
> 
> If the network ends up not being the problem, we've got a serious 
> problem here in the code, as we should never be putting the QP into 
> erroneous states.
> 
> Also, pete, the spec doesnt say anything about having async errors being 
> flagged for an error like this, is this a case where we might be able to 
> get useful information about the QP before or as it goes into an error 
> state via async events?

Concur.  Network problems.  Server had a network error, the client
noticed and flushed the pending receive.  Then told you about it and
exited.

Take a look at the server log and see if it registered any complaints.
And try some long-running network-level tests to see if you can find
any problems there, as Kyle suggests.

		-- Pete



More information about the Pvfs2-users mailing list