[Pvfs2-developers] openib-vfs failure

kschoche at scl.ameslab.gov kschoche at scl.ameslab.gov
Tue Jan 29 17:09:07 EST 2008


I've been running GAMESS tests with about 160GB's on the filesystem trying
to stress the network a bit and have managed to reproducibly get the
pvfs2-client to end on an assertion failure in

I havent been able to figure out exactly what is occuring that is causing
this assertion failure, but from the code it really appears as if this
shouldnt ever be occuring, obviously (assertion) ;)  Maybe we're getting
duplicate messages or double-testing a message somehow.

I'm running cvs HEAD, debian 2.6.18, and using bmi_ib modules over the vfs.

[E 15:54:20.197159] Error: encourage_recv_incoming: RTS_DONE to rq wrong
[E 15:54:20.200927]     [bt] pvfs2-client-core(error+0xca) [0x41a2ba]
[E 15:54:20.200940]     [bt] pvfs2-client-core [0x41779f]
[E 15:54:20.200948]     [bt] pvfs2-client-core [0x417e3a]
[E 15:54:20.200955]     [bt] pvfs2-client-core [0x4181fd]
[E 15:54:20.200963]     [bt] pvfs2-client-core(job_bmi_recv+0xea) [0x422f0a]
[E 15:54:20.200971]     [bt] pvfs2-client-core [0x441a18]
[E 15:54:20.200978]     [bt]
pvfs2-client-core(PINT_state_machine_invoke+0xd2) [
[E 15:54:20.200986]     [bt]
pvfs2-client-core(PINT_state_machine_next+0xcc) [0x
[E 15:54:20.200994]     [bt]
99) [0x4383e9]
[E 15:54:20.201001]     [bt] pvfs2-client-core(PVFS_isys_io+0x324) [0x4430a4]
[E 15:54:20.201009]     [bt] pvfs2-client-core [0x4117a6]
[E 15:54:20.205453] pvfs2-client-core with pid 6251 exited with value 1

Whats the best way to start to debug this?

