[Pvfs2-developers] pvfs2-server failures openib

Kyle Schochenmaier kschoche at scl.ameslab.gov
Wed Apr 4 13:21:17 EDT 2007


I can reproducibly trigger this error on the server by doing multiple 
instances of pvfs2-cp over various IB hardware.

For this one, I did:

pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t 
/pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null 
& pvfs2-cp -t /pvfs2/1node/test2 /dev/null & pvfs2-cp -t 
/pvfs2/1node/test2 /dev/null & pvfs2-cp -t /pvfs2/1node/test2 /dev/null

That should be 6 of them, everything worked fine up until 6 processes 
started hammering the server.  I can reproduce this with only 3 
processes using faster/lower-latency hardware on the client.

Any ideas where to start tracking this one down?

Kyle



[D 21:25:03.375378] PVFS2 Server version 2.6.2pre1-2007-02-23-150254 
starting.
[E 17:00:09.488945] job_time_mgr_expire: job time out: cancelling flow 
operation
, job_id: 4608742.
[E 17:00:09.571458] fp_multiqueue_cancel: flow proto cancel called on 
0x60a620
[E 17:00:09.572098] handle_io_error: flow proto error cleanup started on 
0x60a62
0, error_code: -1610612737
[E 17:00:09.573040] handle_io_error: flow proto 0x60a620 canceled 8 
operations,
will clean up.
[E 17:00:09.573595] handle_io_error: flow proto 0x60a620 error cleanup 
finished,
 error_code: -1610612737
[E 17:00:09.573630] Error: memcache_memfree: buf 0x2aaaab272010 len 
262144 count
 = 2, expected 1.
[E 17:00:09.630067]     [bt] /usr/local/sbin/pvfs2-server(error+0xca) 
[0x42e40a]
[E 17:00:09.630112]     [bt] /usr/local/sbin/pvfs2-server [0x42e72a]
[E 17:00:09.630120]     [bt] /usr/local/sbin/pvfs2-server [0x431fa2]
[E 17:00:09.630128]     [bt] /usr/local/sbin/pvfs2-server [0x433a69]
[E 17:00:09.630136]     [bt] /usr/local/sbin/pvfs2-server [0x43df38]
[E 17:00:09.630143]     [bt] /lib/libpthread.so.0 [0x2b3168be0b1c]
[E 17:00:09.630151]     [bt] /lib/libc.so.6(__clone+0x72) [0x2b3168fcd9c2]



More information about the Pvfs2-developers mailing list