[Pvfs2-developers] pvfs2-client issues over IB

Kyle Schochenmaier kschoche at scl.ameslab.gov
Mon Sep 25 11:50:13 EDT 2006


Can anyone make any sense of this?
I have a feeling these are related to the hangups I'm having w/o the 
client interface in openib.
This is built off of latest cvs head.  6 server nodes, 1 client node. 
mounted via pvfs2-client over openib.

I did a `killall pvfs2-client-core`  and thats where the process exit 
status message comes from.. This appears to have kicked the process out 
of the hang state that it was in.  And I'm able to do fs operations on 
it again, but it will lock up eventually.. (every other time I do an op 
on the mount?)

Any other debian+openib+pvfs2 users out there?

Log message from the client:

[E 10:19:33.127182] fp_multiqueue_cancel: flow proto cancel called on 
0x10151cf0
[E 10:19:33.127283] handle_io_error: flow proto error cleanup started on 
0x10151
cf0, error_code: -1610612737
[E 10:19:33.127394] handle_io_error: flow proto 0x10151cf0 canceled 1 
operations
, will clean up.
[E 10:19:33.127420] fp_multiqueue_cancel: flow proto cancel called on 
0x10152bc0
[E 10:19:33.127442] handle_io_error: flow proto error cleanup started on 
0x10152
bc0, error_code: -1610612737
[E 10:19:33.127529] handle_io_error: flow proto 0x10152bc0 canceled 1 
operations
, will clean up.
[E 10:19:33.127553] fp_multiqueue_cancel: flow proto cancel called on 
0x10153328
[E 10:19:33.127576] handle_io_error: flow proto error cleanup started on 
0x10153
328, error_code: -1610612737
[E 10:19:33.127664] handle_io_error: flow proto 0x10153328 canceled 1 
operations
, will clean up.
[E 10:19:33.129177] handle_io_error: flow proto 0x10151cf0 error cleanup 
finishe
d, error_code: -1610612737
[E 10:19:33.129220] handle_io_error: flow proto 0x10152bc0 error cleanup 
finishe
d, error_code: -1610612737
[E 10:19:33.129254] handle_io_error: flow proto 0x10153328 error cleanup 
finishe
d, error_code: -1610612737
[E 10:20:42.852432] fp_multiqueue_cancel: flow proto cancel called on 
0x104a1f58
[E 10:20:42.852485] handle_io_error: flow proto error cleanup started on 
0x104a1
f58, error_code: -1610612737
[E 10:20:42.852602] handle_io_error: flow proto 0x104a1f58 canceled 1 
operations
, will clean up.
[E 10:20:42.853082] handle_io_error: flow proto 0x104a1f58 error cleanup 
finishe
d, error_code: -1610612737
[E 10:21:23.104434] fp_multiqueue_cancel: flow proto cancel called on 
0x104a17f0
[E 10:21:23.104487] handle_io_error: flow proto error cleanup started on 
0x104a1
7f0, error_code: -1610612737
[E 10:21:23.104600] handle_io_error: flow proto 0x104a17f0 canceled 1 
operations
, will clean up.
[E 10:21:23.104665] handle_io_error: flow proto 0x104a17f0 error cleanup 
finishe
d, error_code: -1610612737
[E 10:23:35.657241] pvfs2-client-core with pid 22269 exited with value 0
[E 10:26:44.473556] fp_multiqueue_cancel: flow proto cancel called on 
0x10152150
[E 10:26:44.473654] handle_io_error: flow proto error cleanup started on 
0x10152
150, error_code: -1610612737
[E 10:26:44.473765] handle_io_error: flow proto 0x10152150 canceled 1 
operations
, will clean up.
[E 10:26:44.473820] handle_io_error: flow proto 0x10152150 error cleanup 
finishe
d, error_code: -1610612737
[E 10:31:45.812544] job_time_mgr_expire: job time out: cancelling bmi 
operation,
 job_id: 115663.
[E 10:31:45.813455] job_time_mgr_expire: job time out: cancelling bmi 
operation,
 job_id: 115665.
[E 10:32:12.253541] job_time_mgr_expire: job time out: cancelling bmi 
operation,
 job_id: 115695.
[E 10:32:12.253590] job_time_mgr_expire: job time out: cancelling bmi 
operation,
 job_id: 115697.

Hopefully this can makes sense to someone else, its not english to me :(

    -- Kyle

-- 
Kyle Schochenmaier
kschoche at scl.ameslab.gov
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory 



More information about the Pvfs2-developers mailing list