[PVFS2-developers] pvfs2 failover almost there

Robert Latham robl at mcs.anl.gov
Thu Jun 17 16:19:06 EDT 2004


Hey guys

I've been playing around with pvfs2 high availibility lately, and i've
almost got it working really well.  Active-passive already seems to
work, but Active-Active has some issues.

This is with pvfs2-0.5.1, tcp, AIO callbacks, Debian unstable.

I've got two servers, both acting as metadata and io nodes.  heartbeat
fires up the pvfs2-servers on both nodes, and a client (a 3rd node)
runs 'pvfs2-cp testfile /pvfs-ha/testfile' (a 1 GB file -- something
that will take significant time to run ).  

When one node goes down, the client (pvfs2-cp) reports 

Error: bmi_tcp: Connection reset by peer
Warning: BMI attempting reconnect.
Error: bmi_tcp: Connection refused
Error: poorly formatted protocol message received.
   Protocol version mismatch: received version 0 when expecting version 501.
   Please verify your PVFS2 installation and make sure that the version is
   consistent.
msgpairarray decode error: Protocol not supported
PVFS_sys_write: Protocol not supported
Error: short write


or sometimes i get this:
Warning: BMI attempting reconnect.
Error: bmi_tcp: Connection refused
Error: poorly formatted protocol message received.
   Too small: message only 0 bytes.
msgpairarray decode error: Protocol error
PVFS_sys_write: Protocol error
Error: short write       

Any suggestions? 

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B


More information about the PVFS2-developers mailing list