[PVFS2-users] pvfs2-client crashes...

Robert Latham robl at mcs.anl.gov
Wed Jun 2 11:29:00 EDT 2004


Justin Binns has run into some cases where the pvfs2-client dies.
Since he's not subscribed, here's his message. Please CC him on
replies. 

==rob



I've been chatting with Rob Latham and he suggested that I e-mail this
list.  We have two identical clusters, each running a pvfs2 volume
(approx. 1TB in size) that we are using to store hdf5 data.  We're
reading this data via the kernel interface using the hdf5 libraries,
and/or using rsync to copy to the local file system (to speed up reads -
the code we're using does rather small seek/read combinations which
really hurts in performance with the larger files).

So far, we're *very* happy with pvfs2 - it's much faster than pvfs1 in
the vast majority of cases, and it's a lot cleaner and easier to work
with.  Here's the problem - we regularly see failures of the
pvfs2-client component.  The kernel module seems happy, and doing a
restart (essentially a killall followed by re-running the pvfs2-client
component) brings it back.  The only error message is in dmesg and is
pasted below, followed by a couple timeout errors (that I believe are
caused by the failure of the client).  There is no information in any of
the logs.

If we run our job across 4 or 8 machines, invariably, one or two fail
in a relatively short time.  rsyncing a large file usually does it.

What I'd like to do is provide whatever assistance I can to help debug
this - what kinds of options should we use to get helpful information? 
Thanks!

Justin Binns

*****************************************************
PVFS2 Device Error:  You cannot open the device file
/dev/pvfs2-req more than once.  Please make sure that
there are no instances of a program using this device
currently running. (You must verify this!)
For example, you can use the lsof program as follows:
'lsof | grep pvfs2-req' (run this as root)
*****************************************************

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B


More information about the PVFS2-users mailing list