[Pvfs2-users] PVFS2 Stability
Julian Martin Kunkel
Julian.Kunkel at web.de
Mon Apr 10 17:14:27 EDT 2006
> I've been working with PVFS2 since December, trying to implement it on our
> 60 node (120 CPU) Linux Beowulf Cluster.
Your error report is quite interesting and reminds me of benchmarks I did a
while ago on our test cluster with 5 nodes and 10 CPUs (dual Xeon 2Ghz, 1 GB
ethernet), this seems to be a similar configuration !
I did not use the kernel module. The servers already hung up by a large I/O
operation similar to pvfs2-cp, just from memory of one (!) node to the
servers. I was able to reproducible hung a server with one big I/O operation.
However, I ran out of time and for some other reasons I did not finished
debugging of the pvfs2-server...
That time I thought the reason for the server hung up was our system
configuration, especially the kernel (2.6.8) which had some problems with our
intel IDE chipset, or the dual cpu configuration (we did not use
I digged a bit and it seemed that on a random server the thread responsible
for dbpf hung up (maybe a deadlock or problems with the asynchronous I/O -
In my bachelor thesis I replaced the trove module with an I/O stub and testing
module which did not hang with the same operatiosn, so I suggest the problem
must be somewhere in the trove layer.
Maybe it would be good to try pvfs2 without threads to ensure that there are
no deadlocks or problems with aio. In former days it was possible to compile
the server with a single thread, but I'm not sure how the server has to be
compiled nowaday or even if it still works. I only remember that there are
the following servercflags in the Makefile which have to be removed/replaced
I would be glad to hear if this is still possible and how :)
Also the configure flag --disable-kernel-aio might help.
I'm currently busy and our cluster is reinstalled with a new configuration,
but maybe I can try to figure out what is happening at the end of the week.
More information about the Pvfs2-users