[PVFS-users] FW: PVFS Hangups during concurrent read/writes

Brannen S Hough bshough at impactsci.com
Mon Aug 2 17:25:11 EDT 2004



	I've seen a new problem (for me anyway) trying to get some
performance
measurements on PVFS.  I've induced version 1.6.2 to hang thus:
	1) Having 3 machines operate 4 ionodes (1 does two, with separate
iod.config files and separate ports - 7000 and 7001).
	2) Mounting the PVFS filesystem using pvfs.o, pvfsd, mount.pvfs on
two
machines (1 is running the Manager and i ionode, the other running the 2
ionodes).
	3) Running the same test program concurrently on both machines that
have PVFS mounted, but to different subdirectories.
	4) Took several minutes of running to hang, but both test programs
stopped in midcycle - one while trying to write a file, the other while
trying to read a file.  Different files in different directories though.

	That was last week.  I noticed that 1.6.3-pre3 was posted on the
29th,
so I figured maybe the latest patches that were incorporated might have
a bearing.  So I downloaded and build to 1.6.3-pre3 on all machines in
my tiny cluster, and reran tests.  Exactly the same scenario as above
occurred.  I have been running individual tests all day on 1.6.3 - first on
one node then the other - to assure myself that everything was set up and
operating correctly.  No problems.  When I ran them concurrently the
lock-up described above happened.

	Has anyone seen something like this before?  Any suggestions as to
how
to diagnose it?  I saw debug level settings on the pvfsd executable, but
there are a lot of debug mask settings, so I image that if I turn on too
much I'll just be inundated with debug data that I can't interpret.
Suggestions on relevant debug masks would be helpful.

	Any help on how to figure this out would be appreciated.  Right now
the PVFS logs (iod and pvfsd) have only the starting message in them - no
errors are recorded.  Almost seems like a deadlock between read and write
access (??? pure speculation here).  

	I can unmount and shut down the pvfsd and remove the module, but if
I try to do anything that tries to access the PVFS mounted fs before I do
that (even a df) - it will hang that shell. However, if I unmount, kill
pvfsd, rmmod pvfs, and then redo the steps to mount the PVFS it does indeed
work again (at least running one test at a time).

	Thanks in advance,
		- Brannen





More information about the PVFS-users mailing list