[PVFS-users] FW: PVFS Hangups during concurrent read/writes

Rob Ross rross at mcs.anl.gov
Tue Aug 3 14:58:01 EDT 2004


Hi Brannen,

Thanks for the problem report and for trying the newest prerelease before 
getting back to us!

What exactly is your test program?

Does this only happen when you have the two iods running on the same 
node?

It would be helpful to us for you to recompile with "-g", attach to the 
pvfsd, and get a stack dump.  Also some debugging output could help... try 
0x077 and see if that gets much into the pvfsdlog file as a start.

The iods don't coordinate read and write operations in a way that would 
cause deadlock, so that shouldn't be it.  The mgr just serializes 
everything it does, so that should be ok too.  I'm not sure what is going 
on quite yet...

Rob

On Mon, 2 Aug 2004, Brannen S Hough wrote:

> 	I've seen a new problem (for me anyway) trying to get some
> performance
> measurements on PVFS.  I've induced version 1.6.2 to hang thus:
> 	1) Having 3 machines operate 4 ionodes (1 does two, with separate
> iod.config files and separate ports - 7000 and 7001).
> 	2) Mounting the PVFS filesystem using pvfs.o, pvfsd, mount.pvfs on
> two
> machines (1 is running the Manager and i ionode, the other running the 2
> ionodes).
> 	3) Running the same test program concurrently on both machines that
> have PVFS mounted, but to different subdirectories.
> 	4) Took several minutes of running to hang, but both test programs
> stopped in midcycle - one while trying to write a file, the other while
> trying to read a file.  Different files in different directories though.
> 
> 	That was last week.  I noticed that 1.6.3-pre3 was posted on the
> 29th,
> so I figured maybe the latest patches that were incorporated might have
> a bearing.  So I downloaded and build to 1.6.3-pre3 on all machines in
> my tiny cluster, and reran tests.  Exactly the same scenario as above
> occurred.  I have been running individual tests all day on 1.6.3 - first on
> one node then the other - to assure myself that everything was set up and
> operating correctly.  No problems.  When I ran them concurrently the
> lock-up described above happened.
> 
> 	Has anyone seen something like this before?  Any suggestions as to
> how
> to diagnose it?  I saw debug level settings on the pvfsd executable, but
> there are a lot of debug mask settings, so I image that if I turn on too
> much I'll just be inundated with debug data that I can't interpret.
> Suggestions on relevant debug masks would be helpful.
> 
> 	Any help on how to figure this out would be appreciated.  Right now
> the PVFS logs (iod and pvfsd) have only the starting message in them - no
> errors are recorded.  Almost seems like a deadlock between read and write
> access (??? pure speculation here).  
> 
> 	I can unmount and shut down the pvfsd and remove the module, but if
> I try to do anything that tries to access the PVFS mounted fs before I do
> that (even a df) - it will hang that shell. However, if I unmount, kill
> pvfsd, rmmod pvfs, and then redo the steps to mount the PVFS it does indeed
> work again (at least running one test at a time).
> 
> 	Thanks in advance,
> 		- Brannen
> 
> 
> 
> _______________________________________________
> PVFS-users mailing list
> PVFS-users at www.beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
> 
> 


More information about the PVFS-users mailing list