[PVFS-users] RE: PVFS Hangups during concurrent reads/writes

Brannen S Hough bshough at impactsci.com
Tue Aug 10 10:27:04 EDT 2004


	Hi Rob,
	I haven't had any luck finding mention of select() problems in
general either - just the notes in the pvfs-1.6.3-pre3/shared/sockset.c file
itself.
	I'm sorry to hear that no one has seen anything like it before, but
not too surprised.  I seem to have a gift for breaking things in new and
novel ways - a gift of dubious utility at times  8-).
	My test program is a little odd - I was switching back and forth
between access through the libraries and standard I/O so I could get a feel
for how they compare to each other for different file and block sizes.  I
will get rid of the local copies of the headers and symbolically link in the
current ones - it was only meant to be a quick and dirty test, not something
I'd use long term, so I wasn't too systematic about writing it.
	My test program can use either standard file I/O in a short test
(the -s switch), or the pvfs libraries in a short test (the -sd switch).  So
I've run them both ways - using all standard file I/O and all via pvfs
library calls.  No difference in the behaviour.
	The /etc/pvfstab files on both machines are identical and point to
the same directory (/mnt/pvfs) on the same machine, but I'll double check
again, and send you a copy too.
	I'll check on the PVFS_USE_NODELAY define too.

	Thanks for all your help,
	- Brannen

> -----Original Message-----
> From: Rob Ross [mailto:rross at mcs.anl.gov]
> Sent: Monday, August 09, 2004 5:42 PM
> To: Brannen S Hough
> Cc: pvfs-users at beowulf-underground.org
> Subject: Re: PVFS Hangups during concurrent reads/writes
> 
> Hi Brannen,
> 
> I did a quick search and couldn't find any mention of 2.4.20 select()
> problems.  Of couse I would like this to be a kernel problem, or perhaps a
> libc problem, but I don't see anything indicating that others have had the
> same issues.
> 
> At the same time, no, we haven't had anything like this reported either!
> It's particularly odd to me that things work fine when on different
> machines while working just fine on the same machine!  Usually it is the
> other way around :).
> 
> Your test program is a little odd in that it moves back and forth between
> using the kernel and using the user library (if my cursory skim got the
> right impression).  Also, you're playing a dangerous game keeping extra
> copies of the PVFS headers in the test subdirectory; there are changes
> between what I see in there and CVS for sure.
> 
> Have you tried just using the kernel interface or just using the library?
> If so, did those work ok?  Do you have an /etc/pvfstab file set up on your
> machine pointing exactly to the same directory as the mount point?
> 
> Can you verify for me that PVFS_USE_NODELAY is defined in pvfs/config.h
> (not pvfs-kernel)?  It's probably defined twice (it's ok).
> 
> Thanks, and sorry we don't have a quick solution for you!
> 
> Rob
> 
> 
> On Mon, 9 Aug 2004, Brannen S Hough wrote:
> 
> >              I've been trying to isolate this problem and find a way
> around
> > it.  At its core it seems to be a select() call problem, which would
> mean a
> > linux kernel problem.   Attached is a screen shot of the trace from
> running
> > ddd on pvfsd, gets hung up on line 199 in sockset.c, which is calling
> > dfd_select() (in pvfs-1.6.3-pre3/shared/dfd_set.c), which is calling
> > select().
> >
> >              I tried updating my RedHat 9 to kernel version 2.4.20-31.9,
> > recompiling everything, and rerunning my tests, but I got the same
> results.
> > Any other ideas?   I could try rewriting the dfd_select routine to break
> out
> > each socket file descriptor individually and calling select() on each
> > instead of passing the array of file descriptors to select(), but I'm
> not
> > sure that would fix the problem (and would make things slightly less
> > efficient).






More information about the PVFS-users mailing list