[PVFS-users] RE: PVFS Hangups during concurrent reads/writes
Brannen S Hough
bshough at impactsci.com
Tue Aug 10 10:27:04 EDT 2004
I haven't had any luck finding mention of select() problems in
general either - just the notes in the pvfs-1.6.3-pre3/shared/sockset.c file
I'm sorry to hear that no one has seen anything like it before, but
not too surprised. I seem to have a gift for breaking things in new and
novel ways - a gift of dubious utility at times 8-).
My test program is a little odd - I was switching back and forth
between access through the libraries and standard I/O so I could get a feel
for how they compare to each other for different file and block sizes. I
will get rid of the local copies of the headers and symbolically link in the
current ones - it was only meant to be a quick and dirty test, not something
I'd use long term, so I wasn't too systematic about writing it.
My test program can use either standard file I/O in a short test
(the -s switch), or the pvfs libraries in a short test (the -sd switch). So
I've run them both ways - using all standard file I/O and all via pvfs
library calls. No difference in the behaviour.
The /etc/pvfstab files on both machines are identical and point to
the same directory (/mnt/pvfs) on the same machine, but I'll double check
again, and send you a copy too.
I'll check on the PVFS_USE_NODELAY define too.
Thanks for all your help,
> -----Original Message-----
> From: Rob Ross [mailto:rross at mcs.anl.gov]
> Sent: Monday, August 09, 2004 5:42 PM
> To: Brannen S Hough
> Cc: pvfs-users at beowulf-underground.org
> Subject: Re: PVFS Hangups during concurrent reads/writes
> Hi Brannen,
> I did a quick search and couldn't find any mention of 2.4.20 select()
> problems. Of couse I would like this to be a kernel problem, or perhaps a
> libc problem, but I don't see anything indicating that others have had the
> same issues.
> At the same time, no, we haven't had anything like this reported either!
> It's particularly odd to me that things work fine when on different
> machines while working just fine on the same machine! Usually it is the
> other way around :).
> Your test program is a little odd in that it moves back and forth between
> using the kernel and using the user library (if my cursory skim got the
> right impression). Also, you're playing a dangerous game keeping extra
> copies of the PVFS headers in the test subdirectory; there are changes
> between what I see in there and CVS for sure.
> Have you tried just using the kernel interface or just using the library?
> If so, did those work ok? Do you have an /etc/pvfstab file set up on your
> machine pointing exactly to the same directory as the mount point?
> Can you verify for me that PVFS_USE_NODELAY is defined in pvfs/config.h
> (not pvfs-kernel)? It's probably defined twice (it's ok).
> Thanks, and sorry we don't have a quick solution for you!
> On Mon, 9 Aug 2004, Brannen S Hough wrote:
> > I've been trying to isolate this problem and find a way
> > it. At its core it seems to be a select() call problem, which would
> mean a
> > linux kernel problem. Attached is a screen shot of the trace from
> > ddd on pvfsd, gets hung up on line 199 in sockset.c, which is calling
> > dfd_select() (in pvfs-1.6.3-pre3/shared/dfd_set.c), which is calling
> > select().
> > I tried updating my RedHat 9 to kernel version 2.4.20-31.9,
> > recompiling everything, and rerunning my tests, but I got the same
> > Any other ideas? I could try rewriting the dfd_select routine to break
> > each socket file descriptor individually and calling select() on each
> > instead of passing the array of file descriptors to select(), but I'm
> > sure that would fix the problem (and would make things slightly less
> > efficient).
More information about the PVFS-users