[PVFS-users] Re: PVFS access hangs
Brannen S Hough
bshough at impactsci.com
Mon Jul 19 11:07:56 EDT 2004
Thanks for getting back to me Rob. Responses blended in below:
> -----Original Message-----
> From: Rob Ross [mailto:rross at mcs.anl.gov]
> Sent: Friday, July 16, 2004 4:02 PM
> To: Brannen S Hough
> Cc: pvfs-users at beowulf-underground.org
> Subject: RE: [PVFS-users] Re: PVFS access hangs
>
> On Fri, 16 Jul 2004, Brannen S Hough wrote:
>
> > I've gotten in some gigabit Ethernet equipment in the last couple of
> > days, so I reconfigured my little cluster to use them. In the process I
> > made them more homogeneous by installing RedHat 9 on the other two (so
> now
> > all are running that). So at least I can get rid of any doubts I have
> about
> > different kernel versions, etc.
>
> Sounds good. But all that stuff was ok (except for the Xeon) before,
> right?
Yes, that was all working before, only different nodes were running
different kernel versions. Now everyone is on one kernel version. Figured
it would eliminate one of the variables.
>
> > I see a lot of strange things that I cannot understand. Setting
> > things up as in the quick start guide (except I have the pvfs data on a
> > separate partition, /pvfs, with appropriate config file changes) and
> running
> > tests on the Xeon client (running as Manager, one of the three IONodes,
> and
> > Client) everything works fine. However, I tested both before and after
> > setting up the kernel module and mounting the filesystem so standard I/O
> > could access it (using the pvfs library calls) and it performed much
> faster
> > after the filesystem was mounted. That seems odd. Also, the access
> using
> > the library calls was slower than using standard I/O. Which also seems
> odd.
>
> How were you testing with the library calls? With standard I/O?
Using the pvfs_open, pvfs_close, pvfs_read, pvfs_write calls - no custom
striping. And using the regular open, close, read, write calls to test
"standard I/O", while keeping everything else as close as possible.
>
> > However, with the Xeon set up as the Manager, I was not able to use
> > either the library or standard I/O (after mounting the pvfs filesystem
> using
> > the kernel module) to access the pvfs files on either of the other
> machines
> > running as IONodes. Here's the dump from the iolog on one of them:
> >
> >
> > $Header: /projects/cvsroot/pvfs/iod/iod.c,v 1.66 2003/11/12 14:09:19
> > pcarns Exp $
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > brecv_timeout: giving up
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_close_req: failed (i=0, ino=532259, cap=0)
> > *** SENDING ERROR ***
> > new_request: close failed
>
> That "empty job" message occurs when a client asks for a region of a file
> and the server has no data in that region -- a common case would be trying
> to read beyond EOF.
>
> Are you sure that those messages are from the "cat" case below? If they
> are, then they imply that some data was in fact getting to the iods from
> the client node. But my first guess would be that that isn't happening.
>
> Another thing that is possible is that somehow the IP addresses of your
> iods are not quite right? For example, if you listed an address in the
> .iodtab that is accessible from the local node but not remotely, then you
> would have issues. Can you send your .iodtab to us? The pvfs-ping output
> too, in both cases?
That actually sounds like it might be the problem right there - I need to
look at my .hosts and .iodtab files - I specified all nodes by host names,
but my .hosts file might have substituted the loopback address. THAT would
definitely screw it up for any other machine.
>
> > I could use pvfs-ping, pvfs-ls, etc (forgot to try u2p though)
> > without a problem. After mounting the pvfs filesystem I could use 'ls',
> > etc. However my test program hangs while trying to write files (though
> it
> > can be interrupted). I also tried to 'cat' one of the files, but that
> hung
> > also (the last errors above are from that attempt).
>
> What does your test program do, and what interface does it use?
>
> Please do try u2p also if you have a chance, I'm guessing that it won't
> work.
I'll try that too - but I think the above one will fix my problem.
>
> > I'm using the Jan patch to pvfs-1.6.2, the two patches for redhat to
> > the kernel code. Not sure what else to try - I could set up one of the
> > other two machines as the Manager and see what happens then - so far my
> > experience has been that unless a machine is the Manager node, I cannot
> > access the pvfs filesystem for read or write even though the utilities
> work
> > fine.
>
> I'm almost certain that you're seeing either a misconfiguration problem
> due to odd IP addresses in a PVFS config file, or a routing problem, or a
> firewall problem.
>
> Regards,
>
> Rob
I'll check everything out and give you an update - thanks for all your help!
- Brannen
More information about the PVFS-users
mailing list