[PVFS-users] Re: PVFS access hangs

Brannen S Hough bshough at impactsci.com
Mon Jul 19 11:07:56 EDT 2004


Thanks for getting back to me Rob.  Responses blended in below:

> -----Original Message-----
> From: Rob Ross [mailto:rross at mcs.anl.gov]
> Sent: Friday, July 16, 2004 4:02 PM
> To: Brannen S Hough
> Cc: pvfs-users at beowulf-underground.org
> Subject: RE: [PVFS-users] Re: PVFS access hangs
> 
> On Fri, 16 Jul 2004, Brannen S Hough wrote:
> 
> > 	I've gotten in some gigabit Ethernet equipment in the last couple of
> > days, so I reconfigured my little cluster to use them.  In the process I
> > made them more homogeneous by installing RedHat 9 on the other two (so
> now
> > all are running that).  So at least I can get rid of any doubts I have
> about
> > different kernel versions, etc.
> 
> Sounds good.  But all that stuff was ok (except for the Xeon) before,
> right?

Yes, that was all working before, only different nodes were running
different kernel versions.  Now everyone is on one kernel version.  Figured
it would eliminate one of the variables.

> 
> > 	I see a lot of strange things that I cannot understand.  Setting
> > things up as in the quick start guide (except I have the pvfs data on a
> > separate partition, /pvfs, with appropriate config file changes) and
> running
> > tests on the Xeon client (running as Manager, one of the three IONodes,
> and
> > Client) everything works fine.  However, I tested both before and after
> > setting up the kernel module and mounting the filesystem so standard I/O
> > could access it (using the pvfs library calls) and it performed much
> faster
> > after the filesystem was mounted.  That seems odd.  Also, the access
> using
> > the library calls was slower than using standard I/O.  Which also seems
> odd.
> 
> How were you testing with the library calls?  With standard I/O?

Using the pvfs_open, pvfs_close, pvfs_read, pvfs_write calls - no custom
striping.  And using the regular open, close, read, write calls to test
"standard I/O", while keeping everything else as close as possible.

> 
> > 	However, with the Xeon set up as the Manager, I was not able to use
> > either the library or standard I/O (after mounting the pvfs filesystem
> using
> > the kernel module) to access the pvfs files on either of the other
> machines
> > running as IONodes.  Here's the dump from the iolog on one of them:
> >
> >
> > $Header: /projects/cvsroot/pvfs/iod/iod.c,v 1.66 2003/11/12 14:09:19
> > pcarns Exp $
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > brecv_timeout: giving up
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_rw_req: empty job - removing
> > do_close_req: failed (i=0, ino=532259, cap=0)
> > *** SENDING ERROR ***
> > new_request: close failed
> 
> That "empty job" message occurs when a client asks for a region of a file
> and the server has no data in that region -- a common case would be trying
> to read beyond EOF.
> 
> Are you sure that those messages are from the "cat" case below?  If they
> are, then they imply that some data was in fact getting to the iods from
> the client node.  But my first guess would be that that isn't happening.
> 
> Another thing that is possible is that somehow the IP addresses of your
> iods are not quite right?  For example, if you listed an address in the
> .iodtab that is accessible from the local node but not remotely, then you
> would have issues.  Can you send your .iodtab to us?  The pvfs-ping output
> too, in both cases?

That actually sounds like it might be the problem right there - I need to
look at my .hosts and .iodtab files - I specified all nodes by host names,
but my .hosts file might have substituted the loopback address.  THAT would
definitely screw it up for any other machine.

> 
> > 	I could use pvfs-ping, pvfs-ls, etc (forgot to try u2p though)
> > without a problem.  After mounting the pvfs filesystem I could use 'ls',
> > etc.  However my test program hangs while trying to write files (though
> it
> > can be interrupted).  I also tried to 'cat' one of the files, but that
> hung
> > also (the last errors above are from that attempt).
> 
> What does your test program do, and what interface does it use?
> 
> Please do try u2p also if you have a chance, I'm guessing that it won't
> work.

I'll try that too - but I think the above one will fix my problem.

> 
> > 	I'm using the Jan patch to pvfs-1.6.2, the two patches for redhat to
> > the kernel code.  Not sure what else to try - I could set up one of the
> > other two machines as the Manager and see what happens then - so far my
> > experience has been that unless a machine is the Manager node, I cannot
> > access the pvfs filesystem for read or write even though the utilities
> work
> > fine.
> 
> I'm almost certain that you're seeing either a misconfiguration problem
> due to odd IP addresses in a PVFS config file, or a routing problem, or a
> firewall problem.
> 
> Regards,
> 
> Rob

I'll check everything out and give you an update - thanks for all your help!

	- Brannen







More information about the PVFS-users mailing list