[PVFS-users] Re: PVFS access hangs
Brannen S Hough
bshough at impactsci.com
Fri Jul 16 16:19:57 EDT 2004
Rob,
I've gotten in some gigabit Ethernet equipment in the last couple of
days, so I reconfigured my little cluster to use them. In the process I
made them more homogeneous by installing RedHat 9 on the other two (so now
all are running that). So at least I can get rid of any doubts I have about
different kernel versions, etc.
I see a lot of strange things that I cannot understand. Setting
things up as in the quick start guide (except I have the pvfs data on a
separate partition, /pvfs, with appropriate config file changes) and running
tests on the Xeon client (running as Manager, one of the three IONodes, and
Client) everything works fine. However, I tested both before and after
setting up the kernel module and mounting the filesystem so standard I/O
could access it (using the pvfs library calls) and it performed much faster
after the filesystem was mounted. That seems odd. Also, the access using
the library calls was slower than using standard I/O. Which also seems odd.
However, with the Xeon set up as the Manager, I was not able to use
either the library or standard I/O (after mounting the pvfs filesystem using
the kernel module) to access the pvfs files on either of the other machines
running as IONodes. Here's the dump from the iolog on one of them:
$Header: /projects/cvsroot/pvfs/iod/iod.c,v 1.66 2003/11/12 14:09:19
pcarns Exp $
do_rw_req: empty job - removing
do_rw_req: empty job - removing
brecv_timeout: giving up
do_rw_req: empty job - removing
do_rw_req: empty job - removing
do_rw_req: empty job - removing
do_rw_req: empty job - removing
do_rw_req: empty job - removing
do_close_req: failed (i=0, ino=532259, cap=0)
*** SENDING ERROR ***
new_request: close failed
I could use pvfs-ping, pvfs-ls, etc (forgot to try u2p though)
without a problem. After mounting the pvfs filesystem I could use 'ls',
etc. However my test program hangs while trying to write files (though it
can be interrupted). I also tried to 'cat' one of the files, but that hung
also (the last errors above are from that attempt).
I'm using the Jan patch to pvfs-1.6.2, the two patches for redhat to
the kernel code. Not sure what else to try - I could set up one of the
other two machines as the Manager and see what happens then - so far my
experience has been that unless a machine is the Manager node, I cannot
access the pvfs filesystem for read or write even though the utilities work
fine.
Any ideas would be great
- Brannen
> -----Original Message-----
> From: Rob Ross [mailto:rross at mcs.anl.gov]
> Sent: Monday, July 12, 2004 10:15 PM
> To: Brannen S Hough
> Cc: pvfs-users at beowulf-underground.org
> Subject: Re: [PVFS-users] Re: PVFS access hangs
>
> Hi Brannen,
>
> There's very little that changes due to configure scripts, and in
> particular for a given PVFS version there are no changes to what is sent
> between client and server based on that (the protocol is fixed), so that
> shouldn't be an issue.
>
> Do the PVFS utilities work when you have your original Xeon client
> configuration (that was hanging)? In particular do pvfs-ping, u2p (to
> copy something onto the file system), and pvfs-ls work?
>
> Those will help us narrow down whether it's definitely a kernel issue or
> something stranger.
>
> Thanks!
>
> Rob
>
> On Fri, 9 Jul 2004, Brannen S Hough wrote:
>
> > I've gotten further in diagnosing my problem - I've gotten the Red Hat
> > 9 system to work as a standalone (as Manager, only IONode, and Client),
> > and can mount and read and write to the pvfs file system OK. I also
> > extended it by adding two other nodes (a PIII based machine and a PII
> > based machine), so I have 3 IONodes running and can use the pvfs file
> > system OK.
> >
> > I was running before using the PIII based system as the Manager, and
> > using the Red Hat 9 system (on a Xeon) as the Client - which was
> > hanging.
> >
> > How much adjusting of the source code is done by the configure script?
> > I built the binaries for each system on each system separately - is it
> > possible there was some inconsistency introduced by the configure script
> > due to the different architectures of the different machines I am using?
More information about the PVFS-users
mailing list