[PVFS-users] mmargo
Ron W. Green
rwgree at sandia.gov
Fri Apr 2 18:52:44 EST 2004
Martin,
We seem to get those "failed on enqueue" quite often. Of course, our
cluster is much bigger too. I've scratched my head on this, and looked
at the code. The best I can tell it is when the pvfs client attempts a
metadata operation to the mgr node. I suspect that the mgr is slow in
responding and/or has run out of queueing space to enqueue the metadata
operation request (create or stat).
Anyone on the list know if mgr has a fixed queue size? Or can we jack
up the client timeouts? Multithread mgr?
From our testing we're quite convinced the problem lies in mgr, that it
can't keep up with metadata requests from the clients.
ron
Martin Margo wrote:
> Dear list,
>
> I have a 20 nodes two way itanium cluster running suse linux
> 2.4.21_144. Each node is an iod and a client, one head node is all
> three (mgr,iod,client). we are pvfs 1.6.2. I setup this system to
> perform benchmarking on PVFS. To benchmark pvfs, i used a tool from
> ASCI called IOR (v2.7.3) an mpi coordinated test of parallel I/O.
>
> Each iod has access to 73 GB internal SCSI disk thus making the entire
> fs 1.3TB in size.
>
> My ior test hung almost all the time, usually one of the client hung
> and not responding. I started out with each node writing a 1 GB file
> with 16k blocksize, and then increase the blocksize and keeping the 1
> GB file constant. I saw that Rob had a patch out on January 29 2004,
> is that patch incorporated into pvfs 1.6.2? I would like to try the
> patch but I couldn's seem to get it from the list, is it available
> somewhere in the website / ftp server?
>
> by the way this is the error I got in syslog from the node that hung.
>
> Mar 3 01:49:12 tg-c131 kernel: (ll_pvfs.c, 233): ll_pvfs_create
> failed on enqueue
> for 198.202.112.166:30004/pvfs-meta/testFile
> Mar 3 01:49:18 tg-c131 kernel: (ll_pvfs.c, 507): ll_pvfs_statfs
> failed on enqueue
> for 198.202.112.166:30004/pvfs-meta
> Mar 3 01:49:18 tg-c131 kernel: (inode.c, 264): pvfs_statfs failed
> Mar 3 01:50:54 tg-c131 sshd[9466]: Accepted rsa for root from
> ::ffff:198.202.112.
> 166 port 33773
>
>
> Thanks in advance.
>
> -Martin W Margo
> HPC Engineer
> San Diego Supercomputer Center
>
> _______________________________________________
> PVFS-users mailing list
> PVFS-users at www.beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
--
Ron W. Green
rwgree at sandia.gov
+1-505-284-1600
Sr. Engineer, ICC Applications Support
More information about the PVFS-users
mailing list