[PVFS-users] mmargo

Ron W. Green rwgree at sandia.gov
Fri Apr 2 18:52:44 EST 2004


Martin,

We seem to get those "failed on enqueue" quite often.  Of course, our 
cluster is much bigger too.  I've scratched my head on this, and looked 
at the code.  The best I can tell it is when the pvfs client attempts a 
metadata operation to the mgr node.  I suspect that the mgr is slow in 
responding and/or has run out of queueing space to enqueue the metadata 
operation request (create or stat).

Anyone on the list know if mgr has a fixed queue size?  Or can we jack 
up the client timeouts?  Multithread mgr?

 From our testing we're quite convinced the problem lies in mgr, that it 
can't keep up with metadata requests from the clients.

ron

Martin Margo wrote:

> Dear list,
>
> I have a 20 nodes two way itanium cluster running suse linux 
> 2.4.21_144. Each node is an iod and a client, one head node is all 
> three (mgr,iod,client). we are pvfs 1.6.2. I setup this system to 
> perform benchmarking on PVFS. To benchmark pvfs, i used a tool from 
> ASCI called IOR (v2.7.3) an mpi coordinated test of parallel I/O.
>
> Each iod has access to 73 GB internal SCSI disk thus making the entire 
> fs 1.3TB in size.
>
> My ior test hung almost all the time, usually one of the client hung 
> and not responding. I started out with each node writing a 1 GB file 
> with 16k blocksize, and then increase the blocksize and keeping the 1 
> GB file constant. I saw that Rob had a patch out on January 29 2004, 
> is that patch incorporated into pvfs 1.6.2? I would like to try the 
> patch but I couldn's seem to get it from the list, is it available 
> somewhere in the website / ftp server?
>
> by the way this is the error I got in syslog from the node that hung.
>
> Mar  3 01:49:12 tg-c131 kernel: (ll_pvfs.c, 233): ll_pvfs_create 
> failed on enqueue
>  for 198.202.112.166:30004/pvfs-meta/testFile
> Mar  3 01:49:18 tg-c131 kernel: (ll_pvfs.c, 507): ll_pvfs_statfs 
> failed on enqueue
>  for 198.202.112.166:30004/pvfs-meta
> Mar  3 01:49:18 tg-c131 kernel: (inode.c, 264): pvfs_statfs failed
> Mar  3 01:50:54 tg-c131 sshd[9466]: Accepted rsa for root from 
> ::ffff:198.202.112.
> 166 port 33773
>
>
> Thanks in advance.
>
> -Martin W Margo
> HPC Engineer
> San Diego Supercomputer Center
>
> _______________________________________________
> PVFS-users mailing list
> PVFS-users at www.beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs-users


-- 
Ron W. Green
rwgree at sandia.gov
+1-505-284-1600

Sr. Engineer, ICC Applications Support





More information about the PVFS-users mailing list