[Pvfs2-users] Major Performance Issues with my pvfs2 install
jkusznir at gmail.com
Thu Sep 29 11:42:59 EDT 2011
1) iperf (defaults) reported 873, 884, and 929 for connections form
the three servers to the head node (a pvfs2 client)
2) no errors showed up on any of the ports on the managed switch.
3) I'm not sure what this will do, as the pvfs2 volume is comprised of
3 servers, so mounting it on a server still uses the network for the
other two. I also don't understand "single file per datafile"
statement. In any case, I do not have the kernel module compiled on
my servers; they ONLY have the pvfs2 server software installed.
4) I'm not sure; I used largely defaults. I've attached my config below.
5) the network bandwidth is on one of the servers (the one I checked;
I believe them to all be similar).
6) Not sure. I have created an XFS filesystem using LVM to combine
the two hardware raid6 volumes and mounted that at /mnt/pvfs2 on the
servers. I then let pvfs do its magic. Config files below.
7(from second e-mail): Config file attached.
[root at pvfs2-io-0-2 mnt]# cat /etc/pvfs2-fs.conf
Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
Range pvfs2-io-0-0 4-715827885
Range pvfs2-io-0-1 715827886-1431655767
Range pvfs2-io-0-2 1431655768-2147483649
Range pvfs2-io-0-0 2147483650-2863311531
Range pvfs2-io-0-1 2863311532-3579139413
Range pvfs2-io-0-2 3579139414-4294967295
All the server config files are very similar.
On Wed, Sep 28, 2011 at 4:45 PM, Michael Moore <mtmoore at omnibond.com> wrote:
> No doubt something is awry. Offhand I'm suspecting the network. A couple
> things that might help give a direction:
> 1) Do an end-to-end TCP test between client/server. Something like iperf or
> nuttcp should do the trick.
> 2) Check server and client ethernet ports on the switch for high error
> counts (not familiar with that switch, not sure if it's managed or not).
> Hardware (port/cable) errors should show up in the above test.
> 3) Can you mount the PVFS2 file system on the server and run some I/O tests
> (single datafile per file) to see if the network is in fact in play.
> 4) What are the number of datafiles (by default) each file you're writing to
> is using? 3?
> 5) When you watch network bandwidth and see 10 MB/s where is that? On the
> 6) What backend are you using for I/O, direct or alt-aio. Nothing really
> wrong either way, just wondering.
> It sounds like based on the dd output the disks are capable of more than
> you're seeing, just need to narrow down where the performance is getting
> On Wed, Sep 28, 2011 at 6:10 PM, Jim Kusznir <jkusznir at gmail.com> wrote:
>> Hi all:
>> I've got a pvfs2 install on my cluster. I never felt it was
>> performing up to snuff, but lately it seems that things have gone way,
>> way down in total throughput and overall usability. To the tune that
>> jobs writing out 900MB will take an extra 1-2 hours to complete due to
>> disk I/O waits. A 2-hr job that would write about 30GB over the
>> course of the run (normally about 2hrs long) takes up to 20hrs. Once
>> the disk I/O is cut out, it completes in 1.5-2hrs. I've noticed
>> personally that there's up to a 5 sec lag time when I cd into
>> /mnt/pvfs2 and do an ls. Note that all of our operations are using
>> the kernel module / mount point. Our problems and code base do not
>> support the use of other tools (such as the pvfs2-* or the native MPI
>> libraries); its all done through the kernel module / filesystem
>> My configuration is this: 3 pvfs2 servers (Dell PowerEdge 1950's with
>> 1.6Ghz quad-core CPUs, 4GB ram, raid-0 for metadata+os on perc5i
>> card), Dell Perc6e card with hardware raid6 in two volumes: one on a
>> bunch of 750GB sata drives, and the other on its second SAS connector
>> to about 12 2tb WD drives. The two raid volumes are lvm'ed together
>> in the OS and mounted as the pvfs2 data store. Each server is
>> connected via ethernet to a stack of LG-errison gig-e switches
>> (stack==2 switches with 40Gbit stacking cables installed). PVFS 2.8.2
>> used throughout the cluster on Rocks (using site-compiled pvfs, not
>> the rocks-supplied pvfs). OSes are CentOS5-x-based (both clients and
>> As I said, I always felt something wasn't quite right, but a few
>> months back, I performed a series of upgrades and reconfigurations on
>> the infrastructure and hardware. Specifically, I upgraded to the
>> lg-errison switches and replaced a full 12-bay drive shelf with a
>> 24-bay one (moving all the disks through) and adding some additional
>> disks. All three pvfs2 servers are identical in this. At some point
>> prior to these changes, my users were able to get acceptable
>> performance from pvfs2; now they are not. I don't have any evidence
>> pointing to the switch or to the disks.
>> I can run dd if=/dev/zero of=testfile bs=1024k count=10000 and get
>> 380+MB/s locally on the pvfs server, writing to the partition on the
>> hardware raid6 card. From a compute node, doing that for 100MB file,
>> I get 47.7MB/s to my RAID-5 NFS server on the head node, and 36.5MB/s
>> to my pvfs2 mounted share. When I watch the network
>> bandwidth/throughput using bwm-ng, I rarely see more than 10MB/s, and
>> often its around 4MB/s with a 12-node IO-bound job running.
>> I originally had the pvfs2 servers connected to the switch with dual
>> gig-e connections and using bonding (ALB) to make it more able to
>> serve multiple nodes. I never saw anywhere close to the throughput I
>> should. In any case, to test of that was the problem, I removed the
>> bonding and am running through a single gig-e pipe now, but
>> performance hasn't improved at all.
>> I'm not sure how to troubleshoot this problem further. Presently, the
>> cluster isn't usable for large I/O jobs, so I really have to fix this.
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
More information about the Pvfs2-users