[Pvfs2-users] Performance Problems

Phil Carns carns at mcs.anl.gov
Thu May 7 10:11:41 EDT 2009


Just to comment further on #2 (the slow cp issue), the biggest problem 
there is the relatively old version of coreutils that shipped with RHEL 
5/CENTOS 5.  They opted for 5.x, despite 6.x being available at the 
time.  For most people it probably doesn't matter, but a notable problem 
with 5.x is that it doesn't honor the block size suggested by pvfs in 
all cases.

CENTOS 5 comes with coreutils 5.97.  Here is a comparison on my laptop 
of that version vs. something more recent:

$ dpkg -l |grep coreutils
ii  coreutils                                  6.10-6ubuntu1 
                             The GNU core utilities

$ time cp /mnt/pvfs2/50MB.dat /tmp/50A.dat
real	0m1.348s
user	0m0.008s
sys	0m0.120s

$ time /home/pcarns/tar/coreutils-5.97/src/cp /mnt/pvfs2/50MB.dat 
/tmp/50B.dat
real	0m8.897s
user	0m0.028s
sys	0m0.584s

strace shows that coreutils 6.10 is moving data 64 KB at a time, while 
coreutils 5.97 is moving data 4 KB at a time.  As a side note, you can 
increase the block size for 6.x further by increasing the strip size in 
PVFS.

PVFS probably isn't going to perform as well as you want for serialized 
4KB transfers no matter how you tune, so you might want to consider 
either upgrading coreutils/cp or else trying to convince your users not 
to use cp on PVFS.

-Phil

Kyle Schochenmaier wrote:
> Jim -
> 
> I will be brief here because I dont have a specific answer for all of
> your questions right now.
> 1.  Bonnie++ is notoriously evil to PVFS2 because it uses relatively
> small-accesses which just arent what PVFS2 is tuned for usually.
> 2.  pvfs2-cp uses a larger IO buffer than `cp`   newer versions of
> `cp` allow you to specify what size of buffer is used to transfer
> data, but these versions are not out in public that I'm aware of.   I
> would suggest using `dd if=<local_file> of=</pvfs2-mount/file>  bs=1M
> count=10240`  for example to transfer a 10GB file to pvfs, or swap the
> of/if to take it off of pvfs, it will be orders of magnitude faster
> than plain old cp.
> 3.  I agree with your comment about the ram loss, I've heard of this
> before, but I'm not aware of there being a fix for it because it tends
> to be very difficult to reproduce.  If you can do some debugging and
> pinpoint where this memory loss is happening, we'd definitely love to
> work with you on that.
> 
> 
> Good luck!
> 
> ~Kyle
> Kyle Schochenmaier
> 
> 
> 
> On Wed, May 6, 2009 at 4:21 PM, Jim Kusznir <jkusznir at gmail.com> wrote:
>> Hi all:
>>
>> I've been trying to track down some performance problems with my pvfs2
>> system on my HPC cluster.  Here's my system arch:
>>
>> I have 3 dedicated I/O nodes, each are identical Dell PowerEdge 1950's
>> with PERC 6/e cards attached to a 15-disk MD1000 that has about 9.8TB
>> of storage after RAID-6'ing it.
>> Each I/O node has both of its Gig-E interfaces connected to the
>> cluster's switch and bonded together (bond0).  The systems are running
>> CentOS 5 and the RAID is formatted with XFS.
>> bonnie++ on one of the raid disk (locally) reports:
>>
>> Version  1.94       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
>> pvfs2-io-0-0.loc 8G   596  99 177257  35 93649  21   851  97 353346  35 353.29
>> Latency             42912us     733ms    1016ms   65897us     436ms     118ms
>> Version  1.94       ------Sequential Create------ --------Random Create--------
>> pvfs2-io-0-0.local  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>>                 16  1752  32 +++++ +++  4639  26  1932  37 +++++ +++  437823
>> Latency               135ms     104us     688ms   80075us     136us     214ms
>> 1.93c,1.94,pvfs2-io-0-0.local,1,1241623639,8G,,596,99,177257,35,93649,21,851,97,353346,35,353.2,9,16,,,,,1752,32,+++++,+++,4639,26,1932,37,+++++,+++,4378,23,42912us,733ms,1016ms,65897us,436ms,118ms,135ms,104us,688ms,80075us,136us,214ms
>>
>> I then have 24 compute nodes, running ROCKS 5.1 (all PVFS2 supported
>> added by me, NOT using rocks pvfs2 roll), and connected via a single
>> Gig-E port to the same switch.
>>
>> I have 1 head node, Dell 2950 with a RAID-5 for user home directories,
>> but otherwise identical to the compute nodes.  It has one gig-e
>> interface on the "outside world" and one on the "cluster switch".
>> I also have a gig-e fiber link to a second set of compute nodes
>> located off site, but we'll ignore those for now (I haven't performed
>> performance testing on them yet).
>>
>> I started with what I thought would be the base case: testing I/O
>> performance on the head node.  This is a valid use case, as users are
>> moving data sets into or out of pvfs2 through the head node; they may
>> also be running some single-threaded analysis on their data or some
>> post-processing.  I'd say that currently, about half (maybe a bit
>> more) of all I/O to pvfs2 happens in this way.
>>
>> Using pvfs2-cp  -t, I obtain about 60MB/s with large file I/O (moving
>> a 10GB file to or from the pvfs).  While not stellar, it works.
>> However, any I/O happening with the filesystem interface deteriorates
>> rapidly.  Performing the same copy, but this time with time cp .....
>> (i.e., using the native / kernel filesystem hooks, I get only 2.97
>> MB/s.
>>
>> After 2 hours, I have not yet been able to complete a single run of
>> bonnie++ using the filesystem interface.
>>
>> There's got to be something wrong....How do I go about fixing it?
>>
>> (BTW: I have seen a number of problems with the filesystem / kernel
>> module.  For example, this morning I found about 5GB of ram "missing",
>> and it appears that it got "lost" in the kernel.  While I can't pin it
>> on pvfs2, this doesn't happen if I don't have the pvfs2 module loaded.
>>  I haven't been able to reproduce it easily, but I think it has to do
>> with all my nodes running updatedb at the same time.  I realize the
>> solution in this case is to tell updatedb not to scan pvfs, but why
>> does pvfs kernel module loose 5GB of ram when this happens?  It should
>> either work slowly or fail miserably, but it should NOT crash the
>> system or loose large quantities of ram permanently (must reboot to
>> reclaim ram)).
>>
>> --Jim
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
> 
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



More information about the Pvfs2-users mailing list