[Pvfs2-developers] pvfs2-cp performance with single client and single server

Scott Atchley atchley at myri.com
Fri Dec 29 13:35:02 EST 2006


On Dec 29, 2006, at 11:44 AM, Pete Wyckoff wrote:

> atchley at myri.com wrote on Fri, 29 Dec 2006 10:43 -0500:
>> What performance do you typically see with a single client and single
>> server (not the same machine) with 10 Gb/s NICs?
>
> 1 metaserver, 1 io server, 1 client, 16 MB flow buffer sizes.  Here's
> some similarly uninteresting numbers on IB, with the server running
> maybe around 50%.  (Only 800 MB else I fall into swap.):
>
> ib30$ pvfs2-cp -t /tmp/tmpfs/800m /pvfs-ib/x1
> Wrote 838860800 bytes in 3.035549 seconds. 263.543767 MB/seconds
>
> ib30$ pvfs2-cp -t -b $((64*1024*1024)) /tmp/tmpfs/800m /pvfs-ib/x1
> Wrote 838860800 bytes in 2.237504 seconds. 357.541259 MB/seconds

Thanks for the sanity check.

> pvfs2-cp isn't that great a code.  Find yourself an MPI interface
> benchmark, like "perf".  This produces server load around 90%:
>
> ib30$ mpiexec -n 1 2402/perf -n 10 -s 800m -c 100m -f pvfs2:/pvfs- 
> ib/x1
> #np size   chunk  write no sync- read no sync-- write sync---- read  
> sync-----
> #   (MB)    (MB)  (MB/s)         (MB/s)         (MB/s)         (MB/s)
> 1    800.0  100.0 681.56 +-  1.9 612.87 +-  2.2 679.99 +-  1.0  
> 613.76 +-  3.0
>
> With 1 MB flow buffers, the server is pegged and slower, more like
> what you're seeing:
>
> ib30$ mpiexec -n 1 2402/perf -n 10 -s 800m -c 100m -f pvfs2:/pvfs- 
> ib/x1
> #np size   chunk  write no sync- read no sync-- write sync---- read  
> sync-----
> #   (MB)    (MB)  (MB/s)         (MB/s)         (MB/s)         (MB/s)
> 1    800.0  100.0 342.73 +-  3.8 317.24 +-  2.6 343.96 +-  2.2  
> 318.34 +-  1.8

Do these require the kernel module? I have not tried using that yet.

> It's important to keep the flow buffer size comparable with the
> network speed.  The default 256 kB is too small even for gige.
> The stripe size only comes into play with multiple IO servers, and
> that wants to be large too.

I do not see any improvement using larger than 1 MB with  
FlowBufferSizeBytes.

>> On the same machine, if I use dd to copy from /dev/zero to /mnt/ 
>> tmpfs/
>> zeros using 1 MB blocks, I get 300 MB/s for a 1 GB file.
>
> This is wrong.  You should get 700-900 MB/s for memcpy on a recent
> vintage machine.  Data in tmpfs will go to swap if you exceed the
> free memory on the box.  Watch for that.

I was not swapping (I have 8 GB available). Using ramfs instead of  
tmpfs, I can get 1,200 MB/s. I have switched to ramfs but the numbers  
are roughly the same.

>> Initially, I used the dumbest of BMI_meth_memalloc() and
>> BMI_meth_memfree(), where they are simply calls to malloc() and free
>> (), and I was getting about 300 MB/s. Thinking that this was the
>> problem, I tinkered with mallopt() to set higher thresholds for trim
>> and mmap. This added about 50 MB/s.
>>
>> Next, I added pre-malloced memory on startup and I manage a list of
>> these buffers. This added another 50 MB/s to get me to 400 MB/s.
>
> IB uses malloc/free, but caches freed blocks to avoid costly
> re-registrations later, handing out an old block on future malloc
> calls.  You probably don't care about registration, but we added
> a hook so the IO client can tell the BMI device about the
> user-supplied buffer rather than seeing lots of 64 kB buffers:
> BMI_OPTIMISTIC_BUFFER_REG.

What does this do? Internally, MX can cache some registrations (the  
API does not expose it). It is in my best interest to try to reuse  
buffers.

>> I
>> tried playing with pvfs2-cp's -b option but performance never
>> improved over the default behavior. Interestingly, on the client,
>> pvfs2-cp only uses two 1 MB buffers (over and over) for the entire 1
>> GB transfer. Is this intentional? Does this mean, that only one
>> buffer is in flight while the other is being filled? Is there a way
>> to get pvfs2-cp to use more concurrent messages?
>
> pvfs2-cp is not exactly optimized for performance.  Don't spend too
> much time worrying about it.
>
> 		-- Pete

I have found that the -b option _really_ likes multiples of 10 MB  
(the defaults seems to be 10 MB).

Scott


More information about the Pvfs2-developers mailing list