[Pvfs2-developers] pvfs2-cp performance with single client and
single server
Scott Atchley
atchley at myri.com
Fri Dec 29 13:35:02 EST 2006
On Dec 29, 2006, at 11:44 AM, Pete Wyckoff wrote:
> atchley at myri.com wrote on Fri, 29 Dec 2006 10:43 -0500:
>> What performance do you typically see with a single client and single
>> server (not the same machine) with 10 Gb/s NICs?
>
> 1 metaserver, 1 io server, 1 client, 16 MB flow buffer sizes. Here's
> some similarly uninteresting numbers on IB, with the server running
> maybe around 50%. (Only 800 MB else I fall into swap.):
>
> ib30$ pvfs2-cp -t /tmp/tmpfs/800m /pvfs-ib/x1
> Wrote 838860800 bytes in 3.035549 seconds. 263.543767 MB/seconds
>
> ib30$ pvfs2-cp -t -b $((64*1024*1024)) /tmp/tmpfs/800m /pvfs-ib/x1
> Wrote 838860800 bytes in 2.237504 seconds. 357.541259 MB/seconds
Thanks for the sanity check.
> pvfs2-cp isn't that great a code. Find yourself an MPI interface
> benchmark, like "perf". This produces server load around 90%:
>
> ib30$ mpiexec -n 1 2402/perf -n 10 -s 800m -c 100m -f pvfs2:/pvfs-
> ib/x1
> #np size chunk write no sync- read no sync-- write sync---- read
> sync-----
> # (MB) (MB) (MB/s) (MB/s) (MB/s) (MB/s)
> 1 800.0 100.0 681.56 +- 1.9 612.87 +- 2.2 679.99 +- 1.0
> 613.76 +- 3.0
>
> With 1 MB flow buffers, the server is pegged and slower, more like
> what you're seeing:
>
> ib30$ mpiexec -n 1 2402/perf -n 10 -s 800m -c 100m -f pvfs2:/pvfs-
> ib/x1
> #np size chunk write no sync- read no sync-- write sync---- read
> sync-----
> # (MB) (MB) (MB/s) (MB/s) (MB/s) (MB/s)
> 1 800.0 100.0 342.73 +- 3.8 317.24 +- 2.6 343.96 +- 2.2
> 318.34 +- 1.8
Do these require the kernel module? I have not tried using that yet.
> It's important to keep the flow buffer size comparable with the
> network speed. The default 256 kB is too small even for gige.
> The stripe size only comes into play with multiple IO servers, and
> that wants to be large too.
I do not see any improvement using larger than 1 MB with
FlowBufferSizeBytes.
>> On the same machine, if I use dd to copy from /dev/zero to /mnt/
>> tmpfs/
>> zeros using 1 MB blocks, I get 300 MB/s for a 1 GB file.
>
> This is wrong. You should get 700-900 MB/s for memcpy on a recent
> vintage machine. Data in tmpfs will go to swap if you exceed the
> free memory on the box. Watch for that.
I was not swapping (I have 8 GB available). Using ramfs instead of
tmpfs, I can get 1,200 MB/s. I have switched to ramfs but the numbers
are roughly the same.
>> Initially, I used the dumbest of BMI_meth_memalloc() and
>> BMI_meth_memfree(), where they are simply calls to malloc() and free
>> (), and I was getting about 300 MB/s. Thinking that this was the
>> problem, I tinkered with mallopt() to set higher thresholds for trim
>> and mmap. This added about 50 MB/s.
>>
>> Next, I added pre-malloced memory on startup and I manage a list of
>> these buffers. This added another 50 MB/s to get me to 400 MB/s.
>
> IB uses malloc/free, but caches freed blocks to avoid costly
> re-registrations later, handing out an old block on future malloc
> calls. You probably don't care about registration, but we added
> a hook so the IO client can tell the BMI device about the
> user-supplied buffer rather than seeing lots of 64 kB buffers:
> BMI_OPTIMISTIC_BUFFER_REG.
What does this do? Internally, MX can cache some registrations (the
API does not expose it). It is in my best interest to try to reuse
buffers.
>> I
>> tried playing with pvfs2-cp's -b option but performance never
>> improved over the default behavior. Interestingly, on the client,
>> pvfs2-cp only uses two 1 MB buffers (over and over) for the entire 1
>> GB transfer. Is this intentional? Does this mean, that only one
>> buffer is in flight while the other is being filled? Is there a way
>> to get pvfs2-cp to use more concurrent messages?
>
> pvfs2-cp is not exactly optimized for performance. Don't spend too
> much time worrying about it.
>
> -- Pete
I have found that the -b option _really_ likes multiples of 10 MB
(the defaults seems to be 10 MB).
Scott
More information about the Pvfs2-developers
mailing list