[PVFS2-developers] latest mpi-io-test numbers on jazz
Rob Ross
rross at mcs.anl.gov
Tue Mar 29 11:41:09 EST 2005
Hi all,
Thanks for the updated numbers RobL; I think that we should try to do
this more often, and ideally get some real storage servers attached to
Jazz so we can see some better overall I/O performance.
Two things jump out at me with respect to this data.
The first is that clearly we're not getting all the bandwidth we can out
of our I/O servers in the N:N servers to clients read case, because
our bandwidth continues to climb as we add more clients than we have
servers. We should figure out why this is; there's a potential
opportunity there to provide better aggregate throughput in smaller # of
client cases.
On the other hand, we see apparently the opposite effect in the write
cases; we hit peaks early in # of nodes and flatten out after that.
That almost certainly has to be a local storage bottleneck. How does
our peak local storage I/O performance match with iozone/bonnie numbers
on those same nodes? Is there room for improvement in how we leverage
that local storage, or are those peaks simply the best we can do with
those disks?
The second thing is the difference (or lack thereof) between these
numbers and the previous ones. On the read side of things, we're
clearly winning with the new system software on those nodes (or maybe
improvements in PVFS2; hard to know w/out more data...). It looks like
the better async I/O support is providing better scalability at the high
end for us.
However, writes haven't changed at all really, again pointing to a local
storage bottleneck.
We've almost hit the 10GB/sec mark! Very cool!
Rob
Robert Latham wrote:
> A little bit ago I queued up a bunch of mpi-io-test jobs on jazz over
> the myrinet network and they finally finished.
>
> http://www.mcs.anl.gov/~robl/pvfs2/jazz-20050308/
>
> [The -eb versions of the plots in that directory show the high, low,
> and average over multiple runs (4 or 5 i think, depending on how many
> jobs finished before Jazz recently had an "unscheduled maintenance" :> ]
>
> As always for these mpi-io-test runs, these read numbers are with warm
> caches. Given that favorable condition, we peaked out at about 8.5
> GiB/sec (128 servers and 100 clients).
>
> The big knee at 8 servers and 100 clients shouldn't be too alarming.
> That's the point where the clients working set exceeds the servers'
> memory (and caches).
>
> Write numbers look rather unimpressive as usual. The disks on Jazz
> compute nodes are nothing special. The recent system software upgrade
> means we can compare the performance impact of TroveSyncMeta and
> TroveSyncData. jazz-gm-write.png is with those config file options
> set to 'yes'. jazz-gm-write-nosync is with them set to 'no'. The
> mpi-io-test program syncs after every MPI_File_write, so we are
> still pushing data to disk, just not as often. Rough calculations
> give us 18 MiB/sec per server with TroveSync{Meta,Data} set to 'yes',
> but 23 MiB/sec per serfer with those options set to 'no'. The
> TroveSync options did not appear to have any impact on read
> performance (nor would one expect to see such an impact).
>
> The biggest change between this run and the previous run in September
> is that Jazz was recently upgraded to Redhat Enterprise Linux 3.
> pvfs2-server now gets to take advantage of working AIO callbacks in
> glibc. Just as Neill asserted, aio callbacks make a significant
> difference in performance.
>
> For comparison, here are the old plots:
> http://www-unix.mcs.anl.gov/~robl/pvfs2/jazz_gm.read-20040928.png
> http://www-unix.mcs.anl.gov/~robl/pvfs2/jazz_gm.write-20040928.png
>
> I guess I should write this up in a little more coherent manner but
> I'm somewhat pressed for time ... let me know if you would like
> anything explained in further detail.
>
> ==rob
More information about the PVFS2-developers
mailing list