[PVFS2-developers] latest mpi-io-test numbers on jazz

Rob Ross rross at mcs.anl.gov
Tue Mar 29 11:41:09 EST 2005

Hi all,

Thanks for the updated numbers RobL; I think that we should try to do 
this more often, and ideally get some real storage servers attached to 
Jazz so we can see some better overall I/O performance.

Two things jump out at me with respect to this data.

The first is that clearly we're not getting all the bandwidth we can out 
  of our I/O servers in the N:N servers to clients read case, because 
our bandwidth continues to climb as we add more clients than we have 
servers.  We should figure out why this is; there's a potential 
opportunity there to provide better aggregate throughput in smaller # of 
client cases.

On the other hand, we see apparently the opposite effect in the write 
cases; we hit peaks early in # of nodes and flatten out after that. 
That almost certainly has to be a local storage bottleneck.  How does 
our peak local storage I/O performance match with iozone/bonnie numbers 
on those same nodes?  Is there room for improvement in how we leverage 
that local storage, or are those peaks simply the best we can do with 
those disks?

The second thing is the difference (or lack thereof) between these 
numbers and the previous ones.  On the read side of things, we're 
clearly winning with the new system software on those nodes (or maybe 
improvements in PVFS2; hard to know w/out more data...).  It looks like 
the better async I/O support is providing better scalability at the high 
end for us.

However, writes haven't changed at all really, again pointing to a local 
storage bottleneck.

We've almost hit the 10GB/sec mark!  Very cool!


Robert Latham wrote:
> A little bit ago I queued up a bunch of mpi-io-test jobs on jazz over
> the myrinet network and they finally finished.  
> http://www.mcs.anl.gov/~robl/pvfs2/jazz-20050308/
> [The -eb versions of the plots in that directory show the high, low,
> and average over multiple runs (4 or 5 i think, depending on how many
> jobs finished before Jazz recently had an "unscheduled maintenance" :> ]
> As always for these mpi-io-test runs, these read numbers are with warm
> caches.  Given that favorable condition, we peaked out at about 8.5
> GiB/sec (128 servers and 100 clients).   
> The big knee at 8 servers and 100 clients shouldn't be too alarming.
> That's the point where the clients working set exceeds the servers'
> memory (and caches).
> Write numbers look rather unimpressive as usual.  The disks on Jazz
> compute nodes are nothing special.  The recent system software upgrade
> means we can compare the performance impact of TroveSyncMeta and
> TroveSyncData.  jazz-gm-write.png is with those config file options
> set to 'yes'.  jazz-gm-write-nosync is with them set to 'no'.  The
> mpi-io-test program syncs after every MPI_File_write, so we are
> still pushing data to disk, just not as often.   Rough calculations
> give us 18 MiB/sec per server with TroveSync{Meta,Data} set to 'yes', 
> but 23 MiB/sec per serfer with those options set to 'no'.   The
> TroveSync options did not appear to have any impact on read
> performance (nor would one expect to see such an impact).
> The biggest change between this run and the previous run in September
> is that Jazz was recently upgraded to Redhat Enterprise Linux 3.
> pvfs2-server now gets to take advantage of working AIO callbacks in
> glibc.  Just as Neill asserted, aio callbacks make a significant
> difference in performance. 
> For comparison, here are the old plots:
> http://www-unix.mcs.anl.gov/~robl/pvfs2/jazz_gm.read-20040928.png
> http://www-unix.mcs.anl.gov/~robl/pvfs2/jazz_gm.write-20040928.png
> I guess I should write this up in a little more coherent manner but
> I'm somewhat pressed for time ... let me know if you would like
> anything explained in further detail.  
> ==rob

More information about the PVFS2-developers mailing list