[PVFS2-developers] latest mpi-io-test numbers on jazz
Phil Carns
pcarns at wastedcycles.org
Thu Mar 31 19:45:40 EST 2005
>>On the other hand, we see apparently the opposite effect in the write
>>cases; we hit peaks early in # of nodes and flatten out after that.
>>That almost certainly has to be a local storage bottleneck. How does
>>our peak local storage I/O performance match with iozone/bonnie numbers
>>on those same nodes? Is there room for improvement in how we leverage
>>that local storage, or are those peaks simply the best we can do with
>>those disks?
>
> I love it when I've managed to anticipate one of your questions :>
>
> Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> j28 2G 20024 96 31642 21 14055 5 20273 87 31502 4 210.1 0
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 2376 97 +++++ +++ +++++ +++ 2435 98 +++++ +++ 6028 96
> j28,2G,20024,96,31642,21,14055,5,20273,87,31502,4,210.1,0,16,2376,97,+++++,+++,+++++,+++,2435,98,+++++,+++,6028,96
>
> The 'block sequential input' number of ~31 MiB/sec means we are
> getting just over half of peak from trove.
There definitely needs to be some improvement here. I think the most
important write graph is the "nosync" version, though. According to the
rough calculations in the earlier email, that graph shows us getting 23
MB/s per server, which is close to 75% of peak. Still not as good as we
could be, though.
I don't think that the runs with TroveSync set to yes really tell us
that much. Those are forcing trove to sync after every individual
underlying operation (which can be no larger than 256K each by default)
whether the application wants it to or not. That's adding a good bit of
superfluous work to a benchmark that is already syncing the traditional
way by just calling a sync() function once at the end of a write. It is
also probably causing blocks to flush to disk in an unusual order given
the interleaving of data from many clients, therefore causing extra disk
latency.
PVFS2 ships with the default configuration files setting TroveSync to
yes, though, so I guess that's really showing PVFS2's normal mode of
operation. IMHO, I tend to think that turning TroveSync off is a better
default choice. Maybe at least until we have a server controlled cache
that prevents us from hitting trove unless we really want to hit
physical disk. Applications still have the choice of deciding when/if
to explicitly sync regardless of the TroveSync option.
-Phil
More information about the PVFS2-developers
mailing list