[PVFS2-developers] latest mpi-io-test numbers on jazz

Phil Carns pcarns at wastedcycles.org
Thu Mar 31 19:45:40 EST 2005


>>On the other hand, we see apparently the opposite effect in the write 
>>cases; we hit peaks early in # of nodes and flatten out after that. 
>>That almost certainly has to be a local storage bottleneck.  How does 
>>our peak local storage I/O performance match with iozone/bonnie numbers 
>>on those same nodes?  Is there room for improvement in how we leverage 
>>that local storage, or are those peaks simply the best we can do with 
>>those disks?
>  
> I love it when I've managed to anticipate one of your questions :>
> 
> Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> j28              2G 20024  96 31642  21 14055   5 20273  87 31502   4 210.1   0
>                     ------Sequential Create------ --------Random Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                  16  2376  97 +++++ +++ +++++ +++  2435  98 +++++ +++  6028  96
> j28,2G,20024,96,31642,21,14055,5,20273,87,31502,4,210.1,0,16,2376,97,+++++,+++,+++++,+++,2435,98,+++++,+++,6028,96
> 
> The 'block sequential input' number of ~31 MiB/sec means we are
> getting just over half of peak from trove.  

There definitely needs to be some improvement here.  I think the most 
important write graph is the "nosync" version, though.  According to the 
rough calculations in the earlier email, that graph shows us getting 23 
MB/s per server, which is close to 75% of peak.  Still not as good as we 
could be, though.

I don't think that the runs with TroveSync set to yes really tell us 
that much.  Those are forcing trove to sync after every individual 
underlying operation (which can be no larger than 256K each by default) 
whether the application wants it to or not.  That's adding a good bit of 
superfluous work to a benchmark that is already syncing the traditional 
way by just calling a sync() function once at the end of a write.  It is 
also probably causing blocks to flush to disk in an unusual order given 
the interleaving of data from many clients, therefore causing extra disk 
latency.

PVFS2 ships with the default configuration files setting TroveSync to 
yes, though, so I guess that's really showing PVFS2's normal mode of 
operation.  IMHO, I tend to think that turning TroveSync off is a better 
default choice.  Maybe at least until we have a server controlled cache 
that prevents us from hitting trove unless we really want to hit 
physical disk.  Applications still have the choice of deciding when/if 
to explicitly sync regardless of the TroveSync option.

-Phil


More information about the PVFS2-developers mailing list