[Pvfs2-developers] tuning kernel buffer settings

Sam Lang slang at mcs.anl.gov
Mon Dec 4 01:18:51 EST 2006


Murali, Phil,

I've gone ahead and commited this patch.  Thanks Murali!

-sam

On Nov 29, 2006, at 10:21 PM, Murali Vilayannur wrote:

> Hi Phil,
> Attached patch fixes the read buffer bug that you had mentioned and
> also implements the variable sized buffer counts and lengths that we
> can pass as command line options to pvfs2-client-core.
> I did not implement module time options for buffer size settings since
> that is fairly complicated and not intuitive (client core driving the
> buffer size and count settings
> seems to make more sense to me).
>
> So now we can do
> pvfs2-client --desc-count=<NUM1> --desc-size=<NUM2>
> in addition to the usual options.
> With regards to the changes itself, this involved modifying the
> parameters of an existing ioctl, and so we break binary compatibility,
> but I don't think we have a policy of maintaining backward binary
> compatibility, do we?
> I have updated the compat ioctl code as well, so hopefully we won't
> break in mixed 32-64 bit environments.
> I have tested this out with various buffer sizes and counts on 32 bit
> platforms only!
> That said, I haven't done a comprehensive testing..so there may  
> still be bugs..
> Please review it and let me know if this looks ok.
> BTW: patch is against pvfs-2.6.0..sorry abut that.
> cvs ports are firewalled off at work and my internet at home is
> temporarily not working.
> thanks,
> Murali
>
>
> On 11/29/06, Murali Vilayannur <murali.vilayannur at gmail.com> wrote:
>> Hi Phil,
>> Thanks for running these tests.
>> I think this buffer size will be dependant on the machine  
>> configuration right?
>> If we work out a simple formula for the buffer size based on say
>> memory b/w (and/or latency), network b/w (and/or latency), we could
>> plug that in as a sane default (bandwidth .
>> I did not realize that this setting will have such a noticable effect
>> on performance.
>> I can work on a patch to change these settings at runtime.
>> thanks,
>> Murali
>>
>> > >> - single client
>> > >> - 16 servers
>> > >> - gigabit ethernet
>> > >> - read/write tests, with 40 GB files
>> > >> - using reads and writes of 100 MB each in size
>> > >> - varying number of processes running concurrently on the client
>> > >>
>> > >> This test application can be configured to be run with multiple
>> > >> processes and/or multiple client nodes.  In this case we kept
>> > >> everything on a single client to focus on bottlenecks on that  
>> side.
>> > >>
>> > >> What we were looking at was the kernel buffer settings  
>> controlled  in
>> > >> pint-dev-shared.h.  By default PVFS2 uses 5 buffers of 4 MB   
>> each.
>> > >> After experimenting for a while, we made a few observations:
>> > >>
>> > >> - increasing the buffer size helped performance
>> > >> - using only 2 buffers (rather than 5) was sufficient to  
>> saturate  the
>> > >> client when we were running multiple processes; adding more   
>> made only
>> > >> a marginal difference
>> > >>
>> > >> We found good results using 2 32MB buffers.  Here are some
>> > >> comparisons between the standard settings and the 2 x 32MB
>> > >> configuration:
>> > >>
>> > >> results for RHEL4 (2.6 kernel):
>> > >> ------------------------------
>> > >> 5 x 4MB, 1 process: 83.6 MB/s
>> > >> 2 x 32MB, 1 process: 95.5 MB/s
>> > >>
>> > >> 5 x 4MB, 5 processes: 107.4 MB/s
>> > >> 2 x 32MB, 5 processes: 111.2 MB/s
>> > >>
>> > >> results for RHEL3 (2.4 kernel):
>> > >> -------------------------------
>> > >> 5 x 4MB, 1 process: 80.5 MB/s
>> > >> 2 x 32MB, 1 process: 90.7 MB/s
>> > >>
>> > >> 5 x 4MB, 5 processes: 91 MB/s
>> > >> 2 x 32MB, 5 processes: 103.5 MB/s
>> > >>
>> > >>
>> > >> A few comments based on those numbers:
>> > >>
>> > >> - on 3 out of 4 tests, we saw a 13-15% performance  
>> improvement by
>> > >> going to 2 32 MB buffers
>> > >> - the remaining test (5 process RHEL4) probably did not see  
>> as much
>> > >> improvement because we maxed out the network.  In the past,  
>> netpipe
>> > >> has shown that we can get around 112 MB/s out of these nodes.
>> > >> - the RHEL3 nodes are on a different switch, so it is hard to  
>> say  how
>> > >> much of the difference from RHEL3 to RHEL4 is due to network   
>> topology
>> > >> and how much is due to the kernel version
>> > >>
>> > >> It is also worth noting that even with this tuning, the single
>> > >> process tests are about 14% slower than the 5 process tests.   
>> I am
>> > >> guessing that this is due to a lack of pipelining, probably  
>> caused  by
>> > >> two things:
>> > >> - the application only submitting one read/write at a time
>> > >> - the kernel module itself serializing when it breaks reads/ 
>> writes
>> > >> into buffer sized chunks
>> > >>
>> > >> The latter could be addressed by either pipelining the I/O  
>> through
>> > >> the bufmap interface (so that a single read or write could keep
>> > >> multiple buffers busy) or by going to a system like Murali  
>> came up
>> > >> with for memory transfers a while back that isn't limited by  
>> buffer
>> > >> size.
>> > >>
>> > >> It would also be nice to have a way to set these buffer settings
>> > >> without recompiling- either via module options or via pvfs2- 
>> client-
>> > >> core command line options.  For the time being we are going  
>> to hard
>> > >> code our tree to run with the 32 MB buffers.  The 64 MB of  
>> RAM that
>> > >> this uses up (vs. 20 MB with the old settings) doesn't  
>> really  matter
>> > >> for our standard node footprint.
>> > >>
>> > >> -Phil
>> > >> _______________________________________________
>> > >> Pvfs2-developers mailing list
>> > >> Pvfs2-developers at beowulf-underground.org
>> > >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2- 
>> developers
>> > >>
>> > >
>> >
>> > _______________________________________________
>> > Pvfs2-developers mailing list
>> > Pvfs2-developers at beowulf-underground.org
>> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2- 
>> developers
>> >
>>
>> <kmod-bufsizes.patch>



More information about the Pvfs2-developers mailing list