[Pvfs2-developers] tuning kernel buffer settings

Murali Vilayannur murali.vilayannur at gmail.com
Wed Nov 29 23:21:48 EST 2006


Hi Phil,
Attached patch fixes the read buffer bug that you had mentioned and
also implements the variable sized buffer counts and lengths that we
can pass as command line options to pvfs2-client-core.
I did not implement module time options for buffer size settings since
that is fairly complicated and not intuitive (client core driving the
buffer size and count settings
seems to make more sense to me).

So now we can do
pvfs2-client --desc-count=<NUM1> --desc-size=<NUM2>
in addition to the usual options.
With regards to the changes itself, this involved modifying the
parameters of an existing ioctl, and so we break binary compatibility,
but I don't think we have a policy of maintaining backward binary
compatibility, do we?
I have updated the compat ioctl code as well, so hopefully we won't
break in mixed 32-64 bit environments.
I have tested this out with various buffer sizes and counts on 32 bit
platforms only!
That said, I haven't done a comprehensive testing..so there may still be bugs..
Please review it and let me know if this looks ok.
BTW: patch is against pvfs-2.6.0..sorry abut that.
cvs ports are firewalled off at work and my internet at home is
temporarily not working.
thanks,
Murali


On 11/29/06, Murali Vilayannur <murali.vilayannur at gmail.com> wrote:
> Hi Phil,
> Thanks for running these tests.
> I think this buffer size will be dependant on the machine configuration right?
> If we work out a simple formula for the buffer size based on say
> memory b/w (and/or latency), network b/w (and/or latency), we could
> plug that in as a sane default (bandwidth .
> I did not realize that this setting will have such a noticable effect
> on performance.
> I can work on a patch to change these settings at runtime.
> thanks,
> Murali
>
> > >> - single client
> > >> - 16 servers
> > >> - gigabit ethernet
> > >> - read/write tests, with 40 GB files
> > >> - using reads and writes of 100 MB each in size
> > >> - varying number of processes running concurrently on the client
> > >>
> > >> This test application can be configured to be run with multiple
> > >> processes and/or multiple client nodes.  In this case we kept
> > >> everything on a single client to focus on bottlenecks on that side.
> > >>
> > >> What we were looking at was the kernel buffer settings controlled  in
> > >> pint-dev-shared.h.  By default PVFS2 uses 5 buffers of 4 MB  each.
> > >> After experimenting for a while, we made a few observations:
> > >>
> > >> - increasing the buffer size helped performance
> > >> - using only 2 buffers (rather than 5) was sufficient to saturate  the
> > >> client when we were running multiple processes; adding more  made only
> > >> a marginal difference
> > >>
> > >> We found good results using 2 32MB buffers.  Here are some
> > >> comparisons between the standard settings and the 2 x 32MB
> > >> configuration:
> > >>
> > >> results for RHEL4 (2.6 kernel):
> > >> ------------------------------
> > >> 5 x 4MB, 1 process: 83.6 MB/s
> > >> 2 x 32MB, 1 process: 95.5 MB/s
> > >>
> > >> 5 x 4MB, 5 processes: 107.4 MB/s
> > >> 2 x 32MB, 5 processes: 111.2 MB/s
> > >>
> > >> results for RHEL3 (2.4 kernel):
> > >> -------------------------------
> > >> 5 x 4MB, 1 process: 80.5 MB/s
> > >> 2 x 32MB, 1 process: 90.7 MB/s
> > >>
> > >> 5 x 4MB, 5 processes: 91 MB/s
> > >> 2 x 32MB, 5 processes: 103.5 MB/s
> > >>
> > >>
> > >> A few comments based on those numbers:
> > >>
> > >> - on 3 out of 4 tests, we saw a 13-15% performance improvement by
> > >> going to 2 32 MB buffers
> > >> - the remaining test (5 process RHEL4) probably did not see as much
> > >> improvement because we maxed out the network.  In the past, netpipe
> > >> has shown that we can get around 112 MB/s out of these nodes.
> > >> - the RHEL3 nodes are on a different switch, so it is hard to say  how
> > >> much of the difference from RHEL3 to RHEL4 is due to network  topology
> > >> and how much is due to the kernel version
> > >>
> > >> It is also worth noting that even with this tuning, the single
> > >> process tests are about 14% slower than the 5 process tests.  I am
> > >> guessing that this is due to a lack of pipelining, probably caused  by
> > >> two things:
> > >> - the application only submitting one read/write at a time
> > >> - the kernel module itself serializing when it breaks reads/writes
> > >> into buffer sized chunks
> > >>
> > >> The latter could be addressed by either pipelining the I/O through
> > >> the bufmap interface (so that a single read or write could keep
> > >> multiple buffers busy) or by going to a system like Murali came up
> > >> with for memory transfers a while back that isn't limited by buffer
> > >> size.
> > >>
> > >> It would also be nice to have a way to set these buffer settings
> > >> without recompiling- either via module options or via pvfs2-client-
> > >> core command line options.  For the time being we are going to hard
> > >> code our tree to run with the 32 MB buffers.  The 64 MB of RAM that
> > >> this uses up (vs. 20 MB with the old settings) doesn't really  matter
> > >> for our standard node footprint.
> > >>
> > >> -Phil
> > >> _______________________________________________
> > >> Pvfs2-developers mailing list
> > >> Pvfs2-developers at beowulf-underground.org
> > >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> > >>
> > >
> >
> > _______________________________________________
> > Pvfs2-developers mailing list
> > Pvfs2-developers at beowulf-underground.org
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kmod-bufsizes.patch
Type: text/x-patch
Size: 36064 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20061129/b4373034/kmod-bufsizes-0001.bin


More information about the Pvfs2-developers mailing list