[Pvfs2-developers] threaded client-core and the device thread
Phil Carns
pcarns at wastedcycles.org
Tue Oct 17 13:16:31 EDT 2006
> Just to see if I'm noticing the same issue, what was the exact problem
> Phil was noticing? Shouldn't multiple requests take longer than a
> single request?
>
> The workload I was using was multiple rpc.nfsd threads issuing 64 KB
> requests (through the writev/readv interface) to the PVFS2 kernel module
> (and then to client-core and so on). To make things easy, I bet using
> iozone with multiple threads and a random workload would simulate this
> workload quite well. What I was noticing is that although we haven't
> reached disk, cpu, or network limits, the I/O throughput is fixed at
> some low value.
>
> One test Sam and I tried was to increase the number of kernel mmapped
> buffers. Instead of five 4MB buffers, we used sixty-four 128KB
> buffers. This reduced performance considerably, especially read
> performance. Since we are using 64KB requests, this should not be an
> issue, but it was. One thing we didn't get a chance to try was if the
> reduced performance was because of the increase in buffers or the
> reduction in size. My guess would be the increase, but why would this be?
>
> Beyond inefficient coding issues, Sam and I talked about where the
> bottleneck could be from a design standpoint. We came up with the
> following list:
> 0) kmapping and copying data is going at fast as possible
> 1) Sending message through the pvfs2-req device can only happen at a
> constant rate.
> 2) client-core reading message off the pvfs2-req device (should no
> longer be an issue with the --threaded option, but maybe reading 5 at a
> time is still inefficient)
> 3) A single BMI thread issuing I/O requests. Are multiple threads
> necessary to issue the multiple I/O requests from the kernel?
>
> Can anyone think of other parts of the I/O path that might be a
> bottleneck? So far, we have only started to investigate items 1 and 2.
>
> Thanks for everyone's help.
> Dean
>
To follow up on your question of what I saw, see the following mailing
list post. This was back in June but I haven't had an opportunity to
really look at it closer yet:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2006-June/002208.html
At any rate, I was running a benchmark with 5 processes on a single
node. I found that I got a significant performance improvement by
limiting the kernel module to only 1 transfer buffer rather than the
default of 5.
If you are seeing the same issue that I was, then it seems to indicate
that the number of buffers that you are using is causing additional
slowdown rather than the size of the buffers. I have no idea why. It
may be a direct problem with the mechanism that handles the buffers, or
it may be an indirect result elsewhere that only shows up when we get
concurrent I/O operations in flight from the VFS.
-Phil
More information about the Pvfs2-developers
mailing list