[Pvfs2-developers] threaded client-core and the device thread

Phil Carns pcarns at wastedcycles.org
Tue Oct 17 13:16:31 EDT 2006


> Just to see if I'm noticing the same issue, what was the exact problem 
> Phil was noticing?  Shouldn't multiple requests take longer than a 
> single request?
> 
> The workload I was using was multiple rpc.nfsd threads issuing 64 KB 
> requests (through the writev/readv interface) to the PVFS2 kernel module 
> (and then to client-core and so on).  To make things easy, I bet using 
> iozone with multiple threads and a random workload would simulate this 
> workload quite well.  What I was noticing is that although we haven't 
> reached disk, cpu, or network limits, the I/O throughput is fixed at 
> some low value.
> 
> One test Sam and I tried was to increase the number of kernel mmapped 
> buffers.  Instead of five 4MB buffers, we used sixty-four 128KB 
> buffers.  This reduced performance considerably, especially read 
> performance.  Since we are using 64KB requests, this should not be an 
> issue, but it was.  One thing we didn't get a chance to try was if the 
> reduced performance was because of the increase in buffers or the 
> reduction in size.  My guess would be the increase, but why would this be?
> 
> Beyond inefficient coding issues, Sam and I talked about where the 
> bottleneck could be from a design standpoint.  We came up with the 
> following list:
> 0) kmapping and copying data is going at fast as possible
> 1) Sending message through the pvfs2-req device can only happen at a 
> constant rate.
> 2) client-core reading message off the pvfs2-req device (should no 
> longer be an issue with the --threaded option, but maybe reading 5 at a 
> time is still inefficient)
> 3) A single BMI thread issuing I/O requests.  Are multiple threads 
> necessary to issue the multiple I/O requests from the kernel?
> 
> Can anyone think of other parts of the I/O path that might be a 
> bottleneck?  So far, we have only started to investigate items 1 and 2.
> 
> Thanks for everyone's help.
> Dean
> 

To follow up on your question of what I saw, see the following mailing 
list post.  This was back in June but I haven't had an opportunity to 
really look at it closer yet:

http://www.beowulf-underground.org/pipermail/pvfs2-developers/2006-June/002208.html

At any rate, I was running a benchmark with 5 processes on a single 
node.  I found that I got a significant performance improvement by 
limiting the kernel module to only 1 transfer buffer rather than the 
default of 5.

If you are seeing the same issue that I was, then it seems to indicate 
that the number of buffers that you are using is causing additional 
slowdown rather than the size of the buffers.  I have no idea why.  It 
may be a direct problem with the mechanism that handles the buffers, or 
it may be an indirect result elsewhere that only shows up when we get 
concurrent I/O operations in flight from the VFS.

-Phil



More information about the Pvfs2-developers mailing list