[Pvfs2-developers] threaded client-core and the device thread
Phil Carns
pcarns at wastedcycles.org
Mon Oct 16 09:37:15 EDT 2006
Sam Lang wrote:
>
> Hi All,
>
> Dean and I are looking at trying to push the efficiency of requests
> from the kernel module up through the device to client-core. I added
> the --threaded option to the client to allow the client-core to run
> with multiple threads (one each for bmi, dev, and main -- and also a
> remount thread, but lets ignore that for now), so the device thread
> should be able to keep pulling requests of the device without having to
> wait for bmi operations to complete.
>
> I noticed a couple things with the device thread that I wanted to ask
> about.
>
> PINT_dev_test_unexpected takes an incount of 5, so its only going to
> read at most 5 requests off the device for each call. Once it returns,
> each of the unexpected requests is added to the completed jobs array
> and then we signal the jobs completed condition variable _for each
> request_. It seems like this will be 5x the number of context switches
> between the device thread and the main thread that we need.
>
> Also, we poll every time before reading another request off the
> device. What about trying to read a number of requests off the device
> at once with one read (or possibly a readv so we can keep separate
> buffers per request).
>
> Also, it looks like we do a malloc for each new request buffer, and
> then a free once we're done with it, and a memset of the info struct.
> It seems like we could manage the buffers on the stack instead of the
> heap, and save on a few system calls there.
>
> For both threaded and nonthreaded, with the workload that Dean is
> using, he found that the PINT_dev_test_unexpected always returned 5
> requests in the outcount. So it looks like there are always requests
> sitting on the device, waiting to be read by client-core. Are we just
> not able to process requests fast enough through BMI and the state
> machines, or is the cost of polling and signaling every time we read a
> request off the device slowing us down? In other words, does it make
> sense to rework the code a little bit or will we just get bottlenecked
> elsewhere?
I am just speculating, but out of the things you list I would guess that
these two things would be most likely to show improvement without much
coding effort:
- increasing the testcount to something higher than 5 (since it sounds
like that is getting maxed out for this workload)
- fixing the "signalling on every request problem"
The need for multiple reads and the mallocs could be a problem, but I am
with Murali in that I think problems in this area are more likely
related to inefficient threading or I/O stalls rather than CPU or memory
overhead.
-Phil
More information about the Pvfs2-developers
mailing list