[Pvfs2-developers] patches: tuning options
Phil Carns
pcarns at wastedcycles.org
Thu Aug 10 18:52:39 EDT 2006
Sam Lang wrote:
>
> On Aug 10, 2006, at 4:04 PM, Phil Carns wrote:
>
>> flow-proto-tuning.patch:
>> -----------
>> This patch adds "FlowBufferSizeBytes" and "FlowBuffersPerFlow"
>> options to the configuration file format. They allow you to specify
>> the buffer size that the default flow protocol will use as well as
>> the maximum number of buffers to use per flow. Note that if you
>> change either of these parameters, then you need to remount any
>> active clients so that they pick up the configuration change before
>> performing any I/O.
>>
>> max-aio.patch:
>> ----------
>> This patch adds "TroveMaxConcurrentIO" to the configuration file
>> format. It allows you to specify the maximum number of I/O
>> operations that trove will allow to proceed concurrently (currently
>> 16). Note from the previous email regarding AIO that depending on
>> your access pattern, AIO may queue all of your operations anyway
>> regardless of this setting. It probably doesn't have much effect
>> unless you are accessing more than one file at a time, or if you are
>> using an alternative to the stock AIO implementation.
>>
>
> I had made the same change in Julian's branch,
Oh, ok. We will switch over when that stuff hits trunk.
> there are still a couple
> things that aren't clear to me about this max value though. First, its
> a global value for all outstanding lio_listio calls the pvfs server
> makes, but based on your previous email comments about glibc's
> one-thread-per-fd oddity, it seems like we only want that value to max
> out per datafile. Also, after we hit the max we just queue the
> operations and post them once current ops have completed. If librt
> just queues ops and does them in FIFO order though, its pretty much the
> same thing. Why not just let librt handle the queuing? If we were to
> do ordering of the operations based on offsets, then it would make
> sense for us to queue, but we don't. Are we better at queueing than
> librt?
I agree that if you are using librt for aio, then this max value isn't
doing much of anything :) librt's queueing isn't exactly the same thing
though. librt allows N operations in flight at a time (where N can be
tuned using aio_init) by way of limiting the maximum number of threads
that it will spawn. However, since it serializes on each fd, that limit
never kicks in unless you are accessing N different files. Otherwise it
is really only going to do one thing at a time. The librt source that I
looked at happened to default N=16, just like Trove was.
I think the point of the aio limit in trove was to try to throttle I/O
on the servers, but it turns out that librt was already throttling above
an beyond; so the trove limit wasn't actually decreasing the number of
posted I/O operations to the kernel any.
Maybe the throttling makes more sense when you bypass librt somehow (as
in the previous patch) because then there is nothing to queue/throttle
the operations besides trove?
At any rate, we decided to make this configurable before understanding
the issues involved- it was just a hardcoded value we saw that looked
like it should have been tunable.
> I know Julian was looking at performance of aio and found results
> somewhere (I don't have a reference, sorry) that showed lio_listio did
> better in cases where multiple fds were passed to one lio_listio
> operation (right now we just do one fd with multiple segments to one
> lio_listio). I wonder if that difference is based on the glibc queuing
> behavior that you describe.
I would guess that the queueing behavior is the reason. I can't imagine
that using seperate files would make much difference once you get to the
system call level.
> Just a curiousity, but I wonder if the aio
> performance would change if we were to post multiple trove operations
> in the same lio_listio call, or possibly even break up the bstream into
> multiple files based on strip size...sounds crazy right? :-)
On the former question, I guess it depends on who is better a
coalescing- the kernel disk scheduler or the trove queue? Hopefully we
can avoid splitting files up :)
-Phil
More information about the Pvfs2-developers
mailing list