[Pvfs2-developers] null-aio
Troy Benjegerdes
troy at scl.ameslab.gov
Fri Apr 25 19:51:44 EDT 2008
Do you still use posix-aio with O_DIRECT?
I'd be very interested in a linux-native AIO trove module that uses
O_DIRECT, and this would avoid any buffer cache hits. I get the best IO
performance out of our storage servers when using linux native O_DIRECT
and linux native AIO.
Sam Lang wrote:
>
> Hi Phil,
>
> Its good to get this functionality into the code base -- we've had a
> number of attempts at this sort of thing, but none of them got
> committed to HEAD, and having it in there in whatever state is better
> than not. I have a concern (and overall design gripe) with the use of
> AIO interfaces for this sort of thing, when we already have callback
> structures in dbpf.
>
> We now have two levels of indirection, with threads being created and
> managed in both. Obviously, that's more code to manage in different
> locations that do more or less the same thing, making it harder for
> others to understand and augment.
>
> In general I don't think the aio callback structures are needed at
> all, but its admittedly much easier to implement to those functions
> than the dbpf ones, if only because of the disorderly op mgmt code in
> dbpf bstream.
>
> I don't know what our long term plans will be for the trove code, but
> I would vote for trying to move towards a simpler centralized location
> for management of the IO threads and queues, and different callbacks
> for IO impls. I've done a prototype of this for queue/thread
> management and O_DIRECT, and I think it would clean things up quite a
> bit to go that route.
>
> -sam
>
> On Apr 17, 2008, at 4:10 PM, Phil Carns wrote:
>
>> There is a new trove method available in trunk now called "null-aio".
>> It can be selected by putting "TroveMethod null-aio" in the
>> <StorageHints> section of the file system configuration file.
>>
>> This is only useful for debugging purposes, because it deliberately
>> skips doing any file I/O on the server side. Please use with
>> caution! It does all metadata operations the same as any other
>> method, but file reads will return garbage and file writes are thrown
>> away. Writing beyond eof triggers a truncate to mimic the
>> appropriate resulting bstream size.
>>
>> This might be useful once in a while for narrowing down performance
>> problems between network and storage. It takes the storage out of
>> the loop and shows approximately what the network is capable of by
>> itself. Of course it will only work for benchmarks that don't verify
>> data correctness (or otherwise rely on data read off of PVFS).
>>
>> We used to have a compile time option (--disable-disk-io) for this
>> same purpose, but that actually hasn't worked in a while. Nowadays
>> its easier to just do this as a trove method that can be selected at
>> runtime without recompiling.
>>
>> -Phil
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list