[Pvfs2-developers] null-aio

Troy Benjegerdes troy at scl.ameslab.gov
Fri Apr 25 19:51:44 EDT 2008


Do you still use posix-aio with O_DIRECT?

I'd be very interested in a linux-native AIO trove module that uses 
O_DIRECT, and this would avoid any buffer cache hits. I get the best IO 
performance out of our storage servers when using linux native O_DIRECT 
and linux native AIO.

Sam Lang wrote:
>
> Hi Phil,
>
> Its good to get this functionality into the code base -- we've had a 
> number of attempts at this sort of thing, but none of them got 
> committed to HEAD, and having it in there in whatever state is better 
> than not.  I have a concern (and overall design gripe) with the use of 
> AIO interfaces for this sort of thing, when we already have callback 
> structures in dbpf.
>
> We now have two levels of indirection, with threads being created and 
> managed in both.  Obviously, that's more code to manage in different 
> locations that do more or less the same thing, making it harder for 
> others to understand and augment.
>
> In general I don't think the aio callback structures are needed at 
> all, but its admittedly much easier to implement to those functions 
> than the dbpf ones, if only because of the disorderly op mgmt code in 
> dbpf bstream.
>
> I don't know what our long term plans will be for the trove code, but 
> I would vote for trying to move towards a simpler centralized location 
> for management of the IO threads and queues, and different callbacks 
> for IO impls.  I've done a prototype of this for queue/thread 
> management and O_DIRECT, and I think it would clean things up quite a 
> bit to go that route.
>
> -sam
>
> On Apr 17, 2008, at 4:10 PM, Phil Carns wrote:
>
>> There is a new trove method available in trunk now called "null-aio". 
>> It can be selected by putting "TroveMethod null-aio" in the 
>> <StorageHints> section of the file system configuration file.
>>
>> This is only useful for debugging purposes, because it deliberately 
>> skips doing any file I/O on the server side.  Please use with 
>> caution! It does all metadata operations the same as any other 
>> method, but file reads will return garbage and file writes are thrown 
>> away.  Writing beyond eof triggers a truncate to mimic the 
>> appropriate resulting bstream size.
>>
>> This might be useful once in a while for narrowing down performance 
>> problems between network and storage.  It takes the storage out of 
>> the loop and shows approximately what the network is capable of by 
>> itself. Of course it will only work for benchmarks that don't verify 
>> data correctness (or otherwise rely on data read off of PVFS).
>>
>> We used to have a compile time option (--disable-disk-io) for this 
>> same purpose, but that actually hasn't worked in a while.  Nowadays 
>> its easier to just do this as a trove method that can be selected at 
>> runtime without recompiling.
>>
>> -Phil
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>   



More information about the Pvfs2-developers mailing list