[Pvfs2-developers] Question in pvfs I/O

Phil Carns carns at mcs.anl.gov
Tue Apr 21 09:59:09 EDT 2009


Hi Christina,

I answered some of your specific questions in line below:

Christina Patrick wrote:
> Hi All,
> 
> I am doing a project where I need to implement simple prefetching in
> pvfs. While I was going through the pvfs code, I couldn't understand
> the following and hence have to ask the below questions:
> 
> 1. What is the file descriptor assigned by pvfs? Does this file
> descriptor correspond to actual descriptor (of the underlying
> filesystem) ... for eg. that which is seen by /proc filesystem?

The file descriptor used on the client side has no relation to the file 
descriptors used by the servers.  The client's file descriptor is 
assigned by the Linux kernel when a file is opened.

> 2. When I traced the I/O calls in pvfs v2.8.0, I found the following
> function chain on the pvfs2 server side:
> main
>   -> PINT_state_machine_continue
>     -> PINT_state_machine_next
>       -> PINT_state_machine_invoke
>         -> io_start_flow
>           -> job_flow
>             -> PINT_flow_post
>               -> fp_multiqueue_post
>                 -> bmi_send_callback_fn
>                   -> trove_bstream_read_list
>                     -> alt_aio_bstream_read_list
>                       -> dbpf_bstream_rw_list
>                         -> issue_or_delay_io_operation
>                           -> alt_lio_listio
>                             -> pthread_create => alt_lio_thread
> alt_lio_thread (new thread)
>   -> pread
> Since, pvfs is a parallel file system, are all the file descriptors
> assigned on all the servers the same? I saw one instance where the
> servers used descriptors 12, 12 & 17 and another instance where all
> the file servers used descriptor 12?

I think the previous question sort of answered that, but the servers get 
their own file descriptors as needed; they have no relation to the 
client's file descriptor.  If the servers happen to match it is just 
coincidence.

> 3. The most important question is how come all the file servers (the
> function alt_lio_thread) are passed the same offsets and also the
> total size of the data to be read? Shouldn't each file server be
> passed a list of offsets corresponding to what is stored on its own
> local disk (bcoz of the striping)? Where does this translation
> actually take place and how does the server read in the appropriate
> data in other words how is the striped data actually read?

The offsets used by the client have to get mapped to what makes sense 
relative to each server's part of the data.  The default striping unit 
is 64 KB.  This means that if you read at offset 64 KB, for example, 
then it will actually correspond to offset 0 of the 2nd server's data. 
It is pretty common to hit the same local offset on every server.

> 4. In order to implement prefetching on the server side, how do I have
> to go about it? Can I do some shortcut method or do I need to
> introduce some new state machine etc? I could really do with some help
> here.
> 
> I really appreciate your help,
> Thanks and Regards,
> Christina.
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



More information about the Pvfs2-developers mailing list