[Pvfs2-developers] I/O statemachines adaption for migration ?
Sam Lang
slang at mcs.anl.gov
Mon Aug 28 11:53:26 EDT 2006
On Aug 28, 2006, at 8:23 AM, Julian Martin Kunkel wrote:
> Hi,
> I want to adapt the I/O statemachines to reread the dfile array in
> case a I/O
> server responds with PVFS_ENOENT during the flow or within the
> inital I/O
> ACK. This might happen if the file is migrated away and the client
> does not
> have the updated dfile array befor it initiates the I/O.
> Thus, I want to reread the dfile array and only restart the I/O for
> this
> particular server. The progress of the other I/O requests should
> not be
> influenced.
> While looking at the sys-io.sm I wonder if the transition for the case
> IO_RETRY in the state io_analyze_results does this. Maybe some
> extra lines
> could be added for example to restart the process if the initial
> acknowledge
> returns with PVFS_ENOENT and also do not increase the retry count
> in this
> case ?
> I'm thankful for any suggestions how that could be implemented easily.
>
I think IO_RETRY is a little different. The first step (before the
IO request/response) of the sys-io.sm is a getattr to the metadata
server to get the datafile handles. Its this step that you want to
repeat if the IO request to the IO server fails, right? So instead
of jumping back to io_datafile_post_msgpairs, I think you'll want to
jump all the way back to io_init. Its probably easier to create
another return code (IO_REINIT or something), and return that from
io_datafile_complete_operations. I think there will be some cleanup
that you have to do in complete_operations before you can jump back
up to init as well.
Also, it seems unlikely that the dfile handle array would have
changed from the initial getattr to the IO requests (wouldn't a
migrate disable the metadata server temporarily?), so this retry is
probably only necessary if the attribute cache holding the dfile
handle array has become stale. You could just turn that attr cache
off with a 0 timeout for now, otherwise you'll have to invalidate the
cache (at least the dfile handle array bits of it) before doing the
getattr again.
> In this context a weird error message:
> In case the fs is corrupted, e.g. there is a metafile pointing to a
> non-
> existing datafile I think the I/O should abort quickly instead of
> doing
> retries (in the migration case retry to get dfiles if they did not
> change
> abort). Currently on the client sm returns the error: "Operation
> now in
> progress". You can try this by removing a datafile with pvfs2-
> remove-object
> (first get object number with pvfs2-viewdist).
Hmm..that is a little odd. I think the EINPROGRESS only gets
returned from aio_error though...I don't see us setting it anywhere
in our code. My guess is that the remove-object may not remove the
actual fd from the open cache, so the IO doesn't fail sooner with
ENOENT as it should. I haven't looked at the code to verify that
though.
-sam
>
> thanks,
> julian
>
> ---
> Ben (Obi-Wan) Kenobi:
> Use the Force, Luke!
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list