[Pvfs2-developers] I/O statemachines adaption for migration ?

Sam Lang slang at mcs.anl.gov
Mon Aug 28 11:53:26 EDT 2006


On Aug 28, 2006, at 8:23 AM, Julian Martin Kunkel wrote:

> Hi,
> I want to adapt the I/O statemachines to reread the dfile array in  
> case a I/O
> server responds with PVFS_ENOENT during the flow or within the  
> inital I/O
> ACK. This might happen if the file is migrated away and the client  
> does not
> have the updated dfile array befor it initiates the I/O.
> Thus, I want to reread the dfile array and only restart the I/O for  
> this
> particular server. The progress of the other I/O requests should  
> not be
> influenced.
> While looking at the sys-io.sm I wonder if the transition for the case
> IO_RETRY in 	the state io_analyze_results does this. Maybe some  
> extra lines
> could be added for example to restart the process if the initial  
> acknowledge
> returns with PVFS_ENOENT and also do not increase the retry count  
> in this
> case ?
> I'm thankful for any suggestions how that could be implemented easily.
>

I think IO_RETRY is a little different.  The first step (before the  
IO request/response) of the sys-io.sm is a getattr to the metadata  
server to get the datafile handles.  Its this step that you want to  
repeat if the IO request to the IO server fails, right?  So instead  
of jumping back to io_datafile_post_msgpairs, I think you'll want to  
jump all the way back to io_init.  Its probably easier to create  
another return code (IO_REINIT or something), and return that from  
io_datafile_complete_operations.  I think there will be some cleanup  
that you have to do in complete_operations before you can jump back  
up to init as well.

Also, it seems unlikely that the dfile handle array would have  
changed from the initial getattr to the IO requests (wouldn't a  
migrate disable the metadata server temporarily?), so this retry is  
probably only necessary if the attribute cache holding the dfile  
handle array has become stale.  You could just turn that attr cache  
off with a 0 timeout for now, otherwise you'll have to invalidate the  
cache (at least the dfile handle array bits of it) before doing the  
getattr again.

> In this context a weird error message:
> In case the fs is corrupted, e.g. there is a metafile pointing to a  
> non-
> existing datafile I think the I/O should abort quickly instead of  
> doing
> retries (in the migration case retry to get dfiles if they did not  
> change
> abort). Currently on the  client sm returns the error: "Operation  
> now in
> progress". You can try this by removing a datafile with pvfs2- 
> remove-object
> (first get object number with pvfs2-viewdist).

Hmm..that is a little odd.  I think the EINPROGRESS only gets  
returned from aio_error though...I don't see us setting it anywhere  
in our code.  My guess is that the remove-object may not remove the  
actual fd from the open cache, so the IO doesn't fail sooner with  
ENOENT as it should.  I haven't looked at the code to verify that  
though.

-sam

>
> thanks,
> julian
>
> ---
> Ben (Obi-Wan) Kenobi:
> 	Use the Force, Luke!
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list