[PVFS-developers] How to handle file unlinks when IOD(s) are
down
Rob Ross
rross at mcs.anl.gov
Tue Oct 26 14:36:26 EDT 2004
Hi David,
It looks like you've worked through most of this already :).
To clarify, the problem that you're seeing is that while a particular iod
is down, if someone goes and deletes a file that has a component on that
iod, there's an orphaned data file sitting on that iod. The other iods
have deleted their pieces, right?
If the user asked to delete something, we should go ahead and delete
everything that we can I think. Otherwise we could run into situations
where it's impossible to free space on the servers because one is down.
There are no tools for helping with this at this time; basically you'd
need to do the same sorts of things that the pvfs2-fsck tool does for
PVFS2 volumes, except that PVFS1 servers don't have the right building
block operations to make it easy...
So I think the issue boils down to how best to deal with the orphaned data
files. Perhaps the thing to do at this point is to change the behavior of
the iod from renaming to .saveme to instead unlinking the old file?
Rob
On Tue, 26 Oct 2004, David S. Metheny wrote:
> We are seeing that file deletes are leaving dangling inode files
> (future .saveme files) on the IOD(s) when a delete is issued from a client,
> and one or more IOD(s) are down (not accepting connections). The mgr will
> delete the metadata file before checking and attempting to delete the files
> from the IOD(s). There is a bad return from one or more of the IODs, but the
> end result is the metadata file is gone, and the IOD(s) have dangling inode
> files on them. I'm looking at working on a patch for this, but wanted a
> little direction.
>
> I see that the do_unlink call executes md_unlink before calling the
> IOD(s). The md_unlink actually opens the file to keep the inode from being
> reused after the deletion. Would it be OK to just make the call to the
> IOD(s) to try the deletion first, before executing the deletion on the mgr.
> This could also keep from having to open the metafile to keep the inode from
> being re-used.
>
> Some things that I don't yet understand how to handle, or if they would
> throw big kinks in the above idea:
>
> - If only one IOD doesn't delete the file (or just one does delete the
> file), we can't recover any of the data, so we really want to go ahead and
> delete the mgr file anyhow, instead of leaving a bad file viewable to all.
> Maybe we just want to log what happens so we can perform some sort of active
> cleanup, instead of waiting for it to turn into a .saveme file.
>
> - Also, it seems that we delete the manager file first, because we don't
> want anyone trying to access the file while the IOD deletes are occurring.
>
> Are there any tools available to clean up a PVFS cluster that has
> these dangling inodes, rather than wait for these files to be renamed to a
> .saveme file when the inode number is reused.
More information about the PVFS-developers
mailing list