[PVFS-developers] Recovering from an IOD failure
Rob Ross
rross at mcs.anl.gov
Wed Feb 11 12:30:33 EST 2004
On Wed, 11 Feb 2004, Porter Don wrote:
> You are right about 1A. The code is there. The only change that needs to
> happen is sending unlink to the iods.
That's what I'm getting at -- the code *is* there to send the unlink to
the iods! In mgr.c:do_close():
if (f_p->unlinked >= 0) /* tell IODs to remove the file too */ {
memset(&iodreq, 0, sizeof(iodreq));
iodreq.majik_nr = IOD_MAJIK_NR;
iodreq.release_nr = PVFS_RELEASE_NR;
iodreq.type = IOD_UNLINK;
iodreq.dsize = 0;
iodreq.req.unlink.f_ino = f_p->f_ino;
if (send_req(fs_p->iod, fs_p->nr_iods, f_p->p_stat.base,
f_p->p_stat.pcount, &iodreq, NULL, NULL) < 0)
{
myerr = errno;
}
meta_close(f_p->unlinked); /* close the FD that was kept around */
}
So I think that there is just some bug hiding in there. I'm not sure if
somehow the above code isn't being executed or if the iods are choosing
not to remove the file for some reason? Or am I missing something?
> I suppose I mentioned the code in do_open is for bug "1B", which doesn't
> call do_close on a new _entry_ (file already exists, but hasn't been opened
> before).
Right; this one isn't handled. I think we would need to (a) fix whatever
the problem is for 1A, then (b) modify the do_open() code in that
"if (new_file)" block so that the close gets called regardless of whether
or not the file is new (and just set the unlink field for new files).
Does that make sense? I'm tempted to try to knock this out while we're
talking about it :).
Rob
More information about the PVFS-developers
mailing list