[PVFS-developers] Recovering from an IOD failure

Rob Ross rross at mcs.anl.gov
Wed Feb 11 12:30:33 EST 2004


On Wed, 11 Feb 2004, Porter Don wrote:

> You are right about 1A.  The code is there.  The only change that needs to
> happen is sending unlink to the iods.  

That's what I'm getting at -- the code *is* there to send the unlink to 
the iods!  In mgr.c:do_close():

	if (f_p->unlinked >= 0) /* tell IODs to remove the file too */ {
		memset(&iodreq, 0, sizeof(iodreq));
		iodreq.majik_nr         = IOD_MAJIK_NR;
		iodreq.release_nr       = PVFS_RELEASE_NR;
		iodreq.type             = IOD_UNLINK;
		iodreq.dsize            = 0;
		iodreq.req.unlink.f_ino = f_p->f_ino;
		if (send_req(fs_p->iod, fs_p->nr_iods, f_p->p_stat.base, 
			     f_p->p_stat.pcount, &iodreq, NULL, NULL) < 0)
		{
			myerr = errno;
		}
		meta_close(f_p->unlinked); /* close the FD that was kept around */
	}

So I think that there is just some bug hiding in there.  I'm not sure if 
somehow the above code isn't being executed or if the iods are choosing 
not to remove the file for some reason?  Or am I missing something?

> I suppose I mentioned the code in do_open is for bug "1B", which doesn't
> call do_close on a new _entry_ (file already exists, but hasn't been opened
> before).  

Right; this one isn't handled.  I think we would need to (a) fix whatever 
the problem is for 1A, then (b) modify the do_open() code in that
"if (new_file)" block so that the close gets called regardless of whether 
or not the file is new (and just set the unlink field for new files).

Does that make sense?  I'm tempted to try to knock this out while we're 
talking about it :).

Rob



More information about the PVFS-developers mailing list