[PVFS-developers] Recovering from an IOD failure

Rob Ross rross at mcs.anl.gov
Thu Feb 12 16:10:05 EST 2004


On Tue, 10 Feb 2004, Porter Don wrote:

> 3) The big one is that when iods come back up, say after a power loss, their
> state is out of sync with the rest of the cluster.  If a client tries to
> submit an io request to a newly bounced iod (with the rest not bounced), the
> iod will not have a file with that inode/cap open and squash the connection
> (as it does anytime it thinks it is getting a bogus request).  The net
> result will be that the client will have to reopen the file on the manager
> and all iods (hopefully closing the old one first).  Because all files are
> open on all iods, the loss of a single iod means that the state of the
> entire cluster will have to be reset en masse, albeit not necessarily all at
> once.

One other note on this: we should be thinking about mgr failures too.  
These cause the loss of all open file state, and mean that we need to 
reopen everything that we had open before.

(As an aside, I almost daily wish that I hadn't made "open" a real 
operation in PVFS, but it's a little late to fix that now!)

So the "close everything, then reopen" approach does a pretty good job of 
handling this case too, and it really couldn't be done much better (it's 
going to be inefficient no matter what we do).

One more thing to think about when considering how to tackle this problem.

I've attached a patch to address the 1B problem that Don brought up, in 
case someone other than Don is interested in looking at this (he has it 
already, but I forgot to mail the list).  This patch is not in CVS yet.

Rob
-------------- next part --------------
Index: mgr.c
===================================================================
RCS file: /projects/cvsroot/pvfs/mgr/mgr.c,v
retrieving revision 1.94
diff -r1.94 mgr.c
1376c1376,1398
< 		ERR1("do_open (%s): send_req failed\n", (char *) data_p);
---
> 		mreq fakereq;
> 		mack fakeack;
> 		
> 		myerr = errno;
> 
> 		/* if we fail to open the file we need to send a close to
> 		 * free resources on the iods that *did* get the open request.
> 		 *
> 		 * note that if there are other open instances, we probably won't
> 		 * really get the file closed.  we aren't trying to handle that case
> 		 * here.
> 		 */
> 		
> 		ERR1("do_open(%s): cleaning up after failed open\n", (char *) data_p);
> 		fakereq.majik_nr   = MGR_MAJIK_NR;
> 		fakereq.release_nr = PVFS_RELEASE_NR;
> 		fakereq.type       = MGR_CLOSE;
> 		fakereq.uid        = req_p->uid;
> 		fakereq.gid        = req_p->gid;
> 		fakereq.dsize      = 0;
> 		fakereq.req.close.meta.fs_ino        = fs_p->fs_ino;
> 		fakereq.req.close.meta.u_stat.st_ino = f_p->f_ino;
> 		
1379,1380c1401,1402
< 			 * created on the nodes that DID work and we need to remove the
< 			 * metadata file that md_open() created too...
---
> 			 * created on the nodes that did work and we need to remove the
> 			 * metadata file that md_open() created.
1382,1384c1404,1405
< 			 * We're going to get errors from all this, because something
< 			 * has happened to one of our IODs.  We're just trying to get
< 			 * everything back to a sane state as best we can.
---
> 			 * we do this by marking the file as unlinked before sending
> 			 * the close request out to the iods.
1386,1401d1406
< 			mreq fakereq;
< 			mack fakeack;
< 
< 			myerr = errno;
< 
< 			ERR1("do_open(%s): cleaning up after failed open\n", (char *) data_p);
< 			fakereq.majik_nr   = MGR_MAJIK_NR;
< 			fakereq.release_nr = PVFS_RELEASE_NR;
< 			fakereq.type       = MGR_CLOSE;
< 			fakereq.uid        = req_p->uid;
< 			fakereq.gid        = req_p->gid;
< 			fakereq.dsize      = 0;
< 			fakereq.req.close.meta.fs_ino        = fs_p->fs_ino;
< 			fakereq.req.close.meta.u_stat.st_ino = f_p->f_ino;
< 
< 			/* hack to ensure that this goes through */
1402a1408
> 		}
1405,1412c1411,1412
< 			/* here we add the file in so that do_close() can find it */
< 			f_add(fs_p->fl_p, f_p);
< 			/* we're closing the file so all the IODs will know it is safe
< 			 * to unlink it; we're telling them to unlink it at the same
< 			 * time...
< 			 */
< 			do_close(-1, &fakereq, NULL, &fakeack);
< 			/* do_close() gets rid of our copy of the file info at f_p */
---
> 		/* add the file in so that do_close() can find it */
> 		if (new_entry) f_add(fs_p->fl_p, f_p);
1413a1414,1417
> 		do_close(-1, &fakereq, NULL, &fakeack);
> 		/* do_close() gets rid of our copy of the file info at f_p */
> 
> 		if (new_file) {
1416d1419
< 			errno = myerr;
1417a1421,1423
> 
> 		errno = myerr;
> 


More information about the PVFS-developers mailing list