[PVFS-developers] Recovering from an IOD failure
Porter Don
PorterDE at mercury.hendrix.edu
Wed Feb 11 10:59:14 EST 2004
You are right about 1A. The code is there. The only change that needs to
happen is sending unlink to the iods.
I suppose I mentioned the code in do_open is for bug "1B", which doesn't
call do_close on a new _entry_ (file already exists, but hasn't been opened
before).
Both should be simple bugfixes.
-----Original Message-----
From: Rob Ross
To: Don Porter
Cc: pvfs-developers at www.beowulf-underground.org
Sent: 2/11/04 8:43 AM
Subject: Re: [PVFS-developers] Recovering from an IOD failure
On Tue, 10 Feb 2004, Don Porter wrote:
> Good insights as usual :) We haven't had the iods going down often at
> all. I have just spent a good deal of time thinking through the
> different scenarios, both for Martin's project as well as some others.
> So you are probably right on the point about the simplicity.
Glad to hear you aren't seeing this too often; I was a little concerned
that I was missing something!
> I will definately take a crack at the first two things and see where
> they get me, but I may put the third on the back burner for a while if
> we don't really need it.
I've looked at the first issue (call it 1A, removing data files on
failed
opens), and there actually is code in the mgr to try to remove the data
files on a failed open. If you look at do_open(), you'll see a note
about
a "tricky case" of trying to remove these. What it does is:
- if the open request fails,
- and if the file is a new file,
- put together a fake close request (to pass to do_close())
- set the f_p->unlinked field to point to the metadata
- call do_close()
- remove the metadata
Then do_close() should:
- send close request to iods
- if (f_p->unlinked >= 0),
- send unlink to iods
So I think for this particular case it's a matter of fixing a bug, not
implementing new functionality. Any ideas where in this path we're
failing?
Thanks,
Rob
More information about the PVFS-developers
mailing list