[PVFS-developers] Recovering from an IOD failure

Porter Don PorterDE at mercury.hendrix.edu
Mon Feb 16 14:23:48 EST 2004


 Yeah, I have a test case that always causes an fd leak without the patch,
but cannot with the patch.  I am currently doing some stress testing against
it to see if there were any inadvertant effect (I would be very surprised if
there were, but better safe than sorry).

-----Original Message-----
From: Rob Ross
To: Porter Don
Cc: 'pvfs-developers at www.beowulf-underground.org'
Sent: 2/16/04 1:11 PM
Subject: RE:[PVFS-developers] Recovering from an IOD failure

Heh, cool; it was definitely worth a shot.

Any news re: the 1B problems?

Thanks!

Rob

On Mon, 16 Feb 2004, Porter Don wrote:

> >2) In mgr.c/send_req, if the manager had an open socket connection
that
> >dies, there is no retry logic.  It seems like the manager ought to at
least
> >try once to reestablish the connection on an EPIPE.  This would
primarily
> >help the case where an iod died and came back up between requests.
> 
> Yeah, this was a bad idea.  I tinkered around with it and it quickly
got
> into an infinite loop, so I am going to go with Rob in saying that the
retry
> logic is just going to have to live in the client.


More information about the PVFS-developers mailing list