[PVFS2-developers] job timeouts
Phil Carns
pcarns at parl.clemson.edu
Thu Jul 15 16:20:33 EDT 2004
On Thursday 15 July 2004 18:36, Pete Wyckoff wrote:
> neillm at mcs.anl.gov wrote on Tue, 13 Jul 2004 11:37 -0500:
> > Pete and Robl, if you guys could pull from CVS and re-run your tests,
> > it would be good to know if it works for you now. Please report, as
> > more tweaking may be necessary.
>
> Just FYI, I ran a bunch of tests last night on the then-latest CVS
> with more or less the same results, against a single server on
> a different node from the clients, TCP on gige.
Doh. I guess the 30 second timeout on the flows (even with it "automatically"
resetting any time data is moved) isn't enough. I bet some of the flows are
stalling right at the beginning waiting on some earlier work and never get a
chance to move data before timing out :(
I'll see if I can add a log message that prints out when a flow times out and
says how far along it got before quitting.
> I'll run with the "return 0" workaround at the top of __job_time_mgr_add()
> for the time being. Let me know if you want any of my test codes or
> scripts, though PAV does the heavy lifting. Once you get through the
> painful process of building MPI it's not too hard to run these tests on
> a couple compute nodes.
>
> Weird thing is, disabling the job timeouts leads to slower writes. No
> clue why. See second batch below. It seems to be repeatable.
Now that is really strange!
-Phil
More information about the PVFS2-developers
mailing list