[PVFS2-developers] job timeouts

Phil Carns pcarns at parl.clemson.edu
Thu Jul 15 16:20:33 EDT 2004


On Thursday 15 July 2004 18:36, Pete Wyckoff wrote:
> neillm at mcs.anl.gov wrote on Tue, 13 Jul 2004 11:37 -0500:
> > Pete and Robl, if you guys could pull from CVS and re-run your tests,
> > it would be good to know if it works for you now.  Please report, as
> > more tweaking may be necessary.
>
> Just FYI, I ran a bunch of tests last night on the then-latest CVS
> with more or less the same results, against a single server on
> a different node from the clients, TCP on gige.

Doh.  I guess the 30 second timeout on the flows (even with it "automatically" 
resetting any time data is moved) isn't enough.  I bet some of the flows are 
stalling right at the beginning waiting on some earlier work and never get a 
chance to move data before timing out :(

I'll see if I can add a log message that prints out when a flow times out and 
says how far along it got before quitting.

> I'll run with the "return 0" workaround at the top of __job_time_mgr_add()
> for the time being.  Let me know if you want any of my test codes or
> scripts, though PAV does the heavy lifting.  Once you get through the
> painful process of building MPI it's not too hard to run these tests on
> a couple compute nodes.
>
> Weird thing is, disabling the job timeouts leads to slower writes.  No
> clue why.  See second batch below.  It seems to be repeatable.

Now that is really strange!

-Phil


More information about the PVFS2-developers mailing list