[Pvfs2-developers] proposed changes to kernel timeout mechanism

Phil Carns pcarns at wastedcycles.org
Fri Mar 24 03:42:02 EST 2006


This is somewhat related to the timeout discussion from the previous 
email, but this time the issue is the "op timeout" that the kernel 
module uses.  This is an absolute timeout associated with every upcall 
that the kernel submits, and is fully independent of the job timeouts 
that the pvfs2-client daemon uses.

These two competing timeouts for operations posted through the VFS can 
cause some headache in some circumstances.  I mentioned part of the 
problem a while back but we recently dug into it a little bit more:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2005-December/001702.html

It seems like the pvfs2-client should be the real authority on when to 
timeout and retry from network or server problems.  In particular, it 
can do a couple of things that the kernel can't:
- it can differentiate BMI and non/BMI errors
- it uses a sliding timeout (based on progress over time) for the flows 
rather than an absolute timeout

I believe that the timeout/retry mechanism in the kernel module was 
mainly added to handle cases in which the pvfs2-client daemon is 
restarted, but it ends up triggering in a variety of unrelated scenarios 
because it is shorter than the job timeouts and doesn't understand 
flows.  It very uncommon for the pvfs2-client-core to restart anymore 
(this is no longer a normal error cleanup mechanism).

However, it seems like kernel should still recover gracefully from 
pvfs2-client-core restarts, but it would be nice if the timeouts/retries 
used to handle this didn't interfere with the pvfs2-client-core 
timeout/retry mechanism.

Here is a proposed solution:
- completely get rid of the per-operation timeout and retry mechanism
   (and the op-timeout tunable parameter)
- instead add logic to the device release function in the kernel, which
   is an indicator that the pvfs2-client-core has exited:
    - when this happens, requeue all pending operations to be resubmitted
    - start a single global timer
      - if the timer expires before someone reopens the device file, then
        cancel all pending operations with some error code to indicate
        that the pvfs2-client died
      - if the device is reopened in time, the new pvfs2-client-core
        instance will service the old operations (transparent to the
        application), and the timer is cancelled

The end result is that the kernel module never times out or retries any 
operation unless it is specifically to handle the case that the 
pvfs2-client-core has been restarted.  We also have the opportunity to 
use an error code other than -ETIMEOUT that might be a little more helpful.

Any thoughts?

-Phil


More information about the Pvfs2-developers mailing list