[Pvfs2-developers] terminating state machines
Sam Lang
slang at mcs.anl.gov
Thu Jul 27 11:37:44 EDT 2006
On Jul 27, 2006, at 10:16 AM, Phil Carns wrote:
>
>> Hmm...I had been thinking about a flow implementation that used
>> the new concurrent state machine code...it sounds like that's a
>> bad idea because the testing and restarting would take too long
>> to switch between bmi and trove? We use the post/test model
>> through pvfs2 though, so maybe I don't understand the issue.
>
> I don't think that is bad idea. There were really two seperate but
> related problems in one of the older flow protocol implementations,
> I can try to describe them a little more here if I can remember:
>
> - explicitly tracking and testing each trove and bmi operation: It
> basically kept arrays that listed pending trove and bmi ops, and
> would call testsome() to service them. This was a problem because
> the time it took to keep running up and down those arrays (when
> building them at the flow level, or when testing them at the trove/
> bmi level). The solution is to just use testcontext() and let
> trove/bmi tell you when something finishes without managing extra
> state.
>
> - thread switch time: the architecture here was set up at one time
> to have one thread pushing the test functions for bmi, another
> thread pushing the test functions for trove, while another thread
> was processing the flow and posting new operations. The problem
> here is that it (at the time) took too long to jump between the
> "pushing" threads and the "processing" thread when an operation
> finished that should trigger progress on the flow. This led to the
> thread-mgr.c code and associated callbacks. The callbacks actually
> drive the flow progress and post new operations. That means that
> the same thread that pushes testcontext() gets to trigger the next
> post, without waiting on the latency of waking up a different
> thread to do something (using condition variable etc.). I managed
> to reuse the thread-mgr for the job code as well, so that one
> testcontext() call triggers callbacks to both the job and flow
> interfaces.
>
> I don't think either of the above issues precludes different flow
> protocol implementations, and they are really kind of orthogonal to
> whether state machines are used or not. The first issue is solved
> just by using testcontext() rather than manually tracking operations.
>
> The second issue could be solved in a variety of ways, some of
> which may be better than what we have now. The callback approach
> is effecient enough, but is hard to debug. Of course it is also
> possible that the thread switch (ie. condition signal) latency is
> low enough nowadays that you don't even need to worry about it
> anymore. I last looked at this problem before NPTL arrived on the
> scene.
>
> At any rate I think a state machine based flow protocol could dodge
> issue #2 by either:
> - lucking out with a faster modern thread implementation
> - being smarter about how thread work is divided up
> - using callbacks as we do now, and making the state machine
> mechanism thread safe so that it can be driven directly from those
> callbacks rather than from a testcontext() work loop
>
> On a related note, it is important to remember that trove has its
> own internal thread also- so on the trove push side (depending on
> your design) you could have to worry about a chain of 2 threads
> that have to be woken up to get something done at completion time.
> The trove part of that chain can't be avoided without changing the
> API.
>
> Sorry about the tangent here, but I figured I may as well share
> some warnings about things to look out for here. I think it would
> be good to have a cleaner flow protocol implementation.
>
Thanks for the detailed explanation Phil. I hadn't thought about the
context switches that might slow down flow. I was primarily thinking
of something that would be cleaner, and easier to modify and test for
different scenarios. If at some point I get around to playing with a
flow impl that uses the concurrent state machine framework, I'll open
up the discussion again to avoid any of the pitfalls you described.
-sam
>>> I think I'm lost now. What do you mean by replace? The states
>>> are still isolated, jobs trigger the transitions, only one state
>>> action gets executed at a time, there still may be a time gap
>>> between completion of any given child and when the parent picks
>>> up processing again, and there are still frames. I think both
>>> approaches will look the same when running unless I missed
>>> something. If Walt puts a longjmp() in there we can both hit
>>> him over the head.
>>>
>> Heh. Don't give him ideas! ;-)
>> I was operating under the constraint that a state machine can
>> only post a job for itself. If I understand the current plan
>> correctly, using job_null in the child state machine to post a
>> job for the parent breaks that constraint, and so in some sense
>> is a replace (the job_null actually takes the parent smcb
>> pointer). I think you're probably right that its not a big
>> difference either way, its just cleaner in my head to only have
>> state machines posting jobs for themselves.
>
> I see what you are saying. I guess it depends on how you look at
> it. I had kind of started thinking of the jobs as a signalling
> mechanism since they are the construct that "signals" as state
> machine to make its next transition. The job_null() approach just
> makes it so that a child state machine is what triggers this
> particular signal, rather than a bmi/trove/dev/req_sched/flow
> component. I know this is a change in the model and adds a
> dependency that wasn't previously there, but at least job_null() is
> just a few dozen lines of code. If someone reuses the SM code
> elsewhere, I would guess that is one of the more minor worries
> considering that they would need a whole new mechanism (other than
> the job api) to motivate all of the transitions anyway.
>
>> Walt probably got more discussion than he bargained for, but at
>> the least, lively discussion keeps me awake in the afternoon ;-).
>
> Heh- same here :)
>
> -Phil
>
More information about the Pvfs2-developers
mailing list