[Pvfs2-developers] terminating state machines
Sam Lang
slang at mcs.anl.gov
Wed Jul 26 22:23:21 EDT 2006
On Jul 26, 2006, at 6:16 PM, Phil Carns wrote:
>> I think I'm getting voted down here, so I should probably just
>> shutup, but I don't think in practice we're going to have that
>> many child state machines that iterating through the list is at
>> all costly. I'm arguing for simpler mechanisms that fit in with
>> the job subsystem over something more fancy and possibly slightly
>> better performing.
>
> Well, as far as the number of SMs goes, I would rather not risk
> it. I still hope this is lightweight enough that we could
> eventually use it in more places that would generate a lot of
> children (like a re-architected sys-io implementation), though I
> don't know if that will pan out in practice. I got bitten by a
> similar assumption in the flow protocol- it used to track all of
> its posted operations for testing rather than relying on someone to
> notify it of completion. Admittedly the flow protocol is a more
> obvious case and I should have known better, but at the time it
> seemed reasonable :)
>
Hmm...I had been thinking about a flow implementation that used the
new concurrent state machine code...it sounds like that's a bad idea
because the testing and restarting would take too long to switch
between bmi and trove? We use the post/test model through pvfs2
though, so maybe I don't understand the issue.
>>> I think that the way that you describe would work fine too, but
>>> it would require a little more active work to check the status
>>> of the array of child SMs and would require more code to keep
>>> track of them.
>
>> Probably a bit more code yes, but it seems cleaner than keeping
>> around backpointers and checking for parents. Instead of driving
>> all state machines from one place, this event notification
>> scheme essentially replaces the last child state machine with the
>> parent, which seems like a bit of hack and harder to debug.
>
> I think I'm lost now. What do you mean by replace? The states are
> still isolated, jobs trigger the transitions, only one state action
> gets executed at a time, there still may be a time gap between
> completion of any given child and when the parent picks up
> processing again, and there are still frames. I think both
> approaches will look the same when running unless I missed
> something. If Walt puts a longjmp() in there we can both hit him
> over the head.
>
Heh. Don't give him ideas! ;-)
I was operating under the constraint that a state machine can only
post a job for itself. If I understand the current plan correctly,
using job_null in the child state machine to post a job for the
parent breaks that constraint, and so in some sense is a replace (the
job_null actually takes the parent smcb pointer). I think you're
probably right that its not a big difference either way, its just
cleaner in my head to only have state machines posting jobs for
themselves.
> I think having a pointer to the parent actually improves
> debugability (though I'm not sure this approach actually requires
> it, all you really need is either a job descriptor or a pointer to
> a counter). If I have a state machine that does something bad or
> gets stuck it would be nice to be able to work backwards to find
> out who invoked it, without having to search for it in a seperate
> data structure.
>
> I don't mean to keep struggling with this issue- I honestly think
> that both approaches are pretty good, and if Walt implements it the
> way I think he is going to, then 95% of developers won't notice the
> difference anyway. At this point I am mostly hammering away to
> make sure I am not missing a larger issue...
Walt probably got more discussion than he bargained for, but at the
least, lively discussion keeps me awake in the afternoon ;-).
-sam
>
> -Phil
>
More information about the Pvfs2-developers
mailing list