[Pvfs2-developers] terminating state machines

Sam Lang slang at mcs.anl.gov
Wed Jul 26 22:23:21 EDT 2006


On Jul 26, 2006, at 6:16 PM, Phil Carns wrote:

>> I think I'm getting voted down here, so I should probably just   
>> shutup, but I don't think in practice we're going to have that  
>> many  child state machines that iterating through the list is at  
>> all  costly.  I'm arguing for simpler mechanisms that fit in with  
>> the job  subsystem over something more fancy and possibly slightly  
>> better  performing.
>
> Well, as far as the number of SMs goes, I would rather not risk  
> it.  I still hope this is lightweight enough that we could  
> eventually use it in more places that would generate a lot of  
> children (like a re-architected sys-io implementation), though I  
> don't know if that will pan out in practice.  I got bitten by a  
> similar assumption in the flow protocol- it used to track all of  
> its posted operations for testing rather than relying on someone to  
> notify it of completion.  Admittedly the flow protocol is a more  
> obvious case and I should have known better, but at the time it  
> seemed reasonable :)
>

Hmm...I had been thinking about a flow implementation that used the  
new concurrent state machine code...it sounds like that's a bad idea  
because the testing and restarting would take too long to switch  
between bmi and trove?  We use the post/test model through pvfs2  
though, so maybe I don't understand the issue.

>>> I think that the way that you describe would work fine too, but  
>>> it  would require a little more active work to check the status  
>>> of the  array of child SMs and would require more code to keep  
>>> track of them.
>
>> Probably a bit more code yes, but it seems cleaner than keeping   
>> around backpointers and checking for parents.  Instead of driving  
>> all  state machines from one place, this event notification  
>> scheme  essentially replaces the last child state machine with the  
>> parent,  which seems like a bit of hack and harder to debug.
>
> I think I'm lost now.  What do you mean by replace?  The states are  
> still isolated, jobs trigger the transitions, only one state action  
> gets executed at a time, there still may be a time gap between  
> completion of any given child and when the parent picks up  
> processing again, and there are still frames.  I think both  
> approaches will look the same when running unless I missed  
> something.  If Walt puts a longjmp() in there we can both hit him  
> over the head.
>
Heh.  Don't give him ideas! ;-)

I was operating under the constraint that a state machine can only  
post a job for itself.  If I understand the current plan correctly,  
using job_null in the child state machine to post a job for the  
parent breaks that constraint, and so in some sense is a replace (the  
job_null actually takes the parent smcb pointer).  I think you're  
probably right that its not a big difference either way, its just  
cleaner in my head to only have state machines posting jobs for  
themselves.

> I think having a pointer to the parent actually improves  
> debugability (though I'm not sure this approach actually requires  
> it, all you really need is either a job descriptor or a pointer to  
> a counter).  If I have a state machine that does something bad or  
> gets stuck it would be nice to be able to work backwards to find  
> out who invoked it, without having to search for it in a seperate  
> data structure.
>
> I don't mean to keep struggling with this issue- I honestly think  
> that both approaches are pretty good, and if Walt implements it the  
> way I think he is going to, then 95% of developers won't notice the  
> difference anyway.  At this point I am mostly hammering away to  
> make sure I am not missing a larger issue...

Walt probably got more discussion than he bargained for, but at the  
least, lively discussion keeps me awake in the afternoon ;-).

-sam

>
> -Phil
>



More information about the Pvfs2-developers mailing list