[Pvfs2-developers] terminating state machines

Walter B. Ligon III walt at clemson.edu
Thu Jul 27 10:56:40 EDT 2006



Sam Lang wrote:
> 
> On Jul 26, 2006, at 6:16 PM, Phil Carns wrote:
> 
>>> I think I'm getting voted down here, so I should probably just   
>>> shutup, but I don't think in practice we're going to have that  many  
>>> child state machines that iterating through the list is at  all  
>>> costly.  I'm arguing for simpler mechanisms that fit in with  the 
>>> job  subsystem over something more fancy and possibly slightly  
>>> better  performing.
>>
>>
>> Well, as far as the number of SMs goes, I would rather not risk  it.  
>> I still hope this is lightweight enough that we could  eventually use 
>> it in more places that would generate a lot of  children (like a 
>> re-architected sys-io implementation), though I  don't know if that 
>> will pan out in practice.  I got bitten by a  similar assumption in 
>> the flow protocol- it used to track all of  its posted operations for 
>> testing rather than relying on someone to  notify it of completion.  
>> Admittedly the flow protocol is a more  obvious case and I should have 
>> known better, but at the time it  seemed reasonable :)
>>
> 
> Hmm...I had been thinking about a flow implementation that used the  new 
> concurrent state machine code...it sounds like that's a bad idea  
> because the testing and restarting would take too long to switch  
> between bmi and trove?  We use the post/test model through pvfs2  
> though, so maybe I don't understand the issue.
> 
>>>> I think that the way that you describe would work fine too, but  it  
>>>> would require a little more active work to check the status  of the  
>>>> array of child SMs and would require more code to keep  track of them.
>>
>>
>>> Probably a bit more code yes, but it seems cleaner than keeping   
>>> around backpointers and checking for parents.  Instead of driving  
>>> all  state machines from one place, this event notification  scheme  
>>> essentially replaces the last child state machine with the  parent,  
>>> which seems like a bit of hack and harder to debug.
>>
>>
>> I think I'm lost now.  What do you mean by replace?  The states are  
>> still isolated, jobs trigger the transitions, only one state action  
>> gets executed at a time, there still may be a time gap between  
>> completion of any given child and when the parent picks up  processing 
>> again, and there are still frames.  I think both  approaches will look 
>> the same when running unless I missed  something.  If Walt puts a 
>> longjmp() in there we can both hit him  over the head.
>>
> Heh.  Don't give him ideas! ;-)
> 
> I was operating under the constraint that a state machine can only  post 
> a job for itself.  If I understand the current plan correctly,  using 
> job_null in the child state machine to post a job for the  parent breaks 
> that constraint, and so in some sense is a replace (the  job_null 
> actually takes the parent smcb pointer).  I think you're  probably right 
> that its not a big difference either way, its just  cleaner in my head 
> to only have state machines posting jobs for  themselves.
> 
>> I think having a pointer to the parent actually improves  debugability 
>> (though I'm not sure this approach actually requires  it, all you 
>> really need is either a job descriptor or a pointer to  a counter).  
>> If I have a state machine that does something bad or  gets stuck it 
>> would be nice to be able to work backwards to find  out who invoked 
>> it, without having to search for it in a seperate  data structure.
>>
>> I don't mean to keep struggling with this issue- I honestly think  
>> that both approaches are pretty good, and if Walt implements it the  
>> way I think he is going to, then 95% of developers won't notice the  
>> difference anyway.  At this point I am mostly hammering away to  make 
>> sure I am not missing a larger issue...
> 
> 
> Walt probably got more discussion than he bargained for, but at the  
> least, lively discussion keeps me awake in the afternoon ;-).
> 
> -sam
> 
>>
>> -Phil
>>

Good discussion.  Phil has convinced me the level of dependency is low, 
and unless I completely misunderstand Sam, the complexity of the parent 
pointer/job_null approach is a lot less than the alternative, and I like 
low complexity.  I also think debugging will be simpler.  So that's 
where I'm going.

I'll hae to think of other topics to get you guys going form time to 
time!  ;-)

Now off to figure out a way to use setjmp/longjmp in my implementation!

Walt
-- 
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


More information about the Pvfs2-developers mailing list