[Pvfs2-developers] terminating state machines

Phil Carns pcarns at wastedcycles.org
Wed Jul 26 18:17:43 EDT 2006


Sam Lang wrote:
> 
> On Jul 26, 2006, at 3:41 PM, Walter B. Ligon III wrote:
> 
>> Yeah, the idea is that the SM code would call the job function.  
>> Depending on the state actions to do it seems like asking for  
>> trouble, all the details that have to be kept up with.
>>
>> Actually, there are already job structs used by the SM code, now  I've 
>> had to add a context id to the smcb and there will be job  calls.  I 
>> think you are right though, the amount of dependency is  pretty small.
>>
>> As for the job funcs I think I'd need one new one to post the  parent 
>> job, establishing a counter.  The child job would look up  the 
>> counter, decrement, and if zero, call job_null to relaunch the  
>> parent, or just
>> replicate what job_null does, whatever seem the easiest.
> 
> I would rather see the parent get relaunched by the normal job test  
> code by putting itself in the job completion queue once its  finished.  
> This could happen in a job_sm_test call like I suggested  in my previous 
> email.  Also, instead of a counter that a test  function would check, 
> and the child state machines would have to  decrement, I'd prefer the 
> parent job keep an array of child state  machines (it does this anyway, 
> no?) and check each element in the  array for completion of the state 
> machine.  That way the children  aren't competing to lock the same state 
> to notify of completion, the  parent just checks each one.

There doesn't need to be any locking- the main server thread only 
executes one state function or one transition at a time.  The counter 
also doesn't need to be visible- it could be hidden inside the job call, 
which could lock or not lock as it sees fit.

The parent also couldn't be the one checking the elements in an array 
like that - it would have to be done from within the job code somewhere 
(which I think you described in your previous email).  That means that 
somewhere in the job code (or request scheduler, etc.) something will 
have to do the following on every job_testcontext() call:

for each active sm
	for each child within that sm
		check state

Which could get expensive depending on how extensively we use the 
child/parallel sm model.

> 
>> The implicit call is the child's call when it terminates.  The  
>> parent's call could be implicit too, or done by the state action.
> 
> Doesn't this require child state machines to only function in the  child 
> state machine context?  I'd prefer to just have generic state  machines 
> that can be used as a child state machine or as a top-level  state machine.

I would prefer that too :)  Is this going to work Walt?  It would be 
nice if the state machine processing code handled transparently 
triggering different termination functions depending on whether it was a 
top level sm or not without the state functions themselves knowing any 
better.

>> As of this moment we really haven't taken any pains to keep the SM  
>> independent from the job system, in fact you have to have the job  
>> system to drive things, so in some sense its not really an issue.
> 
> 
> I vote for making the interfaces as separate as possible.  If someone  
> else wants to use the state machine code somewhere else, it would be  
> nice to allow them to take it as-is (mpich2 guys were talking about  
> using it, but I think they ended up doing something else).  Also,  
> independent layers make testing and debugging easier in my view.
> 
> In the current code, the sm_p is passed through to the job descriptor  
> as a void*, and we just cast back to a sm_p in the while loop that  does 
> the job_testcontext and then drives the state machines again.   The use 
> of job_status does bring in the job code into the state  machine code,  
> but it seems like mostly only the error_code field is  used within the 
> state actions, and the rest of that structure could  be independent of 
> the state machine code.
> 
> -sam
> 
>>
>> Any more commends?  (Sam I hope this address some of yours)
>>
>> Walt
>>
>> Phil Carns wrote:
>>
>>> Walter B. Ligon III wrote:
>>>
>>>>
>>>> OK, guys, I have another issue I want input on.  When child SMs  
>>>> terminate they have to notify their parent.  The parent has to  wait 
>>>> for all the children to terminate.  So I've been thinking to  use 
>>>> the job subsystem for this: the parent would post a job to  wait for 
>>>> N children,
>>>> and each child would post a job, the last one releasing the parent.
>>>>
>>>> Now I see two ways to implement this - one is to implement this  
>>>> directly in the state machine code.  The parent simply stops  
>>>> running (because it does not schedule a job yet returns  DEFERRED).  
>>>> Each child decrements a counter, and when it hits 0  the parent is 
>>>> restarted.  This is a little ugly because the  waiting parent is not 
>>>> being held on any list or queue (up to now  all waiting SMs are in 
>>>> the job subsystem), also the last  terminating child becomes the 
>>>> parent as it starts executing the  parent code.  Things can get 
>>>> weird when one SM starts children  that start children, and so on.
>>>>
>>>> Now the other way to implement this is with the job subsystem as  I 
>>>> suggested above.  Much cleaner except for one thing:  up to now  the 
>>>> state machine subsystem has had no dependency at all on the  job 
>>>> subsystem.  If we do it this way, this function only works  with the 
>>>> job system intact.  I'd prefer not to do this, but it  does seem the 
>>>> cleanest, most logical means.
>>>
>>> I like the job approach.  I guess this is an extra dependency  
>>> because the sms would be calling these particular job functions  
>>> implicitly, rather than relying on the state functions to handle  
>>> those posts and releases?  We definitely haven't done that before,  
>>> but at least in this case the job function that the sm  
>>> infrastructure would be depending on is the simplest one in the  
>>> arsenal :)  It shouldn't be hard for someone to reimplement that  
>>> particular functionality if they wanted to use the state machine  
>>> mechanism in another project.
>>> If you weren't planning on these job calls to be implicit, then  I'm 
>>> not sure where the extra dependency is- we already use jobs to  
>>> trigger all of the other "normal" transitions.
>>> This reminded me of a question, though- is there going to be a  
>>> standard mechanism for the children to report each of their  
>>> independent error codes to the parent sm?  Or do the children need  
>>> to just keep a reference to the parent sm structure and manually  
>>> fill in an array or something?
>>> I guess I have a broader question of how data that the children  
>>> generate (like a handle value or an attr structure) gets  transferred 
>>> to the parent.  Does the parent copy this stuff from  the child after 
>>> the child finishes, or does the child copy it to  the parent before 
>>> it exits?    I think we talked about this before  at some point but I 
>>> forgot what the plan is.  It would be nice if  we made the developer 
>>> define macros or something to dictate what  the input parameters need 
>>> to be filled in when invoking a child  and what output parameters can 
>>> be retrieved when it finishes.   Otherwise it starts getting tricky 
>>> to remember what fields need to  be set in the sm structure before 
>>> kicking something off.
>>> -Phil
>>> -Phil
>>
>>
>> -- 
>> Dr. Walter B. Ligon III
>> Associate Professor
>> ECE Department
>> Clemson University
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> 



More information about the Pvfs2-developers mailing list