Fwd: [Pvfs2-developers] threaded client-core and the device thread
Sam Lang
slang at mcs.anl.gov
Tue Oct 24 16:09:23 EDT 2006
Thanks Walt. I'm forwarding your response to dev so that everyone
can benefit. :-)
-sam
Begin forwarded message:
> From: "Walter B. Ligon III" <walt at clemson.edu>
> Date: October 24, 2006 2:52:45 PM CDT
> To: Sam Lang <slang at mcs.anl.gov>
> Subject: Re: [Pvfs2-developers] threaded client-core and the device
> thread
>
> Well, I had been planning to write that up ... as soon as I get the
> code done, but the pvfs-client stuff is being a real bitch. I
> don't really understand most of it any more than you understand
> this other stuff.
>
> But the quick answer is SM_ACTION_DEFERRED is just 0 and
> SM_ACTION_COMPLETE is just 1, you use them the same way 0 and 1
> were used, only now (hopfully) it is a little clearer what is going
> on (either the state action completed and you can continue to the
> next state, or it it was deferred, and you have to wait for it to
> finish).
>
> There is a new return code SM_ACTION_TERMINATE which indicates that
> the state machine should terminate. The state machines themselves
> treat the various "frames" (PINT_client_sm or PINT_server_op) as
> opaque types. They are set up by the respective code (client or
> server) and dutifully returned when requested in the state
> actions. Really, none of that is changed from the original, except
> how you get to them. They are nolonger directly accessible but
> should be accessed with PINT_sm_frame function.
>
> I don't have a general method to kill a state machine. I guess I
> can put that on the list for the next revision, but at this point
> I'm still focussing on getting what is there to run. The
> unexpected message state machine in the server checks for a server
> specific flag and kills a state machine if it is set. Its really
> up to the jobs to deal with killing a deferred SM - and I don't
> know if or how to do that. If there is a way to cancel a job, and
> if we keep a reference to a job for a deferred SM, then we could
> kill it that way. But if its just the timer SMs we could always
> have them check a flag each time the timer goes off and die if the
> flag is set - similar to what the unexpected messages do.
>
> Oh, and the other big change is that unepxected messages are
> nolonger special cases - they are regular old SMs like everything
> else. Much cleaner.
>
> Walt
>
> Sam Lang wrote:
>> On Oct 24, 2006, at 1:53 PM, Walter B. Ligon III wrote:
>>> Good. I'm making progress tracking down the problems in the code
>>> - somehow a bunch of edits got lost. I'm fixing them.
>>> Involves changes it all of the client state machines.
>>>
>>>
>>> BTW, there is one I'm confused about. src/client/sysint/sys-
>>> getattr.sm
>>> the last state action "getattr_set_sys_response" returns from
>>> several places. It is not clear if ALL of them intend to
>>> terminate since they don't all set the op_completed flag, but
>>> the only option in the SM is to terminate. So I'm assuming they
>>> want to terminate. If you know anything about that one I'd
>>> appreciate it if you'd look.
>> I agree that you don't want SM_ACTION_DEFERRED for any of those.
>> It looks like you just went through and replaced all the return
>> 0; lines in state actions with SM_ACTION_DEFERRED, even if the
>> error_code is set to a negative value (we used to ignore the
>> return value if the error value was negative?). If it was just a
>> search and replace, there are probably a bunch of other places
>> like this as well.
>> BTW, when is SM_ACTION_COMPLETE supposed to be used (returned by
>> a state action)? For nested machines? We could really use some
>> documentation for what is supposed to be returned by state
>> actions and when. It didn't exist before, and it took me a while
>> to figure out how return 0; and return 1; behaved, and now all
>> that is changing again. Its certainly for the better, but it
>> will help me to have the rules documented explicitly.
>> Also, the semantics of state machines and jobs, what are they?
>> What are the jobs currently associated with a state machine
>> pointer (PINT_client_sm or PINT_server_op)? How do I stop/cancel
>> a state machine? This is especially pertinent for our state
>> machines that essentially infinite loop, such as the job-timer
>> sm. We don't currently cleanup those state machines ourselves,
>> it would be nice of us if we did. That means figuring out what
>> (if any) jobs are currently posted by the machine, and cancelling
>> or waiting for completion on those jobs.
>> -sam
>>>
>>> Walt
>>>
>>> Sam Lang wrote:
>>>
>>>> I'm working with your branch Walt. Most of the code that does
>>>> allocation of the client state machines is the same.
>>>> -sam
>>>> On Oct 24, 2006, at 9:10 AM, Walter B. Ligon III wrote:
>>>>
>>>>> Should be careful here, since all of the code dealing with
>>>>> PINT_client_sm's have been rewritten for the new SM code and
>>>>> Murali's suggestions (for example) may not work so well.
>>>>>
>>>>> Walt
>>>>>
>>>>> Murali Vilayannur wrote:
>>>>>
>>>>>> Hey Sam,
>>>>>>
>>>>>>> I ran pvfs2-client-core in valgrind, and then ran Bonnie++ a
>>>>>>> few times (10) on the mounted pvfs volume, and noticed the
>>>>>>> following when I stopped the client process:
>>>>>>>
>>>>>>> ==20132== malloc/free: 1,298,824 allocs, 1,297,888 frees,
>>>>>>> 3,462,517,583 bytes allocated.
>>>>>>>
>>>>>>> Allocating and freeing 3.5GB seemed extreme, so I went
>>>>>>> exploring. It turns out that every time we allocate a
>>>>>>> PINT_client_sm, we're allocating about 35KB:
>>>>>>>
>>>>>>> (gdb) p sizeof(struct PINT_client_sm)
>>>>>>> $4 = 37764
>>>>>>
>>>>>>
>>>>>> Oh boy.. that is definitely large..
>>>>>>
>>>>>>> static array of 8 PINT_client_lookup_sm_ctx, which itself
>>>>>>> has a static array 40 PINT_client_lookup_sm_segment, which
>>>>>>> are each about 112 bytes. Anyway, it ends up accumulating.
>>>>>>>
>>>>>>> So I'm convinced at this point that this is beyond the
>>>>>>> noise range, plus its just cruft that we don't need. I'd
>>>>>>> like to swap out those static arrays for dynamic allocation
>>>>>>> when we get to the start of the lookup state machine. Any
>>>>>>> thoughts or suggestions?
>>>>>>
>>>>>>
>>>>>> I agree. It definitely does not look like noise region anymore.
>>>>>> How about we keep a pool of PINT_client_sm's around in client-
>>>>>> core and allocate from that instead of dynamically
>>>>>> allocating one everytime?
>>>>>> My 2 cents :)
>>>>>> thanks,
>>>>>> Murali
>>>>>> _______________________________________________
>>>>>> Pvfs2-developers mailing list
>>>>>> Pvfs2-developers at beowulf-underground.org
>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-
>>>>>> developers
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dr. Walter B. Ligon III
>>>>> Associate Professor
>>>>> ECE Department
>>>>> Clemson University
>>>>>
>>>
>>> --
>>> Dr. Walter B. Ligon III
>>> Associate Professor
>>> ECE Department
>>> Clemson University
>>>
>
> --
> Dr. Walter B. Ligon III
> Associate Professor
> ECE Department
> Clemson University
>
More information about the Pvfs2-developers
mailing list