[Pvfs2-developers] threaded client-core and the device thread

Sam Lang slang at mcs.anl.gov
Mon Oct 23 20:19:18 EDT 2006


On Oct 16, 2006, at 8:37 AM, Phil Carns wrote:

> Sam Lang wrote:
>> Hi All,
>> Dean and I are looking at trying to push the efficiency of  
>> requests  from the kernel module up through the device to client- 
>> core.  I added  the --threaded option to the client to allow the  
>> client-core to run  with multiple threads (one each for bmi, dev,  
>> and main -- and also a  remount thread, but lets ignore that for  
>> now), so the device thread  should be able to keep pulling  
>> requests of the device without having  to wait for bmi operations  
>> to complete.
>> I noticed a couple things with the device thread that I wanted to  
>> ask  about.
>> PINT_dev_test_unexpected takes an incount of 5, so its only going  
>> to  read at most 5 requests off the device for each call.  Once  
>> it  returns, each of the unexpected requests is added to the  
>> completed  jobs array and then we signal the jobs completed  
>> condition variable  _for each request_.  It seems like this will  
>> be 5x the number of  context switches between the device thread  
>> and the main thread that  we need.
>> Also, we poll every time before reading another request off the   
>> device.  What about trying to read a number of requests off the   
>> device at once with one read (or possibly a readv so we can keep   
>> separate buffers per request).
>> Also, it looks like we do a malloc for each new request buffer,  
>> and  then a free once we're done with it, and a memset of the  
>> info  struct.  It seems like we could manage the buffers on the  
>> stack  instead of the heap, and save on a few system calls there.
>> For both threaded and nonthreaded, with the workload that Dean is   
>> using, he found that the PINT_dev_test_unexpected always returned  
>> 5  requests in the outcount.  So it looks like there are always  
>> requests  sitting on the device, waiting to be read by client- 
>> core.  Are we  just not able to process requests fast enough  
>> through BMI and the  state machines, or is the cost of polling and  
>> signaling every time we  read a request off the device slowing us  
>> down?  In other words, does  it make sense to rework the code a  
>> little bit or will we just get  bottlenecked elsewhere?
>
> I am just speculating, but out of the things you list I would guess  
> that these two things would be most likely to show improvement  
> without much coding effort:
>
> - increasing the testcount to something higher than 5 (since it  
> sounds like that is getting maxed out for this workload)
> - fixing the "signalling on every request problem"
>
> The need for multiple reads and the mallocs could be a problem,
> but I am with Murali in that I think problems in this area are more  
> likely related to inefficient threading or I/O stalls rather than  
> CPU or memory overhead.
>

I ran pvfs2-client-core in valgrind, and then ran Bonnie++ a few  
times (10) on the mounted pvfs volume, and noticed the following when  
I stopped the client process:

==20132== malloc/free: 1,298,824 allocs, 1,297,888 frees,  
3,462,517,583 bytes allocated.

Allocating and freeing 3.5GB seemed extreme, so I went exploring.  It  
turns out that every time we allocate a PINT_client_sm, we're  
allocating about 35KB:

(gdb) p sizeof(struct PINT_client_sm)
$4 = 37764

The problem is that we allocate a PINT_client_sm every time a new  
request is posted.  Most of the memory is from the u.lookup field:

(gdb) p sizeof(struct PINT_client_lookup_sm)
$3 = 36196

PINT_client_lookup_sm has a static array of 8  
PINT_client_lookup_sm_ctx, which itself has a static array 40  
PINT_client_lookup_sm_segment, which are each about 112 bytes.   
Anyway, it ends up accumulating.

So I'm convinced at this point that this is beyond the noise range,  
plus its just cruft that we don't need.  I'd like to swap out those  
static arrays for dynamic allocation when we get to the start of the  
lookup state machine.  Any thoughts or suggestions?

-sam

> -Phil
>



More information about the Pvfs2-developers mailing list