[Pvfs2-developers] bmi questions
Sam Lang
slang at mcs.anl.gov
Fri Aug 18 00:58:41 EDT 2006
On Aug 17, 2006, at 7:49 PM, Pete Wyckoff wrote:
> slang at mcs.anl.gov wrote on Thu, 17 Aug 2006 18:14 -0500:
>> * BMI memory allocation. Do we place any restrictions on when or how
>> frequently BMI_memalloc is called? In the pvfs code, we always call
>> BMI_memalloc for a post_send or post_recv. Would it be possible to
>> avoid the malloc on the client for a write and just use the user
>> buffer? Or should we mandate that calls to post_send and post_recv
>> always pass in a pointer from BMI_memalloc? (as a side note, if we
>> make that mandate, maybe we should have a BMI_buffer type that
>> memalloc returns and post_send/post_recv accept).
>
> Both bmi_ib and bmi_gm define the BMI memalloc method to do
> something other than simply malloc(). In the IB case, it pins the
> memory early, and never unpins it until the corresponding
> BMI_memfree() happens. This is better than letting BMI do the
> pinning explicitly, as it moves some of the messaging work out of
> the critical path, if you can arrange to alloc/free before you do
> send/recv.
>
> Note that these alloc routines only do something special if the
> buffer is big enough to be "worth it" (8 kB for IB).
>
> There's no restrictions on how frequently you can call these things.
> Each pinned memory region has some overhead in terms of in-pvfs data
> structures, in-kernel data structers, and on-NIC data structures.
> Ideally we'd try to limit the growth of these things and force old
> entries to be freed, but in practice they mostly just grow and it's
> not a big problem (unless you have lots of pvfs apps on a single
> box, for instance).
>
> You can certainly avoid the malloc and use the user buffer when you
> have one instead. I think this is the common case for MPI-IO
> operations. Point out what case you're talking about and I'll take
> a look.
It looks like the mem_to_bmi code (client write) in flow always does
a memalloc for the intermediate buffer and then copies the user
buffer into that. On reads (bmi_to_mem), flow does use the client's
buffer, so I guess that's a case that doesn't do memalloc. I wonder
if the copy on a client write could be avoided as well though.
>
> We definitely cannot mandate that all memory is BMI_memalloc-ed.
> Arbirtary MPI_File_Write() and similar will pass in user buffers.
> We don't want to copy them into BMI_memalloc-ed memory, and it's not
> really practical to require that application writers use the MPI (or
> PVFS) alloc routines.
>
> If the bmi_buffer_type argument to the post_send and post_recv
> routines is BMI_PRE_ALLOC, a BMI implementation can avoid pinning
> the memory, as does GM. For IB, it's just as fast to check the
> address to see if it has already been pinned, either through
> memalloc or implicitly by having been used as a user buffer.
>
Sounds cool. Thanks for the good explanation.
-sam
> -- Pete
>
More information about the Pvfs2-developers
mailing list