[Pvfs2-developers] libpvfs2 usage

Sam Lang slang at mcs.anl.gov
Wed Oct 18 17:21:28 EDT 2006


On Oct 18, 2006, at 3:57 PM, Pete Wyckoff wrote:

> slang at mcs.anl.gov wrote on Wed, 18 Oct 2006 13:38 -0500:
>> We do what seems like unnecessary conversion from the output of
>> PINT_process_request (offsets and sizes into a buffer) to the input
>> of BMI_post_send_list (array of memory pointers).  Could we change
>> the BMI_post_send_list interface to take offsets instead?
>
> We need the list of buffers in case post_send_list wants to use bits
> of memory from separate allocations.  But we could also pass in the
> "parent buffer" if one exists, or all of them if multiple ones
> exist.  This gets messy fast.
>
> Another way to do it would be for the flow code to make a separate
> call to BMI to say "here's the big buffer I'm using".  But then to
> be completely correct it should probably say "I'm done with this big
> buffer" later.
>
> It's all a massive layering violation.  Calls out for some sort of
> unified approach to buffer management that spans layers.
>
> Here's what I'm sort of thinking now, but I'm not sure.  It kind of
> curdles the stomach contents.  Tell me if you think of a better way.
>
> Declare this:
>
> 	struct bmi_optimistic_buffer_info {
> 	    const void *buffer;
> 	    bmi_size_t len;
> 	    enum PVFS_io_type rw;
> 	}
>
> and hang one off the sm u.io.  Initialize in PVFS_isys_io by:
>
> 	sm_p->u.io.binfo.buffer = sm_p->u.io.buffer;
> 	sm_p->u.io.binfo.len = PINT_REQUEST_TOTAL_BYTES(sm_p->u.io.mem_req),
> 	sm_p->u.io.binfo.rw = sm_p->u.io.io_type;
>
> Then in io_post_flow() somewhere just before the job_flow():
>
> 	BMI_set_info(cur_ctx->msg.svr_addr, BMI_OPTIMISTIC_BUFFER_REG,
> 	             &sm_p->u.io.binfo);
>
> (called once for each server), and deep in
> io_datafile_complete_operations(), near the IO_SM_PHASE_FLOW
> handling, undo it for that particular server with:
>
> 	BMI_set_info(cur_ctx->msg.svr_addr, BMI_OPTIMISTIC_BUFFER_DEREG,
> 	             &sm_p->u.io.binfo);
>
> Under the hood BMI can choose to preserve its actual registration
> beyond the flow lifetimes of course.  The main benefit of doing this
> will be that BMI can recognize that a particular buffer is part of
> this bigger registration and avoid having the 900 separate little
> 64 kB registrations in favor of a single 36 MB registration.

Rob pointed out that even with one buffer pointer passed into  
PVFS_sys_io, if the request is non-contiguous, the offsets of the  
request could just have been calculated from different buffer  
pointers.  So we don't know how many separate buffers are being used  
over the memory request, except that there's at most as many buffers  
as contiguous regions in the request.  The PINT_process_request code  
(potentially) breaks those buffers up even further though, based on  
the distribution parameters (like strip size), before passing the  
pointers to bmi.  It seems like we could do what you're suggesting,  
but we would have to do it per each contiguous region of the  
request.  Maybe that's not such a big deal?  Not sure...

-sam


>
> 		-- Pete
>



More information about the Pvfs2-developers mailing list