[Pvfs2-developers] libpvfs2 usage
Troy Benjegerdes
troy at scl.ameslab.gov
Wed Oct 18 16:09:46 EDT 2006
> The bigger problem is the same one seen by most applications that
> use networks that require memory registration: program semantics do
> not require users to register memory but underlying hardware does,
> thus something has to patch that gap. If you reg/dereg around every
> transfer, things are very slow. Hence we go with caching in some
> middle layer to fix this up. The same is true for MPI as well.
> (The Netpipe guys had a way to cause lots of damage by sending lots
> of little buffers rather than one big one, I recall.)
The NetPIPE guy(s), which is me right now, do this by doing a ping-
pong of say a 128k message, but we send the message from a different
address each time. This beats up memory registration caches nicely ;)
We call this NetPIPE's cache-invalidate mode. It was originally
written to address stuff that ended up in CPU caches, but it works
quite nicely to break other caches as well.
The NetPIPE pvfs module, when run with cache invalidate, effectively
writes sequentially, but from a different buffer every time as well,
and we end up seeing the same behavior, and breakage on the ehca.
>
> By the way, various groups keep rediscovering this problem but there
> are no real appealing fixes. When was the last time you saw anybody
> use MPI_Alloc_mem? :) We discovered it ourselves in the context of
> PVFS back in 2003 or thereabouts, and took a stab at fixing it, but
> didn't quite complete the work needed to fully integrate it.
> (Wuj's Unifier framework (CCGrid04):
> http://www.osc.edu/~pw/papers/wu-unifier-ccgrid04.pdf
> )
>
> -- Pete
The solution for a kernel hacker like me is obvious, you allow the OS
kernel memory management and network driver handle the memory pinning
and interaction with the hardware. This way an application can just
call the OS to register the entire application memory space, and the
OS kernel can deal with keeping it all pinned down, and if it needs
to unpin something, it can do so.
The catch is that it requires the hardware to support keeping an
address registered, but *not* physically pinned, and *ask nicely* to
the OS via the page fault handler to pin the page back down if
something comes in. This seems to be an idea that RDMA hardware
designers just can't wrap their heads around. I guess they are too
used to dealing with OS'es that never change.
More information about the Pvfs2-developers
mailing list