[Pvfs2-developers] server flow post
pw at osc.edu
Fri Jan 12 10:03:04 EST 2007
atchley at myri.com wrote on Fri, 12 Jan 2007 09:55 -0500:
> In TCP, the OS will receive and buffer the data. There is always a
> copy regardless if you pre-post the receive or not. Are you asking
> which is faster between memory copy and network transfer? If so, I
> would think that the memory copy is always faster. Given that, the
> current strategy (ack, then post the flow) makes the most sense.
> In IB, I believe that A cannot write/put a large message to B until B
> has allocated memory and sent the memory address to A. This is why
> bmi_ib needs the RTS and CTS messages.
> MX does this internally. When A posts a large send, MX sends a
> "scout" message, which is equivalent to the RTS, to B that includes
> the matching info and length. If B has posted a receive, than B
> replies with an ack and A can start sending data. If B has not posted
> a receive, then the scout message goes into the unexpected queue.
> When B does post a matching receive, it then has to scan the
> unexpected queue to see if it has already arrived. If so, it matches
> and sends an ack to start the data transfer.
> By pre-posting the receives, we eliminate the scanning a potentially
> very long unexpected queue (I am thinking of the case of a storage
> server handling 10s or 100s of clients).
> If you pre-post the receives, then in the IB case you could send all
> of that data in the ack to the initial sendunexpected and potentially
> eliminate the RTS and CTS messages as well.
> Pete, I could possible be smoking something and this is not possible
> in IB at all. Any thoughts?
> Sam, it may be that I am trying to optimize something that will not
> provide much benefit at all. Can you send a patch that simply posts
> the flow before the ack. I can test it on MX-10G and see if it
> impacts performance at all. If not, leave things as they are.
I think that all makes sense. Agree that the need for preposting
receives is to avoid big queues of waiting unexpected messages.
(Doubt anyone will bother to coalesce the sendunexpected ack and CTS,
as that's some complexity to save one little message.)
A long time ago I suggested that we mandate that BMI users must
prepost all receives, but this was rejected (reasonably) in that it
makes app programming more difficult. Instead I had to go and
implement RTS/CTS, and MX has to use its scout messages. These
things are fine to do, but we can avoid some performance overheads
by still trying to use preposted receives where possible, especially
in hot paths like IO flows.
More information about the Pvfs2-developers