[Pvfs2-developers] server flow post

Sam Lang slang at mcs.anl.gov
Fri Jan 12 12:52:22 EST 2007


On Jan 12, 2007, at 9:03 AM, Pete Wyckoff wrote:

> atchley at myri.com wrote on Fri, 12 Jan 2007 09:55 -0500:
>> In TCP, the OS will receive and buffer the data. There is always a
>> copy regardless if you pre-post the receive or not. Are you asking
>> which is faster between memory copy and network transfer? If so, I
>> would think that the memory copy is always faster. Given that, the
>> current strategy (ack, then post the flow) makes the most sense.
>>
>> In IB, I believe that A cannot write/put a large message to B until B
>> has allocated memory and sent the memory address to A. This is why
>> bmi_ib needs the RTS and CTS messages.
>>
>> MX does this internally. When A posts a large send, MX sends a
>> "scout" message, which is equivalent to the RTS, to B that includes
>> the matching info and length. If B has posted a receive, than B
>> replies with an ack and A can start sending data. If B has not posted
>> a receive, then the scout message goes into the unexpected queue.
>> When B does post a matching receive, it then has to scan the
>> unexpected queue to see if it has already arrived. If so, it matches
>> and sends an ack to start the data transfer.
>>
>> By pre-posting the receives, we eliminate the scanning a potentially
>> very long unexpected queue (I am thinking of the case of a storage
>> server handling 10s or 100s of clients).
>>
>> If you pre-post the receives, then in the IB case you could send all
>> of that data in the ack to the initial sendunexpected and potentially
>> eliminate the RTS and CTS messages as well.
>>
>> Pete, I could possible be smoking something and this is not possible
>> in IB at all. Any thoughts?
>>
>> Sam, it may be that I am trying to optimize something that will not
>> provide much benefit at all. Can you send a patch that simply posts
>> the flow before the ack. I can test it on MX-10G and see if it
>> impacts performance at all. If not, leave things as they are.
>
> I think that all makes sense.  Agree that the need for preposting
> receives is to avoid big queues of waiting unexpected messages.

Hi Scott,

The attached patch posts the flow (and receives) before posting the  
send of the response ack.  I posted the response ack before the flow,  
because the first call to BMI_memalloc (with a request for 1MB in  
your case) happens in the flow post call, so that delays posting of  
the response ack.  I'm curious if this will actually give you better  
performance.

I also fixed that assert failure you were getting.  Let me know if  
this works for you.

Thanks,

-sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: io-flow-post.patch
Type: application/octet-stream
Size: 16727 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070112/a8c5d97e/io-flow-post.obj
-------------- next part --------------

>
> (Doubt anyone will bother to coalesce the sendunexpected ack and CTS,
> as that's some complexity to save one little message.)
>
> A long time ago I suggested that we mandate that BMI users must
> prepost all receives, but this was rejected (reasonably) in that it
> makes app programming more difficult.  Instead I had to go and
> implement RTS/CTS, and MX has to use its scout messages.  These
> things are fine to do, but we can avoid some performance overheads
> by still trying to use preposted receives where possible, especially
> in hot paths like IO flows.
>
> 		-- Pete
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list