[Pvfs2-developers] BMI questions
Scott Atchley
atchley at myri.com
Fri Dec 1 13:26:43 EST 2006
On Dec 1, 2006, at 12:53 PM, Sam Lang wrote:
>>> It looks like the flow code on the server doesn't actually post
>>> the next recv of IO (IO2), until the first recv has completed
>>> (IO1), so its possible that the client posts (and starts) the
>>> next send before the server posts the next receive, although its
>>> probably unlikely.
>>
>> If IO operations are always > 32 KB, I would agree. But if any are
>> <= 32 KB, MX will buffer them on the send side and complete
>> immediately. The client could then post another even if MX is in
>> the middle of delivering the first one. I can override this
>> behavior (use mx_issend()) or use credits for control flow.
>
> Hm...these particular IOs are going to post BMI_send calls > 32KB.
> If the IO is less than that, we probably want to pack the IO in the
> first request. We call that eager mode, and you would need to have
> the BMI_get_info(BMI_GET_UNEXP_SIZE) return 32K.
The reason I mention 32 KB is that it is a magic (albeit adjustable)
number in MX that determines when MX switches from sending messages
eagerly to using rendezvous. I do not necessarily want to tie the
maximum unexpected message size to that value (bmi_ib uses 8 KB for
example).
If IO calls are always larger than 32 KB, then MX will use the
rendezvous protocol and I do not have to worry about the server being
overwhelmed with sends arriving before the matching post is received
(in rendezvous mode, the client sends the header info only and the
data stays on the client until the server indicates it is ready for
the payload).
> In either case it sounds like its possible for a bunch of client
> sends to get posted, and a bunch of server receives to get posted,
> without any of them actually completing. Is it possible to sort
> all that out if the same tag is specified for all of them?
In MX, matching is done in order so if they use the same tag, then
send[0] should match against recv[0], send[1] matches recv[1], etc.
If it doesn't, we will fix it. ;-)
>>> Each BMI receive uses a separate buffer (up to a max of 8 buffers).
>>
>> Does this mean that at most, the client will post 8 IO sends per
>> operation?
>
> The 8 buffer limit is specified by the FlowBuffersPerFlow config
> option, and it just limits the number of buffers that can be
> allocated on the server (and hence the number of outstanding BMI
> operations for a particular IO). In the diagram I sent in the
> previous email, each IOn would have had an associated buffer. When
> it gets to 8, no more BMI_post_recv calls are made until one of the
> TROVE_post_write calls has completed first (freeing up one of the
> buffers). None of that changes the behavior on the client, since
> the client uses the user buffer. He keeps posting another send
> once a previous send has completed.
Ok.
>>> Every time a bmi recv completes, two things happen, the
>>> associated trove write is posted, and a new bmi recv is posted.
>>> So over time, bmi receives will get posted at the server before
>>> bmi sends get posted at the client, but the second and maybe
>>> third bmi receives posted may be posted after the bmi sends at
>>> the client.
>>>
>>> To answer your specific questions:
>>>
>>> The same bmi tag is passed to each of the post_send and post_recv
>>> calls for the entire IO operation.
>>
>> I can live with this as long as only one receive is posted at a
>> time using a specific tag.
>
> Hm..we actually do post multiple receives using the same tag. All
> BMI messages for a given IO operation get the same tag.
As mentioned above, this should work. My statement that only one
should be called was not well thought out. ;-)
> No that's partly my own confusion. We post unexpected jobs in the
> server, but this doesn't translate to a posted receive for
> unexpected messages in BMI. We just setup a queue for completed
> unexpected BMI messages, and populate that once BMI_testunexpected
> returns something.
>
> -sam
Ok.
Thanks,
Scott
More information about the Pvfs2-developers
mailing list