[Pvfs2-developers] help debugging request processor/distribution
Rob Ross
rross at mcs.anl.gov
Tue Jun 13 18:47:41 EDT 2006
There's a fundamental issue here that I don't quite get: if we're in
rendezvous mode, why is there data on the wire if we aren't ready to
receive it? The whole point of rendezvous mode is to *not* send the data
until the matching receive has been posted.
What am I missing?
Thanks,
Rob
Phil Carns wrote:
> Ok, I think I _might_ see what the problem is with the BMI messaging.
>
> I haven't 100% confirmed yet, but it looks like we have the following
> scenario:
>
> On the client side:
> --------------------
> - pvfs2-client-core starts a I/O operation (write) to server X
> - a send (for the request) is posted, which is a small buffer
> - the flow is posted before an ack is received
> - the flow itself posts another send for data, which is a large buffer
> - ...
>
> A few notes real quick- I think the above is a performance optimization;
> we try to go ahead and get the flow going before receiving a positive
> ack from the server. It will be canceled if we get a negative ack (or
> fail to get an ack altogether)
>
> - while the above is in progress, pvfs2-client-core starts another write
> operation to server X (from another application that is hitting the same
> server)
> - a send for this second request is posted
> - another flow is posted before an ack is received
> - depending on the timing, it may manage to post a send for data as
> well, which is another large buffer
> - this traffic is interleaved on the same socket as is being used for
> the first flow, which is still running at this point
>
> On the server side:
> --------------------
> - the first I/O request arrives
> - it gets past the request scheduler
> - a flow is started and receives the first (large) data buffer
> - a different request for the same handle arrives
> - getattr would be a good example, could be from any client
> - this getattr gets queued in the request scheduler behind the write
> - the second I/O request arrives
> - it gets queued behind the getattr in the request scheduler
>
> At this point on the server side, we have a flow in progress that is
> waiting on a data buffer. However, the next message is for a different
> flow (the tags don't match). Since this message is relatively large
> (256K), it is in rendezvous mode within bmi_tcp and cannot be pulled out
> of the socket until a matching receive is posted. The flow that is
> expected to post that receive is not running yet because the second I/O
> request is stuck in the scheduler.
>
> ... so we have a deadlock. The socket is filled with data that the
> server isn't allowed to recv yet, and the data that it really needs next
> is stuck behind it.
>
> I'm not sure that I described that all that well. At a high level we
> have two flows sharing the same socket. The client started both of them
> and the messages got interleaved. The server only started one of them,
> but is now stuck because it can't deal with data arriving for the second
> one.
>
> I am going to try to find a brute force way to serialize I/O from each
> pvfs2-client-core just to see if that solves the problem (maybe only
> allowing one buffer between pvfs2-client-core and kernel, rather than
> 5). If that does look like it fixed the problem, then we need a more
> elegant solution. Maybe waiting for acks before starting flows, or
> just somehow serializing flows that share sockets.
>
> -Phil
More information about the Pvfs2-developers
mailing list