[Pvfs2-developers] Re: [Pvfs2-cvs] commit by slang
in pvfs2/src/io/bmi/bmi_tcp: bmi-tcp.c sockio.c
Rob Ross
rross at mcs.anl.gov
Tue Dec 11 12:22:25 EST 2007
i was more concerned that we not lose track of awaiting data because
of not retrying, but i'm woefully uninformed on how poll()/epoll do
their thing at this point.
the solution you chose (in later email) seems good to me.
rob
On Dec 11, 2007, at 10:52 AM, Sam Lang wrote:
>
> That seems possible. I did some reading and couldn't find any
> obvious reasons the kernel does this, but I think the basic answer
> is that it doesn't break the semantics of recv, as there aren't any
> semantic guarantees between results from poll and calls to recv.
> To investigate further I think would require looking at the kernel
> code, and while I'm interested in what's going on there, its not
> something I want to dig into right now. The answer isn't going to
> change the behavior of the functions in any case. :-)
>
> I think Kevin's and maybe Rob's concerns are that recv would loop
> forever returning EAGAIN, and just from empirical evidence, it
> doesn't appear to, so I would go with that as the current solution.
> -sam
>
>
> On Dec 11, 2007, at 10:46 AM, Walter B. Ligon III wrote:
>
>> Shooting from the hip here, but is it possible EAGAIN might
>> indicate there is a structure locked in the socket - say if a
>> packet receive handler is running - which would block the call,
>> even though there ARE bytes in the socket?
>>
>> Walt
>>
>> Sam Lang wrote:
>>> I agree Pete -- its messy. Just by the names of errnos, it seems
>>> appropriate to return what's been completed if we get
>>> EWOULDBLOCK, while EAGAIN suggests we can just call recv again
>>> and get what we want. But as you point out they're the same
>>> value. According to the opengroup, impls _may_ assign the same
>>> value to both:
>>> http://www.opengroup.org/pubs/online/7908799/xsh/errors.html
>>> Strictly from a linux implementation perspective, epoll/poll tell
>>> us that the bytes are on the socket, so even when EAGAIN is
>>> returned, we can call recv again and get what we wanted. I've
>>> tested this a bunch, and when EAGAIN is returned (which is
>>> infrequent), the next call invariably returns successfully.
>>> There were two instances where the code looped up-to around 200
>>> times on EAGAIN under heavy load. But looping does turn nbrecv
>>> into more of a brecv, although we avoid all the fcntl calls to
>>> turn the socket into a blocking one just for the recv call.
>>> With the socket in non-blocking mode, the conditional:
>>> if (ret == -1 && errno == EWOULDBLOCK)
>>> {
>>> return (len - comp); /* return amount completed */
>>> }
>>> Just doesn't work. It causes the caller to error and close the
>>> socket. Not what we want.
>>> I think we can get away with doing:
>>> if (!ret) /* socket closed */
>>> {
>>> errno = EPIPE;
>>> return (-1);
>>> }
>>> if (ret == -1 && (errno == EINTR || errno == EAGAIN ||
>>> errno == EWOULDBLOCK))
>>> {
>>> goto nbrecv_restart;
>>> }
>>> else if (ret == -1)
>>> {
>>> return (-1);
>>> }
>>> From a practical perspective, this seems to work, and an
>>> implementation that has poll telling us that bytes are ready, but
>>> recv returning EWOULDBLOCK because of anything other than small
>>> timing issues in the kernel seems broken anyway.
>>> The alternative is to return the bytes received with the errno,
>>> and on EAGAIN, we would have to add the operation back onto the
>>> op queue with a state variable of how much was received. The
>>> code is designed to avoid doing this in the first place by
>>> polling until the bytes we need are ready, so doing this would
>>> probably be messy.
>>> -sam
>>> On Dec 10, 2007, at 11:47 PM, Pete Wyckoff wrote:
>>>> rross at mcs.anl.gov <mailto:rross at mcs.anl.gov> wrote on Mon, 10
>>>> Dec 2007 21:19 -0600:
>>>>> while a loop will fix it, it would be really nice to understand
>>>>> how we get
>>>>> EAGAIN when we think that there are bytes there...
>>>> [..]
>>>>> On Dec 7, 2007, at 4:55 PM, Sam Lang wrote:
>>>>>> I'm seeing recv on a socket in non-blocking mode returning EAGAIN
>>>>>> occasionally, even though epoll has just told us there's bytes
>>>>>> waiting. I
>>>>>> guess that's why the call was initially a blocking recv. I
>>>>>> can add a loop
>>>>>> around the non-blocking recv while it returns EAGAIN, unless
>>>>>> someone can
>>>>>> think of a better work around.
>>>>
>>>> The function is getting a bit messy. I'm all for looping on E* and
>>>> thought Sam's original mail made sense. But on second glance:
>>>>
>>>> int BMI_sockio_nbrecv(int s,
>>>> void *buf,
>>>> int len)
>>>> {
>>>> int ret, comp = len;
>>>>
>>>> assert(fcntl(s, F_GETFL, 0) & O_NONBLOCK);
>>>>
>>>> while (comp)
>>>> {
>>>> nbrecv_restart:
>>>> ret = recv(s, buf, comp, DEFAULT_MSG_FLAGS);
>>>> if (!ret) /* socket closed */
>>>> {
>>>> errno = EPIPE;
>>>> return (-1);
>>>> }
>>>> if (ret == -1 && errno == EWOULDBLOCK)
>>>> {
>>>> return (len - comp); /* return amount completed */
>>>> }
>>>> if (ret == -1 && (errno == EINTR || errno == EAGAIN))
>>>> {
>>>> goto nbrecv_restart;
>>>> }
>>>> else if (ret == -1)
>>>> {
>>>> return (-1);
>>>> }
>>>> comp -= ret;
>>>> buf = (char *)buf + ret;
>>>> }
>>>> return (len - comp);
>>>> }
>>>>
>>>> Note that we get from standard headers:
>>>>
>>>> /usr/include/asm-generic/errno.h:#define EWOULDBLOCK
>>>> EAGAIN /* Operation would block */
>>>>
>>>> But maybe there are some systems where this is not true? Not ones
>>>> that use glibc, apparently.
>>>>
>>>> Anyway, the first use of EWOULDBLOCK runs us back to the poll
>>>> loop, which is the right thing to do. The second use of EAGAIN
>>>> would lead to a busy loop on recv()->EAGAIN that isn't quite so
>>>> nice. But that code never gets hit.
>>>>
>>>> I'm not sure that a poll readable result necessarily means we'll
>>>> get
>>>> any bytes on the socket. There are numerous ways in which things
>>>> can get messy.
>>>>
>>>> -- Pete
>>>>
>>> --------------------------------------------------------------------
>>> ----
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>> --
>> Dr. Walter B. Ligon III
>> Associate Professor
>> ECE Department
>> Clemson University
>>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list