[Pvfs2-developers] Re: [Pvfs2-cvs] commit by slang
in pvfs2/src/io/bmi/bmi_tcp: bmi-tcp.c sockio.c
Sam Lang
slang at mcs.anl.gov
Tue Dec 11 11:52:38 EST 2007
That seems possible. I did some reading and couldn't find any obvious
reasons the kernel does this, but I think the basic answer is that it
doesn't break the semantics of recv, as there aren't any semantic
guarantees between results from poll and calls to recv. To
investigate further I think would require looking at the kernel code,
and while I'm interested in what's going on there, its not something I
want to dig into right now. The answer isn't going to change the
behavior of the functions in any case. :-)
I think Kevin's and maybe Rob's concerns are that recv would loop
forever returning EAGAIN, and just from empirical evidence, it doesn't
appear to, so I would go with that as the current solution.
-sam
On Dec 11, 2007, at 10:46 AM, Walter B. Ligon III wrote:
> Shooting from the hip here, but is it possible EAGAIN might indicate
> there is a structure locked in the socket - say if a packet receive
> handler is running - which would block the call, even though there
> ARE bytes in the socket?
>
> Walt
>
> Sam Lang wrote:
>> I agree Pete -- its messy. Just by the names of errnos, it seems
>> appropriate to return what's been completed if we get EWOULDBLOCK,
>> while EAGAIN suggests we can just call recv again and get what we
>> want. But as you point out they're the same value. According to
>> the opengroup, impls _may_ assign the same value to both:
>> http://www.opengroup.org/pubs/online/7908799/xsh/errors.html
>> Strictly from a linux implementation perspective, epoll/poll tell
>> us that the bytes are on the socket, so even when EAGAIN is
>> returned, we can call recv again and get what we wanted. I've
>> tested this a bunch, and when EAGAIN is returned (which is
>> infrequent), the next call invariably returns successfully. There
>> were two instances where the code looped up-to around 200 times on
>> EAGAIN under heavy load. But looping does turn nbrecv into more of
>> a brecv, although we avoid all the fcntl calls to turn the socket
>> into a blocking one just for the recv call.
>> With the socket in non-blocking mode, the conditional:
>> if (ret == -1 && errno == EWOULDBLOCK)
>> {
>> return (len - comp); /* return amount completed */
>> }
>> Just doesn't work. It causes the caller to error and close the
>> socket. Not what we want.
>> I think we can get away with doing:
>> if (!ret) /* socket closed */
>> {
>> errno = EPIPE;
>> return (-1);
>> }
>> if (ret == -1 && (errno == EINTR || errno == EAGAIN || errno
>> == EWOULDBLOCK))
>> {
>> goto nbrecv_restart;
>> }
>> else if (ret == -1)
>> {
>> return (-1);
>> }
>> From a practical perspective, this seems to work, and an
>> implementation that has poll telling us that bytes are ready, but
>> recv returning EWOULDBLOCK because of anything other than small
>> timing issues in the kernel seems broken anyway.
>> The alternative is to return the bytes received with the errno, and
>> on EAGAIN, we would have to add the operation back onto the op
>> queue with a state variable of how much was received. The code is
>> designed to avoid doing this in the first place by polling until
>> the bytes we need are ready, so doing this would probably be messy.
>> -sam
>> On Dec 10, 2007, at 11:47 PM, Pete Wyckoff wrote:
>>> rross at mcs.anl.gov <mailto:rross at mcs.anl.gov> wrote on Mon, 10 Dec
>>> 2007 21:19 -0600:
>>>> while a loop will fix it, it would be really nice to understand
>>>> how we get
>>>> EAGAIN when we think that there are bytes there...
>>> [..]
>>>> On Dec 7, 2007, at 4:55 PM, Sam Lang wrote:
>>>>> I'm seeing recv on a socket in non-blocking mode returning EAGAIN
>>>>> occasionally, even though epoll has just told us there's bytes
>>>>> waiting. I
>>>>> guess that's why the call was initially a blocking recv. I can
>>>>> add a loop
>>>>> around the non-blocking recv while it returns EAGAIN, unless
>>>>> someone can
>>>>> think of a better work around.
>>>
>>> The function is getting a bit messy. I'm all for looping on E* and
>>> thought Sam's original mail made sense. But on second glance:
>>>
>>> int BMI_sockio_nbrecv(int s,
>>> void *buf,
>>> int len)
>>> {
>>> int ret, comp = len;
>>>
>>> assert(fcntl(s, F_GETFL, 0) & O_NONBLOCK);
>>>
>>> while (comp)
>>> {
>>> nbrecv_restart:
>>> ret = recv(s, buf, comp, DEFAULT_MSG_FLAGS);
>>> if (!ret) /* socket closed */
>>> {
>>> errno = EPIPE;
>>> return (-1);
>>> }
>>> if (ret == -1 && errno == EWOULDBLOCK)
>>> {
>>> return (len - comp); /* return amount completed */
>>> }
>>> if (ret == -1 && (errno == EINTR || errno == EAGAIN))
>>> {
>>> goto nbrecv_restart;
>>> }
>>> else if (ret == -1)
>>> {
>>> return (-1);
>>> }
>>> comp -= ret;
>>> buf = (char *)buf + ret;
>>> }
>>> return (len - comp);
>>> }
>>>
>>> Note that we get from standard headers:
>>>
>>> /usr/include/asm-generic/errno.h:#define EWOULDBLOCK
>>> EAGAIN /* Operation would block */
>>>
>>> But maybe there are some systems where this is not true? Not ones
>>> that use glibc, apparently.
>>>
>>> Anyway, the first use of EWOULDBLOCK runs us back to the poll
>>> loop, which is the right thing to do. The second use of EAGAIN
>>> would lead to a busy loop on recv()->EAGAIN that isn't quite so
>>> nice. But that code never gets hit.
>>>
>>> I'm not sure that a poll readable result necessarily means we'll get
>>> any bytes on the socket. There are numerous ways in which things
>>> can get messy.
>>>
>>> -- Pete
>>>
>> ------------------------------------------------------------------------
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> --
> Dr. Walter B. Ligon III
> Associate Professor
> ECE Department
> Clemson University
>
More information about the Pvfs2-developers
mailing list