[Pvfs2-developers] Re: [Pvfs2-cvs] commit by slang in pvfs2/src/io/bmi/bmi_tcp: bmi-tcp.c sockio.c

Sam Lang slang at mcs.anl.gov
Tue Dec 11 11:24:20 EST 2007


I agree Pete -- its messy.  Just by the names of errnos, it seems  
appropriate to return what's been completed if we get EWOULDBLOCK,  
while EAGAIN suggests we can just call recv again and get what we  
want.  But as you point out they're the same value.  According to the  
opengroup, impls _may_ assign the same value to both:

http://www.opengroup.org/pubs/online/7908799/xsh/errors.html

Strictly from a linux implementation perspective, epoll/poll tell us  
that the bytes are on the socket, so even when EAGAIN is returned, we  
can call recv again and get what we wanted.  I've tested this a bunch,  
and when EAGAIN is returned (which is infrequent), the next call  
invariably returns successfully.  There were two instances where the  
code looped up-to around 200 times on EAGAIN under heavy load.  But  
looping does turn nbrecv into more of a brecv, although we avoid all  
the fcntl calls to turn the socket into a blocking one just for the  
recv call.

With the socket in non-blocking mode, the conditional:

        if (ret == -1 && errno == EWOULDBLOCK)
        {
            return (len - comp);        /* return amount completed */
        }

Just doesn't work.  It causes the caller to error and close the  
socket.  Not what we want.
I think we can get away with doing:

        if (!ret)       /* socket closed */
        {
            errno = EPIPE;
            return (-1);
        }
        if (ret == -1 && (errno == EINTR || errno == EAGAIN || errno  
== EWOULDBLOCK))
        {
            goto nbrecv_restart;
        }
        else if (ret == -1)
        {
            return (-1);
        }

 From a practical perspective, this seems to work, and an  
implementation that has poll telling us that bytes are ready, but recv  
returning EWOULDBLOCK because of anything other than small timing  
issues in the kernel seems broken anyway.

The alternative is to return the bytes received with the errno, and on  
EAGAIN, we would have to add the operation back onto the op queue with  
a state variable of how much was received.  The code is designed to  
avoid doing this in the first place by polling until the bytes we need  
are ready, so doing this would probably be messy.

-sam

On Dec 10, 2007, at 11:47 PM, Pete Wyckoff wrote:

> rross at mcs.anl.gov wrote on Mon, 10 Dec 2007 21:19 -0600:
>> while a loop will fix it, it would be really nice to understand how  
>> we get
>> EAGAIN when we think that there are bytes there...
> [..]
>> On Dec 7, 2007, at 4:55 PM, Sam Lang wrote:
>>> I'm seeing recv on a socket in non-blocking mode returning EAGAIN
>>> occasionally, even though epoll has just told us there's bytes  
>>> waiting.  I
>>> guess that's why the call was initially a blocking recv.  I can  
>>> add a loop
>>> around the non-blocking recv while it returns EAGAIN, unless  
>>> someone can
>>> think of a better work around.
>
> The function is getting a bit messy.  I'm all for looping on E* and
> thought Sam's original mail made sense.  But on second glance:
>
> int BMI_sockio_nbrecv(int s,
>           void *buf,
>           int len)
> {
>    int ret, comp = len;
>
>    assert(fcntl(s, F_GETFL, 0) & O_NONBLOCK);
>
>    while (comp)
>    {
>      nbrecv_restart:
>        ret = recv(s, buf, comp, DEFAULT_MSG_FLAGS);
>        if (!ret)       /* socket closed */
>        {
>            errno = EPIPE;
>            return (-1);
>        }
>        if (ret == -1 && errno == EWOULDBLOCK)
>        {
>            return (len - comp);        /* return amount completed */
>        }
>        if (ret == -1 && (errno == EINTR || errno == EAGAIN))
>        {
>            goto nbrecv_restart;
>        }
>        else if (ret == -1)
>        {
>            return (-1);
>        }
>        comp -= ret;
>        buf = (char *)buf + ret;
>    }
>    return (len - comp);
> }
>
> Note that we get from standard headers:
>
> /usr/include/asm-generic/errno.h:#define        EWOULDBLOCK      
> EAGAIN  /* Operation would block */
>
> But maybe there are some systems where this is not true?  Not ones
> that use glibc, apparently.
>
> Anyway, the first use of EWOULDBLOCK runs us back to the poll
> loop, which is the right thing to do.  The second use of EAGAIN
> would lead to a busy loop on recv()->EAGAIN that isn't quite so
> nice.  But that code never gets hit.
>
> I'm not sure that a poll readable result necessarily means we'll get
> any bytes on the socket.  There are numerous ways in which things
> can get messy.
>
> 		-- Pete
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071211/7667c6de/attachment.htm


More information about the Pvfs2-developers mailing list