[Pvfs2-developers] Re: [Pvfs2-cvs] commit by slang in pvfs2/src/io/bmi/bmi_tcp: bmi-tcp.c sockio.c

Sam Lang slang at mcs.anl.gov
Tue Dec 11 11:52:38 EST 2007


That seems possible.  I did some reading and couldn't find any obvious  
reasons the kernel does this, but I think the basic answer is that it  
doesn't break the semantics of recv, as there aren't any semantic  
guarantees between results from poll and calls to recv.  To  
investigate further I think would require looking at the kernel code,  
and while I'm interested in what's going on there, its not something I  
want to dig into right now.  The answer isn't going to change the  
behavior of the functions in any case. :-)

I think Kevin's and maybe Rob's concerns are that recv would loop  
forever returning EAGAIN, and just from empirical evidence, it doesn't  
appear to, so I would go with that as the current solution.
-sam


On Dec 11, 2007, at 10:46 AM, Walter B. Ligon III wrote:

> Shooting from the hip here, but is it possible EAGAIN might indicate  
> there is a structure locked in the socket - say if a packet receive  
> handler is running - which would block the call, even though there  
> ARE bytes in the socket?
>
> Walt
>
> Sam Lang wrote:
>> I agree Pete -- its messy.  Just by the names of errnos, it seems  
>> appropriate to return what's been completed if we get EWOULDBLOCK,  
>> while EAGAIN suggests we can just call recv again and get what we  
>> want.  But as you point out they're the same value.  According to  
>> the opengroup, impls _may_ assign the same value to both:
>> http://www.opengroup.org/pubs/online/7908799/xsh/errors.html
>> Strictly from a linux implementation perspective, epoll/poll tell  
>> us that the bytes are on the socket, so even when EAGAIN is  
>> returned, we can call recv again and get what we wanted.  I've  
>> tested this a bunch, and when EAGAIN is returned (which is  
>> infrequent), the next call invariably returns successfully.  There  
>> were two instances where the code looped up-to around 200 times on  
>> EAGAIN under heavy load.  But looping does turn nbrecv into more of  
>> a brecv, although we avoid all the fcntl calls to turn the socket  
>> into a blocking one just for the recv call.
>> With the socket in non-blocking mode, the conditional:
>>       if (ret == -1 && errno == EWOULDBLOCK)
>>       {
>>           return (len - comp);        /* return amount completed */
>>       }
>> Just doesn't work.  It causes the caller to error and close the  
>> socket.  Not what we want.
>> I think we can get away with doing:
>>       if (!ret)       /* socket closed */
>>       {
>>           errno = EPIPE;
>>           return (-1);
>>       }
>>       if (ret == -1 && (errno == EINTR || errno == EAGAIN || errno  
>> == EWOULDBLOCK))
>>       {
>>           goto nbrecv_restart;
>>       }
>>       else if (ret == -1)
>>       {
>>           return (-1);
>>       }
>> From a practical perspective, this seems to work, and an  
>> implementation that has poll telling us that bytes are ready, but  
>> recv returning EWOULDBLOCK because of anything other than small  
>> timing issues in the kernel seems broken anyway.
>> The alternative is to return the bytes received with the errno, and  
>> on EAGAIN, we would have to add the operation back onto the op  
>> queue with a state variable of how much was received.  The code is  
>> designed to avoid doing this in the first place by polling until  
>> the bytes we need are ready, so doing this would probably be messy.
>> -sam
>> On Dec 10, 2007, at 11:47 PM, Pete Wyckoff wrote:
>>> rross at mcs.anl.gov <mailto:rross at mcs.anl.gov> wrote on Mon, 10 Dec  
>>> 2007 21:19 -0600:
>>>> while a loop will fix it, it would be really nice to understand  
>>>> how we get
>>>> EAGAIN when we think that there are bytes there...
>>> [..]
>>>> On Dec 7, 2007, at 4:55 PM, Sam Lang wrote:
>>>>> I'm seeing recv on a socket in non-blocking mode returning EAGAIN
>>>>> occasionally, even though epoll has just told us there's bytes  
>>>>> waiting.  I
>>>>> guess that's why the call was initially a blocking recv.  I can  
>>>>> add a loop
>>>>> around the non-blocking recv while it returns EAGAIN, unless  
>>>>> someone can
>>>>> think of a better work around.
>>>
>>> The function is getting a bit messy.  I'm all for looping on E* and
>>> thought Sam's original mail made sense.  But on second glance:
>>>
>>> int BMI_sockio_nbrecv(int s,
>>>          void *buf,
>>>          int len)
>>> {
>>>   int ret, comp = len;
>>>
>>>   assert(fcntl(s, F_GETFL, 0) & O_NONBLOCK);
>>>
>>>   while (comp)
>>>   {
>>>     nbrecv_restart:
>>>       ret = recv(s, buf, comp, DEFAULT_MSG_FLAGS);
>>>       if (!ret)       /* socket closed */
>>>       {
>>>           errno = EPIPE;
>>>           return (-1);
>>>       }
>>>       if (ret == -1 && errno == EWOULDBLOCK)
>>>       {
>>>           return (len - comp);        /* return amount completed */
>>>       }
>>>       if (ret == -1 && (errno == EINTR || errno == EAGAIN))
>>>       {
>>>           goto nbrecv_restart;
>>>       }
>>>       else if (ret == -1)
>>>       {
>>>           return (-1);
>>>       }
>>>       comp -= ret;
>>>       buf = (char *)buf + ret;
>>>   }
>>>   return (len - comp);
>>> }
>>>
>>> Note that we get from standard headers:
>>>
>>> /usr/include/asm-generic/errno.h:#define        EWOULDBLOCK      
>>> EAGAIN  /* Operation would block */
>>>
>>> But maybe there are some systems where this is not true?  Not ones
>>> that use glibc, apparently.
>>>
>>> Anyway, the first use of EWOULDBLOCK runs us back to the poll
>>> loop, which is the right thing to do.  The second use of EAGAIN
>>> would lead to a busy loop on recv()->EAGAIN that isn't quite so
>>> nice.  But that code never gets hit.
>>>
>>> I'm not sure that a poll readable result necessarily means we'll get
>>> any bytes on the socket.  There are numerous ways in which things
>>> can get messy.
>>>
>>> -- Pete
>>>
>> ------------------------------------------------------------------------
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> -- 
> Dr. Walter B. Ligon III
> Associate Professor
> ECE Department
> Clemson University
>



More information about the Pvfs2-developers mailing list