[Pvfs2-developers] Re: [Pvfs2-cvs] commit by slang in pvfs2/src/io/bmi/bmi_tcp: bmi-tcp.c sockio.c

Rob Ross rross at mcs.anl.gov
Tue Dec 11 12:22:25 EST 2007


i was more concerned that we not lose track of awaiting data because  
of not retrying, but i'm woefully uninformed on how poll()/epoll do  
their thing at this point.

the solution you chose (in later email) seems good to me.

rob

On Dec 11, 2007, at 10:52 AM, Sam Lang wrote:

>
> That seems possible.  I did some reading and couldn't find any  
> obvious reasons the kernel does this, but I think the basic answer  
> is that it doesn't break the semantics of recv, as there aren't any  
> semantic guarantees between results from poll and calls to recv.   
> To investigate further I think would require looking at the kernel  
> code, and while I'm interested in what's going on there, its not  
> something I want to dig into right now.  The answer isn't going to  
> change the behavior of the functions in any case. :-)
>
> I think Kevin's and maybe Rob's concerns are that recv would loop  
> forever returning EAGAIN, and just from empirical evidence, it  
> doesn't appear to, so I would go with that as the current solution.
> -sam
>
>
> On Dec 11, 2007, at 10:46 AM, Walter B. Ligon III wrote:
>
>> Shooting from the hip here, but is it possible EAGAIN might  
>> indicate there is a structure locked in the socket - say if a  
>> packet receive handler is running - which would block the call,  
>> even though there ARE bytes in the socket?
>>
>> Walt
>>
>> Sam Lang wrote:
>>> I agree Pete -- its messy.  Just by the names of errnos, it seems  
>>> appropriate to return what's been completed if we get  
>>> EWOULDBLOCK, while EAGAIN suggests we can just call recv again  
>>> and get what we want.  But as you point out they're the same  
>>> value.  According to the opengroup, impls _may_ assign the same  
>>> value to both:
>>> http://www.opengroup.org/pubs/online/7908799/xsh/errors.html
>>> Strictly from a linux implementation perspective, epoll/poll tell  
>>> us that the bytes are on the socket, so even when EAGAIN is  
>>> returned, we can call recv again and get what we wanted.  I've  
>>> tested this a bunch, and when EAGAIN is returned (which is  
>>> infrequent), the next call invariably returns successfully.   
>>> There were two instances where the code looped up-to around 200  
>>> times on EAGAIN under heavy load.  But looping does turn nbrecv  
>>> into more of a brecv, although we avoid all the fcntl calls to  
>>> turn the socket into a blocking one just for the recv call.
>>> With the socket in non-blocking mode, the conditional:
>>>       if (ret == -1 && errno == EWOULDBLOCK)
>>>       {
>>>           return (len - comp);        /* return amount completed */
>>>       }
>>> Just doesn't work.  It causes the caller to error and close the  
>>> socket.  Not what we want.
>>> I think we can get away with doing:
>>>       if (!ret)       /* socket closed */
>>>       {
>>>           errno = EPIPE;
>>>           return (-1);
>>>       }
>>>       if (ret == -1 && (errno == EINTR || errno == EAGAIN ||  
>>> errno == EWOULDBLOCK))
>>>       {
>>>           goto nbrecv_restart;
>>>       }
>>>       else if (ret == -1)
>>>       {
>>>           return (-1);
>>>       }
>>> From a practical perspective, this seems to work, and an  
>>> implementation that has poll telling us that bytes are ready, but  
>>> recv returning EWOULDBLOCK because of anything other than small  
>>> timing issues in the kernel seems broken anyway.
>>> The alternative is to return the bytes received with the errno,  
>>> and on EAGAIN, we would have to add the operation back onto the  
>>> op queue with a state variable of how much was received.  The  
>>> code is designed to avoid doing this in the first place by  
>>> polling until the bytes we need are ready, so doing this would  
>>> probably be messy.
>>> -sam
>>> On Dec 10, 2007, at 11:47 PM, Pete Wyckoff wrote:
>>>> rross at mcs.anl.gov <mailto:rross at mcs.anl.gov> wrote on Mon, 10  
>>>> Dec 2007 21:19 -0600:
>>>>> while a loop will fix it, it would be really nice to understand  
>>>>> how we get
>>>>> EAGAIN when we think that there are bytes there...
>>>> [..]
>>>>> On Dec 7, 2007, at 4:55 PM, Sam Lang wrote:
>>>>>> I'm seeing recv on a socket in non-blocking mode returning EAGAIN
>>>>>> occasionally, even though epoll has just told us there's bytes  
>>>>>> waiting.  I
>>>>>> guess that's why the call was initially a blocking recv.  I  
>>>>>> can add a loop
>>>>>> around the non-blocking recv while it returns EAGAIN, unless  
>>>>>> someone can
>>>>>> think of a better work around.
>>>>
>>>> The function is getting a bit messy.  I'm all for looping on E* and
>>>> thought Sam's original mail made sense.  But on second glance:
>>>>
>>>> int BMI_sockio_nbrecv(int s,
>>>>          void *buf,
>>>>          int len)
>>>> {
>>>>   int ret, comp = len;
>>>>
>>>>   assert(fcntl(s, F_GETFL, 0) & O_NONBLOCK);
>>>>
>>>>   while (comp)
>>>>   {
>>>>     nbrecv_restart:
>>>>       ret = recv(s, buf, comp, DEFAULT_MSG_FLAGS);
>>>>       if (!ret)       /* socket closed */
>>>>       {
>>>>           errno = EPIPE;
>>>>           return (-1);
>>>>       }
>>>>       if (ret == -1 && errno == EWOULDBLOCK)
>>>>       {
>>>>           return (len - comp);        /* return amount completed */
>>>>       }
>>>>       if (ret == -1 && (errno == EINTR || errno == EAGAIN))
>>>>       {
>>>>           goto nbrecv_restart;
>>>>       }
>>>>       else if (ret == -1)
>>>>       {
>>>>           return (-1);
>>>>       }
>>>>       comp -= ret;
>>>>       buf = (char *)buf + ret;
>>>>   }
>>>>   return (len - comp);
>>>> }
>>>>
>>>> Note that we get from standard headers:
>>>>
>>>> /usr/include/asm-generic/errno.h:#define        EWOULDBLOCK      
>>>> EAGAIN  /* Operation would block */
>>>>
>>>> But maybe there are some systems where this is not true?  Not ones
>>>> that use glibc, apparently.
>>>>
>>>> Anyway, the first use of EWOULDBLOCK runs us back to the poll
>>>> loop, which is the right thing to do.  The second use of EAGAIN
>>>> would lead to a busy loop on recv()->EAGAIN that isn't quite so
>>>> nice.  But that code never gets hit.
>>>>
>>>> I'm not sure that a poll readable result necessarily means we'll  
>>>> get
>>>> any bytes on the socket.  There are numerous ways in which things
>>>> can get messy.
>>>>
>>>> -- Pete
>>>>
>>> -------------------------------------------------------------------- 
>>> ----
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>> -- 
>> Dr. Walter B. Ligon III
>> Associate Professor
>> ECE Department
>> Clemson University
>>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list