[Pvfs2-developers] epoll fun

Sam Lang slang at mcs.anl.gov
Thu Oct 4 20:09:40 EDT 2007


It looks like you're right Pete.  poll/epoll isn't to blame here.   
The poll calls are completed only a few microsecs after the request  
is received, even for the VFS cases where polls seems to be slower  
than with the admin tools.  It looks like the poll calls are  
"slowing" because the operations at the server are being delayed,  
causing them to just wait until another event occurs on the socket.

So the real delay is somewhere on the server, between the first  
return of poll and the response being posted (11ms), whereas the VFS  
only takes 1.2ms to receive the response and send another request.

It boggles me how this behavior can take place with the VFS, and not  
with the admin tools.  How can an operation coming from the VFS take  
11ms on the server, but for the same operation coming from the admin  
tool, it only takes 1ms?  I could believe that somehow our operations  
were getting gradually slower over time (which they appear to be),  
but why only with the VFS?

In any case, I think this explains the gradients we see in the  
graphs.  Each of the different visible contours is a different type  
of operation, and the operation is just taking a little bit longer  
each time.  Its easier to see in the poll plot below.  My guess is  
that the one at about 6ms is the getattr, then the two creates and  
the setattr at about 8ms, and finally, the contour at about 10ms is  
the crdirent.

-sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: poll-timings.png
Type: image/png
Size: 28878 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/31de314f/poll-timings-0001.png
-------------- next part --------------


On Oct 4, 2007, at 4:57 PM, Sam Lang wrote:

>
> On Oct 4, 2007, at 4:42 PM, Sam Lang wrote:
>
>>
>> On Oct 4, 2007, at 3:45 PM, Pete Wyckoff wrote:
>>
>>> slang at mcs.anl.gov wrote on Thu, 04 Oct 2007 15:21 -0500:
>>>> I'm in the process of getting a trace and dump from the same server
>>>> during the same runs.
>>>
>>> Good, will be interesting too see tcpdump + strace.
>>
>> Pete,
>>
>> Here's a trace and dump from the same set of runs.
>>
>> http://www.mcs.anl.gov/~slang/epoll-trace.out [370MB]
>> http://www.mcs.anl.gov/~slang/server-packets.dump [180MB]
>>
>> Again, the trace isn't from strace, I'm tracing the epoll calls  
>> manually.
>>
>
> Damnit, I forgot to recompile the servers.  These results are with  
> poll.  The slowdown still exists for these runs though.
> -sam
>
>> -sam
>>
>>
>>>   Although I
>>> worry we might not learn anything new.
>>>
>>>> In the meantime, I attached two "zoomed" plots of the last one I
>>>> sent, with ranges set to 5000-5200 secs and 5000-6000 secs,
>>>> respectively.  I thought they were interesting, both for the  
>>>> behavior
>>>> that epoll_wait exhibits during operations over the VFS, and  
>>>> just for
>>>> the patterns they show at a scale of 100s of seconds (it almost  
>>>> looks
>>>> like a seirpinski triangle).  Not that it sheds much light on the
>>>> problem.
>>>
>>> Pretty.  Maybe there's a category for Most Attractive Unexplained
>>> Distributed System Art in the state fair this year.
>>>
>>> Would the pending request ID stuff with slog etc be useful in
>>> tracking individual operations?
>>>
>>> 		-- Pete
>>>
>>
>



More information about the Pvfs2-developers mailing list