[Pvfs2-developers] epoll fun

Sam Lang slang at mcs.anl.gov
Wed Oct 3 14:26:25 EDT 2007


On Oct 3, 2007, at 12:56 PM, Pete Wyckoff wrote:

> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 10:55 -0500:
>> On Oct 3, 2007, at 10:17 AM, Pete Wyckoff wrote:
>>> Can you do something on the server like:
>>>
>>>    tcpdump -ttt
>>>    strace -tt -T
>>>
>>> to distinguish the two cases of 1) epoll_wait is taking a long time
>>> after the packet shows up at the host, vs 2) the client request
>>> packet is taking a long time to show up.
>>
>> I'm fairly sure its number 1).  I got dumps off the server while I
>> was doing creates and deletes over the VFS on a system that had been
>> running for a while and exhibited this performance degradation.  The
>> delay was seen between the receipt of the request, and the send of
>> the response.  Something in the server handling of the request was
>> slowing it down.  At that point Rob suggested I strace the server to
>> see if it was system call related, and we noticed the behavior with
>> epoll.
>
> The times in the plots, as Murali points out, are seconds.

Hmm..did I miss an email?

>   So the
> flat line in the middle of your "Picture 19" at .012 represents the
> testcontext timeout of 10ms, more or less.  The low dots are when
> something was waiting on the socket.  And the high dots, up to 20ms,
> are the events that you think are causing slowness?

Yes, the 10ms timeout is clear.  And its obvious that the epoll_wait  
calls behave more or less as expected when the server is idling (even  
though the pvfs2-client daemon socket is still in the set).  The plot  
is using histeps instead of points, so the filled-in bits of red  
aren't all that informative.  The take-away is the really how the  
width of the VFS operations part increases over time, showing how  
much longer epoll_wait is taking over the pvfs2 admin tools.  It  
shows that the average time of epoll_wait calls for those VFS  
operations is increasing.  The fact that the _shortest_ epoll_wait  
calls for the VFS are getting larger over time I think is also  
interesting.  It looks like the 20ms top-off values are just what the  
epoll_wait itself times out at.  I could certainly spend the time to  
get better graphs that show averages over time, I'm not sure it would  
be more informative though.

>
> Do these times represent the values in <> braces at the end of the
> epoll_wait lines in 'strace -T' output?  Then these 20ms dots would
> be bad, as epoll_wait isn't supposed to sit around longer than its
> timeout, which is 10ms.  Presumably you are running on
> CONFIG_HZ=1000 kernels on modern hardware, so tick granularity
> should not be an issue.  You could change the 10ms to 1ms and see if
> it just an issue of clock granularity, though.

These particular plots were on a 2.6.20 kernel on 4 cpu, dual core  
opterons.  I instrumented the server to get timestamps before and  
after the epoll_wait call.  The strace -ttt output gave pretty much  
the same results though.

I don't think the 20ms issue is all that important to investigate for  
this particular problem.  Even with the pvfs admin tools, the  
epoll_wait call takes longer than 10ms sometimes, and even the 20ms  
happened periodically for the VFS cases, but the other epoll_wait  
calls behaved normally, we probably wouldn't notice a problem  
overall.  But the average for the VFS cases gets much worse over  
time, even if its completing at or under 10ms.

>
>>> I'm sure some of us will look at the traces and dumps too, if you
>>> send them out.
>>
>> The traces are huge.  :-)  On the order of ~500MB.  I can probably
>> put them on the web somewhere or something if you really want to sift
>> through them.  I also have zoomed in plots of the plots I sent in the
>> previous email, which I can send.  I've attached an example, but I
>> have lots more :-).
> [..]
>> The dumps are not as large.  I'll try to dig them up.
>
> Yeah, let's use the ESnet that we all pay for.  Maybe you can pull
> out the good bits of an strace and tcpdump that happened at the same
> time, or point to the timestamp that is interesting in the traces.

You know the plots are just visual versions of the traces, right?   
Still collecting the dumps.
-sam

>
> 		-- Pete
>



More information about the Pvfs2-developers mailing list