[Pvfs2-developers] epoll fun
Sam Lang
slang at mcs.anl.gov
Wed Oct 3 14:26:25 EDT 2007
On Oct 3, 2007, at 12:56 PM, Pete Wyckoff wrote:
> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 10:55 -0500:
>> On Oct 3, 2007, at 10:17 AM, Pete Wyckoff wrote:
>>> Can you do something on the server like:
>>>
>>> tcpdump -ttt
>>> strace -tt -T
>>>
>>> to distinguish the two cases of 1) epoll_wait is taking a long time
>>> after the packet shows up at the host, vs 2) the client request
>>> packet is taking a long time to show up.
>>
>> I'm fairly sure its number 1). I got dumps off the server while I
>> was doing creates and deletes over the VFS on a system that had been
>> running for a while and exhibited this performance degradation. The
>> delay was seen between the receipt of the request, and the send of
>> the response. Something in the server handling of the request was
>> slowing it down. At that point Rob suggested I strace the server to
>> see if it was system call related, and we noticed the behavior with
>> epoll.
>
> The times in the plots, as Murali points out, are seconds.
Hmm..did I miss an email?
> So the
> flat line in the middle of your "Picture 19" at .012 represents the
> testcontext timeout of 10ms, more or less. The low dots are when
> something was waiting on the socket. And the high dots, up to 20ms,
> are the events that you think are causing slowness?
Yes, the 10ms timeout is clear. And its obvious that the epoll_wait
calls behave more or less as expected when the server is idling (even
though the pvfs2-client daemon socket is still in the set). The plot
is using histeps instead of points, so the filled-in bits of red
aren't all that informative. The take-away is the really how the
width of the VFS operations part increases over time, showing how
much longer epoll_wait is taking over the pvfs2 admin tools. It
shows that the average time of epoll_wait calls for those VFS
operations is increasing. The fact that the _shortest_ epoll_wait
calls for the VFS are getting larger over time I think is also
interesting. It looks like the 20ms top-off values are just what the
epoll_wait itself times out at. I could certainly spend the time to
get better graphs that show averages over time, I'm not sure it would
be more informative though.
>
> Do these times represent the values in <> braces at the end of the
> epoll_wait lines in 'strace -T' output? Then these 20ms dots would
> be bad, as epoll_wait isn't supposed to sit around longer than its
> timeout, which is 10ms. Presumably you are running on
> CONFIG_HZ=1000 kernels on modern hardware, so tick granularity
> should not be an issue. You could change the 10ms to 1ms and see if
> it just an issue of clock granularity, though.
These particular plots were on a 2.6.20 kernel on 4 cpu, dual core
opterons. I instrumented the server to get timestamps before and
after the epoll_wait call. The strace -ttt output gave pretty much
the same results though.
I don't think the 20ms issue is all that important to investigate for
this particular problem. Even with the pvfs admin tools, the
epoll_wait call takes longer than 10ms sometimes, and even the 20ms
happened periodically for the VFS cases, but the other epoll_wait
calls behaved normally, we probably wouldn't notice a problem
overall. But the average for the VFS cases gets much worse over
time, even if its completing at or under 10ms.
>
>>> I'm sure some of us will look at the traces and dumps too, if you
>>> send them out.
>>
>> The traces are huge. :-) On the order of ~500MB. I can probably
>> put them on the web somewhere or something if you really want to sift
>> through them. I also have zoomed in plots of the plots I sent in the
>> previous email, which I can send. I've attached an example, but I
>> have lots more :-).
> [..]
>> The dumps are not as large. I'll try to dig them up.
>
> Yeah, let's use the ESnet that we all pay for. Maybe you can pull
> out the good bits of an strace and tcpdump that happened at the same
> time, or point to the timestamp that is interesting in the traces.
You know the plots are just visual versions of the traces, right?
Still collecting the dumps.
-sam
>
> -- Pete
>
More information about the Pvfs2-developers
mailing list