[Pvfs2-developers] epoll fun
Sam Lang
slang at mcs.anl.gov
Wed Oct 3 17:31:36 EDT 2007
Hi Pete,
I've placed the big trace I have from the instrumented server at:
http://www.mcs.anl.gov/~slang/sbig.out
It only traces epoll_wait and epoll_ctl calls (along with any other
bmi tcp debug messages, which are infrequent). Even so, it still
weighs in at around 850MB. A lot of that is all the idle epoll_wait
calls during my 10 minutes of sleeping between each run, so its
bloated, but you asked for it, so.. :-) Its big enough that I split
it into separate output files 20MB each:
http://www.mcs.anl.gov/~slang/sbig.out-[aa-bm]
I have a perl script that goes through and generates a gnuplot file
that looks like the images I sent. I've attached that.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: epoll-hist.pl
Type: text/x-perl-script
Size: 1473 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071003/4411510a/epoll-hist-0001.bin
-------------- next part --------------
You have to do something like:
cat sbig.out-bb | ./epoll-hist.pl > epoll-hist.gp
I generated the zoomed in plots by just setting the range manually in
gnuplot. This will allow you to play with the plot formatting, or
spit out averages or whatever.
I've attached a dump of the metadata server while doing 10 creates/
deletes from the VFS, sleeping 4 secs, and then 10 creates/deletes
from the admin tool, sleeping 10 secs, and then repeating that. The
time differences between VFS and pvfs admin tools are apparent.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pvfs-fs2-dump
Type: application/octet-stream
Size: 3025522 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071003/4411510a/pvfs-fs2-dump-0001.obj
-------------- next part --------------
If its helpful to you, Wireshark (TAFKA Ethereal) has pvfs dissection
out of the box, so if you just load the dump into Wireshark you can
see the pvfs requests and responses, along with tcp packets. This
turned out to be really useful to me.
Let me know if you see anything interesting. I'm going to keep
trying to reproduce this with my synthetic test.
-sam
On Oct 3, 2007, at 12:56 PM, Pete Wyckoff wrote:
> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 10:55 -0500:
>> On Oct 3, 2007, at 10:17 AM, Pete Wyckoff wrote:
>>> Can you do something on the server like:
>>>
>>> tcpdump -ttt
>>> strace -tt -T
>>>
>>> to distinguish the two cases of 1) epoll_wait is taking a long time
>>> after the packet shows up at the host, vs 2) the client request
>>> packet is taking a long time to show up.
>>
>> I'm fairly sure its number 1). I got dumps off the server while I
>> was doing creates and deletes over the VFS on a system that had been
>> running for a while and exhibited this performance degradation. The
>> delay was seen between the receipt of the request, and the send of
>> the response. Something in the server handling of the request was
>> slowing it down. At that point Rob suggested I strace the server to
>> see if it was system call related, and we noticed the behavior with
>> epoll.
>
> The times in the plots, as Murali points out, are seconds. So the
> flat line in the middle of your "Picture 19" at .012 represents the
> testcontext timeout of 10ms, more or less. The low dots are when
> something was waiting on the socket. And the high dots, up to 20ms,
> are the events that you think are causing slowness?
>
> Do these times represent the values in <> braces at the end of the
> epoll_wait lines in 'strace -T' output? Then these 20ms dots would
> be bad, as epoll_wait isn't supposed to sit around longer than its
> timeout, which is 10ms. Presumably you are running on
> CONFIG_HZ=1000 kernels on modern hardware, so tick granularity
> should not be an issue. You could change the 10ms to 1ms and see if
> it just an issue of clock granularity, though.
>
>>> I'm sure some of us will look at the traces and dumps too, if you
>>> send them out.
>>
>> The traces are huge. :-) On the order of ~500MB. I can probably
>> put them on the web somewhere or something if you really want to sift
>> through them. I also have zoomed in plots of the plots I sent in the
>> previous email, which I can send. I've attached an example, but I
>> have lots more :-).
> [..]
>> The dumps are not as large. I'll try to dig them up.
>
> Yeah, let's use the ESnet that we all pay for. Maybe you can pull
> out the good bits of an strace and tcpdump that happened at the same
> time, or point to the timestamp that is interesting in the traces.
>
> -- Pete
>
More information about the Pvfs2-developers
mailing list