[Pvfs2-developers] epoll fun
Sam Lang
slang at mcs.anl.gov
Thu Oct 4 14:08:38 EDT 2007
On Oct 4, 2007, at 11:41 AM, Sam Lang wrote:
>
> On Oct 4, 2007, at 11:12 AM, Pete Wyckoff wrote:
>
>> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 16:31 -0500:
>>> I've placed the big trace I have from the instrumented server at:
>>>
>>> http://www.mcs.anl.gov/~slang/sbig.out
>>
>> I get nothing useful out of this unfortunately. In fact, I don't
>> see the big values (0.2 ish) you had in your "Iterations 1-10" plot:
>>
>> $ sed 15242000q sbig.out | ./epoll-hist.pl | sed 1,2d | awk '{if
>> ($2 > max) {max = $2}} END {print max}'
>> 0.0167419999925187
I'm not sure how, but those plots were definitely wrong. I think
trying to add the epoll_ctl calls in there somehow screwed up the
results to put data points higher than than 0.016 value. I know
you're not finding the plots that useful, but I've attached new ones
that use points for each timing instead of the histeps, and get rid
of the epoll_calls.
-sam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-aa.png
Type: image/png
Size: 16100 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-aa-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-ab.png
Type: image/png
Size: 18643 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-ab-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-ac.png
Type: image/png
Size: 17788 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-ac-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-ad.png
Type: image/png
Size: 22192 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-ad-0001.png
-------------- next part --------------
>>
>>> I've attached a dump of the metadata server while doing 10 creates/
>>> deletes from the VFS, sleeping 4 secs, and then 10 creates/deletes
>>> from the admin tool, sleeping 10 secs, and then repeating that. The
>>> time differences between VFS and pvfs admin tools are apparent.
>> [..]
>>> If its helpful to you, Wireshark (TAFKA Ethereal) has pvfs
>>> dissection
>>> out of the box, so if you just load the dump into Wireshark you can
>>> see the pvfs requests and responses, along with tcp packets. This
>>> turned out to be really useful to me.
>>
>> Yeah, wireshark is cute, having PVFS. But I see no correlation
>> between the timestamps on the network and those in the sbig file.
>
> Those are from runs on completely different systems. Sorry for the
> confusion.
>
>> Was hoping to see a packet arrive in the tcp dump, then see
>> epoll_wait END within a few 10s of us after that. Hoping to prove
>> that epoll_wait wasn't the problem here. But the packet arrivals
>> are pretty far away from when sbig says epoll_wait found something.
>> More than the 10 ms or so delays you are seeing.
>
> I'll have to look. Its possible I posted the wrong trace. :-|
>
>>
>>> Let me know if you see anything interesting. I'm going to keep
>>> trying to reproduce this with my synthetic test.
>>
>> Some observations. The libc does nothing to epoll. It is just a
>> system call. In the kernel, the timeout is converted to ticks.
>> Seeing the basecase 12ms in your plots, it is clear you are using a
>> HZ=250 kernel. Running HZ=1000 would reduce this to 10ms when
>> nothing happens, but probably not fix the problem.
>>
>> Each epoll fd is locked, but we only do epoll_wait in one thread, so
>> there will be no contention. The incidence of _ctl operations is
>> very rare, so no problem there.
>>
>> When TCP sees an incoming packet, it invokes the epoll callback that
>> wakes up the waiting task. Hard to imagine any delays in here. Of
>> course, the polling task may not actually run until it can be
>> scheduled on a processor.
>>
>> I'm guessing this is simple CPU contention. Can you look at the
>> average run-queue length (load average) during some runs and see if
>> that seems to be the case? Might not be fine-grained enough. Are
>> there other scheduler tools available that can help with this
>> approach? You might change your synthetic test to add some threads
>> with lots of CPU-hungry tasks and see if that provokes the behavior.
>
> Yeah that could be, but it doesn't explain why operations from a
> newly connected socket don't create contention, while identical
> operations from a long-lived socket do.
>
> Thanks for the pointers.
> -sam
>
>>
>> For timing tests here, we'll run with a non-threaded server and
>> disable posix locks. You might experiment with that and see if you
>> at least get consistent results.
>>
>> -- Pete
>>
>
More information about the Pvfs2-developers
mailing list