[Pvfs2-developers] epoll fun
Sam Lang
slang at mcs.anl.gov
Thu Oct 4 16:21:18 EDT 2007
On Oct 4, 2007, at 1:08 PM, Sam Lang wrote:
>
> I'm not sure how, but those plots were definitely wrong. I think
> trying to add the epoll_ctl calls in there somehow screwed up the
> results to put data points higher than than 0.016 value. I know
> you're not finding the plots that useful, but I've attached new
> ones that use points for each timing instead of the histeps, and
> get rid of the epoll_calls.
>
Pete,
I'm in the process of getting a trace and dump from the same server
during the same runs.
In the meantime, I attached two "zoomed" plots of the last one I
sent, with ranges set to 5000-5200 secs and 5000-6000 secs,
respectively. I thought they were interesting, both for the behavior
that epoll_wait exhibits during operations over the VFS, and just for
the patterns they show at a scale of 100s of seconds (it almost looks
like a seirpinski triangle). Not that it sheds much light on the
problem.
-sam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-admid.png
Type: image/png
Size: 21074 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/45b46b16/sbig-new-admid-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-admid2.png
Type: image/png
Size: 28429 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/45b46b16/sbig-new-admid2-0001.png
-------------- next part --------------
> -sam
>
> <sbig-new-aa.png><sbig-new-ab.png><sbig-new-ac.png><sbig-new-ad.png>
>>>
>>>> I've attached a dump of the metadata server while doing 10 creates/
>>>> deletes from the VFS, sleeping 4 secs, and then 10 creates/deletes
>>>> from the admin tool, sleeping 10 secs, and then repeating that.
>>>> The
>>>> time differences between VFS and pvfs admin tools are apparent.
>>> [..]
>>>> If its helpful to you, Wireshark (TAFKA Ethereal) has pvfs
>>>> dissection
>>>> out of the box, so if you just load the dump into Wireshark you can
>>>> see the pvfs requests and responses, along with tcp packets. This
>>>> turned out to be really useful to me.
>>>
>>> Yeah, wireshark is cute, having PVFS. But I see no correlation
>>> between the timestamps on the network and those in the sbig file.
>>
>> Those are from runs on completely different systems. Sorry for
>> the confusion.
>>
>>> Was hoping to see a packet arrive in the tcp dump, then see
>>> epoll_wait END within a few 10s of us after that. Hoping to prove
>>> that epoll_wait wasn't the problem here. But the packet arrivals
>>> are pretty far away from when sbig says epoll_wait found something.
>>> More than the 10 ms or so delays you are seeing.
>>
>> I'll have to look. Its possible I posted the wrong trace. :-|
>>
>>>
>>>> Let me know if you see anything interesting. I'm going to keep
>>>> trying to reproduce this with my synthetic test.
>>>
>>> Some observations. The libc does nothing to epoll. It is just a
>>> system call. In the kernel, the timeout is converted to ticks.
>>> Seeing the basecase 12ms in your plots, it is clear you are using a
>>> HZ=250 kernel. Running HZ=1000 would reduce this to 10ms when
>>> nothing happens, but probably not fix the problem.
>>>
>>> Each epoll fd is locked, but we only do epoll_wait in one thread, so
>>> there will be no contention. The incidence of _ctl operations is
>>> very rare, so no problem there.
>>>
>>> When TCP sees an incoming packet, it invokes the epoll callback that
>>> wakes up the waiting task. Hard to imagine any delays in here. Of
>>> course, the polling task may not actually run until it can be
>>> scheduled on a processor.
>>>
>>> I'm guessing this is simple CPU contention. Can you look at the
>>> average run-queue length (load average) during some runs and see if
>>> that seems to be the case? Might not be fine-grained enough. Are
>>> there other scheduler tools available that can help with this
>>> approach? You might change your synthetic test to add some threads
>>> with lots of CPU-hungry tasks and see if that provokes the behavior.
>>
>> Yeah that could be, but it doesn't explain why operations from a
>> newly connected socket don't create contention, while identical
>> operations from a long-lived socket do.
>>
>> Thanks for the pointers.
>> -sam
>>
>>>
>>> For timing tests here, we'll run with a non-threaded server and
>>> disable posix locks. You might experiment with that and see if you
>>> at least get consistent results.
>>>
>>> -- Pete
>>>
>>
>
More information about the Pvfs2-developers
mailing list