[Pvfs2-developers] epoll fun

Sam Lang slang at mcs.anl.gov
Thu Oct 4 14:08:38 EDT 2007


On Oct 4, 2007, at 11:41 AM, Sam Lang wrote:

>
> On Oct 4, 2007, at 11:12 AM, Pete Wyckoff wrote:
>
>> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 16:31 -0500:
>>> I've placed the big trace I have from the instrumented server at:
>>>
>>> http://www.mcs.anl.gov/~slang/sbig.out
>>
>> I get nothing useful out of this unfortunately.  In fact, I don't
>> see the big values (0.2 ish) you had in your "Iterations 1-10" plot:
>>
>> $ sed 15242000q sbig.out | ./epoll-hist.pl | sed 1,2d | awk '{if  
>> ($2 > max) {max = $2}} END {print max}'
>> 0.0167419999925187

I'm not sure how, but those plots were definitely wrong.  I think  
trying to add the epoll_ctl calls in there somehow screwed up the  
results to put data points higher than than 0.016 value.  I know  
you're not finding the plots that useful, but I've attached new ones  
that use points for each timing instead of the histeps, and get rid  
of the epoll_calls.

-sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-aa.png
Type: image/png
Size: 16100 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-aa-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-ab.png
Type: image/png
Size: 18643 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-ab-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-ac.png
Type: image/png
Size: 17788 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-ac-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-ad.png
Type: image/png
Size: 22192 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/f01097f0/sbig-new-ad-0001.png
-------------- next part --------------

>>
>>> I've attached a dump of the metadata server while doing 10 creates/
>>> deletes from the VFS, sleeping 4 secs, and then 10 creates/deletes
>>> from the admin tool, sleeping 10 secs, and then repeating that.  The
>>> time differences between VFS and pvfs admin tools are apparent.
>> [..]
>>> If its helpful to you, Wireshark (TAFKA Ethereal) has pvfs  
>>> dissection
>>> out of the box, so if you just load the dump into Wireshark you can
>>> see the pvfs requests and responses, along with tcp packets.  This
>>> turned out to be really useful to me.
>>
>> Yeah, wireshark is cute, having PVFS.  But I see no correlation
>> between the timestamps on the network and those in the sbig file.
>
> Those are from runs on completely different systems.  Sorry for the  
> confusion.
>
>> Was hoping to see a packet arrive in the tcp dump, then see
>> epoll_wait END within a few 10s of us after that.  Hoping to prove
>> that epoll_wait wasn't the problem here.  But the packet arrivals
>> are pretty far away from when sbig says epoll_wait found something.
>> More than the 10 ms or so delays you are seeing.
>
> I'll have to look.  Its possible I posted the wrong trace.  :-|
>
>>
>>> Let me know if you see anything interesting.  I'm going to keep
>>> trying to reproduce this with my synthetic test.
>>
>> Some observations.  The libc does nothing to epoll.  It is just a
>> system call.  In the kernel, the timeout is converted to ticks.
>> Seeing the basecase 12ms in your plots, it is clear you are using a
>> HZ=250 kernel.  Running HZ=1000 would reduce this to 10ms when
>> nothing happens, but probably not fix the problem.
>>
>> Each epoll fd is locked, but we only do epoll_wait in one thread, so
>> there will be no contention.  The incidence of _ctl operations is
>> very rare, so no problem there.
>>
>> When TCP sees an incoming packet, it invokes the epoll callback that
>> wakes up the waiting task.  Hard to imagine any delays in here.  Of
>> course, the polling task may not actually run until it can be
>> scheduled on a processor.
>>
>> I'm guessing this is simple CPU contention.  Can you look at the
>> average run-queue length (load average) during some runs and see if
>> that seems to be the case?  Might not be fine-grained enough.  Are
>> there other scheduler tools available that can help with this
>> approach?  You might change your synthetic test to add some threads
>> with lots of CPU-hungry tasks and see if that provokes the behavior.
>
> Yeah that could be, but it doesn't explain why operations from a  
> newly connected socket don't create contention, while identical  
> operations from a long-lived socket do.
>
> Thanks for the pointers.
> -sam
>
>>
>> For timing tests here, we'll run with a non-threaded server and
>> disable posix locks.  You might experiment with that and see if you
>> at least get consistent results.
>>
>> 		-- Pete
>>
>



More information about the Pvfs2-developers mailing list