[Pvfs2-developers] epoll fun

Sam Lang slang at mcs.anl.gov
Thu Oct 4 12:41:43 EDT 2007


On Oct 4, 2007, at 11:12 AM, Pete Wyckoff wrote:

> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 16:31 -0500:
>> I've placed the big trace I have from the instrumented server at:
>>
>> http://www.mcs.anl.gov/~slang/sbig.out
>
> I get nothing useful out of this unfortunately.  In fact, I don't
> see the big values (0.2 ish) you had in your "Iterations 1-10" plot:
>
> $ sed 15242000q sbig.out | ./epoll-hist.pl | sed 1,2d | awk '{if  
> ($2 > max) {max = $2}} END {print max}'
> 0.0167419999925187
>
>> I've attached a dump of the metadata server while doing 10 creates/
>> deletes from the VFS, sleeping 4 secs, and then 10 creates/deletes
>> from the admin tool, sleeping 10 secs, and then repeating that.  The
>> time differences between VFS and pvfs admin tools are apparent.
> [..]
>> If its helpful to you, Wireshark (TAFKA Ethereal) has pvfs dissection
>> out of the box, so if you just load the dump into Wireshark you can
>> see the pvfs requests and responses, along with tcp packets.  This
>> turned out to be really useful to me.
>
> Yeah, wireshark is cute, having PVFS.  But I see no correlation
> between the timestamps on the network and those in the sbig file.

Those are from runs on completely different systems.  Sorry for the  
confusion.

> Was hoping to see a packet arrive in the tcp dump, then see
> epoll_wait END within a few 10s of us after that.  Hoping to prove
> that epoll_wait wasn't the problem here.  But the packet arrivals
> are pretty far away from when sbig says epoll_wait found something.
> More than the 10 ms or so delays you are seeing.

I'll have to look.  Its possible I posted the wrong trace.  :-|

>
>> Let me know if you see anything interesting.  I'm going to keep
>> trying to reproduce this with my synthetic test.
>
> Some observations.  The libc does nothing to epoll.  It is just a
> system call.  In the kernel, the timeout is converted to ticks.
> Seeing the basecase 12ms in your plots, it is clear you are using a
> HZ=250 kernel.  Running HZ=1000 would reduce this to 10ms when
> nothing happens, but probably not fix the problem.
>
> Each epoll fd is locked, but we only do epoll_wait in one thread, so
> there will be no contention.  The incidence of _ctl operations is
> very rare, so no problem there.
>
> When TCP sees an incoming packet, it invokes the epoll callback that
> wakes up the waiting task.  Hard to imagine any delays in here.  Of
> course, the polling task may not actually run until it can be
> scheduled on a processor.
>
> I'm guessing this is simple CPU contention.  Can you look at the
> average run-queue length (load average) during some runs and see if
> that seems to be the case?  Might not be fine-grained enough.  Are
> there other scheduler tools available that can help with this
> approach?  You might change your synthetic test to add some threads
> with lots of CPU-hungry tasks and see if that provokes the behavior.

Yeah that could be, but it doesn't explain why operations from a  
newly connected socket don't create contention, while identical  
operations from a long-lived socket do.

Thanks for the pointers.
-sam

>
> For timing tests here, we'll run with a non-threaded server and
> disable posix locks.  You might experiment with that and see if you
> at least get consistent results.
>
> 		-- Pete
>



More information about the Pvfs2-developers mailing list