[Pvfs2-developers] epoll fun

Sam Lang slang at mcs.anl.gov
Thu Oct 4 16:21:18 EDT 2007


On Oct 4, 2007, at 1:08 PM, Sam Lang wrote:
>
> I'm not sure how, but those plots were definitely wrong.  I think  
> trying to add the epoll_ctl calls in there somehow screwed up the  
> results to put data points higher than than 0.016 value.  I know  
> you're not finding the plots that useful, but I've attached new  
> ones that use points for each timing instead of the histeps, and  
> get rid of the epoll_calls.
>

Pete,

I'm in the process of getting a trace and dump from the same server  
during the same runs.

In the meantime, I attached two "zoomed" plots of the last one I  
sent, with ranges set to 5000-5200 secs and 5000-6000 secs,  
respectively.  I thought they were interesting, both for the behavior  
that epoll_wait exhibits during operations over the VFS, and just for  
the patterns they show at a scale of 100s of seconds (it almost looks  
like a seirpinski triangle).  Not that it sheds much light on the  
problem.

-sam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-admid.png
Type: image/png
Size: 21074 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/45b46b16/sbig-new-admid-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbig-new-admid2.png
Type: image/png
Size: 28429 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071004/45b46b16/sbig-new-admid2-0001.png
-------------- next part --------------

> -sam
>
> <sbig-new-aa.png><sbig-new-ab.png><sbig-new-ac.png><sbig-new-ad.png>
>>>
>>>> I've attached a dump of the metadata server while doing 10 creates/
>>>> deletes from the VFS, sleeping 4 secs, and then 10 creates/deletes
>>>> from the admin tool, sleeping 10 secs, and then repeating that.   
>>>> The
>>>> time differences between VFS and pvfs admin tools are apparent.
>>> [..]
>>>> If its helpful to you, Wireshark (TAFKA Ethereal) has pvfs  
>>>> dissection
>>>> out of the box, so if you just load the dump into Wireshark you can
>>>> see the pvfs requests and responses, along with tcp packets.  This
>>>> turned out to be really useful to me.
>>>
>>> Yeah, wireshark is cute, having PVFS.  But I see no correlation
>>> between the timestamps on the network and those in the sbig file.
>>
>> Those are from runs on completely different systems.  Sorry for  
>> the confusion.
>>
>>> Was hoping to see a packet arrive in the tcp dump, then see
>>> epoll_wait END within a few 10s of us after that.  Hoping to prove
>>> that epoll_wait wasn't the problem here.  But the packet arrivals
>>> are pretty far away from when sbig says epoll_wait found something.
>>> More than the 10 ms or so delays you are seeing.
>>
>> I'll have to look.  Its possible I posted the wrong trace.  :-|
>>
>>>
>>>> Let me know if you see anything interesting.  I'm going to keep
>>>> trying to reproduce this with my synthetic test.
>>>
>>> Some observations.  The libc does nothing to epoll.  It is just a
>>> system call.  In the kernel, the timeout is converted to ticks.
>>> Seeing the basecase 12ms in your plots, it is clear you are using a
>>> HZ=250 kernel.  Running HZ=1000 would reduce this to 10ms when
>>> nothing happens, but probably not fix the problem.
>>>
>>> Each epoll fd is locked, but we only do epoll_wait in one thread, so
>>> there will be no contention.  The incidence of _ctl operations is
>>> very rare, so no problem there.
>>>
>>> When TCP sees an incoming packet, it invokes the epoll callback that
>>> wakes up the waiting task.  Hard to imagine any delays in here.  Of
>>> course, the polling task may not actually run until it can be
>>> scheduled on a processor.
>>>
>>> I'm guessing this is simple CPU contention.  Can you look at the
>>> average run-queue length (load average) during some runs and see if
>>> that seems to be the case?  Might not be fine-grained enough.  Are
>>> there other scheduler tools available that can help with this
>>> approach?  You might change your synthetic test to add some threads
>>> with lots of CPU-hungry tasks and see if that provokes the behavior.
>>
>> Yeah that could be, but it doesn't explain why operations from a  
>> newly connected socket don't create contention, while identical  
>> operations from a long-lived socket do.
>>
>> Thanks for the pointers.
>> -sam
>>
>>>
>>> For timing tests here, we'll run with a non-threaded server and
>>> disable posix locks.  You might experiment with that and see if you
>>> at least get consistent results.
>>>
>>> 		-- Pete
>>>
>>
>



More information about the Pvfs2-developers mailing list