[Pvfs2-developers] epoll fun

Sam Lang slang at mcs.anl.gov
Wed Oct 3 11:55:54 EDT 2007


On Oct 3, 2007, at 10:17 AM, Pete Wyckoff wrote:

> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 09:52 -0500:
>> Tried that.  :-)  Its more or less the same problem with poll.  The
>> behavior of poll timings seems a bit less erratic than with epoll,
>> but the performance degradation is identical.
> [..]
>>>> Its a PITA to debug, because the servers have to remain running  
>>>> for a
>>>> long time (and the clients have to remain mounted) for the  
>>>> problem to
>>>> be visible.  Rob suggested I use strace on the servers to see what
>>>> epoll was doing, and that showed some interesting results.
>>>> Basically, it looks like epoll_wait takes significantly longer when
>>>> clients are doing operations over the VFS, rather than with the  
>>>> pvfs2
>>>> admin tools.  Also, strace reported epoll_ctl(...,
>>>> EPOLL_CTL_ADD, ...)) getting called a few times, even for the VFS
>>>> ops, and in those cases its returning EEXISTS.
>
> Really?  Poll also behaves the same?  Now I am intrigued.

Heh.  That's what it takes huh?  I'll have to start adding random  
poll comments in my emails to get through your filter. ;-)

>
> You can't really tell how long epoll_wait is taking just using
> strace, since it will wait until a packet arrives plus this
> mysterious extra time.
>
> Can you do something on the server like:
>
>     tcpdump -ttt
>     strace -tt -T
>
> to distinguish the two cases of 1) epoll_wait is taking a long time
> after the packet shows up at the host, vs 2) the client request
> packet is taking a long time to show up.

I'm fairly sure its number 1).  I got dumps off the server while I  
was doing creates and deletes over the VFS on a system that had been  
running for a while and exhibited this performance degradation.  The  
delay was seen between the receipt of the request, and the send of  
the response.  Something in the server handling of the request was  
slowing it down.  At that point Rob suggested I strace the server to  
see if it was system call related, and we noticed the behavior with  
epoll.

>
> If (2), try the same exercise at the client side.

This has been a problem we've been seeing on our BGL system at  
Argonne for over a year.  Its taken me that long to dig into where  
the degradation occurs, but it was tough to pin down, in part because  
it wasn't clear if it was a client side problem or server side.  On a  
production system admins usually just restart everything.  After  
working with one of the admins here, we started noticing that  
restarting the servers seemed to fix it, but then noticed that  
restarting the client daemon seemed to fix it too.

With BGL, IO nodes mount the pvfs volume every time they reboot,  
which is essentially every time a new job runs.  So the problem  
wasn't visible there, but it was possibly the cause (many connections  
coming and going over time).  This would cause the degradation on the  
login nodes, which remained up and connected for weeks and months.   
My guess is that it hasn't been a visible problem for many users  
because their workloads differ -- either they always use the VFS and  
not the admin tools/MPI-IO, or vice-versa.  Mixed MPI-IO runs and a  
mounted pvfs volume should cause the slowdown in the mounted volume  
though.

>
> I'm sure some of us will look at the traces and dumps too, if you
> send them out.

The traces are huge.  :-)  On the order of ~500MB.  I can probably  
put them on the web somewhere or something if you really want to sift  
through them.  I also have zoomed in plots of the plots I sent in the  
previous email, which I can send.  I've attached an example, but I  
have lots more :-).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 19.png
Type: image/png
Size: 9128 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071003/1c975560/Picture19.png
-------------- next part --------------

The dumps are not as large.  I'll try to dig them up.
-sam

>
> 		-- Pete
>



More information about the Pvfs2-developers mailing list