[Pvfs2-developers] epoll fun
Sam Lang
slang at mcs.anl.gov
Wed Oct 3 11:55:54 EDT 2007
On Oct 3, 2007, at 10:17 AM, Pete Wyckoff wrote:
> slang at mcs.anl.gov wrote on Wed, 03 Oct 2007 09:52 -0500:
>> Tried that. :-) Its more or less the same problem with poll. The
>> behavior of poll timings seems a bit less erratic than with epoll,
>> but the performance degradation is identical.
> [..]
>>>> Its a PITA to debug, because the servers have to remain running
>>>> for a
>>>> long time (and the clients have to remain mounted) for the
>>>> problem to
>>>> be visible. Rob suggested I use strace on the servers to see what
>>>> epoll was doing, and that showed some interesting results.
>>>> Basically, it looks like epoll_wait takes significantly longer when
>>>> clients are doing operations over the VFS, rather than with the
>>>> pvfs2
>>>> admin tools. Also, strace reported epoll_ctl(...,
>>>> EPOLL_CTL_ADD, ...)) getting called a few times, even for the VFS
>>>> ops, and in those cases its returning EEXISTS.
>
> Really? Poll also behaves the same? Now I am intrigued.
Heh. That's what it takes huh? I'll have to start adding random
poll comments in my emails to get through your filter. ;-)
>
> You can't really tell how long epoll_wait is taking just using
> strace, since it will wait until a packet arrives plus this
> mysterious extra time.
>
> Can you do something on the server like:
>
> tcpdump -ttt
> strace -tt -T
>
> to distinguish the two cases of 1) epoll_wait is taking a long time
> after the packet shows up at the host, vs 2) the client request
> packet is taking a long time to show up.
I'm fairly sure its number 1). I got dumps off the server while I
was doing creates and deletes over the VFS on a system that had been
running for a while and exhibited this performance degradation. The
delay was seen between the receipt of the request, and the send of
the response. Something in the server handling of the request was
slowing it down. At that point Rob suggested I strace the server to
see if it was system call related, and we noticed the behavior with
epoll.
>
> If (2), try the same exercise at the client side.
This has been a problem we've been seeing on our BGL system at
Argonne for over a year. Its taken me that long to dig into where
the degradation occurs, but it was tough to pin down, in part because
it wasn't clear if it was a client side problem or server side. On a
production system admins usually just restart everything. After
working with one of the admins here, we started noticing that
restarting the servers seemed to fix it, but then noticed that
restarting the client daemon seemed to fix it too.
With BGL, IO nodes mount the pvfs volume every time they reboot,
which is essentially every time a new job runs. So the problem
wasn't visible there, but it was possibly the cause (many connections
coming and going over time). This would cause the degradation on the
login nodes, which remained up and connected for weeks and months.
My guess is that it hasn't been a visible problem for many users
because their workloads differ -- either they always use the VFS and
not the admin tools/MPI-IO, or vice-versa. Mixed MPI-IO runs and a
mounted pvfs volume should cause the slowdown in the mounted volume
though.
>
> I'm sure some of us will look at the traces and dumps too, if you
> send them out.
The traces are huge. :-) On the order of ~500MB. I can probably
put them on the web somewhere or something if you really want to sift
through them. I also have zoomed in plots of the plots I sent in the
previous email, which I can send. I've attached an example, but I
have lots more :-).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 19.png
Type: image/png
Size: 9128 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20071003/1c975560/Picture19.png
-------------- next part --------------
The dumps are not as large. I'll try to dig them up.
-sam
>
> -- Pete
>
More information about the Pvfs2-developers
mailing list