[Pvfs2-developers] epoll fun
Sam Lang
slang at mcs.anl.gov
Wed Oct 3 10:52:55 EDT 2007
Hi Murali,
Thanks for your comments! I've added my own inline.
-sam
On Oct 1, 2007, at 8:24 PM, Murali Vilayannur wrote:
> Hey Sam,
> Ugh..
> First off, really nice detective work!!!
>
>> degrades slowly with a long-lived (weeks and months) PVFS volume.
>> The degradation is significant -- simple metadata operations are an
>> order of magnitude slower after a month or so. The behavior turns
>> out to only occur with the VFS and pvfs2-client daemon: performance
>> of the admin tools (pvfs2-touch, pvfs2-rm, etc.) to the same set of
>> servers remains good. Restarting the client daemon also fixes the
>> problem, suggesting that the long-lived open sockets are somehow the
>> cause. The slowness also appears to be at the servers not the
>> clients: the same kernel module and client daemon to a different
>> filesystem and set of servers doesn't exhibit the performance
>> degradation.
>>
>> Also, I should mention that the system config is a little different
>> than usual. We have IO nodes mounting and unmounting the PVFS
>> volume (and stopping the client daemon) with each user's job, which
>> is fairly frequent, while on the login nodes, the volume remains
>> mounted for a long time (and where the performance degrades).
>>
>> Our hunch here is that epoll or our use of epoll on the servers is
>> somehow to blame. Maybe the file descriptors opened on the server
>> for pvfs2-client-core are getting pushed down further and further
>> into the epoll set, which for some reason is growing with new
>> connections coming and going. This might be the case if we were
>> failing to remove sockets from the set on disconnect, for example.
>> It doesn't look like that's happening though, at least for normal
>> disconnects.
>
> Just to make sure, can't we switch to a poll() based server and see if
> we have the same problem..
Tried that. :-) Its more or less the same problem with poll. The
behavior of poll timings seems a bit less erratic than with epoll,
but the performance degradation is identical.
>
>
>>
>> Its a PITA to debug, because the servers have to remain running for a
>> long time (and the clients have to remain mounted) for the problem to
>> be visible. Rob suggested I use strace on the servers to see what
>> epoll was doing, and that showed some interesting results.
>> Basically, it looks like epoll_wait takes significantly longer when
>> clients are doing operations over the VFS, rather than with the pvfs2
>> admin tools. Also, strace reported epoll_ctl(...,
>> EPOLL_CTL_ADD, ...)) getting called a few times, even for the VFS
>> ops, and in those cases its returning EEXISTS.
>>
>> I noticed that we add a socket to the epoll set whenever we get a new
>> connection, or a read or write is posted (enqueue_operation), but we
>> only remove the socket from the epoll set on errors or disconnects.
>> So why are we adding it for reads and writes? Any connected socket
>> should already be in the set, no? I think this may be why I'm seeing
>> EEXISTS with strace.
>
> yep; Agreed; We shouldn;'t need to add it if it already exists. But
> that is not a bug as far
> as I can tell.
Right -- just an extra system call we don't need?
>>
>> Also, is it safe to check the error from epoll_ctl in
>> BMI_socket_collection_[add|remove]?
>
> Yep; We should be checking for the return value from these functions.
> Perhaps make the _add and _remove as inline functions with return
> values?
Ok yeah - something like that. I'll try to look into it more once I
get the bug figured out.
>
>> And finally, assuming PVFS is actually using epoll calls properly,
>> does anyone know of epoll bugs on a SUSE 2.6.5 kernel that would
>> cause epoll_ctl(..., EPOLL_CTL_DEL, ....) to not do what its meant
>> to? Googling epoll and SUSE 2.6.5 isn't turning up anything...
>
> Nope. none that I can think of..
> thanks,
> Murali
>>
>> Thanks,
>> -sam
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
More information about the Pvfs2-developers
mailing list