[Pvfs2-developers] epoll fun

Sam Lang slang at mcs.anl.gov
Wed Oct 3 10:52:55 EDT 2007


Hi Murali,

Thanks for your comments!  I've added my own inline.
-sam

On Oct 1, 2007, at 8:24 PM, Murali Vilayannur wrote:

> Hey Sam,
> Ugh..
> First off, really nice detective work!!!
>
>> degrades slowly with a long-lived (weeks and months) PVFS volume.
>> The degradation is significant -- simple metadata operations are an
>> order of magnitude slower after a month or so.  The behavior turns
>> out to only occur with the VFS and pvfs2-client daemon:  performance
>> of the admin tools (pvfs2-touch, pvfs2-rm, etc.) to the same set of
>> servers remains good.  Restarting the client daemon also fixes the
>> problem, suggesting that the long-lived open sockets are somehow the
>> cause.  The slowness also appears to be at the servers not the
>> clients: the same kernel module and client daemon to a different
>> filesystem and set of servers doesn't exhibit the performance
>> degradation.
>>
>> Also, I should mention that the system config is a little different
>> than usual.  We have IO nodes mounting and unmounting the PVFS
>> volume  (and stopping the client daemon) with each user's job, which
>> is fairly frequent, while on the login nodes, the volume remains
>> mounted for a long time (and where the performance degrades).
>>
>> Our hunch here is that epoll or our use of epoll on the servers is
>> somehow to blame.  Maybe the file descriptors opened on the server
>> for pvfs2-client-core are getting pushed down further and further
>> into the epoll set, which for some reason is growing with new
>> connections coming and going.  This might be the case if we were
>> failing to remove sockets from the set on disconnect, for example.
>> It doesn't look like that's happening though, at least for normal
>> disconnects.
>
> Just to make sure, can't we switch to a poll() based server and see if
> we have the same problem..

Tried that.  :-)  Its more or less the same problem with poll.  The  
behavior of poll timings seems a bit less erratic than with epoll,  
but the performance degradation is identical.

>
>
>>
>> Its a PITA to debug, because the servers have to remain running for a
>> long time (and the clients have to remain mounted) for the problem to
>> be visible.  Rob suggested I use strace on the servers to see what
>> epoll was doing, and that showed some interesting results.
>> Basically, it looks like epoll_wait takes significantly longer when
>> clients are doing operations over the VFS, rather than with the pvfs2
>> admin tools.  Also, strace reported epoll_ctl(...,
>> EPOLL_CTL_ADD, ...)) getting called a few times, even for the VFS
>> ops, and in those cases its returning EEXISTS.
>>
>> I noticed that we add a socket to the epoll set whenever we get a new
>> connection, or a read or write is posted (enqueue_operation), but we
>> only remove the socket from the epoll set on errors or disconnects.
>> So why are we adding it for reads and writes?  Any connected socket
>> should already be in the set, no?  I think this may be why I'm seeing
>> EEXISTS with strace.
>
> yep; Agreed; We shouldn;'t need to add it if it already exists. But
> that is not a bug as far
> as I can tell.

Right -- just an extra system call we don't need?

>>
>> Also, is it safe to check the error from epoll_ctl in
>> BMI_socket_collection_[add|remove]?
>
> Yep; We should be checking for the return value from these functions.
> Perhaps make the _add and _remove as inline functions with return  
> values?

Ok yeah - something like that.  I'll try to look into it more once I  
get the bug figured out.

>
>> And finally, assuming PVFS is actually using epoll calls properly,
>> does anyone know of epoll bugs on a SUSE 2.6.5 kernel that would
>> cause epoll_ctl(..., EPOLL_CTL_DEL, ....) to not do what its meant
>> to?  Googling epoll and SUSE 2.6.5 isn't turning up anything...
>
> Nope. none that I can think of..
> thanks,
> Murali
>>
>> Thanks,
>> -sam
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>



More information about the Pvfs2-developers mailing list