[PVFS-developers] IOD File Descriptor Leak
Porter Don
PorterDE@mercury.hendrix.edu
Tue, 16 Sep 2003 12:02:35 -0500
Thanks Rob. This is very helpful.
The problem with initializing the sock field in do_open_req() is that (at
least in some observation I did), this seems to always be called by the
manager.
I looked at pvfs_open.c and it seems to call the manager with a MGR_OPEN
request, but it sends a NOOP to the iods just to initialize the connection.
In mgr.c:do_open() the manager sends the IOD_OPEN request to the iods if
this is the first time the file has been opened.
In some tests I did with a similar one-line fix, all finfo's got associated
with the manager (usually sock fd 0). So unless they are re-assigned in
do_rw_req(), they will still not be cleaned up unless the manager is
bounced.
Also, what is the capability number? I can't figure that out. My wrong
conclusion that it grabbed the first finfo was based on the fact that all
requests I see used a capability number of 0.
Thanks,
Don
-----Original Message-----
From: Rob Ross
To: Porter Don
Cc: 'pvfs-developers@www.beowulf-underground.org'
Sent: 9/16/03 11:28 AM
Subject: Re: [PVFS-developers] IOD File Descriptor Leak
Hi Don,
I see what you're talking about; the file structure is allocated in
iod.c:do_open_req() but the sock field isn't set until
jobs.c:do_rw_req().
This is a problem because the flist_purge_sock() call then cannot
correctly remove the file structure on socket close.
However, your assertion that do_rw_job() just grabs the first finfo
isn't
correct. That search is based on both the ino and the capability
number,
which is different for every client.
I'm not sure that there is still a clear reason to have multiple file
descriptors for each socket; however, it is currently partially used for
permission purposes, and it isn't clear to me how moving to a single FD
would change how the kernel perceives what the iod is doing.
I'm checking in a single-line fix that associated the socket with the
file
structure at open time. I think that this will fix the leak. If you
want
to investigate changing to using a single FD for each file, that's cool,
but I'm a little hesitant to switch that up without significant testing.
Thanks for the leak catch!
Regards,
Rob
On Tue, 16 Sep 2003, Porter Don wrote:
> I have been noticing on some of the iods what appears to be a file
> descriptor leak. In looking at the iod code, it seems to be happening
when
> a job cancels between opening a file and doing a rw job. The socket
number
> in the finfo struct is initialized to -1 and doesn't get set to a real
> socket until a rw job gets it from the list. If this field never gets
> initialized, the finfo will not be cleaned up when the socket closes.
The
> leak seems to be introduced because right now the do_rw_job function
just
> grabs the first finfo for the inode it wants and assigns its socket if
it is
> uninitialized, completely ignoring all but the first finfo on the
linked
> list.
>
> This raises the question, is there any reason to have multiple file
> descriptors for the same inode file at all? The iods are single
threaded
> and are effectively sharing one anyway. It seems that perhaps
changing the
> finfo structure to keep a linked list of sockets as a sort of
"reference
> count" (like the mgr) might be more efficient and plug up this leak.
I
> suppose this would also need to be coupled with having something in
> check_socks that would clean up unclaimed finfo structs after a period
of
> idleness.
>
> Anyway, I am about to start working on this, but I thought I would
bounce it
> off of the list and see if anyone had any insights.