Go to file
Mark Vitale c5993b0a4f rx: prevent leak of cache manager NAT ping rx_connections
Both the Unix and Windows cache managers maintain a set of persistent
client rx_connections ("conns") to the known fileservers for each active
AFS user.  These conns are periodically refreshed (destroyed and
possibly rebuilt) approximately NOTOKETIMEOUT (2h) after token
expiration for authenticated (rx_kad) conns, or every NOTOKETIMEOUT (2h)
for anonymous (rx_null) conns.

Both cache managers enable NAT ping for one rx_connection to each known
fileserver.  Thus we see this common idiom when a cache manager destroys
a fileserver connection:

  rx_SetConnSecondsUntilNatPing(conn, 0);
  rx_DestroyConnection(conn);

Doing this for all conns is harmless, even if a given conn doesn't have
NAT ping enabled.

It is important to note that rx_SetConnSecondsUntilNatPing(conn, 0) does
not actually cancel the conn's natKeepAliveEvent (if it has one); it
merely sets conn->secondsUntilNatPing to 0.  If there is a
natKeepAliveEvent, this prevents it from being rescheduled after the
next event.  This was fine in the past because rx_DestroyConnection
eventually did cancel natKeepAliveEvent and destroy the rx_connection
itself.

However, this idiom broke after commit 304d758983 "Standardize
rx_event usage" introduced a number of changes:
 - an extra rx_connection refCount for each outstanding conn event
 - a requirement (enforced by osi_Assert) that all conn events be
   canceled before rx_DestroyConnection can succeed

Therefore, if natKeepAliveEvent is still active when we enter
rx_DestroyConnection, it will return early, due to the extra conn
refCount associated with the NAT ping event.  The rx_connection is not
destroyed in this case.

Eventually, the final scheduled natKeepAliveEvent will fire; the event
will not be rescheduled, and the final conn refCount will be removed via
putConnection and not rx_DestroyConnection (and so the conn is not
destroyed). This rx_connection now has refCount 0, but can never be
destroyed - it has been leaked:
 - The cache manager has "forgotten" this rx_connection and has no means
   to invoke rx_DestroyConnection again.
 - rxi_ReapConnections will not destroy it because it is a client conn
   and is still on the rx_connHashTable, not the rx_connCleanup_list.
 - If there is still a dallying rx_call, any eventual call to
   rxi_CheckCall -> rxi_FreeCall will not destroy the conn because it
   has not been flagged RX_CONN_DESTROY_ME.

Modify rx_SetConnSecondsUntilNatPing to explicitly cancel any
natKeepAliveEvent and remove its refcount if cancelled.  With this
change in place, the cache managers will no longer periodically leak
client rx_connections.

Change-Id: I4e89ebc4bd2c95b6e61b95bd8f91867d451dd34c
Reviewed-on: https://gerrit.openafs.org/14951
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
2024-10-10 16:19:20 -04:00
build-tools make-release: create SHA256 checksums too 2024-04-25 12:22:19 -04:00
doc auth: Remove src/auth/copyauth 2024-10-09 16:35:34 -04:00
src rx: prevent leak of cache manager NAT ping rx_connections 2024-10-10 16:19:20 -04:00
tests tests: Fix perl string concatenation spacing 2024-09-12 11:36:12 -04:00
.gitignore Remove alpha_dux/alpha_osf references 2018-09-22 17:05:26 -04:00
.gitreview Add .gitreview 2018-02-04 15:34:55 -05:00
.mailmap git: add a mailmap file 2016-09-25 21:05:23 -04:00
.splintrc start-splint-support-20030528 2003-05-28 19:18:08 +00:00
acinclude.m4 cf: Remove SRCDIR_PARENT 2024-08-19 09:41:11 -04:00
CODING Stop defining HC_DEPRECATED 2024-07-09 08:13:29 -04:00
configure-libafs.ac cf: Set CC before calling AC_PROG_CC 2024-07-02 13:13:45 -04:00
configure.ac build: Remove doc directory checks 2024-07-09 11:21:54 -04:00
CONTRIBUTING Correct our contributor's code of conduct 2020-09-04 10:01:28 -04:00
INSTALL INSTALL: Update AIX notes 2024-07-02 14:52:10 -04:00
libafsdep Move build support files into build-tools 2010-07-14 20:40:36 -07:00
LICENSE cf: Make local copy of ax_gcc_func_attribute.m4 2020-07-24 08:35:59 -04:00
Makefile-libafs.in Fix libafs_tree's cross-architecture support 2010-05-24 20:28:41 -07:00
Makefile.in tests: Make src/tests buildable 2024-10-03 15:44:31 -04:00
NEWS Update NEWS for OpenAFS 1.9.1 2021-03-18 21:48:27 -04:00
NTMakefile Remove rpctestlib 2021-06-10 12:59:53 -04:00
README Tweak grammar in README 2015-12-28 19:32:17 -05:00
README-WINDOWS Update windows build documentation 2013-07-02 15:14:09 -07:00
regen.sh Use autoconf-archive m4 from src/external 2020-05-08 11:30:36 -04:00

AFS is a distributed file system that enables users to share and
access all of the files stored in a network of computers as easily as
they access the files stored on their local machines. The file system is
called distributed for this exact reason: files can reside on many
different machines, but are available to users on every machine.

OpenAFS 1.0 was originally released by IBM under the terms of the
IBM Public License 1.0 (IPL10).  For details on IPL10 see the LICENSE
file in this directory.  The current OpenAFS distribution is licensed
under a combination of the IPL10 and many other licenses as granted by
the relevant copyright holders.  The LICENSE file in this directory
contains more details, thought it is not a comprehensive statement.

See INSTALL for information about building and installing OpenAFS
on various platforms.

See CODING for developer information and guidelines.

See NEWS for recent changes to OpenAFS.