mirror of
https://git.openafs.org/openafs.git
synced 2025-01-18 15:00:12 +00:00
rx: prevent leak of cache manager NAT ping rx_connections
Both the Unix and Windows cache managers maintain a set of persistent
client rx_connections ("conns") to the known fileservers for each active
AFS user. These conns are periodically refreshed (destroyed and
possibly rebuilt) approximately NOTOKETIMEOUT (2h) after token
expiration for authenticated (rx_kad) conns, or every NOTOKETIMEOUT (2h)
for anonymous (rx_null) conns.
Both cache managers enable NAT ping for one rx_connection to each known
fileserver. Thus we see this common idiom when a cache manager destroys
a fileserver connection:
rx_SetConnSecondsUntilNatPing(conn, 0);
rx_DestroyConnection(conn);
Doing this for all conns is harmless, even if a given conn doesn't have
NAT ping enabled.
It is important to note that rx_SetConnSecondsUntilNatPing(conn, 0) does
not actually cancel the conn's natKeepAliveEvent (if it has one); it
merely sets conn->secondsUntilNatPing to 0. If there is a
natKeepAliveEvent, this prevents it from being rescheduled after the
next event. This was fine in the past because rx_DestroyConnection
eventually did cancel natKeepAliveEvent and destroy the rx_connection
itself.
However, this idiom broke after commit 304d758983
"Standardize
rx_event usage" introduced a number of changes:
- an extra rx_connection refCount for each outstanding conn event
- a requirement (enforced by osi_Assert) that all conn events be
canceled before rx_DestroyConnection can succeed
Therefore, if natKeepAliveEvent is still active when we enter
rx_DestroyConnection, it will return early, due to the extra conn
refCount associated with the NAT ping event. The rx_connection is not
destroyed in this case.
Eventually, the final scheduled natKeepAliveEvent will fire; the event
will not be rescheduled, and the final conn refCount will be removed via
putConnection and not rx_DestroyConnection (and so the conn is not
destroyed). This rx_connection now has refCount 0, but can never be
destroyed - it has been leaked:
- The cache manager has "forgotten" this rx_connection and has no means
to invoke rx_DestroyConnection again.
- rxi_ReapConnections will not destroy it because it is a client conn
and is still on the rx_connHashTable, not the rx_connCleanup_list.
- If there is still a dallying rx_call, any eventual call to
rxi_CheckCall -> rxi_FreeCall will not destroy the conn because it
has not been flagged RX_CONN_DESTROY_ME.
Modify rx_SetConnSecondsUntilNatPing to explicitly cancel any
natKeepAliveEvent and remove its refcount if cancelled. With this
change in place, the cache managers will no longer periodically leak
client rx_connections.
Change-Id: I4e89ebc4bd2c95b6e61b95bd8f91867d451dd34c
Reviewed-on: https://gerrit.openafs.org/14951
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
This commit is contained in:
parent
983a670b9d
commit
c5993b0a4f
@ -6659,7 +6659,10 @@ rx_SetConnSecondsUntilNatPing(struct rx_connection *conn, afs_int32 seconds)
|
||||
{
|
||||
MUTEX_ENTER(&conn->conn_data_lock);
|
||||
conn->secondsUntilNatPing = seconds;
|
||||
if (seconds != 0) {
|
||||
if (seconds == 0) {
|
||||
if (rxevent_Cancel(&conn->natKeepAliveEvent))
|
||||
putConnection(conn);
|
||||
} else {
|
||||
if (!(conn->flags & RX_CONN_ATTACHWAIT))
|
||||
rxi_ScheduleNatKeepAliveEvent(conn);
|
||||
else
|
||||
|
Loading…
Reference in New Issue
Block a user