From bb5397e4c409e3c075ee73d6bf54a3b6eacc0060 Mon Sep 17 00:00:00 2001 From: Mark Vitale Date: Fri, 20 Apr 2018 00:57:28 -0400 Subject: [PATCH] rx: prevent leakage of non-cached rx_connections (pthread) The rxi_connectionCache (AFS_PTHREAD_ENV only) allows applications to reuse rx_connection structs. Cached rx_connections are obtained via rx_GetCachedConnection and released via rx_ReleaseCachedConnection. This feature is used most heavily by libadmin and kauth, but there are other users in the tree as well. For instance, ubikclient routines ubik_ClientInit and ubik_ClientDestroy call rx_ReleaseCachedConnections (if AFS_PTHREAD_ENV) when disposing of their rx_connections. Unfortunately, in many cases these rx_connections were obtained via rx_NewConnection, _not_ from the cache via rx_GetCachedConnection. In those cases, rx_ReleaseCachedConnection will not find the rx_connection in the rxi_connectionCache, and thus it returns without doing anything. Therefore, when ubik_ClientInit is passed an existing ubik_client (for re-initialization) that contains rx_connections NOT allocated via rx_GetCachedConnection, those connections are not destroyed, but will be silently leaked. Similarly, ubik_ClientDestroy will leak its rx_connections when it frees the ubik_client struct. For example, the fileserver host package calls ubik_ClientInit (via hpr_Initialize) and ubik_ClientDestroy (via hpr_End) to manage connections to the ptserver. However, these connections were obtained via rx_NewConnection, not rx_GetCachedConnection. If the fileserver has a failed call to the ptserver that sets prfail=1, the next RPC scheduled for that client (in CallPreamble) will refresh the thread's ubik_client (viced_uclient_key) by calling hprEnd -> ubik_ClientDestroy -> rx_ReleaseCachedConnection. The "released" connections will be leaked. This problem exists in all versions of OpenAFS going back to IBM 1.0. Starting with 1.8.x, many components that were formerly LWP-only are now pthreaded and thus susceptible to this leak. It seems difficult and error-prone to identify all possible code paths that may pass a non-cached rx_connection to rx_ReleaseCachedConnection, and convert them to obtain connections via rx_GetCachedConnection. Instead, prevent all existing and future leaks by modifying the connection cache to: - flag all rx_connections it allocates - correctly release any rx_connection it is passed, whether they came from the cache or not. Change-Id: Ibe164ccd30a8ddd799438c28fd6e1d8a0a9040dd Reviewed-on: https://gerrit.openafs.org/13042 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot --- src/rx/rx.h | 1 + src/rx/rx_conncache.c | 15 +++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/src/rx/rx.h b/src/rx/rx.h index 219ad2d656..3e1b77e467 100644 --- a/src/rx/rx.h +++ b/src/rx/rx.h @@ -366,6 +366,7 @@ struct rx_service { #define RX_CONN_ATTACHWAIT 64 /* attach waiting for peer->lastReach */ #define RX_CONN_MAKECALL_ACTIVE 128 /* a thread is actively in rx_NewCall */ #define RX_CONN_NAT_PING 256 /* NAT ping requested but deferred during attachWait */ +#define RX_CONN_CACHED 512 /* connection is managed by rxi_connectionCache */ /* Type of connection, client or server */ #define RX_CLIENT_CONNECTION 0 diff --git a/src/rx/rx_conncache.c b/src/rx/rx_conncache.c index 89f8d5e5df..ced822c736 100644 --- a/src/rx/rx_conncache.c +++ b/src/rx/rx_conncache.c @@ -142,6 +142,7 @@ rxi_AddCachedConnection(rx_connParts_p parts, struct rx_connection **conn) new_entry->hasError = 0; opr_queue_Prepend(&rxi_connectionCache, &new_entry->queue); } + (*conn)->flags |= RX_CONN_CACHED; /* * if malloc fails, we fail silently @@ -258,6 +259,19 @@ rx_ReleaseCachedConnection(struct rx_connection *conn) struct opr_queue *cursor, *store; LOCK_CONN_CACHE; + + /* Check if the caller is asking us to release a conn that did NOT come + * from the connection cache. If so, don't bother searching the cache + * because the connection won't be found or destroyed. Since we return + * void, the caller must assume the connection _has_ been found and + * destroyed. So to avoid leaking the connection, just destroy it now and + * return. + */ + if (!(conn->flags & RX_CONN_CACHED)) { + rxi_DestroyConnection(conn); + UNLOCK_CONN_CACHE; + return; + } for (opr_queue_ScanSafe(&rxi_connectionCache, cursor, store)) { struct cache_entry *cacheConn = opr_queue_Entry(cursor, struct cache_entry, queue); @@ -274,6 +288,7 @@ rx_ReleaseCachedConnection(struct rx_connection *conn) cacheConn->hasError = 1; if (cacheConn->inUse == 0) { opr_queue_Remove(&cacheConn->queue); + cacheConn->conn->flags &= ~RX_CONN_CACHED; rxi_DestroyConnection(cacheConn->conn); free(cacheConn); }