From 5757a0dc3debb6b1d7e47faee18373a2866e4b03 Mon Sep 17 00:00:00 2001 From: Michael Meffie Date: Thu, 27 Aug 2015 13:06:05 -0400 Subject: [PATCH] afs: shake harder in shake-loose-vcaches Linux based cache managers will allocate vcaches on demand and deallocate batches of vcaches in the background. This feature is called dynamic vcaches. Vcaches to be deallocated are found by traversing the vcache LRU list (VLRU) from the oldest vcache to the newest. Up to a target number of vcaches are attempted to be evicted. The afs_xvcache lock protecting the VLRU may be dropped and re-acquired while attempting to evict a vcache. When this happens, it is possible the VLRU may have changed, so the traversal of the VLRU is restarted. This restarting of the VLRU transversal is limited to 100 iterations to avoid looping indefinitely. Vcaches which are busy cannot be evicted and remain in the VLRU. When a busy cache was not evicted and the afs_xvache lock was dropped, the VLRU traversal is restarted from the end of the VLRU. When the busy vcache is encountered on the retry, it will trigger additional retries until the loop limit is reached, at which point the target number of vcaches will not be deallocated. This can leave a very large number of unbusy vcaches which are never deallocated. On a busy machine, tens of millions of unused vcaches can remain in memory. When the busy vcache at the end of the VLRU is finally evicted, the log jam is broken, and the background deamon will hold the afs_xvcache lock for an excessively long time, hanging the system. Fix this by moving busy vcaches to the head of the VLRU before restarting the VLRU traversal. These busy vcaches will be skipped when retrying the VLRU traversal, allowing the cache manager to make progress deallocating vcaches down to the target level. This was already done on the mac osx platform while attempting to evict vcaches. Move the code to move busy vcaches to the head of the VLRU up the the platform agnostic caller. Thanks to Andrew Deason for the initial version of this patch. Reviewed-on: https://gerrit.openafs.org/11654 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk (cherry picked from commit 5c136c7d93ed97166f39bf716cc7f5d579b70677) Change-Id: If60b1889d012a739aa5b43e842abb80a6ebfdb6a Reviewed-on: https://gerrit.openafs.org/12451 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Mark Vitale Reviewed-by: Stephan Wiesand --- src/afs/DARWIN/osi_vcache.c | 5 +---- src/afs/afs_vcache.c | 14 +++++++++++++- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/src/afs/DARWIN/osi_vcache.c b/src/afs/DARWIN/osi_vcache.c index 18d8d9a089..1a1199c4cb 100644 --- a/src/afs/DARWIN/osi_vcache.c +++ b/src/afs/DARWIN/osi_vcache.c @@ -53,10 +53,7 @@ osi_TryEvictVCache(struct vcache *avc, int *slept, int defersleep) { * this out, since the iocount we have to hold makes it * always "fail" */ if (AFSTOV(avc) == tvp) { - if (*slept) { - QRemove(&avc->vlruq); - QAdd(&VLRU, &avc->vlruq); - } + /* Caller will move this vcache to the head of the VLRU. */ return 0; } else return 1; diff --git a/src/afs/afs_vcache.c b/src/afs/afs_vcache.c index d751a564ce..ca5a956ea2 100644 --- a/src/afs/afs_vcache.c +++ b/src/afs/afs_vcache.c @@ -725,6 +725,7 @@ int afs_ShakeLooseVCaches(afs_int32 anumber) { afs_int32 i, loop; + int evicted; struct vcache *tvc; struct afs_q *tq, *uq; int fv_slept, defersleep = 0; @@ -752,12 +753,23 @@ afs_ShakeLooseVCaches(afs_int32 anumber) } fv_slept = 0; - if (osi_TryEvictVCache(tvc, &fv_slept, defersleep)) + evicted = osi_TryEvictVCache(tvc, &fv_slept, defersleep); + if (evicted) { anumber--; + } if (fv_slept) { if (loop++ > 100) break; + if (!evicted) { + /* + * This vcache was busy and we slept while trying to evict it. + * Move this busy vcache to the head of the VLRU so vcaches + * following this busy vcache can be evicted during the retry. + */ + QRemove(&tvc->vlruq); + QAdd(&VLRU, &tvc->vlruq); + } goto retry; /* start over - may have raced. */ } if (uq == &VLRU) {