FBSD: Avoid extra vcache puts in afs_root

Our two 'goto tryagain;' cases in afs_root() have a couple of
refcounting problems:

- If vget() returns an error, but any of the cases in the first if()
  statement are true (such as, afs_globalVp does not have CStatd set),
  we'll vput() and afs_PutVCache(), but vget() didn't grab a
  reference, since it returned an error. So we'll put references we
  don't actually have.

- If we enter the first if() block when vget() returns a success, we
  vput() the reference we got from vget(), but we also put an
  additional reference by calling afs_PutVCache(). If afs_globalVp
  still points to this same vcache, this can cause afs_globalVp to
  point to a vcache without a ref held for it.

Because of this, if afs_globalVp loses CStatd while we're waiting for
the vnode lock, this can cause the afs_globalVp vcache to not have any
refs held for it, which causes all sorts of other possible problems,
where the usecount for afs_globalVp can drop to 0 when we don't expect
it to.

To fix these issues, remove the extra afs_PutVCache(), and check for
an error from vget() before doing the other afs_globalVp-related
checks.

The relevant code path involved here can be stressed by frequently
causing lookups via /afs/..., while at the same time causing frequent
callback breaks on the root vnode. This can be achieved by using an RW
volume as the root volume (with non-dynroot), and having another
client constantly modify the root directory in that volume while a
FreeBSD client constantly does /afs/... lookups in separate pids.

Thanks to tcreech@tcreech.com for reporting and helping investigate
the relevant issue.

Change-Id: I812d063b3d60ac6eb841863cc7fba3e152393910
Reviewed-on: https://gerrit.openafs.org/14206
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
This commit is contained in:
Andrew Deason 2020-05-10 17:02:21 -05:00 committed by Andrew Deason
parent ccd5cb381e
commit d822447c27

View File

@ -233,15 +233,15 @@ tryagain:
AFS_GUNLOCK();
error = vget(vp, LK_EXCLUSIVE | LK_RETRY, td);
AFS_GLOCK();
if (error != 0) {
goto tryagain;
}
/* we dropped the glock, so re-check everything it had serialized */
if (!afs_globalVp || !(afs_globalVp->f.states & CStatd) ||
tvp != afs_globalVp) {
vput(vp);
afs_PutVCache(tvp);
goto tryagain;
}
if (error != 0)
goto tryagain;
/*
* I'm uncomfortable about this. Shouldn't this happen at a
* higher level, and shouldn't we busy the top-level directory