LINUX: Unlock page on afs_linux_read_cache errors

When afs_linux_read_cache is called with a non-NULL task, it is
responsible for unlocking 'page' (unless it's unlocked in a background
task), even if we encounter an error. Currently we almost always do
unlock the given page for a non-NULL task, but if we manage to hit one
of the codepaths that 'goto out', we skip over the unlock_page() call
near the end of the function, and the page never gets unlocked.

As a result, the page stays locked forever. That generally means any
future access to the same file will block forever, and when we try to
flush the relevant vcache, we will block waiting for the page lock
while holding GLOCK. (This can happen via the background daemon via
e.g. afs_ShakeLooseVCaches -> osi_TryEvictVCache -> afs_FlushVCache ->
osi_VM_FlushVCache -> vmtruncate -> ... -> truncate_inode_pages_range
-> __lock_page on Linux 2.6.32-754.2.1.el6.) This quickly brings the
whole client to a halt until the machine can be forcibly rebooted.

To solve this, just move the 'out:' label to before the page unlock.
Add a few locking-related comments around the relevant code to help
explain some relevant details.

The relevant code has changed and been refactored over the years, but
this problem has probably existed ever since this code was originally
converted to using the readpage() of the underlying cache fs, in
commit 88a03758 (Use readpage, not read for fastpath access).

Change-Id: If7e882ed54ca93ad6b9fdda938c606b241236241
Reviewed-on: https://gerrit.openafs.org/13672
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
This commit is contained in:
Andrew Deason 2019-07-03 12:55:53 -05:00 committed by Benjamin Kaduk
parent 0d8ce846ab
commit eed79e2d28

View File

@ -2080,7 +2080,8 @@ afs_linux_put_link(struct dentry *dentry, struct nameidata *nd)
* If task is NULL, the page copy occurs syncronously, and the routine
* returns with page still locked. If task is non-NULL, then page copies
* may occur in the background, and the page will be unlocked when it is
* ready for use.
* ready for use. Note that if task is non-NULL and we encounter an error
* before we start the background copy, we MUST unlock 'page' before we return.
*/
static int
afs_linux_read_cache(struct file *cachefp, struct page *page,
@ -2144,7 +2145,9 @@ afs_linux_read_cache(struct file *cachefp, struct page *page,
if (!PageUptodate(cachepage)) {
ClearPageError(cachepage);
code = cachemapping->a_ops->readpage(NULL, cachepage);
/* Note that ->readpage always handles unlocking the given page, even
* when an error is returned. */
code = cachemapping->a_ops->readpage(NULL, cachepage);
if (!code && !task) {
wait_on_page_locked(cachepage);
}
@ -2167,11 +2170,11 @@ afs_linux_read_cache(struct file *cachefp, struct page *page,
}
}
out:
if (code && task) {
unlock_page(page);
}
out:
if (cachepage)
put_page(cachepage);
@ -2711,6 +2714,8 @@ afs_linux_readpages(struct file *fp, struct address_space *mapping,
if (!pagevec_add(&lrupv, page))
__pagevec_lru_add_file(&lrupv);
/* Note that add_to_page_cache() locked 'page'.
* afs_linux_read_cache() is guaranteed to handle unlocking it. */
afs_linux_read_cache(cacheFp, page, tdc->f.chunk, &lrupv, task);
}
put_page(page);