openafs

mirror of https://git.openafs.org/openafs.git synced 2025-01-18 23:10:58 +00:00

Author	SHA1	Message	Date
Ian Wienand	833e2783a3	Add .gitreview git-review [1] makes it much easier to submit changes. Add a default configuration file. [1] https://docs.openstack.org/infra/git-review/usage.html Reviewed-on: https://gerrit.openafs.org/12884 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `c7c71d2429`) Change-Id: I271cfeb6aea888ae40539e248a18131b0affeda8 Reviewed-on: https://gerrit.openafs.org/12901 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 21:48:12 -05:00
Mark Vitale	780ed24d36	SOLARIS: Avoid vcache locks when flushing pages for RO vnodes We have multiple code paths that hold the following locks at the same time: - avc->lock for a vcache - The page lock for a page in 'avc' In order to avoid deadlocks, we need a consistent ordering for obtaining these two locks. The code in afs_putpage() currently obtains avc->lock before the page lock (Obtain*Lock is called before pvn_vplist_dirty). The code in afs_getpages() also obtains avc->lock before the page lock, but it does so in a loop for all requested pages (via pvn_getpages()). On the second iteration of that loop, it obtains avc->lock, and the page from the first iteration of the loop is still locked. Thus, it obtains a page lock before locking avc->lock in some cases. Since we have two code paths that obtain those two locks in a different order, a deadlock can occur. Fixing this properly requires changing at least one of those code paths, so the locks are taken in a consistent order. However, doing so is complex and will be done in a separate future commit. For this commit, we can avoid the deadlock for RO volumes by simply avoiding taking avc->lock in afs_putpages() at all while the pages are locked. Normally, we lock avc->lock because pvn_vplist_dirty() will call afs_putapage() for each dirty page (and afs_putapage() requires avc->lock held). But for RO volumes, we will have no dirty pages (because RO volumes cannot be written to from a client), and so afs_putapage() will never be called. So to avoid this deadlock issue for RO volumes, avoid taking avc->lock across the pvn_vplist_dirty() call in afs_putpage(). We now pass a dummy pageout callback function to pvn_vplist_dirty() instead, which should never be called, and which panics if it ever is. We still need to hold avc->lock a few other times during afs_putpage() for other minor reasons, but none of these hold page locks at the same time, so the deadlock issue is still avoided. [mmeffie: comments, and fix missing write lock, fix lock releases] [adeason: revised commit message] Reviewed-on: https://gerrit.openafs.org/12247 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `5e09a694ec`) Change-Id: I5d4e4ddba12c09dc549edeee3cad7de40582ac65 Reviewed-on: https://gerrit.openafs.org/12900 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 21:47:41 -05:00
Benjamin Kaduk	5bb7684f07	rx: remove trailing semicolons from FBSD mutex operations Since the first introduction of FreeBSD support, the macros (MUTEX_ENTER, etc.) for kernel mutex operations have included trailing semicolons, unique among all the platforms. This did not cause problems until the recent work on rx event handlers, which put a MUTEX_ENTER() in the body of an 'if' clause with no brackets, and attempted to follow it with an 'else' clause. This results in the following (rather obtuse) compiler error: /root/openafs/src/rx/rx.c:3666:5: error: expected expression else ^ Which is more visible in the preprocessed source, as if (condition) expression;; else other_expression; is clearly invalid C. To fix the FreeBSD kernel module build, remove the unneeded semicolons. Reviewed-on: https://gerrit.openafs.org/12853 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `0760feb799`) Change-Id: I503a5967a167e9be92721af8dc82d191f3bf18ba Reviewed-on: https://gerrit.openafs.org/12899 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 10:40:14 -05:00
Benjamin Kaduk	2f07951a47	libuafs: remove stale afs_nfsdisp.lo rule afs_nfsdisp.lo is not used, so we do not need a build rule for it. Reviewed-on: https://gerrit.openafs.org/12802 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `decb4308d4`) Change-Id: I53680df1c8648ceb43cc032cada573964622d5b4 Reviewed-on: https://gerrit.openafs.org/12898 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:37:25 -05:00
Christof Hanke	4988628a2e	Avoid gcc warning When using the configure option --enable-checking with gcc 7.2.1, the compilation fails with vutil.c:860:20: error: ‘%s’ directive writing up to 255 bytes into \ a region of size 63 [-Werror=format-overflow=] This can be seen in the logs of the openSUSE Tumbleweed builder for e.g. build 2368. Avoid this warning by using snprintf which is provided by libroken for all platforms. Reviewed-on: https://gerrit.openafs.org/12813 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `fd4eaebb60`) Change-Id: I3be14f6f1228fd09f036da7ff4f1505c65e49406 Reviewed-on: https://gerrit.openafs.org/12897 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:34:56 -05:00
Marcio Barbosa	35636bd9e3	ubik: avoid DISK_Begin on sites that didn't vote for sync As already described on `7c708506`, SDISK_Begin fails on remotes if lastYesState is not set. To fix this problem, `7c708506` does not allow write transactions until we know that lastYesState is set on at least quorum (ubik_syncSiteAdvertised == 1). In other words, if enough sites received a beacon packet informing that a sync-site was elected, write transactions will be allowed. This means that ubik_syncSiteAdvertised can be true while lastYesState is not set in a few sites. Consider the following scenario in a cell with frequent write transactions: Site A => Sync-site (up) Site B => Remote 1 (up) Site C => Remote 2 (down - unreachable) Since A and B are up, we have quorum. After the second wave of beacons, ubik_syncSiteAdvertised will be true and write transactions will be allowed. At some point, C is not unreachable anymore. Site A sends a copy of its database to C, but C did not vote for A yet (lastYesState == 0). A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. Since C is reachable, A will mark this remote site as up. The sync-site will send its database to C, but C did not vote for A yet. A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. In a cell with frequent write transactions, this cycle will repeat forever. As a result, the sync-site will be constantly sending its database to C and quorum will be operating with less sites, increasing the chances of re-elections. To fix this problem, do not call DISK_Begin on remotes that did not vote for the sync-site yet. Reviewed-on: https://gerrit.openafs.org/12715 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `68ec78950a`) Change-Id: I3764c23125f0bc675762449cd29b282ba403f871 Reviewed-on: https://gerrit.openafs.org/12896 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:30:43 -05:00
Michael Meffie	51816cfd04	add rfc3961.h to kernel sources Export this header to the kernel sources in the libafs_tree, since it is needed for the kernel module build. FIXES 134476 Reviewed-on: https://gerrit.openafs.org/12882 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `073522b3d4`) Change-Id: I4e5c7883a1dd4b66b9252f4e630ca489f05e9ad3 Reviewed-on: https://gerrit.openafs.org/12890 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:25:05 -05:00
Benjamin Kaduk	07811e3b15	Add param.h files for recent FreeBSD Add files for FreeBSD 10.4, 11.1, and 12.0 (12-CURRENT), for i386 and amd64. Reviewed-on: https://gerrit.openafs.org/12863 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `88dc4d93f5`) Change-Id: I6ddb0f03e209b0ce9c7ed1168c86a675d7802c23 Reviewed-on: https://gerrit.openafs.org/12888 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:24:26 -05:00
Benjamin Kaduk	0ba9b5559e	FBSD: catch up to missing sysnames Add sysnames for i386 and amd64 10.4, 11.1, and 12.0 (12-CURRENT, at present). Reviewed-on: https://gerrit.openafs.org/12862 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `c390f368a5`) Change-Id: I5183c19d446fd0c00bd26c32ca3f7f00a4d12907 Reviewed-on: https://gerrit.openafs.org/12887 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:18:45 -05:00
Marcio Barbosa	1f10f08726	ubik: update ubik_dbVersion during SDISK_SendFile The ubik_dbVersion global represents the sync site's database version and it is mostly used by the remote sites for sanity checks. Currently, this global is updated when database changes are made on the sync site (SDISK_Commit or SDISK_SetVersion), as well as every time we vote "yes" for the sync-site in a beacon reply. Unfortunately, ubik_dbVersion is not updated when a copy of the sync site's database is received via DISK_SendFile, and it won't get updated until our next "yes" vote. During this window, the current database version will not match ubik_dbVersion. As a result, any write transaction during this time frame will fail on the remote site in question. To fix this problem, do not wait for the next beacon packet to update ubik_dbVersion when the sync site's database is received; just update it when we get the new database. Since no write transactions are allowed while the db is transferring, ubik_dbVersion can be safely updated. Reviewed-on: https://gerrit.openafs.org/12716 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `50c1d1088d`) Change-Id: Icbbe9efb9c8dab9ac69237380e824d4a523a53d3 Reviewed-on: https://gerrit.openafs.org/12885 Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:07:49 -05:00
Andrew Deason	e9419dc895	LINUX: Avoid locking inode in check_dentry_race Currently, check_dentry_race locks the parent inode in order to ensure it is not running in parallel with d_splice_alias for the same inode. (For old Linux kernel versions; see commit `b0461f2d`: "LINUX: Workaround d_splice_alias/d_lookup race".) However, it is possible to hit this area of code when the parent inode is already locked. When someone tries to create a file, directory, or symlink, Linux tries to lookup the dentry for the target path, to see if it already exists. While looking up the last component of the path, Linux locks the directory, and if it finds a dentry for the target name, it calls d_invalidate on it while the parent directory is locked. For a dentry with a NULL inode, we'll then try to lock the parent inode in check_dentry_race. But since the inode is already locked, we will deadlock. From a user's point of view, the hang can be reproduced by doing something similar to: $ mkdir dir # succeeds $ rmdir dir $ ls -l dir ls: cannot access dir: No such file or directory $ mkdir dir # hangs To avoid this, we can just change which lock we're using to avoid check_dentry_race/d_splice_alias from running in parallel. Instead of locking the parent inode, introduce a new global lock (called dentry_race_sem), and lock that in check_dentry_race and around our d_splice_alias call. We know that those are the only two users of this new lock, so this should avoid any such deadlocks. This does potentially reduce performance, since all tasks that hit check_dentry_race or d_splice_alias will take the same global lock. However, this at least still allows us to make use of negative dentries, and this entire code path only applies to older Linux kernels. It could be possible to add a new lock into struct vcache instead, but using a global lock like this commit does is much simpler. Reviewed-on: https://gerrit.openafs.org/12868 Tested-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `ef1d4c8d32`) Change-Id: Ia8e28519fff36baca7dc4061ceef6719a2a738d4 Reviewed-on: https://gerrit.openafs.org/12881 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 09:03:45 -05:00
Caitlyn Marko	f523c92a74	SOLARIS: save kernel module function arguments for debugging Add the -Wu,-save_args compiler option when building kernel modules under Solaris 10 and 11 for the amd64 architecture. Binaries generated with this option save function arguments on the stack during function entry for debugging purposes. Up to six integer arguments are saved on function entry, and are not modified during the execution of the function. [mmeffie: commit message update] Reviewed-on: https://gerrit.openafs.org/12798 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `32d0493a7e`) Change-Id: I478ce65da78b86aa3c13e1c615bafd51d0f5d567 Reviewed-on: https://gerrit.openafs.org/12903 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:57:29 -05:00
Marcio Barbosa	96ce04c78b	autoconf: detect ctf-tools and add ctf to libafs CTF is a reduced form of debug information similar to DWARF and stab. It describes types and function prototypes. The principal objective of the format is to shrink the data size as much as possible so that it could be included in a production environment. MDB, DTrace, and other tools use CTF debug information to read and display structures correctly. This commit introduces a new configure option called --with-ctf-tools. This option can be used to specify an alternative path where the tools can be found. If the path is not provided, the tools will be searched in a set of default directories (including $PATH). The CTF debugging information will only be included if the corresponding --enable-debug / --enable-debug-kernel is specified. Note: at the moment, the Solaris kernel module is the only module benefited by this commit. Reviewed-on: https://gerrit.openafs.org/12680 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `88cb536f99`) Change-Id: I174347370a83b31f68d2631c965e17d72b438cd1 Reviewed-on: https://gerrit.openafs.org/12902 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:55:16 -05:00
Michael Meffie	0247eb0a8c	autoconf: refactor linux-checks.m4 Further refactoring of the autoconf macros. Divy up the linux kernel checks into smaller files. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Reviewed-on: https://gerrit.openafs.org/12844 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `6a2b85cd4c`) Change-Id: Iae325bc14fb160f27791b2f3d82198fe671badd8 Reviewed-on: https://gerrit.openafs.org/12878 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:34:09 -05:00
Michael Meffie	e05b0b10b9	autoconf: refactor ostype.m4 Further refactoring of the autoconf macros. Move more linux and solaris specific checks into their own files. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Reviewed-on: https://gerrit.openafs.org/12843 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `3c2e39bab7`) Change-Id: I4d91753afd90e4735ab61413e757f6852750a3de Reviewed-on: https://gerrit.openafs.org/12877 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:33:16 -05:00
Michael Meffie	e549637573	autoconf: refactor acinclude.m4 The acinclude.m4 is very large and often requires to be changed for unrelated commits. Divy up the large acinclude.m4 into a number of smaller files to avoid so many contentions and to make the autoconf system easier to maintain. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Reviewed-on: https://gerrit.openafs.org/12842 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `c72622a244`) Change-Id: I9504eaa2430fd35f79b55c3df96c82cc7e58fafd Reviewed-on: https://gerrit.openafs.org/12876 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:32:20 -05:00
Michael Meffie	0798b54b25	CellServDB update 14 Mar 2017 Update all remaining copies of CellServDB in the tree, and make the Red Hat packaging use it by default too. Reviewed-on: https://gerrit.openafs.org/12880 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `3ca1352170`) Change-Id: I773d35745e14903dd3069a0627932153900e0ba6 Reviewed-on: https://gerrit.openafs.org/12889 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:30:19 -05:00
Michael Meffie	5eb64632fd	redhat: fix conditional for kernel-debuginfo files directive Commit `443dd5367e` added support for a separate debuginfo package for the kernel module. Unfortunately, the %files directive for the kernel module debuginfo package was incorrectly placed in the %if stanza of the build_userspace condition, so the rpmbuild fails when attempting to build just the kernel module. That is, when running rpmbuild with the options: rpmbuild --define "build_userspace 0" --define "build_modules 1" ... rpmbuild fails with: RPM build errors: Installed (but unpackaged) file(s) found: /usr/lib/debug/lib/modules/.../extra/openafs/openafs.ko.debug Fix this by moving the new %files directive out of the build_userspace conditional. Reviewed-on: https://gerrit.openafs.org/12874 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `f599e1ce63`) Change-Id: I07e25d3dd43b2cd7056cefb8f0f5c10f78347b85 Reviewed-on: https://gerrit.openafs.org/12875 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:29:07 -05:00
Michael Meffie	781624f7f4	redhat: avoid rpmbuild exclude directives Older versions of rpmbuild do not support the files exclude directive, so fall back to the old way in which we remove the files to be excluded and list the files to be included. Reviewed-on: https://gerrit.openafs.org/12733 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `a71288a387`) Change-Id: I01c20bc21ec6143be76458c311d826023c370d51 Reviewed-on: https://gerrit.openafs.org/12873 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:28:31 -05:00
Michael Meffie	9d62e1d5c6	redhat: move .krb variants to the kauth-client subpackage Move the deprecated klog.krb, pagsh.krb, and tokens.krb programs and man pages to the optional openafs-kauth-client subpackage. Reviewed-on: https://gerrit.openafs.org/12732 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `4d247e1ae4`) Change-Id: I3c6164022b07f0c3283cb54ffd26e1f9c3dd67bb Reviewed-on: https://gerrit.openafs.org/12872 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:26:31 -05:00
Michael Meffie	ea1bf1bd75	redhat: specify man pages without wildcards Currently, some of the man pages are specified with the full name and some are specified with a wildcard for the filename extension. Instead, specify all the man pages without a wildcards to be more consistent and to avoid putting incorrect man pages in packages. This change removes a stray copy the klog.krb5.1 man page from openafs-kauth-client subpackage and moves the AuthLog/AuthLog.dir man pages to the optional openafs-kauth-server subpackage. Reviewed-on: https://gerrit.openafs.org/12731 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `671db4ca5a`) Change-Id: I9d10cc7aad94a2dc004526acb426a9b9badc8e3c Reviewed-on: https://gerrit.openafs.org/12871 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:21:58 -05:00
Michael Meffie	f56ea0e095	redhat: remove afsd.fuse man page The afsd.fuse binary is not currently packaged; do not package the man page. Reviewed-on: https://gerrit.openafs.org/12730 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `a9810b829b`) Change-Id: I7c829a492e999cc989e9341e94f56d6669722a4c Reviewed-on: https://gerrit.openafs.org/12870 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-09 08:19:33 -05:00
Benjamin Kaduk	79188311b4	Make OpenAFS 1.8.0pre4 Update version strings for the fourth 1.8.0 prerelease. Change-Id: Ib7defe21ca5e5a8c2214879633a467e002f3269b Reviewed-on: https://gerrit.openafs.org/12837 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:56:08 -05:00
Benjamin Kaduk	90bc3cf935	Update NEWS for 1.8.0pre4 Change-Id: I0ba71b1e837309b36db39895914b6a8b9380a81f Reviewed-on: https://gerrit.openafs.org/12836 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:55:32 -05:00
Mark Vitale	4feec06c7b	LINUX: Avoid d_invalidate() during afs_ShakeLooseVCaches() With recent changes to d_invalidate's semantics (it returns void in Linux 3.11, and always returns success in RHEL 7.4), it has become increasingly clear that d_invalidate() is not the best function for use in our best-effort (nondisruptive) attempt to free up vcaches that is afs_ShakeLooseVCaches(). The new d_invalidate() semantics always force the invalidation of a directory dentry, which contradicts our desire to be nondisruptive, especially when that directory is being used as the current working directory for a process. Our call to d_invalidate(), intended to merely probe for whether a dentry can be discarded without affecting other consumers, instead would cause processes using that dentry as a CWD to receive ENOENT errors from getcwd(). A previous commit (`c3bbf0b444`) tried to address this issue by calling d_prune_aliases() instead of d_invalidate(), but d_prune_aliases() does not recursively descend into children of the given dentry while pruning, leaving it an incomplete solution for our use-case. To address these issues, modify the shakeloose routine TryEvictDentries() to call shrink_dcache_parent() and maybe __d_drop() for directories, and d_prune_aliases() for non-directories, instead of d_invalidate(). (Calls to d_prune_aliases() for directories have already been removed by reverting commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286.) Just like d_invalidate(), shrink_dcache_parent() has been around "forever" (since pre-git v2.6.12). Also like d_invalidate(), it "walks" the parent dentry's subdirectories and "shrinks" (unhashes) unused dentries. But unlike d_invalidate(), shrink_dcache_parent() will not unhash an in-use dentry, and has never changed its signature or semantics. d_prune_aliases() has also been available "forever", and has also never changed its signature or semantics. The lack of recursive descent is not an issue for non-directories, which cannot have such children. [kaduk@mit.edu: apply review feedback to fix locking and avoid extraneous changes, and reword commit message] Reviewed-on: https://gerrit.openafs.org/12830 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `afbc199f15`) Change-Id: I6d37e5584b57dcbb056385a79f67b92a363e08d2 Reviewed-on: https://gerrit.openafs.org/12851 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:53:11 -05:00
Mark Vitale	4b633c9681	LINUX: consolidate duplicate code in osi_TryEvictDentries The two stanzas for HAVE_DCACHE_LOCK are now functionally identical; remove the preprocessor conditionals and duplicate code. Minor functional change is incurrred for very old (before 2.6.38) Linux versions that have dcache_lock; we are now obtaining the d_lock as well. This is safe because d_lock is also quite old (pre-git, 2.6.12), and it is a spinlock that's only held for checking d_unhashed. Therefore, it should have negligible performance impact. It cannot cause deadlocks or violate locking order, because spinlocks can't be held across sleeps. Reviewed-on: https://gerrit.openafs.org/12792 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Andrew Deason <adeason@dson.org> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `5076dfc14b`) Change-Id: I7a17494b40c049a562dec20c50c27125f54436d0 Reviewed-on: https://gerrit.openafs.org/12850 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:45:59 -05:00
Mark Vitale	16c0dbd796	LINUX: consolidate duplicate code in canonical_dentry The two stanzas for HAVE_DCACHE_LOCK are now identical; remove the preprocessor conditionals and duplicate code. No functional change should be incurred by this commit. Reviewed-on: https://gerrit.openafs.org/12791 Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `0678ad26b6`) Change-Id: If0f9516201cea747a753db04ba2d0e2cac69971b Reviewed-on: https://gerrit.openafs.org/12849 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:44:10 -05:00
Mark Vitale	c42dea8e02	LINUX: add afs_d_alias_lock & _unlock compat wrappers Simplify some #ifdefs for HAVE_DCACHE_LOCK by pushing them down into new helpers in osi_compat.h. No functional change should be incurred by this commit. Reviewed-on: https://gerrit.openafs.org/12790 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `652cd597d9`) Change-Id: I6aec7d6a21e68011ca10ceaa15e83d80f52fad59 Reviewed-on: https://gerrit.openafs.org/12848 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:43:29 -05:00
Mark Vitale	0ec02ef73d	LINUX: create afs_linux_dget() compat wrapper For dentry operations that cover multiple dentry aliases of a single inode, create a compatibility wrapper to hide differences between the older dget_locked() and the current dget(). No functional change should be incurred by this commit. Reviewed-on: https://gerrit.openafs.org/12789 Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `74f4bfc627`) Change-Id: Id854e5957547a1370cadb400f7f699c30d861fd1 Reviewed-on: https://gerrit.openafs.org/12847 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:41:47 -05:00
Mark Vitale	ae70407faf	Revert "LINUX: do not use d_invalidate to evict dentries" Linux recently changed the semantics of d_invalidate() to: - return void - invalidate even a current working directory OpenAFS commit `c3bbf0b444` switched libafs to use d_prune_aliases() instead. However, since that commit, several things have happened: - RHEL 7.4 changed the semantics of d_invalidate() such that it invalidates the cwd, but did NOT change the return type to void. This broke our autoconf test for detecting the new semantics. - Further research reveals that d_prune_aliases() was not the best choice for replacing d_invalidate(). This is because for directories, d_prune_aliases() doesn't invalidate dentries when they are referenced by its children, and it doesn't walk the tree trying to invalidate child dentries. So it can leave dentries dangling, if the only references to thos dentries are via children. In preparation for future commits, revert `c3bbf0b444` . Reviewed-on: https://gerrit.openafs.org/12788 Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `367693bd7d`) Change-Id: I3dfa9127adf8424fe675e237194d6ade5a7fc4f1 Reviewed-on: https://gerrit.openafs.org/12846 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:40:52 -05:00
Mark Vitale	2a06e32c39	Revert "LINUX: eliminate unused variable warning" This reverts commit `19599b5ef5` to allow also reverting commit `c3bbf0b444` . Reviewed-on: https://gerrit.openafs.org/12787 Reviewed-by: Andrew Deason <adeason@dson.org> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `f8247078bd`) Change-Id: I023c88e19d9f1a18b2bfaec8a35bd635f157b570 Reviewed-on: https://gerrit.openafs.org/12845 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-03 00:39:31 -05:00
Pat Riehecky	e412a87b1f	redhat: separate debuginfo package for kmod rpm Place the debuginfo for the kmod into its own rpm so that it doesn't have to track against the userspace packages. FIXES 132034 Reviewed-on: https://gerrit.openafs.org/11867 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `443dd5367e`) Change-Id: I6a24bb08242ec34c123880e9cbca4580a3560cba Reviewed-on: https://gerrit.openafs.org/12822 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-26 02:07:46 -05:00
Stephan Wiesand	7a80b4ba67	Linux 4.15: check for 2nd argument to pagevec_init Linux 4.15 removes the distinction between "hot" and "cold" cache pages, and pagevec_init() no longer takes a "cold" flag as the second argument. Add a configure test and use it in osi_vnodeops.c . Reviewed-on: https://gerrit.openafs.org/12824 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Tested-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `fb1f14d8ee`) Change-Id: Ib9e0751e4900d984a4197d18ee9ebb1bdc7bf331 Reviewed-on: https://gerrit.openafs.org/12829 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-26 02:07:05 -05:00
Stephan Wiesand	2ff3ef2ec6	Linux: use plain page_cache_alloc Linux 4.15 removes the distinction between "hot" and "cold" cache pages, and no longer provides page_cache_alloc_cold(). Simply use page_cache_alloc() instead, rather than adding yet another test. Reviewed-on: https://gerrit.openafs.org/12823 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Tested-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `be5f5b2aff`) Change-Id: I2d4df508abfa9d3c7020b8a2817ed3e882a4dbbc Reviewed-on: https://gerrit.openafs.org/12828 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-26 02:06:12 -05:00
Marcio Barbosa	d9bb508e07	macos: make the OpenAFS client aware of APFS Apple has introduced a new file system called APFS. Starting from High Sierra, APFS replaces Mac OS Extended (HFS+) as the default file system for solid-state drives and other flash storage devices. The current OpenAFS client is not aware of APFS. As a result, the installation of the current client into an APFS volume will panic the machine. To fix this problem, make the OpenAFS client aware of APFS. Reviewed-on: https://gerrit.openafs.org/12743 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `6e57b22642`) Change-Id: I60d2a57fae3ee227bb3327a5e18962f46b49c991 Reviewed-on: https://gerrit.openafs.org/12827 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-26 02:05:12 -05:00
Marcio Barbosa	5857724bf6	macos: packaging support for MacOS X 10.13 This commit introduces the new set of changes / files required to successfully create the dmg installer on OS X 10.13 "High Sierra". Reviewed-on: https://gerrit.openafs.org/12742 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `e533d07370`) Change-Id: I8932f6a3db6a0572aa36944aa339b888fac94b7d Reviewed-on: https://gerrit.openafs.org/12826 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-26 02:04:16 -05:00
Marcio Barbosa	ac8cab7fcd	macos: add support for MacOS 10.13 This commit introduces the new set of changes / files required to successfully build the OpenAFS source code on OS X 10.13 "High Sierra". Reviewed-on: https://gerrit.openafs.org/12741 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `804c9cbf50`) Change-Id: I9abcccf8313c8ac075eb1edbd36cbaa565968b38 Reviewed-on: https://gerrit.openafs.org/12825 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-26 02:02:20 -05:00
Benjamin Kaduk	383688fa0d	Fix macro used to check kernel_read() argument order The m4 macro implementing the configure check is called LINUX_KERNEL_READ_OFFSET_IS_LAST, but it defines a preprocessor symbol that is just KERNEL_READ_OFFSET_IS_LAST. Our code needs to check for the latter being defined, not the former. Reported by Aaron Ucko. Reviewed-on: https://gerrit.openafs.org/12808 Reviewed-by: Anders Kaseorg <andersk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `edc5463f3d`) Change-Id: I7bc6615118f1200d3f257e7a01652b49b458a8fa Reviewed-on: https://gerrit.openafs.org/12809 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-14 22:58:33 -05:00
Benjamin Kaduk	3cae0a01d0	Update NEWS for rx security fix Change-Id: I30282ac8f51a7b16dd851fdbd41464f8fdafc279	2017-12-05 08:43:14 -06:00
Benjamin Kaduk	eae2575dc7	OPENAFS-SA-2017-001: rx: Sanity-check received MTU and twind values Rather than blindly trusting the values received in the (unauthenticated) ack packet trailer, apply some minmial sanity checks to received values. natMTU and regular MTU values are subject to Rx minmium/maximum packet sizes, and the transmit window cannot drop below one without risk of deadlock. The maxDgramPackets value that can also be present in the trailer already has sufficient sanity checking. Extremely low MTU values (less than 28 == RX_HEADER_SIZE) can cause us to set a negative "maximum usable data" size that gets used as an (unsigned) packet length for subsequent allocation and computation, triggering an assertion when the connection is used to transmit data. FIXES 134450 (cherry picked from commit `894555f93a`) Change-Id: I98e2a65d1aa291a73e8cfed9c9eaac71c6af00dc	2017-12-05 08:43:14 -06:00
Benjamin Kaduk	352fbc8111	Make OpenAFS 1.8.0pre3 Update the version strings for the third 1.8.0 prerelease. Change-Id: I25a4eee4de04e57ffcf9055f69ae9a3d683b8d64 Reviewed-on: https://gerrit.openafs.org/12765 Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-28 11:47:30 -05:00
Benjamin Kaduk	1efc44f397	Update NEWS for 1.8.0pre3 Change-Id: I38110825cbe8b5c4ca18d86e4542374ae26f6fd4 Reviewed-on: https://gerrit.openafs.org/12764 Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>	2017-11-28 11:46:59 -05:00
Benjamin Kaduk	e2c47cae56	afs: Fix bounds check in PNewCell Reported by the opensuse buildbot: CC [M] /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/libafs/MODLOAD-4.13.12-1-default-MP/rx_packet.o /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c: In function ‘PNewCell’: /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c:3075:55: error: ‘’ in boolean context, suggest ‘&&’ instead [-Werror=int-in-bool-context] if ((afs_pd_remaining(ain) < AFS_MAXCELLHOSTS +3) sizeof(afs_int32)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ The bug was introduced in commit `718f85a8b6`. Reviewed-on: https://gerrit.openafs.org/12782 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `4fa0ee620c`) Change-Id: I0963403846a62dddf2d13ce3c03d772a6d869119 Reviewed-on: https://gerrit.openafs.org/12784 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-28 11:46:09 -05:00
Benjamin Kaduk	6e611c56c5	rx: fix call refcount leak in error case The recent event handling normalization in commit `304d758983` had event handlers switch to dropping their reference on the associated connection/call just before return. An early return case was missed in the conversion, leading to a refcount leak in an error case. Reviewed-on: https://gerrit.openafs.org/12781 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `66b74e78ba`) Change-Id: I532c49b2ef6ec95dd26a99c02e12ea53348f9690 Reviewed-on: https://gerrit.openafs.org/12783 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-28 11:43:08 -05:00
Marcio Barbosa	ad11867973	afs: fix kernel_write / kernel_read arguments The order / content of the arguments passed to kernel_write and kernel_read are not right. As a result, the kernel will panic if one of the functions in question is called. [kaduk@mit.edu: include configure check for multiple kernel_read() variants, per linux commits bdd1d2d3d251c65b74ac4493e08db18971c09240 and e13ec939e96b13e664bb6cee361cc976a0ee621a] FIXES 134440 Reviewed-on: https://gerrit.openafs.org/12769 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `3ce55426ee`) Change-Id: I28f04f7625a471c37f98515d5186f80082bf6a43 Reviewed-on: https://gerrit.openafs.org/12780 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-27 23:50:56 -05:00
Michael Meffie	42993b3a33	tests: fix out of bounds access in the rx-event test Use the NUMEVENTS symbol which defines the array size instead of an incorrect hard coded number when checking if a second event can be added to be fired at the same time. This fixes a potential out of bounds access of the event test array. Also update the comment which incorrectly mentions the incorrect number of events in the test. Reviewed-on: https://gerrit.openafs.org/12762 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> (cherry picked from commit `50a3eb7b7e`) Change-Id: I7a975e7498c1c7416a800c9294c97ee4de4fd57a Reviewed-on: https://gerrit.openafs.org/12779 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-27 23:49:46 -05:00
Benjamin Kaduk	6c635a66b5	Sprinkle rx_GetConnection() for concision Instead of inlining the body (taking the lock, incrementing the refcount, and dropping the lock), use the convenience function designed for this purpose. Reviewed-on: https://gerrit.openafs.org/12772 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `2ae84bf053`) Change-Id: I60794d877a76fbb7c8ba59207e710a20641cc8f1 Reviewed-on: https://gerrit.openafs.org/12778 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-27 23:49:09 -05:00
Benjamin Kaduk	667617b870	rx: fix mutex leak in error case Reported by Mark Vitale Reviewed-on: https://gerrit.openafs.org/12771 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `01bcfd3e14`) Change-Id: I4384d6813a5cfb053e6991eb3c157fa59ecfa11b Reviewed-on: https://gerrit.openafs.org/12777 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-27 23:48:34 -05:00
Benjamin Kaduk	4220eadae0	Add event-related mutex assertions In utility functions that access fields of type struct rxevent *, assert that the appropriate lock is held for the access in question. These assertions are only compiled in when built with -DOPR_DEBUG_LOCKS, which can be enbled by --debug-locks at configure time. Reviewed-on: https://gerrit.openafs.org/12757 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `a7a3108e60`) Change-Id: I147a2e475feffb1b75a08ac5b08614bd6d8f46a5 Reviewed-on: https://gerrit.openafs.org/12776 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-27 23:48:12 -05:00
Benjamin Kaduk	8ce3b5e253	Standardize rx_event usage Go over all consumers of the rx event framework and normalize its usage according to the following principles: rxevent_Post() is used to create an event, and it returns an event handle (with a reference on the event structure) that can be used to cancel the event before its timeout fires. (There is also an additional reference on the event held by the global event tree.) In all() usage within the tree, that event handle is stored within either an rx_connection or an rx_call. Reads/writes to the member variable that holds the event handle require either the conn_data_lock or call lock, respectively -- that means that in most cases, callers of rxevent_Post() and rxevent_Cancel() will be holding one of those aforementioned locks. The event handlers themselves will need to modify the call/connection object according to the nature of the event, which requires holding those same locks, and also a guarantee that the call/connection is still a live object and has not been deallocated! Whether or not rxevent_Cancel() succeeds in cancelling the event before it fires, whenever passed a non-NULL event structure it will NULL out the supplied pointer and drop a reference on the event structure. This is the correct behavior, since the caller has asked to cancel the event and has no further use for the event handle or its reference on the event structure. The caller of rxevent_Cancel() must check its return value to know whether or not the event was cancelled before its handler was able to run. The interaction window between the call/connection lock and the lock protecting the red/black tree of pending events opens up a somewhat problematic race window. Because the application thread is expected to hold the call/connection lock around rxevent_Cancel() (to protect the write to the field in the call/connection structure that holds an event handle), and rxevent_Cancel() must take the lock protecting the red/black tree of events, this establishes a lock order with the call/connection lock taken before the eventTree lock. This is in conflict with the event handler thread, which must take the eventTree lock first, in order to select an event to run (and thus know what additional lock would need to be taken, by virtue of what handler function is to be run). The conflict is easy to resolve in the standard way, by having a local pointer to the event that is obtained while the event is removed from the red/black tree under the eventTree lock, and then the eventTree lock can be dropped and the event run based on the local variable referring to it. The race window occurs when the caller of rxevent_Cancel() holds the call/connection lock, and rxevent_Cancel() obtains the eventTree lock just after the event handler thread drops it in order to run the event. The event handler function begins to execute, and immediately blocks trying to obtain the call/connection lock. Now that rxevent_Cancel() has the eventTree lock it can proceed to search the tree, fail to find the indicated event in the tree, clear out the event pointer from the call/connection data structure, drop its caller's reference to the event structure, and return failure (the event was not cancelled). Only then does the caller of rxevent_Cancel() drop the call/connection lock and allow the event handler to make progress. This race is not necessarily problematic if appropriate care is taken, but in the previous code such was not the case. In particular, it is a common idiom for the firing event to call rxevent_Put() on itself, to release the handle stored in the call/connection that could have been used to cancel the event before it fired. Failing to do so would result in a memory leak of event structures; however, rxevent_Put() does not check for a NULL argument, so a segfault (NULL dereference) was observed in the test suite when the race occurred and the event handler tried to rxevent_Put() the reference that had already been released by the unsuccessful rxevent_Cancel() call. Upon inspection, many (but not all) of the uses in rx.c were susceptible to a similar race condition and crash. The test suite also papers over a related issue in that the event handler in the test suite always knows that the data structure containing the event handle will remain live, since it is a global array that is allocated for the entire scope of the test. In rx.c, events are associated with calls and connections that have a finite lifetime, so we need to take care to ensure that the call/connection pointer stored in the event remains valid for the duration of the event's lifecycle. In particular, even an attempt to take the call/connection lock to check whether the corresponding event field is NULL is fraught with risk, as it could crash if the lock (and containing call/connection) has already been destroyed! There are several potential ways to ensure the liveness of the associated call/connection while the event handler runs, most notably to take care in the call/connection destruction path to ensure that all associated events are either successfully cancelled or run to completion before tearing down the call/connection structure, and to give the pending event its own reference on the associated call/connection. Here, we opt for the latter, acknowledging that this may result in the event handler thread doing the full call/connection teardown and delay the firing of subsequent events. This is deemed acceptable, as pending events are for intentionally delayed tasks, and some extra delay is probably acceptable. (The various keepalive events and the challenge event could delay the user experience and/or security properties if significantly delayed, but I do not believe that this change admits completely unbounded delay in the event handler thread, so the practical risk seems minimal.) Accordingly, this commit attempts to ensure that: Each event holds a formal reference on its associated call/connection. * The appropriate lock is held for all accesses to event pointers in call/connection structures. * Each event handler (after taking the appropriate lock) checks whether it raced with rxevent_Cancel() and only drops the call/connection's reference to the event if the race did not occur. * Each event handler drops its reference to the associated call/connection after doing any actions that might access/modify the call/connection. * The per-event reference on the associated call/connection is dropped by the thread that removes the event from the red/black tree. That is, the event handler function if the event runs, or by the caller of rxevent_Cancel() when the cancellation succeed. * No non-NULL event handles remain in a call/connection being destroyed, which would indicate a refcounting error. (*) There is an additional event used in practice, to reap old connections, but it is effectively a background task that reschedules itself periodically, with no handle to the event retained so as to be able to cancel it. As such, it is unaffected by the concerns raised here. While here, standardize on the rx_GetConnection() function for incrementing the reference count on a connection object, instead of inlining the corresponding mutex lock/unlock and variable access. In contrast to what was done on master, for the 1.8 branch we do not force-enable refcount checking. Reviewed-on: https://gerrit.openafs.org/12756 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> (cherry picked from commit `304d758983`) Change-Id: I68e6cc162a148b6ebbabe037a7bc3cccd648423c Reviewed-on: https://gerrit.openafs.org/12775 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2017-11-27 23:46:00 -05:00

1 2 3 4 5 ...

12744 Commits