openafs

mirror of https://git.openafs.org/openafs.git synced 2025-01-18 06:50:12 +00:00

Author	SHA1	Message	Date
Ian Wienand	c7c71d2429	Add .gitreview git-review [1] makes it much easier to submit changes. Add a default configuration file. [1] https://docs.openstack.org/infra/git-review/usage.html Change-Id: I9615a81c9b199c86e8de2fedc710e3246deeac84 Reviewed-on: https://gerrit.openafs.org/12884 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-04 15:34:55 -05:00
Mark Vitale	5e09a694ec	SOLARIS: Avoid vcache locks when flushing pages for RO vnodes We have multiple code paths that hold the following locks at the same time: - avc->lock for a vcache - The page lock for a page in 'avc' In order to avoid deadlocks, we need a consistent ordering for obtaining these two locks. The code in afs_putpage() currently obtains avc->lock before the page lock (Obtain*Lock is called before pvn_vplist_dirty). The code in afs_getpages() also obtains avc->lock before the page lock, but it does so in a loop for all requested pages (via pvn_getpages()). On the second iteration of that loop, it obtains avc->lock, and the page from the first iteration of the loop is still locked. Thus, it obtains a page lock before locking avc->lock in some cases. Since we have two code paths that obtain those two locks in a different order, a deadlock can occur. Fixing this properly requires changing at least one of those code paths, so the locks are taken in a consistent order. However, doing so is complex and will be done in a separate future commit. For this commit, we can avoid the deadlock for RO volumes by simply avoiding taking avc->lock in afs_putpages() at all while the pages are locked. Normally, we lock avc->lock because pvn_vplist_dirty() will call afs_putapage() for each dirty page (and afs_putapage() requires avc->lock held). But for RO volumes, we will have no dirty pages (because RO volumes cannot be written to from a client), and so afs_putapage() will never be called. So to avoid this deadlock issue for RO volumes, avoid taking avc->lock across the pvn_vplist_dirty() call in afs_putpage(). We now pass a dummy pageout callback function to pvn_vplist_dirty() instead, which should never be called, and which panics if it ever is. We still need to hold avc->lock a few other times during afs_putpage() for other minor reasons, but none of these hold page locks at the same time, so the deadlock issue is still avoided. [mmeffie: comments, and fix missing write lock, fix lock releases] [adeason: revised commit message] Change-Id: Iec11101147220828f319dae4027e7ab1f08483a6 Reviewed-on: https://gerrit.openafs.org/12247 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-04 09:14:56 -05:00
Michael Meffie	073522b3d4	add rfc3961.h to kernel sources Export this header to the kernel sources in the libafs_tree, since it is needed for the kernel module build. FIXES 134476 Change-Id: Id359c6d065c259601d14ee5c02b93647f86a0288 Reviewed-on: https://gerrit.openafs.org/12882 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-02-01 22:56:07 -05:00
Michael Meffie	3ca1352170	CellServDB update 14 Mar 2017 Update all remaining copies of CellServDB in the tree, and make the Red Hat packaging use it by default too. Change-Id: I5a70a7c658ad0056cd10945bb730e84f0edfb730 Reviewed-on: https://gerrit.openafs.org/12880 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-28 00:19:01 -05:00
Benjamin Kaduk	88dc4d93f5	Add param.h files for recent FreeBSD Add files for FreeBSD 10.4, 11.1, and 12.0 (12-CURRENT), for i386 and amd64. Change-Id: I904f576914bb965a659750e6302f011acf66ba81 Reviewed-on: https://gerrit.openafs.org/12863 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-28 00:16:44 -05:00
Benjamin Kaduk	c390f368a5	FBSD: catch up to missing sysnames Add sysnames for i386 and amd64 10.4, 11.1, and 12.0 (12-CURRENT, at present). Change-Id: If38ecca7b2b3e40c186b7e9321ce017b4711139c Reviewed-on: https://gerrit.openafs.org/12862 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-26 09:57:41 -05:00
Marcio Barbosa	f5c289d00a	ubik: check if epoch is sane before db relabel The sync-site relabels its database at the end of the first write transaction. The new label will be equal to the time at which the sync-site in question first received its coordinator mandate. This time is stored by a global called ubik_epochTime. In order to make sure that the new database label is sane, only relabel the database if ubik_epochTime is within a specific range. Change-Id: I2408569e5de46d387f63cbc2fab05ea1264a505c Reviewed-on: https://gerrit.openafs.org/12640 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2018-01-26 09:31:35 -05:00
Marcio Barbosa	50c1d1088d	ubik: update ubik_dbVersion during SDISK_SendFile The ubik_dbVersion global represents the sync site's database version and it is mostly used by the remote sites for sanity checks. Currently, this global is updated when database changes are made on the sync site (SDISK_Commit or SDISK_SetVersion), as well as every time we vote "yes" for the sync-site in a beacon reply. Unfortunately, ubik_dbVersion is not updated when a copy of the sync site's database is received via DISK_SendFile, and it won't get updated until our next "yes" vote. During this window, the current database version will not match ubik_dbVersion. As a result, any write transaction during this time frame will fail on the remote site in question. To fix this problem, do not wait for the next beacon packet to update ubik_dbVersion when the sync site's database is received; just update it when we get the new database. Since no write transactions are allowed while the db is transferring, ubik_dbVersion can be safely updated. Change-Id: Ide7a695a69cb3229ad585d9e56c5ddc2efb76dd7 Reviewed-on: https://gerrit.openafs.org/12716 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-20 15:54:55 -05:00
Andrew Deason	ef1d4c8d32	LINUX: Avoid locking inode in check_dentry_race Currently, check_dentry_race locks the parent inode in order to ensure it is not running in parallel with d_splice_alias for the same inode. (For old Linux kernel versions; see commit `b0461f2d`: "LINUX: Workaround d_splice_alias/d_lookup race".) However, it is possible to hit this area of code when the parent inode is already locked. When someone tries to create a file, directory, or symlink, Linux tries to lookup the dentry for the target path, to see if it already exists. While looking up the last component of the path, Linux locks the directory, and if it finds a dentry for the target name, it calls d_invalidate on it while the parent directory is locked. For a dentry with a NULL inode, we'll then try to lock the parent inode in check_dentry_race. But since the inode is already locked, we will deadlock. From a user's point of view, the hang can be reproduced by doing something similar to: $ mkdir dir # succeeds $ rmdir dir $ ls -l dir ls: cannot access dir: No such file or directory $ mkdir dir # hangs To avoid this, we can just change which lock we're using to avoid check_dentry_race/d_splice_alias from running in parallel. Instead of locking the parent inode, introduce a new global lock (called dentry_race_sem), and lock that in check_dentry_race and around our d_splice_alias call. We know that those are the only two users of this new lock, so this should avoid any such deadlocks. This does potentially reduce performance, since all tasks that hit check_dentry_race or d_splice_alias will take the same global lock. However, this at least still allows us to make use of negative dentries, and this entire code path only applies to older Linux kernels. It could be possible to add a new lock into struct vcache instead, but using a global lock like this commit does is much simpler. Change-Id: Ide0f21145c83d6fbb34c637d8a36c8cd21549940 Reviewed-on: https://gerrit.openafs.org/12868 Tested-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-20 01:56:23 -05:00
Michael Meffie	f599e1ce63	redhat: fix conditional for kernel-debuginfo files directive Commit `443dd5367e` added support for a separate debuginfo package for the kernel module. Unfortunately, the %files directive for the kernel module debuginfo package was incorrectly placed in the %if stanza of the build_userspace condition, so the rpmbuild fails when attempting to build just the kernel module. That is, when running rpmbuild with the options: rpmbuild --define "build_userspace 0" --define "build_modules 1" ... rpmbuild fails with: RPM build errors: Installed (but unpackaged) file(s) found: /usr/lib/debug/lib/modules/.../extra/openafs/openafs.ko.debug Fix this by moving the new %files directive out of the build_userspace conditional. Change-Id: I46e74b660048022a4cc4327835c6055402a34ccf Reviewed-on: https://gerrit.openafs.org/12874 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-18 14:16:03 -05:00
Michael Meffie	6a2b85cd4c	autoconf: refactor linux-checks.m4 Further refactoring of the autoconf macros. Divy up the linux kernel checks into smaller files. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Change-Id: I5ea4c9e3a0aeff1767ef561bdb8361781694ee28 Reviewed-on: https://gerrit.openafs.org/12844 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-09 23:31:36 -05:00
Michael Meffie	3c2e39bab7	autoconf: refactor ostype.m4 Further refactoring of the autoconf macros. Move more linux and solaris specific checks into their own files. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Change-Id: Ib3e7b1270826970c541a695230f4e3cd13cf9e3d Reviewed-on: https://gerrit.openafs.org/12843 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-09 23:24:16 -05:00
Michael Meffie	c72622a244	autoconf: refactor acinclude.m4 The acinclude.m4 is very large and often requires to be changed for unrelated commits. Divy up the large acinclude.m4 into a number of smaller files to avoid so many contentions and to make the autoconf system easier to maintain. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Change-Id: I70e7f846dea0055d00a60a47422aa73bff25c4c6 Reviewed-on: https://gerrit.openafs.org/12842 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-09 00:22:06 -05:00
Benjamin Kaduk	0760feb799	rx: remove trailing semicolons from FBSD mutex operations Since the first introduction of FreeBSD support, the macros (MUTEX_ENTER, etc.) for kernel mutex operations have included trailing semicolons, unique among all the platforms. This did not cause problems until the recent work on rx event handlers, which put a MUTEX_ENTER() in the body of an 'if' clause with no brackets, and attempted to follow it with an 'else' clause. This results in the following (rather obtuse) compiler error: /root/openafs/src/rx/rx.c:3666:5: error: expected expression else ^ Which is more visible in the preprocessed source, as if (condition) expression;; else other_expression; is clearly invalid C. To fix the FreeBSD kernel module build, remove the unneeded semicolons. Change-Id: I191009ad412852dcc03cd71a0982fe41a953301d Reviewed-on: https://gerrit.openafs.org/12853 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-08 21:46:25 -05:00
Benjamin Kaduk	decb4308d4	libuafs: remove stale afs_nfsdisp.lo rule afs_nfsdisp.lo is not used, so we do not need a build rule for it. Change-Id: I4ca53a4823b0ccd5bfd769867f6766bd05ea4ceb Reviewed-on: https://gerrit.openafs.org/12802 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-08 21:35:26 -05:00
Benjamin Kaduk	e443a9fb67	Replace <rpc/types.h> with <rx/xdr.h> Our in-tree xdr.h appears to have started life as a concatenation of rpc/types.h and rpc/xdr.h, and should include all the needed functionality. Indeed, commit `7293ddf325` even indicates that we expect to be using our in-tree XDR everywhere anyway, so the system XDR is superfluous. Note that afs/sysincludes.h (not afsincludes.h!) already includes rx/xdr.h ifndef AFS_LINUX22_ENV. This change should help systems running glibc 2.26 or newer, which has stopped providing the Sun RPC headers by default. While here remove some duplicate includes of rpc/types.h in the AIX-specific sources. The Solaris NFS translator bits cannot really be changed, since the system headers are used and have tight interdependencies. Update rxgen to not emit rpc/types.h inclusion. [mmeffie: squash 12801 to not emit rpc/types.h from rxgen] Change-Id: I0b195216affa06ab9e259cb0bab0c8286a1636d9 Reviewed-on: https://gerrit.openafs.org/12800 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-08 21:32:40 -05:00
Mark Vitale	afbc199f15	LINUX: Avoid d_invalidate() during afs_ShakeLooseVCaches() With recent changes to d_invalidate's semantics (it returns void in Linux 3.11, and always returns success in RHEL 7.4), it has become increasingly clear that d_invalidate() is not the best function for use in our best-effort (nondisruptive) attempt to free up vcaches that is afs_ShakeLooseVCaches(). The new d_invalidate() semantics always force the invalidation of a directory dentry, which contradicts our desire to be nondisruptive, especially when that directory is being used as the current working directory for a process. Our call to d_invalidate(), intended to merely probe for whether a dentry can be discarded without affecting other consumers, instead would cause processes using that dentry as a CWD to receive ENOENT errors from getcwd(). A previous commit (`c3bbf0b444`) tried to address this issue by calling d_prune_aliases() instead of d_invalidate(), but d_prune_aliases() does not recursively descend into children of the given dentry while pruning, leaving it an incomplete solution for our use-case. To address these issues, modify the shakeloose routine TryEvictDentries() to call shrink_dcache_parent() and maybe __d_drop() for directories, and d_prune_aliases() for non-directories, instead of d_invalidate(). (Calls to d_prune_aliases() for directories have already been removed by reverting commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286.) Just like d_invalidate(), shrink_dcache_parent() has been around "forever" (since pre-git v2.6.12). Also like d_invalidate(), it "walks" the parent dentry's subdirectories and "shrinks" (unhashes) unused dentries. But unlike d_invalidate(), shrink_dcache_parent() will not unhash an in-use dentry, and has never changed its signature or semantics. d_prune_aliases() has also been available "forever", and has also never changed its signature or semantics. The lack of recursive descent is not an issue for non-directories, which cannot have such children. [kaduk@mit.edu: apply review feedback to fix locking and avoid extraneous changes, and reword commit message] Change-Id: Icb6138ee5785e0ef82a9b85b1d2651dfd0830043 Reviewed-on: https://gerrit.openafs.org/12830 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2018-01-01 22:50:39 -05:00
Mark Vitale	5076dfc14b	LINUX: consolidate duplicate code in osi_TryEvictDentries The two stanzas for HAVE_DCACHE_LOCK are now functionally identical; remove the preprocessor conditionals and duplicate code. Minor functional change is incurrred for very old (before 2.6.38) Linux versions that have dcache_lock; we are now obtaining the d_lock as well. This is safe because d_lock is also quite old (pre-git, 2.6.12), and it is a spinlock that's only held for checking d_unhashed. Therefore, it should have negligible performance impact. It cannot cause deadlocks or violate locking order, because spinlocks can't be held across sleeps. Change-Id: I08faf204e6bd82c4401cdf6048d12cd551dd18fc Reviewed-on: https://gerrit.openafs.org/12792 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Andrew Deason <adeason@dson.org> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2018-01-01 22:05:09 -05:00
Mark Vitale	0678ad26b6	LINUX: consolidate duplicate code in canonical_dentry The two stanzas for HAVE_DCACHE_LOCK are now identical; remove the preprocessor conditionals and duplicate code. No functional change should be incurred by this commit. Change-Id: I15cd4631d1932dcfb920313acb82fcbe570087e8 Reviewed-on: https://gerrit.openafs.org/12791 Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2018-01-01 21:06:07 -05:00
Mark Vitale	652cd597d9	LINUX: add afs_d_alias_lock & _unlock compat wrappers Simplify some #ifdefs for HAVE_DCACHE_LOCK by pushing them down into new helpers in osi_compat.h. No functional change should be incurred by this commit. Change-Id: Ia0dc560bc84c8db4b84ddcc77a17bab5fbf93af9 Reviewed-on: https://gerrit.openafs.org/12790 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2018-01-01 18:39:54 -05:00
Mark Vitale	74f4bfc627	LINUX: create afs_linux_dget() compat wrapper For dentry operations that cover multiple dentry aliases of a single inode, create a compatibility wrapper to hide differences between the older dget_locked() and the current dget(). No functional change should be incurred by this commit. Change-Id: I2bb0d453417f37707018f6ba5859903c3d34c8ff Reviewed-on: https://gerrit.openafs.org/12789 Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2018-01-01 17:42:57 -05:00
Mark Vitale	367693bd7d	Revert "LINUX: do not use d_invalidate to evict dentries" Linux recently changed the semantics of d_invalidate() to: - return void - invalidate even a current working directory OpenAFS commit `c3bbf0b444` switched libafs to use d_prune_aliases() instead. However, since that commit, several things have happened: - RHEL 7.4 changed the semantics of d_invalidate() such that it invalidates the cwd, but did NOT change the return type to void. This broke our autoconf test for detecting the new semantics. - Further research reveals that d_prune_aliases() was not the best choice for replacing d_invalidate(). This is because for directories, d_prune_aliases() doesn't invalidate dentries when they are referenced by its children, and it doesn't walk the tree trying to invalidate child dentries. So it can leave dentries dangling, if the only references to thos dentries are via children. In preparation for future commits, revert `c3bbf0b444` . Change-Id: Iafbef23a6070180c0e21eb01a2d59385ef52f55c Reviewed-on: https://gerrit.openafs.org/12788 Reviewed-by: Andrew Deason <adeason@dson.org> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2018-01-01 16:53:32 -05:00
Mark Vitale	f8247078bd	Revert "LINUX: eliminate unused variable warning" This reverts commit `19599b5ef5` to allow also reverting commit `c3bbf0b444` . Change-Id: I2780fe68d352f0f1def198f21127ec944d1d2c1d Reviewed-on: https://gerrit.openafs.org/12787 Reviewed-by: Andrew Deason <adeason@dson.org> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2018-01-01 16:02:34 -05:00
Stephan Wiesand	fb1f14d8ee	Linux 4.15: check for 2nd argument to pagevec_init Linux 4.15 removes the distinction between "hot" and "cold" cache pages, and pagevec_init() no longer takes a "cold" flag as the second argument. Add a configure test and use it in osi_vnodeops.c . Change-Id: Ia5287b409b2a811d2250c274579e6f15fd18fdbb Reviewed-on: https://gerrit.openafs.org/12824 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Tested-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-22 23:22:17 -05:00
Stephan Wiesand	be5f5b2aff	Linux: use plain page_cache_alloc Linux 4.15 removes the distinction between "hot" and "cold" cache pages, and no longer provides page_cache_alloc_cold(). Simply use page_cache_alloc() instead, rather than adding yet another test. Change-Id: I34e734223927030f7ff252acb61120366a808ad6 Reviewed-on: https://gerrit.openafs.org/12823 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Tested-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-22 23:17:13 -05:00
Pat Riehecky	443dd5367e	redhat: separate debuginfo package for kmod rpm Place the debuginfo for the kmod into its own rpm so that it doesn't have to track against the userspace packages. FIXES 132034 Change-Id: I60a753275d896a89c1f6896c653d78a4e1fe7e2c Reviewed-on: https://gerrit.openafs.org/11867 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2017-12-20 11:28:31 -05:00
Christof Hanke	fd4eaebb60	Avoid gcc warning When using the configure option --enable-checking with gcc 7.2.1, the compilation fails with vutil.c:860:20: error: ‘%s’ directive writing up to 255 bytes into \ a region of size 63 [-Werror=format-overflow=] This can be seen in the logs of the openSUSE Tumbleweed builder for e.g. build 2368. Avoid this warning by using snprintf which is provided by libroken for all platforms. Change-Id: I6acd3a1c06760abc8144c0892812c3bb50477227 Reviewed-on: https://gerrit.openafs.org/12813 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2017-12-18 23:37:22 -05:00
Marcio Barbosa	6e57b22642	macos: make the OpenAFS client aware of APFS Apple has introduced a new file system called APFS. Starting from High Sierra, APFS replaces Mac OS Extended (HFS+) as the default file system for solid-state drives and other flash storage devices. The current OpenAFS client is not aware of APFS. As a result, the installation of the current client into an APFS volume will panic the machine. To fix this problem, make the OpenAFS client aware of APFS. Change-Id: Ib5ac88b87f348744864f4e33f1f222efbc852d41 Reviewed-on: https://gerrit.openafs.org/12743 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-18 16:57:20 -05:00
Marcio Barbosa	e533d07370	macos: packaging support for MacOS X 10.13 This commit introduces the new set of changes / files required to successfully create the dmg installer on OS X 10.13 "High Sierra". Change-Id: Id9da3cf959627a13d8cfd1d1d7412820e46ad63e Reviewed-on: https://gerrit.openafs.org/12742 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-18 16:57:06 -05:00
Marcio Barbosa	804c9cbf50	macos: add support for MacOS 10.13 This commit introduces the new set of changes / files required to successfully build the OpenAFS source code on OS X 10.13 "High Sierra". Change-Id: I51928279d97c9d86c67db7de5eb7fc9d317fd381 Reviewed-on: https://gerrit.openafs.org/12741 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2017-12-17 15:31:31 -05:00
Benjamin Kaduk	edc5463f3d	Fix macro used to check kernel_read() argument order The m4 macro implementing the configure check is called LINUX_KERNEL_READ_OFFSET_IS_LAST, but it defines a preprocessor symbol that is just KERNEL_READ_OFFSET_IS_LAST. Our code needs to check for the latter being defined, not the former. Reported by Aaron Ucko. Change-Id: Id7cd3245b6a8eb05f83c03faee9c15bab8d0f6e8 Reviewed-on: https://gerrit.openafs.org/12808 Reviewed-by: Anders Kaseorg <andersk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-12-14 21:55:01 -05:00
Benjamin Kaduk	894555f93a	OPENAFS-SA-2017-001: rx: Sanity-check received MTU and twind values Rather than blindly trusting the values received in the (unauthenticated) ack packet trailer, apply some minmial sanity checks to received values. natMTU and regular MTU values are subject to Rx minmium/maximum packet sizes, and the transmit window cannot drop below one without risk of deadlock. The maxDgramPackets value that can also be present in the trailer already has sufficient sanity checking. Extremely low MTU values (less than 28 == RX_HEADER_SIZE) can cause us to set a negative "maximum usable data" size that gets used as an (unsigned) packet length for subsequent allocation and computation, triggering an assertion when the connection is used to transmit data. FIXES 134450 Change-Id: I37698ff166da47a57aa0d1962ae8effc74e30851	2017-12-05 08:25:44 -06:00
Benjamin Kaduk	4fa0ee620c	afs: Fix bounds check in PNewCell Reported by the opensuse buildbot: CC [M] /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/libafs/MODLOAD-4.13.12-1-default-MP/rx_packet.o /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c: In function ‘PNewCell’: /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c:3075:55: error: ‘’ in boolean context, suggest ‘&&’ instead [-Werror=int-in-bool-context] if ((afs_pd_remaining(ain) < AFS_MAXCELLHOSTS +3) sizeof(afs_int32)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ The bug was introduced in commit `718f85a8b6`. Change-Id: Iae55a99e35266aa763fb431f2acc4eba09fa5357 Reviewed-on: https://gerrit.openafs.org/12782 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-28 10:15:37 -05:00
Benjamin Kaduk	66b74e78ba	rx: fix call refcount leak in error case The recent event handling normalization in commit `304d758983` had event handlers switch to dropping their reference on the associated connection/call just before return. An early return case was missed in the conversion, leading to a refcount leak in an error case. Change-Id: Ie3d0bc9474fdbc09be9c753f4d0192c8cca68351 Reviewed-on: https://gerrit.openafs.org/12781 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-28 10:14:50 -05:00
Marcio Barbosa	3ce55426ee	afs: fix kernel_write / kernel_read arguments The order / content of the arguments passed to kernel_write and kernel_read are not right. As a result, the kernel will panic if one of the functions in question is called. [kaduk@mit.edu: include configure check for multiple kernel_read() variants, per linux commits bdd1d2d3d251c65b74ac4493e08db18971c09240 and e13ec939e96b13e664bb6cee361cc976a0ee621a] FIXES 134440 Change-Id: I4753dee61f1b986bbe6a12b5568d1a8db30c65f8 Reviewed-on: https://gerrit.openafs.org/12769 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-27 22:14:21 -05:00
Michael Meffie	50a3eb7b7e	tests: fix out of bounds access in the rx-event test Use the NUMEVENTS symbol which defines the array size instead of an incorrect hard coded number when checking if a second event can be added to be fired at the same time. This fixes a potential out of bounds access of the event test array. Also update the comment which incorrectly mentions the incorrect number of events in the test. Change-Id: I4f993b42e53e7e6a42fa31302fd1baa70e9f5041 Reviewed-on: https://gerrit.openafs.org/12762 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2017-11-22 20:50:47 -05:00
Benjamin Kaduk	2ae84bf053	Sprinkle rx_GetConnection() for concision Instead of inlining the body (taking the lock, incrementing the refcount, and dropping the lock), use the convenience function designed for this purpose. Change-Id: I674d389e61e42710ef340e202992748e66c5e763 Reviewed-on: https://gerrit.openafs.org/12772 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-22 20:26:38 -05:00
Benjamin Kaduk	01bcfd3e14	rx: fix mutex leak in error case Reported by Mark Vitale Change-Id: I3269fbb0f87285bcb9af64f4ad81791177582e6d Reviewed-on: https://gerrit.openafs.org/12771 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-22 20:22:30 -05:00
Benjamin Kaduk	a7a3108e60	Add event-related mutex assertions In utility functions that access fields of type struct rxevent *, assert that the appropriate lock is held for the access in question. These assertions are only compiled in when built with -DOPR_DEBUG_LOCKS, which can be enbled by --debug-locks at configure time. Change-Id: I16885a4d37a0f094f0d365c54e8157ed92070c69 Reviewed-on: https://gerrit.openafs.org/12757 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-22 20:21:44 -05:00
Benjamin Kaduk	304d758983	Standardize rx_event usage Go over all consumers of the rx event framework and normalize its usage according to the following principles: rxevent_Post() is used to create an event, and it returns an event handle (with a reference on the event structure) that can be used to cancel the event before its timeout fires. (There is also an additional reference on the event held by the global event tree.) In all() usage within the tree, that event handle is stored within either an rx_connection or an rx_call. Reads/writes to the member variable that holds the event handle require either the conn_data_lock or call lock, respectively -- that means that in most cases, callers of rxevent_Post() and rxevent_Cancel() will be holding one of those aforementioned locks. The event handlers themselves will need to modify the call/connection object according to the nature of the event, which requires holding those same locks, and also a guarantee that the call/connection is still a live object and has not been deallocated! Whether or not rxevent_Cancel() succeeds in cancelling the event before it fires, whenever passed a non-NULL event structure it will NULL out the supplied pointer and drop a reference on the event structure. This is the correct behavior, since the caller has asked to cancel the event and has no further use for the event handle or its reference on the event structure. The caller of rxevent_Cancel() must check its return value to know whether or not the event was cancelled before its handler was able to run. The interaction window between the call/connection lock and the lock protecting the red/black tree of pending events opens up a somewhat problematic race window. Because the application thread is expected to hold the call/connection lock around rxevent_Cancel() (to protect the write to the field in the call/connection structure that holds an event handle), and rxevent_Cancel() must take the lock protecting the red/black tree of events, this establishes a lock order with the call/connection lock taken before the eventTree lock. This is in conflict with the event handler thread, which must take the eventTree lock first, in order to select an event to run (and thus know what additional lock would need to be taken, by virtue of what handler function is to be run). The conflict is easy to resolve in the standard way, by having a local pointer to the event that is obtained while the event is removed from the red/black tree under the eventTree lock, and then the eventTree lock can be dropped and the event run based on the local variable referring to it. The race window occurs when the caller of rxevent_Cancel() holds the call/connection lock, and rxevent_Cancel() obtains the eventTree lock just after the event handler thread drops it in order to run the event. The event handler function begins to execute, and immediately blocks trying to obtain the call/connection lock. Now that rxevent_Cancel() has the eventTree lock it can proceed to search the tree, fail to find the indicated event in the tree, clear out the event pointer from the call/connection data structure, drop its caller's reference to the event structure, and return failure (the event was not cancelled). Only then does the caller of rxevent_Cancel() drop the call/connection lock and allow the event handler to make progress. This race is not necessarily problematic if appropriate care is taken, but in the previous code such was not the case. In particular, it is a common idiom for the firing event to call rxevent_Put() on itself, to release the handle stored in the call/connection that could have been used to cancel the event before it fired. Failing to do so would result in a memory leak of event structures; however, rxevent_Put() does not check for a NULL argument, so a segfault (NULL dereference) was observed in the test suite when the race occurred and the event handler tried to rxevent_Put() the reference that had already been released by the unsuccessful rxevent_Cancel() call. Upon inspection, many (but not all) of the uses in rx.c were susceptible to a similar race condition and crash. The test suite also papers over a related issue in that the event handler in the test suite always knows that the data structure containing the event handle will remain live, since it is a global array that is allocated for the entire scope of the test. In rx.c, events are associated with calls and connections that have a finite lifetime, so we need to take care to ensure that the call/connection pointer stored in the event remains valid for the duration of the event's lifecycle. In particular, even an attempt to take the call/connection lock to check whether the corresponding event field is NULL is fraught with risk, as it could crash if the lock (and containing call/connection) has already been destroyed! There are several potential ways to ensure the liveness of the associated call/connection while the event handler runs, most notably to take care in the call/connection destruction path to ensure that all associated events are either successfully cancelled or run to completion before tearing down the call/connection structure, and to give the pending event its own reference on the associated call/connection. Here, we opt for the latter, acknowledging that this may result in the event handler thread doing the full call/connection teardown and delay the firing of subsequent events. This is deemed acceptable, as pending events are for intentionally delayed tasks, and some extra delay is probably acceptable. (The various keepalive events and the challenge event could delay the user experience and/or security properties if significantly delayed, but I do not believe that this change admits completely unbounded delay in the event handler thread, so the practical risk seems minimal.) Accordingly, this commit attempts to ensure that: Each event holds a formal reference on its associated call/connection. * The appropriate lock is held for all accesses to event pointers in call/connection structures. * Each event handler (after taking the appropriate lock) checks whether it raced with rxevent_Cancel() and only drops the call/connection's reference to the event if the race did not occur. * Each event handler drops its reference to the associated call/connection after doing any actions that might access/modify the call/connection. * The per-event reference on the associated call/connection is dropped by the thread that removes the event from the red/black tree. That is, the event handler function if the event runs, or by the caller of rxevent_Cancel() when the cancellation succeed. * No non-NULL event handles remain in a call/connection being destroyed, which would indicate a refcounting error. (*) There is an additional event used in practice, to reap old connections, but it is effectively a background task that reschedules itself periodically, with no handle to the event retained so as to be able to cancel it. As such, it is unaffected by the concerns raised here. While here, standardize on the rx_GetConnection() function for incrementing the reference count on a connection object, instead of inlining the corresponding mutex lock/unlock and variable access. Also enable refcount checking unconditionally on unix, as this is a rather invasive change late in the 1.8.0 release process and we want to get as much sanity checking coverage as possible. Change-Id: I27bcb932ec200ff20364fb1b83ea811221f9871c Reviewed-on: https://gerrit.openafs.org/12756 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-22 20:17:40 -05:00
Benjamin Kaduk	bdb509fb1d	Adjust rx-event test to exercise cancel/fire race We currently do not properly handle the case where a thread runs rxevent_Cancel() in parallel with the event-handler thread attempting to fire that event, but the test suite only picked up on this issue in a handful of the Debian automated builds (somewhat less-resourced ones, perhaps). Modify the event scheduling algorithm in the test so as to create a larger chunk of events scheduled to fire "right away" and thereby exercise the race condition more often when we proceed to cancel a quarter of events "right away". Change-Id: I50f55fd532901147cfda1a5f40ef949bf3270401 Reviewed-on: https://gerrit.openafs.org/12755 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-08 11:25:23 -05:00
Michael Laß	311f1d28a2	gtx: link against libtinfo if termlib is seperated If ncurses is built with "./configure --with-termlib=tinfo", gtx fails to link because of an undefined reference to the LINES symbol which is then provided by libtinfo.so and not libncurses.so. If ncurses is present, additionally check whether LINES is provided by ncurses or tinfo and set $LIB_curses accordingly. This change is based on a patch provided by Bastian Beischer. FIXES 134420 Change-Id: I3e29c61405d90d0b850bafe4c51125bef433452b Reviewed-on: https://gerrit.openafs.org/12760 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-06 20:17:41 -05:00
Benjamin Kaduk	e0c5ada214	Correct m4 conditionals in curses.m4 AS_IF does not invoke the test(1) shell builtin for us, so we must take care to consistently use it ourself. While here, sprinkle some missing double-quotes around variable expansions in AS_IF statements in this file. Submitted by Bastian Beischer. FIXES 134414 Change-Id: Iccfe311011f17de6317cf64abdc58b0812b81b8c Reviewed-on: https://gerrit.openafs.org/12738 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-16 19:26:36 -04:00
Damien Diederen	5ee516b378	Linux: Use kernel_read/kernel_write when __vfs variants are unavailable We hide the uses of set_fs/get_fs behind a macro, as those functions are likely to soon become unavailable: > Christoph Hellwig suggested removing all calls outside of the core > filesystem and architecture code; Andy Lutomirski went one step > further and said they should all go. https://lwn.net/Articles/722267/ Change-Id: Ib668f8fdb62ca01fe14321c07bd14d218744d909 Reviewed-on: https://gerrit.openafs.org/12729 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-04 11:27:49 -04:00
Michael Meffie	a71288a387	redhat: avoid rpmbuild exclude directives Older versions of rpmbuild do not support the files exclude directive, so fall back to the old way in which we remove the files to be excluded and list the files to be included. Change-Id: If64df382ef372aa1078f1703a34942a1930bdc88 Reviewed-on: https://gerrit.openafs.org/12733 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-03 22:13:50 -04:00
Michael Meffie	4d247e1ae4	redhat: move .krb variants to the kauth-client subpackage Move the deprecated klog.krb, pagsh.krb, and tokens.krb programs and man pages to the optional openafs-kauth-client subpackage. Change-Id: I09a2e36b60f9d47726a6a314a26db88e44575567 Reviewed-on: https://gerrit.openafs.org/12732 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-03 22:13:04 -04:00
Michael Meffie	671db4ca5a	redhat: specify man pages without wildcards Currently, some of the man pages are specified with the full name and some are specified with a wildcard for the filename extension. Instead, specify all the man pages without a wildcards to be more consistent and to avoid putting incorrect man pages in packages. This change removes a stray copy the klog.krb5.1 man page from openafs-kauth-client subpackage and moves the AuthLog/AuthLog.dir man pages to the optional openafs-kauth-server subpackage. Change-Id: Id30a6174c532a9a00f850d6ca2722158293d5118 Reviewed-on: https://gerrit.openafs.org/12731 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-03 22:11:57 -04:00
Michael Meffie	a9810b829b	redhat: remove afsd.fuse man page The afsd.fuse binary is not currently packaged; do not package the man page. Change-Id: Ia0dd4fa72dc8a87e2c835798b6fbe1213d71da5f Reviewed-on: https://gerrit.openafs.org/12730 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-03 22:09:40 -04:00
Marcio Barbosa	68ec78950a	ubik: avoid DISK_Begin on sites that didn't vote for sync As already described on `7c708506`, SDISK_Begin fails on remotes if lastYesState is not set. To fix this problem, `7c708506` does not allow write transactions until we know that lastYesState is set on at least quorum (ubik_syncSiteAdvertised == 1). In other words, if enough sites received a beacon packet informing that a sync-site was elected, write transactions will be allowed. This means that ubik_syncSiteAdvertised can be true while lastYesState is not set in a few sites. Consider the following scenario in a cell with frequent write transactions: Site A => Sync-site (up) Site B => Remote 1 (up) Site C => Remote 2 (down - unreachable) Since A and B are up, we have quorum. After the second wave of beacons, ubik_syncSiteAdvertised will be true and write transactions will be allowed. At some point, C is not unreachable anymore. Site A sends a copy of its database to C, but C did not vote for A yet (lastYesState == 0). A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. Since C is reachable, A will mark this remote site as up. The sync-site will send its database to C, but C did not vote for A yet. A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. In a cell with frequent write transactions, this cycle will repeat forever. As a result, the sync-site will be constantly sending its database to C and quorum will be operating with less sites, increasing the chances of re-elections. To fix this problem, do not call DISK_Begin on remotes that did not vote for the sync-site yet. Change-Id: I27f5122a089064e7b83beba3533261d8a4e31c64 Reviewed-on: https://gerrit.openafs.org/12715 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-03 22:08:22 -04:00
Damien Diederen	929e77a886	Linux: Test for __vfs_write rather than __vfs_read The following commit: commit eb031849d52e61d24ba54e9d27553189ff328174 Author: Christoph Hellwig <hch@lst.de> Date: Fri Sep 1 17:39:23 2017 +0200 fs: unexport __vfs_read/__vfs_write unexports both __vfs_read and __vfs_write, but keeps the former in fs.h--as it is is still being used by another part of the tree. This situation results in a false positive in our Autoconf check, which does not see the export statements, and ends up marking the corresponding API as available. That, in turn, causes some code which assumes symmetry with __vfs_write to fail to compile. Switch to testing for __vfs_write, which correctly marks the API as unavailable. Change-Id: I392f2b17b4de7bd81d549c84e6f7b5ef05e1b999 Reviewed-on: https://gerrit.openafs.org/12728 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-10-02 20:10:47 -04:00

1 2 3 4 5 ...

12736 Commits