openafs

mirror of https://git.openafs.org/openafs.git synced 2025-01-31 05:27:44 +00:00

Author	SHA1	Message	Date
Marcio Barbosa	45a69b6113	volser: take RO volume offline during convertROtoRW The vos convertROtoRW command converts a RO volume into a RW volume. Unfortunately, the RO volume is not checked out from the fileserver during this process. As a result, accesses to the volume being converted can leave volume objects in an inconsistent state. Moreover, consider the following scenario: 1. Create a volume on host_b and add replicas on host_a and host_b. $ vos create host_b a vol_1 $ vos addsite host_b a vol_1 $ vos addiste host_a a vol_1 2. Mount the volume: $ fs mkmount /afs/.mycell/vol_1 vol_1 $ vos release vol_1 $ vos release root.cell 3. Shutdown dafs on host_b: $ bos shutdown host_b dafs 4. Remove RO reference to host_b from the vldb: $ vos remsite host_b a vol_1 5. Attach the RO copy by touching it: $ fs flushall $ ls /afs/mycell/vol_1 6. Convert RO copy to RW: $ vos convertROtoRW host_a a vol_1 Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving the volume object for the RO copy set as VOL_STATE_ATTACHED (on success, this volume should be set as VOL_STATE_DELETED). 7. Add replica on host_a: $ vos addsite host_a a vol_1 8. Wait until the "inUse" flag of the RO entry is cleared (or force this to happen by attaching multiple volumes). 9. Release the volume: $ vos release vol_1 Failed to start transaction on volume 536870922 Volume not attached, does not exist, or not on line Error in vos release command. Volume not attached, does not exist, or not on line Notice that this happens because we cannot mark an attached volume as destroyed (FSYNC_com_VolDone). To avoid the problem mentioned above and to prevent accesses to the volume being converted, take the RO volume offline before converting it to RW. Change-Id: Ifd342e1f420dc42e5da49242a7aa70db7d97a884 Reviewed-on: https://gerrit.openafs.org/14340 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-13 12:33:46 -05:00
Cheyenne Wills	c17c157641	vos: Cleanup indentation whitespace Fix the indentation whitespace in vos.c, and remove double blank lines. No functional change. Change-Id: I97587779d6d2c131b5eac98bbee49efae73fafe9 Reviewed-on: https://gerrit.openafs.org/14007 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 12:13:19 -05:00
Michael Meffie	4bbf1239f8	vos: Return true when GetServerAndPart finds a site Change the GetServerAndPart() function to return true when a volume site in the vldb entry is found. Do not change the output arguments unless the site is found. Also, add a function comment header and fix some comment typos in this function. Change-Id: I10b43054b1bf9e6757ccdc95cb4559ab8b6dc013 Reviewed-on: https://gerrit.openafs.org/14006 Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>	2020-11-13 11:08:10 -05:00
Michael Meffie	ace2f7f5ce	vos: Add missing -partition requires -server checks The `vos remove` command was missing a check for the -server option when the -partition option is given. This command requires the -server option when the -partition is given, as documented in the man page. The `vos syncvldb` command performed the check for the -server option when the -partition option is given, but in the wrong location. As documented, the `vos unlockvldb` command permits the -partition option without a -server option, in which case all of the volumes listed in the VLDB with sites on the specified partition are unlocked. However, this command incorrectly issued an RPC to a volume server at address 0.0.0.0 when only the partition is given. Change-Id: I6b878678e28b34250e63d2d082747f6fd416972d Reviewed-on: https://gerrit.openafs.org/14005 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>	2020-11-13 11:07:08 -05:00
Mark Vitale	de3e7289e2	vos: avoid double release of a volume lock To update a volume entry in the VLDB, vos commands typically lock the volume entry via VL_SetLock, then call VL_UpdateEntryN, then release the lock via VL_ReleaseLock. However, some vos commands exploit the optional lock release flags of VL_UpdateEntryN to combine the update and unlock operations into a single RPC. This approach requires extra care to ensure that VL_ReleaseLock is issued for a failed VL_UpdateEntryN, but NOT for a successful VL_UpdateEntryN. Unfortunately, the following commands have success paths that fall through to the error path, resulting in a double release of the volume lock: - vos convertROtoRW - vos release A second VL_ReleaseLock of a volume entry that has already been unlocked via VL_UpdateEntryN is essentially a harmless no-op (other than negating any benefit of exploiting the VL_UpdateEntryN lock flags). However, if there is a race with another volume operation on the same volume, it is possible for this bug to release the volume lock of a different volume operation. This problem has been present in 'vos release' since OpenAFS 1.0. This problem has been present in 'vos convertROtoRW' since the command's introduction in commit 8af8241e94284522feb77d75aee8ea3deb73f3cc vol-ro-to-rw-tool-20030314. Properly maintain state to avoid unlocking a volume (with VL_ReleaseLock) that has already been unlocked (via VL_UpdateEntryN). Thanks to Andrew Deason for discovering the issue and suggesting the fix. Change-Id: I757b4619b9431d1ca980f755349806993add14a5 Reviewed-on: https://gerrit.openafs.org/14426 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:54:46 -05:00
Mark Vitale	8e1c321dc8	volser: document 'vos restore -readonly' restriction Commit 0c03f8607e15 vos-command-enhancements-20011008 introduced the 'vos restore' -readonly option, which allows the restored volume to be RO instead of the default RW. The commit message documents the following restriction: - ... This option causes the restored volume to be an RO volume. It is not permitted to restore an RO volume when the associated RW volume already exists. While it is possible to restore an RW volume where an RO volume exists, caution should be used to avoid doing this with VLDB entries created by 'vos restore -readonly', since such entries have their ROVOL and RWVOL ID's set to the same thing. Document this restriction in the 'vos restore' man page, and in a code comment. No functional change is incurred by this commit. Change-Id: I34f6c5434b82da538a38a9d219207b33dcf62b17 Reviewed-on: https://gerrit.openafs.org/14348 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:35:15 -05:00
Mark Vitale	0bcfe89d68	volser: improve error checking for 'vos restore' UV_RestoreVolume2 calls VLDB_GetEntryByName to obtain information for sanity checking, but only checks for a VL_NOENT error code; other codes are thus ignored, which may lead to confusing results. Add an additional error check for 'vos restore' (and other callers of UV_RestoreVolume2) to stop and issue an error message if a non-VL_NOENT error code is received from VLDB_GetEntryByName. Change-Id: Idf41965fdd84fa282a3397215ec393ae10f72018 Reviewed-on: https://gerrit.openafs.org/14347 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:21:56 -05:00
Mark Vitale	a534ee83f2	volser: fix 'cant' typos Correctly spell "can't" in a log message and a comment. Change-Id: I9d5c667d9c5ea3c5b726f958431c497353433239 Reviewed-on: https://gerrit.openafs.org/14346 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:09:57 -05:00
Mark Vitale	aed4a0c4b9	afs: avoid panic in DNew when afs_WriteDCache fails afs_WriteDCache may fail for an IO error, or if interrupted (EINTR). Unfortunately, DNew will panic in this case, crashing the entire machine. In order to avoid an outage in this case, don't panic. Instead, reflect the error back to the caller of DNew. While here, add Doxygen comments to DNew. Change-Id: I27a8f89bab979c5691dded70e8b9eacbe8aff4fd Reviewed-on: https://gerrit.openafs.org/13804 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 15:51:35 -05:00
Mark Vitale	1c04036b34	afs: remove redundant assignment DRelease has two assignments for tp = entry->buffer; remove the second (redundant) one. Introduced with 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly state buffer locations for data'. No functional change should be incurred by this commit. Change-Id: If4a17862f451973075fa3fa267b5139046d97ede Reviewed-on: https://gerrit.openafs.org/13802 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 15:51:12 -05:00
Mark Vitale	6bd94fe29d	dir: check DNew return code Commit 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly state buffer locations for data' changed DNew and DRead to return a return code. However, the callers of DNew were not modified to check the new return code. (This commit applied only to the implementations dealing with AFS directories, in afs/afs_buffer.c and dir/dir.c. The ubik implmentations of DNew and DRead, dealing with ubik databases, were not modified.) Modify all (non-ubik) callers of DNew to check the return code. In addition, modify code as needed so return codes are properly propagated to the callers. While here, add Doxygen comments for AddPage and FindBlobs. Change-Id: Iabde6499745dd351f3fcda73c9f52c440a36490e Reviewed-on: https://gerrit.openafs.org/13801 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 14:42:15 -05:00
Andrew Deason	7b0a66f63b	Remove unused xdr types Numerous types and constants are defined in our various RPC-L files that are never used or referenced by anything. Remove them. Change-Id: I0b03be1ce0e186a88f80d2f3f7a66a1e25965ff3 Reviewed-on: https://gerrit.openafs.org/14404 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 14:29:37 -05:00
Benjamin Kaduk	0787a2c8ae	volser: apply static keyword to VolPartitionInfo definition The function declaration was already marked as static; mark the definition as well for consistency (and consistency with the other helpers in this file). Change-Id: I642db1d27efd34ab2a09f7299791c19d07b1f923 Reviewed-on: https://gerrit.openafs.org/13321 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 11:59:17 -05:00
Mark Vitale	dcce956df4	dir: check afs_dir_Create return code in afs_dir_MakeDir afs_dir_MakeDir() ignores the return code from afs_dir_Create() for the '.' and '..' ("dot" and "dotdot") directories. This has been the case from the earliest implementation (MakeDir() calling Create()) in the original IBM import. Instead, check the return codes to prevent the possibility of creating malformed directories. Change-Id: I60179488429dfa9afe60c4862c5e42b41f1e0048 Reviewed-on: https://gerrit.openafs.org/13800 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 11:47:01 -05:00
Benjamin Kaduk	04805f48a2	ptserver: rename NameToID and IDToName helpers These helper function names alias the names of public RPCs and can cause confusion when grepping the code. Rename them in a different style to provide greater hamming distance between the various functions involved in handling these RPCs. Change-Id: I0e2c7997bc145888affdac28716293ff820756c7 Reviewed-on: https://gerrit.openafs.org/13320 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 10:16:42 -05:00
Mark Vitale	0639ca8d22	dir: check afs_dir_MakeDir return code in DirSalvage Since the original IBM import, DirSalvage() has ignored the return code from afs_dir_MakeDir() (f.k.a. MakeDir). This has been safe because, as the comment states, afs_dir_MakeDir returns no (non-zero) error code. In preparation for a future commit, add a check for the return from afs_dir_MakeDir and remove the comment. Change-Id: Ibb259a7aaeeb21ef70a7794143a0dadb2a75725d Reviewed-on: https://gerrit.openafs.org/13799 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 10:07:04 -05:00
Mark Vitale	735fa5fb09	dir: distinguish logical and physical errors on reads The directory package (src/dir) salvage routines DirOK and DirSalvage check a global variable 'DErrno' to distinguish logical errors (e.g. short read) from physical errors (e.g. EIO). However, since the original IBM import, this logic has not worked correctly because there is no longer any code that sets the value of DErrno - its value is always zero. Instead, modify all implementations of ReallyRead to optionally return the errno for low-level IO errors. Also, create a new userspace-only variant - DReadWithErrno() - of the src/dir/buffer.c version of DRead (the version called by DirOK and DirSalvage, and the only caller of ReallyRead) to return the ReallyRead errno upon request. Also create an analogous variant of afs_dir_GetBlobs, afs_dir_GetBlobsWithErrno(). Finally, convert DirOK and DirSalvage to use the new variants and replace DErrno with equivalent logic. Remove all other references to DErrno. Change-Id: I3de182ce49c1682572142da594af5dc2c00ede74 Reviewed-on: https://gerrit.openafs.org/13798 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 10:06:13 -05:00
Andrew Deason	1caeeea43c	afs: Log pid with disk cache read errors Log the current pid (and procname) when we complain about an error when reading from CacheItems in afs_UFSGetDSlot. These errors can result in confusing situations, so it can be helpful to know at least what process saw the error. Our logic for logging this information is getting a bit large, so also move this to a new function, LogCacheError. Change-Id: I3427e736458784df0d516f4182684605e930e128 Reviewed-on: https://gerrit.openafs.org/14416 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 01:18:21 -05:00
Cheyenne Wills	98c1a8751c	roken: use strtok_r from roken Windows standard library doesn't provide strtok_r. Use the strtok_r that is provided from roken. Change-Id: I1bccb9a306c9dd1963f044127fb5dfe4da5728cc Reviewed-on: https://gerrit.openafs.org/13891 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 23:14:24 -05:00
Heimdal Developers	e5abb34882	Import of code from heimdal This commit updates the code imported from heimdal to 5dfaa0d10b8320293e85387778adcdd043dfc1fe (git2svn-syncpoint-master-311-g5dfaa0d10) New files are: roken/strtok_r.c Change-Id: I27042f614c7d6ce9a95a80d01474e8bf401e4760 Reviewed-on: https://gerrit.openafs.org/13890 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 23:14:04 -05:00
Cheyenne Wills	fe6a97b4d8	roken: add strtok_r to the imported file list Import the strtok_r function which is needed by audit for parsing command line options. Change-Id: I8412c5a663dc3315c4146665edb72d9a6b8df5be Reviewed-on: https://gerrit.openafs.org/13889 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 23:13:46 -05:00
Benjamin Kaduk	a912a29a45	Detect realloc failure While reviewing other commits, a call to realloc() was discovered that would leak memory on failure (by virtue of always assigning the realloc() return value to the pointer holding the input address, even when the return value is NULL). Check for failure and return early in that case (giving an incomplete list of events). Change-Id: Ic6e889f1d990bd289812ce4bf8e9cd4ebce488ec Reviewed-on: https://gerrit.openafs.org/13313 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-05 22:49:43 -05:00
Benjamin Kaduk	7ede3fa17f	ptserver: move IDToName, NameToID to ptprocs.c and make static These two helpers are only used in implementing server-side RPC handlers, and having to track the codeflow across files is unhelpful. Move them into the file where they're used, make them static, and remove the prototypes from ptrototypes.h (which is not an installed header, so there is no API/ABI breakage). Change-Id: I236d17865a296933f41aaee206535d341c3a955d Reviewed-on: https://gerrit.openafs.org/13319 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 22:29:07 -05:00
Benjamin Kaduk	fe4f6638d1	Assign explicit opcodes to butc RPCs This should prevent inadvertent reassignment if additional RPCs are introduced in the future. Change-Id: I5645ca478d2ecef9962f4bde04ab8f9895dd9497 Reviewed-on: https://gerrit.openafs.org/13317 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-05 21:39:31 -05:00
Andrew Deason	fd6add0aca	vlserver: Return VL_DBBAD on unhash failure If we try to delete a vlentry, and the vlentry cannot be found on one of its hash chains, we cannot unhash the vlentry properly and the operation fails with VL_NOENT. This results in the following error messages to the user: $ vos delentry 123456 Could not delete entry for volume 123456 You must specify a RW volume name or ID (the entire VLDB entry will be deleted) VLDB: no such entry Deleted 0 VLDB entries This is confusing, because VL_NOENT can also occur if the user specifies a volume that does actually not exist. This situation is indicative of database corruption, usually because of a ubik transaction that was only half-applied, or because of other ubik bugs in the past. The situation can only really be fixed by repairing the database, so return VL_DBBAD in this case instead, to more clearly indicate that something is wrong with the database, and not a problem with the arguments the caller provided. Change-Id: I6fc275c3ad05c108778f36687227b0a927cca5da Reviewed-on: https://gerrit.openafs.org/13384 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-02 11:53:43 -05:00
Andrew Deason	878d27c845	vlserver: Add VL_DBBAD error code The VL_ error table currently doesn't have an error code to indicate that an operation cannot succeed because the database is corrupted. There are a few error codes for specific cases of errors that are probably the result of corruption (like VL_IDALREADYHASHED, or VL_EMPTY), but these are only for specific cases and indicate rather low-level internal problems. There are some instances where the real problem preventing an operation from succeeding is that the database is just corrupt or inconsistent in some way, and the administrator must repair the database before it can succeed. And we currently don't have any way of indicating that situation via an error code. So, introduce the VL_DBBAD code, to indicate this situation. Error codes already exist in other tables for similar situations, such as PRDBBAD, and KADATABASEINCONSISTENT. This commit does not use the new error code anywhere; we just introduce it into the VL_ error table, so comerr-using applications will be able to interpret it. Note that the VL_DBBAD error code has been recognized by the AFS Assigned Numbers Registry as recorded in the ticket history of <https://rt.central.org/rt/Ticket/Display.html?id=134817> Change-Id: I8fea356a4e0db907ec8418efe6ef35d547be0a63 Reviewed-on: https://gerrit.openafs.org/13383 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-02 11:53:22 -05:00
Benjamin Kaduk	c1c4e308cf	ptserver: move allocation out of put_prentries() into listEntries() put_prentries() is a helper function for listEntries(), but the contract between the two is rather odd -- put_prentries() is expected to notice when the backing store has not yet been allocated and silently allocate it, even though there is only the single caller and the allocation could be done in the caller. Move the allocation to the caller and adjust the "buffer is full" logic accordingly, and normalize the initialization of the output array to just use calloc() instead of individual memset()s when populating each entry. Change-Id: Icf84e3b60eae81a1570b12d7adbf006a24a104f3 Reviewed-on: https://gerrit.openafs.org/13315 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-02 03:08:38 -05:00
Cheyenne Wills	61a8f0c5b8	volser: Avoid calling osi_audit before audit init volmain.c calls osi_audit before the audit facility is fully initialized. Commit 16d67791 (auditlogs-for-everyone-20050702) introduced the -auditlog parameter; it appears that it didn't remove the call to osi_audit (right after osi_audit_init) that was called before command line argument processing. This resulted in calling the audit facility before it was fulling initialized with the -auditlog and -audit-interface parameters. The 16d67791 commit replicated the osi_audit call after command line processing. Change-Id: Ia0c0054a2fb11892b5b30c0f0838a4d6bbdf9bbb Reviewed-on: https://gerrit.openafs.org/13772 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-01 14:27:06 -05:00
Andrew Deason	52c4bf4d18	audit: Always call pthread_once in osi_audit_init Currently, we skip the pthread_once call in osi_audit_init if audit_lock_initialized is set. But this is somewhat pointless, since pthread_once will effectively do this check itself, and better (it will wait if osi_audit_init is actively running in another thread). So just get rid of audit_lock_initialized, and replace the other assert for audit_lock_initialized with another plain pthread_once call. Change-Id: I466c8ec2d1516edecaae23d4354892e7e3a88918 Reviewed-on: https://gerrit.openafs.org/14403 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-01 13:11:48 -05:00
Benjamin Kaduk	e83347ce6a	remove unused src/butc/common.h Change-Id: Ie25a9ca4f715c841a7f7fa130176cfbdc5ef18e7 Reviewed-on: https://gerrit.openafs.org/13322 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-29 23:30:24 -04:00
Benjamin Kaduk	b1975c27d2	butc: consistently spell taskId parameter All but one RPC used the capitalization "taskId"; adjust the long straggler for consistent style. Change-Id: I996d96a4fc67af7f745bf67041c90390073ca9ea Reviewed-on: https://gerrit.openafs.org/13318 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-29 23:09:23 -04:00
Benjamin Kaduk	f44d7e64bb	Remove commented-out butc RPC definitions These functions have been commented out since the original IBM import, and un-commenting them in their current location would be an ABI break (by causing opcodes to be reassigned for subsequent RPCs). Since they are just noise in the interface description file, remove them. Change-Id: I7e8cd2e7dfa4469e39e26a0437059c108f3ef218 Reviewed-on: https://gerrit.openafs.org/13316 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-29 21:49:39 -04:00
Benjamin Kaduk	465701c8f1	butc: Initialize RPC outputs at top of function RPC handlers are a little bit special in that their output parameters are discarded on error and an Rx abort is sent instead of the usual response fields. Nonetheless, it is good code hygeine to adhere to the practices we use for the rest of the functions in the tree: initialize output variables before the first return. Change-Id: I6c2e25b04ccb6277bd28e398121723b92fe42b04 Reviewed-on: https://gerrit.openafs.org/13314 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-29 21:32:54 -04:00
Andrew Deason	3e3fce24da	vlserver: Warn when we cannot unhash deleted entry If we are trying to delete an entry from the vldb, we fail with VL_NOENT if we cannot find the given entry on one of its hash chains. This is indicative of corruption in the vldb (since we have an entry not on a hash chain), but we don't really indicate this clearly. There are no log messages, and the user running 'vos' only sees an error like this: $ vos delentry 123456 Could not delete entry for volume 123456 You must specify a RW volume name or ID (the entire VLDB entry will be deleted) VLDB: no such entry Deleted 0 VLDB entries Which is the exact same error message if the user tries to delete a volume that does not actually exist. We currently do not have an error code that clearly says that the database appears corrupted and needs to be fixed, but we can at least log an error in VLLog for this case, to give the administrator a chance at fixing the situation. So, log a message in this situation. Change-Id: I4f0ee8749a90441e1f8d779890293dc5d1d9dbee Reviewed-on: https://gerrit.openafs.org/13382 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-29 21:20:08 -04:00
Mark Vitale	48df3ac302	bos: do not assume fs just if dafs bnode is stopped If dafs is configured but stopped, 'bos salvage <fs> <vicep> -forceDAFS' will fail with: bos: failed to get instance info for 'fs' (no such entity) bos: shutting down 'fs'. bos: can't stop 'fs' (no such entity) This is due to incomplete logic in IsDAFS, introduced with commit e46f10a0a0a930f318833a8a86b10c19744160c1 'bos: Do not assume DAFS just if DAFS bnode exists' Add logic to IsDAFS to work correctly when dafs is configured but stopped. Change-Id: I50f8209180536d25e68c0ad6fb826202d8f27ce7 Reviewed-on: https://gerrit.openafs.org/14382 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-25 23:22:02 -04:00
Mark Vitale	f372ec041a	bozo: defer audit open until log dir is created and current On a new OpenAFS install where the log directory has not yet been created. 'bosserver -auditlog /usr/afs/logs/<auditlog>' (absolute path) fails with ENOENT because the log directory doesn't exist yet. Furthermore, 'bosserver -auditlog <auditlog>' (relative path) succeeds, but the audit file is created in the current working directory when bosserver was started, not in the expected log directory (Transarc /usr/afs/logs). Both problems have been present since bosserver audit log support was introduced by commit 16d67791dce45e5d4ee9b854c796492ffcde2113 'auditlogs-for-everyone-20050702'. Reorder the bosserver initialization steps to ensure that the log directory has been created and is the current working directory, before creating and opening the audit log. Change-Id: I1dc3c136edd12c5425ef0b7a3212a18d4c3036f7 Reviewed-on: https://gerrit.openafs.org/14381 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-25 16:58:02 -04:00
Andrew Deason	87041d676c	bozo: Properly detect presence of -auditlog cmd_OptionAsString returns non-zero if the given option _isn't_ given (CMD_MISSING), so we need to call osi_audit_file only when cmd_OptionAsString returns 0. Since commit f6cdf71 (bozo: Use libcmd for command line options), this causes bosserver to complain on startup if no -auditlog was given: $ bosserver Warning: auditlog (null) not writable, ignored. To fix this, skip calling osi_audit_file if -auditlog was not given. While we're changing this anyway, change our processing of our audit-related options to more closely match what other daemons do, like ptserver or viced, so it's easier to see if we're doing the right thing. That is, just call cmd_OptionAsString() without a conditional, and just test if auditFileName is non-NULL later on, after options processing. Change-Id: I563c7efd02cb5210c32c0cc7f5a03683db792e98 Reviewed-on: https://gerrit.openafs.org/14402 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-25 16:57:50 -04:00
Mark Vitale	e8702e6a61	afs: prevent double release of global lock afs_xvcb afs_GetServer calls ReleaseWriteLock(&afs_xvcb) twice within a few lines. The second one is spurious. Commits b18653de7ae90491c2e75f4a98410581655d776c 'xserver lock order violation' and f2bf60ed4f1323cd6f74f2f01114f7e4f714db53 'xvcb lock order violation' were written by the same author at the same time and apparently were victims of a bad merge. Discovered during a lock audit project as a panic during afsd startup: assertion failed: (&afs_xvcb)->excl_locked == WRITE_LOCK, file: /home/mvitale/src/sna-openafs/src/afs/afs_server.c, line: 2089 afs_GetServer is called frequently by many threads and so this bug could easily have released another thread's write lock on afs_xvcb. Remove the spurious second release. Change-Id: I495f4775e18ed37cfbccd03b5f25594586864b83 Reviewed-on: https://gerrit.openafs.org/14411 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-24 00:44:03 -04:00
Marcio Barbosa	fed176cc50	ubik: Introduce IndexOf() To make the ubik_Call* functions cleaner, consolidate code that finds the index of the connection associated with a host into a new function. No functional change should be incurred by this commit. Change-Id: I320d7a41221cb533e8d077c412f872152ac43b75 Reviewed-on: https://gerrit.openafs.org/14060 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 23:06:13 -04:00
Andrew Deason	ea9e5e8519	afs: Handle osi_NewVnode failures Currently, code inside afs_vcache.c assumes that osi_NewVnode always returns non-NULL, which means that osi_NewVnode must panic if it cannot create a new vnode. All of the callers of afs_GetVCache, afs_NewVCache, etc, already handle getting a NULL return, though (after all, the given fid may not exist or be inaccessible due to network errors, etc). So, just propagate NULL returns from osi_NewVnode up to our callers, to avoid panics in these situations. Modify osi_NewVnode on many arches to return an error on allocation failure, instead of panic'ing. Change-Id: Ib578b1747590bdf65327d4674e0849811ed999eb Reviewed-on: https://gerrit.openafs.org/13701 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-10-23 12:43:48 -04:00
Mark Vitale	e1e5df918f	stats: incorrect clock square algorithm Since the original IBM code import, OpenAFS has an algorithm for squaring clock values, implemented identically in three different places. This algorithm does not account correctly for microsecs overflow into seconds, resulting in incorrect "sum-of-squares" values for queue and execution time in several OpenAFS performance utilities. Specifically, this code: t1.tv_usec += (2 * t2.tv_sec * t2.tv_usec) % 1000000 \ + (t2.tv_usec / 1000)(t2.tv_usec / 1000) \ + 2 (t2.tv_usec / 1000) * (t2.tv_usec % 1000) / 1000 \ + (((t2.tv_usec % 1000) > 707) ? 1 : 0); \ Can allow for the tv_usec field to be increased by a theoretical max of around: t1.tv_usec += 999998 \ + 999999 \ + 2 999 * 999 / 1000 \ + 1; \ Or: t1.tv_usec += 1999996; \ If t1.tv_usec is already 999999, after this calculation its value could be as high as 2999995. So just checking once if t1.tv_usec is over 1000000 is not sufficient, since the resulting value (1999995) is still over 1000000. Correct all implementations by repeatedly checking if tv_usec is over 1000000 after the above calculation: macro affected utility ===================== ============================ afs_stats_SquareAddTo xstat_cm_test fs_stats_SquareAddTo xstat_fs_test clock_AddSq rxstat_get_process and _peer Change-Id: I3145d592ba6bc1556729eac657f43d476c99eede Reviewed-on: https://gerrit.openafs.org/14376 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 12:25:43 -04:00
Mark Vitale	e985d43d99	rxstats: correctly report vlserver VL_* RPC stats Since the original IBM code import, rxstat_get_process and rxstat_get_peer have reported vlserver VL_* RPC stats as for the "volserver interface". Correct this to read "vlserver interface". Change-Id: Ie65fd41150bed8180ad8792c21a67012084459ab Reviewed-on: https://gerrit.openafs.org/14375 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 12:25:10 -04:00
Mark Vitale	18c345a9f8	rxstats: correctly distinguish client and server stats Commit d3eaa39da3693bba708fa2fa951568009e929550 'rx: Make the rx_call structure private' inadvertently caused all rxstats (aka rpcstats) to be recorded as client stats by hardcoding the value for isServer to 1. Therefore, when peer or process rxstats are enabled for a OpenAFS component, the rxstat_get_process and rxstat_get_peer utilities will erroneously report both client and server stats as "accessed as a client". This is particularly problematic for ubik VOTE_* and DISK_* RPC stats, for which a given ubik server may be both client and server over time. In this case, both client and server stats are conflated into the same "accessed as a client" counters. Instead, properly pass the value of isServer from rx_RecordCallStatistics through to rxi_IncrementTimeAndCount. Note to maintainers: This bug is only in master and all 1.8.x releases; no 1.6.x releases are affected. Note: Confusingly, isServer=1 indicates client stats and isServer=0 indicates server stats. However, this is a quirk of the original implementation and wire format of the RXSTATS_* RPCs and cannot be changed. isServer is actually shorthand for "remote is server"; thus all RPC client stubs record their rxstats with isServer == 1, and all RPC server stubs record their rxstats with isServer == 0. Change-Id: I2420f807e2c18ddfb9de7093a487825fa2d0a68e Reviewed-on: https://gerrit.openafs.org/14374 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 12:24:37 -04:00
Marcio Barbosa	f18b58f822	volser: Close dirp on error in ConvertROtoRW Currently, if SAFSVolConvertROtoRWvolume cannot create a new transaction for the volume to be converted, it returns without closing the directory stream opened by it. To prevent this leak, go through a new 'goto done' destructor if NewTrans fails. Change-Id: Ie0580e7739ae667f1cd2f9cabb8aaf5e15d3f2dd Reviewed-on: https://gerrit.openafs.org/14342 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 12:16:07 -04:00
Michael Meffie	47d809d443	bozo: Log each dir and file with bad access rights The bosserver directory and file access check stops after finding one directory or file with incorrect permissions or owner. A log message is written for this first one found, but more than one directory or file may have incorrect access rights. Instead check all of them so the bosserver logs a warning message for each incorrect director or file permission found. This should make it easier to fix all of the file permission problems at once. Change-Id: Ia3f14800ce036aa390929109a286cf21828e8a35 Reviewed-on: https://gerrit.openafs.org/14330 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 12:11:43 -04:00
Michael Meffie	a6b14ea902	bozo: Add KeyFileExt and rxkad.keytab to access rights check When the KeyFileExt and rxkad.keytab were added to OpenAFS, they were not added to the bosserver's access rights check. Add these files to the bosserver access checks, with the same access rights needed for the original KeyFile. Also, add the full path for KeyFileExt to the dirpath package (not just the filename), which was not done when the KeyFileExt was introduced. This is needed to perform the access checks. Change-Id: I8c9028e846fad9f15823baeb7cc15a8f80ed5c1c Reviewed-on: https://gerrit.openafs.org/14329 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 12:11:26 -04:00
Mark Vitale	e17bc8ce86	afs: remove vestigial externs for afs_xvcache These have not been needed since src/afs/afs_prototypes.h gained 'extern afs_rwlock_t afs_xvcache' with commit 8f2df21ffe59e9aa66219bf24656775b584c122d "pull-prototypes-to-head-20020821" Remove the vestigial extern references. Change-Id: Id6aceff0d5df1f1bed210a3fbf2951c62f35ddbb Reviewed-on: https://gerrit.openafs.org/14406 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 11:46:19 -04:00
Mark Vitale	a3fc79633f	afs: remove vestigial externs for afs_xcbhash Commit 64cc7f0ca7a44bb214396c829268a541ab286c69 "afs: Create afs_StaleVCache" consolidated many references to afs_xcbhash into a new function afs_StaleVCache. However, this left many references to 'extern afs_wrlock_t afs_xcbhash' that are no longer needed. But actually, many of these have not been needed since src/afs/afs_prototypes.h gained 'extern afs_rwlock_t afs_xcbhash' with commit 8f2df21ffe59e9aa66219bf24656775b584c122d "pull-prototypes-to-head-20020821" Remove the vestigial extern references. No functional change is incurred by this commit. Change-Id: Ie6cfb6d90c52951795378d3b42e041567d207305 Reviewed-on: https://gerrit.openafs.org/14405 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 11:43:12 -04:00
Mark Vitale	4e85324729	xstat: prevent CPU loop when -period 0 Historically xstat_cm_test and xstat_fs_test have supported option '-period <mm>' to specify continuous operaiton for a length of time. If '-period 0' was specified, both programs exited immediately. Beginning with commits 2c1a7e47336c8f8d14dd6c65d53925a9e0e87c66 'xstat: add xstat_*_Wait functions' and 6b67cac432043a43d7cdfa6af972ab54412aff94 'convert xstat and friends to pthreads', xstat_cm_test and xstat_fs_test now support -period 0 to run "forever". This support is implemented in xstat_cm_Wait and xstat_fs_Wait, respectively. Although the "wait forever" logic was added to allow consolidation of similar code in afsmonitor, it also changed how xstat_cm_test and xstat_fs_test behave for '-period 0'. Unfortunately, there is a bug in this support, at least when running on pthreads. After the initial 24 minute timer expires, the while (1) will repeatedly run select with a timeout that is now 0. This causes the while loop to consume 100% of the CPU on which this thread is dispatched. Instead, modify the wait-forever logic to specify NULL for the select() timeout value. Also update the man page to document that '-period 0' means forever. Change-Id: I25d0d5be0eedb8bf3de495785b9b03a3e3d45221 Reviewed-on: https://gerrit.openafs.org/14366 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 11:41:30 -04:00
Andrew Deason	74f46e0912	afs: Return to userspace after AFS_NEW_BKG reqs Currently, for AFS_NEW_BKG, background daemons run in the context of a normal user process (afsd), in order to return to run userspace-handled background ops. For non-AFS_NEW_BKG when AFS_DAEMONOP_ENV is defined, background daemons run as kernel threads instead, and have no corresponding userspace process. On LINUX, whether or not we run as a kernel thread has some odd side-effects: at least one example of this is how open file handles (struct file) are treated when closed. When the last reference to a struct file is closed, the final free is deferred to an asynchronous call to be executed "later", in order to avoid issues with lock inversion. For kernel threads, "later" means the work is schedule on the global system work queue (schedule_work()), but for userspace processes, it is scheduled on the task work queue (task_work_add()), which is run around when the thread returns to userspace. For background daemons, we never return from the relevant syscall until we get a userspace background request (or the client is shutting down), Commit ca472e66 (LINUX: Turn on AFS_NEW_BKG) changed LINUX to use AFS_NEW_BKG background daemons, so background requests now run as a normal userspace process, and last-reference file closes are deferred. Since we may never return to userspace, this means that our file handles (used for accessing the disk cache) may never get freed, leading to an unbounded number of file handles remaining open. This can be seen by seeing the first value in /proc/sys/fs/file-nr growing without bound (possibly slowly), as accessing /afs causes background requests. Eventually the number of open files can exceed the /proc/sys/fs/file-max limit, causing further file opens to fail, causing various other problems and potentially panics. To avoid this issue, define a new userspace background op, called AFS_USPC_NOOP, which gets returned to the afsd background daemon process. When afsd sees this, it just does nothing and calls the AFSOP_BKG_HANDLER syscall again, to go into the background daemon loop again. In afs_BackgroundDaemon, we return the AFS_USPC_NOOP op whenever there are no pending background requests, or if we've run 100 background requests in a row without a break. This causes us to return to userspace periodically, preventing any such task-queued work from building up indefinitely. Do this for all platforms (currently just LINUX and DARWIN), in order to simplify the code, and possibly avoid other similar issues, since staying inside a syscall for so long while doing real work is arguably weird. Add a documentation comment block for afs_BackgroundDaemon while we're here. Thanks to mvitale@sinenomine.net for discovering the file leak. Change-Id: I1953d73b2142d4128b064f8c5f95a5858d7df131 Reviewed-on: https://gerrit.openafs.org/13984 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-10-23 11:16:47 -04:00

1 2 3 4 5 ...

13420 Commits