openafs

mirror of https://git.openafs.org/openafs.git synced 2025-01-31 05:27:44 +00:00

Author	SHA1	Message	Date
Andrew Deason	cd35aa9e2a	afs: Fix XBSD check for VNOVAL va_uid Commit e86eb73e (obsd-vattrs-20040125) introduced an XBSD-specific check to detect some unchanged attributes. But the #ifdef for XBSD for the va_uid section was added in the middle of an HPUX-specific block by mistake. Move this #ifdef one level higher, so it's actually used on BSD platforms. Change-Id: I606f87f21d6c4830ed8bcf50abd6fb5807868ff5 Reviewed-on: https://gerrit.openafs.org/14473 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Tim Creech <tcreech@tcreech.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-29 20:40:42 -05:00
Mark Vitale	3f9a08db86	rx: Avoid new server calls for non-DATA packets Normally, a client starts a new Rx call by sending DATA packets for that call to a server, and rxi_ReceiveServerCall on the server creates a new call struct for that call (since we don't recognize it as an existing call). Under certain circumstances, it's possible for a server to see a non-DATA packet as the first packet for a call, and currently rxi_ReceiveServerCall will create a new server call for any packet type. The call cannot actually proceed until the server receives data from the client (and goes through the challenge/response auth handshake, if needed), but usually this is harmless, since the existence of any packets for a particular call channel indicate that the client is trying to run such a call. The server will respond to the client with ACKs to indicate that it is missing the needed DATA packet(s), and the client will send them and the call can proceed. However, if a call is in the middle of running when the server is restarted, the client may be sending ACKs for a pre-existing call that the server doesn't know about. In this case, the server generates ACKs that indicate the server has not received any DATA packets, which may appear to violate the protocol, depending on the prior state of the call (e.g. the server appears to try to move the window backwards). Clients should be able to detect this and kill the call, but many do not. For many OpenAFS releases before commit 7b204946 (rx: Avoid lastReceiveTime update for invalid ACKs), the client will get confused in this situation and will keep the call open forever, never making progress. There isn't any benefit to creating a new server call in these situations, so just ignore non-DATA packets for unrecognized calls, to avoid stalled calls from such clients. Those clients will not get a response from the server, and so the call will eventually die from the normal Rx call timeout. Change-Id: I565625ba8b6901f9b745124a8816a9ba816c0264 Reviewed-on: https://gerrit.openafs.org/13758 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-29 15:11:23 -05:00
Andrew Deason	7b20494601	rx: Avoid lastReceiveTime update for invalid ACKs Currently, we ignore ACK packets in a few cases: - If the ACK appears to move the window backwards (if firstPacket is smaller than call->tfirst). - If the ACK appears to have been received out of order (if previousPacket is smaller than call->tprev). - If the ACK packet appears truncated. In all of these cases, we ignore the ACK packet completely in our ACK processing code (rxi_ReceiveAckPacket), but we still process the packet at higher levels (rxi_ReceivePacket). Notably, this means we update call->lastReceiveTime after rxi_ReceiveAckPacket returns, even for ACK packets we haven't really looked at. Normally this does not cause any noticeable problems, because such packets should either never be encountered, or only consist of a small number of packets that are mixed in with valid packets. However, if our peer is a server, and it is restarted in the middle of a call, our peer may exclusively send us packets that fall into the above categories. (This does not happen if our peer is a client, because clients just ignore packets for calls they do not recognize.) For example: Consider a call where a client is sending data to a server, and the server restarts after the client has sent a DATA packet with sequence number 1000. The server may then start responding to the client with ACKs with firstPacket set to 1, since the restarted server has no knowledge of the call's state. In this case, a firstPacket of 1 is well below where our window was, so all of the ACKs from the server are ignored. But we keep updating call->lastReceiveTime for all of these packets, and so the call stays alive forever until an idle-dead or hard-dead timeout activates (if any are set). As another example, consider the case where a client is sending data to a server, and the server receives a full window of packets (say, 16 packets), has not yet passed any data to the application yet, and the server restarts. The restarted server then starts responding to the client with ACKs with firstPacket set to 1, and previousPacket set to 0. We also ignore all of the ACKs from the server in this case, because even though firstPacket looks sane, it looks like previousPacket has gone backwards. We still update call->lastReceiveTime for each ignored ACK we get, keeping the call alive. Before commit 4e71409f (Rx: Reject out of order ACK packets) was introduced in 1.6.0, neither of these issues could occur. That commit introduced the issue specifically if previousPacket goes backwards; that is, if the server restarts before firstPacket moves forwards. Commit 8d359e6d (rx: Remove duplicate out of order ACK check) in 1.8.0 introduced the issue when 'firstPacket' goes backwards, since previously the FIRSTACKOFFSET-based check caused us to ignore those packets without updating call->lastReceiveTime. That is, if the server restarts after firstPacket moves forwards. In this commit, we still ignore packets in the above cases, but we also avoid updating lastReceiveTime when we update such packets, to make sure that we do not keep a call alive solely from receiving these invalid packets. Alternatively, we could change our logic to immediately abort calls where firstPacket moves backwards (since this violates the Rx protocol), or to not ignore some packets where previousPacket goes backwards (since these calls may be recoverable). And we could also skip updating lastReceiveTime for invalid packets of other types. But for now, this commit just avoids updating lastReceiveTime for invalid ACK packets, in order to just try to restore our behavior before 1.6.0, while still retaining the benefits of ignoring out-of-order ACKs. Further changes in this area can potentially be handled separately by future commits. Also increment the spuriousPacketsRead counter for packets that we ignore in this way (which we used to do for some packets before commit 8d359e6d), so we are not entirely silent about ignoring them. Written in collaboration with mvitale@sinenomine.net. Change-Id: Ibf11bcb2417d481ab80cf4104f2862d1d6502bf4 Reviewed-on: https://gerrit.openafs.org/13875 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-18 13:18:48 -05:00
Andrew Deason	f6490629e1	rx: Introduce ack_is_valid Take some of our existing logic for ignoring invalid ACK packets and split it out into a separate function, ack_is_valid. This just makes it easier to add more complex logic in here and write longer comments explaining the decisions. Note that the bug mentioned regarding the previousPacket field was introduced in IBM AFS 3.5, and was fixed in OpenAFS in commit bbf92017 (rx: rxi_ReceiveDataPacket do not set rprev on drop), included in OpenAFS 1.6.23. This commit incurs no functional change; it is just code reorganization. Change-Id: Idd569c6bc0c475e700935cf86780a04ab24102f4 Reviewed-on: https://gerrit.openafs.org/13874 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-18 11:59:25 -05:00
Andrew Deason	5c92346945	rx: For AFS_RXERRQ_ENV, retry sendmsg on error When AFS_RXERRQ_ENV is defined, we currently end up doing something like this for our sendmsg abstractions: if (sendmsg(...) < 0) { while (rxi_HandleSocketError(sock)) ; return error; } return success; This means that when sendmsg() returns an error, we process the socket error queue before returning an error. The problem with this is that when we receive an ICMP error on our socket, it creates a pending socket error that is returned for any operation on the socket. So, if we receive an ICMP error after trying to contact any peer, sendmsg() could return an error when trying to send for any other peer. Even though there is no issue preventing us from sending the packet, we'll fail to actually send the packet because sendmsg() returned an error. This effectively causes an extra outgoing packet drop, possibly delaying the related RPC. To avoid this, change Rx to retry the sendmsg call when it returns an error, since the error may be due to an unrelated ICMP error. To avoid needing to implement this retry loop in multiple places, move around our sendmsg code for AFS_RXERRQ_ENV, so that the higher-level function rxi_NetSend performs the retry and checks for socket errors (instead of the lower-level rxi_Sendmsg or osi_NetSend). Also change our functions to process socket errors to be more consistent between kernel and userspace: now we always have rxi_HandleSocketErrors, which runs a loop around the platform-specific osi_HandleSocketError. With this commit, osi_HandleSocketError is now required to be implemented when AFS_RXERRQ_ENV is defined. We hadn't been implementing this for UKERNEL, so just turn off AFS_RXERRQ_ENV for UKERNEL. Thanks to mbarbosa@sinenomine.net for discovering and providing information about the relevant issue. Change-Id: Iccceddcd2d28992ed7a00dc308816a0cb1a0195f Reviewed-on: https://gerrit.openafs.org/14424 Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-12-18 11:54:51 -05:00
Andrew Deason	eff7fa4b2e	rx: Save errno in pthread rxi_Sendmsg Currently, our pthread version of rxi_Sendmsg uses 'errno' in some logic if sendmsg fails, but we do so after calling functions that might alter errno (e.g. fflush). To make sure we get the correct errno value, save the value of errno right after sendmsg returns an error. Reorganize this function a bit to help make the logic easier to follow. Change-Id: I6bf284bd75edb5404bb6771bb99a9381b0f8654d Reviewed-on: https://gerrit.openafs.org/14423 Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-18 11:53:57 -05:00
Michael Meffie	2ad9190b83	afsio: readdir/fidreaddir commands Add the readdir/fidreaddir sub-commands to afsio dump AFS3 directory objects. This command dumps the raw directory object to stdout. Pipe the output to a program, such as the afsdump_dirlist program (from the CMU dumpscan tool kit), to parse the directory object. Example usage: afsio readdir -dir /afs/mycell/mypath/somedir \| afsdump_dirlist Change-Id: Ief181b432cdea6a11bbe61e781686ade2795faad Reviewed-on: https://gerrit.openafs.org/12381 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-18 10:59:09 -05:00
Mark Vitale	1efa4e49f2	vol: always build vol-bless utility In order to avoid future bit-rot, always build vol-bless. Also add it to the clean rule. However, continue to leave it undistributed and uninstalled by default. Change-Id: I3d2dc94c28a7feeb20167223655e97538e807ce6 Reviewed-on: https://gerrit.openafs.org/14464 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-18 10:44:52 -05:00
Benjamin Kaduk	4a45219eb7	Fix spelling of struct rx_ackPacket in comment A comment in rx_packet.h referred to the size of struct rx_ackpacket, but the actual structure is spelled with a majuscule 'P'. Change-Id: Iaf57f098b2e818fe0d492a89347a0a14bc3eb392 Reviewed-on: https://gerrit.openafs.org/14468 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-13 21:21:22 -05:00
Andrew Deason	7239565b0f	rx: Reorganize LWP rxi_Sendmsg to use 'goto error' Our LWP version of rxi_Sendmsg can allocate an fd_set, but we don't free the fd_set if sendmsg() returns certain errors afterwards. To make sure we go through the same cleanup code for the different possible error code paths, reorganize the function to go through a 'goto error'-style destructor. This also makes our return codes a bit more consistent; we should always return -errno now for errors. Change-Id: I5eaeb7f4ea1d76acc3bd9c52dc258f53f59f631e Reviewed-on: https://gerrit.openafs.org/14422 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-10 23:48:06 -05:00
Andrew Deason	01c10fe8a9	audit: Add missing AUD_TSTT case In commit 9ebff4c6 (OPENAFS-SA-2018-001 audit: support butc types), several new butc-related audit data types were added. In the AIX-specific audmakebuf() function, the case for the AUD_TSTT type is missing the actual "case" clause in the code, causing AUD_TSTT types to be treated as invalid (and so falling through to the "AFS_Aud_EINVAL" case). Add the "case" for AUD_TSTT, so it's treated properly on AIX. Note that the non-AIX printbuf() already handled this properly, so no changes are needed there. Change-Id: Ic46c18b503bacb0901ff0a60534f6c45ce3c9a75 Reviewed-on: https://gerrit.openafs.org/14466 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-12-10 16:48:35 -05:00
Mark Vitale	986ee6a0a7	vol: add vol-bless to .gitignore No functional change is incurred by this commit. Change-Id: If84ba946d43d67eb6c253462f5826f9a45a2df46 Reviewed-on: https://gerrit.openafs.org/14463 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-10 14:33:28 -05:00
Mark Vitale	e1f20287a4	vol: make vol-bless buildable again The vol-bless utility is not built by default and so is subject to bit-rot. Thus commit 170dbb3ce301329ff127bb23fb588db31439ae8d 'rx: Use opr queues' overlooked vol-bless.c when adding includes for users of struct rx_queue. Add the required #include <rx/rx_queue.h> so vol-bless builds again. Note to maintainers: this change is only required for 1.8.x and later; vol-bless builds fine in 1.6.x and earlier releases. Change-Id: Ia0bb78e3e7dd74b2f65ac07707aced2c81aaa5d9 Reviewed-on: https://gerrit.openafs.org/14462 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-10 14:27:05 -05:00
Mark Vitale	23bd776b01	afs: consolidate duplicated wait-for-cache-drain code Consolidate duplicated logic into a new routine afs_MaybeWaitForCacheDrain(). Change-Id: I2e23b86eeaabe3bc559e3ddca5c1e03082af6a3f Reviewed-on: https://gerrit.openafs.org/13278 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-12-06 17:38:01 -05:00
Michael Meffie	25792e2463	afs: more cache truncation stats Add counters for cache too full and waiting to drain occurrences. These will be used in later commits to indicate how often the cache truncation is required and how often the cache manager is waiting for cache truncation to complete. Change-Id: I4aa802729f0910dff1fb3e90b2d44d36df8bf8f3 Reviewed-on: https://gerrit.openafs.org/13168 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-12-06 17:37:44 -05:00
Cheyenne Wills	611507d8b5	kauth: Add support for updated audit facility New functionality was added to the audit facility that allows multiple audit logs. The updated audit interfaces require a specific calling sequence even if multiple audit logs are not used. Support for multiple auditlogs is not supported for kauth. Since kauth does not use libcmd for processing the command line, and adding support for multiple audit log instances requires additional effort, that is not warranted. Update kauth to follow the proper calling sequences for the audit facility. Update help message and manpage entries for -auditlog and -audit-interface. Make note that multiple -auditlogs are not supported. Change-Id: I98111b1e399e6687fde235bc2eadf0a28fa8acf4 Reviewed-on: https://gerrit.openafs.org/13782 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-04 19:08:59 -05:00
Cheyenne Wills	5069c697c7	Add command line support for multiple audit logs Gerrits #13774 (audit: Support multiple audit interfaces and interface options) and #13775 (audit: Add cmd helper for processing audit options) added support in the audit facility for multiple audit logs. Add command line support to use multiple audit logs for daemons that use libcmd for command line processing: bosserver, buserver, butc, fileserver, volserver, ptserver, and vlserver. Update the daemons to add a call to audit_open, and where possible add a call to audit_close when shutting down the daemon. Update help message and manpage entries for -auditlog and -audit-interface Change-Id: I4356e1aa84f580897a0e788e2a2829685be891aa Reviewed-on: https://gerrit.openafs.org/13776 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-12-04 19:04:12 -05:00
Cheyenne Wills	3e204354f5	audit: Add cmd helper for processing audit options osi_audit_cmd_Options will handle the processing for the -audit-interface and -audit-log command line options. The auditlog / audit-interface options are used by several services; this new helper routine provides a simple method to process the audit related command line options in a consistent fashion. Change-Id: I5acd12062dbfec23c1cbb0b2cdfc2d224354eed9 Reviewed-on: https://gerrit.openafs.org/13775 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-29 11:58:43 -05:00
Cheyenne Wills	52da4b9488	audit: Support multiple audit interfaces and interface options Currently, the audit subsystem only allows for one audit log to exist for the entire process. This can make it cumbersome to use for sites that have multiple tools or destinations that want to read the audit data. For example, to feed the audit data to two separate scripts, one script needs to read the data, and retransmit the data to the second script. To make such a setup easier, change the audit system to allow for multiple audit logs to exist at once. To allow callers to associate each audit log with an interface, we change the syntax for the value to the -auditlog parameter to the following: [interface:]filespec[:options] For example: -auditlog sysvmq:/tmp/msgqueue To accommodate the existing -audit-interface parameter, change the behavior of -audit-interface so that it sets the default audit interface if none is specified for -auditlog. This allows existing users of -audit-interface to experience the same behavior as before. In order to implement this, change the audit API and all existing audit interfaces to avoid using per-interface globals, and instead allocate per-instance contexts during startup. Also change the code so the audit message is constructed inside audit.c, instead of via a per-interface callback, which eliminates the duplicated logic in each interface's append_msg(), and lets us avoid holding 'audit_lock' during message construction. While we're changing the audit API, also introduce a few new operations: open_interface, close_interface and set_options. This commit and the existing interfaces do not make use of these new functions, but future commits will do so. This commit also only changes the audit subsystem itself to be able to handle multiple audit logs, and doesn't change any command-line parsing logic. Future commits will add the command-line parsing logic changes required so daemons can actually configure multiple interfaces. Thanks to Andrew Deason (adeason@sinenomine.net) for providing the changes needed to reduce holding the 'audit_lock' and improve performance as well as providing input during the development of this change. Change-Id: I1311ea417fdd0ba38d2206083cd65bd7a054d017 Reviewed-on: https://gerrit.openafs.org/13774 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-29 11:52:12 -05:00
Andrew Deason	78e5e1b0e5	LINUX: Return errors in our d_revalidate In our d_revalidate callback (afs_linux_dentry_revalidate), we currently 'goto bad_dentry' when we encounter any error. This can happen if we can't allocate memory or some other internal errors, or if the relevant afs_lookup call fails just due to plain network errors. For any of these cases, we'll treat the dentry as if it's no longer valid, so we'll return '0' and call d_invalidate() on the dentry. However, the behavior of d_invalidate changed, as mentioned in commit afbc199f1 (LINUX: Avoid d_invalidate() during afs_ShakeLooseVCaches()). After a certain point in the Linux kernel, d_invalidate() will also effectively d_drop() the given dentry, unhashing it. This can cause getcwd() calls to fail with ENOENT for those directories (as mentioned in afbc199f1), and can cause bind-mount calls to fail similarly during a small window. To avoid all of this, when we encounter an error that prevents us from checking if the dentry is valid or not, we need to return an error, instead of saying 'yes' or 'no'. So, change afs_linux_dentry_revalidate to jump to the 'done' label when we encounter such errors, and avoid calling d_drop/d_invalidate in such cases. This also lets us remove the 'lookup_good' variable and consolidate some of the related logic. Important note: in older Linux kernels, d_revalidate cannot return errors; callers just interpreted its return value as either 'valid' (non-zero) or 'not valid' (zero). The treatment of negative values as errors was introduced in Linux commit bcdc5e019d9f525a9f181a7de642d3a9c27c7610, which was included in 2.6.19. This is very old, but technically still above our stated requirements for the Linux kernel, so try to handle this case, by jumping to 'bad_dentry' still for those old kernels. Just do this with a version check, since no configure check can detect this (no function signatures changed), and the only Linux versions that are a concern are quite old. Change-Id: Ie530ce08463cf6b6899f056cb76ae4047c989ef2 Reviewed-on: https://gerrit.openafs.org/14417 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-27 13:15:41 -05:00
Michael Meffie	4c33820525	vldb_check: Check for volume lock inconsistencies Verify the a lock timestamp is set if, and only if, a lock volume operation flag is also set. When running vldb_check with the -fix option, fix the inconsistent entries by setting the lock timestamp to the current time if a lock flag is set, or by setting the VLOP_DELETE flag if the lock timestamp is set but no lock flags are set. (The VLOP_DELETE flag is the flag set by the 'vos lock command, and is shown in vos output as "delete/misc".) Volume lock fields can be put into an inconsistent state, at least, by interupted vos rename operations, due to bugs in vos rename. When the volume lock timestamp and lock flags are in this inconsistent state, the volume is locked, but that is not indicated by 'vos listvldb'. The volume can be unlocked by issuing 'vos unlock'. Change-Id: Idc4f821a9eb7675edd78a8547fdfe46e838b0c89 Reviewed-on: https://gerrit.openafs.org/14307 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-26 23:08:07 -05:00
Michael Meffie	6779e30d37	vsprocs: Remove dead code Remove the dead code in UV_VolumeMove() commented out with the macro ENABLE_BUGFIX_1165. Remove two commented out lines of code in UV_ConvertRO(). Change-Id: Ic628c74df011b0f09be6b03f72ab1baac5e59caf Reviewed-on: https://gerrit.openafs.org/14004 Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-17 21:24:52 -05:00
Cheyenne Wills	56aa396d83	vos: Cleanup function definitions The functions defined within vos.c are not referenced outside of vos.c but are not declared as static. Convert the functions within vos.c to static declarations. Change-Id: Ia684e698adc53ced964e10ee0496cb52a3af564e Reviewed-on: https://gerrit.openafs.org/14009 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 21:16:34 -05:00
Cheyenne Wills	a3be2c74a9	vos: Remove dead code Clean out dead code from vos.c GetVolumeType - not referenced anywhere CompareVLDBEntry - commented out since 1st git commit osi_audit - Comment indicates this might have been needed at one point. Builds without it. Does not look like the vos executable is pulling in any of the audit code. RestoreVolume - remove stale comment about typo previous to openafs 1.0 RemoveSite - remove commented out partition check Change-Id: I9c0b59d5c37d403610c7a904717ac9765598fc99 Reviewed-on: https://gerrit.openafs.org/14008 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 21:02:28 -05:00
Marcio Barbosa	45a69b6113	volser: take RO volume offline during convertROtoRW The vos convertROtoRW command converts a RO volume into a RW volume. Unfortunately, the RO volume is not checked out from the fileserver during this process. As a result, accesses to the volume being converted can leave volume objects in an inconsistent state. Moreover, consider the following scenario: 1. Create a volume on host_b and add replicas on host_a and host_b. $ vos create host_b a vol_1 $ vos addsite host_b a vol_1 $ vos addiste host_a a vol_1 2. Mount the volume: $ fs mkmount /afs/.mycell/vol_1 vol_1 $ vos release vol_1 $ vos release root.cell 3. Shutdown dafs on host_b: $ bos shutdown host_b dafs 4. Remove RO reference to host_b from the vldb: $ vos remsite host_b a vol_1 5. Attach the RO copy by touching it: $ fs flushall $ ls /afs/mycell/vol_1 6. Convert RO copy to RW: $ vos convertROtoRW host_a a vol_1 Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving the volume object for the RO copy set as VOL_STATE_ATTACHED (on success, this volume should be set as VOL_STATE_DELETED). 7. Add replica on host_a: $ vos addsite host_a a vol_1 8. Wait until the "inUse" flag of the RO entry is cleared (or force this to happen by attaching multiple volumes). 9. Release the volume: $ vos release vol_1 Failed to start transaction on volume 536870922 Volume not attached, does not exist, or not on line Error in vos release command. Volume not attached, does not exist, or not on line Notice that this happens because we cannot mark an attached volume as destroyed (FSYNC_com_VolDone). To avoid the problem mentioned above and to prevent accesses to the volume being converted, take the RO volume offline before converting it to RW. Change-Id: Ifd342e1f420dc42e5da49242a7aa70db7d97a884 Reviewed-on: https://gerrit.openafs.org/14340 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-13 12:33:46 -05:00
Cheyenne Wills	c17c157641	vos: Cleanup indentation whitespace Fix the indentation whitespace in vos.c, and remove double blank lines. No functional change. Change-Id: I97587779d6d2c131b5eac98bbee49efae73fafe9 Reviewed-on: https://gerrit.openafs.org/14007 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 12:13:19 -05:00
Michael Meffie	4bbf1239f8	vos: Return true when GetServerAndPart finds a site Change the GetServerAndPart() function to return true when a volume site in the vldb entry is found. Do not change the output arguments unless the site is found. Also, add a function comment header and fix some comment typos in this function. Change-Id: I10b43054b1bf9e6757ccdc95cb4559ab8b6dc013 Reviewed-on: https://gerrit.openafs.org/14006 Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>	2020-11-13 11:08:10 -05:00
Michael Meffie	ace2f7f5ce	vos: Add missing -partition requires -server checks The `vos remove` command was missing a check for the -server option when the -partition option is given. This command requires the -server option when the -partition is given, as documented in the man page. The `vos syncvldb` command performed the check for the -server option when the -partition option is given, but in the wrong location. As documented, the `vos unlockvldb` command permits the -partition option without a -server option, in which case all of the volumes listed in the VLDB with sites on the specified partition are unlocked. However, this command incorrectly issued an RPC to a volume server at address 0.0.0.0 when only the partition is given. Change-Id: I6b878678e28b34250e63d2d082747f6fd416972d Reviewed-on: https://gerrit.openafs.org/14005 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>	2020-11-13 11:07:08 -05:00
Mark Vitale	de3e7289e2	vos: avoid double release of a volume lock To update a volume entry in the VLDB, vos commands typically lock the volume entry via VL_SetLock, then call VL_UpdateEntryN, then release the lock via VL_ReleaseLock. However, some vos commands exploit the optional lock release flags of VL_UpdateEntryN to combine the update and unlock operations into a single RPC. This approach requires extra care to ensure that VL_ReleaseLock is issued for a failed VL_UpdateEntryN, but NOT for a successful VL_UpdateEntryN. Unfortunately, the following commands have success paths that fall through to the error path, resulting in a double release of the volume lock: - vos convertROtoRW - vos release A second VL_ReleaseLock of a volume entry that has already been unlocked via VL_UpdateEntryN is essentially a harmless no-op (other than negating any benefit of exploiting the VL_UpdateEntryN lock flags). However, if there is a race with another volume operation on the same volume, it is possible for this bug to release the volume lock of a different volume operation. This problem has been present in 'vos release' since OpenAFS 1.0. This problem has been present in 'vos convertROtoRW' since the command's introduction in commit 8af8241e94284522feb77d75aee8ea3deb73f3cc vol-ro-to-rw-tool-20030314. Properly maintain state to avoid unlocking a volume (with VL_ReleaseLock) that has already been unlocked (via VL_UpdateEntryN). Thanks to Andrew Deason for discovering the issue and suggesting the fix. Change-Id: I757b4619b9431d1ca980f755349806993add14a5 Reviewed-on: https://gerrit.openafs.org/14426 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:54:46 -05:00
Mark Vitale	8e1c321dc8	volser: document 'vos restore -readonly' restriction Commit 0c03f8607e15 vos-command-enhancements-20011008 introduced the 'vos restore' -readonly option, which allows the restored volume to be RO instead of the default RW. The commit message documents the following restriction: - ... This option causes the restored volume to be an RO volume. It is not permitted to restore an RO volume when the associated RW volume already exists. While it is possible to restore an RW volume where an RO volume exists, caution should be used to avoid doing this with VLDB entries created by 'vos restore -readonly', since such entries have their ROVOL and RWVOL ID's set to the same thing. Document this restriction in the 'vos restore' man page, and in a code comment. No functional change is incurred by this commit. Change-Id: I34f6c5434b82da538a38a9d219207b33dcf62b17 Reviewed-on: https://gerrit.openafs.org/14348 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:35:15 -05:00
Mark Vitale	0bcfe89d68	volser: improve error checking for 'vos restore' UV_RestoreVolume2 calls VLDB_GetEntryByName to obtain information for sanity checking, but only checks for a VL_NOENT error code; other codes are thus ignored, which may lead to confusing results. Add an additional error check for 'vos restore' (and other callers of UV_RestoreVolume2) to stop and issue an error message if a non-VL_NOENT error code is received from VLDB_GetEntryByName. Change-Id: Idf41965fdd84fa282a3397215ec393ae10f72018 Reviewed-on: https://gerrit.openafs.org/14347 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:21:56 -05:00
Mark Vitale	a534ee83f2	volser: fix 'cant' typos Correctly spell "can't" in a log message and a comment. Change-Id: I9d5c667d9c5ea3c5b726f958431c497353433239 Reviewed-on: https://gerrit.openafs.org/14346 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-13 10:09:57 -05:00
Mark Vitale	aed4a0c4b9	afs: avoid panic in DNew when afs_WriteDCache fails afs_WriteDCache may fail for an IO error, or if interrupted (EINTR). Unfortunately, DNew will panic in this case, crashing the entire machine. In order to avoid an outage in this case, don't panic. Instead, reflect the error back to the caller of DNew. While here, add Doxygen comments to DNew. Change-Id: I27a8f89bab979c5691dded70e8b9eacbe8aff4fd Reviewed-on: https://gerrit.openafs.org/13804 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 15:51:35 -05:00
Mark Vitale	1c04036b34	afs: remove redundant assignment DRelease has two assignments for tp = entry->buffer; remove the second (redundant) one. Introduced with 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly state buffer locations for data'. No functional change should be incurred by this commit. Change-Id: If4a17862f451973075fa3fa267b5139046d97ede Reviewed-on: https://gerrit.openafs.org/13802 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 15:51:12 -05:00
Mark Vitale	6bd94fe29d	dir: check DNew return code Commit 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly state buffer locations for data' changed DNew and DRead to return a return code. However, the callers of DNew were not modified to check the new return code. (This commit applied only to the implementations dealing with AFS directories, in afs/afs_buffer.c and dir/dir.c. The ubik implmentations of DNew and DRead, dealing with ubik databases, were not modified.) Modify all (non-ubik) callers of DNew to check the return code. In addition, modify code as needed so return codes are properly propagated to the callers. While here, add Doxygen comments for AddPage and FindBlobs. Change-Id: Iabde6499745dd351f3fcda73c9f52c440a36490e Reviewed-on: https://gerrit.openafs.org/13801 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 14:42:15 -05:00
Andrew Deason	7b0a66f63b	Remove unused xdr types Numerous types and constants are defined in our various RPC-L files that are never used or referenced by anything. Remove them. Change-Id: I0b03be1ce0e186a88f80d2f3f7a66a1e25965ff3 Reviewed-on: https://gerrit.openafs.org/14404 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 14:29:37 -05:00
Benjamin Kaduk	0787a2c8ae	volser: apply static keyword to VolPartitionInfo definition The function declaration was already marked as static; mark the definition as well for consistency (and consistency with the other helpers in this file). Change-Id: I642db1d27efd34ab2a09f7299791c19d07b1f923 Reviewed-on: https://gerrit.openafs.org/13321 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 11:59:17 -05:00
Mark Vitale	dcce956df4	dir: check afs_dir_Create return code in afs_dir_MakeDir afs_dir_MakeDir() ignores the return code from afs_dir_Create() for the '.' and '..' ("dot" and "dotdot") directories. This has been the case from the earliest implementation (MakeDir() calling Create()) in the original IBM import. Instead, check the return codes to prevent the possibility of creating malformed directories. Change-Id: I60179488429dfa9afe60c4862c5e42b41f1e0048 Reviewed-on: https://gerrit.openafs.org/13800 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 11:47:01 -05:00
Benjamin Kaduk	04805f48a2	ptserver: rename NameToID and IDToName helpers These helper function names alias the names of public RPCs and can cause confusion when grepping the code. Rename them in a different style to provide greater hamming distance between the various functions involved in handling these RPCs. Change-Id: I0e2c7997bc145888affdac28716293ff820756c7 Reviewed-on: https://gerrit.openafs.org/13320 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 10:16:42 -05:00
Mark Vitale	0639ca8d22	dir: check afs_dir_MakeDir return code in DirSalvage Since the original IBM import, DirSalvage() has ignored the return code from afs_dir_MakeDir() (f.k.a. MakeDir). This has been safe because, as the comment states, afs_dir_MakeDir returns no (non-zero) error code. In preparation for a future commit, add a check for the return from afs_dir_MakeDir and remove the comment. Change-Id: Ibb259a7aaeeb21ef70a7794143a0dadb2a75725d Reviewed-on: https://gerrit.openafs.org/13799 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-06 10:07:04 -05:00
Mark Vitale	735fa5fb09	dir: distinguish logical and physical errors on reads The directory package (src/dir) salvage routines DirOK and DirSalvage check a global variable 'DErrno' to distinguish logical errors (e.g. short read) from physical errors (e.g. EIO). However, since the original IBM import, this logic has not worked correctly because there is no longer any code that sets the value of DErrno - its value is always zero. Instead, modify all implementations of ReallyRead to optionally return the errno for low-level IO errors. Also, create a new userspace-only variant - DReadWithErrno() - of the src/dir/buffer.c version of DRead (the version called by DirOK and DirSalvage, and the only caller of ReallyRead) to return the ReallyRead errno upon request. Also create an analogous variant of afs_dir_GetBlobs, afs_dir_GetBlobsWithErrno(). Finally, convert DirOK and DirSalvage to use the new variants and replace DErrno with equivalent logic. Remove all other references to DErrno. Change-Id: I3de182ce49c1682572142da594af5dc2c00ede74 Reviewed-on: https://gerrit.openafs.org/13798 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 10:06:13 -05:00
Andrew Deason	1caeeea43c	afs: Log pid with disk cache read errors Log the current pid (and procname) when we complain about an error when reading from CacheItems in afs_UFSGetDSlot. These errors can result in confusing situations, so it can be helpful to know at least what process saw the error. Our logic for logging this information is getting a bit large, so also move this to a new function, LogCacheError. Change-Id: I3427e736458784df0d516f4182684605e930e128 Reviewed-on: https://gerrit.openafs.org/14416 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-06 01:18:21 -05:00
Cheyenne Wills	98c1a8751c	roken: use strtok_r from roken Windows standard library doesn't provide strtok_r. Use the strtok_r that is provided from roken. Change-Id: I1bccb9a306c9dd1963f044127fb5dfe4da5728cc Reviewed-on: https://gerrit.openafs.org/13891 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 23:14:24 -05:00
Heimdal Developers	e5abb34882	Import of code from heimdal This commit updates the code imported from heimdal to 5dfaa0d10b8320293e85387778adcdd043dfc1fe (git2svn-syncpoint-master-311-g5dfaa0d10) New files are: roken/strtok_r.c Change-Id: I27042f614c7d6ce9a95a80d01474e8bf401e4760 Reviewed-on: https://gerrit.openafs.org/13890 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 23:14:04 -05:00
Cheyenne Wills	fe6a97b4d8	roken: add strtok_r to the imported file list Import the strtok_r function which is needed by audit for parsing command line options. Change-Id: I8412c5a663dc3315c4146665edb72d9a6b8df5be Reviewed-on: https://gerrit.openafs.org/13889 Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 23:13:46 -05:00
Benjamin Kaduk	a912a29a45	Detect realloc failure While reviewing other commits, a call to realloc() was discovered that would leak memory on failure (by virtue of always assigning the realloc() return value to the pointer holding the input address, even when the return value is NULL). Check for failure and return early in that case (giving an incomplete list of events). Change-Id: Ic6e889f1d990bd289812ce4bf8e9cd4ebce488ec Reviewed-on: https://gerrit.openafs.org/13313 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-05 22:49:43 -05:00
Benjamin Kaduk	7ede3fa17f	ptserver: move IDToName, NameToID to ptprocs.c and make static These two helpers are only used in implementing server-side RPC handlers, and having to track the codeflow across files is unhelpful. Move them into the file where they're used, make them static, and remove the prototypes from ptrototypes.h (which is not an installed header, so there is no API/ABI breakage). Change-Id: I236d17865a296933f41aaee206535d341c3a955d Reviewed-on: https://gerrit.openafs.org/13319 Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-05 22:29:07 -05:00
Benjamin Kaduk	fe4f6638d1	Assign explicit opcodes to butc RPCs This should prevent inadvertent reassignment if additional RPCs are introduced in the future. Change-Id: I5645ca478d2ecef9962f4bde04ab8f9895dd9497 Reviewed-on: https://gerrit.openafs.org/13317 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-05 21:39:31 -05:00
Andrew Deason	fd6add0aca	vlserver: Return VL_DBBAD on unhash failure If we try to delete a vlentry, and the vlentry cannot be found on one of its hash chains, we cannot unhash the vlentry properly and the operation fails with VL_NOENT. This results in the following error messages to the user: $ vos delentry 123456 Could not delete entry for volume 123456 You must specify a RW volume name or ID (the entire VLDB entry will be deleted) VLDB: no such entry Deleted 0 VLDB entries This is confusing, because VL_NOENT can also occur if the user specifies a volume that does actually not exist. This situation is indicative of database corruption, usually because of a ubik transaction that was only half-applied, or because of other ubik bugs in the past. The situation can only really be fixed by repairing the database, so return VL_DBBAD in this case instead, to more clearly indicate that something is wrong with the database, and not a problem with the arguments the caller provided. Change-Id: I6fc275c3ad05c108778f36687227b0a927cca5da Reviewed-on: https://gerrit.openafs.org/13384 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2020-11-02 11:53:43 -05:00
Andrew Deason	878d27c845	vlserver: Add VL_DBBAD error code The VL_ error table currently doesn't have an error code to indicate that an operation cannot succeed because the database is corrupted. There are a few error codes for specific cases of errors that are probably the result of corruption (like VL_IDALREADYHASHED, or VL_EMPTY), but these are only for specific cases and indicate rather low-level internal problems. There are some instances where the real problem preventing an operation from succeeding is that the database is just corrupt or inconsistent in some way, and the administrator must repair the database before it can succeed. And we currently don't have any way of indicating that situation via an error code. So, introduce the VL_DBBAD code, to indicate this situation. Error codes already exist in other tables for similar situations, such as PRDBBAD, and KADATABASEINCONSISTENT. This commit does not use the new error code anywhere; we just introduce it into the VL_ error table, so comerr-using applications will be able to interpret it. Note that the VL_DBBAD error code has been recognized by the AFS Assigned Numbers Registry as recorded in the ticket history of <https://rt.central.org/rt/Ticket/Display.html?id=134817> Change-Id: I8fea356a4e0db907ec8418efe6ef35d547be0a63 Reviewed-on: https://gerrit.openafs.org/13383 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-11-02 11:53:22 -05:00

1 2 3 4 5 ...

13494 Commits