RedHat: Make client unit start/stop more robust

Our openafs-client.service systemd unit currently has some unfortunate
behaviors:

- If someone runs 'systemctl stop openafs-client' and someone is using
  /afs, our umount will fail, and systemd will consider the
  openafs-client unit failed and deactivated. Trying to stop the unit
  again won't do anything, and trying to start the unit will fail
  because of our 'fs sysname' check. The client can then only be
  stopped by manually running umount/rmmod.

- If our kernel module is already initialized (because afsd failed
  during startup, or someone 'umount'd /afs without unloading the
  kernel module), running 'systemctl start openafs-client' will try to
  start afsd with an already-initialized kernel module, which will
  either fail or cause errors/panics.

To improve this situation, change our startup sequence to unload the
kernel module if it's already loaded (and then load it again right
afterwards). This should guarantee that we won't use an
already-initialized kernel module when we run afsd. This also means we
will fail during startup if the kernel module cannot be unloaded for
any reason (for example, if the client is already running but the 'fs
sysname' check somehow didn't detect this).

Also change our 'fs sysname' check to return success if the client is
already running, instead of failure. This means that after a failed
'stop', the user can run 'start' and then 'stop' again to try and stop
the client. Just running 'stop' again still won't do anything, which
is not ideal, but that's just how systemd works.

Move our 'afsd -shutdown' and 'rmmod' steps into ExecStopPost, so they
may get run in some additional corner cases for a
partially-initialized service.

Add --verbose to a few commands, to make it a little clearer what's
happening in what order in systemd logs.

If we cannot unload the openafs kernel module when stopping (because,
for example, we couldn't 'umount /afs' because it was in use), log some
information about how the user can actually get the client stopped.

Change-Id: I78463160a1835137efaeeb0f27bb19c78171e9da
Reviewed-on: https://gerrit.openafs.org/15647
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
This commit is contained in:
Andrew Deason 2024-02-12 15:41:36 -06:00
parent ba762b83d4
commit f951a9bf9b
2 changed files with 42 additions and 10 deletions

View File

@ -8,26 +8,58 @@ set -e
[ -f /etc/sysconfig/openafs ] && . /etc/sysconfig/openafs
case $1 in
ExecStartPre)
ExecStart)
if fs sysname >/dev/null 2>&1 ; then
echo AFS client appears to be running -- not starting
exit 1
# If we previously tried to stop the client and failed (because
# e.g. /afs was in use), our unit will be deactivated but the
# client will keep running. So if we're starting up, but the client
# is currently running, do not perform the startup sequence but
# just return success, to let the unit activate, so stopping the
# unit can go through the shutdown sequence again.
echo AFS client appears to be running -- skipping startup
exit 0
fi
sed -n 'w/usr/vice/etc/CellServDB' /usr/vice/etc/CellServDB.local /usr/vice/etc/CellServDB.dist
chmod 0644 /usr/vice/etc/CellServDB
exec /sbin/modprobe openafs
;;
ExecStart)
# If the kernel module is already initialized from a previous client
# run, it must be unloaded and loaded again. So if the module is
# currently loaded, unload it in case it was (partly) initialized.
if lsmod | grep -wq ^openafs ; then
/sbin/rmmod --verbose openafs
fi
/sbin/modprobe --verbose openafs
exec /usr/vice/etc/afsd $AFSD_ARGS
;;
ExecStop)
/bin/umount /afs || true
if /bin/umount --verbose /afs ; then
exit 0
else
echo "Failed to unmount /afs: $?"
fi
exit 1
;;
ExecStopPost)
/usr/vice/etc/afsd -shutdown || true
exec /sbin/rmmod openafs
/sbin/rmmod --verbose openafs || true
if lsmod | grep -wq ^openafs ; then
echo "Cannot unload the OpenAFS client kernel module."
echo "systemd will consider the openafs-client.service unit inactive, but the AFS client may still be running."
echo "To stop the client, stop all access to /afs, and then either:"
echo "stop the client manually:"
echo " umount /afs"
echo " rmmod openafs"
echo "or start and stop the openafs-client.service unit:"
echo " systemctl start openafs-client.service"
echo " systemctl stop openafs-client.service"
echo 'See "journalctl -u openafs-client.service" for details.'
exit 1
fi
exit 0
;;
esac
echo "Usage: $0 {ExecStartPre|ExecStart|ExecStop}" >&2
echo "Usage: $0 {ExecStart|ExecStop|ExecStopPost}" >&2
exit 1

View File

@ -7,9 +7,9 @@ Before=remote-fs.target
[Service]
Type=forking
RemainAfterExit=true
ExecStartPre=/usr/vice/etc/openafs-client-systemd-helper.sh ExecStartPre
ExecStart= /usr/vice/etc/openafs-client-systemd-helper.sh ExecStart
ExecStop= /usr/vice/etc/openafs-client-systemd-helper.sh ExecStop
ExecStopPost=/usr/vice/etc/openafs-client-systemd-helper.sh ExecStopPost
KillMode=process
GuessMainPID=no
SendSIGKILL=no