RedHat: Retry umount /afs on systemd shutdown

When systemd tries to stop openafs-client.service during system
shutdown, our 'umount /afs' will fail if someone else is accessing
/afs. The openafs-client.service unit is then marked as deactivated
(and failed), and the shutdown sequence proceeds.

After all services have been stopped, systemd-shutdown tries to kill
all remaining processes with SIGTERM and then SIGKILL, waiting
DefaultTimeoutStopSec seconds (default: 90) for them to die. If there
are unkillable processes running (for example, afsd), this results in
at least a 3-minute delay.

It's hard to make sure there are no processes accessing /afs during
the shutdown sequence, since that could include processes outside of
defined systemd units. So some processes may be shutting down in
parallel with openafs-client.service, and so it's a race whether there
are /afs-using processes when we try to umount /afs.

To avoid the most common cases of this, retry our umount during
openafs-client's ExecStop for $UMOUNT_TIMEOUT seconds (default: 30),
to give other /afs-using processes a chance to go away. Only do this
if the system is shutting down (according to 'systemctl
is-system-running'), so users don't see a long delay running
'systemctl stop openafs-client' during normal system operation.

Written in collaboration with cwills@sinenomine.net.

Change-Id: I5755dbf4cddf4204ed6836f9f4f21c00133fcb39
Reviewed-on: https://gerrit.openafs.org/15633
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
This commit is contained in:
Andrew Deason 2024-04-04 13:20:05 -05:00
parent f951a9bf9b
commit bd2a7530ad

View File

@ -5,6 +5,8 @@
set -e
UMOUNT_TIMEOUT=30
[ -f /etc/sysconfig/openafs ] && . /etc/sysconfig/openafs
case $1 in
@ -38,6 +40,28 @@ case $1 in
else
echo "Failed to unmount /afs: $?"
fi
state=$(systemctl is-system-running || true)
if [ "$state" = stopping ] && [ x"$UMOUNT_TIMEOUT" != x ] && /bin/mountpoint --quiet /afs ; then
# If we are shutting down the system, failing to umount /afs
# can lead to longer delays later as systemd tries to forcibly
# kill our afsd processes. So retry the umount a few times,
# just in case other /afs-using processes just need a few
# seconds to go away.
echo "For system shutdown, retrying umount /afs for $UMOUNT_TIMEOUT secs"
interval=3
for (( i = 0; i < $UMOUNT_TIMEOUT; i += $interval )) ; do
sleep $interval
if /bin/umount --verbose /afs ; then
exit 0
fi
if ! /bin/mountpoint --quiet /afs ; then
echo "mountpoint /afs disappeared; bailing out"
exit 0
fi
done
echo "Still cannot umount /afs, bailing out"
fi
exit 1
;;