RedHat: Use KillMode=process for systemd client

Our openafs-client.service systemd unit file contains a deprecated
option, KillMode=none. Using this option results in the following
message with systemd version 246 or later:

    /lib/systemd/system/openafs-client.service:22: Unit configured to
        use KillMode=none. This is unsafe, as it disables systemd's
        process lifecycle management for the service. Please update your
        service to use a safer KillMode=, such as 'mixed' or
        'control-group'. Support for KillMode=none is deprecated and
        will eventually be removed.

Without this option, if someone runs 'systemctl stop openafs-client'
and the client fails to shutdown (e.g., because files are accessing
/afs), systemd will try to kill all of our afsd processes. Our afsd
processes usually either cannot be killed, or will cause unstable
behavior if they are killed (because e.g. AFSDB requests cannot be
fulfilled).

If systemd cannot kill all of our afsd processes, it will wait for a
timeout before reporting an error. By default, it waits 90 seconds
before sending SIGTERMs, and another 90 seconds before sending
SIGKILLs. This means that by default, if someone is using /afs,
'systemctl stop openafs-client' will hang for 3 minutes (!), even
though we know immediately that we cannot stop the client.

One way to avoid this is using KillMode=none, which skips killing our
processes and waiting for any timeouts. To avoid using a deprecated
option, switch to using KillMode=process.

With KillMode=process, after a failed 'stop', systemd will only try to
kill the 'main' pid run by ExecStart. The 'main' pid is detected by
systemd either automatically by some heuristic (with
GuessMainPID=yes), or by a pid file (when PIDFile= is set). If we
disable GuessMainPID and don't set PIDFile, systemd will not try to
terminate any of our processes on shutdown.

systemd will still try to kill our other remaining processes using
SIGKILL, but we can disable that with SendSIGKILL=no. To be safe, also
specify KillSignal=SIGCONT to make sure systemd doesn't actually
forcibly kill any of our afsd processes.

None of this matters during a successful client shutdown, since then
all of our afsd processes go away after a successful unmount, and
there's nothing to cleanup.

Our behavior during a failed 'stop' is still not ideal. After a failed
'stop', systemd will flag the service as "failed (Result: exit-code)".
This is similar to a service that is stopped successfully
(deactivated), and running 'systemctl stop' on it again does nothing.
Running 'systemctl start openafs-client' will try to start the service
again, but this will fail because of the ExecStartPre check that runs
'fs sysname', and the service will still be considered
failed/deactivated. The only way to fix the situation is for an
administrator to run the shutdown sequence manually, unmounting /afs
and removing the kernel module themselves, and then starting the
client again.

That behavior is unfortunate, but seems difficult or impossible to
avoid with a single systemd service.

Change-Id: Ibed2971f72e4cde2cbeaeefc3ac14325ac8f84e1
Reviewed-on: https://gerrit.openafs.org/15613
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
This commit is contained in:
Cheyenne Wills 2024-01-22 15:06:56 -07:00 committed by Andrew Deason
parent 2e39bea08f
commit df3b8129ce

View File

@ -16,7 +16,10 @@ ExecStart=/usr/vice/etc/afsd $AFSD_ARGS
ExecStop=/bin/umount /afs
ExecStop=/usr/vice/etc/afsd -shutdown
ExecStop=/sbin/rmmod openafs
KillMode=none
KillMode=process
GuessMainPID=no
SendSIGKILL=no
KillSignal=SIGCONT
[Install]
WantedBy=multi-user.target remote-fs.target