volser: Add simple shutdown signal handler

Currently, the volserver process doesn't register any signal handlers
for a shutdown sequence. When the fileserver process group is shutdown,
the bosserver sends a SIGTERM to the volserver process, and the
volserver process immediately dies. If any volumes are attached by the
volserver at the time (e.g., for dumping or restoring a volume), the
volume is not cleanly detached, and usually must be salvaged later on
before it can be used. This can be confusing to administrators, since a
volume may need salvage even though we never logged a reason why the
volume got in an unclean/broken state.

To improve this situation, add a signal handler to the volserver so we
can go through a shutdown process. In the future, we can add a more
complex shutdown process that may interrupt running volume transactions,
or wait for transactions to go away, or something else. But for now,
just as a first step, add a very simple shutdown process that just logs
what transactions are being interrupted, so we at least give a clue as
to why some volumes were not cleanly detached.

With this commit, the volserver now logs some messages if transactions
are running when it's shutdown. For example, a VolserLog may look like
this:

    Mon Jan 13 10:11:32 2025 Volserver shutting down on signal 15
    Mon Jan 13 10:11:32 2025 Interrupting transaction 2 for volume 536871057 partition /vicepa; volume may need salvage
    Mon Jan 13 10:11:32 2025 Interrupting transaction 1 for volume 536871052 partition /vicepa; volume may need salvage
    Mon Jan 13 10:11:32 2025 Volserver shutdown complete

With this commit, the volserver process also exits with code 0 on a
normal shutdown, instead of being terminated by the SIGTERM signal. The
BosLog entry for shutting down a volserver process used to look like
this:

    Mon Jan 13 10:11:32 2025 dafs:vol exited on signal 15

and with this commit, now looks like this:

    Mon Jan 13 10:11:32 2025 dafs:vol exited with code 0

This commit just adds the signal handler for the pthreaded volserver;
don't bother adding a code path for the obsolete LWP volserver.

Change-Id: I9f8321f845d45f6b37d9c69d12d54d1830d68b23
Reviewed-on: https://gerrit.openafs.org/16083
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
This commit is contained in:
Andrew Deason 2024-11-14 01:45:57 -06:00 committed by Michael Meffie
parent 4a32840b3a
commit e127edeff6

View File

@ -112,6 +112,38 @@ MyAfterProc(struct rx_call *acall, afs_int32 code)
return; return;
} }
#ifdef AFS_PTHREAD_ENV
static void
shutdown_signal(int sig)
{
struct volser_trans *tt;
char part[16];
Log("Volserver shutting down on signal %d\n", sig);
VTRANS_LOCK;
for (tt = TransList(); tt != NULL; tt = tt->next) {
/*
* We don't need to lock each individual 'tt', since we are only
* accessing tt->tid, tt->volid and tt->partition, which never change
* after the transaction is created.
*/
if (volutil_PartitionName2_r(tt->partition, part, sizeof(part)) != 0) {
snprintf(part, sizeof(part), "[bad index %d]", tt->partition);
}
Log("Interrupting transaction %d for volume %u partition %s; volume may need salvage\n",
tt->tid, tt->volid, part);
}
VTRANS_UNLOCK;
Log("Volserver shutdown complete\n");
exit(0);
}
#endif /* AFS_PTHREAD_ENV */
/* Called every GCWAKEUP seconds to try to unlock all our partitions, /* Called every GCWAKEUP seconds to try to unlock all our partitions,
* if we're idle and there are no active transactions * if we're idle and there are no active transactions
*/ */
@ -567,6 +599,8 @@ main(int argc, char **argv)
#ifdef AFS_PTHREAD_ENV #ifdef AFS_PTHREAD_ENV
opr_softsig_Init(); opr_softsig_Init();
SetupLogSoftSignals(); SetupLogSoftSignals();
opr_softsig_Register(SIGINT, shutdown_signal);
opr_softsig_Register(SIGTERM, shutdown_signal);
#else #else
SetupLogSignals(); SetupLogSignals();
#endif #endif