From e127edeff69c5b8cc3865658b93b548b6b1d8569 Mon Sep 17 00:00:00 2001 From: Andrew Deason Date: Thu, 14 Nov 2024 01:45:57 -0600 Subject: [PATCH] volser: Add simple shutdown signal handler Currently, the volserver process doesn't register any signal handlers for a shutdown sequence. When the fileserver process group is shutdown, the bosserver sends a SIGTERM to the volserver process, and the volserver process immediately dies. If any volumes are attached by the volserver at the time (e.g., for dumping or restoring a volume), the volume is not cleanly detached, and usually must be salvaged later on before it can be used. This can be confusing to administrators, since a volume may need salvage even though we never logged a reason why the volume got in an unclean/broken state. To improve this situation, add a signal handler to the volserver so we can go through a shutdown process. In the future, we can add a more complex shutdown process that may interrupt running volume transactions, or wait for transactions to go away, or something else. But for now, just as a first step, add a very simple shutdown process that just logs what transactions are being interrupted, so we at least give a clue as to why some volumes were not cleanly detached. With this commit, the volserver now logs some messages if transactions are running when it's shutdown. For example, a VolserLog may look like this: Mon Jan 13 10:11:32 2025 Volserver shutting down on signal 15 Mon Jan 13 10:11:32 2025 Interrupting transaction 2 for volume 536871057 partition /vicepa; volume may need salvage Mon Jan 13 10:11:32 2025 Interrupting transaction 1 for volume 536871052 partition /vicepa; volume may need salvage Mon Jan 13 10:11:32 2025 Volserver shutdown complete With this commit, the volserver process also exits with code 0 on a normal shutdown, instead of being terminated by the SIGTERM signal. The BosLog entry for shutting down a volserver process used to look like this: Mon Jan 13 10:11:32 2025 dafs:vol exited on signal 15 and with this commit, now looks like this: Mon Jan 13 10:11:32 2025 dafs:vol exited with code 0 This commit just adds the signal handler for the pthreaded volserver; don't bother adding a code path for the obsolete LWP volserver. Change-Id: I9f8321f845d45f6b37d9c69d12d54d1830d68b23 Reviewed-on: https://gerrit.openafs.org/16083 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Marcio Brito Barbosa Reviewed-by: Michael Meffie --- src/volser/volmain.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/src/volser/volmain.c b/src/volser/volmain.c index 77bd5db46e..c5a2c18eca 100644 --- a/src/volser/volmain.c +++ b/src/volser/volmain.c @@ -112,6 +112,38 @@ MyAfterProc(struct rx_call *acall, afs_int32 code) return; } +#ifdef AFS_PTHREAD_ENV +static void +shutdown_signal(int sig) +{ + struct volser_trans *tt; + char part[16]; + + Log("Volserver shutting down on signal %d\n", sig); + + VTRANS_LOCK; + + for (tt = TransList(); tt != NULL; tt = tt->next) { + /* + * We don't need to lock each individual 'tt', since we are only + * accessing tt->tid, tt->volid and tt->partition, which never change + * after the transaction is created. + */ + if (volutil_PartitionName2_r(tt->partition, part, sizeof(part)) != 0) { + snprintf(part, sizeof(part), "[bad index %d]", tt->partition); + } + Log("Interrupting transaction %d for volume %u partition %s; volume may need salvage\n", + tt->tid, tt->volid, part); + } + + VTRANS_UNLOCK; + + Log("Volserver shutdown complete\n"); + + exit(0); +} +#endif /* AFS_PTHREAD_ENV */ + /* Called every GCWAKEUP seconds to try to unlock all our partitions, * if we're idle and there are no active transactions */ @@ -567,6 +599,8 @@ main(int argc, char **argv) #ifdef AFS_PTHREAD_ENV opr_softsig_Init(); SetupLogSoftSignals(); + opr_softsig_Register(SIGINT, shutdown_signal); + opr_softsig_Register(SIGTERM, shutdown_signal); #else SetupLogSignals(); #endif