mirror of
https://git.openafs.org/openafs.git
synced 2025-01-18 06:50:12 +00:00
volser: Add simple shutdown signal handler
Currently, the volserver process doesn't register any signal handlers for a shutdown sequence. When the fileserver process group is shutdown, the bosserver sends a SIGTERM to the volserver process, and the volserver process immediately dies. If any volumes are attached by the volserver at the time (e.g., for dumping or restoring a volume), the volume is not cleanly detached, and usually must be salvaged later on before it can be used. This can be confusing to administrators, since a volume may need salvage even though we never logged a reason why the volume got in an unclean/broken state. To improve this situation, add a signal handler to the volserver so we can go through a shutdown process. In the future, we can add a more complex shutdown process that may interrupt running volume transactions, or wait for transactions to go away, or something else. But for now, just as a first step, add a very simple shutdown process that just logs what transactions are being interrupted, so we at least give a clue as to why some volumes were not cleanly detached. With this commit, the volserver now logs some messages if transactions are running when it's shutdown. For example, a VolserLog may look like this: Mon Jan 13 10:11:32 2025 Volserver shutting down on signal 15 Mon Jan 13 10:11:32 2025 Interrupting transaction 2 for volume 536871057 partition /vicepa; volume may need salvage Mon Jan 13 10:11:32 2025 Interrupting transaction 1 for volume 536871052 partition /vicepa; volume may need salvage Mon Jan 13 10:11:32 2025 Volserver shutdown complete With this commit, the volserver process also exits with code 0 on a normal shutdown, instead of being terminated by the SIGTERM signal. The BosLog entry for shutting down a volserver process used to look like this: Mon Jan 13 10:11:32 2025 dafs:vol exited on signal 15 and with this commit, now looks like this: Mon Jan 13 10:11:32 2025 dafs:vol exited with code 0 This commit just adds the signal handler for the pthreaded volserver; don't bother adding a code path for the obsolete LWP volserver. Change-Id: I9f8321f845d45f6b37d9c69d12d54d1830d68b23 Reviewed-on: https://gerrit.openafs.org/16083 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Cheyenne Wills <cwills@sinenomine.net> Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
This commit is contained in:
parent
4a32840b3a
commit
e127edeff6
@ -112,6 +112,38 @@ MyAfterProc(struct rx_call *acall, afs_int32 code)
|
||||
return;
|
||||
}
|
||||
|
||||
#ifdef AFS_PTHREAD_ENV
|
||||
static void
|
||||
shutdown_signal(int sig)
|
||||
{
|
||||
struct volser_trans *tt;
|
||||
char part[16];
|
||||
|
||||
Log("Volserver shutting down on signal %d\n", sig);
|
||||
|
||||
VTRANS_LOCK;
|
||||
|
||||
for (tt = TransList(); tt != NULL; tt = tt->next) {
|
||||
/*
|
||||
* We don't need to lock each individual 'tt', since we are only
|
||||
* accessing tt->tid, tt->volid and tt->partition, which never change
|
||||
* after the transaction is created.
|
||||
*/
|
||||
if (volutil_PartitionName2_r(tt->partition, part, sizeof(part)) != 0) {
|
||||
snprintf(part, sizeof(part), "[bad index %d]", tt->partition);
|
||||
}
|
||||
Log("Interrupting transaction %d for volume %u partition %s; volume may need salvage\n",
|
||||
tt->tid, tt->volid, part);
|
||||
}
|
||||
|
||||
VTRANS_UNLOCK;
|
||||
|
||||
Log("Volserver shutdown complete\n");
|
||||
|
||||
exit(0);
|
||||
}
|
||||
#endif /* AFS_PTHREAD_ENV */
|
||||
|
||||
/* Called every GCWAKEUP seconds to try to unlock all our partitions,
|
||||
* if we're idle and there are no active transactions
|
||||
*/
|
||||
@ -567,6 +599,8 @@ main(int argc, char **argv)
|
||||
#ifdef AFS_PTHREAD_ENV
|
||||
opr_softsig_Init();
|
||||
SetupLogSoftSignals();
|
||||
opr_softsig_Register(SIGINT, shutdown_signal);
|
||||
opr_softsig_Register(SIGTERM, shutdown_signal);
|
||||
#else
|
||||
SetupLogSignals();
|
||||
#endif
|
||||
|
Loading…
Reference in New Issue
Block a user