mirror of
https://git.openafs.org/openafs.git
synced 2025-01-19 15:30:14 +00:00
ubik: Try to detect VOTE_Beacon errors
Currently the way ubik dbsites vote for each other is via the "return value" of the Beacon VOTE RPC. Since this is really an Rx abort, this can easily collide with actual errors on the wire, such as rxkad errors. Try to detect these by detecting vote times that are very different than the current timestamp (more than an hour in the future or past), and treat it like a network error. If we do not do this, a single site reporting an error can cause us to never reach quorum, since we calculate our sync site expiration based on the oldest 'yes' vote, which for most known Rx aborts will be far in the past. Change-Id: I28cf4c520bbbe9e98eb55947476c8785d3c8ec0b Reviewed-on: http://gerrit.openafs.org/8486 Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
This commit is contained in:
parent
7c8373c8c2
commit
4d4668b161
@ -497,6 +497,22 @@ ubeacon_Interact(void *dummy)
|
||||
UBIK_BEACON_LOCK;
|
||||
ts->lastBeaconSent = temp;
|
||||
code = multi_error;
|
||||
|
||||
if (code > 0 && ((code < temp && code < temp - 3600) ||
|
||||
(code > temp && code > temp + 3600))) {
|
||||
/* if we reached here, supposedly the remote host voted
|
||||
* for us based on a computation from over an hour ago in
|
||||
* the past, or over an hour in the future. this is
|
||||
* unlikely; what actually probably happened is that the
|
||||
* call generated some error and was aborted. this can
|
||||
* happen due to errors with the rx security class in play
|
||||
* (rxkad, rxgk, etc). treat the host as if we got a
|
||||
* timeout, since this is not a valid vote. */
|
||||
ubik_print("assuming distant vote time %d from %s is an error; marking host down\n",
|
||||
(int)code, afs_inet_ntoa_r(ts->addr[0], hoststr));
|
||||
code = -1;
|
||||
}
|
||||
|
||||
/* note that the vote time (the return code) represents the time
|
||||
* the vote was computed, *not* the time the vote expires. We compute
|
||||
* the latter down below if we got enough votes to go with */
|
||||
|
Loading…
Reference in New Issue
Block a user