Purpose
Reports status of Ubik process associated with a database server process
Synopsis
udebug -servers <server machine> [-port <IP port>] [-long] [-help] udebug -s <server machine> [-p <IP port>] [-l] [-h]
Description
The udebug command displays the status of the lightweight Ubik process for the database server process identified by the -port argument that is running on the database server machine named by the -servers argument. The output identifies the machines where peer database server processes are running, which of them is the synchronization site (Ubik coordinator), and the status of the connections between them.
Options
buserver or 7021 for the Backup Server
kaserver or 7004 for the Authentication Server
ptserver or 7002 for the Protection Server
vlserver or 7003 for the Volume Location Server
Output
Several of the messages in the output provide basic status information about the Ubik process on the machine specified by the -servers argument, and the remaining messages are useful mostly for debugging purposes.
To check basic Ubik status, issue the command for each database server machine in turn. In the output for each, one of the following messages appears in the top third of the output.
I am sync site . . . (#_sites servers) I am not sync site
For the synchronization site, the following message indicates that all sites have the same version of the database, which implies that Ubik is functioning correctly. See the following for a description of values other than 1f.
Recovery state 1f
For correct Ubik operation, the database server machine clocks must agree on the time. The following messages, which are the second and third lines in the output, report the current date and time according to the database server machine's clock and the clock on the machine where the udebug command is issued.
Host's IP_addr time is dbserver_date/time Local time is local_date/time (time differential skew secs)
The skew is the difference between the database server machine clock and the local clock. Its absolute value is not vital for Ubik functioning, but a difference of more than a few seconds between the skew values for the database server machines indicates that their clocks are not synchronized and Ubik performance is possibly hampered.
Following is a description of all messages in the output. As noted, it is useful mostly for debugging and most meaningful to someone who understands Ubik's implementation.
The output begins with the following messages. The first message reports the IP addresses that are configured with the operating system on the machine specified by the -servers argument. As previously noted, the second and third messages report the current date and time according to the clocks on the database server machine and the machine where the udebug command is issued, respectively. All subsequent timestamps in the output are expressed in terms of the local clock rather than the database server machine clock.
Host's addresses are: list_of_IP_addrs Host's IP_addr time is dbserver_date/time Local time is local_date/time (time differential skew secs)
If the skew is more than about 10 seconds, the following message appears. As noted, it does not necessarily indicate Ubik malfunction: it denotes clock skew between the database server machine and the local machine, rather than among the database server machines.
****clock may be bad
If the udebug command is issued during the coordinator election process and voting has not yet begun, the following message appears next.
Last yes vote not cast yet
Otherwise, the output continues with the following messages.
Last yes vote for sync_IP_addr was last_vote secs ago (sync site); Last vote started vote_start secs ago (at date/time) Local db version is db_version
The first indicates which peer this Ubik process last voted for as coordinator (it can vote for itself) and how long ago it sent the vote. The second message indicates how long ago the Ubik coordinator requested confirming votes from the secondary sites. Usually, the last_vote and vote_start values are the same; a difference between them can indicate clock skew or a slow network connection between the two database server machines. A small difference is not harmful. The third message reports the current version number db_version of the database maintained by this Ubik process. It has two fields separated by a period. The field before the period is based on a timestamp that reflects when the database first changed after the most recent coordinator election, and the field after the period indicates the number of changes since the election.
The output continues with messages that differ depending on whether the Ubik process is the coordinator or not.
I am sync site forever (1 server)
I am sync site until expiration secs from now (at date/time) (#_sites servers) Recovery state flags
The first message reports how much longer the site remains coordinator even if the next attempt to maintain quorum fails, and how many sites are participating in the quorum. The flags field in the second message is a hexadecimal number that indicates the current state of the quorum. A value of 1f indicates complete database synchronization, whereas a value of f means that the coordinator has the correct database but cannot contact all secondary sites to determine if they also have it. Lesser values are acceptable if the udebug command is issued during coordinator election, but they denote a problem if they persist. The individual flags have the following meanings:
If the udebug command is issued while the coordinator is writing a change into the database, the following additional message appears.
I am currently managing write transaction identifier
I am not sync site Lowest host lowest_IP_addr was set low_time secs ago Sync host sync_IP_addr was set sync_time secs ago
The lowest_IP_addr is the lowest IP address of any peer from which the Ubik process has received a message recently, whereas the sync_IP_addr is the IP address of the current coordinator. If they differ, the machine with the lowest IP address is not currently the coordinator. The Ubik process continues voting for the current coordinator as long as they remain in contact, which provides for maximum stability. However, in the event of another coordinator election, this Ubik process votes for the lowest_IP_addr site instead (assuming they are in contact), because it has a bias to vote in elections for the site with the lowest IP address.
For both the synchronization and secondary sites, the output continues with the following messages. The first message reports the version number of the database at the synchronization site, which needs to match the db_version reported by the preceding Local db version message. The second message indicates how many VLDB records are currently locked for any operation or for writing in particular. The values are nonzero if the udebug command is issued while an operation is in progress.
Sync site's db version is db_version locked locked pages, writes of them for write
The following messages appear next only if there are any read or write locks on database records:
There are read locks held There are write locks held
Similarly, one or more of the following messages appear next only if there are any read or write transactions in progress when the udebug command is issued:
There is an active write transaction There is at least one active read transaction Transaction tid is tid
If the machine named by the -servers argument is the coordinator, the next message reports when the current coordinator last updated the database.
Last time a new db version was labelled was: last_restart secs ago (at date/time)
If the machine named by the -servers argument is the coordinator, the output concludes with an entry for each secondary site that is participating in the quorum, in the following format.
Server( IP_address ): (db db_version) last vote rcvd last_vote secs ago (at date/time), last beacon sent last_beacon secs ago (at date/time), last vote was { yes | no } dbcurrent={ 0 | 1 }, up={ 0 | 1 } beaconSince={ 0 | 1 }
The first line reports the site's IP address and the version number of the database it is maintaining. The last_vote field reports how long ago the coordinator received a vote message from the Ubik process at the site, and the last_beacon field how long ago the coordinator last requested a vote message. If the udebug command is issued during the coordinator election process and voting has not yet begun, the following messages appear instead.
Last vote never rcvd Last beacon never sent
On the final line of each entry, the fields have the following meaning:
Including the -long flag produces peer entries even when the -servers argument names a secondary site, but in that case only the IP_address field is guaranteed to be accurate. For example, the value in the db_version field is usually 0.0, because secondary sites do not poll their peers for this information. The values in the last_vote and last_beacon fields indicate when this site last received or requested a vote as coordinator; they generally indicate the time of the last coordinator election.
Examples
This example checks the status of the Ubik process for the Volume Location Server on the machine afs1, which is the synchronization site.
% udebug afs1 vlserver Host's addresses are: 192.12.107.33 Host's 192.12.107.33 time is Wed Oct 27 09:49:50 1999 Local time is Wed Oct 27 09:49:52 1999 (time differential 2 secs) Last yes vote for 192.12.107.33 was 1 secs ago (sync site); Last vote started 1 secs ago (at Wed Oct 27 09:49:51 1999) Local db version is 940902602.674 I am sync site until 58 secs from now (at Wed Oct 27 09:50:50 1999) (3 servers) Recovery state 1f Sync site's db version is 940902602.674 0 locked pages, 0 of them for write Last time a new db version was labelled was: 129588 secs ago (at Mon Oct 25 21:50:04 1999) Server( 192.12.107.35 ): (db 940902602.674) last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999), last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes dbcurrent=1, up=1 beaconSince=1 Server( 192.12.107.34 ): (db 940902602.674) last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999), last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes dbcurrent=1, up=1 beaconSince=1
This example checks the status of the Authentication Server on the machine with IP address 192.12.107.34, which is a secondary site. The local clock is about 4 minutes behind the database server machine's clock.
% udebug 192.12.107.34 7004 Host's addresses are: 192.12.107.34 Host's 192.12.107.34 time is Wed Oct 27 09:54:15 1999 Local time is Wed Oct 27 09:50:08 1999 (time differential -247 secs) ****clock may be bad Last yes vote for 192.12.107.33 was 6 secs ago (sync site); Last vote started 6 secs ago (at Wed Oct 27 09:50:02 1999) Local db version is 940906574.25 I am not sync site Lowest host 192.12.107.33 was set 6 secs ago Sync host 192.12.107.33 was set 6 secs ago Sync site's db version is 940906574.25 0 locked pages, 0 of them for write
Privilege Required
Related Information