mirror of
https://git.openafs.org/openafs.git
synced 2025-01-19 07:20:11 +00:00
54cecedb42
The formatting gets screwed up if we have multiple =item tags together like this. To just have each one be a bullet point, just have a bare =item before each one, without a "tag" or "key" for the =item. Change-Id: I5cb7a98bd16f5999d529a42a0f822835f6d2f66e Reviewed-on: http://gerrit.openafs.org/10388 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
399 lines
14 KiB
Plaintext
399 lines
14 KiB
Plaintext
=head1 NAME
|
|
|
|
udebug - Reports Ubik process status for a database server process
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
=for html
|
|
<div class="synopsis">
|
|
|
|
B<udebug> S<<< B<-server> <I<server machine>> >>> S<<< [B<-port> <I<IP port>>] >>>
|
|
[B<-long>] [B<-help>]
|
|
|
|
B<udebug> S<<< B<-s> <I<server machine>> >>> S<<< [B<-p> <I<IP port>>] >>> [B<-l>] [B<-h>]
|
|
|
|
=for html
|
|
</div>
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
The B<udebug> command displays the status of the lightweight Ubik process
|
|
for the database server process identified by the B<-port> argument that
|
|
is running on the database server machine named by the B<-server>
|
|
argument. The output identifies the machines where peer database server
|
|
processes are running, which of them is the synchronization site (Ubik
|
|
coordinator), and the status of the connections between them.
|
|
|
|
=head1 OPTIONS
|
|
|
|
=over 4
|
|
|
|
=item B<-server> <I<server machine>>
|
|
|
|
Names the database server machine that is running the process for which to
|
|
display status information. Provide the machine's IP address in dotted
|
|
decimal format, its fully qualified host name (for example,
|
|
B<fs1.example.com>), or the shortest abbreviated form of its host name that
|
|
distinguishes it from other machines. Successful use of an abbreviated
|
|
form depends on the availability of a name resolution service (such as the
|
|
Domain Name Service or a local host table) at the time the command is
|
|
issued.
|
|
|
|
=item B<-port> <I<IP port>>
|
|
|
|
Identifies the database server process for which to display status
|
|
information, either by its process name or port number. Provide one of the
|
|
following values:
|
|
|
|
=over 4
|
|
|
|
=item
|
|
|
|
B<buserver> or 7021 for the Backup Server
|
|
|
|
=item
|
|
|
|
B<kaserver> or 7004 for the Authentication Server
|
|
|
|
=item
|
|
|
|
B<ptserver> or 7002 for the Protection Server
|
|
|
|
=item
|
|
|
|
B<vlserver> or 7003 for the Volume Location Server
|
|
|
|
=back
|
|
|
|
=item B<-long>
|
|
|
|
Reports additional information about each peer of the machine named by the
|
|
B<-server> argument. The information appears by default if that machine
|
|
is the synchronization site.
|
|
|
|
=item B<-help>
|
|
|
|
Prints the online help for this command. All other valid options are
|
|
ignored.
|
|
|
|
=back
|
|
|
|
=head1 OUTPUT
|
|
|
|
Several of the messages in the output provide basic status information
|
|
about the Ubik process on the machine specified by the B<-server>
|
|
argument, and the remaining messages are useful mostly for debugging
|
|
purposes.
|
|
|
|
To check basic Ubik status, issue the command for each database server
|
|
machine in turn. In the output for each, one of the following messages
|
|
appears in the top third of the output.
|
|
|
|
I am sync site . . . (<#_sites> servers)
|
|
|
|
I am not sync site
|
|
|
|
For the synchronization site, the following message indicates that all
|
|
sites have the same version of the database, which implies that Ubik is
|
|
functioning correctly. See the following for a description of values other
|
|
than C<1f>.
|
|
|
|
Recovery state 1f
|
|
|
|
For correct Ubik operation, the database server machine clocks must agree
|
|
on the time. The following messages, which are the second and third lines
|
|
in the output, report the current date and time according to the database
|
|
server machine's clock and the clock on the machine where the B<udebug>
|
|
command is issued.
|
|
|
|
Host's <IP_addr> time is <dbserver_date/time>
|
|
Local time is <local_date/time> (time differential <skew> secs)
|
|
|
|
The <skew> is the difference between the database server machine clock and
|
|
the local clock. Its absolute value is not vital for Ubik functioning, but
|
|
a difference of more than a few seconds between the I<skew> values for the
|
|
database server machines indicates that their clocks are not synchronized
|
|
and Ubik performance is possibly hampered.
|
|
|
|
Following is a description of all messages in the output. As noted, it is
|
|
useful mostly for debugging and most meaningful to someone who understands
|
|
Ubik's implementation.
|
|
|
|
The output begins with the following messages. The first message reports
|
|
the IP addresses that are configured with the operating system on the
|
|
machine specified by the B<-server> argument. As previously noted, the
|
|
second and third messages report the current date and time according to
|
|
the clocks on the database server machine and the machine where the
|
|
B<udebug> command is issued, respectively. All subsequent timestamps in
|
|
the output are expressed in terms of the local clock rather than the
|
|
database server machine clock.
|
|
|
|
Host's addresses are: <list_of_IP_addrs>
|
|
Host's <IP_addr> time is <dbserver_date/time>
|
|
Local time is <local_date/time> (time differential <skew> secs)
|
|
|
|
If the <skew> is more than about 10 seconds, the following message
|
|
appears. As noted, it does not necessarily indicate Ubik malfunction: it
|
|
denotes clock skew between the database server machine and the local
|
|
machine, rather than among the database server machines.
|
|
|
|
****clock may be bad
|
|
|
|
If the udebug command is issued during the coordinator election process
|
|
and voting has not yet begun, the following message appears next.
|
|
|
|
Last yes vote not cast yet
|
|
|
|
Otherwise, the output continues with the following messages.
|
|
|
|
Last yes vote for <sync_IP_addr> was <last_vote> secs ago (sync site);
|
|
Last vote started <vote_start> secs ago (at <date/time>)
|
|
Local db version is <db_version>
|
|
|
|
The first indicates which peer this Ubik process last voted for as
|
|
coordinator (it can vote for itself) and how long ago it sent the vote.
|
|
The second message indicates how long ago the Ubik coordinator requested
|
|
confirming votes from the secondary sites. Usually, the <last_vote> and
|
|
<vote_start> values are the same; a difference between them can indicate
|
|
clock skew or a slow network connection between the two database server
|
|
machines. A small difference is not harmful. The third message reports the
|
|
current version number <db_version> of the database maintained by this
|
|
Ubik process. It has two fields separated by a period. The field before
|
|
the period is based on a timestamp that reflects when the database first
|
|
changed after the most recent coordinator election, and the field after
|
|
the period indicates the number of changes since the election.
|
|
|
|
The output continues with messages that differ depending on whether the
|
|
Ubik process is the coordinator or not.
|
|
|
|
=over 4
|
|
|
|
=item *
|
|
|
|
If there is only one database server machine, it is always the coordinator
|
|
(synchronization site), as indicated by the following message.
|
|
|
|
I am sync site forever (1 server)
|
|
|
|
=item *
|
|
|
|
If there are multiple database sites, and the B<-server> argument names
|
|
the coordinator (synchronization site), the output continues with the
|
|
following two messages.
|
|
|
|
I am sync site until <expiration> secs from now (at <date/time>)
|
|
(<#_sites> servers)
|
|
Recovery state <flags>
|
|
|
|
The first message (which is reported on one line) reports how much longer
|
|
the site remains coordinator even if the next attempt to maintain quorum
|
|
fails, and how many sites are participating in the quorum. The I<flags>
|
|
field in the second message is a hexadecimal number that indicates the
|
|
current state of the quorum. A value of C<1f> indicates complete database
|
|
synchronization, whereas a value of C<f> means that the coordinator has
|
|
the correct database but cannot contact all secondary sites to determine
|
|
if they also have it. Lesser values are acceptable if the B<udebug>
|
|
command is issued during coordinator election, but they denote a problem
|
|
if they persist. The individual flags have the following meanings:
|
|
|
|
=over 4
|
|
|
|
=item 0x1
|
|
|
|
This machine is the coordinator.
|
|
|
|
=item 0x2
|
|
|
|
The coordinator has determined which site has the database with the
|
|
highest version number.
|
|
|
|
=item 0x4
|
|
|
|
The coordinator has a copy of the database with the highest version
|
|
number.
|
|
|
|
=item 0x8
|
|
|
|
The database's version number has been updated correctly.
|
|
|
|
=item 0x10
|
|
|
|
All sites have the database with the highest version number.
|
|
|
|
=back
|
|
|
|
If the udebug command is issued while the coordinator is writing a change
|
|
into the database, the following additional message appears.
|
|
|
|
I am currently managing write transaction I<identifier>
|
|
|
|
=item *
|
|
|
|
If the B<-server> argument names a secondary site, the output continues
|
|
with the following messages.
|
|
|
|
I am not sync site
|
|
Lowest host <lowest_IP_addr> was set <low_time> secs ago
|
|
Sync host <sync_IP_addr> was set <sync_time> secs ago
|
|
|
|
The <lowest_IP_addr> is the lowest IP address of any peer from which the
|
|
Ubik process has received a message recently, whereas the <sync_IP_addr>
|
|
is the IP address of the current coordinator. If they differ, the machine
|
|
with the lowest IP address is not currently the coordinator. The Ubik
|
|
process continues voting for the current coordinator as long as they
|
|
remain in contact, which provides for maximum stability. However, in the
|
|
event of another coordinator election, this Ubik process votes for the
|
|
<lowest_IP_addr> site instead (assuming they are in contact), because it
|
|
has a bias to vote in elections for the site with the lowest IP address.
|
|
|
|
=back
|
|
|
|
For both the synchronization and secondary sites, the output continues
|
|
with the following messages. The first message reports the version number
|
|
of the database at the synchronization site, which needs to match the
|
|
<db_version> reported by the preceding C<Local db version> message. The
|
|
second message indicates how many VLDB records are currently locked for
|
|
any operation or for writing in particular. The values are nonzero if the
|
|
B<udebug> command is issued while an operation is in progress.
|
|
|
|
Sync site's db version is <db_version>
|
|
<locked> locked pages, <writes> of them for write
|
|
|
|
The following messages appear next only if there are any read or write
|
|
locks on database records:
|
|
|
|
There are read locks held
|
|
There are write locks held
|
|
|
|
Similarly, one or more of the following messages appear next only if there
|
|
are any read or write transactions in progress when the B<udebug> command
|
|
is issued:
|
|
|
|
There is an active write transaction
|
|
There is at least one active read transaction
|
|
Transaction tid is <tid>
|
|
|
|
If the machine named by the B<-server> argument is the coordinator, the
|
|
next message reports when the current coordinator last updated the
|
|
database.
|
|
|
|
Last time a new db version was labelled was:
|
|
<last_restart> secs ago (at <date/time>)
|
|
|
|
If the machine named by the B<-server> argument is the coordinator, the
|
|
output concludes with an entry for each secondary site that is
|
|
participating in the quorum, in the following format.
|
|
|
|
Server (<IP_address>): (db <db_version>)
|
|
last vote rcvd <last_vote> secs ago (at <date/time>),
|
|
last beacon sent <last_beacon> secs ago (at <date/time>),
|
|
last vote was { yes | no }
|
|
dbcurrent={ 0 | 1 }, up={ 0 | 1 } beaconSince={ 0 | 1 }
|
|
|
|
The first line reports the site's IP address and the version number of the
|
|
database it is maintaining. The <last_vote> field reports how long ago the
|
|
coordinator received a vote message from the Ubik process at the site, and
|
|
the <last_beacon> field how long ago the coordinator last requested a vote
|
|
message. If the B<udebug> command is issued during the coordinator
|
|
election process and voting has not yet begun, the following messages
|
|
appear instead.
|
|
|
|
Last vote never rcvd
|
|
Last beacon never sent
|
|
|
|
On the final line of each entry, the fields have the following meaning:
|
|
|
|
=over 4
|
|
|
|
=item *
|
|
|
|
C<dbcurrent> is C<1> if the site has the database with the highest version
|
|
number, C<0> if it does not.
|
|
|
|
=item *
|
|
|
|
C<up> is C<1> if the Ubik process at the site is functioning correctly,
|
|
C<0> if it is not.
|
|
|
|
=item *
|
|
|
|
C<beaconSince> is C<1> if the site has responded to the coordinator's last
|
|
request for votes, C<0> if it has not.
|
|
|
|
=back
|
|
|
|
Including the B<-long> flag produces peer entries even when the
|
|
B<-server> argument names a secondary site, but in that case only the
|
|
I<IP_address> field is guaranteed to be accurate. For example, the value
|
|
in the <db_version> field is usually C<0.0>, because secondary sites do
|
|
not poll their peers for this information. The values in the I<last_vote>
|
|
and I<last_beacon> fields indicate when this site last received or
|
|
requested a vote as coordinator; they generally indicate the time of the
|
|
last coordinator election.
|
|
|
|
=head1 EXAMPLES
|
|
|
|
This example checks the status of the Ubik process for the Volume Location
|
|
Server on the machine C<afs1>, which is the synchronization site.
|
|
|
|
% udebug afs1 vlserver
|
|
Host's addresses are: 192.12.107.33
|
|
Host's 192.12.107.33 time is Wed Oct 27 09:49:50 1999
|
|
Local time is Wed Oct 27 09:49:52 1999 (time differential 2 secs)
|
|
Last yes vote for 192.12.107.33 was 1 secs ago (sync site);
|
|
Last vote started 1 secs ago (at Wed Oct 27 09:49:51 1999)
|
|
Local db version is 940902602.674
|
|
I am sync site until 58 secs from now (at Wed Oct 27 09:50:50 1999) (3 servers)
|
|
Recovery state 1f
|
|
Sync site's db version is 940902602.674
|
|
0 locked pages, 0 of them for write
|
|
Last time a new db version was labelled was:
|
|
129588 secs ago (at Mon Oct 25 21:50:04 1999)
|
|
|
|
Server( 192.12.107.35 ): (db 940902602.674)
|
|
last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
|
|
last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
|
|
dbcurrent=1, up=1 beaconSince=1
|
|
|
|
Server( 192.12.107.34 ): (db 940902602.674)
|
|
last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
|
|
last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
|
|
dbcurrent=1, up=1 beaconSince=1
|
|
|
|
This example checks the status of the Authentication Server on the machine
|
|
with IP address 192.12.107.34, which is a secondary site. The local clock
|
|
is about 4 minutes behind the database server machine's clock.
|
|
|
|
% udebug 192.12.107.34 7004
|
|
Host's addresses are: 192.12.107.34
|
|
Host's 192.12.107.34 time is Wed Oct 27 09:54:15 1999
|
|
Local time is Wed Oct 27 09:50:08 1999 (time differential -247 secs)
|
|
****clock may be bad
|
|
Last yes vote for 192.12.107.33 was 6 secs ago (sync site);
|
|
Last vote started 6 secs ago (at Wed Oct 27 09:50:02 1999)
|
|
Local db version is 940906574.25
|
|
I am not sync site
|
|
Lowest host 192.12.107.33 was set 6 secs ago
|
|
Sync host 192.12.107.33 was set 6 secs ago
|
|
Sync site's db version is 940906574.25
|
|
0 locked pages, 0 of them for write
|
|
|
|
=head1 PRIVILEGE REQUIRED
|
|
|
|
None
|
|
|
|
=head1 SEE ALSO
|
|
|
|
L<buserver(8)>,
|
|
L<kaserver(8)>,
|
|
L<ptserver(8)>,
|
|
L<vlserver(8)>
|
|
|
|
=head1 COPYRIGHT
|
|
|
|
IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.
|
|
|
|
This documentation is covered by the IBM Public License Version 1.0. It was
|
|
converted from HTML to POD by software written by Chas Williams and Russ
|
|
Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.
|