openafs/doc/html/AdminReference/auarf237.htm
Derrick Brashear d7da1acc31 initial-html-documentation-20010606
pull in all documentation from IBM
2001-06-06 19:09:07 +00:00

326 lines
17 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4//EN">
<HTML><HEAD>
<TITLE>Administration Reference</TITLE>
<!-- Begin Header Records ========================================== -->
<!-- /tmp/idwt3672/auarf000.scr converted by idb2h R4.2 (359) ID -->
<!-- Workbench Version (AIX) on 3 Oct 2000 at 16:18:30 -->
<META HTTP-EQUIV="updated" CONTENT="Tue, 03 Oct 2000 16:18:29">
<META HTTP-EQUIV="review" CONTENT="Wed, 03 Oct 2001 16:18:29">
<META HTTP-EQUIV="expires" CONTENT="Thu, 03 Oct 2002 16:18:29">
</HEAD><BODY>
<!-- (C) IBM Corporation 2000. All Rights Reserved -->
<BODY bgcolor="ffffff">
<!-- End Header Records ============================================ -->
<A NAME="Top_Of_Page"></A>
<H1>Administration Reference</H1>
<HR><P ALIGN="center"> <A HREF="../index.htm"><IMG SRC="../books.gif" BORDER="0" ALT="[Return to Library]"></A> <A HREF="auarf002.htm#ToC"><IMG SRC="../toc.gif" BORDER="0" ALT="[Contents]"></A> <A HREF="auarf236.htm"><IMG SRC="../prev.gif" BORDER="0" ALT="[Previous Topic]"></A> <A HREF="#Bot_Of_Page"><IMG SRC="../bot.gif" BORDER="0" ALT="[Bottom of Topic]"></A> <A HREF="auarf238.htm"><IMG SRC="../next.gif" BORDER="0" ALT="[Next Topic]"></A> <A HREF="auarf284.htm#HDRINDEX"><IMG SRC="../index.gif" BORDER="0" ALT="[Index]"></A> <P>
<P>
<H2><A NAME="HDRUDEBUG" HREF="auarf002.htm#ToC_251">udebug</A></H2>
<A NAME="IDX5484"></A>
<A NAME="IDX5485"></A>
<A NAME="IDX5486"></A>
<P><STRONG>Purpose</STRONG>
<P>Reports status of Ubik process associated with a database server process
<P><STRONG>Synopsis</STRONG>
<PRE><B>udebug -servers</B> &lt;<VAR>server&nbsp;machine</VAR>> [<B>-port</B> &lt;<VAR>IP&nbsp;port</VAR>>] [<B>-long</B>] [<B>-help</B>]
<B>udebug -s</B> &lt;<VAR>server&nbsp;machine</VAR>> [<B>-p</B> &lt;<VAR>IP&nbsp;port</VAR>>] [<B>-l</B>] [<B>-h</B>]
</PRE>
<P><STRONG>Description</STRONG>
<P>The <B>udebug</B> command displays the status of the lightweight Ubik
process for the database server process identified by the <B>-port</B>
argument that is running on the database server machine named by the
<B>-servers</B> argument. The output identifies the machines where
peer database server processes are running, which of them is the
synchronization site (Ubik coordinator), and the status of the connections
between them.
<P><STRONG>Options</STRONG>
<DL>
<P><DT><B>-servers
</B><DD>Names the database server machine that is running the process for which to
display status information. Provide the machine's IP address in
dotted decimal format, its fully qualified host name (for example,
<B>fs1.abc.com</B>), or the shortest abbreviated form of its
host name that distinguishes it from other machines. Successful use of
an abbreviated form depends on the availability of a name resolution service
(such as the Domain Name Service or a local host table) at the time the
command is issued.
<P><DT><B>-port
</B><DD>Identifies the database server process for which to display status
information, either by its process name or port number. Provide one of
the following values.
<DL>
<DD><P><B>buserver</B> or <B>7021</B> for the Backup Server
<DD><P><B>kaserver</B> or <B>7004</B> for the Authentication Server
<DD><P><B>ptserver</B> or <B>7002</B> for the Protection Server
<DD><P><B>vlserver</B> or <B>7003</B> for the Volume Location Server
</DL>
<P><DT><B>-long
</B><DD>Reports additional information about each peer of the machine named by the
<B>-servers</B> argument. The information appears by default if
that machine is the synchronization site.
<P><DT><B>-help
</B><DD>Prints the online help for this command. All other valid options
are ignored.
</DL>
<P><STRONG>Output</STRONG>
<P>Several of the messages in the output provide basic status information
about the Ubik process on the machine specified by the <B>-servers</B>
argument, and the remaining messages are useful mostly for debugging
purposes.
<P>To check basic Ubik status, issue the command for each database server
machine in turn. In the output for each, one of the following messages
appears in the top third of the output.
<PRE> I am sync site . . . (<VAR>#_sites</VAR> servers)
I am not sync site
</PRE>
<P>For the synchronization site, the following message indicates that all
sites have the same version of the database, which implies that Ubik is
functioning correctly. See the following for a description of values
other than <TT>1f</TT>.
<PRE> Recovery state 1f
</PRE>
<P>For correct Ubik operation, the database server machine clocks must agree
on the time. The following messages, which are the second and third
lines in the output, report the current date and time according to the
database server machine's clock and the clock on the machine where the
<B>udebug</B> command is issued.
<PRE> Host's <VAR>IP_addr</VAR> time is <VAR>dbserver_date/time</VAR>
Local time is <VAR>local_date/time</VAR> (time differential <VAR>skew</VAR> secs)
</PRE>
<P>The <VAR>skew</VAR> is the difference between the database server machine
clock and the local clock. Its absolute value is not vital for Ubik
functioning, but a difference of more than a few seconds between the
<VAR>skew</VAR> values for the database server machines indicates that their
clocks are not synchronized and Ubik performance is possibly hampered.
<P>Following is a description of all messages in the output. As noted,
it is useful mostly for debugging and most meaningful to someone who
understands Ubik's implementation.
<P>The output begins with the following messages. The first message
reports the IP addresses that are configured with the operating system on the
machine specified by the <B>-servers</B> argument. As previously
noted, the second and third messages report the current date and time
according to the clocks on the database server machine and the machine where
the <B>udebug</B> command is issued, respectively. All subsequent
timestamps in the output are expressed in terms of the local clock rather than
the database server machine clock.
<PRE> Host's addresses are: <VAR>list_of_IP_addrs</VAR>
Host's <VAR>IP_addr</VAR> time is <VAR>dbserver_date/time</VAR>
Local time is <VAR>local_date/time</VAR> (time differential <VAR>skew</VAR> secs)
</PRE>
<P>If the <VAR>skew</VAR> is more than about 10 seconds, the following message
appears. As noted, it does not necessarily indicate Ubik
malfunction: it denotes clock skew between the database server machine
and the local machine, rather than among the database server machines.
<PRE> ****clock may be bad
</PRE>
<P>If the <B>udebug</B> command is issued during the coordinator election
process and voting has not yet begun, the following message appears
next.
<PRE> Last yes vote not cast yet
</PRE>
<P>Otherwise, the output continues with the following messages.
<PRE> Last yes vote for <VAR>sync_IP_addr</VAR> was <VAR>last_vote</VAR> secs ago (sync site);
Last vote started <VAR>vote_start</VAR> secs ago (at <VAR>date/time</VAR>)
Local db version is <VAR>db_version</VAR>
</PRE>
<P>The first indicates which peer this Ubik process last voted for as
coordinator (it can vote for itself) and how long ago it sent the vote.
The second message indicates how long ago the Ubik coordinator requested
confirming votes from the secondary sites. Usually, the
<VAR>last_vote</VAR> and <VAR>vote_start</VAR> values are the same; a
difference between them can indicate clock skew or a slow network connection
between the two database server machines. A small difference is not
harmful. The third message reports the current version number
<VAR>db_version</VAR> of the database maintained by this Ubik process. It
has two fields separated by a period. The field before the period is
based on a timestamp that reflects when the database first changed after the
most recent coordinator election, and the field after the period indicates the
number of changes since the election.
<P>The output continues with messages that differ depending on whether the
Ubik process is the coordinator or not.
<UL>
<P><LI>If there is only one database server machine, it is always the coordinator
(synchronization site), as indicated by the following message.
<PRE> I am sync site forever (1 server)
</PRE>
<P><LI>If there are multiple database sites, and the <B>-servers</B> argument
names the coordinator (synchronization site), the output continues with the
following two messages.
<PRE> I am sync site until <VAR>expiration</VAR> secs from now (at <VAR>date/time</VAR>) (<VAR>#_sites</VAR> servers)
Recovery state <VAR>flags</VAR>
</PRE>
<P>The first message reports how much longer the site remains coordinator even
if the next attempt to maintain quorum fails, and how many sites are
participating in the quorum. The <VAR>flags</VAR> field in the second
message is a hexadecimal number that indicates the current state of the
quorum. A value of <TT>1f</TT> indicates complete database
synchronization, whereas a value of <TT>f</TT> means that the coordinator
has the correct database but cannot contact all secondary sites to determine
if they also have it. Lesser values are acceptable if the
<B>udebug</B> command is issued during coordinator election, but they
denote a problem if they persist. The individual flags have the
following meanings:
<DL>
<P><DT><B><TT>0x1</TT>
</B><DD>This machine is the coordinator
<P><DT><B><TT>0x2</TT>
</B><DD>The coordinator has determined which site has the database with the
highest version number
<P><DT><B><TT>0x4</TT>
</B><DD>The coordinator has a copy of the database with the highest version number
<P><DT><B><TT>0x8</TT>
</B><DD>The database's version number has been updated correctly
<P><DT><B><TT>0x10</TT>
</B><DD>All sites have the database with the highest version number
</DL>
<P>If the <B>udebug</B> command is issued while the coordinator is writing
a change into the database, the following additional message appears.
<PRE> I am currently managing write transaction <VAR>identifier</VAR>
</PRE>
<P><LI>If the <B>-servers</B> argument names a secondary site, the output
continues with the following messages.
<PRE> I am not sync site
Lowest host <VAR>lowest_IP_addr</VAR> was set <VAR>low_time</VAR> secs ago
Sync host <VAR>sync_IP_addr</VAR> was set <VAR>sync_time</VAR> secs ago
</PRE>
<P>The <VAR>lowest_IP_addr</VAR> is the lowest IP address of any peer from which
the Ubik process has received a message recently, whereas the
<VAR>sync_IP_addr</VAR> is the IP address of the current coordinator. If
they differ, the machine with the lowest IP address is not currently the
coordinator. The Ubik process continues voting for the current
coordinator as long as they remain in contact, which provides for maximum
stability. However, in the event of another coordinator election, this
Ubik process votes for the <VAR>lowest_IP_addr</VAR> site instead (assuming they
are in contact), because it has a bias to vote in elections for the site with
the lowest IP address.
</UL>
<P>For both the synchronization and secondary sites, the output continues with
the following messages. The first message reports the version number of
the database at the synchronization site, which needs to match the
<VAR>db_version</VAR> reported by the preceding <TT>Local db version</TT>
message. The second message indicates how many VLDB records are
currently locked for any operation or for writing in particular. The
values are nonzero if the <B>udebug</B> command is issued while an
operation is in progress.
<PRE> Sync site's db version is <VAR>db_version</VAR>
<VAR>locked</VAR> locked pages, <VAR>writes</VAR> of them for write
</PRE>
<P>The following messages appear next only if there are any read or write
locks on database records:
<PRE> There are read locks held
There are write locks held
</PRE>
<P>Similarly, one or more of the following messages appear next only if there
are any read or write transactions in progress when the <B>udebug</B>
command is issued:
<PRE> There is an active write transaction
There is at least one active read transaction
Transaction tid is <VAR>tid</VAR>
</PRE>
<P>If the machine named by the <B>-servers</B> argument is the
coordinator, the next message reports when the current coordinator last
updated the database.
<PRE> Last time a new db version was labelled was:
<VAR>last_restart</VAR> secs ago (at <VAR>date/time</VAR>)
</PRE>
<P>If the machine named by the <B>-servers</B> argument is the
coordinator, the output concludes with an entry for each secondary site that
is participating in the quorum, in the following format.
<PRE> Server( <VAR>IP_address</VAR> ): (db <VAR>db_version</VAR>)
last vote rcvd <VAR>last_vote</VAR> secs ago (at <VAR>date/time</VAR>),
last beacon sent <VAR>last_beacon</VAR> secs ago (at <VAR>date/time</VAR>), last vote was { yes | no }
dbcurrent={ 0 | 1 }, up={ 0 | 1 } beaconSince={ 0 | 1 }
</PRE>
<P>The first line reports the site's IP address and the version number of
the database it is maintaining. The <VAR>last_vote</VAR> field reports
how long ago the coordinator received a vote message from the Ubik process at
the site, and the <VAR>last_beacon</VAR> field how long ago the coordinator last
requested a vote message. If the <B>udebug</B> command is issued
during the coordinator election process and voting has not yet begun, the
following messages appear instead.
<PRE> Last vote never rcvd
Last beacon never sent
</PRE>
<P>On the final line of each entry, the fields have the following
meaning:
<UL>
<P><LI><TT>dbcurrent</TT> is <TT>1</TT> if the site has the database with the
highest version number, <TT>0</TT> if it does not
<P><LI><TT>up</TT> is <TT>1</TT> if the Ubik process at the site is
functioning correctly, <TT>0</TT> if it is not
<P><LI><TT>beaconSince</TT> is <TT>1</TT> if the site has responded to the
coordinator's last request for votes, <TT>0</TT> if it has not
</UL>
<P>Including the <B>-long</B> flag produces peer entries even when the
<B>-servers</B> argument names a secondary site, but in that case only the
<VAR>IP_address</VAR> field is guaranteed to be accurate. For example,
the value in the <VAR>db_version</VAR> field is usually <TT>0.0</TT>,
because secondary sites do not poll their peers for this information.
The values in the <VAR>last_vote</VAR> and <VAR>last_beacon</VAR> fields indicate
when this site last received or requested a vote as coordinator; they
generally indicate the time of the last coordinator election.
<P><STRONG>Examples</STRONG>
<P>This example checks the status of the Ubik process for the Volume Location
Server on the machine <B>afs1</B>, which is the synchronization
site.
<PRE> % <B>udebug afs1 vlserver</B>
Host's addresses are: 192.12.107.33
Host's 192.12.107.33 time is Wed Oct 27 09:49:50 1999
Local time is Wed Oct 27 09:49:52 1999 (time differential 2 secs)
Last yes vote for 192.12.107.33 was 1 secs ago (sync site);
Last vote started 1 secs ago (at Wed Oct 27 09:49:51 1999)
Local db version is 940902602.674
I am sync site until 58 secs from now (at Wed Oct 27 09:50:50 1999) (3 servers)
Recovery state 1f
Sync site's db version is 940902602.674
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
129588 secs ago (at Mon Oct 25 21:50:04 1999)
Server( 192.12.107.35 ): (db 940902602.674)
last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
dbcurrent=1, up=1 beaconSince=1
Server( 192.12.107.34 ): (db 940902602.674)
last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
dbcurrent=1, up=1 beaconSince=1
</PRE>
<P>This example checks the status of the Authentication Server on the machine
with IP address 192.12.107.34, which is a secondary
site. The local clock is about 4 minutes behind the database server
machine's clock.
<PRE> % <B>udebug 192.12.107.34 7004</B>
Host's addresses are: 192.12.107.34
Host's 192.12.107.34 time is Wed Oct 27 09:54:15 1999
Local time is Wed Oct 27 09:50:08 1999 (time differential -247 secs)
****clock may be bad
Last yes vote for 192.12.107.33 was 6 secs ago (sync site);
Last vote started 6 secs ago (at Wed Oct 27 09:50:02 1999)
Local db version is 940906574.25
I am not sync site
Lowest host 192.12.107.33 was set 6 secs ago
Sync host 192.12.107.33 was set 6 secs ago
Sync site's db version is 940906574.25
0 locked pages, 0 of them for write
</PRE>
<P><STRONG>Privilege Required</STRONG>
<P><STRONG>Related Information</STRONG>
<P><A HREF="auarf125.htm#HDRBUSERVER">buserver</A>
<P><A HREF="auarf198.htm#HDRKASERVER">kaserver</A>
<P><A HREF="auarf227.htm#HDRPTSERVER">ptserver</A>
<P><A HREF="auarf249.htm#HDRVLSERVER">vlserver</A>
<P>
<HR><P ALIGN="center"> <A HREF="../index.htm"><IMG SRC="../books.gif" BORDER="0" ALT="[Return to Library]"></A> <A HREF="auarf002.htm#ToC"><IMG SRC="../toc.gif" BORDER="0" ALT="[Contents]"></A> <A HREF="auarf236.htm"><IMG SRC="../prev.gif" BORDER="0" ALT="[Previous Topic]"></A> <A HREF="#Top_Of_Page"><IMG SRC="../top.gif" BORDER="0" ALT="[Top of Topic]"></A> <A HREF="auarf238.htm"><IMG SRC="../next.gif" BORDER="0" ALT="[Next Topic]"></A> <A HREF="auarf284.htm#HDRINDEX"><IMG SRC="../index.gif" BORDER="0" ALT="[Index]"></A> <P>
<!-- Begin Footer Records ========================================== -->
<P><HR><B>
<br>&#169; <A HREF="http://www.ibm.com/">IBM Corporation 2000.</A> All Rights Reserved
</B>
<!-- End Footer Records ============================================ -->
<A NAME="Bot_Of_Page"></A>
</BODY></HTML>