mirror of
https://git.openafs.org/openafs.git
synced 2025-01-22 00:41:08 +00:00
d7da1acc31
pull in all documentation from IBM
1874 lines
97 KiB
HTML
1874 lines
97 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4//EN">
|
|
<HTML><HEAD>
|
|
<TITLE>Administration Guide</TITLE>
|
|
<!-- Begin Header Records ========================================== -->
|
|
<!-- /tmp/idwt3570/auagd000.scr converted by idb2h R4.2 (359) ID -->
|
|
<!-- Workbench Version (AIX) on 2 Oct 2000 at 11:42:14 -->
|
|
<META HTTP-EQUIV="updated" CONTENT="Mon, 02 Oct 2000 11:42:13">
|
|
<META HTTP-EQUIV="review" CONTENT="Tue, 02 Oct 2001 11:42:13">
|
|
<META HTTP-EQUIV="expires" CONTENT="Wed, 02 Oct 2002 11:42:13">
|
|
</HEAD><BODY>
|
|
<!-- (C) IBM Corporation 2000. All Rights Reserved -->
|
|
<BODY bgcolor="ffffff">
|
|
<!-- End Header Records ============================================ -->
|
|
<A NAME="Top_Of_Page"></A>
|
|
<H1>Administration Guide</H1>
|
|
<HR><P ALIGN="center"> <A HREF="../index.htm"><IMG SRC="../books.gif" BORDER="0" ALT="[Return to Library]"></A> <A HREF="auagd002.htm#ToC"><IMG SRC="../toc.gif" BORDER="0" ALT="[Contents]"></A> <A HREF="auagd012.htm"><IMG SRC="../prev.gif" BORDER="0" ALT="[Previous Topic]"></A> <A HREF="#Bot_Of_Page"><IMG SRC="../bot.gif" BORDER="0" ALT="[Bottom of Topic]"></A> <A HREF="auagd014.htm"><IMG SRC="../next.gif" BORDER="0" ALT="[Next Topic]"></A> <A HREF="auagd026.htm#HDRINDEX"><IMG SRC="../index.gif" BORDER="0" ALT="[Index]"></A> <P>
|
|
<HR><H1><A NAME="HDRWQ323" HREF="auagd002.htm#ToC_360">Monitoring and Auditing AFS Performance</A></H1>
|
|
<A NAME="IDX7094"></A>
|
|
<A NAME="IDX7095"></A>
|
|
<A NAME="IDX7096"></A>
|
|
<A NAME="IDX7097"></A>
|
|
<A NAME="IDX7098"></A>
|
|
<A NAME="IDX7099"></A>
|
|
<A NAME="IDX7100"></A>
|
|
<A NAME="IDX7101"></A>
|
|
<P>AFS comes with three main monitoring tools:
|
|
<UL>
|
|
<P><LI>The <B>scout</B> program, which monitors and gathers statistics on
|
|
File Server performance.
|
|
<P><LI>The <B>fstrace</B> command suite, which traces Cache Manager
|
|
operations in detail.
|
|
<P><LI>The <B>afsmonitor</B> program, which monitors and gathers statistics
|
|
on both the File Server and the Cache Manager.
|
|
</UL>
|
|
<P>AFS also provides a tool for auditing AFS events on file server machines
|
|
running AIX.
|
|
<HR><H2><A NAME="HDRWQ324" HREF="auagd002.htm#ToC_361">Summary of Instructions</A></H2>
|
|
<P>This chapter explains how to perform the following tasks by
|
|
using the indicated commands:
|
|
<BR>
|
|
<TABLE WIDTH="100%">
|
|
<TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Initialize the <B>scout</B> program
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>scout</B>
|
|
</TD></TR><TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Display information about a trace log
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>fstrace lslog</B>
|
|
</TD></TR><TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Display information about an event set
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>fstrace lsset</B>
|
|
</TD></TR><TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Change the size of a trace log
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>fstrace setlog</B>
|
|
</TD></TR><TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Set the state of an event set
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>fstrace setset</B>
|
|
</TD></TR><TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Dump contents of a trace log
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>fstrace dump</B>
|
|
</TD></TR><TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Clear a trace log
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>fstrace clear</B>
|
|
</TD></TR><TR>
|
|
<TD ALIGN="LEFT" VALIGN="TOP" WIDTH="70%">Initialize the <B>afsmonitor</B> program
|
|
</TD><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><B>afsmonitor</B>
|
|
</TD></TR></TABLE>
|
|
<HR><H2><A NAME="HDRWQ326" HREF="auagd002.htm#ToC_362">Using the scout Program</A></H2>
|
|
<A NAME="IDX7102"></A>
|
|
<P>The <B>scout</B> program monitors the status of the File Server process
|
|
running on file server machines. It periodically collects statistics
|
|
from a specified set of File Server processes, displays them in a graphical
|
|
format, and alerts you if any of the statistics exceed a configurable
|
|
threshold.
|
|
<P>More specifically, the <B>scout</B> program includes the following
|
|
features.
|
|
<UL>
|
|
<P><LI>You can monitor, from a single location, the File Server process on any
|
|
number of server machines from the local and foreign cells. The number
|
|
is limited only by the size of the display window, which must be large enough
|
|
to display the statistics.
|
|
<P><LI>You can set a threshold for many of the statistics. When the value
|
|
of a statistic exceeds the threshold, the <B>scout</B> program highlights
|
|
it (displays it in reverse video) to draw your attention to it. If the
|
|
value goes back under the threshold, the highlighting is deactivated.
|
|
You control the thresholds, so highlighting reflects what you consider to be a
|
|
noteworthy situation. See <A HREF="#HDRWQ332">Highlighting Significant Statistics</A>.
|
|
<P><LI>The <B>scout</B> program alerts you to File Server process, machine,
|
|
and network outages by highlighting the name of each machine that does not
|
|
respond to its probe, enabling you to respond more quickly.
|
|
<P><LI>You can set how often the <B>scout</B> program collects statistics
|
|
from the File Server processes.
|
|
</UL>
|
|
<P><H3><A NAME="HDRWQ327" HREF="auagd002.htm#ToC_363">System Requirements</A></H3>
|
|
<A NAME="IDX7103"></A>
|
|
<A NAME="IDX7104"></A>
|
|
<A NAME="IDX7105"></A>
|
|
<A NAME="IDX7106"></A>
|
|
<A NAME="IDX7107"></A>
|
|
<A NAME="IDX7108"></A>
|
|
<A NAME="IDX7109"></A>
|
|
<P>The <B>scout</B> program runs on any AFS client machine that has access
|
|
to the <B>curses</B> graphics package, which most UNIX distributions
|
|
include as a standard utility. It can run on both dumb terminals and
|
|
under windowing systems that emulate terminals, but the output looks best on
|
|
machines that support reverse video and cursor addressing. For best
|
|
results, set the TERM environment variable to the correct terminal type, or
|
|
one with characteristics similar to the actual ones. For machines
|
|
running AIX, the recommended TERM setting is <B>vt100</B>, assuming the
|
|
terminal is similar to that. For other operating systems, the wider
|
|
range of acceptable values includes <B>xterm</B>, <B>xterms</B>,
|
|
<B>vt100</B>, <B>vt200</B>, and <B>wyse85</B>.
|
|
<A NAME="IDX7110"></A>
|
|
<P>No privilege is required to run the <B>scout</B> program, so any user
|
|
who can access the directory where its binary resides (the
|
|
<B>/usr/afsws/bin</B> directory in the conventional configuration) can use
|
|
it. The program's probes for collecting statistics do not impose a
|
|
significant burden on the File Server process, but you can restrict its use by
|
|
placing the binary file in a directory with a more restrictive access control
|
|
list (ACL).
|
|
<P>Multiple instances of the <B>scout</B> program can run on a single
|
|
client machine, each over its own dedicated connection (in its own
|
|
window). It must run in the foreground, so the window in which it runs
|
|
does not accept further input except for an interrupt signal.
|
|
<P>You can also run the <B>scout</B> program on several machines and view
|
|
its output on a single machine, by opening telnet connections to the other
|
|
machines from the central one and initializing the program in each remote
|
|
window. In this case, you can include the <B>-host</B> flag to the
|
|
<B>scout</B> command to make the name of each remote machine appear in the
|
|
<I>banner line</I> at the top of the window displaying its output.
|
|
See <A HREF="#HDRWQ330">The Banner Line</A>.
|
|
<P><H3><A NAME="HDRWQ328" HREF="auagd002.htm#ToC_364">Using the -basename argument to Specify a Domain Name</A></H3>
|
|
<A NAME="IDX7111"></A>
|
|
<A NAME="IDX7112"></A>
|
|
<P>As previously mentioned, the <B>scout</B> program can monitor the File
|
|
Server process on any number of file server machines. If all of the
|
|
machines belong to the same cell, then their hostnames probably all have the
|
|
same domain name suffix, such as <B>abc.com</B> in the ABC
|
|
Corporation cell. In this case, you can use the <B>-basename</B>
|
|
argument to the <B>scout</B> command, which has several advantages:
|
|
<UL>
|
|
<P><LI>You can omit the domain name suffix as you enter each file server
|
|
machine's name on the command line. The <B>scout</B> program
|
|
automatically appends the domain name to each machine's name, resulting
|
|
in a fully-qualified hostname. You can omit the domain name suffix even
|
|
when you don't include the <B>-basename</B> argument, but in that
|
|
case correct resolution of the name depends on the state of your cell's
|
|
naming service at the time of connection.
|
|
<P><LI>The machine names are more likely to fit in the appropriate column of the
|
|
display without having to be truncated (for more on truncating names in the
|
|
display column, see <A HREF="#HDRWQ331">The Statistics Display Region</A>).
|
|
<P><LI>The domain name appears in the banner line at the top of the display
|
|
window to indicate the name of the cell you are monitoring.
|
|
</UL>
|
|
<P><H3><A NAME="HDRWQ329" HREF="auagd002.htm#ToC_365">The Layout of the scout Display</A></H3>
|
|
<A NAME="IDX7113"></A>
|
|
<A NAME="IDX7114"></A>
|
|
<P>The <B>scout</B> program can display statistics either in a dedicated
|
|
window or on a plain screen if a windowing environment is not
|
|
available. For best results, use a window or screen that can print in
|
|
reverse video and do cursor addressing.
|
|
<P>The <B>scout</B> program screen has three main regions: the
|
|
<I>banner line</I>, the <I>statistics display region</I> and the
|
|
<I>probe/message line</I>. This section describes their contents,
|
|
and graphic examples appear in <A HREF="#HDRWQ336">Example Commands and Displays</A>.
|
|
<P><H4><A NAME="HDRWQ330">The Banner Line</A></H4>
|
|
<A NAME="IDX7115"></A>
|
|
<A NAME="IDX7116"></A>
|
|
<P>By default, the string <TT>scout</TT> appears in the banner line at the
|
|
top of the window or screen, to indicate that the <B>scout</B> program is
|
|
running. You can display two additional types of information by include
|
|
the appropriate option on the command line:
|
|
<UL>
|
|
<P><LI>Include the <B>-host</B> flag to display the local machine's name
|
|
in the banner line. This is particularly useful when you are running
|
|
the <B>scout</B> program on several machines but displaying the results on
|
|
a single machine.
|
|
<P>For example, the following banner line appears when you run the
|
|
<B>scout</B> program on the machine
|
|
<B>client1.abc.com</B> and use the<B>-host</B>
|
|
flag:
|
|
<PRE> [client1.abc.com] scout
|
|
</PRE>
|
|
<P><LI>Include the <B>-basename</B> argument to display the specified cell
|
|
domain name in the banner line. For further discussion, see <A HREF="#HDRWQ328">Using the -basename argument to Specify a Domain Name</A>.
|
|
<P>For example, if you specify a value of <B>abc.com</B> for the
|
|
<B>-basename</B> argument, the banner line reads:
|
|
<PRE> scout for abc.com
|
|
</PRE>
|
|
</UL>
|
|
<P><H4><A NAME="HDRWQ331">The Statistics Display Region</A></H4>
|
|
<A NAME="IDX7117"></A>
|
|
<A NAME="IDX7118"></A>
|
|
<P>The statistics display region occupies most of the window and is divided
|
|
into six columns. The following list describes them as they appear from
|
|
left to right in the window.
|
|
<DL>
|
|
<P><DT><B><TT>Conn</TT>
|
|
<A NAME="IDX7119"></A>
|
|
</B><DD>Displays the number of RPC connections open between the File Server
|
|
process and client machines. This number normally equals or exceeds the
|
|
number in the fourth <TT>Ws</TT> column. It can exceed the number in
|
|
that column because each user on the machine can have more than one connection
|
|
open at once, and one client machine can handle several users.
|
|
<P><DT><B><TT>Fetch</TT>
|
|
<A NAME="IDX7120"></A>
|
|
</B><DD>Displays the number of fetch-type RPCs (fetch data, fetch access list, and
|
|
fetch status) that the File Server process has received from client machines
|
|
since it started. It resets to zero when the File Server process
|
|
restarts.
|
|
<P><DT><B><TT>Store</TT>
|
|
<A NAME="IDX7121"></A>
|
|
</B><DD>Displays the number of store-type RPCs (store data, store access list, and
|
|
store status) that the File Server process has received from client machines
|
|
since it started. It resets to zero when the File Server process
|
|
restarts.
|
|
<P><DT><B><TT>Ws</TT>
|
|
<A NAME="IDX7122"></A>
|
|
<A NAME="IDX7123"></A>
|
|
<A NAME="IDX7124"></A>
|
|
</B><DD>Displays the number of client machines (workstations) that have
|
|
communicated with the File Server process within the last 15 minutes (such
|
|
machines are termed <I>active</I>). This number is likely to be
|
|
smaller than the number in the <TT>Conn</TT>) column because a single client
|
|
machine can have several connections open to one File Server process.
|
|
<P><DT><B>[Unlabeled column]
|
|
</B><DD>Displays the name of the file server machine on which the File Server
|
|
process is running. It is 12 characters wide. Longer names are
|
|
truncated and an asterisk (<TT>*</TT>) appears as the last character in the
|
|
name. If all machines have the same domain name suffix, you can use the
|
|
<B>-basename</B> argument to decrease the need for truncation; see <A HREF="#HDRWQ328">Using the -basename argument to Specify a Domain Name</A>.
|
|
<P><DT><B><TT>Disk attn</TT>
|
|
<A NAME="IDX7125"></A>
|
|
<A NAME="IDX7126"></A>
|
|
<A NAME="IDX7127"></A>
|
|
<A NAME="IDX7128"></A>
|
|
</B><DD>Displays the number of kilobyte blocks available on up to 26 of the file
|
|
server machine's AFS server (<B>/vicep</B>) partitions. The
|
|
display for each partition has the following format:
|
|
<PRE> <VAR>partition_letter</VAR>:<VAR>free_blocks</VAR>
|
|
</PRE>
|
|
<P>
|
|
<P>For example, <TT>a:8949</TT> indicates that partition
|
|
<B>/vicepa</B> has 8,949 KB free. If the window is not wide enough
|
|
for all partition entries to appear on a single line, the <B>scout</B>
|
|
program automatically stacks the partition entries into subcolumns within the
|
|
sixth column.
|
|
<P>The label on the <TT>Disk attn</TT> column indicates the threshold value
|
|
at which entries in the column become highlighted. By default, the
|
|
<B>scout</B> program highlights a partition that is over 95% full, in
|
|
which case the label is as follows:
|
|
<PRE> Disk attn: > 95% used
|
|
</PRE>
|
|
<P>
|
|
<P>For more on this threshold and its effect on highlighting, see <A HREF="#HDRWQ332">Highlighting Significant Statistics</A>.
|
|
</DL>
|
|
<P>For all columns except the fifth (file server machine name), you can use
|
|
the <B>-attention</B> argument to set a threshold value above which the
|
|
<B>scout</B> program highlights the statistic. By default, only
|
|
values in the fifth and sixth columns ever become highlighted. For
|
|
instructions on using the <B>-attention</B> argument, see <A HREF="#HDRWQ332">Highlighting Significant Statistics</A>.
|
|
<P><H4><A NAME="Header_368">The Probe Reporting Line</A></H4>
|
|
<A NAME="IDX7129"></A>
|
|
<A NAME="IDX7130"></A>
|
|
<P>The bottom line of the display indicates how many times the
|
|
<B>scout</B> program has probed the File Server processes for
|
|
statistics. The statistics gathered in the latest probe appear in the
|
|
statistics display region. By default, the <B>scout</B> program
|
|
probes the File Servers every 60 seconds, but you can use the
|
|
<B>-frequency</B> argument to specify a different probe frequency.
|
|
<P><H3><A NAME="HDRWQ332" HREF="auagd002.htm#ToC_369">Highlighting Significant Statistics</A></H3>
|
|
<A NAME="IDX7131"></A>
|
|
<A NAME="IDX7132"></A>
|
|
<A NAME="IDX7133"></A>
|
|
<A NAME="IDX7134"></A>
|
|
<P>To draw your attention to a statistic that currently exceed a threshold
|
|
value, the <B>scout</B> program displays it in reverse video (highlights
|
|
it). You can set the threshold value for most statistics, and so
|
|
determine which values are worthy of special attention and which are
|
|
normal.
|
|
<P><H4><A NAME="HDRWQ333">Highlighting Server Outages</A></H4>
|
|
<A NAME="IDX7135"></A>
|
|
<A NAME="IDX7136"></A>
|
|
<A NAME="IDX7137"></A>
|
|
<A NAME="IDX7138"></A>
|
|
<A NAME="IDX7139"></A>
|
|
<P>The only column in which you cannot control highlighting is the fifth,
|
|
which identifies the file server machine for which statistics are displayed in
|
|
the other columns. The <B>scout</B> program uses highlighting in
|
|
this column to indicate that the File Server process on a machine fails to
|
|
respond to its probe, and automatically blanks out the other columns.
|
|
Failure to respond to the probe can indicate a File Server process, file
|
|
server machine, or network outage, so the highlighting draws your attention to
|
|
a situation that is probably interrupting service to users.
|
|
<P>When the File Server process once again responds to the probes, its name
|
|
appears normally and statistics reappear in the other columns. If all
|
|
machine names become highlighted at once, a possible network outage has
|
|
disrupted the connection between the file server machines and the client
|
|
machine running the <B>scout</B> program.
|
|
<P><H4><A NAME="Header_371">Highlighting for Extreme Statistic Values</A></H4>
|
|
<P>To set the threshold value for one or more of the five
|
|
statistics-displaying columns, use the <B>-attention</B> argument.
|
|
The threshold value applies to all File Server processes you are monitoring
|
|
(you cannot set different thresholds for different machines). For
|
|
details, see the syntax description in <A HREF="#HDRWQ335">To start the scout program</A>.
|
|
<P>It is not possible to change the threshold values for a running
|
|
<B>scout</B> program. Stop the current program and start a new
|
|
one. Also, the <B>scout</B> program does not retain threshold
|
|
values across restarts, so you must specify all thresholds every time you
|
|
start the program.
|
|
<P><H3><A NAME="HDRWQ334" HREF="auagd002.htm#ToC_372">Resizing the scout Display</A></H3>
|
|
<A NAME="IDX7140"></A>
|
|
<A NAME="IDX7141"></A>
|
|
<A NAME="IDX7142"></A>
|
|
<P>Do not resize the display window while the <B>scout</B> program is
|
|
running. Increasing the size does no harm, but the <B>scout</B>
|
|
program does not necessarily adjust to the new dimensions. Decreasing
|
|
the display's width can disturb column alignment, making the display
|
|
harder to read. With any type of resizing, the <B>scout</B> program
|
|
does not adjust the display in any way until it displays the results of the
|
|
next probe.
|
|
<P>To resize the display effectively, stop the <B>scout</B> program,
|
|
resize the window and then restart the program. Even in this case, the
|
|
<B>scout</B> program's response depends on the accuracy of the
|
|
information it receives from the display environment. Testing during
|
|
development has shown that the display environment does not reliably provide
|
|
information about window resizing. If you use the X windowing system,
|
|
issuing the following sequence of commands before starting the
|
|
<B>scout</B> program (or placing them in the shell initialization file)
|
|
sometimes makes it adjust properly to resizing.
|
|
<PRE> %<B> set noglob</B>
|
|
% <B>eval '/usr/bin/X11/resize'</B>
|
|
% <B>unset noglob</B>
|
|
</PRE>
|
|
<A NAME="IDX7143"></A>
|
|
<A NAME="IDX7144"></A>
|
|
<A NAME="IDX7145"></A>
|
|
<A NAME="IDX7146"></A>
|
|
<A NAME="IDX7147"></A>
|
|
<P><H3><A NAME="HDRWQ335" HREF="auagd002.htm#ToC_373">To start the scout program</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Open a dedicated command shell. If necessary, adjust it to the
|
|
appropriate size.
|
|
<P><LI>Issue the <B>scout</B> command to start the program.
|
|
<PRE> % <B>scout</B> [<B>initcmd</B>] <B>-server</B> <<VAR>FileServer name(s) to monitor</VAR>><SUP>+</SUP> \
|
|
[<B>-basename</B> <<VAR>base server name</VAR>>] \
|
|
[<B>-frequency</B> <<VAR>poll frequency, in seconds</VAR>>] [<B>-host</B>] \
|
|
[<B>-attention</B> <<VAR>specify attention (highlighting) level</VAR>><SUP>+</SUP>] \
|
|
[<B>-debug</B> <<VAR>turn debugging output on to the named file</VAR>>]
|
|
</PRE>
|
|
<P>where
|
|
<DL>
|
|
<P><DT><B>initcmd
|
|
</B><DD>Is an optional string that accommodates the command's use of the AFS
|
|
command parser. It can be omitted and ignored.
|
|
<P><DT><B>-server
|
|
</B><DD>Identifies each File Server process to monitor, by naming the file server
|
|
machine it is running on. Provide fully-qualified hostnames unless the
|
|
<B>-basename</B> argument is used. In that case, specify only the
|
|
initial part of each machine name, omitting the domain name suffix common to
|
|
all the machine names.
|
|
<P><DT><B>-basename
|
|
</B><DD>Specifies the domain name suffix common to all of the file server machines
|
|
named by the <B>-server</B> argument. For discussion of this
|
|
argument's effects, see <A HREF="#HDRWQ328">Using the -basename argument to Specify a Domain Name</A>.
|
|
<P>Do not include the period that separates the domain suffix from the initial
|
|
part of the machine name, but do include any periods that occur within the
|
|
suffix itself. (For example, in the ABC Corporation cell, the proper
|
|
value is <B>abc.com</B>, not
|
|
<B>.abc.com</B>.)
|
|
<P><DT><B>-frequency
|
|
</B><DD>Sets the frequency, in seconds, of the <B>scout</B> program's
|
|
probes to File Server processes. Specify an integer greater than 0
|
|
(zero). The default is 60 seconds.
|
|
<P><DT><B>-host
|
|
</B><DD>Displays the name of the machine that is running the <B>scout</B>
|
|
program in the display window's banner line. By default, no
|
|
machine name is displayed.
|
|
<P><DT><B>-attention
|
|
</B><DD>Defines the threshold value at which to highlight one or more
|
|
statistics. You can provide the pairs of statistic and threshold in any
|
|
order, separating each pair and the parts of each pair with one or more
|
|
spaces. The following list defines the syntax for each
|
|
statistic.
|
|
<A NAME="IDX7148"></A>
|
|
<A NAME="IDX7149"></A>
|
|
<A NAME="IDX7150"></A>
|
|
<DL>
|
|
<P><DT><B>conn <VAR>connections</VAR>
|
|
</B><DD>Highlights the value in the <TT>Conn</TT> (first) column when the number
|
|
of connections that the File Server has open to client machines exceeds the
|
|
<VAR>connections</VAR> value. The highlighting deactivates when the value
|
|
goes back below the threshold. There is no default threshold.
|
|
<P><DT><B>fetch <VAR>fetch_RPCs</VAR>
|
|
</B><DD>Highlights the value in the <TT>Fetch</TT> (second) column when the
|
|
number of fetch RPCs that clients have made to the File Server process exceeds
|
|
the <VAR>fetch_RPCs</VAR> value. The highlighting deactivates only when
|
|
the File Server process restarts, at which time the value returns to
|
|
zero. There is no default threshold.
|
|
<P><DT><B>store <VAR>store_RPCs</VAR>
|
|
</B><DD>Highlights the value in the <TT>Store</TT> (third) column when the
|
|
number of store RPCs that clients have made to the File Server process exceeds
|
|
the <VAR>store_RPCs</VAR> value. The highlighting deactivates only when
|
|
the File Server process restarts, at which time the value returns to
|
|
zero. There is no default threshold.
|
|
<P><DT><B>ws <VAR>active_clients</VAR>
|
|
</B><DD>Highlights the value in the <TT>Ws</TT> (fourth) column when the number
|
|
of active client machines (those that have contacted the File Server in the
|
|
last 15 minutes) exceeds the <VAR>active_clients</VAR> value. The
|
|
highlighting deactivates when the value goes back below the threshold.
|
|
There is no default threshold.
|
|
<P><DT><B>disk <VAR>percent_full</VAR> % or disk <VAR>min_blocks</VAR>
|
|
</B><DD>Highlights the value for a partition in the <TT>Disk attn</TT> (sixth)
|
|
column when either the amount of disk space used exceeds the percentage
|
|
indicated by the<VAR>percent_full</VAR> value, or the number of free KB blocks
|
|
is less than the <VAR>min_blocks</VAR> value. The highlighting
|
|
deactivates when the value goes back below the <VAR>percent_full</VAR> threshold
|
|
or above the <VAR>min_blocks</VAR> threshold.
|
|
<P>The value you specify appears in the header of the sixth column following
|
|
the string <TT>Disk attn</TT>. The default threshold is 95%
|
|
full.
|
|
<P>Acceptable values for <VAR>percent_full</VAR> are the integers from the range
|
|
<B>0</B> (zero) to <B>99</B>, and you must include the percent sign to
|
|
distinguish this statistic from a <VAR>min_blocks</VAR> value..
|
|
</DL>
|
|
<P>The following example sets the threshold for the <TT>Conn</TT> column to
|
|
100, for the <TT>Ws</TT> column to 50, and for the <TT>Disk attn</TT>
|
|
column to 75%. There is no threshold for the <TT>Fetch</TT> and
|
|
<TT>Store</TT> columns.
|
|
<P><B>-attention conn 100 ws 50 disk 75%</B>
|
|
<P>The following example has the same affect as the previous one except that
|
|
it sets the threshold for the <TT>Disk attn</TT> column to 5000 free KB
|
|
blocks:
|
|
<P><B>-attention disk 5000 ws 50 conn 100</B>
|
|
<P><DT><B>-debug
|
|
</B><DD>Enables debugging output and directs it into the specified file.
|
|
Partial pathnames are interpreted relative to the current working
|
|
directory. By default, no debugging output is produced.
|
|
</DL>
|
|
</OL>
|
|
<P><H3><A NAME="Header_374" HREF="auagd002.htm#ToC_374">To stop the scout program</A></H3>
|
|
<A NAME="IDX7151"></A>
|
|
<OL TYPE=1>
|
|
<P><LI>Enter <B>Ctrl-c</B> in the display window. This is the proper
|
|
interrupt signal even if the general interrupt signal in your environment is
|
|
different.
|
|
</OL>
|
|
<P><H3><A NAME="HDRWQ336" HREF="auagd002.htm#ToC_375">Example Commands and Displays</A></H3>
|
|
<A NAME="IDX7152"></A>
|
|
<A NAME="IDX7153"></A>
|
|
<P>This section presents examples of the <B>scout</B> program, combining
|
|
different arguments and illustrating the screen displays that result.
|
|
<P>In the first example, an administrator in the ABC Corporation issues the
|
|
<B>scout</B> command without providing any optional arguments or
|
|
flags. She includes the <B>-server</B> argument because she is
|
|
providing multiple machine names. She chooses to specify on the initial
|
|
part of each machine's name even though she has not used the
|
|
<B>-basename</B> argument, relying on the cell's name service to
|
|
obtain the fully-qualified name that the <B>scout</B> program requires for
|
|
establishing a connection.
|
|
<PRE> % <B>scout -server fs1 fs2</B>
|
|
</PRE>
|
|
<P><A HREF="#FIGWQ337">Figure 2</A> depicts the resulting display. Notice first that the
|
|
machine names in the fifth (unlabeled) column appear in the format the
|
|
administrator used on the command line. Now consider the second line in
|
|
the display region, where the machine name <TT>fs2</TT> appears in the fifth
|
|
column. The <TT>Conn</TT> and <TT>Ws</TT> columns together show
|
|
that machine <B>fs2</B> has 144 RPC connections open to 44 client
|
|
machines, demonstrating that multiple connections per client machine are
|
|
possible. The <TT>Fetch</TT> column shows that client machines have
|
|
made 2,734,278 fetch RPCs to machine <B>fs2</B> since the File Server
|
|
process last started and the <TT>Store</TT> column shows that they have made
|
|
34,066 store RPCs.
|
|
<P>Six partition entries appear in the <TT>Disk attn</TT> column, marked
|
|
<TT>a</TT> through <TT>f</TT> (for <B>/vicepa</B> through
|
|
<B>/vicepf</B>). They appear on three lines in two subcolumns
|
|
because of the width of the window; if the window is wider, there are
|
|
more subcolumns. Four of the partition entries (<TT>a</TT>,
|
|
<TT>c</TT>, <TT>d</TT>, and <TT>e</TT>) appear in reverse video to
|
|
indicate that they are more than 95% full (the threshold value that appears in
|
|
the <TT>Disk attn</TT> header).
|
|
<P><B><A NAME="FIGWQ337" HREF="auagd003.htm#FT_FIGWQ337">Figure 2. First example scout display</A></B><BR>
|
|
<TABLE BORDER ><TR><TD><BR>
|
|
<B><BR><IMG SRC="scout1.gif" ALT="First example scout display"><BR></B><BR>
|
|
</TD></TR></TABLE>
|
|
<P>In the second example, the administrator uses more of the <B>scout</B>
|
|
program's optional arguments.
|
|
<UL>
|
|
<P><LI>She provides the machine names in the same form as in Example 1, but this
|
|
time she also uses the <B>-basename</B> argument to specify their domain
|
|
name suffix, <B>abc.com</B>. This implies that the
|
|
<B>scout</B> program does not need the name service to expand the names to
|
|
fully-qualified hostnames, but the name service still converts the hostnames
|
|
to IP addresses.
|
|
<P><LI>She uses the <B>-host</B> flag to display in the banner line the name
|
|
of the client machine where the <B>scout</B> program is running.
|
|
<P><LI>She uses the <B>-frequency</B> argument to changes the probing
|
|
frequency from its default of once per minute to once every five
|
|
seconds.
|
|
<P><LI>She uses the <B>-attention</B> argument to changes the highlighting
|
|
threshold for partitions to a 5000 KB minimum rather than the default of 95%
|
|
full.
|
|
</UL>
|
|
<PRE> % <B>scout -server fs1 fs2 -basename abc.com -host -frequency 5 -attention disk 5000</B>
|
|
</PRE>
|
|
<P>The use of optional arguments results in several differences between <A HREF="#FIGWQ338">Figure 3</A> and <A HREF="#FIGWQ337">Figure 2</A>. First, because the <B>-host</B>
|
|
flag is included, the banner line displays the name of the machine running the
|
|
<B>scout</B> process as <TT>[client52]</TT> along with the basename
|
|
<TT>abc.com</TT> specified with the <B>-basename</B>
|
|
argument.
|
|
<P>Another difference is that two rather than four of machine
|
|
<B>fs2</B>'s partitions appear in reverse video, even though their
|
|
values are almost the same as in <A HREF="#FIGWQ337">Figure 2</A>. This is because the administrator changed the
|
|
highlight threshold to a 5000 block minimum, as also reflected in the
|
|
<TT>Disk attn</TT> column's header. And while machine
|
|
<B>fs2</B>'s partitions <B>/vicepa</B> and <B>/vicepd</B> are
|
|
still 95% full, they have more than 5000 free blocks left; partitions
|
|
<B>/vicepc</B> and <B>/vicepe</B> are highlighted because they have
|
|
fewer than 5000 blocks free.
|
|
<P>Note also the result of changing the probe frequency, reflected in the
|
|
probe reporting line at the bottom left corner of the display. Both
|
|
this example and the previous one represent a time lapse of one minute after
|
|
the administrator issues the <B>scout</B> command. In this example,
|
|
however, the <B>scout</B> program has probed the File Server processes 12
|
|
times as opposed to once
|
|
<P><B><A NAME="FIGWQ338" HREF="auagd003.htm#FT_FIGWQ338">Figure 3. Second example scout display</A></B><BR>
|
|
<TABLE BORDER ><TR><TD><BR>
|
|
<B><BR><IMG SRC="scout2.gif" ALT="Second example scout display"><BR></B><BR>
|
|
</TD></TR></TABLE>
|
|
<P>In <A HREF="#FIGWQ339">Figure 4</A>, an administrator in the State University cell monitors
|
|
three of that cell's file server machines. He uses the
|
|
<B>-basename</B> argument to specify the <B>stateu.edu</B>
|
|
domain name.
|
|
<PRE> % <B>scout -server server2 server3 server4 -basename stateu.edu</B>
|
|
</PRE>
|
|
<P><B><A NAME="FIGWQ339" HREF="auagd003.htm#FT_FIGWQ339">Figure 4. Third example scout display</A></B><BR>
|
|
<TABLE BORDER ><TR><TD><BR>
|
|
<B><BR><IMG SRC="scout3.gif" ALT="Third example scout display"><BR></B><BR>
|
|
</TD></TR></TABLE>
|
|
<P><A HREF="#FIGWQ340">Figure 5</A> illustrates three of the <B>scout</B> program's
|
|
features. First, you can monitor file server machines from different
|
|
cells in a single display: <B>fs1.abc.com</B>,
|
|
<B>server3.stateu.edu</B>, and
|
|
<B>sv7.def.com</B>. Because the machines belong to
|
|
different cells, it is not possible to provide the <B>-basename</B>
|
|
argument.
|
|
<P>Second, it illustrates how the display must truncate machine names that do
|
|
not fit in the fifth column, using an asterisk at the end of the name to show
|
|
that it is shortened.
|
|
<P>Third, it illustrates what happens when the <B>scout</B> process cannot
|
|
reach a File Server process, in this case the one on the machine
|
|
<B>sv7.def.com</B>: it highlights the machine name and
|
|
blanks out the values in the other columns.
|
|
<P><B><A NAME="FIGWQ340" HREF="auagd003.htm#FT_FIGWQ340">Figure 5. Fourth example scout display</A></B><BR>
|
|
<TABLE BORDER ><TR><TD><BR>
|
|
<B><BR><IMG SRC="scout4.gif" ALT="Fourth example scout display"><BR></B><BR>
|
|
</TD></TR></TABLE>
|
|
<HR><H2><A NAME="HDRWQ341" HREF="auagd002.htm#ToC_376">Using the fstrace Command Suite</A></H2>
|
|
<P>This section describes the <B>fstrace</B> commands that
|
|
system administrators employ to trace Cache Manager activity for debugging
|
|
purposes. It assumes the reader is familiar with the Cache Manager
|
|
concepts described in <A HREF="auagd015.htm#HDRWQ387">Administering Client Machines and the Cache Manager</A>.
|
|
<P>The <B>fstrace</B> command suite monitors the internal activity of the
|
|
Cache Manager and enables you to record, or trace, its operations in
|
|
detail. The operations, which are termed <I>events</I>, comprise
|
|
the <B>cm</B> <I>event set</I>. Examples of <B>cm</B>
|
|
events are fetching files and looking up information for a listing of files
|
|
and subdirectories using the UNIX <B>ls</B> command.
|
|
<P>Following are the <B>fstrace</B> commands and their respective
|
|
functions:
|
|
<UL>
|
|
<P><LI>The <B>fstrace apropos</B> command provides a short description of
|
|
commands.
|
|
<P><LI>The <B>fstrace clear</B> command clears the trace log.
|
|
<P><LI>The <B>fstrace dump</B> command dumps the contents of the trace
|
|
log.
|
|
<P><LI>The <B>fstrace help</B> command provides a description and syntax for
|
|
commands.
|
|
<P><LI>The <B>fstrace lslog</B> command lists information about the trace
|
|
log.
|
|
<P><LI>The <B>fstrace lsset</B> command lists information about the event
|
|
set.
|
|
<P><LI>The <B>fstrace setlog</B> command changes the size of the trace
|
|
log.
|
|
<P><LI>The <B>fstrace setset</B> command sets the state of the event
|
|
set.
|
|
</UL>
|
|
<P><H3><A NAME="HDRWQ342" HREF="auagd002.htm#ToC_377">About the fstrace Command Suite</A></H3>
|
|
<P>The <B>fstrace</B> command suite replaces and greatly
|
|
expands the functionality formerly provided by the <B>fs debug</B>
|
|
command. Its intended use is to aid in diagnosis of specific Cache
|
|
Manager problems, such as client machine hangs, cache consistency problems,
|
|
clock synchronization errors, and failures to access a volume or AFS
|
|
file. Therefore, it is best not to keep <B>fstrace</B> logging
|
|
enabled at all times, unlike the logging for AFS server processes.
|
|
<P>Most of the messages in the trace log correspond to low-level Cache Manager
|
|
operations. It is likely that only personnel familiar with the AFS
|
|
source code can interpret them. If you have an AFS source license, you
|
|
can attempt to interpret the trace yourself, or work with the AFS Product
|
|
Support group to resolve the underlying problems. If you do not have an
|
|
AFS source license, it is probably more efficient to contact the AFS Product
|
|
Support group immediately in case of problems. They can instruct you to
|
|
activate <B>fstrace</B> tracing if appropriate.
|
|
<P>The log can grow in size very quickly; this can use valuable disk
|
|
space if you are writing to a file in the local file space.
|
|
Additionally, if the size of the log becomes too large, it can become
|
|
difficult to parse the results for pertinent information.
|
|
<A NAME="IDX7154"></A>
|
|
<A NAME="IDX7155"></A>
|
|
<P>When AFS tracing is enabled, each time a <B>cm</B> event occurs, a
|
|
message is written to the trace log, <B>cmfx</B>. To diagnose a
|
|
problem, read the output of the trace log and analyze the operations executed
|
|
by the Cache Manager. The default size of the trace log is 60 KB, but
|
|
you can increase or decrease it.
|
|
<A NAME="IDX7156"></A>
|
|
<A NAME="IDX7157"></A>
|
|
<P>To use the <B>fstrace</B> command suite, you must first enable tracing
|
|
and reserve, or allocate, space for the trace log with the <B>fstrace
|
|
setset</B> command. With this command, you can set the <B>cm</B>
|
|
event set to one of three states to enable or disable tracing for the event
|
|
set and to allocate or deallocate space for the trace log in the kernel:
|
|
<A NAME="IDX7158"></A>
|
|
<A NAME="IDX7159"></A>
|
|
<A NAME="IDX7160"></A>
|
|
<DL>
|
|
<P><DT><B>active
|
|
</B><DD>Enables tracing for the event set and allocates space for the trace
|
|
log.
|
|
<P><DT><B>inactive
|
|
</B><DD>Temporarily disables tracing for the event set; however, the event
|
|
set continues to allocate space occupied by the log to which it sends
|
|
data.
|
|
<P><DT><B>dormant
|
|
</B><DD>Disables tracing for the event set; furthermore, the event set
|
|
releases the space occupied by the log to which it sends data. When the
|
|
<B>cm</B> event set that sends data to the <B>cmfx</B> trace log is in
|
|
this state, the space allocated for that log is freed or deallocated.
|
|
</DL>
|
|
<A NAME="IDX7161"></A>
|
|
<A NAME="IDX7162"></A>
|
|
<A NAME="IDX7163"></A>
|
|
<P>Both event sets and trace logs can be designated as <I>persistent</I>,
|
|
which prevents accidental resetting of an event set's state or clearing
|
|
of a trace log. The designation is made as the kernel is compiled and
|
|
cannot be changed.
|
|
<P>If an event set such as <B>cm</B> is persistent, you can change its
|
|
state only by including the <B>-set</B> argument to the <B>fstrace
|
|
setset</B> command. (That is, you cannot change its state along with
|
|
the state of all other event sets by issuing the <B>fstrace setset</B>
|
|
command with no arguments.) Similarly, if a trace log such as
|
|
<B>cmfx</B> is persistent, you can clear it only by including either the
|
|
<B>-set</B> or <B>-log</B> argument to the <B>fstrace clear</B>
|
|
command (you cannot clear it along with all other trace logs by issuing the
|
|
<B>fstrace clear</B> command with no arguments.)
|
|
<P>When a problem occurs, set the <B>cm</B> event set to active using the
|
|
<B>fstrace setset</B> command. When tracing is enabled on a busy
|
|
AFS client, the volume of events being recorded is significant;
|
|
therefore, when you are diagnosing problems, restrict AFS activity as much as
|
|
possible to minimize the amount of extraneous tracing in the log.
|
|
Because tracing can have a negative impact on system performance, leave
|
|
<B>cm</B> tracing in the dormant state when you are not diagnosing
|
|
problems.
|
|
<P>If a problem is reproducible, clear the <B>cmfx</B> trace log with the
|
|
<B>fstrace clear</B> command and reproduce the problem. If the
|
|
problem is not easily reproduced, keep the state of the event set active until
|
|
the problem recurs.
|
|
<P>To view the contents of the trace log and analyze the <B>cm</B> events,
|
|
use the <B>fstrace dump</B> command to copy the content lines of the trace
|
|
log to standard output (stdout) or to a file.
|
|
<TABLE><TR><TD ALIGN="LEFT" VALIGN="TOP"><B>Note:</B></TD><TD ALIGN="LEFT" VALIGN="TOP">If a particular command or process is causing problems, determine its process
|
|
id (PID). Search the output of the <B>fstrace dump</B> command for
|
|
the PID to find only those lines associated with the problem.
|
|
</TD></TR></TABLE>
|
|
<P><H3><A NAME="HDRWQ343" HREF="auagd002.htm#ToC_378">Requirements for Using the fstrace Command Suite</A></H3>
|
|
<A NAME="IDX7164"></A>
|
|
<A NAME="IDX7165"></A>
|
|
<P>Except for the <B>fstrace help</B> and <B>fstrace apropos</B>
|
|
commands, which require no privilege, issuing the <B>fstrace</B> commands
|
|
requires that the issuer be logged in as the local superuser <B>root</B>
|
|
on the local client machine. Before issuing an <B>fstrace</B>
|
|
command, verify that you have the necessary privilege.
|
|
<P>The Cache Manager catalog must be in place so that logging can
|
|
occur. The <B>fstrace</B> command suite uses the standard UNIX
|
|
catalog utilities. The default location is
|
|
<B>/usr/vice/etc/C/afszcm.cat</B>. It can be placed in
|
|
another directory by placing the file elsewhere and using the proper NLSPATH
|
|
and LANG environment variables.
|
|
<P><H3><A NAME="Header_379" HREF="auagd002.htm#ToC_379">Using fstrace Commands Effectively</A></H3>
|
|
<P>To use <B>fstrace</B> commands most effectively, configure them as
|
|
indicated:
|
|
<UL>
|
|
<P><LI>Store the <B>fstrace</B> binary in a local disk directory.
|
|
<P><LI>When you dump the <B>fstrace</B> log to a file, direct it to one on
|
|
the local disk.
|
|
<P><LI>The trace can grow large in just a few minutes. Before attempting
|
|
to dump the log to a local file, verify that you have enough room. Be
|
|
particularly careful if you are using disk quotas on partitions in the local
|
|
file system.
|
|
<P><LI>Attempt to limit Cache Manager activity on the AFS client machine other
|
|
than the problem operation. This reduces the amount of extraneous data
|
|
in the trace.
|
|
<P><LI>Activate the <B>fstrace</B> log for the shortest possibly period of
|
|
time. If possible activate the trace immediately before performing the
|
|
problem operation, deactivate it as soon as the operation completes, and dump
|
|
the trace log to a file immediately.
|
|
<P><LI>If possible, obtain UNIX process ID (PID) of the command or program that
|
|
initiates the problematic operation. This enables the person analyzing
|
|
the trace log to search it for messages associated with the PID.
|
|
</UL>
|
|
<P><H3><A NAME="HDRWQ344" HREF="auagd002.htm#ToC_380">Activating the Trace Log</A></H3>
|
|
<P>To start Cache Manager tracing on an AFS client machine, you
|
|
must first configure
|
|
<UL>
|
|
<P><LI>The <B>cmfx</B> kernel trace log using the <B>fstrace setlog</B>
|
|
command
|
|
<P><LI>The <B>cm</B> event set using the <B>fstrace setset</B> command
|
|
</UL>
|
|
<P>The <B>fstrace setlog</B> command sets the size of the <B>cmfx</B>
|
|
kernel trace log in kilobytes. The trace log occupies 60 kilobytes of
|
|
kernel by default. If the trace log already exists, it is cleared when
|
|
this command is issued and a new log of the given size is created.
|
|
Otherwise, a new log of the desired size is created.
|
|
<P>The <B>fstrace setset</B> command sets the state of the <B>cm</B>
|
|
kernel event set. The state of the <B>cm</B> event set determines
|
|
whether information on the events in that event set is logged.
|
|
<P>After establishing kernel tracing on the AFS client machine, you can check
|
|
the state of the event set and the size of the kernel buffer allocated for the
|
|
trace log. To display information about the state of the <B>cm</B>
|
|
event set, issue the <B>fstrace lsset</B> command. To display
|
|
information about the <B>cmfx</B> trace log, use the <B>fstrace
|
|
lslog</B> command. See the instructions in <A HREF="#HDRWQ346">Displaying the State of a Trace Log or Event Set</A>.
|
|
<A NAME="IDX7166"></A>
|
|
<A NAME="IDX7167"></A>
|
|
<A NAME="IDX7168"></A>
|
|
<A NAME="IDX7169"></A>
|
|
<P><H3><A NAME="Header_381" HREF="auagd002.htm#ToC_381">To configure the trace log</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Become the local superuser <B>root</B> on the machine, if you are not
|
|
already, by issuing the <B>su</B> command.
|
|
<PRE> % <B>su root</B>
|
|
Password: <VAR>root_password</VAR>
|
|
</PRE>
|
|
<P><LI>Issue the <B>fstrace setlog</B> command to set the size of the
|
|
<B>cmfx</B> kernel trace log.
|
|
<PRE> # <B>fstrace setlog</B> [<B>-log</B> <<VAR>log_name</VAR>><SUP>+</SUP>] <B>-buffersize</B> <<VAR>1-kilobyte_units</VAR>>
|
|
</PRE>
|
|
</OL>
|
|
<P>The following example sets the size of the <B>cmfx</B> trace log to 80
|
|
KB.
|
|
<PRE> # <B>fstrace setlog cmfx 80</B>
|
|
</PRE>
|
|
<A NAME="IDX7170"></A>
|
|
<A NAME="IDX7171"></A>
|
|
<A NAME="IDX7172"></A>
|
|
<A NAME="IDX7173"></A>
|
|
<P><H3><A NAME="HDRWQ345" HREF="auagd002.htm#ToC_382">To set the event set</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Become the local superuser <B>root</B> on the machine, if you are not
|
|
already, by issuing the <B>su</B> command.
|
|
<PRE> % <B>su root</B>
|
|
Password: <VAR>root_password</VAR>
|
|
</PRE>
|
|
<P><LI>Issue the <B>fstrace setset</B> command to set the state of event
|
|
sets.
|
|
<PRE> % <B>fstrace setset</B> [<B>-set</B> <<VAR>set_name</VAR>><SUP>+</SUP>] [<B>-active</B>] [<B>-inactive</B>] \
|
|
[<B>-dormant</B>]
|
|
</PRE>
|
|
</OL>
|
|
<P>The following example activates the <B>cm</B> event set.
|
|
<PRE> # <B>fstrace setset cm -active</B>
|
|
</PRE>
|
|
<P><H3><A NAME="HDRWQ346" HREF="auagd002.htm#ToC_383">Displaying the State of a Trace Log or Event Set</A></H3>
|
|
<P>An event set must be in the <I>active</I> state to be
|
|
included in the trace log. To display an event set's state, use
|
|
the <B>fstrace lsset</B> command. To set its state, issue the
|
|
<B>fstrace setset</B> command as described in <A HREF="#HDRWQ345">To set the event set</A>.
|
|
<P>To display size and allocation information for the trace log, issue the
|
|
<B>fstrace lslog</B>command with the <B>-long</B> argument.
|
|
<A NAME="IDX7174"></A>
|
|
<A NAME="IDX7175"></A>
|
|
<A NAME="IDX7176"></A>
|
|
<A NAME="IDX7177"></A>
|
|
<P><H3><A NAME="Header_384" HREF="auagd002.htm#ToC_384">To display the state of an event set</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Become the local superuser <B>root</B> on the machine, if you are not
|
|
already, by issuing the <B>su</B> command.
|
|
<PRE> % <B>su root</B>
|
|
Password: <VAR>root_password</VAR>
|
|
</PRE>
|
|
<P><LI>Issue the <B>fstrace lsset</B> command to display the available event
|
|
set and its state.
|
|
<PRE> # <B>fstrace lsset</B> [<B>-set</B> <<VAR>set_name</VAR>><SUP>+</SUP>]
|
|
</PRE>
|
|
</OL>
|
|
<P>The following example displays the event set and its state on the local
|
|
machine.
|
|
<PRE> #<B> fstrace lsset cm</B>
|
|
Available sets:
|
|
cm active
|
|
</PRE>
|
|
<P>The output from this command lists the event set and its states. The
|
|
three event states for the <B>cm</B> event set are:
|
|
<DL>
|
|
<P><DT><B><TT>active</TT>
|
|
</B><DD>Tracing is enabled.
|
|
<P><DT><B><TT>inactive</TT>
|
|
</B><DD>Tracing is disabled, but space is still allocated for the corresponding
|
|
trace log (<B>cmfx</B>).
|
|
<P><DT><B><TT>dormant</TT>
|
|
</B><DD>Tracing is disabled, and space is no longer allocated for the
|
|
corresponding trace log (<B>cmfx</B>).Disables tracing for the
|
|
event set.
|
|
</DL>
|
|
<A NAME="IDX7178"></A>
|
|
<A NAME="IDX7179"></A>
|
|
<A NAME="IDX7180"></A>
|
|
<A NAME="IDX7181"></A>
|
|
<P><H3><A NAME="Header_385" HREF="auagd002.htm#ToC_385">To display the log size</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Become the local superuser <B>root</B> on the machine, if you are not
|
|
already, by issuing the <B>su</B> command.
|
|
<PRE> % <B>su root</B>
|
|
Password: <VAR>root_password</VAR>
|
|
</PRE>
|
|
<P><LI>Issue the <B>fstrace lslog</B> command to display information about
|
|
the kernel trace log.
|
|
<PRE> # <B>fstrace lslog</B> [<B>-set</B> <<VAR>set_name</VAR>><SUP>+</SUP>] [<B>-log</B> <<VAR>log_name</VAR>>] [<B>-long</B>]
|
|
</PRE>
|
|
</OL>
|
|
<P>The following example uses the <B>-long</B> flag to display additional
|
|
information about the <B>cmfx</B> trace log.
|
|
<PRE> # <B>fstrace lslog cmfx -long</B>
|
|
Available logs:
|
|
cmfx : 60 kbytes (allocated)
|
|
</PRE>
|
|
<P>The output from this command lists information on the trace log.
|
|
When issued without the <B>-long</B> flag, the <B>fstrace lslog</B>
|
|
command lists only the name of the log. When issued with the
|
|
<B>-long</B> flag, the <B>fstrace lslog</B> command lists the log, the
|
|
size of the log in kilobytes, and the allocation state of the log.
|
|
<P>There are two allocation states for the kernel trace log:
|
|
<DL>
|
|
<P><DT><B><TT>allocated</TT>
|
|
</B><DD>Space is reserved for the log in the kernel. This indicates that
|
|
the event set that writes to this log is either <I>active</I> (tracing is
|
|
enabled for the event set) or <I>inactive</I> (tracing is temporarily
|
|
disabled for the event set); however, the event set continues to reserve
|
|
space occupied by the log to which it sends data.
|
|
<P><DT><B><TT>unallocated</TT>
|
|
</B><DD>Space is not reserved for the log in the kernel. This indicates
|
|
that the event set that writes to this log is <I>dormant</I> (tracing is
|
|
disabled for the event set); furthermore, the event set releases the
|
|
space occupied by the log to which it sends data.
|
|
</DL>
|
|
<P><H3><A NAME="HDRWQ347" HREF="auagd002.htm#ToC_386">Dumping and Clearing the Trace Log</A></H3>
|
|
<P>After the Cache Manager operation you want to trace is
|
|
complete, use the <B>fstrace dump</B> command to dump the trace log to the
|
|
standard output stream or to the file named by the <B>-file</B>
|
|
argument. Or, to dump the trace log continuously, use the
|
|
<B>-follow</B> argument (combine it with the <B>-file</B> argument if
|
|
desired). To halt continuous dumping, press an interrupt signal such as
|
|
<<B>Ctrl-c</B>>.
|
|
<P>To clear a trace log when you no longer need the data in it, issue the
|
|
<B>fstrace clear</B> command. (The <B>fstrace setlog</B>
|
|
command also clears an existing trace log automatically when you use it to
|
|
change the log's size.)
|
|
<A NAME="IDX7182"></A>
|
|
<A NAME="IDX7183"></A>
|
|
<A NAME="IDX7184"></A>
|
|
<A NAME="IDX7185"></A>
|
|
<A NAME="IDX7186"></A>
|
|
<P><H3><A NAME="Header_387" HREF="auagd002.htm#ToC_387">To dump the contents of a trace log</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Become the local superuser <B>root</B> on the machine, if you are not
|
|
already, by issuing the <B>su</B> command.
|
|
<PRE> % <B>su root</B>
|
|
Password: <VAR>root_password</VAR>
|
|
</PRE>
|
|
<P><LI>Issue the <B>fstrace dump</B> command to dump trace logs.
|
|
<PRE> # <B>fstrace dump</B> [<B>-set</B> <<VAR>set_name</VAR>><SUP>+</SUP>] [<B>-follow</B> <<VAR>log_name></VAR>] \
|
|
[<B>-file</B> <<VAR>output_filename</VAR>>] \
|
|
[<B>-sleep</B> <<VAR>seconds_between_reads</VAR>>]
|
|
</PRE>
|
|
</OL>
|
|
<P>At the beginning of the output of each dump is a header specifying the date
|
|
and time at which the dump began. The number of logs being dumped is
|
|
also displayed if the <B>-follow</B> argument is not specified. The
|
|
header appears as follows:
|
|
<PRE> AFS Trace Dump --
|
|
Date: <VAR>date</VAR> <VAR>time</VAR>
|
|
Found <VAR>n</VAR> logs.
|
|
</PRE>
|
|
<P>where <I>date</I> is the starting date of the trace log dump,
|
|
<I>time</I> is the starting time of the trace log dump, and <I>n</I>
|
|
specifies the number of logs found by the <B>fstrace dump</B>
|
|
command.
|
|
<P>The following is an example of trace log dump header:
|
|
<PRE> AFS Trace Dump --
|
|
Date: Fri Apr 16 10:44:38 1999
|
|
Found 1 logs.
|
|
</PRE>
|
|
<P>The contents of the log follow the header and are comprised of messages
|
|
written to the log from an active event set. The messages written to
|
|
the log contain the following three components:
|
|
<UL>
|
|
<P><LI>The timestamp associated with the message (number of seconds from an
|
|
arbitrary start point)
|
|
<P><LI>The process ID or thread ID associated with the message
|
|
<P><LI>The message itself
|
|
</UL>
|
|
<P>A trace log message is formatted as follows:
|
|
<PRE> time <VAR>timestamp</VAR>, pid <VAR>pid</VAR>:<VAR>event message</VAR>
|
|
</PRE>
|
|
<P>where <I>timestamp</I> is the number of seconds from an arbitrary start
|
|
point, <I>pid</I> is the process ID number of the Cache Manager event, and
|
|
<I>event message</I> is the Cache Manager event which corresponds with a
|
|
function in the AFS source code.
|
|
<P>The following is an example of a dumped trace log message:
|
|
<PRE> time 749.641274, pid 3002:Returning code 2 from 19
|
|
</PRE>
|
|
<P>For the messages in the trace log to be most readable, the Cache Manager
|
|
catalog file needs to be installed on the local disk of the client
|
|
machine; the conventional location is
|
|
<B>/usr/vice/etc/C/afszcm.cat</B>. Log messages that begin
|
|
with the string <TT>raw op</TT>, like the following, indicate that the
|
|
catalog is not installed.
|
|
<PRE> raw op 232c, time 511.916288, pid 0
|
|
p0:Fri Apr 16 10:36:31 1999
|
|
</PRE>
|
|
<P>Every 1024 seconds, a current time message is written to each log.
|
|
This message has the following format:
|
|
<PRE> time <VAR>timestamp</VAR>, pid <VAR>pid</VAR>: Current time: <VAR>unix_time</VAR>
|
|
</PRE>
|
|
<P>where <VAR>timestamp</VAR> is the number of seconds from an arbitrary start
|
|
point, <VAR>pid</VAR> is the process ID number, and <VAR>unix_time</VAR> is the
|
|
standard time format since January 1, 1970.
|
|
<P>The current time message can be used to determine the actual time
|
|
associated with each log message. Determine the actual time as
|
|
follows:
|
|
<OL TYPE=1>
|
|
<P><LI>Locate the log message whose actual time you want to determine.
|
|
<P><LI>Search backward through the dump record until you come to a current time
|
|
message.
|
|
<P><LI>If the current time message's <I>timestamp</I> is smaller than
|
|
the log message's <I>timestamp</I>, subtract the former from the
|
|
latter. If the current time message's <I>timestamp</I> is
|
|
larger than the log message's <I>timestamp</I>, add 1024 to the
|
|
latter and subtract the former from the result.
|
|
<P><LI>Add the resulting number to the current time message's
|
|
<I>unix_time</I> to determine the log message's actual time.
|
|
</OL>
|
|
<P>Because log data is stored in a finite, circular buffer, some of the data
|
|
can be overwritten before being read. If this happens, the following
|
|
message appears at the appropriate place in the dump:
|
|
<PRE> Log wrapped; data missing.
|
|
</PRE>
|
|
<TABLE><TR><TD ALIGN="LEFT" VALIGN="TOP"><B>Note:</B></TD><TD ALIGN="LEFT" VALIGN="TOP">If this message appears in the middle of a dump, which can happen under a
|
|
heavy work load, it indicates that not all of the log data is being written to
|
|
the log or some data is being overwritten. Increasing the size of the
|
|
log with the <B>fstrace setlog</B> command can alleviate this
|
|
problem.
|
|
</TD></TR></TABLE>
|
|
<A NAME="IDX7187"></A>
|
|
<A NAME="IDX7188"></A>
|
|
<A NAME="IDX7189"></A>
|
|
<A NAME="IDX7190"></A>
|
|
<A NAME="IDX7191"></A>
|
|
<P><H3><A NAME="Header_388" HREF="auagd002.htm#ToC_388">To clear the contents of a trace log</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Become the local superuser <B>root</B> on the machine, if you are not
|
|
already, by issuing the <B>su</B> command.
|
|
<PRE> % <B>su root</B>
|
|
Password: <VAR>root_password</VAR>
|
|
</PRE>
|
|
<P><LI>Issue the <B>fstrace clear</B> command to clear logs by log name or by
|
|
event set.
|
|
<PRE> # <B>fstrace clear</B> [<B>-set</B> <<VAR>set_name</VAR>><SUP>+</SUP>] [<B>-log</B> <<VAR>log_name</VAR>><SUP>+</SUP>]
|
|
</PRE>
|
|
</OL>
|
|
<P>The following example clears the <B>cmfx</B> log used by the
|
|
<B>cm</B> event set on the local machine.
|
|
<PRE> # <B>fstrace clear cm</B>
|
|
</PRE>
|
|
<P>The following example also clears the <B>cmfx</B> log on the local
|
|
machine.
|
|
<PRE> # <B>fstrace clear cmfx</B>
|
|
</PRE>
|
|
<A NAME="IDX7192"></A>
|
|
<P><H3><A NAME="HDRWQ348" HREF="auagd002.htm#ToC_389">Examples of fstrace Commands</A></H3>
|
|
<P>This section contains an extensive example of the use of the
|
|
<B>fstrace</B> command suite, which is useful for gathering a detailed
|
|
trace of Cache Manager activity when you are working with AFS Product Support
|
|
to diagnose a problem. The Product Support representative can guide you
|
|
in choosing appropriate parameter settings for the trace.
|
|
<P>Before starting the kernel trace log, try to isolate the Cache Manager on
|
|
the AFS client machine that is experiencing the problem accessing the
|
|
file. If necessary, instruct users to move to another machine so as to
|
|
minimize the Cache Manager activity on this machine. To minimize the
|
|
amount of unrelated AFS activity recorded in the trace log, place both the
|
|
<B>fstrace</B> binary and the dump file must reside on the local disk, not
|
|
in AFS. You must be logged in as the local superuser <B>root</B> to
|
|
issue <B>fstrace</B> commands.
|
|
<P>Before starting a kernel trace, issue the <B>fstrace lsset</B> command
|
|
to check the state of the <B>cm</B> event set.
|
|
<PRE> # <B>fstrace lsset cm</B>
|
|
</PRE>
|
|
<P>If tracing has not been enabled previously or if tracing has been turned
|
|
off on the client machine, the following output is displayed:
|
|
<PRE> Available sets:
|
|
cm inactive
|
|
</PRE>
|
|
<P>If tracing has been turned off and kernel memory is not allocated for the
|
|
trace log on the client machine, the following output is displayed:
|
|
<PRE> Available sets:
|
|
cm inactive (dormant)
|
|
</PRE>
|
|
<P>If the current state of the <B>cm</B> event set is <TT>inactive</TT>
|
|
or <TT>inactive (dormant)</TT>, turn on kernel tracing by issuing the
|
|
<B>fstrace setset</B> command with the <B>-active</B> flag.
|
|
<PRE> # <B>fstrace setset cm -active</B>
|
|
</PRE>
|
|
<P>If tracing is enabled currently on the client machine, the following output
|
|
is displayed:
|
|
<PRE> Available sets:
|
|
cm active
|
|
</PRE>
|
|
<P>If tracing is enabled currently, you do not need to use the <B>fstrace
|
|
setset</B> command. Do issue the <B>fstrace clear</B> command to
|
|
clear the contents of any existing trace log, removing prior traces that are
|
|
not related to the current problem.
|
|
<PRE> # <B>fstrace clear cm</B>
|
|
</PRE>
|
|
<P>After checking on the state of the event set, issue the <B>fstrace
|
|
lslog</B> command with the <B>-long</B> flag to check the current state
|
|
and size of the kernel trace log .
|
|
<PRE> # <B>fstrace lslog cmfx -long</B>
|
|
</PRE>
|
|
<P>If tracing has not been enabled previously or the <B>cm</B> event set
|
|
was set to <TT>active</TT> or <TT>inactive</TT> previously, output similar
|
|
to the following is displayed:
|
|
<PRE> Available logs:
|
|
cmfx : 60 kbytes (allocated)
|
|
</PRE>
|
|
<P>The <B>fstrace</B> tracing utility allocates 60 kilobytes of memory to
|
|
the trace log by default. You can increase or decrease the amount of
|
|
memory allocated to the kernel trace log by setting it with the <B>fstrace
|
|
setlog</B> command. The number specified with the
|
|
<B>-buffersize</B> argument represents the number of kilobytes allocated
|
|
to the kernel trace log. If you increase the size of the kernel trace
|
|
log to 100 kilobytes, issue the following command.
|
|
<PRE> # <B>fstrace setlog cmfx</B> 100
|
|
</PRE>
|
|
<P>After ensuring that the kernel trace log is configured for your needs, you
|
|
can set up a file into which you can dump the kernel trace log. For
|
|
example, create a dump file with the name
|
|
<B>cmfx.dump.file.1</B> using the following
|
|
<B>fstrace dump</B> command. Issue the command as a continuous
|
|
process by adding the <B>-follow</B> and <B>-sleep</B>
|
|
arguments. Setting the <B>-sleep</B> argument to <I>10</I>
|
|
dumps output from the kernel trace log to the file every 10 seconds.
|
|
<PRE> # <B>fstrace dump -follow</B> cmfx <B>-file</B> cmfx.dump.file.1 <B>-sleep</B> 10
|
|
AFS Trace Dump -
|
|
Date: Fri Apr 16 10:54:57 1999
|
|
Found 1 logs.
|
|
time 32.965783, pid 0: Fri Apr 16 10:45:52 1999
|
|
time 32.965783, pid 33657: Close 0x5c39ed8 flags 0x20
|
|
time 32.965897, pid 33657: Gn_close vp 0x5c39ed8 flags 0x20 (returns
|
|
0x0)
|
|
time 35.159854, pid 10891: Breaking callback for 5bd95e4 states 1024
|
|
(volume 0)
|
|
time 35.407081, pid 10891: Breaking callback for 5c0fadc states 1024
|
|
(volume 0)
|
|
. .
|
|
. .
|
|
. .
|
|
time 71.440456, pid 33658: Lookup adp 0x5bbdcf0 name g3oCKs fid (756
|
|
4fb7e:588d240.2ff978a8.6)
|
|
time 71.440569, pid 33658: Returning code 2 from 19
|
|
time 71.440619, pid 33658: Gn_lookup vp 0x5bbdcf0 name g3oCKs (returns
|
|
0x2)
|
|
time 71.464989, pid 38267: Gn_open vp 0x5bbd000 flags 0x0 (returns 0x
|
|
0)
|
|
AFS Trace Dump - Completed
|
|
</PRE>
|
|
<HR><H2><A NAME="HDRWQ349" HREF="auagd002.htm#ToC_390">Using the afsmonitor Program</A></H2>
|
|
<A NAME="IDX7193"></A>
|
|
<P>The <B>afsmonitor</B> program enables you to monitor the status and
|
|
performance of specified File Server and Cache Manager processes by gathering
|
|
statistical information. Among its other uses, the
|
|
<B>afsmonitor</B> program can be used to fine-tune Cache Manager
|
|
configuration and load balance File Servers.
|
|
<P>The <B>afsmonitor</B> program enables you to perform the following
|
|
tasks.
|
|
<UL>
|
|
<P><LI>Monitor any number of File Server and Cache Manager processes on any
|
|
number of machines (in both local and foreign cells) from a single
|
|
location.
|
|
<P><LI>Set threshold values for any monitored statistic. When the value of
|
|
a statistic exceeds the threshold, the <B>afsmonitor</B> program
|
|
highlights it to draw your attention. You can set threshold levels that
|
|
apply to every machine or only some.
|
|
<P><LI>Invoke programs or scripts automatically when a statistic exceeds its
|
|
threshold.
|
|
</UL>
|
|
<P><H3><A NAME="HDRWQ350" HREF="auagd002.htm#ToC_391">Requirements for running the afsmonitor program</A></H3>
|
|
<A NAME="IDX7194"></A>
|
|
<P>The following software must be accessible to a machine where the
|
|
<B>afsmonitor</B> program is running:
|
|
<UL>
|
|
<P><LI>The AFS <B>xstat</B> libraries, which the <B>afsmonitor</B>
|
|
program uses to gather data
|
|
<P><LI>The <B>curses</B> graphics package, which most UNIX distributions
|
|
provide as a standard utility
|
|
</UL>
|
|
<A NAME="IDX7195"></A>
|
|
<A NAME="IDX7196"></A>
|
|
<P>The <B>afsmonitor</B> screens format successfully both on so-called
|
|
dumb terminals and in windowing systems that emulate terminals. For the
|
|
output to looks its best, the display environment needs to support reverse
|
|
video and cursor addressing. Set the TERM environment variable to the
|
|
correct terminal type, or to a value that has characteristics similar to the
|
|
actual terminal type. The display window or terminal must be at least
|
|
80 columns wide and 12 lines long.
|
|
<A NAME="IDX7197"></A>
|
|
<A NAME="IDX7198"></A>
|
|
<A NAME="IDX7199"></A>
|
|
<P>The <B>afsmonitor</B> program must run in the foreground, and in its
|
|
own separate, dedicated window or terminal. The window or terminal is
|
|
unavailable for any other activity as long as the <B>afsmonitor</B>
|
|
program is running. Any number of instances of the
|
|
<B>afsmonitor</B> program can run on a single machine, as long as each
|
|
instance runs in its own dedicated window or terminal. Note that it can
|
|
take up to three minutes to start an additional instance.
|
|
<P>
|
|
<A NAME="IDX7200"></A>
|
|
No privilege is required to run the <B>afsmonitor</B> program. By
|
|
convention, it is installed in the <B>/usr/afsws/bin</B> directory, and
|
|
anyone who can access the directory can monitor File Servers and Cache
|
|
Managers. The probes through which the <B>afsmonitor</B> program
|
|
collects statistics do not constitute a significant burden on the File Server
|
|
or Cache Manager unless hundreds of people are running the program. If
|
|
you wish to restrict its use, place the binary file in a directory available
|
|
only to authorized users.
|
|
<P><H3><A NAME="Header_392" HREF="auagd002.htm#ToC_392">The afsmonitor Output Screens</A></H3>
|
|
<A NAME="IDX7201"></A>
|
|
<P>The <B>afsmonitor</B> program displays its data on three screens:
|
|
<UL>
|
|
<P><LI><TT>System Overview</TT>: This screen appears automatically when
|
|
the <B>afsmonitor</B> program initializes. It summarizes separately
|
|
for File Servers and Cache Managers the number of machines being monitored and
|
|
how many of them have <I>alerts</I> (statistics that have exceeded their
|
|
thresholds). It then lists the hostname and number of alerts for each
|
|
machine being monitored, indicating if appropriate that a process failed to
|
|
respond to the last probe.
|
|
<P><LI><TT>File Server</TT>: This screen displays File Server statistics
|
|
for each file server machine being monitored. It highlights statistics
|
|
that have exceeded their thresholds, and identifies machines that failed to
|
|
respond to the last probe.
|
|
<P><LI><TT>Cache Managers</TT>: This screen displays Cache Manager
|
|
statistics for each client machine being monitored. It highlights
|
|
statistics that have exceeded their thresholds, and identifies machines that
|
|
failed to respond to the last probe.
|
|
</UL>
|
|
<P>Fields at the corners of every screen display the following
|
|
information:
|
|
<UL>
|
|
<P><LI>In the top left corner, the program name and version number.
|
|
<P><LI>In the top right corner, the screen name, current and total page numbers,
|
|
and current and total column numbers. The page number (for example,
|
|
<TT>p. 1 of 3</TT>) indicates the index of the current page and the
|
|
total number of (vertical) pages over which data is displayed. The
|
|
column number (for example, <TT>c. 1 of 235</TT>) indicates the index
|
|
of the current leftmost column and the total number of columns in which data
|
|
appears. (The symbol <TT>>>></TT> indicates that there is additional
|
|
data to the right; the symbol <TT><<<</TT> indicates that
|
|
there is additional data to the left.)
|
|
<P><LI>In the bottom left corner, a list of the available commands. Enter
|
|
the first letter in the command name to run that command. Only the
|
|
currently possible options appear; for example, if there is only one page
|
|
of data, the <TT>next</TT> and <TT>prev</TT> commands, which scroll the
|
|
screen up and down respectively, do not appear. For descriptions of the
|
|
commands, see the following section about navigating the display
|
|
screens.
|
|
<P><LI>In the bottom right corner, the <TT>probes</TT> field reports how many
|
|
times the program has probed File Servers (<TT>fs</TT>), Cache Managers
|
|
(<TT>cm</TT>), or both. The counts for File Servers and Cache
|
|
Managers can differ. The <TT>freq</TT> field reports how often the
|
|
program sends probes.
|
|
</UL>
|
|
<P><B>Navigating the afsmonitor Display Screens</B>
|
|
<P>As noted, the lower left hand corner of every display screen displays the
|
|
names of the commands currently available for moving to alternate screens,
|
|
which can either be a different type or display more statistics or machines of
|
|
the current type. To execute a command, press the lowercase version of
|
|
the first letter in its name. Some commands also have an uppercase
|
|
version that has a somewhat different effect, as indicated in the following
|
|
list.
|
|
<DL>
|
|
<P><DT><B><TT>cm</TT>
|
|
</B><DD>Switches to the <TT>Cache Managers</TT> screen. Available only on
|
|
the <TT>System Overview</TT> and <TT>File Servers</TT> screens.
|
|
<P><DT><B><TT>fs</TT>
|
|
</B><DD>Switches to the <TT>File Servers</TT> screen. Available only on
|
|
the <TT>System Overview</TT> and the <TT>Cache Managers</TT>
|
|
screens.
|
|
<P><DT><B><TT>left</TT>
|
|
</B><DD>Scrolls horizontally to the left, to access the data columns situated to
|
|
the left of the current set. Available when the <TT><<<</TT>
|
|
symbol appears at the top left of the screen. Press uppercase
|
|
<B>L</B> to scroll horizontally all the way to the left (to display the
|
|
first set of data columns).
|
|
<P><DT><B><TT>next</TT>
|
|
</B><DD>Scrolls down vertically to the next page of machine names.
|
|
Available when there are two or more pages of machines and the final page is
|
|
not currently displayed. Press uppercase <B>N</B> to scroll to the
|
|
final page.
|
|
<P><DT><B><TT>oview</TT>
|
|
</B><DD>Switches to the <TT>System Overview</TT> screen. Available only
|
|
on the <TT>Cache Managers</TT> and <TT>File Servers</TT> screens.
|
|
<P><DT><B><TT>prev</TT>
|
|
</B><DD>Scrolls up vertically to the previous page of machine names.
|
|
Available when there are two or more pages of machines and the first page is
|
|
not currently displayed. Press uppercase <B>N</B> to scroll to the
|
|
first page.
|
|
<P><DT><B><TT>right</TT>
|
|
</B><DD>Scrolls horizontally to the right, to access the data columns situated to
|
|
the right of the current set. This command is available when the
|
|
<TT>>>></TT> symbol appears at the upper right of the screen. Press
|
|
uppercase <B>R</B> to scroll horizontally all the way to the right (to
|
|
display the final set of data columns).
|
|
</DL>
|
|
<P><H3><A NAME="Header_393" HREF="auagd002.htm#ToC_393">The System Overview Screen</A></H3>
|
|
<P>The <TT>System Overview</TT> screen appears automatically as the
|
|
<B>afsmonitor</B> program initializes. This screen displays the
|
|
status of as many File Server and Cache Manager processes as can fit in the
|
|
current window; scroll down to access additional information.
|
|
<P>The information on this screen is split into File Server information on the
|
|
left and Cache Manager information on the right. The header for each
|
|
grouping reports two pieces of information:
|
|
<UL>
|
|
<P><LI>The number of machines on which the program is monitoring the indicated
|
|
process
|
|
<P><LI>The number of alerts and the number of machines affected by them (an
|
|
<I>alert</I>means that a statistic has exceeded its threshold or a process
|
|
failed to respond to the last probe)
|
|
</UL>
|
|
<P>A list of the machines being monitored follows. If there are any
|
|
alerts on a machine, the number of them appears in square brackets to the left
|
|
of the hostname. If a process failed to respond to the last probe, the
|
|
letters <TT>PF</TT> (probe failure) appear in square brackets to the left of
|
|
the hostname.
|
|
<P>The following graphic is an example <TT>System Overview</TT>
|
|
screen. The <B>afsmonitor</B> program is monitoring six File
|
|
Servers and seven Cache Managers. The File Server process on host
|
|
<B>fs1.abc.com</B> and the Cache Manager on host
|
|
<B>cli33.abc.com</B> are each marked <TT>[ 1]</TT> to
|
|
indicate that one threshold value is exceeded. The <TT>[PF]</TT>
|
|
marker on host <B>fs6.abc.com</B> indicates that its File
|
|
Server process did not respond to the last probe.
|
|
<P><B><A NAME="Figure_6" HREF="auagd003.htm#FT_Figure_6">Figure 6. The afsmonitor System Overview Screen</A></B><BR>
|
|
<TABLE BORDER ><TR><TD><BR>
|
|
<B><BR><IMG SRC="overview.gif" ALT="System Overview Screen"><BR></B><BR>
|
|
</TD></TR></TABLE>
|
|
<P><H3><A NAME="Header_394" HREF="auagd002.htm#ToC_394">The File Servers Screen</A></H3>
|
|
<P>The <TT>File Servers</TT> screen displays the values collected at the
|
|
most recent probe for File Server statistics.
|
|
<P>A summary line at the top of the screen (just below the standard program
|
|
version and screen title blocks) specifies the number of monitored File
|
|
Servers, the number of alerts, and the number of machines affected by the
|
|
alerts.
|
|
<P>The first column always displays the hostnames of the machines running the
|
|
monitored File Servers.
|
|
<P>To the right of the hostname column appear as many columns of statistics as
|
|
can fit within the current width of the display screen or window; each
|
|
column requires space for 10 characters. The name of the statistic
|
|
appears at the top of each column. If the File Server on a machine did
|
|
not respond to the most recent probe, a pair of dashes (<TT>--</TT>) appears
|
|
in each column. If a value exceeds its configured threshold, it is
|
|
highlighted in reverse video. If a value is too large to fit into the
|
|
allotted column width, it overflows into the next row in the same
|
|
column.
|
|
<P>For a list of the available File Server statistics, see <A HREF="auagd024.htm#HDRWQ617">Appendix C, The afsmonitor Program Statistics</A>.
|
|
<P>The following graphic depicts the <TT>File Servers</TT> screen that
|
|
follows the System Overview Screen example previously discussed; however,
|
|
one additional server probe has been completed. In this example, the
|
|
File Server process on <B>fs1</B> has exceeded the configured threshold
|
|
for the number of performance calls received (the <B>numPerfCalls</B>
|
|
statistic), and that field appears in reverse video. Host
|
|
<B>fs6</B> did not respond to Probe 10, so dashes appear in all
|
|
fields.
|
|
<P><B><A NAME="Figure_7" HREF="auagd003.htm#FT_Figure_7">Figure 7. The afsmonitor File Servers Screen</A></B><BR>
|
|
<TABLE BORDER ><TR><TD><BR>
|
|
<B><BR><IMG SRC="fserver1.gif" ALT="File Servers Screen"><BR></B><BR>
|
|
</TD></TR></TABLE>
|
|
<P>Both the File Servers and Cache Managers screen (discussed in the following
|
|
section) can display hundreds of columns of data and are therefore designed to
|
|
scroll left and right. In the preceding graphic, the screen displays
|
|
the leftmost screen and the screen title block shows that column 1 of 235 is
|
|
displayed. The appearance of the <TT>>>></TT> symbol in the upper
|
|
right hand corner of the screen and the <B>right</B> command in the
|
|
command block indicate that additional data is available by scrolling
|
|
right. (For information on the available statistics, see <A HREF="auagd024.htm#HDRWQ617">Appendix C, The afsmonitor Program Statistics</A>.)
|
|
<P>If the <B>right</B> command is executed, the screen looks something
|
|
like the following example. Note that the horizontal scroll symbols now
|
|
point both to the left (<TT><<<</TT>) and to the right
|
|
(<TT>>>></TT>) and both the <B>left</B> and <B>right</B> commands
|
|
appear, indicating that additional data is available by scrolling both left
|
|
and right.
|
|
<P><B><A NAME="Figure_8" HREF="auagd003.htm#FT_Figure_8">Figure 8. The afsmonitor File Servers Screen Shifted One Page to the Right</A></B><BR>
|
|
<TABLE BORDER ><TR><TD><BR>
|
|
<B><BR><IMG SRC="fserver2.gif" ALT="File Servers Screen Shifted One Page to the Right"><BR></B><BR>
|
|
</TD></TR></TABLE>
|
|
<P><H3><A NAME="Header_395" HREF="auagd002.htm#ToC_395">The Cache Managers Screen</A></H3>
|
|
<P>The <TT>Cache Managers</TT> screen displays the values collected at
|
|
the most recent probe for Cache Manager statistics.
|
|
<P>A summary line at the top of the screen (just below the standard program
|
|
version and screen title blocks) specifies the number of monitored Cache
|
|
Managers, the number of alerts, and the number of machines affected by the
|
|
alerts.
|
|
<P>The first column always displays the hostnames of the machines running the
|
|
monitored Cache Managers.
|
|
<P>To the right of the hostname column appear as many columns of statistics as
|
|
can fit within the current width of the display screen or window; each
|
|
column requires space for 10 characters. The name of the statistic
|
|
appears at the top of each column. If the Cache Manager on a machine
|
|
did not respond to the most recent probe, a pair of dashes (<TT>--</TT>)
|
|
appears in each column. If a value exceeds its configured threshold, it
|
|
is highlighted in reverse video. If a value is too large to fit into
|
|
the allotted column width, it overflows into the next row in the same
|
|
column.
|
|
<P>For a list of the available Cache Manager statistics, see <A HREF="auagd024.htm#HDRWQ617">Appendix C, The afsmonitor Program Statistics</A>.
|
|
<P>The following graphic depicts a Cache Managers screen that follows the
|
|
System Overview Screen previously discussed. In the example, the Cache
|
|
Manager process on host <B>cli33</B> has exceeded the configured threshold
|
|
for the number of cells it can contact (the <B>numCellsContacted</B>
|
|
statistic), so that field appears in reverse video.
|
|
<P><B><A NAME="Figure_9" HREF="auagd003.htm#FT_Figure_9">Figure 9. The afsmonitor Cache Managers Screen</A></B><BR>
|
|
<TABLE BORDER WIDTH="100%"><TR><TD>
|
|
<B><BR><IMG SRC="cachmgr.gif" ALT="Cache Managers Screen"><BR></B>
|
|
</TD></TR></TABLE>
|
|
<HR><H2><A NAME="HDRWQ351" HREF="auagd002.htm#ToC_396">Configuring the afsmonitor Program</A></H2>
|
|
<A NAME="IDX7202"></A>
|
|
<A NAME="IDX7203"></A>
|
|
<P>To customize the <B>afsmonitor</B> program, create an ASCII-format
|
|
configuration file and use the <B>-config</B> argument to name it.
|
|
You can specify the following in the configuration file:
|
|
<UL>
|
|
<P><LI>The File Servers, Cache Managers, or both to monitor.
|
|
<P><LI>The statistics to display. By default, the display includes 271
|
|
statistics for File Servers and 570 statistics for Cache Managers. For
|
|
information on the available statistics, see <A HREF="auagd024.htm#HDRWQ617">Appendix C, The afsmonitor Program Statistics</A>.
|
|
<P><LI>The threshold values to set for statistics and a script or program to
|
|
execute if a threshold is exceeded. By default, no threshold values are
|
|
defined and no scripts or programs are executed.
|
|
</UL>
|
|
<P>The following list describes the instructions that can appear in the
|
|
configuration file:
|
|
<DL>
|
|
<P><DT><B><TT>cm <VAR>host_name</VAR></TT>
|
|
</B><DD>Names a client machine for which to display Cache Manager
|
|
statistics. The order of <B>cm</B> lines in the file determines the
|
|
order in which client machines appear from top to bottom on the <TT>System
|
|
Overview</TT> and <TT>Cache Managers</TT> output screens.
|
|
<P><DT><B><TT>fs <VAR>host_name</VAR></TT>
|
|
</B><DD>Names a file server machine for which to display File Server
|
|
statistics. The order of <B>fs</B> lines in the file determines the
|
|
order in which file server machines appear from top to bottom on the
|
|
<TT>System Overview</TT> and <TT>File Servers</TT> output screens.
|
|
<P><DT><B><TT>thresh fs | cm <VAR>field_name</VAR> <VAR>thresh_val</VAR>
|
|
[<VAR>cmd_to_run</VAR>] [<VAR>arg</VAR><SUB>1</SUB>] . . .
|
|
[<VAR>arg</VAR><SUB>n</SUB>]</TT>
|
|
</B><DD>Assigns the threshold value <VAR>thresh_val</VAR> to the statistic
|
|
<VAR>field_name</VAR>, for either a File Server statistic (<B>fs</B>) or a
|
|
Cache Manager statistic (<B>cm</B>). The optional
|
|
<VAR>cmd_to_execute</VAR> field names a binary or script to execute each time
|
|
the value of the statistic changes from being below <VAR>thresh_val</VAR> to
|
|
being at or above <VAR>thresh_val</VAR>. A change between two values that
|
|
both exceed <VAR>thresh_val</VAR> does not retrigger the binary or
|
|
script. The optional <VAR>arg</VAR><SUB>1</SUB> through
|
|
<VAR>arg</VAR><SUB>n</SUB> fields are additional values that the
|
|
<B>afsmonitor</B> program passes as arguments to the
|
|
<VAR>cmd_to_execute</VAR> command. If any of them include one or more
|
|
spaces, enclose the entire field in double quotes.
|
|
<P>The parameters <B>fs</B>, <B>cm</B>, <VAR>field_name</VAR>,
|
|
<VAR>threshold_val</VAR>, and <VAR>arg</VAR><SUB>1</SUB> through
|
|
<VAR>arg</VAR><SUB>n</SUB> correspond to the values with the same name on the
|
|
<B>thresh</B> line. The <VAR>host_name</VAR> parameter identifies the
|
|
file server or client machine where the statistic has crossed the threshold,
|
|
and the <VAR>actual_val</VAR> parameter is the actual value of
|
|
<VAR>field_name</VAR> that equals or exceeds the threshold value.
|
|
<P>Use the <B>thresh</B> line to set either a global threshold, which
|
|
applies to all file server machines listed on <B>fs</B> lines or client
|
|
machines listed on <B>cm</B> lines in the configuration file, or a
|
|
machine-specific threshold, which applies to only one file server or client
|
|
machine.
|
|
<UL>
|
|
<P><LI>To set a global threshold, place the <B>thresh</B> line before any of
|
|
the <B>fs</B> or <B>cm</B> lines in the file.
|
|
<P><LI>To set a machine-specific threshold, place the <B>thresh</B> line
|
|
below the corresponding <B>fs</B> or <B>cm</B> line, and above any
|
|
other <B>fs</B> or <B>cm</B> lines. A machine-specific
|
|
threshold value always overrides the corresponding global threshold, if
|
|
set. Do not place a <B>thresh fs</B> line directly after a
|
|
<B>cm</B> line or a <B>thresh cm</B> line directly after a
|
|
<B>fs</B> line.
|
|
</UL>
|
|
<P><DT><B><TT>show fs | cm <VAR>field/group/section</VAR></TT>
|
|
</B><DD>Specifies which individual statistic, group of statistics, or section of
|
|
statistics to display on the <TT>File Servers</TT> screen (<B>fs</B>) or
|
|
<TT>Cache Managers</TT> screen (<B>cm</B>) and the order in which to
|
|
display them. The appendix of <B>afsmonitor</B> statistics in the
|
|
<I>IBM AFS Administration Guide</I> specifies the group and section to
|
|
which each statistic belongs. Include as many <B>show</B> lines as
|
|
necessary to customize the screen display as desired, and place them anywhere
|
|
in the file. The top-to-bottom order of the <B>show</B> lines in
|
|
the configuration file determines the left-to-right order in which the
|
|
statistics appear on the corresponding screen.
|
|
<P>If there are no <B>show</B> lines in the configuration file, then the
|
|
screens display all statistics for both Cache Managers and File
|
|
Servers. Similarly, if there are no <B>show fs</B> lines, the
|
|
<TT>File Servers</TT> screen displays all file server statistics, and if
|
|
there are no <B>show cm</B> lines, the <TT>Cache Managers</TT> screen
|
|
displays all client statistics.
|
|
<P><DT><B># <VAR>comments</VAR>
|
|
</B><DD>Precedes a line of text that the <B>afsmonitor</B> program ignores
|
|
because of the initial number (<B>#</B>) sign, which must appear in the
|
|
very first column of the line.
|
|
</DL>
|
|
<P>For a list of the values that can appear in the
|
|
<VAR>field/group/section</VAR> field of a <B>show</B> instruction, see <A HREF="auagd024.htm#HDRWQ617">Appendix C, The afsmonitor Program Statistics</A>.)
|
|
<P>The following example illustrates a possible configuration file:
|
|
<PRE> thresh cm dlocalAccesses 1000000
|
|
thresh cm dremoteAccesses 500000 handleDRemote
|
|
thresh fs rx_maxRtt_Usec 1000
|
|
cm client5
|
|
cm client33
|
|
cm client14
|
|
thresh cm dlocalAccesses 2000000
|
|
thresh cm vcacheMisses 10000
|
|
cm client2
|
|
fs fs3
|
|
fs fs9
|
|
fs fs5
|
|
fs fs10
|
|
show cm numCellsContacted
|
|
show cm dlocalAccesses
|
|
show cm dremoteAccesses
|
|
show cm vcacheMisses
|
|
show cm Auth_Stats_group
|
|
</PRE>
|
|
<P>Since the first three <B>thresh</B> instructions appear before any
|
|
<B>fs</B> or <B>cm</B> instructions, they set global threshold
|
|
values:
|
|
<UL>
|
|
<P><LI>All Cache Manager process in this file use <B>1000000</B> as the
|
|
threshold for the <B>dlocalAccesses</B> statistic (except for the machine
|
|
<B>client2</B> which uses an overriding value of
|
|
<B>2000000</B>.)
|
|
<P><LI>All Cache Manager processes in this file use <B>500000</B> as the
|
|
threshold value for the <B>dremoteAccesses</B> statistic; if that
|
|
value is exceeded, the script <B>handleDRemote</B> is invoked.
|
|
<P><LI>All File Server processes in this file use <B>1000</B> as the
|
|
threshold value for the <B>rx_maxRtt_Usec</B> statistic.
|
|
</UL>
|
|
<P>The four <B>cm</B> instructions monitor the Cache Manager on the
|
|
machines <B>client5</B>, <B>client33</B>, <B>client14</B>, and
|
|
<B>client2</B>. The first three use all of the global threshold
|
|
values.
|
|
<P>The Cache Manager on <B>client2</B> uses the global threshold value for
|
|
the <B>dremoteAccesses</B> statistic, but a different one for the
|
|
<B>dlocalAccesses</B> statistic. Furthermore, <B>client22</B>
|
|
is the only Cache Manager that uses the threshold set for the
|
|
<B>vcacheMisses</B> statistic.
|
|
<P>The <B>fs</B> instructions monitor the File Server on the machines
|
|
<B>fs3</B>, <B>fs9</B>, <B>fs5</B>, and <B>fs10</B>.
|
|
They all use the global threshold for the<B>rx_maxRtt_Usec</B>
|
|
statistic.
|
|
<P>Because there are no <B>show fs</B> instructions, the File Servers
|
|
screen displays all File Server statistics. The Cache Managers screen
|
|
displays only the statistics named in <B>show cm</B> instructions,
|
|
ordering them from left to right. The <B>Auth_Stats_group</B>
|
|
includes several statistics, all of which are displayed (<B>curr_PAGs</B>,
|
|
<B>curr_Records</B>, <B>curr_AuthRecords</B>,
|
|
<B>curr_UnauthRecords</B>, <B>curr_MaxRecordsInPAG</B>,
|
|
<B>curr_LongestChain</B>, <B>PAGCreations</B>,
|
|
<B>TicketUpdates</B>, <B>HWM_PAGS</B>, <B>HWM_Records</B>,
|
|
<B>HWM_MaxRecordsInPAG</B>, and <B>HWM_LongestChain</B>).
|
|
<HR><H2><A NAME="HDRWQ352" HREF="auagd002.htm#ToC_397">Writing afsmonitor Statistics to a File</A></H2>
|
|
<A NAME="IDX7204"></A>
|
|
<P>All of the statistical information collected and displayed by the
|
|
<B>afsmonitor</B> program can be preserved by writing it to an output
|
|
file. You can create an output file by using the <B>-output</B>
|
|
argument when you startup the <B>afsmonitor</B> process. You can
|
|
use the output file to track process performance over long periods of time and
|
|
to apply post-processing techniques to further analyze system trends.
|
|
<P>The <B>afsmonitor</B> program output file is a simple ASCII file that
|
|
records the information reported by the File Server and Cache Manager
|
|
screens. The output file has the following format:
|
|
<PRE> <VAR>time</VAR> <VAR>host_name</VAR> <B>CM</B>|<B>FS</B> <VAR>list_of_measured_values</VAR>
|
|
</PRE>
|
|
<P>and specifies the <I>time</I> at which the
|
|
<I>list_of_measured_values</I> were gathered from the Cache Manager
|
|
(<B>CM</B>) or File Server (<B>FS</B>) process housed on
|
|
<VAR>host_name</VAR>. On those occasion where probes fail, the value
|
|
<TT>-1</TT> is reported instead of the
|
|
<I>list_of_measured_values</I>.
|
|
<P>This file format provides several advantages:
|
|
<UL>
|
|
<P><LI>It can be viewed using a standard editor. If you intend to view
|
|
this file frequently, use the <B>-detailed</B> flag with the
|
|
<B>-output</B> argument. It formats the output file in a way that
|
|
is easier to read.
|
|
<P><LI>It can be passed through filters to extract desired information using the
|
|
standard set of UNIX tools.
|
|
<P><LI>It is suitable for long term storage of the <B>afsmonitor</B> program
|
|
output.
|
|
</UL>
|
|
<A NAME="IDX7205"></A>
|
|
<A NAME="IDX7206"></A>
|
|
<HR><H2><A NAME="Header_398" HREF="auagd002.htm#ToC_398">To start the afsmonitor Program</A></H2>
|
|
<OL TYPE=1>
|
|
<P><LI>Open a separate command shell window or use a dedicated terminal for each
|
|
instance of the <B>afsmonitor</B> program. This window or terminal
|
|
must be devoted to the exclusive use of the <B>afsmonitor</B> process
|
|
because the command cannot be run in the background.
|
|
<P><LI>Initialize the <B>afsmonitor</B> program. The message <TT>
|
|
afsmonitor Collecting Statistics...</TT>, followed by
|
|
the appearance of the System Overview screen, confirms a successful
|
|
start.
|
|
<PRE> % <B>afsmonitor</B> [<B>initcmd</B>] [<B>-config</B> <<VAR>configuration file</VAR>>] \
|
|
[<B>-frequency</B> <<VAR>poll frequency, in seconds</VAR>>] \
|
|
[<B>-output</B> <<VAR>storage file name</VAR>>] [<B>-detailed</B>] \
|
|
[<B>-debug</B> <<VAR>turn debugging output on to the named file</VAR>>] \
|
|
[<B>-fshosts</B> <<VAR>list of file servers to monitor</VAR>><SUP>+</SUP>] \
|
|
[<B>-cmhosts</B> <<VAR>list of cache managers to monitor</VAR>><SUP>+</SUP>]
|
|
afsmonitor Collecting Statistics...
|
|
</PRE>
|
|
<P>where
|
|
<DL>
|
|
<P><DT><B>initcmd
|
|
</B><DD>Is an optional string that accommodates the command's use of the AFS
|
|
command parser. It can be omitted and ignored.
|
|
<P><DT><B>-config
|
|
</B><DD>Specifies the pathname of an <B>afsmonitor</B> configuration file,
|
|
which lists the machines and statistics to monitor. Partial pathnames
|
|
are interpreted relative to the current working directory. Provide
|
|
either this argument or one or both of the <B>-fshosts</B> and
|
|
<B>-cmhosts</B> arguments. You must use a configuration file to set
|
|
thresholds or customize the screen display. For instructions on
|
|
creating the configuration file, see <A HREF="#HDRWQ351">Configuring the afsmonitor Program</A>.
|
|
<P><DT><B>-frequency
|
|
</B><DD>Specifies how often to probe the File Server and Cache Manager processes,
|
|
as a number of seconds. Acceptable values range from <B>1</B> and
|
|
<B>86400</B>; the default value is <B>60</B>. This
|
|
frequency applies to both File Server and Cache Manager probes; however,
|
|
File Server and Cache Manager probes are initiated and processed independent
|
|
of each other. The actual interval between probes to a host is the
|
|
probe frequency plus the time needed by all hosts to respond to the
|
|
probe.
|
|
<P><DT><B>-output
|
|
</B><DD>Specifies the name of an output file to which to write all of the
|
|
statistical data. By default, no output file is created. For
|
|
information on this file, see <A HREF="#HDRWQ352">Writing afsmonitor Statistics to a File</A>.
|
|
<P><DT><B>-detailed
|
|
</B><DD>Formats the output file named by the <B>-output</B> argument to be
|
|
more easily readable. The <B>-output</B> argument must be provided
|
|
along with this flag.
|
|
<P><DT><B>-fshosts
|
|
</B><DD>Identifies each File Server process to monitor by specifying the host it
|
|
is running on. You can identify a host using either its complete
|
|
Internet-style host name or an abbreviation acceptable to the cell's
|
|
naming service. Combine this argument with the <B>-cmhosts</B> if
|
|
you wish, but not the <B>-config</B> argument.
|
|
<P><DT><B>-cmhosts
|
|
</B><DD>Identifies each Cache Manager process to monitor by specifying the host it
|
|
is running on. You can identify a host using either its complete
|
|
Internet-style host name or an abbreviation acceptable to the cell's
|
|
naming service. Combine this argument with the <B>-fshosts</B> if
|
|
you wish, but not the <B>-config</B> argument.
|
|
</DL>
|
|
</OL>
|
|
<HR><H2><A NAME="Header_399" HREF="auagd002.htm#ToC_399">To stop the afsmonitor program</A></H2>
|
|
<A NAME="IDX7207"></A>
|
|
<P>To exit an <B>afsmonitor</B> program session, Enter the
|
|
<<B>Ctrl-c</B>> interrupt signal or an uppercase <B>Q</B>.
|
|
<HR><H2><A NAME="HDRWQ353" HREF="auagd002.htm#ToC_400">The xstat Data Collection Facility</A></H2>
|
|
<A NAME="IDX7208"></A>
|
|
<A NAME="IDX7209"></A>
|
|
<A NAME="IDX7210"></A>
|
|
<A NAME="IDX7211"></A>
|
|
<A NAME="IDX7212"></A>
|
|
<A NAME="IDX7213"></A>
|
|
<A NAME="IDX7214"></A>
|
|
<A NAME="IDX7215"></A>
|
|
<A NAME="IDX7216"></A>
|
|
<A NAME="IDX7217"></A>
|
|
<A NAME="IDX7218"></A>
|
|
<P>The <B>afsmonitor</B> program uses the <B>xstat</B> data collection
|
|
facility to gather and calculate the data that it (the <B>afsmonitor</B>
|
|
program) then uses to perform its function. You can also use the
|
|
<B>xstat</B> facility to create your own data display programs. If
|
|
you do, keep the following in mind. The File Server considers any
|
|
program calling its RPC routines to be a Cache Manager; therefore, any
|
|
program calling the File Server interface directly must export the Cache
|
|
Manager's callback interface. The calling program must be capable
|
|
of emulating the necessary callback state, and it must respond to periodic
|
|
keep-alive messages from the File Server. In addition, a calling
|
|
program must be able to gather the collected data.
|
|
<P>The <B>xstat</B> facility consists of two C language libraries
|
|
available to user-level applications:
|
|
<UL>
|
|
<P><LI><B>/usr/afsws/lib/afs/libxstat_fs.a</B> exports calls that
|
|
gather information from one or more running File Server processes.
|
|
<P><LI><B>/usr/afsws/lib/afs/libxstat_cm.a</B> exports calls that
|
|
collect information from one or more running Cache Managers.
|
|
</UL>
|
|
<P>The libraries allow the caller to register
|
|
<UL>
|
|
<P><LI>A set of File Servers or Cache Managers to be examined.
|
|
<P><LI>The frequency with which the File Servers or Cache Managers are to be
|
|
probed for data.
|
|
<P><LI>A user-specified routine to be called each time data is collected.
|
|
</UL>
|
|
<P>The libraries handle all of the lightweight processes, callback
|
|
interactions, and timing issues associated with the data collection.
|
|
The user needs only to process the data as it arrives.
|
|
<P><H3><A NAME="Header_401" HREF="auagd002.htm#ToC_401">The libxstat Libraries</A></H3>
|
|
<A NAME="IDX7219"></A>
|
|
<A NAME="IDX7220"></A>
|
|
<P>The <B>libxstat_fs.a</B> and <B>libxstat_cm.a</B>
|
|
libraries handle the callback requirements and other complications associated
|
|
with the collection of data from File Servers and Cache Managers. The
|
|
user provides only the means of accumulating the desired data. Each
|
|
<B>xstat</B> library implements three routines:
|
|
<UL>
|
|
<P><LI>Initialization (<B>xstat_fs_Init</B> and <B>xstat_cm_Init</B>)
|
|
arranges the periodic collection and handling of data.
|
|
<P><LI>Immediate probe (<B>xstat_fs_ForceProbeNow</B> and
|
|
<B>xstat_cm_ForceProbeNow</B>) forces the immediate collection of data,
|
|
after which collection returns to its normal probe schedule.
|
|
<P><LI>Cleanup (<B>xstat_fs_Cleanup</B> and <B>xstat_cm_Cleanup</B>)
|
|
terminates all connections and removes all traces of the data collection from
|
|
memory.
|
|
</UL>
|
|
<P>
|
|
<A NAME="IDX7221"></A>
|
|
<A NAME="IDX7222"></A>
|
|
<A NAME="IDX7223"></A>
|
|
<A NAME="IDX7224"></A>
|
|
<A NAME="IDX7225"></A>
|
|
The File Server and Cache Manager each define data collections that clients
|
|
can fetch. A data collection is simply a related set of numbers that
|
|
can be collected as a unit. For example, the File Server and Cache
|
|
Manager each define profiling and performance data collections. The
|
|
profiling collections maintain counts of the number of times internal
|
|
functions are called within servers, allowing bottleneck analysis to be
|
|
performed. The performance collections record, among other things,
|
|
internal disk I/O statistics for a File Server and cache effectiveness figures
|
|
for a Cache Manager, allowing for performance analysis.
|
|
<P>
|
|
<A NAME="IDX7226"></A>
|
|
<A NAME="IDX7227"></A>
|
|
<A NAME="IDX7228"></A>
|
|
For a copy of the detailed specification which provides much additional usage
|
|
information about the <B>xstat</B> facility, its libraries, and the
|
|
routines in the libraries, contact AFS Product Support.
|
|
<P><H3><A NAME="Header_402" HREF="auagd002.htm#ToC_402">Example xstat Commands</A></H3>
|
|
<A NAME="IDX7229"></A>
|
|
<A NAME="IDX7230"></A>
|
|
<A NAME="IDX7231"></A>
|
|
<A NAME="IDX7232"></A>
|
|
<A NAME="IDX7233"></A>
|
|
<P>AFS comes with two low-level, example commands:
|
|
<B>xstat_fs_test</B> and <B>xstat_cm_test</B>. The commands
|
|
allow you to experiment with the <B>xstat</B> facility. They gather
|
|
information and display the available data collections for a File Server or
|
|
Cache Manager. They are intended merely to provide examples of the
|
|
types of data that can be collected via <B>xstat</B>; they are not
|
|
intended for use in the actual collection of data.
|
|
<A NAME="IDX7234"></A>
|
|
<A NAME="IDX7235"></A>
|
|
<A NAME="IDX7236"></A>
|
|
<A NAME="IDX7237"></A>
|
|
<P><H4><A NAME="Header_403">To use the example xstat_fs_test command</A></H4>
|
|
<OL TYPE=1>
|
|
<P><LI>Issue the example <B>xstat_fs_test</B> command to test the routines in
|
|
the <B>libxstat_fs.a</B> library and display the data collections
|
|
associated with the File Server process. The command executes in the
|
|
foreground.
|
|
<PRE> % <B>xstat_fs_test</B> [<B>initcmd</B>] \
|
|
<B>-fsname</B> <<VAR>File Server name(s) to monitor</VAR>><SUP>+</SUP> \
|
|
<B>-collID</B> <<VAR>Collection(s) to fetch</VAR>><SUP>+</SUP> [<B>-onceonly</B>] \
|
|
[<B>-frequency</B> <<VAR>poll frequency, in seconds</VAR>>] \
|
|
[<B>-period</B> <<VAR>data collection time, in minutes</VAR>>] [<B>-debug</B>]
|
|
</PRE>
|
|
<P>where
|
|
<DL>
|
|
<P><DT><B>xstat_fs_test
|
|
</B><DD>Must be typed in full.
|
|
<P><DT><B>initcmd
|
|
</B><DD>Is an optional string that accommodates the command's use of the AFS
|
|
command parser. It can be omitted and ignored.
|
|
<P><DT><B>-fsname
|
|
</B><DD>Is the Internet host name of each file server machine on which to monitor
|
|
the File Server process.
|
|
<P><DT><B>-collID
|
|
</B><DD>Specifies each data collection to return. The indicated data
|
|
collection defines the type and amount of data the command is to gather about
|
|
the File Server. Data is returned in the form of a predefined data
|
|
structure (refer to the specification documents referenced previously for more
|
|
information about the data structures).
|
|
<P>There are two acceptable values:
|
|
<UL>
|
|
<P><LI><B>1</B> reports various internal performance statistics related to
|
|
the File Server (for example, vnode cache entries and <B>Rx</B> protocol
|
|
activity).
|
|
<P><LI><B>2</B> reports all of the internal performance statistics provided
|
|
by the <B>1</B> setting, plus some additional, detailed performance
|
|
figures about the File Server (for example, minimum, maximum, and cumulative
|
|
statistics regarding File Server RPCs, how long they take to complete, and how
|
|
many succeed).
|
|
</UL>
|
|
<P><DT><B>-onceonly
|
|
</B><DD>Directs the command to gather statistics just one time. Omit this
|
|
option to have the command continue to probe the File Server for statistics
|
|
every 30 seconds. If you omit this option, you can use the
|
|
<<B>Ctrl-c</B>> interrupt signal to halt the command at any
|
|
time.
|
|
<P><DT><B>-frequency
|
|
</B><DD>Sets the frequency in seconds at which the program initiates probes to the
|
|
File Server. If you omit this argument, the default is 30
|
|
seconds.
|
|
<P><DT><B>-period
|
|
</B><DD>Sets how long the utility runs before exiting, as a number of
|
|
minutes. If you omit this argument, the default is 10 minutes.
|
|
<P><DT><B>-debug
|
|
</B><DD>Displays additional information as the command runs.
|
|
</DL>
|
|
</OL>
|
|
<A NAME="IDX7238"></A>
|
|
<A NAME="IDX7239"></A>
|
|
<A NAME="IDX7240"></A>
|
|
<A NAME="IDX7241"></A>
|
|
<P><H4><A NAME="Header_404">To use the example xstat_cm_test command</A></H4>
|
|
<OL TYPE=1>
|
|
<P><LI>Issue the example <B>xstat_cm_test</B> command to test the routines in
|
|
the <B>libxstat_cm.a</B> library and display the data collections
|
|
associated with the Cache Manager. The command executes in the
|
|
foreground.
|
|
<PRE> % <B>xstat_cm_test</B> [<B>initcmd</B>] \
|
|
<B>-cmname</B> <<VAR>Cache Manager name(s) to monitor</VAR>><SUP>+</SUP> \
|
|
<B>-collID</B> <<VAR>Collection(s) to fetch</VAR>><SUP>+</SUP> \
|
|
[<B>-onceonly</B>] [<B>-frequency</B> <<VAR>poll frequency, in seconds</VAR>>] \
|
|
[<B>-period</B> <<VAR>data collection time, in minutes</VAR>>] [<B>-debug</B>]
|
|
</PRE>
|
|
<P>where
|
|
<DL>
|
|
<P><DT><B>xstat_cm_test
|
|
</B><DD>Must be typed in full.
|
|
<P><DT><B>initcmd
|
|
</B><DD>Is an optional string that accommodates the command's use of the AFS
|
|
command parser. It can be omitted and ignored.
|
|
<P><DT><B>-cmname
|
|
</B><DD>Is the host name of each client machine on which to monitor the Cache
|
|
Manager.
|
|
<P><DT><B>-collID
|
|
</B><DD>Specifies each data collection to return. The indicated data
|
|
collection defines the type and amount of data the command is to gather about
|
|
the Cache Manager. Data is returned in the form of a predefined data
|
|
structure (refer to the specification documents referenced previously for more
|
|
information about the data structures).
|
|
<P>There are two acceptable values:
|
|
<UL>
|
|
<P><LI><B>0</B> provides profiling information about the numbers of times
|
|
different internal Cache Manager routines were called since the Cache manager
|
|
was started.
|
|
<P><LI><B>1</B> reports various internal performance statistics related to
|
|
the Cache manager (for example, statistics about how effectively the cache is
|
|
being used and the quantity of intracell and intercell data access).
|
|
<P><LI><B>2</B> reports all of the internal performance statistics provided
|
|
by the <B>1</B> setting, plus some additional, detailed performance
|
|
figures about the Cache Manager (for example, statistics about the number of
|
|
RPCs sent by the Cache Manager and how long they take to complete; and
|
|
statistics regarding things such as authentication, access, and PAG
|
|
information associated with data access).
|
|
</UL>
|
|
<P><DT><B>-onceonly
|
|
</B><DD>Directs the command to gather statistics just one time. Omit this
|
|
option to have the command continue to probe the Cache Manager for statistics
|
|
every 30 seconds. If you omit this option, you can use the
|
|
<<B>Ctrl-c</B>> interrupt signal to halt the command at any
|
|
time.
|
|
<P><DT><B>-frequency
|
|
</B><DD>Sets the frequency in seconds at which the program initiates probes to the
|
|
Cache Manager. If you omit this argument, the default is 30
|
|
seconds.
|
|
<P><DT><B>-period
|
|
</B><DD>Sets how long the utility runs before exiting, as a number of
|
|
minutes. If you omit this argument, the default is 10 minutes.
|
|
<P><DT><B>-debug
|
|
</B><DD>Displays additional information as the command runs.
|
|
</DL>
|
|
</OL>
|
|
<HR><H2><A NAME="HDRWQ354" HREF="auagd002.htm#ToC_405">Auditing AFS Events on AIX File Servers</A></H2>
|
|
<A NAME="IDX7242"></A>
|
|
<A NAME="IDX7243"></A>
|
|
<A NAME="IDX7244"></A>
|
|
<A NAME="IDX7245"></A>
|
|
<P>You can audit AFS events on AIX File Servers using an AFS mechanism that
|
|
transfers audit information from AFS to the AIX auditing system. The
|
|
following general classes of AFS events can be audited. For a complete
|
|
list of specific AFS audit events, see <A HREF="auagd025.htm#HDRWQ620">Appendix D, AIX Audit Events</A>.
|
|
<UL>
|
|
<P><LI>Authentication and Identification Events
|
|
<P><LI>Security Events
|
|
<P><LI>Privilege Required Events
|
|
<P><LI>Object Creation and Deletion Events
|
|
<P><LI>Attribute Modification Events
|
|
<P><LI>Process Control Events
|
|
</UL>
|
|
<TABLE><TR><TD ALIGN="LEFT" VALIGN="TOP"><B>Note:</B></TD><TD ALIGN="LEFT" VALIGN="TOP">This section assumes familiarity with the AIX auditing system. For
|
|
more information, see the <I>AIX System Management Guide</I> for the
|
|
version of AIX you are using.
|
|
</TD></TR></TABLE>
|
|
<P><H3><A NAME="Header_406" HREF="auagd002.htm#ToC_406">Configuring AFS Auditing on AIX File Servers</A></H3>
|
|
<P>The directory <B>/usr/afs/local/audit</B> contains three files that
|
|
contain the information needed to configure AIX File Servers to audit AFS
|
|
events:
|
|
<UL>
|
|
<P><LI>The <B>events.sample</B> file contains information on auditable
|
|
AFS events. The contents of this file are integrated into the
|
|
corresponding AIX events file (<B>/etc/security/audit/events</B>).
|
|
<P><LI>The <B>config.sample</B> file defines the six classes of AFS
|
|
audit events and the events that make up each class. It also defines
|
|
the classes of AFS audit events to audit for the File Server, which runs as
|
|
the local superuser <B>root</B>. The contents of this file must be
|
|
integrated into the corresponding AIX config file
|
|
(<B>/etc/security/audit/config</B>).
|
|
<P><LI>The <B>objects.sample</B> file contains a list of information
|
|
about audited files. You must only audit files in the local file
|
|
space. The contents of this file must be integrated into the
|
|
corresponding AIX objects file
|
|
(<B>/etc/security/audit/objects</B>).
|
|
</UL>
|
|
<P>Once you have properly configured these files to include the AFS-relevant
|
|
information, use the AIX auditing system to start up and shut down the
|
|
auditing.
|
|
<P><H3><A NAME="Header_407" HREF="auagd002.htm#ToC_407">To enable AFS auditing</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Create the following string in the file <B>/usr/afs/local/Audit</B> on
|
|
each File Server on which you plan to audit AFS events:
|
|
<PRE> <B>AFS_AUDIT_AllEvents</B>
|
|
</PRE>
|
|
<P><LI>Issue the <B>bos restart</B> command (with the <B>-all</B> flag)
|
|
to stop and restart all server processes on each File Server. For
|
|
instructions on using this command, see <A HREF="auagd009.htm#HDRWQ170">Stopping and Immediately Restarting Processes</A>.
|
|
</OL>
|
|
<P><H3><A NAME="Header_408" HREF="auagd002.htm#ToC_408">To disable AFS auditing</A></H3>
|
|
<OL TYPE=1>
|
|
<P><LI>Remove the contents of the file <B>/usr/afs/local/Audit</B> on each
|
|
File Server for which you are no longer interested in auditing AFS
|
|
events.
|
|
<P><LI>Issue the <B>bos restart</B> command (with the <B>-all</B> flag)
|
|
to stop and restart all server processes on each File Server. For
|
|
instructions on using this command, see <A HREF="auagd009.htm#HDRWQ170">Stopping and Immediately Restarting Processes</A>.
|
|
</OL>
|
|
<HR><P ALIGN="center"> <A HREF="../index.htm"><IMG SRC="../books.gif" BORDER="0" ALT="[Return to Library]"></A> <A HREF="auagd002.htm#ToC"><IMG SRC="../toc.gif" BORDER="0" ALT="[Contents]"></A> <A HREF="auagd012.htm"><IMG SRC="../prev.gif" BORDER="0" ALT="[Previous Topic]"></A> <A HREF="#Top_Of_Page"><IMG SRC="../top.gif" BORDER="0" ALT="[Top of Topic]"></A> <A HREF="auagd014.htm"><IMG SRC="../next.gif" BORDER="0" ALT="[Next Topic]"></A> <A HREF="auagd026.htm#HDRINDEX"><IMG SRC="../index.gif" BORDER="0" ALT="[Index]"></A> <P>
|
|
<!-- Begin Footer Records ========================================== -->
|
|
<P><HR><B>
|
|
<br>© <A HREF="http://www.ibm.com/">IBM Corporation 2000.</A> All Rights Reserved
|
|
</B>
|
|
<!-- End Footer Records ============================================ -->
|
|
<A NAME="Bot_Of_Page"></A>
|
|
</BODY></HTML>
|