mirror of
https://git.openafs.org/openafs.git
synced 2025-01-31 13:38:01 +00:00
458 lines
15 KiB
Plaintext
458 lines
15 KiB
Plaintext
|
=head1 NAME
|
||
|
|
||
|
afsmonitor - Monitors File Servers and Cache Managers
|
||
|
|
||
|
=head1 DESCRIPTION
|
||
|
|
||
|
B<afsmonitor> [B<initcmd>] [-config <I<configuration file>>]
|
||
|
[B<-frequency> <I<poll frequency, in seconds>>]
|
||
|
[B<-output> <I<storage file name>>] [-detailed]
|
||
|
[-debug <I<turn debugging output on to the named file>>]
|
||
|
[-fshosts <I<list of file servers to monitor>>+]
|
||
|
[-cmhosts <I<list of cache managers to monitor>>+]
|
||
|
[B<-buffers> <I<number of buffer slots>>] [-help]
|
||
|
|
||
|
B<afsmonitor> [B<i>] [-co <I<configuration file>>]
|
||
|
[B<-fr> <I<poll frequency, in seconds>>]
|
||
|
[B<-o> <I<storage file name>>] [-det]
|
||
|
[-deb <I<turn debugging output on to the named file>>]
|
||
|
[-fs <I<list of file servers to monitor>>+]
|
||
|
[-cm <I<list of cache managers to monitor>>+]
|
||
|
[B<-b> <I<number of buffer slots>>] [-h]
|
||
|
|
||
|
=head1 DESCRIPTION
|
||
|
|
||
|
The afsmonitor command initializes a program that gathers and
|
||
|
displays statistics about specified File Server and Cache Manager
|
||
|
operations. It allows the issuer to monitor, from a single location, a
|
||
|
wide range of File Server and Cache Manager operations on any number of
|
||
|
machines in both local and foreign cells.
|
||
|
|
||
|
There are 271 available File Server statistics and 571 available Cache
|
||
|
Manager statistics, listed in the appendix about B<afsmonitor>
|
||
|
statistics in the I<IBM AFS Administration Guide>. By default,
|
||
|
the command displays all of the relevant statistics for the file server
|
||
|
machines named by the B<-fshosts> argument and the client machines
|
||
|
named by the B<-cmhosts> argument. To limit the display to only
|
||
|
the statistics of interest, list them in the configuration file specified by
|
||
|
the B<-config> argument. In addition, use the configuration
|
||
|
file for the following purposes:
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item *
|
||
|
|
||
|
To set threshold values for any monitored statistic. When the value
|
||
|
of a statistic exceeds the threshold, the B<afsmonitor> command
|
||
|
displays it in reverse video. There are no default threshold
|
||
|
values.
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
To invoke a program or script automatically when a statistic exceeds its
|
||
|
threshold. The AFS distribution does not include any such
|
||
|
scripts.
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
To list the file server and client machines to monitor, instead of using
|
||
|
the B<-fshosts> and B<-cmhosts> arguments.
|
||
|
|
||
|
|
||
|
=back
|
||
|
|
||
|
For a description of the configuration file, see the afsmonitor
|
||
|
Configuration File reference page
|
||
|
|
||
|
=head1 CAVEATS
|
||
|
|
||
|
The following software must be accessible to a machine where the
|
||
|
B<afsmonitor> program is running:
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item *
|
||
|
|
||
|
The AFS B<xstat> libraries, which the afsmonitor
|
||
|
program uses to gather data
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
The curses graphics package, which most UNIX distributions
|
||
|
provide as a standard utility
|
||
|
|
||
|
|
||
|
=back
|
||
|
|
||
|
The afsmonitor screens format successfully both on so-called
|
||
|
dumb terminals and in windowing systems that emulate terminals. For the
|
||
|
output to looks its best, the display environment needs to support reverse
|
||
|
video and cursor addressing. Set the TERM environment variable to the
|
||
|
correct terminal type, or to a value that has characteristics similar to the
|
||
|
actual terminal type. The display window or terminal must be at least
|
||
|
80 columns wide and 12 lines long.
|
||
|
L<(1)>
|
||
|
L<(1)>
|
||
|
L<(1)>
|
||
|
|
||
|
The afsmonitor program must run in the foreground, and in its
|
||
|
own separate, dedicated window or terminal. The window or terminal is
|
||
|
unavailable for any other activity as long as the B<afsmonitor>
|
||
|
program is running. Any number of instances of the
|
||
|
B<afsmonitor> program can run on a single machine, as long as each
|
||
|
instance runs in its own dedicated window or terminal. Note that it can
|
||
|
take up to three minutes to start an additional instance.
|
||
|
|
||
|
=head1 OPTIONS
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item initcmd
|
||
|
|
||
|
Accommodates the command's use of the AFS command parser, and is
|
||
|
optional.
|
||
|
|
||
|
=item -config
|
||
|
|
||
|
Names the configuration file which lists the machines to monitor,
|
||
|
statistics to display, and threshold values, if any. A partial pathname
|
||
|
is interpreted relative to the current working directory. Provide this
|
||
|
argument if not providing the B<-fshosts> argument,
|
||
|
B<-cmhosts> argument, or neither. For instructions on creating
|
||
|
this file, see the preceding B<Description> section, and the section
|
||
|
on the B<afsmonitor> program in the I<IBM AFS Administration
|
||
|
Guide>.
|
||
|
|
||
|
=item -frequency
|
||
|
|
||
|
Specifies in seconds how often the afsmonitor program probes
|
||
|
the File Servers and Cache Managers. Valid values range from
|
||
|
B<1> to B<86400> (which is 24 hours); the default value
|
||
|
is B<60>. This frequency applies to both File Servers and Cache
|
||
|
Managers, but the B<afsmonitor> program initiates the two types of
|
||
|
probes, and processes their results, separately. The actual interval
|
||
|
between probes to a host is the probe frequency plus the time required for all
|
||
|
hosts to respond.
|
||
|
|
||
|
=item -output
|
||
|
|
||
|
Names the file to which the afsmonitor program writes all of
|
||
|
the statistics that it collects. By default, no output file is
|
||
|
created. See the section on the B<afsmonitor> command in the
|
||
|
I<IBM AFS Administration Guide> for information on this file.
|
||
|
|
||
|
=item -detailed
|
||
|
|
||
|
Formats the information in the output file named by -output
|
||
|
argument in a maximally readable format. Provide the B<-output>
|
||
|
argument along with this one.
|
||
|
|
||
|
=item -fshosts
|
||
|
|
||
|
Names one or more machines from which to gather File Server
|
||
|
statistics. For each machine, provide either a fully qualified host
|
||
|
name, or an unambiguous abbreviation (the ability to resolve an abbreviation
|
||
|
depends on the state of the cell's name service at the time the command
|
||
|
is issued). This argument can be combined with the B<-cmhosts>
|
||
|
argument, but not with the B<-config> argument.
|
||
|
|
||
|
=item -cmhosts
|
||
|
|
||
|
Names one or more machines from which to gather Cache Manager
|
||
|
statistics. For each machine, provide either a fully qualified host
|
||
|
name, or an unambiguous abbreviation (the ability to resolve an abbreviation
|
||
|
depends on the state of the cell's name service at the time the command
|
||
|
is issued). This argument can be combined with the B<-fshosts>
|
||
|
argument, but not with the B<-config> argument.
|
||
|
|
||
|
=item -buffers
|
||
|
|
||
|
Is nonoperational and provided to accommodate potential future
|
||
|
enhancements to the program.
|
||
|
|
||
|
=item -help
|
||
|
|
||
|
Prints the online help for this command. All other valid options
|
||
|
are ignored.
|
||
|
|
||
|
=back
|
||
|
|
||
|
=head1 OUTPUT
|
||
|
|
||
|
The afsmonitor program displays its data on three screens:
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item *
|
||
|
|
||
|
C<System Overview>: This screen appears automatically when
|
||
|
the B<afsmonitor> program initializes. It summarizes separately
|
||
|
for File Servers and Cache Managers the number of machines being monitored and
|
||
|
how many of them have I<alerts> (statistics that have exceeded their
|
||
|
thresholds). It then lists the hostname and number of alerts for each
|
||
|
machine being monitored, indicating if appropriate that a process failed to
|
||
|
respond to the last probe.
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
C<File Server>: This screen displays File Server statistics
|
||
|
for each file server machine being monitored. It highlights statistics
|
||
|
that have exceeded their thresholds, and identifies machines that failed to
|
||
|
respond to the last probe.
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
C<Cache Managers>: This screen displays Cache Manager
|
||
|
statistics for each client machine being monitored. It highlights
|
||
|
statistics that have exceeded their thresholds, and identifies machines that
|
||
|
failed to respond to the last probe.
|
||
|
|
||
|
|
||
|
=back
|
||
|
|
||
|
Fields at the corners of every screen display the following
|
||
|
information:
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item *
|
||
|
|
||
|
In the top left corner, the program name and version number.
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
In the top right corner, the screen name, current and total page numbers,
|
||
|
and current and total column numbers. The page number (for example,
|
||
|
C<p. 1 of 3>) indicates the index of the current page and the
|
||
|
total number of (vertical) pages over which data is displayed. The
|
||
|
column number (for example, C<c. 1 of 235>) indicates the index
|
||
|
of the current leftmost column and the total number of columns in which data
|
||
|
appears. (The symbol C<>>>> indicates that there is additional
|
||
|
data to the right; the symbol C<<<<> indicates that
|
||
|
there is additional data to the left.)
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
In the bottom left corner, a list of the available commands. Enter
|
||
|
the first letter in the command name to run that command. Only the
|
||
|
currently possible options appear; for example, if there is only one page
|
||
|
of data, the C<next> and C<prev> commands, which scroll the
|
||
|
screen up and down respectively, do not appear. For descriptions of the
|
||
|
commands, see the following section about navigating the display
|
||
|
screens.
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
In the bottom right corner, the C<probes> field reports how many
|
||
|
times the program has probed File Servers (C<fs>), Cache Managers
|
||
|
(C<cm>), or both. The counts for File Servers and Cache
|
||
|
Managers can differ. The C<freq> field reports how often the
|
||
|
program sends probes.
|
||
|
|
||
|
|
||
|
=back
|
||
|
|
||
|
Navigating the afsmonitor Display Screens
|
||
|
|
||
|
As noted, the lower left hand corner of every display screen displays the
|
||
|
names of the commands currently available for moving to alternate screens,
|
||
|
which can either be a different type or display more statistics or machines of
|
||
|
the current type. To execute a command, press the lowercase version of
|
||
|
the first letter in its name. Some commands also have an uppercase
|
||
|
version that has a somewhat different effect, as indicated in the following
|
||
|
list.
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item C<cm
|
||
|
>
|
||
|
|
||
|
Switches to the C<Cache Managers> screen. Available only on
|
||
|
the C<System Overview> and C<File Servers> screens.
|
||
|
|
||
|
=item C<fs
|
||
|
>
|
||
|
|
||
|
Switches to the C<File Servers> screen. Available only on
|
||
|
the C<System Overview> and the C<Cache Managers>
|
||
|
screens.
|
||
|
|
||
|
=item C<left
|
||
|
>
|
||
|
|
||
|
Scrolls horizontally to the left, to access the data columns situated to
|
||
|
the left of the current set. Available when the C<<<<>
|
||
|
symbol appears at the top left of the screen. Press uppercase
|
||
|
B<L> to scroll horizontally all the way to the left (to display the
|
||
|
first set of data columns).
|
||
|
|
||
|
=item C<next
|
||
|
>
|
||
|
|
||
|
Scrolls down vertically to the next page of machine names.
|
||
|
Available when there are two or more pages of machines and the final page is
|
||
|
not currently displayed. Press uppercase B<N> to scroll to the
|
||
|
final page.
|
||
|
|
||
|
=item C<oview
|
||
|
>
|
||
|
|
||
|
Switches to the C<System Overview> screen. Available only
|
||
|
on the C<Cache Managers> and C<File Servers> screens.
|
||
|
|
||
|
=item C<prev
|
||
|
>
|
||
|
|
||
|
Scrolls up vertically to the previous page of machine names.
|
||
|
Available when there are two or more pages of machines and the first page is
|
||
|
not currently displayed. Press uppercase B<N> to scroll to the
|
||
|
first page.
|
||
|
|
||
|
=item C<right
|
||
|
>
|
||
|
|
||
|
Scrolls horizontally to the right, to access the data columns situated to
|
||
|
the right of the current set. This command is available when the
|
||
|
C<>>>> symbol appears at the upper right of the screen. Press
|
||
|
uppercase B<R> to scroll horizontally all the way to the right (to
|
||
|
display the final set of data columns).
|
||
|
|
||
|
=back
|
||
|
|
||
|
The System Overview Screen
|
||
|
|
||
|
The C<System Overview> screen appears automatically as the
|
||
|
B<afsmonitor> program initializes. This screen displays the
|
||
|
status of as many File Server and Cache Manager processes as can fit in the
|
||
|
current window; scroll down to access additional information.
|
||
|
|
||
|
The information on this screen is split into File Server information on the
|
||
|
left and Cache Manager information on the right. The header for each
|
||
|
grouping reports two pieces of information:
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item *
|
||
|
|
||
|
The number of machines on which the program is monitoring the indicated
|
||
|
process
|
||
|
|
||
|
|
||
|
=item *
|
||
|
|
||
|
The number of alerts and the number of machines affected by them (an
|
||
|
I<alert>means that a statistic has exceeded its threshold or a process
|
||
|
failed to respond to the last probe)
|
||
|
|
||
|
|
||
|
=back
|
||
|
|
||
|
A list of the machines being monitored follows. If there are any
|
||
|
alerts on a machine, the number of them appears in square brackets to the left
|
||
|
of the hostname. If a process failed to respond to the last probe, the
|
||
|
letters C<PF> (probe failure) appear in square brackets to the left of
|
||
|
the hostname.
|
||
|
|
||
|
The File Servers Screen
|
||
|
|
||
|
The C<File Servers> screen displays the values collected at the
|
||
|
most recent probe for File Server statistics.
|
||
|
|
||
|
A summary line at the top of the screen (just below the standard program
|
||
|
version and screen title blocks) specifies the number of monitored File
|
||
|
Servers, the number of alerts, and the number of machines affected by the
|
||
|
alerts.
|
||
|
|
||
|
The first column always displays the hostnames of the machines running the
|
||
|
monitored File Servers.
|
||
|
|
||
|
To the right of the hostname column appear as many columns of statistics as
|
||
|
can fit within the current width of the display screen or window; each
|
||
|
column requires space for 10 characters. The name of the statistic
|
||
|
appears at the top of each column. If the File Server on a machine did
|
||
|
not respond to the most recent probe, a pair of dashes (C<-->) appears
|
||
|
in each column. If a value exceeds its configured threshold, it is
|
||
|
highlighted in reverse video. If a value is too large to fit into the
|
||
|
allotted column width, it overflows into the next row in the same
|
||
|
column.
|
||
|
|
||
|
The Cache Managers Screen
|
||
|
|
||
|
The C<Cache Managers> screen displays the values collected at the
|
||
|
most recent probe for Cache Manager statistics.
|
||
|
|
||
|
A summary line at the top of the screen (just below the standard program
|
||
|
version and screen title blocks) specifies the number of monitored Cache
|
||
|
Managers, the number of alerts, and the number of machines affected by the
|
||
|
alerts.
|
||
|
|
||
|
The first column always displays the hostnames of the machines running the
|
||
|
monitored Cache Managers.
|
||
|
|
||
|
To the right of the hostname column appear as many columns of statistics as
|
||
|
can fit within the current width of the display screen or window; each
|
||
|
column requires space for 10 characters. The name of the statistic
|
||
|
appears at the top of each column. If the Cache Manager on a machine
|
||
|
did not respond to the most recent probe, a pair of dashes (C<-->)
|
||
|
appears in each column. If a value exceeds its configured threshold, it
|
||
|
is highlighted in reverse video. If a value is too large to fit into
|
||
|
the allotted column width, it overflows into the next row in the same
|
||
|
column.
|
||
|
|
||
|
Writing to an Output File
|
||
|
|
||
|
Include the -output argument to name the file into which the
|
||
|
B<afsmonitor> program writes all of the statistics it collects.
|
||
|
The output file can be useful for tracking performance over long periods of
|
||
|
time, and enables the administrator to apply post-processing techniques that
|
||
|
reveal system trends. The AFS distribution does not include any
|
||
|
post-processing programs.
|
||
|
|
||
|
The output file is in ASCII format and records the same information as the
|
||
|
C<File Server> and C<Cache Manager> display screens.
|
||
|
Each line in the file uses the following format to record the time at which
|
||
|
the B<afsmonitor> program gathered the indicated statistic from the
|
||
|
Cache Manager (C<CM>) or File Server (C<FS>) running on the
|
||
|
machine called I<host_name>. If a probe failed, the error code
|
||
|
C<-1> appears in the I<statistic> field.
|
||
|
|
||
|
I<time> I<host_name> CM|FS I<statistic>
|
||
|
|
||
|
If the administrator usually reviews the output file manually, rather than
|
||
|
using it as input to an automated analysis program or script, including the
|
||
|
B<-detail> flag formats the data in a more easily readable
|
||
|
form.
|
||
|
|
||
|
=head1 EXAMPLES
|
||
|
|
||
|
For examples of commands, display screens, and configuration files, see the
|
||
|
section about the B<afsmonitor> program in the I<IBM AFS
|
||
|
Administration Guide>.
|
||
|
|
||
|
=head1 PRIVILEGE REQUIRED
|
||
|
|
||
|
None
|
||
|
|
||
|
=head1 SEE ALSO
|
||
|
|
||
|
L<afsmonitor Configuration File(1)>
|
||
|
|
||
|
L<fstrace(1)>,
|
||
|
L<scout(1)>
|
||
|
|
||
|
=head1 COPYRIGHT
|
||
|
|
||
|
IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.
|
||
|
|
||
|
This documentation is covered by the IBM Public License Version 1.0. It was
|
||
|
converted from HTML to POD by software written by Chas Williams and Russ
|
||
|
Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.
|