mirror of
https://git.openafs.org/openafs.git
synced 2025-01-19 07:20:11 +00:00
0e91773fd7
LICENSE IPL10 Update the fileserver documentation for demand-attach and add documentation of other missing options and notes where some options are only applicable with particular builds.
763 lines
29 KiB
Plaintext
763 lines
29 KiB
Plaintext
=head1 NAME
|
|
|
|
fileserver - Initializes the File Server component of the fs process
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
=for html
|
|
<div class="synopsis">
|
|
|
|
B<fileserver> S<<< [B<-auditlog> <I<path to log file>>] >>>
|
|
S<<< [B<-d> <I<debug level>>] >>>
|
|
S<<< [B<-p> <I<number of processes>>] >>>
|
|
S<<< [B<-spare> <I<number of spare blocks>>] >>>
|
|
S<<< [B<-pctspare> <I<percentage spare>>] >>>
|
|
S<<< [B<-b> <I<buffers>>] >>>
|
|
S<<< [B<-l> <I<large vnodes>>] >>>
|
|
S<<< [B<-s> <I<small vnodes>>] >>>
|
|
S<<< [B<-vc> <I<volume cachesize>>] >>>
|
|
S<<< [B<-w> <I<call back wait interval>>] >>>
|
|
S<<< [B<-cb> <I<number of call backs>>] >>>
|
|
S<<< [B<-banner>] >>>
|
|
S<<< [B<-novbc>] >>>
|
|
S<<< [B<-implicit> <I<admin mode bits: rlidwka>>] >>>
|
|
S<<< [B<-readonly>] >>>
|
|
S<<< [B<-hr> <I<number of hours between refreshing the host cps>>] >>>
|
|
S<<< [B<-busyat> <I<< redirect clients when queue > n >>>] >>>
|
|
S<<< [B<-nobusy>] >>>
|
|
S<<< [B<-rxpck> <I<number of rx extra packets>>] >>>
|
|
S<<< [B<-rxdbg>] >>>
|
|
S<<< [B<-rxdbge>] >>>
|
|
S<<< [B<-rxmaxmtu> <I<bytes>>] >>>
|
|
S<<< [B<-nojumbo> >>>
|
|
S<<< [B<-rxbind> >>>
|
|
S<<< [B<-allow-dotted-principals>] >>>
|
|
S<<< [B<-L>] >>>
|
|
S<<< [B<-S>] >>>
|
|
S<<< [B<-k> <I<stack size>>] >>>
|
|
S<<< [B<-realm> <I<Kerberos realm name>>] >>>
|
|
S<<< [B<-udpsize> <I<size of socket buffer in bytes>>] >>>
|
|
S<<< [B<-sendsize> <I<size of send buffer in bytes>>] >>>
|
|
S<<< [B<-abortthreshold> <I<abort threshold>>] >>>
|
|
S<<< [B<-enable_peer_stats>] >>>
|
|
S<<< [B<-enable_process_stats>] >>>
|
|
S<<< [B<-syslog> [<I< loglevel >>]] >>>
|
|
S<<< [B<-mrafslogs>] >>>
|
|
S<<< [B<-saneacls>] >>>
|
|
S<<< [B<-help>] >>>
|
|
S<<< [B<-fs-state-dont-save>] >>>
|
|
S<<< [B<-fs-state-dont-restore>] >>>
|
|
S<<< [B<-fs-state-verify>] (none | save | restore | both)] >>>
|
|
S<<< [B<-vhashsize> <I<log(2) of number of volume hash buckets>>] >>>
|
|
S<<< [B<-vlrudisable>] >>>
|
|
S<<< [B<-vlruthresh> <I<minutes before unused volumes become eligible for soft detach>>] >>>
|
|
S<<< [B<-vlruinterval> <I<seconds between VLRU scans>>] >>>
|
|
S<<< [B<-vlrumax> <I<max volumes to soft detach in one VLRU scan>>] >>>
|
|
S<<< [B<-vattachpar> <I<number of volume attach threads>>] >>>
|
|
S<<< [B<-m> <I<min percentage spare in partition>>] >>>
|
|
S<<< [B<-lock>] >>>
|
|
|
|
=for html
|
|
</div>
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
The B<fileserver> command initializes the File Server component of the
|
|
C<fs> process. In the conventional configuration, its binary file is
|
|
located in the F</usr/afs/bin> directory on a file server machine.
|
|
|
|
The B<fileserver> command is not normally issued at the command shell
|
|
prompt, but rather placed into a database server machine's
|
|
F</usr/afs/local/BosConfig> file with the B<bos create> command. If it is
|
|
ever issued at the command shell prompt, the issuer must be logged onto a
|
|
file server machine as the local superuser C<root>.
|
|
|
|
The File Server creates the F</usr/afs/logs/FileLog> log file as it
|
|
initializes, if the file does not already exist. It does not write a
|
|
detailed trace by default, but the B<-d> option may be used to
|
|
increase the amount of detail. Use the B<bos getlog> command to
|
|
display the contents of the log file.
|
|
|
|
The command's arguments enable the administrator to control many aspects
|
|
of the File Server's performance, as detailed in L<OPTIONS>. By default
|
|
the B<fileserver> command sets values for many arguments that are suitable
|
|
for a medium-sized file server machine. To set values suitable for a small
|
|
or large file server machine, use the B<-S> or B<-L> flag
|
|
respectively. The following list describes the parameters and
|
|
corresponding argument for which the B<fileserver> command sets default
|
|
values, and the table below summarizes the setting for each of the three
|
|
machine sizes.
|
|
|
|
=over 4
|
|
|
|
=item *
|
|
|
|
The maximum number of lightweight processes (LWPs) or pthreads
|
|
the File Server uses to handle requests for data; corresponds to the
|
|
B<-p> argument. The File Server always uses a minimum of 32 KB of
|
|
memory for these processes.
|
|
|
|
=item *
|
|
|
|
The maximum number of directory blocks the File Server caches in memory;
|
|
corresponds to the B<-b> argument. Each cached directory block (buffer)
|
|
consumes 2,092 bytes of memory.
|
|
|
|
=item *
|
|
|
|
The maximum number of large vnodes the File Server caches in memory for
|
|
tracking directory elements; corresponds to the B<-l> argument. Each large
|
|
vnode consumes 292 bytes of memory.
|
|
|
|
=item *
|
|
|
|
The maximum number of small vnodes the File Server caches in memory for
|
|
tracking file elements; corresponds to the B<-s> argument. Each small
|
|
vnode consumes 100 bytes of memory.
|
|
|
|
=item *
|
|
|
|
The maximum volume cache size, which determines how many volumes the File
|
|
Server can cache in memory before having to retrieve data from disk;
|
|
corresponds to the B<-vc> argument.
|
|
|
|
=item *
|
|
|
|
The maximum number of callback structures the File Server caches in
|
|
memory; corresponds to the B<-cb> argument. Each callback structure
|
|
consumes 16 bytes of memory.
|
|
|
|
=item *
|
|
|
|
The maximum number of Rx packets the File Server uses; corresponds to the
|
|
B<-rxpck> argument. Each packet consumes 1544 bytes of memory.
|
|
|
|
=back
|
|
|
|
The default values are:
|
|
|
|
Parameter (Argument) Small (-S) Medium Large (-L)
|
|
---------------------------------------------------------------------
|
|
Number of LWPs (-p) 6 9 12
|
|
Number of cached dir blocks (-b) 70 90 120
|
|
Number of cached large vnodes (-l) 200 400 600
|
|
Number of cached small vnodes (-s) 200 400 600
|
|
Maximum volume cache size (-vc) 200 400 600
|
|
Number of callbacks (-cb) 20,000 60,000 64,000
|
|
Number of Rx packets (-rxpck) 100 150 200
|
|
|
|
To override any of the values, provide the indicated argument (which can
|
|
be combined with the B<-S> or B<-L> flag).
|
|
|
|
The amount of memory required for the File Server varies. The approximate
|
|
default memory usage is 751 KB when the B<-S> flag is used (small
|
|
configuration), 1.1 MB when all defaults are used (medium configuration),
|
|
and 1.4 MB when the B<-L> flag is used (large configuration). If
|
|
additional memory is available, increasing the value of the B<-cb> and
|
|
B<-vc> arguments can improve File Server performance most directly.
|
|
|
|
By default, the File Server allows a volume to exceed its quota by 1 MB
|
|
when an application is writing data to an existing file in a volume that
|
|
is full. The File Server still does not allow users to create new files in
|
|
a full volume. To change the default, use one of the following arguments:
|
|
|
|
=over 4
|
|
|
|
=item *
|
|
|
|
Set the B<-spare> argument to the number of extra kilobytes that the File
|
|
Server allows as overage. A value of C<0> allows no overage.
|
|
|
|
=item *
|
|
|
|
Set the B<-pctspare> argument to the percentage of the volume's quota the
|
|
File Server allows as overage.
|
|
|
|
=back
|
|
|
|
By default, the File Server implicitly grants the C<a> (administer) and
|
|
C<l> (lookup) permissions to system:administrators on the access control
|
|
list (ACL) of every directory in the volumes stored on its file server
|
|
machine. In other words, the group's members can exercise those two
|
|
permissions even when an entry for the group does not appear on an ACL. To
|
|
change the set of default permissions, use the B<-implicit> argument.
|
|
|
|
The File Server maintains a I<host current protection subgroup> (I<host
|
|
CPS>) for each client machine from which it has received a data access
|
|
request. Like the CPS for a user, a host CPS lists all of the Protection
|
|
Database groups to which the machine belongs, and the File Server compares
|
|
the host CPS to a directory's ACL to determine in what manner users on the
|
|
machine are authorized to access the directory's contents. When the B<pts
|
|
adduser> or B<pts removeuser> command is used to change the groups to
|
|
which a machine belongs, the File Server must recompute the machine's host
|
|
CPS in order to notice the change. By default, the File Server contacts
|
|
the Protection Server every two hours to recompute host CPSs, implying
|
|
that it can take that long for changed group memberships to become
|
|
effective. To change this frequency, use the B<-hr> argument.
|
|
|
|
The File Server stores volumes in partitions. A partition is a
|
|
filesystem or directory on the server machine that is named C</vicepX>
|
|
or C</vicepXX> where XX is "a" through "z" or "aa" though "zz". The
|
|
File Server expects that the /vicepXX directories are each on a
|
|
dedicated filesystem. The File Server will only use a /vicepXX if it's
|
|
a mountpoint for another filesystem, unless the file
|
|
C</vicepXX/AlwaysAttach> exists. The data in the partition is a
|
|
special format that can only be access using OpenAFS commands or an
|
|
OpenAFS client.
|
|
|
|
The File Server generates the following message when a partition is nearly
|
|
full:
|
|
|
|
No space left on device
|
|
|
|
This command does not use the syntax conventions of the AFS command
|
|
suites. Provide the command name and all option names in full.
|
|
|
|
=head1 CAUTIONS
|
|
|
|
Do not use the B<-k> and B<-w> arguments, which are intended for use
|
|
by the OpenAFS developers only. Changing them from their default
|
|
values can result in unpredictable File Server behavior. In any case,
|
|
on many operating systems the File Server uses native threads rather
|
|
than the LWP threads, so using the B<-k> argument to set the number of
|
|
LWP threads has no effect.
|
|
|
|
Do not specify both the B<-spare> and B<-pctspare> arguments. Doing so
|
|
causes the File Server to exit, leaving an error message in the
|
|
F</usr/afs/logs/FileLog> file.
|
|
|
|
Options that are available only on some system types, such as the B<-m>
|
|
and B<-lock> options, appear in the output generated by the B<-help>
|
|
option only on the relevant system type.
|
|
|
|
Currently, the maximum size of a volume is 2 terabytes (2^31 bytes)
|
|
and the maximum size of a /vicepX partition on a fileserver is 2^64
|
|
kilobytes. The fileserver will not report an error when it has access
|
|
to a partition larger than 2^64 kilobytes, but it will probably fail if
|
|
the administrator attempts to use more than 2^64 kilobytes of space. In
|
|
addition, there are reports of erroneous disk usage numbers when
|
|
B<vos partinfo> or other OpenAFS disk reporting tools are used with
|
|
partitions larger than 2^64 kilobytes.
|
|
|
|
The maximum number of directory entries is 64,000 if all of the
|
|
entries have names that are 15 characters or less in length. A name
|
|
that is 15 characters long requires the use of only one block in the
|
|
directory. Additional sequential blocks are required to store entries
|
|
with names that are longer than 15 characters. Each additional block
|
|
provides an additional length of 32 characters for the name of the
|
|
entry.
|
|
|
|
In real world use, the maximum number of objects in an AFS directory
|
|
is usually between 16,000 and 25,000, depending on the average name
|
|
length.
|
|
|
|
=head1 OPTIONS
|
|
|
|
=over 4
|
|
|
|
=item B<-auditlog> <I<log path>>
|
|
|
|
Set and enable auditing.
|
|
|
|
=item B<-d> <I<debug level>>
|
|
|
|
Sets the detail level for the debugging trace written to the
|
|
F</usr/afs/logs/FileLog> file. Provide one of the following values, each
|
|
of which produces an increasingly detailed trace: C<0>, C<1>, C<5>, C<25>,
|
|
and C<125>. The default value of C<0> produces only a few messages.
|
|
|
|
=item B<-p> <I<number of processes>>
|
|
|
|
Sets the number of threads (or LWPs) to run. Provide a positive integer.
|
|
The File Server creates and uses five threads for special purposes,
|
|
in addition to the number specified (but if this argument specifies
|
|
the maximum possible number, the File Server automatically uses five
|
|
of the threads for its own purposes).
|
|
|
|
The maximum number of threads can differ in each release of AFS. Consult
|
|
the I<IBM AFS Release Notes> for the current release.
|
|
|
|
=item B<-spare> <I<number of spare blocks>>
|
|
|
|
Specifies the number of additional kilobytes an application can store in a
|
|
volume after the quota is exceeded. Provide a positive integer; a value of
|
|
C<0> prevents the volume from ever exceeding its quota. Do not combine
|
|
this argument with the B<-pctspare> argument.
|
|
|
|
=item B<-pctspare> <I<percentage spare>>
|
|
|
|
Specifies the amount by which the File Server allows a volume to exceed
|
|
its quota, as a percentage of the quota. Provide an integer between C<0>
|
|
and C<99>. A value of C<0> prevents the volume from ever exceeding its
|
|
quota. Do not combine this argument with the B<-spare> argument.
|
|
|
|
=item B<-b> <I<buffers>>
|
|
|
|
Sets the number of directory buffers. Provide a positive integer.
|
|
|
|
=item B<-l> <I<large vnodes>>
|
|
|
|
Sets the number of large vnodes available in memory for caching directory
|
|
elements. Provide a positive integer.
|
|
|
|
=item B<-s> <I<small nodes>>
|
|
|
|
Sets the number of small vnodes available in memory for caching file
|
|
elements. Provide a positive integer.
|
|
|
|
=item B<-vc> <I<volume cachesize>>
|
|
|
|
Sets the number of volumes the File Server can cache in memory. Provide a
|
|
positive integer.
|
|
|
|
=item B<-w> <I<call back wait interval>>
|
|
|
|
Sets the interval at which the daemon spawned by the File Server performs
|
|
its maintenance tasks. Do not use this argument; changing the default
|
|
value can cause unpredictable behavior.
|
|
|
|
=item B<-cb> <I<number of callbacks>>
|
|
|
|
Sets the number of callbacks the File Server can track. Provide a positive
|
|
integer.
|
|
|
|
=item B<-banner>
|
|
|
|
Prints the following banner to F</dev/console> about every 10 minutes.
|
|
|
|
File Server is running at I<time>.
|
|
|
|
=item B<-novbc>
|
|
|
|
Prevents the File Server from breaking the callbacks that Cache Managers
|
|
hold on a volume that the File Server is reattaching after the volume was
|
|
offline (as a result of the B<vos restore> command, for example). Use of
|
|
this flag is strongly discouraged.
|
|
|
|
=item B<-implicit> <I<admin mode bits>>
|
|
|
|
Defines the set of permissions granted by default to the
|
|
system:administrators group on the ACL of every directory in a volume
|
|
stored on the file server machine. Provide one or more of the standard
|
|
permission letters (C<rlidwka>) and auxiliary permission letters
|
|
(C<ABCDEFGH>), or one of the shorthand notations for groups of permissions
|
|
(C<all>, C<none>, C<read>, and C<write>). To review the meaning of the
|
|
permissions, see the B<fs setacl> reference page.
|
|
|
|
=item B<-readonly>
|
|
|
|
Don't allow writes to this fileserver.
|
|
|
|
=item B<-hr> <I<number of hours between refreshing the host cps>>
|
|
|
|
Specifies how often the File Server refreshes its knowledge of the
|
|
machines that belong to protection groups (refreshes the host CPSs for
|
|
machines). The File Server must update this information to enable users
|
|
from machines recently added to protection groups to access data for which
|
|
those machines now have the necessary ACL permissions.
|
|
|
|
=item B<-busyat> <I<< redirect clients when queue > n >>>
|
|
|
|
Defines the number of incoming RPCs that can be waiting for a response
|
|
from the File Server before the File Server returns the error code
|
|
C<VBUSY> to the Cache Manager that sent the latest RPC. In response, the
|
|
Cache Manager retransmits the RPC after a delay. This argument prevents
|
|
the accumulation of so many waiting RPCs that the File Server can never
|
|
process them all. Provide a positive integer. The default value is
|
|
C<600>.
|
|
|
|
=item B<-rxpck> <I<number of rx extra packets>>
|
|
|
|
Controls the number of Rx packets the File Server uses to store data for
|
|
incoming RPCs that it is currently handling, that are waiting for a
|
|
response, and for replies that are not yet complete. Provide a positive
|
|
integer.
|
|
|
|
=item B<-rxdbg>
|
|
|
|
Writes a trace of the File Server's operations on Rx packets to the file
|
|
F</usr/afs/logs/rx_dbg>.
|
|
|
|
=item B<-rxdbge>
|
|
|
|
Writes a trace of the File Server's operations on Rx events (such as
|
|
retransmissions) to the file F</usr/afs/logs/rx_dbg>.
|
|
|
|
=item B<-rxmaxmtu> <I<bytes>>
|
|
|
|
Defines the maximum size of an MTU. The value must be between the
|
|
minimum and maximum packet data sizes for Rx.
|
|
|
|
=item B<-nojumbo>
|
|
|
|
Do not send, and do not accept, jumbograms.
|
|
|
|
=item B<-rxbind>
|
|
|
|
Force the fileserver to only bind to one IP address.
|
|
|
|
=item B<-allow-dotted-principal>
|
|
|
|
By default, the RXKAD security layer will disallow access by Kerberos
|
|
principals with a dot in the first component of their name. This is to avoid
|
|
the confusion where principals user/admin and user.admin are both mapped to the
|
|
user.admin PTS entry. Sites whose Kerberos realms don't have these collisions
|
|
between principal names may disable this check by starting the server
|
|
with this option.
|
|
|
|
=item B<-L>
|
|
|
|
Sets values for many arguments in a manner suitable for a large file
|
|
server machine. Combine this flag with any option except the B<-S> flag;
|
|
omit both flags to set values suitable for a medium-sized file server
|
|
machine.
|
|
|
|
=item B<-S>
|
|
|
|
Sets values for many arguments in a manner suitable for a small file
|
|
server machine. Combine this flag with any option except the B<-L> flag;
|
|
omit both flags to set values suitable for a medium-sized file server
|
|
machine.
|
|
|
|
=item B<-k> <I<stack size>>
|
|
|
|
Sets the LWP stack size in units of 1 kilobyte. Do not use this argument,
|
|
and in particular do not specify a value less than the default of C<24>.
|
|
|
|
=item B<-realm> <I<Kerberos realm name>>
|
|
|
|
Defines the Kerberos realm name for the File Server to use. If this
|
|
argument is not provided, it uses the realm name corresponding to the cell
|
|
listed in the local F</usr/afs/etc/ThisCell> file.
|
|
|
|
=item B<-udpsize> <I<size of socket buffer in bytes>>
|
|
|
|
Sets the size of the UDP buffer, which is 64 KB by default. Provide a
|
|
positive integer, preferably larger than the default.
|
|
|
|
=item B<-sendsize> <I<size of send buffer in bytes>>
|
|
|
|
Sets the size of the send buffer, which is 16384 bytes by default.
|
|
|
|
=item B<-abortthreshold> <I<abort threshold>>
|
|
|
|
Sets the abort threshold, which is triggered when an AFS client sends
|
|
a number of FetchStatus requests in a row and all of them fail due to
|
|
access control or some other error. When the abort threshold is
|
|
reached, the file server starts to slow down the responses to the
|
|
problem client in order to reduce the load on the file server.
|
|
|
|
The throttling behaviour can cause issues especially for some versions
|
|
of the Windows OpenAFS client. When using Windows Explorer to navigate
|
|
the AFS directory tree, directories with only "look" access for the
|
|
current user may load more slowly because of the throttling. This is
|
|
because the Windows OpenAFS client sends FetchStatus calls one at a
|
|
time instead of in bulk like the Unix Open AFS client.
|
|
|
|
Setting the threshold to 0 disables the throttling behavior. This
|
|
option is available in OpenAFS versions 1.4.1 and later.
|
|
|
|
=item B<-enable_peer_stats>
|
|
|
|
Activates the collection of Rx statistics and allocates memory for their
|
|
storage. For each connection with a specific UDP port on another machine,
|
|
a separate record is kept for each type of RPC (FetchFile, GetStatus, and
|
|
so on) sent or received. To display or otherwise access the records, use
|
|
the Rx Monitoring API.
|
|
|
|
=item B<-enable_process_stats>
|
|
|
|
Activates the collection of Rx statistics and allocates memory for their
|
|
storage. A separate record is kept for each type of RPC (FetchFile,
|
|
GetStatus, and so on) sent or received, aggregated over all connections to
|
|
other machines. To display or otherwise access the records, use the Rx
|
|
Monitoring API.
|
|
|
|
=item B<-syslog [<loglevel>]
|
|
|
|
Use syslog instead of the normal logging location for the fileserver
|
|
process. If provided, log messages are at <loglevel> instead of the
|
|
default LOG_USER.
|
|
|
|
=item B<-mrafslogs>
|
|
|
|
Use MR-AFS (Multi-Resident) style logging. This option is deprecated.
|
|
|
|
=item B<-saneacls>
|
|
|
|
Offer the SANEACLS capability for the fileserver. This option is
|
|
currently unimplemented.
|
|
|
|
=item B<-help>
|
|
|
|
Prints the online help for this command. All other valid options are
|
|
ignored.
|
|
|
|
=item B<-fs-state-dont-save>
|
|
|
|
When present, fileserver state will not be saved during shutdown. Default
|
|
is to save state.
|
|
|
|
This option is only supported by the demand-attach file server.
|
|
|
|
=item B<-fs-state-dont-restore>
|
|
|
|
When present, fileserver state will not be restored during startup.
|
|
Default is to restore state on startup.
|
|
|
|
This option is only supported by the demand-attach file server.
|
|
|
|
=item B<-fs-state-verify> (none | save | restore | both)
|
|
|
|
This argument controls the behavior of the state verification mechanism.
|
|
A value of C<none> turns off all verification. A value of C<save> only
|
|
performs the verification steps prior to saving state to disk. A value
|
|
of C<restore> only performs the verification steps after restoring state
|
|
from disk. A value of C<both> performs all verifications steps both
|
|
prior to save and following a restore.
|
|
|
|
The default is C<both>.
|
|
|
|
This option is only supported by the demand-attach file server.
|
|
|
|
=item B<-vhashsize <I<size>>
|
|
|
|
The log(2) number of of volume hash buckets. Default is 8 (i.e., by
|
|
default, there are 2^8 = 256 volume hash buckets).
|
|
|
|
This option is only supported by the demand-attach file server.
|
|
|
|
=item B<-vlruthresh <I<minutes>>
|
|
|
|
The number of minutes of inactivity before a volume is eligible for soft
|
|
detachment. Default is 120 minutes.
|
|
|
|
This option is only supported by the demand-attach file server.
|
|
|
|
=item B<-vlruinterval <I<seconds>>
|
|
|
|
The number of seconds between VLRU candidate queue scan default is 120 s.
|
|
The second s.
|
|
|
|
This option is only supported by the demand-attach file server.
|
|
|
|
=item B<-vlrumax <I<positive integer>>
|
|
|
|
The maximum number of volumes which can be soft detached in a single pass
|
|
of the scanner. Default is 8 volumes.
|
|
|
|
This option is only supported by the demand-attach file server.
|
|
|
|
=item B<-vattachpar> <I<number of volume attach threads>>
|
|
|
|
The number of threads assigned to attach and detach volumes. The default
|
|
is 1. Warning: many of the I/O parallism features of Demand-Attach
|
|
Fileserver are turned off when the number of volume attach threads is only
|
|
1.
|
|
|
|
This option is only meaningful for a file server built with pthreads
|
|
support.
|
|
|
|
=item B<-m> <I<min percentage spare in partition>>
|
|
|
|
Specifies the percentage of each AFS server partition that the AIX version
|
|
of the File Server creates as a reserve. Specify an integer value between
|
|
C<0> and C<30>; the default is 8%. A value of C<0> means that the
|
|
partition can become completely full, which can have serious negative
|
|
consequences. This option is not supported on platforms other than AIX.
|
|
|
|
=item B<-lock>
|
|
|
|
Prevents any portion of the fileserver binary from being paged (swapped)
|
|
out of memory on a file server machine running the IRIX operating system.
|
|
This option is not supported on platforms other than IRIX.
|
|
|
|
=back
|
|
|
|
=head1 EXAMPLES
|
|
|
|
The following B<bos create> command creates an fs process on the file
|
|
server machine C<fs2.abc.com> that uses the large configuration size, and
|
|
allows volumes to exceed their quota by 10%. Type the command on a single
|
|
line:
|
|
|
|
% bos create -server fs2.abc.com -instance fs -type fs \
|
|
-cmd "/usr/afs/bin/fileserver -pctspare 10 \
|
|
-L" /usr/afs/bin/volserver /usr/afs/bin/salvager
|
|
|
|
|
|
=head1 TROUBLESHOOTING
|
|
|
|
Sending process signals to the File Server Process can change its
|
|
behavior in the following ways:
|
|
|
|
Process Signal OS Result
|
|
---------------------------------------------------------------------
|
|
|
|
File Server XCPU Unix Prints a list of client IP
|
|
Addresses.
|
|
|
|
File Server USR2 Windows Prints a list of client IP
|
|
Addresses.
|
|
|
|
File Server POLL HPUX Prints a list of client IP
|
|
Addresses.
|
|
|
|
Any server TSTP Any Increases Debug level by a power
|
|
of 5 -- 1,5,25,125, etc.
|
|
This has the same effect as the
|
|
-d XXX command-line option.
|
|
|
|
Any Server HUP Any Resets Debug level to 0
|
|
|
|
File Server TERM Any Run minor instrumentation over
|
|
the list of descriptors.
|
|
|
|
Other Servers TERM Any Causes the process to quit.
|
|
|
|
File Server QUIT Any Causes the File Server to Quit.
|
|
Bos Server knows this.
|
|
|
|
The basic metric of whether an AFS file server is doing well is the number
|
|
of connections waiting for a thread,
|
|
which can be found by running the following command:
|
|
|
|
% rxdebug <server> | grep waiting_for | wc -l
|
|
|
|
Each line returned by C<rxdebug> that contains the text "waiting_for"
|
|
represents a connection that's waiting for a file server thread.
|
|
|
|
If the blocked connection count is ever above 0, the server is having
|
|
problems replying to clients in a timely fashion. If it gets above 10,
|
|
roughly, there will be noticable slowness by the user. The total number of
|
|
connections is a mostly irrelevant number that goes essentially
|
|
monotonically for as long as the server has been running and then goes back
|
|
down to zero when it's restarted.
|
|
|
|
The most common cause of blocked connections rising on a server is some
|
|
process somewhere performing an abnormal number of accesses to that server
|
|
and its volumes. If multiple servers have a blocked connection count, the
|
|
most likely explanation is that there is a volume replicated between those
|
|
servers that is absorbing an abnormally high access rate.
|
|
|
|
To get an access count on all the volumes on a server, run:
|
|
|
|
% vos listvol <server> -long
|
|
|
|
and save the output in a file. The results will look like a bunch of B<vos
|
|
examine> output for each volume on the server. Look for lines like:
|
|
|
|
40065 accesses in the past day (i.e., vnode references)
|
|
|
|
and look for volumes with an abnormally high number of accesses. Anything
|
|
over 10,000 is fairly high, but some volumes like root.cell and other
|
|
volumes close to the root of the cell will have that many hits routinely.
|
|
Anything over 100,000 is generally abnormally high. The count resets about
|
|
once a day.
|
|
|
|
Another approach that can be used to narrow the possibilities for a
|
|
replicated volume, when multiple servers are having trouble, is to find all
|
|
replicated volumes for that server. Run:
|
|
|
|
% vos listvldb -server <server>
|
|
|
|
where <server> is one of the servers having problems to refresh the VLDB
|
|
cache, and then run:
|
|
|
|
% vos listvldb -server <server> -part <partition>
|
|
|
|
to get a list of all volumes on that server and partition, including every
|
|
other server with replicas.
|
|
|
|
Once the volume causing the problem has been identified, the best way to
|
|
deal with the problem is to move that volume to another server with a low
|
|
load or to stop any runaway programs that are accessing that volume
|
|
unnecessarily. Often the volume will be enough information to tell what's
|
|
going on.
|
|
|
|
If you still need additional information about who's hitting that server,
|
|
sometimes you can guess at that information from the failed callbacks in the
|
|
F<FileLog> log in F</var/log/afs> on the server, or from the output of:
|
|
|
|
% /usr/afsws/etc/rxdebug <server> -rxstats
|
|
|
|
but the best way is to turn on debugging output from the file server.
|
|
(Warning: This generates a lot of output into FileLog on the AFS server.)
|
|
To do this, log on to the AFS server, find the PID of the fileserver
|
|
process, and do:
|
|
|
|
kill -TSTP <pid>
|
|
|
|
where <pid> is the PID of the file server process. This will raise the
|
|
debugging level so that you'll start seeing what people are actually doing
|
|
on the server. You can do this up to three more times to get even more
|
|
output if needed. To reset the debugging level back to normal, use (The
|
|
following command will NOT terminate the file server):
|
|
|
|
kill -HUP <pid>
|
|
|
|
The debugging setting on the File Server should be reset back to normal when
|
|
debugging is no longer needed. Otherwise, the AFS server may well fill its
|
|
disks with debugging output.
|
|
|
|
The lines of the debugging output that are most useful for debugging load
|
|
problems are:
|
|
|
|
SAFS_FetchStatus, Fid = 2003828163.77154.82248, Host 171.64.15.76
|
|
SRXAFS_FetchData, Fid = 2003828163.77154.82248
|
|
|
|
(The example above is partly truncated to highlight the interesting
|
|
information). The Fid identifies the volume and inode within the volume;
|
|
the volume is the first long number. So, for example, this was:
|
|
|
|
% vos examine 2003828163
|
|
pubsw.matlab61 2003828163 RW 1040060 K On-line
|
|
afssvr5.Stanford.EDU /vicepa
|
|
RWrite 2003828163 ROnly 2003828164 Backup 2003828165
|
|
MaxQuota 3000000 K
|
|
Creation Mon Aug 6 16:40:55 2001
|
|
Last Update Tue Jul 30 19:00:25 2002
|
|
86181 accesses in the past day (i.e., vnode references)
|
|
|
|
RWrite: 2003828163 ROnly: 2003828164 Backup: 2003828165
|
|
number of sites -> 3
|
|
server afssvr5.Stanford.EDU partition /vicepa RW Site
|
|
server afssvr11.Stanford.EDU partition /vicepd RO Site
|
|
server afssvr5.Stanford.EDU partition /vicepa RO Site
|
|
|
|
and from the Host information one can tell what system is accessing that
|
|
volume.
|
|
|
|
Note that the output of L<vos_examine(1)> also includes the access count, so
|
|
once the problem has been identified, vos examine can be used to see if the
|
|
access count is still increasing. Also remember that you can run vos
|
|
examine on the read-only replica (e.g., pubsw.matlab61.readonly) to see the
|
|
access counts on the read-only replica on all of the servers that it's
|
|
located on.
|
|
|
|
=head1 PRIVILEGE REQUIRED
|
|
|
|
The issuer must be logged in as the superuser C<root> on a file server
|
|
machine to issue the command at a command shell prompt. It is conventional
|
|
instead to create and start the process by issuing the B<bos create>
|
|
command.
|
|
|
|
=head1 SEE ALSO
|
|
|
|
L<BosConfig(5)>,
|
|
L<FileLog(5)>,
|
|
L<bos_create(8)>,
|
|
L<bos_getlog(8)>,
|
|
L<fs_setacl(1)>,
|
|
L<salvager(8)>,
|
|
L<volserver(8)>,
|
|
L<vos_examine(1)>
|
|
|
|
=head1 COPYRIGHT
|
|
|
|
IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.
|
|
|
|
This documentation is covered by the IBM Public License Version 1.0. It was
|
|
converted from HTML to POD by software written by Chas Williams and Russ
|
|
Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.
|