One of your most important responsibilities as a system administrator is ensuring that the processes on file server machines are running correctly. The BOS Server, which runs on every file server machine, relieves you of much of the responsibility by constantly monitoring the other AFS server processes on its machine. It can automatically restart processes that have failed, ordering the restarts to take interdependencies into account.
Because different file server machines run different combinations of processes, you must define which processes the BOS Server on each file server machine is to monitor (to learn how, see Controlling and Checking Process Status).
It is sometimes necessary to take direct control of server process status before performing routine maintenance or correcting problems that the BOS Server cannot correct (such as problems with database replication or mutual authentication). At those times, you control process status through the BOS Server by issuing bos commands.
This chapter explains how to perform the following tasks by
using the indicated commands:
Examine process status | bos status |
Examine information from the BosConfig file file | bos status with -long flag |
Create a process instance | bos create |
Stop a process | bos stop |
Start a stopped process | bos start |
Stop a process temporarily | bos shutdown |
Start a temporarily stopped process | bos startup |
Stop and immediately restart a process | bos restart |
Stop and immediately restart all processes | bos restart with -bosserver flag |
Examine BOS Server's restart times | bos getrestart |
Set BOS Server's restart times | bos setrestart |
Examine a log file | bos getlog |
Execute a command remotely | bos exec |
This section briefly describes the different server processes that can run on an AFS server machine. In cells with multiple server machines, not all processes necessarily run on all machines.
An AFS server process is referred to in one of three ways, depending on the context:
The following sections specify each name for the process as well as some of the administrative tasks in which you use the process. For a more general description of the servers, see AFS Server Processes and the Cache Manager.
The bosserver process, which runs on every AFS server machine, is the Basic OverSeer (BOS) Server responsible for monitoring the other AFS server processes running on its machine. If a process fails, the BOS Server can restart it automatically, without human intervention. It takes interdependencies into account when restarting a process that has multiple component processes (such as the fs process described in The fs Collection of Processes: the File Server, Volume Server and Salvager).
Because the BOS Server does not monitor or restart itself, it does not appear in the output from the bos status command. It appears in the ps command's output as /usr/afs/bin/bosserver.
As a system administrator, you contact the BOS Server when you issue bos commands to perform the following kinds of tasks.
The buserver process, which runs on database server machines, is the Backup Server. It maintains information about Backup System configuration and operations in the Backup Database.
The process appears as buserver in the bos status command's output, if the conventional name is assigned. It appears in the ps command's output as /usr/afs/bin/buserver.
As a system administrator, you contact the Backup Server when you issue any backup command that manipulates information in the Backup Database, including those that change Backup System configuration information, that dump data from volumes to permanent storage, or that restore data to AFS. See Configuring the AFS Backup System and Backing Up and Restoring AFS Data.
The fs process, which runs on every file server machine, combines three component processes: File Server, Volume Server and Salvager. The three components perform independent functions, but are controlled as a single process for the following reasons.
The File Server component handles AFS data at the level of files and directories, manipulating file system elements as requested by application programs and the standard operating system commands. Its main duty is to deliver requested files to client machines and store them again on the server machine when the client is finished. It also maintains status and protection information about each file and directory. It runs continuously during normal operation.
The Volume Server component handles AFS data at the level of complete volumes rather than files and directories. In response to vos commands, it creates, removes, moves, dumps and restores entire volumes, among other actions. It runs continuously during normal operation.
The Salvager component runs only after the failure of one of the other two processes. It checks the file system for internal consistency and repairs any errors it finds.
The process appears as fs in the bos status command's output, if the conventional name is assigned. An auxiliary message reports the status of the File Server or Salvager component. See Displaying Process Status and Information from the BosConfig File.
The component processes of the fs process appear individually in the ps command's output, as follows. There is no entry for the fs process itself.
The Cache Manager contacts the File Server component on your behalf whenever you access data or status information in an AFS file or directory or issue file manipulation commands such as the UNIX cp and ls commands. You can contact the File Server directly by issuing fs commands that perform the following functions
You contact the Volume Server component when you issue vos commands that manipulate volumes in any way--creating, removing, replicating, moving, renaming, converting to different formats, and salvaging. For instructions, see Managing Volumes.
The Salvager normally runs automatically in case of a failure. You can also start it with the bos salvage command as described in Salvaging Volumes.
The kaserver process, which runs on database server machines, is the Authentication Server responsible for several aspects of AFS security. It verifies AFS user identity by requiring a password. It maintains all AFS server encryption keys and user passwords in the Authentication Database. The Authentication Server's Ticket Granting Service (TGS) module creates the shared secrets that AFS client and server processes use when establishing secure connections.
The process appears as kaserver in the bos status command's output, if the conventional name is assigned. The ka string stands for Kerberos Authentication, reflecting the fact that AFS's authentication protocols are based on Kerberos, which was originally developed at the Massachusetts Institute of Technology's Project Athena.
It appears in the ps command's output as /usr/afs/bin/kaserver.
As a system administrator, you contact the Authentication Server when you issue kas commands to perform the following kinds of tasks.
The ptserver process, which runs on database server machines, is the Protection Server. Its main responsibility is maintaining the Protection Database which contains user, machine, and group entries. The Protection Server allocates AFS IDs and maintains the mapping between them and names. The File Server consults the Protection Server when verifying that a user is authorized to perform a requested action.
The process appears as ptserver in the bos status command's output, if the conventional name is assigned. It appears in the ps command's output as /usr/afs/bin/ptserver.
As a system administrator, you contact the Protection Server when you issue pts commands to perform the following kinds of tasks.
The runntp process, which runs on every server machine, is a controller program for the Network Time Protocol Daemon (NTPD), which synchronizes the hardware clocks on server machines. You need to run the runntp process if you are not already running NTP or another time synchronization protocol on your server machines.
The clocks on database server machines need to be synchronized because AFS's distributed database technology (Ubik) works properly only when the clocks agree within a narrow range of variation (see Configuring the Cell for Proper Ubik Operation). The clocks on file server machines need to be correct not only because the File Server sets modification time stamps on files, but because in the conventional configuration they serve as the time source for AFS client machines.
The process appears as runntp in the bos status command's output, if the conventional name is assigned. It appears in the output from the ps command as /usr/afs/bin/runntp. The ps command's output also includes an entry called ntpd; its exact form depends on the arguments you provide to the runntp command.
As a system administrator, you do not contact the NTPD directly once you have installed it according to the instructions in the IBM AFS Quick Beginnings.
The Update Server has two separate parts, each of which runs on a different type of server machine. The upserver process is the server portion of the Update Server. Its function depends on which edition of AFS you use:
The upclient process is the client portion of the Update Server, and like the server portion its function depends on the AFS edition in use.
In output from the bos status command, the server portion appears as upserver and the client portions as upclientbin and upclientetc, if the conventional names are assigned. In the output from the ps command, the server portion appears as /usr/afs/bin/upserver and the client portions as /usr/afs/bin/upclient.
You do not contact the Update Server directly once you have installed it. It operates automatically whenever you use bos commands to change the files that it distributes.
The vlserver process, which runs on database server machines, is the Volume Location (VL) Server that automatically tracks which file server machines house each volume, making its location transparent to client applications.
The process appears as vlserver in the bos status command's output, if the conventional name is assigned. It appears in the ps command's output as /usr/afs/bin/vlserver.
As a system administrator, you contact the VL Server when you issue any vos command that changes the status of a volume (it records the status changes in the VLDB).
To define the AFS server processes that run on a server machine, use the bos create command to create entries for them in the local /usr/afs/local/BosConfig file. The BOS Server monitors the processes listed in the BosConfig file that are marked with the Run status flag, and automatically attempts to restart them if they fail. After creating process entries, you use other commands from the bos suite to stop and start processes or change the status flag as desired.
Never edit the BosConfig file directly rather than using bos commands. Similarly, it is not a good practice to run server processes without listing them in the BosConfig file, or to stop them using process termination commands such as the UNIX kill command.
A process's entry in the BosConfig file includes the following information:
In addition to process definitions, the BosConfig file also records automatic restart times for processes that have new binaries, and for all server processes including the BOS Server. See Setting the BOS Server's Restart Times.
Whenever the BOS Server starts or restarts, it reads the BosConfig file to learn which processes it is to start and monitor. It transfers the information into kernel memory and does not read the BosConfig file again until it next restarts. This implies that the BOS Server's memory state can change independently of the BosConfig file. You can, for example, stop a process but leave its status flag in the BosConfig file as Run, or start a process even though its status flag in the BosConfig file is NotRun.
When you start or stop a database server process (Authentication Server, Backup Server, Protection Server, or Volume Location Server) for more than a short time, you must follow the instructions in the IBM AFS Quick Beginnings for installing or removing a database server machine. Here is a summary of the tasks you must perform to preserve correct AFS functioning.
In the conventional cell configuration, one server machine of each system type acts as a binary distribution machine, running the server portion of the Update Server (upserver process) to distribute the contents of its /usr/afs/bin directory. The other server machines of its system type run an instance of the Update Server client portion (by convention called upclientbin) that references the binary distribution machine.
If you run the United States edition of AFS, it is conventional for the first server machine you install to act as the system control machine, running the server portion of the Update Server (upserver process) to distribute the contents of its /usr/afs/etc directory. All other server machines run an instance of the Update Server client portion (by convention called upclientetc) that references the system control machine.
Note: | If you are using the international edition of AFS, do not use the Update Server to distribute the contents of the /usr/afs/etc directory (you do not run a system control machine). Ignore all references to the process in this chapter. |
It is simplest not to move binary distribution or system control responsibilities to a different machine unless you completely decommission a machine that is currently serving in one of those roles. Running the Update Server usually imposes very little processing load. If you must move the functionality, perform the following related tasks.
To display the status of the AFS server processes on a server machine, issue the bos status command. Adding the -long flag displays most of the information from each process's entry in the BosConfig file, including its type and command parameters. It also displays a warning message if the mode bits on files and subdirectories in the /usr/afs directory do not match the expected values.
% bos status <machine name> [<server process name>+] [-long]
where
The output includes an entry for each process and uses one of the following strings to indicate the process's status:
The output for the fs process always includes a message marked Auxiliary status, which can be one of the following:
The output for a cron process also includes an Auxiliary status message to report when the command is scheduled to run next; see the example that follows.
The output for any process can include the supplementary message has core file to indicate that at some point the process failed and generated a core file in the /usr/afs/logs directory. In most cases, the BOS Server is able to restart the process and it is running.
The following example includes a user-defined cron entry called backupusers:
% bos status fs3.abc.com Instance kaserver, currently running normally. Instance ptserver, currently running normally. Instance vlserver, has core file, currently running normally. Instance buserver, currently running normally. Instance fs, currently running normally. Auxiliary status is: file server running. Instance upserver, currently running normally. Instance runntp, currently running normally. Instance backupusers, currently running normally. Auxiliary status is: run next at Mon Jun 7 02:00:00 1999.
If you include the -long flag to the bos status command, a process's entry in the output includes the following additional information from the BosConfig file:
In addition, if the BOS Server has found that the mode bits on certain files and directories under /usr/afs deviate from what it expects, it prints the following warning message:
Bosserver process reports inappropriate access on server directories
The expected protections for the directories and files in the
/usr/afs directory are as follows. A question mark indicates
that the BOS Server does not check the mode bit. See the IBM AFS
Quick Beginnings for more information about setting the protections on
these files and directories.
/usr/afs | drwxr?xr-x |
/usr/afs/backup | drwx???--- |
/usr/afs/bin | drwxr?xr-x |
/usr/afs/db | drwx???--- |
/usr/afs/etc | drwxr?xr-x |
/usr/afs/etc/KeyFile | -rw????--- |
/usr/afs/etc/UserList | -rw?????-- |
/usr/afs/local | drwx???--- |
/usr/afs/logs | drwxr?xr-x |
The following illustrates the extended output for the fs process running on the machine fs3.abc.com:
% bos status fs3.abc.com fs -long Instance fs, (type is fs), currently running normally. Auxiliary status is file server running Process last started at Mon May 3 8:29:19 1999 (3 proc starts) Last exit at Mon May 3 8:29:19 1999 Last error exit at Mon May 3 8:29:19 1999, due to shutdown request Command 1 is '/usr/afs/bin/fileserver' Command 2 is '/usr/afs/bin/volserver' Command 3 is '/usr/afs/bin/salvager'
To start a new AFS server process on a server machine, issue the bos create command, which creates an entry in the /usr/afs/local/BosConfig file, sets the process's status flag to Run both in the file and in the BOS Server's memory, and starts it running immediately. The binary file for the new process must already be installed, by convention in the /usr/afs/bin directory (see Installing New Binaries).
To stop a process permanently, first issue the bos stop command, which changes the process's status flag to NotRun in both the BosConfig file and the BOS Server's memory; it is marked as disabled in the output from the bos status command. If desired, issue the bos delete command to remove the process's entry from the BosConfig file; the process no longer appears in the bos status command's output.
Note: | If you are starting or stopping a database server process in the manner described in this section, follow the complete instructions in the IBM AFS Quick Beginnings for creating or removing a database server machine. If you run one database server process on a given machine, you must run them all; for more information, see About Starting and Stopping the Database Server Processes. Similarly, if you are stopping the upserver process on the system control machine or a binary distribution machine, you must complete the additional tasks described in About Starting and Stopping the Update Server. |
% bos listusers <machine name>
If the binaries are not present, install them on the binary distribution machine of the appropriate system type, and wait for the Update Server to copy them to this machine. For instructions, see Installing New Binaries.
% ls /usr/afs/bin
% bos create <machine name> <server process name> \ <server type> <command lines>+ [ -notifier <Notifier program>]
where
For a simple process, provide the complete pathname of the process's binary file on the local disk (for example, /usr/afs/bin/ptserver for the Protection Server). If including any of the initialization command's options, surround the entire command in double quotes (" "). The upclient process has a required argument, and the commands for all other processes take optional arguments.
For the fs process, provide the complete pathname of the local disk binary file for each of the component processes: fileserver, volserver, and salvager, in that order. The standard binary directory is /usr/afs/bin. If including any of an initialization command's options, surround the entire command in double quotes (" ").
For a cron process, provide two parameters:
The following example defines and starts the Protection Server on the machine db2.abc.com:
% bos create db2.abc.com ptserver simple /usr/afs/bin/ptserver
The following example defines and starts the fs process on the machine fs6.abc.com.
% bos create fs6.abc.com fs fs /usr/afs/bin/fileserver \ /usr/afs/bin/volserver /usr/afs/bin/salvager
The following example defines and starts a cron process called backupuser process on the machine fs3.abc.com, scheduling it to run each day at 3:00 a.m.
% bos create fs3.abc.com backupuser cron "/usr/afs/bin/vos backupsys -prefix user -local" 3:00
% bos listusers <machine name>
% bos stop <machine name> <server process name>+ [-wait]
% bos delete <machine name> <server process name>+
where
To stop a process so that the BOS Server no longer attempts to monitor it, issue the bos stop command. The process's status flag is set to NotRun in both the BOS Server's memory and in the BosConfig file. The process does not run again until you issue the bos start command, which sets its status flag back to Run in both the BOS Server's memory and in the BosConfig file. (You can also use the bos startup command to start the process again without changing its status flag in the BosConfig file; see Stopping and Starting Processes Temporarily.)
There is no entry for the BOS Server in the BosConfig file, so the bos stop and bos start commands do not control it. To stop and immediately restart the BOS Server along with all other processes, use the -bosserver flag to the bos restart command as described in Stopping and Immediately Restarting Processes.
Note: | If you are starting or stopping a database server process in the manner described in this section, follow the complete instructions in the IBM AFS Quick Beginnings for creating or removing a database server machine. If you run one database server process on a given machine, you must run them all; for more information, see About Starting and Stopping the Database Server Processes. Similarly, if you are stopping the upserver process on the system control machine or a binary distribution machine, you must complete the additional tasks described in About Starting and Stopping the Update Server. |
% bos listusers <machine name>
% bos stop <machine name> <server process name>+ [-wait]
where
% bos listusers <machine name>
% bos start <machine name> <server process name>+
where
It is sometimes necessary to halt a process temporarily (for example, to make slight configuration changes or to perform maintenance). The commands described in this section change a process's status in the BOS Server's memory only; the effect is immediate and lasts until you change the memory state again (or until the BOS Server restarts, at which time it starts the process according to its entry in the BosConfig file).
To stop a process temporarily by changing its status flag in BOS Server memory to NotRun, use the bos shutdown command. To restart a stopped process by changing its status flag in the BOS Server's memory to Run, use the bos startup command. The process starts regardless of its status flag in the BosConfig file. You can also use the bos startup command to start all processes marked with status flag Run in the BosConfig file, as described in the following instructions.
Because the bos startup command starts a process without changing it status flag in the BosConfig file, it is useful for testing a server process without enabling it permanently. To stop and start processes by changing their status flags in the BosConfig file, see Stopping and Starting Processes Permanently; to stop and immediately restart a process, see Stopping and Immediately Restarting Processes.
Note: | Do not temporarily stop a database server process on all machines at once. Doing so makes the database completely unavailable. |
% bos listusers <machine name>
% bos shutdown <machine name> [<instances>+] [-wait]
where
% bos listusers <machine name>
% bos startup <machine name>
where
% bos listusers <machine name>
% bos startup <machine name> <instances>+
where
Although by default the BOS Server checks each day for new installed binary files and restarts the associated processes, it is sometimes desirable to stop and restart processes immediately. The bos restart command provides this functionality, starting a completely new instance of each affected process:
Restarting processes causes a service outage. It is usually best to schedule restarts for periods of low usage. The BOS Server automatically restarts all processes once a week, to reduce the potential for the core leaks that can develop as any process runs for an extended time; see Setting the BOS Server's Restart Times.
% bos listusers <machine name>
% bos restart <machine name> -bosserver
where
% bos listusers <machine name>
% bos restart <machine name> -all
where
% bos listusers <machine name>
% bos restart <machine name> <instances>+
where
The BOS Server by default restarts once a week, and the new instance restarts all processes marked with status flag Run in the local /usr/afs/local/BosConfig file (this is equivalent to issuing the bos restart command with the -bosserver flag). The default restart time is Sunday at 4:00 a.m. The weekly restart is designed to minimize core leaks, which can develop as a process continues to allocate virtual memory but does not free it again. When the memory is completely exhausted, the machine can no longer function correctly.
The BOS Server also by default checks once a day for any newly installed binary files. If it finds that the modification time stamp on a process's binary file in the /usr/afs/bin directory is more recent than the time at which the process last started, it restarts the process so that a new instance starts using the new binary file. The default binary-checking time is 5:00 a.m.
Because restarts can cause outages during which the file system is inaccessible, the default times for restarts are in the early morning when usage is likely to be lowest. Restarting a database server process on any database server machine usually makes the entire system unavailable to everyone for a brief time, whereas restarting other types of processes inconveniences only users interacting with that process on that machine. The longest outages typically result from restarting the fs process, because the File Server must reattach all volumes.
The BosConfig file on each file server machine records the two restart times. To display the current setting, issue the bos getrestart command. To reset a time, use the bos setrestart command.
% bos getrestart <machine name>
where
% bos listusers <machine name>
% bos setrestart <machine name> "<time to restart server>" [-general] [-newbinary]
where
If desired, precede a time or day and time definition with the string every or at. These words do not change the meaning, but possibly make the output of the bos getrestart command easier to understand.
Note: | If the specified time is within one hour of the current time, the BOS Server does not perform the restart until the next eligible time (the next day for a time or next week for a day and time). |
The /usr/afs/logs directory on each file server machine contains log files that detail interesting events that occur during normal operation of some AFS server processes. The self-explanatory information in the log files can help you evaluate process failures and other problems. To display a log file remotely, issue the bos getlog command. You can also establish a connection to the server machine and use a text editor or other file display program (such as the cat command).
Note: | Log files can grow unmanageably large if you do not periodically shutdown and restart the database server processes (for example, if you disable the general restart time). In this case it is a good policy periodically to issue the UNIX rm command to delete the current log file. The server process automatically creates a new one as needed. |
% bos listusers <machine name>
% bos getlog <machine name> <log file to examine>
where
You can provide a full or relative pathname to display a file from another directory. Relative pathnames are interpreted relative to the /usr/afs/logs directory.