This chapter provides a broad overview of the concepts and organization of AFS. It is strongly recommended that anyone involved in administering an AFS cell read this chapter before beginning to issue commands.
This section introduces most of the key terms and concepts necessary for a basic understanding of AFS. For a more detailed discussion, see More Detailed Discussions of Some Basic Concepts.
AFS: A Distributed File System
AFS is a distributed file system that enables users to share and access all of the files stored in a network of computers as easily as they access the files stored on their local machines. The file system is called distributed for this exact reason: files can reside on many different machines (be distributed across them), but are available to users on every machine.
Servers and Clients
In fact, AFS stores files on a subset of the machines in a network, called file server machines. File server machines provide file storage and delivery service, along with other specialized services, to the other subset of machines in the network, the client machines. These machines are called clients because they make use of the servers' services while doing their own work. In a standard AFS configuration, clients provide computational power, access to the files in AFS and other "general purpose" tools to the users seated at their consoles. There are generally many more client workstations than file server machines.
AFS file server machines run a number of server processes, so called because each provides a distinct specialized service: one handles file requests, another tracks file location, a third manages security, and so on. To avoid confusion, AFS documentation always refers to server machines and server processes, not simply to servers. For a more detailed description of the server processes, see AFS Server Processes and the Cache Manager.
Cells
A cell is an administratively independent site running AFS. As a cell's system administrator, you make many decisions about configuring and maintaining your cell in the way that best serves its users, without having to consult the administrators in other cells. For example, you determine how many clients and servers to have, where to put files, and how to allocate client machines to users.
Transparent Access and the Uniform Namespace
Although your AFS cell is administratively independent, you probably want to organize the local collection of files (your filespace or tree) so that users from other cells can also access the information in it. AFS enables cells to combine their local filespaces into a global filespace, and does so in such a way that file access is transparent--users do not need to know anything about a file's location in order to access it. All they need to know is the pathname of the file, which looks the same in every cell. Thus every user at every machine sees the collection of files in the same way, meaning that AFS provides a uniform namespace to its users.
Volumes
AFS groups files into volumes, making it possible to distribute files across many machines and yet maintain a uniform namespace. A volume is a unit of disk space that functions like a container for a set of related files, keeping them all together on one partition. Volumes can vary in size, but are (by definition) smaller than a partition.
Volumes are important to system administrators and users for several reasons. Their small size makes them easy to move from one partition to another, or even between machines. The system administrator can maintain maximum efficiency by moving volumes to keep the load balanced evenly. In addition, volumes correspond to directories in the filespace--most cells store the contents of each user home directory in a separate volume. Thus the complete contents of the directory move together when the volume moves, making it easy for AFS to keep track of where a file is at a certain time. Volume moves are recorded automatically, so users do not have to keep track of file locations.
Efficiency Boosters: Replication and Caching
AFS incorporates special features on server machines and client machines that help make it efficient and reliable.
On server machines, AFS enables administrators to replicate commonly-used volumes, such as those containing binaries for popular programs. Replication means putting an identical read-only copy (sometimes called a clone) of a volume on more than one file server machine. The failure of one file server machine housing the volume does not interrupt users' work, because the volume's contents are still available from other machines. Replication also means that one machine does not become overburdened with requests for files from a popular volume.
On client machines, AFS uses caching to improve efficiency. When a user on a client workstation requests a file, the Cache Manager on the client sends a request for the data to the File Server process running on the proper file server machine. The user does not need to know which machine this is; the Cache Manager determines file location automatically. The Cache Manager receives the file from the File Server process and puts it into the cache, an area of the client machine's local disk or memory dedicated to temporary file storage. Caching improves efficiency because the client does not need to send a request across the network every time the user wants the same file. Network traffic is minimized, and subsequent access to the file is especially fast because the file is stored locally. AFS has a way of ensuring that the cached file stays up-to-date, called a callback.
Security: Mutual Authentication and Access Control Lists
Even in a cell where file sharing is especially frequent and widespread, it is not desirable that every user have equal access to every file. One way AFS provides adequate security is by requiring that servers and clients prove their identities to one another before they exchange information. This procedure, called mutual authentication, requires that both server and client demonstrate knowledge of a "shared secret" (like a password) known only to the two of them. Mutual authentication guarantees that servers provide information only to authorized clients and that clients receive information only from legitimate servers.
Users themselves control another aspect of AFS security, by determining who has access to the directories they own. For any directory a user owns, he or she can build an access control list (ACL) that grants or denies access to the contents of the directory. An access control list pairs specific users with specific types of access privileges. There are seven separate permissions and up to twenty different people or groups of people can appear on an access control list.
For a more detailed description of AFS's mutual authentication procedure, see A More Detailed Look at Mutual Authentication. For further discussion of ACLs, see Managing Access Control Lists.
The previous section offered a brief overview of the many concepts that an AFS system administrator needs to understand. The following sections examine some important concepts in more detail. Although not all concepts are new to an experienced administrator, reading this section helps ensure a common understanding of term and concepts.
A network is a collection of interconnected computers able to communicate with each other and transfer information back and forth.
A networked computing environment contrasts with two types of computing environments: mainframe and personal.
A network can connect computers of any kind, but the typical network running AFS connects high-function personal workstations. Each workstation has some computing power and local disk space, usually more than a personal computer or terminal, but less than a mainframe. For more about the classes of machines used in an AFS environment, see Servers and Clients.
A file system is a collection of files and the facilities (programs and commands) that enable users to access the information in the files. All computing environments have file systems. In a mainframe environment, the file system consists of all the files on the mainframe's storage disks, whereas in a personal computing environment it consists of the files on the computer's local disk.
Networked computing environments often use distributed file systems like AFS. A distributed file system takes advantage of the interconnected nature of the network by storing files on more than one computer in the network and making them accessible to all of them. In other words, the responsibility for file storage and delivery is "distributed" among multiple machines instead of relying on only one. Despite the distribution of responsibility, a distributed file system like AFS creates the illusion that there is a single filespace.
AFS uses a server/client model. In general, a server is a machine, or a process running on a machine, that provides specialized services to other machines. A client is a machine or process that makes use of a server's specialized service during the course of its own work, which is often of a more general nature than the server's. The functional distinction between clients and server is not always strict, however--a server can be considered the client of another server whose service it is using.
AFS divides the machines on a network into two basic classes, file server machines and client machines, and assigns different tasks and responsibilities to each.
File server machines store the files in the distributed file system, and a server process running on the file server machine delivers and receives files. AFS file server machines run a number of server processes. Each process has a special function, such as maintaining databases important to AFS administration, managing security or handling volumes. This modular design enables each server process to specialize in one area, and thus perform more efficiently. For a description of the function of each AFS server process, see AFS Server Processes and the Cache Manager.
Not all AFS server machines must run all of the server processes. Some processes run on only a few machines because the demand for their services is low. Other processes run on only one machine in order to act as a synchronization site. See The Four Roles for File Server Machines.
The other class of machines are the client machines, which generally work directly for users, providing computational power and other general purpose tools. Clients also provide users with access to the files stored on the file server machines. Clients do not run any special processes per se, but do use a modified kernel that enables them to communicate with the AFS server processes running on the file server machines and to cache files. This collection of kernel modifications is referred to as the Cache Manager; see The Cache Manager. There are usually many more client machines in a cell than file server machines.
Client and Server Configuration
In the most typical AFS configuration, both file server machines and client machines are high-function workstations with disk drives. While this configuration is not required, it does have some advantages.
There are several advantages to using personal workstations as file server machines. One is that it is easy to expand the network by adding another file server machine. It is also easy to increase storage space by adding disks to existing machines. Using workstations rather than more powerful mainframes makes it more economical to use multiple file server machines rather than one. Multiple file server machines provide an increase in system availability and reliability if popular files are available on more than one machine.
The advantage of using workstations as clients is that caching on the local disk speeds the delivery of files to application programs. (For an explanation of caching, see Caching and Callbacks.) Diskless machines can access AFS if they are running NFS(R) and the NFS/AFS Translator, an optional component of the AFS distribution.
A cell is an independently administered site running AFS. In terms of hardware, it consists of a collection of file server machines and client machines defined as belonging to the cell; a machine can only belong to one cell at a time. Users also belong to a cell in the sense of having an account in it, but unlike machines can belong to (have an account in) multiple cells. To say that a cell is administratively independent means that its administrators determine many details of its configuration without having to consult administrators in other cells or a central authority. For example, a cell administrator determines how many machines of different types to run, where to put files in the local tree, how to associate volumes and directories, and how much space to allocate to each user.
The terms local cell and home cell are equivalent, and refer to the cell in which a user has initially authenticated during a session, by logging onto a machine that belongs to that cell. All other cells are referred to as foreign from the user's perspective. In other words, throughout a login session, a user is accessing the filespace through a single Cache Manager--the one on the machine to which he or she initially logged in--whose cell membership defines the local cell. All other cells are considered foreign during that login session, even if the user authenticates in additional cells or uses the cd command to change directories into their file trees.
It is possible to maintain more than one cell at a single geographical location. For instance, separate departments on a university campus or in a corporation can choose to administer their own cells. It is also possible to have machines at geographically distant sites belong to the same cell; only limits on the speed of network communication determine how practical this is.
Despite their independence, AFS cells generally agree to make their local filespace visible to other AFS cells, so that users in different cells can share files if they choose. If your cell is to participate in the "global" AFS namespace, it must comply with a few basic conventions governing how the local filespace is configured and how the addresses of certain file server machines are advertised to the outside world.
One of the features that makes AFS easy to use is that it provides transparent access to the files in a cell's filespace. Users do not have to know which file server machine stores a file in order to access it; they simply provide the file's pathname, which AFS automatically translates into a machine location.
In addition to transparent access, AFS also creates a uniform namespace--a file's pathname is identical regardless of which client machine the user is working on. The cell's file tree looks the same when viewed from any client because the cell's file server machines store all the files centrally and present them in an identical manner to all clients.
To enable the transparent access and the uniform namespace features, the system administrator must follow a few simple conventions in configuring client machines and file trees. For details, see Making Other Cells Visible in Your Cell.
A volume is a conceptual container for a set of related files that keeps them all together on one file server machine partition. Volumes can vary in size, but are (by definition) smaller than a partition. Volumes are the main administrative unit in AFS, and have several characteristics that make administrative tasks easier and help improve overall system performance.
The previous section discussed how each volume corresponds logically to a directory in the file system: the volume keeps together on one partition all the data in the files residing in the directory. The directory that corresponds to a volume is called its root directory, and the mechanism that associates the directory and volume is called a mount point. A mount point is similar to a symbolic link in the file tree that specifies which volume contains the files kept in a directory. A mount point is not an actual symbolic link; its internal structure is different.
Note: | You must not create a symbolic link to a file whose name begins with the number sign (#) or the percent sign (%), because the Cache Manager interprets such a link as a mount point to a regular or read/write volume, respectively. |
The use of mount points means that many of the elements in an AFS file tree that look and function just like standard UNIX file system directories are actually mount points. In form, a mount point is a one-line file that names the volume containing the data for files in the directory. When the Cache Manager (see The Cache Manager) encounters a mount point--for example, in the course of interpreting a pathname--it looks in the volume named in the mount point. In the volume the Cache Manager finds an actual UNIX-style directory element--the volume's root directory--that lists the files contained in the directory/volume. The next element in the pathname appears in that list.
A volume is said to be mounted at the point in the file tree where there is a mount point pointing to the volume. A volume's contents are not visible or accessible unless it is mounted.
Replication refers to making a copy, or clone, of a source read/write volume and then placing the copy on one or more additional file server machines in a cell. One benefit of replicating a volume is that it increases the availability of the contents. If one file server machine housing the volume fails, users can still access the volume on a different machine. No one machine need become overburdened with requests for a popular file, either, because the file is available from several machines.
Replication is not necessarily appropriate for cells with limited disk space, nor are all types of volumes equally suitable for replication (replication is most appropriate for volumes that contain popular files that do not change very often). For more details, see When to Replicate Volumes.
Just as replication increases system availability, caching increases the speed and efficiency of file access in AFS. Each AFS client machine dedicates a portion of its local disk or memory to a cache where it stores data temporarily. Whenever an application program (such as a text editor) running on a client machine requests data from an AFS file, the request passes through the Cache Manager. The Cache Manager is a portion of the client machine's kernel that translates file requests from local application programs into cross-network requests to the File Server process running on the file server machine storing the file. When the Cache Manager receives the requested data from the File Server, it stores it in the cache and then passes it on to the application program.
Caching improves the speed of data delivery to application programs in the following ways:
While caching provides many advantages, it also creates the problem of maintaining consistency among the many cached copies of a file and the source version of a file. This problem is solved using a mechanism referred to as a callback.
A callback is a promise by a File Server to a Cache Manager to inform the latter when a change is made to any of the data delivered by the File Server. Callbacks are used differently based on the type of file delivered by the File Server:
The callback mechanism ensures that the Cache Manager always requests the most up-to-date version of a file. However, it does not ensure that the user necessarily notices the most current version as soon as the Cache Manager has it. That depends on how often the application program requests additional data from the File System or how often it checks with the Cache Manager.
As mentioned in Servers and Clients, AFS file server machines run a number of processes, each with a specialized function. One of the main responsibilities of a system administrator is to make sure that processes are running correctly as much of the time as possible, using the administrative services that the server processes provide.
The following list briefly describes the function of each server process and the Cache Manager; the following sections then discuss the important features in more detail.
The File Server, the most fundamental of the servers, delivers data files from the file server machine to local workstations as requested, and stores the files again when the user saves any changes to the files.
The Basic OverSeer Server (BOS Server) ensures that the other server processes on its server machine are running correctly as much of the time as possible, since a server is useful only if it is available. The BOS Server relieves system administrators of much of the responsibility for overseeing system operations.
The Authentication Server helps ensure that communications on the network are secure. It verifies user identities at login and provides the facilities through which participants in transactions prove their identities to one another (mutually authenticate). It maintains the Authentication Database.
The Protection Server helps users control who has access to their files and directories. Users can grant access to several other users at once by putting them all in a group entry in the Protection Database maintained by the Protection Server.
The Volume Server performs all types of volume manipulation. It helps the administrator move volumes from one server machine to another to balance the workload among the various machines.
The Volume Location Server (VL Server) maintains the Volume Location Database (VLDB), in which it records the location of volumes as they move from file server machine to file server machine. This service is the key to transparent file access for users.
The Update Server distributes new versions of AFS server process software and configuration information to all file server machines. It is crucial to stable system performance that all server machines run the same software.
The Backup Server maintains the Backup Database, in which it stores information related to the Backup System. It enables the administrator to back up data from volumes to tape. The data can then be restored from tape in the event that it is lost from the file system.
The Salvager is not a server in the sense that others are. It runs only after the File Server or Volume Server fails; it repairs any inconsistencies caused by the failure. The system administrator can invoke it directly if necessary.
The Network Time Protocol Daemon (NTPD) is not an AFS server process per se, but plays a vital role nonetheless. It synchronizes the internal clock on a file server machine with those on other machines. Synchronized clocks are particularly important for correct functioning of the AFS distributed database technology (known as Ubik); see Configuring the Cell for Proper Ubik Operation. The NTPD is controlled by the runntp process.
The Cache Manager is the one component in this list that resides on AFS client rather than file server machines. It not a process per se, but rather a part of the kernel on AFS client machines that communicates with AFS server processes. Its main responsibilities are to retrieve files for application programs running on the client and to maintain the files in the cache.
The File Server is the most fundamental of the AFS server processes and runs on each file server machine. It provides the same services across the network that the UNIX file system provides on the local disk:
The Basic OverSeer Server (BOS Server) reduces the demands on system administrators by constantly monitoring the processes running on its file server machine. It can restart failed processes automatically and provides a convenient interface for administrative tasks.
The BOS Server runs on every file server machine. Its primary function is to minimize system outages. It also
The Authentication Server performs two main functions related to network security:
In fulfilling these duties, the Authentication Server utilizes algorithms and other procedures known as Kerberos (which is why many commands used to contact the Authentication Server begin with the letter k). This technology was originally developed by the Massachusetts Institute of Technology's Project Athena.
The Authentication Server also maintains the Authentication Database, in which it stores user passwords converted into encryption key form as well as the AFS server encryption key. To learn more about the procedures AFS uses to verify user identity and during mutual authentication, see A More Detailed Look at Mutual Authentication.
The Protection Server is the key to AFS's refinement of the normal UNIX methods for protecting files and directories from unauthorized use. The refinements include the following:
The Protection Server's main duty is to help the File Server determine if a user is authorized to access a file in the requested manner. The Protection Server creates a list of all the groups to which the user belongs. The File Server then compares this list to the ACL associated with the file's parent directory. A user thus acquires access both as an individual and as a member of any groups.
The Protection Server also maps usernames (the name typed at the login prompt) to AFS user ID numbers (AFS UIDs). These UIDs are functionally equivalent to UNIX UIDs, but operate in the domain of AFS rather than in the UNIX file system on a machine's local disk. This conversion service is essential because the tokens that the Authentication Server grants to authenticated users are stamped with usernames (to comply with Kerberos standards). The AFS server processes identify users by AFS UID, not by username. Before they can understand whom the token represents, they need the Protection Server to translate the username into an AFS UID. For further discussion of tokens, see A More Detailed Look at Mutual Authentication.
The Volume Server provides the interface through which you create, delete, move, and replicate volumes, as well as prepare them for archiving to tape or other media (backing up). Volumes explained the advantages gained by storing files in volumes. Creating and deleting volumes are necessary when adding and removing users from the system; volume moves are done for load balancing; and replication enables volume placement on multiple file server machines (for more on replication, see Replication).
The VL Server maintains a complete list of volume locations in the Volume Location Database (VLDB). When the Cache Manager (see The Cache Manager) begins to fill a file request from an application program, it first contacts the VL Server in order to learn which file server machine currently houses the volume containing the file. The Cache Manager then requests the file from the File Server process running on that file server machine.
The VLDB and VL Server make it possible for AFS to take advantage of the increased system availability gained by using multiple file server machines, because the Cache Manager knows where to find a particular file. Indeed, in a certain sense the VL Server is the keystone of the entire file system--when the information in the VLDB is inaccessible, the Cache Manager cannot retrieve files, even if the File Server processes are working properly. A list of the information stored in the VLDB about each volume is provided in Volume Information in the VLDB.
The Update Server helps guarantee that all file server machines are running the same version of a server process. System performance can be inconsistent if some machines are running one version of the BOS Server (for example) and other machines were running another version.
To ensure that all machines run the same version of a process, install new software on a single file server machine of each system type, called the binary distribution machine for that type. The binary distribution machine runs the server portion of the Update Server, whereas all the other machines of that type run the client portion of the Update Server. The client portions check frequently with the server portion to see if they are running the right version of every process; if not, the client portion retrieves the right version from the binary distribution machine and installs it locally. The system administrator does not need to remember to install new software individually on all the file server machines: the Update Server does it automatically. For more on binary distribution machines, see Binary Distribution Machines.
In cells that run the United States edition of AFS, the Update Server also distributes configuration files that all file server machines need to store on their local disks (for a description of the contents and purpose of these files, see Common Configuration Files in the /usr/afs/etc Directory). As with server process software, the need for consistent system performance demands that all the machines have the same version of these files. With the United States edition, the system administrator needs to make changes to these files on one machine only, the cell's system control machine, which runs a server portion of the Update Server. All other machines in the cell run a client portion that accesses the correct versions of these configuration files from the system control machine. Cells running the international edition of AFS do not use a system control machine to distribute configuration files. For more information, see The System Control Machine.
The Backup Server maintains the information in the Backup Database. The Backup Server and the Backup Database enable administrators to back up data from AFS volumes to tape and restore it from tape to the file system if necessary. The server and database together are referred to as the Backup System.
Administrators initially configure the Backup System by defining sets of volumes to be dumped together and the schedule by which the sets are to be dumped. They also install the system's tape drives and define the drives' Tape Coordinators, which are the processes that control the tape drives.
Once the Backup System is configured, user and system data can be dumped from volumes to tape. In the event that data is ever lost from the system (for example, if a system or disk failure causes data to be lost), administrators can restore the data from tape. If tapes are periodically archived, or saved, data can also be restored to its state at a specific time. Additionally, because Backup System data is difficult to reproduce, the Backup Database itself can be backed up to tape and restored if it ever becomes corrupted. For more information on configuring and using the Backup System, see Configuring the AFS Backup System and Backing Up and Restoring AFS Data.
The Salvager differs from other AFS Servers in that it runs only at selected times. The BOS Server invokes the Salvager when the File Server, Volume Server, or both fail. The Salvager attempts to repair disk corruption that can result from a failure.
As a system administrator, you can also invoke the Salvager as necessary, even if the File Server or Volume Server has not failed. See Salvaging Volumes.
The Network Time Protocol Daemon (NTPD) is not an AFS server process per se, but plays an important role. It helps guarantee that all of the file server machines agree on the time. The NTPD on one file server machine acts as a synchronization site, generally learning the correct time from a source outside the cell. The NTPDs on the other file server machines refer to the synchronization site to set the internal clocks on their machines.
Keeping clocks synchronized is particularly important to the correct operation of AFS's distributed database technology, which coordinates the copies of the Authentication, Backup, Protection, and Volume Location Databases; see Replicating the AFS Administrative Databases. Client machines also refer to these clocks for the correct time; therefore, it is less confusing if all file server machines have the same time. For more technical detail about the NTPD, see The runntp Process.
As already mentioned in Caching and Callbacks, the Cache Manager is the one component in this section that resides on client machines rather than on file server machines. It is not technically a stand-alone process, but rather a set of extensions or modifications in the client machine's kernel that enable communication with the server processes running on server machines. Its main duty is to translate file requests (made by application programs on client machines) into remote procedure calls (RPCs) to the File Server. (The Cache Manager first contacts the VL Server to find out which File Server currently houses the volume that contains a requested file, as mentioned in The Volume Location (VL) Server). When the Cache Manager receives the requested file, it caches it before passing data on to the application program.
The Cache Manager also tracks the state of files in its cache compared to the version at the File Server by storing the callbacks sent by the File Server. When the File Server breaks a callback, indicating that a file or volume changed, the Cache Manager requests a copy of the new version before providing more data to application programs.