diff --git a/doc/arch/arch-overview.h b/doc/arch/arch-overview.h new file mode 100644 index 0000000000..fcd1e54ed7 --- /dev/null +++ b/doc/arch/arch-overview.h @@ -0,0 +1,1223 @@ +/*! + \page title AFS-3 Programmer's Reference: Architectural Overview + +\author Edward R. Zayas +Transarc Corporation +\version 1.0 +\date 2 September 1991 22:53 .cCopyright 1991 Transarc Corporation All Rights +Reserved FS-00-D160 + + + \page chap1 Chapter 1: Introduction + + \section sec1-1 Section 1.1: Goals and Background + +\par +This paper provides an architectural overview of Transarc's wide-area +distributed file system, AFS. Specifically, it covers the current level of +available software, the third-generation AFS-3 system. This document will +explore the technological climate in which AFS was developed, the nature of +problem(s) it addresses, and how its design attacks these problems in order to +realize the inherent benefits in such a file system. It also examines a set of +additional features for AFS, some of which are actively being considered. +\par +This document is a member of a reference suite providing programming +specifications as to the operation of and interfaces offered by the various AFS +system components. It is intended to serve as a high-level treatment of +distributed file systems in general and of AFS in particular. This document +should ideally be read before any of the others in the suite, as it provides +the organizational and philosophical framework in which they may best be +interpreted. + + \section sec1-2 Section 1.2: Document Layout + +\par +Chapter 2 provides a discussion of the technological background and +developments that created the environment in which AFS and related systems were +inspired. Chapter 3 examines the specific set of goals that AFS was designed to +meet, given the possibilities created by personal computing and advances in +communication technology. Chapter 4 presents the core AFS architecture and how +it addresses these goals. Finally, Chapter 5 considers how AFS functionality +may be be improved by certain design changes. + + \section sec1-3 Section 1.3: Related Documents + +\par +The names of the other documents in the collection, along with brief summaries +of their contents, are listed below. +\li AFS-3 Programmer?s Reference: File Server/Cache Manager Interface: This +document describes the File Server and Cache Manager agents, which provide the +backbone ?le managment services for AFS. The collection of File Servers for a +cell supplies centralized ?le storage for that site, and allows clients running +the Cache Manager component to access those ?les in a high-performance, secure +fashion. +\li AFS-3 Programmer?s Reference:Volume Server/Volume Location Server +Interface: This document describes the services through which ?containers? of +related user data are located and managed. +\li AFS-3 Programmer?s Reference: Protection Server Interface: This paper +describes the server responsible for mapping printable user names to and from +their internal AFS identi?ers. The Protection Server also allows users to +create, destroy, and manipulate ?groups? of users, which are suitable for +placement on Access Control Lists (ACLs). +\li AFS-3 Programmer?s Reference: BOS Server Interface: This paper covers the +?nanny? service which assists in the administrability of the AFS environment. +\li AFS-3 Programmer?s Reference: Speci?cation for the Rx Remote Procedure Call +Facility: This document speci?es the design and operation of the remote +procedure call and lightweight process packages used by AFS. + + \page chap2 Chapter 2: Technological Background + +\par +Certain changes in technology over the past two decades greatly in?uenced the +nature of computational resources, and the manner in which they were used. +These developments created the conditions under which the notion of a +distributed ?le systems (DFS) was born. This chapter describes these +technological changes, and explores how a distributed ?le system attempts to +capitalize on the new computing environment?s strengths and minimize its +disadvantages. + + \section sec2-1 Section 2.1: Shift in Computational Idioms + +\par +By the beginning of the 1980s, new classes of computing engines and new methods +by which they may be interconnected were becoming firmly established. At this +time, a shift was occurring away from the conventional mainframe-based, +timeshared computing environment to one in which both workstation-class +machines and the smaller personal computers (PCs) were a strong presence. +\par +The new environment offered many benefits to its users when compared with +timesharing. These smaller, self-sufficient machines moved dedicated computing +power and cycles directly onto people's desks. Personal machines were powerful +enough to support a wide variety of applications, and allowed for a richer, +more intuitive, more graphically-based interface for them. Learning curves were +greatly reduced, cutting training costs and increasing new-employee +productivity. In addition, these machines provided a constant level of service +throughout the day. Since a personal machine was typically only executing +programs for a single human user, it did not suffer from timesharing's +load-based response time degradation. Expanding the computing services for an +organization was often accomplished by simply purchasing more of the relatively +cheap machines. Even small organizations could now afford their own computing +resources, over which they exercised full control. This provided more freedom +to tailor computing services to the specific needs of particular groups. +\par +However, many of the benefits offered by the timesharing systems were lost when +the computing idiom first shifted to include personal-style machines. One of +the prime casualties of this shift was the loss of the notion of a single name +space for all files. Instead, workstation-and PC-based environments each had +independent and completely disconnected file systems. The standardized +mechanisms through which files could be transferred between machines (e.g., +FTP) were largely designed at a time when there were relatively few large +machines that were connected over slow links. Although the newer multi-megabit +per second communication pathways allowed for faster transfers, the problem of +resource location in this environment was still not addressed. There was no +longer a system-wide file system, or even a file location service, so +individual users were more isolated from the organization's collective data. +Overall, disk requirements ballooned, since lack of a shared file system was +often resolved by replicating all programs and data to each machine that needed +it. This proliferation of independent copies further complicated the problem of +version control and management in this distributed world. Since computers were +often no longer behind locked doors at a computer center, user authentication +and authorization tasks became more complex. Also, since organizational +managers were now in direct control of their computing facilities, they had to +also actively manage the hardware and software upon which they depended. +\par +Overall, many of the benefits of the proliferation of independent, +personal-style machines were partially offset by the communication and +organizational penalties they imposed. Collaborative work and dissemination of +information became more difficult now that the previously unified file system +was fragmented among hundreds of autonomous machines. + + \section sec2-2 Section 2.2: Distributed File Systems + +\par +As a response to the situation outlined above, the notion of a distributed file +system (DFS) was developed. Basically, a DFS provides a framework in which +access to files is permitted regardless of their locations. Specifically, a +distributed file system offers a single, common set of file system operations +through which those accesses are performed. +\par +There are two major variations on the core DFS concept, classified according to +the way in which file storage is managed. These high-level models are defined +below. +\li Peer-to-peer: In this symmetrical model, each participating machine +provides storage for specific set of files on its own attached disk(s), and +allows others to access them remotely. Thus, each node in the DFS is capable of +both importing files (making reference to files resident on foreign machines) +and exporting files (allowing other machines to reference files located +locally). +\li Server-client: In this model, a set of machines designated as servers +provide the storage for all of the files in the DFS. All other machines, known +as clients, must direct their file references to these machines. Thus, servers +are the sole exporters of files in the DFS, and clients are the sole importers. + +\par +The notion of a DFS, whether organized using the peer-to-peer or server-client +discipline, may be used as a conceptual base upon which the advantages of +personal computing resources can be combined with the single-system benefits of +classical timeshared operation. +\par +Many distributed file systems have been designed and deployed, operating on the +fast local area networks available to connect machines within a single site. +These systems include DOMAIN [9], DS [15], RFS [16], and Sprite [10]. Perhaps +the most widespread of distributed file systems to date is a product from Sun +Microsystems, NFS [13] [14], extending the popular unix file system so that it +operates over local networks. + + \section sec2-3 Section 2.3: Wide-Area Distributed File Systems + +\par +Improvements in long-haul network technology are allowing for faster +interconnection bandwidths and smaller latencies between distant sites. +Backbone services have been set up across the country, and T1 (1.5 +megabit/second) links are increasingly available to a larger number of +locations. Long-distance channels are still at best approximately an order of +magnitude slower than the typical local area network, and often two orders of +magnitude slower. The narrowed difference between local-area and wide-area data +paths opens the window for the notion of a wide-area distributed file system +(WADFS). In a WADFS, the transparency of file access offered by a local-area +DFS is extended to cover machines across much larger distances. Wide-area file +system functionality facilitates collaborative work and dissemination of +information in this larger theater of operation. + + \page chap3 Chapter 3: AFS-3 Design Goals + + \section sec3-1 Section 3.1: Introduction + +\par +This chapter describes the goals for the AFS-3 system, the first commercial +WADFS in existence. +\par +The original AFS goals have been extended over the history of the project. The +initial AFS concept was intended to provide a single distributed file system +facility capable of supporting the computing needs of Carnegie Mellon +University, a community of roughly 10,000 people. It was expected that most CMU +users either had their own workstation-class machine on which to work, or had +access to such machines located in public clusters. After being successfully +implemented, deployed, and tuned in this capacity, it was recognized that the +basic design could be augmented to link autonomous AFS installations located +within the greater CMU campus. As described in Section 2.3, the long-haul +networking environment developed to a point where it was feasible to further +extend AFS so that it provided wide-area file service. The underlying AFS +communication component was adapted to better handle the widely-varying channel +characteristics encountered by intra-site and inter-site operations. +\par +A more detailed history of AFS evolution may be found in [3] and [18]. + + \section sec3-2 Section 3.2: System Goals + +\par +At a high level, the AFS designers chose to extend the single-machine unix +computing environment into a WADFS service. The unix system, in all of its +numerous incarnations, is an important computing standard, and is in very wide +use. Since AFS was originally intended to service the heavily unix-oriented CMU +campus, this decision served an important tactical purpose along with its +strategic ramifications. +\par +In addition, the server-client discipline described in Section 2.2 was chosen +as the organizational base for AFS. This provides the notion of a central file +store serving as the primary residence for files within a given organization. +These centrally-stored files are maintained by server machines and are made +accessible to computers running the AFS client software. +\par +Listed in the following sections are the primary goals for the AFS system. +Chapter 4 examines how the AFS design decisions, concepts, and implementation +meet this list of goals. + + \subsection sec3-2-1 Section 3.2.1: Scale + +\par +AFS differs from other existing DFSs in that it has the specific goal of +supporting a very large user community with a small number of server machines. +Unlike the rule-of-thumb ratio of approximately 20 client machines for every +server machine (20:1) used by Sun Microsystem's widespread NFS distributed file +system, the AFS architecture aims at smoothly supporting client/server ratios +more along the lines of 200:1 within a single installation. In addition to +providing a DFS covering a single organization with tens of thousands of users, +AFS also aims at allowing thousands of independent, autonomous organizations to +join in the single, shared name space (see Section 3.2.2 below) without a +centralized control or coordination point. Thus, AFS envisions supporting the +file system needs of tens of millions of users at interconnected yet autonomous +sites. + + \subsection sec3-2-2 Section 3.2.2: Name Space + +\par +One of the primary strengths of the timesharing computing environment is the +fact that it implements a single name space for all files in the system. Users +can walk up to any terminal connected to a timesharing service and refer to its +files by the identical name. This greatly encourages collaborative work and +dissemination of information, as everyone has a common frame of reference. One +of the major AFS goals is the extension of this concept to a WADFS. Users +should be able to walk up to any machine acting as an AFS client, anywhere in +the world, and use the identical file name to refer to a given object. +\par +In addition to the common name space, it was also an explicit goal for AFS to +provide complete access transparency and location transparency for its files. +Access transparency is defined as the system's ability to use a single +mechanism to operate on a file, regardless of its location, local or remote. +Location transparency is defined as the inability to determine a file's +location from its name. A system offering location transparency may also +provide transparent file mobility, relocating files between server machines +without visible effect to the naming system. + + \subsection sec3-2-3 Section 3.2.3: Performance + +\par +Good system performance is a critical AFS goal, especially given the scale, +client-server ratio, and connectivity specifications described above. The AFS +architecture aims at providing file access characteristics which, on average, +are similar to those of local disk performance. + + \subsection sec3-2-4 Section 3.2.4: Security + +\par +A production WADFS, especially one which allows and encourages transparent file +access between different administrative domains, must be extremely conscious of +security issues. AFS assumes that server machines are "trusted" within their +own administrative domain, being kept behind locked doors and only directly +manipulated by reliable administrative personnel. On the other hand, AFS client +machines are assumed to exist in inherently insecure environments, such as +offices and dorm rooms. These client machines are recognized to be +unsupervisable, and fully accessible to their users. This situation makes AFS +servers open to attacks mounted by possibly modified client hardware, firmware, +operating systems, and application software. In addition, while an organization +may actively enforce the physical security of its own file servers to its +satisfaction, other organizations may be lax in comparison. It is important to +partition the system's security mechanism so that a security breach in one +administrative domain does not allow unauthorized access to the facilities of +other autonomous domains. +\par +The AFS system is targeted to provide confidence in the ability to protect +system data from unauthorized access in the above environment, where untrusted +client hardware and software may attempt to perform direct remote file +operations from anywhere in the world, and where levels of physical security at +remote sites may not meet the standards of other sites. + + \subsection sec3-2-5 Section 3.2.5: Access Control + +\par +The standard unix access control mechanism associates mode bits with every file +and directory, applying them based on the user's numerical identifier and the +user's membership in various groups. This mechanism was considered too +coarse-grained by the AFS designers. It was seen as insufficient for specifying +the exact set of individuals and groups which may properly access any given +file, as well as the operations these principals may perform. The unix group +mechanism was also considered too coarse and inflexible. AFS was designed to +provide more flexible and finer-grained control of file access, improving the +ability to define the set of parties which may operate on files, and what their +specific access rights are. + + \subsection sec3-2-6 Section 3.2.6: Reliability + +\par +The crash of a server machine in any distributed file system causes the +information it hosts to become unavailable to the user community. The same +effect is observed when server and client machines are isolated across a +network partition. Given the potential size of the AFS user community, a single +server crash could potentially deny service to a very large number of people. +The AFS design reflects a desire to minimize the visibility and impact of these +inevitable server crashes. + + \subsection sec3-2-7 Section 3.2.7: Administrability + +\par +Driven once again by the projected scale of AFS operation, one of the system's +goals is to offer easy administrability. With the large projected user +population, the amount of file data expected to be resident in the shared file +store, and the number of machines in the environment, a WADFS could easily +become impossible to administer unless its design allowed for easy monitoring +and manipulation of system resources. It is also imperative to be able to apply +security and access control mechanisms to the administrative interface. + + \subsection sec3-2-8 Section 3.2.8: Interoperability/Coexistence + +\par +Many organizations currently employ other distributed file systems, most +notably Sun Microsystem's NFS, which is also an extension of the basic +single-machine unix system. It is unlikely that AFS will receive significant +use if it cannot operate concurrently with other DFSs without mutual +interference. Thus, coexistence with other DFSs is an explicit AFS goal. +\par +A related goal is to provide a way for other DFSs to interoperate with AFS to +various degrees, allowing AFS file operations to be executed from these +competing systems. This is advantageous, since it may extend the set of +machines which are capable of interacting with the AFS community. Hardware +platforms and/or operating systems to which AFS is not ported may thus be able +to use their native DFS system to perform AFS file references. +\par +These two goals serve to extend AFS coverage, and to provide a migration path +by which potential clients may sample AFS capabilities, and gain experience +with AFS. This may result in data migration into native AFS systems, or the +impetus to acquire a native AFS implementation. + + \subsection sec3-2-9 Section 3.2.9: Heterogeneity/Portability + +\par +It is important for AFS to operate on a large number of hardware platforms and +operating systems, since a large community of unrelated organizations will most +likely utilize a wide variety of computing environments. The size of the +potential AFS user community will be unduly restricted if AFS executes on a +small number of platforms. Not only must AFS support a largely heterogeneous +computing base, it must also be designed to be easily portable to new hardware +and software releases in order to maintain this coverage over time. + + \page chap4 Chapter 4: AFS High-Level Design + + \section sec4-1 Section 4.1: Introduction + +\par +This chapter presents an overview of the system architecture for the AFS-3 +WADFS. Different treatments of the AFS system may be found in several +documents, including [3], [4], [5], and [2]. Certain system features discussed +here are examined in more detail in the set of accompanying AFS programmer +specification documents. +\par +After the archtectural overview, the system goals enumerated in Chapter 3 are +revisited, and the contribution of the various AFS design decisions and +resulting features is noted. + + \section sec4-2 Section 4.2: The AFS System Architecture + + \subsection sec4-2-1 Section 4.2.1: Basic Organization + +\par +As stated in Section 3.2, a server-client organization was chosen for the AFS +system. A group of trusted server machines provides the primary disk space for +the central store managed by the organization controlling the servers. File +system operation requests for specific files and directories arrive at server +machines from machines running the AFS client software. If the client is +authorized to perform the operation, then the server proceeds to execute it. +\par +In addition to this basic file access functionality, AFS server machines also +provide related system services. These include authentication service, mapping +between printable and numerical user identifiers, file location service, time +service, and such administrative operations as disk management, system +reconfiguration, and tape backup. + + \subsection sec4-2-2 Section 4.2.2: Volumes + + \subsubsection sec4-2-2-1 Section 4.2.2.1: Definition + +\par +Disk partitions used for AFS storage do not directly host individual user files +and directories. Rather, connected subtrees of the system's directory structure +are placed into containers called volumes. Volumes vary in size dynamically as +the objects it houses are inserted, overwritten, and deleted. Each volume has +an associated quota, or maximum permissible storage. A single unix disk +partition may thus host one or more volumes, and in fact may host as many +volumes as physically fit in the storage space. However, the practical maximum +is currently 3,500 volumes per disk partition. This limitation is imposed by +the salvager program, which examines and repairs file system metadata +structures. +\par +There are two ways to identify an AFS volume. The first option is a 32-bit +numerical value called the volume ID. The second is a human-readable character +string called the volume name. +\par +Internally, a volume is organized as an array of mutable objects, representing +individual files and directories. The file system object associated with each +index in this internal array is assigned a uniquifier and a data version +number. A subset of these values are used to compose an AFS file identifier, or +FID. FIDs are not normally visible to user applications, but rather are used +internally by AFS. They consist of ordered triplets, whose components are the +volume ID, the index within the volume, and the uniquifier for the index. +\par +To understand AFS FIDs, let us consider the case where index i in volume v +refers to a file named example.txt. This file's uniquifier is currently set to +one (1), and its data version number is currently set to zero (0). The AFS +client software may then refer to this file with the following FID: (v, i, 1). +The next time a client overwrites the object identified with the (v, i, 1) FID, +the data version number for example.txt will be promoted to one (1). Thus, the +data version number serves to distinguish between different versions of the +same file. A higher data version number indicates a newer version of the file. +\par +Consider the result of deleting file (v, i, 1). This causes the body of +example.txt to be discarded, and marks index i in volume v as unused. Should +another program create a file, say a.out, within this volume, index i may be +reused. If it is, the creation operation will bump the index's uniquifier to +two (2), and the data version number is reset to zero (0). Any client caching a +FID for the deleted example.txt file thus cannot affect the completely +unrelated a.out file, since the uniquifiers differ. + + \subsubsection sec4-2-2-2 Section 4.2.2.2: Attachment + +\par +The connected subtrees contained within individual volumes are attached to +their proper places in the file space defined by a site, forming a single, +apparently seamless unix tree. These attachment points are called mount points. +These mount points are persistent file system objects, implemented as symbolic +links whose contents obey a stylized format. Thus, AFS mount points differ from +NFS-style mounts. In the NFS environment, the user dynamically mounts entire +remote disk partitions using any desired name. These mounts do not survive +client restarts, and do not insure a uniform namespace between different +machines. +\par +A single volume is chosen as the root of the AFS file space for a given +organization. By convention, this volume is named root.afs. Each client machine +belonging to this organization peforms a unix mount() of this root volume (not +to be confused with an AFS mount point) on its empty /afs directory, thus +attaching the entire AFS name space at this point. + + \subsubsection sec4-2-2-3 Section 4.2.2.3: Administrative Uses + +\par +Volumes serve as the administrative unit for AFS ?le system data, providing as +the basis for replication, relocation, and backup operations. + + \subsubsection sec4-2-2-4 Section 4.2.2.4: Replication + +Read-only snapshots of AFS volumes may be created by administrative personnel. +These clones may be deployed on up to eight disk partitions, on the same server +machine or across di?erent servers. Each clone has the identical volume ID, +which must di?er from its read-write parent. Thus, at most one clone of any +given volume v may reside on a given disk partition. File references to this +read-only clone volume may be serviced by any of the servers which host a copy. + + \subsubsection sec4-2-2-4 Section 4.2.2.5: Backup + +\par +Volumes serve as the unit of tape backup and restore operations. Backups are +accomplished by first creating an on-line backup volume for each volume to be +archived. This backup volume is organized as a copy-on-write shadow of the +original volume, capturing the volume's state at the instant that the backup +took place. Thus, the backup volume may be envisioned as being composed of a +set of object pointers back to the original image. The first update operation +on the file located in index i of the original volume triggers the +copy-on-write association. This causes the file's contents at the time of the +snapshot to be physically written to the backup volume before the newer version +of the file is stored in the parent volume. +\par +Thus, AFS on-line backup volumes typically consume little disk space. On +average, they are composed mostly of links and to a lesser extent the bodies of +those few files which have been modified since the last backup took place. +Also, the system does not have to be shut down to insure the integrity of the +backup images. Dumps are generated from the unchanging backup volumes, and are +transferred to tape at any convenient time before the next backup snapshot is +performed. + + \subsubsection sec4-2-2-6 Section 4.2.2.6: Relocation + +\par +Volumes may be moved transparently between disk partitions on a given file +server, or between different file server machines. The transparency of volume +motion comes from the fact that neither the user-visible names for the files +nor the internal AFS FIDs contain server-specific location information. +\par +Interruption to file service while a volume move is being executed is typically +on the order of a few seconds, regardless of the amount of data contained +within the volume. This derives from the staged algorithm used to move a volume +to a new server. First, a dump is taken of the volume's contents, and this +image is installed at the new site. The second stage involves actually locking +the original volume, taking an incremental dump to capture file updates since +the first stage. The third stage installs the changes at the new site, and the +fourth stage deletes the original volume. Further references to this volume +will resolve to its new location. + + \subsection sec4-2-3 Section 4.2.3: Authentication + +\par +AFS uses the Kerberos [22] [23] authentication system developed at MIT's +Project Athena to provide reliable identification of the principals attempting +to operate on the files in its central store. Kerberos provides for mutual +authentication, not only assuring AFS servers that they are interacting with +the stated user, but also assuring AFS clients that they are dealing with the +proper server entities and not imposters. Authentication information is +mediated through the use of tickets. Clients register passwords with the +authentication system, and use those passwords during authentication sessions +to secure these tickets. A ticket is an object which contains an encrypted +version of the user's name and other information. The file server machines may +request a caller to present their ticket in the course of a file system +operation. If the file server can successfully decrypt the ticket, then it +knows that it was created and delivered by the authentication system, and may +trust that the caller is the party identified within the ticket. +\par +Such subjects as mutual authentication, encryption and decryption, and the use +of session keys are complex ones. Readers are directed to the above references +for a complete treatment of Kerberos-based authentication. + + \subsection sec4-2-4 Section 4.2.4: Authorization + + \subsubsection sec4-2-4-1 Section 4.2.4.1: Access Control Lists + +\par +AFS implements per-directory Access Control Lists (ACLs) to improve the ability +to specify which sets of users have access to the ?les within the directory, +and which operations they may perform. ACLs are used in addition to the +standard unix mode bits. ACLs are organized as lists of one or more (principal, +rights) pairs. A principal may be either the name of an individual user or a +group of individual users. There are seven expressible rights, as listed below. +\li Read (r): The ability to read the contents of the files in a directory. +\li Lookup (l): The ability to look up names in a directory. +\li Write (w): The ability to create new files and overwrite the contents of +existing files in a directory. +\li Insert (i): The ability to insert new files in a directory, but not to +overwrite existing files. +\li Delete (d): The ability to delete files in a directory. +\li Lock (k): The ability to acquire and release advisory locks on a given +directory. +\li Administer (a): The ability to change a directory's ACL. + + \subsubsection sec4-2-4-2 Section 4.2.4.2: AFS Groups + +\par +AFS users may create a certain number of groups, differing from the standard +unix notion of group. These AFS groups are objects that may be placed on ACLs, +and simply contain a list of AFS user names that are to be treated identically +for authorization purposes. For example, user erz may create a group called +erz:friends consisting of the kazar, vasilis, and mason users. Should erz wish +to grant read, lookup, and insert rights to this group in directory d, he +should create an entry reading (erz:friends, rli) in d's ACL. +\par +AFS offers three special, built-in groups, as described below. +\par +1. system:anyuser: Any individual who accesses AFS files is considered by the +system to be a member of this group, whether or not they hold an authentication +ticket. This group is unusual in that it doesn't have a stable membership. In +fact, it doesn't have an explicit list of members. Instead, the system:anyuser +"membership" grows and shrinks as file accesses occur, with users being +(conceptually) added and deleted automatically as they interact with the +system. +\par +The system:anyuser group is typically put on the ACL of those directories for +which some specific level of completely public access is desired, covering any +user at any AFS site. +\par +2. system:authuser: Any individual in possession of a valid Kerberos ticket +minted by the organization's authentication service is treated as a member of +this group. Just as with system:anyuser, this special group does not have a +stable membership. If a user acquires a ticket from the authentication service, +they are automatically "added" to the group. If the ticket expires or is +discarded by the user, then the given individual will automatically be +"removed" from the group. +\par +The system:authuser group is usually put on the ACL of those directories for +which some specific level of intra-site access is desired. Anyone holding a +valid ticket within the organization will be allowed to perform the set of +accesses specified by the ACL entry, regardless of their precise individual ID. +\par +3. system:administrators: This built-in group de?nes the set of users capable +of performing certain important administrative operations within the cell. +Members of this group have explicit 'a' (ACL administration) rights on every +directory's ACL in the organization. Members of this group are the only ones +which may legally issue administrative commands to the file server machines +within the organization. This group is not like the other two described above +in that it does have a stable membership, where individuals are added and +deleted from the group explicitly. +\par +The system:administrators group is typically put on the ACL of those +directories which contain sensitive administrative information, or on those +places where only administrators are allowed to make changes. All members of +this group have implicit rights to change the ACL on any AFS directory within +their organization. Thus, they don't have to actually appear on an ACL, or have +'a' rights enabled in their ACL entry if they do appear, to be able to modify +the ACL. + + \subsection sec4-2-5 Section 4.2.5: Cells + +\par +A cell is the set of server and client machines managed and operated by an +administratively independent organization, as fully described in the original +proposal [17] and specification [18] documents. The cell's administrators make +decisions concerning such issues as server deployment and configuration, user +backup schedules, and replication strategies on their own hardware and disk +storage completely independently from those implemented by other cell +administrators regarding their own domains. Every client machine belongs to +exactly one cell, and uses that information to determine where to obtain +default system resources and services. +\par +The cell concept allows autonomous sites to retain full administrative control +over their facilities while allowing them to collaborate in the establishment +of a single, common name space composed of the union of their individual name +spaces. By convention, any file name beginning with /afs is part of this shared +global name space and can be used at any AFS-capable machine. The original +mount point concept was modified to contain cell information, allowing volumes +housed in foreign cells to be mounted in the file space. Again by convention, +the top-level /afs directory contains a mount point to the root.cell volume for +each cell in the AFS community, attaching their individual file spaces. Thus, +the top of the data tree managed by cell xyz is represented by the /afs/xyz +directory. +\par +Creating a new AFS cell is straightforward, with the operation taking three +basic steps: +\par +1. Name selection: A prospective site has to first select a unique name for +itself. Cell name selection is inspired by the hierarchical Domain naming +system. Domain-style names are designed to be assignable in a completely +decentralized fashion. Example cell names are transarc.com, ssc.gov, and +umich.edu. These names correspond to the AFS installations at Transarc +Corporation in Pittsburgh, PA, the Superconducting Supercollider Lab in Dallas, +TX, and the University of Michigan at Ann Arbor, MI. respectively. +\par +2. Server installation: Once a cell name has been chosen, the site must bring +up one or more AFS file server machines, creating a local file space and a +suite of local services, including authentication (Section 4.2.6.4) and volume +location (Section 4.2.6.2). +\par +3. Advertise services: In order for other cells to discover the presence of the +new site, it must advertise its name and which of its machines provide basic +AFS services such as authentication and volume location. An established site +may then record the machines providing AFS system services for the new cell, +and then set up its mount point under /afs. By convention, each cell places the +top of its file tree in a volume named root.cell. + + \subsection sec4-2-6 Section 4.2.6: Implementation of Server +Functionality + +\par +AFS server functionality is implemented by a set of user-level processes which +execute on server machines. This section examines the role of each of these +processes. + + \subsubsection sec4-2-6-1 Section 4.2.6.1: File Server + +\par +This AFS entity is responsible for providing a central disk repository for a +particular set of files within volumes, and for making these files accessible +to properly-authorized users running on client machines. + + \subsubsection sec4-2-6-2 Section 4.2.6.2: Volume Location Server + +\par +The Volume Location Server maintains and exports the Volume Location Database +(VLDB). This database tracks the server or set of servers on which volume +instances reside. Among the operations it supports are queries returning volume +location and status information, volume ID management, and creation, deletion, +and modification of VLDB entries. +\par +The VLDB may be replicated to two or more server machines for availability and +load-sharing reasons. A Volume Location Server process executes on each server +machine on which a copy of the VLDB resides, managing that copy. + + \subsubsection sec4-2-6-3 Section 4.2.6.3: Volume Server + +\par +The Volume Server allows administrative tasks and probes to be performed on the +set of AFS volumes residing on the machine on which it is running. These +operations include volume creation and deletion, renaming volumes, dumping and +restoring volumes, altering the list of replication sites for a read-only +volume, creating and propagating a new read-only volume image, creation and +update of backup volumes, listing all volumes on a partition, and examining +volume status. + + \subsubsection sec4-2-6-4 Section 4.2.6.4: Authentication Server + +\par +The AFS Authentication Server maintains and exports the Authentication Database +(ADB). This database tracks the encrypted passwords of the cell's users. The +Authentication Server interface allows operations that manipulate ADB entries. +It also implements the Kerberos mutual authentication protocol, supplying the +appropriate identification tickets to successful callers. +\par +The ADB may be replicated to two or more server machines for availability and +load-sharing reasons. An Authentication Server process executes on each server +machine on which a copy of the ADB resides, managing that copy. + + \subsubsection sec4-2-6-5 Section 4.2.6.5: Protection Server + +\par +The Protection Server maintains and exports the Protection Database (PDB), +which maps between printable user and group names and their internal numerical +AFS identifiers. The Protection Server also allows callers to create, destroy, +query ownership and membership, and generally manipulate AFS user and group +records. +\par +The PDB may be replicated to two or more server machines for availability and +load-sharing reasons. A Protection Server process executes on each server +machine on which a copy of the PDB resides, managing that copy. + + \subsubsection sec4-2-6-6 Section 4.2.6.6: BOS Server + +\par +The BOS Server is an administrative tool which runs on each file server machine +in a cell. This server is responsible for monitoring the health of the AFS +agent processess on that machine. The BOS Server brings up the chosen set of +AFS agents in the proper order after a system reboot, answers requests as to +their status, and restarts them when they fail. It also accepts commands to +start, suspend, or resume these processes, and install new server binaries. + + \subsubsection sec4-2-6-7 Section 4.2.6.7: Update Server/Client + +\par +The Update Server and Update Client programs are used to distribute important +system files and server binaries. For example, consider the case of +distributing a new File Server binary to the set of Sparcstation server +machines in a cell. One of the Sparcstation servers is declared to be the +distribution point for its machine class, and is configured to run an Update +Server. The new binary is installed in the appropriate local directory on that +Sparcstation distribution point. Each of the other Sparcstation servers runs an +Update Client instance, which periodically polls the proper Update Server. The +new File Server binary will be detected and copied over to the client. Thus, +new server binaries need only be installed manually once per machine type, and +the distribution to like server machines will occur automatically. + + \subsection sec4-2-7 Section 4.2.7: Implementation of Client +Functionality + + \subsubsection sec4-2-7-1 Section 4.2.7.1: Introduction + +\par +The portion of the AFS WADFS which runs on each client machine is called the +Cache Manager. This code, running within the client's kernel, is a user's +representative in communicating and interacting with the File Servers. The +Cache Manager's primary responsibility is to create the illusion that the +remote AFS file store resides on the client machine's local disk(s). +\par +s implied by its name, the Cache Manager supports this illusion by maintaining +a cache of files referenced from the central AFS store on the machine's local +disk. All file operations executed by client application programs on files +within the AFS name space are handled by the Cache Manager and are realized on +these cached images. Client-side AFS references are directed to the Cache +Manager via the standard VFS and vnode file system interfaces pioneered and +advanced by Sun Microsystems [21]. The Cache Manager stores and fetches files +to and from the shared AFS repository as necessary to satisfy these operations. +It is responsible for parsing unix pathnames on open() operations and mapping +each component of the name to the File Server or group of File Servers that +house the matching directory or file. +\par +The Cache Manager has additional responsibilities. It also serves as a reliable +repository for the user's authentication information, holding on to their +tickets and wielding them as necessary when challenged during File Server +interactions. It caches volume location information gathered from probes to the +VLDB, and keeps the client machine's local clock synchronized with a reliable +time source. + + \subsubsection sec4-2-7-2 Section 4.2.7.2: Chunked Access + +\par +In previous AFS incarnations, whole-file caching was performed. Whenever an AFS +file was referenced, the entire contents of the file were stored on the +client's local disk. This approach had several disadvantages. One problem was +that no file larger than the amount of disk space allocated to the client's +local cache could be accessed. +\par +AFS-3 supports chunked file access, allowing individual 64 kilobyte pieces to +be fetched and stored. Chunking allows AFS files of any size to be accessed +from a client. The chunk size is settable at each client machine, but the +default chunk size of 64K was chosen so that most unix files would fit within a +single chunk. + + \subsubsection sec4-2-7-3 Section 4.2.7.3: Cache Management + +\par +The use of a file cache by the AFS client-side code, as described above, raises +the thorny issue of cache consistency. Each client must effciently determine +whether its cached file chunks are identical to the corresponding sections of +the file as stored at the server machine before allowing a user to operate on +those chunks. +\par +AFS employs the notion of a callback as the backbone of its cache consistency +algorithm. When a server machine delivers one or more chunks of a file to a +client, it also includes a callback "promise" that the client will be notified +if any modifications are made to the data in the file at the server. Thus, as +long as the client machine is in possession of a callback for a file, it knows +it is correctly synchronized with the centrally-stored version, and allows its +users to operate on it as desired without any further interaction with the +server. Before a file server stores a more recent version of a file on its own +disks, it will first break all outstanding callbacks on this item. A callback +will eventually time out, even if there are no changes to the file or directory +it covers. + + \subsection sec4-2-8 Section 4.2.8: Communication Substrate: Rx + +\par +All AFS system agents employ remote procedure call (RPC) interfaces. Thus, +servers may be queried and operated upon regardless of their location. +\par +The Rx RPC package is used by all AFS agents to provide a high-performance, +multi-threaded, and secure communication mechanism. The Rx protocol is +adaptive, conforming itself to widely varying network communication media +encountered by a WADFS. It allows user applications to de?ne and insert their +own security modules, allowing them to execute the precise end-to-end +authentication algorithms required to suit their specific needs and goals. Rx +offers two built-in security modules. The first is the null module, which does +not perform any encryption or authentication checks. The second built-in +security module is rxkad, which utilizes Kerberos authentication. +\par +Although pervasive throughout the AFS distributed file system, all of its +agents, and many of its standard application programs, Rx is entirely separable +from AFS and does not depend on any of its features. In fact, Rx can be used to +build applications engaging in RPC-style communication under a variety of +unix-style file systems. There are in-kernel and user-space implementations of +the Rx facility, with both sharing the same interface. + + \subsection sec4-2-9 Section 4.2.9: Database Replication: ubik + +\par +The three AFS system databases (VLDB, ADB, and PDB) may be replicated to +multiple server machines to improve their availability and share access loads +among the replication sites. The ubik replication package is used to implement +this functionality. A full description of ubik and of the quorum completion +algorithm it implements may be found in [19] and [20]. +\par +The basic abstraction provided by ubik is that of a disk file replicated to +multiple server locations. One machine is considered to be the synchronization +site, handling all write operations on the database file. Read operations may +be directed to any of the active members of the quorum, namely a subset of the +replication sites large enough to insure integrity across such failures as +individual server crashes and network partitions. All of the quorum members +participate in regular elections to determine the current synchronization site. +The ubik algorithms allow server machines to enter and exit the quorum in an +orderly and consistent fashion. +\par +All operations to one of these replicated "abstract files" are performed as +part of a transaction. If all the related operations performed under a +transaction are successful, then the transaction is committed, and the changes +are made permanent. Otherwise, the transaction is aborted, and all of the +operations for that transaction are undone. +\par +Like Rx, the ubik facility may be used by client applications directly. Thus, +user applicatons may easily implement the notion of a replicated disk file in +this fashion. + + \subsection sec4-2-10 Section 4.2.10: System Management + +\par +There are several AFS features aimed at facilitating system management. Some of +these features have already been mentioned, such as volumes, the BOS Server, +and the pervasive use of secure RPCs throughout the system to perform +administrative operations from any AFS client machinein the worldwide +community. This section covers additional AFS features and tools that assist in +making the system easier to manage. + + \subsubsection sec4-2-10-1 Section 4.2.10.1: Intelligent Access +Programs + +\par +A set of intelligent user-level applications were written so that the AFS +system agents could be more easily queried and controlled. These programs +accept user input, then translate the caller's instructions into the proper +RPCs to the responsible AFS system agents, in the proper order. +\par +An example of this class of AFS application programs is vos, which mediates +access to the Volume Server and the Volume Location Server agents. Consider the +vos move operation, which results in a given volume being moved from one site +to another. The Volume Server does not support a complex operation like a +volume move directly. In fact, this move operation involves the Volume Servers +at the current and new machines, as well as the Volume Location Server, which +tracks volume locations. Volume moves are accomplished by a combination of full +and incremental volume dump and restore operations, and a VLDB update. The vos +move command issues the necessary RPCs in the proper order, and attempts to +recovers from errors at each of the steps. +\par +The end result is that the AFS interface presented to system administrators is +much simpler and more powerful than that offered by the raw RPC interfaces +themselves. The learning curve for administrative personnel is thus flattened. +Also, automatic execution of complex system operations are more likely to be +successful, free from human error. + + \subsubsection sec4-2-10-2 Section 4.2.10.2: Monitoring Interfaces + +\par +The various AFS agent RPC interfaces provide calls which allow for the +collection of system status and performance data. This data may be displayed by +such programs as scout, which graphically depicts File Server performance +numbers and disk utilizations. Such monitoring capabilites allow for quick +detection of system problems. They also support detailed performance analyses, +which may indicate the need to reconfigure system resources. + + \subsubsection sec4-2-10-3 Section 4.2.10.3: Backup System + +\par +A special backup system has been designed and implemented for AFS, as described +in [6]. It is not sufficient to simply dump the contents of all File Server +partitions onto tape, since volumes are mobile, and need to be tracked +individually. The AFS backup system allows hierarchical dump schedules to be +built based on volume names. It generates the appropriate RPCs to create the +required backup volumes and to dump these snapshots to tape. A database is used +to track the backup status of system volumes, along with the set of tapes on +which backups reside. + + \subsection sec4-2-11 Section 4.2.11: Interoperability + +\par +Since the client portion of the AFS software is implemented as a standard +VFS/vnode file system object, AFS can be installed into client kernels and +utilized without interference with other VFS-style file systems, such as +vanilla unix and the NFS distributed file system. +\par +Certain machines either cannot or choose not to run the AFS client software +natively. If these machines run NFS, it is still possible to access AFS files +through a protocol translator. The NFS-AFS Translator may be run on any machine +at the given site that runs both NFS and the AFS Cache Manager. All of the NFS +machines that wish to access the AFS shared store proceed to NFS-mount the +translator's /afs directory. File references generated at the NFS-based +machines are received at the translator machine, which is acting in its +capacity as an NFS server. The file data is actually obtained when the +translator machine issues the corresponding AFS references in its role as an +AFS client. + + \section sec4-3 Section 4.3: Meeting AFS Goals + +\par +The AFS WADFS design, as described in this chapter, serves to meet the system +goals stated in Chapter 3. This section revisits each of these AFS goals, and +identifies the specific architectural constructs that bear on them. + + \subsection sec4-3-1 Section 4.3.1: Scale + +\par +To date, AFS has been deployed to over 140 sites world-wide, with approximately +60 of these cells visible on the public Internet. AFS sites are currently +operating in several European countries, in Japan, and in Australia. While many +sites are modest in size, certain cells contain more than 30,000 accounts. AFS +sites have realized client/server ratios in excess of the targeted 200:1. + + \subsection sec4-3-2 Section 4.3.2: Name Space + +\par +A single uniform name space has been constructed across all cells in the +greater AFS user community. Any pathname beginning with /afs may indeed be used +at any AFS client. A set of common conventions regarding the organization of +the top-level /afs directory and several directories below it have been +established. These conventions also assist in the location of certain per-cell +resources, such as AFS configuration files. +\par +Both access transparency and location transparency are supported by AFS, as +evidenced by the common access mechanisms and by the ability to transparently +relocate volumes. + + \subsection sec4-3-3 Section 4.3.3: Performance + +\par +AFS employs caching extensively at all levels to reduce the cost of "remote" +references. Measured data cache hit ratios are very high, often over 95%. This +indicates that the file images kept on local disk are very effective in +satisfying the set of remote file references generated by clients. The +introduction of file system callbacks has also been demonstrated to be very +effective in the efficient implementation of cache synchronization. Replicating +files and system databases across multiple server machines distributes load +among the given servers. The Rx RPC subsystem has operated successfully at +network speeds ranging from 19.2 kilobytes/second to experimental +gigabit/second FDDI networks. +\par +Even at the intra-site level, AFS has been shown to deliver good performance, +especially in high-load situations. One often-quoted study [1] compared the +performance of an older version of AFS with that of NFS on a large file system +task named the Andrew Benchmark. While NFS sometimes outperformed AFS at low +load levels, its performance fell off rapidly at higher loads while AFS +performance degradation was not significantly affected. + + \subsection sec4-3-4 Section 4.3.4: Security + +\par +The use of Kerberos as the AFS authentication system fits the security goal +nicely. Access to AFS files from untrusted client machines is predicated on the +caller's possession of the appropriate Kerberos ticket(s). Setting up per-site, +Kerveros-based authentication services compartmentalizes any security breach to +the cell which was compromised. Since the Cache Manager will store multiple +tickets for its users, they may take on different identities depending on the +set of file servers being accessed. + + \subsection sec4-3-5 Section 4.3.5: Access Control + +\par +AFS extends the standard unix authorization mechanism with per-directory Access +Control Lists. These ACLs allow specific AFS principals and groups of these +principals to be granted a wide variety of rights on the associated files. +Users may create and manipulate AFS group entities without administrative +assistance, and place these tailored groups on ACLs. + + \subsection sec4-3-6 Section 4.3.6: Reliability + +\par +A subset of file server crashes are masked by the use of read-only replication +on volumes containing slowly-changing files. Availability of important, +frequently-used programs such as editors and compilers may thus been greatly +improved. Since the level of replication may be chosen per volume, and easily +changed, each site may decide the proper replication levels for certain +programs and/or data. +Similarly, replicated system databases help to maintain service in the face of +server crashes and network partitions. + + \subsection sec4-3-7 Section 4.3.7: Administrability + +\par +Such features as pervasive, secure RPC interfaces to all AFS system components, +volumes, overseer processes for monitoring and management of file system +agents, intelligent user-level access tools, interface routines providing +performance and statistics information, and an automated backup service +tailored to a volume-based environment all contribute to the administrability +of the AFS system. + + \subsection sec4-3-8 Section 4.3.8: Interoperability/Coexistence + +\par +Due to its VFS-style implementation, the AFS client code may be easily +installed in the machine's kernel, and may service file requests without +interfering in the operation of any other installed file system. Machines +either not capable of running AFS natively or choosing not to do so may still +access AFS files via NFS with the help of a protocol translator agent. + + \subsection sec4-3-9 Section 4.3.9: Heterogeneity/Portability + +\par +As most modern kernels use a VFS-style interface to support their native file +systems, AFS may usually be ported to a new hardware and/or software +environment in a relatively straightforward fashion. Such ease of porting +allows AFS to run on a wide variety of platforms. + + \page chap5 Chapter 5: Future AFS Design Re?nements + + \section sec5-1 Section 5.1: Overview + +\par +The current AFS WADFS design and implementation provides a high-performance, +scalable, secure, and flexible computing environment. However, there is room +for improvement on a variety of fronts. This chapter considers a set of topics, +examining the shortcomings of the current AFS system and considering how +additional functionality may be fruitfully constructed. +\par +Many of these areas are already being addressed in the next-generation AFS +system which is being built as part of Open Software Foundation?s (OSF) +Distributed Computing Environment [7] [8]. + + \section sec5-2 Section 5.2: unix Semantics + +\par +Any distributed file system which extends the unix file system model to include +remote file accesses presents its application programs with failure modes which +do not exist in a single-machine unix implementation. This semantic difference +is dificult to mask. +\par +The current AFS design varies from pure unix semantics in other ways. In a +single-machine unix environment, modifications made to an open file are +immediately visible to other processes with open file descriptors to the same +file. AFS does not reproduce this behavior when programs on different machines +access the same file. Changes made to one cached copy of the file are not made +immediately visible to other cached copies. The changes are only made visible +to other access sites when a modified version of a file is stored back to the +server providing its primary disk storage. Thus, one client's changes may be +entirely overwritten by another client's modifications. The situation is +further complicated by the possibility that dirty file chunks may be flushed +out to the File Server before the file is closed. +\par +The version of AFS created for the OSF offering extends the current, untyped +callback notion to a set of multiple, independent synchronization guarantees. +These synchronization tokens allow functionality not offered by AFS-3, +including byte-range mandatory locking, exclusive file opens, and read and +write privileges over portions of a file. + + \section sec5-3 Section 5.3: Improved Name Space Management + +\par +Discovery of new AFS cells and their integration into each existing cell's name +space is a completely manual operation in the current system. As the rate of +new cell creations increases, the load imposed on system administrators also +increases. Also, representing each cell's file space entry as a mount point +object in the /afs directory leads to a potential problem. As the number of +entries in the /afs directory increase, search time through the directory also +grows. +\par +One improvement to this situation is to implement the top-level /afs directory +through a Domain-style database. The database would map cell names to the set +of server machines providing authentication and volume location services for +that cell. The Cache Manager would query the cell database in the course of +pathname resolution, and cache its lookup results. +\par +In this database-style environment, adding a new cell entry under /afs is +accomplished by creating the appropriate database entry. The new cell +information is then immediately accessible to all AFS clients. + + \section sec5-4 Section 5.4: Read/Write Replication + +\par +The AFS-3 servers and databases are currently equipped to handle read/only +replication exclusively. However, other distributed file systems have +demonstrated the feasibility of providing full read/write replication of data +in environments very similar to AFS [11]. Such systems can serve as models for +the set of required changes. + + \section sec5-5 Section 5.5: Disconnected Operation + +\par +Several facilities are provided by AFS so that server failures and network +partitions may be completely or partially masked. However, AFS does not provide +for completely disconnected operation of file system clients. Disconnected +operation is a mode in which a client continues to access critical data during +accidental or intentional inability to access the shared file repository. After +some period of autonomous operation on the set of cached files, the client +reconnects with the repository and resynchronizes the contents of its cache +with the shared store. +\par +Studies of related systems provide evidence that such disconnected operation is +feasible [11] [12]. Such a capability may be explored for AFS. + + \section sec5-6 Section 5.6: Multiprocessor Support + +\par +The LWP lightweight thread package used by all AFS system processes assumes +that individual threads may execute non-preemeptively, and that all other +threads are quiescent until control is explicitly relinquished from within the +currently active thread. These assumptions conspire to prevent AFS from +operating correctly on a multiprocessor platform. +\par +A solution to this restriction is to restructure the AFS code organization so +that the proper locking is performed. Thus, critical sections which were +previously only implicitly defined are explicitly specified. + + \page biblio Bibliography + +\li [1] John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, +M. Satyanarayanan, Robert N. Sidebotham, Michael J. West, Scale and Performance +in a Distributed File System, ACM Transactions on Computer Systems, Vol. 6, No. +1, February 1988, pp. 51-81. +\li [2] Michael L. Kazar, Synchronization and Caching Issues in the Andrew File +System, USENIX Proceedings, Dallas, TX, Winter 1988. +\li [3] Alfred Z. Spector, Michael L. Kazar, Uniting File Systems, Unix +Review, March 1989, +\li [4] Johna Till Johnson, Distributed File System Brings LAN Technology to +WANs, Data Communications, November 1990, pp. 66-67. +\li [5] Michael Padovano, PADCOM Associates, AFS widens your horizons in +distributed computing, Systems Integration, March 1991. +\li [6] Steve Lammert, The AFS 3.0 Backup System, LISA IV Conference +Proceedings, Colorado Springs, Colorado, October 1990. +\li [7] Michael L. Kazar, Bruce W. Leverett, Owen T. Anderson, Vasilis +Apostolides, Beth A. Bottos, Sailesh Chutani, Craig F. Everhart, W. Anthony +Mason, Shu-Tsui Tu, Edward R. Zayas, DEcorum File System Architectural +Overview, USENIX Conference Proceedings, Anaheim, Texas, Summer 1990. +\li [8] AFS Drives DCE Selection, Digital Desktop, Vol. 1, No. 6, +September 1990. +\li [9] Levine, P.H., The Apollo DOMAIN Distributed File System, in NATO ASI +Series: Theory and Practice of Distributed Operating Systems, Y. Paker, J-P. +Banatre, M. Bozyigit, editors, Springer-Verlag, 1987. +\li [10] M.N. Nelson, B.B. Welch, J.K. Ousterhout, Caching in the Sprite +Network File System, ACM Transactions on Computer Systems, Vol. 6, No. 1, +February 1988. +\li [11] James J. Kistler, M. Satyanarayanan, Disconnected Operaton in the Coda +File System, CMU School of Computer Science technical report, CMU-CS-91-166, 26 +July 1991. +\li [12] Puneet Kumar, M. Satyanarayanan, Log-Based Directory Resolution +in the Coda File System, CMU School of Computer Science internal document, 2 +July 1991. +\li [13] Sun Microsystems, Inc., NFS: Network File System Protocol +Specification, RFC 1094, March 1989. +\li [14] Sun Microsystems, Inc,. Design and Implementation of the Sun Network +File System, USENIX Summer Conference Proceedings, June 1985. +\li [15] C.H. Sauer, D.W Johnson, L.K. Loucks, A.A. Shaheen-Gouda, and T.A. +Smith, RT PC Distributed Services Overview, Operating Systems Review, Vol. 21, +No. 3, July 1987. +\li [16] A.P. Rifkin, M.P. Forbes, R.L. Hamilton, M. Sabrio, S. Shah, and +K. Yueh, RFS Architectural Overview, Usenix Conference Proceedings, Atlanta, +Summer 1986. +\li [17] Edward R. Zayas, Administrative Cells: Proposal for Cooperative Andrew +File Systems, Information Technology Center internal document, Carnegie Mellon +University, 25 June 1987. +\li [18] Ed. Zayas, Craig Everhart, Design and Specification of the Cellular +Andrew Environment, Information Technology Center, Carnegie Mellon University, +CMU-ITC-070, 2 August 1988. +\li [19] Kazar, Michael L., Information Technology Center, Carnegie Mellon +University. Ubik -A Library For Managing Ubiquitous Data, ITCID, Pittsburgh, +PA, Month, 1988. +\li [20] Kazar, Michael L., Information Technology Center, Carnegie Mellon +University. Quorum Completion, ITCID, Pittsburgh, PA, Month, 1988. +\li [21] S. R. Kleinman. Vnodes: An Architecture for Multiple file +System Types in Sun UNIX, Conference Proceedings, 1986 Summer Usenix Technical +Conference, pp. 238-247, El Toro, CA, 1986. +\li [22] S.P. Miller, B.C. Neuman, J.I. Schiller, J.H. Saltzer. Kerberos +Authentication and Authorization System, Project Athena Technical Plan, Section +E.2.1, M.I.T., December 1987. +\li [23] Bill Bryant. Designing an Authentication System: a Dialogue in Four +Scenes, Project Athena internal document, M.I.T, draft of 8 February 1988. + + +*/ diff --git a/doc/protocol/bos-spec.h b/doc/protocol/bos-spec.h new file mode 100644 index 0000000000..ae50dfafc8 --- /dev/null +++ b/doc/protocol/bos-spec.h @@ -0,0 +1,2473 @@ +/*! + + \page title AFS-3 Programmer's Reference: BOS Server Interface + +\author Edward R. Zayas +Transarc Corporation +\version 1.0 +\date 28 August 1991 11:58 Copyright 1991 Transarc Corporation All Rights +Reserved FS-00-D161 + + \page chap1 Chapter 1: Overview + + \section sec1-1 Section 1.1: Introduction + +\par +One of the important duties of an AFS system administrator is to insure that +processes on file server machines are properly installed and kept running. The +BOS Server was written as a tool for assisting administrators in these tasks. +An instance of the BOS Server runs on each AFS server machine, and has the +following specific areas of responsibility: +\li Definition of the set of processes that are to be run on the machine on +which a given BOS Server executes. This definition may be changed dynamically +by system administrators. Programs may be marked as continuously or +periodically runnable. +\li Automatic startup and restart of these specified processes upon server +bootup and program failure. The BOS Server also responds to administrator +requests for stopping and starting one or more of these processes. In addition, +the BOS Server is capable of restarting itself on command. +\li Collection of information regarding the current status, command line +parameters, execution history, and log files generated by the set of server +programs. +\li Management of the security information resident on the machine on which the +BOS Server executes. Such information includes the list of administratively +privileged people associated with the machine and the set of AFS File Server +encryption keys used in the course of file service. +\li Management of the cell configuration information for the server machine in +question. This includes the name of the cell in which the server resides, along +with the list and locations of the servers within the cell providing AFS +database services (e.g., volume location, authentication, protection). +Installation of server binaries on the given machine. The BOS Server allows +several "generations" of server software to be kept on its machine. +Installation of new software for one or more server agents is handled by the +BOS Server, as is "rolling back" to a previous version should it prove more +stable than the currently-installed image. +\par +Execution of commands on the server machine. An administrator may execute +arbitrary unix commands on a machine running the BOS Server. +\par +Unlike many other AFS server processes, the BOS Server does not maintain a +cell-wide, replicated database. It does, however, maintain several databases +used exclusively on every machine on which it runs. + + \section sec1-2 Section 1.2: Scope + +\par +This paper describes the design and structure of the AFS-3 BOS Server. The +scope of this work is to provide readers with a sufficiently detailed +description of the BOS Server so that they may construct client applications +that call the server's RPC interface routines. + + \section sec1-3 Section 1.3: Document Layout + +\par +The second chapter discusses various aspects of the BOS Server's architecture. +First, one of the basic concepts is examined, namely the bnode. Providing the +complete description of a program or set of programs to be run on the given +server machine, a bnode is the generic definitional unit for the BOS Server's +duties. After bnodes have been explained, the set of standard directories on +which the BOS Server depends is considered. Also, the set of well-known files +within these directories is explored. Their uses and internal formats are +presented. After these sections, a discussion of BOS Server restart times +follows. The BOS Server has special support for two commonly-used restart +occasions, as described by this section. Finally, the organization and behavior +of the bosserver program itself is presented. +\par +The third and final chapter provides a detailed examination of the +programmer-visible BOS Server constants and structures, along with a full +specification of the API for the RPC-based BOS Server functionality. + + \section sec1-4 Section 1.4: Related Documents + +\par +This document is a member of a documentation suite providing programmer-level +specifications for the operation of the various AFS servers and agents, and the +interfaces they export, as well as the underlying RPC system they use to +communicate. The full suite of related AFS specification documents is listed +below: +\li AFS-3 Programmer's Reference: Architectural Overview: This paper provides +an architectual overview of the AFS distributed file system, describing the +full set of servers and agents in a coherent way, illustrating their +relationships to each other and examining their interactions. +\li AFS-3 Programmer's Reference: File Server/Cache Manager Interface: This +document describes the File Server and Cache Manager agents, which provide the +backbone file managment services for AFS. The collection of File Servers for a +cell supply centralized file storage for that site, and allow clients running +the Cache Manager component to acces those files in a high-performance, secure +fashion. +\li AFS-3 Programmer's Reference:Volume Server/Volume Location Server +Interface: This document describes the services through which "containers" of +related user data are located and managed. +\li AFS-3 Programmer's Reference: Protection Server Interface: This paper +describes the server responsible for mapping printable user names to and from +their internal AFS identifiers. The Protection Server also allows users to +create, destroy, and manipulate "groups" of users, which are suitable for +placement on ACLs. +\li AFS-3 Programmer's Reference: Specification for the Rx Remote Procedure +Call Facility: This document specifies the design and operation of the remote +procedure call and lightweight process packages used by AFS. +\par +In addition to these papers, the AFS 3.1 product is delivered with its own +user, administrator, installation, and command reference documents. + + \page chap2 Chapter 2: BOS Server Architecture + +\par +This chapter considers some of the architectual features of the AFS-3 BOS +Server. First, the basic organizational and functional entity employed by the +BOS Server, the bnode, is discussed. Next, the set of files with which the +server interacts is examined. The notion of restart times is then explained in +detail. Finally, the organization and components of the bosserver program +itself, which implements the BOS Server, are presented. + + \section sec2-1 Section 2.1: Bnodes + + \subsection sec2-1-1 Section 2.1.1: Overview + +\par +The information required to manage each AFS-related program running on a File +Server machine is encapsulated in a bnode object. These bnodes serve as the +basic building blocks for BOS Server services. Bnodes have two forms of +existence: +\li On-disk: The BosConfig file (see Section 2.3.4 below) defines the set of +bnodes for which the BOS Server running on that machine will be responsible, +along with specifying other information such as values for the two restart +times. This file provides permanent storage (i.e., between bootups) for the +desired list of programs for that server platform. +\li In-memory: The contents of the BosConfig file are parsed and internalized +by the BOS Server when it starts execution. The basic data for a particular +server program is placed into a struct bnode structure. +\par +The initial contents of the BosConfig file are typically set up during system +installation. The BOS Server can be directed, via its RPC interface, to alter +existing bnode entries in the BosConfig file, add new ones, and delete old +ones. Typically, this file is never edited directly. + + \subsection sec2-1-2 Section 2.1.2: Bnode Classes + +\par +The descriptions of the members of the AFS server suite fall into three broad +classes of programs: +\li Simple programs: This server class is populated by programs that simply +need to be kept running, and do not depend on other programs for correctness or +effectiveness. Examples of AFS servers falling into this category are the +Volume Location Server, Authentication Server, and Protection Server. Since its +members exhibit such straightforward behavior, this class of programs is +referred to as the simple class. +\li Interrelated programs: The File Server program depends on two other +programs, and requires that they be executed at the appropriate times and in +the appropriate sequence, for correct operation. The first of these programs is +the Volume Server, which must be run concurrently with the File Server. The +second is the salvager, which repairs the AFS volume metadata on the server +partitions should the metadata become damaged. The salvager must not be run at +the same time as the File Server. In honor of the File Server trio that +inspired it, the class of programs consisting of groups of interrelated +processes is named the fs class. +\li Periodic programs: Some AFS servers, such as the BackupServer, only need to +run every so often, but on a regular and well-defined basis. The name for this +class is taken from the unix tool that is typically used to define such regular +executions, namely the cron class. +\par +The recognition and definition of these three server classes is exploited by +the BOS Server. Since all of the programs in a given class share certain common +characteristics, they may all utilize the same basic data structures to record +and manage their special requirements. Thus, it is not necessary to reimplement +all the operations required to satisfy the capabilities promised by the BOS +Server RPC interface for each and every program the BOS Server manages. +Implementing one set of operations for each server class is sufficient to +handle any and all server binaries to be run on the platform. + + \subsection sec2-1-3 Section 2.1.3: Per-Class Bnode Operations + +\par +As mentioned above, only one set of basic routines must be implemented for each +AFS server class. Much like Sun's VFS/vnode interface [8], providing a common +set of routines for interacting with a given file system, regardless of its +underlying implementation and semantics, the BOS Server defines a common vector +of operations for a class of programs to be run under the BOS Server's +tutelage. In fact, it was this standardized file system interface that inspired +the "bnode" name. +\par +The BOS Server manipulates the process or processes that are described by each +bnode by invoking the proper functions in the appropriate order from the +operation vector for that server class. Thus, the BOS Server itself operates in +a class-independent fashion. This allows each class to take care of the special +circumstances associated with it, yet to have the BOS Server itself be totally +unaware of what special actions (if any) are needed for the class. This +abstraction also allows more server classes to be implemented without any +significant change to the BOS Server code itself. +\par +There are ten entries in this standardized class function array. The purpose +and usage of each individual class function is described in detail in Section +3.3.5. Much like the VFS system, a collection of macros is also provided in +order to simplify the invocation of these functions. These macros are described +in Section 3.5. The ten function slots are named here for convenience: +\li create() +\li timeout() +\li getstat() +\li setstat() +\li delete() +\li procexit() +\li getstring() +\li getparm() +\li restartp() +\li hascore() + + \section sec2-2 Section 2.2: BOS Server Directories + +\par +The BOS Server expects the existence of the following directories on the local +disk of the platform on which it runs. These directories define where the +system binaries, log files, ubik databases, and other files lie. +\li /usr/afs/bin: This directory houses the full set of AFS server binaries. +Such executables as bosserver, fileserver, vlserver, volserver, kaserver, and +ptserver reside here. +\li /usr/afs/db: This directory serves as the well-known location on the +server's local disk for the ubik database replicas for this machine. +Specifically, the Authentication Server, Protection Server, and the Volume +Location Server maintain their local database images here. +\li /usr/afs/etc: This directory hosts the files containing the security, cell, +and authorized system administrator list for the given machine. Specifically, +the CellServDB, KeyFile, License, ThisCell, and UserList files are stored here. +\li /usr/afs/local: This directory houses the BosConfig file, which supplies +the BOS Server with the permanent set of bnodes to support. Also contained in +this directory are files used exclusively by the salvager. +\li /usr/afs/logs: All of the AFS server programs that maintain log files +deposit them in this directory. + + \section sec2-3 Section 2.3: BOS Server Files + +\par +Several files, some mentioned above, are maintained on the server's local disk +and manipulated by the BOS Server. This section examines many of these files, +and describes their formats. + + \subsection sec2-3-1 Section 2.3.1: /usr/afs/etc/UserList + +\par +This file contains the names of individuals who are allowed to issue +"restricted" BOS Server commands (e.g., creating & deleting bnodes, setting +cell information, etc.) on the given hardware platform. The format is +straightforward, with one administrator name per line. If a cell grants joe and +schmoe these rights on a machine, that particular UserList will have the +following two lines: +\n joe +\n schmoe + + \subsection sec2-3-2 Section 2.3.2: /usr/afs/etc/CellServDB + +\par +This file identifies the name of the cell to which the given server machine +belongs, along with the set of machines on which its ubik database servers are +running. Unlike the CellServDB found in /usr/vice/etc on AFS client machines, +this file contains only the entry for the home cell. It shares the formatting +rules with the /usr/vice/etc version of CellServDB. The contents of the +CellServDB file used by the BOS Server on host grand.central.org are: +\code +>grand.central.org #DARPA clearinghouse cell +192.54.226.100 #grand.central.org +192.54.226.101 #penn.central.org +\endcode + + \subsection sec2-3-3 Section 2.3.3: /usr/afs/etc/ThisCell + +\par +The BOS Server obtains its notion of cell membership from the ThisCell file in +the specified directory. As with the version of ThisCell found in /usr/vice/etc +on AFS client machines, this file simply contains the character string +identifying the home cell name. For any server machine in the grand.central.org +cell, this file contains the following: +\code +grand.central.org +\endcode + + \subsection sec2-3-4 Section 2.3.4: /usr/afs/local/BosConfig + +\par +The BosConfig file is the on-disk representation of the collection of bnodes +this particular BOS Server manages. The BOS Server reads and writes to this +file in the normal course of its affairs. The BOS Server itself, in fact, +should be the only agent that modifies this file. Any changes to BosConfig +should be carried out by issuing the proper RPCs to the BOS Server running on +the desired machine. +\par +The following is the text of the BosConfig file on grand.central.org. A +discussion of the contents follows immediately afterwards. +\code +restarttime 11 0 4 0 0 checkbintime 3 0 5 0 0 +bnode simple kaserver 1 parm /usr/afs/bin/kaserver +end bnode simple ptserver 1 parm +/usr/afs/bin/ptserver end bnode simple vlserver 1 +parm /usr/afs/bin/vlserver end bnode fs fs 1 parm +/usr/afs/bin/fileserver parm /usr/afs/bin/volserver +parm /usr/afs/bin/salvager end bnode simple runntp +1 parm /usr/afs/bin/runntp -localclock transarc.com +end bnode simple upserver 1 parm +/usr/afs/bin/upserver end bnode simple +budb_server 1 parm /usr/afs/bin/budb_server end +bnode cron backup 1 parm +/usr/afs/backup/clones/lib/backup.csh daily parm +05:00 end +\endcode + +\par +The first two lines of this file set the system and new-binary restart times +(see Section 2.4, below). They are optional, with the default system restart +time being every Sunday at 4:00am and the new-binary restart time being 5:00am +daily. Following the reserved words restarttime and checkbintime are five +integers, providing the mask, day, hour, minute, and second values (in decimal) +for the restart time, as diagrammed below: +\code +restarttime + checkbintime + +\endcode + +\par +The range of acceptable values for these fields is presented in Section 3.3.1. +In this example, the restart line specifies a (decimal) mask value of 11, +selecting the KTIME HOUR, KTIME MIN, and KTIME DAY bits. This indicates that +the hour, minute, and day values are the ones to be used when matching times. +Thus, this line requests that system restarts occur on day 0 (Sunday), hour 4 +(4:00am), and minute 0 within that hour. +\par +The sets of lines that follow define the individual bnodes for the particular +machine. The first line of the bnode definition set must begin with the +reserved word bnode, followed by the type name, the instance name, and the +initial bnode goal: +\code +bnode +\endcode + +\par +The and fields are both character strings, and the + field is an integer. Acceptable values for the are +simple, fs, and cron. Acceptable values for are defined in Section +3.2.3, and are normally restricted to the integer values representing BSTAT +NORMAL and BSTAT SHUTDOWN. Thus, in the bnode line defining the Authentication +Server, it is declared to be of type simple, have instance name kaserver, and +have 1 (BSTAT NORMAL) as a goal (e.g., it should be brought up and kept +running). +\par +Following the bnode line in the BosConfig file may be one or more parm lines. +These entries represent the command line parameters that will be used to invoke +the proper related program or programs. The entire text of the line after the +parm reserved word up to the terminating newline is stored as the command line +string. +\code +parm +\endcode + +\par +After the parm lines, if any, the reserved word end must appear alone on a +line, marking the end of an individual bnode definition. + + \subsection sec2-3-5 Section 2.3.5: /usr/afs/local/NoAuth + +\par +The appearance of this file is used to mark whether the BOS Server is to insist +on properly authenticated connections for its restricted operations or whether +it will allow any caller to exercise these commands. Not only is the BOS Server +affected by the presence of this file, but so are all of the bnodes objects the +BOS Server starts up. If /usr/afs/local/NoAuth is present, the BOS Server will +start all of its bnodes with the -noauth flag. +\par +Completely unauthenticated AFS operation will result if this file is present +when the BOS Server starts execution. The file itself is typically empty. If +any data is put into the NoAuth file, it will be ignored by the system. + + \subsection sec2-3-6 Section 2.3.6: /usr/afs/etc/KeyFile + +\par +This file stores the set of AFS encryption keys used for file service +operations. The file follows the format defined by struct afsconf key and +struct afsconf keys in include file afs/keys.h. For the reader's convenience, +these structures are detailed below: +\code +#define AFSCONF_MAXKEYS 8 +struct afsconf_key { + long kvno; + char key[8]; +}; +struct afsconf_keys { + long nkeys; + struct afsconf_key key[AFSCONF_MAXKEYS]; +}; +\endcode +\par +The first longword of the file reveals the number of keys that may be found +there, with a maximum of AFSCONF MAXKEYS (8). The keys themselves follow, each +preceded by its key version number. +\par +All information in this file is stored in network byte order. Each BOS Server +converts the data to the appropriate host byte order befor storing and +manipulating it. + + \section sec2-4 Section 2.4: Restart Times + +\par +It is possible to manually start or restart any server defined within the set +of BOS Server bnodes from any AFS client machine, simply by making the +appropriate call to the RPC interface while authenticated as a valid +administrator (i.e., a principal listed in the UserList file on the given +machine). However, two restart situations merit the implementation of special +functionality within the BOS Server. There are two common occasions, occuring +on a regular basis, where the entire system or individual server programs +should be brought down and restarted: +\par +\b Complete \b system \b restart: To guard against the reliability and +performance problems caused by any core leaks in long-running programs, the +entire AFS system should be occasionally shut down and restarted periodically. +This action 'clears out' the memory system, and may result in smaller memory +images for these servers, as internal data structures are reinitialized back to +their starting sizes. It also allows AFS partitions to be regularly examined, +and salvaged if necessary. +\par +Another reason for performing a complete system restart is to commence +execution of a new release of the BOS Server itself. The new-binary restarts +described below do not restart the BOS Server if a new version of its software +has been installed on the machine. +\par +\b New-binary \b restarts: New server software may be installed at any time +with the assistance of the BOS Server. However, it is often not the case that +such software installations occur as a result of the discovery of an error in +the program or programs requiring immediate restart. On these occasions, +restarting the given processes in prime time so that the new binaries may begin +execution is counterproductive, causing system downtime and interfering with +user productivity. The system administrator may wish to set an off-peak time +when the server binaries are automatically compared to the running program +images, and restarts take place should the on-disk binary be more recent than +the currently running program. These restarts would thus minimize the resulting +service disruption. +\par +Automatically performing these restart functions could be accomplished by +creating cron-type bnodes that were defined to execute at the desired times. +However, rather than force the system administrator to create and supervise +such bnodes, the BOS Server supports the notion of an internal LWP thread with +the same effect (see Section 2.5.2). As part of the BosConfig file defined +above, the administrator may simply specify the values for the complete system +restart and new-binary restart times, and a dedicated BOS Server thread will +manage the restarts. +\par +Unless otherwise instructed, the BOS Server selects default times for the two +above restart times. A complete system restart is carried out every Sunday at +4:00am by default, and a new-binary restart is executed for each updated binary +at 5:00am every day. + + \section sec2-5 Section 2.5: The bosserver Process + + \subsection sec2-5-1 Section 2.5.1: Introduction + +\par +The user-space bosserver process is in charge of managing the AFS server +processes and software images, the local security and cell database files, and +allowing administrators to execute arbitrary programs on the server machine on +which it runs. It also implements the RPC interface defined in the bosint.xg +Rxgen definition file. + + \subsection sec2-5-2 Section 2.5.2: Threading + +\par +As with all the other AFS server agents, the BOS Server is a multithreaded +program. It is configured so that a minimum of two lightweight threads are +guaranteed to be allocated to handle incoming RPC calls to the BOS Server, and +a maximum of four threads are commissioned for this task. +\par +In addition to these threads assigned to RPC duties, there is one other thread +employed by the BOS Server, the BozoDaemon(). This thread is responsible for +keeping track of the two major restart events, namely the system restart and +the new binary restart (see Section 2.4). Every 60 seconds, this thread is +awakened, at which time it checks to see if either deadline has occurred. If +the complete system restart is then due, it invokes internal BOS Server +routines to shut down the entire suite of AFS agents on that machine and then +reexecute the BOS Server binary, which results in the restart of all of the +server processes. If the new-binary time has arrived, the BOS Server shuts down +the bnodes for which binaries newer than those running are available, and then +invokes the new software. +\par +In general, the following procedure is used when stopping and restarting +processes. First, the restart() operation defined for each bnode's class is +invoked via the BOP RESTART() macro. This allows each server to take any +special steps required before cycling its service. After that function +completes, the standard mechanisms are used to shut down each bnode's process, +wait until it has truly stopped its execution, and then start it back up again. + + \subsection sec2-5-3 Section 2.5.3: Initialization Algorithm + +\par +This section describes the procedure followed by the BOS Server from the time +when it is invoked to the time it has properly initialized the server machine +upon which it is executing. +\par +The first check performed by the BOS Server is whether or not it is running as +root. It needs to manipulate local unix files and directories in which only +root has been given access, so it immediately exits with an error message if +this is not the case. The BOS Server's unix working directory is then set to be +/usr/afs/logs in order to more easily service incoming RPC requests to fetch +the contents of the various server log files at this location. Also, changing +the working directory in this fashion results in any core images dumped by the +BOS Server's wards will be left in /usr/afs/logs. +\par +The command line is then inspected, and the BOS Server determines whether it +will insist on authenticated RPC connections for secure administrative +operations by setting up the /usr/afs/local/NoAuth file appropriately (see +Section 2.3.5). It initializes the underlying bnode package and installs the +three known bnode types (simple, fs, and cron). +\par +After the bnode package is thus set up, the BOS Server ensures that the set of +local directories on which it will depend are present; refer to Section 2.2 for +more details on this matter. The license file in /usr/afs/etc/License is then +read to determine the number of AFS server machines the site is allowed to +operate, and whether the cell is allowed to run the NFS/AFS Translator +software. This file is typically obtained in the initial system installation, +taken from the installation tape. The BOS Server will exit unless this file +exists and is properly formatted. +\par +In order to record its actions, any existing /usr/afs/logs/BosLog file is moved +to BosLog.old, and a new version is opened in append mode. The BOS Server +immediately writes a log entry concerning the state of the above set of +important directories. +\par +At this point, the BOS Server reads the BosConfig file, which lists the set of +bnodes for which it will be responsible. It starts up the processes associated +with the given bnodes. Once accomplished, it invokes its internal system +restart LWP thread (covered in Section 2.5.2 above). +\par +Rx initialization begins at this point, setting up the RPC infrastructure to +receive its packets on the AFSCONF NANNYPORT, UDP port 7007. The local cell +database is then read and internalized, followed by acquisition of the AFS +encryption keys. +\par +After all of these steps have been carried out, the BOS Server has gleaned all +of the necessary information from its environemnt and has also started up its +wards. The final initialization action required is to start all of its listener +LWP threads, which are devoted to executing incoming requests for the BOS +Server RPC interface. + + \subsection sec2-5-4 Section 2.5.4: Command Line Switches + +\par +The BOS Server recognizes exactly one command line argument: -noauth. By +default, the BOS Server attempts to use authenticated RPC connections (unless +the /usr/afs/local/NoAuth file is present; see Section 2.3.5). The appearance +of the -noauth command line flag signals that this server incarnation is to use +unauthenticated connections for even those operations that are normally +restricted to system administrators. This switch is essential during the +initial AFS system installation, where the procedures followed to bootstrap AFS +onto a new machine require the BOS Server to run before some system databases +have been created. + + \page chap3 Chapter 3: BOS Server Interface + + \section sec3-1 Section 3.1: Introduction + +\par +This chapter documents the API for the BOS Server facility, as defined by the +bosint.xg Rxgen interface file and the bnode.h include file. Descriptions of +all the constants, structures, macros, and interface functions available to the +application programmer appear in this chapter. + + \section sec3-2 Section 3.2: Constants + +\par +This section covers the basic constant definitions of interest to the BOS +Server application programmer. These definitions appear in the bosint.h file, +automatically generated from the bosint.xg Rxgen interface file. Another file +is exported to the programmer, namely bnode.h. +\par +Each subsection is devoted to describing constants falling into each of the +following categories: +\li Status bits +\li Bnode activity bits +\li Bnode states +\li Pruning server binaries +\li Flag bits for struct bnode proc +\par +One constant of general utility is BOZO BSSIZE, which defines the length in +characters of BOS Server character string buffers, including the trailing null. +It is defined to be 256 characters. + + \subsection sec3-2-1 Section 3.2.1: Status Bits + +\par +The following bit values are used in the flags field of struct bozo status, as +defined in Section 3.3.4. They record whether or not the associated bnode +process currently has a stored core file, whether the bnode execution was +stopped because of an excessive number of errors, and whether the mode bits on +server binary directories are incorrect. + +\par Name +BOZO HASCORE +\par Value +1 +\par Description +Does this bnode have a stored core file? + +\par Name +BOZO ERRORSTOP +\par Value +2 +\par Description +Was this bnode execution shut down because of an excessive number of errors +(more than 10 in a 10 second period)? + +\par Name +BOZO BADDIRACCESS +\par Value +3 +\par Description +Are the mode bits on the /usr/afs directory and its descendants (etc, bin, +logs, backup, db, local, etc/KeyFile, etc/UserList) correctly set? + + \subsection sec3-2-2 Section 3.2.2: Bnode Activity Bits + +\par +This section describes the legal values for the bit positions within the flags +field of struct bnode, as defined in Section 3.3.8. They specify conditions +related to the basic activity of the bnode and of the entities relying on it. + +\par Name +BNODE NEEDTIMEOUT +\par Value +0x01 +\par Description +This bnode is utilizing the timeout mechanism for invoking actions on its +behalf. + +\par Name +BNODE ACTIVE +\par Value +0x02 +\par Description +The given bnode is in active service. + +\par Name +BNODE WAIT +\par Value +0x04 +\par Description +Someone is waiting for a status change in this bnode. + +\par Name +BNODE DELETE +\par Value +0x08 +\par Description +This bnode should be deleted at the earliest convenience. + +\par Name +BNODE ERRORSTOP +\par Value +0x10 +\par Description +This bnode decommissioned because of an excessive number of errors in its +associated unix processes. + + \subsection sec3-2-3 Section 3.2.3: Bnode States + +\par +The constants defined in this section are used as values within the goal and +fileGoal fields within a struct bnode. They specify either the current state of +the associated bnode, or the anticipated state. In particular, the fileGoal +field, which is the value stored on disk for the bnode, always represents the +desired state of the bnode, whether or not it properly reflects the current +state. For this reason, only BSTAT SHUTDOWN and BSTAT NORMAL may be used within +the fileGoal field. The goal field may take on any of these values, and +accurately reflects the current status of the bnode. + +\par Name +BSTAT SHUTDOWN +\par Value +0 +\par Description +The bnode's execution has been (should be) terminated. + +\par Name +BSTAT NORMAL +\par Value +1 +\par Description +The bnode is (should be) executing normally. + +\par Name +BSTAT SHUTTINGDOWN +\par Value +2 +\par Description +The bnode is currently being shutdown; execution has not yet ceased. + +\par Name +BSTAT STARTINGUP +\par Value +3 +\par Description +The bnode execution is currently being commenced; execution has not yet begun. + + \subsection sec3-2-4 Section 3.2.4: Pruning Server Binaries + +\par +The BOZO Prune() interface function, fully defined in Section 3.6.6.4, allows a +properly-authenticated caller to remove ("prune") old copies of server binaries +and core files managed by the BOS Server. This section identifies the legal +values for the flags argument to the above function, specifying exactly what is +to be pruned. + +\par Name +BOZO PRUNEOLD +\par Value +1 +\par Description +Prune all server binaries with the *.OLD extension. + +\par Name +BOZO PRUNEBAK +\par Value +2 +\par Description +Prune all server binaries with the *.BAK extension. + +\par Name +BOZO PRUNECORE +\par Value +3 +\par Description +Prune core files. + + \subsection sec3-2-5 Section 3.2.5: Flag Bits for struct bnode proc + +\par +This section specifies the acceptable bit values for the flags field in the +struct bnode proc structure, as defined in Section 3.3.9. Basically, they are +used to record whether or not the unix binary associated with the bnode has +ever been run, and if so whether it has ever exited. + +\par Name +BPROC STARTED +\par Value +1 +\par Description +Has the associated unix process ever been started up? + +\par Name +BPROC EXITED +\par Value +2 +\par Description +Has the associated unix process ever exited? + + \section sec3-3 Section 3.3: Structures + +\par +This section describes the major exported BOS Server data structures of +interest to application programmers. + + \subsection sec3-3-1 Section 3.3.1: struct bozo netKTime + +\par +This structure is used to communicate time values to and from the BOS Server. +In particular, the BOZO GetRestartTime() and BOZO SetRestartTime() interface +functions, described in Sections 3.6.2.5 and 3.6.2.6 respectively, use +parameters declared to be of this type. +\par +Four of the fields in this structure specify the hour, minute, second, and day +of the event in question. The first field in the layout serves as a mask, +identifying which of the above four fields are to be considered when matching +the specified time to a given reference time (most often the current time). The +bit values that may be used for the mask field are defined in the afs/ktime.h +include file. For convenience, their values are reproduced here, including some +special cases at the end of the table. + +\par Name +KTIME HOUR +\par Value +0x01 +\par Description +Hour should match. + +\par Name +KTIME MIN +\par Value +0x02 +\par Description +Minute should match. + +\par Name +KTIME SEC +\par Value +0x04 +\par Description +Second should match. + +\par Name +KTIME DAY +\par Value +0x08 +\par Description +Day should match. + +\par Name +KTIME TIME +\par Value +0x07 +\par Description +All times should match. + +\par Name +KTIME NEVER +\par Value +0x10 +\par Description +Special case: never matches. + +\par Name +KTIME NOW +\par Value +0x20 +\par Description +Special case: right now. + +\n \b Fields +\li int mask - A field of bit values used to specify which of the following +field are to be used in computing matches. +\li short hour - The hour, ranging in value from 0 to 23. +\li short min - The minute, ranging in value from 0 to 59. +\li short sec - The second, ranging in value from 0 to 59. +\li short day - Zero specifies Sunday, other days follow in order. + + \subsection sec3-3-2 Section 3.3.2: struct bozo key + +\par +This structure defines the format of an AFS encryption key, as stored in the +key file located at /usr/afs/etc/KeyFile at the host on which the BOS Server +runs. It is used in the argument list of the BOZO ListKeys() and BOZO AddKeys() +interface functions, as described in Sections 3.6.4.4 and 3.6.4.5 respectively. +\n \b Fields +\li char data[8] - The array of 8 characters representing an encryption key. + + \subsection sec3-3-3 Section 3.3.3: struct bozo keyInfo + +\par +This structure defines the information kept regarding a given AFS encryption +key, as represented by a variable of type struct bozo key, as described in +Section 3.3.2 above. A parameter of this type is used by the BOZO ListKeys() +function (described in Section 3.6.4.4). It contains fields holding the +associated key's modification time, a checksum on the key, and an unused +longword field. Note that the mod sec time field listed below is a standard +unix time value. +\n \b Fields +\li long mod sec - The time in seconds when the associated key was last +modified. +\li long mod usec - The number of microseconds elapsed since the second +reported in the mod sec field. This field is never set by the BOS Server, and +should always contain a zero. +\li unsigned long keyCheckSum - The 32-bit cryptographic checksum of the +associated key. A block of zeros is encrypted, and the first four bytes of the +result are placed into this field. +\li long spare2 - This longword field is currently unused, and is reserved for +future use. + + \subsection sec3-3-4 Section 3.3.4: struct bozo status + +\par +This structure defines the layout of the information returned by the status +parameter for the interface function BOZO GetInstanceInfo(), as defined in +Section 3.6.2.3. The enclosed fields include such information as the temporary +and long-term goals for the process instance, an array of bit values recording +status information, start and exit times, and associated error codes and +signals. +\n \b Fields +\li long goal - The short-term goal for a process instance. Settings for this +field are BSTAT SHUTDOWN, BSTAT NORMAL, BSTAT SHUTTINGDOWN, and BSTAT +STARTINGUP. These values are fully defined in Section 3.2.3. +\li long fileGoal - The long-term goal for a process instance. Accepted +settings are restricted to a subset of those used by the goal field above, as +explained in Section 3.2.3. +\li long procStartTime - The last time the given process instance was started. +\li long procStarts - The number of process starts executed on the behalf of +the given bnode. +\li long lastAnyExit - The last time the process instance exited for any +reason. +\li long lastErrorExit - The last time a process exited unexpectedly. +\li long errorCode - The last exit's return code. +\li long errorSignal - The last signal terminating the process. +\li long flags - BOZO HASCORE, BOZO ERRORSTOP, and BOZO BADDIRACCESS. These +constants are fully defined in Section 3.2.1. +\li long spare[] - Eight longword spares, currently unassigned and reserved for +future use. + + \subsection sec3-3-5 Section 3.3.5: struct bnode ops + +\par +This struture defines the base set of operations that each BOS Server bnode +type (struct bnode type, see Section 3.3.6 below) must implement. They are +called at the appropriate times within the BOS Server code via the BOP * macros +(see Section 3.5 and the individual descriptions therein). They allow each +bnode type to define its own behavior in response to its particular needs. +\n \b Fields +\li struct bnode *(*create)() - This function is called whenever a bnode of the +given type is created. Typically, this function will create bnode structures +peculiar to its own type and initialize the new records. Each type +implementation may take a different number of parameters. Note: there is no BOP +* macro defined for this particular function; it is always called directly. +\li int (*timeout)() - This function is called whenever a timeout action must +be taken for this bnode type. It takes a single argument, namely a pointer to a +type-specific bnode structure. The BOP TIMEOUT macro is defined to simplify the +construction of a call to this function. +\li int (*getstat)() - This function is called whenever a caller is attempting +to get status information concerning a bnode of the given type. It takes two +parameters, the first being a pointer to a type-specific bnode structure, and +the second being a pointer to a longword in which the desired status value will +be placed. The BOP GETSTAT macro is defined to simplify the construction of a +call to this function. +\li int (*setstat)() - This function is called whenever a caller is attempting +to set the status information concerning a bnode of the given type. It takes +two parameters, the first being a pointer to a type-specific bnode structure, +and the second being a longword from which the new status value is obtained. +The BOP SETSTAT macro is defined to simplify the construction of a call to this +function. +\li int (*delete)() - This function is called whenever a bnode of this type is +being deleted. It is expected that the proper deallocation and cleanup steps +will be performed here. It takes a single argument, a pointer to a +type-specific bnode structure. The BOP DELETE macro is defined to simplify the +construction of a call to this function. +\li int (*procexit)() - This function is called whenever the unix process +implementing the given bnode exits. It takes two parameters, the first being a +pointer to a type-specific bnode structure, and the second being a pointer to +the struct bnode proc (defined in Section 3.3.9), describing that process in +detail. The BOP PROCEXIT macro is defined to simplify the construction of a +call to this function. +\li int (*getstring)() - This function is called whenever the status string for +the given bnode must be fetched. It takes three parameters. The first is a +pointer to a type-specific bnode structure, the second is a pointer to a +character buffer, and the third is a longword specifying the size, in bytes, of +the above buffer. The BOP GETSTRING macro is defined to simplify the +construction of a call to this function. +\li int (*getparm)() - This function is called whenever a particular parameter +string for the given bnode must be fetched. It takes four parameters. The first +is a pointer to a type-specific bnode structure, the second is a longword +identifying the index of the desired parameter string, the third is a pointer +to a character buffer to receive the parameter string, and the fourth and final +argument is a longword specifying the size, in bytes, of the above buffer. The +BOP GETPARM macro is defined to simplify the construction of a call to this +function. +\li int (*restartp)() - This function is called whenever the unix process +implementing the bnode of this type is being restarted. It is expected that the +stored process command line will be parsed in preparation for the coming +execution. It takes a single argument, a pointer to a type-specific bnode +structure from which the command line can be located. The BOP RESTARTP macro is +defined to simplify the construction of a call to this function. +\li int (*hascore)() - This function is called whenever it must be determined +if the attached process currently has a stored core file. It takes a single +argument, a pointer to a type-specific bnode structure from which the name of +the core file may be constructed. The BOP HASCORE macro is defined to simplify +the construction of a call to this function. + + \subsection sec3-3-6 Section 3.3.6: struct bnode type + +\par +This structure encapsulates the defining characteristics for a given bnode +type. Bnode types are placed on a singly-linked list within the BOS Server, and +are identified by a null-terminated character string name. They also contain +the function array defined in Section 3.3.5, that implements the behavior of +that object type. There are three predefined bnode types known to the BOS +Server. Their names are simple, fs, and cron. It is not currently possible to +dynamically define and install new BOS Server types. +\n \b Fields +\li struct bnode type *next - Pointer to the next bnode type definition +structure in the list. +\li char *name - The null-terminated string name by which this bnode type is +identified. +\li bnode ops *ops - The function array that defines the behavior of this given +bnode type. + + \subsection sec3-3-7 Section 3.3.7: struct bnode token + +\par +This structure is used internally by the BOS Server when parsing the command +lines with which it will start up process instances. This structure is made +externally visible should more types of bnode types be implemented. +\n \b Fields +\li struct bnode token *next - The next token structure queued to the list. +\li char *key - A pointer to the token, or parsed character string, associated +with this entry. + + \subsection sec3-3-8 Section 3.3.8: struct bnode + +\par +This structure defines the essence of a BOS Server process instance. It +contains such important information as the identifying string name, numbers +concerning periodic execution on its behalf, the bnode's type, data on start +and error behavior, a reference count used for garbage collection, and a set of +flag bits. +\n \b Fields +\li char *name - The null-terminated character string providing the instance +name associated with this bnode. +\li long nextTimeout - The next time this bnode should be awakened. At the +specified time, the bnode's flags field will be examined to see if BNODE +NEEDTIMEOUT is set. If so, its timeout() operation will be invoked via the BOP +TIMEOUT() macro. This field will then be reset to the current time plus the +value kept in the period field. +\li long period - This field specifies the time period between timeout calls. +It is only used by processes that need to have periodic activity performed. +\li long rsTime - The time that the BOS Server started counting restarts for +this process instance. +\li long rsCount - The count of the number of restarts since the time recorded +in the rsTime field. +\li struct bnode type *type - The type object defining this bnode's behavior. +\li struct bnode ops *ops - This field is a pointer to the function array +defining this bnode's basic behavior. Note that this is identical to the value +of type->ops. +\par +This pointer is duplicated here for convenience. All of the BOP * macros, +discussed in Section 3.5, reference the bnode's operation array through this +pointer. +\li long procStartTime - The last time this process instance was started +(executed). +\li long procStarts - The number of starts (executions) for this process +instance. +\li long lastAnyExit - The last time this process instance exited for any +reason. +\li long lastErrorExit - The last time this process instance exited +unexpectedly. +\li long errorCode - The last exit return code for this process instance. +\li long errorSignal - The last signal that terminated this process instance. +\li char *lastErrorName - The name of the last core file generated. +\li short refCount - A reference count maintained for this bnode. +\li short flags - This field contains a set of bit fields that identify +additional status information for the given bnode. The meanings of the legal +bit values, explained in Section 3.2.2, are: BOZO NEEDTIMEOUT, BOZO ACTIVE, +BOZO WAIT, BOZO DELETE, and BOZO ERRORSTOP. +\li char goal - The current goal for the process instance. It may take on any +of the values defined in Section 3.2.3, namely BSTAT SHUTDOWN, BSTAT NORMAL, +BSTAT SHUTTINGDOWN, and BSTAT STARTINGUP. +\par +This goal may be changed at will by an authorized caller. Such changes affect +the current status of the process instance. See the description of the BOZO +SetStatus() and BOZO SetTStatus() interface functions, defined in Sections +3.6.3.1 and 3.6.3.2 respectively, for more details. +\li char fileGoal - This field is similar to goal, but represents the goal +stored in the on-file BOS Server description of this process instance. As with +the goal field, see functions the description of the BOZO SetStatus() and BOZO +SetTStatus() interface functions defined in Sections 3.6.3.1 and 3.6.3.2 +respectively for more details. + + \subsection sec3-3-9 Section 3.3.9: struct bnode proc + +\par +This structure defines all of the information known about each unix process the +BOS Server is currently managing. It contains a reference to the bnode defining +the process, along with the command line to be used to start the process, the +optional core file name, the unix pid, and such things as a flag field to keep +additional state information. The BOS Server keeps these records on a global +singly-linked list. +\n \b Fields +\li struct bnode proc *next - A pointer to the BOS Server's next process +record. +\li struct bnode *bnode - A pointer to the bnode creating and defining this +unix process. +\li char *comLine - The text of the command line used to start this process. +\li char *coreName - An optional core file component name for this process. +\li long pid - The unix pid, if successfully created. +\li long lastExit - This field keeps the last termination code for this +process. +\li long lastSignal - The last signal used to kill this process. +\li long flags - A set of bits providing additional process state. These bits +are fully defined in Section 3.2.5, and are: BPROC STARTED and BPROC EXITED. + + \section sec3-4 Section 3.4: Error Codes + +\par +This section covers the set of error codes exported by the BOS Server, +displaying the printable phrases with which they are associated. + +\par Name +BZNOTACTIVE +\par Value +(39424L) +\par Description +process not active. + +\par Name +BZNOENT +\par Value +(39425L) +\par Description +no such entity. + +\par Name +BZBUSY +\par Value +(38426L) +\par Description +can't do operation now. + +\par Name +BZEXISTS +\par Value +(29427L) +\par Description +entity already exists. + +\par Name +BZNOCREATE +\par Value +(39428) +\par Description +failed to create entity. + +\par Name +BZDOM +\par Value +(39429L) +\par Description +index out of range. + +\par Name +BZACCESS +\par Value +(39430L) +\par Description +you are not authorized for this operation. + +\par Name +BZSYNTAX +\par Value +(39431L) +\par Description +syntax error in create parameter. + +\par Name +BZIO +\par Value +(39432L) +\par Description +I/O error. + +\par Name +BZNET +\par Value +(39433L) +\par Description +network problem. + +\par Name +BZBADTYPE +\par Value +(39434L) +\par Description +unrecognized bnode type. + + \section sec3-5 Section 3.5: Macros + +\par +The BOS Server defines a set of macros that are externally visible via the +bnode.h file. They are used to facilitate the invocation of the members of the +struct bnode ops function array, which defines the basic operations for a given +bnode type. Invocations appear throughout the BOS Server code, wherever bnode +type-specific operations are required. Note that the only member of the struct +bnode ops function array that does not have a corresponding invocation macro +defined is create(), which is always called directly. + + \subsection sec3-5-1 Section 3.5.1: BOP TIMEOUT() + +\code +#define BOP_TIMEOUT(bnode) \ +((*(bnode)->ops->timeout)((bnode))) +\endcode +\par +Execute the bnode type-specific actions required when a timeout action must be +taken. This macro takes a single argument, namely a pointer to a type-specific +bnode structure. + + \subsection sec3-5-2 Section 3.5.2: BOP GETSTAT() + +\code +#define BOP_GETSTAT(bnode, a) \ +((*(bnode)->ops->getstat)((bnode),(a))) +\endcode +\par +Execute the bnode type-specific actions required when a caller is attempting to +get status information concerning the bnode. It takes two parameters, the first +being a pointer to a type-specific bnode structure, and the second being a +pointer to a longword in which the desired status value will be placed. + + \subsection sec3-5-3 Section 3.5.3: BOP SETSTAT() + +\code +#define BOP_SETSTAT(bnode, a) \ +((*(bnode)->ops->setstat)((bnode),(a))) +\endcode +\par +Execute the bnode type-specific actions required when a caller is attempting to +set the status information concerning the bnode. It takes two parameters, the +first being a pointer to a type-specific bnode structure, and the second being +a longword from which the new status value is obtained. + + \subsection sec3-5-4 Section 3.5.4: BOP DELETE() + +\code +#define BOP_DELETE(bnode) \ +((*(bnode)->ops->delete)((bnode))) +\endcode +\par +Execute the bnode type-specific actions required when a bnode is deleted. This +macro takes a single argument, namely a pointer to a type-specific bnode +structure. + + \subsection sec3-5-5 Section 3.5.5: BOP PROCEXIT() + +\code +#define BOP_PROCEXIT(bnode, a) \ +((*(bnode)->ops->procexit)((bnode),(a))) +\endcode +\par +Execute the bnode type-specific actions required whenever the unix process +implementing the given bnode exits. It takes two parameters, the first being a +pointer to a type-specific bnode structure, and the second being a pointer to +the struct bnode proc (defined in Section 3.3.9), describing that process in +detail. + + \subsection sec3-5-6 Section 3.5.6: BOP GETSTRING() + +\code +#define BOP_GETSTRING(bnode, a, b) \ +((*(bnode)->ops->getstring)((bnode),(a), (b))) +\endcode +\par +Execute the bnode type-specific actions required when the status string for the +given bnode must be fetched. It takes three parameters. The first is a pointer +to a type-specific bnode structure, the second is a pointer to a character +buffer, and the third is a longword specifying the size, in bytes, of the above +buffer. + + \subsection sec3-5-7 Section 3.5.7: BOP GETPARM() + +\code +#define BOP_GETPARM(bnode, n, b, l) \ +((*(bnode)->ops->getparm)((bnode),(n),(b),(l))) +\endcode +\par +Execute the bnode type-specific actions required when a particular parameter +string for the given bnode must be fetched. It takes four parameters. The first +is a pointer to a type-specific bnode structure, the second is a longword +identifying the index of the desired parameter string, the third is a pointer +to a character buffer to receive the parameter string, and the fourth and final +argument is a longword specifying the size, in bytes, of the above buffer. + + \subsection sec3-5-8 Section 3.5.8: BOP RESTARTP() + +\code +#define BOP_RESTARTP(bnode) \ +((*(bnode)->ops->restartp)((bnode))) +\endcode +\par +Execute the bnode type-specific actions required when the unix process +implementing the bnode of this type is restarted. It is expected that the +stored process command line will be parsed in preparation for the coming +execution. It takes a single argument, a pointer to a type-specific bnode +structure from which the command line can be located. + + \subsection sec3-5-9 Section 3.5.9: BOP HASCORE() + +\code +#define BOP_HASCORE(bnode) ((*(bnode)->ops->hascore)((bnode))) +\endcode +\par +Execute the bnode type-specific actions required when it must be determined +whether or not the attached process currently has a stored core file. It takes +a single argument, a pointer to a type-specific bnode structure from which the +name of the core file may be constructed. + + \section sec3-6 Section 3.6: Functions + +\par +This section covers the BOS Server RPC interface routines. They are generated +from the bosint.xg Rxgen file. At a high level, these functions may be seen as +belonging to seven basic classes: +\li Creating and removing process entries +\li Examining process information +\li Starting, stopping, and restarting processes +\li Security configuration +\li Cell configuration +\li Installing/uninstalling server binaries +\li Executing commands at the server +\par +The following is a summary of the interface functions and their purpose, +divided according to the above classifications: + +\par Creating & Removing Process Entries + +\par Function Name +BOZO CreateBnode() +\par Description +Create a process instance. + +\par Function Name +BOZO DeleteBnode() +\par Description +Delete a process instance. + +\par Examining Process Information + +\par Function Name +BOZO GetStatus() +\par Description +Get status information for the given process instance. + +\par Function Name +BOZO EnumerateInstance() +\par Description +Get instance name from the i'th bnode. + +\par Function Name +BOZO GetInstanceInfo() +\par Description +Get information on the given process instance. + +\par Function Name +BOZO GetInstanceParm() +\par Description +Get text of command line associated with the given process instance. + +\par Function Name +BOZO GetRestartTime() +\par Description +Get one of the BOS Server restart times. + +\par Function Name +BOZO SetRestartTime() +\par Description +Set one of the BOS Server restart times. + +\par Function Name +BOZOGetDates() +\par Description +Get the modification times for versions of a server binary file. + +\par Function Name +StartBOZO GetLog() +\par Description +Pass the IN params when fetching a BOS Server log file. + +\par Function Name +EndBOZO GetLog() +\par Description +Get the OUT params when fetching a BOS Server log file. + +\par Function Name +GetInstanceStrings() +\par Description +Get strings related to a given process instance. + +\par Starting, Stopping & Restarting Processes + +\par Function Name +BOZO SetStatus() +\par Description +Set process instance status and goal. + +\par Function Name +BOZO SetTStatus() +\par Description +Start all existing process instances. + +\par Function Name +BOZO StartupAll() +\par Description +Start all existing process instances. + +\par Function Name +BOZO ShutdownAll() +\par Description +Shut down all process instances. + +\par Function Name +BOZO RestartAll() +\par Description +Shut down, the restarted all process instances. + +\par Function Name +BOZO ReBozo() +\par Description +Shut down, then restart all process instances and the BOS Server itself. + +\par Function Name +BOZO Restart() +\par Description +Restart a given isntance. + +\par Function Name +BOZO WaitAll() +\par Description +Wait until all process instances have reached their goals. + +\par Security Configuration + +\par Function Name +BOZO AddSUser() +\par Description +Add a user to the UserList. + +\par Function Name +BOZO DeleteSUser() +\par Description +Delete a user from the UserList. + +\par Function Name +BOZO ListSUsers() +\par Description +Get the name of the user in a given position in the UserList file. + +\par Function Name +BOZO ListKeys() +\par Description +List info about the key at a given index in the key file. + +\par Function Name +BOZO AddKey() +\par Description +Add a key to the key file. + +\par Function Name +BOZO DeleteKey() +\par Description +Delete the entry for an AFS key. + +\par Function Name +BOZO SetNoAuthFlag() +\par Description +Enable or disable authenticated call requirements. + +\par Cell Configuration + +\par Function Name +BOZO GetCellName() +\par Description +Get the name of the cell to which the BOS Server belongs. + +\par Function Name +BOZO SetCellName() +\par Description +Set the name of the cell to which the BOS Server belongs. + +\par Function Name +BOZO GetCellHost() +\par Description +Get the name of a database host given its index. + +\par Function Name +BOZO AddCellHost() +\par Description +Add an entry to the list of database server hosts. + +\par Function Name +BOZO DeleteCellHost() +\par Description +Delete an entry from the list of database server hosts. + +\par Installing/Uninstalling Server Binaries + +\par Function Name +StartBOZO Install() +\par Description +Pass the IN params when installing a server binary. + +\par Function Name +EndBOZO Install() +\par Description +Get the OUT params when installing a server binary. + +\par Function Name +BOZO UnInstall() +\par Description +Roll back from a server binary installation. + +\par Function Name +BOZO Prune() +\par Description +Throw away old versions of server binaries and core files. + +\par Executing Commands at the Server + +\par Function Name +BOZO Exec() +\par Description +Execute a shell command at the server. + +\par +All of the string parameters in these functions are expected to point to +character buffers that are at least BOZO BSSIZE long. + + \subsection sec3-6-1 Section 3.6.1: Creating and Removing Processes + +\par +The two interface routines defined in this section are used for creating and +deleting bnodes, thus determining which processe instances the BOS Server must +manage. + + \subsubsection sec3-6-1-1 Section 3.6.1.1: BOZO CreateBnode - Create a +process instance + +\code +int BOZO CreateBnode(IN struct rx connection *z conn, + IN char *type, + IN char *instance, + IN char *p1, + IN char *p2, + IN char *p3, + IN char *p4, + IN char *p5, + IN char *p6) +\endcode +\par Description +This interface function allows the caller to create a bnode (process instance) +on the server machine executing the routine. +\par +The instance's type is declared to be the string referenced in the type +argument. There are three supported instance type names, namely simple, fs, and +cron (see Section 2.1 for a detailed examination of the types of bnodes +available). +\par +The bnode's name is specified via the instance parameter. Any name may be +chosen for a BOS Server instance. However, it is advisable to choose a name +related to the name of the actual binary being instantiated. There are eight +well-known names already in common use, corresponding to the ASF system agents. +They are as follows: +\li kaserver for the Authentication Server. +\li runntp for the Network Time Protocol Daemon (ntpd). +\li ptserver for the Protection Server. +\li upclient for the client portion of the UpdateServer, which brings over +binary files from /usr/afs/bin directory and configuration files from +/usr/afs/etc directory on the system control machine. +\li upclientbin for the client portion of the UpdateServer, which uses the +/usr/afs/bin directory on the binary distribution machine for this platform's +CPU/operating system type. +\li upclientetc for the client portion of the UpdateServer, which +references the /usr/afs/etc directory on the system control machine. +\li upserver for the server portion of the UpdateServer. +\li vlserver for the Volume Location Server. +\par +Up to six command-line strings may be communicated in this routine, residing in +arguments p1 through p6. Different types of bnodes allow for different numbers +of actual server processes to be started, and the command lines required for +such instantiation are passed in this manner. +\par +The given bnode's setstat() routine from its individual ops array will be +called in the course of this execution via the BOP SETSTAT() macro. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to create new instances. If successfully created, the new BOS +Server instance will be appended to the BosConfig file kept on the machine's +local disk. The UserList and BosConfig files are examined in detail in Sections +2.3.1 and 2.3.4 respectively. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n BZEXISTS The given instance already exists. +\n BZBADTYPE Illegal value provided in the type parameter. +\n BZNOCREATE Failed to create desired entry. + + \subsubsection sec3-6-1-2 Section 3.6.1.2: BOZO DeleteBnode - Delete a +process instance + +\code +int BOZO DeleteBnode(IN struct rx connection *z conn, IN char *instance) +\endcode +\par Description +This routine deletes the BOS Server bnode whose name is specified by the +instance parameter. If an instance with that name does not exist, this +operation fails. Similarly, if the process or processes associated with the +given bnode have not been shut down (see the descriptions for the BOZO +ShutdownAll() and BOZO ShutdownAll() interface functions), the operation also +fails. +\par +The given bnode's setstat() and delete() routines from its individual ops array +will be called in the course of this execution via the BOP SETSTAT() and BOP +DELETE() macros. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to delete existing instances. If successfully deleted, the old +BOS Server instance will be removed from the BosConfig file kept on the +machine's local disk. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n BZNOENT The given instance name not registered with the BOS Server. +\n BZBUSY The process(es) associated with the given instance are still active +(i.e., a shutdown has not yet been performed or has not yet completed). + + \subsection sec3-6-2 Section 3.6.2: Examining Process Information + +\par +This section describes the ten interface functions that collectively allow +callers to obtain and modify the information stored by the BOS Server to +describe the set of process that it manages. Among the operations supported by +the functions examined here are getting and setting status information, +obtaining the instance parameters, times, and dates, and getting the text of +log files on the server machine + + \subsubsection sec3-6-2-1 Section 3.6.2.1: BOZO GetStatus - Get status +information for the given process instance + +\code +int BOZO GetStatus(IN struct rx connection *z conn, + IN char *instance, + OUT long *intStat, + OUT char **statdescr) +\endcode +\par Description +This interface function looks up the bnode for the given process instance and +places its numerical status indicator into intStat and its status string (if +any) into a buffer referenced by statdescr. +\par +The set of values that may be returned in the intStat argument are defined +fully in Section 3.2.3. Briefly, they are BSTAT STARTINGUP, BSTAT NORMAL, BSTAT +SHUTTINGDOWN, and BSTAT SHUTDOWN. +\par +A buffer holding BOZO BSSIZE (256) characters is allocated, and statdescr is +set to point to it. Not all bnodes types implement status strings, which are +used to provide additional status information for the class. An example of one +bnode type that does define these strings is fs, which exports the following +status strings: +\li "file server running" +\li "file server up; volser down" +\li "salvaging file system" +\li "starting file server" +\li "file server shutting down" +\li "salvager shutting down" +\li "file server shut down" +\par +The given bnode's getstat() routine from its individual ops array will be +called in the course of this execution via the BOP GETSTAT() macro. +\par Error Codes +BZNOENT The given process instance is not registered with the BOS Server. + + \subsubsection sec3-6-2-2 Section 3.6.2.2: BOZO EnumerateInstance - Get +instance name from i'th bnode + +\code +int BOZO EnumerateInstance(IN struct rx connection *z conn, + IN long instance, + OUT char **iname); +\endcode +\par Description +This routine will find the bnode describing process instance number instance +and return that instance's name in the buffer to which the iname parameter +points. This function is meant to be used to enumerate all process instances at +a BOS Server. The first legal instance number value is zero, which will return +the instance name from the first registered bnode. Successive values for +instance will return information from successive bnodes. When all bnodes have +been thus enumerated, the BOZO EnumerateInstance() function will return BZDOM, +indicating that the list of bnodes has been exhausted. +\par Error Codes +BZDOM The instance number indicated in the instance parameter does not exist. + + \subsubsection sec3-6-2-3 Section 3.6.2.3: BOZO GetInstanceInfo - Get +information on the given process instance + +\code +int BOZO GetInstanceInfo(IN struct rx connection *z conn, + IN char *instance, + OUT char **type, + OUT struct bozo status *status) +\endcode +\par Description +Given the string name of a BOS Server instance, this interface function returns +the type of the instance and its associated status descriptor. The set of +values that may be placed into the type parameter are simple, fs, and cron (see +Section 2.1 for a detailed examination of the types of bnodes available). The +status structure filled in by the call includes such information as the goal +and file goals, the process start time, the number of times the process has +started, exit information, and whether or not the process has a core file. +\par Error Codes +BZNOENT The given process instance is not registered with the BOS Server. + + \subsubsection sec3-6-2-4 Section 3.6.2.4: BOZO GetInstanceParm - Get +text of command line associated with the given process instance + +\code +int BOZO GetInstanceParm(IN struct rx connection *z conn, + IN char *instance, + IN long num, + OUT char **parm) +\endcode +\par Description +Given the string name of a BOS Server process instance and an index identifying +the associated command line of interest, this routine returns the text of the +desired command line. The first associated command line text for the instance +may be acquired by setting the index parameter, num, to zero. If an index is +specified for which there is no matching command line stored in the bnode, then +the function returns BZDOM. +\par Error Codes +BZNOENT The given process instance is not registered with the BOS Server. +\n BZDOM There is no command line text associated with index num for this +bnode. + + \subsubsection sec3-6-2-5 Section 3.6.2.5: BOZO GetRestartTime - Get +one of the BOS Server restart times + +\code +int BOZO GetRestartTime(IN struct rx connection *z conn, + IN long type, + OUT struct bozo netKTime *restartTime) +\endcode +\par Description +The BOS Server maintains two different restart times, for itself and all server +processes it manages, as described in Section 2.4. Given which one of the two +types of restart time is desired, this routine fetches the information from the +BOS Server. The type argument is used to specify the exact restart time to +fetch. If type is set to one (1), then the general restart time for all agents +on the machine is fetched. If type is set to two (2), then the new-binary +restart time is returned. A value other than these two for the type parameter +results in a return value of BZDOM. +\par Error Codes +BZDOM All illegal value was passed in via the type parameter. + + \subsubsection sec3-6-2-6 Section 3.6.2.6: BOZO SetRestartTime - Set +one of the BOS Server restart times + +\code +int BOZO SetRestartTime(IN struct rx connection *z conn, + IN long type, + IN struct bozo netKTime *restartTime) +\endcode +\par Description +This function is the inverse of the BOZO GetRestartTime() interface routine +described in Section 3.6.2.5 above. Given the type of restart time and its new +value, this routine will set the desired restart time at the BOS Server +receiving this call. The values for the type parameter are identical to those +used by BOZO GetRestartTime(), namely one (1) for the general restart time and +two (2) for the new-binary restart time. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to set its restart times. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n BZDOM All illegal value was passed in via the type parameter. + + \subsubsection sec3-6-2-7 Section 3.6.2.7: BOZO GetDates - Get the +modification times for versions of a server binary file + +\code +int BOZO GetDates(IN struct rx connection *z conn, + IN char *path, + OUT long *newtime, + OUT long *baktime, + OUT long *oldtime) +\endcode +\par Description +Given a fully-qualified pathname identifying the particular server binary to +examine in the path argument, this interface routine returns the modification +time of that file, along with the modification times for the intermediate +(.BAK) and old (.OLD) versions. The above-mentioned times are deposited into +the newtime, baktime and oldtime arguments. Any one or all of the reported +times may be set to zero, indicating that the associated file does not exist. +\par Error Codes +---None. + + \subsubsection sec3-6-2-8 Section 3.6.2.8: StartBOZO GetLog - Pass the +IN params when fetching a BOS Server log file + +\code +int BOZO StartGetLog(IN struct rx connection *z conn, IN char *name) +\endcode +\par Description +The BOZO GetLog() function defined in the BOS Server Rxgen interface file is +used to acquire the contents of the given log file from the machine processing +the call. It is defined to be a streamed function, namely one that can return +an arbitrary amount of data. For full details on the definition and use of +streamed functions, please refer to the Streamed Function Calls section in [4]. +\par +This function is created by Rxgen in response to the BOZO GetLog() interface +definition in the bosint.xg file. The StartBOZO GetLog() routine handles +passing the IN parameters of the streamed call to the BOS Server. Specifically, +the name parameter is used to convey the string name of the desired log file. +For the purposes of opening the specified files at the machine being contacted, +the current working directory for the BOS Server is considered to be +/usr/afs/logs. If the caller is included in the locally-maintained UserList +file, any pathname may be specified in the name parameter, and the contents of +the given file will be fetched. All other callers must provide a string that +does not include the slash character, as it might be used to construct an +unauthorized request for a file outside the /usr/afs/logs directory. +\par Error Codes +RXGEN CC MARSHAL The transmission of the GetLog() IN parameters failed. This +and all rxgen constant definitions are available from the rxgen consts.h +include file. + + \subsubsection sec3-6-2-9 Section 3.6.2.9: EndBOZO GetLog - Get the OUT +params when fetching a BOS Server log file + +\code +int BOZO EndGetLog(IN struct rx connection *z conn) +\endcode +\par Description +This function is created by Rxgen in response to the BOZO GetLog() interface +definition in the bosint.xg file. The EndBOZO GetLog() routine handles the +recovery of the OUT parameters for this interface call (of which there are +none). The utility of such functions is often the value they return. In this +case, however, EndBOZO GetLog() always returns success. Thus, it is not even +necessary to invoke this particular function, as it is basically a no-op. +\par Error Codes +---Always returns successfully. + + \subsubsection sec3-6-2-10 Section 3.6.2.10: BOZO GetInstanceStrings - +Get strings related to a given process instance + +\code +int BOZO GetInstanceStrings(IN struct rx connection *z conn, +IN char *instance, +OUT char **errorName, +OUT char **spare1, +OUT char **spare2, +OUT char **spare3) +\endcode +\par Description +This interface function takes the string name of a BOS Server instance and +returns a set of strings associated with it. At the current time, there is only +one string of interest returned by this routine. Specifically, the errorName +parameter is set to the error string associated with the bnode, if any. The +other arguments, spare1 through spare3, are set to the null string. Note that +memory is allocated for all of the OUT parameters, so the caller should be +careful to free them once it is done. +\par Error Codes +BZNOENT The given process instance is not registered with the BOS Server. + + \subsection sec3-6-3 Section 3.6.3: Starting, Stopping, and Restarting +Processes + +\par +The eight interface functions described in this section allow BOS Server +clients to manipulate the execution of the process instances the BOS Server +controls. + + \subsubsection sec3-6-3-1 Section 3.6.3.1: BOZO SetStatus - Set process +instance status and goal + +\code +int BOZO SetStatus(IN struct rx connection *z conn, + IN char *instance, + IN long status) +\endcode +\par Description +This routine sets the actual status field, as well as the "file goal", of the +given instance to the value supplied in the status parameter. Legal values for +status are taken from the set described in Section 3.2.3, specifically BSTAT +NORMAL and BSTAT SHUTDOWN. For more information about these constants (and +about goals/file goals), please refer to Section 3.2.3. +\par +The given bnode's setstat() routine from its individual ops array will be +called in the course of this execution via the BOP SETSTAT() macro. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to perform this operation. If successfully modified, the BOS +Server bnode defining the given instance will be written out to the BosConfig +file kept on the machine's local disk. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n BZNOENT The given instance name not registered with the BOS Server. + + \subsubsection sec3-6-3-2 Section 3.6.3.2: BOZO SetTStatus - +Temporarily set process instance status and goal + +\code +int BOZO SetTStatus(IN struct rx connection *z conn, + IN char *instance, + IN long status) +\endcode +\par Description +This interface routine is much like the BOZO SetStatus(), defined in Section +3.6.3.1 above, except that its effect is to set the instance status on a +temporary basis. Specifically, the status field is set to the given status +value, but the "file goal" field is not changed. Thus, the instance's stated +goal has not changed, just its current status. +\par +The given bnode's setstat() routine from its individual ops array will be +called in the course of this execution via the BOP SETSTAT() macro. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to perform this operation. If successfully modified, the BOS +Server bnode defining the given instance will be written out to the BosConfig +file kept on the machine's local disk. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n BZNOENT The given instance name not registered with the BOS Server. + + \subsubsection sec3-6-3-3 Section 3.6.3.3: BOZO StartupAll - Start all +existing process instances + +\code +int BOZO StartupAll(IN struct rx connection *z conn) +\endcode +\par Description +This interface function examines all bnodes and attempts to restart all of +those that have not been explicitly been marked with the BSTAT SHUTDOWN file +goal. Specifically, BOP SETSTAT() is invoked, causing the setstat() routine +from each bnode's ops array to be called. The bnode's flags field is left with +the BNODE ERRORSTOP bit turned off after this call. +\par +The given bnode's setstat() routine from its individual ops array will be +called in the course of this execution via the BOP SETSTAT() macro. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to start up bnode process instances. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsubsection sec3-6-3-4 Section 3.6.3.4: BOZO ShutdownAll - Shut down +all process instances + +\code +int BOZO ShutdownAll(IN struct rx connection *z conn) +\endcode +\par Description +This interface function iterates through all bnodes and attempts to shut them +all down. Specifically, the BOP SETSTAT() is invoked, causing the setstat() +routine from each bnode's ops array to be called, setting that bnode's goal +field to BSTAT SHUTDOWN. +\par +The given bnode's setstat() routine from its individual ops array will be +called in the course of this execution via the BOP SETSTAT() macro. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to perform this operation. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsubsection sec3-6-3-5 Section 3.6.3.5: BOZO RestartAll - Shut down, +then restart all process instances + +\code +int BOZO RestartAll(IN struct rx connection *z conn) +\endcode +\par Description +This interface function shuts down every BOS Server process instance, waits +until the shutdown is complete (i.e., all instances are registered as being in +state BSTAT SHUTDOWN), and then starts them all up again. While all the +processes known to the BOS Server are thus restarted, the invocation of the BOS +Server itself does not share this fate. For simulation of a truly complete +machine restart, as is necessary when an far-reaching change to a database file +has been made, use the BOZO ReBozo() interface routine defined in Section +3.6.3.6 below. +\par +The given bnode's getstat() and setstat() routines from its individual ops +array will be called in the course of this execution via the BOP GETSTAT() and +BOP SETSTAT() macros. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to restart bnode process instances. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsubsection sec3-6-3-6 Section 3.6.3.6: BOZO ReBozo - Shut down, +then restart all process instances and the BOS Server itself + +\code +int BOZO ReBozo(IN struct rx connection *z conn) +\endcode +\par Description +This interface routine is identical to the BOZO RestartAll() call, defined in +Section 3.6.3.5 above, except for the fact that the BOS Server itself is +restarted in addition to all the known bnodes. All of the BOS Server's open +file descriptors are closed, and the standard BOS Server binary image is +started via execve(). +\par +The given bnode's getstat() and setstat() routines from its individual ops +array will be called in the course of this execution via the BOP GETSTAT() and +BOP SETSTAT() macros. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to restart bnode process instances. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsubsection sec3-6-3-7 Section 3.6.3.7: BOZO Restart - Restart a +given process instance + +\code +int BOZO Restart(IN struct rx connection *z conn, IN char *instance) +\endcode +\par Description +This interface function is used to shut down and then restart the process +instance identified by the instance string passed as an argument. +\par +The given bnode's getstat() and setstat() routines from its individual ops +array will be called in the course of this execution via the BOP GETSTAT() and +BOP SETSTAT() macros. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to restart bnode process instances. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n BZNOENT The given instance name not registered with the BOS Server. + + \subsubsection sec3-6-3-8 Section 3.6.3.8: BOZO WaitAll - Wait until +all process instances have reached their goals + +\code +int BOZO WaitAll(IN struct rx connection *z conn) +\endcode +\par Description +This interface function is used to synchronize with the status of the bnodes +managed by the BOS Server. Specifically, the BOZO WaitAll() call returns when +each bnode's current status matches the value in its short-term goal field. For +each bnode it manages, the BOS Server thread handling this call invokes the BOP +GETSTAT() macro, waiting until the bnode's status and goals line up. +\par +Typically, the BOZO WaitAll() routine is used to allow a program to wait until +all bnodes have terminated their execution (i.e., all goal fields have been set +to BSTAT SHUTDOWN and all corresponding processes have been killed). Note, +however, that this routine may also be used to wait until all bnodes start up. +The true utility of this application of BOZO WaitAll() is more questionable, +since it will return when all bnodes have simply commenced execution, which +does not imply that they have completed their initialization phases and are +thus rendering their normal services. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to wait on bnodes through this interface function. +\par +The given bnode's getstat() routine from its individual ops array will be +called in the course of this execution via the BOP GETSTAT() macro. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsection sec3-6-4 Section 3.6.4: Security Configuration + +\par +This section describes the seven BOS Server interface functions that allow a +properly-authorized person to examine and modify certain data relating to +system security. Specifically, it allows for manipulation of the list of +adminstratively 'privileged' individuals, the set of Kerberos keys used for +file service, and whether authenticated connections should be required by the +BOS Server and all other AFS server agents running on the machine. + + \subsubsection sec3-6-4-1 Section 3.6.4.1: BOZO AddSUser - Add a user +to the UserList + +\code +int BOZO AddSUser(IN struct rx connection *z conn, IN char *name); +\endcode +\par Description +This interface function is used to add the given user name to the UserList file +of priviledged BOS Server principals. Only individuals already appearing in the +UserList are permitted to add new entries. If the given user name already +appears in the file, the function fails. Otherwise, the file is opened in +append mode and the name is written at the end with a trailing newline. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n EEXIST The individual specified by name is already on the UserList. +\n EIO If the UserList file could not be opened or closed. + + \subsubsection sec3-6-4-2 Section 3.6.4.2: BOZO DeleteSUser - Delete a +user from the UserList + +\code +int BOZO DeleteSUser(IN struct rx connection *z conn, IN char *name) +\endcode +\par Description +This interface function is used to delete the given user name from the UserList +file of priviledged BOS Server principals. Only individuals already appearing +in the UserList are permitted to delete existing entries. The file is opened in +read mode, and a new file named UserList.NXX is created in the same directory +and opened in write mode. The original UserList is scanned, with each entry +copied to the new file if it doesn't match the given name. After the scan is +done, all files are closed, and the UserList.NXX file is renamed to UserList, +overwriting the original. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n -1 The UserList file could not be opened. +\n EIO The UserList.NXX file could not be opened, or an error occured in the +file close operations. +\n ENOENT The given name was not found in the original UserList file. + + \subsubsection sec3-6-4-3 Section 3.6.4.3: BOZO ListSUsers - Get the +name of the user in the given position in the UserList file + +\code +int BOZO ListSUsers(IN struct rx connection *z conn, + IN long an, + OUT char **name) +\endcode +\par Description +This interface function is used to request the name of priviledged user in the +an'th slot in the BOS Server's UserList file. The string placed into the name +parameter may be up to 256 characters long, including the trailing null. +\par Error Codes +The UserList file could not be opened, or an invalid value was specified for +an. + + \subsubsection sec3-6-4-4 Section 3.6.4.4: BOZO ListKeys - List info +about the key at a given index in the key file + +\code +int BOZO ListKeys(IN struct rx connection *z conn, + IN long an, + OUT long *kvno, + OUT struct bozo key *key, + OUT struct bozo keyInfo *keyinfo) +\endcode +\par Description +This interface function allows its callers to specify the index of the desired +AFS encryption key, and to fetch information regarding that key. If the caller +is properly authorized, the version number of the specified key is placed into +the kvno parameter. Similarly, a description of the given key is placed into +the keyinfo parameter. When the BOS Server is running in noauth mode, the key +itself will be copied into the key argument, otherwise the key structure will +be zeroed. The data placed into the keyinfo argument, declared as a struct bozo +keyInfo as defined in Section 3.3.3, is obtained as follows. The mod sec field +is taken from the value of st mtime after stat()ing /usr/afs/etc/KeyFile, and +the mod usec field is zeroed. The keyCheckSum is computed by an Authentication +Server routine, which calculates a 32-bit cryptographic checksum of the key, +encrypting a block of zeros and then using the first 4 bytes as the checksum. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to obtain information regarding the list of AFS keys held by the +given BOS Server. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n BZDOM An invalid index was found in the an parameter. +\n KABADKEY Defined in the exported kautils.h header file corresponding to the +Authentication Server, this return value indicates a problem with generating +the checksum field of the keyinfo parameter. + + \subsubsection sec3-6-4-5 Section 3.6.4.5: BOZO AddKey - Add a key to +the key file + +\code +int BOZO AddKey(IN struct rx connection *z conn, + IN long an, + IN struct bozo key *key) +\endcode +\par Description +This interface function allows a properly-authorized caller to set the value of +key version number an to the given AFS key. If a slot is found in the key file +/usr/afs/etc/KeyFile marked as key version number an, its value is overwritten +with the key provided. If an entry for the desired key version number does not +exist, the key file is grown, and the new entry filled with the specified +information. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to add new entries into the list of AFS keys held by the BOS +Server. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n AFSCONF FULL The system key file already contains the maximum number of keys +(AFSCONF MAXKEYS, or 8). These two constant defintions are available from the +cellconfig.h and keys.h AFS include files respectively. + + \subsubsection sec3-6-4-6 Section 3.6.4.6: BOZO DeleteKey - Delete the +entry for an AFS key + +\code +int BOZO DeleteKey(IN struct rx connection *z conn, + IN long an) +\endcode +\par Description +This interface function allows a properly-authorized caller to delete key +version number an from the key file, /usr/afs/etc/KeyFile. The existing keys +are scanned, and if one with key version number an is found, it is removed. Any +keys occurring after the deleted one are shifted to remove the file entry +entirely. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to delete entries from the list of AFS keys held by the BOS +Server. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n AFSCONF NOTFOUND An entry for key version number an was not found. This +constant defintion is available from the cellconfig.h AFS include file. + + \subsubsection sec3-6-4-7 Section 3.6.4.7: BOZO SetNoAuthFlag - Enable +or disable requirement for authenticated calls + +\code +int BOZO SetNoAuthFlag(IN struct rx connection *z conn, + IN long flag) +\endcode +\par Description +This interface routine controls the level of authentication imposed on the BOS +Server and all other AFS server agents on the machine by manipulating the +NoAuth file in the /usr/afs/local directory on the server. If the flag +parameter is set to zero (0), the NoAuth file will be removed, instructing the +BOS Server and AFS agents to authenenticate the RPCs they receive. Otherwise, +the file is created as an indication to honor all RPC calls to the BOS Server +and AFS agents, regardless of the credentials carried by callers. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsection sec3-6-5 Section 3.6.5: Cell Configuration + +\par +The five interface functions covered in this section all have to do with +manipulating the configuration information of the machine on which the BOS +Server runs. In particular, one may get and set the cell name for that server +machine, enumerate the list of server machines running database servers for the +cell, and add and delete machines from this list. + + \subsubsection sec3-6-5-1 Section 3.6.5.1: BOZO GetCellName - Get the +name of the cell to which the BOS Server belongs + +\code +int BOZO GetCellName(IN struct rx connection *z conn, OUT char **name) +\endcode +\par Description +This interface routine returns the name of the cell to which the given BOS +Server belongs. The BOS Server consults a file on its local disk, +/usr/afs/etc/ThisCell to obtain this information. If this file does not exist, +then the BOS Server will return a null string. +\par Error Codes +AFSCONF UNKNOWN The BOS Server could not access the cell name file. This +constant defintion is available from the cellconfig.h AFS include file. + + \subsubsection sec3-6-5-2 Section 3.6.5.2: BOZO SetCellName - Set the +name of the cell to which the BOS Server belongs + +\code +int BOZO SetCellName(IN struct rx connection *z conn, IN char *name) +\endcode +\par Description +This interface function allows the caller to set the name of the cell to which +the given BOS Server belongs. The BOS Server writes this information to a file +on its local disk, /usr/afs/etc/ThisCell. The current contents of this file are +first obtained, along with other information about the current cell. If this +operation fails, then BOZO SetCellName() also fails. The string name provided +as an argument is then stored in ThisCell. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to set the name of the cell to which the machine executing the +given BOS Server belongs. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n AFSCONF NOTFOUND Information about the current cell could not be obtained. +This constant definition, along with AFSCONF FAILURE appearing below, is +availabel from the cellconfig.h AFS include file. +\n AFSCONF FAILURE New cell name could not be written to file. + + \subsubsection sec3-6-5-3 Section 3.6.5.3: BOZO GetCellHost - Get the +name of a database host given its index + +\code +int BOZO GetCellHost(IN struct rx connection *z conn, + IN long awhich, + OUT char **name) +\endcode +\par Description +This interface routine allows the caller to get the name of the host appearing +in position awhich in the list of hosts acting as database servers for the BOS +Server's cell. The first valid position in the list is index zero. The host's +name is deposited in the character buffer pointed to by name. If the value of +the index provided in awhich is out of range, the function fails and a null +string is placed in name. +\par Error Codes +BZDOM The host index in awhich is out of range. +\n AFSCONF NOTFOUND Information about the current cell could not be obtained. +This constant defintion may be found in the cellconfig.h AFS include file. + + \subsubsection sec3-6-5-4 Section 3.6.5.4: BOZO AddCellHost - Add an +entry to the list of database server hosts + +\code +int BOZO AddCellHost(IN struct rx connection *z conn, IN char *name) +\endcode +\par Description +This interface function allows properly-authorized callers to add a name to the +list of hosts running AFS database server processes for the BOS Server's home +cell. If the given name does not already appear in the database server list, a +new entry will be created. Regardless, the mapping from the given name to its +IP address will be recomputed, and the cell database file, +/usr/afs/etc/CellServDB will be updated. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to add an entry to the list of host names providing database +services for the BOS Server's home cell. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n AFSCONF NOTFOUND Information about the current cell could not be obtained. +This constant defintion may be found in the cellconfig.h AFS include file. + + \subsubsection sec3-6-5-5 Section 3.6.5.5: BOZO DeleteCellHost - Delete +an entry from the list of database server hosts + +\code +int BOZO DeleteCellHost(IN struct rx connection *z conn, IN char *name) +\endcode +\par Description +This interface routine allows properly-authorized callers to remove a given +name from the list of hosts running AFS database server processes for the BOS +Server's home cell. If the given name does not appear in the database server +list, this function will fail. Otherwise, the matching entry will be removed, +and the cell database file, /usr/afs/etc/CellServDB will be updated. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to delete an entry from the list of host names providing database +services for the BOS Server's home cell. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n AFSCONF NOTFOUND Information about the current cell could not be obtained. +This constant defintion may be found in the cellconfig.h AFS include file. + + \subsection sec3-6-6 Section 3.6.6: Installing/Uninstalling Server +Binaries + +\par +There are four BOS Server interface routines that allow administrators to +install new server binaries and to roll back to older, perhaps more reliable, +executables. They also allow for stored images of the old binaries (as well as +core files) to be 'pruned', or selectively deleted. + +3.6.6.1 StartBOZO Install - Pass the IN params when installing a server binary + +\code +int StartBOZO Install(IN struct rx connection *z conn, + IN char *path, + IN long size, + IN long flags, + IN long date) +\endcode +\par Description +The BOZO Install() function defined in the BOS Server Rxgen interface file is +used to deliver the executable image of an AFS server process to the given +server machine and then installing it in the appropriate directory there. It is +defined to be a streamed function, namely one that can deliver an arbitrary +amount of data. For full details on the definition and use of streamed +functions, please refer to the Streamed Function Calls section in [4]. +\par +This function is created by Rxgen in response to the BOZO Install() interface +definition in the bosint.xg file. The StartBOZO Install() routine handles +passing the IN parameters of the streamed call to the BOS Server. Specifically, +the apath argument specifies the name of the server binary to be installed +(including the full pathname prefix, if necessary). Also, the length of the +binary is communicated via the size argument, and the modification time the +caller wants the given file to carry is placed in date. The flags argument is +currently ignored by the BOS Server. +\par +After the above parameters are delivered with StartBOZO Install(), the BOS +Server creates a file with the name given in the path parameter followed by a +.NEW postfix. The size bytes comprising the text of the executable in question +are then read over the RPC channel and stuffed into this new file. When the +transfer is complete, the file is closed. The existing versions of the server +binary are then 'demoted'; the *.BAK version (if it exists) is renamed to +*.OLD. overwriting the existing *.OLD version if and only if an *.OLD version +does not exist, or if a *.OLD exists and the .BAK file is at least seven days +old. The main binary is then renamed to *.BAK. Finally, the *.NEW file is +renamed to be the new standard binary image to run, and its modification time +is set to date. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to install server software onto the machine on which the BOS +Server runs. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. +\n 100 An error was encountered when writing the binary image to the local disk +file. The truncated file was closed and deleted on the BOS Server. +\n 101 More than size bytes were delivered to the BOS Server in the RPC +transfer of the executable image. +\n 102 Fewer than size bytes were delivered to the BOS Server in the RPC +transfer of the executable image. + + \subsubsection sec3-6-6-2 Section 3.6.6.2: EndBOZO Install - Get the +OUT params when installing a server binary + +\code +int EndBOZO Install(IN struct rx connection *z conn) +\endcode +\par Description +This function is created by Rxgen in response to the BOZO Install() interface +definition in the bosint.xg file. The EndBOZO Install() routine handles the +recovery of the OUT parameters for this interface call, of which there are +none. The utility of such functions is often the value they return. In this +case, however, EndBOZO Install() always returns success. Thus, it is not even +necessary to invoke this particular function, as it is basically a no-op. +\par Error Codes +---Always returns successfully. + + \subsubsection sec3-6-6-3 Section 3.6.6.3: BOZO UnInstall - Roll back +from a server binary installation + +\code +int BOZO UnInstall(IN struct rx connection *z conn, IN char *path) +\endcode +\par Description +This interface function allows a properly-authorized caller to "roll back" from +the installation of a server binary. If the *.BAK version of the server named +path exists, it will be renamed to be the main executable file. In this case, +the *.OLD version, if it exists, will be renamed to *.BAK.If a *.BAK version of +the binary in question is not found, the *.OLD version is renamed as the new +standard binary file. If neither a *.BAK or a *.OLD version of the executable +can be found, the function fails, returning the low-level unix error generated +at the server. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to roll back server software on the machine on which the BOS +Server runs. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsubsection sec3-6-6-4 Section 3.6.6.4: BOZO Prune - Throw away old +versions of server binaries and core files + +\code +int BOZO Prune(IN struct rx connection *z conn, IN long flags) +\endcode +\par Description +This interface routine allows a properly-authorized caller to prune the saved +versions of server binaries resident on the machine on which the BOS Server +runs. The /usr/afs/bin directory on the server machine is scanned in directory +order. If the BOZO PRUNEOLD bit is set in the flags argument, every file with +the *.OLD extension is deleted. If the BOZO PRUNEBAK bit is set in the flags +argument, every file with the *.BAK extension is deleted. Next, the +/usr/afs/logs directory is scanned in directory order. If the BOZO PRUNECORE +bit is set in the flags argument, every file with a name beginning with the +prefix core is deleted. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to prune server software binary versions and core files on the +machine on which the BOS Server runs. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \subsection sec3-6-7 Section 3.6.7: Executing Commands at the Server + +\par +There is a single interface function defined by the BOS Server that allows +execution of arbitrary programs or scripts on any server machine on which a BOS +Server process is active. + +3.6.7.1 BOZO Exec - Execute a shell command at the server + +\code +int BOZO Exec(IN struct rx connection *z conn, IN char *cmd) +\endcode +\par Description +This interface routine allows a properly-authorized caller to execute any +desired shell command on the server on which the given BOS Server runs. There +is currently no provision made to pipe the output of the given command's +execution back to the caller through the RPC channel. +\par +The BOS Server will only allow individuals listed in its locally-maintained +UserList file to execute arbitrary shell commands on the server machine on +which the BOS Server runs via this call. +\par Error Codes +BZACCESS The caller is not authorized to perform this operation. + + \page biblio Bibliography + +\li [1] CMU Information Technology Center. Synchronization and Caching +Issues in the Andrew File System, USENIX Proceedings, Dallas, TX, Winter 1988. +\li [2] Transarc Corporation. AFS 3.0 Command Reference Manual, F-30-0-D103, +Pittsburgh, PA, April 1990. +\li [3] Zayas, Edward R., Transarc Corporation. AFS-3 Programmer's +Reference: Specification for the Rx Remote Procedure Call Facility, FS-00-D164, +Pittsburgh, PA, April 1991. +\li [4] Zayas, Edward R., Transarc Corporation. AFS-3 Programmer's Reference: +File Server/Cache Manager Interface, FS-00-D162, Pittsburgh, PA, April 1991. +\li [5] Transarc Corporation. AFS 3.0 System Administrator's Guide, +F-30-0-D102, Pittsburgh, PA, April 1990. +\li [6] Kazar, Michael L., Information Technology Center, Carnegie Mellon +University. Ubik -A Library For Managing Ubiquitous Data, ITCID, Pittsburgh, +PA, Month, 1988. +\li [7] Kazar, Michael L., Information Technology Center, Carnegie Mellon +University. Quorum Completion, ITCID, Pittsburgh, PA, Month, 1988. +\li [8] S. R. Kleinman. Vnodes: An Architecture for Multiple file +System Types in Sun UNIX, Conference Proceedings, 1986 Summer Usenix Technical +Conference, pp. 238-247, El Toro, CA, 1986. + +*/ + diff --git a/doc/protocol/fs-cm-spec.h b/doc/protocol/fs-cm-spec.h new file mode 100644 index 0000000000..74d7160c59 --- /dev/null +++ b/doc/protocol/fs-cm-spec.h @@ -0,0 +1,3981 @@ +/*! + + \page title AFS-3 Programmer's Reference: File Server/Cache Manager +Interface + +\author Edward R. Zayas +Transarc Corporation +\version 1.1 +\date 20 Aug 1991 9:38 Copyright 1991 Transarc Corporation All Rights Reserved +FS-00-D162 + + \page chap1 Chapter 1: Overview + + \section sec1-1 Section 1.1: Introduction + + \subsection sec1-1-1 Section 1.1.1: The AFS 3.1 Distributed File System + +\par +AFS 3.1 is a distributed file system (DFS) designed to meet the following set +of requirements: +\li Server-client model: Permanent file storage for AFS is maintained by a +collection of file server machines. This centralized storage is accessed by +individuals running on client machines, which also serve as the computational +engines for those users. A single machine may act as both an AFS file server +and client simultaneously. However, file server machines are generally assumed +to be housed in a secure environment, behind locked doors. +\li Scale: Unlike other existing DFSs, AFS was designed with the specific goal +of supporting a very large user community. Unlike the rule-of-thumb ratio of 20 +client machines for every server machine (20:1) used by Sun Microsystem's NFS +distributed file system [4][5], the AFS architecture aims at smoothly +supporting client/server ratios more along the lines of 200:1 within a single +installation. +\par +AFS also provides another, higher-level notion of scalability. Not only can +each independently-administered AFS site, or cell, grow very large (on the +order of tens of thousands of client machines), but individual cells may easily +collaborate to form a single, unified file space composed of the union of the +individual name spaces. Thus, users have the image of a single unix file system +tree rooted at the /afs directory on their machine. Access to files in this +tree is performed with the standard unix commands, editors, and tools, +regardless of a file's location. +\par +These cells and the files they export may be geographically dispersed, thus +requiring client machines to access remote file servers across network pathways +varying widely in speed, latency, and reliability. The AFS architecture +encourages this concept of a single, wide-area file system. As of this writing, +the community AFS filespace includes sites spanning the continental United +States and Hawaii, and also reaches overseas to various installations in +Europe, Japan, and Australia. +\li Performance: This is a critical consideration given the scalability and +connectivity requirements described above. A high-performance system in the +face of high client/server ratios and the existence of low-bandwidth, +high-latency network connections as well as the normal high-speed ones is +achieved by two major mechanisms: +\li Caching: Client machines make extensive use of caching techniques wherever +possible. One important application of this methodology is that each client is +required to maintain a cache of files it has accessed from AFS file servers, +performing its operations exclusively on these local copies. This file cache is +organized in a least-recently-used (LRU) fashion. Thus, each machine will build +a local working set of objects being referenced by its users. As long as the +cached images remain 'current' (i.e., compatible with the central version +stored at the file servers), operations may be performed on these files without +further communication with the central servers. This results in significant +reductions in network traffic and server loads, paving the way for the target +client/server ratios. +\par +This file cache is typically located on the client's local hard disk, although +a strictly in-memory cache is also supported. The disk cache has the advantage +that its contents will survive crashes and reboots, with the expectation that +the majority of cached objects will remain current. The local cache parameters, +including the maximum number of blocks it may occupy on the local disk, may be +changed on the fly. In order to avoid having the size of the client file cache +become a limit on the length of an AFS file, caching is actually performed on +chunks of the file. These chunks are typically 64 Kbytes in length, although +the chunk size used by the client is settable when the client starts up. +\li Callbacks: The use of caches by the file system, as described above, raises +the thorny issue of cache consistency. Each client must efficiently determine +whether its cached file chunks are identical to the corresponding sections of +the file as stored at the server machine before allowing a user to operate on +those chunks. AFS employs the notion of a callback as the backbone of its cache +consistency algorithm. When a server machine delivers one or more chunks of a +file to a client, it also includes a callback 'promise' that the client will be +notified if any modifications are made to the data in the file. Thus, as long +as the client machine is in possession of a callback for a file, it knows it is +correctly synchronized with the centrally-stored version, and allows its users +to operate on it as desired without any further interaction with the server. +Before a file server stores a more recent version of a file on its own disks, +it will first break all outstanding callbacks on this item. A callback will +eventually time out, even if there are no changes to the file or directory it +covers. +\li Location transparency: The typical AFS user does not know which server or +servers houses any of his or her files. In fact, the user's storage may be +distributed among several servers. This location transparency also allows user +data to be migrated between servers without users having to take corrective +actions, or even becoming aware of the shift. +\li Reliability: The crash of a server machine in any distributed file system +will cause the information it hosts to become unavailable to the user +community. The same effect is caused when server and client machines are +isolated across a network partition. AFS addresses this situation by allowing +data to be replicated across two or more servers in a read-only fashion. If the +client machine loses contact with a particular server from which it is +attempting to fetch data, it hunts among the remaining machines hosting +replicas, looking for one that is still in operation. This search is performed +without the user's knowledge or intervention, smoothly masking outages whenever +possible. Each client machine will automatically perform periodic probes of +machines on its list of known servers, updating its internal records concerning +their status. Consequently, server machines may enter and exit the pool without +administrator intervention. +\par +Replication also applies to the various databases employed by the AFS server +processes. These system databases are read/write replicated with a single +synchronization site at any instant. If a synchronization site is lost due to +failure, the remaining database sites elect a new synchronization site +automatically without operator intervention. +\li Security: A production file system, especially one which allows and +encourages transparent access between administrative domains, must be conscious +of security issues. AFS considers the server machines as 'trusted', being kept +behind locked doors and only directly manipulated by administrators. On the +other hand, client machines are, by definition, assumed to exist in inherently +insecure environments. These client machines are recognized to be fully +accessible to their users, making AFS servers open to attacks mounted by +possibly modified hardware, operating systems, and software from its clients. +\li To provide credible file system security, AFS employs an authentication +system based on the Kerberos facility developed by Project Athena at MIT +[6][7]. Users operating from client machines are required to interact with +Authentication Server agents running on the secure server machines to generate +secure tokens of identity. These tokens express the user's identity in an +encrypted fashion, and are stored in the kernel of the client machine. When the +user attempts to fetch or store files, the server may challenge the user to +verify his or her identity. This challenge, hidden from the user and handled +entirely by the RPC layer, will transmit this token to the file server involved +in the operation. The server machine, upon decoding the token and thus +discovering the user's true identity, will allow the caller to perform the +operation if permitted. Access control: The standard unix access control +mechanism associates mode bits with every file and directory, applying them +based on the user's numerical identifier and the user's membership in various +groups. AFS has augmented this traditional access control mechanism with Access +Control Lists (ACLs). Every AFS directory has an associated ACL which defines +the principals or parties that may operate on all files contained in the +directory, and which operations these principals may perform. Rights granted by +these ACLs include read, write, delete, lookup, insert (create new files, but +don't overwrite old files), and administer (change the ACL). Principals on +these ACLs include individual users and groups of users. These groups may be +defined by AFS users without administrative intervention. AFS ACLs provide for +much finer-grained access control for its files. +\li Administrability: Any system with the scaling goals of AFS must pay close +attention to its ease of administration. The task of running an AFS +installation is facilitated via the following mechanisms: +\li Pervasive RPC interfaces: Access to AFS server agents is performed mostly +via RPC interfaces. Thus, servers may be queried and operated upon regardless +of their location. In combination with the security system outlined above, even +administrative functions such as instigating backups, reconfiguring server +machines, and stopping and restarting servers may be performed by an +administrator sitting in front of any AFS-capable machine, as long as the +administrator holds the proper tokens. +\li Replication: As AFS supports read-only replication for user data and +read-write replication for system databases, much of the system reconfiguration +work in light of failures is performed transparently and without human +intervention. Administrators thus typically have more time to respond to many +common failure situations. +\li Data mobility: Improved and balanced utilization of disk resources is +facilitated by the fact that AFS supports transparent relocation of user data +between partitions on a single file server machine or between two different +machines. In a situation where a machine must be brought down for an extended +period, all its storage may be migrated to other servers so that users may +continue their work completely unaffected. +\li Automated 'nanny' services: Each file server machine runs a BOS Server +process, which assists in the machine's administration. This server is +responsible for monitoring the health of the AFS agents under its care, +bringing them up in the proper order after a system reboot, answering requests +as to their status and restarting them when they fail. It also accepts commands +to start, suspend, or resume these processes, and install new server binaries. +Accessible via an RPC interface, this supervisory process relieves +administrators of some oversight responsibilities and also allows them to +perform their duties from any machine running AFS, regardless of location or +geographic distance from the targeted file server machine. +\li On-line backup: Backups may be performed on the data stored by the AFS file +server machines without bringing those machines down for the duration. +Copy-on-write 'snapshots' are taken of the data to be preserved, and tape +backup is performed from these clones. One added benefit is that these backup +clones are on-line and accessible by users. Thus, if someone accidentally +deletes a file that is contained in their last snapshot, they may simply copy +its contents as of the time the snapshot was taken back into their active +workspace. This facility also serves to improve the administrability of the +system, greatly reducing the number of requests to restore data from tape. +\li On-line help: The set of provided program tools used to interact with the +active AFS agents are self-documenting in that they will accept command-line +requests for help, displaying descriptive text in response. +\li Statistics: Each AFS agent facilitates collection of statistical data on +its performance, configuration, and status via its RPC interface. Thus, the +system is easy to monitor. One tool that takes advantage of this facility is +the scout program. Scout polls file server machines periodically, displaying +usage statistics, current disk capacities, and whether the server is +unavailable. Administrators monitoring this information can thus quickly react +to correct overcrowded disks and machine crashes. +\li Coexistence: Many organizations currently employ other distributed file +systems, most notably NFS. AFS was designed to run simultaneously with other +DFSs without interfering in their operation. In fact, an NFS-AFS translator +agent exists that allows pure-NFS client machines to transparently access files +in the AFS community. +\li Portability: Because AFS is implemented using the standard VFS and vnode +interfaces pioneered and advanced by Sun Microsystems, AFS is easily portable +between different platforms from a single vendor or from different vendors. + + \subsection sec1-1-2 Section 1.1.2: Scope of this Document + +\par +This document is a member of a documentation suite providing specifications of +the operations and interfaces offered by the various AFS servers and agents. +Specifically, this document will focus on two of these system agents: +\li File Server: This AFS entity is responsible for providing a central disk +repository for a particular set of files and for making these files accessible +to properly-authorized users running on client machines. The File Server is +implemented as a user-space process +\li Cache Manager: This code, running within the kernel of an AFS client +machine, is a user's representative in communicating with the File Servers, +fetching files back and forth into the local cache as needed. The Cache Manager +also keeps information as to the composition of its own cell as well as the +other AFS cells in existence. It resolves file references and operations, +determining the proper File Server (or group of File Servers) that may satisfy +the request. In addition, it is also a reliable repository for the user's +authentication information, holding on to their tokens and wielding them as +necessary when challenged. + + \subsection sec1-1-3 Section 1.1.3: Related Documents + +\par +The full AFS specification suite of documents is listed below: +\li AFS-3 Programmer's Reference: Architectural Overview: This paper provides +an architectual overview of the AFS distributed file system, describing the +full set of servers and agents in a coherent way, illustrating their +relationships to each other and examining their interactions. +\li AFS-3 Programmer's Reference:Volume Server/Volume Location Server +Interface: This document describes the services through which 'containers' of +related user data are located and managed. +\li AFS-3 Programmer's Reference: Protection Server Interface: This paper +describes the server responsible for providing two-way mappings between +printable usernames and their internal AFS identifiers. The Protection Server +also allows users to create, destroy, and manipulate 'groups' of users, which +are suitable for placement on ACLs. AFS-3 Programmer's Reference: BOS Server +Interface: This paper explicates the 'nanny' service described above, which +assists in the administrability of the AFS environment. +\li AFS-3 Programmer's Reference: Specification for the Rx Remote Procedure +Call Facility: This document specifies the design and operation of the remote +procedure call and lightweight process packages used by AFS. +\par +In addition to these papers, the AFS 3.1 product is delivered with its own +user, administrator, installation, and command reference documents. + + \section sec1-2 Section 1.2: Basic Concepts + +\par +To properly understand AFS operation, specifically the tasks and objectives of +the File Server and Cache Manager, it is necessary to introduce and explain the +following concepts: +\li Cell: A cell is the set of server and client machines operated by an +administratively independent organization. The cell administrators make +decisions concerning such issues as server deployment and configuration, user +backup schedules, and replication strategies on their own hardware and disk +storage completely independently from those implemented by other cell +administrators regarding their own domains. Every client machine belongs to +exactly one cell, and uses that information to determine the set of database +servers it uses to locate system resources and generate authentication +information. +\li Volume: AFS disk partitions do not directly host individual user files or +directories. Rather, connected subtrees of the system's directory structure are +placed into containers called volumes. Volumes vary in size dynamically as +objects are inserted, overwritten, and deleted. Each volume has an associated +quota, or maximum permissible storage. A single unix disk partition may host +one or more volumes, and in fact may host as many volumes as physically fit in +the storage space. However, a practical maximum is 3,500 volumes per disk +partition, since this is the highest number currently handled by the salvager +program. The salvager is run on occasions where the volume structures on disk +are inconsistent, repairing the damage. A compile-time constant within the +salvager imposes the above limit, causing it to refuse to repair any +inconsistent partition with more than 3,500 volumes. Volumes serve many +purposes within AFS. First, they reduce the number of objects with which an +administrator must be concerned, since operations are normally performed on an +entire volume at once (and thus on all files and directories contained within +the volume). In addition, volumes are the unit of replication, data mobility +between servers, and backup. Disk utilization may be balanced by transparently +moving volumes between partitions. +\li Mount Point: The connected subtrees contained within individual volumes +stored at AFS file server machines are 'glued' to their proper places in the +file space defined by a site, forming a single, apparently seamless unix tree. +These attachment points are referred to as mount points. Mount points are +persistent objects, implemented as symbolic links whose contents obey a +stylized format. Thus, AFS mount points differ from NFS-style mounts. In the +NFS environment, the user dynamically mounts entire remote disk partitions +using any desired name. These mounts do not survive client restarts, and do not +insure a uniform namespace between different machines. +\par +As a Cache Manager resolves an AFS pathname as part of a file system operation +initiated by a user process, it recognizes mount points and takes special +action to resolve them. The Cache Manager consults the appropriate Volume +Location Server to discover the File Server (or set of File Servers) hosting +the indicated volume. This location information is cached, and the Cache +Manager then proceeds to contact the listed File Server(s) in turn until one is +found that responds with the contents of the volume's root directory. Once +mapped to a real file system object, the pathname resolution proceeds to the +next component. +\li Database Server: A set of AFS databases is required for the proper +functioning of the system. Each database may be replicated across two or more +file server machines. Access to these databases is mediated by a database +server process running at each replication site. One site is declared to be the +synchronization site, the sole location accepting requests to modify the +databases. All other sites are read-only with respect to the set of AFS users. +When the synchronization site receives an update to its database, it +immediately distributes it to the other sites. Should a synchronization site go +down through either a hard failure or a network partition, the remaining sites +will automatically elect a new synchronization site if they form a quorum, or +majority. This insures that multiple synchronization sites do not become active +in the network partition scenario. +\par +The classes of AFS database servers are listed below: +\li Authentication Server: This server maintains the authentication database +used to generate tokens of identity. +\li Protection Server: This server maintains mappings between human-readable +user account names and their internal numerical AFS identifiers. It also +manages the creation, manipulation, and update of user-defined groups suitable +for use on ACLs. +\li Volume Location Server: This server exports information concerning the +location of the individual volumes housed within the cell. + + \section sec1-3 Section 1.3: Document Layout + +\par +Following this introduction and overview, Chapter 2 describes the architecture +of the File Server process design. Similarly, Chapter 3 describes the +architecture of the in-kernel Cache Manager agent. Following these +architectural examinations, Chapter 4 provides a set of basic coding +definitions common to both the AFS File Server and Cache Manager, required to +properly understand the interface specifications which follow. Chapter 5 then +proceeds to specify the various File Server interfaces. The myriad Cache +Manager interfaces are presented in Chapter 6, thus completing the document. + + \page chap2 Chapter 2: File Server Architecture + + \section sec2-1 Section 2.1: Overview + +\par +The AFS File Server is a user-level process that presides over the raw disk +partitions on which it supports one or more volumes. It provides 'half' of the +fundamental service of the system, namely exporting and regimenting access to +the user data entrusted to it. The Cache Manager provides the other half, +acting on behalf of its human users to locate and access the files stored on +the file server machines. +\par +This chapter examines the structure of the File Server process. First, the set +of AFS agents with which it must interact are discussed. Next, the threading +structure of the server is examined. Some details of its handling of the race +conditions created by the callback mechanism are then presented. This is +followed by a discussion of the read-only volume synchronization mechanism. +This functionality is used in each RPC interface call and intended to detect +new releases of read-only volumes. File Servers do not generate callbacks for +objects residing in read-only volumes, so this synchronization information is +used to implement a 'whole-volume' callback. Finally, the fact that the File +Server may drop certain information recorded about the Cache Managers with +which it has communicated and yet guarantee correctness of operation is +explored. + + \section sec2-2 Section 2.2: Interactions + +\par +By far the most frequent partner in File Server interactions is the set of +Cache Managers actively fetching and storing chunks of data files for which the +File Server provides central storage facilities. The File Server also +periodically probes the Cache Managers recorded in its tables with which it has +recently dealt, determining if they are still active or whether their records +might be garbage-collected. +\par +There are two other server entities with which the File Server interacts, +namely the Protection Server and the BOS Server. Given a fetch or store request +generated by a Cache Manager, the File Server needs to determine if the caller +is authorized to perform the given operation. An important step in this process +is to determine what is referred to as the caller's Current Protection +Subdomain, or CPS. A user's CPS is a list of principals, beginning with the +user's internal identifier, followed by the the numerical identifiers for all +groups to which the user belongs. Once this CPS information is determined, the +File Server scans the ACL controlling access to the file system object in +question. If it finds that the ACL contains an entry specifying a principal +with the appropriate rights which also appears in the user's CPS, then the +operation is cleared. Otherwise, it is rejected and a protection violation is +reported to the Cache Manager for ultimate reflection back to the caller. +\par +The BOS Server performs administrative operations on the File Server process. +Thus, their interactions are quite one-sided, and always initiated by the BOS +Server. The BOS Server does not utilize the File Server's RPC interface, but +rather generates unix signals to achieve the desired effect. + + \section sec2-3 Section 2.3: Threading + +\par +The File Server is organized as a multi-threaded server. Its threaded behavior +within a single unix process is achieved by use of the LWP lightweight process +facility, as described in detail in the companion "AFS-3 Programmer's +Reference: Specification for the Rx Remote Procedure Call Facility" document. +The various threads utilized by the File Server are described below: +\li WorkerLWP: This lightweight process sleeps until a request to execute one +of the RPC interface functions arrives. It pulls the relevant information out +of the request, including any incoming data delivered as part of the request, +and then executes the server stub routine to carry out the operation. The +thread finishes its current activation by feeding the return code and any +output data back through the RPC channel back to the calling Cache Manager. The +File Server initialization sequence specifies that at least three but no more +than six of these WorkerLWP threads are to exist at any one time. It is +currently not possible to configure the File Server process with a different +number of WorkerLWP threads. +\li FiveMinuteCheckLWP: This thread runs every five minutes, performing such +housekeeping chores as cleaning up timed-out callbacks, setting disk usage +statistics, and executing the special handling required by certain AIX +implementations. Generally, this thread performs activities that do not take +unbounded time to accomplish and do not block the thread. If reassurance is +required, FiveMinuteCheckLWP can also be told to print out a banner message to +the machine's console every so often, stating that the File Server process is +still running. This is not strictly necessary and an artifact from earlier +versions, as the File Server's status is now easily accessible at any time +through the BOS Server running on its machine. +\li HostCheckLWP: This thread, also activated every five minutes, performs +periodic checking of the status of Cache Managers that have been previously +contacted and thus appear in this File Server's internal tables. It generates +RXAFSCB Probe() calls from the Cache Manager interface, and may find itself +suspended for an arbitrary amount of time when it enounters unreachable Cache +Managers. + + \section sec2-4 Section 2.4: Callback Race Conditions + +\par +Callbacks serve to implement the efficient AFS cache consistency mechanism, as +described in Section 1.1.1. Because of the asynchronous nature of callback +generation and the multi-threaded operation and organization of both the File +Server and Cache Manager, race conditions can arise in their use. As an +example, consider the case of a client machine fetching a chunk of file X. The +File Server thread activated to carry out the operation ships the contents of +the chunk and the callback information over to the requesting Cache Manager. +Before the corresponding Cache Manager thread involved in the exchange can be +scheduled, another request arrives at the File Server, this time storing a +modified image of the same chunk from file X. Another worker thread comes to +life and completes processing of this second request, including execution of an +RXAFSCB CallBack() to the Cache Manager who still hasn't picked up on the +results of its fetch operation. If the Cache Manager blindly honors the RXAFSCB +CallBack() operation first and then proceeds to process the fetch, it will wind +up believing it has a callback on X when in reality it is out of sync with the +central copy on the File Server. To resolve the above class of callback race +condition, the Cache Manager effectively doublechecks the callback information +received from File Server calls, making sure they haven't already been +nullified by other file system activity. + + \section sec2-5 Section 2.5: Read-Only Volume Synchronization + +\par +The File Server issues a callback for each file chunk it delivers from a +read-write volume, thus allowing Cache Managers to efficiently synchronize +their local caches with the authoritative File Server images. However, no +callbacks are issued when data from read-only volumes is delivered to clients. +Thus, it is possible for a new snapshot of the read-only volume to be +propagated to the set of replication sites without Cache Managers becoming +aware of the event and marking the appropriate chunks in their caches as stale. +Although the Cache Manager refreshes its volume version information +periodically (once an hour), there is still a window where a Cache Manager will +fail to notice that it has outdated chunks. +\par +The volume synchronization mechanism was defined to close this window, +resulting in what is nearly a 'whole-volume' callback device for read-only +volumes. Each File Server RPC interface function handling the transfer of file +data is equipped with a parameter (a volSyncP), which carries this volume +synchronization information. This parameter is set to a non-zero value by the +File Server exclusively when the data being fetched is coming from a read-only +volume. Although the struct AFSVolSync defined in Section 5.1.2.2 passed via a +volSyncP consists of six longwords, only the first one is set. This leading +longword carries the creation date of the read-only volume. The Cache Manager +immediately compares the synchronization value stored in its cached volume +information against the one just received. If they are identical, then the +operation is free to complete, secure in the knowledge that all the information +and files held from that volume are still current. A mismatch, though, +indicates that every file chunk from this volume is potentially out of date, +having come from a previous release of the read-only volume. In this case, the +Cache Manager proceeds to mark every chunk from this volume as suspect. The +next time the Cache Manager considers accessing any of these chunks, it first +checks with the File Server it came from which the chunks were obtained to see +if they are up to date. + + \section sec2-6 Section 2.6: Disposal of Cache Manager Records + +\par +Every File Server, when first starting up, will, by default, allocate enough +space to record 20,000 callback promises (see Section 5.3 for how to override +this default). Should the File Server fully populate its callback records, it +will not allocate more, allowing its memory image to possibly grow in an +unbounded fashion. Rather, the File Server chooses to break callbacks until it +acquires a free record. All reachable Cache Managers respond by marking their +cache entries appropriately, preserving the consistency guarantee. In fact, a +File Server may arbitrarily and unilaterally purge itself of all records +associated with a particular Cache Manager. Such actions will reduce its +performance (forcing these Cache Managers to revalidate items cached from that +File Server) without sacrificing correctness. + + \page chap3 Chapter 3: Cache Manager Architecture + + \section sec3-1 Section 3.1: Overview + +\par +The AFS Cache Manager is a kernel-resident agent with the following duties and +responsibilities: +\li Users are to be given the illusion that files stored in the AFS distributed +file system are in fact part of the local unix file system of their client +machine. There are several areas in which this illusion is not fully realized: +\li Semantics: Full unix semantics are not maintained by the set of agents +implementing the AFS distributed file system. The largest deviation involves +the time when changes made to a file are seen by others who also have the file +open. In AFS, modifications made to a cached copy of a file are not necessarily +reflected immediately to the central copy (the one hosted by File Server disk +storage), and thus to other cache sites. Rather, the changes are only +guaranteed to be visible to others who simultaneously have their own cached +copies open when the modifying process executes a unix close() operation on the +file. +\par +This differs from the semantics expected from the single-machine, local unix +environment, where writes performed on one open file descriptor are immediately +visible to all processes reading the file via their own file descriptors. Thus, +instead of the standard "last writer wins" behavior, users see "last closer +wins" behavior on their AFS files. Incidentally, other DFSs, such as NFS, do +not implement full unix semantics in this case either. +\li Partial failures: A panic experienced by a local, single-machine unix file +system will, by definition, cause all local processes to terminate immediately. +On the other hand, any hard or soft failure experienced by a File Server +process or the machine upon which it is executing does not cause any of the +Cache Managers interacting with it to crash. Rather, the Cache Managers will +now have to reflect their failures in getting responses from the affected File +Server back up to their callers. Network partitions also induce the same +behavior. From the user's point of view, part of the file system tree has +become inaccessible. In addition, certain system calls (e.g., open() and +read()) may return unexpected failures to their users. Thus, certain coding +practices that have become common amongst experienced (single-machine) unix +programmers (e.g., not checking error codes from operations that "can't" fail) +cause these programs to misbehave in the face of partial failures. +\par +To support this transparent access paradigm, the Cache Manager proceeds to: +\li Intercept all standard unix operations directed towards AFS objects, +mapping them to references aimed at the corresponding copies in the local +cache. +\li Keep a synchronized local cache of AFS files referenced by the client +machine's users. If the chunks involved in an operation reading data from an +object are either stale or do not exist in the local cache, then they must be +fetched from the File Server(s) on which they reside. This may require a query +to the volume location service in order to locate the place(s) of residence. +Authentication challenges from File Servers needing to verify the caller's +identity are handled by the Cache Manager, and the chunk is then incorporated +into the cache. +\li Upon receipt of a unix close, all dirty chunks belonging to the object will +be flushed back to the appropriate File Server. +\li Callback deliveries and withdrawals from File Servers must be processed, +keeping the local cache in close synchrony with the state of affairs at the +central store. +\li Interfaces are also be provided for those principals who wish to perform +AFS-specific operations, such as Access Control List (ACL) manipulations or +changes to the Cache Manager's configuration. +\par +This chapter takes a tour of the Cache Manager's architecture, and examines how +it supports these roles and responsibilities. First, the set of AFS agents with +which it must interact are discussed. Next, some of the Cache Manager's +implementation and interface choices are examined. Finally, the server's +ability to arbitrarily dispose of callback information without affecting the +correctness of the cache consistency algorithm is explained. + + \section sec3-2 Section 3.2: Interactions + +\par +The main AFS agent interacting with a Cache Manager is the File Server. The +most common operation performed by the Cache Manager is to act as its users' +agent in fetching and storing files to and from the centralized repositories. +Related to this activity, a Cache Manager must be prepared to answer queries +from a File Server concerning its health. It must also be able to accept +callback revocation notices generated by File Servers. Since the Cache Manager +not only engages in data transfer but must also determine where the data is +located in the first place, it also directs inquiries to Volume Location Server +agents. There must also be an interface allowing direct interactions with both +common and administrative users. Certain AFS-specific operations must be made +available to these parties. In addition, administrative users may desire to +dynamically reconfigure the Cache Manager. For example, information about a +newly-created cell may be added without restarting the client's machine. + + \section sec3-3 Section 3.3: Implementation Techniques + +\par +The above roles and behaviors for the Cache Manager influenced the +implementation choices and methods used to construct it, along with the desire +to maximize portability. This section begins by showing how the VFS/vnode +interface, pioneered and standardized by Sun Microsystems, provides not only +the necessary fine-grain access to user file system operations, but also +facilitates Cache Manager ports to new hardware and operating system platforms. +Next, the use of unix system calls is examined. Finally, the threading +structure employed is described. + + \subsection sec3-3-1 Section 3.3.1: VFS Interface + +\par +As mentioned above, Sun Microsystems has introduced and propagated an important +concept in the file system world, that of the Virtual File System (VFS) +interface. This abstraction defines a core collection of file system functions +which cover all operations required for users to manipulate their data. System +calls are written in terms of these standardized routines. Also, the associated +vnode concept generalizes the original unix inode idea and provides hooks for +differing underlying environments. Thus, to port a system to a new hardware +platform, the system programmers have only to construct implementations of this +base array of functions consistent with the new underlying machine. +\par +The VFS abstraction also allows multiple file systems (e.g., vanilla unix, DOS, +NFS, and AFS) to coexist on the same machine without interference. Thus, to +make a machine AFS-capable, a system designer first extends the base vnode +structure in well-defined ways in order to store AFS-specific operations with +each file description. Then, the base function array is coded so that calls +upon the proper AFS agents are made to accomplish each function's standard +objectives. In effect, the Cache Manager consists of code that interprets the +standard set of unix operations imported through this interface and executes +the AFS protocols to carry them out. + + \subsection sec3-3-2 Section 3.3.2: System Calls + +\par +As mentioned above, many unix system calls are implemented in terms of the base +function array of vnode-oriented operations. In addition, one existing system +call has been modified and two new system calls have been added to perform +AFS-specific operations apart from the Cache Manager's unix 'emulation' +activities. The standard ioctl() system call has been augmented to handle +AFS-related operations on objects accessed via open unix file descriptors. One +of the brand-new system calls is pioctl(), which is much like ioctl() except it +names targeted objects by pathname instead of file descriptor. Another is afs +call(), which is used to initialize the Cache Manager threads, as described in +the section immediately following. + + \subsection sec3-3-3 Section 3.3.3: Threading + +\par +In order to execute its many roles, the Cache Manager is organized as a +multi-threaded entity. It is implemented with (potentially multiple +instantiations of) the following three thread classes: +\li CallBack Listener: This thread implements the Cache Manager callback RPC +interface, as described in Section 6.5. +\li Periodic Maintenance: Certain maintenance and checkup activities need to be +performed at five set intervals. Currently, the frequency of each of these +operations is hard-wired. It would be a simple matter, though, to make these +times configurable by adding command-line parameters to the Cache Manager. +\li Thirty seconds: Flush pending writes for NFS clients coming in through the +NFS-AFS Translator facility. +\li One minute: Make sure local cache usage is below the assigned quota, write +out dirty buffers holding directory data, and keep flock()s alive. +\li Three minutes: Check for the resuscitation of File Servers previously +determined to be down, and check the cache of previously computed access +information in light of any newly expired tickets. +\li Ten minutes: Check health of all File Servers marked as active, and +garbage-collect old RPC connections. +\li One hour: Check the status of the root AFS volume as well as all cached +information concerning read-only volumes. +\li Background Operations: The Cache Manager is capable of prefetching file +system objects, as well as carrying out delayed stores, occurring sometime +after a close() operation. At least two threads are created at Cache Manager +initialization time and held in reserve to carry out these objectives. This +class of background threads implements the following three operations: +\li Prefetch operation: Fetches particular file system object chunks in the +expectation that they will soon be needed. +\li Path-based prefetch operation: The prefetch daemon mentioned above operates +on objects already at least partly resident in the local cache, referenced by +their vnode. The path-based prefetch daemon performs the same actions, but on +objects named solely by their unix pathname. +\li Delayed store operation: Flush all modified chunks from a file system +object to the appropriate File Server's disks. + + \section sec3-4 Section 3.4: Disposal of Cache Manager Records + +\par +The Cache Manager is free to throw away any or all of the callbacks it has +received from the set of File Servers from which it has cached files. This +housecleaning does not in any way compromise the correctness of the AFS cache +consistency algorithm. The File Server RPC interface described in this paper +provides a call to allow a Cache Manager to advise of such unilateral +jettisoning. However, failure to use this routine still leaves the machine's +cache consistent. Let us examine the case of a Cache Manager on machine C +disposing of its callback on file X from File Server F. The next user access on +file X on machine C will cause the Cache Manager to notice that it does not +currently hold a callback on it (although the File Server will think it does). +The Cache Manager on C attempts to revalidate its entry when it is entirely +possible that the file is still in sync with the central store. In response, +the File Server will extend the existing callback information it has and +deliver the new promise to the Cache Manager on C. Now consider the case where +file X is modified by a party on a machine other than C before such an access +occurs on C. Under these circumstances, the File Server will break its callback +on file X before performing the central update. The Cache Manager on C will +receive one of these "break callback" messages. Since it no longer has a +callback on file X, the Cache Manager on C will cheerfully acknowledge the File +Server's notification and move on to other matters. In either case, the +callback information for both parties will eventually resynchronize. The only +potential penalty paid is extra inquiries by the Cache Manager and thus +providing for reduced performance instead of failure of operation. + + \page chap4 Chapter 4: Common Definitions and Data Structures + +\par +This chapter discusses the definitions used in common by the File Server and +the Cache Manager. They appear in the common.xg file, used by Rxgen to generate +the C code instantiations of these definitions. + + \section sec4-1 Section 4.1: File-Related Definitions + + \subsection sec4-1-1 Section 4.1.1: struct AFSFid + +\par +This is the type for file system objects within AFS. +\n \n Fields +\li unsigned long Volume - This provides the identifier for the volume in which +the object resides. +\li unsigned long Vnode - This specifies the index within the given volume +corresponding to the object. +\li unsigned long Unique - This is a 'uniquifier' or generation number for the +slot identified by the Vnode field. + + \section sec4-2 Section 4.2: Callback-related Definitions + + \subsection sec4-2-1 Section 4.2.1: Types of Callbacks + +\par +There are three types of callbacks defined by AFS-3: + +\li EXCLUSIVE: This version of callback has not been implemented. Its intent +was to allow a single Cache Manager to have exclusive rights on the associated +file data. +\li SHARED: This callback type indicates that the status information kept by a +Cache Manager for the associated file is up to date. All cached chunks from +this file whose version numbers match the status information are thus +guaranteed to also be up to date. This type of callback is non-exclusive, +allowing any number of other Cache Managers to have callbacks on this file and +cache chunks from the file. +\li DROPPED: This is used to indicate that the given callback promise has been +cancelled by the issuing File Server. The Cache Manager is forced to mark the +status of its cache entry as unknown, forcing it to stat the file the next time +a user attempts to access any chunk from it. + + \subsection sec4-2-2 Section 4.2.2: struct AFSCallBack + +\par +This is the canonical callback structure passed in many File Server RPC +interface calls. +\n \b Fields +\li unsigned long CallBackVersion - Callback version number. +\li unsigned long ExpirationTime - Time when the callback expires, measured in +seconds. +\li unsigned long CallBackType - The type of callback involved, one of +EXCLUSIVE, SHARED, or DROPPED. + + \subsection sec4-2-3 Section 4.2.3: Callback Arrays + +\par +AFS-3 sometimes does callbacks in bulk. Up to AFSCBMAX (50) callbacks can be +handled at once. Layouts for the two related structures implementing callback +arrays, struct AFSCBFids and struct AFSCBs, follow below. Note that the +callback descriptor in slot i of the array in the AFSCBs structure applies to +the file identifier contained in slot i in the fid array in the matching +AFSCBFids structure. + + \subsubsection sec4-2-3-1 Section 4.2.3.1: struct AFSCBFids + +\n \b Fields +\li u int AFSCBFids len - Number of AFS file identifiers stored in the +structure, up to a maximum of AFSCBMAX. +\li AFSFid *AFSCBFids val - Pointer to the first element of the array of file +identifiers. + + \subsubsection sec4-2-3-2 Section 4.2.3.2: struct AFSCBs + +\n \b Fields +\li u int AFSCBs len - Number of AFS callback descriptors stored in the +structure, up to a maximum of AFSCBMAX. +\li AFSCallBack *AFSCBs val - Pointer to the actual array of callback +descriptors + + \section sec4-3 Section 4.3: Locking Definitions + + \subsection sec4-3-1 Section 4.3.1: struct AFSDBLockDesc + +\par +This structure describes the state of an AFS lock. +\n \b Fields +\li char waitStates - Types of lockers waiting for the lock. +\li char exclLocked - Does anyone have a boosted, shared or write lock? (A +boosted lock allows the holder to have data read-locked and then 'boost' up to +a write lock on the data without ever relinquishing the lock.) +\li char readersReading - Number of readers that actually hold a read lock on +the associated object. +\li char numWaiting - Total number of parties waiting to acquire this lock in +some fashion. + + \subsection sec4-3-2 Section 4.3.2: struct AFSDBCacheEntry + +\par +This structure defines the description of a Cache Manager local cache entry, as +made accessible via the RXAFSCB GetCE() callback RPC call. Note that File +Servers do not make the above call. Rather, client debugging programs (such as +cmdebug) are the agents which call RXAFSCB GetCE(). +\n \b Fields +\li long addr - Memory location in the Cache Manager where this description is +located. +\li long cell - Cell part of the fid. +\li AFSFid netFid - Network (standard) part of the fid +\li long Length - Number of bytes in the cache entry. +\li long DataVersion - Data version number for the contents of the cache entry. +\li struct AFSDBLockDesc lock - Status of the lock object controlling access to +this cache entry. +\li long callback - Index in callback records for this object. +\li long cbExpires - Time when the callback expires. +\li short refCount - General reference count. +\li short opens - Number of opens performed on this object. +\li short writers - Number of writers active on this object. +\li char mvstat - The file classification, indicating one of normal file, mount +point, or volume root. +\li char states - Remembers the state of the given file with a set of +bits indicating, from lowest-order to highest order: stat info valid, read-only +file, mount point valid, pending core file, wait-for-store, and mapped file. + + \subsection sec4-3-3 Section 4.3.3: struct AFSDBLock + +\par +This is a fuller description of an AFS lock, including a string name used to +identify it. +\n \b Fields +\li char name[16] - String name of the lock. +\li struct AFSDBLockDesc lock - Contents of the lock itself. + + \section sec4-4 Section 4.4: Miscellaneous Definitions + + \subsection sec4-4-1 Section 4.4.1: Opaque structures + +\par +A maximum size for opaque structures passed via the File Server interface is +defined as AFSOPAQUEMAX. Currently, this is set to 1,024 bytes. The AFSOpaque +typedef is defined for use by those parameters that wish their contents to +travel completely uninterpreted across the network. + + \subsection sec4-4-2 Section 4.4.2: String Lengths + +\par +Two common definitions used to specify basic AFS string lengths are AFSNAMEMAX +and AFSPATHMAX. AFSNAMEMAX places an upper limit of 256 characters on such +things as file and directory names passed as parameters. AFSPATHMAX defines the +longest pathname expected by the system, composed of slash-separated instances +of the individual directory and file names mentioned above. The longest +acceptable pathname is currently set to 1,024 characters. + + \page chap5 Chapter 5: File Server Interfaces + +\par +There are several interfaces offered by the File Server, allowing it to export +the files stored within the set of AFS volumes resident on its disks to the AFS +community in a secure fashion and to perform self-administrative tasks. This +chapter will cover the three File Server interfaces, summarized below. There is +one File Server interface that will not be discussed in this document, namely +that used by the Volume Server. It will be fully described in the companion +AFS-3 Programmer's Reference:Volume Server/Volume Location Server Interface. +\li RPC: This is the main File Server interface, supporting all of the Cache +Manager's needs for providing its own clients with appropriate access to file +system objects stored within AFS. It is closedly tied to the callback interface +exported by the Cache Manager as described in Section 6.5, which has special +implications for any application program making direct calls to this interface. +\li Signals: Certain operations on a File Server must be performed by it +sending unix signals on the machine on which it is executing. These operations +include performing clean shutdowns and adjusting debugging output levels. +Properly-authenticated administrative users do not have to be physically logged +into a File Server machine to generate these signals. Rather, they may use the +RPC interface exported by that machine's BOS Server process to generate them +from any AFS-capable machine. +\li Command Line: Many of the File Server's operating parameters may be set +upon startup via its command line interface. Such choices as the number of data +buffers and callback records to hold in memory may be made here, along with +various other decisions such as lightweight thread stack size. + + \section sec5-1 Section 5.1: RPC Interface + + \subsection sec5-1-1 Section 5.1.1: Introduction and Caveats + +\par +The documentation for the AFS-3 File Server RPC interface commences with some +basic definitions and data structures used in conjunction with the function +calls. This is followed by an examination of the set of non-streamed RPC +functions, namely those routines whose parameters are all fixed in size. Next, +the streamed RPC functions, those with parameters that allow an arbitrary +amount of data to be delivered, are described. A code fragment and accompanying +description and analysis are offered as an example of how to use the streamed +RPC calls. Finally, a description of the special requirements on any +application program making direct calls to this File Server interface appears. +The File Server assumes that any entity making calls to its RPC functionality +is a bona fide and full-fledged Cache Manager. Thus, it expects this caller to +export the Cache Manager's own RPC interface, even if the application simply +uses File Server calls that don't transfer files and thus generate callbacks. +\par +Within those sections describing the RPC functions themselves, the purpose of +each call is detailed, and the nature and use of its parameters is documented. +Each of these RPC interface routines returns an integer error code, and a +subset of the possible values are described. A complete and systematic list of +potential error returns for each function is difficult to construct and +unwieldy to examine. This is due to fact that error codes from many different +packages and from many different levels may arise. Instead of attempting +completeness, the error return descriptions discuss error codes generated +within the functions themselves (or a very small number of code levels below +them) within the File Server code itself, and not from such associated packages +as the Rx, volume, and protection modules. Many of these error code are defined +in the companion AFS-3 documents. +\par +By convention, a return value of zero reveals that the function call was +successful and that all of its OUT parameters have been set by the File Server. + + \subsection sec5-1-2 Section 5.1.2: Definitions and Structures + + \subsubsection sec5-1-2-1 Section 5.1.2.1: Constants and Typedefs + +\par +The following constants and typedefs are required to properly use the File +Server RPC interface, both to provide values and to interpret information +returned by the calls. The constants appear first, followed by the list of +typedefs, which sometimes depend on the constants above. Items are alphabetized +within each group. +\par +All of the constants appearing below whose names contain the XSTAT string are +used in conjuction with the extended data collection facility supported by the +File Server. The File Server defines some number of data collections, each of +which consists of an array of longword values computed by the File Server. +\par +There are currently two data collections defined for the File Server. The first +is identified by the AFS XSTATSCOLL CALL INFO constant. This collection of +longwords relates the number of times each internal function within the File +Server code has been executed, thus providing profiling information. The second +File Server data collection is identified by the AFS XSTATSCOLL PERF INFO +constant. This set of longwords contains information related to the File +Server's performance. + +\par Section 5.1.2.1.1 AFS DISKNAMESIZE [Value = 32] +Specifies the maximum length for an AFS disk partition, used directly in the +definition for the DiskName typedef. A DiskName appears as part of a struct +ViceDisk, a group of which appear inside a struct ViceStatistics, used for +carrying basic File Server statistics information. + +\par Section 5.1.2.1.2 AFS MAX XSTAT LONGS [Value = 1,024] +Defines the maximum size for a File Server data collection, as exported via the +RXAFS GetXStats() RPC call. It is used directly in the AFS CollData typedef. + +\par Section 5.1.2.1.3 AFS XSTATSCOLL CALL INFO [Value = 0] +This constant identifies the File Server's data collection containing profiling +information on the number of times each of its internal procedures has been +called. +\par +Please note that this data collection is not supported by the File Server at +this time. A request for this data collection will result the return of a +zero-length array. + +\par Section 5.1.2.1.4 AFS XSTATSCOLL PERF INFO [Value = 1] +This constant identifies the File Server's data collection containing +performance-related information. + +\par Section 5.1.2.1.5 AFS CollData [typedef long AFS CollData;] +This typedef is used by Rxgen to create a structure used to pass File Server +data collections to the caller. It resolves into a C typedef statement defining +a structure of the same name with the following fields: +\n \b Fields +\li u int AFS CollData len - The number of longwords contained within the data +pointed to by the next field. +\li long *AFS CollData val - A pointer to a sequence of AFS CollData len +long-words. + +\par Section 5.1.2.1.6 AFSBulkStats [typedef AFSFetchStatus +AFSBulkStats;] +This typedef is used by Rxgen to create a structure used to pass a set of +statistics structures, as described in the RXAFS BulkStatus documentation in +Section 5.1.3.21. It resolves into a C typedef statement defining a structure +of the same name with the following fields: +\n \b Fields +\li u int AFSBulkStats len - The number of struct AFSFetchStatus units +contained within the data to which the next field points. +\li AFSFetchStatus *AFSBulkStats val - This field houses pointer to a sequence +of AFSBulkStats len units of type struct AFSFetchStatus. + +\par Section 5.1.2.1.7 DiskName [typedef opaque DiskName[AFS DISKNAMESIZE];] +The name of an AFS disk partition. This object appears as a field within a +struct ViceDisk,a group of which appear inside a struct ViceStatistics, used +for carrying basic File Server statistics information. The term opaque +appearing above inidcates that the object being defined will be treated as an +undifferentiated string of bytes. + +\par Section 5.1.2.1.8 ViceLockType [typedef long ViceLockType;] +This defines the format of a lock used internally by the Cache Manager. The +content of these locks is accessible via the RXAFSCB GetLock() RPC function. An +isomorphic and more refined version of the lock structure used by the Cache +Manager, mapping directly to this definition, is struct AFSDBLockDesc, defined +in Section 4.3.1. + + \subsubsection sec5-1-2-2 Section 5.1.2.2: struct AFSVolSync + +\par +This structure conveys volume synchronization information across many of the +File Server RPC interface calls, allowing something akin to a "whole-volume +callback" on read-only volumes. +\n \b Fields +\li unsigned long spare1 ... spare6 - The first longword, spare1, contains the +volume's creation date. The rest are currently unused. + + \subsubsection sec5-1-2-3 Section 5.1.2.3: struct AFSFetchStatus + +\par +This structure defines the information returned when a file system object is +fetched from a File Server. +\n \b Fields +\li unsigned long InterfaceVersion - RPC interface version, defined to be 1. +\li unsigned long FileType - Distinguishes the object as either a file, +directory, symlink, or invalid. +\li unsigned long LinkCount - Number of links to this object. +\li unsigned long Length - Length in bytes. +\li unsigned long DataVersion - Object's data version number. +\li unsigned long Author - Identity of the object's author. +\li unsigned long Owner - Identity of the object's owner. +\li unsigned long CallerAccess - The set of access rights computed for the +caller on this object. +\li unsigned long AnonymousAccess - The set of access rights computed for any +completely unauthenticated principal. +\li unsigned long UnixModeBits - Contents of associated unix mode bits. +\li unsigned long ParentVnode - Vnode for the object's parent directory. +\li unsigned long ParentUnique - Uniquifier field for the parent object. +\li unsigned long SegSize - (Not implemented). +\li unsigned long ClientModTime - Time when the caller last modified the data +within the object. +\li unsigned long ServerModTime - Time when the server last modified the data +within the object. +\li unsigned long Group - (Not implemented). +\li unsigned long SyncCounter - (Not implemented). +\li unsigned long spare1 ... spare4 - Spares. + + \subsubsection sec5-1-2-4 Section 5.1.2.4: struct AFSStoreStatus + +\par +This structure is used to convey which of a file system object's status fields +should be set, and their new values. Several File Server RPC calls, including +RXAFS StoreStatus(), RXAFS CreateFile(), RXAFS SymLink(), RXAFS MakeDir(), and +the streamed call to store file data onto the File Server. +\n \b Fields +\li unsigned long Mask - Bit mask, specifying which of the following fields +should be assigned into the File Server's status block on the object. +\li unsigned long ClientModTime - The time of day that the object was last +modified. +\li unsigned long Owner - The principal identified as the owner of the file +system object. +\li unsigned long Group - (Not implemented). +\li unsigned long UnixModeBits - The set of associated unix mode bits. +\li unsigned long SegSize - (Not implemented). + + \subsubsection sec5-1-2-5 Section 5.1.2.5: struct ViceDisk + +\par +This structure occurs in struct ViceStatistics, and describes the +characteristics and status of a disk partition used for AFS storage. +\n \b Fields +\li long BlocksAvailable - Number of 1 Kbyte disk blocks still available on the +partition. +\li long TotalBlocks - Total number of disk blocks in the partition. +\li DiskName Name - The human-readable character string name of the disk +partition (e.g., /vicepa). + + \subsubsection sec5-1-2-6 Section 5.1.2.6: struct ViceStatistics + +\par +This is the File Server statistics structure returned by the RXAFS +GetStatistics() RPC call. +\n \b Fields +\li unsigned long CurrentMsgNumber - Not used. +\li unsigned long OldestMsgNumber - Not used. +\li unsigned long CurrentTime - Time of day, as understood by the File Server. +\li unsigned long BootTime - Kernel's boot time. +\li unsigned long StartTime - Time when the File Server started up. +\li long CurrentConnections - Number of connections to Cache Manager instances. +\li unsigned long TotalViceCalls - Count of all calls made to the RPC +interface. +\li unsigned long TotalFetchs - Total number of fetch operations, either status +or data, performed. +\li unsigned long FetchDatas - Total number of data fetch operations +exclusively. +\li unsigned long FetchedBytes - Total number of bytes fetched from the File +Server since it started up. +\li long FetchDataRate - Result of dividing the FetchedBytes field by the +number of seconds the File Server has been running. +\li unsigned long TotalStores - Total number of store operations, either status +or data, performed. +\li unsigned long StoreDatas - Total number of data store operations +exclusively. +\li unsigned long StoredBytes - Total number of bytes stored to the File Server +since it started up. +\li long StoreDataRate - The result of dividing the StoredBytes field by the +number of seconds the File Server has been running. +\li unsigned long TotalRPCBytesSent - Outdated +\li unsigned long TotalRPCBytesReceived - Outdated +\li unsigned long TotalRPCPacketsSent - Outdated +\li unsigned long TotalRPCPacketsReceived - Outdated +\li unsigned long TotalRPCPacketsLost - Outdated +\li unsigned long TotalRPCBogusPackets - Outdated +\li long SystemCPU - Result of reading from the kernel the usage times +attributed to system activities. +\li long UserCPU - Result of reading from the kernel the usage times attributed +to user-level activities. +\li long NiceCPU - Result of reading from the kernel the usage times attributed +to File Server activities that have been nice()d (i.e., run at a lower +priority). +\li long IdleCPU - Result of reading from the kernel the usage times attributed +to idling activities. +\li long TotalIO - Summary of the number of bytes read/written from the disk. +\li long ActiveVM - Amount of virtual memory used by the File Server. +\li long TotalVM - Total space available on disk for virtual memory activities. +\li long EtherNetTotalErrors - Not used. +\li long EtherNetTotalWrites - Not used. +\li long EtherNetTotalInterupts - Not used. +\li long EtherNetGoodReads - Not used. +\li long EtherNetTotalBytesWritten - Not used. +\li long EtherNetTotalBytesRead - Not used. +\li long ProcessSize - The size of the File Server's data space in 1 Kbyte +chunks. +\li long WorkStations - The total number of client Cache Managers +(workstations) for which information is held by the File Server. +\li long ActiveWorkStations - The total number of client Cache Managers +(workstations) that have recently interacted with the File Server. This number +is strictly less than or equal to the WorkStations field. +\li long Spare1 ... Spare8 - Not used. +\li ViceDisk Disk1 ... Disk10 - Statistics concerning up to 10 disk partitions +used by the File Server. These records keep information on all partitions, not +just partitions reserved for AFS storage. + + \subsubsection sec5-1-2-7 Section 5.1.2.7: struct afs PerfStats + +\par +This is the structure corresponding to the AFS XSTATSCOLL PERF INFO data +collection that is defined by the File Server (see Section 5.1.2.1.4). It is +accessible via the RXAFS GetXStats() interface routine, as defined in Section +5.1.3.26. +The fields within this structure fall into the following classifications: +\li Number of requests for the structure. +\li Vnode cache information. +\li Directory package numbers. +\li Rx information. +\li Host module fields +\li Spares. + +\par +Please note that the Rx fields represent the contents of the rx stats structure +maintained by Rx RPC facility itself. Also, a full description of all the +structure's fields is not possible here. For example, the reader is referred to +the companion Rx document for further clarification on the Rx-related fields +within afs PerfStats. +\n \b Fields +\li long numPerfCalls - Number of performance collection calls received. +\li long vcache L Entries - Number of entries in large (directory) vnode cache. +\li long vcache L Allocs - Number of allocations for the large vnode cache. +\li long vcache L Gets - Number of get operations for the large vnode cache. +\li long vcache L Reads - Number of reads performed on the large vnode cache. +\li long vcache L Writes - Number of writes executed on the large vnode.cache. +\li long vcache S Entries - Number of entries in the small (file) vnode cache. +\li long vcache S Allocs - Number of allocations for the small vnode cache. +\li long vcache S Gets - Number of get operations for the small vnode cache. +\li long vcache S Reads - Number of reads performed on the small vnode cache. +\li long vcache S Writes - Number of writes executed on the small vnode cache. +\li long vcache H Entries - Number of entries in the header of the vnode cache. +\li long vcache H Gets - Number of get operations on the header of the vnode +cache. +\li long vcache H Replacements - Number of replacement operations on the header +of the vnode cache. +\li long dir Buffers - Number of directory package buffers in use. +\li long dir Calls - Number of read calls made to the directory package. +\li long dir IOs - Number of directory I/O operations performed. +\li long rx packetRequests - Number of Rx packet allocation requests. +\li long rx noPackets RcvClass - Number of failed packet reception requests. +\li long rx noPackets SendClass - Number of failed packet transmission +requests. +\li long rx noPackets SpecialClass - Number of 'special' Rx packet rquests. +\li long rx socketGreedy - Did setting the Rx socket to SO GREEDY succeed? +\li long rx bogusPacketOnRead - Number of short packets received. +\li long rx bogusHost - Latest host address from bogus packets. +\li long rx noPacketOnRead - Number of attempts to read a packet when one was +not physically available. +\li long rx noPacketBuffersOnRead - Number of packets dropped due to buffer +shortages. +\li long rx selects - Number of selects performed, waiting for a packet arrival +or a timeout. +\li long rx sendSelects - Number of selects forced upon a send. +\li long rx packetsRead RcvClass - Number of packets read belonging to the +'Rcv' class. +\li long rx packetsRead SendClass - Number of packets read that belong to the +'Send' class. +\li long rx packetsRead SpecialClass - Number of packets read belonging to the +'Special' class. +\li long rx dataPacketsRead - Number of unique data packets read off the wire. +\li long rx ackPacketsRead - Number of acknowledgement packets read. +\li long rx dupPacketsRead - Number of duplicate data packets read. +\li long rx spuriousPacketsRead - Number of inappropriate packets read. +\li long rx packetsSent RcvClass - Number of packets sent belonging to the +'Rcv' class. +\li long rx packetsSent SendClass - Number of packets sent belonging to the +'Send' class. +\li long rx packetsSent SpecialClass - Number of packets sent belonging to the +'Special' class. +\li long rx ackPacketsSent - Number of acknowledgement packets sent. +\li long rx pingPacketsSent - Number of ping packets sent. +\li long rx abortPacketsSent - Number of abort packets sent. +\li long rx busyPacketsSent - Number of busy packets sent. +\li long rx dataPacketsSent - Number of unique data packets sent. +\li long rx dataPacketsReSent - Number of retransmissions sent. +\li long rx dataPacketsPushed - Number of retransmissions pushed by a NACK. +\li long rx ignoreAckedPacket - Number of packets whose acked flag was set at +rxi Start() time. +\li long rx totalRtt Sec - Total round trip time in seconds. +\li long rx totalRtt Usec - Microsecond portion of the total round trip time, +\li long rx minRtt Sec - Minimum round trip time in seconds. +\li long rx minRtt Usec - Microsecond portion of minimal round trip time. +\li long rx maxRtt Sec - Maximum round trip time in seconds. +\li long rx maxRtt Usec - Microsecond portion of maximum round trip time. +\li long rx nRttSamples - Number of round trip samples. +\li long rx nServerConns - Total number of server connections. +\li long rx nClientConns - Total number of client connections. +\li long rx nPeerStructs - Total number of peer structures. +\li long rx nCallStructs - Total number of call structures. +\li long rx nFreeCallStructs - Total number of call structures residing on the +free list. +\li long host NumHostEntries - Number of host entries. +\li long host HostBlocks - Number of blocks in use for host entries. +\li long host NonDeletedHosts - Number of non-deleted host entries. +\li long host HostsInSameNetOrSubnet - Number of host entries in the same +[sub]net as the File Server. +\li long host HostsInDiffSubnet - Number of host entries in a different subnet +as the File Server. +\li long host HostsInDiffNetwork - Number of host entries in a different +network entirely as the File Server. +\li long host NumClients - Number of client entries. +\li long host ClientBlocks - Number of blocks in use for client entries. +\li long spare[32] - Spare fields, reserved for future use. + + \subsubsection sec5-1-2-8 Section 5.1.2.8: struct AFSFetchVolumeStatus + +\par +The results of asking the File Server for status information concerning a +particular volume it hosts. +\n \b Fields +\li long Vid - Volume ID. +\li long ParentId - Volume ID in which the given volume is 'primarily' mounted. +\li This is used to properly resolve pwd operations, as a volume may be mounted +simultaneously at multiple locations. +\li char Online - Is the volume currently online and fully available? +\li char InService - This field records whether the volume is currently in +service. It is indistinguishable from the Blessed field, +\li char Blessed - See the description of the InService field immediately +above. +\li char NeedsSalvage -Should this volume be salvaged (run through a +consistency- checking procedure)? +\li long Type - The classification of this volume, namely a read/write volume +(RWVOL = 0), read-only volume (ROVOL = 1), or backup volume (BACKVOL = 2). +\li long MinQuota - Minimum number of 1 Kbyte disk blocks to be set aside for +this volume. Note: this field is not currently set or accessed by any AFS +agents. +\li long MaxQuota - Maximum number of 1 Kbyte disk blocks that may be occupied +by this volume. +\li long BlocksInUse - Number of 1 Kbyte disk blocks currently in use by this +volume. +\li long PartBlocksAvail - Number of available 1 Kbyte blocks currently unused +in the volume's partition. +\li long PartMaxBlocks - Total number of blocks, in use or not, for the +volume's partition. + + \subsubsection sec5-1-2-9 Section 5.1.2.9: struct AFSStoreVolumeStatus + +\par +This structure is used to convey which of a file system object's status fields +should be set, and their new values. The RXAFS SetVolumeStatus() RPC call is +the only user of this structure. +\n \b Fields +\li long Mask - Bit mask to determine which of the following two fields should +be stored in the centralized status for a given volume. +\li long MinQuota - Minimum number of 1 Kbyte disk blocks to be set aside for +this volume. +\li long MaxQuota - Maximum number of 1 Kbyte disk blocks that may be occupied +by this volume. + + \subsubsection sec5-1-2-10 Section 5.1.2.10: struct AFSVolumeInfo + +\par +This field conveys information regarding a particular volume through certain +File Server RPC interface calls. For information regarding the different volume +types that exist, please consult the companion document, AFS-3 Programmer's +Reference:Volume Server/Volume Location Server Interface. +\n \b Fields +\li unsigned long Vid - Volume ID. +\li long Type - Volume type (see struct AFSFetchVolumeStatus in Section 5.1.2.8 +above). +\li unsigned long Type0 ... Type4 - The volume IDs for the possible volume +types in existance for this volume. +\li unsigned long ServerCount - The number of File Server machines on which an +instance of this volume is located. +\li unsigned long Server0 ... Server7 - Up to 8 IP addresses of File Server +machines hosting an instance on this volume. The first ServerCount of these +fields hold valid server addresses. +\li unsigned short Port0 ... Port7 - Up to 8 UDP port numbers on which +operations on this volume should be directed. The first ServerCount of these +fields hold valid port identifiers. + + \subsection sec5-1-3 Section 5.1.3: Non-Streamed Function Calls + +\par +The following is a description of the File Server RPC interface routines that +utilize only parameters with fixed maximum lengths. The majority of the File +Server calls fall into this suite, with only a handful using streaming +techniques to pass objects of unbounded size between a File Server and Cache +Manager. +\par +Each function is labeled with an opcode number. This is the low-level numerical +identifier for the function, and appears in the set of network packets +constructed for the RPC call. + + \subsubsection sec5-1-3-1 Section 5.1.3.1: RXAFS FetchACL - Fetch the +ACL associated with the given AFS file identifier + +\code +int RXAFS FetchACL(IN struct rx connection *a rxConnP, + IN AFSFid *a dirFidP, + OUT AFSOpaque *a ACLP, + OUT AFSFetchStatus *a dirNewStatP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 131] Fetch the ACL for the directory identified by a dirFidP, placing +it in the space described by the opaque structure to which a ACLP points. Also +returned is the given directory's status, written to a dirNewStatP. An ACL may +thus take up at most AFSOPAQUEMAX (1,024) bytes, since this is the maximum size +of an AFSOpaque. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller is not permitted to perform this operation. EINVAL An +internal error in looking up the client record was encountered, or an invalid +fid was provided. VICETOKENDEAD Caller's authentication token has expired. + + \subsubsection sec5-1-3-2 Section 5.1.3.2: RXAFS FetchStatus - Fetch +the status information regarding a given file system object + +\code +int RXAFS FetchStatus(IN struct rx connection *a rxConnP, + IN AFSFid *a fidToStatP, + OUT AFSFetchStatus *a currStatP, + OUT AFSCallBack *a callBackP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 132] Fetch the current status information for the file or directory +identified by a fidToStatP, placing it into the area to which a currStatP +points. If the object resides in a read/write volume, then the related callback +information is returned in a callBackP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller is not permitted to perform this operation. EINVAL An +internal error in looking up the client record was encountered, or an invalid +fid was provided. VICETOKENDEAD Caller's authentication token has expired. + + \subsubsection sec5-1-3-3 Section 5.1.3.3: RXAFS StoreACL - Associate +the given ACL with the named directory + +\code +int RXAFS StoreACL(IN struct rx connection *a rxConnP, + IN AFSOpaque *a ACLToStoreP, + IN AFSFid *a dirFidP, + OUT AFSFetchStatus *a dirNewStatP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 134] Store the ACL information to which a ACLToStoreP points to the +File Server, associating it with the directory identified by a dirFidP. The +resulting status information for the a dirFidP directory is returned in a +dirNewStatP. Note that the ACL supplied via a ACLToStoreP may be at most +AFSOPAQUEMAX (1,024) bytes long, since this is the maximum size accommodated by +an AFSOpaque. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller is not permitted to perform this operation. +\n E2BIG The given ACL is too large. +\n EINVAL The given ACL could not translated to its on-disk format. + + \subsubsection sec5-1-3-4 Section 5.1.3.4: RXAFS StoreStatus - Store +the given status information for the specified file + +\code +int RXAFS StoreStatus(IN struct rx connection *a rxConnP, + IN AFSFid *a fidP, + IN AFSStoreStatus *a currStatusP, + OUT AFSFetchStatus *a srvStatusP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 135] Store the status information to which a currStatusP points, +associating it with the file identified by a fidP. All outstanding callbacks on +this object are broken. The resulting status structure stored at the File +Server is returned in a srvStatusP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller is not permitted to perform this operation. +\n EINVAL An internal error in looking up the client record was encountered, or +an invalid fid was provided, or an attempt was made to change the mode of a +symbolic link. +\n VICETOKENDEAD Caller's authentication token has expired. + + \subsubsection sec5-1-3-5 Section 5.1.3.5: RXAFS RemoveFile - Delete +the given file + +\code +int RXAFS RemoveFile(IN struct rx connection *a rxConnP, + IN AFSFid *a dirFidP, + IN char *a name, + OUT AFSFetchStatus *a srvStatusP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 136] Destroy the file named a name within the directory identified by a +dirFidP. All outstanding callbacks on this object are broken. The resulting +status structure stored at the File Server is returned in a srvStatusP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller is not permitted to perform this operation. +\n EINVAL An internal error in looking up the client record was encountered, or +an invalid fid was provided, or an attempt was made to remove "." or "..". +\n EISDIR The target of the deletion was supposed to be a file, but it is +really a directory. +\n ENOENT The named file was not found. +\n ENOTDIR The a dirFidP parameter references an object which is not a +directory, or the deletion target is supposed to be a directory but is not. +\n ENOTEMPTY The target directory being deleted is not empty. +\n VICETOKENDEAD Caller's authentication token has expired. + + \subsubsection sec5-1-3-6 Section 5.1.3.6: RXAFS CreateFile - Create +the given file + +\code +int RXAFS CreateFile(IN struct rx connection *a rxConnP, + IN AFSFid *DirFid, + IN char *Name, + IN AFSStoreStatus *InStatus, + OUT AFSFid *OutFid, + OUT AFSFetchStatus *OutFidStatus, + OUT AFSFetchStatus *OutDirStatus, + OUT AFSCallBack *CallBack, + OUT AFSVolSync *a volSyncP) +/* associated with the new file. */ +\endcode +\par Description +[Opcode 137] This call is used to create a file, but not for creating a +directory or a symbolic link. If this call succeeds, it is the Cache Manager's +responsibility to either create an entry locally in the directory specified by +DirFid or to invalidate this directory's cache entry. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller is not permitted to perform this operation. +\n EINVAL An internal error in looking up the client record was encountered, or +an invalid fid or name was provided. +\n ENOTDIR The DirFid parameter references an object which is not a directory. +\n VICETOKENDEAD Caller's authentication token has expired. + + \subsubsection sec5-1-3-7 Section 5.1.3.7: RXAFS Rename - Rename the +specified file in the given directory + +\code +int RXAFS Rename(IN struct rx connection *a rxConnP, + IN AFSFid *a origDirFidP, + IN char *a origNameP, + IN AFSFid *a newDirFidP, + IN char *a newNameP, + OUT AFSFetchStatus *a origDirStatusP, + OUT AFSFetchStatus *a newDirStatusP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 138] Rename file a origNameP in the directory identified by a +origDirFidP. Its new name is to be a newNameP, and it will reside in the +directory identified by a newDirFidP. Each of these names must be no more than +AFSNAMEMAX (256) characters long. The status of the original and new +directories after the rename operation completes are deposited in a +origDirStatusP and a newDirStatusP respectively. Existing callbacks are broken +for all files and directories involved in the operation. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES New file exists but user doesn't have Delete rights in the directory. +\n EINVAL Name provided is invalid. +\n EISDIR Original object is a file and new object is a directory. +\n ENOENT The object to be renamed doesn't exist in the parent directory. +\n ENOTDIR Original object is a directory and new object is a file. +\n EXDEV Rename attempted across a volume boundary, or create a pathname loop, +or hard links exist to the file. + + \subsubsection sec5-1-3-8 Section 5.1.3.8: RXAFS Symlink - Create a +symbolic link + +\code +int RXAFS Symlink(IN struct rx connection *a rxConnP, + IN AFSFid *a dirFidP, + IN char *a nameP, + IN char *a linkContentsP, + IN AFSStoreStatus *a origDirStatP, + OUT AFSFid *a newFidP, + OUT AFSFetchStatus *a newFidStatP, + OUT AFSFetchStatus *a newDirStatP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 139] Create a symbolic link named a nameP in the directory identified +by a dirFidP. The text of the symbolic link is provided in a linkContentsP, and +the desired status fields for the symbolic link given by a origDirStatP. The +name offered in a nameP must be less than AFSNAMEMAX (256) characters long, and +the text of the link to which a linkContentsP points must be less than +AFSPATHMAX (1,024) characters long. Once the symbolic link has been +successfully created, its file identifier is returned in a newFidP. Existing +callbacks to the a dirFidP directory are broken before the symbolic link +creation completes. The status fields for the symbolic link itself and its +parent's directory are returned in a newFidStatP and a newDirStatP +respectively. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EINVAL Illegal symbolic link name provided. + + \subsubsection sec5-1-3-9 Section 5.1.3.9: RXAFS Link - Create a hard +link + +\code +int RXAFS Link(IN struct rx connection *a rxConnP, + IN AFSFid *a dirFidP, + IN char *a nameP, + IN AFSFid *a existingFidP, + OUT AFSFetchStatus *a newFidStatP, + OUT AFSFetchStatus *a newDirStatP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 140] Create a hard link named a nameP in the directory identified by a +dirFidP. The file serving as the basis for the hard link is identified by +existingFidP. The name offered in a nameP must be less than AFSNAMEMAX (256) +characters long. Existing callbacks to the a dirFidP directory are broken +before the hard link creation completes. The status fields for the file itself +and its parent's directory are returned in a newFidStatP and a newDirStatP +respectively. +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EISDIR An attempt was made to create a hard link to a directory. +\n EXDEV Hard link attempted across directories. + + \subsubsection sec5-1-3-10 Section 5.1.3.10: RXAFS MakeDir - Create a +directory + +\code +int RXAFS MakeDir(IN struct rx connection *a rxConnP, + IN AFSFid *a parentDirFid,P + IN char *a newDirNameP, + IN AFSStoreStatus *a currStatP, + OUT AFSFid *a newDirFidP, + OUT AFSFetchStatus *a dirFidStatP, + OUT AFSFetchStatus *a parentDirStatP, + OUT AFSCallBack *a newDirCallBackP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 141] Create a directory named a newDirNameP within the directory +identified by a parentDirFidP. The initial status fields for the new directory +are provided in a currStatP. The new directory's name must be less than +AFSNAMEMAX (256) characters long. The new directory's ACL is inherited from its +parent. Existing callbacks on the parent directory are broken before the +creation completes. Upon successful directory creation, the new directory's +file identifier is returned in a newDirFidP, and the resulting status +information for the new and parent directories are stored in a dirFidStatP and +a parentDirStatP respectively. In addition, a callback for the new directory is +returned in a newDirCallBackP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EINVAL The directory name provided is unacceptable. + + \subsubsection sec5-1-3-11 Section 5.1.3.11: RXAFS RemoveDir - Remove a +directory + +\code +int RXAFS RemoveDir(IN struct rx connection *a rxConnP, + IN AFSFid *a parentDirFidP, + IN char *a dirNameP, + OUT AFSFetchStatus *a newParentDirStatP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 142] Remove the directory named a dirNameP from within its parent +directory, identified by a parentDirFid. The directory being removed must be +empty, and its name must be less than AFSNAMEMAX (256) characters long. +Existing callbacks to the directory being removed and its parent directory are +broken before the deletion completes. Upon successful deletion, the status +fields for the parent directory are returned in a newParentDirStatP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. + + \subsubsection sec5-1-3-12 Section 5.1.3.12: RXAFS GetStatistics - Get +common File Server statistics + +\code +int RXAFS GetStatistics(IN struct rx connection *a rxConnP, + OUT ViceStatistics *a FSInfoP) +\endcode +\par Description +[Opcode 146] Fetch the structure containing a set of common File Server +statistics. These numbers represent accumulated readings since the time the +File Server last restarted. For a full description of the individual fields +contained in this structure, please see Section 5.1.2.6. +\par +Rx connection information for the related File Server is contained in a +rxConnP. +\par Error Codes +---No error codes generated. + + \subsubsection sec5-1-3-13 Section 5.1.3.13: RXAFS GiveUpCallBacks - +Ask the File Server to break the given set of callbacks on the corresponding +set of file identifiers + +\code +int RXAFS GiveUpCallBacks(IN struct rx connection *a rxConnP, + IN AFSCBFids *a fidArrayP, + IN AFSCBs *a callBackArrayP) +\endcode +\par Description +[Opcode 147] Given an array of up to AFSCBMAX file identifiers in a fidArrayP +and a corresponding number of callback structures in a callBackArrayP, ask the +File Server to remove these callbacks from its register. Note that this routine +only affects callbacks outstanding on the given set of files for the host +issuing the RXAFS GiveUpCallBacks call. Callback promises made to other +machines on any or all of these files are not affected. +\par +Rx connection information for the related File Server is contained in a +rxConnP. +\par Error Codes +EINVAL More file identifiers were provided in the a fidArrayP than callbacks in +the a callBackArray. + + \subsubsection sec5-1-3-14 Section 5.1.3.14: RXAFS GetVolumeInfo - Get +information about a volume given its name + +\code +int RXAFS GetVolumeInfo(IN struct rx connection *a rxConnP, + IN char *a volNameP, + OUT VolumeInfo *a volInfoP) +\endcode +\par Description +[Opcode 148] Ask the given File Server for information regarding a volume whose +name is a volNameP. The volume name must be less than AFSNAMEMAX characters +long, and the volume itself must reside on the File Server being probed. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Please note that definitions for the error codes with VL prefixes may +be found in the vlserver.h include file +\par Error Codes +Could not contact any of the corresponding Volume Location Servers. +VL BADNAME An improperly-formatted volume name provided. +\n VL ENTDELETED An entry was found for the volume, reporting that the volume +has been deleted. +\n VL NOENT The given volume was not found. + + \subsubsection sec5-1-3-15 Section 5.1.3.15: RXAFS GetVolumeStatus - +Get basic status information for the named volume + +\code +int RXAFS GetVolumeStatus(IN struct rx connection *a rxConnP, + IN long a volIDP, + OUT AFSFetchVolumeStatus *a volFetchStatP, + OUT char *a volNameP, + OUT char *a offLineMsgP, + OUT char *a motdP) +\endcode +\par Description +[Opcode 149] Given the numeric volume identifier contained in a volIDP, fetch +the basic status information corresponding to that volume. This status +information is stored into a volFetchStatP. A full description of this status +structure is found in Section 5.1.2.8. In addition, three other facts about the +volume are returned. The volume's character string name is placed into a +volNameP. This name is guaranteed to be less than AFSNAMEMAX characters long. +The volume's offline message, namely the string recording why the volume is +off-line (if it is), is stored in a offLineMsgP . Finally, the volume's +"Message of the Day" is placed in a motdP. Each of the character strings +deposited into a offLineMsgP and a motdP is guaranteed to be less than +AFSOPAQUEMAX (1,024) characters long. +\par +Rx connection information for the related File Server is contained in a +rxConnP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EINVAL A volume identifier of zero was specified. + + \subsubsection sec5-1-3-16 Section 5.1.3.16: RXAFS SetVolumeStatus - +Set the basic status information for the named volume + +\code +int RXAFS SetVolumeStatus(struct rx connection *a rxConnP, + long avolIDP, + AFSStoreVolumeStatus *a volStoreStatP, + char *a volNameP, + char *a offLineMsgP, + char *a motdP) +/* for the named volume */ +\endcode +\par Description +[Opcode 150] Given the numeric volume identifier contained in a volIDP, set +that volume's basic status information to the values contained in a +volStoreStatP. A full description of the fields settable by this call, +including the necessary masking, is found in Section 5.1.2.9. In addition, +three other items relating to the volume may be set. Non-null character strings +found in a volNameP, a offLineMsgP, and a motdP will be stored in the volume's +printable name, off-line message, and "Message of the Day" fields respectively. +The volume name provided must be less than AFSNAMEMAX (256) characters long, +and the other two strings must be less than AFSOPAQUEMAX (1,024) characters +long each. +\par +Rx connection information for the related File Server is contained in a +rxConnP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EINVAL A volume identifier of zero was specified. + + \subsubsection sec5-1-3-17 Section 5.1.3.17: RXAFS GetRootVolume - +Return the name of the root volume for the file system + +\code +int RXAFS GetRootVolume(IN struct rx connection *a rxConnP, + OUT char *a rootVolNameP) +\endcode +\par Description +[Opcode 151] Fetch the name of the volume which serves as the root of the AFS +file system and place it into a rootVolNameP. This name will always be less +than AFSNAMEMAX characters long. Any File Server will respond to this call, not +just the one hosting the root volume. The queried File Server first tries to +discover the name of the root volume by reading from the +/usr/afs/etc/RootVolume file on its local disks. If that file doesn't exist, +then it will return the default value, namely "root.afs". +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +---No error codes generated. + + \subsubsection sec5-1-3-18 5.1.3.18: RXAFS CheckToken - (Obsolete) +Check that the given user identifier matches the one in the supplied +authentication token + +\code +int RXAFS CheckToken(IN struct rx connection *a rxConnP, + IN long ViceId, + IN AFSOpaque *token) +\endcode +\par Description +[Opcode 152] This function only works for the now-obsolete RPC facility used by +AFS, R. For modern systems using the Rx RPC mechanism, we always get an error +return from this routine. +\par +Rx connection information for the related File Server is contained in a +rxConnP. +\par Error Codes +ECONNREFUSED Always returned on Rx connections. + + \subsubsection sec5-1-3-19 Section 5.1.3.19: RXAFS GetTime - Get the +File Server's time of day + +\code +int RXAFS GetTime(IN struct rx connection *a rxConnP, + OUT unsigned long *a secondsP, + OUT unsigned long *a uSecondsP) +\endcode +\par Description +[Opcode 153] Get the current time of day from the File Server specified in the +Rx connection information contained in a rxConnP. The time is returned in +elapsed seconds (a secondsP) and microseconds (a uSecondsP) since that standard +unix "start of the world". +\par Error Codes +---No error codes generated. + + \subsubsection sec5-1-3-20 Section 5.1.3.20: RXAFS NGetVolumeInfo - Get +information about a volume given its name + +\code +int RXAFS NGetVolumeInfo(IN struct rx connection *a rxConnP, + IN char *a volNameP, + OUT AFSVolumeInfo *a volInfoP) +\endcode +\par Description +[Opcode 154] This function is identical to RXAFS GetVolumeInfo() (see Section +5.1.3.14), except that it returns a struct AFSVolumeInfo instead of a struct +VolumeInfo. The basic difference is that struct AFSVolumeInfo also carries an +accompanying UDP port value for each File Server listed in the record. + + \subsubsection sec5-1-3-21 Section 5.1.3.21: RXAFS BulkStatus - Fetch +the status information regarding a set of given file system objects + +\code +int RXAFS BulkStatus(IN struct rx connection *a rxConnP, + IN AFSCBFids *a fidToStatArrayP, + OUT AFSBulkStats *a currStatArrayP, + OUT AFSCBs *a callBackArrayP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 155] This routine is identical to RXAFS FetchStatus() as described in +Section 5.1.3.2, except for the fact that it allows the caller to ask for the +current status fields for a set of up to AFSCBMAX (50) file identifiers at +once. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EINVAL The number of file descriptors for which status information was +requested is illegal. + + \subsubsection sec5-1-3-22 Section 5.1.3.22: RXAFS SetLock - Set an +advisory lock on the given file identifier + +\code +int RXAFS SetLock(IN struct rx connection *a rxConnP, + IN AFSFid *a fidToLockP, + IN ViceLockType a lockType, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 156] Set an advisory lock on the file identified by a fidToLockP. There +are two types of locks that may be specified via a lockType: LockRead and +LockWrite. An advisory lock times out after AFS LOCKWAIT (5) minutes, and must +be extended in order to stay in force (see RXAFS ExtendLock(), Section +5.1.3.23). +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EINVAL An illegal lock type was specified. +\n EWOULDBLOCK The lock was already incompatibly granted to another party. + + \subsubsection sec5-1-3-23 Section 5.1.3.23: RXAFS ExtendLock - Extend +an advisory lock on a file + +\code +int RXAFS ExtendLock(IN struct rx connection *a rxConnP, + IN AFSFid *a fidToBeExtendedP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 157] Extend the advisory lock that has already been granted to the +caller on the file identified by a fidToBeExtendedP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EINVAL The caller does not already have the given file locked. + + \subsubsection sec5-1-3-24 Section 5.1.3.24: RXAFS ReleaseLock - +Release the advisory lock on a file + +\code +int RXAFS ReleaseLock(IN struct rx connection *a rxConnP, + IN AFSFid *a fidToUnlockP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +[Opcode 158] Release the advisory lock held on the file identified by a +fidToUnlockP. If this was the last lock on this file, the File Server will +break all existing callbacks to this file. +\par +Rx connection information for the related File Server is contained in a +rxConnP. Volume version information is returned for synchronization purposes in +a volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. + + \subsubsection sec5-1-3-25 Section 5.1.3.25: RXAFS XStatsVersion - Get +the version number associated with the File Server's extended statistics +structure + +\code +int RXAFS XStatsVersion(IN struct rx connection *a rxConnP, + OUT long *a versionNumberP) +\endcode +\par Description +[Opcode 159] This call asks the File Server for the current version number of +the extended statistics structures it exports (see RXAFS GetXStats(), Section +5.1.3.26). The version number is placed into a versionNumberP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. +\par Error Codes +---No error codes generated. + + \subsubsection sec5-1-3-26 Section 5.1.3.26: RXAFS GetXStats - Get the +current contents of the specified extended statistics structure + +\code +int RXAFS GetXStats(IN struct rx connection *a rxConnP, + IN long a clientVersionNumber, + IN long a collectionNumber, + OUT long *a srvVersionNumberP, + OUT long *a timeP, + OUT AFS CollData *a dataP) +\endcode +\par Description +[Opcode 160] This function fetches the contents of the specified File Server +extended statistics structure. The caller provides the version number of the +data it expects to receive in a clientVersionNumber. Also provided in a +collectionNumber is the numerical identifier for the desired data collection. +There are currently two of these data collections defined: AFS XSTATSCOLL CALL +INFO, which is the list of tallies of the number of invocations of internal +File Server procedure calls, and AFS XSTATSCOLL PERF INFO, which is a list of +performance-related numbers. The precise contents of these collections are +described in Sections 5.1.2.7. The current version number of the File Server +collections is returned in a srvVersionNumberP, and is always set upon return, +even if the caller has asked for a difierent version. If the correct version +number has been specified, and a supported collection number given, then the +collection data is returned in a dataP. The time of collection is also +returned, being placed in a timeP. +\par +Rx connection information for the related File Server is contained in a +rxConnP. +\par Error Codes +---No error codes are generated. + + \subsection sec5-1-4 Section 5.1.4: Streamed Function Calls + +\par +There are two streamed functions in the File Server RPC interface, used to +fetch and store arbitrary amounts of data from a file. While some non-streamed +calls pass such variable-length objects as struct AFSCBFids, these objects have +a pre-determined maximum size. +\par +The two streamed RPC functions are also distinctive in that their single Rxgen +declarations generate not one but two client-side stub routines. The first is +used to ship the IN parameters off to the designated File Server, and the +second to gather the OUT parameters and the error code. If a streamed +definition declares a routine named X YZ(), the two resulting stubs will be +named StartX YZ() and EndX YZ(). It is the application programmer's job to +first invoke StartX YZ(), then manage the unbounded data transfer, then finish +up by calling EndX YZ(). The first longword in the unbounded data stream being +fetched from a File Server contains the number of data bytes to follow. The +application then reads the specified number of bytes from the stream. +\par +The following sections describe the four client-side functions resulting from +the Fetch-Data() and StoreData() declarations in the Rxgen interface definition +file. These are the actual routines the application programmer will include in +the client code. For reference, here are the interface definitions that +generate these functions. Note that the split keyword is what causes Rxgen to +generate the separate start and end routines. In each case, the number after +the equal sign specifies the function's identifying opcode number. The opcode +is passed to the File Server by the StartRXAFS FetchData() and StartRXAFS +StoreData() stub routines. + +\code +FetchData(IN AFSFid *a_fidToFetchP, + IN long a_offset, + IN long a_lenInBytes, + OUT AFSFetchStatus *a_fidStatP, + OUT AFSCallBack *a_callBackP, + OUT AFSVolSync *a_volSyncP) split = 130; + +StoreData(IN AFSFid *Fid, + IN AFSStoreStatus *InStatus, + IN long Pos, + IN long Length, + IN long FileLength, + OUT AFSFetchStatus *OutStatus, + OUT AFSVolSync *a_volSyncP) split = 133; +\endcode + + \subsubsection sec5-1-4-1 Section 5.1.4.1: StartRXAFS FetchData - Begin +a request to fetch file data + +\code +int StartRXAFS FetchData(IN struct rx call *a rxCallP, + IN AFSFid *a fidToFetchP, + IN long a offset, + IN long a lenInBytes) +\endcode + +\par Description +Begin a request for a lenInBytes bytes of data starting at byte offset a offset +from the file identified by a fidToFetchP. After successful completion of this +call, the data stream will make the desired bytes accessible. The first +longword in the stream contains the number of bytes to actually follow. +\par +Rx call information to the related File Server is contained in a rxCallP. +\par Error Codes +---No error codes generated. + + \subsubsection sec5-1-4-2 Section 5.1.4.2: EndRXAFS FetchData - +Conclude a request to fetch file data + +\code +int EndRXAFS FetchData(IN struct rx call *a rxCallP, + OUT AFSFetchStatus *a fidStatP, + OUT AFSCallBack *a callBackP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +Conclude a request to fetch file data, as commenced by an StartRXAFS +FetchData() invocation. By the time this routine has been called, all of the +desired data has been read off the data stream. The status fields for the file +from which the data was read are stored in a fidStatP. If the file was from a +read/write volume, its callback information is placed in a callBackP. +\par +Rx call information to the related File Server is contained in a rxCallP. +Volume version information is returned for synchronization purposes in a +volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. EIO Given file +could not be opened or statted on the File Server, or there was an error +reading the given data off the File Server's disk. +\n -31 An Rx write into the stream ended prematurely. + + \subsubsection sec5-1-4-3 Section 5.1.4.3: StartRXAFS StoreData - Begin +a request to store file data + +\code +int StartRXAFS StoreData(IN struct rx call *a rxCallP, + IN AFSFid *a fidToStoreP, + IN reStatus *a fidStatusP, + IN AFSStolong a offset, + IN long a lenInBytes, + IN long a fileLenInBytes) +\endcode +\par Description +Begin a request to write a lenInBytes of data starting at byte offset a offset +to the file identified by a fidToStoreP, causing that file's length to become a +fileLenInBytes bytes. After successful completion of this call, the data stream +will be ready to begin accepting the actual data being written. +\par +Rx call information to the related File Server is contained in a rxCallP. +\par Error Codes +---No error codes generated. + + \subsubsection sec5-1-4-4 Section 5.1.4.4: EndRXAFS StoreData - +Conclude a request to store file data + +\code +int EndRXAFS StoreData(IN struct rx call *a rxCallP, + OUT AFSFetchStatus *a fidStatP, + OUT AFSCallBack *a callBackP, + OUT AFSVolSync *a volSyncP) +\endcode +\par Description +Conclude a request to store file data, as commenced by a StartRXAFS StoreData() +invocation. By the time this routine has been called, all of the file data has +been inserted into the data stream. The status fields for the file to which the +data was written are stored in a fidStatP. All existing callbacks to the given +file are broken before the store concludes. +\par +Rx call information to the related File Server is contained in a rxCallP. +Volume version information is returned for synchronization purposes in a +volSyncP. +\par Error Codes +EACCES The caller does not have the necessary access rights. +\n EISDIR The file being written to is a symbolic link. +\n ENOSPEC A write to the File Server's file on local disk failed. +\n -32 A short read was encountered by the File Server on the data stream. + + \subsection sec5-1-5 Section 5.1.5: Example of Streamed Function Call +Usage + + \subsubsection sec5-1-5-1 Section 5.1.5.1: Preface + +\par +The following code fragment is offered as an example of how to use the streamed +File Server RPC calls. In this case, a client fetches some amount of data from +the given File Server and writes it to a local file it uses to cache the +information. For simplicity, many issues faced by a true application programmer +are not addressed here. These issues include locking, managing file chunking, +data version number mismatches, volume location, Rx connection management, +defensive programming (e.g., checking parameters before using them), +client-side cache management algorithms, callback management, and full error +detection and recovery. Pseudocode is incorporated when appropriate to keep the +level of detail reasonable. For further descriptions of some of these details +and issues, the reader is referred to such companion documents as AFS-3 +Programmer's Reference: Specification for the Rx Remote Procedure Call +Facility, AFS-3 Programmer's Reference:Volume Server/Volume Location Server +Interface, and AFS-3 Programmer's Reference: Architectural Overview. +\par +A discussion of the methods used within the example code fragment follows +immediately afterwards in Section 5.1.5.3. + + \subsubsection sec5-1-5-2 Section 5.1.5.2: Code Fragment Illustrating +Fetch Operation + +\code +int code; /*Return code*/ +long bytesRead; /*Num bytes read from Rx*/ +struct myConnInfo *connP; /*Includes Rx conn info*/ +struct rx_call *rxCallP; /*Rx call ptr*/ +struct AFSFid *afsFidP; /*Fid for file to fetch*/ +int lclFid; /*Fid for local cache file*/ +long offsetBytes; /*Starting fetch offset*/ +long bytesToFetch; /*Num bytes to fetch*/ +long bytesFromFS; /*Num bytes FileServer returns*/ +char *fetchBuffP; /*Buffer to hold stream data*/ +int currReadBytes; /*Num bytes for current read*/ +/* +* Assume that connP, afsFidP, offsetBytes, lclFid,and +* bytesToFetch have all been given their desired values. +*/ . . . +rxCallP = rx_NewCall(connP->rxConnP); +code = StartRXAFS_FetchData( rxCallP, /*Rx call to use*/ + afsFidP, /*Fid being fetched from*/ + offsetBytes, /*Offset in bytes*/ + bytesToFetch); /*Num bytes wanted*/ +if (code == 0) +{ + bytesRead = rx_Read(rxCallP, &bytesFromFS, sizeof(long)); + if (bytesRead != sizeof(long)) ExitWithError(SHORT_RX_READ); + bytesFromFS = ntohl(bytesFromFS); + xmitBuffer = malloc(FETCH_BUFF_BYTES); + lclFid = open(CacheFileName, O_RDWR, mode); + pos = lseek(lclFid, offsetBytes, L_SET); + while (bytesToFetch > 0) { + currReadBytes = (bytesToFetch > FETCH_BUFF_BYTES) ? + FETCH_BUFF_BYTES : bytesToFetch; + bytesRead = rx_Read(rxCallP, fetchBuffP, currReadBytes); + if (bytesRead != currReadBytes) ExitWithError(SHORT_RX_READ); + code = write(lclFid, fetchBuffP, currReadBytes); + if (code) ExitWithError(LCL_WRITE_FAILED); + bytesToFetch -= bytesRead; + } /*Read from the Rx stream*/ + close(lclFid); +} else ExitWithError(code); +code = EndRXAFS_FetchData( rxCallP, /*Rx call to use*/ + fidStatP, /*Resulting stat fields*/ + fidCallBackP, /*Resulting callback info*/ + volSynchP); /*Resulting volume sync info*/ + code = rx_EndCall(rxCallP, code); +return(code); . . . +\endcode + + \subsubsection sec5-1-5-3 Section 5.1.5.3: Discussion and Analysis + +\par +The opening assumption in this discussion is that all the information required +to do the fetch has already been set up. These mandatory variables are the +client-side connection information for the File Server hosting the desired +file, the corresponding AFS file identifier, the byte offset into the file, the +number of bytes to fetch, and the identifier for the local file serving as a +cached copy. +\par +Given the Rx connection information stored in the client's connP record, rx +NewCall() is used to create a new Rx call to handle this fetch operation. The +structure containing this call handle is placed into rxCallP. This call handle +is used immediately in the invocation of StartRXAFS FetchData(). If this setup +call fails, the fragment exits. Upon success, though, the File Server will +commence writing the desired data into the Rx data stream. The File Server +first writes a single longword onto the stream announcing to the client how +many bytes of data will actually follow. The fragment reads this number with +its first rx Read() call. Since all Rx stream data is written in network byte +order, the fragment translates the byte count to its own host byte order first +to properly interpret it. Once the number of bytes to appear on the stream is +known, the client code proceeds to open the appropriate cache file on its own +local disk and seeks to the appropriate spot within it. A buffer into which the +stream data will be placed is also created at this time. +\par +The example code then falls into a loop where it reads all of the data from the +File Server and stores it in the corresponding place in the local cache file. +For each iteration, the code decides whether to read a full buffer's worth or +the remaining number of bytes, whichever is smaller. After all the data is +pulled off the Rx stream, the local cache file is closed. At this point, the +example finishes off the RPC by calling EndRXAFS FetchData(). This gathers in +the required set of OUT parameters, namely the status fields for the file just +fetched, callback and volume synchronization information, and the overall error +code for the streamed routine. The Rx call created to perform the fetch is then +terminated and cleaned up by invoking rx EndCall(). + + \subsection sec5-1-6 Section 5.1.6: Required Caller Functionality + +\par +The AFS File Server RPC interface was originally designed to interact only with +Cache Manager agents, and thus made some assumptions about its callers. In +particular, the File Server expected that the agents calling it would +potentially have stored callback state on file system objects, and would have +to be periodically pinged in order to garbage-collect its records, removing +information on dead client machines. Thus, any entity making direct calls to +this interface must mimic certain Cache Manager actions, and respond to certain +Cache Manager RPC interface calls. +\par +To be safe, any application calling the File Server RPC interface directly +should export the entire Cache Manager RPC interface. Realistically, though, it +will only need to provide stubs for the three calls from this interface that +File Servers know how to make: RXAFSCB InitCallBackState(), RXAFSCB Probe() and +RXAFSCB CallBack(). The very first File Server call made by this application +will prompt the given File Server to call RXAFSCB InitCallBackState(). This +informs the application that the File Server has no record of its existence and +hence this "Cache Manager" should clear all callback information for that +server. Once the application responds positively to the inital RXAFSCB +InitCallBackState(), the File Server will treat it as a bona fide, +fully-fledged Cache Manager, and probe it every so often with RXAFSCB Probe() +calls to make sure it is still alive. + + \section sec5-2 Section 5.2: Signal Interface + +\par +While the majority of communication with AFS File Servers occurs over the RPC +interface, some important operations are invoked by sending unix signals to the +process. This section describes the set of signals recognized by the File +Server and the actions they trigger upon receipt, as summarized below: +\li SIGQUIT: Shut down a File Server. +\li SIGTSTP: Upgrade debugging output level. +\li SIGHUP: Reset debugging output level. +\li SIGTERM: Generate debugging output specifically concerning open files +within the File Server process. + + \subsection sec5-2-1 Section 5.2.1: SIGQUIT: Server Shutdown + +\par +Upon receipt of this signal, the File Server shuts itself down in an orderly +fashion. It first writes a message to the console and to its log file +(/usr/afs/logs/FileLog) stating that a shutdown has commenced. The File Server +then flushes all modified buffers and prints out a set of internal statistics, +including cache and disk numbers. Finally, each attached volume is taken +offline, which means the volume header is written to disk with the appropriate +bits set. +\par +In typical usage, human operators do not send the SIGQUIT signal directly to +the File Server in order to affect an orderly shutdown. Rather, the BOS Server +managing the server processes on that machine issues the signal upon receipt of +a properly-authorized shutdown RPC request. + + \subsection sec5-2-2 Section 5.2.2: SIGTSTP: Upgrade Debugging Level + +\par +Arrival of a SIGTSTP signal results in an increase of the debugging level used +by the File Server. The routines used for writing to log files are sensitive to +this debugging level, as recorded in the global LogLevel variable. +Specifically, these routines will only generate output if the value of LogLevel +is greater than or equal to the value of its threshold parameter. By default, +the File Server sets LogLevel to zero upon startup. If a SIGTSTP signal is +received when the debugging level is zero, it will be bumped to 1. If the +signal arrives when the debugging level is positive, its value will be +multiplied by 5. Thus, as more SIGTSTPs are received, the set of debugging +messages eligible to be delivered to log files grows. +\par +Since the SIGTSTP signal is not supported under IBM's AIX 2.2.1 operating +system, this form of debugging output manipulation is not possible on those +platforms. + + \subsection sec5-2-3 Section 5.2.3: SIGHUP: Reset Debugging Level + +\par +Receiving a SIGHUP signal causes a File Server to reset its debugging level to +zero. This effectively reduces the set of debugging messages eligible for +delivery to log files to a bare minimum. This signal is used in conjunction +with SIGTSTP to manage the verbosity of log information. +\par +Since the SIGHUP signal is not supported under IBM's AIX 2.2.1 operating +system, this form of debugging output manipulation is not possible on those +platforms. + + \subsection sec5-2-4 Section 5.2.4: SIGTERM: File Descriptor Check + +\par +Receipt of a SIGTERM signal triggers a routine which sweeps through the given +File Server's unix file descriptors. For each possible unix fid slot, an +fstat() is performed on that descriptor, and the particulars of each open file +are printed out. This action is designed solely for debugging purposes. + + \section sec5-3 Section 5.3: Command Line Interface + +\par +Another interface exported by the File Server is the set of command line +switches it accepts. Using these switches, many server parameters and actions +can be set. Under normal conditions, the File Server process is started up by +the BOS Server on that machine, as described in AFS-3 Programmer's Reference: +BOS Server Interface. So, in order to utilize any combination of these +command-line options, the system administrator must define the File Server +bnode in such a way that these parameters are properly included. Note that the +switch names must be typed exactly as listed, and that abbreviations are not +allowed. Thus, specifying -b 300 on the command line is unambiguous, directing +that 300 buffers are to be allocated. It is not an abbreviation for the -banner +switch, asking that a message is to be printed to the console periodically. +\par +A description of the set of currently-supported command line switches follows. +\li -b <# buffers> Choose the number of 2,048-byte data buffers to allocate at +system startup. If this switch is not provided, the File Server will operate +with 70 such buffers by default. +\li -banner This switch instructs the File Server to print messages to the +console every 10 minutes to demonstrate it is still running correctly. The text +of the printed message is: File Server is running at