diff --git a/doc/txt/rx-spec.txt b/doc/txt/rx-spec.txt new file mode 100644 index 0000000000..c1499132eb --- /dev/null +++ b/doc/txt/rx-spec.txt @@ -0,0 +1,921 @@ +Rx protocol specification draft +Nickolai Zeldovich, kolya@MIT.EDU + +Introduction +============ + +Rx is a client-server RPC protocol, an extended and combined version +of the older R and RFTP protocols. This document describes Rx, but +the details of Rx security protocols (such as Rxkad) are not specified. + +Rx communicates via UDP datagrams on a user-specified port. Rx also +provides for multiplexing of Rx services on a single port, via a +16-bit service ID which identifies a particular Rx service that's +listening on a given port akin to a port number. Therefore, an Rx +service is identified by a triple of . + +The protocol is connection-oriented -- a client and a server must +first hand-shake and establish a connection before Rx calls can be +made. Said hand-shaking is implicit upon the first request if no +authentication is desired, or can consist of a pair of Challenge +and Response requests in order to establish authentication between +the client and the server. + +Protocol Overview +================= + +As mentioned above, Rx uses UDP/IP datagrams on a user-specified +port to communicate. An optional user-selectable authentication +and encryption method can be used to achieve desired security. +Each Rx server may provide multiple services, specified by the +Service ID. This allows for service multiplexing, much in the +same way as UDP port numbers allow for multiplexing of UDP +datagrams addressed to the same host. + +Each client and server pair that want to communicate using Rx must +establish an Rx connection, which can be thought of as a context +for all subsequent Rx activity between these two parties. An Rx +connection can only be associated with a single Rx service. + +Each Rx connection context contains multiple channels, which are +used for data transmission and actually performing an RPC call. +The channels are independent of each other, allowing multiple +RPC calls to be performed to the same Rx server simultaneously. + +An Rx call involves the transmission of call arguments over an Rx +channel to the server and reception of the reply data. For each +Rx call, an available Rx channel must be allocated exclusively to +that call. The channel cannot be used for anything else until the +call completes. After call completion, the channel may be reused +for subsequent Rx calls. + +Rx Connections +============== + +This section makes many references to fields of an Rx header; see +the ``Packet Formats'' section for specific layout of the Rx header. + +The connection epoch is a unique value chosen by Rx on startup and +used by the peer to both to identify connections to this host, and +to detect when this host's Rx restarts. An Rx connection between +two hosts is identified by: + + { Epoch, Connection ID, Peer IP, Peer Port }, + if the high bit of the epoch (+) is not set + { Epoch, Connection ID }, + if the high bit of the epoch (+) is set + +This means that if the high epoch bit is set, the recipient of a +packet should accept packets for this Rx connection from any IP +address and port number. Conversely, if the high bit is not set, +the IP and port number must be the same in order for packets to +be properly recognized as being part of the same connection. + +Connection ID is chosen by the client that establishes the connection. +The last two bits of the same 32-bit field are used by Rx to multiplex +between 4 parallel calls on the same connection. Each one of them is +called an Rx channel, and therefore the field is denoted "Channel ID". + +Call number identifies a particular call within a channel (so there +are four call numbers associated with an Rx connection). Each new +call should start with a higher number than the previous call, and +typically this is just the previous call number + 1. The initial +call number must be non-zero, since call number zero indicates a +connection-only Rx packet (see below). The call number is chosen +by the peer initiating the call. Although only one call can use +a channel at one time, the call number allows peers to distinguish +packets on the same channel that belong to different calls. + +The sequence number is similar to the sequence number in TCP, but +instead of bytes they count packets within a call. Sequence numbers +always start with 1 at the beginning of each call, and are incremented +by 1 for each additional packet sent. Retransmissions in Rx are done +on a packet-by-packet basis, identified by these sequence numbers. + +Every outgoing packet associated with a certain connection is stamped +with a serial number in the serial field, and the serial number is +incremented by 1 for every packet sent. This is used by the flow +control mechanisms (described below). The serial number for a +connection should start out with 1 (i.e., the first packet sent +should have a serial number of 1.) + +Service ID identifies a particular Rx service running on a given +host/port combination. This is analogous to how UDP port numbers +allow multiplexing packets to a single IP address. Note that once +an Rx connection has been created, the service ID may not be changed; +existing implementations cache the service ID value for a given +connection, and will ignore service ID values in subsequent packets. + +The Checksum field allows for an optional packet checksum. A zero +checksum field value means that checksums are not being computed. +An Rx security protocol (identified by the security field, described +below) may choose to use this field to transport some checksum of +the packet that is computed and verified by it (for example, rxkad +uses this field for a cryptographic header checksum). Rx itself +makes no use of the checksum field. + +The status field allows for additional user flags to be transported +with each packet. These have no significance to the protocol itself. +These flags are associated with a call rather than an individual +packet. + +The security field specifies the type of security in use on this +connection. These values don't have a defined mapping in the Rx +protocol but rather are mapped to specific Rx security types by +the application using Rx. + +An Rx security protocol can use the checksum field as described +above, and can also modify the packet payload in any way, for +instance by encrypting the contents or adding headers or trailers +specific to the security protocol (although the end result must +be a properly sized packet that Rx will be able to transmit.) + +The "Flags" field consists of a number of single-bit flags with +meanings as follows. The actual bit values are defined below, +in the ``Protocol Constants'' section. + + * CLIENT-INITIATED + This packet originated from an Rx client (as opposed + to server). To avoid packet loops, a server should + always clear the CLIENT-INITIATED flag on any packets + it sends, and discard incoming packets without the + CLIENT-INITIATED flag. + + * REQUEST-ACK + Sender is requesting acknowledgement of this packet, + via an Ack packet response. + + * LAST-PACKET + This packet is the last packet in this call from the + sender. + + NOTE: some older Rx implementations, which do not + support the trailing packet size fields in Rx Ack + packets, use the LAST-PACKET flag for computing the + MTU. In particular, when a DATA packet with the + REQUEST-ACK flag but without the LAST-PACKET flag + is received, the MTU is adjusted down to the size + of that packet. + + * MORE-PACKETS + More packets are going to be following this one. This + flag is set on all but the last packet by the sender + transmitting a list of packets at once, for possible + optimization at the receiver end. + + * SLOW-START-OK + In an ack packet, indicates that the sender of this + packet supports the slow-start mechanism, described + below under ``Flow Control''. + + * JUMBO-PACKET + In a data packet, indicates that this packet is part + of a jumbogram, and is not the last one. See the + ``Jumbograms'' section below for more details. + +Packet Types +============ + +The "Type" field indicates the contents of this packet. Actual +values are specified in the ``Protocol Constants'' section. +This section describes the simpler packet types, and subsequent +sections cover more complex packet types in more detail. + +Certain type packets are connection-only requests (that is, they +are not associated with an RPC call). A connection-only request +is indicated by a zero call number. Valid packet types in a +connection-only context are Abort, Challenge, Response, Debug, +Version, and the parameter exchange packet types. All other +packets can only be used in the context of a call. Additionally, +Abort can be used both in a connection and call context. + +The payload of the packet following the header depends on the +type of the field, as follows: + + * DATA type (Standard data packet) + + The payload of a data packet is simply the Rx payload, + corresponding to the sequence number and call specified + in the header. The actual data that is transmitted in + Rx data packets is described below. + + The receipt of a data packet by a client implicitly + acknowledges that the server has received and processed + all the packets that have been transmitted to it as + part of this call. + + * ACK type (Acknowledgement of received data) + + An acknowledgement packet provides information about + which packets were or were not received by the peer, + and other useful parameters. The semantics of these + packets are described below in the ``Call Layer'' + section. + + * BUSY type (Busy response) + + When a client tries to start a new call on a channel + which the server still considers active, a busy response + is returned. The call and channel number in the packet + header indicate which call is being rejected. This packet + type has no payload associated with it. + + * ABORT type (Abort packet) + + Indicates that the relevant connection or call (if the + call number field is non-zero) has encountered an error + and has been terminated. The payload of the packet has + a network-byte-order 32-bit user error code. + + * ACKALL type (Acknowledgement of all packets) + + An acknowledge-all packet indicates the obvious: the peer + wants to acknowledge the receipt of all packets sent to + it. This could be used, for example, when a connection + is being closed and the client wants to ensure that no + retransmissions are attempted after it exits. + + There is no payload associated with an acknowledge-all + packet. + + * CHALLENGE, RESPONSE types (Challenge request/response) + + The payload of the packet is security-layer-specific + data, and is used to authenticate an Rx connection. + + Perhaps this should include a reference to some spec + on rxkad (or rxkad should just be added to this spec.) + + * DEBUG type (Debug packet) + + Rx supports an optional debugging interface; see the + ``Debugging'' section below for more details. + + * PARAMS types (Parameter exchange) + + These types were assigned in AFS 3.2 but never used for + anything, and therefore have no protocol significance + at this time. + + * VERSION type (Get AFS version) + + If a server receives a packet with a type value of 13, and + the client-initiated flag set, it should respond with a + 65-byte payload containing a string that identifies the + version of AFS software it is running. The response should + not have the client-initiated flag set. + + Nothing should respond to a version packet without the + client-initiated flag, to avoid infinite packet loops. + +Call Layer +========== + + The call layer provides a reliable data transport over an + Rx channel, and is used by the RPC layer to make Rx calls. + One of the most important pieces of the call layer is the + Rx acknowledgement packet. The acknowledgement packet is + used by Rx to determine when retransmissions are needed, + as well as determining the proper transmission / receiving + parameters to use (such as the transmit window size and + jumbogram length, described in more detail below). + + A new call is established by the client simply sending a + data packet to the server on an available channel. Either + side can indicate that they have no more data to send by + setting the LAST-PACKET flag in their last Rx packet. The + call remains open until the upper layer informs Rx that it + is done with the call. (The upper layer in this case would + most likely be the Rx RPC layer.) + + The structure of an Rx acknowledgement packet is described + in the Packet Formats section. We will refer to particular + fields of the acknowledgement packet here by names. + + The field specifies the number of packets that + the sender of the acknowledgement is willing to provide for + receiving packets for this call. The sender, presumably, + should not send packets beyond the number specified here, + without receiving further acknowledgement allowing it. + + The field indicates the maximum packet skew that + the sender of this packet has seen for this call. If a + packet is received N packets later than expected (based + on the packet's serial number, i.e. if the last received + packet's serial number is N higher than this packet's), + then it is defined to have a skew of N. This can be used + to avoid retransmission because of packet reordering. + + The number specifies the sequence number of + the first packet that is being explicitly acknowledged (either + positively or negatively) by this packet. All packets with + sequence numbers smaller than this are implicitly acknowledged. + + The field, previously used to indicate the previous + received packet, is no longer used. It should be set to zero + by the sender and not interpreted by the receiver. + + The field indicates the serial number of the + packet which has triggered this acknowledgement, or zero if there + is no such packet (i.e. the ack packet was delayed and should not + be used for round-trip time computation). The receiver should + note that any transmitted packets with a serial number less than + this, which are not acknowledged by this packet, are likely lost + or reordered. Thus, these packets should be retransmitted, after + a possible delay to allow for packet reordering (as measured by + packet skew). + + The trailing fields after the variable-length acknowledgements + section are not always 32-bit aligned with respect to the packet, + and aren't always present. (Their presence depends on the Rx + version of the peer.) The maximum and recommended packet sizes + are, respectively, the largest possible packet size that the peer + is willing to accept from us, and the size of the packet they + would prefer to receive. In absence of these fields, it should + be assumed that the maximum allowed packet size is 1444 bytes. + + The receive window size indicates the size of the ACK sender's + receive window, in packets. Its use is described below in + the "Flow Control" section. If this field is absent, the + implementation must assume a maximum window size of 15 packets; + older implementations that do not support this trailing field + only allow for a window of 15 packets. + + The "Max Packets per Jumbogram" field indicates how many packets + the ACK sender is willing to receive in a jumbogram (also + described below). All packets in a jumbogram are always of the + same size (except the last one), regardless of the maximum and + recommended packet sizes described above. + + The field specifies a particular type of an ack packet. + Valid reason codes are specified in the ``Packet Formats and + Protocol Constants'' section; their meanings are as follows: + + REQUESTED + Acknowledgement was requested. The peer received + a packet from us with the acknowledgement-requested + flag set, and is acknowledging it. + + DUPLICATE + A duplicate packet was received. The duplicate + packet's serial number is in the field. + + OUT-OF-SEQUENCE + A packet was received out of sequence. The serial + number of said packet is in the field. + + WINDOW-EXCEEDED + A packet was received but exceeded the current + receive window, and was dropped. + + NO-SPACE + A packet was received, but no buffer space was + available and therefore it was dropped. + + PING + This is a keep-alive packet, used to verify that + the peer is still alive. If the REQUEST-ACK flag + in the Rx packet is set, the recipient of this + packet should reply with a PING-RESPONSE packet. + + PING-RESPONSE + This is a response to a keep-alive ack (ping). + + DELAYED + A delayed acknowledgement, usually because a certain + amount of time has passed since the receipt of the + last packet and there are outstanding unacknowledged + packets. Should not be used for RTT computation. + + OTHER + Un-delayed general acknowledgement, which does not + fall in any of the above categories. + + A peer should never delay the transmission of an ack packet + in response to a received packet unless it sets the delayed + ack type field. This is because ack packets (except for + delayed ones) are used for RTT computation by Rx. + + All acknowledgement packets should have the REQUEST-ACK + flag in the Rx header turned off, except for PING type + ack packets. + + The field specifies the number of bytes following + in the acknowledgements section. Each of those bytes indicate + the acknowledgement status corresponding to a sequence number + between firstSequence and firstSequence+ackCount-1 inclusively. + There can be up to 255 bytes in the acknowledgements section. + Typically the ack count is the receive window size of the + ack packet sender, and the individual packet status bytes + correspond to the packets in the current receive window. + The values in each of those bytes can be as follows: + + 0 Explicit negative acknowledgement: packet with the + corresponding sequence number has not been received + or has been dropped. + 1 Explicit acknowledgement: packet with the corresponding + sequence number has been received but not processed by + the application yet. + + It's important to note the distinction between packets with + sequence numbers before firstSequence, between firstSequence + and firstSequence+ackCount-1, and those with sequence numbers + of at least firstSequence+ackCount. Those in the first category + have been passed up to the application level and the sender + (recipient of this ack) can recycle packets with such sequence + numbers. + + Packets in the second category are individually acknowledged + in the acknowledgements section, either as being queued for + the application or not received. The recipient of the ack + should keep all packets with sequence numbers in this range, + but avoid retransmitting the positively acknowledged ones. + Negatively acknowledged packets should be retransmitted. + A more detailed explaination of the retransmit strategy is + given below. + + Packets in the third category are not acknowledged at all, + and the recipient of the ack should assume no knowledge + of their state. Since the Rx receive window should not + exceed the size of an ack packet, the sender shouldn't + have transmitted any packets in this category anyway. + + * Round-trip time computation + + To determine when packet retransmission is necessary, Rx + computes some statistics about the round-trip time between + the two hosts: exponentially-decaying averages of the + round-trip time and the standard deviation thereof. Each + acknowledgement packet which mentions a specific packet in + the field and is not delayed is used to update the + round-trip statistics. First, the round-trip time for this + packet (R) is computed as the difference between the arrival + time of the ack packet and the time we transmitted the + packet with the serial number specified in . + + Next, the round-trip time average and standard deviation + values are updated. For instance, this algorithm could + be used: + + RTTdev = RTTdev * (3/4) + |RTTavg - R| / 4 + RTTavg = RTTavg * (7/8) + R / 8 + + * Packet retransmission + + In order to support reliable data transport, Rx must retransmit + packet which are lost in the network. This must not be done + too early, otherwise we might retransmit a packet whose first + copy is still in transit, thereby wasting bandwidth. + + Rx computes a retransmit timeout value T, and retransmits any + packet which hasn't been positively acknowledged since last + transmission for at least T seconds. This timeout could be + computed as follows from the round-trip statistics above: + + T = RTTavg + 4 * RTTdev + 0.350 + + This allows the packet to be up to 4 deviations late and still + not be retransmitted. The 350 msec fudge factor is used to + compensate for bursty networks, though it is likely becoming + less relevant (and accurate) with time. + + A more clever algorithm could take into account the maximum + packet skew rate, and improve the retransmission strategy to + take into the account the likelihood that a given packet has + been reordered, and give it extra time before retransmission. + + * Keepalive and Timeout + + The upper layer (either the Rx RPC layer or the application) + have to specify a timeout, T, to the call layer. If the peer + is not heard from within T seconds, the call layer declares + the call to be dead and propagates the error to the upper + layer. + + In order to determine whether the peer is still alive or not, + keepalive requests are used. These take form of an ack PING + and PING-RESPONSE packets. When the client has not received + any response from the server, either to the original request + or the keepalive requests, in T seconds, the call times out. + + The following strategy may be used to determine when to send + keepalive requests: + + Compute a keepalive timeout, KT = T/6 + + If the call was initiated KT seconds ago, or KT + seconds have passed since the last keepalive + request transmission, send a keepalive packet. + + This strategy limits the number of transmitted keepalive + packets to a fixed number in the case of a dead server, + and proportional to the real timeout in case of a slow + server. It also allows up to 5 keepalives to be dropped + before the server is erroneously declared dead. + + * Flow Control + + Every Rx client or server has associated with each Rx call a + receive and transmit window. These windows indicate the number + of packets that haven't been fully acknowledged packets (that + is, not read by the peer's application) that an Rx sender can + have outstanding at any time. A sender's transmit window may + never be greater than it's peer's receive window for that call. + The receive windows are exchanged via the "Receive Window Size" + parameter in an Ack packet. + + Rx ``sliding windows'' are similar to those used by TCP, except + they measure packets rather than bytes. Also, in TCP the window + effectively applies to bytes in flight between the two peers, + whileas in Rx the window applies to packets between the user + applications. For example, a transmit window of 8 on a certain + Rx connection means that at most 8 packets can be transmitted + and not yet read by the peer's application at any time. The + sequence number of the first packet that hasn't been read by + the application is indicated by the First Sequence field of + an Ack packet. + + The selection of initial window sizes isn't strictly defined + by the Rx protocol, but here are a few things that one might + want to consider when choosing initial windows: + + * A useful strategy can be to advertise a small receive + window until the application starts reading data, and + advertise a larger window afterwards. + + * The transmit window should be initially a conservative + small value. Once an Ack packet is received, the peer's + advertised receive window can be used to choose a better + transmit window. + + Rx uses the slow start, congestion avoidance, and fast recovery + algorithms[6]. The algorithms are modified to work in the context + of Rx packet-based transmission windows, and are described below. + + These algorithms require two additional variables to be maintained + for each active Rx call: a congestion window, cwind, and a slow + start threshold, ssthresh. + + Define a "negative ack" as an Ack packet that contains a negative + acknowledgement followed by a positive one. Similarly, define a + "positive ack" to be any Ack that is not negative. Upon receiving + three negative acks for a call in a row since the last congestion + avoidance attempt (if any), the Rx protocol enters congestion + avoidance for that Rx call. + + * Slow start, congestion avoidance, and fast recovery algorithms + + First, the congestion window, cwind, is initialized to 1. + The number of unread transmitted packets is now limited not + only by the transmission window, but also by the congestion + window. The latter limit is a little different: Rx may + send up to cwind packets (by sequence number) past the last + contiguous positively acknowledged packet. For example, + if an Ack packet indicates that packets 1, 2 and 8 were + received, and cwind is 2, Rx may transmit packets 3 and 4. + + When congestion occurs (indicated by a negative ack or a + packet retransmission timeout), Rx enters congestion avoidance + and fast recovery. The slow-start threshold, ssthresh, is + set to half of the effective transmission window (minimum of + cwind and transmit window), but no less than 2 packets. + + If triggered by a negative ack, any negatively acknowledged + packets should be retransmitted as soon as possible (i.e. + window-permitting). + + If triggered by a retransmission timeout, the congestion + window is reset to a single packet. + + When in fast-recovery mode, every additional negative ack + packet received causes cwind to be increased by one packet. + A positive ack packet causes cwind to be set to ssthresh, + and terminates fast recovery. At this point we are back + to congestion avoidance, since the cwind is half the original + transmission window. + + When packet acknowledgements are received, the congestion + window should be increased. If cwind is less than ssthresh, + cwind should be increased by 1 for each newly acknowledged + packet. If cwind is at least ssthresh, cwind is increased + by 1 for each newly received Ack packet. + + The size of the receive window should not grow past the size of + an Rx ack packet (which can acknowledge up to 255 packets at a + time.) + +Debugging +========= + +Rx provides for an optional debugging interface, using the Debug AFS +packet type, allowing remote Rx clients to query an Rx server for +some Rx protocol statistics. Not all implementations are required +to implement this interface. Some parts of this interface may also +be specific to a particular implementation of Rx. In order to prevent +packet loops, a server should only reply to debug packets with the +client-initiated flag set. + +The payload of a debug request packet is always the same; both of +the 32-bit quantities are in network byte order: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Debug Type | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Debug Index | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +The debug type indicates the kind of debug information being sent +or requested, and determines the format of the rest of the packet. +The debug index allows some debug types to export array-like data, +indexed by this field. The following debug types are defined for +the Transarc implementation: + + 0x01 Retrieve basic connection statistics + 0x02 Get information about some connections + 0x03 Get information about all connections + 0x04 Get all Rx stats + 0x05 Get all peers of this server + +The index field in the debug packet indicates which element of the +debug information the client wants to access, in cases where there +are multiple entries in question. + +The responses to each of those debug queries contain the following +information: + +1. Retrieve basic connection stats + + An array of general statistics about packet allocation, + server performance, and so on. The first octet in this + response represents the debug protocol version being used + by the server. See RX_DEBUGI_VERSION* in rx/rx.h. + +2, 3. Get information about connections + + Both of these calls return a struct rx_debugConn (see + rx/rx.h), indexed by the "index" field. + + The first version of the debug call (type 2) only retrieves + information about connections which are deemed interesting, + that is, connections which are active, or about to be + reaped. + + The end of the list is signaled by a response where the + connection ID value is 0xFFFFFFFF. + +4. Get Rx stats + + This call returns a struct rx_stats to the client in network + byte order, containing various statistics about the state of + Rx on the server (see rx/rx.h). + +5. Get all Rx peers + + Similar to the connection request above (2, 3) this call + returns all the Rx peers of the server (in a network-byte-order + struct rx_debugPeer), indexed by the index field in the request. + End of list is indicated by a host value of 0xFFFFFFFF. (These + are the first 4 octets.) + +In response to unknown requests, the server returns 0xFFFFFFF8 in the +debug type field. + + XXX The response interface should probably be fixed + to include a fixed header that indicates whether + the request was successfully completed. + +Jumbograms +========== + +To be able to transmit more data in a single packet, Rx supports +``jumbograms'', which are single UDP datagrams containing multiple +sequential Rx DATA packets. In a jumbogram, all packets except the +last one must be of a fixed maximal size (1412 bytes). Because all +the packets in the jumbogram are sequential, only one full header +is needed. Here is what a jumbogram could look like: + + +-----------+---------------+--------------+---------------+ + | Rx header | 1412 byte pkt | Short header | 1412 byte pkt | -> + +-----------+---------------+--------------+---------------+ + + +--------------+- -+-----------------------+ + -> | Short header | ... | <= 1412 byte last pkt | + +--------------+- -+-----------------------+ + +Every Rx packet in a jumbogram except the first one must be preceeded +by the short Rx header, and all packets except the last one must have +the Jumbogram Rx flag set in their respective headers. The number of +packets in a jumbogram may not exceed the peer's advertised Max Packets +Per Jumbogram value in the Ack packet. + +The maximum number of packets per jumbogram should be assumed to be 1 +(i.e., no jumbograms) unless explicitly specified otherwise by an Ack +packet. If an Ack packet is received without the packet-per-jumbogram +field, it might indicate that the peer is now running a version of Rx +that does not support jumbograms, and therefore no jumbograms should +be sent until they are explicitly enabled again. + +The short header in a jumbogram has the following makeup: + + 0 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Flags | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Checksum | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +All the packets in the jumbogram have the same Rx header fields +(from the full Rx header) except for Flags, Checksum, Sequence, +and Serial. The flags and checksum field for subsequent packets +are taken from the short header preceeding that packet in the +jumbogram. The sequence and serial numbers are assumed to be +consecutive, and are incremented by 1 from the first packet in +the jumbogram (ie the full Rx header). + +Retransmitted packets should not be sent in a jumbogram. + +RPC Layer +========= + +This section discusses how an RPC call is made using the Rx protocol. +There are two common ``types'' of Rx calls: simple and streaming. +These mostly reflect a difference in the upper-level API rather than +in the Rx protocol. A simple Rx call has a fixed number of input +variables and a fixed number of output variables. A streaming Rx +call, in addition to the above, allows the user to send and receive +arbitrary amounts of data (whose length should be specified as a +fixed-length argument.) + +In either case, an Rx call consists of two basic stages: client +sending the data to the server, and server sending the response +back to the client. No data can be sent by the client in the +same call after the server has started sending its response. + +Each remote function call associated with a particular Rx service +(identified by the IP-port-serviceId triplet, as mentioned above) +is assigned a 32-bit integer opcode number. To make a simple Rx +call, the caller must transmit the opcode number followed by the +expected arguments for that call over an Rx channel using XDR +encoding. The callee uses XDR to unmarshall the opcode and input +arguments, performs a function call corresponding to that opcode +and arguments, and then uses XDR to encode the return values back +to the caller. The caller then uses XDR to receive the output +variables. + +For streaming calls which send data from the caller to the callee, +the convention is to include the length of the data to be sent as +one of the fixed-length arguments, and send the variable-length +data immediately after the fixed-length portion. For streaming +calls which receive data, the convention is for the callee to first +reply with a fixed-length field specifying the number of bytes it's +about to send, and then send those bytes. Upon completion of the +streaming part of the call, the output arguments are sent back to +the caller in fixed-length XDR form, as with simple calls. + +Packet Formats and Protocol Constants +===================================== + + * Rx packet + + Every simple Rx packet has an Rx header, of the form below. + All quantities are in network byte order. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |+| Connection Epoch | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Connection ID | * | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Call Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Sequence Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Serial Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | Flags | Status | Security | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Checksum | Service ID | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Payload .... + +-+-+-+-+- + + [*] The field marked with * is the Channel ID. The last + two bits of the connection ID are used to multiplex + between 4 parallel calls. + + [+] The bit marked with + is used to indicate that only + the connection ID should be used to identify this + connection, and sender host/port should not be used. + + The values for the Flags field are defined as follows: + + 0000 0001 CLIENT-INITIATED + 0000 0010 REQUEST-ACK + 0000 0100 LAST-PACKET + 0000 1000 MORE-PACKETS + 0001 0000 - Reserved - + 0010 0000 SLOW-START-OK + 0010 0000 JUMBO-PACKET + + Commonly, but not necessarily, the following value mappings + for the Security field are used: + + 0 No security or encryption + 1 bcrypt security, only used in AFS 2.0 + 2 "krb4" rxkad + 3 "krb4" rxkad with encryption (sometimes) + + The following packet type values are defined: + + 1 DATA Standard data packet + 2 ACK Acknowledgement of received data + 3 BUSY Busy response + 4 ABORT Abort packet + 5 ACKALL Acknowledgement of all packets + 6 CHALLENGE Challenge request + 7 RESPONSE Challenge response + 8 DEBUG Debug packet + 9 PARAMS Exchange of parameters + 10 PARAMS Exchange of parameters + 11 PARAMS Exchange of parameters + 12 PARAMS Exchange of parameters + 13 VERSION Get AFS version + + * Rx acknowledgement packet + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Buffer Space | Max Skew | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | First Sequence | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Serial | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Reason | Ack Count | Acknowledgements ... + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .. + + ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ... Acks | Reserved | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Maximum Packet Size | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Recommended Packet Size | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Receive Window Size | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Max Packets per Jumbogram | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Note that the trailing fields can have arbitrary alignment, + determined by the number of individual acks in the packet. + There are three reserved octets between the variable acks + section and the start of the trailing fields; they also have + no particular alignment. + + The valid values for the Reason code are: + + 1 REQUESTED + 2 DUPLICATE + 3 OUT-OF-SEQUENCE + 4 WINDOW-EXCEEDED + 5 NO-SPACE + 6 PING + 7 PING-RESPONSE + 8 DELAYED + 9 OTHER + +Acknowledgements +================ + +Jeffrey Hutzelman reviewed an early draft of this +specification, and provided much appreciated feedback on technical +details as well as document structuring. + +Love Hornquist-Astrand made many corrections +to this specification, especially regarding backwards-compatibility +with older Rx implementations. + +References +========== + + [1] /afs/sipb.mit.edu/contrib/doc/AFS/hijacking-afs.ps.gz + + [2] OpenAFS: src/rx/ + + [3] /afs/sipb.mit.edu/contrib/doc/AFS/ps/rx-spec.ps + + [4] ftp://ftp.stacken.kth.se/pub/arla/prog-afs/shadow/doc/r.vdoc + + [5] ftp://ftp.stacken.kth.se/pub/arla/prog-afs/shadow/doc/rx.mss + + [6] http://web.mit.edu/rfc/rfc2001.txt + +$Id: rx-spec,v 1.22 2002/10/20 06:46:00 kolya Exp $