openafs/doc/txt/rx-spec.txt
Michael Meffie b8e8145fa9 doc: add kolya's rx-spec to doc/txt
Add rx protocol spec and rx debug spec written by Nickolia Zeldovich.

Rx protocol specification draft (2002)
Nickolai Zeldovich, kolya@MIT.EDU

Change-Id: I65a9a83a8889503f3a82c8fde7a87f84d2736c8d
Reviewed-on: https://gerrit.openafs.org/12676
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
2017-08-03 20:46:47 -04:00

922 lines
38 KiB
Plaintext

Rx protocol specification draft
Nickolai Zeldovich, kolya@MIT.EDU
Introduction
============
Rx is a client-server RPC protocol, an extended and combined version
of the older R and RFTP protocols. This document describes Rx, but
the details of Rx security protocols (such as Rxkad) are not specified.
Rx communicates via UDP datagrams on a user-specified port. Rx also
provides for multiplexing of Rx services on a single port, via a
16-bit service ID which identifies a particular Rx service that's
listening on a given port akin to a port number. Therefore, an Rx
service is identified by a triple of <IP address; UDP port number;
Rx service ID>.
The protocol is connection-oriented -- a client and a server must
first hand-shake and establish a connection before Rx calls can be
made. Said hand-shaking is implicit upon the first request if no
authentication is desired, or can consist of a pair of Challenge
and Response requests in order to establish authentication between
the client and the server.
Protocol Overview
=================
As mentioned above, Rx uses UDP/IP datagrams on a user-specified
port to communicate. An optional user-selectable authentication
and encryption method can be used to achieve desired security.
Each Rx server may provide multiple services, specified by the
Service ID. This allows for service multiplexing, much in the
same way as UDP port numbers allow for multiplexing of UDP
datagrams addressed to the same host.
Each client and server pair that want to communicate using Rx must
establish an Rx connection, which can be thought of as a context
for all subsequent Rx activity between these two parties. An Rx
connection can only be associated with a single Rx service.
Each Rx connection context contains multiple channels, which are
used for data transmission and actually performing an RPC call.
The channels are independent of each other, allowing multiple
RPC calls to be performed to the same Rx server simultaneously.
An Rx call involves the transmission of call arguments over an Rx
channel to the server and reception of the reply data. For each
Rx call, an available Rx channel must be allocated exclusively to
that call. The channel cannot be used for anything else until the
call completes. After call completion, the channel may be reused
for subsequent Rx calls.
Rx Connections
==============
This section makes many references to fields of an Rx header; see
the ``Packet Formats'' section for specific layout of the Rx header.
The connection epoch is a unique value chosen by Rx on startup and
used by the peer to both to identify connections to this host, and
to detect when this host's Rx restarts. An Rx connection between
two hosts is identified by:
{ Epoch, Connection ID, Peer IP, Peer Port },
if the high bit of the epoch (+) is not set
{ Epoch, Connection ID },
if the high bit of the epoch (+) is set
This means that if the high epoch bit is set, the recipient of a
packet should accept packets for this Rx connection from any IP
address and port number. Conversely, if the high bit is not set,
the IP and port number must be the same in order for packets to
be properly recognized as being part of the same connection.
Connection ID is chosen by the client that establishes the connection.
The last two bits of the same 32-bit field are used by Rx to multiplex
between 4 parallel calls on the same connection. Each one of them is
called an Rx channel, and therefore the field is denoted "Channel ID".
Call number identifies a particular call within a channel (so there
are four call numbers associated with an Rx connection). Each new
call should start with a higher number than the previous call, and
typically this is just the previous call number + 1. The initial
call number must be non-zero, since call number zero indicates a
connection-only Rx packet (see below). The call number is chosen
by the peer initiating the call. Although only one call can use
a channel at one time, the call number allows peers to distinguish
packets on the same channel that belong to different calls.
The sequence number is similar to the sequence number in TCP, but
instead of bytes they count packets within a call. Sequence numbers
always start with 1 at the beginning of each call, and are incremented
by 1 for each additional packet sent. Retransmissions in Rx are done
on a packet-by-packet basis, identified by these sequence numbers.
Every outgoing packet associated with a certain connection is stamped
with a serial number in the serial field, and the serial number is
incremented by 1 for every packet sent. This is used by the flow
control mechanisms (described below). The serial number for a
connection should start out with 1 (i.e., the first packet sent
should have a serial number of 1.)
Service ID identifies a particular Rx service running on a given
host/port combination. This is analogous to how UDP port numbers
allow multiplexing packets to a single IP address. Note that once
an Rx connection has been created, the service ID may not be changed;
existing implementations cache the service ID value for a given
connection, and will ignore service ID values in subsequent packets.
The Checksum field allows for an optional packet checksum. A zero
checksum field value means that checksums are not being computed.
An Rx security protocol (identified by the security field, described
below) may choose to use this field to transport some checksum of
the packet that is computed and verified by it (for example, rxkad
uses this field for a cryptographic header checksum). Rx itself
makes no use of the checksum field.
The status field allows for additional user flags to be transported
with each packet. These have no significance to the protocol itself.
These flags are associated with a call rather than an individual
packet.
The security field specifies the type of security in use on this
connection. These values don't have a defined mapping in the Rx
protocol but rather are mapped to specific Rx security types by
the application using Rx.
An Rx security protocol can use the checksum field as described
above, and can also modify the packet payload in any way, for
instance by encrypting the contents or adding headers or trailers
specific to the security protocol (although the end result must
be a properly sized packet that Rx will be able to transmit.)
The "Flags" field consists of a number of single-bit flags with
meanings as follows. The actual bit values are defined below,
in the ``Protocol Constants'' section.
* CLIENT-INITIATED
This packet originated from an Rx client (as opposed
to server). To avoid packet loops, a server should
always clear the CLIENT-INITIATED flag on any packets
it sends, and discard incoming packets without the
CLIENT-INITIATED flag.
* REQUEST-ACK
Sender is requesting acknowledgement of this packet,
via an Ack packet response.
* LAST-PACKET
This packet is the last packet in this call from the
sender.
NOTE: some older Rx implementations, which do not
support the trailing packet size fields in Rx Ack
packets, use the LAST-PACKET flag for computing the
MTU. In particular, when a DATA packet with the
REQUEST-ACK flag but without the LAST-PACKET flag
is received, the MTU is adjusted down to the size
of that packet.
* MORE-PACKETS
More packets are going to be following this one. This
flag is set on all but the last packet by the sender
transmitting a list of packets at once, for possible
optimization at the receiver end.
* SLOW-START-OK
In an ack packet, indicates that the sender of this
packet supports the slow-start mechanism, described
below under ``Flow Control''.
* JUMBO-PACKET
In a data packet, indicates that this packet is part
of a jumbogram, and is not the last one. See the
``Jumbograms'' section below for more details.
Packet Types
============
The "Type" field indicates the contents of this packet. Actual
values are specified in the ``Protocol Constants'' section.
This section describes the simpler packet types, and subsequent
sections cover more complex packet types in more detail.
Certain type packets are connection-only requests (that is, they
are not associated with an RPC call). A connection-only request
is indicated by a zero call number. Valid packet types in a
connection-only context are Abort, Challenge, Response, Debug,
Version, and the parameter exchange packet types. All other
packets can only be used in the context of a call. Additionally,
Abort can be used both in a connection and call context.
The payload of the packet following the header depends on the
type of the field, as follows:
* DATA type (Standard data packet)
The payload of a data packet is simply the Rx payload,
corresponding to the sequence number and call specified
in the header. The actual data that is transmitted in
Rx data packets is described below.
The receipt of a data packet by a client implicitly
acknowledges that the server has received and processed
all the packets that have been transmitted to it as
part of this call.
* ACK type (Acknowledgement of received data)
An acknowledgement packet provides information about
which packets were or were not received by the peer,
and other useful parameters. The semantics of these
packets are described below in the ``Call Layer''
section.
* BUSY type (Busy response)
When a client tries to start a new call on a channel
which the server still considers active, a busy response
is returned. The call and channel number in the packet
header indicate which call is being rejected. This packet
type has no payload associated with it.
* ABORT type (Abort packet)
Indicates that the relevant connection or call (if the
call number field is non-zero) has encountered an error
and has been terminated. The payload of the packet has
a network-byte-order 32-bit user error code.
* ACKALL type (Acknowledgement of all packets)
An acknowledge-all packet indicates the obvious: the peer
wants to acknowledge the receipt of all packets sent to
it. This could be used, for example, when a connection
is being closed and the client wants to ensure that no
retransmissions are attempted after it exits.
There is no payload associated with an acknowledge-all
packet.
* CHALLENGE, RESPONSE types (Challenge request/response)
The payload of the packet is security-layer-specific
data, and is used to authenticate an Rx connection.
Perhaps this should include a reference to some spec
on rxkad (or rxkad should just be added to this spec.)
* DEBUG type (Debug packet)
Rx supports an optional debugging interface; see the
``Debugging'' section below for more details.
* PARAMS types (Parameter exchange)
These types were assigned in AFS 3.2 but never used for
anything, and therefore have no protocol significance
at this time.
* VERSION type (Get AFS version)
If a server receives a packet with a type value of 13, and
the client-initiated flag set, it should respond with a
65-byte payload containing a string that identifies the
version of AFS software it is running. The response should
not have the client-initiated flag set.
Nothing should respond to a version packet without the
client-initiated flag, to avoid infinite packet loops.
Call Layer
==========
The call layer provides a reliable data transport over an
Rx channel, and is used by the RPC layer to make Rx calls.
One of the most important pieces of the call layer is the
Rx acknowledgement packet. The acknowledgement packet is
used by Rx to determine when retransmissions are needed,
as well as determining the proper transmission / receiving
parameters to use (such as the transmit window size and
jumbogram length, described in more detail below).
A new call is established by the client simply sending a
data packet to the server on an available channel. Either
side can indicate that they have no more data to send by
setting the LAST-PACKET flag in their last Rx packet. The
call remains open until the upper layer informs Rx that it
is done with the call. (The upper layer in this case would
most likely be the Rx RPC layer.)
The structure of an Rx acknowledgement packet is described
in the Packet Formats section. We will refer to particular
fields of the acknowledgement packet here by names.
The <Buffer Space> field specifies the number of packets that
the sender of the acknowledgement is willing to provide for
receiving packets for this call. The sender, presumably,
should not send packets beyond the number specified here,
without receiving further acknowledgement allowing it.
The <Max Skew> field indicates the maximum packet skew that
the sender of this packet has seen for this call. If a
packet is received N packets later than expected (based
on the packet's serial number, i.e. if the last received
packet's serial number is N higher than this packet's),
then it is defined to have a skew of N. This can be used
to avoid retransmission because of packet reordering.
The <First Sequence> number specifies the sequence number of
the first packet that is being explicitly acknowledged (either
positively or negatively) by this packet. All packets with
sequence numbers smaller than this are implicitly acknowledged.
The <Reserved> field, previously used to indicate the previous
received packet, is no longer used. It should be set to zero
by the sender and not interpreted by the receiver.
The <Serial Number> field indicates the serial number of the
packet which has triggered this acknowledgement, or zero if there
is no such packet (i.e. the ack packet was delayed and should not
be used for round-trip time computation). The receiver should
note that any transmitted packets with a serial number less than
this, which are not acknowledged by this packet, are likely lost
or reordered. Thus, these packets should be retransmitted, after
a possible delay to allow for packet reordering (as measured by
packet skew).
The trailing fields after the variable-length acknowledgements
section are not always 32-bit aligned with respect to the packet,
and aren't always present. (Their presence depends on the Rx
version of the peer.) The maximum and recommended packet sizes
are, respectively, the largest possible packet size that the peer
is willing to accept from us, and the size of the packet they
would prefer to receive. In absence of these fields, it should
be assumed that the maximum allowed packet size is 1444 bytes.
The receive window size indicates the size of the ACK sender's
receive window, in packets. Its use is described below in
the "Flow Control" section. If this field is absent, the
implementation must assume a maximum window size of 15 packets;
older implementations that do not support this trailing field
only allow for a window of 15 packets.
The "Max Packets per Jumbogram" field indicates how many packets
the ACK sender is willing to receive in a jumbogram (also
described below). All packets in a jumbogram are always of the
same size (except the last one), regardless of the maximum and
recommended packet sizes described above.
The <Reason> field specifies a particular type of an ack packet.
Valid reason codes are specified in the ``Packet Formats and
Protocol Constants'' section; their meanings are as follows:
REQUESTED
Acknowledgement was requested. The peer received
a packet from us with the acknowledgement-requested
flag set, and is acknowledging it.
DUPLICATE
A duplicate packet was received. The duplicate
packet's serial number is in the <Serial> field.
OUT-OF-SEQUENCE
A packet was received out of sequence. The serial
number of said packet is in the <Serial> field.
WINDOW-EXCEEDED
A packet was received but exceeded the current
receive window, and was dropped.
NO-SPACE
A packet was received, but no buffer space was
available and therefore it was dropped.
PING
This is a keep-alive packet, used to verify that
the peer is still alive. If the REQUEST-ACK flag
in the Rx packet is set, the recipient of this
packet should reply with a PING-RESPONSE packet.
PING-RESPONSE
This is a response to a keep-alive ack (ping).
DELAYED
A delayed acknowledgement, usually because a certain
amount of time has passed since the receipt of the
last packet and there are outstanding unacknowledged
packets. Should not be used for RTT computation.
OTHER
Un-delayed general acknowledgement, which does not
fall in any of the above categories.
A peer should never delay the transmission of an ack packet
in response to a received packet unless it sets the delayed
ack type field. This is because ack packets (except for
delayed ones) are used for RTT computation by Rx.
All acknowledgement packets should have the REQUEST-ACK
flag in the Rx header turned off, except for PING type
ack packets.
The <Ack Count> field specifies the number of bytes following
in the acknowledgements section. Each of those bytes indicate
the acknowledgement status corresponding to a sequence number
between firstSequence and firstSequence+ackCount-1 inclusively.
There can be up to 255 bytes in the acknowledgements section.
Typically the ack count is the receive window size of the
ack packet sender, and the individual packet status bytes
correspond to the packets in the current receive window.
The values in each of those bytes can be as follows:
0 Explicit negative acknowledgement: packet with the
corresponding sequence number has not been received
or has been dropped.
1 Explicit acknowledgement: packet with the corresponding
sequence number has been received but not processed by
the application yet.
It's important to note the distinction between packets with
sequence numbers before firstSequence, between firstSequence
and firstSequence+ackCount-1, and those with sequence numbers
of at least firstSequence+ackCount. Those in the first category
have been passed up to the application level and the sender
(recipient of this ack) can recycle packets with such sequence
numbers.
Packets in the second category are individually acknowledged
in the acknowledgements section, either as being queued for
the application or not received. The recipient of the ack
should keep all packets with sequence numbers in this range,
but avoid retransmitting the positively acknowledged ones.
Negatively acknowledged packets should be retransmitted.
A more detailed explaination of the retransmit strategy is
given below.
Packets in the third category are not acknowledged at all,
and the recipient of the ack should assume no knowledge
of their state. Since the Rx receive window should not
exceed the size of an ack packet, the sender shouldn't
have transmitted any packets in this category anyway.
* Round-trip time computation
To determine when packet retransmission is necessary, Rx
computes some statistics about the round-trip time between
the two hosts: exponentially-decaying averages of the
round-trip time and the standard deviation thereof. Each
acknowledgement packet which mentions a specific packet in
the <Serial> field and is not delayed is used to update the
round-trip statistics. First, the round-trip time for this
packet (R) is computed as the difference between the arrival
time of the ack packet and the time we transmitted the
packet with the serial number specified in <Serial>.
Next, the round-trip time average and standard deviation
values are updated. For instance, this algorithm could
be used:
RTTdev = RTTdev * (3/4) + |RTTavg - R| / 4
RTTavg = RTTavg * (7/8) + R / 8
* Packet retransmission
In order to support reliable data transport, Rx must retransmit
packet which are lost in the network. This must not be done
too early, otherwise we might retransmit a packet whose first
copy is still in transit, thereby wasting bandwidth.
Rx computes a retransmit timeout value T, and retransmits any
packet which hasn't been positively acknowledged since last
transmission for at least T seconds. This timeout could be
computed as follows from the round-trip statistics above:
T = RTTavg + 4 * RTTdev + 0.350
This allows the packet to be up to 4 deviations late and still
not be retransmitted. The 350 msec fudge factor is used to
compensate for bursty networks, though it is likely becoming
less relevant (and accurate) with time.
A more clever algorithm could take into account the maximum
packet skew rate, and improve the retransmission strategy to
take into the account the likelihood that a given packet has
been reordered, and give it extra time before retransmission.
* Keepalive and Timeout
The upper layer (either the Rx RPC layer or the application)
have to specify a timeout, T, to the call layer. If the peer
is not heard from within T seconds, the call layer declares
the call to be dead and propagates the error to the upper
layer.
In order to determine whether the peer is still alive or not,
keepalive requests are used. These take form of an ack PING
and PING-RESPONSE packets. When the client has not received
any response from the server, either to the original request
or the keepalive requests, in T seconds, the call times out.
The following strategy may be used to determine when to send
keepalive requests:
Compute a keepalive timeout, KT = T/6
If the call was initiated KT seconds ago, or KT
seconds have passed since the last keepalive
request transmission, send a keepalive packet.
This strategy limits the number of transmitted keepalive
packets to a fixed number in the case of a dead server,
and proportional to the real timeout in case of a slow
server. It also allows up to 5 keepalives to be dropped
before the server is erroneously declared dead.
* Flow Control
Every Rx client or server has associated with each Rx call a
receive and transmit window. These windows indicate the number
of packets that haven't been fully acknowledged packets (that
is, not read by the peer's application) that an Rx sender can
have outstanding at any time. A sender's transmit window may
never be greater than it's peer's receive window for that call.
The receive windows are exchanged via the "Receive Window Size"
parameter in an Ack packet.
Rx ``sliding windows'' are similar to those used by TCP, except
they measure packets rather than bytes. Also, in TCP the window
effectively applies to bytes in flight between the two peers,
whileas in Rx the window applies to packets between the user
applications. For example, a transmit window of 8 on a certain
Rx connection means that at most 8 packets can be transmitted
and not yet read by the peer's application at any time. The
sequence number of the first packet that hasn't been read by
the application is indicated by the First Sequence field of
an Ack packet.
The selection of initial window sizes isn't strictly defined
by the Rx protocol, but here are a few things that one might
want to consider when choosing initial windows:
* A useful strategy can be to advertise a small receive
window until the application starts reading data, and
advertise a larger window afterwards.
* The transmit window should be initially a conservative
small value. Once an Ack packet is received, the peer's
advertised receive window can be used to choose a better
transmit window.
Rx uses the slow start, congestion avoidance, and fast recovery
algorithms[6]. The algorithms are modified to work in the context
of Rx packet-based transmission windows, and are described below.
These algorithms require two additional variables to be maintained
for each active Rx call: a congestion window, cwind, and a slow
start threshold, ssthresh.
Define a "negative ack" as an Ack packet that contains a negative
acknowledgement followed by a positive one. Similarly, define a
"positive ack" to be any Ack that is not negative. Upon receiving
three negative acks for a call in a row since the last congestion
avoidance attempt (if any), the Rx protocol enters congestion
avoidance for that Rx call.
* Slow start, congestion avoidance, and fast recovery algorithms
First, the congestion window, cwind, is initialized to 1.
The number of unread transmitted packets is now limited not
only by the transmission window, but also by the congestion
window. The latter limit is a little different: Rx may
send up to cwind packets (by sequence number) past the last
contiguous positively acknowledged packet. For example,
if an Ack packet indicates that packets 1, 2 and 8 were
received, and cwind is 2, Rx may transmit packets 3 and 4.
When congestion occurs (indicated by a negative ack or a
packet retransmission timeout), Rx enters congestion avoidance
and fast recovery. The slow-start threshold, ssthresh, is
set to half of the effective transmission window (minimum of
cwind and transmit window), but no less than 2 packets.
If triggered by a negative ack, any negatively acknowledged
packets should be retransmitted as soon as possible (i.e.
window-permitting).
If triggered by a retransmission timeout, the congestion
window is reset to a single packet.
When in fast-recovery mode, every additional negative ack
packet received causes cwind to be increased by one packet.
A positive ack packet causes cwind to be set to ssthresh,
and terminates fast recovery. At this point we are back
to congestion avoidance, since the cwind is half the original
transmission window.
When packet acknowledgements are received, the congestion
window should be increased. If cwind is less than ssthresh,
cwind should be increased by 1 for each newly acknowledged
packet. If cwind is at least ssthresh, cwind is increased
by 1 for each newly received Ack packet.
The size of the receive window should not grow past the size of
an Rx ack packet (which can acknowledge up to 255 packets at a
time.)
Debugging
=========
Rx provides for an optional debugging interface, using the Debug AFS
packet type, allowing remote Rx clients to query an Rx server for
some Rx protocol statistics. Not all implementations are required
to implement this interface. Some parts of this interface may also
be specific to a particular implementation of Rx. In order to prevent
packet loops, a server should only reply to debug packets with the
client-initiated flag set.
The payload of a debug request packet is always the same; both of
the 32-bit quantities are in network byte order:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Debug Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Debug Index |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The debug type indicates the kind of debug information being sent
or requested, and determines the format of the rest of the packet.
The debug index allows some debug types to export array-like data,
indexed by this field. The following debug types are defined for
the Transarc implementation:
0x01 Retrieve basic connection statistics
0x02 Get information about some connections
0x03 Get information about all connections
0x04 Get all Rx stats
0x05 Get all peers of this server
The index field in the debug packet indicates which element of the
debug information the client wants to access, in cases where there
are multiple entries in question.
The responses to each of those debug queries contain the following
information:
1. Retrieve basic connection stats
An array of general statistics about packet allocation,
server performance, and so on. The first octet in this
response represents the debug protocol version being used
by the server. See RX_DEBUGI_VERSION* in rx/rx.h.
2, 3. Get information about connections
Both of these calls return a struct rx_debugConn (see
rx/rx.h), indexed by the "index" field.
The first version of the debug call (type 2) only retrieves
information about connections which are deemed interesting,
that is, connections which are active, or about to be
reaped.
The end of the list is signaled by a response where the
connection ID value is 0xFFFFFFFF.
4. Get Rx stats
This call returns a struct rx_stats to the client in network
byte order, containing various statistics about the state of
Rx on the server (see rx/rx.h).
5. Get all Rx peers
Similar to the connection request above (2, 3) this call
returns all the Rx peers of the server (in a network-byte-order
struct rx_debugPeer), indexed by the index field in the request.
End of list is indicated by a host value of 0xFFFFFFFF. (These
are the first 4 octets.)
In response to unknown requests, the server returns 0xFFFFFFF8 in the
debug type field.
XXX The response interface should probably be fixed
to include a fixed header that indicates whether
the request was successfully completed.
Jumbograms
==========
To be able to transmit more data in a single packet, Rx supports
``jumbograms'', which are single UDP datagrams containing multiple
sequential Rx DATA packets. In a jumbogram, all packets except the
last one must be of a fixed maximal size (1412 bytes). Because all
the packets in the jumbogram are sequential, only one full header
is needed. Here is what a jumbogram could look like:
+-----------+---------------+--------------+---------------+
| Rx header | 1412 byte pkt | Short header | 1412 byte pkt | ->
+-----------+---------------+--------------+---------------+
+--------------+- -+-----------------------+
-> | Short header | ... | <= 1412 byte last pkt |
+--------------+- -+-----------------------+
Every Rx packet in a jumbogram except the first one must be preceeded
by the short Rx header, and all packets except the last one must have
the Jumbogram Rx flag set in their respective headers. The number of
packets in a jumbogram may not exceed the peer's advertised Max Packets
Per Jumbogram value in the Ack packet.
The maximum number of packets per jumbogram should be assumed to be 1
(i.e., no jumbograms) unless explicitly specified otherwise by an Ack
packet. If an Ack packet is received without the packet-per-jumbogram
field, it might indicate that the peer is now running a version of Rx
that does not support jumbograms, and therefore no jumbograms should
be sent until they are explicitly enabled again.
The short header in a jumbogram has the following makeup:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Flags | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
All the packets in the jumbogram have the same Rx header fields
(from the full Rx header) except for Flags, Checksum, Sequence,
and Serial. The flags and checksum field for subsequent packets
are taken from the short header preceeding that packet in the
jumbogram. The sequence and serial numbers are assumed to be
consecutive, and are incremented by 1 from the first packet in
the jumbogram (ie the full Rx header).
Retransmitted packets should not be sent in a jumbogram.
RPC Layer
=========
This section discusses how an RPC call is made using the Rx protocol.
There are two common ``types'' of Rx calls: simple and streaming.
These mostly reflect a difference in the upper-level API rather than
in the Rx protocol. A simple Rx call has a fixed number of input
variables and a fixed number of output variables. A streaming Rx
call, in addition to the above, allows the user to send and receive
arbitrary amounts of data (whose length should be specified as a
fixed-length argument.)
In either case, an Rx call consists of two basic stages: client
sending the data to the server, and server sending the response
back to the client. No data can be sent by the client in the
same call after the server has started sending its response.
Each remote function call associated with a particular Rx service
(identified by the IP-port-serviceId triplet, as mentioned above)
is assigned a 32-bit integer opcode number. To make a simple Rx
call, the caller must transmit the opcode number followed by the
expected arguments for that call over an Rx channel using XDR
encoding. The callee uses XDR to unmarshall the opcode and input
arguments, performs a function call corresponding to that opcode
and arguments, and then uses XDR to encode the return values back
to the caller. The caller then uses XDR to receive the output
variables.
For streaming calls which send data from the caller to the callee,
the convention is to include the length of the data to be sent as
one of the fixed-length arguments, and send the variable-length
data immediately after the fixed-length portion. For streaming
calls which receive data, the convention is for the callee to first
reply with a fixed-length field specifying the number of bytes it's
about to send, and then send those bytes. Upon completion of the
streaming part of the call, the output arguments are sent back to
the caller in fixed-length XDR form, as with simple calls.
Packet Formats and Protocol Constants
=====================================
* Rx packet
Every simple Rx packet has an Rx header, of the form below.
All quantities are in network byte order.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|+| Connection Epoch |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Connection ID | * |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Call Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Serial Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Flags | Status | Security |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Service ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload ....
+-+-+-+-+-
[*] The field marked with * is the Channel ID. The last
two bits of the connection ID are used to multiplex
between 4 parallel calls.
[+] The bit marked with + is used to indicate that only
the connection ID should be used to identify this
connection, and sender host/port should not be used.
The values for the Flags field are defined as follows:
0000 0001 CLIENT-INITIATED
0000 0010 REQUEST-ACK
0000 0100 LAST-PACKET
0000 1000 MORE-PACKETS
0001 0000 - Reserved -
0010 0000 SLOW-START-OK
0010 0000 JUMBO-PACKET
Commonly, but not necessarily, the following value mappings
for the Security field are used:
0 No security or encryption
1 bcrypt security, only used in AFS 2.0
2 "krb4" rxkad
3 "krb4" rxkad with encryption (sometimes)
The following packet type values are defined:
1 DATA Standard data packet
2 ACK Acknowledgement of received data
3 BUSY Busy response
4 ABORT Abort packet
5 ACKALL Acknowledgement of all packets
6 CHALLENGE Challenge request
7 RESPONSE Challenge response
8 DEBUG Debug packet
9 PARAMS Exchange of parameters
10 PARAMS Exchange of parameters
11 PARAMS Exchange of parameters
12 PARAMS Exchange of parameters
13 VERSION Get AFS version
* Rx acknowledgement packet
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Buffer Space | Max Skew |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| First Sequence |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Serial |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reason | Ack Count | Acknowledgements ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ..
... -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
... Acks | Reserved | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Maximum Packet Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Recommended Packet Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Receive Window Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Max Packets per Jumbogram |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Note that the trailing fields can have arbitrary alignment,
determined by the number of individual acks in the packet.
There are three reserved octets between the variable acks
section and the start of the trailing fields; they also have
no particular alignment.
The valid values for the Reason code are:
1 REQUESTED
2 DUPLICATE
3 OUT-OF-SEQUENCE
4 WINDOW-EXCEEDED
5 NO-SPACE
6 PING
7 PING-RESPONSE
8 DELAYED
9 OTHER
Acknowledgements
================
Jeffrey Hutzelman <jhutz@cmu.edu> reviewed an early draft of this
specification, and provided much appreciated feedback on technical
details as well as document structuring.
Love Hornquist-Astrand <lha@stacken.kth.se> made many corrections
to this specification, especially regarding backwards-compatibility
with older Rx implementations.
References
==========
[1] /afs/sipb.mit.edu/contrib/doc/AFS/hijacking-afs.ps.gz
[2] OpenAFS: src/rx/
[3] /afs/sipb.mit.edu/contrib/doc/AFS/ps/rx-spec.ps
[4] ftp://ftp.stacken.kth.se/pub/arla/prog-afs/shadow/doc/r.vdoc
[5] ftp://ftp.stacken.kth.se/pub/arla/prog-afs/shadow/doc/rx.mss
[6] http://web.mit.edu/rfc/rfc2001.txt
$Id: rx-spec,v 1.22 2002/10/20 06:46:00 kolya Exp $