Update version strings for the third 1.9.x development release.
Change-Id: I203a9d3487d1344882afb02ded094e01e7672753
Reviewed-on: https://gerrit.openafs.org/15926
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
Currently, there are a few callers of RPCs that specify some data for
an INOUT parameter, but do not initialize the memory for that data.
This can result in the uninitialized memory being sent to the peer
when the argument is processed as an IN argument. Simply clear the
relevant data before running the RPC to avoid this.
The relevant RPCs and arguments are:
- For RMTSYS_Pioctl, the 'OutData' argument.
- For BUDB_GetVolumes, the 'volumes' argument.
-- via DBLookupByVolume -> bcdb_LookupVolume -> ubik_BUDB_GetVolumes
-- and via bc_Restorer -> bcdb_FindVolumes -> ubik_BUDB_GetVolumes
- For KAA_Authenticate_old / KAA_Authenticate, this can happen with
the 'answer' argument in ka_Authenticate if KAA_AuthenticateV2 or
KAA_Authenticate return RXGEN_OPCODE, but the server manages to
populate oanswer.SeqLen with non-zero.
For all of these, make sure the memory is blanked before running the
relevant RPC. For ka_Authenticate, reset oanswer.SeqLen to 0 to avoid
sending any data, but still blank 'answer' and 'answer_old' just to be
safe.
FIXES 135043
Change-Id: I95ee91a1710d66e2e88c3086d57c279a376f7dc6
Reviewed-on: https://gerrit.openafs.org/15925
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
Here, 'OutData' only has OutData.rmtbulk_len bytes in it. We know that
OutData.rmtbulk_len is at most data->out_size, but it could be
smaller. So, only copy OutData.rmtbulk_len bytes, not data->out_size,
since data->out_size could be more than the number of bytes we have
allocated in OutData.
FIXES 135043
Change-Id: I6f87fc8cb5df0298061f419112200f6c7e1974ba
Reviewed-on: https://gerrit.openafs.org/15924
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
A few areas of code retry the same RPC, like so:
do {
code = VL_SomeRPC(rxconn, &array_out);
} while (some_condition);
xdr_free((xdrproc_t) xdr_foo, &array_out);
Or try a different version/variant of an RPC (e.g.
VLDB_ListAttributesN2 -> VLDB_ListAttributes).
If the first RPC call causes the output array to be allocated with
length N, then the subsequent RPC calls may fail if the server
responds with an array larger than N.
Furthermore, if the subsequent call responds with an array smaller
than N, then when we xdr_free the array, our length will be smaller
than the actual number of allocated elements. That results in two
potential issues:
- We'll fail to free the elements at the end of the array. This is
only a problem if each element in the array also uses
dynamically-allocated memory (e.g. each element contains a string or
another array). Fortunately, there are only a few such structures in
any of our RPC-L definitions: SysNameList and CredInfos. And neither
of those are used in such a retry loop, so this isn't a problem.
- We'll give the wrong length to osi_free when freeing the array
itself. This only matters for KERNEL, and only on some platforms
(such as Solaris), since the length given to osi_free is ignored
everywhere else.
To avoid these possible issues, change the relevant retry loops to
free our xdr-allocated arrays on every iteration of the loop, like
this:
do {
xdr_free((xdrproc_t) xdr_foo, &array_out);
code = VL_SomeRPC(rxconn, &array_out);
} while (some_condition);
xdr_free((xdrproc_t) xdr_foo, &array_out);
Or like this:
do {
code = VL_SomeRPC(rxconn, &array_out);
xdr_free((xdrproc_t) xdr_foo, &array_out);
} while (some_condition);
FIXES 135043
Change-Id: I4e058eb52d277e6d3817df8b703c5e5b894c0b5a
Reviewed-on: https://gerrit.openafs.org/15923
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
Currently, if a caller calls an RPC with a string output argument,
like so:
{
char *str = NULL;
code = RXAFS_SomeCall(&str);
/* do something with 'str' */
xdr_free((xdrproc_t) xdr_string, &str);
}
Normally, xdr_free causes xdr_string to call osi_free, specifying the
same size that we allocated for the string. However, since we only
have a char*, the amount of space allocated for the string is not
recorded separately, and so xdr_string calculates the size of the
buffer to free by using strlen().
This works for well-formed strings, but if we fail to decode the
payload of the string, or if our peer gave us a string with a NUL byte
in the middle of it, then strlen() may be significantly less than the
actual allocated size. And so in this case, the size given to osi_free
will be wrong.
The size given to osi_free is ignored in userspace, and for KERNEL on
many platforms like Linux and DARWIN. However, it is notably not
ignored for KERNEL on Solaris and some other less supported platforms
(HPUX, Irix, NetBSD). At least on Solaris, an incorrect size given to
osi_free can cause a system panic or possibly memory corruption.
To avoid this, change xdr_string during XDR_DECODE to make sure that
strlen() of the string always reflects the allocated size. If we fail
to decode the string's payload, replace the payload with non-NUL bytes
(fill it with 'z', an arbitrary choice). And if we do successfully
decode the payload, check if the strlen() is wrong (that is, if the
payload contains NUL '\0' bytes), and fail if so, also filling the
payload with 'z'. This is only strictly needed in KERNEL on certain
platforms, but do it everywhere so our behavior is consistent.
FIXES 135043
Change-Id: I90c419a7ef0ede247187172a182863dcb4250578
Reviewed-on: https://gerrit.openafs.org/15922
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
Various RPCs return a variable-length array in an OUT argument, but
are only supposed to return specific sizes. A few instances of this
include the following (but this is not an exhaustive list):
- AFSVolListOneVolume should only return a single volintInfo.
- PR_NameToID should return the same number of IDs as names given.
- VL_GetAddrsU should return the same number of addresses as the
'nentries' OUT argument.
Some callers of these RPCs just assume that the server has not
violated these rules. If the server responds with a nonsensical array
size, this could cause us to read beyond the end of the array, or
cause a NULL dereference or other errors.
For example, some callers of VL_GetAddrsU will iterate over 'nentries'
addresses, even if the 'blkaddrs' OUT argument contains fewer entries.
Or with AFSVolListOneVolume, some callers assume that at least 1
volintInfo has been returned; if 0 have been returned, we can try to
access a NULL array.
To avoid all of this, add various sanity checks on the relevant
returned lengths of these RPCs. For most cases, if the lengths are not
sane, return an internal error from the appropriate subsystem (or
RXGEN_CC_UNMARSHAL if there isn't one). For VL_GetAddrsU, if
'nentries' is too long, just set it to the length of the returned
array.
FIXES 135043
Change-Id: Ibdc7837ab09b4765436fc4c0d780e695bba07128
Reviewed-on: https://gerrit.openafs.org/15921
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
When making an RPC call from a client, output arguments that use
arrays (or array-like objects like strings and opaques) can be
allocated by XDR, like so:
{
struct idlist ids;
ids.idlist_val = NULL;
ids.idlist_len = 0;
code = PR_NameToID(rxconn, names, &ids);
/* data inside ids.idlist_val[...] */
xdr_free((xdrproc_t) xdr_idlist, &ids);
}
With this approach, during XDR_DECODE, xdr_array() reads in the number
of array elements from the peer, then allocates enough memory to hold
that many elements, and then reads in the array elements.
Alternatively, the caller can provide preallocated memory, like so:
{
struct idlist ids;
afs_int32 ids_buf[30];
ids.idlist_val = ids_buf;
ids.idlist_len = 30;
code = PR_NameToID(rxconn, names, &ids);
/* data inside ids.idlist_val[...] */
}
With this approach, during XDR_DECODE, xdr_array() reads in the number
of array elements from the peer, and then reads in the array elements
into the supplied buffer. However, in this case, xdr_array() never
checks that the number of array elements will actually fit into the
supplied buffer; the _len field provided by the caller is just ignored.
In this example, if the ptserver responds with 50 elements for the 'ids'
output argument, xdr_array() will write 50 afs_int32's into
'ids.idlist_val', going beyond the end of the 30 elements that are
actually allocated.
It's also possible, and in fact very easy, to use xdr-allocated
buffers and then reuse them as a preallocated buffer, possibly
accidentally. For example:
{
struct idlist ids;
ids.idlist_val = NULL;
ids.idlist_len = 0;
while (some_condition) {
code = PR_NameToID(rxconn, names, &ids);
}
}
In this case, the first call to PR_NameToID can cause the buffer for
'ids' to be allocated by XDR, which will then be reused by the
subsequent calls to PR_NameToId. Note that this can happen even if the
first PR_NameToID call fails; the call can be aborted after the output
array is allocated.
Retrying an RPC in this way is effectively what all ubik_Call*
codepaths do (including all ubik_* wrappers, e.g. ubik_PR_NameToID).
Or some callers retry effectively the same RPC when falling back to
earlier versions (e.g. VL_ListAttributesN2 -> VL_ListAttributesN).
To prevent this for arrays and opaques, change xdr_array (and
xdr_bytes) to check if the _len field for preallocated buffers is
large enough, and return failure if it's not.
Also perform the same check for the ka_CBS and ka_BBS structures. These
are mostly the same as opaques, but they have custom serialization
functions in src/kauth/kaaux.c. ka_BBS also has two lengths: the actual
length of bytes, and a 'max' length. ka_CBS isn't used for any RPC
output arguments, but fix it for consistency.
For strings, the situation is complicated by the fact that callers
cannot pass in how much space was allocated for the string, since
callers only provide a char**. So for strings, just refuse to use a
preallocated buffer at all, and return failure if one is provided.
Note that for some callers using preallocated arrays or strings, the
described buffer overruns are not possible, since the preallocated
buffers are larger than the max length specified in the relevant
RPC-L. For example, afs_DoBulkStat() allocates AFSCBMAX entries for
the output args for RXAFS_InlineBulkStatus, which is the max length
specified in the RPC-L, so a buffer overrun is impossible. But since
it is so easy to allow a buffer overrun, enforce the length checks for
everyone.
FIXES 135043
Change-Id: Iaf913e2156af2081c72125ec066d9636086c7d14
Reviewed-on: https://gerrit.openafs.org/15920
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
Currently, a few RPCs with arrays or opaque OUT arguments are called
with preallocated memory for the arg, but also provide a _len of 0 (or
an uninitialized _len). This makes it impossible for the xdr routine to
tell whether we have allocated enough space to actually hold the
response from the server.
To help this situation, either specify an appropriate _len for the
preallocated value (cm_IoctlGetACL, fsprobe_LWP), or don't provide a
preallocated buffer at all and let xdr allocate a buffer for us
(PGetAcl).
Note that this commit doesn't change xdr to actually check the value of
the given _len; but now a future commit can do so without breaking
callers.
FIXES 135043
Change-Id: Ieb50aaa5ae9a1bde027999ce1c668e0c99b4d82b
Reviewed-on: https://gerrit.openafs.org/15919
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10397
Currently, several callers call RPCs with string OUT arguments, and
provide preallocated memory for those arguments. This can easily allow a
response from the server to overrun the allocated buffer, stomping over
stack or heap memory.
We could simply make our preallocated buffers larger than the maximum
size that the RPC allows, but relying on that is error prone, and
there's no way for XDR to check if a string buffer is large enough.
Instead, to make sure we don't overrun a given preallocated buffer,
avoid giving a preallocated buffer to such RPCs, and let XDR allocate
the memory for us.
Specifically, this commit changes several callers to
RXAFS_GetVolumeStatus(), and one caller of BOZO_GetInstanceParm(), to
avoid passing in a preallocated string buffer.
All other callers of RPCs with string OUT args already let XDR allocate
the buffers for them.
FIXES 135043
Change-Id: If42e2cc983903cff9766e1bab487142d4d493a17
Reviewed-on: https://gerrit.openafs.org/15918
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
Several places in the tree parse ACLs using sscanf() calls that look
similar to this:
sscanf(str, "%d dfs:%d %s", &nplus, &dfs, cell);
sscanf(str, "%100s %d", tname, &trights);
Some callers check whether the scanf() returns negative or 0, but some
callers do not check the return code at all. If only some of the fields
are present in the sscanf()'d string (because, for instance, the ACL is
malformed), some of the arguments are left alone, and may be set to
garbage if the relevant variable was never initialized.
If the parsed ACL is copied to another ACL, this can result in the
copied ACL containing uninitialized memory.
To avoid this, make sure all of the variables passed to sscanf() and
similar calls are initialized before parsing. This commit does not
guarantee that the results make sense, but at least the results do not
contain uninitialized memory.
Change-Id: Ib0becffbce5a6e15f91ac4390bb9c422d478bea6
Reviewed-on: https://gerrit.openafs.org/15917
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
After the preceding commits, the data returned by the VIOCGETAL
pioctl (a RXAFS_FetchAcl wrapper) will safely be NUL-terminated.
However, the callers that attempt to parse the ACL string make
assumptions that the returned data will be properly formatted,
and implement a "skip to next line" functionality (under various
names) that blindly increments a char* until it finds a newline
character, which can read past the end of even a properly
NUL-terminated string if there is not a newline where one is
expected.
Adjust the various "skip to next line" functionality to keep
the current string pointer at the trailing NUL if the end of the
string is reached while searching for a newline.
Change-Id: I7fb7f23d7d6f68608f3e656a1530a7fc40b4a567
Reviewed-on: https://gerrit.openafs.org/15916
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
Supplement the previous commit by additionally verifying that
the returned ACL string occupies the entire XDR opaque, rejecting
any values returned that have an internal NUL prior to the end
of the opaque.
Change-Id: Iefa3d00a9a0e25ef66b7166fe952aae0603ee3d7
Reviewed-on: https://gerrit.openafs.org/15915
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
Analogously to how a call to RXAFS_StoreACL() with a malformed
ACL string can cause a fileserver to perform invalid memory operations,
a malformed ACL string returned in response to a call to RXAFS_FetchACL()
can cause a client to perform invalid memory operations.
Modify all the in-tree callers of the RPC to verify that the ACL
data, which is conveyed as an XDR 'opaque' but whose contents
are actually expected to be a string, is a valid C string. If
a zero-length opaque or one without a trailing NUL is received,
treat that as an error response from the fileserver rather than
returning success.
The Unix cache manager's pioctl handler already has logic to cope with a
zero-length reply by emitting a single NUL byte to userspace. This
special-casing seems to have been in place from the original IBM import,
though it does so by confusingly "skipping over" a NUL byte already put
in place. For historical compatibility, preserve that behavior rather
than treating the zero-length reply as an error as we do for the other
callers. It seems likely that this location should treat a zero-length
reply as an error just as the other call sites do, but that can be done
as a later change.
Change-Id: Ibf685e54e7e3fca6a4caac63c961cfcfb2f4732a
Reviewed-on: https://gerrit.openafs.org/15914
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
Currently in SRXAFS_StoreACL, if CallPreamble() or check_acl() fail, we
will jump to Bad_StoreACL, which will pass the ACL string from the
client to osi_auditU. Since check_acl() hasn't yet checked if the given
ACL contains a NUL byte, the ACL may be an unterminated string. If
auditing is enabled, this can cause garbage to be logged to the audit
log, or cause the fileserver to crash.
To avoid this, set 'rawACL' to NULL at first, only setting it to the
actual ACL string after check_acl() has succeeded. This ensures that all
code accessing 'rawACL' is guaranteed to be using a terminated string.
This may mean that we pass a NULL AUD_ACL to osi_auditU. Our auditing
code explicitly checks for and handles handles NULL strings, so this is
fine.
FIXES 135445
Change-Id: Iecde5677805a28d55c833b135732a14fd86cc985
Reviewed-on: https://gerrit.openafs.org/15913
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
Change our StoreACL implementation to refer to the 'AccessList' argument
via a new local variable called 'rawACL'. This makes it clearer to
users that the data is a string, and makes it easier for future commits
to make sure we don't access the 'AccessList' argument in certain
situations.
Update almost all users in StoreACL to refer to 'rawACL' instead of
'AccessList'. Change the name of 'AccessList' to 'uncheckedACL' to make
sure we don't miss any users. Update our check_acl() call to use
'uncheckedACL' (and not 'rawACL'), because it must use an AFSOpaque to
check the ACL.
Change RXStore_AccessList() and printableACL() to accept a plain char*
instead of a struct AFSOpaque.
This commit should not incur any noticeable behavior change. Technically
printableACL() is changed to run strlen() on the given string, but this
should not cause any noticeable change in behavior:
This change could cause printableACL() to process less of the string
than before, if the string contains a NUL byte before the end of the
AFSOpaque buffer. But this doesn't matter, since the all of our code
after this treats the ACL as a plain string, and so doesn't look at any
data beyond the first NUL. It's not possible for printableACL() to
process more data than before, because check_acl() has already checked
that the ACL string contains a NUL byte, so we must process
AFSOpaque_len bytes or fewer.
FIXES 135445
Change-Id: I26229a39528fec788d96c77012c042c278a214c9
Reviewed-on: https://gerrit.openafs.org/15912
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
In acl_Internalize_pr(), each line in an ACL granting rights (positive
or negative) is sscanf()'d with "%63s\t%d\n", and then we try to
advance 'nextc' beyond the next newline character.
However, sscanf()'ing "%63s\t%d\n" does not guarantee that there is a
newline in the given string. Whitespace characters in sscanf() are not
matched exactly, and may match any amount of whitespace (including
none at all). For example, a string like "foo 4" may be parsed by
sscanf(), but does not contain any newlines.
If this happens, strchr(nextc, '\n') will return NULL, and we'll
advance 'nextc' to 0x1, causing a segfault when we next try to
dereference 'nextc'.
To avoid this, check if 'nextc' is NULL after the strchr() call, and
return an error if so.
FIXES 135445
Change-Id: I6bcbbaf88a16202fb84c0932578dd8d5712726dd
Reviewed-on: https://gerrit.openafs.org/15911
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
The early parsing code in acl_Internalize_pr() tries to advance
'nextc' to go beyond the first two newlines in the given ACL string.
But if the given ACL string has no newlines, or only 1 newline, then
'nextc' will point beyond the end of the ACL string, potentially
pointing to garbage.
Intuitively, it may look like the ACL string must contain at least 2
newlines because we have sscanf()'d the string with "%d\n%\d".
However, whitespace characters in sscanf() are not matched exactly
like non-whitespace characters are; a sequence of whitespace
characters matches any amount of whitespace (including none). So, a
string like "1 2" will be parsed by "%d\n%d\n", but will not contain
any newline characters.
Usually this should result in a parse error from acl_Internalize_pr(),
but if the garbage happens to parse successfully, this could result in
unrelated memory getting stored to the ACL.
To fix this, don't advance 'nextc' if we're already at the end of the
ACL string.
FIXES 135445
Change-Id: Ie009b59bec9a75afc81fee201c2fca6955f484e4
Reviewed-on: https://gerrit.openafs.org/15910
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10394
Currently, we only throttle PAG creation in afs_setpag(). But there
are several callers that call setpag() directly, not via afs_setpag;
notably _settok_setParentPag in afs_pioctl.c. When setpag() is called
with a PAG value of -1, it generates a new PAG internally without any
throttling. So, those callers effectively bypass the PAG throttling
mechanism, which allows a calling user to create PAGs without any
delay.
To avoid this, move our afs_pag_wait call from afs_setpag() to
afs_genpag(), which all code uses to generate a new PAG value. This
ensures that PAG creation is always throttled for unprivileged users.
FIXES 135062
Change-Id: Ic4cb352edaa693984995fbdb6dc35b89686e8470
Reviewed-on: https://gerrit.openafs.org/15907
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10394
Currently, several areas in the code call genpag() to generate a new
PAG id, but the signature of genpag() is very limited. To allow for
the code in genpag() to return errors and to examine the calling
user's credentials, introduce a new function, afs_genpag(), that does
the same thing as genpag(), but accepts creds and allows errors to be
returned.
Convert all existing callers to use afs_genpag() and to handle any
errors, though no errors are ever returned in this commit on its own.
To ensure there are no old callers of genpag() left around, change the
existing genpag() to be called genpagval(), and declare it static.
FIXES 135062
Change-Id: I5c96c0134db901f21ca30c8b3f57aeec1eb67aa5
Reviewed-on: https://gerrit.openafs.org/14090
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
Currently, we don't free 'newACL' if acl_Internalize_pr() fails. If
acl_Internalize_pr() has already allocated 'newACL', then the memory
associated with newACL will be leaked. This can happen if parsing the
given ACL fails at any point after successfully parsing the first
couple of lines in the ACL.
Change acl_FreeACL() to make freeing a NULL acl a no-op, to make it
easier to make sure the acl has been freed.
FIXES 135445
Change-Id: I87745fa9b6285574acdd5ecb613e80fa1ea37ae8
Reviewed-on: https://gerrit.openafs.org/15909
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
CVE-2024-10396
Currently, the fileserver treats the ACL given in RXAFS_StoreACL as a
string, even though it is technically an AFSOpaque and could be not
NUL-terminated.
We give the ACL opaque/string to acl_Internalize_pr() to parse, which
will run off the end of the allocated buffer if the given ACL does not
contain a '\0' character. Usually this will result in a parse error
since we'll encounter garbage, but if the partially-garbage ACL
happens to parse successfully, some uninitialized data could make it
into the stored ACL.
In addition, if the given ACL is an opaque of length 0, we'll still
give the opaque pointer to acl_Internalize_pr(). In this case, the
pointer will point to &memZero, which happens to contain a NUL byte,
and so is treated like an empty string (which is not a valid ACL). But
the fact that this causes no problems is somewhat a coincidence, and
so should also be avoided.
To avoid both of these situations, just check if the given ACL string
contains a NUL byte. If it doesn't, or if it has length 0, refuse to
look at it and abort the call with EINVAL.
FIXES 135445
Change-Id: If55f72d6556bc7b1704a3848865bfb902ee9f92a
Reviewed-on: https://gerrit.openafs.org/15908
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Commit de46edef69 (Remove lowercase min/max macro definitions)
inadvertently removed a definition of max() that is required by
_krb5_n_fold() (src/external/heimdal/krb5/n-fold.c) if no system
definition of max() is available. For example, on Solaris this results
in a kernel module with an undefined external symbol 'max', which
prevents the kernel module from loading:
genunix: [ID 819705 kern.notice] /kernel/drv/amd64/afs: undefined symbol
genunix: [ID 826211 kern.notice] 'max'
genunix: [ID 472681 kern.notice] WARNING: mod_load: cannot load module 'afs'
Restore the required max() macro definition.
Change-Id: I6bca2bb2b90d7cdbe90a3b769997cdc153f59f6c
Reviewed-on: https://gerrit.openafs.org/15874
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
During parallel builds, we can run the final 'libtool --mode=link'
step for, say, pam_afs.la and pam_afs.krb.la in parallel. When libtool
runs --mode=link for these libraries, it effectively does the
following (amongst many other steps):
- rm -rf .libs/"$libname".*
- mkdir .libs/"$libname".lax
- mkdir .libs/"$libname".lax/libopr_pic.a
The 'rm -rf' cleans up any lingering files from previous runs in the
.libs dir (e.g. .libs/pam_afs.a, .libs/pam_afs.lai, etc etc). The
'.lax' directory is used as a temporary space to extract object files
in dependant libs. And of course it creates various other files in
there named like .libs/pam_afs.lai.
But if we're building a library called pam_afs.krb.la, then some of
our temporary files will be named like .libs/pam_afs.krb.lax, which
get deleted when building pam_afs.la when it deletes
".libs/pam_afs.*". So, if we build pam_afs.la while the libtool
command for pam_afs.krb.la is using .libs/pam_afs.krb.lax, the
temporary files can get deleted out from under us, causing rare
confusing libtool errors. For example:
/bin/sh ../../libtool --quiet --mode=link --tag=CC gcc -rpath /usr/local/lib \
-module -no-undefined -o pam_afs.la [...]
/bin/sh ../../libtool --quiet --mode=link --tag=CC gcc -rpath /usr/local/lib \
-module -no-undefined -o pam_afs.krb.la [...]
../../libtool: line 1719: cd: .libs/pam_afs.krb.lax/libopr_pic.a: No such file or directory
make[3]: *** [Makefile:76: pam_afs.krb.la] Error 1
make[3]: *** Waiting for unfinished jobs....
Fortunately, most of the libraries we build don't have dots in their
name, and so avoid this issue. But we do have a few libraries that
build a ".krb" variant, and so can be impacted by this. This includes
pam_afs.krb, libauth.krb, and libkauth.krb.
To make sure these libraries don't encounter this problem, make the
.krb variant depend on the non-.krb variant, so the libraries aren't
built in parallel.
Change-Id: Icb96a721b481bc3d99c9e24cf81fcfbbf7d498f6
Reviewed-on: https://gerrit.openafs.org/15328
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Since the original IBM code import, the 'bos' utility source code has
contained an unused pre-processor conditional 'OPBOS'. If OPBOS is
defined, it builds a bos binary that implements only the following bos
commands: status, start, stop, restart, startup, shutdown.
There is no configure option or any other logic that defines 'OPBOS',
nor is it documented. Remove the unused conditional.
Change-Id: I4d32084e6a6b4ca8cadb0aa0ef06c74fb6edb770
Reviewed-on: https://gerrit.openafs.org/15875
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Modify OpenAFS to respect user-specified ARCHFLAGS (-m32 or -m64) when
building on Solaris 10 and 11 SPARC platforms. If ARCHFLAGS is
unset/unspecified, we will continue to build 32-bit userspace by
default.
Implement -m64 SPARC LWP via USE_UCONTEXT; leave the -m32 SPARC LWP
(default) implementation unchanged.
Kernelspace builds remain 64-bit, regardless of ARCHFLAGS value.
Change-Id: Ibb8e145d0cb8dfbe05de896b84c1d1135664b845
Reviewed-on: https://gerrit.openafs.org/14897
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Currently, the ReclaimedVCList is only fed on Darwin. As a result, the
logic present in afs_FlushReclaimedVcaches() is never used on all other
platforms and should be wrapped in #ifdef AFS_DARWIN_ENV. While here,
also reorganize this function in an effort to make it more readable.
No functional change is incurred by this commit.
Change-Id: I1097a2b07fcb48dec2b31385477c34e974144b51
Reviewed-on: https://gerrit.openafs.org/15027
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Because the afs_vhashT hash table uses singly-linked lists in its hash
buckets, to remove an entry we have to:
1. Find the bucket where the entry to be removed should be present.
2. Go through each element of this bucket until we find this entry.
3. Remove it.
Due to step number 2, the time required to remove an entry from a bucket
increases as the linked list associated with this bucket gets longer.
This can get particularly bad when evicting vcaches (ShakeLooseVCaches),
as each entry from the least recently used queue (VLRU) has to be found
in our afs_vhashT hash table. This problem is exacerbated when files are
repeatedly deleted and recreated in the same volume, resulting in many
vcaches with the same volume id and vnode number (but different unique
ids) in the same bucket.
To avoid this problem, build afs_vhashT using doubly-linked lists with
'struct afs_q', like we did for afs_vhashTV in commit 4fc48af8
(vc-hashing-be-less-expensive-20050728). Now, removing a vcache from
afs_vhashT is an O(1) operation (QRemove).
Change-Id: I5a1a9a090f9aa3d8884e2bb12aca1f1acc3b902d
Reviewed-on: https://gerrit.openafs.org/14949
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Replace strcpy, strcat, and sprintf with safer string functions in the
tests/torture/DumpAfsLog program. Add checks for string truncation
where needed.
Change-Id: I4eac12e73bbeaeadc0a8a6b5ebded664ca91b7f8
Reviewed-on: https://gerrit.openafs.org/15489
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Gaurav Saxena <gsaxena@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: Michael Meffie <mmeffie@sinenomine.net>
Remove the commented out code in the DumpAfsLog test program.
Change-Id: Ie7b380ec37d18cfc7b1be8ce9904b55ad101b263
Reviewed-on: https://gerrit.openafs.org/15871
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Several static analysis tools have identified various problems:
- files left open from early returns/exits (infer)
- missing checks to ensure *alloc was successful (infer)
- memory leaks due to not freeing storage before
a return (infer)
- unused variables (cppcheck)
To resolve the above problems:
- close files before returning/exiting
- add checks to ensure *alloc was successful before using the memory
- free storage before returning
- remove unused variables
This commit is a reorganization of commits submitted by Pat Riehecky,
who ran the static analysis tools and developed the fixes.
Change-Id: If45d64c370ad68d1e844b6aa5881f9dbd75300c9
Reviewed-on: https://gerrit.openafs.org/14671
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
There are several large #if/#endif blocks that do not have a comment on
the #endif, and some of the nested #if/#else/#endif blocks are not
indented.
In addition, there are violations of the CODING style such as
indentation errors, trailing whitespace, braces for functions that are
not on their own line, etc.
Add comments to the #endif statements for the larger blocks.
Use indent with parameters from CODING to fix the remaining style
issues, and fix up the preprocessor indentation.
There are no functional changes
Change-Id: I89a13a1f43aa08728777a71a41984613ac163b95
Reviewed-on: https://gerrit.openafs.org/15669
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Commit de46edef69 (Remove lowercase min/max macro definitions) removed
the min() macro definition from the util/ktime.h header. Unfortunately,
that min() definition is still required for some of the Windows source
files.
Later, the windows sources can be converted to use the new in-tree
opr_min and opr_max macros, but for now reinstate the min() macro
definition in ktime.h.
Fixes the build errors on Windows:
msrpc.obj : error LNK2001: unresolved external symbol _min
RDRFunction.obj : error LNK2001: unresolved external symbol _min
cm_server.obj : error LNK2019: unresolved external symbol _min referenced
in function _cm_SetServerIPRank
cm_dcache.obj : error LNK2001: unresolved external symbol _min
cm_vnodeops.obj : error LNK2001: unresolved external symbol _min
smb3.obj : error LNK2001: unresolved external symbol _min
Change-Id: I043e23e72977ef06330b53f05044cd3be93cc00e
Reviewed-on: https://gerrit.openafs.org/15870
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Commit 8e1e242ce4 (Replace lowercase min/max macros with opr_min/max)
converted the min() invocation in auth/ktc_nt.c with our in-tree
opr_min() macro, but missed adding the opr.h header to that source file.
Add the opr.h include to fix the build error on windows:
afsauth.lib(ktc_nt.obj) : error LNK2019: unresolved external symbol
_opr_min referenced in function _GetLocalToken
Change-Id: I96dbe377a8a85921fa1833cd27638cd853e26b9c
Reviewed-on: https://gerrit.openafs.org/15869
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
The commit:
"afsd: Move to newer cmd syntax" (fbe6f0f4ad)
removed setting the flag variable "sawBiods" when the -nbiods option is
used, this results in the -nbiods value being ignored and the default
value being used.
This problem was identified by the cppcheck static analysis tool.
Note: The -nbiods option is only for AIX systems.
Change-Id: I9a8d199a212a164660c1e9f8b4151792d9c9d0e0
Reviewed-on: https://gerrit.openafs.org/15212
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Since introduced in commit 53986479ee (ptutil-initial-20001219),
pt_util has used a statically allocated buffer to read pages from the
database. In several places the buffer is cast to a struct ubik_header.
Memory allocated dynamically via malloc()/calloc() is guaranteed to be
properly aligned for objects of any type at run time, however buffers
that are not allocated dynamically have no such guarantee. So, instead
of using a statically allocated buffer, dynamically allocate it with
calloc() to ensure it is aligned for struct ubik_header objects.
This fixes a SIGBUS error whenever the buffer happens to be built with a
non-word alignment. This failure was observed while running the
tests/ptserver/pt_util-t unit test on Solaris SPARC 64-bit.
Note: By inspection, the same potential problem exists in
src/kauth/ka_util.c. However, kaserver is deprecated and ka_util has
never been built by default.
Change-Id: I00372ed6adede32448b5e0df1c1d74689286a218
Reviewed-on: https://gerrit.openafs.org/15352
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
rxkad/fcrypt defines 2 macros: ENCRYPT and DECRYPT. The names are
pretty generic and do not clearly associate with fcrypt.
In addition, the name ENCRYPT collide with the Microsoft provided header
'ntddndis.h' that uses ENCRYPT as an ENUM (on newer versions of the
Windows development kits):
typedef enum _OFFLOAD_OPERATION_E
{
AUTHENTICATE = 1,
ENCRYPT
}
This results in a build error on Windows when fcrypt.h is included with
ntddndis.h:
... ntddndis.h(2212): error C2059: syntax error: 'constant'
Rename the ENCRYPT/DECRYPT macros to FCRYPT_ENCRYPT/FCRYPT_DECRYPT in
fcrypt.h to a name that relates the macros back to fcrypt.
Note: The ENCRYPT/DECRYPT symbols are part of a public interface
installed in fcrypt.h, but keeping the old names are impractical, so we
are changing them anyway.
Change-Id: I9c51c81fab7fcbf0dae01569852ca94c0e6a0439
Reviewed-on: https://gerrit.openafs.org/15868
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Several static analysis tools have identified various problems:
- missing checks to ensure *alloc was successful (infer)
To resolve the above problems:
- add checks to ensure *alloc was successful before using the memory
This commit is a reorganization of commits developed by Pat Riehecky,
who ran the static analysis tools and developed the fixes.
Change-Id: I9396b8327f8283c60821050c75de1c36e0ebd12e
Reviewed-on: https://gerrit.openafs.org/14684
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
CheckVolume accepts an optional nvldbentry as an argument. When this
argument is NULL, CheckVolume correctly locks the nvldbentry before
obtaining the entry and making changes. However, if an nvldbentry is
passed, CheckVolume merely copies it to a work entry; it also neglects
to obtain the current state of the entry after locking it for
modification.
Change CheckVolume to obtain a locked copy of the nvldbentry if
modifications are required.
Change-Id: Ie98bc57af1cc2955d10982f44abd4c55755c6244
Reviewed-on: https://gerrit.openafs.org/14356
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
In preparation for a future commit, refactor some large chunks of logic
out of CheckVolume and into new static routines to handle the specific
cases for RW, RO, and BK volumes. Also provide Doxygen documentation for
CheckVolume.
No functional change should be incurred by this commit.
Change-Id: Iac6a773d29ed530e9676f02726db031b1f2a2526
Reviewed-on: https://gerrit.openafs.org/14355
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Since the original IBM code import, UV_RestoreVolume (later renamed to
to the current name UV_RestoreVolume2) has called VLDB_GetEntryByID to
obtain the VLDB entry of the target/destination volume. This entry is
then modified to match the restore changes, locked via VL_SetLock, and
replaced via VL_ReplaceEntryN.
Unfortunately, this allows for a race with other volume operations
against the same volume, if another operation alters the entry after we
call VLDB_GetEntryByID but before ubik_VL_SetLock. To avoid this, call
GetLockedEntry to ensure that the entry is locked while we are reading
and modifying it.
Change-Id: I772ef80158205f45c04f8ef0aa95726f1f2c4ac4
Reviewed-on: https://gerrit.openafs.org/14354
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Since the original IBM code import, UV_RestoreVolume2 (originally
UV_RestoreVolume) has checked for VL_ENTDELETED from VLDB_GetEntryByID.
However, if VL_ENTDELETED is received, we will act like we retrieved the
vlentry successfully. If the -verbose flag is given, we'll try to print
the entry (even though one was not returned), and we'll call VL_SetLock
for the volume, which will almost certainly also fail with VL_ENTDELETED
and bail out.
Furthermore (also since the original IBM import), there is no longer any
code path to set the volume entry flag VLDELETED. So it should be rare
for any VL_* RPC to return VL_ENTDELETED; that would require an entry to
have the VLDELETED flag carried over from a very old database.
Therefore, it is safe and reasonable to simply remove the check for
VL_ENTDELETED from UV_RestoreVolume2. If we do somehow receive the
VL_ENTDELETED error code, vos will terminate with an error instead of
possibly printing an uninitialized entry.
In addition, remove the same check for VL_ENTDELETED in the equivalent
code path in libadmin's UV_RestoreVolume.
Change-Id: I11d1c3306f67d68de54780f6aac75e4c27779db4
Reviewed-on: https://gerrit.openafs.org/14357
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: Andrew Deason <adeason@sinenomine.net>
In several routines in vsprocs.c, we declare and clear nvldbentry
storeEntry long before it is actually needed.
Instead, where possible, move the declaration and memset of storeEntry
closer to the actual point of use.
No functional change is incurred by this commit.
Change-Id: Iffbf5ff56fab4d221586dfb93bbbbfde063c856e
Reviewed-on: https://gerrit.openafs.org/14353
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Change UV_ReleaseVolume() to use the new GetLockedEntry() function. Add
logic to GetLockedEntry() to support the special semantics of VL_SetLock
for VLOP_RELEASE, to preserve the current behavior of UV_ReleaseVolume()
with the VL_RERELEASE error code.
The special semantics of the VL_RERELEASE error code with VLOP_RELEASE
are a bit odd: this situation can currently only happen with our
vlserver if the vlentry has the VLOP_RELEASE flag set but no lock
timestamp. The only way that situation can occur with our current 'vos'
is during a small window after a partially failed 'vos release'. Around
where UV_ReleaseVolume() prints the message "The volume %lu could not be
released to the following %d sites", we call VLDB_ReplaceEntry() with
LOCKREL_TIMESTAMP, clearing just the lock timestamp but not the lock
flags. We then jump to the 'rfail' label where we ubik_VL_ReleaseLock()
with all LOCKREL_* flags are set, clearing the lock flags.
So, the only time in this codepath where the lock timestamp is cleared
but the VLOP_RELEASE flag is set is after the mentioned
VLDB_ReplaceEntry() call but before the ubik_VL_ReleaseLock() call. If
that last ubik_VL_ReleaseLock() fails, or if vos dies before reaching
it, then the vlentry is left in a state that causes the VL_RERELEASE
error code in the future. This behavior has existed since at least
OpenAFS 1.0.
Currently, the handling of this special VL_RERELEASE case is completely
silent, so it's not clear that this is happening, if it does manage to
happen. These special semantics are not really safe, since two vos
processes could encounter a VL_RERELEASE volume at the same time, and go
through the 'vos release' process in parallel thinking they have the
vlentry locked, causing all kinds of problems.
This behavior may be changed in the future, but for now just preserve
the existing special handling of VL_RERELEASE. Add a warning message,
though, to make this special situation not silent.
No functional change (other than error messages) should be incurred by
this commit.
[adeason@sinenomine.net: Add VL_RERELEASE warning message and additional
related comments.]
Change-Id: I9da50677233f63235de730b5d234a9b575e196fd
Reviewed-on: https://gerrit.openafs.org/14352
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
In preparation for a future commit, consolidate common VL_SetLock and
VL_GetEntryById logic into a new routine GetLockedEntry.
This commit should incur no practical change in behavior, but there is a
minor internal behavior change for one case:
For the case in UV_ConvertRO(), previously we called VL_SetLock with the
RW volume id (with volume type RWVOL), and called VL_GetEntryByID with
the RO volume id (with the volume type -1); now we call both with the RW
volume id. We get these volume IDs from the same VL entry, and they must
refer to the same VL entry in order for this function to work properly,
so this difference should not matter.
Change-Id: I76faf3d184c397bceff27ba3a857b5c80c88e814
Reviewed-on: https://gerrit.openafs.org/14350
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
rxi_DestroyConnectionNoLock contains the following early return
test:
if ((conn->refCount > 0) || (conn->flags & RX_CONN_BUSY)) {
/* Busy; wait till the last guy before proceeding */
MUTEX_EXIT(&rx_refcnt_mutex);
MUTEX_EXIT(&conn->conn_data_lock);
USERPRI;
return;
}
This prevents the rx_connection ("conn") from being destroyed if it is
still in use elsewhere, e.g.:
- the rx_Listener is handling packets for one of the conn's rx_calls
- another caller has a reference to the conn
However, since the original IBM code import, rxi_ReceivePacket obtains a
conn reference (via rxi_FindConnection) but returns it by merely
decrementing the refCount (currently via putConnection), and does not
call rx_DestroyConnection. If the listener's rxi_ReceivePacket is
processing a reply or ack for a client conn (holding a reference) while
another thread is calling rx_DestroyConnection on the otherwise-last
reference, the rx_connection will not be destroyed and can never be
destroyed - it has been leaked.
Modify rxi_ReapConnections to destroy client conns with refCount == 0.
This way, these connections can still "leak", but they're eventually
cleaned up the next time rxi_ReapConnections runs.
Change-Id: I8c588d43b108a8147a07d0ff0cc69055cd00d355
Reviewed-on: https://gerrit.openafs.org/15135
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
rxi_ReapConnections obtains conn_data_lock and rx_refcnt_mutex for each
server connection in the rx_connHashTable.
Instead, only obtain these locks for server connections that have no
calls.
Change-Id: I27d748e604070792bee3c1f31d4d558462fd399a
Reviewed-on: https://gerrit.openafs.org/15349
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Both the Unix and Windows cache managers maintain a set of persistent
client rx_connections ("conns") to the known fileservers for each active
AFS user. These conns are periodically refreshed (destroyed and
possibly rebuilt) approximately NOTOKETIMEOUT (2h) after token
expiration for authenticated (rx_kad) conns, or every NOTOKETIMEOUT (2h)
for anonymous (rx_null) conns.
Both cache managers enable NAT ping for one rx_connection to each known
fileserver. Thus we see this common idiom when a cache manager destroys
a fileserver connection:
rx_SetConnSecondsUntilNatPing(conn, 0);
rx_DestroyConnection(conn);
Doing this for all conns is harmless, even if a given conn doesn't have
NAT ping enabled.
It is important to note that rx_SetConnSecondsUntilNatPing(conn, 0) does
not actually cancel the conn's natKeepAliveEvent (if it has one); it
merely sets conn->secondsUntilNatPing to 0. If there is a
natKeepAliveEvent, this prevents it from being rescheduled after the
next event. This was fine in the past because rx_DestroyConnection
eventually did cancel natKeepAliveEvent and destroy the rx_connection
itself.
However, this idiom broke after commit 304d758983 "Standardize
rx_event usage" introduced a number of changes:
- an extra rx_connection refCount for each outstanding conn event
- a requirement (enforced by osi_Assert) that all conn events be
canceled before rx_DestroyConnection can succeed
Therefore, if natKeepAliveEvent is still active when we enter
rx_DestroyConnection, it will return early, due to the extra conn
refCount associated with the NAT ping event. The rx_connection is not
destroyed in this case.
Eventually, the final scheduled natKeepAliveEvent will fire; the event
will not be rescheduled, and the final conn refCount will be removed via
putConnection and not rx_DestroyConnection (and so the conn is not
destroyed). This rx_connection now has refCount 0, but can never be
destroyed - it has been leaked:
- The cache manager has "forgotten" this rx_connection and has no means
to invoke rx_DestroyConnection again.
- rxi_ReapConnections will not destroy it because it is a client conn
and is still on the rx_connHashTable, not the rx_connCleanup_list.
- If there is still a dallying rx_call, any eventual call to
rxi_CheckCall -> rxi_FreeCall will not destroy the conn because it
has not been flagged RX_CONN_DESTROY_ME.
Modify rx_SetConnSecondsUntilNatPing to explicitly cancel any
natKeepAliveEvent and remove its refcount if cancelled. With this
change in place, the cache managers will no longer periodically leak
client rx_connections.
Change-Id: I4e89ebc4bd2c95b6e61b95bd8f91867d451dd34c
Reviewed-on: https://gerrit.openafs.org/14951
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
The cmd_Parse() function calls strdup() to create a temporary string
when there is a parameter in the form "-<name>=<value>". This is
done in order to parse the <name> and <value> components of the
parameter.
This memory is not freed when the last parameter of the command is the
form "-<name>=<value>". Always free the param pointer at the end of
cmd_Parse() to avoid this memory leak.
This memory leak was found by running the tests/cmd/command-t unit test
program with valgrind.
Change-Id: I85f8f5f10660268596072c8bb8be2c014ac46ad8
Reviewed-on: https://gerrit.openafs.org/15086
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
A prior commit:
"Replace lowercase min/max macros with opr_min/max"
(Change-Id: Ib7c91b4e59aaf76ab77f293016496d6179c23c88)
replaced the uses of the min and max macros with opr_min and opr_max.
As a cleanup, remove the definitions for the min() and max() macros.
As noted in the above commit, there were several locations that left
the min/max macros alone:
src/WINNT
src/external/heimdal
Change-Id: I263d91b04a2aed14bf9e1ebca1616629f36e6b23
Reviewed-on: https://gerrit.openafs.org/15843
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
The commit:
"opr: replace MIN/MAX macros with opr_min/opr_max" (915c5cff16)
replaced all uses of the MIN and MAX macros with opr_min and opr_max.
Since we now have a consolidated version of opr_min and opr_max,
replace the min/max macros with opr_min and opr_max, except for the
files in src/WINNT (which can be handled in a future commit).
The exception are the min/max macros that are defined and used in the
external sources src/external/heimdal, which cannot be changed.
Change-Id: Ib7c91b4e59aaf76ab77f293016496d6179c23c88
Reviewed-on: https://gerrit.openafs.org/15825
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>