Commit Graph

15 Commits

Author SHA1 Message Date
Michael Meffie
50a3eb7b7e tests: fix out of bounds access in the rx-event test
Use the NUMEVENTS symbol which defines the array size instead of an
incorrect hard coded number when checking if a second event can be added
to be fired at the same time.  This fixes a potential out of bounds
access of the event test array.

Also update the comment which incorrectly mentions the incorrect number
of events in the test.

Change-Id: I4f993b42e53e7e6a42fa31302fd1baa70e9f5041
Reviewed-on: https://gerrit.openafs.org/12762
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
2017-11-22 20:50:47 -05:00
Benjamin Kaduk
304d758983 Standardize rx_event usage
Go over all consumers of the rx event framework and normalize its usage
according to the following principles:

rxevent_Post() is used to create an event, and it returns an event
handle (with a reference on the event structure) that can be used
to cancel the event before its timeout fires.  (There is also an
additional reference on the event held by the global event tree.)
In all(*) usage within the tree, that event handle is stored within
either an rx_connection or an rx_call.  Reads/writes to the member variable
that holds the event handle require either the conn_data_lock or call
lock, respectively -- that means that in most cases, callers of
rxevent_Post() and rxevent_Cancel() will be holding one of those
aforementioned locks.  The event handlers themselves will need to
modify the call/connection object according to the nature of the
event, which requires holding those same locks, and also a guarantee
that the call/connection is still a live object and has not been
deallocated!  Whether or not rxevent_Cancel() succeeds in cancelling
the event before it fires, whenever passed a non-NULL event structure
it will NULL out the supplied pointer and drop a reference on the
event structure.  This is the correct behavior, since the caller
has asked to cancel the event and has no further use for the event
handle or its reference on the event structure.  The caller of
rxevent_Cancel() must check its return value to know whether or
not the event was cancelled before its handler was able to run.

The interaction window between the call/connection lock and the lock
protecting the red/black tree of pending events opens up a somewhat
problematic race window.  Because the application thread is expected
to hold the call/connection lock around rxevent_Cancel() (to protect
the write to the field in the call/connection structure that holds
an event handle), and rxevent_Cancel() must take the lock protecting
the red/black tree of events, this establishes a lock order with the
call/connection lock taken before the eventTree lock.  This is in
conflict with the event handler thread, which must take the eventTree
lock first, in order to select an event to run (and thus know what
additional lock would need to be taken, by virtue of what handler
function is to be run).  The conflict is easy to resolve in the
standard way, by having a local pointer to the event that is obtained
while the event is removed from the red/black tree under the eventTree
lock, and then the eventTree lock can be dropped and the event run
based on the local variable referring to it.  The race window occurs
when the caller of rxevent_Cancel() holds the call/connection lock,
and rxevent_Cancel() obtains the eventTree lock just after the event
handler thread drops it in order to run the event.  The event handler
function begins to execute, and immediately blocks trying to obtain
the call/connection lock.  Now that rxevent_Cancel() has the eventTree
lock it can proceed to search the tree, fail to find the indicated event
in the tree, clear out the event pointer from the call/connection
data structure, drop its caller's reference to the event structure,
and return failure (the event was not cancelled).  Only then does the
caller of rxevent_Cancel() drop the call/connection lock and allow
the event handler to make progress.

This race is not necessarily problematic if appropriate care is taken,
but in the previous code such was not the case.  In particular, it
is a common idiom for the firing event to call rxevent_Put() on itself,
to release the handle stored in the call/connection that could have
been used to cancel the event before it fired.  Failing to do so would
result in a memory leak of event structures; however, rxevent_Put() does
not check for a NULL argument, so a segfault (NULL dereference) was
observed in the test suite when the race occurred and the event handler
tried to rxevent_Put() the reference that had already been released by
the unsuccessful rxevent_Cancel() call.  Upon inspection, many (but not
all) of the uses in rx.c were susceptible to a similar race condition
and crash.

The test suite also papers over a related issue in that the event handler
in the test suite always knows that the data structure containing the
event handle will remain live, since it is a global array that is allocated
for the entire scope of the test.  In rx.c, events are associated with
calls and connections that have a finite lifetime, so we need to take care
to ensure that the call/connection pointer stored in the event remains
valid for the duration of the event's lifecycle.  In particular, even an
attempt to take the call/connection lock to check whether the corresponding
event field is NULL is fraught with risk, as it could crash if the lock
(and containing call/connection) has already been destroyed!  There are
several potential ways to ensure the liveness of the associated
call/connection while the event handler runs, most notably to take care
in the call/connection destruction path to ensure that all associated
events are either successfully cancelled or run to completion before
tearing down the call/connection structure, and to give the pending event
its own reference on the associated call/connection.  Here, we opt for
the latter, acknowledging that this may result in the event handler thread
doing the full call/connection teardown and delay the firing of subsequent
events.  This is deemed acceptable, as pending events are for intentionally
delayed tasks, and some extra delay is probably acceptable.  (The various
keepalive events and the challenge event could delay the user experience
and/or security properties if significantly delayed, but I do not believe
that this change admits completely unbounded delay in the event handler
thread, so the practical risk seems minimal.)

Accordingly, this commit attempts to ensure that:

* Each event holds a formal reference on its associated call/connection.
* The appropriate lock is held for all accesses to event pointers in
  call/connection structures.
* Each event handler (after taking the appropriate lock) checks whether
  it raced with rxevent_Cancel() and only drops the call/connection's
  reference to the event if the race did not occur.
* Each event handler drops its reference to the associated call/connection
  *after* doing any actions that might access/modify the call/connection.
* The per-event reference on the associated call/connection is dropped by
  the thread that removes the event from the red/black tree.  That is,
  the event handler function if the event runs, or by the caller of
  rxevent_Cancel() when the cancellation succeed.
* No non-NULL event handles remain in a call/connection being destroyed,
  which would indicate a refcounting error.

(*) There is an additional event used in practice, to reap old connections,
    but it is effectively a background task that reschedules itself
    periodically, with no handle to the event retained so as to be able
    to cancel it.  As such, it is unaffected by the concerns raised here.

While here, standardize on the rx_GetConnection() function for incrementing
the reference count on a connection object, instead of inlining the
corresponding mutex lock/unlock and variable access.

Also enable refcount checking unconditionally on unix, as this is a
rather invasive change late in the 1.8.0 release process and we want
to get as much sanity checking coverage as possible.

Change-Id: I27bcb932ec200ff20364fb1b83ea811221f9871c
Reviewed-on: https://gerrit.openafs.org/12756
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
2017-11-22 20:17:40 -05:00
Benjamin Kaduk
bdb509fb1d Adjust rx-event test to exercise cancel/fire race
We currently do not properly handle the case where a thread runs
rxevent_Cancel() in parallel with the event-handler thread attempting
to fire that event, but the test suite only picked up on this issue
in a handful of the Debian automated builds (somewhat less-resourced
ones, perhaps).

Modify the event scheduling algorithm in the test so as to create a
larger chunk of events scheduled to fire "right away" and thereby
exercise the race condition more often when we proceed to cancel
a quarter of events "right away".

Change-Id: I50f55fd532901147cfda1a5f40ef949bf3270401
Reviewed-on: https://gerrit.openafs.org/12755
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
2017-11-08 11:25:23 -05:00
Benjamin Kaduk
81a1a53a36 Clean up our cleaning
'make clean' and 'make maintainer-clean' still leave around a fair
number of droppings, prior to this commit.

We were not descending into the 'tests' top-level directory while
cleaning.  Furthermore, tests/opr/Makefile needed $(LT_CLEAN), and
tests/rx/Makefile needed to spell it correctly.

The libtoolization places a lot of files to be removed in the
'pristine' target.

The processing used to implement the =include directive in the pod
sources for the man pages leaves around the non-.in versions of
files; we should clean that up in the 'pristine' target as well.

The 'pristine' target should likewise remove the man pages which
are generated from the pod files.

Additionally, the documentation build uses a Doxyfile which is
output by configure; that should be removed (if present) by the
'distclean' target.

When hcrypto was converted to libtool, the use of ${OBJECTS} in
the clean target was missed, so we were leaving around most of the
actual object files -- $(LT_CLEAN) does not handle this for us.
Change the rule to remove *.o as is done elsewhere.

The conversion of libafsrpc to libtool added a convenience library
libafsrpc_sys.la, and changed how syscall.o was generated on
most architectures, to be the result of compiling an empty .c file
(instead of just an empty .o file).  This introduced a new
intermediate file, syscall.c, which must be cleaned up.

tvolser was only listing volserver and not vos in its list of
executables to remove while cleaning.

The conversion of venus/test to libtool was not done quite right.
Makefile.libtool and the .lo suffix are only needed when libtool
is being used to link *libraries*; just Makefile.pthread suffices
when libtool is being used to link executables.  As such, remove
the inclusion of Makefile.libtool, and change the .lo targets back
to regular .o ones, and add back *.o to the list of files to remove
in the 'clean' target (it was needed there even without the
other changes to that Makefile).

Change-Id: Ifbc3eee4ad2dce54df991301bc5edd11eb29a24a
Reviewed-on: http://gerrit.openafs.org/11532
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
2014-11-28 18:04:15 -05:00
Benjamin Kaduk
75d9e4b954 Fix memset invocation in rx/event-t.c
The order of the parameters was swapped, which recent gcc complains
loudly about.

Change-Id: I2329ca3dd0eee81639731e78172621b580199024
Reviewed-on: http://gerrit.openafs.org/11451
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
2014-09-10 11:10:51 -04:00
Michael Meffie
b88ff242df tests: posix signal constants in rx/perf test
Export the posix signal constants in the rx/perf perl test. Fixes a
perl syntax error on solaris.

Change-Id: Iaad361b8533787f9ad97fa00221e01e687f50723
Reviewed-on: http://gerrit.openafs.org/9836
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
2013-05-01 07:18:19 -07:00
Simon Wilkinson
f2fdd3040c rx: Make rxevent_Put NULL the event ptr being put
Change rxevent_Put so that it takes a pointer to the event being
put, and NULLs that pointer. This removes a lot of duplicate code
in callers, as well as making it harder to reuse a discarded event.

Change-Id: Ib7a51f01687e08ea3dced5932ec9ec27797a784a
Reviewed-on: http://gerrit.openafs.org/8540
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
2012-12-01 09:51:31 -08:00
Simon Wilkinson
20034a8157 rx: Don't treat calls specially in event package
Many different structures can be passed to the rxevent package as
data. Don't give calls special treatment by making rxevent aware of
how to release their reference counts when an event is cancelled.

Update all of the callers of rxevent_Cancel to use the new arguments,
and where they were cancelling functions with calls as parameters add
the appropriate CALL_RELE directives. In many cases, this has led to
new helper functions to cancel particular call-based events.

Change-Id: Ic02778e48fd950e8850b77bd3c076c235453274d
Reviewed-on: http://gerrit.openafs.org/8538
Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>
2012-12-01 09:51:06 -08:00
Simon Wilkinson
564fe1e329 rx: Build libtool library
Build a pthreaded, libtool, version of librx.a called liboafs_rx.la.
librx.a remains for LWP applications to use. With this change, all RX
objects are built in both the LWP and pthread cases, so some #ifdef
guards are required to protect code that isn't relevant in a given
build.

Currently, all of our pthreaded objects use libafsrpc to get RX
functionality, so this change is fairly minimal outside of the RX
directory.

Change-Id: I8e629e2319fb1964058e70c3c0c3ed548b09b22d
Reviewed-on: http://gerrit.openafs.org/8058
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@your-file-system.com>
2012-09-08 20:28:32 -07:00
Simon Wilkinson
8b84d9538c opr: Convert to using libtool
Convert opr so that it uses libtool. For backwards compatibility we
still build libopr.a, but we do so as a static convenience library.

As libopr.a may, in the future, be converted to an LWP library, change
all of the pthreaded binaries so that they link against the libtool
library liboafs_opr.la

Change-Id: Icee04ff4745334f06ffba16df5bb07fc9dcc0b54
Reviewed-on: http://gerrit.openafs.org/8034
Reviewed-by: Derrick Brashear <shadow@dementix.org>
Tested-by: Derrick Brashear <shadow@dementix.org>
2012-09-04 13:23:14 -07:00
Simon Wilkinson
c877c0b419 tests: Start using the upstream C TAP harness
Instead of bundling our own copies of Russ's C TAP Harness, start using
source pulled from his git repository using the src/external import
mechanism. Note that we are not currently building the floating
point (is_double) portion of the harness.

In the process of doing so, we also upgrade our test harness to the latest
upstream version, 1.11. This is somewhat problematic, as there have been
some significant code changes since the version bundled with OpenAFS.
Work around these by
   *) Referencing the basic.h header as <tests/tap/basic.h>, rather than
      just <tap/basic.h>, to match the new upstream layout
   *) Changing the include path so that the tests/ directory can be
      found within it.

Change-Id: I63efbb30248165e5729005b0a791e7eb7afb051d
Reviewed-on: http://gerrit.openafs.org/7374
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Russ Allbery <rra@stanford.edu>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
2012-05-11 16:36:44 -07:00
Simon Wilkinson
3a1e129b76 tests: Add a RX functionality test
Use the rxperf performance testing tools to add a couple of simple
RX tests. The first moves 1Mbyte of data backwards and forwards 30
times. The second starts 30 threads, which each move 1MByte of data
once.

This is by no means an exhaustive test of RX, but the single and
multi-threaded invocations should provide a useful smoke test if
things get very broken.

Change-Id: I11267be067cf6c05a20aeb90a18ed4031502a1b1
Reviewed-on: http://gerrit.openafs.org/7244
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
2012-04-19 05:07:19 -07:00
Simon Wilkinson
4188865670 Update .gitignore files
Update (and create) .gitignore files across the tree

Change-Id: I7534e4f1eac44e6024f86591a171b63a64c6f320
Reviewed-on: http://gerrit.openafs.org/7241
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
2012-04-18 20:26:54 -07:00
Simon Wilkinson
02f470e99d tests: rx needs roken
On Linux, the RX library has a dependency on libroken for the rk_socket
function. Add this dependency to the RX tests.

Change-Id: I306e846524232bc136cd969ab1b8664d1c570e2d
Reviewed-on: http://gerrit.openafs.org/7127
Tested-by: Simon Wilkinson <simonxwilkinson@gmail.com>
Reviewed-by: Derrick Brashear <shadow@dementix.org>
2012-04-08 18:24:20 -07:00
Simon Wilkinson
043c31bf8d rx: Use a red black tree for the event stack
Instead of the current event stack, which uses a sorted linked
list, use a red/black tree to maintain the timer stack. This
dramatically improves event insertion times, at the expense of
some additional implementation complexity.

This change also adds reference counting to the rxevent
structure. We've always had a race between an event being
fired, and that event being simultaneously cancelled by
the user thread. Reference counting avoids that race resulting
in the structure appearing twice in the free list.

Change-Id: Icbef6e04e01f3eef5b888bc3cb77b7a3d1be26ae
Reviewed-on: http://gerrit.openafs.org/5841
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com>
Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>
2011-11-29 12:29:41 -08:00