openafs

mirror of https://git.openafs.org/openafs.git synced 2025-01-18 23:10:58 +00:00

Author	SHA1	Message	Date
Cheyenne Wills	bf37aec672	tests: fix potential divide by zero condition Running clang's static analysis revealed a possible divide by zero condition. There is a random chance of the divide by zero. - it has to be in the first pass of the main loop testing events (counter = 0) - 90% chance path : if (counter < (NUMEVENTS -1) && random() % 10 == 0) -- needs to be false - 25% chance path: if (random() % 4 == 0) -- needs to be true if the above conditions are met, the statement int victim = random() % counter is a divide by zero. Add a check to ensure the counter is greater than zero. Add a comment to document that only events prior to the current event are randomly selected. Change-Id: I4b4e73fa324842bb504bcc952079af15aea8a6a3 Reviewed-on: https://gerrit.openafs.org/14501 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Andrew Deason <adeason@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2021-01-22 19:36:36 -05:00
Andrew Deason	ebaefc5a06	tests: Give more leeway in rx/event-t Currently, the rx/event-t tests schedule a bunch of events up to 3 seconds in the future, and then we sleep for 3 seconds to give them a chance to run. Since we're cutting it so close, this can rarely result in a few events not being run (observed occasionally on FreeBSD 12.1, where we failed to run about 3 events out of 10000). To avoid this, just sleep for 4 seconds instead of 3. Also print out a little more info regarding the number of fired/cancelled events, so we can see the event count when it's wrong. Change-Id: I6269bea2c245aeed00c129ff638423d0fa81ad23 Reviewed-on: https://gerrit.openafs.org/14160 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2020-05-01 17:37:05 -04:00
Michael Meffie	50a3eb7b7e	tests: fix out of bounds access in the rx-event test Use the NUMEVENTS symbol which defines the array size instead of an incorrect hard coded number when checking if a second event can be added to be fired at the same time. This fixes a potential out of bounds access of the event test array. Also update the comment which incorrectly mentions the incorrect number of events in the test. Change-Id: I4f993b42e53e7e6a42fa31302fd1baa70e9f5041 Reviewed-on: https://gerrit.openafs.org/12762 Reviewed-by: Benjamin Kaduk <kaduk@mit.edu> Tested-by: BuildBot <buildbot@rampaginggeek.com>	2017-11-22 20:50:47 -05:00
Benjamin Kaduk	304d758983	Standardize rx_event usage Go over all consumers of the rx event framework and normalize its usage according to the following principles: rxevent_Post() is used to create an event, and it returns an event handle (with a reference on the event structure) that can be used to cancel the event before its timeout fires. (There is also an additional reference on the event held by the global event tree.) In all() usage within the tree, that event handle is stored within either an rx_connection or an rx_call. Reads/writes to the member variable that holds the event handle require either the conn_data_lock or call lock, respectively -- that means that in most cases, callers of rxevent_Post() and rxevent_Cancel() will be holding one of those aforementioned locks. The event handlers themselves will need to modify the call/connection object according to the nature of the event, which requires holding those same locks, and also a guarantee that the call/connection is still a live object and has not been deallocated! Whether or not rxevent_Cancel() succeeds in cancelling the event before it fires, whenever passed a non-NULL event structure it will NULL out the supplied pointer and drop a reference on the event structure. This is the correct behavior, since the caller has asked to cancel the event and has no further use for the event handle or its reference on the event structure. The caller of rxevent_Cancel() must check its return value to know whether or not the event was cancelled before its handler was able to run. The interaction window between the call/connection lock and the lock protecting the red/black tree of pending events opens up a somewhat problematic race window. Because the application thread is expected to hold the call/connection lock around rxevent_Cancel() (to protect the write to the field in the call/connection structure that holds an event handle), and rxevent_Cancel() must take the lock protecting the red/black tree of events, this establishes a lock order with the call/connection lock taken before the eventTree lock. This is in conflict with the event handler thread, which must take the eventTree lock first, in order to select an event to run (and thus know what additional lock would need to be taken, by virtue of what handler function is to be run). The conflict is easy to resolve in the standard way, by having a local pointer to the event that is obtained while the event is removed from the red/black tree under the eventTree lock, and then the eventTree lock can be dropped and the event run based on the local variable referring to it. The race window occurs when the caller of rxevent_Cancel() holds the call/connection lock, and rxevent_Cancel() obtains the eventTree lock just after the event handler thread drops it in order to run the event. The event handler function begins to execute, and immediately blocks trying to obtain the call/connection lock. Now that rxevent_Cancel() has the eventTree lock it can proceed to search the tree, fail to find the indicated event in the tree, clear out the event pointer from the call/connection data structure, drop its caller's reference to the event structure, and return failure (the event was not cancelled). Only then does the caller of rxevent_Cancel() drop the call/connection lock and allow the event handler to make progress. This race is not necessarily problematic if appropriate care is taken, but in the previous code such was not the case. In particular, it is a common idiom for the firing event to call rxevent_Put() on itself, to release the handle stored in the call/connection that could have been used to cancel the event before it fired. Failing to do so would result in a memory leak of event structures; however, rxevent_Put() does not check for a NULL argument, so a segfault (NULL dereference) was observed in the test suite when the race occurred and the event handler tried to rxevent_Put() the reference that had already been released by the unsuccessful rxevent_Cancel() call. Upon inspection, many (but not all) of the uses in rx.c were susceptible to a similar race condition and crash. The test suite also papers over a related issue in that the event handler in the test suite always knows that the data structure containing the event handle will remain live, since it is a global array that is allocated for the entire scope of the test. In rx.c, events are associated with calls and connections that have a finite lifetime, so we need to take care to ensure that the call/connection pointer stored in the event remains valid for the duration of the event's lifecycle. In particular, even an attempt to take the call/connection lock to check whether the corresponding event field is NULL is fraught with risk, as it could crash if the lock (and containing call/connection) has already been destroyed! There are several potential ways to ensure the liveness of the associated call/connection while the event handler runs, most notably to take care in the call/connection destruction path to ensure that all associated events are either successfully cancelled or run to completion before tearing down the call/connection structure, and to give the pending event its own reference on the associated call/connection. Here, we opt for the latter, acknowledging that this may result in the event handler thread doing the full call/connection teardown and delay the firing of subsequent events. This is deemed acceptable, as pending events are for intentionally delayed tasks, and some extra delay is probably acceptable. (The various keepalive events and the challenge event could delay the user experience and/or security properties if significantly delayed, but I do not believe that this change admits completely unbounded delay in the event handler thread, so the practical risk seems minimal.) Accordingly, this commit attempts to ensure that: Each event holds a formal reference on its associated call/connection. * The appropriate lock is held for all accesses to event pointers in call/connection structures. * Each event handler (after taking the appropriate lock) checks whether it raced with rxevent_Cancel() and only drops the call/connection's reference to the event if the race did not occur. * Each event handler drops its reference to the associated call/connection after doing any actions that might access/modify the call/connection. * The per-event reference on the associated call/connection is dropped by the thread that removes the event from the red/black tree. That is, the event handler function if the event runs, or by the caller of rxevent_Cancel() when the cancellation succeed. * No non-NULL event handles remain in a call/connection being destroyed, which would indicate a refcounting error. (*) There is an additional event used in practice, to reap old connections, but it is effectively a background task that reschedules itself periodically, with no handle to the event retained so as to be able to cancel it. As such, it is unaffected by the concerns raised here. While here, standardize on the rx_GetConnection() function for incrementing the reference count on a connection object, instead of inlining the corresponding mutex lock/unlock and variable access. Also enable refcount checking unconditionally on unix, as this is a rather invasive change late in the 1.8.0 release process and we want to get as much sanity checking coverage as possible. Change-Id: I27bcb932ec200ff20364fb1b83ea811221f9871c Reviewed-on: https://gerrit.openafs.org/12756 Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-22 20:17:40 -05:00
Benjamin Kaduk	bdb509fb1d	Adjust rx-event test to exercise cancel/fire race We currently do not properly handle the case where a thread runs rxevent_Cancel() in parallel with the event-handler thread attempting to fire that event, but the test suite only picked up on this issue in a handful of the Debian automated builds (somewhat less-resourced ones, perhaps). Modify the event scheduling algorithm in the test so as to create a larger chunk of events scheduled to fire "right away" and thereby exercise the race condition more often when we proceed to cancel a quarter of events "right away". Change-Id: I50f55fd532901147cfda1a5f40ef949bf3270401 Reviewed-on: https://gerrit.openafs.org/12755 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Mark Vitale <mvitale@sinenomine.net> Reviewed-by: Michael Meffie <mmeffie@sinenomine.net> Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>	2017-11-08 11:25:23 -05:00
Benjamin Kaduk	75d9e4b954	Fix memset invocation in rx/event-t.c The order of the parameters was swapped, which recent gcc complains loudly about. Change-Id: I2329ca3dd0eee81639731e78172621b580199024 Reviewed-on: http://gerrit.openafs.org/11451 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>	2014-09-10 11:10:51 -04:00
Simon Wilkinson	f2fdd3040c	rx: Make rxevent_Put NULL the event ptr being put Change rxevent_Put so that it takes a pointer to the event being put, and NULLs that pointer. This removes a lot of duplicate code in callers, as well as making it harder to reuse a discarded event. Change-Id: Ib7a51f01687e08ea3dced5932ec9ec27797a784a Reviewed-on: http://gerrit.openafs.org/8540 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>	2012-12-01 09:51:31 -08:00
Simon Wilkinson	20034a8157	rx: Don't treat calls specially in event package Many different structures can be passed to the rxevent package as data. Don't give calls special treatment by making rxevent aware of how to release their reference counts when an event is cancelled. Update all of the callers of rxevent_Cancel to use the new arguments, and where they were cancelling functions with calls as parameters add the appropriate CALL_RELE directives. In many cases, this has led to new helper functions to cancel particular call-based events. Change-Id: Ic02778e48fd950e8850b77bd3c076c235453274d Reviewed-on: http://gerrit.openafs.org/8538 Reviewed-by: Derrick Brashear <shadow@your-file-system.com> Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>	2012-12-01 09:51:06 -08:00
Simon Wilkinson	c877c0b419	tests: Start using the upstream C TAP harness Instead of bundling our own copies of Russ's C TAP Harness, start using source pulled from his git repository using the src/external import mechanism. Note that we are not currently building the floating point (is_double) portion of the harness. In the process of doing so, we also upgrade our test harness to the latest upstream version, 1.11. This is somewhat problematic, as there have been some significant code changes since the version bundled with OpenAFS. Work around these by ) Referencing the basic.h header as <tests/tap/basic.h>, rather than just <tap/basic.h>, to match the new upstream layout ) Changing the include path so that the tests/ directory can be found within it. Change-Id: I63efbb30248165e5729005b0a791e7eb7afb051d Reviewed-on: http://gerrit.openafs.org/7374 Tested-by: BuildBot <buildbot@rampaginggeek.com> Reviewed-by: Russ Allbery <rra@stanford.edu> Reviewed-by: Derrick Brashear <shadow@dementix.org>	2012-05-11 16:36:44 -07:00
Simon Wilkinson	043c31bf8d	rx: Use a red black tree for the event stack Instead of the current event stack, which uses a sorted linked list, use a red/black tree to maintain the timer stack. This dramatically improves event insertion times, at the expense of some additional implementation complexity. This change also adds reference counting to the rxevent structure. We've always had a race between an event being fired, and that event being simultaneously cancelled by the user thread. Reference counting avoids that race resulting in the structure appearing twice in the free list. Change-Id: Icbef6e04e01f3eef5b888bc3cb77b7a3d1be26ae Reviewed-on: http://gerrit.openafs.org/5841 Tested-by: BuildBot <buildbot@rampaginggeek.com> Tested-by: Jeffrey Altman <jaltman@secure-endpoints.com> Reviewed-by: Jeffrey Altman <jaltman@secure-endpoints.com>	2011-11-29 12:29:41 -08:00

10 Commits