ciss: Don't panic on null CR ciss_dequeue_notify

Apparently, sometimes on hot plug/unplug, a null cr comes back from
ciss_dequeue_notify. This is clearly a bug, and by ignoring it we're
papering over that bug. We only ever wake the thread after enqueing a
notification or setting a bit about killing the thread, so once we check
the bit isn't the cause, cr can't be NULL unless something else has
dequeued it.

Ideally, this would be fixed, rather than papered over, but this makes a
very old card somewhat more useable for external enclosures. I suspect
it's a race when we set CISS_THREAD_SHUT and another flag (the latter
w/o ciss_mtx held), but I don't see it and w/o hardware to reproduce
it would be hard to know for sure.

PR: 246279
Reviewed by: imp
Tested by: Marek Zarychta
Differential Revision: https://reviews.freebsd.org/D25155
This commit is contained in:
Peter Eriksson 2024-10-13 22:01:33 -06:00 committed by Warner Losh
parent fd95966af5
commit b339ab1491

View File

@ -4207,8 +4207,20 @@ ciss_notify_thread(void *arg)
cr = ciss_dequeue_notify(sc);
if (cr == NULL)
panic("cr null");
if (cr == NULL) {
/*
* We get a NULL message sometimes when unplugging/replugging
* stuff But this indicates a bug, since we only wake this thread
* when we (a) set the THREAD_SHUT flag, or (b) we have enqueued
* something. Since it's reported around errors, it may be a
* locking bug related to ciss_flags being modified in multiple
* threads some without ciss_mtx held. Or there's some other
* way we either fail to sleep or corrupt the ciss_flags.
*/
ciss_printf(sc, "Driver bug: NULL notify event received\n");
continue;
}
cn = (struct ciss_notify *)cr->cr_data;
switch (cn->class) {