Following arm64 and risc-v, move definitions that describe
hardware-enforced layout of PTEs and #PF error bits, into a dedicated
header.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47749
Correct swap_pager_seek_data so that, when the first lookup finds no
valid pages, second and subsequent lookups are attempted anyway.
This was broken by db08b0b04d.
Reported by: marklmi@yahoo.com
Reviewed by: kib
Tested by: marklmi@yahoo.com
Fixes: db08b0b04d tmpfs_vnops: move swap work to swap_pager
Differential Revision: https://reviews.freebsd.org/D47767
The KASSERT in chn_sleep() can be triggered if more than one thread
wants to sleep on a given channel at the same time. While this is not
really a common scenario, tools such as stress2, which use fork() and
the child process(es) inherit the parent's FDs as a result, we can end
up triggering such scenarios.
Fix this by removing CHN_F_SLEEPING altogether, which is not very useful
in the first place:
- CHN_BROADCAST() checks cv_waiters already, so there is no need to
check CHN_F_SLEEPING as well.
- We can check whether cv_waiters is 0 in pcm_killchans(), instead of
whether CHN_F_SLEEPING is not set.
Reported by: dougm, pho (stress2)
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D47559
Since SD_F_REGISTERED is cleared at the same time SD_F_DETACHING and
SD_F_DYING are set, and since PCM_DETACHING() is always used in
conjuction with PCM_REGISTERED()/DSP_REGISTERED(), it is enough to just
check SD_F_REGISTERED.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D47463
This patch fixes multiple different panic scenarios occuring during
hot-unload:
1. The channel is unlocked in chn_read()/chn_write() for uiomove(9) and
in the meantime we enter pcm_killchans() and free it. By the time we
have returned from userland and try to lock it back, the channel will
have been freed.
2. The parent channel has been freed in pcm_killchans(), but at the same
time, some yet-unstopped vchan's chn_read()/chn_write() calls
chn_start(), which eventually calls vchan_trigger(), which references
the freed parent.
3. PCM_WAIT() panics because it references a freed PCM lock.
For scenarios 1 and 2, refactor pcm_killchans() to first make sure all
channels have been stopped, and then proceed to free them one by one, as
opposed to freeing the first free channel until all channels have been
freed. This change makes the code more robust, but might introduce some
performance overhead when many channels are allocated, since we
continuously loop through the channel list until all of them are
stopped, and then we loop one last time to free them.
For scenario 3, restructure the code so that we can use destroy_dev(9)
instead of destroy_dev_sched(9) in dsp_destroy_dev(). Because
destroy_dev(9) blocks until all references to the device have went away,
we ensure that the PCM cv and lock will be freed safely.
While here, move the delete_unrhdr(9) calls to pcm_killchans() and
re-order some lines.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch
Differential Revision: https://reviews.freebsd.org/D47462
Consider the following scenario:
1. CHN currently has its trigger set to PCMTRIG_STOP.
2. Thread A locks CHN, calls CHANNEL_TRIGGER(PCMTRIG_START), sets the
trigger to PCMTRIG_START and unlocks.
3. Thread B picks up the lock, calls CHANNEL_TRIGGER(PCMTRIG_ABORT) and
returns a non-zero value, so it returns from chn_trigger() as well.
4. Thread A picks up the lock and adds CHN to the list, which is
_wrong_, because the last call to CHANNEL_TRIGGER() was with
PCMTRIG_ABORT, meaning the channel is stopped, yet we are adding it
to the list and marking it as started.
Another problematic scenario:
1. Thread A locks CHN, sets the trigger to PCMTRIG_ABORT, and unlocks
CHN. It then locks PCM and _removes_ CHN from the list.
2. In the meantime, since thread A unlocked CHN, thread B has locked it,
set the trigger to PCMTRIG_START, unlocked it, and is now blocking on
PCM held by thread A.
3. At the same time, thread C locks CHN, sets the trigger back to
PCMTRIG_ABORT, unlocks CHN, and is also blocking on PCM. However,
once thread A unlocks PCM, because thread C is higher-priority than
thread B, it picks up the PCM lock instead of thread B, and because
CHN is already removed from the list, and thread B hasn't added it
back yet, we take a page fault in CHN_REMOVE() by trying to remove a
non-existent element.
To fix the former scenario, set the channel trigger before the call to
CHANNEL_TRIGGER() (could also come after, doesn't really matter) and
check if anything changed one we lock CHN back.
To fix the latter scenario, use the SAFE variants of CHN_INSERT_HEAD()
and CHN_REMOVE(). A similar scenario can occur in vchan_trigger(), so do
the trigger setting after we've locked the parent channel.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch
Differential Revision: https://reviews.freebsd.org/D47461
Use callout_init_mtx(9) to associate the callback with the driver's
lock. Also make sure the callout is stopped properly during detach.
While here, introduce a dummy_active() function to know when it's
appropriate to stop or not reschedule the callout.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D47459
If a fragmented IPv6 packet hits a route-to rule we have to first prevent
the pf_test(PF_OUT) check in pf_route6() from refragmenting (and calling
ip6_output()/ip6_forward()). We then have to refragment in pf_route6() and
transmit the packets on the route-to interface.
Split pf_refragment6() into two parts, the first to perform the refragmentation,
the second to call ip6_output()/ip6_forward() and call the former from
pf_route6().
Add a test case for route-to-ing fragmented IPv6 packets to verify this works
as expected.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47684
D37419 corrupts VFP context store on signal delivery and D38696 corrupts PCB
because it performs a binary copy between structures with different layouts.
Revert the problematic parts of these commits to have signals delivery
working. Unfortunately, there are more problems with these revisions and
more fixes need to be developed.
Fixes: 6926e2699a
Fixes: 4d2427f2c4
MFC after: 4 weeks
The current quirk is designed to discard duplicated data read from
the chip. Problem is, it also discards real events when they happen
to be identical, which is the case with scroll wheel events;
differently from X/Y they always move by fixed offset. This results
in two-finger scroll that would stop mid-way that could be fixed by
manually setting dev.hms.0.drift_thresh to 0.
To fix that, don't discard duplicates when there's wheel movement.
For users with actual duplicates problem this will result in scroll
suddenly becoming quite inertial, but it will stop moving at any touch,
so shouldn't be terrible.
PR: kern/276709
Reviewed By: wulf
Differential Revision: https://reviews.freebsd.org/D47640
It's possible to take a signal after pselect/ppoll have set their return
value, but before we actually return to userland. This results in
taking a signal without reflecting it in the return value, which weakens
the guarantees provided by these functions.
Switch both to restore the signal mask before we would deliver signals
on return to userland. If a signal was received after the wait was
over, then we'll just have the signal queued up for the next time it
comes unblocked. The modified signal mask is retained if we were
interrupted so that ast() actually handles the signal, at which point
the signal mask is restored.
des@ has a test case demonstrating the issue in D47738 which will
follow.
Note for MFC: TDA_PSELECT is a KBI break, we should just inline
ast_sigsuspend() in pselect/ppoll for stable branches. It's not exactly
the same, but it will be close enough.
Reported by: des
Reviewed by: des (earlier version), kib
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47741
It may be the case that we want to avoid delivering signals that are
normally blocked by the thread's signal mask, in which case the syscall
should schedule this one instead to restore the mask prior to delivery.
This will be used by pselect/ppoll to avoid delivering signals that were
supposed to be blocked after the timeout has elapsed. The name was
chosen as this is the expected behavior of pselect/ppoll, while late
restoration of the mask is exceptional behavior for these specific
calls.
__FreeBSD_version bump as later TDA_* values have changed, third-party
modules that may be using MOD3/MOD4 need to be rebuilt.
Reviewed by: kib
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47741
Segment base registers are at 8-byte intervals, while the register
write helper takes a byte-aligned offset. This fixes
DEV_TAB_HARDWARE_ERROR events and associated peripheral I/O failures
on an Epyc-based system with 8-segment device tables.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D47752
The livedumper triggers reports from both of these sanitizers since it
necessarily accesses uninitialized or freed memory. Add a flag to
silence reports from both sanitizers.
Reviewed by: mhorne, khng
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47714
The T-HEAD custom PTE bits are defined in such a way that the
default/normal memory type is non-zero value. This _unthoughtful_ choice
means that, unlike the Svpbmt and non-Svpbmt cases, this field cannot be
left bare in our bootstrap PTEs, or the hardware will fail to proceed
far enough in boot (cache strangeness). On the other hand, we cannot
unconditionally apply the PTE_THEAD_MA_NONE attributes, as this is not
compatible with spec-compliant RISC-V hardware, and will result in a
fatal exception.
Therefore, in order to handle this errata, we are forced to perform a
check of the CPU type at the first moment possible. Do so, and fix up
the PTEs with the correct memory attribute bits in the T-HEAD case.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47458
Switch the boot argument registers to the unused s3 and s4. This ensures
the values will not be clobbered by SBI or function calls; they are
consumed late in the assembly routine.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47457
T-HEAD CPUs provide a spec-violating implementation of page-based memory
types, using PTE bits [63:59]. Add basic support for this "errata",
referred to in some places as an "extension".
Note that this change is not enough on its own, but a workaround is
needed for the bootstrap (locore) page tables as well.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45472
This is the first major quirk we need to support in order to run on
current T-HEAD/XuanTie CPUs, e.g. the C906 or C910, found in several
existing RISC-V SBCs. With these custom dcache routines installed,
busdma can reliably communicate with devices which are not coherent
w.r.t. the CPU's data caches.
This patch introduces the first quirk/errata handling functions to
identcpu.c, and thus is forced to make some decisions about how this
code is structured. It will be amended with the changes that follow in
the series, yet I feel the final result is (unavoidably) somewhat
clumsy. I expect the CPU identification code will continue to evolve as
more CPUs and their quirks are eventually supported.
Discussed with: jrtc27
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47455
Cache management operations were, for a long time, unspecified by the
RISC-V ISA, and thus these functions have been no-ops. To cope, hardware
with non-coherent I/O has implemented custom cache flush mechanisms,
either in the form of custom instructions or special device registers.
Additionally, the RISC-V CMO extension is ratified and these official
instructions will start to show up in hardware eventually. Therefore, a
method is needed to select the dcache management routines at runtime.
Add a simple set of function hooks, as well as a routine to install them
and specify the minimum dcache line size. The first consumer will be the
non-standard cache management instructions for T-HEAD CPUs.
The unused I-cache variables and macros are removed.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47454
Properly initialize setdf variable in ipsec_encap().
It is used for AF_INET6 case when IPv6 datagram is going to be
encapsulated into IPv4 datagram.
PR: 282535
Fixes: 4046178557
MFC after: 1 week
instead of constructing transient pte itself. This pre-set PG_A and
PG_M bits, avoiding atomic pte update on access and modification. Also
it set the nx bit, the mapping is not supposed to be used for executing.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47717
Pass the to-be-freed page to vm_page_iter_free as a parameter, rather
than computing it from the iterator parameter, to improve performance.
Sort declarations of page_iter functions in vm_page.h.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D47727
Without this patch, an all upper case user domain name
(as specified by nfsuserd(8)) would not work.
I believe this was done so that Kerberos realms were
not confused with user domains.
Now, RFC8881 specifies that the user domain name is a
DNS name. As such, all upper case names should work.
This patch fixes this case so that it works. The custom
comparison function is no longer needed.
PR: 282620
Tested by: jmmv
MFC after: 2 weeks
Notable upstream pull request merges:
#16643 -multiple Change rangelock handling in FreeBSD's zfs_getpages()
#1669746c4f2ce0 dsl_dataset: put IO-inducing frees on the pool deadlist
#16740 -multiple BRT: Rework structures and locks to be per-vdev
#16743a60ed3822 L2ARC: Move different stats updates earlier
#167588dc452d90 Fix some nits in zfs_getpages()
#16759534688948 Remove hash_elements_max accounting from DBUF and ARC
#167669a81484e3 ZAP: Reduce leaf array and free chunks fragmentation
#16773457f8b76e BRT: More optimizations after per-vdev splitting
#167820ca82c568 L2ARC: Stop rebuild before setting spa_final_txg
#16785d76d79fd2 zio: Avoid sleeping in the I/O path
#16791ae1d11882 BRT: Clear bv_entcount_dirty on destroy
#16796b3b0ce64d FreeBSD: Lock vnode in zfs_ioctl()
#16797d0a91b9f8 FreeBSD: Reduce copy_file_range() source lock to shared
Obtained from: OpenZFS
OpenZFS commit: d0a91b9f88
This is a retread of https://reviews.freebsd.org/D34449 which I think
will fix the issue for the remote side not supporting autoneg. We now
attempt an autoneg, and if that fails fall back to the current code
that forces the link speed/duplex.
The original intent of this patch is to inform the remote switch of
duplex settings when we (the client) are specifying a fixed 10 or 100
speed. Otherwise it may get the duplex setting wrong.
The tricky case is when the remote (switch) side is fixing its
speed AND duplex while disabling autoneg and we (client) need to do
the same, which still seems to be common enough at some ISPs.
Original commit message follows:
Currently if an e1000 interface is set to a fixed media configuration,
for gigabit, it will participate in auto-negotiation as required by
IEEE 802.3-2018 Clause 37. However, if set to fixed media configuration
for 100 or 10, it does NOT participate in auto-negotiation.
By my reading of Clauses 28 and 37, while auto-negotiation is optional
for 100 and 10, it is not prohibited and is, in fact, "highly
recommended".
This patch enables auto-negotiation for fixed 100 and 10 media
configuration, in a similar manner to that already performed for 1000.
I.e., the patch enables advertising of just the manually configured
settings with the goal of allowing the remote end to match the manually
configured settings if it has them available.
To be clear, this patch does NOT allow an em(4) interface that has been
manually configured with specific media settings to respond to
auto-negotiation by then configuring different parameters to those that
were manually configured. The intent of this patch is to fully comply
with the requirements of Clause 37, but for 100 and 10.
The need for this has arisen on an em(4) link where the other end is
under a different administrative control and is set to full
auto-negotiation. Due to the cable length GigE is not working well. It
is desired to set the em(4) end to "media 100baseTX mediatype
full-duplex" which does work when both ends are configured that way.
Currently, because em(4) does not participate in autoneg for this
setting, the remote defaults to half-duplex - i.e., there's a duplex
mismatch and things don't work. With this patch, em(4) would inform the
remote that it has only 100baseTX full, the remote would match that and
it will work.
Tested by: Natalino Picone <natalino.picone@nozominetworks.com>
Tested by: Franco Fichtner <franco@opnsense.org>
Tested by: J.R. Oldroyd <fbsd@opal.com> (previous version)
Sponsored by: Nozomi Networks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47336
A device can be disabled via a hint after it is probed (but before it
is attached). The initial version of this marked the device disabled,
but left the device "alive" meaning that dev->driver and dev->desc
were untouched and still pointed into the driver that probed the
device. If that driver lives in a kernel module that is later
unloaded, device_detach() called from devclass_delete_driver() doesn't
do anything (the device's state is DS_ALIVE). In particular, it
doesn't call device_set_driver(dev, NULL) to disassociate the device
from the driver that is being unloaded.
There are several places where these stale pointers can be tripped
over. After kldunload, invoking the sysctl to fetch device info can
dereference dev->desc and dev->driver causing panics. Even without
kldunload, a system suspend request will call the device_suspend and
device_resume DEVMETHODs of the driver in question even though the
device is not attached which can cause some excitement.
To clean this up, more fully detach a device that is disabled by a
hint by clearing the driver and setting the state to DS_NOTPRESENT.
However, to keep the device name+unit combination reserved, leave the
device attached to its devclass.
This requires a change to 'devctl enable' handling to deal with this
updated state. It now checks for a non-NULL devclass to determine if
a disabled device is in this state and if so it clears the hint.
However, it also now clears the devclass before attaching the device.
This gives all drivers an opportunity to attach to the now-enabled
device.
Reported by: adrian
Discussed with: imp
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47691
On a laptop with no other console devices than the screen, things
scroll of the screen faster than eye or camera can capture it.
This tunable slows the console down and makes it update synchronously,
so console output continues when timers or interrupts do not.
Differential Revision: https://reviews.freebsd.org/D47710
Remove the array of port module status and instead save module status
and module number.
At boot, for each PCI function driver get event from fw about module
status. The event contains module number and module status. Driver
stores module number and module status.. When user (ifconfig) ask for
modules information, for each pci function driver first queries fw to
get module number of current pci function, then driver compares the
module number to the module number it stored before and if it matches
and module status is "plugged and enabled" then driver queries fw for
the eprom information of that module number and return it to the
caller.
In fact fw could have concluded that required module number of the
current pci function, but fw is not implemented this way. current
design of PRM/FW is that MCIA register handling is only aware of
modules, not the pci function->module connections. FW is designed to
take the module number written to MCIA and write/read the content
to/from the associated module's EPROM.
So, based on current FW design, we must supply the module num so fw
can find the corresponding I2C interface of the module to write/read.
Sponsored by: NVidia networking
MFC after: 1 week