* thread/condition: fix PthreadCondition compilation
* thread/condition: add wait, signal and broadcast
This is like std.Thread.Mutex which forwards calls to `impl`; avoids
having to call `cond.impl` every time.
* thread/condition: initialize the implementation
After a right shift, top limbs may be all zero. However, without
normalization, the number of limbs is not going to change.
In order to check if a big number is zero, we used to assume that the
number of limbs is 1. Which may not be the case after right shifts,
even if the actual value is zero.
- Normalize after a right shift
- Add a test for that issue
- Check all the limbs in `eqlZero()`. It may not be necessary if
callers always remember to normalize before calling the function.
But checking all the limbs is very cheap and makes the function less
bug-prone.
This is a proof-of-concept of switching to a new memory layout for
tokens and AST nodes. The goal is threefold:
* smaller memory footprint
* faster performance for tokenization and parsing
* most importantly, a proof-of-concept that can be also applied to ZIR
and TZIR to improve the entire compiler pipeline in this way.
I had a few key insights here:
* Underlying premise: using less memory will make things faster, because
of fewer allocations and better cache utilization. Also using less
memory is valuable in and of itself.
* Using a Struct-Of-Arrays for tokens and AST nodes, saves the bytes of
padding between the enum tag (which kind of token is it; which kind
of AST node is it) and the next fields in the struct. It also improves
cache coherence, since one can peek ahead in the tokens array without
having to load the source locations of tokens.
* Token memory can be conserved by only having the tag (1 byte) and byte
offset (4 bytes) for a total of 5 bytes per token. It is not necessary
to store the token ending byte offset because one can always re-tokenize
later, but also most tokens the length can be trivially determined from
the tag alone, and for ones where it doesn't, string literals for
example, one must parse the string literal again later anyway in
astgen, making it free to re-tokenize.
* AST nodes do not actually need to store more than 1 token index because
one can poke left and right in the tokens array very cheaply.
So far we are left with one big problem though: how can we put AST nodes
into an array, since different AST nodes are different sizes?
This is where my key observation comes in: one can have a hash table for
the extra data for the less common AST nodes! But it gets even better than
that:
I defined this data that is always present for every AST Node:
* tag (1 byte)
- which AST node is it
* main_token (4 bytes, index into tokens array)
- the tag determines which token this points to
* struct{lhs: u32, rhs: u32}
- enough to store 2 indexes to other AST nodes, the tag determines
how to interpret this data
You can see how a binary operation, such as `a * b` would fit into this
structure perfectly. A unary operation, such as `*a` would also fit,
and leave `rhs` unused. So this is a total of 13 bytes per AST node.
And again, we don't have to pay for the padding to round up to 16 because
we store in struct-of-arrays format.
I made a further observation: the only kind of data AST nodes need to
store other than the main_token is indexes to sub-expressions. That's it.
The only purpose of an AST is to bring a tree structure to a list of tokens.
This observation means all the data that nodes store are only sets of u32
indexes to other nodes. The other tokens can be found later by the compiler,
by poking around in the tokens array, which again is super fast because it
is struct-of-arrays, so you often only need to look at the token tags array,
which is an array of bytes, very cache friendly.
So for nearly every kind of AST node, you can store it in 13 bytes. For the
rarer AST nodes that have 3 or more indexes to other nodes to store, either
the lhs or the rhs will be repurposed to be an index into an extra_data array
which contains the extra AST node indexes. In other words, no hash table needed,
it's just 1 big ArrayList with the extra data for AST Nodes.
Final observation, no need to have a canonical tag for a given AST. For example:
The expression `foo(bar)` is a function call. Function calls can have any
number of parameters. However in this example, we can encode the function
call into the AST with a tag called `FunctionCallOnlyOneParam`, and use lhs
for the function expr and rhs for the only parameter expr. Meanwhile if the
code was `foo(bar, baz)` then the AST node would have to be `FunctionCall`
with lhs still being the function expr, but rhs being the index into
`extra_data`. Then because the tag is `FunctionCall` it means
`extra_data[rhs]` is the "start" and `extra_data[rhs+1]` is the "end".
Now the range `extra_data[start..end]` describes the list of parameters
to the function.
Point being, you only have to pay for the extra bytes if the AST actually
requires it. There's no limit to the number of different AST tag encodings.
Preliminary results:
* 15% improvement on cache-misses
* 28% improvement on total instructions executed
* 26% improvement on total CPU cycles
* 22% improvement on wall clock time
This is 1/4 items on the checklist before this can actually be merged:
* [x] parser
* [ ] render (zig fmt)
* [ ] astgen
* [ ] translate-c
It now uses the log scope "gpa" instead of "std".
Additionally, there is a new config option `verbose_log` which enables
info log messages for every allocation. Can be useful when debugging.
This option is off by default.
Also known as "Struct-Of-Arrays" or "SOA". The purpose of this data
structure is to provide a similar API to ArrayList but instead of
the element type being a struct, the fields of the struct are in N
different arrays, all with the same length and capacity.
Having this abstraction means we can put them in the same allocation,
avoiding overhead with the allocator. It also saves a tiny bit of
overhead from the redundant capacity and length fields, since each
struct element shares the same value.
This is an alternate implementation to #7854.
This temporary patch fixes a segfault caused by miscompilation
by the LLD when generating stubs for initialization of thread local
storage. We effectively bypass TLS in the default panic handler
so that no segfault is generated and the stack trace is correctly
reported back to the user.
Note that, this is linked directly to a bigger issue with LLD
ziglang/zig#7527 and when resolved, we only need to remove the
`comptime` code path introduced with this patch to use the default
panic handler that relies on TLS.
Co-authored-by: Andrew Kelley <andrew@ziglang.org>
ideal capacity is now determined by e.g.
x += x / f
rather than
x = x * b / a
This turns a multiplication into an addition, making it less likely to
overflow the integer. This commit also introduces padToIdeal() which
does saturating arithmetic so that no overflow is possible when
calculating ideal capacity.
closes#7830
* move concurrency primitives that always operate on kernel threads to
the std.Thread namespace
* remove std.SpinLock. Nobody should use this in a non-freestanding
environment; the other primitives are always preferable. In
freestanding, it will be necessary to put custom spin logic in there,
so there are no use cases for a std lib version.
* move some std lib files to the top level fields convention
* add std.Thread.spinLoopHint
* add std.Thread.Condition
* add std.Thread.Semaphore
* new implementation of std.Thread.Mutex for Windows and non-pthreads Linux
* add std.Thread.RwLock
Implementations provided by @kprotty
* CLI: change to -mred-zone and -mno-red-zone to match gcc/clang.
* build.zig: remove the double negative and make it an optional bool.
This follows precedent from other flags, allowing the compiler CLI to
be the decider of what is default instead of duplicating the default
value into the build system code.
* Compilation: make it an optional `want_red_zone` instead of a
`no_red_zone` bool. The default is decided by a call to
`target_util.hasRedZone`.
* When creating a Clang command line, put -mred-zone on the command
line if we are forcing it to be enabled.
* Update update_clang_options.zig with respect to the recent {s}/{} format changes.
* `zig cc` integration with red zone preference.
The macOS version is now obtained by parsing `SystemVersion.plist`.
Test cases added for plist files that date back to '2005 Panther and up
to the recent '2020 Big Sur 11.1 release of macOS.
Thus we are now able to reliably identify 10.3...11.1 and higher.
- drop use of kern.osproductversion sysctl
- drop use of kern.osversion sysctl (fallback)
- drop kern.osversion tests
- add `lib.std.zig.system.detect()`
- add minimalistic parser for `SystemVersion.plist`
- add test cases for { 10.3, 10.3.9, 10.15.6, 11.0, 11.1 }
closes#7569
... and mem.copy operations. Requires slightly larger input buffers than result length. Add helper functions std.mem.alignInBytes and std.mem.alignInSlice.
The NtQueryInformationFile with .FileNormalizedNameInformation is only available in Windows 10 1803 (rs4) and later, however there is probably still another route we can go via ntdll.
Avoid errors if the socket enters the TIME_WAIT state and we need to
re-execute this test before the OS releases it.
This problem was not really a problem before since the accept()-ed
socket was never closed on the server-side.
The overlap between files and sockets is minimal and lumping them
together means supporting only a small subset of the functionalities
provided by the OS.
Moreover the socket and file handles are not always interchangeable: on
Windows one should use Winsock's close() call rather than the one used
for common files.
This will enable code to perform version checks and make it easier to
support multiple versions of Zig.
Within the SemVer implementation, an intermediate value needed to be
coerced to a slice to workaround a comptime bug.
Closes#6466
Zig's format system is flexible enough to add custom formatters. This PR removes the new z/Z format specifiers that were added for printing Zig identifiers and replaces them with custom formatters.
* std.ArrayList gains `moveToUnmanaged` and dead code
`ArrayListUnmanaged.appendWrite` is deleted.
* emit_h state is attached to Module rather than Compilation.
* remove the implementation of emit-h because it did not properly
integrate with incremental compilation. I will re-implement it
in a follow-up commit.
* Compilation: use the .codegen_failure tag rather than
.dependency_failure tag for when `bin_file.updateDecl` fails.
C backend:
* Use a CValue tagged union instead of strings for C values.
* Cleanly separate state into Object and DeclGen:
- Object is present only when generating a .c file
- DeclGen is present for both generating a .c and .h
* Move some functions into their respective Object/DeclGen namespace.
* Forward decls are managed by the incremental compilation frontend; C
backend no longer renders function signatures based on callsites.
For simplicity, all functions always get forward decls.
* Constants are managed by the incremental compilation frontend. C
backend no longer has a "constants" section.
* Participate in incremental compilation. Each Decl gets an ArrayList
for its generated C code and it is updated when the Decl is updated.
During flush(), all these are joined together in the output file.
* The new CValue tagged union is used to clean up using of assigning to
locals without an additional pointer local.
* Fix bug with bitcast of non-pointers making the memcpy destination
immutable.
In strictly conforming C, identifiers cannot container dollar signs.
However GCC and Clang allow them by default, so translate-c should
handle them. See http://gcc.gnu.org/onlinedocs/cpp/Tokenization.html
I encountered this in the wild in windows.h
Fixes#7585
This broke build scripts that wanted to refer to `exe_dir` or
`install_path`.
There has also been some pushback and discussion on this breaking
change. I think it should be re-evaluated.
This reverts commit a1a1929cf4.
This code is adapted from pixelherodev paste from IRC
I have added a new fmt option to handle printing slice values ``{v}`` or ``{V}``
While i think it can be made the default, i want your opinion about it
```zig
var slicea = [0]u32{};
var sliceb = [3]u32{ 1, 2, 3 };
std.log.info("Content: {v}", .{slicea});
std.log.info("Content: {v}", .{sliceb});
```
will print:
```
info: Content: []
info: Content: [1, 2, 3]
```
Question:
Should we drop ``{v}`` and make it the default behavior?
* Improve ArrayList & co documentation
- Added doc comments about the validity of references to elements in
an ArrayList and how they may become invalid after resizing operations.
- This should help users avoid footguns in future.
* Improve ArrayListUnmanaged & co's documentation
- Port improved documentation from ArrayList and ArrayList aligned to
their unmanaged counterparts.
- Made documentation for ArrayListUnmanaged & co more inclusive and
up-to-date.
- Made documentation more consistent with `ArrayList`.
* Corrections on ArrayList documentation.
- Remove incorrect/unpreferred wording on ArrayList vs
ArrayListUnmanaged.
- Fix notes about the alignment of ArrayListAligned
- Be more verbose with warnings on when pointers are invalidated.
- Copy+paste a few warnings
* add warning to replaceRange
* revert changes to append documentation
The reason this is required is for two reasons: 1) the libc
targeting `aarch64` macOS is slightly newer than that targeting
`x86_64`, and 2) `OSAtomic.h` uses relative imports rather than
system-wide imports for accompanying headers which clearly is an
oversight on Apple's part. Until such time when `libkern` headers
between `x86_64` and `aarch64` are identical, this will require a
manual intervention to duplicate the relevant headers between the
respective architectures.
This commit adds default search paths for system frameworks
on macOS while also adding `-isysroot` for OS versions at least BigSur.
Since BigSur (11.0.1), neither headers nor libs exist in standard
root locations (`/usr/include`, `/System/Library/Frameworks`). Instead, they
are now exclusively part of the installed developer toolchain (either
via XCode.app or CLT), and specifying `-isysroot` allows us to keep
using universal search paths such as `/System/Library/Frameworks` while
only changing the include flag from `-iframework` to
`-iframeworkwithsysroot`.
The current API does not allow the user to distinguish between EOF and
an empty line. Reader.readUntilDelimiterOrEof() gets this API right so
update readUntilDelimiterOrEofAlloc() to match it. Returning an optional
here additionally makes calling this in a loop much cleaner.
Remove readUntilDelimiterOrEofArrayList() as it no longer needed to
implement readUntilDelimiterOrEof() and has the same API issues
described without a clear way to fix them.
* read directly into the ArrayList buffers.
* respect max_output_bytes
* std.ArrayList:
- make `allocatedSlice` public.
- add `unusedCapacitySlice`.
I removed the Windows implementation of this stuff; I am doing a partial
merge of LemonBoy's patch with the understanding that a later patch can
add the Windows implementation after it is vetted.
Keep polling until there are enough open handles, if the child process
terminates closing the handles or explicitly closes them we just quit
polling and wait for the process handle to signal the termination
condition.
Reading stdin&stderr at different times may lead to nasty deadlocks (eg.
when stdout is read before stderr and the child process doesn't write
anything onto stdout).
Implement a polling mechanism to make sure this won't happen: we read
data from stderr/stdout as it becomes ready and then it's copied into an
ArrayList provided by the user, avoiding any kind of blocking read.
These tests asserted there were no args passed to the test binary, but
now there is an arg intentionally passed to the test binary, so the test
case needed to be updated.
This is not in fact safe to use with GeneralPurposeAllocator as GPA
requires align(page_size) but raw_c_allocator provides only
@alignOf(std.c.max_align_t).
We were violating the POSIX standard which resulted in a deadlock on
musl v1.1.24 on aarch64 alpine linux, uncovered with the new ThreadPool
usage in the stage2 compiler.
std.os execv functions that accept an Allocator parameter are removed
because they are footguns. The POSIX standard does not allow calls to
malloc() between fork() and execv() and since it is common to both
(1) call execv() after fork() and (2) use std.heap.c_allocator,
Programmers are encouraged to go through the `std.process` API
instead, causing some dissonance when combined with `std.os` APIs.
I also slapped a big warning message on all the relevant doc comments.
* rename is_compiler_rt_or_libc to skip_linker_dependencies
and set it to `true` for all sub-Compilations. I believe
this resolves the deadlock we were experiencing on Drone
CI and on some users' computers. I will remove the CI workaround in
a follow-up commit.
* enabling TSAN automatically causes the Compilation to link against
libc++ even if not requested, because TSAN depends on libc++.
* add -fno-rtti flags where appropriate when building TSAN objects.
Thanks Firefox317 for pointing this out.
* TSAN support: resolve all the undefined symbols. We are still seeing
a dependency on __gcc_personality_v0 but will resolve this one in a
follow-up commit.
* static libs do not try to build libc++ or libc++abi.
* split std.ResetEvent into:
- ResetEvent - requires init() at runtime and it can fail. Also
requires deinit().
- StaticResetEvent - can be statically initialized and requires no
deinitialization. Initialization cannot fail.
* the POSIX sem_t implementation can in fact fail on initialization
because it is allowed to be implemented as a file descriptor.
* Completely define, clarify, and explain in detail the semantics of
these APIs. Remove the `isSet` function.
* `ResetEvent.timedWait` returns an enum instead of a possible error.
* `ResetEvent.init` takes a pointer to the ResetEvent instead of
returning a copy.
* On Darwin, `ResetEvent` is implemented using Grand Central Dispatch,
which is exposed by libSystem.
stage2 changes:
* ThreadPool: use a single, pre-initialized `ResetEvent` per worker.
* WaitGroup: now requires init() and deinit() and init() can fail.
- Add a `reset` function.
- Compilation initializes one for the work queue in creation and
re-uses it for every update.
- Rename `stop` to `finish`.
- Simplify the implementation based on the usage pattern.