These constants were read as a block count in initForBlockCount()
but at the same time, as a size in update().
The unit could be blocks or bytes, but we should use the same one
everywhere.
So, use blocks as intended.
Fixes#13506
- the meaning of packed structs changed in zig 0.10. adjust accordingly.
Use "extern struct" for the cases that directly map to C structs.
- Add new type info kinds, like enum64 and DeclTag
- change the Type enum to use the canonical names from libbpf.
This is more predictable when comparing with external BPF
documentation (than invented synonyms that need to be guessed)
* std.crypto: make ghash faster, esp. for small messages
Aggregated reduction requires 5 additional multiplications (to
precompute the powers of H), in order to save 2 multiplications
per batch.
So, only use large batches when it's actually interesting to do so.
For the last blocks, reuse the precomputations in order to perform
a single reduction.
Also, even in .ReleaseSmall, allow 2-block aggregation.
The speedup is worth it, and the code increase is reasonable.
And in .ReleaseFast, bump the upper batch size up to 16.
Leverage comptime by the way instead of duplicating code.
std/crypto/benchmark.zig on Apple M1:
Zig 0.10.0: 2769 MiB/s
Before: 6014 MiB/s
After: 7334 MiB/s
Normalize function names by the way.
* Change clmul() to accept the half to be processed
This avoids a bunch of truncate() calls.
* Add more ghash tests to check all code paths
* crypto.core.aes: process 6 block in parallel instead of 8 on aarch64
At least on Apple Silicon, this is slightly faster than 8 blocks.
* AES: add parallel blocks for tigerlake, rocketlake, alderlake, zen3
...instead of hard-coding it to 20.
- This is consistent with the ChaCha implementation
- NaCl and libsodium, that this API is designed to interop with,
also support 8 and 12 round variants. The 12 round variant, in
particular, provides the same security level as the 20 round variant,
but is obviously faster.
- scrypt currently uses its own non optimized version of Salsa, just
because it use 8 rounds instead of 20. This will help remove code
duplication.
No behavior nor public API changes. The Salsa20 and XSalsa20 still
represent the 20-round variant.
PR #13101 recently renamed the "i386" architecture to "x86", and it
seems the specific CPU model got swept up in that. "x86" is an umbrella
term that describes a family of CPUs, and the "i386" is the oldest
supported model under that umbrella.
Rewrite GHASH to use 128-bit multiplication over non-reversed
integers, and up to 8 blocks aggregated reduction.
lib/std/crypto/benchmark.zig results:
Xeon E5:
Before: 1604 MiB/s
After: 4005 MiB/s
Apple M1:
Before: 2769 MiB/s
After: 6014 MiB/s
This also makes AES-GCM faster by the way.
Instead of making the memory alignment functions more complicated, I
added more API documentation for their existing semantics.
closes#12118closes#12135
* std.os.uefi: integer backed structs, add tests to catch regressions
device_path_protocol now uses extern structs with align(1) fields because
the transition to integer backed packed struct broke alignment
added comptime asserts that device_path_protocol structs do not violate
alignment and size specifications