Commit Graph

704 Commits

Author SHA1 Message Date
mlugg
605f2a0978
cases: update for new error wording, add coverage for field/decl name conflict 2024-08-29 23:43:52 +01:00
Andrew Kelley
c2b8afcac9 tokenizer: tabs and carriage returns spec conformance 2024-07-31 16:57:42 -07:00
Andrew Kelley
377e8579f9 std.zig.tokenizer: simplify
I pointed a fuzzer at the tokenizer and it crashed immediately. Upon
inspection, I was dissatisfied with the implementation. This commit
removes several mechanisms:
* Removes the "invalid byte" compile error note.
* Dramatically simplifies tokenizer recovery by making recovery always
  occur at newlines, and never otherwise.
* Removes UTF-8 validation.
* Moves some character validation logic to `std.zig.parseCharLiteral`.

Removing UTF-8 validation is a regression of #663, however, the existing
implementation was already buggy. When adding this functionality back,
it must be fuzz-tested while checking the property that it matches an
independent Unicode validation implementation on the same file. While
we're at it, fuzzing should check the other properties of that proposal,
such as no ASCII control characters existing inside the source code.

Other changes included in this commit:

* Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This
  function has an awkward API that is too easy to misuse.
* Make `utf8Decode2` and friends use arrays as parameters, eliminating a
  runtime assertion in favor of using the type system.

After this commit, the crash found by fuzzing, which was
"\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea"
no longer causes a crash. However, I did not feel the need to add this
test case because the simplified logic eradicates most crashes of this
nature.
2024-07-31 16:57:42 -07:00
gooncreeper
c50f300387 Tokenizer bug fixes and improvements
Fixes many error messages corresponding to invalid bytes displaying the
wrong byte. Additionaly improves handling of UTF-8 in some places.
2024-07-15 11:31:19 +03:00
Krzysztof Wolicki
45c77931c2 Change deprecated b.host to b.graph.host in tests and Zig's build.zig 2024-06-13 10:49:06 -04:00
Andrew Kelley
142471fcc4 zig build system: change target, compilation, and module APIs
Introduce the concept of "target query" and "resolved target". A target
query is what the user specifies, with some things left to default. A
resolved target has the default things discovered and populated.
In the future, std.zig.CrossTarget will be rename to std.Target.Query.
Introduces `std.Build.resolveTargetQuery` to get from one to the other.

The concept of `main_mod_path` is gone, no longer supported. You have to
put the root source file at the module root now.

* remove deprecated API
* update build.zig for the breaking API changes in this branch
* move std.Build.Step.Compile.BuildId to std.zig.BuildId
* add more options to std.Build.ExecutableOptions, std.Build.ObjectOptions,
  std.Build.SharedLibraryOptions, std.Build.StaticLibraryOptions, and
  std.Build.TestOptions.
* remove `std.Build.constructCMacro`. There is no use for this API.
* deprecate `std.Build.Step.Compile.defineCMacro`. Instead,
  `std.Build.Module.addCMacro` is provided.
  - remove `std.Build.Step.Compile.defineCMacroRaw`.
* deprecate `std.Build.Step.Compile.linkFrameworkNeeded`
  - use `std.Build.Module.linkFramework`
* deprecate `std.Build.Step.Compile.linkFrameworkWeak`
  - use `std.Build.Module.linkFramework`
* move more logic into `std.Build.Module`
* allow `target` and `optimize` to be `null` when creating a Module.
  Along with other fields, those unspecified options will be inherited
  from parent `Module` when inserted into an import table.
* the `target` field of `addExecutable` is now required. pass `b.host`
  to get the host target.
2024-01-01 17:51:18 -07:00
Andrew Kelley
5eb5d523b5 give modules friendly names for error reporting 2023-10-08 20:58:04 -07:00
mlugg
09a57583a4
compiler: preserve result type information through address-of operator
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for #16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements #17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: #16512
Resolves: #17194
2023-09-23 22:01:08 +01:00
mlugg
2209813bae
cases: modify error wording to match new errors
The changes to result locations and generic calls has caused mild
changes to some compile errors. Some are slightly better, some slightly
worse, but none of the changes are major.
2023-08-10 10:00:37 +01:00
Jacob Young
736df27663 Sema: use the correct decl for generic argument source locations
Closes #16746
2023-08-09 10:09:01 -04:00
mlugg
3a25f6a22e Port some stage1 test cases to stage2
There are now very few stage1 cases remaining:
* `cases/compile_errors/stage1/obj/*` currently don't work correctly on
  stage2. There are 6 of these, and most of them are probably fairly
  simple to fix.
* `cases/compile_errors/async/*` and all remaining `safety/*` depend on
  async; see #6025.

Resolves: #14849
2023-03-20 19:55:50 -04:00
Andrew Kelley
29cfd47d65 re-enable test-cases and get them all passing
Instead of using `zig test` to build a special version of the compiler
that runs all the test-cases, the zig build system is now used as much
as possible - all with the basic steps found in the standard library.

For incremental compilation tests (the ones that look like foo.0.zig,
foo.1.zig, foo.2.zig, etc.), a special version of the compiler is
compiled into a utility executable called "check-case" which checks
exactly one sequence of incremental updates in an independent
subprocess. Previously, all incremental and non-incremental test cases
were done in the same test runner process.

The compile error checking code is now simpler, but also a bit
rudimentary, and so it additionally makes sure that the actual compile
errors do not include *extra* messages, and it makes sure that the
actual compile errors output in the same order as expected. It is also
based on the "ends-with" property of each line rather than the previous
logic, which frankly I didn't want to touch with a ten-meter pole. The
compile error test cases have been updated to pass in light of these
differences.

Previously, 'error' mode with 0 compile errors was used to shoehorn in a
different kind of test-case - one that only checks if a piece of code
compiles without errors. Now there is a 'compile' mode of test-cases,
and 'error' must be only used when there are greater than 0 errors.

link test cases are updated to omit the target object format argument
when calling checkObject since that is no longer needed.

The test/stage2 directory is removed; the 2 files within are moved to be
directly in the test/ directory.
2023-03-15 10:48:14 -07:00
mlugg
f94cbab3ac
Add test coverage for some module structures 2023-02-21 02:05:36 +00:00
Tom Read Cutting
346ec15c50
Correctly handle carriage return characters according to the spec (#12661)
* Scan from line start when finding tag in tokenizer

This resolves a crash that can occur for invalid bytes like carriage
returns that are valid characters when not parsed from within literals.

There are potentially other edge cases this could resolve as well, as
the calling code for this function didn't account for any potential
'pending_invalid_tokens' that could be queued up by the tokenizer from
within another state.

* Fix carriage return crash in multiline string

Follow the guidance of #38:

> However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR.

Zig fmt already had code for deleting carriage returns, but would still
crash - now it no longer does so. Carriage returns encountered before
line-feeds are now appropriately removed on program compilation as well.

* Only accept carriage returns before line feeds

Previous commit was much less strict about this, this more closely
matches the desired spec of only allow CR characters in a CRLF pair, but
not otherwise.

* Fix CR being rejected when used as whitespace

Missed this comment from ziglang/zig-spec#83:

> CR used as whitespace, whether directly preceding NL or stray, is still unambiguously whitespace. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.

* Add tests for carriage return handling
2023-02-19 14:14:03 +02:00
Veikka Tuominen
8eea73fb92 add tests for tuple declarations 2022-11-23 22:16:31 +02:00
Veikka Tuominen
c3b85e4e2f Sema: further enhance explanation of why expr is evaluated at comptime 2022-10-28 13:31:16 +03:00
r00ster91
51d9db8569 fix(text): hyphenate "comptime" adjectives 2022-10-05 21:19:30 +02:00
r00ster91
654e0b6679 fix(text): hyphenation and other fixes 2022-10-05 21:19:10 +02:00
John Schmidt
b6bda5183e sema: load the correct AST in failWithInvalidComptimeFieldStore
The container we want to get the fields from might not be declared in the
same file as the block we are analyzing, so we should get the AST from
the decl's file instead.
2022-09-26 08:56:34 +02:00
Veikka Tuominen
b2f02a820f Sema: check for astgen failures in semaStructFields
The struct might be a top level struct in which case it might not have Zir.

Closes #12548
2022-08-22 11:16:36 +03:00
Veikka Tuominen
e8102d8738 Sema: add note about function call being comptime because of comptime only return type 2022-08-21 12:24:48 +03:00
Veikka Tuominen
2f34d06d01 Sema: analyzeInlineCallArg needs a block for the arg and the param 2022-07-25 22:04:08 +03:00
Veikka Tuominen
1463144fc8 Compilation: point caret in error message at the main token 2022-07-15 15:11:43 +03:00
r00ster91
da75eb0d79
Compilation: indent multiline error messages properly
Co-authored-by: Veikka Tuominen <git@vexu.eu>
2022-07-12 00:10:39 +03:00
Jakub Konka
62625d9d95 test: migrate stage1 compile error tests to updated test manifest 2022-04-28 18:35:01 +02:00
Andrew Kelley
243afdcdf5 test harness improvements
* `-Dskip-compile-errors` is removed; `-Dskip-stage1` is added.
 * Use `std.testing.allocator` instead of a new instance of GPA.
   - Fix the memory leaks this revealed.
 * Show the file name when it is not parsed correctly such as when the
   manifest is missing.
   - Better error messages when test files are not parsed correctly.
 * Ignore unknown files such as swap files.
 * Move logic from declarative file to the test harness implementation.
 * Move stage1 tests to stage2 tests where appropriate.
2022-03-31 15:10:31 -07:00
Andrew Kelley
47dfaf47b8 stage2: test compile errors independently
Until we land some incremental compilation bug fixes, this prevents CI
failures when running the compile errors test cases.
2022-03-30 11:22:27 -07:00
Andrew Kelley
9aa431cba3 test harness: include case names for compile errors
in the progress nodes
2022-03-29 12:01:45 -07:00
Cody Tapscott
0568b45779 Move existing compile errors to independent files
Some cases had to stay behind, either because they required complex case
configuration that we don't support in independent files yet, or because
they have associated comments which we don't want to lose track of.

To make sure I didn't drop any tests in the process, I logged all
obj/test/exe test cases from a run of "zig build test" and compared
before/after this change.

All of the test cases match, with two exceptions:
 - "use of comptime-known undefined function value" was deleted, since
   it was a duplicate
 - "slice sentinel mismatch" was renamed to "vector index out of
   bounds", since it was incorrectly named
2022-03-25 12:27:46 -07:00
Cody Tapscott
7f64f7c925 Add rudimentary compile error test file support
This brings two quality-of-life improvements for folks working on
compile error test cases:
 - test cases can be added/changed without re-building Zig
 - wrapping the source in a multi-line string literal is not necessary

I decided to keep things as simple as possible for this initial
implementation. The test "manifest" is a contiguous comment block at the
end of the test file:
1. The first line is the test case name
2. The second line is a blank comment
2. The following lines are expected errors

Here's an example:
```zig
const U = union(enum(u2)) {
    A: u8,
    B: u8,
    C: u8,
    D: u8,
    E: u8,
};
export fn entry() void {
    _ = U{ .E = 1 };
}

// union with too small explicit unsigned tag type
//
// tmp.zig:1:22: error: specified integer tag type cannot represent every field
// tmp.zig:1:22: note: type u2 cannot fit values in range 0...4
```

The mode of the test (obj/exe/test), as well as the target
(stage1/stage2) is determined based on the directory containing the
test.

We'll probably eventually want to support embedding this information
in the test files themselves, similar to the arocc test runner, but
that enhancement can be tackled later.
2022-03-25 12:25:43 -07:00
Mitchell Hashimoto
a36f4ee290 stage2: able to slice to sentinel index at comptime
The runtime behavior allowed this in both stage1 and stage2, but stage1
fails with index out of bounds during comptime. This behavior makes
sense to support, and comptime behavior should match runtime behavior. I
implement this fix only in stage2.
2022-03-23 17:08:08 -04:00
Mitchell Hashimoto
91fd0f42c8 stage2: out of bounds error for slicing 2022-03-21 22:10:34 -04:00
Daniel Hooper
911c839e97
add error when binary ops don't have matching whitespace on both sides
This change also moves the warning about "&&" from the AstGen into the parser so that the "&&" warning can supersede the whitespace warning.
2022-03-20 12:55:04 +02:00
Robin Voetter
5c3325588e stage1: make type names more unique 2022-03-19 19:40:46 -04:00
Mitchell Hashimoto
394252c9db stage2: move duplicate error set check to AstGen 2022-03-16 01:41:22 -04:00
Jonathan Marler
d805adddd6 deprecated TypeInfo in favor of Type
Co-authored-by: Veikka Tuominen <git@vexu.eu>
2022-03-08 20:38:12 +02:00
Veikka Tuominen
f8154905e7 stage1: rename TypeInfo.FnArg to Fn.Param 2022-02-23 09:44:36 +02:00
Veikka Tuominen
2f0204aca3 parser: fix "previous field here" pointing to wrong field 2022-02-19 10:15:54 +02:00
Veikka Tuominen
6b65590715 parser: add notes to decl_between_fields error 2022-02-17 22:16:26 +02:00
Veikka Tuominen
92f2767814 parser: add error for missing colon before continue expr
If a '(' is found where the continue expression was expected and it is
on the same line as the previous token issue an error about missing
colon before the continue expression.
2022-02-17 20:51:26 +02:00
Veikka Tuominen
c9dde10f86 stage1: improve error message when casting tuples 2022-02-17 17:39:54 +02:00
Veikka Tuominen
9c36cf92f0 parser: make some errors point to end of previous token
For some errors if the found token is not on the same line as
the previous token, point to the end of the previous token.
This usually results in more helpful errors.
2022-02-17 14:23:35 +02:00
Veikka Tuominen
8a432436ae update compile error tests 2022-02-13 13:48:20 +02:00
Sebsatian Keller
f5471299d8
stage 1: improve error message if error union is cast to payload (#10770)
Also: Added special error message for for `?T` to `T` casting
2022-02-09 20:35:53 -05:00
Andrew Kelley
449554a730 stage2: remove anytype fields from the language
closes #10705
2022-02-01 19:06:40 -07:00
Andrew Kelley
f4a249325e stage1: avoid anytype fields for type info
prerequisite for #10705
2022-02-01 18:10:19 -07:00
Andrew Kelley
4d22fa5a2a update behavior tests and compile error tests 2022-01-31 22:33:49 -07:00
Andrew Kelley
664e1a892c stage1: remove the "referenced here" error note
It's generally noise. The parts where it is useful will need to be
redone to not be annoying for the general case.
2022-01-20 13:27:52 -05:00
Andrew Kelley
70894d5c2f AstGen: fix loop result locations
The main problem was that the loop body was treated as an expression
that was one of the peer result values of a loop, when in reality the
loop body is noreturn and only the `break` operands are the result
values of loops.

This was solved by introducing an override that prevents rvalue() from
emitting a store to result location instruction for loop bodies.

An orthogonal change also included in this commit is switching
`elem_val` index expressions to using `coerced_ty` and doing the
coercion to `usize` inside `Sema`, resulting in smaller ZIR (since the
cast becomes implied).

I also changed the break operand expression to use `reachableExpr`,
introducing a new compile error for double break.

This makes a few more behavior tests pass for `while` and `for` loops.
2021-12-27 15:30:31 -07:00
Isaac Freund
9f9f215305
stage1, stage2: rename c_void to anyopaque (#10316)
zig fmt now replaces c_void with anyopaque to make updating
code easy.
2021-12-19 00:24:45 -05:00