Ken Domino 284602b3f2
Some checks failed
CI / setup (push) Has been cancelled
CI / stale-links (push) Has been cancelled
CI / static-check (ubuntu-latest, ambiguity) (push) Has been cancelled
CI / static-check (ubuntu-latest, format) (push) Has been cancelled
CI / static-check (ubuntu-latest, no-symbolic-links) (push) Has been cancelled
CI / static-check (ubuntu-latest, useless-parens) (push) Has been cancelled
CI / maven (ubuntu-latest) (push) Has been cancelled
CI / build-pwsh (push) Has been cancelled
Weekly All-Grammars JAR / build-all-grammars-jar (push) Has been cancelled
[scala] Add new grammar for Scala 3. (#4836)
* Add Scala 3 grammar with 22 passing test cases

New split grammar (Scala3Lexer.g4 + Scala3Parser.g4) based on the
Scala 3 syntax spec, targeting the CSharp ANTLR4 runtime. Includes
desc.xml for Trash Toolkit testing and 22 example .scala files covering
traits, enums, given/using, extension methods, union/intersection types,
opaque types, pattern matching, for-comprehensions, generics, higher-order
functions, variance, type classes, match types, inline, context functions,
packages, and companion objects. All 22 examples parse without errors.

Key grammar decisions:
- ':' added to Opchar so '::' tokenises as a single Op token
- template rule made non-nullable to avoid mismatched '{' errors
- defDef accepts abstract declarations (no body) to allow the ANTLR
  adaptive predictor to correctly enter templateBody
- enumerators allows optional SEMI between generators for newline-separated
  for-comprehension generators

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Reorganize Scala grammars.

* Fix up testing.

* Minimum update to readmes.

* scala/scala3: add 20 example files reaching 98.3% parser rule coverage

Added examples/23_*.scala through examples/40_*.scala (plus null_package.scala)
to exercise previously untested parser rule call sites in Scala3Parser.g4.
Coverage grew from ~73% to 750/763 call sites (98.3%).

Also removed examples/ex1.scala (indentation-based Scala 3 syntax, incompatible
with the brace-delimited grammar).

Updated readme.md with a Parser Rule Coverage section documenting the coverage
methodology, current score, and explanations for the 8 permanently unreachable
grammar alternatives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Updated targets.

* scala/scala3: address Copilot review comments

- desc.xml: fix xsi:noNamespaceSchemaLocation path (../ -> ../../_scripts/desc.xsd)
- readme.md: clarify coverage count (8 unreachable alternative lines = 13 uncovered
  call sites); add 'Known grammar limitations' section explaining the importSelectors
  design choice (mixed named/wildcard lists not supported) and the deliberate use of
  Op for wildCardSelector, negation, and variance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix desc.xml.

* Checkpoint commit.

* scala/scala3: reduce ALL(*) lookahead in expr1, block, blockResult

Three grammar ambiguities caused ALL(*) to scan hundreds of tokens:

1. expr1: collapsed five assignment alternatives and 'postfixExpr ascription?'
   into 'postfixExpr (ASSIGN expr | ascription)?'.  Every assignment LHS is a
   sub-case of postfixExpr (= infixExpr id?), so the old alternatives shared
   the same unbounded prefix and forced max-k up to 422.

2. block: removed '| USCORE (ARROW|CTXARROW) block' and '| id (ARROW|CTXARROW)
   block' — both are fully subsumed by blockResult via funParams, so they
   introduced genuine ambiguity and deep lookahead.

3. blockResult: removed '| expr1' — it duplicated blockStat's expr1 alternative,
   causing a genuine ambiguity (every plain expression at end-of-block matched
   both blockStat and blockResult) and O(n) ALL(*) lookahead in block.

Original rules documented in comments referencing 757667725.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* scala/scala3: fix _colonArgLambda reset inside lambda parameter bindings

When a colonArgument lambda has tuple parameters, e.g.:
  .rescue: (code, msg) =>
    body

the COMMA inside `(code, msg)` was incorrectly resetting _colonArgLambda,
preventing INDENT from being emitted after `=>`.  The parser then fell back
to reading `: (code, msg)` as an ascription (type annotation), leaving `=>`
stranded and producing "mismatched input '=>' expecting ')'".

Fix: record the region-stack depth when _colonArgLambda is set, and only
reset on COMMA/SEMI at that exact depth — not at deeper nesting (i.e. not
inside the lambda's own parameter list).  Applied to both CSharp and Java
ports of Scala3LexerBase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* scala/scala3: drain Indented regions on COMMA for same-line colonArgument bodies

When a colonArgument body is entirely on one line followed by a comma, e.g.:
  tv = userTv.map: u =>
    lila.round.OnTv.User(u.id),   // INDENT opened, no NEWLINE before ','

no NEWLINE-triggered DEDENT fires before the comma, leaving the Indented
region open.  The comma then appears to be inside the indented block rather
than at the enclosing argument-list level, producing 'no viable alternative'.

Fix: handle COMMA the same way as RPAREN/RBRACKET — drain any open Indented
regions before processing the token.  Applied to both CSharp and Java ports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* scala/scala3: allow INDENT for => inside (…); remove colonArgLambda mechanism

Multi-statement lambda bodies inside parentheses —
  f(loginData =>
      val x = foo()
      x)
— require INDENT/DEDENT injection after => even when the enclosing context
is InParens.  The previous code suppressed INDENT for ARROW/CTXARROW inside
InParens to avoid trailing-comma conflicts, but the COMMA drain (already
present) handles those correctly: it emits DEDENT before processing any comma
that appears on the same line as the lambda body.

Removing the exclusion also makes the _colonArgLambda / _colonArgLambdaStackDepth
mechanism redundant (it existed solely to override the exclusion for
colonArgument lambdas), so both fields and all associated tracking code are
deleted, simplifying the lexer base significantly.

Fixes parse errors in lila controllers/Auth.scala.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* scala/scala3: fix InterpolatedStringLiteral for nested \${...} expressions

'$' '{' .*? '}' (non-greedy dot-star) stops at the first '}', breaking
expressions like \${msgs.map(m => s"\${m.date}...").mkString("\n")} where
the interpolated block itself contains nested braces or string literals.

Replace .*? with InterpExprContent*, a fragment that handles:
  - ordinary chars (excluding {, }, ", \, newlines)
  - escape sequences (\.)
  - double-quoted string literals ("...")
  - one additional level of nested { } (via InterpExprContent0)

Two levels of nesting covers the practical cases in lila (Lichess) source.
InterpMultiPart (triple-quoted strings) gets the same fix.

Fixes parse errors in lila controllers/Mod.scala.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* scala/scala3: fix multiline \${...} in triple-quoted interpolated strings

InterpExprContent excludes \r\n (correct for single-quoted strings, which
cannot span lines), but triple-quoted strings allow \${...} expressions
that span multiple source lines, e.g.:

  s\"\"\"\${env.user.lightUserApi
      .sync(t.createdBy)
      .fold(t.createdBy)(_.name)}\"\"\"

Add InterpMultiExprContent / InterpMultiExprContent0 fragments — identical
to InterpExprContent / InterpExprContent0 except that ~[{}"\]+ does NOT
exclude \r or \n — and use them in InterpMultiPart's \${ } alternative.

Fixes parse errors in lila controllers/Tournament.scala.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update desc.xml with correct targets.

* scala/scala3: add AS, DERIVES, USING to CanStartStat

Contextual keywords AS, DERIVES, and USING can be used as plain
identifiers in Scala 3 (e.g. `as <- fetchActivity` in a for-generator).
Because they were absent from CanStartStat, a NEWLINE before a line
starting with one of these keywords was suppressed, breaking the
end_of_stat separator between for-comprehension generators and causing
cascading parse errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* scala/scala3: fix rparenOpensIndent to exclude InBraces context

RPAREN must not open a new indented region when the outer context is
InBraces: inside a brace block, trailing RPAREN from a function call
(e.g. `authenticateAppealUser(u, redirectTo)`) can appear at a higher
column than curIndent (because curIndent tracks a surrounding block,
not the brace block), so the (newIndent > curIndent) guard would fire
and emit a spurious INDENT before the following token (e.g. `else`).

The fix adds `&& _regionStack.Peek() != Region.InBraces` to the
rparenOpensIndent predicate. The condition still allows RPAREN to
open an INDENT when the outer context is Indented or TopLevel, which
is the case for `extension (params)` bodies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* scala/scala3: fix rparenOpensIndent for class-header continuations and triple-quoted escape

Two fixes:

1. rparenOpensIndent: also exclude the case where the next-line token is
   EXTENDS or WITH — these are class/trait header continuation keywords,
   not the start of a new indented block.  Without this guard, a class
   definition like
       final class Foo(params)
           extends Bar:
   emits a spurious INDENT before `extends` because `params)` ends with
   RPAREN, the outer region is TopLevel/Indented (not InParens/InBraces),
   and `extends` is at a deeper indent than curIndent.

2. Scala3Lexer.g4 InterpMultiPart: add `'\' .` backslash-escape support
   and update the non-special char class from ~["$]+ to ~["$\]+.
   Without this, s"""...\"""" (triple-quoted interpolated string ending
   with \") is mis-tokenized: `\` is consumed as plain content, `""""`
   matches as one content-quote plus a closing `"""`, leaving a stray `"`
   that cannot be lexed.  With `'\' .` having priority, `\"` is consumed
   as a two-character escape unit, and the remaining `"""` correctly closes
   the string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Test.

* Test mvn clean test--it is broken.

* _scripts/maven.sh: skip duplicate grammar directories

When multiple changed files map to the same grammar directory, the
walk-up loop could produce the same path more than once.  Add a
membership check before appending to grammars() so each directory
appears at most once.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix debug change.

* scala/scala3: add --3.0-migration option for Scala 2-compatible syntax

Add a --3.0-migration command-line flag that enables two Scala 2-style
constructs accepted by the Scala 3 compiler under -source:3.0-migration:
  - ._ wildcard import selectors (e.g. import scala.jdk.CollectionConverters._)
  - _ as a wildcard type argument (e.g. Seq[_])

Also fix post-DEDENT NEWLINE re-surface inside InBraces regions, so that
val definitions with indented RHS bodies (e.g. val x: T =\n  expr) inside
class bodies correctly emit a NEWLINE separator before the next member.

Implementation:
  - Scala3LexerBase (C# and Java): read --3.0-migration from command-line args;
    remove && newTop != Region.InBraces from the post-DEDENT NEWLINE condition
  - Scala3ParserBase (C# and Java): new parser base class exposing migration30()
  - Scala3Parser.g4: superClass = Scala3ParserBase; {migration30()}? predicates
    guard the USCORE alternatives in wildCardSelector and simpleType_
  - readme.md: document the new option; add examples 2246.scala2, 4043a/b.scala

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add testFileExtension.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 10:03:06 -04:00
2026-04-09 18:28:54 -04:00
2023-11-29 08:14:25 -07:00
2023-04-21 12:53:11 -06:00
2026-04-09 18:28:54 -04:00
2026-04-06 19:47:51 -04:00
2026-04-09 18:28:54 -04:00
2026-05-15 15:30:31 -04:00
2024-05-18 13:08:01 -06:00
2026-03-15 13:31:01 -06:00
2026-04-09 18:28:54 -04:00
2026-04-09 18:28:54 -04:00
2026-02-14 12:11:51 -07:00
2017-07-16 15:47:46 -06:00
2017-07-16 15:47:46 -06:00
2023-12-23 11:04:45 -07:00

CI

Grammars-v4

This repository is a collection of formal grammars written for ANTLR v4

The root directory name is the all-lowercase name of the language or file format parsed by the grammar. For example, java, cpp, csharp, c, etc...

FAQ

Please refer to the grammars-v4 Wiki

Code of Conduct

Please refer to House Rules

Description
Grammars written for ANTLR v4; expectation that the grammars are free of actions.
Readme 107 MiB
Languages
ANTLR 60.8%
Java 11.9%
Python 6.1%
C# 5.3%
TypeScript 4.5%
Other 10.9%