Skip to content

lexer: track wirelog grammar (crc32 keywords, string escapes)#2

Merged
justinjoy merged 2 commits into
mainfrom
lexer-track-wirelog-grammar
Jun 29, 2026
Merged

lexer: track wirelog grammar (crc32 keywords, string escapes)#2
justinjoy merged 2 commits into
mainfrom
lexer-track-wirelog-grammar

Conversation

@justinjoy

Copy link
Copy Markdown
Collaborator

The wirelog dialect's lexer changed in two ways since wiig last mirrored
it. This brings the lossless tokenizer back in sync.

Changes

  • crc32 keywords — wirelog added crc32_ethernet and
    crc32_castagnoli to its lexer token type, between the UUID and
    string-function groups. Mirrored in wiig's token catalogue, keyword
    table, and kind-string switch, with keyword-test coverage.

  • string literal escapes — wirelog now treats a backslash before a
    quote or backslash (\" / \\) as an escaped byte that does not close
    a string literal. scan_string skips the escape pair rather than
    terminating, while still keeping the raw bytes (wiig is a formatter and
    does not decode escapes). A lone trailing backslash before the closing
    quote leaves the literal unterminated, which stays an ERROR token.
    Added coverage for escaped quote, escaped backslash, and the
    unterminated trailing-backslash case.

Verification

meson test -C builddir green (wiig:lexer, wiig:skeleton).

Two atomic commits, each builds and tests green on its own.

The wirelog dialect added two CRC-32 checksum function keywords
(crc32_ethernet, crc32_castagnoli) between the UUID and string-function
groups of its lexer token type. Mirror them in wiig's token catalogue,
keyword table, and kind-string switch so the lossless tokenizer keeps
tracking the wirelog grammar one-for-one, and cover both in the keyword
test.
The wirelog lexer was fixed to treat a backslash before a quote or
backslash (\" or \\) as an escaped byte that does not close a string
literal. Match that in scan_string: skip the escape pair rather than
terminating, while still keeping the raw bytes (wiig is a formatter and
does not decode escapes). A lone trailing backslash before the closing
quote now leaves the literal unterminated, which stays an ERROR token.
Add coverage for escaped quote, escaped backslash, and the unterminated
trailing-backslash case.
@justinjoy justinjoy merged commit 193a5f5 into main Jun 29, 2026
3 checks passed
@justinjoy justinjoy deleted the lexer-track-wirelog-grammar branch June 29, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant