lexer: track wirelog grammar (crc32 keywords, string escapes)#2
Merged
Conversation
The wirelog dialect added two CRC-32 checksum function keywords (crc32_ethernet, crc32_castagnoli) between the UUID and string-function groups of its lexer token type. Mirror them in wiig's token catalogue, keyword table, and kind-string switch so the lossless tokenizer keeps tracking the wirelog grammar one-for-one, and cover both in the keyword test.
The wirelog lexer was fixed to treat a backslash before a quote or backslash (\" or \\) as an escaped byte that does not close a string literal. Match that in scan_string: skip the escape pair rather than terminating, while still keeping the raw bytes (wiig is a formatter and does not decode escapes). A lone trailing backslash before the closing quote now leaves the literal unterminated, which stays an ERROR token. Add coverage for escaped quote, escaped backslash, and the unterminated trailing-backslash case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The wirelog dialect's lexer changed in two ways since wiig last mirrored
it. This brings the lossless tokenizer back in sync.
Changes
crc32 keywords — wirelog added
crc32_ethernetandcrc32_castagnolito its lexer token type, between the UUID andstring-function groups. Mirrored in wiig's token catalogue, keyword
table, and kind-string switch, with keyword-test coverage.
string literal escapes — wirelog now treats a backslash before a
quote or backslash (
\"/\\) as an escaped byte that does not closea string literal.
scan_stringskips the escape pair rather thanterminating, while still keeping the raw bytes (wiig is a formatter and
does not decode escapes). A lone trailing backslash before the closing
quote leaves the literal unterminated, which stays an ERROR token.
Added coverage for escaped quote, escaped backslash, and the
unterminated trailing-backslash case.
Verification
meson test -C builddirgreen (wiig:lexer,wiig:skeleton).Two atomic commits, each builds and tests green on its own.