Migrate to Runt by Nikil-Shyamsunder · Pull Request #252 · cucapra/protocols

Nikil-Shyamsunder · 2026-06-23T03:17:20Z

Runt-based snapshot testing: catalog-driven workflow

TL;DR for contributors

Add/edit a test: edit scripts/test_catalog.py, run
python3 scripts/generate_runt_configs.py, then runt -s runt/<suite> to
save the golden .expect file. Commit the catalog, the regenerated
runt.toml, and the new .expect.
Run tests: just runt (or runt runt/interp, runt runt/monitor,
runt runt/graph_interp).
never hand-write runt.toml

How it works

scripts/test_catalog.py -> scripts/generate_runt_configs.py
-> runt/<suite>/runt.toml -> runt  
->  <test_dir>/expects/<stem>.<runner>.expect

The catalog holds only the facts that can't be derived from the .prot/.tx/
.v files themselves (how a test wires up to its protocol/RTL and its expected
outcome). Everything else — commands, expect-file names, which protocol features
a test uses — is computed by the generator.

The catalog (`scripts/test_catalog.py`)

TODO: Rename TX to Interp
I previously presented a thing where we had protocols, interp, and monitor stuff as first-class objects in the catalog. now just the interp and monitor stuff as first class things, because protocols are not really tests. interp and monitor cases just point to protocols, and when we need data about the protocol used (i.e. what the constructs used are), we generate that when we actually generate the runt configs later. this minimizes the amount of duplicate maintenance of tests you need to check in.

`TX_CASES` — interpreter / graph-interpreter cases, keyed by `.tx` path

"tests/adders/adder_d0/add_combinational.tx": {
    "protocol": "tests/adders/adder_d0/add_d0.prot",   # the .prot it runs against
    "verilog": ("tests/adders/adder_d0/add_d0.v",),     # RTL (optional)
    "top": "picorv32_pcpi_mul",                          # top module (optional)
    "expect": "pass",                                    # "pass" or a failure class
    # "max_steps": 8,        (optional)
    # "extra_args": ("--skip-static-step-fork-checks",),  (optional)
},

expect is "pass", or a failure-class string for expected failures
(comb_dependency, assertion_mismatch, assignment_conflict,
fork_protocol_error, static_type_error, static_well_formedness,
max_steps).

`MONITOR_CASES` — monitor cases, keyed by a unique id

Monitors are keyed by id (not path) because many cases can share one .prot
(e.g. the antmicro cases differ only by waveform).

"tests.wishbone.wishbone.monitor": {
    "protocol": "tests/wishbone/wishbone.monitor.prot",
    "wave": "tests/wishbone/reqwalker.vcd",
    "instances": ("TOP.reqwalker:WBSubordinate",),
    "expect": "pass",
    "extra_args": ("--sample-posedge", "TOP.reqwalker.i_clk"),
    # "max_steps" / "timeout_secs": (optional)
},

Programatically Generating Cases

The large antmicro family is generated from a list of trace stems via a small
helper at the bottom of the file. I reccomend you do something like this whenever we have to procedurally generate a number of cases.

The suites

Exactly three, and their union covers every test:

suite	runner	contents
`interp`	interpreter	every `TX_CASES` entry
`monitor`	monitor	every `MONITOR_CASES` entry
`graph_interp`	graph interpreter	the subset of passing tx whose protocol has no `for`/`repeat` loop

Each suite is defined using a series of filters over the test cases. Since this is all python, you can do arbitrary filtering.

How features are detected (graph_interp selection)

The graph interpreter can't yet handle for-in/repeat loops. Instead of
tracking an allowlist, the generator asks the protocol compiler which constructs
each protocol uses:

cargo run --bin protocols-cli -- -p <file>.prot constructs

This prints the AST-derived constructs per protocol definition. A passing tx is
included in graph_interp iff its protocol uses no for_in_loop/repeat_loop.
This reads the real AST using an EnumDiscriminant macro, which means if you add new Stmt types to the AST or add new test cases, there is no maintenance overhead.

Expected failures and timeouts

Expected failures (expect != "pass"): the runner prints its diagnostic
to stdout and exits non-zero; Runt captures both the message and the exit code
in the .expect snapshot, so failure output is diff-tested like everything
else.
Expected timeouts: unfortunately runt can not handle an expected timeout. a few monitor cases are non-terminating by design and
set timeout_secs. The generated command wraps them in a small timeout script
timeout (kills the process group, exits 124) which we can expect in Runt.

auto-generated naming of .expect files

Expect file: <test_dir>/expects/<stem>.<runner>.expect
(e.g. add_combinational.interp.expect). The <runner> keeps a .tx's
interp and graph_interp goldens from colliding; monitor antmicro cases are
named by their waveform stem.
The generator errors out if two cases would ever map to the same expect file
with different commands, so collisions can't silently clobber a golden.

CI

In .github/workflows/test.yml, the tests job:

builds the workspace,
regenerates the configs and fails if they differ from what's checked in
(python3 scripts/generate_runt_configs.py + git diff) — so a stale
runt.toml can't slip through,
runs cargo test,
runs runt/interp, runt/graph_interp, runt/monitor.

Runt is installed in CI from our fork
(cargo install --git https://github.com/Nikil-Shyamsunder/runt.git), which adds
the custom expect file naming behavior these suites rely on.

other Repo-level organization changes

All snapshot tests now live under a common tests/ (and examples/) tree instead of being scattered per-crate, with one catalog, describing them. expect files for test families are under the expect/ directory for each
all turnt stuff is completely deleted

Files that should be reviewed by hand

scripts/test_catalog.py - hand-maintained catalog (the only thing you edit to add tests)

scripts/generate_runt_configs.py - generates runt/*/runt.toml from the catalog when you want to make a new suite

.github/workflows/test.yml - CI: config-freshness check + the three suites

ast.rs and cli/main.rs - a few changes that allow us to print all the constructs in a .prot file

…determinism remove all the .out files and .err fiels from turnt

serialize all runt tests

and remove unecessary runt targets

ekiwi · 2026-06-24T01:34:45Z

unfortunately runt can not handle an expected timeout. a few monitor cases are non-terminating by design

Can you point me to these tests? That seems a little sketch.

Nikil-Shyamsunder added 5 commits June 22, 2026 23:16

switch to runt

b624a20

coalesce tests

d02ab8c

prettify

38cca7a

ruff format and delete the error messages

e49e8c7

expand to all passing tests for graph-interpreter that dont have non-…

0f5c254

…determinism remove all the .out files and .err fiels from turnt

Nikil-Shyamsunder changed the title ~~Monstor Runt Migration~~ Monster Runt Migration Jun 23, 2026

move to a custom runt version in CI

b37bcb6

serialize all runt tests

Nikil-Shyamsunder force-pushed the runt branch from 978ec96 to d88accf Compare June 23, 2026 15:06

resolve failing monitor tests missing vcd

fb9f728

Nikil-Shyamsunder force-pushed the runt branch from d88accf to fb9f728 Compare June 23, 2026 15:12

switch to cli command printing constructs and simplify the test catalog

14dab17

Nikil-Shyamsunder force-pushed the runt branch from e0b3bb5 to fb8bf66 Compare June 23, 2026 20:54

simplify generate_runt_configs

14301f8

Nikil-Shyamsunder force-pushed the runt branch 3 times, most recently from 090d682 to 5316520 Compare June 23, 2026 21:19

check runt configs are up to date in CI

48e91c9

and remove unecessary runt targets

Nikil-Shyamsunder force-pushed the runt branch from 5316520 to 48e91c9 Compare June 23, 2026 21:26

Nikil-Shyamsunder marked this pull request as ready for review June 23, 2026 22:17

ekiwi reviewed Jun 24, 2026

View reviewed changes

Comment thread .github/workflows/test.yml Outdated

simplify naming conventions

2d26811

ekiwi reviewed Jun 24, 2026

View reviewed changes

Comment thread scripts/roundtrip_case.py Outdated

delete roundtrip for now and move test freshness to its own job

b2b49bb

Nikil-Shyamsunder changed the title ~~Monster Runt Migration~~ Migrate to Runt Jun 24, 2026

Nikil-Shyamsunder force-pushed the runt branch from 933e177 to 2d26811 Compare June 24, 2026 14:08

delete a lingering markdown doc

ba7210d

Nikil-Shyamsunder merged commit 0c9ac13 into main Jun 24, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate to Runt#252

Migrate to Runt#252
Nikil-Shyamsunder merged 13 commits into
mainfrom
runt

Nikil-Shyamsunder commented Jun 23, 2026 •

edited

Loading

Uh oh!

ekiwi commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Nikil-Shyamsunder commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Runt-based snapshot testing: catalog-driven workflow

TL;DR for contributors

How it works

The catalog (scripts/test_catalog.py)

TX_CASES — interpreter / graph-interpreter cases, keyed by .tx path

MONITOR_CASES — monitor cases, keyed by a unique id

Programatically Generating Cases

The suites

How features are detected (graph_interp selection)

Expected failures and timeouts

auto-generated naming of .expect files

CI

other Repo-level organization changes

Files that should be reviewed by hand

Uh oh!

ekiwi commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Nikil-Shyamsunder commented Jun 23, 2026 •

edited

Loading

The catalog (`scripts/test_catalog.py`)

`TX_CASES` — interpreter / graph-interpreter cases, keyed by `.tx` path

`MONITOR_CASES` — monitor cases, keyed by a unique id

ekiwi commented Jun 24, 2026 •

edited

Loading