test(scatter): reproduce #1592 dynamic-address GM scatter deadlock in cube+vec mix scope#1627
Conversation
…s#1592 Add FusedMatmulDynScatterProgram + block-table helper: the scatter row index is read from a GM block table (dynamic-address pto.tstore) on the vec side of a cube+vec mix scope — the variant that compiles clean but deadlocks at runtime (507018) per the hw-native-sys#1592 isolation matrix.
|
Warning Review limit reached
More reviews will be available in 1 minute and 37 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces new test cases and programs to address Issue #1592, which involves a fused matmul followed by a dynamic-address scatter operation where the row index is read from a block table. Specifically, it adds the _make_block_table helper, FusedMatmulDynScatterProgram, FusedMatmulDynScatterNoSplitProgram, and their corresponding test cases and test runners to verify behavior under both SplitMode.UP_DOWN and SplitMode.NONE. There are no review comments, so no additional feedback is provided.
Summary
Adds a runtime regression repro for #1592 to
tests/st/runtime/test_fused_matmul_scatter.py: a cube matmul fused with a per-row, block-table-indirected (dynamic-address) GM scatter in oneCORE_GROUPmix scope, in bothSplitMode.NONEandSplitMode.UP_DOWN.Per the issue's isolation matrix, this dynamic-address WRITE on the vec side of a cube+vec mix scope compiles clean but deadlocks at runtime (
507018), while the static-address WRITE (#1564, already in this file) runs. This PR adds the repro only — no fix yet — to confirm the behavior on a2a3 CI before committing to a fix.New programs/cases (golden identical to the #1564 static variants — only the addressing mode differs):
FusedMatmulDynScatterProgram/…NoSplitProgram— scatter row read from a block table at runtime (cache_row = pl.read(block_table, [b]) * CACHE; the* CACHEkeeps the offsetindex-typed so codegen is clean, matching the issue).TestFusedMatmulDynScatter/TestFusedMatmulDynScatterNoSplit.Why a repro-only PR
I grounded the diagnosis offline (generated the post-PTOAS
aiv.cppvia--codegen-only --save-kernels). Findings:subblockid==0real vs theelsereplay) are independently pipe-flag balanced — matching the reporter's note that the outerset_flag/wait_flagframe is balanced. This disproves the lane-asymmetry theory.So it is possible this minimal repro does not deadlock. CI on a2a3 is the decisive signal:
507018) → repro confirmed; proceed with the fix on this branch.models/deepseek/v4/decode_indexer_compressor.pyshape (thegate(c_idx)guard + block-table-via-tile-load path).Testing
--codegen-only) for both NONE and UP_DOWNRelated
#1592