fix(passes): outlined tile.store SSA scope violation via idempotent ConvertToSSA + escaping promotion (fixes #1693)#1722
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR extends the IR SSA conversion pipeline to run a second pass after outlining, ensuring outlined functions maintain SSA validity. The implementation uniformly applies ChangesPost-Outline SSA Conversion with Dynamic-Type Consistency
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request registers and documents a second SSA conversion pass (ConvertToSSA_post_outline) to run after outlining, ensuring that outlined functions maintain valid SSA variables. To support this, the ConvertToSSA pass is made idempotent by threading type-embedded variable references through SubstType during variable construction. However, a critical issue was identified in src/ir/transforms/convert_to_ssa_pass.cpp where FindInitValue compares types using pointer equality. Because SubstType returns newly allocated type pointers, this pointer comparison fails for substituted types of escaping variables, causing FindInitValue to return nullptr and fallback to placeholder variables. It is recommended to update FindInitValue to compare the original, unsubstituted types instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…mes propagate The post-outline ConvertToSSA invocation introduced in PR hw-native-sys#1722 broke orchestration codegen for any program using ``pl.spmd(<dynamic>)``: the generated C++ referenced ``cores`` without declaring it (``'cores' was not declared in this scope``). Cause: ``OutlineIncoreScopes`` lifts the SpmdScopeStmt's ``core_num`` onto the outlined Spmd function as a function-level attr (``mul_spmd.attrs["core_num"]`` — an ExprPtr that references a Var defined in the dispatching Orchestration function — see ``include/pypto/ir/transforms/utils/scope_outline_utils.h:778``). The ConvertToSSA pass was function-level; re-running it on the orchestration function re-versioned the ``cores`` AssignStmt LHS but had no visibility into the sibling Spmd function's attr, leaving it pointing at the stale pointer. Simplify's ``CollectCoreNumReferencedVars`` cross-function DCE protection then mis-fired (stale attr pointer ≠ new AssignStmt LHS pointer) and DCE dropped the ``cores = ...`` assignment. Fix: promote ConvertToSSA from ``CreateFunctionPass`` to ``CreateProgramPass``. The new driver runs in two phases: 1. Per-function SSA conversion (unchanged behaviour). Each converter's final ``cur_`` table is accumulated into a single program-wide rename map (original Var pointer → final SSA Var). 2. Walk every function's attrs and substitute Var refs per the program-wide map. Currently only ``core_num`` (ExprPtr) needs handling — extension point documented at ``SubstFunctionAttrs``. The per-function converter is exposed via a new ``SSAConverter::cur()`` accessor; ``TransformConvertToSSAFunction`` (the renamed single-function entry) now returns the new function plus its renames. Tests: - New ``test_rerun_propagates_renames_into_sibling_function_core_num_attr`` in ``TestIdempotence`` exercises the exact failure: drive the default pipeline through the post-outline ConvertToSSA, then assert the sibling Spmd function's ``core_num`` attr Var refs share the same pointer identity as the orchestration body's ``cores`` AssignStmt LHS. - Local repro on ``tests/st/runtime/cross_core/test_spmd_dynamic_core_num.py`` confirms the orchestration cpp now contains ``int64_t cores = ((b_dim * 3) / 4);`` (the device-side ``simpler_init`` failure that remains is the card-0 environmental issue, unrelated to codegen). - Full ut suite: 5904 passed; one pre-existing unrelated failure in ``test_distributed_worker_call_stores_last_run_timing`` (an upstream ``self._multi_program`` attribute bug introduced by hw-native-sys#1687, not touched by this PR).
PR hw-native-sys#1722 broke dist-system-tests with numerical mismatches and ``recv_count=[[0,0,0,0]]`` patterns. Root cause: the post-outline ConvertToSSA invocation re-versioned Vars that were already in auto-name SSA form (``foo__ssa_v0`` / ``foo__tmp_v1`` / ``foo__idx_v0`` etc.), minting fresh Var pointers with *colliding* ``name_hint_`` strings. The collision shows up downstream in host_orch codegen (``src/codegen/distributed/distributed_codegen.cpp`` / ``distributed_ops_codegen.cpp``), which uses the raw ``name_hint_`` as a Python identifier — e.g. ``tensors["<name_hint>"]``, ``CommBufferSpec(name="<name_hint>")``. Two Vars with the same ``name_hint_`` collapse into one Python slot. For the allgather repro: tensors["t__ssa_v0"] = tensors["inputs__ssa_v0"][r, 0:1, 0:64] tensors["t__ssa_v0"] = tensors["outputs__ssa_v0"][r, 0:1, 0:128] t__ssa_v0 = world_size ... _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), INPUT) _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), OUTPUT_EXISTING) _ta_0.add_scalar(((r__idx_v0 + 1) % t__ssa_v0)) The input and output tensor slots both resolve to the *output* slice (last write to the dict key); the peer scalar reads ``world_size`` instead of the world-size-mod result. All distributed ops then dispatch with garbage args. Fix: ``AllocVersion`` short-circuits when the input Var is auto-name- versioned and has not been seen yet as a write site in this function. Keep its pointer identity in ``cur_`` instead of minting a new Var. This gives true idempotence — re-running ConvertToSSA on already-SSA IR is a no-op for the renaming step, and: - Cross-function refs (e.g. an outlined Spmd function's ``core_num`` ExprPtr, a ``DistributedTensorType``'s ``window_buffer_`` field) that the per-function ``cur_`` cannot reach stay valid. - Codegen paths keyed on ``name_hint_`` don't collide. - Escape promotion for hw-native-sys#1693 still works because the inside-loop ``_store_var`` synth'd by ``StoreEvalToAssignMutator`` is *not* auto-versioned — it goes through the normal versioning path. Auto-versioned but multi-assigned Vars (e.g. ``break__tmp_v0`` from ``CtrlFlowTransform``) still need re-versioning. The ``already_seen`` flag (``cur_.find(key) != cur_.end()``) distinguishes a first-time write (preserve identity) from a re-assignment (version normally). Tests: - Local repro on ``tests/st/distributed/test_l3_allgather.py`` now generates ``tensors["t__tmp_v1"]`` / ``tensors["t__tmp_v2"]`` / ``t__tmp_v3`` — three distinct keys, no collision. - Full ut suite: 5904 passed; only the pre-existing unrelated ``test_distributed_worker_call_stores_last_run_timing`` failure from upstream hw-native-sys#1687's ``self._multi_program`` attribute bug.
``SubstType`` returns a freshly-allocated ``shared_ptr<TensorType>`` (or similar) when any embedded Var ref gets renamed; two structurally identical substituted types are *not* pointer-equal. The escaping path in ``ConvertFor`` / ``ConvertWhile`` was passing the substituted type to ``FindInitValue``, which compares via raw pointer equality. With dynamic shapes (Var-embedded types) the comparison fails even when an Out/InOut param has the same logical type, sending escape promotion to the placeholder fallback (``foo__FREE_VAR``) the SSA verifier rejects. Fix: pass the original ``type_it->second`` (un-substituted from ``TypeCollector``) and compare against the params' / pre-loop Vars' original ``GetType()`` pointers, which still point at the same TypePtr identities from the input IR. The substituted ``type`` is still used for the new iter_arg / placeholder Var construction. Per Gemini's review on PR hw-native-sys#1722.
|
Re: Gemini's review on The original concern is theoretical — Keeping the original substituted-type comparison for now. Leaving this thread open so it stays visible if the dynamic-shape edge case ever surfaces — a future fix would need a separate test demonstrating the failure first. |
|
Update on the FindInitValue thread: re-applied the fix in 86523a7. Earlier I attributed the dist-system-tests regression (test_l3_ep_dispatch_combine, max diff = 40.5) to this change, but on closer inspection that test has a known intermittent synchronisation precision bug unrelated to ConvertToSSA. The original-type comparison is the correct semantics — keeping it. |
The pass doc claims "Preserves existing SSA structure" and "Keep
existing SSA constructs unchanged", but re-running ConvertToSSA on
already-SSA IR with dynamic shape / valid_shape / stride Var refs
embedded in Tile/Tensor type annotations corrupted those refs. Every
AssignStmt LHS was unconditionally re-versioned (cur_[OLD] = NEW), but
the iter_arg / phi / return_var construction sites read the type from
the old Var verbatim — type-embedded refs to OLD survived as dangling
pointers, and SSAVerify reported "used before definition".
Fix: thread every Var construction site that reads a type from an
existing Var (or from TypeCollector.types[]) through SubstType so
renamed refs stay consistent.
Sites fixed:
- ConvertFor: existing iter_args (line 542), carried iter_arg/rv
init types (603/608), escaping iter_arg/rv types (614/624/629),
loop_var type (636), existing return_vars push (656).
- ConvertWhile: mirror of ForStmt (696, 748/753, 759/765/771, 792).
- ConvertIf: no-phi rv (877), phi var (900), mixed rv (919).
For the existing return_vars push, freshening requires a new Var
pointer when SubstType actually changes the type; cur_ is updated to
map the old rv pointer to the new one so subsequent uses substitute
correctly. Removed the now-redundant RegisterExistingReturnVars
helper — its job is folded into the freshen-and-push loop.
Tests: added TestIdempotence with two regression cases (IfStmt phi
type with dynamic valid_shape, ForStmt iter_arg type with dynamic
valid_shape). Both fail on baseline with "used before definition"
and pass after the fix. Full ut suite: 5867 pass, 3 pre-existing
unrelated runtime failures (Worker.current attribute) match main.
Docs: tighten the existing "Preserves existing SSA structure" line
into an explicit idempotence guarantee covering dynamic-shape Var
refs in type annotations; mirror the same in zh-cn.
…-sys#1693) OutlineScope's StoreEvalToAssignMutator rewrites EvalStmt-form ``pl.tile.store(t, ..., target)`` into ``_store_var = pl.tile.store(...)`` so the outlined function can return the post-store SSA name. When the store sits inside a for/while/if body, the synthesised AssignStmt is inside-loop while the ReturnStmt that references ``_store_var`` is at function-body depth — SSA scope violation, surfaced as the ``__FREE_VAR`` suffix the SSA verifier rejects. Fix: re-run ConvertToSSA right after OutlineClusterScopes (the last outline pass). The escaping-variable promotion in ConvertToSSA detects "var defined inside loop, used after loop", synthesises IterArg + YieldStmt for each enclosing for/while, and FindInitValue picks the matching Out/InOut function param as the initial value. The outlined ``return _store_var`` becomes ``return _store_var__rv_vN`` referencing a function-scope yield result — SSA-legal, matches what the user would have written by hand with the equivalent native-DSL escaping store pattern. This relies on ConvertToSSA being idempotent on already-SSA IR (c432fd19 — the first pass's outputs survive the second pass unchanged). Pipeline-level effect: ``ConvertToSSA_post_outline`` is added between ``OutlineClusterScopes`` and ``ConvertTensorToTileOps``. The original 4th-position ``ConvertToSSA`` is untouched. Tests: - ``tests/ut/ir/transforms/test_pass_manager.py``: add the new entry to ``TENSOR_ONLY_PASSES`` so the default-strategy inventory check matches. - Existing outline pass tests (74 cases) are unaffected — they exercise ``outline_*()`` directly, not the full pipeline. - Full ut suite: 5867 passed, 3 pre-existing unrelated runtime failures (Worker.current attribute) match main. Docs: note the second invocation point in both ``docs/{en,zh-cn}/dev/passes/04-convert_to_ssa.md`` and add the post-outline row to ``00-pass_manager.md``'s pipeline table.
…mes propagate The post-outline ConvertToSSA invocation introduced in PR hw-native-sys#1722 broke orchestration codegen for any program using ``pl.spmd(<dynamic>)``: the generated C++ referenced ``cores`` without declaring it (``'cores' was not declared in this scope``). Cause: ``OutlineIncoreScopes`` lifts the SpmdScopeStmt's ``core_num`` onto the outlined Spmd function as a function-level attr (``mul_spmd.attrs["core_num"]`` — an ExprPtr that references a Var defined in the dispatching Orchestration function — see ``include/pypto/ir/transforms/utils/scope_outline_utils.h:778``). The ConvertToSSA pass was function-level; re-running it on the orchestration function re-versioned the ``cores`` AssignStmt LHS but had no visibility into the sibling Spmd function's attr, leaving it pointing at the stale pointer. Simplify's ``CollectCoreNumReferencedVars`` cross-function DCE protection then mis-fired (stale attr pointer ≠ new AssignStmt LHS pointer) and DCE dropped the ``cores = ...`` assignment. Fix: promote ConvertToSSA from ``CreateFunctionPass`` to ``CreateProgramPass``. The new driver runs in two phases: 1. Per-function SSA conversion (unchanged behaviour). Each converter's final ``cur_`` table is accumulated into a single program-wide rename map (original Var pointer → final SSA Var). 2. Walk every function's attrs and substitute Var refs per the program-wide map. Currently only ``core_num`` (ExprPtr) needs handling — extension point documented at ``SubstFunctionAttrs``. The per-function converter is exposed via a new ``SSAConverter::cur()`` accessor; ``TransformConvertToSSAFunction`` (the renamed single-function entry) now returns the new function plus its renames. Tests: - New ``test_rerun_propagates_renames_into_sibling_function_core_num_attr`` in ``TestIdempotence`` exercises the exact failure: drive the default pipeline through the post-outline ConvertToSSA, then assert the sibling Spmd function's ``core_num`` attr Var refs share the same pointer identity as the orchestration body's ``cores`` AssignStmt LHS. - Local repro on ``tests/st/runtime/cross_core/test_spmd_dynamic_core_num.py`` confirms the orchestration cpp now contains ``int64_t cores = ((b_dim * 3) / 4);`` (the device-side ``simpler_init`` failure that remains is the card-0 environmental issue, unrelated to codegen). - Full ut suite: 5904 passed; one pre-existing unrelated failure in ``test_distributed_worker_call_stores_last_run_timing`` (an upstream ``self._multi_program`` attribute bug introduced by hw-native-sys#1687, not touched by this PR).
PR hw-native-sys#1722 broke dist-system-tests with numerical mismatches and ``recv_count=[[0,0,0,0]]`` patterns. Root cause: the post-outline ConvertToSSA invocation re-versioned Vars that were already in auto-name SSA form (``foo__ssa_v0`` / ``foo__tmp_v1`` / ``foo__idx_v0`` etc.), minting fresh Var pointers with *colliding* ``name_hint_`` strings. The collision shows up downstream in host_orch codegen (``src/codegen/distributed/distributed_codegen.cpp`` / ``distributed_ops_codegen.cpp``), which uses the raw ``name_hint_`` as a Python identifier — e.g. ``tensors["<name_hint>"]``, ``CommBufferSpec(name="<name_hint>")``. Two Vars with the same ``name_hint_`` collapse into one Python slot. For the allgather repro: tensors["t__ssa_v0"] = tensors["inputs__ssa_v0"][r, 0:1, 0:64] tensors["t__ssa_v0"] = tensors["outputs__ssa_v0"][r, 0:1, 0:128] t__ssa_v0 = world_size ... _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), INPUT) _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), OUTPUT_EXISTING) _ta_0.add_scalar(((r__idx_v0 + 1) % t__ssa_v0)) The input and output tensor slots both resolve to the *output* slice (last write to the dict key); the peer scalar reads ``world_size`` instead of the world-size-mod result. All distributed ops then dispatch with garbage args. Fix: ``AllocVersion`` short-circuits when the input Var is auto-name- versioned and has not been seen yet as a write site in this function. Keep its pointer identity in ``cur_`` instead of minting a new Var. This gives true idempotence — re-running ConvertToSSA on already-SSA IR is a no-op for the renaming step, and: - Cross-function refs (e.g. an outlined Spmd function's ``core_num`` ExprPtr, a ``DistributedTensorType``'s ``window_buffer_`` field) that the per-function ``cur_`` cannot reach stay valid. - Codegen paths keyed on ``name_hint_`` don't collide. - Escape promotion for hw-native-sys#1693 still works because the inside-loop ``_store_var`` synth'd by ``StoreEvalToAssignMutator`` is *not* auto-versioned — it goes through the normal versioning path. Auto-versioned but multi-assigned Vars (e.g. ``break__tmp_v0`` from ``CtrlFlowTransform``) still need re-versioning. The ``already_seen`` flag (``cur_.find(key) != cur_.end()``) distinguishes a first-time write (preserve identity) from a re-assignment (version normally). Tests: - Local repro on ``tests/st/distributed/test_l3_allgather.py`` now generates ``tensors["t__tmp_v1"]`` / ``tensors["t__tmp_v2"]`` / ``t__tmp_v3`` — three distinct keys, no collision. - Full ut suite: 5904 passed; only the pre-existing unrelated ``test_distributed_worker_call_stores_last_run_timing`` failure from upstream hw-native-sys#1687's ``self._multi_program`` attribute bug.
``SubstType`` returns a freshly-allocated ``shared_ptr<TensorType>`` (or similar) when any embedded Var ref gets renamed; two structurally identical substituted types are *not* pointer-equal. The escaping path in ``ConvertFor`` / ``ConvertWhile`` was passing the substituted type to ``FindInitValue``, which compares via raw pointer equality. With dynamic shapes (Var-embedded types) the comparison fails even when an Out/InOut param has the same logical type, sending escape promotion to the placeholder fallback (``foo__FREE_VAR``) the SSA verifier rejects. Fix: pass the original ``type_it->second`` (un-substituted from ``TypeCollector``) and compare against the params' / pre-loop Vars' original ``GetType()`` pointers, which still point at the same TypePtr identities from the input IR. The substituted ``type`` is still used for the new iter_arg / placeholder Var construction. Per Gemini's review on PR hw-native-sys#1722.
This reverts commit 69a0fab.
Reapply 69a0fab (Gemini's review fix). The earlier revert (768eea8) attributed a CI ``test_l3_ep_dispatch_combine`` failure (``max diff = 40.5``) to this commit, but that test has a known flaky synchronisation bug that occasionally causes small precision mismatches — re-running the same CI without code changes also surfaces it intermittently. The ``FindInitValue`` semantics here are correct and a soundness improvement for the dynamic-shape escape-promotion path; keeping ``SubstType``-derived ``shared_ptr`` identity for type comparison can miss legitimate Out/InOut param matches and fall back to a ``foo__FREE_VAR`` placeholder the SSA verifier rejects. This re-introduces: - ``FindInitValue`` compares against original (unsubstituted) types using the ``orig_params_[i]->GetType()`` and ``key->GetType()`` paths, so structurally-equivalent-but-pointer-distinct substituted ``TypePtr`` instances no longer fail the lookup. - The two callers in ``ConvertFor`` / ``ConvertWhile`` pass ``type_it->second`` (the un-substituted ``TypeCollector`` entry) into ``FindInitValue`` while still using the substituted ``type`` for new IterArg / placeholder Var construction.
After f3f518c (preserve Var pointer identity for already-auto-versioned Vars in AllocVersion), re-running ConvertToSSA on an IfStmt that was already in SSA form left ``then_ver == else_ver == before`` for every body Var (the branch AssignStmts were no-ops on auto-versioned LHS), so ``ConvertIf`` saw zero divergence and fell into the "no new phis but existing return_vars" path. That path unconditionally re-versioned the existing rv via ``BuildAutoNamedVersion(name, "rv", v)`` — dropping the ``__phi_vN`` role marker. The role drop broke ``test_manual_scope_ifstmt_phi_read_after`` (added in upstream hw-native-sys#1719): the test asserts ``Tensor \w+__phi_v\d+ =`` shows up in the orchestration cpp, but our re-run had downgraded ``result__phi_v2`` to ``result__rv_v0``, and codegen prints ``name_hint_`` verbatim. Fix: when re-processing existing return_vars in ``ConvertIf``, preserve the original rv pointer identity if its name is already auto-versioned (``IsAutoVersionedName``). Same idempotence philosophy as ``AllocVersion``. Both the "no new phis" path and the "preserve existing rvs alongside new phis" path get the same treatment.
86523a7 to
1de2b27
Compare
…trs dispatch, compress comments Three review findings on PR hw-native-sys#1722: 1. NormalizedStmtStructure invalidation gap (high-priority). The new ``ConvertToSSA_post_outline`` reuses ``kConvertToSSAProperties`` which declares ``.invalidated = {NormalizedStmtStructure}``, but the next pass at the new position is ``ConvertTensorToTileOps`` which requires that property. Mirroring the original 4th-position pipeline, insert ``NormalizeStmtStructure_post_outline`` between them and update the default-strategy inventory test plus the pipeline doc table (en/zh). 2. ``SubstFunctionAttrs`` had a hard-coded ``"core_num"`` enumeration plus a "remember to extend this" comment — a silent-regression trap. Switch to generic ``ExprPtr`` dispatch: any attr stored as an ``ExprPtr`` is walked for Var refs and substituted. Future Var- bearing ``ExprPtr`` attrs are auto-handled. Non-``ExprPtr`` Var carriers (plain ``VarPtr`` / ``vector<VarPtr>``) still need explicit handling — none exist on Functions today, called out in the comment. 3. Multi-paragraph rationale blocks on ``AllocVersion`` / ``FindInitValue`` / the ConvertIf rv-preservation path compress to 2–4 lines each (project convention favours minimal in-code comments; the full rationale lives in the PR description and the pass doc).
|
Superseded by upstream #1747 (d86f86b), which fixes #1693 via a different (and more thorough) approach: ScopeOutliner / ExpandMixedKernel / NormalizeReturnOrder now emit explicit param-return wrappers under a new ReturnParamsExplicit IR property + verifier, so outlined function ReturnStmts reference InOut params directly — the original SSA scope violation root cause is gone, no post-outline ConvertToSSA re-run needed. Verified locally: the original #1693 repro on plain upstream/main with d86f86b emits SSA-legal IR ( |
Summary
Two stacked commits that together fix issue #1693 (SSA scope violation in outlined functions when
tile.storeis rewritten inside a loop body) by leaning on ConvertToSSA's existing escaping-variable promotion machinery.Commit 1 —
fix(convert_to_ssa): make pass idempotent on already-SSA IRThe pass doc advertises
Preserves existing SSA structureandKeep existing SSA constructs unchanged, but re-running on already-SSA IR with dynamic shape / valid_shape / stride Var refs embedded in Tile/Tensor type annotations dangled those refs. AssignStmt LHS is unconditionally re-versioned (cur_[OLD] = NEW), but every iter_arg / phi / return_var construction site read the type from the old Var verbatim — type-embedded refs to OLD survived and SSAVerify reportedused before definition.Fix: thread every Var construction site that reads a type from an existing Var (or from
TypeCollector.types[]) throughSubstTypeso renamed refs stay consistent. Sites: existing iter_args / loop_var / carried / escaping init types in ConvertFor and ConvertWhile; phi var, no-phi rv, mixed rv in ConvertIf. The previously-verbatim existing return_vars push is replaced with a freshen-and-update-cur_ loop; removed the now-redundantRegisterExistingReturnVarshelper.Tests: added
TestIdempotencewith two regression cases (IfStmt phi with dynamic valid_shape, ForStmt iter_arg with dynamic valid_shape). Both fail on baseline withused before definitionand pass after the fix.Doc: tighten
04-convert_to_ssa.mdto make the idempotence guarantee explicit (en + zh-cn).Commit 2 —
fix(passes): rerun ConvertToSSA after outline passes (fixes #1693)OutlineScope's
StoreEvalToAssignMutatorrewrites EvalStmt-formpl.tile.store(t, ..., target)into_store_var = pl.tile.store(...)so the outlined function can return the post-store SSA name. When the store sits inside a for/while/if body, the synthesised AssignStmt is inside-loop while the ReturnStmt that references_store_varis at function-body depth — SSA scope violation, surfaced as the__FREE_VARsuffix the SSA verifier rejects.Fix: re-run
ConvertToSSAright afterOutlineClusterScopes(the last outline pass). The escaping-variable promotion in ConvertToSSA detects "var defined inside loop, used after loop", synthesises IterArg + YieldStmt for each enclosing for/while, andFindInitValuepicks the matching Out/InOut function param as the initial value. The outlinedreturn _store_varbecomesreturn _store_var__rv_vNreferencing a function-scope yield result — SSA-legal, matches what the user would have written by hand with the equivalent native-DSL escaping store pattern. This relies on the idempotence guarantee from commit 1.Pipeline-level effect:
ConvertToSSA_post_outlineis added betweenOutlineClusterScopesandConvertTensorToTileOps. The original 4th-positionConvertToSSAis untouched.Test plan
TestIdempotence::test_rerun_on_if_with_dynamic_valid_shape,test_rerun_on_for_loop_with_dynamic_valid_shape)outline_*()directly, unaffected by the new pipeline entrytest_pass_manager_get_strategy_default—ConvertToSSA_post_outlineadded toTENSOR_ONLY_PASSES)pypto.runtime.runtime_base.Worker.currentattribute)for i, (a__iter_v0,) in pl.range(2, init_values=(a__ssa_v0,)): ... a__rv_v1 = pl.yield_(a__ssa_v2)followed byreturn a__rv_v1— SSA-legal, no__FREE_VARRelated issues
Fixes #1693