Skip to content

fix(passes): outlined tile.store SSA scope violation via idempotent ConvertToSSA + escaping promotion (fixes #1693)#1722

Closed
YunjiQin wants to merge 9 commits into
hw-native-sys:mainfrom
YunjiQin:issue-1693-escape-with-idempotent
Closed

fix(passes): outlined tile.store SSA scope violation via idempotent ConvertToSSA + escaping promotion (fixes #1693)#1722
YunjiQin wants to merge 9 commits into
hw-native-sys:mainfrom
YunjiQin:issue-1693-escape-with-idempotent

Conversation

@YunjiQin

@YunjiQin YunjiQin commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Two stacked commits that together fix issue #1693 (SSA scope violation in outlined functions when tile.store is rewritten inside a loop body) by leaning on ConvertToSSA's existing escaping-variable promotion machinery.

Commit 1 — fix(convert_to_ssa): make pass idempotent on already-SSA IR

The pass doc advertises Preserves existing SSA structure and Keep existing SSA constructs unchanged, but re-running on already-SSA IR with dynamic shape / valid_shape / stride Var refs embedded in Tile/Tensor type annotations dangled those refs. AssignStmt LHS is unconditionally re-versioned (cur_[OLD] = NEW), but every iter_arg / phi / return_var construction site read the type from the old Var verbatim — type-embedded refs to OLD survived and SSAVerify reported used before definition.

Fix: thread every Var construction site that reads a type from an existing Var (or from TypeCollector.types[]) through SubstType so renamed refs stay consistent. Sites: existing iter_args / loop_var / carried / escaping init types in ConvertFor and ConvertWhile; phi var, no-phi rv, mixed rv in ConvertIf. The previously-verbatim existing return_vars push is replaced with a freshen-and-update-cur_ loop; removed the now-redundant RegisterExistingReturnVars helper.

Tests: added TestIdempotence with two regression cases (IfStmt phi with dynamic valid_shape, ForStmt iter_arg with dynamic valid_shape). Both fail on baseline with used before definition and pass after the fix.

Doc: tighten 04-convert_to_ssa.md to make the idempotence guarantee explicit (en + zh-cn).

Commit 2 — fix(passes): rerun ConvertToSSA after outline passes (fixes #1693)

OutlineScope's StoreEvalToAssignMutator rewrites EvalStmt-form pl.tile.store(t, ..., target) into _store_var = pl.tile.store(...) so the outlined function can return the post-store SSA name. When the store sits inside a for/while/if body, the synthesised AssignStmt is inside-loop while the ReturnStmt that references _store_var is at function-body depth — SSA scope violation, surfaced as the __FREE_VAR suffix the SSA verifier rejects.

Fix: re-run ConvertToSSA right after OutlineClusterScopes (the last outline pass). The escaping-variable promotion in ConvertToSSA detects "var defined inside loop, used after loop", synthesises IterArg + YieldStmt for each enclosing for/while, and FindInitValue picks the matching Out/InOut function param as the initial value. The outlined return _store_var becomes return _store_var__rv_vN referencing a function-scope yield result — SSA-legal, matches what the user would have written by hand with the equivalent native-DSL escaping store pattern. This relies on the idempotence guarantee from commit 1.

Pipeline-level effect: ConvertToSSA_post_outline is added between OutlineClusterScopes and ConvertTensorToTileOps. The original 4th-position ConvertToSSA is untouched.

Test plan

  • Idempotence regression tests pass (TestIdempotence::test_rerun_on_if_with_dynamic_valid_shape, test_rerun_on_for_loop_with_dynamic_valid_shape)
  • Existing ConvertToSSA tests pass (65 cases) — no behavioural change on first invocation
  • Existing outline pass tests pass (74 cases) — they exercise outline_*() directly, unaffected by the new pipeline entry
  • Default strategy inventory test updated (test_pass_manager_get_strategy_defaultConvertToSSA_post_outline added to TENSOR_ONLY_PASSES)
  • Full ut suite: 5867 passed (3 pre-existing unrelated runtime failures match main — pypto.runtime.runtime_base.Worker.current attribute)
  • Issue [Bug] Orchestration cross-assigns __ssa_v aliases for multiple in-place outputs of an inlined InCore scope #1693 repro: outlined function now emits for i, (a__iter_v0,) in pl.range(2, init_values=(a__ssa_v0,)): ... a__rv_v1 = pl.yield_(a__ssa_v2) followed by return a__rv_v1 — SSA-legal, no __FREE_VAR

Related issues

Fixes #1693

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e44e0d70-da99-4c22-be81-95160542ca49

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extends the IR SSA conversion pipeline to run a second pass after outlining, ensuring outlined functions maintain SSA validity. The implementation uniformly applies SubstType when constructing all SSA-related variables to correctly handle dynamic type annotations. New tests validate that re-running the pass on already-SSA IR produces idempotent results.

Changes

Post-Outline SSA Conversion with Dynamic-Type Consistency

Layer / File(s) Summary
Documentation and pipeline setup
docs/en/dev/passes/00-pass_manager.md, docs/en/dev/passes/04-convert_to_ssa.md, docs/zh-cn/dev/passes/00-pass_manager.md, docs/zh-cn/dev/passes/04-convert_to_ssa.md, python/pypto/ir/pass_manager.py
New ConvertToSSA (post-outline) pass stage documented with idempotency guarantee and two-pass execution order. Pipeline adds the post-outline stage after OutlineClusterScopes to re-convert outlined IR.
SubstType application in SSA conversion
src/ir/transforms/convert_to_ssa_pass.cpp
Consistent SubstType wrapping applied throughout all SSA variable construction paths: loop iter-args, carried/escaping return-vars, loop induction variables, and conditional phi/return-vars. Inline return-var freshening replaces removed RegisterExistingReturnVars helper.
Idempotence validation tests
tests/ut/ir/transforms/test_convert_to_ssa_pass.py, tests/ut/ir/transforms/test_pass_manager.py
New TestIdempotence class with SSA verifier helper and regression tests re-running SSA conversion on IR with dynamic valid_shape Vars in phi and iter-arg/return-var types. Test expectations updated for the new post-outline pass stage.

🎯 3 (Moderate) | ⏱️ ~25 minutes

bug, enhancement

🐰 Two passes now dance in perfect step,
Types bloom where Vars once overslept,
SubstType weaves through every fold,
Idempotence—a story newly told! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and specifically describes the fix: implementing idempotent ConvertToSSA with escaping promotion to resolve outlined tile.store SSA scope violations, directly referencing issue #1693.
Description check ✅ Passed The PR description comprehensively details both commits, their specific problems/solutions, and test results, all directly related to the changeset scope.
Linked Issues check ✅ Passed The PR implementation addresses issue #1693 by fixing SSA scope violations in outlined tile.store operations through idempotent ConvertToSSA and escaping-variable promotion, matching the stated objective.
Out of Scope Changes check ✅ Passed All code changes—documentation updates, pass manager pipeline insertion, ConvertToSSA SubstType threading, and test additions—are tightly scoped to fixing the outlined tile.store SSA scope violation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request registers and documents a second SSA conversion pass (ConvertToSSA_post_outline) to run after outlining, ensuring that outlined functions maintain valid SSA variables. To support this, the ConvertToSSA pass is made idempotent by threading type-embedded variable references through SubstType during variable construction. However, a critical issue was identified in src/ir/transforms/convert_to_ssa_pass.cpp where FindInitValue compares types using pointer equality. Because SubstType returns newly allocated type pointers, this pointer comparison fails for substituted types of escaping variables, causing FindInitValue to return nullptr and fallback to placeholder variables. It is recommended to update FindInitValue to compare the original, unsubstituted types instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/ir/transforms/convert_to_ssa_pass.cpp Outdated
YunjiQin added a commit to YunjiQin/pypto that referenced this pull request Jun 9, 2026
…mes propagate

The post-outline ConvertToSSA invocation introduced in PR hw-native-sys#1722 broke
orchestration codegen for any program using ``pl.spmd(<dynamic>)``: the
generated C++ referenced ``cores`` without declaring it
(``'cores' was not declared in this scope``).

Cause: ``OutlineIncoreScopes`` lifts the SpmdScopeStmt's ``core_num`` onto
the outlined Spmd function as a function-level attr
(``mul_spmd.attrs["core_num"]`` — an ExprPtr that references a Var
defined in the dispatching Orchestration function — see
``include/pypto/ir/transforms/utils/scope_outline_utils.h:778``). The
ConvertToSSA pass was function-level; re-running it on the orchestration
function re-versioned the ``cores`` AssignStmt LHS but had no visibility
into the sibling Spmd function's attr, leaving it pointing at the stale
pointer. Simplify's ``CollectCoreNumReferencedVars`` cross-function DCE
protection then mis-fired (stale attr pointer ≠ new AssignStmt LHS
pointer) and DCE dropped the ``cores = ...`` assignment.

Fix: promote ConvertToSSA from ``CreateFunctionPass`` to
``CreateProgramPass``. The new driver runs in two phases:

1. Per-function SSA conversion (unchanged behaviour). Each converter's
   final ``cur_`` table is accumulated into a single program-wide
   rename map (original Var pointer → final SSA Var).
2. Walk every function's attrs and substitute Var refs per the
   program-wide map. Currently only ``core_num`` (ExprPtr) needs
   handling — extension point documented at ``SubstFunctionAttrs``.

The per-function converter is exposed via a new ``SSAConverter::cur()``
accessor; ``TransformConvertToSSAFunction`` (the renamed
single-function entry) now returns the new function plus its renames.

Tests:
- New ``test_rerun_propagates_renames_into_sibling_function_core_num_attr``
  in ``TestIdempotence`` exercises the exact failure: drive the default
  pipeline through the post-outline ConvertToSSA, then assert the
  sibling Spmd function's ``core_num`` attr Var refs share the same
  pointer identity as the orchestration body's ``cores`` AssignStmt LHS.
- Local repro on ``tests/st/runtime/cross_core/test_spmd_dynamic_core_num.py``
  confirms the orchestration cpp now contains ``int64_t cores = ((b_dim * 3) / 4);``
  (the device-side ``simpler_init`` failure that remains is the
  card-0 environmental issue, unrelated to codegen).
- Full ut suite: 5904 passed; one pre-existing unrelated failure in
  ``test_distributed_worker_call_stores_last_run_timing`` (an upstream
  ``self._multi_program`` attribute bug introduced by hw-native-sys#1687, not
  touched by this PR).
YunjiQin added a commit to YunjiQin/pypto that referenced this pull request Jun 9, 2026
PR hw-native-sys#1722 broke dist-system-tests with numerical mismatches and
``recv_count=[[0,0,0,0]]`` patterns. Root cause: the post-outline
ConvertToSSA invocation re-versioned Vars that were already in
auto-name SSA form (``foo__ssa_v0`` / ``foo__tmp_v1`` / ``foo__idx_v0``
etc.), minting fresh Var pointers with *colliding* ``name_hint_``
strings.

The collision shows up downstream in host_orch codegen
(``src/codegen/distributed/distributed_codegen.cpp`` /
``distributed_ops_codegen.cpp``), which uses the raw ``name_hint_`` as
a Python identifier — e.g. ``tensors["<name_hint>"]``,
``CommBufferSpec(name="<name_hint>")``. Two Vars with the same
``name_hint_`` collapse into one Python slot. For the allgather repro:

  tensors["t__ssa_v0"] = tensors["inputs__ssa_v0"][r, 0:1, 0:64]
  tensors["t__ssa_v0"] = tensors["outputs__ssa_v0"][r, 0:1, 0:128]
  t__ssa_v0 = world_size
  ...
  _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), INPUT)
  _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), OUTPUT_EXISTING)
  _ta_0.add_scalar(((r__idx_v0 + 1) % t__ssa_v0))

The input and output tensor slots both resolve to the *output* slice
(last write to the dict key); the peer scalar reads ``world_size``
instead of the world-size-mod result. All distributed ops then
dispatch with garbage args.

Fix: ``AllocVersion`` short-circuits when the input Var is auto-name-
versioned and has not been seen yet as a write site in this function.
Keep its pointer identity in ``cur_`` instead of minting a new Var.
This gives true idempotence — re-running ConvertToSSA on already-SSA
IR is a no-op for the renaming step, and:

  - Cross-function refs (e.g. an outlined Spmd function's ``core_num``
    ExprPtr, a ``DistributedTensorType``'s ``window_buffer_`` field)
    that the per-function ``cur_`` cannot reach stay valid.
  - Codegen paths keyed on ``name_hint_`` don't collide.
  - Escape promotion for hw-native-sys#1693 still works because the inside-loop
    ``_store_var`` synth'd by ``StoreEvalToAssignMutator`` is *not*
    auto-versioned — it goes through the normal versioning path.

Auto-versioned but multi-assigned Vars (e.g. ``break__tmp_v0`` from
``CtrlFlowTransform``) still need re-versioning. The ``already_seen``
flag (``cur_.find(key) != cur_.end()``) distinguishes a first-time
write (preserve identity) from a re-assignment (version normally).

Tests:
- Local repro on ``tests/st/distributed/test_l3_allgather.py`` now
  generates ``tensors["t__tmp_v1"]`` / ``tensors["t__tmp_v2"]`` /
  ``t__tmp_v3`` — three distinct keys, no collision.
- Full ut suite: 5904 passed; only the pre-existing unrelated
  ``test_distributed_worker_call_stores_last_run_timing`` failure from
  upstream hw-native-sys#1687's ``self._multi_program`` attribute bug.
YunjiQin added a commit to YunjiQin/pypto that referenced this pull request Jun 9, 2026
``SubstType`` returns a freshly-allocated ``shared_ptr<TensorType>`` (or
similar) when any embedded Var ref gets renamed; two structurally
identical substituted types are *not* pointer-equal. The escaping path
in ``ConvertFor`` / ``ConvertWhile`` was passing the substituted type
to ``FindInitValue``, which compares via raw pointer equality. With
dynamic shapes (Var-embedded types) the comparison fails even when an
Out/InOut param has the same logical type, sending escape promotion to
the placeholder fallback (``foo__FREE_VAR``) the SSA verifier rejects.

Fix: pass the original ``type_it->second`` (un-substituted from
``TypeCollector``) and compare against the params' / pre-loop Vars'
original ``GetType()`` pointers, which still point at the same
TypePtr identities from the input IR. The substituted ``type`` is
still used for the new iter_arg / placeholder Var construction.

Per Gemini's review on PR hw-native-sys#1722.
@YunjiQin

YunjiQin commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Re: Gemini's review on FindInitValue — I applied the suggested fix in 69a0fab (comparing unsubstituted types). It locally passed the full ut suite (5904 tests), but on CI it regressed tests/st/distributed/test_l3_ep_dispatch_combine.py::test_ep_dispatch_combine (max diff = 40.5, previously the test passed with f3f518c). Reverted in 768eea8.

The original concern is theoretical — SubstType does mint a new shared_ptr when type-embedded Vars are renamed, but in the current ConvertToSSA escape-promotion path the substituted Out/InOut param type and the substituted escape var type happen to share original Var pointers (per-function cur_ only has outer-scope renames at scope entry), so pointer equality still holds for the common case. The dynamic-shape edge case Gemini described doesn't fire on any existing test, and the proposed "original-type comparison" path has at least one structural-equality assumption (orig_params_[i]->GetType() == orig_type for the param branch) that diverges from what's expected by some real distributed program — surfacing as the ep_dispatch_combine regression.

Keeping the original substituted-type comparison for now. Leaving this thread open so it stays visible if the dynamic-shape edge case ever surfaces — a future fix would need a separate test demonstrating the failure first.

@YunjiQin

Copy link
Copy Markdown
Contributor Author

Update on the FindInitValue thread: re-applied the fix in 86523a7. Earlier I attributed the dist-system-tests regression (test_l3_ep_dispatch_combine, max diff = 40.5) to this change, but on closer inspection that test has a known intermittent synchronisation precision bug unrelated to ConvertToSSA. The original-type comparison is the correct semantics — keeping it.

YunjiQin added 8 commits June 10, 2026 11:10
The pass doc claims "Preserves existing SSA structure" and "Keep
existing SSA constructs unchanged", but re-running ConvertToSSA on
already-SSA IR with dynamic shape / valid_shape / stride Var refs
embedded in Tile/Tensor type annotations corrupted those refs. Every
AssignStmt LHS was unconditionally re-versioned (cur_[OLD] = NEW), but
the iter_arg / phi / return_var construction sites read the type from
the old Var verbatim — type-embedded refs to OLD survived as dangling
pointers, and SSAVerify reported "used before definition".

Fix: thread every Var construction site that reads a type from an
existing Var (or from TypeCollector.types[]) through SubstType so
renamed refs stay consistent.

Sites fixed:
  - ConvertFor: existing iter_args (line 542), carried iter_arg/rv
    init types (603/608), escaping iter_arg/rv types (614/624/629),
    loop_var type (636), existing return_vars push (656).
  - ConvertWhile: mirror of ForStmt (696, 748/753, 759/765/771, 792).
  - ConvertIf: no-phi rv (877), phi var (900), mixed rv (919).

For the existing return_vars push, freshening requires a new Var
pointer when SubstType actually changes the type; cur_ is updated to
map the old rv pointer to the new one so subsequent uses substitute
correctly. Removed the now-redundant RegisterExistingReturnVars
helper — its job is folded into the freshen-and-push loop.

Tests: added TestIdempotence with two regression cases (IfStmt phi
type with dynamic valid_shape, ForStmt iter_arg type with dynamic
valid_shape). Both fail on baseline with "used before definition"
and pass after the fix. Full ut suite: 5867 pass, 3 pre-existing
unrelated runtime failures (Worker.current attribute) match main.

Docs: tighten the existing "Preserves existing SSA structure" line
into an explicit idempotence guarantee covering dynamic-shape Var
refs in type annotations; mirror the same in zh-cn.
…-sys#1693)

OutlineScope's StoreEvalToAssignMutator rewrites EvalStmt-form
``pl.tile.store(t, ..., target)`` into ``_store_var = pl.tile.store(...)``
so the outlined function can return the post-store SSA name. When the
store sits inside a for/while/if body, the synthesised AssignStmt is
inside-loop while the ReturnStmt that references ``_store_var`` is at
function-body depth — SSA scope violation, surfaced as the
``__FREE_VAR`` suffix the SSA verifier rejects.

Fix: re-run ConvertToSSA right after OutlineClusterScopes (the last
outline pass). The escaping-variable promotion in ConvertToSSA detects
"var defined inside loop, used after loop", synthesises IterArg +
YieldStmt for each enclosing for/while, and FindInitValue picks the
matching Out/InOut function param as the initial value. The outlined
``return _store_var`` becomes ``return _store_var__rv_vN`` referencing
a function-scope yield result — SSA-legal, matches what the user would
have written by hand with the equivalent native-DSL escaping store
pattern.

This relies on ConvertToSSA being idempotent on already-SSA IR
(c432fd19 — the first pass's outputs survive the second pass
unchanged).

Pipeline-level effect: ``ConvertToSSA_post_outline`` is added between
``OutlineClusterScopes`` and ``ConvertTensorToTileOps``. The original
4th-position ``ConvertToSSA`` is untouched.

Tests:
- ``tests/ut/ir/transforms/test_pass_manager.py``: add the new entry
  to ``TENSOR_ONLY_PASSES`` so the default-strategy inventory check
  matches.
- Existing outline pass tests (74 cases) are unaffected — they
  exercise ``outline_*()`` directly, not the full pipeline.
- Full ut suite: 5867 passed, 3 pre-existing unrelated runtime
  failures (Worker.current attribute) match main.

Docs: note the second invocation point in both
``docs/{en,zh-cn}/dev/passes/04-convert_to_ssa.md`` and add the
post-outline row to ``00-pass_manager.md``'s pipeline table.
…mes propagate

The post-outline ConvertToSSA invocation introduced in PR hw-native-sys#1722 broke
orchestration codegen for any program using ``pl.spmd(<dynamic>)``: the
generated C++ referenced ``cores`` without declaring it
(``'cores' was not declared in this scope``).

Cause: ``OutlineIncoreScopes`` lifts the SpmdScopeStmt's ``core_num`` onto
the outlined Spmd function as a function-level attr
(``mul_spmd.attrs["core_num"]`` — an ExprPtr that references a Var
defined in the dispatching Orchestration function — see
``include/pypto/ir/transforms/utils/scope_outline_utils.h:778``). The
ConvertToSSA pass was function-level; re-running it on the orchestration
function re-versioned the ``cores`` AssignStmt LHS but had no visibility
into the sibling Spmd function's attr, leaving it pointing at the stale
pointer. Simplify's ``CollectCoreNumReferencedVars`` cross-function DCE
protection then mis-fired (stale attr pointer ≠ new AssignStmt LHS
pointer) and DCE dropped the ``cores = ...`` assignment.

Fix: promote ConvertToSSA from ``CreateFunctionPass`` to
``CreateProgramPass``. The new driver runs in two phases:

1. Per-function SSA conversion (unchanged behaviour). Each converter's
   final ``cur_`` table is accumulated into a single program-wide
   rename map (original Var pointer → final SSA Var).
2. Walk every function's attrs and substitute Var refs per the
   program-wide map. Currently only ``core_num`` (ExprPtr) needs
   handling — extension point documented at ``SubstFunctionAttrs``.

The per-function converter is exposed via a new ``SSAConverter::cur()``
accessor; ``TransformConvertToSSAFunction`` (the renamed
single-function entry) now returns the new function plus its renames.

Tests:
- New ``test_rerun_propagates_renames_into_sibling_function_core_num_attr``
  in ``TestIdempotence`` exercises the exact failure: drive the default
  pipeline through the post-outline ConvertToSSA, then assert the
  sibling Spmd function's ``core_num`` attr Var refs share the same
  pointer identity as the orchestration body's ``cores`` AssignStmt LHS.
- Local repro on ``tests/st/runtime/cross_core/test_spmd_dynamic_core_num.py``
  confirms the orchestration cpp now contains ``int64_t cores = ((b_dim * 3) / 4);``
  (the device-side ``simpler_init`` failure that remains is the
  card-0 environmental issue, unrelated to codegen).
- Full ut suite: 5904 passed; one pre-existing unrelated failure in
  ``test_distributed_worker_call_stores_last_run_timing`` (an upstream
  ``self._multi_program`` attribute bug introduced by hw-native-sys#1687, not
  touched by this PR).
PR hw-native-sys#1722 broke dist-system-tests with numerical mismatches and
``recv_count=[[0,0,0,0]]`` patterns. Root cause: the post-outline
ConvertToSSA invocation re-versioned Vars that were already in
auto-name SSA form (``foo__ssa_v0`` / ``foo__tmp_v1`` / ``foo__idx_v0``
etc.), minting fresh Var pointers with *colliding* ``name_hint_``
strings.

The collision shows up downstream in host_orch codegen
(``src/codegen/distributed/distributed_codegen.cpp`` /
``distributed_ops_codegen.cpp``), which uses the raw ``name_hint_`` as
a Python identifier — e.g. ``tensors["<name_hint>"]``,
``CommBufferSpec(name="<name_hint>")``. Two Vars with the same
``name_hint_`` collapse into one Python slot. For the allgather repro:

  tensors["t__ssa_v0"] = tensors["inputs__ssa_v0"][r, 0:1, 0:64]
  tensors["t__ssa_v0"] = tensors["outputs__ssa_v0"][r, 0:1, 0:128]
  t__ssa_v0 = world_size
  ...
  _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), INPUT)
  _ta_0.add_tensor(make_tensor_arg(tensors["t__ssa_v0"]), OUTPUT_EXISTING)
  _ta_0.add_scalar(((r__idx_v0 + 1) % t__ssa_v0))

The input and output tensor slots both resolve to the *output* slice
(last write to the dict key); the peer scalar reads ``world_size``
instead of the world-size-mod result. All distributed ops then
dispatch with garbage args.

Fix: ``AllocVersion`` short-circuits when the input Var is auto-name-
versioned and has not been seen yet as a write site in this function.
Keep its pointer identity in ``cur_`` instead of minting a new Var.
This gives true idempotence — re-running ConvertToSSA on already-SSA
IR is a no-op for the renaming step, and:

  - Cross-function refs (e.g. an outlined Spmd function's ``core_num``
    ExprPtr, a ``DistributedTensorType``'s ``window_buffer_`` field)
    that the per-function ``cur_`` cannot reach stay valid.
  - Codegen paths keyed on ``name_hint_`` don't collide.
  - Escape promotion for hw-native-sys#1693 still works because the inside-loop
    ``_store_var`` synth'd by ``StoreEvalToAssignMutator`` is *not*
    auto-versioned — it goes through the normal versioning path.

Auto-versioned but multi-assigned Vars (e.g. ``break__tmp_v0`` from
``CtrlFlowTransform``) still need re-versioning. The ``already_seen``
flag (``cur_.find(key) != cur_.end()``) distinguishes a first-time
write (preserve identity) from a re-assignment (version normally).

Tests:
- Local repro on ``tests/st/distributed/test_l3_allgather.py`` now
  generates ``tensors["t__tmp_v1"]`` / ``tensors["t__tmp_v2"]`` /
  ``t__tmp_v3`` — three distinct keys, no collision.
- Full ut suite: 5904 passed; only the pre-existing unrelated
  ``test_distributed_worker_call_stores_last_run_timing`` failure from
  upstream hw-native-sys#1687's ``self._multi_program`` attribute bug.
``SubstType`` returns a freshly-allocated ``shared_ptr<TensorType>`` (or
similar) when any embedded Var ref gets renamed; two structurally
identical substituted types are *not* pointer-equal. The escaping path
in ``ConvertFor`` / ``ConvertWhile`` was passing the substituted type
to ``FindInitValue``, which compares via raw pointer equality. With
dynamic shapes (Var-embedded types) the comparison fails even when an
Out/InOut param has the same logical type, sending escape promotion to
the placeholder fallback (``foo__FREE_VAR``) the SSA verifier rejects.

Fix: pass the original ``type_it->second`` (un-substituted from
``TypeCollector``) and compare against the params' / pre-loop Vars'
original ``GetType()`` pointers, which still point at the same
TypePtr identities from the input IR. The substituted ``type`` is
still used for the new iter_arg / placeholder Var construction.

Per Gemini's review on PR hw-native-sys#1722.
Reapply 69a0fab (Gemini's review fix). The earlier revert (768eea8)
attributed a CI ``test_l3_ep_dispatch_combine`` failure (``max diff =
40.5``) to this commit, but that test has a known flaky synchronisation
bug that occasionally causes small precision mismatches — re-running
the same CI without code changes also surfaces it intermittently. The
``FindInitValue`` semantics here are correct and a soundness
improvement for the dynamic-shape escape-promotion path; keeping
``SubstType``-derived ``shared_ptr`` identity for type comparison can
miss legitimate Out/InOut param matches and fall back to a
``foo__FREE_VAR`` placeholder the SSA verifier rejects.

This re-introduces:
  - ``FindInitValue`` compares against original (unsubstituted) types
    using the ``orig_params_[i]->GetType()`` and ``key->GetType()`` paths,
    so structurally-equivalent-but-pointer-distinct substituted
    ``TypePtr`` instances no longer fail the lookup.
  - The two callers in ``ConvertFor`` / ``ConvertWhile`` pass
    ``type_it->second`` (the un-substituted ``TypeCollector`` entry)
    into ``FindInitValue`` while still using the substituted ``type``
    for new IterArg / placeholder Var construction.
After f3f518c (preserve Var pointer identity for already-auto-versioned
Vars in AllocVersion), re-running ConvertToSSA on an IfStmt that was
already in SSA form left ``then_ver == else_ver == before`` for every
body Var (the branch AssignStmts were no-ops on auto-versioned LHS), so
``ConvertIf`` saw zero divergence and fell into the "no new phis but
existing return_vars" path. That path unconditionally re-versioned the
existing rv via ``BuildAutoNamedVersion(name, "rv", v)`` — dropping the
``__phi_vN`` role marker.

The role drop broke ``test_manual_scope_ifstmt_phi_read_after`` (added
in upstream hw-native-sys#1719): the test asserts ``Tensor \w+__phi_v\d+ =`` shows
up in the orchestration cpp, but our re-run had downgraded
``result__phi_v2`` to ``result__rv_v0``, and codegen prints
``name_hint_`` verbatim.

Fix: when re-processing existing return_vars in ``ConvertIf``, preserve
the original rv pointer identity if its name is already auto-versioned
(``IsAutoVersionedName``). Same idempotence philosophy as
``AllocVersion``. Both the "no new phis" path and the "preserve
existing rvs alongside new phis" path get the same treatment.
@YunjiQin YunjiQin force-pushed the issue-1693-escape-with-idempotent branch from 86523a7 to 1de2b27 Compare June 10, 2026 03:56
…trs dispatch, compress comments

Three review findings on PR hw-native-sys#1722:

1. NormalizedStmtStructure invalidation gap (high-priority). The new
   ``ConvertToSSA_post_outline`` reuses ``kConvertToSSAProperties`` which
   declares ``.invalidated = {NormalizedStmtStructure}``, but the next
   pass at the new position is ``ConvertTensorToTileOps`` which requires
   that property. Mirroring the original 4th-position pipeline, insert
   ``NormalizeStmtStructure_post_outline`` between them and update the
   default-strategy inventory test plus the pipeline doc table (en/zh).

2. ``SubstFunctionAttrs`` had a hard-coded ``"core_num"`` enumeration
   plus a "remember to extend this" comment — a silent-regression trap.
   Switch to generic ``ExprPtr`` dispatch: any attr stored as an
   ``ExprPtr`` is walked for Var refs and substituted. Future Var-
   bearing ``ExprPtr`` attrs are auto-handled. Non-``ExprPtr`` Var
   carriers (plain ``VarPtr`` / ``vector<VarPtr>``) still need explicit
   handling — none exist on Functions today, called out in the comment.

3. Multi-paragraph rationale blocks on ``AllocVersion`` /
   ``FindInitValue`` / the ConvertIf rv-preservation path compress to
   2–4 lines each (project convention favours minimal in-code comments;
   the full rationale lives in the PR description and the pass doc).
@YunjiQin

Copy link
Copy Markdown
Contributor Author

Superseded by upstream #1747 (d86f86b), which fixes #1693 via a different (and more thorough) approach: ScopeOutliner / ExpandMixedKernel / NormalizeReturnOrder now emit explicit param-return wrappers under a new ReturnParamsExplicit IR property + verifier, so outlined function ReturnStmts reference InOut params directly — the original SSA scope violation root cause is gone, no post-outline ConvertToSSA re-run needed.

Verified locally: the original #1693 repro on plain upstream/main with d86f86b emits SSA-legal IR (return a__ssa_v0, b__ssa_v0) with no __FREE_VAR placeholder. Closing this PR.

@YunjiQin YunjiQin closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug] Orchestration cross-assigns __ssa_v aliases for multiple in-place outputs of an inlined InCore scope

1 participant