Skip to content

fix(ir): derive cross-scope task dependencies#1759

Open
sunkaixuan2018 wants to merge 7 commits into
hw-native-sys:mainfrom
sunkaixuan2018:codex/cross-scope-auto-deps-plan
Open

fix(ir): derive cross-scope task dependencies#1759
sunkaixuan2018 wants to merge 7 commits into
hw-native-sys:mainfrom
sunkaixuan2018:codex/cross-scope-auto-deps-plan

Conversation

@sunkaixuan2018

Copy link
Copy Markdown
Contributor

Summary

Stacked on #1753 (fix-issue-1744). This PR adds the follow-up cross-scope/P3 auto-deps work:

  • export analyzed producer accesses across runtime-scope boundaries and preserve loop-carried TaskId bindings for later consumers
  • hoist compiler-derived TaskIds in orchestration codegen when set_dependencies(...) needs a producer from a sibling/parent scope
  • cover fixed/dynamic loop fan-in, tuple outputs, fire-and-forget producers, mixed group producers, and qk_norm/rope-style dependency stability
  • keep runtime/debug plumbing aligned for the updated Qwen validation path and update English/zh-CN developer docs

Validation

  • local: git diff --check --cached
  • local: ruff check ... on touched Python files
  • local: ruff format --check ... on touched Python files
  • local: python -m py_compile tests\\ut\\codegen\\test_orchestration_codegen.py tests\\ut\\ir\\transforms\\test_auto_derive_task_dependencies.py tests\\ut\\runtime\\test_pto_rebuild.py
  • local pre-commit during commit: check-headers, large files, yaml, EOF, trailing whitespace, clang-format, cpplint, markdownlint-cli2, ruff check, ruff format, pyright passed; skipped check-english-only because it incorrectly scans existing docs/zh-cn files despite its own exclude list
  • clang-tidy: python tests\\lint\\clang_tidy.py --diff-base upstream/main --jobs 4 --verbose reported no lint target after detecting changed C++ file(s), so C++ validation relies on the remote build/tests below
  • remote isolated .venv: cmake --build build-native --parallel 8 --target pypto_core
  • remote isolated .venv: python -m pytest tests/ut/ir/transforms/test_auto_derive_task_dependencies.py -q -> 46 passed
  • remote isolated .venv: python -m pytest tests/ut/codegen/test_orchestration_codegen.py -q -> 135 passed, 17 warnings
  • remote NPU Qwen decode layer: task_20260612_013226_34605516078 passed; non-LLM-head TensorMap unique pair coverage 19/19, non-creator explicit share 98.64%, generated C++ set_dependencies=21

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b691437f-5ca0-4fc4-b4bd-7307a41317ae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extends orchestration codegen and the AutoDeriveTaskDependencies pass to support rich TaskId binding, track array-typed task IDs with dynamic coverage completeness, hoist TaskIds across scope boundaries, and emit compiler-derived dependencies for loop fan-in and cross-scope patterns while refining fallback conditions.

Changes

Cross-Scope TaskId Hoisting and Compiler-Derived Dependency Tracking

Layer / File(s) Summary
Documentation and specification updates
.claude/rules/pass-doc-ordering.md, docs/en/dev/codegen/01-orchestration_codegen.md, docs/en/dev/passes/35-auto_derive_task_dependencies.md, docs/zh-cn/dev/codegen/*, docs/zh-cn/dev/passes/*
Pass documentation refined to specify conservative task-ID array expansion (expand only when all static slots are covered), dynamic coverage completeness checks, and cross-scope TaskId hoisting behavior (invalid-init→assign→guard emission) for compiler-derived dependencies.
AutoDeriveTaskDependencies pass: task-ID discovery and tracking
src/ir/transforms/auto_derive_task_dependencies_pass.cpp
Extended SubmitTaskIdCollector to record array TaskId lineage (static/dynamic slots, extents, dynamic-update unknowns) and propagate loop-carried task-ID arrays. AutoDepMutator now tracks is_virtual_whole_body state, remaps loop-carried TaskIds per trip type (single, fixed, dynamic), iterates the full prior-stack frames, canonicalizes TaskIds including array expansions, and gates fallback based on dynamic producer coverage completeness. Constructor signature updated to accept new task-ID mapping structures.
Orchestration codegen: cross-scope TaskId hoisting and binding
src/codegen/orchestration/orchestration_codegen.cpp
Added PrepareCrossScopeTaskIdHoists to analyze RuntimeScopeStmt nesting and compute hoist targets via LCA/scope-depth logic. Introduced ManualTaskIdBinding variant to represent scalar, vector-of-strings, and dynamic std::vector<PTO2TaskId> collections. Loop lowering distinguishes compiler-dep vs dynamic collections; parallel paths skip outer init for compiler-dep collection. Task submission/call emission computes compiler-dep output args and binds synthetic TaskIds. EmitManualDeps refactored to support both fixed-size stack arrays and dynamic vectors. New helpers declare hoisted TaskIds, resolve bindings, and bind synthetic/variable TaskIds while reusing hoisted names.
Test suite: cross-scope hoisting and dependency emission
tests/ut/codegen/test_orchestration_codegen.py, tests/ut/ir/transforms/test_auto_derive_task_dependencies.py, tests/ut/codegen/test_pto_codegen.py
Updated existing tests for reformatted C++ layouts and strengthened regex assertions for TaskId invalid→assign patterns. Extended _generate_orch_full_pipeline with analyze_auto_scopes_for_deps flag and added _generate_orch_from_transformed_program helper. New comprehensive tests cover fixed/dynamic-trip loop fan-in, tensor/tuple carriers, manual-scope preservation, partial coverage fallback, sibling/nested auto-scope edge export, cross-scope read/read behavior, and whole-body dynamic skip preservation.
Runtime and utility fixes
python/pypto/backend/pto_backend.py, python/pypto/runtime/debug/pto_rebuild.py, python/pypto/runtime/device_runner.py, python/pypto/runtime/runner.py, tests/ut/runtime/test_block_dim_plumbing.py, tests/ut/runtime/test_pto_rebuild.py
Updated ptoas-output preprocessing regex (_PTOAS_FUNC_DEF_RE) to recognize optional extern "C" and __global__ qualifiers. DFX flag handling: enable_dump_tensor now explicitly coerced to boolean, enable_scope_stats set only when enabled (not unconditionally assigned). Tests verify boolean coercion and extern "C" handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

Possibly related PRs

  • hw-native-sys/pypto#1560: Both PRs modify orchestration codegen's manual-scope dependency TaskId hoisting and cross-scope set_dependencies emission via the same OrchestrationStmtCodegen codepaths.
  • hw-native-sys/pypto#1582: Both PRs modify TaskId binding/snapshotting from array.get_element(...Array[TASK_ID]...) and resolution when emitting deps/set_dependencies across PTO2_SCOPE boundaries.

Suggested labels

enhancement

Poem

🐰 A hoppy tale of TaskIds bright,
Array slots tracked, dynamic in flight!
Hoists and binds across every scope,
Loop fan-in fans find newfound hope.
Deps now dance where shadows used to hide! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main objective of the PR: deriving cross-scope task dependencies, which aligns with the primary changes across the codebase.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the key objectives (cross-scope dependency exporting, TaskId hoisting, loop fan-in handling, etc.) and validation steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for compiler-derived cross-scope TaskId dependencies and loop fan-in task dependency tracking, updating the dependency derivation pass and orchestration codegen to handle dynamic TaskId collections and loop-carried carries. The review feedback suggests directly assigning boolean values to enable_scope_stats in the Python runtime to allow explicit disabling, and optimizing the C++ codegen maps by using uint64_t instead of std::string keys for UniqueId-based lookups to avoid unnecessary heap allocations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread python/pypto/runtime/device_runner.py Outdated
Comment thread python/pypto/runtime/runner.py Outdated
Comment thread src/codegen/orchestration/orchestration_codegen.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/ut/ir/transforms/test_auto_derive_task_dependencies.py (1)

1276-1342: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add an explicit no-edge assertion to the fallback regressions.

These cases only check manual is False; a regression that leaves stale compiler_manual_dep_edges on the consume call would still pass. Please assert the compiler edge list is empty here so the fallback contract is exercised end-to-end.

🛠️ Suggested check
         out = _run_auto_deps(Prog)
+        consume_call = _user_calls(out, "consume")[0]
         scopes = _runtime_scopes(out)
         assert len(scopes) == 1
         assert scopes[0].manual is False
+        assert _compiler_edges(consume_call) == []

Also applies to: 1604-1659

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/transforms/test_auto_derive_task_dependencies.py` around lines
1276 - 1342, Add an explicit assertion that the compiler-added manual dependency
edge list on the consume submission is empty: after obtaining out =
_run_auto_deps(Prog) and scopes = _runtime_scopes(out) (and the existing
asserts), locate the runtime call corresponding to the consume submit (the
submit that invokes the consume function) and assert that its
compiler_manual_dep_edges (or equivalent attribute used to store
compiler-inserted edges) is empty (e.g., == []); do this in both
test_partial_loop_task_id_array_deps_still_falls_back and
test_partial_user_deps_with_dynamic_hazard_still_falls_back so the fallback
contract is validated end-to-end.
🧹 Nitpick comments (3)
tests/ut/runtime/test_pto_rebuild.py (1)

160-160: 💤 Low value

Redundant assertion.

Line 161 already asserts that 'extern "C"' is not present anywhere in body. This makes line 160 redundant, since if 'extern "C"' doesn't exist at all, the substring 'extern "C" static __aicore__' also cannot exist. Removing line 160 would simplify the test without losing coverage.

♻️ Simplify by removing the redundant check
     body = _preprocess_ptoas_body(raw)
-    assert 'extern "C" static __aicore__' not in body
     assert 'extern "C"' not in body
     assert "static __aicore__ void copy_hidden" in body
     assert _extract_func_names(raw) == ["copy_hidden"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/runtime/test_pto_rebuild.py` at line 160, Remove the redundant
assertion that checks for 'extern "C" static __aicore__' in the test; since the
test already asserts that 'extern "C"' is not present in the variable body, the
more specific assertion (assert 'extern "C" static __aicore__' not in body) can
be removed—locate the assertion line in tests/ut/runtime/test_pto_rebuild.py
referencing the variable body and delete that single redundant assert to
simplify the test while preserving coverage.
src/codegen/orchestration/orchestration_codegen.cpp (1)

376-379: 💤 Low value

Variable shadowing may cause confusion.

The inner variable key on line 378 shadows the loop variable key from line 372. While this compiles correctly (the inner key is intentionally different - one is the attr key string, the other is the VarKey), the name collision reduces readability.

Consider renaming the inner variable to var_key for clarity:

Suggested rename
           for (const auto& edge : *edges) {
             if (!edge) continue;
-            const std::string key = VarKey(edge.get());
-            ref_var_by_key.emplace(key, edge.get());
-            refs[key].ref_scopes.push_back(CurrentScope());
+            const std::string var_key = VarKey(edge.get());
+            ref_var_by_key.emplace(var_key, edge.get());
+            refs[var_key].ref_scopes.push_back(CurrentScope());
           }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/codegen/orchestration/orchestration_codegen.cpp` around lines 376 - 379,
The loop creates a new local named key that shadows an outer key variable (lines
around the loop using VarKey and the attr key), reducing readability; rename the
inner variable (e.g., change the VarKey result currently assigned to key to
var_key) wherever used inside the loop (the for (const auto& edge : *edges)
block that calls VarKey(edge.get()) and inserts into ref_var_by_key) so the
outer attr key and the VarKey-derived name are distinct.
tests/ut/codegen/test_orchestration_codegen.py (1)

6352-6373: ⚡ Quick win

Keep the verification downgrade opt-in in _generate_orch_full_pipeline.

This helper now forces BEFORE_AND_AFTER for every caller, so unrelated tests stop exercising the default print→parse verification path. Please make the relaxed mode opt-in per test and only use it for the known pl.submit(..., deps=[...]) roundtrip gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 6352 - 6373, The
helper _generate_orch_full_pipeline currently forces
VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER) for all
callers; make this relaxed verification mode opt-in by adding a boolean
parameter (e.g., allow_relaxed_verification: bool = False) to
_generate_orch_full_pipeline and only wrap the pm.run_passes(program_cls) call
in with
passes.PassContext([passes.VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER)])
when that parameter is true; leave the existing behavior unchanged when false so
other tests continue to exercise the default print→parse verification path, and
update any tests that need the relaxed mode to pass the new flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/dev/codegen/01-orchestration_codegen.md`:
- Around line 493-498: The sentence in the docs incorrectly states the direction
of producer/consumer scopes; update the wording around compiler_manual_dep_edges
to say that when a later sibling or parent scope references a producer TaskId
that was created in an earlier nested or sibling scope, codegen hoists only that
TaskId binding: declare a PTO2TaskId <name> = PTO2TaskId::invalid(); sentinel
before the producer PTO2_SCOPE, assign <name> = task_<n>_outs.task_id(); inside
the block, and then emit the later guarded set_dependencies(...) entry from the
enclosing C++ scope, while leaving ordinary scope-local TaskIds local.

In `@python/pypto/runtime/device_runner.py`:
- Around line 616-620: The assignment to cfg.enable_dump_tensor is incorrectly
coercing a 3-state dump-level into a boolean (using bool(enable_dump_tensor));
remove the bool(...) cast and pass the original integer-level value through so
off/partial/full (e.g., 0/1/2) semantics are preserved; update the analogous
assignment in python/pypto/runtime/runner.py as well (where enable_dump_tensor
is set) to stop collapsing to boolean and ensure cfg.enable_dump_tensor receives
the raw integer level.

In `@src/ir/transforms/auto_derive_task_dependencies_pass.cpp`:
- Around line 1511-1516: The helper MarkCurrentScopeFallback currently refuses
to set fallback for virtual whole-body AUTO scopes because of the
is_virtual_whole_body_stack_ check; remove that check so
MarkCurrentScopeFallback always sets fallback_stack_.back() = true when the
stacks are non-empty, and ensure the subsequent fallback handling logic that
strips compiler_manual_dep_edges for the current scope is invoked for virtual
whole-body scopes as well (reference MarkCurrentScopeFallback, fallback_stack_,
is_virtual_whole_body_stack_, and compiler_manual_dep_edges).

In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 5109-5115: The test expects that hoisted compiler-derived TaskId
variables (the regex-captured producer_tid which is initialized to
PTO2TaskId::invalid()) are only appended into dependency arrays after checking
validity; update the code that emits "params_t1_deps[params_t1_deps_count++] =
{producer_tid}" to instead guard the append with an `.is_valid()` check (e.g.,
if ({producer_tid}.is_valid()) {{ params_t1_deps[params_t1_deps_count++] =
{producer_tid}; }}), and apply the same guarded-emission fix for the analogous
places that append into params_t2_deps/other params_t*_deps so all
hoisted/manual TaskId carriers are only pushed when is_valid() returns true.

In `@tests/ut/runtime/test_block_dim_plumbing.py`:
- Line 104: The assertion that cfg.enable_dump_tensor is False in
tests/ut/runtime/test_block_dim_plumbing.py is misplaced because
python/pypto/runtime/device_runner.py sets cfg.enable_dump_tensor directly from
the enable_dump_tensor argument (cfg.enable_dump_tensor =
bool(enable_dump_tensor)) with no block_dim gating; either remove or change this
assertion in the block-dim plumbing test and instead add a dedicated test for
default DFX/debug flag behavior that covers (a) when enable_dump_tensor is
omitted/None the default is False and (b) when explicitly passed True/False it
reflects that value, or alternatively update the existing test to explicitly
pass enable_dump_tensor=False when constructing the DeviceRunner (or invoking
the function that sets cfg.enable_dump_tensor) so the test matches the actual
code path.

---

Outside diff comments:
In `@tests/ut/ir/transforms/test_auto_derive_task_dependencies.py`:
- Around line 1276-1342: Add an explicit assertion that the compiler-added
manual dependency edge list on the consume submission is empty: after obtaining
out = _run_auto_deps(Prog) and scopes = _runtime_scopes(out) (and the existing
asserts), locate the runtime call corresponding to the consume submit (the
submit that invokes the consume function) and assert that its
compiler_manual_dep_edges (or equivalent attribute used to store
compiler-inserted edges) is empty (e.g., == []); do this in both
test_partial_loop_task_id_array_deps_still_falls_back and
test_partial_user_deps_with_dynamic_hazard_still_falls_back so the fallback
contract is validated end-to-end.

---

Nitpick comments:
In `@src/codegen/orchestration/orchestration_codegen.cpp`:
- Around line 376-379: The loop creates a new local named key that shadows an
outer key variable (lines around the loop using VarKey and the attr key),
reducing readability; rename the inner variable (e.g., change the VarKey result
currently assigned to key to var_key) wherever used inside the loop (the for
(const auto& edge : *edges) block that calls VarKey(edge.get()) and inserts into
ref_var_by_key) so the outer attr key and the VarKey-derived name are distinct.

In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 6352-6373: The helper _generate_orch_full_pipeline currently
forces VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER) for all
callers; make this relaxed verification mode opt-in by adding a boolean
parameter (e.g., allow_relaxed_verification: bool = False) to
_generate_orch_full_pipeline and only wrap the pm.run_passes(program_cls) call
in with
passes.PassContext([passes.VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER)])
when that parameter is true; leave the existing behavior unchanged when false so
other tests continue to exercise the default print→parse verification path, and
update any tests that need the relaxed mode to pass the new flag.

In `@tests/ut/runtime/test_pto_rebuild.py`:
- Line 160: Remove the redundant assertion that checks for 'extern "C" static
__aicore__' in the test; since the test already asserts that 'extern "C"' is not
present in the variable body, the more specific assertion (assert 'extern "C"
static __aicore__' not in body) can be removed—locate the assertion line in
tests/ut/runtime/test_pto_rebuild.py referencing the variable body and delete
that single redundant assert to simplify the test while preserving coverage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 719f7798-e50d-4edd-b999-716aba908273

📥 Commits

Reviewing files that changed from the base of the PR and between ac379e4 and c0a9f5b.

📒 Files selected for processing (16)
  • .claude/rules/pass-doc-ordering.md
  • docs/en/dev/codegen/01-orchestration_codegen.md
  • docs/en/dev/passes/35-auto_derive_task_dependencies.md
  • docs/zh-cn/dev/codegen/01-orchestration_codegen.md
  • docs/zh-cn/dev/passes/35-auto_derive_task_dependencies.md
  • python/pypto/backend/pto_backend.py
  • python/pypto/runtime/debug/pto_rebuild.py
  • python/pypto/runtime/device_runner.py
  • python/pypto/runtime/runner.py
  • src/codegen/orchestration/orchestration_codegen.cpp
  • src/ir/transforms/auto_derive_task_dependencies_pass.cpp
  • tests/ut/codegen/test_orchestration_codegen.py
  • tests/ut/codegen/test_pto_codegen.py
  • tests/ut/ir/transforms/test_auto_derive_task_dependencies.py
  • tests/ut/runtime/test_block_dim_plumbing.py
  • tests/ut/runtime/test_pto_rebuild.py

Comment thread docs/en/dev/codegen/01-orchestration_codegen.md Outdated
Comment thread python/pypto/runtime/device_runner.py Outdated
Comment thread src/ir/transforms/auto_derive_task_dependencies_pass.cpp
Comment thread tests/ut/codegen/test_orchestration_codegen.py Outdated
Comment thread tests/ut/runtime/test_block_dim_plumbing.py Outdated
@sunkaixuan2018 sunkaixuan2018 force-pushed the codex/cross-scope-auto-deps-plan branch from c0a9f5b to 28be0b0 Compare June 12, 2026 07:40
- Preserve runtime DFX flag semantics in CallConfig wiring

- Strip compiler deps on virtual AUTO fallback

- Tighten TaskId dependency tests and docs wording
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant