fix(ir): derive cross-scope task dependencies#1759
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR extends orchestration codegen and the ChangesCross-Scope TaskId Hoisting and Compiler-Derived Dependency Tracking
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces support for compiler-derived cross-scope TaskId dependencies and loop fan-in task dependency tracking, updating the dependency derivation pass and orchestration codegen to handle dynamic TaskId collections and loop-carried carries. The review feedback suggests directly assigning boolean values to enable_scope_stats in the Python runtime to allow explicit disabling, and optimizing the C++ codegen maps by using uint64_t instead of std::string keys for UniqueId-based lookups to avoid unnecessary heap allocations.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/ut/ir/transforms/test_auto_derive_task_dependencies.py (1)
1276-1342:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd an explicit no-edge assertion to the fallback regressions.
These cases only check
manual is False; a regression that leaves stalecompiler_manual_dep_edgeson theconsumecall would still pass. Please assert the compiler edge list is empty here so the fallback contract is exercised end-to-end.🛠️ Suggested check
out = _run_auto_deps(Prog) + consume_call = _user_calls(out, "consume")[0] scopes = _runtime_scopes(out) assert len(scopes) == 1 assert scopes[0].manual is False + assert _compiler_edges(consume_call) == []Also applies to: 1604-1659
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/ut/ir/transforms/test_auto_derive_task_dependencies.py` around lines 1276 - 1342, Add an explicit assertion that the compiler-added manual dependency edge list on the consume submission is empty: after obtaining out = _run_auto_deps(Prog) and scopes = _runtime_scopes(out) (and the existing asserts), locate the runtime call corresponding to the consume submit (the submit that invokes the consume function) and assert that its compiler_manual_dep_edges (or equivalent attribute used to store compiler-inserted edges) is empty (e.g., == []); do this in both test_partial_loop_task_id_array_deps_still_falls_back and test_partial_user_deps_with_dynamic_hazard_still_falls_back so the fallback contract is validated end-to-end.
🧹 Nitpick comments (3)
tests/ut/runtime/test_pto_rebuild.py (1)
160-160: 💤 Low valueRedundant assertion.
Line 161 already asserts that
'extern "C"'is not present anywhere inbody. This makes line 160 redundant, since if'extern "C"'doesn't exist at all, the substring'extern "C" static __aicore__'also cannot exist. Removing line 160 would simplify the test without losing coverage.♻️ Simplify by removing the redundant check
body = _preprocess_ptoas_body(raw) - assert 'extern "C" static __aicore__' not in body assert 'extern "C"' not in body assert "static __aicore__ void copy_hidden" in body assert _extract_func_names(raw) == ["copy_hidden"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/ut/runtime/test_pto_rebuild.py` at line 160, Remove the redundant assertion that checks for 'extern "C" static __aicore__' in the test; since the test already asserts that 'extern "C"' is not present in the variable body, the more specific assertion (assert 'extern "C" static __aicore__' not in body) can be removed—locate the assertion line in tests/ut/runtime/test_pto_rebuild.py referencing the variable body and delete that single redundant assert to simplify the test while preserving coverage.src/codegen/orchestration/orchestration_codegen.cpp (1)
376-379: 💤 Low valueVariable shadowing may cause confusion.
The inner variable
keyon line 378 shadows the loop variablekeyfrom line 372. While this compiles correctly (the innerkeyis intentionally different - one is the attr key string, the other is the VarKey), the name collision reduces readability.Consider renaming the inner variable to
var_keyfor clarity:Suggested rename
for (const auto& edge : *edges) { if (!edge) continue; - const std::string key = VarKey(edge.get()); - ref_var_by_key.emplace(key, edge.get()); - refs[key].ref_scopes.push_back(CurrentScope()); + const std::string var_key = VarKey(edge.get()); + ref_var_by_key.emplace(var_key, edge.get()); + refs[var_key].ref_scopes.push_back(CurrentScope()); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/codegen/orchestration/orchestration_codegen.cpp` around lines 376 - 379, The loop creates a new local named key that shadows an outer key variable (lines around the loop using VarKey and the attr key), reducing readability; rename the inner variable (e.g., change the VarKey result currently assigned to key to var_key) wherever used inside the loop (the for (const auto& edge : *edges) block that calls VarKey(edge.get()) and inserts into ref_var_by_key) so the outer attr key and the VarKey-derived name are distinct.tests/ut/codegen/test_orchestration_codegen.py (1)
6352-6373: ⚡ Quick winKeep the verification downgrade opt-in in
_generate_orch_full_pipeline.This helper now forces
BEFORE_AND_AFTERfor every caller, so unrelated tests stop exercising the default print→parse verification path. Please make the relaxed mode opt-in per test and only use it for the knownpl.submit(..., deps=[...])roundtrip gap.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 6352 - 6373, The helper _generate_orch_full_pipeline currently forces VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER) for all callers; make this relaxed verification mode opt-in by adding a boolean parameter (e.g., allow_relaxed_verification: bool = False) to _generate_orch_full_pipeline and only wrap the pm.run_passes(program_cls) call in with passes.PassContext([passes.VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER)]) when that parameter is true; leave the existing behavior unchanged when false so other tests continue to exercise the default print→parse verification path, and update any tests that need the relaxed mode to pass the new flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/en/dev/codegen/01-orchestration_codegen.md`:
- Around line 493-498: The sentence in the docs incorrectly states the direction
of producer/consumer scopes; update the wording around compiler_manual_dep_edges
to say that when a later sibling or parent scope references a producer TaskId
that was created in an earlier nested or sibling scope, codegen hoists only that
TaskId binding: declare a PTO2TaskId <name> = PTO2TaskId::invalid(); sentinel
before the producer PTO2_SCOPE, assign <name> = task_<n>_outs.task_id(); inside
the block, and then emit the later guarded set_dependencies(...) entry from the
enclosing C++ scope, while leaving ordinary scope-local TaskIds local.
In `@python/pypto/runtime/device_runner.py`:
- Around line 616-620: The assignment to cfg.enable_dump_tensor is incorrectly
coercing a 3-state dump-level into a boolean (using bool(enable_dump_tensor));
remove the bool(...) cast and pass the original integer-level value through so
off/partial/full (e.g., 0/1/2) semantics are preserved; update the analogous
assignment in python/pypto/runtime/runner.py as well (where enable_dump_tensor
is set) to stop collapsing to boolean and ensure cfg.enable_dump_tensor receives
the raw integer level.
In `@src/ir/transforms/auto_derive_task_dependencies_pass.cpp`:
- Around line 1511-1516: The helper MarkCurrentScopeFallback currently refuses
to set fallback for virtual whole-body AUTO scopes because of the
is_virtual_whole_body_stack_ check; remove that check so
MarkCurrentScopeFallback always sets fallback_stack_.back() = true when the
stacks are non-empty, and ensure the subsequent fallback handling logic that
strips compiler_manual_dep_edges for the current scope is invoked for virtual
whole-body scopes as well (reference MarkCurrentScopeFallback, fallback_stack_,
is_virtual_whole_body_stack_, and compiler_manual_dep_edges).
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 5109-5115: The test expects that hoisted compiler-derived TaskId
variables (the regex-captured producer_tid which is initialized to
PTO2TaskId::invalid()) are only appended into dependency arrays after checking
validity; update the code that emits "params_t1_deps[params_t1_deps_count++] =
{producer_tid}" to instead guard the append with an `.is_valid()` check (e.g.,
if ({producer_tid}.is_valid()) {{ params_t1_deps[params_t1_deps_count++] =
{producer_tid}; }}), and apply the same guarded-emission fix for the analogous
places that append into params_t2_deps/other params_t*_deps so all
hoisted/manual TaskId carriers are only pushed when is_valid() returns true.
In `@tests/ut/runtime/test_block_dim_plumbing.py`:
- Line 104: The assertion that cfg.enable_dump_tensor is False in
tests/ut/runtime/test_block_dim_plumbing.py is misplaced because
python/pypto/runtime/device_runner.py sets cfg.enable_dump_tensor directly from
the enable_dump_tensor argument (cfg.enable_dump_tensor =
bool(enable_dump_tensor)) with no block_dim gating; either remove or change this
assertion in the block-dim plumbing test and instead add a dedicated test for
default DFX/debug flag behavior that covers (a) when enable_dump_tensor is
omitted/None the default is False and (b) when explicitly passed True/False it
reflects that value, or alternatively update the existing test to explicitly
pass enable_dump_tensor=False when constructing the DeviceRunner (or invoking
the function that sets cfg.enable_dump_tensor) so the test matches the actual
code path.
---
Outside diff comments:
In `@tests/ut/ir/transforms/test_auto_derive_task_dependencies.py`:
- Around line 1276-1342: Add an explicit assertion that the compiler-added
manual dependency edge list on the consume submission is empty: after obtaining
out = _run_auto_deps(Prog) and scopes = _runtime_scopes(out) (and the existing
asserts), locate the runtime call corresponding to the consume submit (the
submit that invokes the consume function) and assert that its
compiler_manual_dep_edges (or equivalent attribute used to store
compiler-inserted edges) is empty (e.g., == []); do this in both
test_partial_loop_task_id_array_deps_still_falls_back and
test_partial_user_deps_with_dynamic_hazard_still_falls_back so the fallback
contract is validated end-to-end.
---
Nitpick comments:
In `@src/codegen/orchestration/orchestration_codegen.cpp`:
- Around line 376-379: The loop creates a new local named key that shadows an
outer key variable (lines around the loop using VarKey and the attr key),
reducing readability; rename the inner variable (e.g., change the VarKey result
currently assigned to key to var_key) wherever used inside the loop (the for
(const auto& edge : *edges) block that calls VarKey(edge.get()) and inserts into
ref_var_by_key) so the outer attr key and the VarKey-derived name are distinct.
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 6352-6373: The helper _generate_orch_full_pipeline currently
forces VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER) for all
callers; make this relaxed verification mode opt-in by adding a boolean
parameter (e.g., allow_relaxed_verification: bool = False) to
_generate_orch_full_pipeline and only wrap the pm.run_passes(program_cls) call
in with
passes.PassContext([passes.VerificationInstrument(passes.VerificationMode.BEFORE_AND_AFTER)])
when that parameter is true; leave the existing behavior unchanged when false so
other tests continue to exercise the default print→parse verification path, and
update any tests that need the relaxed mode to pass the new flag.
In `@tests/ut/runtime/test_pto_rebuild.py`:
- Line 160: Remove the redundant assertion that checks for 'extern "C" static
__aicore__' in the test; since the test already asserts that 'extern "C"' is not
present in the variable body, the more specific assertion (assert 'extern "C"
static __aicore__' not in body) can be removed—locate the assertion line in
tests/ut/runtime/test_pto_rebuild.py referencing the variable body and delete
that single redundant assert to simplify the test while preserving coverage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 719f7798-e50d-4edd-b999-716aba908273
📒 Files selected for processing (16)
.claude/rules/pass-doc-ordering.mddocs/en/dev/codegen/01-orchestration_codegen.mddocs/en/dev/passes/35-auto_derive_task_dependencies.mddocs/zh-cn/dev/codegen/01-orchestration_codegen.mddocs/zh-cn/dev/passes/35-auto_derive_task_dependencies.mdpython/pypto/backend/pto_backend.pypython/pypto/runtime/debug/pto_rebuild.pypython/pypto/runtime/device_runner.pypython/pypto/runtime/runner.pysrc/codegen/orchestration/orchestration_codegen.cppsrc/ir/transforms/auto_derive_task_dependencies_pass.cpptests/ut/codegen/test_orchestration_codegen.pytests/ut/codegen/test_pto_codegen.pytests/ut/ir/transforms/test_auto_derive_task_dependencies.pytests/ut/runtime/test_block_dim_plumbing.pytests/ut/runtime/test_pto_rebuild.py
c0a9f5b to
28be0b0
Compare
- Preserve runtime DFX flag semantics in CallConfig wiring - Strip compiler deps on virtual AUTO fallback - Tighten TaskId dependency tests and docs wording
Summary
Stacked on #1753 (
fix-issue-1744). This PR adds the follow-up cross-scope/P3 auto-deps work:set_dependencies(...)needs a producer from a sibling/parent scopeValidation
git diff --check --cachedruff check ...on touched Python filesruff format --check ...on touched Python filespython -m py_compile tests\\ut\\codegen\\test_orchestration_codegen.py tests\\ut\\ir\\transforms\\test_auto_derive_task_dependencies.py tests\\ut\\runtime\\test_pto_rebuild.pycheck-english-onlybecause it incorrectly scans existingdocs/zh-cnfiles despite its own exclude listpython tests\\lint\\clang_tidy.py --diff-base upstream/main --jobs 4 --verbosereported no lint target after detecting changed C++ file(s), so C++ validation relies on the remote build/tests below.venv:cmake --build build-native --parallel 8 --target pypto_core.venv:python -m pytest tests/ut/ir/transforms/test_auto_derive_task_dependencies.py -q-> 46 passed.venv:python -m pytest tests/ut/codegen/test_orchestration_codegen.py -q-> 135 passed, 17 warningstask_20260612_013226_34605516078passed; non-LLM-head TensorMap unique pair coverage 19/19, non-creator explicit share 98.64%, generated C++set_dependencies=21