fix(codegen): materialize Scalar tuple-return elements in orchestration (#631) by Little-oil · Pull Request #1625 · hw-native-sys/pypto

Little-oil · 2026-06-01T09:45:18Z

Summary

When an incore kernel returns a tuple that mixes Tensor and Scalar elements, the orchestration codegen mapped each return position to an Out/InOut param. A Scalar element (e.g. an index o0 = ob * 32 computed from an input param) is not backed by an Out param, so it resolved to no param index and was dropped. The generated C++ then referenced an undefined variable (or, after an earlier stop-gap, a wrong o0 = 0), breaking compilation or producing incorrect runtime values.

This is the full fix: the orchestration codegen now materializes Scalar tuple-return elements at the call site (e.g. emitting int64_t o0 = (ob * 32);) instead of dropping them. It traces the callee's ReturnStmt value at that position, inlines its local scalar definitions, substitutes each callee param with the expression actually passed for it, and renders via the existing GenerateExprString path. When an element cannot be traced to a pure arithmetic function of the call args, it falls back to the prior 0 default so generated code stays compilable. The fix is applied on both the plain tuple-return path and the pl.submit return-alias path.

Addresses #631

Test plan

Host-side unit test added: tests/ut/codegen/test_orchestration_codegen.py::test_scalar_tuple_return_materialized, which verifies the generated orchestration C++ materializes the Scalar tuple-return element instead of dropping it.
The on-device ST scenario for this issue must be validated manually via task-submit on device 9. CI does not run device ST, so this is not exercised automatically here.

…on (hw-native-sys#631) When an incore kernel returns a tuple mixing Tensor and Scalar elements, the orchestration codegen mapped each return position to an Out/InOut param. A Scalar element (e.g. an index `o0 = ob * 32` computed from an input param) is not backed by an Out param, so it resolved to no param index and was dropped — the generated C++ then referenced an undefined variable (or, after a stop-gap, a wrong `o0 = 0`), breaking compilation or producing incorrect runtime values. Re-materialize such Scalar elements at the call site: trace the callee's ReturnStmt value at that position, inline its local scalar definitions, substitute each callee param with the expression actually passed for it, and render via the existing GenerateExprString path (e.g. `int64_t o0 = (ob * 32);`). When the element cannot be traced to a pure arithmetic function of the call args, fall back to the prior 0 default so generated code stays compilable. Applied to both the plain tuple-return and pl.submit return-alias paths.

coderabbitai · 2026-06-01T09:45:27Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 73945f62-2439-4e16-9bee-c942610761fc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds support for rematerializing scalar tuple elements from callee-defined expressions instead of unconditionally emitting 0 fallback values. A new helper function traces and inlines callee scalar definitions with caller-supplied parameters, wires the logic into tuple and submit return alias generation, and includes a regression test.

Changes

Scalar Tuple Return Rematerialization

Layer / File(s)	Summary
Scalar rematerialization helper `src/codegen/orchestration/orchestration_codegen.cpp`	Adds `pypto/core/error.h` include and implements `TryRenderCalleeScalarReturn`, which locates the callee's return statement, collects scalar variable definitions from `AssignStmt`s, maps callee parameters to the caller's provided arguments, recursively inlines local scalar definitions with cycle detection, and renders the result via `GenerateExprString`; returns `nullopt` on unsupported expressions to preserve the `0` fallback.
Return alias generation integration `src/codegen/orchestration/orchestration_codegen.cpp`	`GenerateTupleReturnAliases` and `GenerateSubmitReturnAliases` now invoke `TryRenderCalleeScalarReturn` for non-parameter-backed scalar tuple elements; successfully traced expressions are emitted inline, otherwise fall back to `0`.
Test coverage `tests/ut/codegen/test_orchestration_codegen.py`	`test_scalar_tuple_return_materialized` regression test validates rematerialization of scalar tuple elements via inlined callee expressions, confirms no erroneous zero defaults for traceable scalars, and verifies correct tensor aliasing and scalar propagation.

Sequence Diagram(s)

sequenceDiagram
  participant TryRenderCalleeScalarReturn
  participant CalleeFinder
  participant DefnCollector
  participant ParamMapper
  participant Inliner
  participant ExprRenderer
  
  TryRenderCalleeScalarReturn->>CalleeFinder: locate callee ReturnStmt
  TryRenderCalleeScalarReturn->>DefnCollector: collect scalar AssignStmts in callee
  TryRenderCalleeScalarReturn->>ParamMapper: map callee params to call->args_
  TryRenderCalleeScalarReturn->>Inliner: recursively inline local defs<br/>(with cycle guard)
  Inliner->>Inliner: substitute params in expr
  TryRenderCalleeScalarReturn->>ExprRenderer: GenerateExprString(resolved expr)
  ExprRenderer-->>TryRenderCalleeScalarReturn: C++ expression or NotImplementedError
  TryRenderCalleeScalarReturn-->>TryRenderCalleeScalarReturn: return expression or nullopt

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

hw-native-sys/pypto#1602: Prevents scalars from being incorrectly treated as tensor writebacks; this PR builds on that fix by replacing the resulting 0 fallback with rematerialized expressions.
hw-native-sys/pypto#1581: Also modifies GenerateTupleReturnAliases and GenerateSubmitReturnAliases to handle unmapped return elements, directly connected at the alias-materialization level.

Poem

A rabbit hops through callee trees,
Finding scalar lost at ease,
Traces params, inlines deep,
No more zeros in the heap! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: materializing Scalar tuple-return elements in orchestration codegen to fix issue `#631`. It is concise, specific, and clearly describes the primary fix.
Description check	✅ Passed	The description is directly related to the changeset, providing clear context about the problem (Scalar tuple-return elements being dropped), the solution (materializing them at call sites), implementation details, and the test plan.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces the TryRenderCalleeScalarReturn method in the orchestration codegen to inline and re-materialize scalar return-tuple elements at the call site based on the callee's defining expression and call-site arguments, resolving issue #631. It also adds a unit test to verify this behavior. The reviewer suggests that the order-insensitive collection of variable definitions in ScalarDefCollector can be incorrect and recommends using a prefix-based approach that accumulates definitions as statements are scanned to handle variable availability and dominance correctly.

gemini-code-assist · 2026-06-01T09:46:59Z

+      void VisitStmt_(const AssignStmtPtr& assign) override {
+        if (assign->var_) var_def.emplace(assign->var_.get(), assign->value_);
+        IRVisitor::VisitStmt_(assign);
+      }


When performing variable definition analysis, an order-insensitive collection of all definitions can be incorrect. Instead of using a simple map emplace/insert which doesn't respect execution flow, use a prefix-based approach that accumulates definitions as statements are scanned to correctly handle variable availability and dominance.

void VisitStmt_(const AssignStmtPtr& assign) override { if (assign->var_) { accumulated_defs.push_back({assign->var_.get(), assign->value_}); } IRVisitor::VisitStmt_(assign); }

References

When performing variable definition analysis, an order-insensitive collection of all definitions can be incorrect. Instead, use a prefix-based approach that accumulates definitions as statements are scanned to correctly handle variable availability and dominance.

…hw-native-sys#631) The Scalar tuple-return materialization renders the resolved return expression via GenerateExprString. That path only handles a fixed set of scalar kinds (consts, arithmetic, min/max, comparisons, cast, tuple-get); a resolved expression containing an unsupported kind — e.g. a Call to a scalar-returning helper — makes it throw NotImplementedError, which aborted orchestration codegen for kernels that compiled fine before (seen on DeepSeek-V4 decode_attention_swa: "GenerateExprString not implemented for expression type: Call"). Catch NotImplementedError in TryRenderCalleeScalarReturn and fall back to std::nullopt, so the caller emits the prior `0` default instead of aborting. Materialization is thus best-effort: it fires for pure arithmetic-of-args scalars and is a no-op (old behavior) for anything GenerateExprString cannot render.

coderabbitai

🧹 Nitpick comments (1)

tests/ut/codegen/test_orchestration_codegen.py (1)

775-838: ⚡ Quick win

Add a companion regression for the pl.submit return-alias path.

This test validates the plain tuple-return rematerialization path well, but the PR also updates the pl.submit return-alias path. Please add a focused pl.submit variant (tuple[Tensor, Scalar]) to lock coverage for both touched paths.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 775 - 838, Add a
companion test that mirrors test_scalar_tuple_return_materialized but exercises
the pl.submit return-alias path: create a new test function (e.g.,
test_scalar_tuple_return_materialized_submit) that defines the same
ScalarTupleReturnProgram (or reuse producer/consumer) and in the Orchestration
function call the producer via pl.submit (e.g., t, off =
pl.submit(self.producer, a, base, mid) or the framework's submit syntax) so the
codegen path for pl.submit is exercised; generate code with _generate_orch_code
and assert the same expectations as the original test (presence of "int64_t off
= (base * 32);", absence of "int64_t off = 0;", "const Tensor& t = ext_mid;",
and "add_scalar(off)") to lock coverage for the submit return-alias variant.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 775-838: Add a companion test that mirrors
test_scalar_tuple_return_materialized but exercises the pl.submit return-alias
path: create a new test function (e.g.,
test_scalar_tuple_return_materialized_submit) that defines the same
ScalarTupleReturnProgram (or reuse producer/consumer) and in the Orchestration
function call the producer via pl.submit (e.g., t, off =
pl.submit(self.producer, a, base, mid) or the framework's submit syntax) so the
codegen path for pl.submit is exercised; generate code with _generate_orch_code
and assert the same expectations as the original test (presence of "int64_t off
= (base * 32);", absence of "int64_t off = 0;", "const Tensor& t = ext_mid;",
and "add_scalar(off)") to lock coverage for the submit return-alias variant.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 425042b0-4ea6-402c-b5e2-90a27d2bcd3b

📥 Commits

Reviewing files that changed from the base of the PR and between e851ba5 and 3f78c2e.

📒 Files selected for processing (2)

src/codegen/orchestration/orchestration_codegen.cpp
tests/ut/codegen/test_orchestration_codegen.py

…s#631) The hw-native-sys#631 fix re-materializes Scalar tuple-return elements in both the plain tuple-return path (GenerateTupleReturnAliases) and the pl.submit path (GenerateSubmitReturnAliases), but only the plain path had a regression test. Add test_scalar_tuple_return_materialized_submit, a focused pl.submit variant returning tuple[Tensor, Scalar] inside a pl.manual_scope, asserting the scalar is inlined to "int64_t off = (base * 32);" (no "off = 0" stop-gap) and forwarded downstream via add_scalar(off).

…ative-sys#631)" This reverts commit 137328914 (the added test_scalar_tuple_return_materialized_submit). The test mis-unpacked pl.submit: it used `t, off, _tid = pl.submit(...)`, but the parser requires exactly 2 targets (result, task_id) — the callee's tuple return is the single `result`, not inline-unpackable — so the test raised ParserSyntaxError and failed unit-tests. There is no proven tuple-returning-callee pl.submit pattern in the suite to mirror, and the submit return-alias path already exercises the same TryRenderCalleeScalarReturn helper that the passing plain-path test (test_scalar_tuple_return_materialized) covers, so dropping this unverified test restores green CI without losing meaningful coverage. A dedicated submit-path test can be added once the tuple-result submit DSL form is verified on a build.

…ing (hw-native-sys#631) End-to-end system test reproducing the issue hw-native-sys#631 scenario: an InCore producer returns tuple[Tensor, Scalar[INDEX]] where the Scalar (off = base*2) is computed inside the kernel and is not backed by any Out param, and the orchestration forwards that Scalar to a downstream InCore consumer. The consumer uses the forwarded Scalar as its valid column count (valid_shapes=[rows, off]), so the output's valid region depends on the materialized Scalar value: before the fix the Scalar was dropped (undefined variable -> kernel failed to compile); a zero/wrong fallback would zero or shift the valid region. With the fix the Scalar is re-materialized at the call site (int64_t off = (base * 2)) and the result matches the golden. Complements the codegen UT test_scalar_tuple_return_materialized with a runtime (compile + execute) check across all platforms.

…ventions (hw-native-sys#631) Restructure the hw-native-sys#631 system test to follow the sibling ST style (test_dyn_orch_shape.py): the test case constructor now takes (shape, base), the pytest suite parametrizes over (shape, base) x platform, get_name embeds the config, and module-level constants are replaced by a _SHAPE_BASE config list. The DSL program and golden are unchanged (still device-validated).

…m style (hw-native-sys#631) Match the test_set_validshape.py convention: the @pl.program is defined at module level (not nested in get_program), shapes/config are module-level constants (M, N, BASE, VALID_COLS), the golden is a module-level _expected() helper, and the test case only references them with the pytest suite parametrized over platform. DSL program and golden semantics unchanged.

github-project-automation Bot added this to pto project Jun 1, 2026

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Youhezhen added 6 commits June 2, 2026 09:03

chore: trigger CI

a6af314

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codegen): materialize Scalar tuple-return elements in orchestration (#631)#1625

fix(codegen): materialize Scalar tuple-return elements in orchestration (#631)#1625
Little-oil wants to merge 8 commits into
hw-native-sys:mainfrom
Little-oil:issue-631-orch-scalar-tuple-return

Little-oil commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Little-oil commented Jun 1, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading