Skip to content

fix(codegen): materialize Scalar tuple-return elements in orchestration (#631)#1625

Open
Little-oil wants to merge 8 commits into
hw-native-sys:mainfrom
Little-oil:issue-631-orch-scalar-tuple-return
Open

fix(codegen): materialize Scalar tuple-return elements in orchestration (#631)#1625
Little-oil wants to merge 8 commits into
hw-native-sys:mainfrom
Little-oil:issue-631-orch-scalar-tuple-return

Conversation

@Little-oil

Copy link
Copy Markdown
Contributor

Summary

When an incore kernel returns a tuple that mixes Tensor and Scalar elements, the orchestration codegen mapped each return position to an Out/InOut param. A Scalar element (e.g. an index o0 = ob * 32 computed from an input param) is not backed by an Out param, so it resolved to no param index and was dropped. The generated C++ then referenced an undefined variable (or, after an earlier stop-gap, a wrong o0 = 0), breaking compilation or producing incorrect runtime values.

This is the full fix: the orchestration codegen now materializes Scalar tuple-return elements at the call site (e.g. emitting int64_t o0 = (ob * 32);) instead of dropping them. It traces the callee's ReturnStmt value at that position, inlines its local scalar definitions, substitutes each callee param with the expression actually passed for it, and renders via the existing GenerateExprString path. When an element cannot be traced to a pure arithmetic function of the call args, it falls back to the prior 0 default so generated code stays compilable. The fix is applied on both the plain tuple-return path and the pl.submit return-alias path.

Addresses #631

Test plan

  • Host-side unit test added: tests/ut/codegen/test_orchestration_codegen.py::test_scalar_tuple_return_materialized, which verifies the generated orchestration C++ materializes the Scalar tuple-return element instead of dropping it.
  • The on-device ST scenario for this issue must be validated manually via task-submit on device 9. CI does not run device ST, so this is not exercised automatically here.

…on (hw-native-sys#631)

When an incore kernel returns a tuple mixing Tensor and Scalar elements,
the orchestration codegen mapped each return position to an Out/InOut
param. A Scalar element (e.g. an index `o0 = ob * 32` computed from an
input param) is not backed by an Out param, so it resolved to no param
index and was dropped — the generated C++ then referenced an undefined
variable (or, after a stop-gap, a wrong `o0 = 0`), breaking compilation
or producing incorrect runtime values.

Re-materialize such Scalar elements at the call site: trace the callee's
ReturnStmt value at that position, inline its local scalar definitions,
substitute each callee param with the expression actually passed for it,
and render via the existing GenerateExprString path (e.g.
`int64_t o0 = (ob * 32);`). When the element cannot be traced to a pure
arithmetic function of the call args, fall back to the prior 0 default so
generated code stays compilable. Applied to both the plain tuple-return
and pl.submit return-alias paths.
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 73945f62-2439-4e16-9bee-c942610761fc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds support for rematerializing scalar tuple elements from callee-defined expressions instead of unconditionally emitting 0 fallback values. A new helper function traces and inlines callee scalar definitions with caller-supplied parameters, wires the logic into tuple and submit return alias generation, and includes a regression test.

Changes

Scalar Tuple Return Rematerialization

Layer / File(s) Summary
Scalar rematerialization helper
src/codegen/orchestration/orchestration_codegen.cpp
Adds pypto/core/error.h include and implements TryRenderCalleeScalarReturn, which locates the callee's return statement, collects scalar variable definitions from AssignStmts, maps callee parameters to the caller's provided arguments, recursively inlines local scalar definitions with cycle detection, and renders the result via GenerateExprString; returns nullopt on unsupported expressions to preserve the 0 fallback.
Return alias generation integration
src/codegen/orchestration/orchestration_codegen.cpp
GenerateTupleReturnAliases and GenerateSubmitReturnAliases now invoke TryRenderCalleeScalarReturn for non-parameter-backed scalar tuple elements; successfully traced expressions are emitted inline, otherwise fall back to 0.
Test coverage
tests/ut/codegen/test_orchestration_codegen.py
test_scalar_tuple_return_materialized regression test validates rematerialization of scalar tuple elements via inlined callee expressions, confirms no erroneous zero defaults for traceable scalars, and verifies correct tensor aliasing and scalar propagation.

Sequence Diagram(s)

sequenceDiagram
  participant TryRenderCalleeScalarReturn
  participant CalleeFinder
  participant DefnCollector
  participant ParamMapper
  participant Inliner
  participant ExprRenderer
  
  TryRenderCalleeScalarReturn->>CalleeFinder: locate callee ReturnStmt
  TryRenderCalleeScalarReturn->>DefnCollector: collect scalar AssignStmts in callee
  TryRenderCalleeScalarReturn->>ParamMapper: map callee params to call->args_
  TryRenderCalleeScalarReturn->>Inliner: recursively inline local defs<br/>(with cycle guard)
  Inliner->>Inliner: substitute params in expr
  TryRenderCalleeScalarReturn->>ExprRenderer: GenerateExprString(resolved expr)
  ExprRenderer-->>TryRenderCalleeScalarReturn: C++ expression or NotImplementedError
  TryRenderCalleeScalarReturn-->>TryRenderCalleeScalarReturn: return expression or nullopt
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hw-native-sys/pypto#1602: Prevents scalars from being incorrectly treated as tensor writebacks; this PR builds on that fix by replacing the resulting 0 fallback with rematerialized expressions.
  • hw-native-sys/pypto#1581: Also modifies GenerateTupleReturnAliases and GenerateSubmitReturnAliases to handle unmapped return elements, directly connected at the alias-materialization level.

Poem

A rabbit hops through callee trees,
Finding scalar lost at ease,
Traces params, inlines deep,
No more zeros in the heap! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: materializing Scalar tuple-return elements in orchestration codegen to fix issue #631. It is concise, specific, and clearly describes the primary fix.
Description check ✅ Passed The description is directly related to the changeset, providing clear context about the problem (Scalar tuple-return elements being dropped), the solution (materializing them at call sites), implementation details, and the test plan.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the TryRenderCalleeScalarReturn method in the orchestration codegen to inline and re-materialize scalar return-tuple elements at the call site based on the callee's defining expression and call-site arguments, resolving issue #631. It also adds a unit test to verify this behavior. The reviewer suggests that the order-insensitive collection of variable definitions in ScalarDefCollector can be incorrect and recommends using a prefix-based approach that accumulates definitions as statements are scanned to handle variable availability and dominance correctly.

Comment on lines +2212 to +2215
void VisitStmt_(const AssignStmtPtr& assign) override {
if (assign->var_) var_def.emplace(assign->var_.get(), assign->value_);
IRVisitor::VisitStmt_(assign);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When performing variable definition analysis, an order-insensitive collection of all definitions can be incorrect. Instead of using a simple map emplace/insert which doesn't respect execution flow, use a prefix-based approach that accumulates definitions as statements are scanned to correctly handle variable availability and dominance.

      void VisitStmt_(const AssignStmtPtr& assign) override {
        if (assign->var_) {
          accumulated_defs.push_back({assign->var_.get(), assign->value_});
        }
        IRVisitor::VisitStmt_(assign);
      }
References
  1. When performing variable definition analysis, an order-insensitive collection of all definitions can be incorrect. Instead, use a prefix-based approach that accumulates definitions as statements are scanned to correctly handle variable availability and dominance.

…hw-native-sys#631)

The Scalar tuple-return materialization renders the resolved return
expression via GenerateExprString. That path only handles a fixed set of
scalar kinds (consts, arithmetic, min/max, comparisons, cast, tuple-get);
a resolved expression containing an unsupported kind — e.g. a Call to a
scalar-returning helper — makes it throw NotImplementedError, which
aborted orchestration codegen for kernels that compiled fine before (seen
on DeepSeek-V4 decode_attention_swa: "GenerateExprString not implemented
for expression type: Call").

Catch NotImplementedError in TryRenderCalleeScalarReturn and fall back to
std::nullopt, so the caller emits the prior `0` default instead of
aborting. Materialization is thus best-effort: it fires for pure
arithmetic-of-args scalars and is a no-op (old behavior) for anything
GenerateExprString cannot render.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/ut/codegen/test_orchestration_codegen.py (1)

775-838: ⚡ Quick win

Add a companion regression for the pl.submit return-alias path.

This test validates the plain tuple-return rematerialization path well, but the PR also updates the pl.submit return-alias path. Please add a focused pl.submit variant (tuple[Tensor, Scalar]) to lock coverage for both touched paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 775 - 838, Add a
companion test that mirrors test_scalar_tuple_return_materialized but exercises
the pl.submit return-alias path: create a new test function (e.g.,
test_scalar_tuple_return_materialized_submit) that defines the same
ScalarTupleReturnProgram (or reuse producer/consumer) and in the Orchestration
function call the producer via pl.submit (e.g., t, off =
pl.submit(self.producer, a, base, mid) or the framework's submit syntax) so the
codegen path for pl.submit is exercised; generate code with _generate_orch_code
and assert the same expectations as the original test (presence of "int64_t off
= (base * 32);", absence of "int64_t off = 0;", "const Tensor& t = ext_mid;",
and "add_scalar(off)") to lock coverage for the submit return-alias variant.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 775-838: Add a companion test that mirrors
test_scalar_tuple_return_materialized but exercises the pl.submit return-alias
path: create a new test function (e.g.,
test_scalar_tuple_return_materialized_submit) that defines the same
ScalarTupleReturnProgram (or reuse producer/consumer) and in the Orchestration
function call the producer via pl.submit (e.g., t, off =
pl.submit(self.producer, a, base, mid) or the framework's submit syntax) so the
codegen path for pl.submit is exercised; generate code with _generate_orch_code
and assert the same expectations as the original test (presence of "int64_t off
= (base * 32);", absence of "int64_t off = 0;", "const Tensor& t = ext_mid;",
and "add_scalar(off)") to lock coverage for the submit return-alias variant.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 425042b0-4ea6-402c-b5e2-90a27d2bcd3b

📥 Commits

Reviewing files that changed from the base of the PR and between e851ba5 and 3f78c2e.

📒 Files selected for processing (2)
  • src/codegen/orchestration/orchestration_codegen.cpp
  • tests/ut/codegen/test_orchestration_codegen.py

Youhezhen added 6 commits June 2, 2026 09:03
…s#631)

The hw-native-sys#631 fix re-materializes Scalar tuple-return elements in both the
plain tuple-return path (GenerateTupleReturnAliases) and the pl.submit
path (GenerateSubmitReturnAliases), but only the plain path had a
regression test. Add test_scalar_tuple_return_materialized_submit, a
focused pl.submit variant returning tuple[Tensor, Scalar] inside a
pl.manual_scope, asserting the scalar is inlined to
"int64_t off = (base * 32);" (no "off = 0" stop-gap) and forwarded
downstream via add_scalar(off).
…ative-sys#631)"

This reverts commit 137328914 (the added test_scalar_tuple_return_materialized_submit).

The test mis-unpacked pl.submit: it used `t, off, _tid = pl.submit(...)`,
but the parser requires exactly 2 targets (result, task_id) — the callee's
tuple return is the single `result`, not inline-unpackable — so the test
raised ParserSyntaxError and failed unit-tests. There is no proven
tuple-returning-callee pl.submit pattern in the suite to mirror, and the
submit return-alias path already exercises the same TryRenderCalleeScalarReturn
helper that the passing plain-path test (test_scalar_tuple_return_materialized)
covers, so dropping this unverified test restores green CI without losing
meaningful coverage. A dedicated submit-path test can be added once the
tuple-result submit DSL form is verified on a build.
…ing (hw-native-sys#631)

End-to-end system test reproducing the issue hw-native-sys#631 scenario: an InCore
producer returns tuple[Tensor, Scalar[INDEX]] where the Scalar (off = base*2)
is computed inside the kernel and is not backed by any Out param, and the
orchestration forwards that Scalar to a downstream InCore consumer.

The consumer uses the forwarded Scalar as its valid column count
(valid_shapes=[rows, off]), so the output's valid region depends on the
materialized Scalar value: before the fix the Scalar was dropped (undefined
variable -> kernel failed to compile); a zero/wrong fallback would zero or
shift the valid region. With the fix the Scalar is re-materialized at the
call site (int64_t off = (base * 2)) and the result matches the golden.

Complements the codegen UT test_scalar_tuple_return_materialized with a
runtime (compile + execute) check across all platforms.
…ventions (hw-native-sys#631)

Restructure the hw-native-sys#631 system test to follow the sibling ST style
(test_dyn_orch_shape.py): the test case constructor now takes (shape, base),
the pytest suite parametrizes over (shape, base) x platform, get_name embeds
the config, and module-level constants are replaced by a _SHAPE_BASE config
list. The DSL program and golden are unchanged (still device-validated).
…m style (hw-native-sys#631)

Match the test_set_validshape.py convention: the @pl.program is defined at
module level (not nested in get_program), shapes/config are module-level
constants (M, N, BASE, VALID_COLS), the golden is a module-level _expected()
helper, and the test case only references them with the pytest suite
parametrized over platform. DSL program and golden semantics unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant