Skip to content

[BLOCKED on PTOAS] feat(ir): Add tile.interleave and tile.deinterleave operators#1742

Draft
Little-oil wants to merge 1 commit into
hw-native-sys:mainfrom
Little-oil:issue-1325-fix
Draft

[BLOCKED on PTOAS] feat(ir): Add tile.interleave and tile.deinterleave operators#1742
Little-oil wants to merge 1 commit into
hw-native-sys:mainfrom
Little-oil:issue-1325-fix

Conversation

@Little-oil

Copy link
Copy Markdown
Contributor

Summary

  • Add tile-level tile.interleave / tile.deinterleave ops (issue [New Op] Support interleave/deinterleave operations #1325): two same-typed 2D Vec tiles in, ordered result pair out (TupleType{low, high} resp. {even, odd}), mirroring the gather_compare multi-result pattern end to end (C++ op + type deduction, Python IR wrapper, DSL tuple-unpack API, codegen)
  • Type contract enforced in a shared deducer: matching dtype/shape/valid_shape, 2D only, 8/16/32-bit element widths
  • Codegen emits pto.tintlv / pto.tdintlv (2 ins + 2 outs, one alloc_tile per output) from a single shared factory; mnemonics are constants in pto_ops_common.cpp — the single touch point once PTOAS adds tile-form interleave
  • Docs (en/zh) operator tables updated with a pending-PTOAS note

Testing

  • Op UT: valid dtypes, INT64 rejection, dtype/shape/valid_shape mismatch, non-2D rejection (tests/ut/ir/operators/test_interleave.py)
  • Codegen UT: emitted line shape and per-output allocs (tests/ut/codegen/test_pto_codegen_ops.py)
  • Worktree build + pre-commit + clang-tidy clean
  • No ST / on-device coverage yet — PTOAS v0.45 has no tile-form tintlv/tdintlv; ST follows when PTOAS lands

Related Issues

Closes #1325

Add tile-level interleave/deinterleave ops returning an ordered result
pair (low/high resp. even/odd) as TupleType{out0, out1}. Inputs must be
two 2D Vec tiles with matching dtype, shape, and valid_shape; element
widths 8/16/32-bit. Codegen emits pto.tintlv / pto.tdintlv with two ins
and two outs; the mnemonics are isolated in pto_ops_common.cpp pending
PTOAS tile-form support, so coverage is op and codegen UT only.

Closes hw-native-sys#1325
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds full support for tile-level interleave and deinterleave operations to PyPTO. The implementation spans IR definition with type constraints, Python APIs at both IR and DSL layers, PTO backend code generation, documentation, and comprehensive tests covering type validation and code generation correctness.

Changes

Tile Interleave/Deinterleave Operations Implementation

Layer / File(s) Summary
IR operation definition and type inference
src/ir/op/tile_ops/interleave.cpp, CMakeLists.txt
tile.interleave and tile.deinterleave IR operations defined with shared type deduction enforcing matching 2D tiles, 8/16/32-bit element widths, and returning TupleType outputs. Both ops registered for vector memory spaces.
Python IR-level wrapper API
python/pypto/ir/op/tile_ops.py
interleave(lhs, rhs, span=None) and deinterleave(lhs, rhs, span=None) wrapper functions normalize span and emit corresponding IR ops.
Python DSL user-facing API
python/pypto/language/op/tile_ops.py
pl.tile.interleave(lhs, rhs) and pl.tile.deinterleave(lhs, rhs) exported functions call IR wrappers and extract tuple results into two Tile objects.
PTO backend code generation
src/backend/common/pto_ops_common.cpp
MakeInterleaveCodegenPTO factory generates pto.tintlv and pto.tdintlv PTO ops with proper tile allocation and DPS input/output structure.
Documentation
docs/en/dev/ir/05-operators.md, docs/zh-cn/dev/ir/05-operators.md
Documented tile.interleave and tile.deinterleave operations with semantics, constraints, and PTOAS support status.
IR-level type validation tests
tests/ut/ir/operators/test_interleave.py
Parameterized tests for valid 8/16/32-bit dtypes, and error validation for invalid dtype widths, dtype/shape/valid_shape mismatches, dimensionality, and constraints.
Code generation verification tests
tests/ut/codegen/test_pto_codegen_ops.py
Validates PTO MLIR output contains correct pto.tintlv/pto.tdintlv with two-input/two-output DPS structure and proper tile allocations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

enhancement

Poem

🐰 Two tiles interleaved with care,
Low and high streams fill the air,
Deinterleave to even, odd so neat,
Hardware ops make code complete!
New PTO powers, swift and fleet! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, covering implementation details, testing, and design decisions relevant to the new tile operators.
Linked Issues check ✅ Passed The PR successfully implements all primary coding requirements from issue #1325: tile-level APIs (pl.tile.interleave/deinterleave with tuple return), type constraints (matching dtype/shape/valid_shape, 2D-only, 8/16/32-bit), IR ops with deducer, Python wrappers, DSL tuple-unpack API, and PTO codegen (pto.tintlv/pto.tdintlv).
Out of Scope Changes check ✅ Passed All code changes align with the PR scope and issue #1325 objectives: operator implementations, type deduction, Python bindings, documentation, and tests for the new interleave/deinterleave functionality.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding tile.interleave and tile.deinterleave operators. The [BLOCKED on PTOAS] prefix appropriately indicates the status context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the tile.interleave and tile.deinterleave operators, adding support across C++ IR operator registration, Python language bindings, documentation, and backend code generation. A review comment correctly identifies a potential issue in the backend codegen helper MakeInterleaveCodegenPTO, where mismatched empty type annotations could result in malformed PTOAS instructions. It is recommended to apply the suggested validation checks to ensure both type annotations are either consistently present or consistently absent.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1005 to +1014
std::ostringstream oss;
oss << pto_op << " ins(" << lhs << ", " << rhs;
if (!lhs_ty.empty() || !rhs_ty.empty()) {
oss << " : " << lhs_ty << ", " << rhs_ty;
}
oss << ") outs(" << out0 << ", " << out1;
if (!out0_ty.empty() || !out1_ty.empty()) {
oss << " : " << out0_ty << ", " << out1_ty;
}
oss << ")";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In MakeInterleaveCodegenPTO, if one of the input or output type annotations is empty while the other is not, the generated PTOAS instruction will contain a malformed type clause (e.g., ins(%lhs, %rhs : , %rhs_ty) or outs(%out0, %out1 : %out0_ty, )).

To prevent generating malformed PTOAS, we should enforce that either both type annotations are present or both are absent, similar to the pattern used in MakeScatterCodegenPTO.

    INTERNAL_CHECK_SPAN(lhs_ty.empty() == rhs_ty.empty(), op->span_)
        << "Internal error: " << op->op_->name_ << " lhs/rhs type annotations must both be present or both absent, got lhs_ty='"
        << lhs_ty << "', rhs_ty='" << rhs_ty << "'";
    INTERNAL_CHECK_SPAN(out0_ty.empty() == out1_ty.empty(), op->span_)
        << "Internal error: " << op->op_->name_ << " output type annotations must both be present or both absent, got out0_ty='"
        << out0_ty << "', out1_ty='" << out1_ty << "'";

    std::ostringstream oss;
    oss << pto_op << " ins(" << lhs << ", " << rhs;
    if (!lhs_ty.empty() && !rhs_ty.empty()) {
      oss << " : " << lhs_ty << ", " << rhs_ty;
    }
    oss << ") outs(" << out0 << ", " << out1;
    if (!out0_ty.empty() && !out1_ty.empty()) {
      oss << " : " << out0_ty << ", " << out1_ty;
    }
    oss << ")";

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
src/ir/op/tile_ops/interleave.cpp (2)

127-141: ⚡ Quick win

Clarify the deinterleave semantics in the description.

The description states "even = even-indexed elements of lhs|rhs concat, odd = odd-indexed elements." The notation lhs|rhs concat is informal and may be unclear to readers unfamiliar with the hardware instruction.

Consider updating the description to specify the precise element-level semantics that match the PTO ISA tdintlv instruction, e.g., by clarifying the indexing scheme used to partition elements into even and odd outputs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ir/op/tile_ops/interleave.cpp` around lines 127 - 141, Update the
REGISTER_OP("tile.deinterleave") description to explicitly state element-level
semantics matching the PTO ISA tdintlv: define that the operation conceptually
forms a concatenated vector lhs||rhs (lhs elements followed by rhs elements),
then produces TupleType{even, odd} where "even" contains elements from the
concatenated vector at even indices (0,2,4,...) and "odd" contains elements at
odd indices (1,3,5,...); keep the note that lhs and rhs must have the same
dtype/shape/valid_shape (8/16/32-bit) and mention that index ordering is
element-level (not byte-level) so readers can map this to tdintlv and to
DeduceTileInterleaveType.

109-125: ⚡ Quick win

Clarify the interleave semantics in the description.

The description states "low = lhs0,rhs0,lhs1,rhs1,... over the lower halves, high = same over the upper halves." The phrase "over the lower halves" is ambiguous—it's unclear whether this means:

  • The lower half of each input tile's elements are interleaved, or
  • Some other partitioning scheme.

The PR objectives mention "low/high interleaved streams" but don't define the exact element indices. Consider updating the description to specify the precise element-level semantics that match the PTO ISA tintlv instruction, e.g., by clarifying which element indices from lhs and rhs end up in low vs. high.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ir/op/tile_ops/interleave.cpp` around lines 109 - 125, The op description
for REGISTER_OP("tile.interleave") is ambiguous about "lower halves"; update the
.set_description text to explicitly state element-index semantics (and reference
the PTO tintlv behavior): define half = lane_count/2, then low = [lhs[0],
rhs[0], lhs[1], rhs[1], ..., lhs[half-1], rhs[half-1]] and high = [lhs[half],
rhs[half], lhs[half+1], rhs[half+1], ..., lhs[half+half-1], rhs[half+half-1]] so
readers know which input indices map to each output stream; keep this
description next to REGISTER_OP("tile.interleave") and ensure it matches
DeduceTileInterleaveType and PTO ISA tintlv semantics.
tests/ut/codegen/test_pto_codegen_ops.py (1)

2254-2268: ⚡ Quick win

Strengthen operand-count validation.

Lines 2261-2262 use ins_clause.count("%") to count ins/outs operands, which is fragile:

  • Type annotations may contain % (e.g., !pto.tile<%rows, %cols>), leading to overcounting.
  • Doesn't distinguish SSA value identifiers from other % uses.
  • Uses >= instead of exact equality, allowing extra operands to slip through.
🔧 Recommended fix: parse comma-separated operands before type annotation
     def _assert_two_in_two_out(self, mlir: str, pto_op: str, out_names: tuple[str, str]) -> None:
         op_lines = [line for line in mlir.splitlines() if pto_op in line]
         assert len(op_lines) == 1, f"Expected exactly one {pto_op}, got {len(op_lines)}:\n{mlir}"
         line = op_lines[0]
         assert "ins(" in line and "outs(" in line, f"{pto_op} must use ins(...) outs(...), got:\n{line}"
         ins_clause = line.split("ins(", 1)[1].split(")", 1)[0]
         outs_clause = line.split("outs(", 1)[1].split(")", 1)[0]
-        assert ins_clause.count("%") >= 2, f"{pto_op} must have two ins operands, got:\n{line}"
-        assert outs_clause.count("%") >= 2, f"{pto_op} must have two outs operands, got:\n{line}"
+        # Extract operands before type annotation (before ':')
+        ins_operands = [x.strip() for x in ins_clause.split(":", 1)[0].split(",") if x.strip().startswith("%")]
+        outs_operands = [x.strip() for x in outs_clause.split(":", 1)[0].split(",") if x.strip().startswith("%")]
+        assert len(ins_operands) == 2, f"{pto_op} must have exactly 2 ins operands, got {len(ins_operands)}:\n{line}"
+        assert len(outs_operands) == 2, f"{pto_op} must have exactly 2 outs operands, got {len(outs_operands)}:\n{line}"
         for name in out_names:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/codegen/test_pto_codegen_ops.py` around lines 2254 - 2268, The test
helper _assert_two_in_two_out should stop using
ins_clause.count("%")/outs_clause.count("%") and instead parse the
comma-separated operand list before any type annotations: split ins_clause and
outs_clause by commas, trim each token and strip any trailing type annotations
(e.g., remove content after whitespace or ':' or a '<'), then assert that the
resulting operand lists have length == 2 (use equality, not >=) and that each
entry matches expected SSA names; keep the existing check that each expected out
name (from out_names) appears in the parsed outs list and that exactly one
pto.alloc_tile line exists for each out name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/dev/ir/05-operators.md`:
- Line 277: The docs for the interleave operator omit the rank requirement;
update the Interleave description for tile.interleave (and the table row for
**Interleave**) to state that both lhs and rhs must be 2D tiles (rank == 2) in
addition to matching dtype, shape, and valid_shape, and note the element width
constraints; reference the IR implementation in
src/ir/op/tile_ops/interleave.cpp for the enforced 2D constraint to ensure
documentation matches the code.
- Line 278: Update the `tile.deinterleave` doc row to explicitly state the 2D
(rank == 2) constraint: mention that, like `tile.interleave`,
`pl.tile.deinterleave(lhs, rhs)` requires 2D inputs (rank == 2) and follows the
same dtype/shape/valid_shape and 8/16/32-bit width restrictions; reference the
shared type-deducer behavior that enforces rank == 2 (the same rule applied in
`tile.interleave`) so readers don’t need to cross-reference the previous row.

In `@docs/zh-cn/dev/ir/05-operators.md`:
- Line 272: The entry for `tile.deinterleave` must explicitly state the 2D (rank
== 2) tile constraint rather than only saying "same constraints as
`tile.interleave`"; update the sentence for `tile.deinterleave` to mention that
both operators require tiles of rank == 2 (2D), the same dtype/shape/valid_shape
and bit-width constraints, and that this 2D requirement is enforced by their
shared type inference (rank == 2).
- Line 271: Update the interleave operator docs to state that lhs and rhs must
be 2D tiles (rank == 2) in addition to existing requirements; specifically note
that src/ir/op/tile_ops/interleave.cpp enforces a 2D tile constraint, so
document that lhs/rhs must have rank==2, matching dtype/shape/valid_shape,
element bitwidth (8/16/32), and that both outputs copy lhs tile type.

In `@tests/ut/ir/operators/test_interleave.py`:
- Around line 163-215: Add two tests to TestTileDeinterleaveTypes to mirror
interleave coverage: implement test_non_2d_raises which wraps a program using
pl.tile.deinterleave on 3D tiles and asserts pytest.raises(Exception, match="2D
tiles"), and implement test_valid_shape_mismatch_raises which creates a tile
with an altered valid shape via pl.tile.set_validshape then calls
pl.tile.deinterleave and asserts pytest.raises(Exception, match="valid_shape to
match"); place both methods in the TestTileDeinterleaveTypes class so they
reference pl.tile.deinterleave, pl.tile.set_validshape and the same pattern used
by existing tests (use `@pl.program` and `@pl.function` with appropriate Tensor/Tile
annotations).

---

Nitpick comments:
In `@src/ir/op/tile_ops/interleave.cpp`:
- Around line 127-141: Update the REGISTER_OP("tile.deinterleave") description
to explicitly state element-level semantics matching the PTO ISA tdintlv: define
that the operation conceptually forms a concatenated vector lhs||rhs (lhs
elements followed by rhs elements), then produces TupleType{even, odd} where
"even" contains elements from the concatenated vector at even indices
(0,2,4,...) and "odd" contains elements at odd indices (1,3,5,...); keep the
note that lhs and rhs must have the same dtype/shape/valid_shape (8/16/32-bit)
and mention that index ordering is element-level (not byte-level) so readers can
map this to tdintlv and to DeduceTileInterleaveType.
- Around line 109-125: The op description for REGISTER_OP("tile.interleave") is
ambiguous about "lower halves"; update the .set_description text to explicitly
state element-index semantics (and reference the PTO tintlv behavior): define
half = lane_count/2, then low = [lhs[0], rhs[0], lhs[1], rhs[1], ...,
lhs[half-1], rhs[half-1]] and high = [lhs[half], rhs[half], lhs[half+1],
rhs[half+1], ..., lhs[half+half-1], rhs[half+half-1]] so readers know which
input indices map to each output stream; keep this description next to
REGISTER_OP("tile.interleave") and ensure it matches DeduceTileInterleaveType
and PTO ISA tintlv semantics.

In `@tests/ut/codegen/test_pto_codegen_ops.py`:
- Around line 2254-2268: The test helper _assert_two_in_two_out should stop
using ins_clause.count("%")/outs_clause.count("%") and instead parse the
comma-separated operand list before any type annotations: split ins_clause and
outs_clause by commas, trim each token and strip any trailing type annotations
(e.g., remove content after whitespace or ':' or a '<'), then assert that the
resulting operand lists have length == 2 (use equality, not >=) and that each
entry matches expected SSA names; keep the existing check that each expected out
name (from out_names) appears in the parsed outs list and that exactly one
pto.alloc_tile line exists for each out name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f5c2af21-2ade-4c6d-ac23-0e10da63601d

📥 Commits

Reviewing files that changed from the base of the PR and between 3cc256c and 4ea58ec.

📒 Files selected for processing (9)
  • CMakeLists.txt
  • docs/en/dev/ir/05-operators.md
  • docs/zh-cn/dev/ir/05-operators.md
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/language/op/tile_ops.py
  • src/backend/common/pto_ops_common.cpp
  • src/ir/op/tile_ops/interleave.cpp
  • tests/ut/codegen/test_pto_codegen_ops.py
  • tests/ut/ir/operators/test_interleave.py

| **Reduction** | `tile.sum` | Reduction along axis (axis, keepdim) |
| **Scatter** | `tile.scatter` | Row-scatter `src` into `dst` at per-row indices (`pto.tscatter` index form; DPS — `dst` is in/out, the result aliases `dst`). `src`/`dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16}; `indexes` dtype ∈ {I16, I32}; element-size matching rule: 4-byte dst ↔ INT32, 2-byte dst ↔ INT16, 1-byte dst ↔ INT16. |
| - | `tile.scatter_mask` | Mask-pattern row-scatter: write each `src` row into the mask-marked columns of `dst` (`pto.tscatter` mask form; DPS). Mask pattern selects positions: P0101 (1) / P1010 (2) — stride 2; P0001 (3) / P0010 (4) / P0100 (5) / P1000 (6) — stride 4; P1111 (7) — no expansion. Targeted at A3 / CPU-sim style backends — A5 rejects this form. |
| **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must have identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the 2D tile constraint.

The IR implementation enforces that both lhs and rhs must be 2D tiles (rank == 2). The documentation currently lists dtype/shape/valid_shape matching and element-width constraints but omits the 2D requirement, which could mislead users into attempting unsupported ranks.

📝 Suggested addition
-| **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must have identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. |
+| **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed 2D tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must be 2D tiles with identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. |

Based on context from src/ir/op/tile_ops/interleave.cpp which enforces the 2D requirement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/dev/ir/05-operators.md` at line 277, The docs for the interleave
operator omit the rank requirement; update the Interleave description for
tile.interleave (and the table row for **Interleave**) to state that both lhs
and rhs must be 2D tiles (rank == 2) in addition to matching dtype, shape, and
valid_shape, and note the element width constraints; reference the IR
implementation in src/ir/op/tile_ops/interleave.cpp for the enforced 2D
constraint to ensure documentation matches the code.

| **Scatter** | `tile.scatter` | Row-scatter `src` into `dst` at per-row indices (`pto.tscatter` index form; DPS — `dst` is in/out, the result aliases `dst`). `src`/`dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16}; `indexes` dtype ∈ {I16, I32}; element-size matching rule: 4-byte dst ↔ INT32, 2-byte dst ↔ INT16, 1-byte dst ↔ INT16. |
| - | `tile.scatter_mask` | Mask-pattern row-scatter: write each `src` row into the mask-marked columns of `dst` (`pto.tscatter` mask form; DPS). Mask pattern selects positions: P0101 (1) / P1010 (2) — stride 2; P0001 (3) / P0010 (4) / P0100 (5) / P1000 (6) — stride 4; P1111 (7) — no expansion. Targeted at A3 / CPU-sim style backends — A5 rejects this form. |
| **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must have identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. |
| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. Same dtype/shape/valid_shape constraints and 8/16/32-bit widths as `tile.interleave`. Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the 2D tile constraint explicitly.

Although this entry references "same constraints" as tile.interleave, the 2D requirement is not stated explicitly. Both operators share the same type deducer which enforces rank == 2. For clarity and to avoid requiring users to cross-reference the previous row, explicitly state the 2D constraint here as well.

📝 Suggested addition
-| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. Same dtype/shape/valid_shape constraints and 8/16/32-bit widths as `tile.interleave`. Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. |
+| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. `lhs`/`rhs` must be 2D tiles with identical dtype, shape, and valid_shape; element widths 8/16/32-bit (same constraints as `tile.interleave`). Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. |

Based on context from src/ir/op/tile_ops/interleave.cpp which enforces the 2D requirement.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. Same dtype/shape/valid_shape constraints and 8/16/32-bit widths as `tile.interleave`. Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. |
| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. `lhs`/`rhs` must be 2D tiles with identical dtype, shape, and valid_shape; element widths 8/16/32-bit (same constraints as `tile.interleave`). Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/dev/ir/05-operators.md` at line 278, Update the `tile.deinterleave`
doc row to explicitly state the 2D (rank == 2) constraint: mention that, like
`tile.interleave`, `pl.tile.deinterleave(lhs, rhs)` requires 2D inputs (rank ==
2) and follows the same dtype/shape/valid_shape and 8/16/32-bit width
restrictions; reference the shared type-deducer behavior that enforces rank == 2
(the same rule applied in `tile.interleave`) so readers don’t need to
cross-reference the previous row.

| **规约** | `tile.sum` | 沿轴规约(axis, keepdim) |
| **散布** | `tile.scatter` | 按行索引把 `src` 散布到 `dst`(`pto.tscatter` 索引形式;DPS:`dst` 为 in/out,结果别名为 `dst`)。`src` / `dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16};`indexes` dtype ∈ {I16, I32};元素宽度匹配规则:4 字节 dst ↔ INT32,2 字节 dst ↔ INT16,1 字节 dst ↔ INT16。 |
| - | `tile.scatter_mask` | 按掩码模式把 `src` 行写入 `dst` 中由掩码选中的列(`pto.tscatter` 掩码形式;DPS)。掩码 P0101 (1) / P1010 (2) 步幅 2;P0001..P1000 (3-6) 步幅 4;P1111 (7) 不扩展。仅 A3 / CPU-sim 后端支持,A5 拒绝。 |
| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 的 dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

添加 2D tile 约束。

IR 实现强制要求 lhsrhs 必须为 2D tile(rank == 2)。文档目前列出了 dtype/shape/valid_shape 匹配和元素位宽约束,但遗漏了 2D 要求,可能误导用户尝试使用不支持的秩。

📝 建议修改
-| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 的 dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |
+| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 2D tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 必须为 2D tile 且 dtype、shape、valid_shape 完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |

基于 src/ir/op/tile_ops/interleave.cpp 中强制执行的 2D 要求。

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...``high` = 高半部同样交织。`lhs`/`rhs` dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |
| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 2D tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...``high` = 高半部同样交织。`lhs`/`rhs` 必须为 2D tile 且 dtype、shape、valid_shape 完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/zh-cn/dev/ir/05-operators.md` at line 271, Update the interleave
operator docs to state that lhs and rhs must be 2D tiles (rank == 2) in addition
to existing requirements; specifically note that
src/ir/op/tile_ops/interleave.cpp enforces a 2D tile constraint, so document
that lhs/rhs must have rank==2, matching dtype/shape/valid_shape, element
bitwidth (8/16/32), and that both outputs copy lhs tile type.

| **散布** | `tile.scatter` | 按行索引把 `src` 散布到 `dst`(`pto.tscatter` 索引形式;DPS:`dst` 为 in/out,结果别名为 `dst`)。`src` / `dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16};`indexes` dtype ∈ {I16, I32};元素宽度匹配规则:4 字节 dst ↔ INT32,2 字节 dst ↔ INT16,1 字节 dst ↔ INT16。 |
| - | `tile.scatter_mask` | 按掩码模式把 `src` 行写入 `dst` 中由掩码选中的列(`pto.tscatter` 掩码形式;DPS)。掩码 P0101 (1) / P1010 (2) 步幅 2;P0001..P1000 (3-6) 步幅 4;P1111 (7) 不扩展。仅 A3 / CPU-sim 后端支持,A5 拒绝。 |
| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 的 dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |
| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` —— 将 `lhs\|rhs` 拼接按偶/奇下标拆分为两个 tile。约束(同 dtype/shape/valid_shape、8/16/32-bit 位宽)与 `tile.interleave` 相同。等待 PTOAS tile 形式(`pto.tdintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

显式添加 2D tile 约束。

虽然此条目提到"约束与 tile.interleave 相同",但未显式说明 2D 要求。两个算子共享同一个类型推导器,均强制 rank == 2。为清晰起见且避免用户需要交叉引用上一行,请在此处也显式声明 2D 约束。

📝 建议修改
-| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` —— 将 `lhs\|rhs` 拼接按偶/奇下标拆分为两个 tile。约束(同 dtype/shape/valid_shape、8/16/32-bit 位宽)与 `tile.interleave` 相同。等待 PTOAS tile 形式(`pto.tdintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |
+| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` —— 将 `lhs\|rhs` 拼接按偶/奇下标拆分为两个 tile。`lhs`/`rhs` 必须为 2D tile 且 dtype、shape、valid_shape 完全一致;元素位宽 8/16/32-bit(约束与 `tile.interleave` 相同)。等待 PTOAS tile 形式(`pto.tdintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |

基于 src/ir/op/tile_ops/interleave.cpp 中强制执行的 2D 要求。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/zh-cn/dev/ir/05-operators.md` at line 272, The entry for
`tile.deinterleave` must explicitly state the 2D (rank == 2) tile constraint
rather than only saying "same constraints as `tile.interleave`"; update the
sentence for `tile.deinterleave` to mention that both operators require tiles of
rank == 2 (2D), the same dtype/shape/valid_shape and bit-width constraints, and
that this 2D requirement is enforced by their shared type inference (rank == 2).

Comment on lines +163 to +215
class TestTileDeinterleaveTypes:
"""Type-contract tests for tile.deinterleave: same contract as interleave."""

@pytest.mark.parametrize("dtype", _VALID_DTYPES)
def test_valid_dtype(self, dtype):
prog = _build_deinterleave_program(dtype=dtype)
text = str(prog)
assert "tile.deinterleave" in text
assert text.count("tile.deinterleave") == 1

@pytest.mark.parametrize("dtype", _INVALID_DTYPES)
def test_invalid_dtype_raises(self, dtype):
with pytest.raises(Exception, match="8/16/32-bit"):
_build_deinterleave_program(dtype=dtype)

def test_dtype_mismatch_raises(self):
with pytest.raises(Exception, match="dtype to match"):

@pl.program
class BadDtypeDeintlv:
@pl.function(type=pl.FunctionType.InCore)
def main(
self,
lhs: pl.Tensor[[32, 64], pl.INT16],
rhs: pl.Tensor[[32, 64], pl.INT32],
out_even: pl.Tensor[[32, 64], pl.INT16],
out_odd: pl.Tensor[[32, 64], pl.INT16],
):
a: pl.Tile[[32, 64], pl.INT16] = pl.load(lhs, [0, 0], [32, 64])
b: pl.Tile[[32, 64], pl.INT32] = pl.load(rhs, [0, 0], [32, 64])
even, odd = pl.tile.deinterleave(a, b)
pl.store(even, [0, 0], out_even)
pl.store(odd, [0, 0], out_odd)

def test_shape_mismatch_raises(self):
with pytest.raises(Exception, match="shapes to match"):

@pl.program
class BadShapeDeintlv:
@pl.function(type=pl.FunctionType.InCore)
def main(
self,
lhs: pl.Tensor[[32, 64], pl.FP32],
rhs: pl.Tensor[[16, 64], pl.FP32],
out_even: pl.Tensor[[32, 64], pl.FP32],
out_odd: pl.Tensor[[32, 64], pl.FP32],
):
a: pl.Tile[[32, 64], pl.FP32] = pl.load(lhs, [0, 0], [32, 64])
b: pl.Tile[[16, 64], pl.FP32] = pl.load(rhs, [0, 0], [16, 64])
even, odd = pl.tile.deinterleave(a, b)
pl.store(even, [0, 0], out_even)
pl.store(odd, [0, 0], out_odd)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add missing test coverage for deinterleave.

TestTileDeinterleaveTypes is missing two test cases that are present in TestTileInterleaveTypes:

  1. test_non_2d_raises – verifies that deinterleave rejects non-2D tiles (PR objectives state "restricted to 2D tiles" for both ops)
  2. test_valid_shape_mismatch_raises – verifies that deinterleave rejects mismatched valid_shape

Although both ops share the same type deduction and would reject these cases, explicit test coverage ensures the contract is verified for both operators.

🧪 Suggested test additions

Add these two test methods to TestTileDeinterleaveTypes:

def test_non_2d_raises(self):
    with pytest.raises(Exception, match="2D tiles"):

        `@pl.program`
        class Bad3DDeintlv:
            `@pl.function`(type=pl.FunctionType.InCore)
            def main(
                self,
                lhs: pl.Tensor[[2, 32, 64], pl.FP32],
                rhs: pl.Tensor[[2, 32, 64], pl.FP32],
                out_even: pl.Tensor[[2, 32, 64], pl.FP32],
                out_odd: pl.Tensor[[2, 32, 64], pl.FP32],
            ):
                a: pl.Tile[[2, 32, 64], pl.FP32] = pl.load(lhs, [0, 0, 0], [2, 32, 64])
                b: pl.Tile[[2, 32, 64], pl.FP32] = pl.load(rhs, [0, 0, 0], [2, 32, 64])
                even, odd = pl.tile.deinterleave(a, b)
                pl.store(even, [0, 0, 0], out_even)
                pl.store(odd, [0, 0, 0], out_odd)

def test_valid_shape_mismatch_raises(self):
    with pytest.raises(Exception, match="valid_shape to match"):

        `@pl.program`
        class BadValidShapeDeintlv:
            `@pl.function`(type=pl.FunctionType.InCore)
            def main(
                self,
                lhs: pl.Tensor[[32, 64], pl.FP32],
                rhs: pl.Tensor[[32, 64], pl.FP32],
                out_even: pl.Tensor[[32, 64], pl.FP32],
                out_odd: pl.Tensor[[32, 64], pl.FP32],
            ):
                a: pl.Tile[[32, 64], pl.FP32] = pl.load(lhs, [0, 0], [32, 64])
                b_full: pl.Tile[[32, 64], pl.FP32] = pl.load(rhs, [0, 0], [32, 64])
                b: pl.Tile[[32, 64], pl.FP32] = pl.tile.set_validshape(b_full, 32, 32)
                even, odd = pl.tile.deinterleave(a, b)
                pl.store(even, [0, 0], out_even)
                pl.store(odd, [0, 0], out_odd)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/operators/test_interleave.py` around lines 163 - 215, Add two
tests to TestTileDeinterleaveTypes to mirror interleave coverage: implement
test_non_2d_raises which wraps a program using pl.tile.deinterleave on 3D tiles
and asserts pytest.raises(Exception, match="2D tiles"), and implement
test_valid_shape_mismatch_raises which creates a tile with an altered valid
shape via pl.tile.set_validshape then calls pl.tile.deinterleave and asserts
pytest.raises(Exception, match="valid_shape to match"); place both methods in
the TestTileDeinterleaveTypes class so they reference pl.tile.deinterleave,
pl.tile.set_validshape and the same pattern used by existing tests (use
`@pl.program` and `@pl.function` with appropriate Tensor/Tile annotations).

@Little-oil Little-oil marked this pull request as draft June 11, 2026 03:10
@Little-oil Little-oil changed the title feat(ir): Add tile.interleave and tile.deinterleave operators [BLOCKED on PTOAS] feat(ir): Add tile.interleave and tile.deinterleave operators Jun 11, 2026
@Little-oil

Copy link
Copy Markdown
Contributor Author

Status: BLOCKED — converted to draft.

PTOAS v0.45 has no tile-form tintlv/tdintlv (pto-isa only documents vreg-level vintlv/vdintlv), so the emitted PTO cannot be assembled or run on device. UT/codegen tests are in place; this PR stays draft until PTOAS lands the tile interleave instructions. Single touch point on unblock: mnemonic constants in src/backend/common/pto_ops_common.cpp, then add device ST.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[New Op] Support interleave/deinterleave operations

1 participant