[BLOCKED on PTOAS] feat(ir): Add tile.interleave and tile.deinterleave operators#1742
[BLOCKED on PTOAS] feat(ir): Add tile.interleave and tile.deinterleave operators#1742Little-oil wants to merge 1 commit into
Conversation
Add tile-level interleave/deinterleave ops returning an ordered result
pair (low/high resp. even/odd) as TupleType{out0, out1}. Inputs must be
two 2D Vec tiles with matching dtype, shape, and valid_shape; element
widths 8/16/32-bit. Codegen emits pto.tintlv / pto.tdintlv with two ins
and two outs; the mnemonics are isolated in pto_ops_common.cpp pending
PTOAS tile-form support, so coverage is op and codegen UT only.
Closes hw-native-sys#1325
📝 WalkthroughWalkthroughThis PR adds full support for tile-level ChangesTile Interleave/Deinterleave Operations Implementation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the tile.interleave and tile.deinterleave operators, adding support across C++ IR operator registration, Python language bindings, documentation, and backend code generation. A review comment correctly identifies a potential issue in the backend codegen helper MakeInterleaveCodegenPTO, where mismatched empty type annotations could result in malformed PTOAS instructions. It is recommended to apply the suggested validation checks to ensure both type annotations are either consistently present or consistently absent.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| std::ostringstream oss; | ||
| oss << pto_op << " ins(" << lhs << ", " << rhs; | ||
| if (!lhs_ty.empty() || !rhs_ty.empty()) { | ||
| oss << " : " << lhs_ty << ", " << rhs_ty; | ||
| } | ||
| oss << ") outs(" << out0 << ", " << out1; | ||
| if (!out0_ty.empty() || !out1_ty.empty()) { | ||
| oss << " : " << out0_ty << ", " << out1_ty; | ||
| } | ||
| oss << ")"; |
There was a problem hiding this comment.
In MakeInterleaveCodegenPTO, if one of the input or output type annotations is empty while the other is not, the generated PTOAS instruction will contain a malformed type clause (e.g., ins(%lhs, %rhs : , %rhs_ty) or outs(%out0, %out1 : %out0_ty, )).
To prevent generating malformed PTOAS, we should enforce that either both type annotations are present or both are absent, similar to the pattern used in MakeScatterCodegenPTO.
INTERNAL_CHECK_SPAN(lhs_ty.empty() == rhs_ty.empty(), op->span_)
<< "Internal error: " << op->op_->name_ << " lhs/rhs type annotations must both be present or both absent, got lhs_ty='"
<< lhs_ty << "', rhs_ty='" << rhs_ty << "'";
INTERNAL_CHECK_SPAN(out0_ty.empty() == out1_ty.empty(), op->span_)
<< "Internal error: " << op->op_->name_ << " output type annotations must both be present or both absent, got out0_ty='"
<< out0_ty << "', out1_ty='" << out1_ty << "'";
std::ostringstream oss;
oss << pto_op << " ins(" << lhs << ", " << rhs;
if (!lhs_ty.empty() && !rhs_ty.empty()) {
oss << " : " << lhs_ty << ", " << rhs_ty;
}
oss << ") outs(" << out0 << ", " << out1;
if (!out0_ty.empty() && !out1_ty.empty()) {
oss << " : " << out0_ty << ", " << out1_ty;
}
oss << ")";There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
src/ir/op/tile_ops/interleave.cpp (2)
127-141: ⚡ Quick winClarify the deinterleave semantics in the description.
The description states
"even = even-indexed elements of lhs|rhs concat, odd = odd-indexed elements."The notationlhs|rhs concatis informal and may be unclear to readers unfamiliar with the hardware instruction.Consider updating the description to specify the precise element-level semantics that match the PTO ISA
tdintlvinstruction, e.g., by clarifying the indexing scheme used to partition elements intoevenandoddoutputs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ir/op/tile_ops/interleave.cpp` around lines 127 - 141, Update the REGISTER_OP("tile.deinterleave") description to explicitly state element-level semantics matching the PTO ISA tdintlv: define that the operation conceptually forms a concatenated vector lhs||rhs (lhs elements followed by rhs elements), then produces TupleType{even, odd} where "even" contains elements from the concatenated vector at even indices (0,2,4,...) and "odd" contains elements at odd indices (1,3,5,...); keep the note that lhs and rhs must have the same dtype/shape/valid_shape (8/16/32-bit) and mention that index ordering is element-level (not byte-level) so readers can map this to tdintlv and to DeduceTileInterleaveType.
109-125: ⚡ Quick winClarify the interleave semantics in the description.
The description states
"low = lhs0,rhs0,lhs1,rhs1,... over the lower halves, high = same over the upper halves."The phrase "over the lower halves" is ambiguous—it's unclear whether this means:
- The lower half of each input tile's elements are interleaved, or
- Some other partitioning scheme.
The PR objectives mention "low/high interleaved streams" but don't define the exact element indices. Consider updating the description to specify the precise element-level semantics that match the PTO ISA
tintlvinstruction, e.g., by clarifying which element indices fromlhsandrhsend up inlowvs.high.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ir/op/tile_ops/interleave.cpp` around lines 109 - 125, The op description for REGISTER_OP("tile.interleave") is ambiguous about "lower halves"; update the .set_description text to explicitly state element-index semantics (and reference the PTO tintlv behavior): define half = lane_count/2, then low = [lhs[0], rhs[0], lhs[1], rhs[1], ..., lhs[half-1], rhs[half-1]] and high = [lhs[half], rhs[half], lhs[half+1], rhs[half+1], ..., lhs[half+half-1], rhs[half+half-1]] so readers know which input indices map to each output stream; keep this description next to REGISTER_OP("tile.interleave") and ensure it matches DeduceTileInterleaveType and PTO ISA tintlv semantics.tests/ut/codegen/test_pto_codegen_ops.py (1)
2254-2268: ⚡ Quick winStrengthen operand-count validation.
Lines 2261-2262 use
ins_clause.count("%")to count ins/outs operands, which is fragile:
- Type annotations may contain
%(e.g.,!pto.tile<%rows, %cols>), leading to overcounting.- Doesn't distinguish SSA value identifiers from other
%uses.- Uses
>=instead of exact equality, allowing extra operands to slip through.🔧 Recommended fix: parse comma-separated operands before type annotation
def _assert_two_in_two_out(self, mlir: str, pto_op: str, out_names: tuple[str, str]) -> None: op_lines = [line for line in mlir.splitlines() if pto_op in line] assert len(op_lines) == 1, f"Expected exactly one {pto_op}, got {len(op_lines)}:\n{mlir}" line = op_lines[0] assert "ins(" in line and "outs(" in line, f"{pto_op} must use ins(...) outs(...), got:\n{line}" ins_clause = line.split("ins(", 1)[1].split(")", 1)[0] outs_clause = line.split("outs(", 1)[1].split(")", 1)[0] - assert ins_clause.count("%") >= 2, f"{pto_op} must have two ins operands, got:\n{line}" - assert outs_clause.count("%") >= 2, f"{pto_op} must have two outs operands, got:\n{line}" + # Extract operands before type annotation (before ':') + ins_operands = [x.strip() for x in ins_clause.split(":", 1)[0].split(",") if x.strip().startswith("%")] + outs_operands = [x.strip() for x in outs_clause.split(":", 1)[0].split(",") if x.strip().startswith("%")] + assert len(ins_operands) == 2, f"{pto_op} must have exactly 2 ins operands, got {len(ins_operands)}:\n{line}" + assert len(outs_operands) == 2, f"{pto_op} must have exactly 2 outs operands, got {len(outs_operands)}:\n{line}" for name in out_names:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/ut/codegen/test_pto_codegen_ops.py` around lines 2254 - 2268, The test helper _assert_two_in_two_out should stop using ins_clause.count("%")/outs_clause.count("%") and instead parse the comma-separated operand list before any type annotations: split ins_clause and outs_clause by commas, trim each token and strip any trailing type annotations (e.g., remove content after whitespace or ':' or a '<'), then assert that the resulting operand lists have length == 2 (use equality, not >=) and that each entry matches expected SSA names; keep the existing check that each expected out name (from out_names) appears in the parsed outs list and that exactly one pto.alloc_tile line exists for each out name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/en/dev/ir/05-operators.md`:
- Line 277: The docs for the interleave operator omit the rank requirement;
update the Interleave description for tile.interleave (and the table row for
**Interleave**) to state that both lhs and rhs must be 2D tiles (rank == 2) in
addition to matching dtype, shape, and valid_shape, and note the element width
constraints; reference the IR implementation in
src/ir/op/tile_ops/interleave.cpp for the enforced 2D constraint to ensure
documentation matches the code.
- Line 278: Update the `tile.deinterleave` doc row to explicitly state the 2D
(rank == 2) constraint: mention that, like `tile.interleave`,
`pl.tile.deinterleave(lhs, rhs)` requires 2D inputs (rank == 2) and follows the
same dtype/shape/valid_shape and 8/16/32-bit width restrictions; reference the
shared type-deducer behavior that enforces rank == 2 (the same rule applied in
`tile.interleave`) so readers don’t need to cross-reference the previous row.
In `@docs/zh-cn/dev/ir/05-operators.md`:
- Line 272: The entry for `tile.deinterleave` must explicitly state the 2D (rank
== 2) tile constraint rather than only saying "same constraints as
`tile.interleave`"; update the sentence for `tile.deinterleave` to mention that
both operators require tiles of rank == 2 (2D), the same dtype/shape/valid_shape
and bit-width constraints, and that this 2D requirement is enforced by their
shared type inference (rank == 2).
- Line 271: Update the interleave operator docs to state that lhs and rhs must
be 2D tiles (rank == 2) in addition to existing requirements; specifically note
that src/ir/op/tile_ops/interleave.cpp enforces a 2D tile constraint, so
document that lhs/rhs must have rank==2, matching dtype/shape/valid_shape,
element bitwidth (8/16/32), and that both outputs copy lhs tile type.
In `@tests/ut/ir/operators/test_interleave.py`:
- Around line 163-215: Add two tests to TestTileDeinterleaveTypes to mirror
interleave coverage: implement test_non_2d_raises which wraps a program using
pl.tile.deinterleave on 3D tiles and asserts pytest.raises(Exception, match="2D
tiles"), and implement test_valid_shape_mismatch_raises which creates a tile
with an altered valid shape via pl.tile.set_validshape then calls
pl.tile.deinterleave and asserts pytest.raises(Exception, match="valid_shape to
match"); place both methods in the TestTileDeinterleaveTypes class so they
reference pl.tile.deinterleave, pl.tile.set_validshape and the same pattern used
by existing tests (use `@pl.program` and `@pl.function` with appropriate Tensor/Tile
annotations).
---
Nitpick comments:
In `@src/ir/op/tile_ops/interleave.cpp`:
- Around line 127-141: Update the REGISTER_OP("tile.deinterleave") description
to explicitly state element-level semantics matching the PTO ISA tdintlv: define
that the operation conceptually forms a concatenated vector lhs||rhs (lhs
elements followed by rhs elements), then produces TupleType{even, odd} where
"even" contains elements from the concatenated vector at even indices
(0,2,4,...) and "odd" contains elements at odd indices (1,3,5,...); keep the
note that lhs and rhs must have the same dtype/shape/valid_shape (8/16/32-bit)
and mention that index ordering is element-level (not byte-level) so readers can
map this to tdintlv and to DeduceTileInterleaveType.
- Around line 109-125: The op description for REGISTER_OP("tile.interleave") is
ambiguous about "lower halves"; update the .set_description text to explicitly
state element-index semantics (and reference the PTO tintlv behavior): define
half = lane_count/2, then low = [lhs[0], rhs[0], lhs[1], rhs[1], ...,
lhs[half-1], rhs[half-1]] and high = [lhs[half], rhs[half], lhs[half+1],
rhs[half+1], ..., lhs[half+half-1], rhs[half+half-1]] so readers know which
input indices map to each output stream; keep this description next to
REGISTER_OP("tile.interleave") and ensure it matches DeduceTileInterleaveType
and PTO ISA tintlv semantics.
In `@tests/ut/codegen/test_pto_codegen_ops.py`:
- Around line 2254-2268: The test helper _assert_two_in_two_out should stop
using ins_clause.count("%")/outs_clause.count("%") and instead parse the
comma-separated operand list before any type annotations: split ins_clause and
outs_clause by commas, trim each token and strip any trailing type annotations
(e.g., remove content after whitespace or ':' or a '<'), then assert that the
resulting operand lists have length == 2 (use equality, not >=) and that each
entry matches expected SSA names; keep the existing check that each expected out
name (from out_names) appears in the parsed outs list and that exactly one
pto.alloc_tile line exists for each out name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f5c2af21-2ade-4c6d-ac23-0e10da63601d
📒 Files selected for processing (9)
CMakeLists.txtdocs/en/dev/ir/05-operators.mddocs/zh-cn/dev/ir/05-operators.mdpython/pypto/ir/op/tile_ops.pypython/pypto/language/op/tile_ops.pysrc/backend/common/pto_ops_common.cppsrc/ir/op/tile_ops/interleave.cpptests/ut/codegen/test_pto_codegen_ops.pytests/ut/ir/operators/test_interleave.py
| | **Reduction** | `tile.sum` | Reduction along axis (axis, keepdim) | | ||
| | **Scatter** | `tile.scatter` | Row-scatter `src` into `dst` at per-row indices (`pto.tscatter` index form; DPS — `dst` is in/out, the result aliases `dst`). `src`/`dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16}; `indexes` dtype ∈ {I16, I32}; element-size matching rule: 4-byte dst ↔ INT32, 2-byte dst ↔ INT16, 1-byte dst ↔ INT16. | | ||
| | - | `tile.scatter_mask` | Mask-pattern row-scatter: write each `src` row into the mask-marked columns of `dst` (`pto.tscatter` mask form; DPS). Mask pattern selects positions: P0101 (1) / P1010 (2) — stride 2; P0001 (3) / P0010 (4) / P0100 (5) / P1000 (6) — stride 4; P1111 (7) — no expansion. Targeted at A3 / CPU-sim style backends — A5 rejects this form. | | ||
| | **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must have identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. | |
There was a problem hiding this comment.
Add the 2D tile constraint.
The IR implementation enforces that both lhs and rhs must be 2D tiles (rank == 2). The documentation currently lists dtype/shape/valid_shape matching and element-width constraints but omits the 2D requirement, which could mislead users into attempting unsupported ranks.
📝 Suggested addition
-| **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must have identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. |
+| **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed 2D tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must be 2D tiles with identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. |Based on context from src/ir/op/tile_ops/interleave.cpp which enforces the 2D requirement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/en/dev/ir/05-operators.md` at line 277, The docs for the interleave
operator omit the rank requirement; update the Interleave description for
tile.interleave (and the table row for **Interleave**) to state that both lhs
and rhs must be 2D tiles (rank == 2) in addition to matching dtype, shape, and
valid_shape, and note the element width constraints; reference the IR
implementation in src/ir/op/tile_ops/interleave.cpp for the enforced 2D
constraint to ensure documentation matches the code.
| | **Scatter** | `tile.scatter` | Row-scatter `src` into `dst` at per-row indices (`pto.tscatter` index form; DPS — `dst` is in/out, the result aliases `dst`). `src`/`dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16}; `indexes` dtype ∈ {I16, I32}; element-size matching rule: 4-byte dst ↔ INT32, 2-byte dst ↔ INT16, 1-byte dst ↔ INT16. | | ||
| | - | `tile.scatter_mask` | Mask-pattern row-scatter: write each `src` row into the mask-marked columns of `dst` (`pto.tscatter` mask form; DPS). Mask pattern selects positions: P0101 (1) / P1010 (2) — stride 2; P0001 (3) / P0010 (4) / P0100 (5) / P1000 (6) — stride 4; P1111 (7) — no expansion. Targeted at A3 / CPU-sim style backends — A5 rejects this form. | | ||
| | **Interleave** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` — interleave two same-typed tiles: `low` = `lhs0, rhs0, lhs1, rhs1, ...` over the lower halves, `high` = same over the upper halves. `lhs`/`rhs` must have identical dtype, shape, and valid_shape; both outputs copy the lhs tile type. Element widths 8/16/32-bit. Pending PTOAS tile-form support (`pto.tintlv`) — codegen-verified only, no on-device system test yet. | | ||
| | - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. Same dtype/shape/valid_shape constraints and 8/16/32-bit widths as `tile.interleave`. Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. | |
There was a problem hiding this comment.
Add the 2D tile constraint explicitly.
Although this entry references "same constraints" as tile.interleave, the 2D requirement is not stated explicitly. Both operators share the same type deducer which enforces rank == 2. For clarity and to avoid requiring users to cross-reference the previous row, explicitly state the 2D constraint here as well.
📝 Suggested addition
-| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. Same dtype/shape/valid_shape constraints and 8/16/32-bit widths as `tile.interleave`. Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. |
+| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. `lhs`/`rhs` must be 2D tiles with identical dtype, shape, and valid_shape; element widths 8/16/32-bit (same constraints as `tile.interleave`). Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. |Based on context from src/ir/op/tile_ops/interleave.cpp which enforces the 2D requirement.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. Same dtype/shape/valid_shape constraints and 8/16/32-bit widths as `tile.interleave`. Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. | | |
| | - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` — split the `lhs\|rhs` concatenation into even-indexed and odd-indexed elements. `lhs`/`rhs` must be 2D tiles with identical dtype, shape, and valid_shape; element widths 8/16/32-bit (same constraints as `tile.interleave`). Pending PTOAS tile-form support (`pto.tdintlv`) — codegen-verified only, no on-device system test yet. | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/en/dev/ir/05-operators.md` at line 278, Update the `tile.deinterleave`
doc row to explicitly state the 2D (rank == 2) constraint: mention that, like
`tile.interleave`, `pl.tile.deinterleave(lhs, rhs)` requires 2D inputs (rank ==
2) and follows the same dtype/shape/valid_shape and 8/16/32-bit width
restrictions; reference the shared type-deducer behavior that enforces rank == 2
(the same rule applied in `tile.interleave`) so readers don’t need to
cross-reference the previous row.
| | **规约** | `tile.sum` | 沿轴规约(axis, keepdim) | | ||
| | **散布** | `tile.scatter` | 按行索引把 `src` 散布到 `dst`(`pto.tscatter` 索引形式;DPS:`dst` 为 in/out,结果别名为 `dst`)。`src` / `dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16};`indexes` dtype ∈ {I16, I32};元素宽度匹配规则:4 字节 dst ↔ INT32,2 字节 dst ↔ INT16,1 字节 dst ↔ INT16。 | | ||
| | - | `tile.scatter_mask` | 按掩码模式把 `src` 行写入 `dst` 中由掩码选中的列(`pto.tscatter` 掩码形式;DPS)。掩码 P0101 (1) / P1010 (2) 步幅 2;P0001..P1000 (3-6) 步幅 4;P1111 (7) 不扩展。仅 A3 / CPU-sim 后端支持,A5 拒绝。 | | ||
| | **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 的 dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 | |
There was a problem hiding this comment.
添加 2D tile 约束。
IR 实现强制要求 lhs 和 rhs 必须为 2D tile(rank == 2)。文档目前列出了 dtype/shape/valid_shape 匹配和元素位宽约束,但遗漏了 2D 要求,可能误导用户尝试使用不支持的秩。
📝 建议修改
-| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 的 dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |
+| **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 2D tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 必须为 2D tile 且 dtype、shape、valid_shape 完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |基于 src/ir/op/tile_ops/interleave.cpp 中强制执行的 2D 要求。
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 的 dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 | | |
| | **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 2D tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 必须为 2D tile 且 dtype、shape、valid_shape 完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/zh-cn/dev/ir/05-operators.md` at line 271, Update the interleave
operator docs to state that lhs and rhs must be 2D tiles (rank == 2) in addition
to existing requirements; specifically note that
src/ir/op/tile_ops/interleave.cpp enforces a 2D tile constraint, so document
that lhs/rhs must have rank==2, matching dtype/shape/valid_shape, element
bitwidth (8/16/32), and that both outputs copy lhs tile type.
| | **散布** | `tile.scatter` | 按行索引把 `src` 散布到 `dst`(`pto.tscatter` 索引形式;DPS:`dst` 为 in/out,结果别名为 `dst`)。`src` / `dst` dtype ∈ {I8, I16, I32, FP16, FP32, BF16};`indexes` dtype ∈ {I16, I32};元素宽度匹配规则:4 字节 dst ↔ INT32,2 字节 dst ↔ INT16,1 字节 dst ↔ INT16。 | | ||
| | - | `tile.scatter_mask` | 按掩码模式把 `src` 行写入 `dst` 中由掩码选中的列(`pto.tscatter` 掩码形式;DPS)。掩码 P0101 (1) / P1010 (2) 步幅 2;P0001..P1000 (3-6) 步幅 4;P1111 (7) 不扩展。仅 A3 / CPU-sim 后端支持,A5 拒绝。 | | ||
| | **交织** | `tile.interleave` | `low, high = pl.tile.interleave(lhs, rhs)` —— 交织两个同类型 tile:`low` = 两输入低半部的 `lhs0, rhs0, lhs1, rhs1, ...`,`high` = 高半部同样交织。`lhs`/`rhs` 的 dtype、shape、valid_shape 必须完全一致;两个输出均复制 lhs 的 tile 类型。元素位宽 8/16/32-bit。等待 PTOAS tile 形式(`pto.tintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 | | ||
| | - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` —— 将 `lhs\|rhs` 拼接按偶/奇下标拆分为两个 tile。约束(同 dtype/shape/valid_shape、8/16/32-bit 位宽)与 `tile.interleave` 相同。等待 PTOAS tile 形式(`pto.tdintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 | |
There was a problem hiding this comment.
显式添加 2D tile 约束。
虽然此条目提到"约束与 tile.interleave 相同",但未显式说明 2D 要求。两个算子共享同一个类型推导器,均强制 rank == 2。为清晰起见且避免用户需要交叉引用上一行,请在此处也显式声明 2D 约束。
📝 建议修改
-| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` —— 将 `lhs\|rhs` 拼接按偶/奇下标拆分为两个 tile。约束(同 dtype/shape/valid_shape、8/16/32-bit 位宽)与 `tile.interleave` 相同。等待 PTOAS tile 形式(`pto.tdintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |
+| - | `tile.deinterleave` | `even, odd = pl.tile.deinterleave(lhs, rhs)` —— 将 `lhs\|rhs` 拼接按偶/奇下标拆分为两个 tile。`lhs`/`rhs` 必须为 2D tile 且 dtype、shape、valid_shape 完全一致;元素位宽 8/16/32-bit(约束与 `tile.interleave` 相同)。等待 PTOAS tile 形式(`pto.tdintlv`)支持 —— 目前仅 codegen 验证,无在板系统测试。 |基于 src/ir/op/tile_ops/interleave.cpp 中强制执行的 2D 要求。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/zh-cn/dev/ir/05-operators.md` at line 272, The entry for
`tile.deinterleave` must explicitly state the 2D (rank == 2) tile constraint
rather than only saying "same constraints as `tile.interleave`"; update the
sentence for `tile.deinterleave` to mention that both operators require tiles of
rank == 2 (2D), the same dtype/shape/valid_shape and bit-width constraints, and
that this 2D requirement is enforced by their shared type inference (rank == 2).
| class TestTileDeinterleaveTypes: | ||
| """Type-contract tests for tile.deinterleave: same contract as interleave.""" | ||
|
|
||
| @pytest.mark.parametrize("dtype", _VALID_DTYPES) | ||
| def test_valid_dtype(self, dtype): | ||
| prog = _build_deinterleave_program(dtype=dtype) | ||
| text = str(prog) | ||
| assert "tile.deinterleave" in text | ||
| assert text.count("tile.deinterleave") == 1 | ||
|
|
||
| @pytest.mark.parametrize("dtype", _INVALID_DTYPES) | ||
| def test_invalid_dtype_raises(self, dtype): | ||
| with pytest.raises(Exception, match="8/16/32-bit"): | ||
| _build_deinterleave_program(dtype=dtype) | ||
|
|
||
| def test_dtype_mismatch_raises(self): | ||
| with pytest.raises(Exception, match="dtype to match"): | ||
|
|
||
| @pl.program | ||
| class BadDtypeDeintlv: | ||
| @pl.function(type=pl.FunctionType.InCore) | ||
| def main( | ||
| self, | ||
| lhs: pl.Tensor[[32, 64], pl.INT16], | ||
| rhs: pl.Tensor[[32, 64], pl.INT32], | ||
| out_even: pl.Tensor[[32, 64], pl.INT16], | ||
| out_odd: pl.Tensor[[32, 64], pl.INT16], | ||
| ): | ||
| a: pl.Tile[[32, 64], pl.INT16] = pl.load(lhs, [0, 0], [32, 64]) | ||
| b: pl.Tile[[32, 64], pl.INT32] = pl.load(rhs, [0, 0], [32, 64]) | ||
| even, odd = pl.tile.deinterleave(a, b) | ||
| pl.store(even, [0, 0], out_even) | ||
| pl.store(odd, [0, 0], out_odd) | ||
|
|
||
| def test_shape_mismatch_raises(self): | ||
| with pytest.raises(Exception, match="shapes to match"): | ||
|
|
||
| @pl.program | ||
| class BadShapeDeintlv: | ||
| @pl.function(type=pl.FunctionType.InCore) | ||
| def main( | ||
| self, | ||
| lhs: pl.Tensor[[32, 64], pl.FP32], | ||
| rhs: pl.Tensor[[16, 64], pl.FP32], | ||
| out_even: pl.Tensor[[32, 64], pl.FP32], | ||
| out_odd: pl.Tensor[[32, 64], pl.FP32], | ||
| ): | ||
| a: pl.Tile[[32, 64], pl.FP32] = pl.load(lhs, [0, 0], [32, 64]) | ||
| b: pl.Tile[[16, 64], pl.FP32] = pl.load(rhs, [0, 0], [16, 64]) | ||
| even, odd = pl.tile.deinterleave(a, b) | ||
| pl.store(even, [0, 0], out_even) | ||
| pl.store(odd, [0, 0], out_odd) | ||
|
|
There was a problem hiding this comment.
Add missing test coverage for deinterleave.
TestTileDeinterleaveTypes is missing two test cases that are present in TestTileInterleaveTypes:
test_non_2d_raises– verifies that deinterleave rejects non-2D tiles (PR objectives state "restricted to 2D tiles" for both ops)test_valid_shape_mismatch_raises– verifies that deinterleave rejects mismatched valid_shape
Although both ops share the same type deduction and would reject these cases, explicit test coverage ensures the contract is verified for both operators.
🧪 Suggested test additions
Add these two test methods to TestTileDeinterleaveTypes:
def test_non_2d_raises(self):
with pytest.raises(Exception, match="2D tiles"):
`@pl.program`
class Bad3DDeintlv:
`@pl.function`(type=pl.FunctionType.InCore)
def main(
self,
lhs: pl.Tensor[[2, 32, 64], pl.FP32],
rhs: pl.Tensor[[2, 32, 64], pl.FP32],
out_even: pl.Tensor[[2, 32, 64], pl.FP32],
out_odd: pl.Tensor[[2, 32, 64], pl.FP32],
):
a: pl.Tile[[2, 32, 64], pl.FP32] = pl.load(lhs, [0, 0, 0], [2, 32, 64])
b: pl.Tile[[2, 32, 64], pl.FP32] = pl.load(rhs, [0, 0, 0], [2, 32, 64])
even, odd = pl.tile.deinterleave(a, b)
pl.store(even, [0, 0, 0], out_even)
pl.store(odd, [0, 0, 0], out_odd)
def test_valid_shape_mismatch_raises(self):
with pytest.raises(Exception, match="valid_shape to match"):
`@pl.program`
class BadValidShapeDeintlv:
`@pl.function`(type=pl.FunctionType.InCore)
def main(
self,
lhs: pl.Tensor[[32, 64], pl.FP32],
rhs: pl.Tensor[[32, 64], pl.FP32],
out_even: pl.Tensor[[32, 64], pl.FP32],
out_odd: pl.Tensor[[32, 64], pl.FP32],
):
a: pl.Tile[[32, 64], pl.FP32] = pl.load(lhs, [0, 0], [32, 64])
b_full: pl.Tile[[32, 64], pl.FP32] = pl.load(rhs, [0, 0], [32, 64])
b: pl.Tile[[32, 64], pl.FP32] = pl.tile.set_validshape(b_full, 32, 32)
even, odd = pl.tile.deinterleave(a, b)
pl.store(even, [0, 0], out_even)
pl.store(odd, [0, 0], out_odd)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/ut/ir/operators/test_interleave.py` around lines 163 - 215, Add two
tests to TestTileDeinterleaveTypes to mirror interleave coverage: implement
test_non_2d_raises which wraps a program using pl.tile.deinterleave on 3D tiles
and asserts pytest.raises(Exception, match="2D tiles"), and implement
test_valid_shape_mismatch_raises which creates a tile with an altered valid
shape via pl.tile.set_validshape then calls pl.tile.deinterleave and asserts
pytest.raises(Exception, match="valid_shape to match"); place both methods in
the TestTileDeinterleaveTypes class so they reference pl.tile.deinterleave,
pl.tile.set_validshape and the same pattern used by existing tests (use
`@pl.program` and `@pl.function` with appropriate Tensor/Tile annotations).
|
Status: BLOCKED — converted to draft. PTOAS v0.45 has no tile-form |
Summary
tile.interleave/tile.deinterleaveops (issue [New Op] Support interleave/deinterleave operations #1325): two same-typed 2D Vec tiles in, ordered result pair out (TupleType{low, high}resp.{even, odd}), mirroring thegather_comparemulti-result pattern end to end (C++ op + type deduction, Python IR wrapper, DSL tuple-unpack API, codegen)pto.tintlv/pto.tdintlv(2 ins + 2 outs, onealloc_tileper output) from a single shared factory; mnemonics are constants inpto_ops_common.cpp— the single touch point once PTOAS adds tile-form interleaveTesting
tests/ut/ir/operators/test_interleave.py)tests/ut/codegen/test_pto_codegen_ops.py)tintlv/tdintlv; ST follows when PTOAS landsRelated Issues
Closes #1325