Skip to content

feat(backend): add SuperscalarNPU backend with TREG register-file allocation#1680

Open
lyfne123 wants to merge 2 commits into
hw-native-sys:mainfrom
lyfne123:superscalar-npu-backend
Open

feat(backend): add SuperscalarNPU backend with TREG register-file allocation#1680
lyfne123 wants to merge 2 commits into
hw-native-sys:mainfrom
lyfne123:superscalar-npu-backend

Conversation

@lyfne123

@lyfne123 lyfne123 commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a new SuperscalarNPU backend whose only memory spaces are DDR and TREG — a register file of 256 fixed 4 KB blocks (1 MB), addressed by block index. The backend lowers tensor ops to tile ops and allocates TREG via register renaming. Code generation is not implemented yet; this is an IR-only backend.

Memory space

  • Add MemorySpace::TREG across the enum, string maps, Python binding, type stub, and IR-hierarchy BNF.

Register-renaming allocation

  • Extend MemoryAllocatorPolicy with AddressUnitBytes (default 1) and MaxAddressUnits (default unbounded).
  • AllocateMemoryAddr stores block indices for register-file spaces (unit > 1), reserves consecutive blocks for tiles larger than one block, and raises a user-facing error when more blocks are simultaneously live than exist (register pressure). The AllocatedMemoryAddr verifier scales stored addresses by the unit so the byte-based capacity check stays unit-agnostic.
  • Add SuperscalarNPURegisterAllocatorPolicy (4 KB blocks, 256-block limit). Liveness-based coalescing is still done by the shared MemoryReuse pass.

Backend registration

  • Add BackendType::SuperscalarNPU, CreateSuperscalarNPUSoC (DDR + 1 MB TREG), BackendSuperscalarNPU + SuperscalarNPUHandler, factory/registry wiring, CMake sources, Python bindings, and type stubs.

Pass pipeline

  • Add OptimizationStrategy.SuperscalarNPU: tensor prefix + outline + ConvertTensorToTileOps + InferTileMemorySpace + InitMemRef + MemoryReuse + AllocateMemoryAddr, omitting all Ascend cube/vector/cross-core passes.
  • Add BackendHandler::GetDefaultOnChipMemorySpace (Vec on Ascend, TREG here) so InferTileMemorySpace targets each backend's on-chip storage without branching on BackendType; a generic Vec op input-constraint is realized as this default so no spurious tile.move is inserted. Ascend behavior is unchanged (its default is Vec, so all new code paths are no-ops there).

Testing

  • New backend tests (construction / SoC / handler) and an end-to-end TREG strategy test: TREG assignment, block-index addressing, register renaming, and the register-pressure error.
  • Regression: tests/ut/ir/transforms/ (1550 passed), tests/ut/codegen/ (408 passed), allocate/reuse/infer suites — Ascend unaffected.
  • Codegen for SuperscalarNPU (out of scope; backend is IR-only for now).

Notes

  • CoreType only offers CUBE/VECTOR; the single SuperscalarNPU compute core reuses VECTOR (noted in the SoC factory).

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4ae55ca5-9ef0-4848-a00c-396ccf92454c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces support for a new SuperscalarNPU backend with a register-file memory model. It expands IR memory abstractions, adds a complete backend implementation with TREG (register file) memory, refactors tile memory-space inference and address allocation passes to use backend-specific on-chip defaults and block-indexed addressing, integrates the backend into the dispatch system, exposes it to Python, documents the architecture, and validates the implementation with comprehensive tests.

Changes

SuperscalarNPU Backend Addition

Layer / File(s) Summary
IR Core Memory Types and Allocator Policy Abstraction
include/pypto/ir/memory_space.h, include/pypto/ir/memory_allocator_policy.h, src/ir/memref.cpp
MemorySpace enum gains TREG for SuperscalarNPU register files. MemoryAllocatorPolicy interface adds AddressUnitBytes() and MaxAddressUnits() virtual hooks (with default implementations) to support block-indexed addressing schemes. String serialization for TREG is added.
Backend Handler Default On-Chip Memory Space
include/pypto/backend/common/backend_handler.h
New GetDefaultOnChipMemorySpace() virtual hook on BackendHandler allows backends to specify the on-chip memory space used when no constraint is present, defaulting to Vec for backward compatibility.
SuperscalarNPU Backend and SoC Definition
include/pypto/backend/superscalar/backend_superscalar_npu.h, include/pypto/backend/superscalar/backend_superscalar_npu_handler.h, include/pypto/backend/superscalar/register_allocator_policy.h, include/pypto/backend/common/soc.h, src/backend/superscalar/backend_superscalar_npu.cpp, src/backend/superscalar/backend_superscalar_npu_handler.cpp, src/backend/common/soc.cpp
Implements BackendSuperscalarNPU class with singleton accessor, handler returning SuperscalarNPUHandler (which overrides all backend interface methods with register-file semantics: TREG as default on-chip space, no Vec/Mat buffers, zero capacities/thresholds), and register allocator policy providing 4KB-block addressing. Defines CreateSuperscalarNPUSoC() factory creating a single-core VECTOR SoC with TREG (256×4KB) as only storage plus DDR.
Backend Type System Integration and Dispatch
include/pypto/backend/common/backend.h, src/backend/common/backend.cpp, src/backend/common/backend_registry.cpp
Adds BackendType::SuperscalarNPU enum variant. Extends GetBackendInstance() to dispatch to BackendSuperscalarNPU::Instance(). Updates BackendTypeToString(). Registers SuperscalarNPU in backend registry as a singleton-only backend (factory throws).
Transform Passes Refactoring for Backend-Specific Defaults
src/ir/transforms/infer_tile_memory_space_pass.cpp, src/ir/transforms/allocate_memory_addr_pass.cpp
Refactors InferTileMemorySpace to query backend's default on-chip memory space and thread it through all inference phases, remapping Vec constraints to backend defaults so SuperscalarNPU (which uses TREG) avoids spurious move insertions. Refactors AllocateMemoryAddresses to derive addressing unit and max units per space from allocator policy, computing block indices for register-file spaces and validating register pressure against policy-reported capacity limits.
Python Bindings and Pass Strategy
python/bindings/modules/backend.cpp, python/bindings/modules/ir.cpp, python/pypto/pypto_core/backend.pyi, python/pypto/pypto_core/ir.pyi, python/pypto/backend/__init__.py, python/pypto/ir/pass_manager.py
Exposes BackendSuperscalarNPU class, BackendType.SuperscalarNPU, and ir.MemorySpace.TREG to Python nanobind bindings. Adds OptimizationStrategy.SuperscalarNPU with tensor and tile pass sequences. Exports backend classes via module initializers.
Documentation and Build Configuration
CMakeLists.txt, docs/en/dev/backend/00-backend_handler.md, docs/en/dev/ir/01-hierarchy.md, docs/zh-cn/dev/backend/00-backend_handler.md, docs/zh-cn/dev/ir/01-hierarchy.md
Adds two backend implementation files to CMakeLists.txt. Documents GetDefaultOnChipMemorySpace() and SuperscalarNPU behavior in backend handler docs (English and Chinese). Updates IR hierarchy grammar to include TREG memory space. Explains how default on-chip space and register allocator policy eliminate the need to branch on backend type in shared passes.
Tests and Validation
tests/ut/backend/test_backend_superscalar_npu.py, tests/ut/ir/transforms/test_superscalar_npu_treg.py
Backend unit tests validate singleton pattern, type name, SoC topology (1 die, 1 cluster, 1 core), TREG size (1MB = 256×4KB), DDR→TREG memory path, absence of Vec/Mat buffers, and handler defaults. End-to-end transform tests validate elementwise lowering to TREG, register renaming reducing allocations, block-indexed addressing, and register pressure error when exceeding 256 blocks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

enhancement

Poem

🐰 A register file hops into view,
Four kilobyte blocks, two hundred fifty-six or two,
SuperscalarNPU takes its place with care,
Default on-chip space in backend layers fair,
No spurious moves, just blocks addressed with grace!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new SuperscalarNPU backend with TREG register-file allocation, which is the primary focus of this PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering memory space additions, register-renaming allocation, backend registration, pass pipeline, testing, and design notes about the SuperscalarNPU backend.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the SuperscalarNPU backend, which features a register-file memory model consisting of DDR and a 1MB TREG register file divided into 256 fixed 4KB blocks. This backend lowers tensor operations to tile operations and performs register-renaming TREG allocation, bypassing Ascend-specific passes. Key changes include updating the memory space enum, extending the memory allocator policy to support address units and maximum units, and integrating the new backend into the shared memory-space inference and address allocation passes. A critical review comment points out that during address allocation for register-file spaces, integer division of the byte address by the address unit could silently truncate non-aligned offsets, potentially causing aliasing bugs. The reviewer suggests adding an INTERNAL_CHECK_SPAN to guarantee that the byte address is perfectly divisible by the address unit.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/ir/transforms/allocate_memory_addr_pass.cpp

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ir/transforms/infer_tile_memory_space_pass.cpp (1)

871-886: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

TileMemoryInferred verifier still checks raw Vec constraints instead of backend-mapped constraints.

Phase 1/2/3 now map generic Vec constraints to backend on-chip default (TREG on SuperscalarNPU), but verifier input-constraint validation still compares against the un-mapped registry list. This can emit false errors on valid SuperscalarNPU output.

Proposed fix
 class TileMemoryInferredVerifier : public IRVisitor {
  public:
   explicit TileMemoryInferredVerifier(std::vector<Diagnostic>& diagnostics, std::string func_name)
-      : diagnostics_(diagnostics), func_name_(std::move(func_name)) {}
+      : diagnostics_(diagnostics), func_name_(std::move(func_name)) {
+    default_onchip_ = MemorySpace::Vec;
+    if (backend::BackendConfig::IsConfigured()) {
+      default_onchip_ = backend::GetBackend()->GetHandler()->GetDefaultOnChipMemorySpace();
+    }
+  }
@@
  private:
   std::vector<Diagnostic>& diagnostics_;
   std::string func_name_;
+  MemorySpace default_onchip_;
@@
-      bool allowed = std::find(allowed_spaces.begin(), allowed_spaces.end(), actual) != allowed_spaces.end();
+      bool allowed = std::any_of(allowed_spaces.begin(), allowed_spaces.end(),
+                                 [&](MemorySpace s) {
+                                   return MapConstraintSpace(s, default_onchip_) == actual;
+                                 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ir/transforms/infer_tile_memory_space_pass.cpp` around lines 871 - 886,
VerifyInputConstraints currently compares actual memory_space_ against the raw
registry Vec returned by GetInputConstraints, which causes false errors because
earlier phases map generic Vec constraints to backend-specific on-chip defaults;
update VerifyInputConstraints (and the call-site using GetInputConstraints) to
first convert/map the returned allowed_spaces into backend-resolved memory
spaces via the same mapping used by TileMemoryInferred (use the
TileMemoryInferred backend-mapping helper or the function that maps generic Vec
constraints to backend-specific MemorySpace values) and then compare actual
against that mapped set (i.e., replace direct use of allowed_spaces with the
backend-mapped_allowed_spaces before the std::find check).
🧹 Nitpick comments (1)
tests/ut/ir/transforms/test_superscalar_npu_treg.py (1)

104-107: ⚡ Quick win

Scope the byte-offset guard to TREG allocation lines

Line 107 currently rejects pl.const(65536, pl.INT64) anywhere in the whole dump, which can fail on unrelated constants. Narrow this check to pl.tile.alloc(pl.Mem.TREG, ...) lines only.

Suggested tightening
-        assert f"pl.const({_BLOCKS_PER_TILE}, pl.INT64), {_TILE_BYTES})" in text
-        # A byte-addressed second tile would read pl.const(65536, ...); ensure no
-        # TREG MemRef uses the byte offset as its address.
-        assert f"pl.const({_TILE_BYTES}, pl.INT64)" not in text
+        treg_alloc_lines = [ln for ln in text.splitlines() if "pl.tile.alloc(pl.Mem.TREG" in ln]
+        assert any(
+            f"pl.const({_BLOCKS_PER_TILE}, pl.INT64), {_TILE_BYTES})" in ln for ln in treg_alloc_lines
+        )
+        # A byte-addressed second tile would read pl.const(65536, ...); ensure this
+        # does not appear as a TREG MemRef address.
+        assert all(f"pl.const({_TILE_BYTES}, pl.INT64)" not in ln for ln in treg_alloc_lines)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/transforms/test_superscalar_npu_treg.py` around lines 104 - 107,
The test currently asserts that f"pl.const({_TILE_BYTES}, pl.INT64)" does not
appear anywhere in the dump; instead restrict this check to TREG allocation
lines by scanning only the lines (or substrings) that contain
"pl.tile.alloc(pl.Mem.TREG" and asserting that none of those lines include the
f"pl.const({_TILE_BYTES}, pl.INT64)" pattern; update the assertion to use a
targeted search/regex against "pl.tile.alloc(pl.Mem.TREG" entries rather than
the whole `text`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/pypto/backend/common/soc.h`:
- Around line 265-267: Update the contradictory SoC doc wording so it
consistently states that the SuperscalarNPU model has no dedicated CUBE core or
Ascend L0/L1 hierarchy and instead uses a single compute core modeled as VECTOR
carrying the TREG memory; specifically, edit the description mentioning "no
cube/vector cores" and the sentence referencing CoreType and VECTOR so both
assert "no dedicated CUBE core" (or "no Ascend L0/L1 buffers") and that the
single compute core is modeled as VECTOR in CoreType.

In `@src/backend/common/backend_registry.cpp`:
- Around line 72-79: Update the thrown ValueError in
RegisterBackendSuperscalarNPU so the guidance includes
SuperscalarNPU::Instance() (i.e., change the message in the throw within
RegisterBackendSuperscalarNPU to mention "use
BackendSuperscalarNPU::Instance()") and similarly update any other registry
error messages that currently reference Backend910B::Instance() or
Backend950::Instance() so they mention the new SuperscalarNPU::Instance() option
(look for the throw ValueError calls in the backend registration functions and
update their strings).

In `@src/ir/transforms/allocate_memory_addr_pass.cpp`:
- Around line 372-374: The division by addr_unit in the stored_addr computation
can collapse distinct byte addresses when byte_addr is not aligned; update the
code around byte_addr/stored_addr (using symbols current_addr, relative_offset,
byte_addr, stored_addr, addr_unit, and ConstInt) to validate alignment before
converting: if addr_unit > 1 assert or check that (byte_addr % addr_unit) == 0
and only then compute stored_addr = byte_addr / addr_unit; otherwise handle the
unaligned case explicitly (e.g., emit an error/guard, convert to a
byte-granularity address path, or use a defined rounding/packing strategy) so
you never silently floor non-aligned byte addresses into the same stored index.
Ensure the chosen behavior is deterministic and documented in the surrounding
logic.

---

Outside diff comments:
In `@src/ir/transforms/infer_tile_memory_space_pass.cpp`:
- Around line 871-886: VerifyInputConstraints currently compares actual
memory_space_ against the raw registry Vec returned by GetInputConstraints,
which causes false errors because earlier phases map generic Vec constraints to
backend-specific on-chip defaults; update VerifyInputConstraints (and the
call-site using GetInputConstraints) to first convert/map the returned
allowed_spaces into backend-resolved memory spaces via the same mapping used by
TileMemoryInferred (use the TileMemoryInferred backend-mapping helper or the
function that maps generic Vec constraints to backend-specific MemorySpace
values) and then compare actual against that mapped set (i.e., replace direct
use of allowed_spaces with the backend-mapped_allowed_spaces before the
std::find check).

---

Nitpick comments:
In `@tests/ut/ir/transforms/test_superscalar_npu_treg.py`:
- Around line 104-107: The test currently asserts that f"pl.const({_TILE_BYTES},
pl.INT64)" does not appear anywhere in the dump; instead restrict this check to
TREG allocation lines by scanning only the lines (or substrings) that contain
"pl.tile.alloc(pl.Mem.TREG" and asserting that none of those lines include the
f"pl.const({_TILE_BYTES}, pl.INT64)" pattern; update the assertion to use a
targeted search/regex against "pl.tile.alloc(pl.Mem.TREG" entries rather than
the whole `text`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ccd9bed6-68b5-4551-81f4-c8e23186de22

📥 Commits

Reviewing files that changed from the base of the PR and between a825800 and c5525d7.

📒 Files selected for processing (29)
  • CMakeLists.txt
  • docs/en/dev/backend/00-backend_handler.md
  • docs/en/dev/ir/01-hierarchy.md
  • docs/zh-cn/dev/backend/00-backend_handler.md
  • docs/zh-cn/dev/ir/01-hierarchy.md
  • include/pypto/backend/common/backend.h
  • include/pypto/backend/common/backend_handler.h
  • include/pypto/backend/common/soc.h
  • include/pypto/backend/superscalar/backend_superscalar_npu.h
  • include/pypto/backend/superscalar/backend_superscalar_npu_handler.h
  • include/pypto/backend/superscalar/register_allocator_policy.h
  • include/pypto/ir/memory_allocator_policy.h
  • include/pypto/ir/memory_space.h
  • python/bindings/modules/backend.cpp
  • python/bindings/modules/ir.cpp
  • python/pypto/backend/__init__.py
  • python/pypto/ir/pass_manager.py
  • python/pypto/pypto_core/backend.pyi
  • python/pypto/pypto_core/ir.pyi
  • src/backend/common/backend.cpp
  • src/backend/common/backend_registry.cpp
  • src/backend/common/soc.cpp
  • src/backend/superscalar/backend_superscalar_npu.cpp
  • src/backend/superscalar/backend_superscalar_npu_handler.cpp
  • src/ir/memref.cpp
  • src/ir/transforms/allocate_memory_addr_pass.cpp
  • src/ir/transforms/infer_tile_memory_space_pass.cpp
  • tests/ut/backend/test_backend_superscalar_npu.py
  • tests/ut/ir/transforms/test_superscalar_npu_treg.py

Comment thread include/pypto/backend/common/soc.h Outdated
Comment thread src/backend/common/backend_registry.cpp
Comment thread src/ir/transforms/allocate_memory_addr_pass.cpp
lyfne123 added a commit to lyfne123/pypto that referenced this pull request Jun 4, 2026
- AllocateMemoryAddr: guard register-file block-index conversion with an
  INTERNAL_CHECK_SPAN that the byte address is unit-aligned before dividing,
  so a non-aligned offset can no longer silently truncate and alias distinct
  addresses to the same block index (CodeRabbit, gemini-code-assist).
- soc.h: fix contradictory SuperscalarNPU SoC doc wording ("no cube/vector
  cores" vs "modelled as VECTOR") — now "no dedicated CUBE core / no Ascend
  L1/L0 hierarchy; single compute core modelled as VECTOR" (CodeRabbit).
- backend_registry.cpp: include BackendSuperscalarNPU::Instance() in the
  registry singleton-guidance error messages (CodeRabbit).
- tests: order imports as a single group matching the repo convention
  (pytest < pypto < pypto.*), fixing the pre-commit isort failure.
lyfne123 added 2 commits June 12, 2026 08:37
…ocation

Introduce a new SuperscalarNPU backend whose only memory spaces are DDR and
TREG, a register file of 256 fixed 4KB blocks (1MB) addressed by block index.
The backend lowers tensor ops to tile ops and allocates TREG via register
renaming; code generation is not implemented yet (IR-only).

Memory space:
- Add MemorySpace::TREG (enum, string maps, Python binding, type stub, BNF docs).

Register-renaming allocation:
- Extend MemoryAllocatorPolicy with AddressUnitBytes (default 1) and
  MaxAddressUnits (default unbounded). AllocateMemoryAddr now stores block
  indices for register-file spaces (unit > 1), reserves consecutive blocks for
  tiles larger than one block, and raises a user-facing error when more than the
  available blocks are simultaneously live (register pressure). The
  AllocatedMemoryAddr verifier scales stored addresses by the unit so the
  byte-based capacity check is unit-agnostic.
- Add SuperscalarNPURegisterAllocatorPolicy (4KB blocks, 256-block limit).
  Liveness-based coalescing is still performed by the shared MemoryReuse pass.

Backend registration:
- Add BackendType::SuperscalarNPU, CreateSuperscalarNPUSoC (DDR + 1MB TREG),
  BackendSuperscalarNPU + SuperscalarNPUHandler, factory/registry wiring,
  CMake sources, Python bindings, and type stubs.

Pass pipeline:
- Add OptimizationStrategy.SuperscalarNPU: tensor prefix + outline +
  ConvertTensorToTileOps + InferTileMemorySpace + InitMemRef + MemoryReuse +
  AllocateMemoryAddr, omitting all Ascend cube/vector/cross-core passes.
- Add BackendHandler::GetDefaultOnChipMemorySpace (Vec on Ascend, TREG here) so
  InferTileMemorySpace targets each backend's on-chip storage without branching
  on BackendType; a generic Vec op input-constraint is realized as this default
  so no spurious tile.move is inserted. Ascend behavior is unchanged.

Tests:
- Backend construction/SoC/handler tests and an end-to-end TREG strategy test
  (TREG assignment, block-index addressing, register renaming, pressure error).

Docs (en + zh-cn): BackendHandler reference and IR hierarchy BNF.
- AllocateMemoryAddr: guard register-file block-index conversion with an
  INTERNAL_CHECK_SPAN that the byte address is unit-aligned before dividing,
  so a non-aligned offset can no longer silently truncate and alias distinct
  addresses to the same block index (CodeRabbit, gemini-code-assist).
- soc.h: fix contradictory SuperscalarNPU SoC doc wording ("no cube/vector
  cores" vs "modelled as VECTOR") — now "no dedicated CUBE core / no Ascend
  L1/L0 hierarchy; single compute core modelled as VECTOR" (CodeRabbit).
- backend_registry.cpp: include BackendSuperscalarNPU::Instance() in the
  registry singleton-guidance error messages (CodeRabbit).
- tests: order imports as a single group matching the repo convention
  (pytest < pypto < pypto.*), fixing the pre-commit isort failure.
@lyfne123 lyfne123 force-pushed the superscalar-npu-backend branch from f72039b to 1f40641 Compare June 12, 2026 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant