Skip to content

fep(sig-edge): add Arm64 cpu backend to flagtree#15

Open
kevinzs2048 wants to merge 4 commits into
flagos-ai:mainfrom
kevinzs2048:sig-edge-arm64
Open

fep(sig-edge): add Arm64 cpu backend to flagtree#15
kevinzs2048 wants to merge 4 commits into
flagos-ai:mainfrom
kevinzs2048:sig-edge-arm64

Conversation

@kevinzs2048

Copy link
Copy Markdown

This FEP proposes adding Arm64 CPU as a supported backend for FlagOS, enabling FlagTree's Triton kernels and FlagGems operators to compile and run on Arm CPUs. The Arm64 TLE capabilities are organized as a standalone FlagTree plugin following existing plugin conventions, with Triton-CPU as the compiler substrate. This extends FlagOS's multi-backend compilation model from cloud-side accelerators to edge devices.

Status: Provisional

SIG: sig-edge

Target Version: FlagOS 2.1

@CLAassistant

CLAassistant commented May 27, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@kevinzs2048 kevinzs2048 changed the title fep: add Arm64 cpu backend to flagtree fep(sig-edge): add Arm64 cpu backend to flagtree May 27, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请补充关于具体使用方法和命令

Per maintainer feedback, expand the FEP to include explicit reproduction
steps in the Packaging and Test Plan sections, matching the format used by
the vllm-plugin-FL FEP (see flagos-ai#23):

* Packaging: 7-step build sequence (system deps, venv, optional manual
  LLVM, clone FlagTree, helper script, build, FlagGems install).
* Test Plan: Environment Matrix + 3 verification phases (backend
  registration / operator-level rms_norm / end-to-end MiniCPM5-0.9B INT8
  decode) each with executable commands and expected outputs.

Full install reference lives in FlagTree#633 (documents/install_arm64.md).
- FlagGems merged: master (#3616) and 5.3.0-rc2 release branch (#3775);
  step 7 now clones flagos-ai/FlagGems -b 5.3.0-rc2
- C++ extension repo renamed flagos-ai/triton-cpu -> flagos-ai/flagtree-cpu;
  helper script is now link_flagtree_cpu.sh; install guide is install_cpu.md
- FLAGGEMS_VENDOR=arm is now mandatory (arm device_query_cmd probe removed;
  import flag_gems fails without it) — added to the e2e run command
- Verified build prerequisites: pybind11 (--no-build-isolation),
  TRITON_APPEND_CMAKE_ARGS to keep sleef out of /usr/local,
  export FLAGTREE_BACKEND=cpu for runtime kernel.s JIT
Reproduced the full packaging + test plan as a clean non-owner user:

- Build install prefix: use per-user $HOME/.flagtree_install instead of the
  shared /tmp/flagtree_install. On a multi-user machine the shared dir is owned
  by whoever built first, so a second user's cmake --install fails with
  Permission denied (the very error the flag is meant to avoid).
- Add 'pip install numpy' to step 2 to silence torch's 'Failed to initialize
  NumPy' warning (FlagGems pulls numpy anyway).

Validated: clean test user passes Test 1 (10 create_cpu_* ops), Test 2
(rms_norm max err 0.014348745346069336), Test 3 (MiniCPM5-0.9B INT8, 169 INT8
Linears, TPS 18.19).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants