Skip to content

fep(sig-operator): add FlagGems-vllm high-performance fused operator library proposal#20

Open
huangyiqun wants to merge 5 commits into
flagos-ai:mainfrom
huangyiqun:add_flaggems-vllm_fep
Open

fep(sig-operator): add FlagGems-vllm high-performance fused operator library proposal#20
huangyiqun wants to merge 5 commits into
flagos-ai:mainfrom
huangyiqun:add_flaggems-vllm_fep

Conversation

@huangyiqun

@huangyiqun huangyiqun commented May 27, 2026

Copy link
Copy Markdown

FEP: FlagGems-vllm

Adds a FEP document for FlagGems-vllm, a high-performance fused operator library for vLLM inference workloads in the FlagOS ecosystem.

  • SIG: sig-operator
  • Status: Provisional
  • Target: FlagOS 2.1

FlagGems-vllm provides Triton-based fused kernels and vLLM-facing operator implementations for performance-critical paths such as MoE routing, cache update, rotary embedding, FP8 quantization, sequence pack/unpack, and DeepSeek V4 attention helper kernels.

The FEP defines the repository scope, fused operator coverage, packaging approach, test plan, and migration process for keeping vLLM-related fused kernels in sync with FlagGems while exposing them through the standalone flaggems_vllm package.

Repository: https://github.com/flagos-ai/FlagGems-vllm

@huangyiqun huangyiqun changed the title fep(sig-operator): add FlagGems-vllm operator library proposal fep(sig-operator): add FlagGems-vllm high-performance fused operator library proposal May 27, 2026

This design allows the same operator API to be used across supported hardware backends as implementations become available.

### Testing and Benchmarking

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test: Does FlagGems-vllm only support NVIDIA hardware, or does it work with other vendors? If compatible, please list the supported vendors.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi backend adaptation and verification are currently underway.

| Dedicated package import | Run `python -c "import flaggems_vllm; import flaggems_vllm.ops"` after installation. |
| Fused operator API availability | Verify exported symbols from `flaggems_vllm.ops.__all__` include the migrated vLLM-facing fused operators. |
| Accuracy coverage | Run `pytest -q tests --collect-only` and targeted tests such as `pytest -q tests/test_moe_align_block_size.py --quick`. |
| DeepSeek V4 helper coverage | Run the DeepSeek V4 attention helper tests when the matching CUDA/vLLM reference environment is available. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specific test methods & test procedures ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific testing methods have been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants