Skip to content

fep(sig-framework): Add vllm-plugin-fl v0.2.0 new features#23

Open
cyber-pioneer wants to merge 3 commits into
flagos-ai:mainfrom
cyber-pioneer:main
Open

fep(sig-framework): Add vllm-plugin-fl v0.2.0 new features#23
cyber-pioneer wants to merge 3 commits into
flagos-ai:mainfrom
cyber-pioneer:main

Conversation

@cyber-pioneer

@cyber-pioneer cyber-pioneer commented May 27, 2026

Copy link
Copy Markdown

Overview

Add two platform-specific FEP files: one for NVIDIA and one for Hygon.

Description

  • Added separate FEP docs for NVIDIA and Hygon
  • Kept the same Qwen3.6 scope in both files:
    • Qwen3.6-35B-A3B
    • Qwen3.6-27B
    • text and image test coverage

@cyber-pioneer cyber-pioneer changed the title add vllm-plugin_fl fep fep(sig-framework): Add vllm-plugin-fl v0.2.0new features May 27, 2026
@cyber-pioneer cyber-pioneer changed the title fep(sig-framework): Add vllm-plugin-fl v0.2.0new features fep(sig-framework): Add vllm-plugin-fl v0.2.0 new features May 27, 2026
@cyber-pioneer cyber-pioneer force-pushed the main branch 2 times, most recently from 02beb90 to 8bc63af Compare May 28, 2026 07:37
Comment on lines +53 to +121
--name perf \
--network=host \
--ipc=host \
--device=/dev/kfd \
--device=/dev/mkfd \
--device=/dev/dri \
-v /opt/hyhal:/opt/hyhal \
-v /path/to/models:/models \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-itd harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.20.0-ubuntu22.04-dtk26.04-py3.10-MiniCPM-V-4.6 \
/bin/bash
```

### Build and Package

1. Inside the container, install build dependencies and FlagGems:

```bash
pip install -U scikit-build-core==0.11 pybind11 ninja cmake
git clone https://github.com/flagos-ai/FlagGems
cd FlagGems
git checkout 1dab11ab1a6671e3132528492d2cc193e78af8f4
pip install --no-build-isolation .
```

2. Clone and install vllm-plugin-FL:

```bash
git clone https://github.com/flagos-ai/vllm-plugin-FL
cd vllm-plugin-FL
pip install --no-build-isolation .
```

3. Download models
```bash
modelscope download --model Qwen/Qwen3.6-27B --local_dir /models/Qwen3.6-27B
modelscope download --model Qwen/Qwen3.6-35B-A3B --local_dir /models/Qwen3.6-35B-A3B
```

## Test Plan

The test plan below is required for Hygon.

### Environment Matrix

- Platform: Hygon

### Image Acquisition

Record image source explicitly in test logs, including:

- image name/tag
- vllm-plugin-FL commit
- vLLM version


### Component Setup and Running (Unified Case)

Use one unified serving-and-request case for Hygon. Only the model path changes between `Qwen3.6-35B-A3B` and `Qwen3.6-27B`.

#### 1. Start vLLM service

```bash
export VLLM_PLUGINS=fl
vllm serve /models/Qwen3.6-35B-A3B \
--served-model-name "qwen" \
--host 0.0.0.0 \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug Description

When enabling the fl platform plugin on a Hygon BW1000 environment, vLLM fails during platform initialization because the vendor "hygon" is not registered in VENDOR_DEVICE_MAP.

Reproduction

Command:

export VLLM_PLUGINS=fl

vllm serve /models/Qwen/Qwen3.6-27B \
    --served-model-name "qwen" \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 2 \
    --max-model-len 32768 \
    --trust-remote-code \
    --limit-mm-per-prompt '{"image": 1}'

Environment

  • Hardware: Hygon BW1000
  • Python: 3.10
  • vLLM plugin: vllm_fl

Error Log

INFO 05-29 09:14:37 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-29 09:14:37 [__init__.py:46] - fl -> vllm_fl:register
INFO 05-29 09:14:37 [__init__.py:58] Loading plugin fl
INFO 05-29 09:14:41 [__init__.py:238] Platform plugin fl is activated

Traceback (most recent call last):
  ...
  File "/usr/local/lib/python3.10/dist-packages/vllm_fl/utils.py", line 52, in _get_vendor_device_field
    raise ValueError(
ValueError: Vendor 'hygon' not found in VENDOR_DEVICE_MAP.

Analysis

The platform plugin successfully detects the vendor name as "hygon", but vllm_fl/utils.py does not contain a corresponding entry in VENDOR_DEVICE_MAP, causing initialization failure during platform resolution.

It may be necessary to:

  • add "hygon" into VENDOR_DEVICE_MAP, or
  • alias "hygon" to the AMD/ROCm backend if Hygon is expected to reuse the ROCm-compatible execution path.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check commit id if the same as doc shows

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reproduced on branch v0.2.0-rc0 with commit:

90e8c497e0241bf52ac7584f4e0cba573e4fa555

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there bug still?

kevinzs2048 added a commit to kevinzs2048/community that referenced this pull request Jun 3, 2026
Per maintainer feedback, expand the FEP to include explicit reproduction
steps in the Packaging and Test Plan sections, matching the format used by
the vllm-plugin-FL FEP (see flagos-ai#23):

* Packaging: 7-step build sequence (system deps, venv, optional manual
  LLVM, clone FlagTree, helper script, build, FlagGems install).
* Test Plan: Environment Matrix + 3 verification phases (backend
  registration / operator-level rms_norm / end-to-end MiniCPM5-0.9B INT8
  decode) each with executable commands and expected outputs.

Full install reference lives in FlagTree#633 (documents/install_arm64.md).
kevinzs2048 added a commit to kevinzs2048/community that referenced this pull request Jun 3, 2026
Per maintainer feedback, expand the FEP to include explicit reproduction
steps in the Packaging and Test Plan sections, matching the format used by
the vllm-plugin-FL FEP (see flagos-ai#23):

* Packaging: 7-step build sequence (system deps, venv, optional manual
  LLVM, clone FlagTree, helper script, build, FlagGems install).
* Test Plan: Environment Matrix + 3 verification phases (backend
  registration / operator-level rms_norm / end-to-end MiniCPM5-0.9B INT8
  decode) each with executable commands and expected outputs.

Full install reference lives in FlagTree#633 (documents/install_arm64.md).
yixiaodapeng added a commit to yixiaodapeng/community that referenced this pull request Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants