Skip to content

feat: reduce CI time#819

Open
mouliangyu wants to merge 3 commits into
hw-native-sys:mainfrom
mouliangyu:feature-ci-time-opt
Open

feat: reduce CI time#819
mouliangyu wants to merge 3 commits into
hw-native-sys:mainfrom
mouliangyu:feature-ci-time-opt

Conversation

@mouliangyu

Copy link
Copy Markdown
Contributor

Summary

  • reduce VPTO SIM validation case cost with merged/pruned semantic coverage
  • add quiet camodel support for simulator-heavy CI logs
  • add TileLang ST smoke test path and keep full-test changes limited to silent camodel support
  • add smoke coverage for textract, textract_fp, and textract_v2v

Validation

  • VPTO SIM gate: PASS=127 FAIL=0, real 180.02s
  • TileLang DSL smoke CI: passed=88 failed=0 total=88, real 172.82s
  • Single smoke runs: textract, textract_fp, textract_v2v passed
  • py_compile for TileLang ST runner scripts and new smoke Python files

  1. silent camodel
  2. reduce inputs
  3. deep merge vpto cases
  4. build light tilelang dsl smoke test
@mouliangyu mouliangyu marked this pull request as ready for review June 15, 2026 11:57

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive system testing (ST) suite for TileLang operators on the NPU (a5 architecture), including CMake configurations, host drivers, data generation, and comparison scripts for a wide range of operators. It also adds a utility script to prepare a quiet camodel directory to reduce simulator I/O. The reviewer feedback highlights several key improvements: replacing the non-deterministic hash() function with zlib.crc32 for random seeding in st_common.py, opening the lock file in append mode ("a") in prepare_quiet_camodel.py to prevent premature truncation, making the hardcoded Ascend driver path configurable in CMakeLists.txt, and catching ValueError in the softmax comparison script to handle corrupted output files gracefully.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +74 to +80
def setup_case_rng(case):
"""Set a per-case deterministic random seed.

Using hash(name) ensures that adding/reordering cases does not change
the random data of existing cases.
"""
np.random.seed(hash(case["name"]) & 0xFFFFFFFF)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using Python's built-in hash() function for seeding np.random.seed is not deterministic across different runs. In Python 3, hash randomization is enabled by default, meaning hash(case["name"]) will return a different value every time the Python process restarts. To ensure reproducible test data, use a stable hashing algorithm like zlib.crc32.

Suggested change
def setup_case_rng(case):
"""Set a per-case deterministic random seed.
Using hash(name) ensures that adding/reordering cases does not change
the random data of existing cases.
"""
np.random.seed(hash(case["name"]) & 0xFFFFFFFF)
def setup_case_rng(case):
"""Set a per-case deterministic random seed.
Using a stable CRC32 hash ensures that adding/reordering cases does not change
the random data of existing cases, and remains deterministic across runs.
"""
import zlib
seed = zlib.crc32(case['name'].encode('utf-8')) & 0xFFFFFFFF
np.random.seed(seed)

Comment on lines +54 to +57
lock_path = os.path.join(quiet_dir, ".quiet-camodel.lock")
with open(lock_path, "w", encoding="utf-8") as lock_file:
fcntl.flock(lock_file, fcntl.LOCK_EX)
return prepare_quiet_camodel_locked(source_dir, quiet_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Opening the lock file in write ("w") mode truncates the file before acquiring the lock. If another process is currently holding the lock or waiting for it, this truncation can cause race conditions or interfere with its operations. It is a safer best practice to open lock files in append ("a") mode to avoid truncating the file.

Suggested change
lock_path = os.path.join(quiet_dir, ".quiet-camodel.lock")
with open(lock_path, "w", encoding="utf-8") as lock_file:
fcntl.flock(lock_file, fcntl.LOCK_EX)
return prepare_quiet_camodel_locked(source_dir, quiet_dir)
lock_path = os.path.join(quiet_dir, ".quiet-camodel.lock")
with open(lock_path, "a", encoding="utf-8") as lock_file:
fcntl.flock(lock_file, fcntl.LOCK_EX)
return prepare_quiet_camodel_locked(source_dir, quiet_dir)

set(PTO_ISA_ROOT "${CMAKE_CURRENT_LIST_DIR}/../../../../../../../../pto-isa" CACHE PATH "Path to pto-isa repo")
set(PTO_TILELANG_ST_COMMON_DIR
"${CMAKE_CURRENT_LIST_DIR}/../common")
set(ASCEND_DRIVER_PATH /usr/local/Ascend/driver)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Ascend driver path is hardcoded to /usr/local/Ascend/driver. If the driver is installed in a non-standard directory, the build will fail. It is better to allow overriding this path via a CMake cache variable or environment variable.

if(NOT DEFINED ASCEND_DRIVER_PATH)
    set(ASCEND_DRIVER_PATH /usr/local/Ascend/driver CACHE PATH "Path to Ascend driver directory")
endif()

Comment on lines +43 to +45
except FileNotFoundError as exc:
print(style_fail(f"[ERROR] {case['name']}: missing file {exc}"))
return False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If any of the output binary files are incomplete or corrupted (e.g., due to a simulator crash), np.fromfile().reshape() will raise a ValueError. Since only FileNotFoundError is caught here, a ValueError will cause the comparison script to crash with a traceback. Catching ValueError along with FileNotFoundError makes the test runner more robust.

Suggested change
except FileNotFoundError as exc:
print(style_fail(f"[ERROR] {case['name']}: missing file {exc}"))
return False
except (FileNotFoundError, ValueError) as exc:
print(style_fail(f'[ERROR] {case["name"]}: failed to load array {exc}'))
return False

@reedhecre

reedhecre commented Jun 15, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: feat: reduce CI time #819 feat: reduce CI time
  • Author: mouliangyu
  • Base/Head: main / feature-ci-time-opt
  • Head SHA: d0c42c594aae
  • Trigger: PR 有新提交
  • Generated At: 2026-06-15T12:06:09Z
  • Previous Head SHA: 0a9d1e9e2431
  • Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

 .../micro-op/vector-load-store/vsts/compare.py     | 209 ------
 .../micro-op/vector-load-store/vsts/golden.py      |  51 --
 .../micro-op/vector-load-store/vsts/kernel.pto     |  69 --
 .../micro-op/vector-load-store/vsts/launch.cpp     |  70 --
 .../cases/micro-op/vector-load-store/vsts/main.cpp | 130 ----
 .../vstsx2-layout-check/compare.py                 | 210 ------
 .../vstsx2-layout-check/golden.py                  |  56 --
 .../vstsx2-layout-check/kernel.pto                 |  49 --
 .../vstsx2-layout-check/launch.cpp                 |  71 --
 .../vector-load-store/vstsx2-layout-check/main.cpp | 130 ----
 .../vstur-init-align-outside-loop/compare.py       | 112 ---
 .../vstur-init-align-outside-loop/golden.py        |  52 --
 .../vstur-init-align-outside-loop/kernel.pto       |  53 --
 .../vstur-init-align-outside-loop/launch.cpp       |  54 --
 .../vstur-init-align-outside-loop/main.cpp         | 129 ----
 .../micro-op/vector-load-store/vstur/compare.py    | 257 -------
 .../micro-op/vector-load-store/vstur/golden.py     |  52 --
 .../micro-op/vector-load-store/vstur/kernel.pto    |  61 --
 .../micro-op/vector-load-store/vstur/launch.cpp    |  70 --
 .../micro-op/vector-load-store/vstur/main.cpp      | 130 ----
 .../scripts/run_host_vpto_validation_parallel.sh   |   3 +
 1671 files changed, 60006 insertions(+), 68192 deletions(-)
===== END STAGE clone rc=0 @ 2026-06-15 20:05:34 =====

===== STAGE codex-review @ 2026-06-15 20:05:34 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260615_200528_pr819/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260615_200528_pr819/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260615_200528_pr819/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260615_200528_pr819/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260615_200528_pr819/review_prompt.txt'
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260615_200528_pr819/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019ecb2c-4032-7ad0-851a-226aafa5c8fe
--------
user
你现在在审查 GitHub PR。

仓库:hw-native-sys/PTOAS
PR:#819 feat: reduce CI time
作者:mouliangyu
base branch:origin/main
head branch:HEAD(当前已 checkout 到 PR head)

要求:
1. 只审查这个 PR 相对 origin/main 的改动,必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议,不要提低价值猜测。
4. 严格按优先级输出:
   - P1:高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2:重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3:次要但明确可改的问题
5. 如果没有问题,summary 直接写:未检查到 PR #819 存在问题,并返回 findings=[]。
6. 如果有问题,summary 简洁概括,findings 里每条都要给出:
   - severity
   - title
   - body(说明为什么是问题,尽量具体)
   - file(尽量给相对路径)
   - line(能确定就填整数,否则 null)

建议先查看:
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 503 Service Unavailable: Service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: b7aee97e-60be-4fa4-9b1d-41fb4c81d473)
Reconnecting... 2/5 (unexpected status 503 Service Unavailable: Service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 2a0e2e80-7c58-475a-84aa-d9c18d6ac9ed)
Reconnecting... 3/5 (unexpected status 503 Service Unavailable: Service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 77d5e690-a6d0-4a6e-8824-148499660fbc)
Reconnecting... 4/5 (unexpected status 503 Service Unavailable: Service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: b7878a16-124e-489a-a435-9cac99eab354)
Reconnecting... 5/5 (unexpected status 503 Service Unavailable: Service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 76444b71-d869-4d68-ac63-f51a09b6e62f)
ERROR: unexpected status 503 Service Unavailable: Service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: acb48bd8-dddb-4184-8bfc-0c653322ec9e
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260615_200528_pr819/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-06-15 20:06:09 =====

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants