Skip to content

feat: honour ACESTEP_DTYPE env var on standard CUDA devices#1185

Draft
ChuxiJ with Copilot wants to merge 2 commits into
mainfrom
copilot/fix-acestep-dtype-checks
Draft

feat: honour ACESTEP_DTYPE env var on standard CUDA devices#1185
ChuxiJ with Copilot wants to merge 2 commits into
mainfrom
copilot/fix-acestep-dtype-checks

Conversation

Copilot AI commented May 4, 2026

Copy link
Copy Markdown
Contributor

ACESTEP_DTYPE was read nowhere — the env var existed but had no effect on standard (non-ROCm) CUDA paths, leaving users with no way to override the hardware-inferred dtype.

Changes

init_service_orchestrator.py

  • In the non-ROCm elif resolved_device == "cuda" branch, check ACESTEP_DTYPE before hardware auto-detection.
  • Accepts float32 | float16 | bfloat16; invalid/unset values fall through to the existing Ampere/Pre-Ampere detection unchanged.
env_dtype_str = os.environ.get("ACESTEP_DTYPE", "").strip().lower()
if env_dtype_str in ("float32", "float16", "bfloat16"):
    self.dtype = getattr(torch, env_dtype_str)
    logger.info(f"[initialize_service] ACESTEP_DTYPE={env_dtype_str} override: using dtype={self.dtype}.")
elif gpu_config.cuda_supports_bfloat16():
    self.dtype = torch.bfloat16
else:
    self.dtype = torch.float16
    ...

init_service_loader.py

  • Pre-Ampere CUDA branch in _load_main_model_from_checkpoint now checks self.dtype before selecting attention implementation.
  • float32sdpa (no overflow risk in SDPA's fused softmax with float32).
  • float16 (default on Pre-Ampere) → eager workaround retained unchanged.
elif device == "cuda" and not gpu_config.cuda_supports_bfloat16():
    if getattr(self, "dtype", None) == torch.float32:
        attn_implementation = "sdpa"   # float32 is safe with SDPA
    else:
        attn_implementation = "eager"  # float16 overflow guard

Tests

  • 7 new unit tests in init_service_test.py covering: all three valid ACESTEP_DTYPE values, invalid/unset fallback, and the float32 vs float16 attention-selection paths on Pre-Ampere CUDA.

Copilot AI linked an issue May 4, 2026 that may be closed by this pull request
- Add ACESTEP_DTYPE env var check in init_service_orchestrator.py for
  non-ROCm CUDA dtype selection (float32/float16/bfloat16 override)
- Add float32-aware attention selection in init_service_loader.py:
  Pre-Ampere + float32 → SDPA; Pre-Ampere + float16 → eager (unchanged)
- Add 7 unit tests covering the new env override and attention paths

Agent-Logs-Url: https://github.com/ace-step/ACE-Step-1.5/sessions/991cf1b8-0712-4bef-a6d2-ae9e2b3bbaa2

Co-authored-by: ChuxiJ <30956809+ChuxiJ@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix ACESTEP_DTYPE checks in init_service_loader and init_service_orchestrator feat: honour ACESTEP_DTYPE env var on standard CUDA devices May 4, 2026
Copilot finished work on behalf of ChuxiJ May 4, 2026 05:58
Copilot AI requested a review from ChuxiJ May 4, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ACESTEP_DTYPE

2 participants