Skip to content

issue-fix-426: Multi-GPU support by component offloading#1149

Open
imsarang wants to merge 6 commits into
ace-step:mainfrom
imsarang:issue-426
Open

issue-fix-426: Multi-GPU support by component offloading#1149
imsarang wants to merge 6 commits into
ace-step:mainfrom
imsarang:issue-426

Conversation

@imsarang

@imsarang imsarang commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Multi GPU support for component offloading.
issue link: #426

Earlier: GPU 0 used to perform dit, vae and lm tasks.

Now:
expected result GPU i will take respective component.
GPU allocation is done by ranking at runtime, where one gpu has one component to execute.

Summary by CodeRabbit

  • New Features

    • Per-component GPU mapping auto-selects, validates and is applied at startup and service initialization; environment overrides still honored.
    • LLM and model components can be targeted to different GPUs via the resolved mapping.
  • UI

    • GPU info now shows a per-component GPU hint (DiT/VAE/LM) when applicable.
  • Tests

    • Added/updated tests for device resolution, validation, multi-/single-GPU behavior, no-CUDA handling, and LLM device selection.

@coderabbitai

coderabbitai Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a ComponentDeviceMap dataclass and device-mapping utilities; resolves and validates per-component GPU assignments at startup; propagates component device hints into DiT/VAE/LLM/service initialization and UI GPU hint display; tests added/updated for mapping behavior.

Changes

Cohort / File(s) Summary
Device Mapping Core
acestep/core/generation/device_mapping.py, acestep/core/generation/device_mapping_test.py
Add frozen ComponentDeviceMap(dit, vae, lm), resolve_component_device_map() (rank GPUs by free VRAM and assign components), format_component_gpu_hint_text(), and validate_component_device_map() plus unit tests covering multi/single/no-CUDA, hint formatting, and validation errors.
Startup LLM & Model Init
acestep/api/startup_llm_init.py, acestep/api/startup_llm_init_test.py, acestep/api/startup_model_init.py
initialize_llm_at_startup(...) now accepts component_device_map and prefers component_device_map.lm for LM device unless env override. do_model_initialization resolves/validates component map, derives service_device from dit mapping or fallback, passes component_device_map into downstream init calls; tests updated and new test verifies lm device selection from map.
Service Orchestrator Integration
acestep/core/generation/handler/init_service_orchestrator.py
initialize_service(...) gains optional component_device_map; resolves/validates when absent, derives dit_device/vae_device with fallbacks, logs mapping, sets self.device to dit_device, loads DiT/text-encoder on dit_device and VAE on vae_device, and records per-component devices in last_init_params.
UI: GPU Hints & i18n
acestep/ui/gradio/events/generation/service_init.py, acestep/ui/gradio/interfaces/generation_service_config_rows.py, acestep/ui/gradio/i18n/en.json, acestep/ui/gradio/i18n/zh.json
Add format_component_gpu_hint_text usage and localized component_gpu_hint strings; GPU info/tier display appended with per-component GPU hint when relevant.
Tests: Startup & Mapping
acestep/api/startup_llm_init_test.py, acestep/core/generation/device_mapping_test.py
Existing startup tests updated to pass component_device_map; new unit tests validate device-selection logic for LLM and device-mapping behaviors under varied CUDA setups.

Sequence Diagram(s)

sequenceDiagram
    participant Startup as Startup Handler
    participant DeviceMapper as Device Mapper
    participant Validator as Validator
    participant LLMInit as LLM Initializer
    participant ServiceInit as Service Orchestrator
    participant GPU as GPU Devices

    Startup->>DeviceMapper: resolve_component_device_map()
    activate DeviceMapper
    DeviceMapper->>GPU: query device count & mem info
    DeviceMapper-->>Startup: ComponentDeviceMap(dit, vae, lm)
    deactivate DeviceMapper

    Startup->>Validator: validate_component_device_map(mapping)
    activate Validator
    Validator->>GPU: verify cuda indices in range
    Validator-->>Startup: validation result
    deactivate Validator

    Startup->>LLMInit: initialize_llm_at_startup(component_device_map)
    activate LLMInit
    LLMInit->>GPU: load LLM on mapped lm device (or env override)
    LLMInit-->>Startup: LLM ready
    deactivate LLMInit

    Startup->>ServiceInit: initialize_service(component_device_map)
    activate ServiceInit
    ServiceInit->>GPU: load DiT/text-encoder on dit device
    ServiceInit->>GPU: load VAE on vae device
    ServiceInit-->>Startup: service ready
    deactivate ServiceInit
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • ChuxiJ

Poem

🐇 I counted chips with whiskered care,

I nudged each model to its lair,
DiT hops left, VAE takes the right,
LM naps where memory's bright,
Together we hum in CUDA light.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'issue-fix-426: Multi-GPU support by component offloading' clearly and specifically summarizes the main change—adding multi-GPU support through component offloading. It is concise, directly related to the changeset, and a teammate can quickly understand the primary improvement.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
acestep/api/startup_llm_init.py (1)

75-76: Drop the redundant truthiness check on component_device_map.

component_device_map is a required parameter typed as ComponentDeviceMap (a frozen dataclass instance is always truthy), so if component_device_map else None is dead-branch defensive code that contradicts the type hint. Reading .lm directly is clearer.

♻️ Proposed simplification
-        mapped_lm_device = component_device_map.lm if component_device_map else None
-        lm_device = os.getenv("ACESTEP_LM_DEVICE", mapped_lm_device or device)
+        lm_device = os.getenv("ACESTEP_LM_DEVICE", component_device_map.lm or device)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/api/startup_llm_init.py` around lines 75 - 76, The code defensively
checks component_device_map's truthiness even though it's a required
ComponentDeviceMap (always truthy); remove the redundant branch and read the .lm
attribute directly by setting mapped_lm_device = component_device_map.lm, then
keep the existing lm_device assignment using os.getenv("ACESTEP_LM_DEVICE",
mapped_lm_device or device) so the explicit fallback to device remains; update
references in startup_llm_init.py (mapped_lm_device, lm_device,
ACESTEP_LM_DEVICE, device) accordingly.
acestep/core/generation/device_mapping_test.py (2)

18-45: Optional: cover the 2-GPU branch.

Tests cover 1 and 3 GPUs but not the 2-GPU case, which has its own non-obvious behavior in resolve_component_device_map: vae_idx = ranked[1] and lm_idx = ranked[-1] = ranked[1], so VAE and LM intentionally share the second GPU while DiT alone uses the first. A small test would lock that contract in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/device_mapping_test.py` around lines 18 - 45, Add a
test that covers the 2-GPU branch of resolve_component_device_map by mocking
torch.cuda.is_available to True, torch.cuda.device_count to 2, and
torch.cuda.mem_get_info to return two device tuples; call
resolve_component_device_map() and assert that mapping.dit == "cuda:0" and
mapping.vae == mapping.lm == "cuda:1" (comparing against a ComponentDeviceMap()
instance where appropriate) so the contract where VAE and LM share the second
GPU is locked in.

33-40: Single-GPU test silently relies on the mem_get_info exception fallback.

This test does not patch torch.cuda.mem_get_info, so when _rank_cuda_devices_by_free_vram is invoked under device_count=1 it actually calls the real torch.cuda.mem_get_info(0) — which will typically raise in a CPU-only test environment, hit the inner except in device_mapping.py:34, and fall through with free_bytes = 0. The assertion still passes (ranked = [0]), but for the wrong reason: it’s exercising the error path rather than the happy path.

Consider patching mem_get_info here too so the test verifies the intended single-GPU branch deterministically.

♻️ Proposed test hardening
-    `@patch`("torch.cuda.is_available", return_value=True)
-    `@patch`("torch.cuda.device_count", return_value=1)
-    def test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero(self, *_mocks) -> None:
+    `@patch`("torch.cuda.is_available", return_value=True)
+    `@patch`("torch.cuda.device_count", return_value=1)
+    `@patch`("torch.cuda.mem_get_info", return_value=(8 * 1024**3, 8 * 1024**3))
+    def test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero(
+        self, *_mocks
+    ) -> None:
         """Single visible GPU should resolve all components onto cuda:0."""
         mapping = resolve_component_device_map()
         self.assertEqual("cuda:0", mapping.dit)
         self.assertEqual("cuda:0", mapping.vae)
         self.assertEqual("cuda:0", mapping.lm)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/device_mapping_test.py` around lines 33 - 40, The
test test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero
currently omits patching torch.cuda.mem_get_info so it accidentally exercises
the exception fallback in _rank_cuda_devices_by_free_vram; patch
torch.cuda.mem_get_info in that test to return a valid (free_bytes, total_bytes)
tuple with non-zero free memory (so the happy path runs) and keep the existing
patches for torch.cuda.is_available and device_count; ensure the mock target
name matches "torch.cuda.mem_get_info" and that the mocked return value produces
a ranked list [0] so resolve_component_device_map still maps dit/vae/lm to
"cuda:0".
acestep/api/startup_llm_init_test.py (1)

36-36: Optional: add a test that exercises the component_device_map.lm path.

Both test cases pass an empty ComponentDeviceMap(), so the new mapped_lm_device branch in startup_llm_init.py:75-76 is not actually exercised — lm_device always falls through to the device argument. A minimal third case asserting that llm_handler.initialize is called with device="cuda:2" when component_device_map=ComponentDeviceMap(lm="cuda:2") and ACESTEP_LM_DEVICE is unset would lock in the new behavior.

Also applies to: 77-77

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/api/startup_llm_init_test.py` at line 36, Add a third test case in
startup_llm_init_test.py that passes
component_device_map=ComponentDeviceMap(lm="cuda:2") and ensures
ACESTEP_LM_DEVICE is unset, then call the code path that triggers
startup_llm_init.initialize (or the test harness that calls
llm_handler.initialize) and assert that llm_handler.initialize was invoked with
device="cuda:2"; this will exercise the mapped_lm_device branch in
startup_llm_init.py (the lines handling mapped_lm_device at ~75-76) and verify
the new behavior instead of always falling back to the provided device argument.
acestep/core/generation/device_mapping.py (1)

64-78: Document the raised exception in the docstring.

Per project guidelines, public-function docstrings should mention raised exceptions. validate_component_device_map raises ValueError but the docstring does not.

♻️ Proposed docstring update
 def validate_component_device_map(mapping: ComponentDeviceMap) -> None:
-    """Validate that mapped CUDA indices exist on the current host."""
+    """Validate that mapped CUDA indices exist on the current host.
+
+    Args:
+        mapping: Component-to-device mapping to validate.
+
+    Raises:
+        ValueError: If any ``cuda:<idx>`` entry references an index that
+            is out of range for the visible CUDA device count.
+    """

As per coding guidelines: "Docstrings must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/device_mapping.py` around lines 64 - 78, The
docstring for validate_component_device_map is missing documentation of the
exception it can raise; update the function's docstring to include a "Raises"
section (or sentence) that states it raises ValueError when a provided device
string maps to a non-existent CUDA index (e.g., index out of range or invalid
format) so callers know to handle invalid device mappings.
acestep/api/startup_model_init.py (1)

43-54: Optional: migrate startup output to loguru.

The new lines (76-81) keep using print() to match the surrounding code in this file, which is fine for this PR. Just noting that the project guideline is to use loguru.logger for all non-CLI output; converting this whole startup helper in a follow-up would bring it in line.

As per coding guidelines: "Use loguru logger for logging instead of print(), except for CLI output".

Also applies to: 56-68, 76-81, 90-90, 103-103, 108-108, 122-122, 125-125, 133-133, 147-147, 149-149, 151-151, 160-160, 174-174, 176-176, 178-178

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/api/startup_model_init.py` around lines 43 - 54, The startup code
prints GPU configuration using multiple print() calls (e.g., printing
gpu_memory_gb and fields on gpu_config); replace these print() calls with
loguru's logger (e.g., logger.info) and format the same multi-line block as a
single logged message or several logger.info calls, and add the required "from
loguru import logger" import at the top; update every other noted print()
occurrence in this module (the blocks that reference gpu_memory_gb, gpu_config,
and other startup variables) to use logger instead of print so all non-CLI
startup output follows the project's logging guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/api/startup_model_init.py`:
- Around line 76-81: The startup log prints a resolved component_device_map that
can differ from final placements because
initialize_service_orchestrator.initialize_service re-resolves the map per call;
either make the map resolution deterministic by resolving component_device_map
once and thread that same object through
initialize_service_orchestrator.initialize_service (so handler placements use
the same map as the log), or change the print message to clarify it is only the
initial/allocation hint (e.g., "initial component GPU hint") so it won't mislead
— update the use of component_device_map, the print call, and any callers to
pass the single resolved map through to
initialize_service_orchestrator.initialize_service or alter the log text
accordingly.

In `@acestep/core/generation/device_mapping.py`:
- Around line 30-42: The broad `except Exception` handlers in the loop that
probes VRAM (the block calling torch.cuda.mem_get_info in device_mapping.py that
populates free_by_device) should be narrowed to catch RuntimeError only, and any
caught RuntimeError must be logged via loguru.logger (e.g., logger.exception or
logger.error with the exception) so failures aren’t silently treated as zero
free bytes; add/import loguru.logger if missing and replace both `except
Exception` blocks with `except RuntimeError as e:` and a logger call before
falling back to free_bytes = 0, keeping the existing logic that appends (idx,
free_bytes) and sorts afterwards.

In `@acestep/core/generation/handler/init_service_orchestrator.py`:
- Around line 77-89: The orchestrator re-calls resolve_component_device_map()
inside initialize_service, which discards the caller-provided device and causes
inconsistent placement; update initialize_service (and its use of
_resolve_initialize_device, dit_device, vae_device, and component_device_map) to
accept an optional ComponentDeviceMap parameter and short-circuit re-resolution
when a map is provided (i.e., use the passed-in component_device_map to compute
dit_device/vae_device and only call resolve_component_device_map() if none was
passed), and modify callers to pass their already-resolved component_device_map
so the device argument is honored and mem_get_info (GPU ranking) is not
performed twice.

---

Nitpick comments:
In `@acestep/api/startup_llm_init_test.py`:
- Line 36: Add a third test case in startup_llm_init_test.py that passes
component_device_map=ComponentDeviceMap(lm="cuda:2") and ensures
ACESTEP_LM_DEVICE is unset, then call the code path that triggers
startup_llm_init.initialize (or the test harness that calls
llm_handler.initialize) and assert that llm_handler.initialize was invoked with
device="cuda:2"; this will exercise the mapped_lm_device branch in
startup_llm_init.py (the lines handling mapped_lm_device at ~75-76) and verify
the new behavior instead of always falling back to the provided device argument.

In `@acestep/api/startup_llm_init.py`:
- Around line 75-76: The code defensively checks component_device_map's
truthiness even though it's a required ComponentDeviceMap (always truthy);
remove the redundant branch and read the .lm attribute directly by setting
mapped_lm_device = component_device_map.lm, then keep the existing lm_device
assignment using os.getenv("ACESTEP_LM_DEVICE", mapped_lm_device or device) so
the explicit fallback to device remains; update references in
startup_llm_init.py (mapped_lm_device, lm_device, ACESTEP_LM_DEVICE, device)
accordingly.

In `@acestep/api/startup_model_init.py`:
- Around line 43-54: The startup code prints GPU configuration using multiple
print() calls (e.g., printing gpu_memory_gb and fields on gpu_config); replace
these print() calls with loguru's logger (e.g., logger.info) and format the same
multi-line block as a single logged message or several logger.info calls, and
add the required "from loguru import logger" import at the top; update every
other noted print() occurrence in this module (the blocks that reference
gpu_memory_gb, gpu_config, and other startup variables) to use logger instead of
print so all non-CLI startup output follows the project's logging guideline.

In `@acestep/core/generation/device_mapping_test.py`:
- Around line 18-45: Add a test that covers the 2-GPU branch of
resolve_component_device_map by mocking torch.cuda.is_available to True,
torch.cuda.device_count to 2, and torch.cuda.mem_get_info to return two device
tuples; call resolve_component_device_map() and assert that mapping.dit ==
"cuda:0" and mapping.vae == mapping.lm == "cuda:1" (comparing against a
ComponentDeviceMap() instance where appropriate) so the contract where VAE and
LM share the second GPU is locked in.
- Around line 33-40: The test
test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero currently
omits patching torch.cuda.mem_get_info so it accidentally exercises the
exception fallback in _rank_cuda_devices_by_free_vram; patch
torch.cuda.mem_get_info in that test to return a valid (free_bytes, total_bytes)
tuple with non-zero free memory (so the happy path runs) and keep the existing
patches for torch.cuda.is_available and device_count; ensure the mock target
name matches "torch.cuda.mem_get_info" and that the mocked return value produces
a ranked list [0] so resolve_component_device_map still maps dit/vae/lm to
"cuda:0".

In `@acestep/core/generation/device_mapping.py`:
- Around line 64-78: The docstring for validate_component_device_map is missing
documentation of the exception it can raise; update the function's docstring to
include a "Raises" section (or sentence) that states it raises ValueError when a
provided device string maps to a non-existent CUDA index (e.g., index out of
range or invalid format) so callers know to handle invalid device mappings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 869649b8-22e7-4d0e-87e1-d339dc8576a5

📥 Commits

Reviewing files that changed from the base of the PR and between d5d958e and e51cc4b.

📒 Files selected for processing (6)
  • acestep/api/startup_llm_init.py
  • acestep/api/startup_llm_init_test.py
  • acestep/api/startup_model_init.py
  • acestep/core/generation/device_mapping.py
  • acestep/core/generation/device_mapping_test.py
  • acestep/core/generation/handler/init_service_orchestrator.py

Comment thread acestep/api/startup_model_init.py
Comment thread acestep/core/generation/device_mapping.py
Comment thread acestep/core/generation/handler/init_service_orchestrator.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
acestep/api/startup_llm_init.py (1)

8-14: 🛠️ Refactor suggestion | 🟠 Major

Duplicate import of get_recommended_lm_model / is_lm_model_supported.

Line 9 imports get_recommended_lm_model, is_lm_model_supported from acestep.gpu_config, and the very next statement (lines 10–14) re-imports the same two symbols (plus resolve_lm_backend) from the same module. The single grouped import is sufficient.

♻️ Proposed cleanup
 from acestep.core.generation.device_mapping import ComponentDeviceMap
-from acestep.gpu_config import get_recommended_lm_model, is_lm_model_supported
 from acestep.gpu_config import (
     get_recommended_lm_model,
     is_lm_model_supported,
     resolve_lm_backend,
 )

As per coding guidelines: "Imports: Group by type (stdlib, third-party, local), sort alphabetically within groups."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/api/startup_llm_init.py` around lines 8 - 14, The file currently
imports get_recommended_lm_model and is_lm_model_supported twice; remove the
first duplicate line importing get_recommended_lm_model, is_lm_model_supported
and keep the grouped import that also includes resolve_lm_backend, ensuring
imports are grouped/sorted alphabetically (keep ComponentDeviceMap import
separate if it’s a different group). Update the top imports so only one from
acestep.gpu_config exports get_recommended_lm_model, is_lm_model_supported,
resolve_lm_backend is present.
🧹 Nitpick comments (2)
acestep/api/startup_model_init.py (1)

55-56: validate_component_device_map propagates ValueError and will hard-crash eager startup.

resolve_component_device_map() only emits indices it just enumerated, so in practice the validator should never fail — but a ValueError raised here propagates out of do_model_initialization and aborts the API server startup (when ACESTEP_NO_INIT=false) or the first request (lazy path). Given this is purely a defensive check on an internally-produced mapping, consider either:

  • catching ValueError and falling back to an empty ComponentDeviceMap with a logger.warning(...), or
  • moving the validation inside resolve_component_device_map so callers don't need a try/except contract here.

Not a blocker since the failure mode is currently unreachable, but worth hardening before user-supplied mappings (e.g., env-driven) get added.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/api/startup_model_init.py` around lines 55 - 56, The call to
validate_component_device_map can raise ValueError and crash startup; change
do_model_initialization to catch ValueError from
validate_component_device_map(component_device_map) and on exception log a
warning via the module logger and set component_device_map to an empty
ComponentDeviceMap (or equivalent default) so startup continues; alternatively,
move the validation logic into resolve_component_device_map so callers like
do_model_initialization no longer need to handle exceptions—update the code
paths referencing resolve_component_device_map and validate_component_device_map
accordingly.
acestep/api/startup_llm_init.py (1)

23-23: Type annotation says non-Optional, but downstream defensive if component_device_map else None (line 80) treats it as optional — pick one contract.

component_device_map: ComponentDeviceMap declares a required, non-None argument, yet line 80 guards against None. Either:

  • make the type Optional[ComponentDeviceMap] = None to match the runtime check (and update callers that may pass None), or
  • drop the if component_device_map else None guard at line 80 and rely on the type.

Given the orchestrator always resolves and passes a ComponentDeviceMap (see startup_model_init.py:55-56,168), preferring the non-Optional contract and dropping the guard is cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/api/startup_llm_init.py` at line 23, The parameter
component_device_map is declared non-Optional but is being treated as optional
downstream; follow the preferred non-Optional contract: keep
component_device_map: ComponentDeviceMap in the function signature and remove
the defensive conditional that does "if component_device_map else None" (replace
uses with component_device_map directly). Ensure any variable passed into places
that previously accepted None now receive component_device_map, and remove any
typing or branches that handled a None case for component_device_map.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/api/startup_llm_init.py`:
- Around line 77-83: The new validation and mapped-device logic for
lm_backend/mapped_lm_device (variables lm_backend, mapped_lm_device, lm_device
and component_device_map) is being clobbered by the subsequent calls to
resolve_lm_backend(...) and reassigning lm_device; remove the redundant
overwrites and reconcile with the centralized resolver: call
resolve_lm_backend(given_env, gpu_config) to set lm_backend (preserving its
capability fallbacks) and only set lm_device using component_device_map.lm when
ACESTEP_LM_DEVICE is unset (fall back to device), ensuring you do not reassign
lm_backend or lm_device after these decisions.

In `@acestep/api/startup_model_init.py`:
- Around line 55-63: The log prints inconsistent fallbacks: it uses
service_device (component_device_map.dit or device) for DiT/VAE but uses device
for LM, making the message diverge from how devices are actually resolved;
change the print to use the same fallback pattern
(component_device_map.<component> or device) for DiT, VAE, and LM so the log
matches initialize_llm_at_startup and the orchestrator's behavior, and consider
renaming the message to indicate it's an "initial component GPU hint" rather
than a definitive mapping; update the print statement that references
component_device_map, service_device, and lm_device accordingly.

---

Outside diff comments:
In `@acestep/api/startup_llm_init.py`:
- Around line 8-14: The file currently imports get_recommended_lm_model and
is_lm_model_supported twice; remove the first duplicate line importing
get_recommended_lm_model, is_lm_model_supported and keep the grouped import that
also includes resolve_lm_backend, ensuring imports are grouped/sorted
alphabetically (keep ComponentDeviceMap import separate if it’s a different
group). Update the top imports so only one from acestep.gpu_config exports
get_recommended_lm_model, is_lm_model_supported, resolve_lm_backend is present.

---

Nitpick comments:
In `@acestep/api/startup_llm_init.py`:
- Line 23: The parameter component_device_map is declared non-Optional but is
being treated as optional downstream; follow the preferred non-Optional
contract: keep component_device_map: ComponentDeviceMap in the function
signature and remove the defensive conditional that does "if
component_device_map else None" (replace uses with component_device_map
directly). Ensure any variable passed into places that previously accepted None
now receive component_device_map, and remove any typing or branches that handled
a None case for component_device_map.

In `@acestep/api/startup_model_init.py`:
- Around line 55-56: The call to validate_component_device_map can raise
ValueError and crash startup; change do_model_initialization to catch ValueError
from validate_component_device_map(component_device_map) and on exception log a
warning via the module logger and set component_device_map to an empty
ComponentDeviceMap (or equivalent default) so startup continues; alternatively,
move the validation logic into resolve_component_device_map so callers like
do_model_initialization no longer need to handle exceptions—update the code
paths referencing resolve_component_device_map and validate_component_device_map
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 411647bb-f746-40a7-a5ab-df1dd6dff476

📥 Commits

Reviewing files that changed from the base of the PR and between e51cc4b and c045a7e.

📒 Files selected for processing (4)
  • acestep/api/startup_llm_init.py
  • acestep/api/startup_llm_init_test.py
  • acestep/api/startup_model_init.py
  • acestep/core/generation/handler/init_service_orchestrator.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • acestep/api/startup_llm_init_test.py
  • acestep/core/generation/handler/init_service_orchestrator.py

Comment thread acestep/api/startup_llm_init.py Outdated
Comment thread acestep/api/startup_model_init.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
acestep/core/generation/device_mapping.py (2)

10-13: Narrow the bare except on optional torch import.

Ruff is flagging BLE001. The intent here is a fallback for environments without torch, so catching ImportError is more precise and aligns with coding guidelines (avoid bare except:/Exception).

♻️ Proposed refactor
 try:
     import torch  # type: ignore[reportMissingImports]
-except Exception:  # pragma: no cover - fallback for minimal environments
+except ImportError:  # pragma: no cover - fallback for minimal environments
     torch = None  # type: ignore[assignment]

As per coding guidelines: "Avoid bare except: clauses; catch specific exceptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/device_mapping.py` around lines 10 - 13, The bare
"except Exception" in the torch import fallback should be narrowed to catch
ImportError only; update the import block that sets "torch = None" (the optional
torch import) to use "except ImportError" so we only handle the missing-import
case and avoid masking other exceptions.

86-98: Brittle index parsing in validate_component_device_map.

int(device.split(":", 1)[1]) will raise an unhelpful ValueError: invalid literal for int() for malformed values like "cuda:abc" or "cuda: 0". Consider catching the parse failure and re-raising the same ValueError style used a few lines below so the user gets a consistent, contextual error.

♻️ Proposed refactor
         if not device or not device.startswith("cuda:"):
             continue
-        idx = int(device.split(":", 1)[1])
+        try:
+            idx = int(device.split(":", 1)[1])
+        except ValueError as exc:
+            raise ValueError(
+                f"Invalid {component} device '{device}': expected 'cuda:<int>'."
+            ) from exc
         if idx < 0 or idx >= cuda_count:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/core/generation/device_mapping.py` around lines 86 - 98, In
validate_component_device_map, parsing the CUDA index via int(device.split(":",
1)[1]) is brittle and can raise raw ValueError for malformed strings; wrap the
parsing of device (the part after "cuda:") in a try/except (catch ValueError and
possibly TypeError), and on parse failure raise the same contextual ValueError
used below that mentions the component, the original device string, and the
available cuda_count so users get a consistent, helpful error message (use the
existing symbols component, device, cuda_count and keep the same error format).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@acestep/core/generation/device_mapping.py`:
- Around line 10-13: The bare "except Exception" in the torch import fallback
should be narrowed to catch ImportError only; update the import block that sets
"torch = None" (the optional torch import) to use "except ImportError" so we
only handle the missing-import case and avoid masking other exceptions.
- Around line 86-98: In validate_component_device_map, parsing the CUDA index
via int(device.split(":", 1)[1]) is brittle and can raise raw ValueError for
malformed strings; wrap the parsing of device (the part after "cuda:") in a
try/except (catch ValueError and possibly TypeError), and on parse failure raise
the same contextual ValueError used below that mentions the component, the
original device string, and the available cuda_count so users get a consistent,
helpful error message (use the existing symbols component, device, cuda_count
and keep the same error format).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f2b2f91e-b5b4-4f76-8f4b-5e110d8fbe87

📥 Commits

Reviewing files that changed from the base of the PR and between c045a7e and e3ed830.

📒 Files selected for processing (6)
  • acestep/api/startup_llm_init.py
  • acestep/api/startup_llm_init_test.py
  • acestep/api/startup_model_init.py
  • acestep/core/generation/device_mapping.py
  • acestep/core/generation/device_mapping_test.py
  • acestep/core/generation/handler/init_service_orchestrator.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • acestep/api/startup_llm_init_test.py
  • acestep/api/startup_model_init.py
  • acestep/core/generation/handler/init_service_orchestrator.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
acestep/ui/gradio/events/generation/service_init.py (1)

18-20: Cross‑layer import: events/ shouldn't depend on interfaces/.

build_component_gpu_hint_text is a pure string formatter around resolve_component_device_map() — it has no Gradio UI construction in it. Pulling it from acestep/ui/gradio/interfaces/generation_service_config_rows.py into an event handler reverses the usual dependency direction (events → interfaces) and risks future circular imports if the interfaces module ever needs to import from events/.

Consider relocating the helper to a neutral location (e.g. alongside resolve_component_device_map in acestep/core/generation/device_mapping.py, or a small ui/gradio/helpers/ module), and have both generation_service_config_rows.py and service_init.py import it from there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/ui/gradio/events/generation/service_init.py` around lines 18 - 20,
The event module service_init.py currently imports build_component_gpu_hint_text
from interfaces/generation_service_config_rows.py, creating a cross-layer
dependency; move the helper build_component_gpu_hint_text to a neutral module
(e.g., colocate with resolve_component_device_map in
acestep/core/generation/device_mapping.py or create a small
acestep/ui/gradio/helpers module), update generation_service_config_rows.py and
service_init.py to import build_component_gpu_hint_text from the new neutral
location, and ensure resolve_component_device_map remains referenced from its
original module so no interfaces → events import occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acestep/ui/gradio/interfaces/generation_service_config_rows.py`:
- Around line 21-29: build_component_gpu_hint_text currently always returns a
hardcoded English hint even when resolve_component_device_map() yields no
resolved components (noise on non‑CUDA/single‑GPU systems) and bypasses i18n;
update build_component_gpu_hint_text to (1) detect an empty ComponentDeviceMap
or the case where dit, vae and lm all resolve to the same device (including
matching the default_device) and return an empty string in those cases, and (2)
localize the leading label by calling t("service.gpu_auto_tier") (or the
appropriate key) instead of the literal "Component GPU hint: "; also ensure
callers build_gpu_info_and_tier and on_tier_change skip the "  \n" prefix when
the helper returns an empty string so no extra whitespace is rendered.

---

Nitpick comments:
In `@acestep/ui/gradio/events/generation/service_init.py`:
- Around line 18-20: The event module service_init.py currently imports
build_component_gpu_hint_text from interfaces/generation_service_config_rows.py,
creating a cross-layer dependency; move the helper build_component_gpu_hint_text
to a neutral module (e.g., colocate with resolve_component_device_map in
acestep/core/generation/device_mapping.py or create a small
acestep/ui/gradio/helpers module), update generation_service_config_rows.py and
service_init.py to import build_component_gpu_hint_text from the new neutral
location, and ensure resolve_component_device_map remains referenced from its
original module so no interfaces → events import occurs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ed738ce9-3f49-4137-a002-1dc25b49f4b3

📥 Commits

Reviewing files that changed from the base of the PR and between e3ed830 and 4036677.

📒 Files selected for processing (2)
  • acestep/ui/gradio/events/generation/service_init.py
  • acestep/ui/gradio/interfaces/generation_service_config_rows.py

Comment thread acestep/ui/gradio/interfaces/generation_service_config_rows.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
acestep/ui/gradio/interfaces/generation_service_config_rows.py (1)

1-245: 🏗️ Heavy lift

Split this row-builder module before it grows further.

generation_service_config_rows.py is already over the repo's 200-line hard cap, and the new GPU hint logic adds more responsibility to a file that already builds language, GPU, checkpoint, model, and LM controls. Extracting the GPU-info/tier block into a smaller helper module would keep the boundary clearer and bring the file back under policy.

As per coding guidelines: Module LOC policy is met (<=150 target, <=200 hard cap or justified exception).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acestep/ui/gradio/interfaces/generation_service_config_rows.py` around lines
1 - 245, The file exceeds the repo LOC cap and should be split by extracting the
GPU info/tier UI into its own helper module: move the build_gpu_info_and_tier
function (and its direct dependencies: the calls to
format_component_gpu_hint_text, get_gpu_device_name, and GPU_TIER_LABELS lookup
plus any imports used only by that function) into a new module (e.g.,
generation_service_gpu_rows) and update imports in
generation_service_config_rows.py to import build_gpu_info_and_tier from the new
module; ensure the new module exports the same function signature and that
remaining imports (t, GPU_TIER_LABELS, format_component_gpu_hint_text,
get_gpu_device_name) are adjusted there so the original callers and tests
continue to work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@acestep/ui/gradio/interfaces/generation_service_config_rows.py`:
- Around line 1-245: The file exceeds the repo LOC cap and should be split by
extracting the GPU info/tier UI into its own helper module: move the
build_gpu_info_and_tier function (and its direct dependencies: the calls to
format_component_gpu_hint_text, get_gpu_device_name, and GPU_TIER_LABELS lookup
plus any imports used only by that function) into a new module (e.g.,
generation_service_gpu_rows) and update imports in
generation_service_config_rows.py to import build_gpu_info_and_tier from the new
module; ensure the new module exports the same function signature and that
remaining imports (t, GPU_TIER_LABELS, format_component_gpu_hint_text,
get_gpu_device_name) are adjusted there so the original callers and tests
continue to work.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c2fdc42f-8755-43bd-8695-6daea536c993

📥 Commits

Reviewing files that changed from the base of the PR and between 4036677 and 8f579b0.

📒 Files selected for processing (6)
  • acestep/core/generation/device_mapping.py
  • acestep/core/generation/device_mapping_test.py
  • acestep/ui/gradio/events/generation/service_init.py
  • acestep/ui/gradio/i18n/en.json
  • acestep/ui/gradio/i18n/zh.json
  • acestep/ui/gradio/interfaces/generation_service_config_rows.py
✅ Files skipped from review due to trivial changes (2)
  • acestep/ui/gradio/i18n/en.json
  • acestep/ui/gradio/i18n/zh.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • acestep/ui/gradio/events/generation/service_init.py
  • acestep/core/generation/device_mapping_test.py

@pokepress

Copy link
Copy Markdown

So, I tried using this by creating a branch off the current code in the main branch and merging yours in. No issues getting it to start up, but I didn't see it move anything to the second GPU. For the record, my setup is:

ThinkStation p720 with 32GB RAM on Windows 10
GPU 1: 4060ti 16GB
GPU 2: 3060 12GB

The generated hint was: DiT=cuda:0, VAE=cuda:1, LM=cuda:1

Suggesting the DiT music model would remain on the 4060ti, and the others would move onto 3060. However, only the 4060 ti was used, and the VAE and LM were offloaded to the CPU. Do I need to override the setup based on my total memory to make this work?

@imsarang

Copy link
Copy Markdown
Contributor Author

according to the code, all the available gpus should be shown and ranked, based on the ranking the module should be offloaded.
You can try by overriding the setup, but i dont think this will be the permanent fix.
Possibly a static mapping might get involved, but we need a better solution, I will push my changes based on this.

Thanks for the feedback, will update the code.

@pokepress

Copy link
Copy Markdown

according to the code, all the available gpus should be shown and ranked, based on the ranking the module should be offloaded. You can try by overriding the setup, but i dont think this will be the permanent fix. Possibly a static mapping might get involved, but we need a better solution, I will push my changes based on this.

Thanks for the feedback, will update the code.

I believe I also had "offload to CPU" on, which is checked by default in this configuration. I did turn off dit to CPU, though.

@pokepress

Copy link
Copy Markdown

So, I gave editing the code a try myself, and wanted to explain some of the things that make this more difficult than it might seem:

  1. The existing code has a parameter that allows you to specify a specific device for the (L)LM. However, many places in code (both in inference and the general gpu utils) actually just hard code an index of 0.
  2. Some supporting frameworks (in particular, VLLM) don't make it easy to specify which device a model should be sent to.
  3. If models still need to be offloaded to the CPU, that needs to take into account the model may be on a device other than 0
  4. Some cuda operations require the relevant tensors to be on the same device. I don't know if this can be reworked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants