issue-fix-426: Multi-GPU support by component offloading#1149
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a ComponentDeviceMap dataclass and device-mapping utilities; resolves and validates per-component GPU assignments at startup; propagates component device hints into DiT/VAE/LLM/service initialization and UI GPU hint display; tests added/updated for mapping behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Startup as Startup Handler
participant DeviceMapper as Device Mapper
participant Validator as Validator
participant LLMInit as LLM Initializer
participant ServiceInit as Service Orchestrator
participant GPU as GPU Devices
Startup->>DeviceMapper: resolve_component_device_map()
activate DeviceMapper
DeviceMapper->>GPU: query device count & mem info
DeviceMapper-->>Startup: ComponentDeviceMap(dit, vae, lm)
deactivate DeviceMapper
Startup->>Validator: validate_component_device_map(mapping)
activate Validator
Validator->>GPU: verify cuda indices in range
Validator-->>Startup: validation result
deactivate Validator
Startup->>LLMInit: initialize_llm_at_startup(component_device_map)
activate LLMInit
LLMInit->>GPU: load LLM on mapped lm device (or env override)
LLMInit-->>Startup: LLM ready
deactivate LLMInit
Startup->>ServiceInit: initialize_service(component_device_map)
activate ServiceInit
ServiceInit->>GPU: load DiT/text-encoder on dit device
ServiceInit->>GPU: load VAE on vae device
ServiceInit-->>Startup: service ready
deactivate ServiceInit
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
acestep/api/startup_llm_init.py (1)
75-76: Drop the redundant truthiness check oncomponent_device_map.
component_device_mapis a required parameter typed asComponentDeviceMap(a frozen dataclass instance is always truthy), soif component_device_map else Noneis dead-branch defensive code that contradicts the type hint. Reading.lmdirectly is clearer.♻️ Proposed simplification
- mapped_lm_device = component_device_map.lm if component_device_map else None - lm_device = os.getenv("ACESTEP_LM_DEVICE", mapped_lm_device or device) + lm_device = os.getenv("ACESTEP_LM_DEVICE", component_device_map.lm or device)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/api/startup_llm_init.py` around lines 75 - 76, The code defensively checks component_device_map's truthiness even though it's a required ComponentDeviceMap (always truthy); remove the redundant branch and read the .lm attribute directly by setting mapped_lm_device = component_device_map.lm, then keep the existing lm_device assignment using os.getenv("ACESTEP_LM_DEVICE", mapped_lm_device or device) so the explicit fallback to device remains; update references in startup_llm_init.py (mapped_lm_device, lm_device, ACESTEP_LM_DEVICE, device) accordingly.acestep/core/generation/device_mapping_test.py (2)
18-45: Optional: cover the 2-GPU branch.Tests cover 1 and 3 GPUs but not the 2-GPU case, which has its own non-obvious behavior in
resolve_component_device_map:vae_idx = ranked[1]andlm_idx = ranked[-1] = ranked[1], so VAE and LM intentionally share the second GPU while DiT alone uses the first. A small test would lock that contract in.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/core/generation/device_mapping_test.py` around lines 18 - 45, Add a test that covers the 2-GPU branch of resolve_component_device_map by mocking torch.cuda.is_available to True, torch.cuda.device_count to 2, and torch.cuda.mem_get_info to return two device tuples; call resolve_component_device_map() and assert that mapping.dit == "cuda:0" and mapping.vae == mapping.lm == "cuda:1" (comparing against a ComponentDeviceMap() instance where appropriate) so the contract where VAE and LM share the second GPU is locked in.
33-40: Single-GPU test silently relies on themem_get_infoexception fallback.This test does not patch
torch.cuda.mem_get_info, so when_rank_cuda_devices_by_free_vramis invoked underdevice_count=1it actually calls the realtorch.cuda.mem_get_info(0)— which will typically raise in a CPU-only test environment, hit the innerexceptindevice_mapping.py:34, and fall through withfree_bytes = 0. The assertion still passes (ranked =[0]), but for the wrong reason: it’s exercising the error path rather than the happy path.Consider patching
mem_get_infohere too so the test verifies the intended single-GPU branch deterministically.♻️ Proposed test hardening
- `@patch`("torch.cuda.is_available", return_value=True) - `@patch`("torch.cuda.device_count", return_value=1) - def test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero(self, *_mocks) -> None: + `@patch`("torch.cuda.is_available", return_value=True) + `@patch`("torch.cuda.device_count", return_value=1) + `@patch`("torch.cuda.mem_get_info", return_value=(8 * 1024**3, 8 * 1024**3)) + def test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero( + self, *_mocks + ) -> None: """Single visible GPU should resolve all components onto cuda:0.""" mapping = resolve_component_device_map() self.assertEqual("cuda:0", mapping.dit) self.assertEqual("cuda:0", mapping.vae) self.assertEqual("cuda:0", mapping.lm)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/core/generation/device_mapping_test.py` around lines 33 - 40, The test test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero currently omits patching torch.cuda.mem_get_info so it accidentally exercises the exception fallback in _rank_cuda_devices_by_free_vram; patch torch.cuda.mem_get_info in that test to return a valid (free_bytes, total_bytes) tuple with non-zero free memory (so the happy path runs) and keep the existing patches for torch.cuda.is_available and device_count; ensure the mock target name matches "torch.cuda.mem_get_info" and that the mocked return value produces a ranked list [0] so resolve_component_device_map still maps dit/vae/lm to "cuda:0".acestep/api/startup_llm_init_test.py (1)
36-36: Optional: add a test that exercises thecomponent_device_map.lmpath.Both test cases pass an empty
ComponentDeviceMap(), so the newmapped_lm_devicebranch instartup_llm_init.py:75-76is not actually exercised —lm_devicealways falls through to thedeviceargument. A minimal third case asserting thatllm_handler.initializeis called withdevice="cuda:2"whencomponent_device_map=ComponentDeviceMap(lm="cuda:2")andACESTEP_LM_DEVICEis unset would lock in the new behavior.Also applies to: 77-77
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/api/startup_llm_init_test.py` at line 36, Add a third test case in startup_llm_init_test.py that passes component_device_map=ComponentDeviceMap(lm="cuda:2") and ensures ACESTEP_LM_DEVICE is unset, then call the code path that triggers startup_llm_init.initialize (or the test harness that calls llm_handler.initialize) and assert that llm_handler.initialize was invoked with device="cuda:2"; this will exercise the mapped_lm_device branch in startup_llm_init.py (the lines handling mapped_lm_device at ~75-76) and verify the new behavior instead of always falling back to the provided device argument.acestep/core/generation/device_mapping.py (1)
64-78: Document the raised exception in the docstring.Per project guidelines, public-function docstrings should mention raised exceptions.
validate_component_device_mapraisesValueErrorbut the docstring does not.♻️ Proposed docstring update
def validate_component_device_map(mapping: ComponentDeviceMap) -> None: - """Validate that mapped CUDA indices exist on the current host.""" + """Validate that mapped CUDA indices exist on the current host. + + Args: + mapping: Component-to-device mapping to validate. + + Raises: + ValueError: If any ``cuda:<idx>`` entry references an index that + is out of range for the visible CUDA device count. + """As per coding guidelines: "Docstrings must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/core/generation/device_mapping.py` around lines 64 - 78, The docstring for validate_component_device_map is missing documentation of the exception it can raise; update the function's docstring to include a "Raises" section (or sentence) that states it raises ValueError when a provided device string maps to a non-existent CUDA index (e.g., index out of range or invalid format) so callers know to handle invalid device mappings.acestep/api/startup_model_init.py (1)
43-54: Optional: migrate startup output tologuru.The new lines (76-81) keep using
print()to match the surrounding code in this file, which is fine for this PR. Just noting that the project guideline is to useloguru.loggerfor all non-CLI output; converting this whole startup helper in a follow-up would bring it in line.As per coding guidelines: "Use
logurulogger for logging instead ofprint(), except for CLI output".Also applies to: 56-68, 76-81, 90-90, 103-103, 108-108, 122-122, 125-125, 133-133, 147-147, 149-149, 151-151, 160-160, 174-174, 176-176, 178-178
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/api/startup_model_init.py` around lines 43 - 54, The startup code prints GPU configuration using multiple print() calls (e.g., printing gpu_memory_gb and fields on gpu_config); replace these print() calls with loguru's logger (e.g., logger.info) and format the same multi-line block as a single logged message or several logger.info calls, and add the required "from loguru import logger" import at the top; update every other noted print() occurrence in this module (the blocks that reference gpu_memory_gb, gpu_config, and other startup variables) to use logger instead of print so all non-CLI startup output follows the project's logging guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/api/startup_model_init.py`:
- Around line 76-81: The startup log prints a resolved component_device_map that
can differ from final placements because
initialize_service_orchestrator.initialize_service re-resolves the map per call;
either make the map resolution deterministic by resolving component_device_map
once and thread that same object through
initialize_service_orchestrator.initialize_service (so handler placements use
the same map as the log), or change the print message to clarify it is only the
initial/allocation hint (e.g., "initial component GPU hint") so it won't mislead
— update the use of component_device_map, the print call, and any callers to
pass the single resolved map through to
initialize_service_orchestrator.initialize_service or alter the log text
accordingly.
In `@acestep/core/generation/device_mapping.py`:
- Around line 30-42: The broad `except Exception` handlers in the loop that
probes VRAM (the block calling torch.cuda.mem_get_info in device_mapping.py that
populates free_by_device) should be narrowed to catch RuntimeError only, and any
caught RuntimeError must be logged via loguru.logger (e.g., logger.exception or
logger.error with the exception) so failures aren’t silently treated as zero
free bytes; add/import loguru.logger if missing and replace both `except
Exception` blocks with `except RuntimeError as e:` and a logger call before
falling back to free_bytes = 0, keeping the existing logic that appends (idx,
free_bytes) and sorts afterwards.
In `@acestep/core/generation/handler/init_service_orchestrator.py`:
- Around line 77-89: The orchestrator re-calls resolve_component_device_map()
inside initialize_service, which discards the caller-provided device and causes
inconsistent placement; update initialize_service (and its use of
_resolve_initialize_device, dit_device, vae_device, and component_device_map) to
accept an optional ComponentDeviceMap parameter and short-circuit re-resolution
when a map is provided (i.e., use the passed-in component_device_map to compute
dit_device/vae_device and only call resolve_component_device_map() if none was
passed), and modify callers to pass their already-resolved component_device_map
so the device argument is honored and mem_get_info (GPU ranking) is not
performed twice.
---
Nitpick comments:
In `@acestep/api/startup_llm_init_test.py`:
- Line 36: Add a third test case in startup_llm_init_test.py that passes
component_device_map=ComponentDeviceMap(lm="cuda:2") and ensures
ACESTEP_LM_DEVICE is unset, then call the code path that triggers
startup_llm_init.initialize (or the test harness that calls
llm_handler.initialize) and assert that llm_handler.initialize was invoked with
device="cuda:2"; this will exercise the mapped_lm_device branch in
startup_llm_init.py (the lines handling mapped_lm_device at ~75-76) and verify
the new behavior instead of always falling back to the provided device argument.
In `@acestep/api/startup_llm_init.py`:
- Around line 75-76: The code defensively checks component_device_map's
truthiness even though it's a required ComponentDeviceMap (always truthy);
remove the redundant branch and read the .lm attribute directly by setting
mapped_lm_device = component_device_map.lm, then keep the existing lm_device
assignment using os.getenv("ACESTEP_LM_DEVICE", mapped_lm_device or device) so
the explicit fallback to device remains; update references in
startup_llm_init.py (mapped_lm_device, lm_device, ACESTEP_LM_DEVICE, device)
accordingly.
In `@acestep/api/startup_model_init.py`:
- Around line 43-54: The startup code prints GPU configuration using multiple
print() calls (e.g., printing gpu_memory_gb and fields on gpu_config); replace
these print() calls with loguru's logger (e.g., logger.info) and format the same
multi-line block as a single logged message or several logger.info calls, and
add the required "from loguru import logger" import at the top; update every
other noted print() occurrence in this module (the blocks that reference
gpu_memory_gb, gpu_config, and other startup variables) to use logger instead of
print so all non-CLI startup output follows the project's logging guideline.
In `@acestep/core/generation/device_mapping_test.py`:
- Around line 18-45: Add a test that covers the 2-GPU branch of
resolve_component_device_map by mocking torch.cuda.is_available to True,
torch.cuda.device_count to 2, and torch.cuda.mem_get_info to return two device
tuples; call resolve_component_device_map() and assert that mapping.dit ==
"cuda:0" and mapping.vae == mapping.lm == "cuda:1" (comparing against a
ComponentDeviceMap() instance where appropriate) so the contract where VAE and
LM share the second GPU is locked in.
- Around line 33-40: The test
test_resolve_component_device_map_single_gpu_collapses_to_cuda_zero currently
omits patching torch.cuda.mem_get_info so it accidentally exercises the
exception fallback in _rank_cuda_devices_by_free_vram; patch
torch.cuda.mem_get_info in that test to return a valid (free_bytes, total_bytes)
tuple with non-zero free memory (so the happy path runs) and keep the existing
patches for torch.cuda.is_available and device_count; ensure the mock target
name matches "torch.cuda.mem_get_info" and that the mocked return value produces
a ranked list [0] so resolve_component_device_map still maps dit/vae/lm to
"cuda:0".
In `@acestep/core/generation/device_mapping.py`:
- Around line 64-78: The docstring for validate_component_device_map is missing
documentation of the exception it can raise; update the function's docstring to
include a "Raises" section (or sentence) that states it raises ValueError when a
provided device string maps to a non-existent CUDA index (e.g., index out of
range or invalid format) so callers know to handle invalid device mappings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 869649b8-22e7-4d0e-87e1-d339dc8576a5
📒 Files selected for processing (6)
acestep/api/startup_llm_init.pyacestep/api/startup_llm_init_test.pyacestep/api/startup_model_init.pyacestep/core/generation/device_mapping.pyacestep/core/generation/device_mapping_test.pyacestep/core/generation/handler/init_service_orchestrator.py
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
acestep/api/startup_llm_init.py (1)
8-14: 🛠️ Refactor suggestion | 🟠 MajorDuplicate import of
get_recommended_lm_model/is_lm_model_supported.Line 9 imports
get_recommended_lm_model, is_lm_model_supportedfromacestep.gpu_config, and the very next statement (lines 10–14) re-imports the same two symbols (plusresolve_lm_backend) from the same module. The single grouped import is sufficient.♻️ Proposed cleanup
from acestep.core.generation.device_mapping import ComponentDeviceMap -from acestep.gpu_config import get_recommended_lm_model, is_lm_model_supported from acestep.gpu_config import ( get_recommended_lm_model, is_lm_model_supported, resolve_lm_backend, )As per coding guidelines: "Imports: Group by type (stdlib, third-party, local), sort alphabetically within groups."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/api/startup_llm_init.py` around lines 8 - 14, The file currently imports get_recommended_lm_model and is_lm_model_supported twice; remove the first duplicate line importing get_recommended_lm_model, is_lm_model_supported and keep the grouped import that also includes resolve_lm_backend, ensuring imports are grouped/sorted alphabetically (keep ComponentDeviceMap import separate if it’s a different group). Update the top imports so only one from acestep.gpu_config exports get_recommended_lm_model, is_lm_model_supported, resolve_lm_backend is present.
🧹 Nitpick comments (2)
acestep/api/startup_model_init.py (1)
55-56:validate_component_device_mappropagatesValueErrorand will hard-crash eager startup.
resolve_component_device_map()only emits indices it just enumerated, so in practice the validator should never fail — but aValueErrorraised here propagates out ofdo_model_initializationand aborts the API server startup (whenACESTEP_NO_INIT=false) or the first request (lazy path). Given this is purely a defensive check on an internally-produced mapping, consider either:
- catching
ValueErrorand falling back to an emptyComponentDeviceMapwith alogger.warning(...), or- moving the validation inside
resolve_component_device_mapso callers don't need a try/except contract here.Not a blocker since the failure mode is currently unreachable, but worth hardening before user-supplied mappings (e.g., env-driven) get added.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/api/startup_model_init.py` around lines 55 - 56, The call to validate_component_device_map can raise ValueError and crash startup; change do_model_initialization to catch ValueError from validate_component_device_map(component_device_map) and on exception log a warning via the module logger and set component_device_map to an empty ComponentDeviceMap (or equivalent default) so startup continues; alternatively, move the validation logic into resolve_component_device_map so callers like do_model_initialization no longer need to handle exceptions—update the code paths referencing resolve_component_device_map and validate_component_device_map accordingly.acestep/api/startup_llm_init.py (1)
23-23: Type annotation says non-Optional, but downstream defensiveif component_device_map else None(line 80) treats it as optional — pick one contract.
component_device_map: ComponentDeviceMapdeclares a required, non-None argument, yet line 80 guards againstNone. Either:
- make the type
Optional[ComponentDeviceMap] = Noneto match the runtime check (and update callers that may passNone), or- drop the
if component_device_map else Noneguard at line 80 and rely on the type.Given the orchestrator always resolves and passes a
ComponentDeviceMap(seestartup_model_init.py:55-56,168), preferring the non-Optional contract and dropping the guard is cleaner.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/api/startup_llm_init.py` at line 23, The parameter component_device_map is declared non-Optional but is being treated as optional downstream; follow the preferred non-Optional contract: keep component_device_map: ComponentDeviceMap in the function signature and remove the defensive conditional that does "if component_device_map else None" (replace uses with component_device_map directly). Ensure any variable passed into places that previously accepted None now receive component_device_map, and remove any typing or branches that handled a None case for component_device_map.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/api/startup_llm_init.py`:
- Around line 77-83: The new validation and mapped-device logic for
lm_backend/mapped_lm_device (variables lm_backend, mapped_lm_device, lm_device
and component_device_map) is being clobbered by the subsequent calls to
resolve_lm_backend(...) and reassigning lm_device; remove the redundant
overwrites and reconcile with the centralized resolver: call
resolve_lm_backend(given_env, gpu_config) to set lm_backend (preserving its
capability fallbacks) and only set lm_device using component_device_map.lm when
ACESTEP_LM_DEVICE is unset (fall back to device), ensuring you do not reassign
lm_backend or lm_device after these decisions.
In `@acestep/api/startup_model_init.py`:
- Around line 55-63: The log prints inconsistent fallbacks: it uses
service_device (component_device_map.dit or device) for DiT/VAE but uses device
for LM, making the message diverge from how devices are actually resolved;
change the print to use the same fallback pattern
(component_device_map.<component> or device) for DiT, VAE, and LM so the log
matches initialize_llm_at_startup and the orchestrator's behavior, and consider
renaming the message to indicate it's an "initial component GPU hint" rather
than a definitive mapping; update the print statement that references
component_device_map, service_device, and lm_device accordingly.
---
Outside diff comments:
In `@acestep/api/startup_llm_init.py`:
- Around line 8-14: The file currently imports get_recommended_lm_model and
is_lm_model_supported twice; remove the first duplicate line importing
get_recommended_lm_model, is_lm_model_supported and keep the grouped import that
also includes resolve_lm_backend, ensuring imports are grouped/sorted
alphabetically (keep ComponentDeviceMap import separate if it’s a different
group). Update the top imports so only one from acestep.gpu_config exports
get_recommended_lm_model, is_lm_model_supported, resolve_lm_backend is present.
---
Nitpick comments:
In `@acestep/api/startup_llm_init.py`:
- Line 23: The parameter component_device_map is declared non-Optional but is
being treated as optional downstream; follow the preferred non-Optional
contract: keep component_device_map: ComponentDeviceMap in the function
signature and remove the defensive conditional that does "if
component_device_map else None" (replace uses with component_device_map
directly). Ensure any variable passed into places that previously accepted None
now receive component_device_map, and remove any typing or branches that handled
a None case for component_device_map.
In `@acestep/api/startup_model_init.py`:
- Around line 55-56: The call to validate_component_device_map can raise
ValueError and crash startup; change do_model_initialization to catch ValueError
from validate_component_device_map(component_device_map) and on exception log a
warning via the module logger and set component_device_map to an empty
ComponentDeviceMap (or equivalent default) so startup continues; alternatively,
move the validation logic into resolve_component_device_map so callers like
do_model_initialization no longer need to handle exceptions—update the code
paths referencing resolve_component_device_map and validate_component_device_map
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 411647bb-f746-40a7-a5ab-df1dd6dff476
📒 Files selected for processing (4)
acestep/api/startup_llm_init.pyacestep/api/startup_llm_init_test.pyacestep/api/startup_model_init.pyacestep/core/generation/handler/init_service_orchestrator.py
🚧 Files skipped from review as they are similar to previous changes (2)
- acestep/api/startup_llm_init_test.py
- acestep/core/generation/handler/init_service_orchestrator.py
There was a problem hiding this comment.
🧹 Nitpick comments (2)
acestep/core/generation/device_mapping.py (2)
10-13: Narrow the bareexcepton optional torch import.Ruff is flagging
BLE001. The intent here is a fallback for environments withouttorch, so catchingImportErroris more precise and aligns with coding guidelines (avoid bareexcept:/Exception).♻️ Proposed refactor
try: import torch # type: ignore[reportMissingImports] -except Exception: # pragma: no cover - fallback for minimal environments +except ImportError: # pragma: no cover - fallback for minimal environments torch = None # type: ignore[assignment]As per coding guidelines: "Avoid bare
except:clauses; catch specific exceptions."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/core/generation/device_mapping.py` around lines 10 - 13, The bare "except Exception" in the torch import fallback should be narrowed to catch ImportError only; update the import block that sets "torch = None" (the optional torch import) to use "except ImportError" so we only handle the missing-import case and avoid masking other exceptions.
86-98: Brittle index parsing invalidate_component_device_map.
int(device.split(":", 1)[1])will raise an unhelpfulValueError: invalid literal for int()for malformed values like"cuda:abc"or"cuda: 0". Consider catching the parse failure and re-raising the sameValueErrorstyle used a few lines below so the user gets a consistent, contextual error.♻️ Proposed refactor
if not device or not device.startswith("cuda:"): continue - idx = int(device.split(":", 1)[1]) + try: + idx = int(device.split(":", 1)[1]) + except ValueError as exc: + raise ValueError( + f"Invalid {component} device '{device}': expected 'cuda:<int>'." + ) from exc if idx < 0 or idx >= cuda_count:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/core/generation/device_mapping.py` around lines 86 - 98, In validate_component_device_map, parsing the CUDA index via int(device.split(":", 1)[1]) is brittle and can raise raw ValueError for malformed strings; wrap the parsing of device (the part after "cuda:") in a try/except (catch ValueError and possibly TypeError), and on parse failure raise the same contextual ValueError used below that mentions the component, the original device string, and the available cuda_count so users get a consistent, helpful error message (use the existing symbols component, device, cuda_count and keep the same error format).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@acestep/core/generation/device_mapping.py`:
- Around line 10-13: The bare "except Exception" in the torch import fallback
should be narrowed to catch ImportError only; update the import block that sets
"torch = None" (the optional torch import) to use "except ImportError" so we
only handle the missing-import case and avoid masking other exceptions.
- Around line 86-98: In validate_component_device_map, parsing the CUDA index
via int(device.split(":", 1)[1]) is brittle and can raise raw ValueError for
malformed strings; wrap the parsing of device (the part after "cuda:") in a
try/except (catch ValueError and possibly TypeError), and on parse failure raise
the same contextual ValueError used below that mentions the component, the
original device string, and the available cuda_count so users get a consistent,
helpful error message (use the existing symbols component, device, cuda_count
and keep the same error format).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f2b2f91e-b5b4-4f76-8f4b-5e110d8fbe87
📒 Files selected for processing (6)
acestep/api/startup_llm_init.pyacestep/api/startup_llm_init_test.pyacestep/api/startup_model_init.pyacestep/core/generation/device_mapping.pyacestep/core/generation/device_mapping_test.pyacestep/core/generation/handler/init_service_orchestrator.py
🚧 Files skipped from review as they are similar to previous changes (3)
- acestep/api/startup_llm_init_test.py
- acestep/api/startup_model_init.py
- acestep/core/generation/handler/init_service_orchestrator.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
acestep/ui/gradio/events/generation/service_init.py (1)
18-20: Cross‑layer import:events/shouldn't depend oninterfaces/.
build_component_gpu_hint_textis a pure string formatter aroundresolve_component_device_map()— it has no Gradio UI construction in it. Pulling it fromacestep/ui/gradio/interfaces/generation_service_config_rows.pyinto an event handler reverses the usual dependency direction (events → interfaces) and risks future circular imports if the interfaces module ever needs to import fromevents/.Consider relocating the helper to a neutral location (e.g. alongside
resolve_component_device_mapinacestep/core/generation/device_mapping.py, or a smallui/gradio/helpers/module), and have bothgeneration_service_config_rows.pyandservice_init.pyimport it from there.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/ui/gradio/events/generation/service_init.py` around lines 18 - 20, The event module service_init.py currently imports build_component_gpu_hint_text from interfaces/generation_service_config_rows.py, creating a cross-layer dependency; move the helper build_component_gpu_hint_text to a neutral module (e.g., colocate with resolve_component_device_map in acestep/core/generation/device_mapping.py or create a small acestep/ui/gradio/helpers module), update generation_service_config_rows.py and service_init.py to import build_component_gpu_hint_text from the new neutral location, and ensure resolve_component_device_map remains referenced from its original module so no interfaces → events import occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@acestep/ui/gradio/interfaces/generation_service_config_rows.py`:
- Around line 21-29: build_component_gpu_hint_text currently always returns a
hardcoded English hint even when resolve_component_device_map() yields no
resolved components (noise on non‑CUDA/single‑GPU systems) and bypasses i18n;
update build_component_gpu_hint_text to (1) detect an empty ComponentDeviceMap
or the case where dit, vae and lm all resolve to the same device (including
matching the default_device) and return an empty string in those cases, and (2)
localize the leading label by calling t("service.gpu_auto_tier") (or the
appropriate key) instead of the literal "Component GPU hint: "; also ensure
callers build_gpu_info_and_tier and on_tier_change skip the " \n" prefix when
the helper returns an empty string so no extra whitespace is rendered.
---
Nitpick comments:
In `@acestep/ui/gradio/events/generation/service_init.py`:
- Around line 18-20: The event module service_init.py currently imports
build_component_gpu_hint_text from interfaces/generation_service_config_rows.py,
creating a cross-layer dependency; move the helper build_component_gpu_hint_text
to a neutral module (e.g., colocate with resolve_component_device_map in
acestep/core/generation/device_mapping.py or create a small
acestep/ui/gradio/helpers module), update generation_service_config_rows.py and
service_init.py to import build_component_gpu_hint_text from the new neutral
location, and ensure resolve_component_device_map remains referenced from its
original module so no interfaces → events import occurs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ed738ce9-3f49-4137-a002-1dc25b49f4b3
📒 Files selected for processing (2)
acestep/ui/gradio/events/generation/service_init.pyacestep/ui/gradio/interfaces/generation_service_config_rows.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
acestep/ui/gradio/interfaces/generation_service_config_rows.py (1)
1-245: 🏗️ Heavy liftSplit this row-builder module before it grows further.
generation_service_config_rows.pyis already over the repo's 200-line hard cap, and the new GPU hint logic adds more responsibility to a file that already builds language, GPU, checkpoint, model, and LM controls. Extracting the GPU-info/tier block into a smaller helper module would keep the boundary clearer and bring the file back under policy.As per coding guidelines: Module LOC policy is met (<=150 target, <=200 hard cap or justified exception).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@acestep/ui/gradio/interfaces/generation_service_config_rows.py` around lines 1 - 245, The file exceeds the repo LOC cap and should be split by extracting the GPU info/tier UI into its own helper module: move the build_gpu_info_and_tier function (and its direct dependencies: the calls to format_component_gpu_hint_text, get_gpu_device_name, and GPU_TIER_LABELS lookup plus any imports used only by that function) into a new module (e.g., generation_service_gpu_rows) and update imports in generation_service_config_rows.py to import build_gpu_info_and_tier from the new module; ensure the new module exports the same function signature and that remaining imports (t, GPU_TIER_LABELS, format_component_gpu_hint_text, get_gpu_device_name) are adjusted there so the original callers and tests continue to work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@acestep/ui/gradio/interfaces/generation_service_config_rows.py`:
- Around line 1-245: The file exceeds the repo LOC cap and should be split by
extracting the GPU info/tier UI into its own helper module: move the
build_gpu_info_and_tier function (and its direct dependencies: the calls to
format_component_gpu_hint_text, get_gpu_device_name, and GPU_TIER_LABELS lookup
plus any imports used only by that function) into a new module (e.g.,
generation_service_gpu_rows) and update imports in
generation_service_config_rows.py to import build_gpu_info_and_tier from the new
module; ensure the new module exports the same function signature and that
remaining imports (t, GPU_TIER_LABELS, format_component_gpu_hint_text,
get_gpu_device_name) are adjusted there so the original callers and tests
continue to work.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c2fdc42f-8755-43bd-8695-6daea536c993
📒 Files selected for processing (6)
acestep/core/generation/device_mapping.pyacestep/core/generation/device_mapping_test.pyacestep/ui/gradio/events/generation/service_init.pyacestep/ui/gradio/i18n/en.jsonacestep/ui/gradio/i18n/zh.jsonacestep/ui/gradio/interfaces/generation_service_config_rows.py
✅ Files skipped from review due to trivial changes (2)
- acestep/ui/gradio/i18n/en.json
- acestep/ui/gradio/i18n/zh.json
🚧 Files skipped from review as they are similar to previous changes (2)
- acestep/ui/gradio/events/generation/service_init.py
- acestep/core/generation/device_mapping_test.py
|
So, I tried using this by creating a branch off the current code in the main branch and merging yours in. No issues getting it to start up, but I didn't see it move anything to the second GPU. For the record, my setup is: ThinkStation p720 with 32GB RAM on Windows 10 The generated hint was: DiT=cuda:0, VAE=cuda:1, LM=cuda:1 Suggesting the DiT music model would remain on the 4060ti, and the others would move onto 3060. However, only the 4060 ti was used, and the VAE and LM were offloaded to the CPU. Do I need to override the setup based on my total memory to make this work? |
|
according to the code, all the available gpus should be shown and ranked, based on the ranking the module should be offloaded. Thanks for the feedback, will update the code. |
I believe I also had "offload to CPU" on, which is checked by default in this configuration. I did turn off dit to CPU, though. |
|
So, I gave editing the code a try myself, and wanted to explain some of the things that make this more difficult than it might seem:
|
Multi GPU support for component offloading.
issue link: #426
Earlier: GPU 0 used to perform dit, vae and lm tasks.
Now:
expected result GPU i will take respective component.
GPU allocation is done by ranking at runtime, where one gpu has one component to execute.
Summary by CodeRabbit
New Features
UI
Tests