Prior to d881f2a I was able to load the q2-q4 imatrix model both with and without MTP on a Framework Desktop:
nsm@neuro ~/src/ds4 $ ./ds4-server --mtp /home/nsm/src/ds4/gguf/DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf --mtp-draft 2 --ctx 100000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 8192 --host 0.0.0.0
ds4: MTP support model loaded: /home/nsm/src/ds4/gguf/DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf (draft=2)
ds4: ROCm backend initialized on Radeon 8060S Graphics (sm_115)
ds4: ROCm chunk-copying 90.89 GiB model image
ds4: ROCm loading model tensors into device cache: 90.06 GiBds4: ROCm model chunk copy complete in 18.247s (90.88 GiB tensors)
ds4: ROCm chunk-copying 3.55 GiB model image
ds4: ROCm loading model tensors into device cache: 2.06 GiBds4: ROCm model chunk copy complete in 0.701s (3.55 GiB tensors)
ds4: ROCm preparing model tensor mappings: 90.36 GiB
ds4: ROCm startup model preparation covered 90.88 GiB of tensor spans in 0.787s
ds4: ROCm preparing model tensor mappings: 2.38 GiB
ds4: ROCm startup model preparation covered 3.55 GiB of tensor spans in 0.018s
ds4: rocm backend initialized for graph diagnostics
0614 14:21:27 ds4-server: context buffers 3567.06 MiB (ctx=100000, backend=rocm, prefill_chunk=8192, raw_kv_rows=8192, compressed_kv_rows=25002)
0614 14:21:27 ds4-server: KV disk cache /tmp/ds4-kv (budget=8192 MiB, cross-quant=accept, min=512, cold_max=30000, continued=10000, trim=32, align=2048, hit_half_life=21600s)
0614 14:21:27 ds4-server: listening on http://0.0.0.0:8000
After that commit, I can't load the q2-q4 with or without MTP:
nsm@neuro ~/src/ds4 $ ./ds4-server --mtp /home/nsm/src/ds4/gguf/DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf --mtp-draft 2 --ctx 100000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 8192 --host 0.0.0.0
ds4: MTP support model loaded: /home/nsm/src/ds4/gguf/DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf (draft=2)
ds4: ROCm backend initialized on Radeon 8060S Graphics (sm_115)
ds4: ROCm preparing model tensor mappings: 60.76 GiBds4: ROCm model range alloc failed for tensor-span:108 (528.00 MiB): out of memory
ds4: accelerator failed to prepare model tensor span 108 at offset 65244457792
ds4: rocm failed to prepare optional model cache
nsm@neuro ~/src/ds4 $ ./ds4-server --ctx 100000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 8192 --host 0.0.0.0
ds4: ROCm backend initialized on Radeon 8060S Graphics (sm_115)
ds4: ROCm preparing model tensor mappings: 90.36 GiB
ds4: ROCm startup model preparation covered 90.88 GiB of tensor spans in 27.024s
ds4: rocm backend initialized for graph diagnostics
0614 14:16:14 ds4-server: context buffers 3567.06 MiB (ctx=100000, backend=rocm, prefill_chunk=8192, raw_kv_rows=8192, compressed_kv_rows=25002)
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor fill f32 launch failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
ds4: ROCm tensor alloc failed: out of memory
0614 14:16:15 ds4-server: failed to create rocm session
Prior to d881f2a I was able to load the q2-q4 imatrix model both with and without MTP on a Framework Desktop:
After that commit, I can't load the q2-q4 with or without MTP: