llm-server

Docker-based LLM serving with llama.cpp. Auto-downloads models from Hugging Face and serves them via an OpenAI-compatible API.

Quickstart

Add models to models.toml:

[[models]]
name = "my-model"
repo = "org/model-GGUF"
file = "model-q4_k_m.gguf"

Download models and start the server:

just sync    # download missing models
just up      # start server

The server runs at http://localhost:8080.

Commands

Command	Description
`just up`	Start server
`just down`	Stop server
`just restart`	Restart server
`just sync`	Download missing models from Hugging Face
`just logs`	Follow container logs

Config

models.toml — single source of truth. Define global settings and per-model config:

[global]
n-gpu-layers = -1
flash-attn = true

[[models]]
name = "gemma4-v2"
repo = "org/repo"
file = "model.gguf"

Models are auto-downloaded on container start or via just sync. The server reads the generated /config.ini at startup — no manual config file needed.

API

The server exposes an OpenAI-compatible API at http://localhost:8080:

curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "gemma4-v2", "messages": [{"role": "user", "content": "hello"}]}'

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
justfile		justfile
models.toml		models.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-server

Quickstart

Commands

Config

API

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-server

Quickstart

Commands

Config

API

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages