aiproxy

A model-aware, OpenAI-compatible reverse proxy for pools of local/remote LLM backends (llama.cpp, vLLM, OpenAI, Anthropic, Google, Ollama).

WARNING! This is still a work-in-progress. My main use-case is providing a single OpenAI API-compatible service from which I can reach multiple llama-server (llama.cpp) instances in my AI lab. It is working really well for this use-case so far. However I have not yet thoroughly tested the other backend LLM services. If you encounter any problems, please open an issue or PR.

Supported backend `schema` values

Schema	Description
`llamacpp`	llama.cpp servers (OpenAI-compatible wire format, `/health` probe). Also works for vLLM, which exposes an OpenAI-compatible server with the same `/health` probe.
`openai`	OpenAI API and OpenAI-compatible providers
`anthropic`	Anthropic Messages API (`/v1/messages`)
`google`	Google Gemini API
`ollama`	Native Ollama API (`/api/chat`, `/api/embed`, `/api/tags`) with NDJSON streaming. Supports `chat_completions` and `embeddings`. `rerank` is unsupported. Ollama also exposes an OpenAI-compatible `/v1/*` shim, for which you can use the `openai` schema.

Supported Endpoints

Endpoint	API path
chat_completions	`/v1/chat/completions`
embeddings	`/v1/embeddings`
rerank	`/v1/rerank`
infill	`/infill`

infill is a byte-for-byte pass-through to llama.cpp's native /infill endpoint and is only valid with schema: llamacpp. Both the request body and the response body (including SSE stream frames) are forwarded unchanged so the wire shape matches llama.cpp exactly. Unlike the other routes there is no /v1/ alias, no canonical OpenAI envelope, and no data: [DONE] sentinel on the stream.

Example `config.yaml`

server:
  host: '0.0.0.0'
  port: 8080
  api_key: '012345678901234567890123456789'

  # TLS/HTTPS Configuration (optional)
  # Uncomment and configure to enable HTTPS
  tls:
    enabled: false  # Set to true to enable HTTPS
    # cert_file: '/path/to/server-cert.pem'     # Server certificate file
    # key_file: '/path/to/server-key.pem'       # Server private key file

log:
  level: 'info'   # debug, info, warn, error
  format: 'json'  # json, console

pools:
  - model: 'qwen3.5-122b'
    endpoint: 'chat_completions' # chat_completions, embeddings, rerank, infill (infill requires schema: llamacpp)
    schema: 'llamacpp' # anthropic, google, openai, llamacpp, ollama
    instances:
      - url: 'http://192.0.2.11:8080'
        api_key: 'abcdefabcdefabcdefabcdef'
        # tls:                                  # May be necessary if URL is https:
        #   ca_file: '/path/to/backend-ca.pem'  # Custom CA certificate for backend validation (use OS-installed CA certs if not specified)
        #   insecure_skip_verify: false         # Set to true to skip certificate verification (not recommended for production)
      - url: 'http://192.0.2.12:8080'
        api_key: 'abcdefabcdefabcdefabcdef'
        tls:
          ca_file: '/path/to/backend-ca.pem'
          insecure_skip_verify: false
    session_timeout: 300       # seconds
    health_check_interval: 30  # seconds

  - model: 'nemotron-embed-1b-v2'
    endpoint: 'embeddings' # chat_completions, embeddings, rerank, infill (infill requires schema: llamacpp)
    schema: 'llamacpp' # anthropic, google, openai, llamacpp, ollama
    instances:
      - url: 'http://192.0.2.13:8080'
        api_key: 'fedcbafedcbafedcbafedcba'
    session_timeout: 300       # seconds
    health_check_interval: 30  # seconds

Example `docker-compose.yaml`

services:
  aiproxy:
    image: robcowart/aiproxy:latest
    container_name: aiproxy
    restart: unless-stopped
    volumes:
      - ./config.yaml:/etc/aiproxy/config.yaml:ro
    ports:
      - 8080:8080/tcp

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cmd/aiproxy		cmd/aiproxy
pkg		pkg
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aiproxy

Supported backend `schema` values

Supported Endpoints

Example `config.yaml`

Example `docker-compose.yaml`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aiproxy

Supported backend schema values

Supported Endpoints

Example config.yaml

Example docker-compose.yaml

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Supported backend `schema` values

Example `config.yaml`

Example `docker-compose.yaml`

Packages