LocalEmbed

A lightweight text embedding API designed as a drop-in replacement for the OpenAI embeddings endpoint.

Built with FastAPI and fastembed, LocalEmbed is optimized for running local document processing and vector pipelines securely on your own infrastructure.

Features

OpenAI SDK Compatible: Natively mirrors the /v1/embeddings schema. Point your existing OpenAI client to localhost and it just works.
Privacy First: 100% local execution. No data ever leaves your network.
Zero-Latency Starts: Automatically pre-loads your default model into memory on server boot.
Container-Native: Multi-stage Docker build utilizing uv for a minimal, highly optimized runtime footprint.
CPU + GPU Ready: Published Docker images for both CPU (latest) and NVIDIA GPU (latest-gpu) deployments.

Getting Started

Prerequisites

Docker (Recommended)
For GPU deployment: NVIDIA GPU + drivers + NVIDIA Container Toolkit
Python 3.12+ (for local development)

Configuration

LocalEmbed uses optional environment variables for configuration. Create a .env file in the root directory:

Copy the sample environment file from here:
```
cp .env.sample .env
```
Open the .env file and set your desired configurations (like DEFAULT_EMBEDDING_MODEL or HF_TOKEN).

Environment variables:

DEFAULT_EMBEDDING_MODEL: model to preload on startup
HF_TOKEN: optional, useful to avoid model download rate limits
MODEL_CACHE_LIMIT: max number of models kept in memory (LRU eviction)
EMBEDDING_THREADS: CPU threads for embedding computation
BATCH_SIZE: number of inputs processed per batch
USE_GPU: set true to force CUDA provider in local/non-GPU-image runs

Deployment (Docker)

The easiest and recommended way to run LocalEmbed is using the pre-built Docker image from Docker Hub.

Option 1: Docker CLI (CPU)

docker run -d --pull=always --name localembed --env-file .env -p 8000:8000 heshinth/localembed:latest

Option 2: Docker CLI (GPU)

docker run -d --pull=always --gpus all --name localembed-gpu --env-file .env -p 8000:8000 heshinth/localembed:latest-gpu

Option 3: Docker Compose (CPU)

The compose file includes environment variables directly within it.

Download the docker-compose.yml file from here

You can edit the file to configure it, then simply run:

docker compose up -d

Option 4: Docker Compose (GPU)

Download the docker-compose.gpu.yml file from here, then run:

docker compose -f docker-compose.gpu.yml up -d

Docker Tag Scheme

For a release tag like v0.1.3, published image tags are:

CPU: latest, 0.1.3, 0.1
GPU: latest-gpu, 0.1.3-gpu, 0.1-gpu

The API will be available at: http://localhost:8000/v1.

Local Development

If you want to run the application natively without Docker:

Install dependencies for CPU mode:
```
uv sync --extra cpu
```
Run the FastAPI development server:
```
fastapi dev app/main.py
```

For local GPU mode:

Install dependencies for GPU mode:
```
uv sync --extra gpu
```
Start with GPU provider enabled:
```
USE_GPU=true fastapi dev app/main.py
```

API Endpoints

GET /v1/health — Health check
POST /v1/embeddings — Generate text embeddings using local models (OpenAI API compatible)
GET /v1/models — List supported and ready-to-use embedding models

Supported Models

LocalEmbed supports all dense text embedding models provided by fastembed.

You can view the full list of supported models in the FastEmbed Documentation, or programmatically query your running instance via the API:

GET http://localhost:8000/v1/models

Usage with OpenAI SDK

Since the /v1/embeddings endpoint is OpenAI API compatible, you can easily use the official openai Python package to interact with it just like the real OpenAI API:

from openai import OpenAI

# Initialize the client pointing to the local base URL
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-no-key-required"
)

# Generate an embedding
response = client.embeddings.create(
    input=["Hello, world!"],
    model="BAAI/bge-small-en-v1.5"  # Replace with any supported model
)

print(response.data[0].embedding)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
app		app
.dockerignore		.dockerignore
.env.sample		.env.sample
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalEmbed

Features

Getting Started

Prerequisites

Configuration

Deployment (Docker)

Option 1: Docker CLI (CPU)

Option 2: Docker CLI (GPU)

Option 3: Docker Compose (CPU)

Option 4: Docker Compose (GPU)

Docker Tag Scheme

Local Development

API Endpoints

Supported Models

Usage with OpenAI SDK

License

About

Uh oh!

Releases 4

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LocalEmbed

Features

Getting Started

Prerequisites

Configuration

Deployment (Docker)

Option 1: Docker CLI (CPU)

Option 2: Docker CLI (GPU)

Option 3: Docker Compose (CPU)

Option 4: Docker Compose (GPU)

Docker Tag Scheme

Local Development

API Endpoints

Supported Models

Usage with OpenAI SDK

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Contributors

Uh oh!

Languages