Skip to content

kerim/subextract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

subextract

Extract hardcoded (burned-in) subtitles from video files using Apple Vision OCR on macOS.

subextract reads a video file, samples frames at a configurable rate, crops the subtitle region, uses Apple's Vision framework to OCR the text, deduplicates consecutive frames, and outputs a standard SRT subtitle file.

Requirements

  • macOS (uses Apple Vision framework)
  • Python 3.9–3.12
  • ffmpeg (optional, for videos OpenCV can't decode)

Installation

pip install subextract

Quick Start

# Extract subtitles with defaults
subextract video.mp4 -o subtitles.srt

# Use accurate OCR mode (slower but better)
subextract video.mp4 -o subtitles.srt --recognition-level accurate

# Extract a specific time range
subextract video.mp4 -o subtitles.srt --start 14:30 --end 15:30

Configuration

subextract can be configured via CLI flags, a YAML config file, or both. Priority: CLI flags > config file > defaults.

Generate a config file

subextract --init-config

This creates config.yaml with all options and their defaults:

# Subtitle crop region (pixels from top of frame)
crop:
  y: 400              # Top of crop region
  height: 105         # Height of crop region

# OCR settings
ocr:
  recognition_level: fast   # "fast" or "accurate"
  language: en-US           # OCR language code

# Frame sampling
sampling:
  fps: 3.0                   # Frames per second to sample
  variance_threshold: 100.0  # Skip frames below this variance

# Text processing
text:
  similarity_threshold: 0.75  # Merge consecutive subs above this
  min_duration: 0.3           # Discard subs shorter than this (seconds)
  brightness_threshold: 200   # White-text masking threshold (0-255)

Use a config file

subextract video.mp4 --config config.yaml -o subtitles.srt

CLI Reference

subextract video.mp4 [options]

positional arguments:
  video                       Path to video file

options:
  -o, --output PATH           Output SRT path (default: output.srt)
  --config PATH               Path to YAML config file
  --init-config               Write config.yaml with defaults and exit
  --version                   Show version and exit

  --crop-y INT                Crop region top Y (pixels)
  --crop-height INT           Crop region height (pixels)
  --sample-fps FLOAT          Frames per second to sample
  --recognition-level MODE    "fast" or "accurate"
  --language CODE             OCR language code (e.g. en-US)
  --variance-threshold FLOAT  Skip frames below this variance
  --similarity-threshold FLOAT Text similarity for dedup (0-1)
  --min-duration FLOAT        Min subtitle duration in seconds
  --brightness-threshold INT  White-text masking threshold (0-255)
  --no-mask                   Disable brightness masking (for subs with drop shadow)
  --start TIME                Start time (HH:MM:SS or MM:SS)
  --end TIME                  End time (HH:MM:SS or MM:SS)

How It Works

  1. Frame sampling — Reads the video and samples frames at the configured rate (default 3 fps)
  2. Crop — Extracts the subtitle region from each frame
  3. White-text masking — Thresholds the crop to isolate bright white subtitle text, removing background noise
  4. Variance check — Skips frames with low variance (no text present)
  5. OCR — Runs Apple Vision text recognition on each frame via ocrmac
  6. Deduplication — Merges consecutive frames with similar text into single subtitle entries
  7. SRT output — Writes standard SRT format with sequential numbering and timestamps

Limitations

  • macOS only — Requires Apple Vision framework (no Linux/Windows support)
  • White text assumed — The brightness masking works best with white or light-colored subtitles on darker backgrounds. For subtitles with a drop shadow or outline, use --no-mask to skip masking and let Vision OCR the raw crop directly
  • OCR quality varies — Low resolution video, unusual fonts, or complex backgrounds may produce errors that need manual cleanup
  • No auto-detection — You need to specify the subtitle crop region manually (use a screenshot to determine the Y position and height)

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

License

GPL v3

About

Extract hardcoded subtitles from video using Apple Vision OCR

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages