Extract hardcoded (burned-in) subtitles from video files using Apple Vision OCR on macOS.
subextract reads a video file, samples frames at a configurable rate, crops the subtitle region, uses Apple's Vision framework to OCR the text, deduplicates consecutive frames, and outputs a standard SRT subtitle file.
- macOS (uses Apple Vision framework)
- Python 3.9–3.12
- ffmpeg (optional, for videos OpenCV can't decode)
pip install subextract# Extract subtitles with defaults
subextract video.mp4 -o subtitles.srt
# Use accurate OCR mode (slower but better)
subextract video.mp4 -o subtitles.srt --recognition-level accurate
# Extract a specific time range
subextract video.mp4 -o subtitles.srt --start 14:30 --end 15:30subextract can be configured via CLI flags, a YAML config file, or both. Priority: CLI flags > config file > defaults.
subextract --init-configThis creates config.yaml with all options and their defaults:
# Subtitle crop region (pixels from top of frame)
crop:
y: 400 # Top of crop region
height: 105 # Height of crop region
# OCR settings
ocr:
recognition_level: fast # "fast" or "accurate"
language: en-US # OCR language code
# Frame sampling
sampling:
fps: 3.0 # Frames per second to sample
variance_threshold: 100.0 # Skip frames below this variance
# Text processing
text:
similarity_threshold: 0.75 # Merge consecutive subs above this
min_duration: 0.3 # Discard subs shorter than this (seconds)
brightness_threshold: 200 # White-text masking threshold (0-255)subextract video.mp4 --config config.yaml -o subtitles.srtsubextract video.mp4 [options]
positional arguments:
video Path to video file
options:
-o, --output PATH Output SRT path (default: output.srt)
--config PATH Path to YAML config file
--init-config Write config.yaml with defaults and exit
--version Show version and exit
--crop-y INT Crop region top Y (pixels)
--crop-height INT Crop region height (pixels)
--sample-fps FLOAT Frames per second to sample
--recognition-level MODE "fast" or "accurate"
--language CODE OCR language code (e.g. en-US)
--variance-threshold FLOAT Skip frames below this variance
--similarity-threshold FLOAT Text similarity for dedup (0-1)
--min-duration FLOAT Min subtitle duration in seconds
--brightness-threshold INT White-text masking threshold (0-255)
--no-mask Disable brightness masking (for subs with drop shadow)
--start TIME Start time (HH:MM:SS or MM:SS)
--end TIME End time (HH:MM:SS or MM:SS)
- Frame sampling — Reads the video and samples frames at the configured rate (default 3 fps)
- Crop — Extracts the subtitle region from each frame
- White-text masking — Thresholds the crop to isolate bright white subtitle text, removing background noise
- Variance check — Skips frames with low variance (no text present)
- OCR — Runs Apple Vision text recognition on each frame via ocrmac
- Deduplication — Merges consecutive frames with similar text into single subtitle entries
- SRT output — Writes standard SRT format with sequential numbering and timestamps
- macOS only — Requires Apple Vision framework (no Linux/Windows support)
- White text assumed — The brightness masking works best with white or light-colored subtitles on darker backgrounds. For subtitles with a drop shadow or outline, use
--no-maskto skip masking and let Vision OCR the raw crop directly - OCR quality varies — Low resolution video, unusual fonts, or complex backgrounds may produce errors that need manual cleanup
- No auto-detection — You need to specify the subtitle crop region manually (use a screenshot to determine the Y position and height)
Contributions are welcome. Please open an issue first to discuss what you'd like to change.