feat(api): add optional MarkItDown OCR support#2145
Draft
Sanderhoff-alt wants to merge 1 commit into
Draft
Conversation
MarkItDown advertises image extensions, but without an OCR model the image path cannot read screenshots or scanned pages and can fail with low-level parsing/no-content errors. Add server-level MARKITDOWN_OCR_* config that is off by default, falls back to the main LLM API key/base URL/model when unset, and wires those settings into MarkItDown's llm_client support. Image uploads now fail fast with an actionable OCR configuration error when OCR is disabled. Docs and front-end copy also explain that image OCR depends on server configuration. Closes vectorize-io#927
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds optional OCR support for the default MarkItDown file parser by wiring MarkItDown's OpenAI-compatible
llm_clientintegration into Hindsight configuration.This addresses image and scanned-document uploads where MarkItDown advertises image extensions but, without a vision-capable OCR model, cannot extract useful text from screenshots or scanned pages.
Closes #927.
Motivation
Before this change,
.jpg,.jpeg, and.pngfiles were accepted by the defaultmarkitdownparser path, but deployments without OCR support could end up with low-level parser/no-content errors. From an API or control-plane user perspective, that did not explain whether the file type was unsupported, the parser was misconfigured, or OCR simply was not enabled.The goal is to keep the default local MarkItDown behavior unchanged, while giving operators a clear opt-in path for OCR using a vision-capable OpenAI-compatible endpoint.
What Changed
HINDSIGHT_API_FILE_PARSER_MARKITDOWN_OCR_ENABLEDHINDSIGHT_API_FILE_PARSER_MARKITDOWN_OCR_API_KEYHINDSIGHT_API_FILE_PARSER_MARKITDOWN_OCR_BASE_URLHINDSIGHT_API_FILE_PARSER_MARKITDOWN_OCR_MODELHINDSIGHT_API_FILE_PARSER_MARKITDOWN_OCR_PROMPTllm_client, withllm_modelandllm_prompt..env.examplefiles.Configuration Behavior
OCR is opt-in:
When enabled, the MarkItDown OCR-specific settings take precedence:
If those are not set, the parser falls back to the existing main LLM settings:
The selected endpoint must support OpenAI Chat Completions with image input, because MarkItDown's OCR integration is model/client based.
User-Facing Behavior
With OCR disabled, image uploads now fail with an explicit, actionable error similar to:
This replaces the less helpful behavior where image conversion could fail later with a generic no-content or parser error.
For parser fallback chains such as
iris,markitdown, this remains compatible with the existing fallback behavior: MarkItDown failures can still allow the next configured parser to run.Compatibility
Tests
Added coverage for:
Validation run locally:
Result:
8 passed.Also run:
The commit pre-hook also ran
generate-docs-skill.shandlint.shsuccessfully.