Skip to content

Replace source analysis with chunked resumable extraction#71

Merged
justinjoy merged 8 commits into
mainfrom
codex/extract-verbatim-source-text
Jul 3, 2026
Merged

Replace source analysis with chunked resumable extraction#71
justinjoy merged 8 commits into
mainfrom
codex/extract-verbatim-source-text

Conversation

@justinjoy

Copy link
Copy Markdown
Contributor

Summary

  • replace monolithic web source analysis with durable extraction jobs and persisted source chunks
  • process uploads chunk-by-chunk with progress, failed-chunk retry, resume on server restart, and duplicate chunk-claim protection
  • link chunked candidate facts to source, run, and extraction job; clean jobs/chunks on source deletion
  • preserve source text more strictly and drop translated Han facts from Korean sources
  • make Ollama request timeout configurable while removing source analysis dependence on one long request

Tests

  • .venv/bin/python -m pytest -q

Closes #70

@justinjoy justinjoy merged commit 4e6b9ce into main Jul 3, 2026
4 checks passed
@justinjoy justinjoy deleted the codex/extract-verbatim-source-text branch July 3, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace monolithic source analysis with chunked resumable extraction

1 participant