Skip to content

[codex] Downgrade invalid extracted terms to strings#67

Merged
justinjoy merged 2 commits into
mainfrom
codex/downgrade-invalid-extracted-terms
Jul 2, 2026
Merged

[codex] Downgrade invalid extracted terms to strings#67
justinjoy merged 2 commits into
mainfrom
codex/downgrade-invalid-extracted-terms

Conversation

@justinjoy

Copy link
Copy Markdown
Contributor

Summary

  • Downgrade invalid extracted kind="term" slots to strings instead of failing the whole source analysis.
  • Preserve valid structural slots in the same fact, so mixed string/compound facts still work.
  • Tighten the extraction prompt to tell models that bare Korean/Chinese labels and non-Datalog text must be kind="string".

Why

Local Ollama extraction marked a Korean company name as kind="term", which failed parsing and caused source analysis to fail. The safer review behavior is to keep the invalid slot as a StringLit, preserve the valid compound slot, and surface the downgrade reason in the candidate note.

Validation

  • python -m pytest tests/test_llm_schema.py tests/test_pipeline.py tests/test_fact_term_e2e.py -q
  • python -m pytest -q

@justinjoy justinjoy marked this pull request as ready for review July 2, 2026 09:15
@justinjoy justinjoy force-pushed the codex/downgrade-invalid-extracted-terms branch from f96c4a1 to 172c4a7 Compare July 2, 2026 09:22
@justinjoy justinjoy force-pushed the codex/downgrade-invalid-extracted-terms branch from 172c4a7 to ec3ac43 Compare July 2, 2026 09:29
@justinjoy justinjoy merged commit cb27f60 into main Jul 2, 2026
4 checks passed
@justinjoy justinjoy deleted the codex/downgrade-invalid-extracted-terms branch July 2, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant