Skip to content

Feature/eval with badcase#27

Open
xujiayuan0205 wants to merge 2 commits into
mainfrom
feature/eval_with_badcase
Open

Feature/eval with badcase#27
xujiayuan0205 wants to merge 2 commits into
mainfrom
feature/eval_with_badcase

Conversation

@xujiayuan0205

Copy link
Copy Markdown
Contributor

No description provided.

…x ToMi field mapping

- Add StructuredResult dataclass wrapping parsed output with raw_response and reasoning_content
- Add src/judge.py for LLM semantic judge fallback when structured extraction fails
- Extend runner.py with collect_badcases() and build_corrected_predictions()
- Add dual metrics (strict + judge-corrected) to all task run.py scripts
- Fix ToMi field mapping: Story.full_story/Question/Answer.Correct_Answer
- Fix reasoning_content capture: support both 'reasoning' and 'reasoning_content' field names
- Fix run_all.py subprocess PYTHONPATH for src module resolution
- Update SUMMARY.md with deepseek-chat and deepseek-r1 results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant