Skip to content

Add paper experiment scripts#47

Closed
ErlisLushtaku wants to merge 1 commit into
pr32-split/04-elo-sampling-geminifrom
pr32-split/05-paper-experiment-scripts
Closed

Add paper experiment scripts#47
ErlisLushtaku wants to merge 1 commit into
pr32-split/04-elo-sampling-geminifrom
pr32-split/05-paper-experiment-scripts

Conversation

@ErlisLushtaku
Copy link
Copy Markdown
Collaborator

Summary

  • Add scripts for LMArena Elo analysis, Phase B serial execution, and paper table generation.
  • Update the experiment launcher default judge model for the paper benchmark setup.
  • Keep paper artifact generation as the final consumer of the stacked runtime changes.

Stack

  • Base: pr32-split/04-elo-sampling-gemini.
  • This is the final PR in the stack.

Test plan

  • uv run ruff check scripts/analyze_lmarena_elo.py scripts/phase_b_marena_localized_prompt_table.py scripts/phase_b_paper_table.py scripts/phase_b_serial_runner.py slurmpilot_scripts/launch_generation_and_evaluation.py
  • uv run python -m py_compile scripts/analyze_lmarena_elo.py scripts/phase_b_marena_localized_prompt_table.py scripts/phase_b_paper_table.py scripts/phase_b_serial_runner.py slurmpilot_scripts/launch_generation_and_evaluation.py

Add standalone scripts for LMArena Elo analysis, Phase B serial execution, and paper table generation so paper artifacts can be regenerated from run outputs.

Includes-AI-Code: true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant