Skip to content

sileod/reasoning-core

Repository files navigation

Reasoning Core ◉

reasoning-core is a suite of textual procedural data generators for language model pre-training and post-training. It is centered on expressive formal and algorithmic tasks, including full fledged first-order-logic, formal mathematics with Lean/TPTP, planning, and CFG syntax tasks.

We release pre-generated data scaled to more than 10B tokens
🤗 https://hf.co/collections/reasoning-core/datasets

Standalone

uv pip install reasoning-core

from reasoning_core import list_tasks, get_task, score_answer

T = get_task('arithmetics')()
x = T.generate_example()
assert score_answer(x.answer, x)==1

Task examples and task authoring guide

GALLERY (names link to task code)

arithmetics · math_word_problem · equation_system · lean_candidate_compilation · lean_missing_proof_line_selection · conjecture_entailment · tptp_consistency_repair · planar_geometry_relations · lambda_reduction · rewrite_system · most_probable_evidence · most_probable_outcome · evidence_retrieval · logic_nli · logic_qa · multistep_abduction · multistep_evidence_retrieval · multistep_nli · planning · count_elements · set_equality · set_intersection · set_missing_element · sequential_induction · qualitative_reasoning · navigation · reference_tracking · coreference · constraint_satisfaction · graph_dependencies · graph_pathfinding · graph_successors · regex_following · regex_induction · regex_reasoning · regex_retrieval · constrained_continuation · locate_error · parsing_derivation · table_qa · string_transduction · code_execution · code_runnability · analogical_case_retrieval

TASK_AUTHORING_GUIDE

Parallel generation script

Run bash run_generate.sh for multi-threaded generation to json files (readable by Huggingface Datasets).

Integrations

Prime Environment Hub

#!pip install uv #install uv if needed
!uv tool install prime --with openai  -q
!uv tool run prime -- env install sileod/reasoning-core-env

from verifiers import load_environment
import os; from openai import OpenAI

env = load_environment("reasoning-core-env")

client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY")) #🔑
results = env.evaluate(client=client, model="gpt-4.1-mini", num_examples=20, rollouts_per_example=1)
df=env.make_dataset(results).to_pandas()

Reasoning gym

We use a custom interface but compatible interface. Our tasks, which are mostly orthogonal to RG, can be imported in it.

import reasoning_gym, reasoning_core
from reasoning_gym.composite import DatasetSpec

reasoning_core.register_to_reasoning_gym() # registers RC tasks into RG

specs = [
    DatasetSpec(name='leg_counting', weight=1, config={}),  #from reasoning_gym 🏋
    DatasetSpec(name='arithmetics', weight=1, config={}),  #from reasoning_core ◉
]
D=reasoning_gym.create_dataset('composite', size=10, seed=42, datasets=specs)

And the other way around:

from reasoning_core import get_task
t=get_task('reasoning_gym')
t.generate_example(level=1, rg_task='lcm') #or unspecified for random task

Openreward

https://openreward.ai/dsileo/reasoning-core

Citation and paper

@article{reasoningcore2026,
  title={Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training},
  author={Lacombe, Valentin and Quesnel, Valentin and Sileo, Damien},
  journal={arXiv preprint arXiv:2603.02208},
  year={2026},
  url={https://arxiv.org/abs/2603.02208}
}

https://arxiv.org/abs/2603.02208
Contact: damien.sileo@inria.fr

Releases

No releases published

Packages

 
 
 

Contributors

Languages