Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
a091e83
Add ATIF conversion pipeline
openhands-agent Jun 3, 2026
1c6a153
fix: bump schema version for ATIF changes
openhands-agent Jun 3, 2026
7a121b6
chore: address PR review feedback (#256)
openhands-agent Jun 3, 2026
4b9a0f6
chore: clarify ATIF normalization pipeline
openhands-agent Jun 3, 2026
1a6e588
fix: preserve environment observations in ATIF roundtrip
openhands-agent Jun 3, 2026
2773fea
fix: preserve available APIs through ATIF conversion
openhands-agent Jun 3, 2026
8e570b5
fix: preserve ATIF standalone observation results
openhands-agent Jun 3, 2026
a28b498
docs: align OpenHands sample SFT generation with ATIF
openhands-agent Jun 3, 2026
6c79c83
chore: ignore full ATIF artifacts
openhands-agent Jun 3, 2026
627e4cc
fix: normalize ATIF code action language metadata
openhands-agent Jun 3, 2026
531bde4
docs: update sample script OpenHands ATIF route
openhands-agent Jun 3, 2026
deeed6f
docs: clarify SFT conversion test scope
openhands-agent Jun 3, 2026
123daa5
fix: separate ATIF normalization stage
openhands-agent Jun 3, 2026
830b8dc
Merge remote-tracking branch 'origin/main' into openhands/atif-unific…
Jun 8, 2026
6bb7226
fix: emit normalized ATIF sample std data
Jun 8, 2026
fce6bb0
test: satisfy ATIF std pre-commit checks
Jun 8, 2026
8fc8583
fix: preserve raw ATIF before standardization
Jun 8, 2026
5b2af5b
fix: address ATIF docs and OS SDK wrapper
Jun 8, 2026
7d0dc3c
fix: satisfy ruff import order for OS wrapper
Jun 8, 2026
50df06c
docs: align contributing schema overview with ATIF
Jun 8, 2026
bb0b1ab
docs: restore AGENTS repository guide
Jun 8, 2026
c463092
fix: move OS normalization into ATIF std
Jun 8, 2026
396133a
fix: align shared converter execution guards
Jun 8, 2026
ff580a6
Remove legacy ADP schema
openhands-agent Jun 11, 2026
95e27d6
chore: remove raw_to_standardized wrappers
openhands-agent Jun 11, 2026
f8d9597
chore: regenerate samples through ATIF pipeline
openhands-agent Jun 11, 2026
8654c75
fix: preserve ATIF SFT conversion outputs
openhands-agent Jun 13, 2026
b56a855
fix: remove sys path import hacks
openhands-agent Jun 13, 2026
4267bf1
ci: install package before tests
openhands-agent Jun 13, 2026
11b2050
fix: keep raw extraction raw
openhands-agent Jun 13, 2026
5139e2e
style: match CI ruff formatting
openhands-agent Jun 13, 2026
2d6c3dc
fix: preserve raw extraction for remaining datasets
openhands-agent Jun 13, 2026
a3f7a2e
fix: keep remaining raw extraction literal
openhands-agent Jun 13, 2026
6116064
style: apply ruff formatting
openhands-agent Jun 13, 2026
af91d08
fix: preserve raw trajectory semantics in atif conversion
openhands-agent Jun 13, 2026
925cf7e
test: require OpenHands SDK samples
openhands-agent Jun 13, 2026
e6314c4
refactor: keep dataset scripts local
openhands-agent Jun 14, 2026
cf976ed
ci: pin pytest below 9
openhands-agent Jun 14, 2026
7be8b47
Normalize std tools and simplify SFT prompts
openhands-agent Jun 14, 2026
7d7ebae
Match CI formatting for SFT invariants
openhands-agent Jun 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
16 changes: 8 additions & 8 deletions .agents/skills/custom-codereview-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,18 @@ When reviewing this repository, be strict about dataset correctness and reproduc

For every dataset addition or dataset-format change, verify that the PR follows all applicable guidelines in `AGENTS.md`. In particular, check that:

- Required files are present: `README.md`, `extract_raw.py`, `raw_to_standardized.py`, `schema_raw.py`, `sample_raw.json`, `sample_std.json`, and `sample_sft/openhands_v0.json`. `api.py` is additionally required whenever the dataset emits any `ApiAction`.
- Required files are present: `README.md`, `extract_raw.py`, `raw_to_atif.py`, `atif_to_std.py`, `schema_raw.py`, `sample_raw.json`, `sample_std.json`, and `sample_sft/openhands_v0.json`.
- Top-level dataset JSON files are limited to `sample_raw.json`, `sample_std.json`, and `generated_thoughts.json`. No root-level `sample_sft.json`, `full_*.json`, temporary chunks, downloaded corpora, scratch JSON, or alternate sample files such as `sample_fixed.json`.
- The `sample_sft/` subdirectory contains agent-specific samples named `{agent_name}.json` (e.g. `openhands_v0.json`, `sweagent.json`). These must be regenerable from `sample_std.json` via the corresponding agent's `std_to_sft.py` and must cover the same trajectories/IDs as the standardized sample.
- Sample files are generated by committed scripts and are not hand-patched fixtures. Mentally (or actually) re-run the pipeline: `sample_raw.json` → `raw_to_standardized.py` → `sample_std.json` → `agents/<agent>/std_to_sft.py` → `sample_sft/<agent_name>.json` should reproduce the committed JSON. If sample JSON changed but the corresponding generator bug was not fixed, flag it.
- `sample_raw.json`, `sample_std.json`, and each `sample_sft/<agent_name>.json` represent **the same records in the same order**, with matching IDs between standardized and SFT stages. This is a hard requirement, not a soft preference.
- Sample files are generated by committed scripts and are not hand-patched fixtures. Mentally (or actually) re-run the pipeline: `sample_raw.json` → `raw_to_atif.py` → `sample_atif.json` → `atif_to_std.py` → `sample_std.json` → `agents/<agent>/std_to_sft.py` → `sample_sft/<agent_name>.json` should reproduce the committed JSON. If sample JSON changed but the corresponding generator bug was not fixed, flag it.
- `sample_raw.json`, `sample_atif.json`, `sample_std.json`, and each `sample_sft/<agent_name>.json` represent **the same records in the same order**, with matching IDs between standardized and SFT stages. This is a hard requirement, not a soft preference.
- Sample size is small but representative — normally 3–5 trajectories — and covers important edge cases (tool calls, command output, final answers, dataset-specific action types, failures/rewards/terminal states where applicable).
- Extraction, standardization, and SFT conversion are deterministic so future contributors can reproduce the samples (no unseeded `random.*`, time-dependent behavior, or nondeterministic dict ordering in outputs).
- `schema_raw.py` validates `sample_raw.json` and standardized trajectories validate against the ADP schema.
- Every `ApiAction.function` exists in the dataset's `api.py`, and every `kwargs` object satisfies that function's Python signature (including required parameters such as the `message` argument for `finish`). If the dataset emits `ApiAction` without an `api.py`, flag it.
- If standardized trajectories include top-level `available_apis`, verify the dataset has `api.py`, the source data explicitly specifies per-instance tool/API availability, the list is not merely copied wholesale from `api.py` or inferred from used actions, every listed API exists in `api.py`, and every `ApiAction.function` in that trajectory appears in the list.
- `schema_raw.py` validates `sample_raw.json` and standardized trajectories validate against the ATIF schema.
- Every custom `ToolCall.function_name` used in `sample_std.json` is declared in `metadata.json`, and every `ToolCall.arguments` object satisfies that tool schema.
- If standardized trajectories include per-instance tool availability metadata, verify the source data explicitly specifies it; do not infer availability merely from the tools used in the trajectory.
- SFT messages containing `<function=`, `<function_calls>`, or `<invoke name=` use `"from": "function_call"` (not `gpt`, `human`, `assistant`, etc.).
- `TextObservation.source` uses only schema-supported values: `user`, `agent`, or `environment`. Reject invented values like `system`, `os`, or `assistant`.
- ATIF `Step.source` uses only schema-supported values: `system`, `user`, or `agent`.
- Raw trajectory semantics are preserved: repeated actions, consecutive tool calls, observations, failures, rewards, and terminal states are not silently dropped. Any filtering must be implemented in code AND explained/justified in the PR description.
- Dataset-local `std_to_sft.py`, duplicate API definitions, or schema changes are clearly justified. Prefer shared agent converters in `agents/` whenever possible.
- Large corpora, full generated files (`full_raw.json`, `full_std.json`, `full_sft.json`), temporary chunks, caches, screenshots, and scratch JSON are not committed.
Expand Down Expand Up @@ -67,7 +67,7 @@ Dataset PR descriptions must include **all** of the following. If any is missing
- **License** of the source data.
- **Size and split** used (e.g. number of trajectories, which split(s)).
- **Files added or changed** in this PR.
- **Schema mapping summary** — how raw roles/actions/observations map to ADP types (`MessageAction`, `CodeAction`, `ApiAction`, `TextObservation`, `WebObservation`).
- **Schema mapping summary** — how raw roles/actions/observations map to ATIF steps, tool calls, and observation results.
- **Tests run** — which validation tests were executed and their results, or which equivalent CI checks passed.
- **Known limitations** of the dataset or conversion.
- **Design-decision catalog** for unclear implementation choices (see below).
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/check_api_docstrings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ jobs:
python -m pip install --upgrade pip
pip install pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip install -e .

- name: Check dataset metadata
run: |
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

- name: Run pre-commit
uses: pre-commit/action@v3.0.0
Expand Down
16 changes: 1 addition & 15 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,7 @@ jobs:
python -m pip install --upgrade pip
pip install pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Check schema version bump
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
base_ref="origin/${{ github.base_ref }}"
else
base_ref="${{ github.event.before }}"
fi

if echo "$base_ref" | grep -Eq '^0+$'; then
echo "Skipping schema version bump check because no base ref is available."
else
python scripts/check_schema_version_bump.py \
--base-ref "$base_ref" \
--head-ref HEAD
fi
pip install -e .
- name: Run pytest
run: |
pytest tests/test_*.py
65 changes: 0 additions & 65 deletions .github/workflows/schema-release.yml

This file was deleted.

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,12 @@ full_std_chunks/
full_sft_chunks/

full_raw.json
full_atif.json
full_std.json
full_sft.json

full_raw.jsonl
full_atif.jsonl
full_std.jsonl
full_sft.jsonl

Expand All @@ -30,6 +32,7 @@ full_sft.jsonl
/tags-opts

.cache
*.egg-info/

/datasets/androidcontrol/android_env_utils/.eggs/
/datasets/androidcontrol/android_env_utils/android_env/
Expand Down
Loading
Loading