Skip to content

fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0#1971

Open
yoavkatz wants to merge 5 commits into
mainfrom
fix/hf-namespaced-dataset-paths
Open

fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0#1971
yoavkatz wants to merge 5 commits into
mainfrom
fix/hf-namespaced-dataset-paths

Conversation

@yoavkatz
Copy link
Copy Markdown
Member

@yoavkatz yoavkatz commented May 26, 2026

Summary

  • HF dataset paths: Update all LoadHF(path=...) calls to use the full namespace/name format required by huggingface_hub >= 1.16 (e.g., hellaswagRowan/hellaswag)
  • F1 metrics: Replace evaluate library wrapper with direct sklearn calls to fix numpy 2.0 TypeError on 0-d array scalar conversion
  • text2sql: Cast DataFrame to str before column-wise sorting to fix pandas 3.0 TypeError when assigning string values to int64 columns

Test plan

  • All prepare/cards scripts run successfully and regenerate catalog JSONs
  • CI performance test passes (hellaswag loads correctly)
  • test_f1_multiple_use, test_confidence_interval_off pass
  • test_text2sql_accuracy_different_db_schema passes

🤖 Generated with Claude Code

yoavkatz and others added 4 commits May 26, 2026 16:41
…atibility

huggingface_hub 1.16+ enforces that dataset repository IDs must use the
'namespace/name' format. Bare dataset names (e.g., 'hellaswag') are no
longer accepted, causing HfUriError in CI.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Yoav Katz <[email protected]>
Run all prepare/cards scripts to update the catalog JSON files with the
full namespace/name format for HuggingFace dataset paths.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Yoav Katz <[email protected]>
The evaluate library's cached f1.py uses `float(score)` on numpy arrays,
which raises TypeError with numpy >= 2.0. Bypass the evaluate wrapper and
call sklearn's f1_score/precision_score/recall_score directly.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Yoav Katz <[email protected]>
- F1 metric: replace evaluate library wrapper with direct sklearn calls
  to avoid numpy 2.0 float() TypeError on 0-d arrays
- text2sql: cast DataFrame to str before sorting to avoid pandas 3.0
  TypeError when assigning string values to int64 columns
- wiki_bio: use namespaced HF dataset path (michaelauli/wiki_bio)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Yoav Katz <[email protected]>
@yoavkatz yoavkatz changed the title fix: Use namespaced HF dataset paths for huggingface_hub >= 1.16 fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0 May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant