Skip to content

Add standalone HF v2 shard emitter#29

Open
chboishabba wants to merge 1 commit intoacrion:developfrom
chboishabba:add-hf-v2-shard-emitter
Open

Add standalone HF v2 shard emitter#29
chboishabba wants to merge 1 commit intoacrion:developfrom
chboishabba:add-hf-v2-shard-emitter

Conversation

@chboishabba
Copy link
Copy Markdown

This adds a minimal standalone producer for HF-backed partial loading.

What it adds:

  • tools/emit_zelph_hf_v2.py
    • stdlib-only
    • reads a Zelph .index-file sidecar
    • emits one shard file per section-local chunk
    • writes a zelph-hf-layout/v2 manifest for manifest-backed partial loading
  • a short usage note in mkdocs/docs/binaries.md

Why:

  • the manifest consumer/runtime side is already in Zelph
  • the missing piece for large hosted artifacts is a small producer tool that does not require the full ITIR repo
  • this is the minimal slice Stefan can run directly after .index-file

Validation:

  • built zelph from the current branch base
  • smoke-tested tools/emit_zelph_hf_v2.py against a synthetic .bin + .index.json and verified shard files plus manifest emission

Usage:

python tools/emit_zelph_hf_v2.py \
  --bin wikidata-20260309-all.bin \
  --index wikidata-20260309-all-index.json \
  --output wikidata-20260309-all.hf-v2.json \
  --artifact-name wikidata-20260309-all \
  --hf-root hf://datasets/<owner>/<dataset> \
  --shard-root wikidata-20260309-all-shards

This produces:

  • wikidata-20260309-all.hf-v2.json
  • wikidata-20260309-all-shards/

After upload, the manifest can be consumed via .load-partial manifest.json ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant