Open science implementation of h5-index and h5-median for Google using Semantic Scholar.
The h5-index and h5-median are bibliometric indicators published annually by Google Scholar — yet the underlying data, computation logic, and historical series remain opaque and inaccessible. Research systems built on closed metrics cannot be scrutinised, reproduced, or challenged by the community they are meant to serve. This repository exists as a principled alternative: all venue definitions, collection scripts, and results are fully open and reproducible, so that any researcher can verify, extend, or contest the numbers.
See also Open Impact Factor for the same philosophy applied to journal impact factors.
Comparison for Software Systems: https://gistpreview.github.io/?48a57825d306f86eaf2b2c062e30674a
impact_factor.pyShared API, caching, and venue-matching helpers.collect_h5_median_timeseries.pyRecomputes a rolling 5-yearh5-indexandh5-mediananalogue from Semantic Scholar.compare_h5_median.pyCompares the recomputed values against the saved Google Scholar HTML snapshot and draws an interactive scatter plot.render_h5_median_figure.pyRenders the transcribed Google Scholar snapshot as an interactive HTML chart.data/software_systems_google_scholar.jsonVenue definitions and aliases used for Semantic Scholar collection.data/software_systems_google_scholar_h5.jsonTranscribed Google Scholarh5-indexandh5-medianvalues.Software Systems - Google Scholar Metrics - 20260630.htmlSaved Google Scholar snapshot used for verification.
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txtRecommended: get and configure a Semantic Scholar API key in the system keyring (data collection goes much faster):
- service:
login2 - username:
semanticscholar_key
The scripts also accept --api-key explicitly.