Skip to content

Releases: patcon/valency-anndata

v0.3.0

04 Mar 22:26

Choose a tag to compare

Added

  • val.pp.filter_participants() — filter participants (rows) by minimum number of statements voted on. Counts non-NaN entries (real votes), correctly treating -1, 0, and +1 as votes.
  • val.pp.filter_statements() — filter statements (columns) by minimum number of participants who voted. Counts non-NaN entries (real votes), correctly treating -1, 0, and +1 as votes.
  • val.datasets.vtaiwan() — load any of four Polis conversations from Taiwan's vTaiwan civic policymaking process, selectable by topic= keyword ("uber", "airbnb", "online_alcohol", "caning"). Topic parameter uses Literal type hint for IDE/notebook autocomplete.
  • val.datasets.american_assembly() — load Polis conversations run by the American Assembly in Kentucky cities, selectable by city= keyword ("bowling_green", "louisville"). City parameter uses Literal type hint for IDE/notebook autocomplete.
  • val.datasets.bg2050() — load the BG 2050 community visioning conversation from Bowling Green and Warren County, Kentucky (~7,900 participants).
  • val.datasets.cuba_protest() — load any of three Polis conversations run around Cuba's planned 15N march (November 2021), selectable by period= keyword ("before_1", "before_2", "after"). Period parameter uses Literal type hint for IDE/notebook autocomplete.
  • val.datasets.japanchoice() — load any of eight Polis conversations from Japan Choice (four policy topics × two election years: 2025 and 2026), selectable by positional topic argument.
  • val.datasets.klimarat() — load any of the five Polis conversations from Austria's Citizens' Climate Council (Klimarat), selectable by topic= keyword.
  • Five Klimarat datasets added to the docs overview table with fingerprints.
  • scripts/generate_fingerprint_heatmap.py — generates a square RdYlGn vote-matrix heatmap from any Polis report URL.
  • docs/api/datasets.yml — machine-readable registry of reference datasets rendered into an overview table.
  • Two reference datasets added to the docs table: Aufstehen and Chile Protests.
  • mkdocs-glightbox — clicking a fingerprint thumbnail opens the full-size image in a lightbox popup.
  • New Labs page in docs.
  • make strip-notebook-widgets — strips ipywidget metadata from notebooks so they render correctly on GitHub.

Changed

  • val.preprocessing.impute() now uses sklearn.impute.SimpleImputer for "zero", "mean", and "median" strategies. Adds strategy="knn" backed by sklearn.impute.KNNImputer.
  • val.preprocessing.highly_variable_statements() defaults changed: variance_mode is now "valence", bin_by is now "p_engaged", and n_bins is now 10.

Fixes

  • val.preprocessing.highly_variable_statements() no longer emits RuntimeWarning: Degrees of freedom <= 0 for slice when a statement column has fewer than 2 non-NaN votes.
  • Bugfix: scaling factors in recipe_polis were dividing instead of multiplying!

v0.2.0

17 Feb 04:38

Choose a tag to compare

Added

  • hf: and huggingface: source prefixes for val.datasets.polis.load() — load any HuggingFace-hosted Polis export as a one-liner, e.g. load("hf:patcon/polis-aufstehen-2018") (#81).
  • CLAUDE.md guidance file for Claude Code contributors (#58).
  • Pytest infrastructure and test suite for datasets.polis.load (#59).
    • 29 unit + local-fixture tests; 4 opt-in live network tests (make test-live).
    • Synthetic and real CSV fixtures checked in under tests/fixtures/.
    • make test and make test-live targets added to Makefile.
  • Unit and integration tests for tools.kmeans (#63).
    • 22 mocked unit tests + 1 real-clustering integration test.
    • 3 k-means++ smoke tests.
  • val.tl.recipe_polis2_statements() — embeds and clusters statements (var axis) via polismath (#44).
    • New polis2 optional-dependency group (pip install valency-anndata[polis2]).
    • 13 unit tests with all polismath helpers mocked.
    • Noise/unassigned cluster labels (-1) in evoc_polis2_top are stored as NA so scanpy renders them as lightgray by default.
    • show_progress=False (the default) now fully silences HF download progress bars and mlx model-load stdout.
    • "Polis 2.0 Pipeline" tutorial added to docs nav.
  • val.preprocessing.highly_variable_statements() — identify highly variable statements in vote matrices (#52).
    • Analogous to scanpy's highly_variable_genes for single-cell data.
    • Supports multiple variance modes (overall, valence, engagement) and binning strategies.
    • key_added parameter allows running multiple times with different settings.
    • val.viz.highly_variable_statements() plotting function for visualizing dispersion metrics.
    • mask_var parameter added to val.tools.recipe_polis(), val.tools.pacmap(), and val.tools.localmap() for filtering statements before dimensionality reduction.
  • val.write() — export AnnData to h5ad with automatic sanitization for webapp compatibility (#57).
    • include parameter for selective export using glob-style "namespace/key" paths (e.g. "obsm/X_*").
  • make lint and make fmt targets for ruff.
  • Claude Code skill for guided Polis conversation exploration (#42).
    • Interactive prompts for projection selection (PaCMAP, LocalMAP, UMAP, t-SNE) and QC annotation selection.
    • Fixed CLI plotting to support multi-color val.viz.embedding() calls.
  • Cache downloaded Polis report files locally for 24 hours using platformdirs (#70).
    • skip_cache parameter on val.datasets.polis.load() to bypass the cache.
    • Smart cache revalidation using last_vote_timestamp from the Polis math endpoint — stale cache is reused without re-fetching when no new votes have been cast (#78).
  • mask_obs parameter on val.tools.kmeans() for clustering a subset of participants (#77).
  • val.datasets.polis.export_csv() — export an AnnData object to Polis CSV format (votes.csv + comments.csv).
  • include_huggingface_metadata parameter on val.datasets.polis.export_csv() — opt-in generation of a HuggingFace dataset card (README.md with YAML frontmatter) alongside the CSV export.
  • show_progress parameter on val.datasets.polis.load() — displays a tqdm progress bar when fetching votes per-participant from the API; auto-detects notebooks vs terminal (#79).

Fixes

  • Fixed uns["statements"] having comment-id as both index and column, which prevented h5ad serialization (#57).
  • Fixed API vote sign inversion — the Polis API returns inverted vote signs vs the CSV export convention; votes are now negated on ingest so +1 = agree and -1 = disagree everywhere.
  • Replaced deprecated use_highly_variable=False with mask_var=None in recipe_polis PCA call to eliminate FutureWarning from scanpy (#82).

v0.1.1

20 Jan 07:56

Choose a tag to compare

Fixes

  • Fixed the README image so not broken on PyPI.

v0.1.0

20 Jan 07:47
b981b1b

Choose a tag to compare

Initial release includes:

  • val.viz.schematic_diagram() helper to showing data structure and visual diffs.
  • Helper methods for PaCMAP and LocalMAP dimensional reduction.
  • Langevitour visualisation for exploring high dimensional space.
  • Basic Jupyter Scatter support for up to 1M participants, including animations.
  • Import of Polis conversation data.
  • Basic Polis v1 pipeline support.
  • Added val.tools.kmeans().
  • Large reference datasets for Aufstehen political party consultation (33k participants) and the #ChileDesperto protest (3k).
  • val.viz.voter_vignette_widget() for exploring data stories of random individuals.
  • Comprehensive documentation website.
  • Wrappers for various scanpy methods.