Skip to content

Add pandas 3.0 support#294

Open
MaykThewessen wants to merge 3 commits into
PyPSA:masterfrom
MaykThewessen:fix/pandas3-compat
Open

Add pandas 3.0 support#294
MaykThewessen wants to merge 3 commits into
PyPSA:masterfrom
MaykThewessen:fix/pandas3-compat

Conversation

@MaykThewessen

@MaykThewessen MaykThewessen commented Jun 12, 2026

Copy link
Copy Markdown

Summary

Lifts the dependency pin from pandas<3 to pandas<4 and fixes every breakage that surfaces once pandas 3 is installed. Verified end-to-end under pandas 3.0.3 with a JDK present: the full test suite (incl. the Java-gated Duke aggregate + matching paths) passes, and still passes under pandas 2.x.

Several of these breakages sit behind the Java-gated tests, so they are invisible in a JDK-less environment. They were found by running the suite with OpenJDK on PATH.

Root cause

pandas 3.0 makes the NA-preserving string dtype the default and removes several long-deprecated knobs. Three patterns recur:

  • Series.astype(str) now preserves NA as a float NaN instead of the string "nan", breaking downstream str.join / unidecode / astype(int).
  • pd.option_context("future.no_silent_downcasting", True) raises (option removed; behaviour is now default).
  • groupby(..., as_index=False).apply(...) re-inserts the grouping column.

Fixes

Location Symptom Fix
cleaning.clean_name GEO, test_reduced_retrieval: AttributeError: 'float' has no attribute 'encode' fillna("") before astype(str)
data.BEYONDCOAL option_context raises version-gate the context (nullcontext() on pandas 3)
data.MASTR low-memory parser IndexError with usecols; Postleitzahl astype(int) on NA low_memory=False; fillna("0")
duke.add_geoposition_for_duke ','.join hits float NaN stringify per item (map(str, s))
cleaning.aggregate_units second option_context; deprecated copy= kwarg version-gate; drop copy= (no-op under CoW)
matching.best_matches groupby.apply(as_index=False) KeyError on grouping column vectorised idxmax (behaviour-identical, faster, no apply)

All changes are no-ops on pandas 2.x (version-gated contexts; fillna where NaN already stringified to "nan").

🤖 Generated with Claude Code

- clean_name: astype(str) preserves NA under the pandas 3 string dtype,
  so unidecode received float NaN for unnamed plants (GEO, reduced
  retrieval). fillna("") before the conversion.
- BEYONDCOAL: the future.no_silent_downcasting option was removed in
  pandas 3 (its behaviour is now the default); requesting it raises.
  Guard the option_context behind a version check.
- MASTR: the chunked low-memory parser crashes with usecols on
  pandas 3 (IndexError in the DtypeWarning column lookup); read with
  low_memory=False. Postleitzahl astype(str) preserves NA, breaking the
  final astype(int); fillna("0") in between, identical outcome on
  pandas 2 where NaN already became the string "nan" -> "000".

All previously failing non-Duke tests pass under pandas 3.0.3
(test_data, test_plots, test_aggregate[GEO/BEYONDCOAL/MASTR],
test_reduced_retrieval); behaviour unchanged under pandas 2.x.
…ing paths

Relax the dependency pin from pandas<3 to pandas<4 and fix the four
remaining breakages that surface once pandas 3 is installed (these sit
behind the Java-gated Duke aggregate/matching tests, so they are not
visible without a JDK on PATH):

- duke.add_geoposition_for_duke: astype(str) preserves NA on pandas 3,
  so ','.join hit a float NaN. Stringify per item (map(str, s)) so NaN
  becomes 'nan', keeping the 'nan,nan' -> NaN cleanup valid.
- cleaning.aggregate_units: second future.no_silent_downcasting
  option_context (the twin of the BEYONDCOAL one) version-gated; drop
  the deprecated copy= kwarg from infer_objects (no-op under CoW).
- matching.best_matches: groupby(...).apply with as_index=False
  re-inserts the grouping column on pandas 3, absent from the per-group
  Series -> KeyError. Rewrote with a vectorised idxmax (behaviour-
  identical, faster, no apply).

Full test suite (incl. Duke aggregate + matching) passes under both
pandas 2.x and 3.0.3.
@MaykThewessen MaykThewessen changed the title Fix pandas 3.0 incompatibilities in GEO, BEYONDCOAL and MASTR loaders Add pandas 3.0 support Jun 13, 2026
@MaykThewessen

Copy link
Copy Markdown
Author

Gentle nudge: still mergeable against current master, and the full suite passes under both pandas 2.x and 3.0.3. With pandas 3.0 now released, downstream installs (PyPSA-Eur and friends) will start hitting the pandas<3 pin. Could a maintainer take a look when there's a window? Happy to rebase or split the diff if that eases review.

@FabianHofmann FabianHofmann left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @MaykThewessen , happy to merge this. could you remove the inline comments please? they makes sense for reviewing now, but not in the production code

Per maintainer review on PyPSA#294: drop the explanatory comments added for
review. Code changes are unchanged.
@MaykThewessen

Copy link
Copy Markdown
Author

Thanks @FabianHofmann! Removed the inline comments in b33de9c. The code changes are unchanged (only the explanatory comments were dropped), so it's still verified end-to-end under pandas 3.0.3 and 2.x. Ready to merge whenever you are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants