Add pandas 3.0 support#294
Conversation
- clean_name: astype(str) preserves NA under the pandas 3 string dtype,
so unidecode received float NaN for unnamed plants (GEO, reduced
retrieval). fillna("") before the conversion.
- BEYONDCOAL: the future.no_silent_downcasting option was removed in
pandas 3 (its behaviour is now the default); requesting it raises.
Guard the option_context behind a version check.
- MASTR: the chunked low-memory parser crashes with usecols on
pandas 3 (IndexError in the DtypeWarning column lookup); read with
low_memory=False. Postleitzahl astype(str) preserves NA, breaking the
final astype(int); fillna("0") in between, identical outcome on
pandas 2 where NaN already became the string "nan" -> "000".
All previously failing non-Duke tests pass under pandas 3.0.3
(test_data, test_plots, test_aggregate[GEO/BEYONDCOAL/MASTR],
test_reduced_retrieval); behaviour unchanged under pandas 2.x.
…ing paths Relax the dependency pin from pandas<3 to pandas<4 and fix the four remaining breakages that surface once pandas 3 is installed (these sit behind the Java-gated Duke aggregate/matching tests, so they are not visible without a JDK on PATH): - duke.add_geoposition_for_duke: astype(str) preserves NA on pandas 3, so ','.join hit a float NaN. Stringify per item (map(str, s)) so NaN becomes 'nan', keeping the 'nan,nan' -> NaN cleanup valid. - cleaning.aggregate_units: second future.no_silent_downcasting option_context (the twin of the BEYONDCOAL one) version-gated; drop the deprecated copy= kwarg from infer_objects (no-op under CoW). - matching.best_matches: groupby(...).apply with as_index=False re-inserts the grouping column on pandas 3, absent from the per-group Series -> KeyError. Rewrote with a vectorised idxmax (behaviour- identical, faster, no apply). Full test suite (incl. Duke aggregate + matching) passes under both pandas 2.x and 3.0.3.
|
Gentle nudge: still mergeable against current |
FabianHofmann
left a comment
There was a problem hiding this comment.
thanks @MaykThewessen , happy to merge this. could you remove the inline comments please? they makes sense for reviewing now, but not in the production code
Per maintainer review on PyPSA#294: drop the explanatory comments added for review. Code changes are unchanged.
|
Thanks @FabianHofmann! Removed the inline comments in b33de9c. The code changes are unchanged (only the explanatory comments were dropped), so it's still verified end-to-end under pandas 3.0.3 and 2.x. Ready to merge whenever you are. |
Summary
Lifts the dependency pin from
pandas<3topandas<4and fixes every breakage that surfaces once pandas 3 is installed. Verified end-to-end under pandas 3.0.3 with a JDK present: the full test suite (incl. the Java-gated Duke aggregate + matching paths) passes, and still passes under pandas 2.x.Several of these breakages sit behind the Java-gated tests, so they are invisible in a JDK-less environment. They were found by running the suite with OpenJDK on PATH.
Root cause
pandas 3.0 makes the NA-preserving string dtype the default and removes several long-deprecated knobs. Three patterns recur:
Series.astype(str)now preservesNAas a float NaN instead of the string"nan", breaking downstreamstr.join/unidecode/astype(int).pd.option_context("future.no_silent_downcasting", True)raises (option removed; behaviour is now default).groupby(..., as_index=False).apply(...)re-inserts the grouping column.Fixes
cleaning.clean_nametest_reduced_retrieval:AttributeError: 'float' has no attribute 'encode'fillna("")beforeastype(str)data.BEYONDCOALoption_contextraisesnullcontext()on pandas 3)data.MASTRIndexErrorwithusecols;Postleitzahlastype(int)on NAlow_memory=False;fillna("0")duke.add_geoposition_for_duke','.joinhits float NaNmap(str, s))cleaning.aggregate_unitsoption_context; deprecatedcopy=kwargcopy=(no-op under CoW)matching.best_matchesgroupby.apply(as_index=False)KeyErroron grouping columnidxmax(behaviour-identical, faster, noapply)All changes are no-ops on pandas 2.x (version-gated contexts;
fillnawhere NaN already stringified to"nan").🤖 Generated with Claude Code