Skip to content

feat: add kd_tree option for O(N log N) neighbor search in kNN estimators#12

Open
iPopovS wants to merge 5 commits into
Center-For-Complex-Systems-Science:mainfrom
iPopovS:main
Open

feat: add kd_tree option for O(N log N) neighbor search in kNN estimators#12
iPopovS wants to merge 5 commits into
Center-For-Complex-Systems-Science:mainfrom
iPopovS:main

Conversation

@iPopovS

@iPopovS iPopovS commented Jun 4, 2026

Copy link
Copy Markdown

What this does

Adds an optional kd_tree=False parameter to discover_network, conditional_mutual_information, knn_mutual_information, and geometric_knn_mutual_information. When set to True, neighbor searches use scipy.spatial.KDTree instead of a full pairwise distance matrix.

Why

The brute-force cdist approach is O(N²) in memory and time. KD-Tree queries are O(N log N), making a significant difference for large datasets.

Backward compatibility

kd_tree=False is the default, so all existing code is unaffected.

Testing

  • Added tests/test_kdtree.py with correctness tests (kd_tree=True vs False produce numerically identical results) and a benchmark showing speedup.
  • All existing tests continue to pass.

Benchmark results

N=300: brute=0.003s kd_tree=0.003s speedup=0.9x

N=800: brute=0.021s kd_tree=0.006s speedup=3.5x

Pre-existing test failures

TestGeometricKnnConditionalMutualInformation::test_geometric_knn_cmi_no_conditioning fails on the unmodified original code. The test compares geometric_knn_conditional_mutual_information(X, Y, Z=None, k=3) against geometric_knn_mutual_information(X, Y, k=1) — mismatched k values, so the assertion fails regardless of this PR's changes. Not introduced here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant