Skip to content

sissa-data-science/DADApy

Repository files navigation

Code style: black Imports: isort codecov GitHub Workflow Status GitHub Workflow Status GitHub Workflow Status

DADApy is a Python package for the characterization of manifolds in high-dimensional spaces.

Homepage

For more details and tutorials, visit the homepage at: https://dadapy.readthedocs.io/

Quick Example

import numpy as np
from dadapy.data import Data

# Generate a simple 3D gaussian dataset
X = np.random.normal(0, 1, (1000, 3))

# initialize the "Data" class with the set of coordinates
data = Data(X)

# compute distances up to the 100th nearest neighbor
data.compute_distances(maxk=100)

# compute the intrinsic dimension using 2nn estimator
id_twonn, id_error, id_distance = data.compute_id_2NN()

# compute the intrinsic dimension up to the 64th nearest neighbors using Gride
id_gride_list, id_error_list, id_distance_list = data.return_id_scaling_gride(range_max=64)

# compute the density using PAk, a point adaptive kNN estimator
log_den, log_den_error = data.compute_density_PAk()

# find the peaks of the density profile through the ADP algorithm
cluster_assignment = data.compute_clustering_ADP()

# compute the neighborhood overlap with another dataset
X2 = np.random.normal(0, 1, (1000, 5))
overlap_x2 = data.return_data_overlap(X2)

# compute the information imbalance with another dataset
ii_x2 = data.return_information_imbalance(X2)

# compute the neighborhood overlap with a set of labels
labels = np.repeat(np.arange(10), 100)
overlap_labels = data.return_label_overlap(labels, k=10)

The Data class is just container of classes. If you need to work with a specific module
you can equivalently import it directly.

import numpy as np
from dadapy import IdEstimation

# Generate a simple 3D gaussian dataset
X = np.random.normal(0, 1, (1000, 3))

# initialize the "Data" class with the set of coordinates
ie = IdEstimation(X)

# compute the intrinsic dimension up to the 64th nearest neighbors using Gride
id_list, id_error_list, id_distance_list = ie.return_id_scaling_gride(range_max=64)

This allows to work more naturally with data comparison methods.

import numpy as np
from dadapy import NeighborhoodOverlap

# Generate a simple 3D gaussian dataset
X = np.random.normal(0, 1, (1000, 3))
X2 = np.random.normal(0, 1, (1000, 5))
labels = np.repeat(np.arange(10), 100)

# compute the neighborhood overlap with another dataset
no = NeighborhoodOverlap(X, X2)
overlap_x2 = no.return_data_overlap()

# compute the neighborhood overlap with a set of labels
no = NeighborhoodOverlap(X, labels = labels)
overlap_x2 = no.return_label_overlap(k=10)

Currently implemented algorithms

  • Intrinsic dimension estimators

  • Two-NN estimator

    Facco et al., Scientific Reports (2017)

  • Gride estimator

    Denti et al., Scientific Reports (2022)

  • I3D estimator (for both continuous and discrete spaces)

    Macocco et al., Physical Review Letters (2023)

  • BID estimator

    Acevedo et al., Nature Communications Physics (2025)

  • Density estimators

  • kNN estimator

  • k*NN estimator (kNN with an adaptive choice of k)

  • PAk estimator

    Rodriguez et al., JCTC (2018)

  • point-adaptive mean-shift gradient estimator

    Carli et al., ArXiv (2024)

  • BMTI estimator

    Carli et al., ArXiv (2024)

  • Density peaks clustering methods

  • Density peaks clustering

    Rodriguez and Laio, Science (2014)

  • Advanced density peaks clustering

    d’Errico et al., Information Sciences (2021)

  • k-peak clustering

    Sormani, Rodriguez and Laio, JCTC (2020)

  • Manifold comparison tools

  • Neighbourhood overlap

    Doimo et al., NeurIPS (2020)

  • Information imbalance

    Glielmo et al., PNAS Nexus (2022)

  • Feature selection and weighting tool

  • Differentiable Information Imbalance

    Wild et al., Nature Communications (2025)

  • Causal analysis tools

  • Imbalance Gain

    Del Tatto et al., PNAS (2024)

  • Community causal graph

    Allione et al., arXiv (2025)

Installation

The package is compatible with the Python versions 3.10, 3.11, 3.12, 3.13, and 3.14. We currently only support Unix-based systems, including Linux and macOS. For Windows machines, we suggest using the Windows Subsystem for Linux (WSL).

The package requires numpy, scipy, scikit-learn, jax, jaxlib, and matplotlib for the visualizations.

The package contains Cython-generated C extensions that are automatically compiled during installation.

The latest release is available through pip:

pip install dadapy

To install the latest development version, clone the source code from GitHub and install it with pip as follows:

pip install git+https://github.com/sissa-data-science/DADApy

Alternatively, if you'd like to modify the implementation of some function locally, you can download the repository and install the package with:

git clone https://github.com/sissa-data-science/DADApy.git
cd DADApy
python setup.py build_ext --inplace
pip install .

The methods of the classes DiffImbalance and CausalGraph can be run on a GPU, using a suitable installation of JAX on a GPU platform. The code has been tested using JAX v0.4.30 with CUDA 12, which can be installed with:

pip install --upgrade "jax[cuda12_pip]==0.4.30" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

For more information on the installation of the JAX library on GPUs, see the official repository.

Citing DADApy

A description of the package is available here.

Please consider citing it if you found this package useful for your research:

@article{dadapy,
    title = {DADApy: Distance-based analysis of data-manifolds in Python},
    journal = {Patterns},
    pages = {100589},
    year = {2022},
    issn = {2666-3899},
    doi = {https://doi.org/10.1016/j.patter.2022.100589},
    url = {https://www.sciencedirect.com/science/article/pii/S2666389922002070},
    author = {Aldo Glielmo and Iuri Macocco and Diego Doimo and Matteo Carli and Claudio Zeni and Romina Wild and Maria d’Errico and Alex Rodriguez and Alessandro Laio},
    }