Skip to content

Add MedMNIST tutorial for issue #172#239

Open
Zeesejo wants to merge 1 commit into
frankkramer-lab:masterfrom
Zeesejo:Zeesejo-patch-1
Open

Add MedMNIST tutorial for issue #172#239
Zeesejo wants to merge 1 commit into
frankkramer-lab:masterfrom
Zeesejo:Zeesejo-patch-1

Conversation

@Zeesejo

@Zeesejo Zeesejo commented Jun 18, 2026

Copy link
Copy Markdown

This notebook demonstrates how to use AUCMEDI with the MedMNIST benchmark datasets for medical image classification, including setup, data loading, and model training.

Closes #172

This notebook demonstrates how to use AUCMEDI with the MedMNIST benchmark datasets for medical image classification, including setup, data loading, and model training.
Copilot AI review requested due to automatic review settings June 18, 2026 12:33

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new tutorial notebook demonstrating how to use AUCMEDI with the MedMNIST benchmark (PathMNIST) for medical image classification (issue #172).

Changes:

  • Introduces examples/tutorials/MedMNIST.ipynb covering installation, data loading, AUCMEDI pipeline setup, training, evaluation, and visualization.
  • Demonstrates a DenseNet-based training workflow on PathMNIST.
Comments suppressed due to low confidence (1)

examples/tutorials/MedMNIST.ipynb:2

  • This notebook file is committed as a minified one-line JSON blob (no stable formatting / cell IDs), whereas other notebooks in examples/tutorials/ are pretty-printed with one JSON field per line. The minified format makes reviews and future diffs very hard to read/maintain. Re-save the notebook with standard Jupyter formatting (and ideally clear outputs) before merging.
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n","    train_samples.append(train_dataset[i][0])\n","    train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n","    test_samples.append(test_dataset[i][0])\n","    test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n","                   data_directory=None,\n","                   training_samples=train_samples_idx,\n","                   validation_samples=None,\n","                   test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n","                      channels=3,\n","                      architecture=\"DenseNet121\",\n","                      pretrained_weights=True,\n","                      loss=\"categorical_crossentropy\",\n","                      metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n","                          path_imagedir=None,\n","                          labels=train_y,\n","                          image_format=\"array\",\n","                          batch_size=32,\n","                          data_aug=None,\n","                          shuffle=True,\n","                          subfunctions=sf_list,\n","                          resize=None,\n","                          standardize_mode=\"tf\",\n","                          grayscale=False,\n","                          sample_weights=None,\n","                          seed=None,\n","                          image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n","                         path_imagedir=None,\n","                         labels=test_y,\n","                         image_format=\"array\",\n","                         batch_size=32,\n","                         data_aug=None,\n","                         shuffle=False,\n","                         subfunctions=sf_list,\n","                         resize=None,\n","                         standardize_mode=\"tf\",\n","                         grayscale=False,\n","                         sample_weights=None,\n","                         seed=None,\n","                         image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n","                      epochs=10,\n","                      validation_freq=1,\n","                      callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n","    axes[i].imshow(test_x[i])\n","    axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n","    axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Application on MedMNIST

2 participants