Add MedMNIST tutorial for issue #172#239
Open
Zeesejo wants to merge 1 commit into
Open
Conversation
This notebook demonstrates how to use AUCMEDI with the MedMNIST benchmark datasets for medical image classification, including setup, data loading, and model training.
There was a problem hiding this comment.
Pull request overview
Adds a new tutorial notebook demonstrating how to use AUCMEDI with the MedMNIST benchmark (PathMNIST) for medical image classification (issue #172).
Changes:
- Introduces
examples/tutorials/MedMNIST.ipynbcovering installation, data loading, AUCMEDI pipeline setup, training, evaluation, and visualization. - Demonstrates a DenseNet-based training workflow on PathMNIST.
Comments suppressed due to low confidence (1)
examples/tutorials/MedMNIST.ipynb:2
- This notebook file is committed as a minified one-line JSON blob (no stable formatting / cell IDs), whereas other notebooks in
examples/tutorials/are pretty-printed with one JSON field per line. The minified format makes reviews and future diffs very hard to read/maintain. Re-save the notebook with standard Jupyter formatting (and ideally clear outputs) before merging.
{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1 @@ | |||
| {"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4} | |||
| @@ -0,0 +1 @@ | |||
| {"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4} | |||
| @@ -0,0 +1 @@ | |||
| {"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4} | |||
| @@ -0,0 +1 @@ | |||
| {"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4} | |||
| @@ -0,0 +1 @@ | |||
| {"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4} | |||
| @@ -0,0 +1 @@ | |||
| {"cells":[{"cell_type":"markdown","metadata":{},"source":["# Medical Image Classification with MedMNIST\n","\n","This notebook demonstrates how to use **AUCMEDI** with the **MedMNIST** benchmark datasets.\n","\n","MedMNIST is a collection of standardized biomedical image datasets for medical image analysis. We'll use **PathMNIST** (pathology images) to demonstrate AUCMEDI's three pillars: DataInterface, NeuralNetwork, and DataGenerator."]},{"cell_type":"markdown","metadata":{},"source":["## Setup and Installation\n","\n","First, install the medmnist package:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["!pip install medmnist\n","import os\n","os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\""]},{"cell_type":"markdown","metadata":{},"source":["## Import Libraries"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import numpy as np\n","from medmnist import PathMNIST\n","import aucmedi\n","from aucmedi import *"]},{"cell_type":"markdown","metadata":{},"source":["## Load MedMNIST Data\n","\n","We'll use PathMNIST, which contains 107,180 pathology images from 9 tissue classes."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Download and load PathMNIST\n","train_dataset = PathMNIST(split='train', download=True)\n","val_dataset = PathMNIST(split='val', download=True)\n","test_dataset = PathMNIST(split='test', download=True)\n","\n","print(f'Train: {len(train_dataset)} samples')\n","print(f'Val: {len(val_dataset)} samples')\n","print(f'Test: {len(test_dataset)} samples')"]},{"cell_type":"markdown","metadata":{},"source":["## Convert to AUCMEDI Format\n","\n","AUCMEDI expects data in a specific format. We'll convert the MedMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Prepare data for AUCMEDI\n","# Create sample lists and labels\n","train_samples = []\n","train_labels = []\n","for i in range(len(train_dataset)):\n"," train_samples.append(train_dataset[i][0])\n"," train_labels.append(train_dataset[i][1])\n","\n","test_samples = []\n","test_labels = []\n","for i in range(len(test_dataset)):\n"," test_samples.append(test_dataset[i][0])\n"," test_labels.append(test_dataset[i][1])\n","\n","# Convert to numpy arrays\n","train_x = np.array([np.array(img) for img in train_samples])\n","train_y = np.array(train_labels).flatten()\n","test_x = np.array([np.array(img) for img in test_samples])\n","test_y = np.array(test_labels).flatten()\n","\n","print(f'Train images shape: {train_x.shape}')\n","print(f'Train labels shape: {train_y.shape}')\n","print(f'Number of classes: {len(np.unique(train_y))}')"]},{"cell_type":"markdown","metadata":{},"source":["## AUCMEDI's Three Pillars\n","\n","AUCMEDI is built on three pillars:\n","1. **DataInterface**: Handles data loading\n","2. **NeuralNetwork**: Manages the model architecture\n","3. **DataGenerator**: Provides data augmentation and preprocessing"]},{"cell_type":"markdown","metadata":{},"source":["### 1. DataInterface Setup"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Create index lists\n","train_samples_idx = list(range(len(train_x)))\n","test_samples_idx = list(range(len(test_x)))\n","\n","# Initialize DataInterface for in-memory data\n","ds = DataInterface(interface=\"numpy\",\n"," data_directory=None,\n"," training_samples=train_samples_idx,\n"," validation_samples=None,\n"," test_samples=test_samples_idx)\n","\n","print('DataInterface initialized')"]},{"cell_type":"markdown","metadata":{},"source":["### 2. NeuralNetwork Setup\n","\n","We'll use a DenseNet121 architecture pretrained on ImageNet:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Define model\n","model = NeuralNetwork(n_labels=9,\n"," channels=3,\n"," architecture=\"DenseNet121\",\n"," pretrained_weights=True,\n"," loss=\"categorical_crossentropy\",\n"," metrics=[\"accuracy\"])\n","\n","print('Model architecture:', model.architecture)\n","print('Input shape:', (28, 28, 3))"]},{"cell_type":"markdown","metadata":{},"source":["### 3. DataGenerator Setup\n","\n","Configure data augmentation and preprocessing:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["from aucmedi.data_processing.subfunctions import Resize\n","\n","# Create subfunctions for preprocessing\n","# Resize images to 224x224 for DenseNet\n","sf_list = [Resize(shape=(224, 224))]\n","\n","# Initialize DataGenerators\n","train_gen = DataGenerator(train_samples_idx,\n"," path_imagedir=None,\n"," labels=train_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=True,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: train_x[idx])\n","\n","test_gen = DataGenerator(test_samples_idx,\n"," path_imagedir=None,\n"," labels=test_y,\n"," image_format=\"array\",\n"," batch_size=32,\n"," data_aug=None,\n"," shuffle=False,\n"," subfunctions=sf_list,\n"," resize=None,\n"," standardize_mode=\"tf\",\n"," grayscale=False,\n"," sample_weights=None,\n"," seed=None,\n"," image_loader=lambda idx: test_x[idx])\n","\n","print('DataGenerators initialized')"]},{"cell_type":"markdown","metadata":{},"source":["## Training\n","\n","Train the model on PathMNIST data:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Train model\n","history = model.train(train_gen,\n"," epochs=10,\n"," validation_freq=1,\n"," callbacks=[])\n","\n","print('Training complete!')"]},{"cell_type":"markdown","metadata":{},"source":["## Evaluation\n","\n","Evaluate the model on the test set:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Predict on test set\n","preds = model.predict(test_gen)\n","\n","# Calculate accuracy\n","from sklearn.metrics import accuracy_score, classification_report\n","\n","pred_labels = np.argmax(preds, axis=1)\n","accuracy = accuracy_score(test_y, pred_labels)\n","\n","print(f'Test Accuracy: {accuracy:.4f}')\n","print('\\nClassification Report:')\n","print(classification_report(test_y, pred_labels))"]},{"cell_type":"markdown","metadata":{},"source":["## Visualization\n","\n","Visualize some predictions:"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["import matplotlib.pyplot as plt\n","\n","# Show sample predictions\n","fig, axes = plt.subplots(2, 5, figsize=(15, 6))\n","axes = axes.ravel()\n","\n","class_names = ['ADI', 'BACK', 'DEB', 'LYM', 'MUC', 'MUS', 'NORM', 'STR', 'TUM']\n","\n","for i in range(10):\n"," axes[i].imshow(test_x[i])\n"," axes[i].set_title(f'True: {class_names[test_y[i]]}\\nPred: {class_names[pred_labels[i]]}')\n"," axes[i].axis('off')\n","\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{},"source":["## Summary\n","\n","This tutorial demonstrated:\n","- Loading MedMNIST datasets\n","- Converting to AUCMEDI format\n","- Using AUCMEDI's three pillars (DataInterface, NeuralNetwork, DataGenerator)\n","- Training and evaluating on medical imaging data\n","\n","Try experimenting with:\n","- Different MedMNIST datasets (ChestMNIST, DermaMNIST, etc.)\n","- Different architectures (ResNet, EfficientNet, etc.)\n","- Data augmentation techniques\n","- Ensemble methods"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.8.0"}},"nbformat":4,"nbformat_minor":4} | |||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This notebook demonstrates how to use AUCMEDI with the MedMNIST benchmark datasets for medical image classification, including setup, data loading, and model training.
Closes #172