Week 03 — Q1

Open-Source Tools for Automated Image Segmentation

You now understand the clinical context (Week 1) and how to preprocess MRI data (Week 2). This week, you’ll explore the software ecosystem that makes it possible to actually build segmentation models — with a focus on MONAI, the framework purpose-built for medical imaging AI.

The Open-Source Medical Segmentation Ecosystem

Five years ago, building a medical image segmentation model meant writing everything from scratch — custom data loaders for 3D NIfTI files, hand-coded augmentations that respected voxel spacing, and training loops that handled the quirks of volumetric data. Today, a rich ecosystem of open-source tools does most of that for you.

The key insight for beginners: you don’t need to choose just one tool. These frameworks serve different purposes and are often used together. 3D Slicer for visualization and manual annotation, MONAI for building custom deep learning pipelines, nnU-Net for achieving state-of-the-art results out of the box, and SimpleITK for preprocessing. Understanding what each tool does and when to reach for it is the skill you’re building this week.

MONAI
Build custom DL pipelines

PyTorch-based framework for medical imaging AI. Medical-specific transforms, architectures, data loaders. Maximum flexibility for research.

nnU-Net
State-of-the-art out of the box

Self-configuring segmentation. Give it data, it handles everything. Won most MICCAI challenges. Minimal tuning needed.

3D Slicer
Visualize & annotate

Desktop app for viewing medical images, manual/semi-auto segmentation, and verifying AI outputs. Your visual workbench.

SimpleITK
Image processing in Python

Preprocessing powerhouse: registration, filtering, resampling, bias correction. Bridges clinical imaging and code.

TotalSegmentator
Multi-organ segmentation

Pre-trained nnU-Net for 80+ anatomical structures. Works on CT and MRI. Ready to use with minimal setup.

TorchIO
Medical data augmentation

PyTorch library for loading, augmenting, and patch-sampling 3D medical images. MRI artifact simulation built in.

MONAI: The Medical AI Framework

MONAI (Medical Open Network for Artificial Intelligence) is an open-source, PyTorch-based framework developed by NVIDIA and an international research community specifically for healthcare imaging. If PyTorch is the general-purpose deep learning language, MONAI is the medical dialect — it adds everything that’s missing when you try to use vanilla PyTorch on 3D medical data.

Why Not Just Use Raw PyTorch?

You can build medical image segmentation in pure PyTorch. But you’ll immediately hit problems that MONAI solves out of the box: 3D volumetric data (torchvision expects 2D images); NIfTI/DICOM file formats (PyTorch doesn’t know what these are); voxel spacing (a rotation of 10 degrees in a 1mm³ image is very different from a 0.5×0.5×3mm image — MONAI transforms respect physical dimensions); medical-specific augmentations (bias field simulation, elastic deformations appropriate for anatomy); and patch-based training (brain MRI volumes are too large to fit in GPU memory whole, so you train on patches).

94–97%
DSC achieved with MONAI’s 3D UNet on multi-organ segmentation benchmarks
4
Components: Core, Label, Auto3DSeg, Deploy — covering the full pipeline
0.926
Dice achieved by a clinician using MONAI Label for mandibular segmentation

The Four Components of MONAI

MONAI Core

The foundation. Provides medical imaging-specific data loaders (reads NIfTI, DICOM natively), transforms (spacing-aware rotations, Z-score normalization, random cropping for 3D patches), loss functions (Dice loss, generalized Dice, focal loss), and network architectures (U-Net, SegResNet, UNETR, SwinUNETR). This is what you’ll use to build your training pipeline.

Install: pip install monai

MONAI Label

An interactive annotation framework that connects AI models to 3D Slicer or the OHIF web viewer. Instead of manually tracing every slice, you make a few clicks, the AI suggests a segmentation, you correct it, and the model learns from your corrections through active learning. A 2025 study showed that a novice with no medical imaging experience used MONAI Label to achieve a Dice of 0.831 on spleen segmentation — within the range of published expert results — in less than a month.

Install: pip install monailabel

MONAI Auto3DSeg

The “nnU-Net competitor.” An automated pipeline that analyzes your dataset and automatically selects architectures (DiNTS, SegResNet, SwinUNETR), configures hyperparameters, trains multiple models, and ensembles them. Like nnU-Net, the goal is state-of-the-art results with minimal manual configuration. In benchmarks, MONAI architectures are competitive with nnU-Net, though nnU-Net often edges ahead on boundary accuracy.

MONAI Deploy

Takes your trained model and packages it for clinical use. Creates inference pipelines, handles DICOM input/output, and integrates with hospital IT systems (PACS). This bridges the gap between “my model works in a Jupyter notebook” and “a radiologist can use this on real patients.” You’ll explore this in depth in Week 10.

Your First MONAI Pipeline

Here’s what a minimal MONAI training setup looks like — notice how medical-specific it is compared to vanilla PyTorch:

import monai
from monai.transforms import (
  Compose, LoadImaged, EnsureChannelFirstd,
  Spacingd, Orientationd, ScaleIntensityRanged,
  CropForegroundd, RandCropByPosNegLabeld,
  RandFlipd, RandRotate90d, EnsureTyped
)
from monai.networks.nets import UNet
from monai.losses import DiceLoss

# Medical-specific transforms
train_transforms = Compose([
  LoadImaged(keys=["image", "label"]), # Reads NIfTI natively
  EnsureChannelFirstd(keys=["image", "label"]),
  Spacingd(keys=["image", "label"],
    pixdim=(1.0, 1.0, 1.0)), # Resample to 1mm isotropic
  Orientationd(keys=["image", "label"],
    axcodes="RAS"), # Standardize orientation
  ScaleIntensityRanged(keys=["image"],
    a_min=-175, a_max=250,
    b_min=0.0, b_max=1.0, clip=True),
  CropForegroundd(keys=["image", "label"],
    source_key="image"), # Remove empty space
  RandCropByPosNegLabeld(keys=["image", "label"],
    label_key="label",
    spatial_size=(96, 96, 96),
    pos=1, neg=1, num_samples=4), # 3D patch sampling
  RandFlipd(keys=["image", "label"], prob=0.5),
  RandRotate90d(keys=["image", "label"], prob=0.5),
])

# 3D U-Net model
model = UNet(
  spatial_dims=3,
  in_channels=4, # T1, T1ce, T2, FLAIR
  out_channels=4, # Background + 3 tumor regions
  channels=(16, 32, 64, 128, 256),
  strides=(2, 2, 2, 2),
)

loss_fn = DiceLoss(to_onehot_y=True, softmax=True)
💡
Notice the “d” suffix: MONAI transforms like LoadImaged, Spacingd, RandFlipd end in “d” for “dictionary.” They operate on dictionaries with keys like “image” and “label,” applying the same spatial transform to both so your image and segmentation mask stay aligned. This is different from torchvision transforms which operate on single tensors.

nnU-Net vs. MONAI: When to Use Which

This is the question every beginner asks: should I use nnU-Net or MONAI? The honest answer is that they serve different purposes and you’ll likely use both.

Use nnU-Net When…

You want the best possible results with minimal effort. nnU-Net is a complete, self-configuring pipeline — give it data in the right format and it handles preprocessing, architecture selection, training, post-processing, and ensembling automatically. It surpassed most existing approaches on 23 public datasets without any manual tuning. In multi-center brain tumor studies, it consistently achieves the highest Dice scores (0.86–0.93). Nine out of ten MICCAI 2020 challenge winners built on nnU-Net.

Trade-off: Less flexibility. nnU-Net is opinionated — it decides most things for you. Customizing the architecture or loss function requires modifying the framework itself.

Use MONAI When…

You want full control over your pipeline. MONAI gives you building blocks — transforms, architectures, losses, data loaders — and lets you assemble them however you want. Need to combine three different architectures? Use a custom loss function? Integrate active learning? Build a deployment pipeline? MONAI is the tool. It’s also the better educational tool because you understand every piece of your pipeline.

Trade-off: More decisions to make. You choose the architecture, hyperparameters, augmentation strategy, and post-processing yourself. More rope, more flexibility, more ways to hang yourself.

📚
What the BraTS 2nd-place solution used: A MONAI pipeline with three separate architectures ensembled together. MONAI’s flexibility made it possible to combine models in ways that nnU-Net’s fixed pipeline doesn’t easily support. But many top BraTS teams use nnU-Net as their backbone and add customizations on top.

A useful benchmark: Gut et al. (2022) compared U-Net and five architectural variants under identical conditions across nine datasets and found that architecture variants don’t consistently improve over basic U-Net while resource demands increase. This suggests that automated configuration (what nnU-Net and Auto3DSeg do) matters more than architectural novelty — supporting a strategy of starting with nnU-Net for baselines and switching to MONAI when you need customization.

Segmentation Architectures You Should Know

Both MONAI and nnU-Net are built around specific neural network architectures. You don’t need to deeply understand all of these right now, but knowing what they are and when each shines will help you make informed choices later.

ArchitectureCore InnovationBest ForAvailable In
U-NetEncoder-decoder with skip connections. The foundation of medical segmentation since 2015.Default starting point for any task. Well-understood, widely validated.MONAI, nnU-Net, PyTorch
SegResNetResidual connections for deeper networks and better gradient flow.Complex features needing deeper networks. Competitive in brain tumor segmentation (Dice 0.843–0.869 on pediatric BraTS).MONAI, Auto3DSeg
UNETRReplaces the CNN encoder with a Vision Transformer to capture long-range dependencies.Tasks where global context matters (multi-organ, whole-body). Dice up to 0.962 on skull structures.MONAI
SwinUNETRUses Swin Transformer encoder with shifted windows for efficient hierarchical features.BraTS and similar challenges. 0.84–0.91 Dice on brain tumor sub-regions. Good balance of accuracy and efficiency.MONAI, Auto3DSeg
DiNTSAutomated architecture design through differentiable neural architecture search.When you want the architecture itself to be optimized for your data. Computationally expensive.MONAI, Auto3DSeg
nnU-NetSelf-configuring pipeline that auto-tunes preprocessing, architecture depth, patch size, and post-processing.Achieving state-of-the-art with zero manual tuning. The benchmark to beat.nnU-Net framework
⚠️
Don’t over-focus on architectures. A study comparing Slim UNETR to Swin UNETR showed you can get 92.44% Dice while being 34.6x smaller and 13.4x faster. And a critical 2026 study found that a standard UNet outperformed the SAM and MedSAM foundation models even with very limited training data (median Dice 0.88 vs 0.82–0.84). The architecture matters less than data quality, preprocessing, and training strategy.

Data Augmentation for 3D Medical Images

Medical imaging datasets are small by deep learning standards. BraTS has ~1,200 cases; ImageNet has 14 million images. Data augmentation — artificially expanding your training set by applying random transformations — is essential for preventing overfitting.

But medical augmentation is different from natural image augmentation. You can’t just apply random color jittering to an MRI — it needs to produce plausible medical images. A systematic review of 300+ articles found that augmentation is effective across organs, modalities, and dataset sizes, but the techniques must be chosen carefully for each imaging type.

MONAI Transforms vs. torchvision

Standard torchvision transforms work on 2D images with pixel coordinates. MONAI transforms work on 3D volumes with physical coordinates (millimeters). This means:

Spacing-Aware Transforms

When MONAI rotates an image, it accounts for anisotropic voxel spacing. A 10-degree rotation in a volume with 1×1×1mm voxels looks different than in a volume with 0.5×0.5×3mm voxels. MONAI handles this automatically; torchvision does not.

Medical-Specific Augmentations

Bias field simulation: Artificially adds the kind of intensity inhomogeneity you’d see from RF coils, training the model to be robust to this artifact. Elastic deformation: Warps the image in anatomically plausible ways. Intensity augmentations: Gamma correction, Gaussian noise, brightness/contrast shifts calibrated for MRI intensity ranges. Patch sampling: RandCropByPosNegLabeld samples 3D patches ensuring a specified ratio contain foreground (tumor) vs background — critical for class-imbalanced medical data.

TorchIO: A Specialized Alternative

TorchIO is a complementary library focused specifically on 3D medical image loading, augmentation, and patch-based sampling. It follows PyTorch conventions, supports invertible transforms (useful for test-time augmentation), and includes MRI-specific artifact simulation (motion artifacts, ghosting, spike noise). It integrates seamlessly with both MONAI and raw PyTorch pipelines. Think of TorchIO as the augmentation specialist and MONAI as the full-stack framework.

Pre-Trained Models & Transfer Learning

Do you need to train every model from scratch? Not necessarily. Pre-trained models and transfer learning can give you a head start, but the picture in medical imaging is more nuanced than in natural image processing.

The Transfer Learning Reality Check

In computer vision, starting from an ImageNet-pretrained model is standard practice. In medical imaging, the evidence is mixed:

When Transfer Learning Helps

Small datasets: When you have limited training data, pre-trained weights provide a better starting point than random initialization. A 2016 study showed pre-trained CNNs outperformed from-scratch training across four medical imaging tasks and were more robust to small training sets. Similar domains: Transfer from medical-to-medical (e.g., CT-to-CT) outperforms ImageNet-to-medical by about +2% Dice. Cross-dimensional: A clever 2023 approach embedded 2D pre-trained encoders into 3D U-Nets, achieving 91.69% Dice for whole tumor on BraTS 2022.

When It Doesn’t Matter (Much)

Sufficient data: When you have enough training data (like full BraTS), models trained from scratch perform just as well. Foundation model hype: A critical 2026 study found that a standard UNet outperformed fine-tuned SAM and MedSAM for neuroanatomic segmentation — even when SAM had orders of magnitude more pretraining data. The surprising finding: You can freeze the encoder at random values and only train the decoder, and still get competitive results — challenging the assumption that encoders must learn task-specific features.

Where to Find Pre-Trained Models

MONAI Model Zoo: Pre-trained models for various medical imaging tasks, directly loadable into MONAI pipelines. TotalSegmentator: A pre-trained nnU-Net for 80 anatomical structures in CT and MRI (Dice 0.839 on MRI, 0.966 on CT). Useful as a ready-to-use tool or as a starting point for fine-tuning. MedSAM / SAM-Med3D: Foundation models for interactive medical segmentation, though current evidence suggests they don’t consistently outperform well-trained task-specific models.

Practical Setup: GPU Access & Installation

Deep learning on 3D medical images is computationally demanding. A single BraTS MRI volume is 240×240×155 voxels across 4 channels — over 35 million values per patient. Training a 3D U-Net on this data requires a GPU.

GPU Options for Students

Free Tier

Google Colab: Free GPU access (T4, sometimes V100). Limited session time (~12 hours) and RAM. Good for learning and small experiments. Kaggle Notebooks: 30 hours/week of GPU time (P100 or T4). Slightly more generous than Colab. Both platforms can run MONAI and nnU-Net for educational purposes, but serious training runs may time out.

Paid / Institutional Options

Cloud credits: Google Cloud, AWS, and Azure all offer $100–300 in student credits. Enough for several full nnU-Net training runs. Lambda Labs / Paperspace: Affordable GPU rentals ($0.50–2/hour for an A100). Good for serious training. Your school: Ask your CS department — many have shared GPU servers that students can access.

Installation Checklist

# 1. Create a clean environment
conda create -n medseg python=3.10
conda activate medseg

# 2. Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# 3. Install MONAI (full version with all dependencies)
pip install "monai[all]"

# 4. Install additional tools
pip install nibabel simpleitk torchio matplotlib

# 5. (Optional) Install nnU-Net for comparison
pip install nnunetv2

# 6. Verify GPU access
python -c "import torch; print(torch.cuda.is_available())"

Recommended Order of Exploration

Don’t try to learn everything at once. Here’s a structured path for this week:

Day 1–2: Explore 3D Slicer + MONAI Label

If you haven’t already, install 3D Slicer and load some BraTS data. Then install MONAI Label and connect it to 3D Slicer. Try the interactive segmentation — click on a tumor, let the AI suggest a boundary, correct it. This gives you intuition for what annotation feels like and why AI assistance matters.

Day 3–4: MONAI Core Tutorials

Work through the official MONAI tutorials. Start with the “MONAI 101” notebook, then the spleen segmentation tutorial. Pay attention to the transform pipeline — understand how LoadImaged, Spacingd, RandCropByPosNegLabeld work and why each exists. Try modifying parameters and see what changes.

Day 5–6: Build a Minimal Brain Tumor Pipeline

Using what you’ve learned, build a minimal MONAI pipeline that loads BraTS data, applies transforms, creates a 3D U-Net, trains for a few epochs, and produces a segmentation prediction. It won’t be good yet — that’s fine. The goal is to have a working end-to-end pipeline you understand.

Day 7: Compare with nnU-Net

Run nnU-Net on the same data (even just one fold). Compare its output to your MONAI pipeline. Notice the performance gap — this gap is what nnU-Net’s automated optimization is worth. In Weeks 4–6, you’ll learn to close it.

This Week’s Learning Resources

Start Here (Hands-On)

Official quickstart. Installation, first pipeline, core concepts. This is your Week 3 homepage.
Jupyter notebooks for every MONAI feature: 3D segmentation, transforms, architectures, training loops. Start with the spleen segmentation tutorial, then brain tumor segmentation.
Interactive annotation framework. Follow the setup guide to connect to 3D Slicer. The “radiology” sample app is a good starting point for brain segmentation.
Install alongside MONAI for comparison. Read documentation/how_to_use_nnunet.md for the step-by-step guide.
Medical image augmentation library. Browse the transform gallery to see what each augmentation does visually. Great complement to MONAI transforms.
Your visual workbench for verifying everything. If you installed it in Week 2, explore the segmentation editor module and MONAI Label integration this week.

Key Papers

The foundational MONAI Label paper. Describes the framework, active learning strategies, and demonstrates significant annotation time reductions.
Med Image Anal. 2024;95:103207
The nnU-Net paper. Essential for understanding the self-configuring approach and why automated pipeline design outperforms manual architecture selection.
Nat Methods. 2021;18(2):203–211
The MSD paper proving that well-configured algorithms generalize across tasks. Key conclusion: training accurate AI segmentation models is now “commoditized” through tools like nnU-Net.
Nat Commun. 2022;13:4128
Fair comparison of U-Net variants showing that fancier architectures don’t reliably beat basic U-Net. Important for calibrating expectations about architecture choices.
IEEE Trans Med Imaging. 2022;41(11):3128–3141
The TorchIO paper. Covers the philosophy of medical-specific data loading and augmentation, with practical examples.
Comput Methods Programs Biomed. 2021;208:106236
Extension of TotalSegmentator to MRI: 80 anatomical structures, sequence-independent. Shows where pre-trained segmentation tools are heading.
Radiology. 2025;314(1):e241613

Deep Dives (Advanced)

Comprehensive taxonomy of U-Net variants with fair evaluations. The reference for understanding the architecture landscape.
Foundation model for medical segmentation trained on 1.57M image-mask pairs. Important to understand the promise and current limitations of foundation models in medical imaging.
Critical analysis showing transfer learning benefits are task/data-dependent. The surprising finding about random encoder weights challenges common assumptions.