You now understand the clinical context (Week 1) and how to preprocess MRI data (Week 2). This week, you’ll explore the software ecosystem that makes it possible to actually build segmentation models — with a focus on MONAI, the framework purpose-built for medical imaging AI.
Five years ago, building a medical image segmentation model meant writing everything from scratch — custom data loaders for 3D NIfTI files, hand-coded augmentations that respected voxel spacing, and training loops that handled the quirks of volumetric data. Today, a rich ecosystem of open-source tools does most of that for you.
The key insight for beginners: you don’t need to choose just one tool. These frameworks serve different purposes and are often used together. 3D Slicer for visualization and manual annotation, MONAI for building custom deep learning pipelines, nnU-Net for achieving state-of-the-art results out of the box, and SimpleITK for preprocessing. Understanding what each tool does and when to reach for it is the skill you’re building this week.
PyTorch-based framework for medical imaging AI. Medical-specific transforms, architectures, data loaders. Maximum flexibility for research.
Self-configuring segmentation. Give it data, it handles everything. Won most MICCAI challenges. Minimal tuning needed.
Desktop app for viewing medical images, manual/semi-auto segmentation, and verifying AI outputs. Your visual workbench.
Preprocessing powerhouse: registration, filtering, resampling, bias correction. Bridges clinical imaging and code.
Pre-trained nnU-Net for 80+ anatomical structures. Works on CT and MRI. Ready to use with minimal setup.
PyTorch library for loading, augmenting, and patch-sampling 3D medical images. MRI artifact simulation built in.
MONAI (Medical Open Network for Artificial Intelligence) is an open-source, PyTorch-based framework developed by NVIDIA and an international research community specifically for healthcare imaging. If PyTorch is the general-purpose deep learning language, MONAI is the medical dialect — it adds everything that’s missing when you try to use vanilla PyTorch on 3D medical data.
You can build medical image segmentation in pure PyTorch. But you’ll immediately hit problems that MONAI solves out of the box: 3D volumetric data (torchvision expects 2D images); NIfTI/DICOM file formats (PyTorch doesn’t know what these are); voxel spacing (a rotation of 10 degrees in a 1mm³ image is very different from a 0.5×0.5×3mm image — MONAI transforms respect physical dimensions); medical-specific augmentations (bias field simulation, elastic deformations appropriate for anatomy); and patch-based training (brain MRI volumes are too large to fit in GPU memory whole, so you train on patches).
The foundation. Provides medical imaging-specific data loaders (reads NIfTI, DICOM natively), transforms (spacing-aware rotations, Z-score normalization, random cropping for 3D patches), loss functions (Dice loss, generalized Dice, focal loss), and network architectures (U-Net, SegResNet, UNETR, SwinUNETR). This is what you’ll use to build your training pipeline.
Install: pip install monai
An interactive annotation framework that connects AI models to 3D Slicer or the OHIF web viewer. Instead of manually tracing every slice, you make a few clicks, the AI suggests a segmentation, you correct it, and the model learns from your corrections through active learning. A 2025 study showed that a novice with no medical imaging experience used MONAI Label to achieve a Dice of 0.831 on spleen segmentation — within the range of published expert results — in less than a month.
Install: pip install monailabel
The “nnU-Net competitor.” An automated pipeline that analyzes your dataset and automatically selects architectures (DiNTS, SegResNet, SwinUNETR), configures hyperparameters, trains multiple models, and ensembles them. Like nnU-Net, the goal is state-of-the-art results with minimal manual configuration. In benchmarks, MONAI architectures are competitive with nnU-Net, though nnU-Net often edges ahead on boundary accuracy.
Takes your trained model and packages it for clinical use. Creates inference pipelines, handles DICOM input/output, and integrates with hospital IT systems (PACS). This bridges the gap between “my model works in a Jupyter notebook” and “a radiologist can use this on real patients.” You’ll explore this in depth in Week 10.
Here’s what a minimal MONAI training setup looks like — notice how medical-specific it is compared to vanilla PyTorch:
import monai
from monai.transforms import (
Compose, LoadImaged, EnsureChannelFirstd,
Spacingd, Orientationd, ScaleIntensityRanged,
CropForegroundd, RandCropByPosNegLabeld,
RandFlipd, RandRotate90d, EnsureTyped
)
from monai.networks.nets import UNet
from monai.losses import DiceLoss
# Medical-specific transforms
train_transforms = Compose([
LoadImaged(keys=["image", "label"]), # Reads NIfTI natively
EnsureChannelFirstd(keys=["image", "label"]),
Spacingd(keys=["image", "label"],
pixdim=(1.0, 1.0, 1.0)), # Resample to 1mm isotropic
Orientationd(keys=["image", "label"],
axcodes="RAS"), # Standardize orientation
ScaleIntensityRanged(keys=["image"],
a_min=-175, a_max=250,
b_min=0.0, b_max=1.0, clip=True),
CropForegroundd(keys=["image", "label"],
source_key="image"), # Remove empty space
RandCropByPosNegLabeld(keys=["image", "label"],
label_key="label",
spatial_size=(96, 96, 96),
pos=1, neg=1, num_samples=4), # 3D patch sampling
RandFlipd(keys=["image", "label"], prob=0.5),
RandRotate90d(keys=["image", "label"], prob=0.5),
])
# 3D U-Net model
model = UNet(
spatial_dims=3,
in_channels=4, # T1, T1ce, T2, FLAIR
out_channels=4, # Background + 3 tumor regions
channels=(16, 32, 64, 128, 256),
strides=(2, 2, 2, 2),
)
loss_fn = DiceLoss(to_onehot_y=True, softmax=True)
LoadImaged, Spacingd, RandFlipd end in “d” for “dictionary.” They operate on dictionaries with keys like “image” and “label,” applying the same spatial transform to both so your image and segmentation mask stay aligned. This is different from torchvision transforms which operate on single tensors.This is the question every beginner asks: should I use nnU-Net or MONAI? The honest answer is that they serve different purposes and you’ll likely use both.
You want the best possible results with minimal effort. nnU-Net is a complete, self-configuring pipeline — give it data in the right format and it handles preprocessing, architecture selection, training, post-processing, and ensembling automatically. It surpassed most existing approaches on 23 public datasets without any manual tuning. In multi-center brain tumor studies, it consistently achieves the highest Dice scores (0.86–0.93). Nine out of ten MICCAI 2020 challenge winners built on nnU-Net.
Trade-off: Less flexibility. nnU-Net is opinionated — it decides most things for you. Customizing the architecture or loss function requires modifying the framework itself.
You want full control over your pipeline. MONAI gives you building blocks — transforms, architectures, losses, data loaders — and lets you assemble them however you want. Need to combine three different architectures? Use a custom loss function? Integrate active learning? Build a deployment pipeline? MONAI is the tool. It’s also the better educational tool because you understand every piece of your pipeline.
Trade-off: More decisions to make. You choose the architecture, hyperparameters, augmentation strategy, and post-processing yourself. More rope, more flexibility, more ways to hang yourself.
A useful benchmark: Gut et al. (2022) compared U-Net and five architectural variants under identical conditions across nine datasets and found that architecture variants don’t consistently improve over basic U-Net while resource demands increase. This suggests that automated configuration (what nnU-Net and Auto3DSeg do) matters more than architectural novelty — supporting a strategy of starting with nnU-Net for baselines and switching to MONAI when you need customization.
Both MONAI and nnU-Net are built around specific neural network architectures. You don’t need to deeply understand all of these right now, but knowing what they are and when each shines will help you make informed choices later.
| Architecture | Core Innovation | Best For | Available In |
|---|---|---|---|
| U-Net | Encoder-decoder with skip connections. The foundation of medical segmentation since 2015. | Default starting point for any task. Well-understood, widely validated. | MONAI, nnU-Net, PyTorch |
| SegResNet | Residual connections for deeper networks and better gradient flow. | Complex features needing deeper networks. Competitive in brain tumor segmentation (Dice 0.843–0.869 on pediatric BraTS). | MONAI, Auto3DSeg |
| UNETR | Replaces the CNN encoder with a Vision Transformer to capture long-range dependencies. | Tasks where global context matters (multi-organ, whole-body). Dice up to 0.962 on skull structures. | MONAI |
| SwinUNETR | Uses Swin Transformer encoder with shifted windows for efficient hierarchical features. | BraTS and similar challenges. 0.84–0.91 Dice on brain tumor sub-regions. Good balance of accuracy and efficiency. | MONAI, Auto3DSeg |
| DiNTS | Automated architecture design through differentiable neural architecture search. | When you want the architecture itself to be optimized for your data. Computationally expensive. | MONAI, Auto3DSeg |
| nnU-Net | Self-configuring pipeline that auto-tunes preprocessing, architecture depth, patch size, and post-processing. | Achieving state-of-the-art with zero manual tuning. The benchmark to beat. | nnU-Net framework |
Medical imaging datasets are small by deep learning standards. BraTS has ~1,200 cases; ImageNet has 14 million images. Data augmentation — artificially expanding your training set by applying random transformations — is essential for preventing overfitting.
But medical augmentation is different from natural image augmentation. You can’t just apply random color jittering to an MRI — it needs to produce plausible medical images. A systematic review of 300+ articles found that augmentation is effective across organs, modalities, and dataset sizes, but the techniques must be chosen carefully for each imaging type.
Standard torchvision transforms work on 2D images with pixel coordinates. MONAI transforms work on 3D volumes with physical coordinates (millimeters). This means:
When MONAI rotates an image, it accounts for anisotropic voxel spacing. A 10-degree rotation in a volume with 1×1×1mm voxels looks different than in a volume with 0.5×0.5×3mm voxels. MONAI handles this automatically; torchvision does not.
Bias field simulation: Artificially adds the kind of intensity inhomogeneity you’d see from RF coils, training the model to be robust to this artifact. Elastic deformation: Warps the image in anatomically plausible ways. Intensity augmentations: Gamma correction, Gaussian noise, brightness/contrast shifts calibrated for MRI intensity ranges. Patch sampling: RandCropByPosNegLabeld samples 3D patches ensuring a specified ratio contain foreground (tumor) vs background — critical for class-imbalanced medical data.
TorchIO is a complementary library focused specifically on 3D medical image loading, augmentation, and patch-based sampling. It follows PyTorch conventions, supports invertible transforms (useful for test-time augmentation), and includes MRI-specific artifact simulation (motion artifacts, ghosting, spike noise). It integrates seamlessly with both MONAI and raw PyTorch pipelines. Think of TorchIO as the augmentation specialist and MONAI as the full-stack framework.
Do you need to train every model from scratch? Not necessarily. Pre-trained models and transfer learning can give you a head start, but the picture in medical imaging is more nuanced than in natural image processing.
In computer vision, starting from an ImageNet-pretrained model is standard practice. In medical imaging, the evidence is mixed:
Small datasets: When you have limited training data, pre-trained weights provide a better starting point than random initialization. A 2016 study showed pre-trained CNNs outperformed from-scratch training across four medical imaging tasks and were more robust to small training sets. Similar domains: Transfer from medical-to-medical (e.g., CT-to-CT) outperforms ImageNet-to-medical by about +2% Dice. Cross-dimensional: A clever 2023 approach embedded 2D pre-trained encoders into 3D U-Nets, achieving 91.69% Dice for whole tumor on BraTS 2022.
Sufficient data: When you have enough training data (like full BraTS), models trained from scratch perform just as well. Foundation model hype: A critical 2026 study found that a standard UNet outperformed fine-tuned SAM and MedSAM for neuroanatomic segmentation — even when SAM had orders of magnitude more pretraining data. The surprising finding: You can freeze the encoder at random values and only train the decoder, and still get competitive results — challenging the assumption that encoders must learn task-specific features.
MONAI Model Zoo: Pre-trained models for various medical imaging tasks, directly loadable into MONAI pipelines. TotalSegmentator: A pre-trained nnU-Net for 80 anatomical structures in CT and MRI (Dice 0.839 on MRI, 0.966 on CT). Useful as a ready-to-use tool or as a starting point for fine-tuning. MedSAM / SAM-Med3D: Foundation models for interactive medical segmentation, though current evidence suggests they don’t consistently outperform well-trained task-specific models.
Deep learning on 3D medical images is computationally demanding. A single BraTS MRI volume is 240×240×155 voxels across 4 channels — over 35 million values per patient. Training a 3D U-Net on this data requires a GPU.
Google Colab: Free GPU access (T4, sometimes V100). Limited session time (~12 hours) and RAM. Good for learning and small experiments. Kaggle Notebooks: 30 hours/week of GPU time (P100 or T4). Slightly more generous than Colab. Both platforms can run MONAI and nnU-Net for educational purposes, but serious training runs may time out.
Cloud credits: Google Cloud, AWS, and Azure all offer $100–300 in student credits. Enough for several full nnU-Net training runs. Lambda Labs / Paperspace: Affordable GPU rentals ($0.50–2/hour for an A100). Good for serious training. Your school: Ask your CS department — many have shared GPU servers that students can access.
# 1. Create a clean environment
conda create -n medseg python=3.10
conda activate medseg
# 2. Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# 3. Install MONAI (full version with all dependencies)
pip install "monai[all]"
# 4. Install additional tools
pip install nibabel simpleitk torchio matplotlib
# 5. (Optional) Install nnU-Net for comparison
pip install nnunetv2
# 6. Verify GPU access
python -c "import torch; print(torch.cuda.is_available())"
Don’t try to learn everything at once. Here’s a structured path for this week:
If you haven’t already, install 3D Slicer and load some BraTS data. Then install MONAI Label and connect it to 3D Slicer. Try the interactive segmentation — click on a tumor, let the AI suggest a boundary, correct it. This gives you intuition for what annotation feels like and why AI assistance matters.
Work through the official MONAI tutorials. Start with the “MONAI 101” notebook, then the spleen segmentation tutorial. Pay attention to the transform pipeline — understand how LoadImaged, Spacingd, RandCropByPosNegLabeld work and why each exists. Try modifying parameters and see what changes.
Using what you’ve learned, build a minimal MONAI pipeline that loads BraTS data, applies transforms, creates a 3D U-Net, trains for a few epochs, and produces a segmentation prediction. It won’t be good yet — that’s fine. The goal is to have a working end-to-end pipeline you understand.
Run nnU-Net on the same data (even just one fold). Compare its output to your MONAI pipeline. Notice the performance gap — this gap is what nnU-Net’s automated optimization is worth. In Weeks 4–6, you’ll learn to close it.
documentation/how_to_use_nnunet.md for the step-by-step guide.