Week 02 — Q1

MRI Preprocessing: Skull-Stripping & Intensity Normalization

Raw MRI data straight from the scanner is messy. Before any segmentation model can work reliably, you need to clean, standardize, and prepare the images. This week covers the essential preprocessing steps that every brain tumor segmentation pipeline depends on.

Why Can’t We Just Use Raw MRI Data?

If you’ve ever taken a photo with your phone, you know the image looks different depending on lighting, camera settings, and the phone model. MRI is the same — except worse. The signal intensities in an MRI image are arbitrary. Unlike CT scans (where a Hounsfield unit of 0 always means water), a “bright voxel” on one MRI scanner might represent completely different tissue intensity on another.

On top of that, raw MRI scans include the skull, skin, fat, eyeballs, and other non-brain tissue. The images may have smooth intensity gradients across the volume caused by RF coil imperfections (called bias field). Different hospitals use different scanners, protocols, and resolutions. And patients’ heads are positioned differently in the scanner each time.

If you feed all of this variability into a deep learning model, it will learn the wrong things — latching onto scanner differences instead of tumor characteristics. Studies show that models trained at one hospital can drop from Dice scores of 0.72–0.76 down to 0.59–0.68 when tested at a different institution, largely due to these preprocessing-related differences.

💡
Key insight: Preprocessing errors propagate directly to segmentation errors. Research has shown that failure to reproduce the exact preprocessing pipeline of a published method makes it impossible to reproduce the segmentation results. Getting preprocessing right isn’t optional — it’s foundational.

Understanding NIfTI Files

Before you preprocess anything, you need to understand the file format. Brain MRI data for research (including BraTS) comes in NIfTI format (.nii or .nii.gz). A NIfTI file contains two things: a header (metadata about the image) and the image data itself (a 3D array of voxel intensities).

What’s in the Header?

Dimensions: The size of the 3D volume (e.g., 240 × 240 × 155 voxels for BraTS). Voxel spacing (pixdim): The real-world size of each voxel in millimeters (e.g., 1mm × 1mm × 1mm). Orientation (qform/sform): How the voxel grid maps to physical patient space (left-right, anterior-posterior, superior-inferior). Data type: Whether intensities are stored as integers or floating-point numbers.

⚠️
Common pitfall — left-right flips: Research has documented widespread left-right orientation errors in MRI databases. If your orientation is wrong, the model may learn a mirror-image of reality. Always verify orientation visually in 3D Slicer or ITK-SNAP before processing. The original DICOM-to-NIfTI conversion step is where these errors typically originate.

In Python, you load and inspect NIfTI files with nibabel:

import nibabel as nib
import numpy as np

# Load a NIfTI file
img = nib.load('BraTS_001_t1ce.nii.gz')

# Access the header metadata
header = img.header
print(header.get_zooms()) # Voxel spacing, e.g. (1.0, 1.0, 1.0)
print(header.get_data_shape()) # Volume dimensions, e.g. (240, 240, 155)

# Get the image data as a NumPy array
data = img.get_fdata() # Shape: (240, 240, 155)
print(data.dtype, data.min(), data.max())

The MRI Preprocessing Pipeline, Step by Step

There’s a recommended order of operations based on how each step interacts with the others. This is the pipeline used in BraTS and most brain tumor segmentation research:

STEP 1
N4 Bias Field Correction

Remove the smooth intensity gradients caused by RF coil non-uniformity. This must come first because the bias field affects the entire image, and all subsequent steps assume relatively uniform tissue intensities.

STEP 2
Co-Registration

Align all four MRI modalities (T1, T1ce, T2, FLAIR) to a common space so the same voxel coordinate refers to the same brain location across sequences. This uses rigid or affine registration.

STEP 3
Atlas Registration / Spatial Normalization

Register the brain to a standard anatomical template (BraTS uses the SRI24 atlas). This puts every patient’s brain into the same coordinate frame, making it possible to compare across subjects.

STEP 4
Skull-Stripping (Brain Extraction)

Remove all non-brain tissue: skull bone, skin, fat, eyes, neck. This isolates the brain volume so the model only sees relevant tissue and isn’t distracted by irrelevant anatomy.

STEP 5
Intensity Normalization

Standardize the voxel intensity values so that the same tissue type has approximately the same intensity across different patients and scanners. Z-score normalization is the most common approach.

STEP 6
Resampling to Target Resolution

Resample all images to a uniform voxel spacing (typically 1mm × 1mm × 1mm isotropic). Use trilinear interpolation for images and nearest-neighbor interpolation for segmentation labels.

📚
BraTS does this for you (mostly): The BraTS challenge dataset comes pre-processed — images are already co-registered, skull-stripped, and resampled to 1mm isotropic on the SRI24 atlas. This is great for getting started, but understanding what happened behind the scenes is critical because real-world clinical data won’t be pre-processed for you.

N4 Bias Field Correction

Look at a raw brain MRI and you’ll often notice that one side of the brain appears brighter than the other, or the center is darker than the edges. This isn’t biology — it’s an artifact from the RF receive coil’s non-uniform sensitivity. This smooth, low-frequency intensity variation is called the bias field (or “inhomogeneity field”).

N4ITK (an improvement over the earlier N3 algorithm) is the standard tool for correcting this. It estimates the bias field as a smooth B-spline surface and divides it out of the image. The original N4ITK paper by Tustison et al. (2010) remains the most cited reference for this step.

A 2024 study evaluated 240 different N4 parameter configurations and found that results are sensitive to parameter choices — but for brain MRI, the defaults in most tools (SimpleITK, ANTs) work well. The key takeaway from the literature: N4 alone is not enough. A study on 615 glioma patients showed that using N4 without subsequent intensity normalization caused dramatic performance drops (AUC dropping from 0.85 to 0.19–0.52 on external data). N4 must be paired with Z-score or WhiteStripe normalization.

import SimpleITK as sitk

# Load the image
input_img = sitk.ReadImage('scan_t1ce.nii.gz', sitk.sitkFloat32)

# Create a mask (optional but recommended)
mask = sitk.OtsuThreshold(input_img, 0, 1, 200)

# Run N4 bias field correction
corrector = sitk.N4BiasFieldCorrectionImageFilter()
corrected = corrector.Execute(input_img, mask)

# Save the result
sitk.WriteImage(corrected, 'scan_t1ce_n4.nii.gz')

Skull-Stripping (Brain Extraction)

Skull-stripping removes everything that isn’t brain: skull bone, scalp skin, fat, meninges, eyes, and neck tissue. It’s one of the most critical preprocessing steps because errors here propagate to everything downstream — bad brain extraction means bad segmentation.

There are two generations of tools:

Classical Methods

FSL BET — Uses a deformable surface model. Fast but sensitive to parameter tuning and inconsistent across datasets. High failure rates on pathological brains. FreeSurfer — Hybrid watershed algorithm. Robust for T1-weighted images but very slow (hours per scan). ROBEX — Random Forest + point distribution model. Significantly outperformed BET and FreeSurfer across three public datasets with better cross-dataset consistency.

Deep Learning Methods

HD-BET — Neural network trained on multisequence MRI. Outperformed 6 popular methods by +1.16 to +2.50 Dice points. Robust on brains with tumors. The top recommendation for tumor patients. SynthStrip — Trained entirely on synthetic data. Works across any contrast, resolution, and age group with a single model. ~7 seconds per scan. Best for general-purpose use. deepbet — Trained on 7,837 images from 191 datasets. State-of-the-art Dice of 99.0. Processes one image in ~2 seconds.

0.989
Dice score of nnU-Net-based brain extraction on tumor patients (multi-center)
~7 sec
SynthStrip processing time per scan (vs hours for FreeSurfer)
94%
Improvement of OptiBET over conventional tools on severely pathological brains

The Tumor Problem

Standard skull-stripping tools were designed for healthy brains. Tumors cause mass effect (pushing brain tissue aside), midline shift (displacing the brain’s centerline), and disrupted boundaries (tumor breaking through normal tissue borders). Classical tools often fail catastrophically on these cases.

Solutions include: HD-BET (specifically trained on tumor cases), OptiBET (uses registration-based back-projection to handle severely pathological brains), and modality-agnostic training (Thakur et al., 2020 — evaluated on 3,340 brain tumor scans from multiple institutions and generalizes across available MRI sequences without retraining).

💡
Recommendation for BraTS work: Use HD-BET for brain tumor cases. It was developed by the same group that created nnU-Net (DKFZ) and is specifically validated on tumor patients. For non-tumor work or general-purpose use, SynthStrip is the most versatile single tool.

Co-Registration & Spatial Normalization

Co-registration aligns the four MRI modalities (T1, T1ce, T2, FLAIR) to each other. Since they’re acquired in the same scanning session, the patient’s head is in roughly the same position, but small movements between sequences mean the images aren’t perfectly aligned. Rigid registration (translation + rotation, 6 parameters) usually suffices for within-session alignment.

Atlas registration maps the patient’s brain to a standard template (BraTS uses the SRI24 atlas, created from 24 normal adult brains at 3T). This puts all patients into a common coordinate frame. It typically uses affine registration (translation + rotation + scaling + shearing, 12 parameters).

Which Registration Tool?

The largest benchmark of 14 registration algorithms (Klein et al., 2009, over 45,000 registrations) found that ANTs SyN delivered the most consistently high accuracy. For linear registration, a comparison on nearly 10,000 images found that MINC’s BestLinReg had the lowest failure rate (0.44%) while FSL FLIRT had 11.11% failures and SPM had 30.66%.

The Tumor Challenge (Again)

Registration is fundamentally harder with tumors because tumors have no correspondence in the template brain — they create tissue where none should exist. Research shows that lower-grade gliomas register more accurately than glioblastomas, and that for mapping tumor locations, linear registration actually works as well as non-linear approaches. Advanced solutions include tumor growth simulation, pathology-aware registration (PORTR), and deep learning methods that “hallucinate” what the brain would look like without the tumor before registering.

Intensity Normalization

This is arguably the most impactful preprocessing step for multi-site generalizability. Remember: MRI intensities are arbitrary. The same brain tissue can have completely different voxel values on different scanners. If you don’t normalize, your model learns scanner-specific patterns instead of anatomy.

Methods Compared

Z-Score Normalization (Most Common)

Subtract the mean and divide by the standard deviation of brain voxels. Simple, effective, and the default in most deep learning pipelines (including nnU-Net). A study on multi-institutional brain MRI showed Z-score normalization increased tumor grade classification accuracy from 0.67 to 0.82 (p = .005). Set voxels outside the brain mask to zero after normalizing.

WhiteStripe Normalization

Normalizes based on the intensity distribution of normal-appearing white matter. More principled than Z-score because it anchors to a specific tissue type, but requires identifying white matter voxels. Performs comparably to Z-score in most benchmarks.

Histogram Matching (Nyul)

Aligns the intensity histogram of each image to a reference histogram learned from a training set. Good for radiomic feature consistency. One systematic review of glioma studies found histogram matching produced the most reliable features, though Z-score is more widely used in segmentation pipelines.

The critical evidence comes from Foltyn-Dumitru et al. (2024): testing four strategies on 615 glioma patients, N4 + Z-score and N4 + WhiteStripe both maintained high performance on external data (AUC 0.85–0.87), while using N4 alone or no normalization caused AUC to plummet to 0.19–0.52. The lesson is clear: always normalize intensities after bias field correction.

import numpy as np
import nibabel as nib

# Load the bias-corrected, skull-stripped image
img = nib.load('brain_t1ce_n4_stripped.nii.gz')
data = img.get_fdata()

# Create brain mask (non-zero voxels)
brain_mask = data > 0

# Z-score normalize within the brain
brain_voxels = data[brain_mask]
mean_val = np.mean(brain_voxels)
std_val = np.std(brain_voxels)
data[brain_mask] = (data[brain_mask] - mean_val) / std_val
data[~brain_mask] = 0 # Set non-brain to zero

# Save
out = nib.Nifti1Image(data, img.affine, img.header)
nib.save(out, 'brain_t1ce_normalized.nii.gz')

Domain Harmonization: The Multi-Site Problem

Even after standard preprocessing, scanner-related differences persist. This is called the scanner effect or domain shift, and it’s one of the biggest unsolved problems in medical image analysis. When your model trained at Hospital A performs poorly at Hospital B, this is usually why.

Advanced harmonization techniques aim to explicitly remove scanner effects:

ComBat

A statistical method originally from genomics, now widely used in neuroimaging. It models and removes batch effects (scanner differences) from extracted features. A comprehensive 2025 study on 28 MR scanners found ComBat effective for most metrics but less successful for functional connectivity measures. The Longitudinal ComBat variant handles data collected over time. Multiple studies confirm ComBat is essential for radiomic analyses.

Deep Learning Approaches (CycleGAN, Style Transfer)

Neural networks that translate images from one scanner’s “style” to another’s. A style transfer GAN framework achieved 94.41% balanced accuracy harmonizing images from unseen scanners across five large-scale datasets. However, a comparison study found that no current method fully harmonizes longitudinal multi-scanner data — this remains an active research frontier.

📚
For BraTS beginners: Don’t worry about advanced harmonization yet. Standard preprocessing (N4 + skull-stripping + Z-score normalization + resampling) is what you need for BraTS data. Harmonization becomes important when you work with your own multi-institutional clinical data or do radiomic feature extraction (Week 12).

Quality Control: Checking Your Work

Automated preprocessing can fail silently. A skull-stripping algorithm might clip off part of the cerebellum, or registration might subtly misalign a scan. These errors are invisible in aggregate metrics but devastating for individual predictions. Always visually inspect your preprocessed data.

What to Check

After skull-stripping: Load the brain mask overlaid on the original image in 3D Slicer. Scroll through all three planes (axial, sagittal, coronal). Look for: brain tissue outside the mask (under-stripping), non-brain tissue inside the mask (over-inclusion), or clipped regions (especially cerebellum and temporal lobes).

After registration: Overlay the registered image on the atlas template. Structures should roughly align. Major misalignment is obvious visually. A deep learning QC tool called RegQCNET can automatically estimate registration error, but visual checks remain the gold standard.

After normalization: Check that the intensity histogram looks reasonable — roughly Gaussian for brain voxels, no extreme outliers, zero-centered after Z-score.

Automated QC Tools

MRIQC extracts quality metrics from MRI scans and can flag problematic images (76% accuracy on new sites). MRQy is an open-source tool that can identify site-specific variations. However, a study comparing 12 QC strategies found that manual visual inspection still outperforms all automated alternatives — so don’t skip the eyeballing.

Essential Tools for This Week

The Software You Need Installed

3D Slicer or ITK-SNAP — For visual inspection. Load NIfTI files, scroll through slices, overlay masks. Non-negotiable. Python + nibabel — For loading and manipulating NIfTI files programmatically. SimpleITK — For N4 bias field correction, resampling, and registration. More feature-rich than nibabel for processing operations. ANTsPy — Python wrapper for ANTs, the top-performing registration toolkit. Also includes N4 and brain extraction. HD-BET — For skull-stripping on tumor cases (install via pip: pip install hd-bet).

This Week’s Learning Resources

Start Here (Beginner-Friendly)

Download 3D Slicer this week. Load some BraTS NIfTI files, overlay the segmentation labels, and practice navigating axial/sagittal/coronal views. This is your visual verification tool for every preprocessing step.
The Python library you’ll use constantly for loading and saving NIfTI files. Read the quickstart guide and try loading a BraTS scan, printing its header info, and visualizing a single slice with matplotlib.
Your Swiss Army knife for image processing: N4 correction, resampling, registration, filtering. The SimpleITK Notebooks on GitHub have worked examples for every operation you’ll need.
The recommended skull-stripping tool for brain tumor cases. From the same DKFZ group that built nnU-Net. Install with pip, run on your MRI, and compare the output mask to the original image in 3D Slicer.
Best general-purpose skull-stripping tool. Works on any MRI contrast, resolution, or age group with a single model. ~7 seconds per scan. Good alternative to HD-BET for non-tumor cases.
If you want to understand why bias fields exist and how MRI signal formation works, this award-winning interactive course is the best free resource. Helps you understand what you’re correcting and why.
Open-source preprocessing pipeline that reproduces the BraTS challenge preprocessing. The BraTS Preprocessor component handles image conversion, registration, and brain extraction in the same way the official challenge data is prepared.

Key Papers

The HD-BET paper. Validated on multisequence MRI including brain tumor patients. Outperformed six popular methods. Read this to understand why deep learning skull-stripping beats classical approaches on pathological brains.
Hum Brain Mapp. 2019;40(17):4952–4964
The SynthStrip paper. Demonstrates that training on entirely synthetic data can produce a single model that generalizes across contrasts, resolutions, and age groups. A paradigm shift in brain extraction methodology.
NeuroImage. 2022;260:119474
The strongest evidence that intensity normalization is essential. Shows N4 alone causes AUC to collapse on external data, while N4 + Z-score maintains performance. 615 glioma patients across institutions.
Eur Radiol. 2024;34(4):2535–2544
Systematic comparison of normalization methods on multi-institutional data. Z-score normalization increased classification accuracy from 0.67 to 0.82. Essential reading for understanding why normalization matters.
Sci Rep. 2020;10:12340
The foundational N4 bias field correction paper. Describes the B-spline approximation and hierarchical optimization that made N4 the standard. Still the most cited reference for bias correction.
IEEE Trans Med Imaging. 2010;29(6):1310–1320
The foundational BraTS paper describing the standardized preprocessing protocol (SRI24 atlas registration, skull-stripping, 1mm isotropic resampling) used for all BraTS datasets. Context for why the data you download looks the way it does.
IEEE Trans Med Imaging. 2015;34(10):1993–2024

Deep Dives (Advanced)

The definitive benchmark: 45,000+ registrations comparing ANTs, FSL, SPM, and 11 others. ANTs SyN won. Read this when you’re choosing a registration tool for your own pipeline.
Introduces the MICA tool for harmonizing images across scanners by aligning intensity distributions. Relevant when you move beyond BraTS to multi-site clinical data.
Comprehensive overview of ANTs, ANTsPy, and ANTsPyNet for registration, bias correction, segmentation, and deep learning. The complete reference for the ANTs ecosystem.
Multi-institutional evaluation of deep learning skull-stripping on 3,340 brain tumor scans. Introduces modality-agnostic training that works across any available MRI sequences.