Raw MRI data straight from the scanner is messy. Before any segmentation model can work reliably, you need to clean, standardize, and prepare the images. This week covers the essential preprocessing steps that every brain tumor segmentation pipeline depends on.
If you’ve ever taken a photo with your phone, you know the image looks different depending on lighting, camera settings, and the phone model. MRI is the same — except worse. The signal intensities in an MRI image are arbitrary. Unlike CT scans (where a Hounsfield unit of 0 always means water), a “bright voxel” on one MRI scanner might represent completely different tissue intensity on another.
On top of that, raw MRI scans include the skull, skin, fat, eyeballs, and other non-brain tissue. The images may have smooth intensity gradients across the volume caused by RF coil imperfections (called bias field). Different hospitals use different scanners, protocols, and resolutions. And patients’ heads are positioned differently in the scanner each time.
If you feed all of this variability into a deep learning model, it will learn the wrong things — latching onto scanner differences instead of tumor characteristics. Studies show that models trained at one hospital can drop from Dice scores of 0.72–0.76 down to 0.59–0.68 when tested at a different institution, largely due to these preprocessing-related differences.
Before you preprocess anything, you need to understand the file format. Brain MRI data for research (including BraTS) comes in NIfTI format (.nii or .nii.gz). A NIfTI file contains two things: a header (metadata about the image) and the image data itself (a 3D array of voxel intensities).
Dimensions: The size of the 3D volume (e.g., 240 × 240 × 155 voxels for BraTS). Voxel spacing (pixdim): The real-world size of each voxel in millimeters (e.g., 1mm × 1mm × 1mm). Orientation (qform/sform): How the voxel grid maps to physical patient space (left-right, anterior-posterior, superior-inferior). Data type: Whether intensities are stored as integers or floating-point numbers.
In Python, you load and inspect NIfTI files with nibabel:
import nibabel as nib
import numpy as np
# Load a NIfTI file
img = nib.load('BraTS_001_t1ce.nii.gz')
# Access the header metadata
header = img.header
print(header.get_zooms()) # Voxel spacing, e.g. (1.0, 1.0, 1.0)
print(header.get_data_shape()) # Volume dimensions, e.g. (240, 240, 155)
# Get the image data as a NumPy array
data = img.get_fdata() # Shape: (240, 240, 155)
print(data.dtype, data.min(), data.max())
There’s a recommended order of operations based on how each step interacts with the others. This is the pipeline used in BraTS and most brain tumor segmentation research:
Remove the smooth intensity gradients caused by RF coil non-uniformity. This must come first because the bias field affects the entire image, and all subsequent steps assume relatively uniform tissue intensities.
Align all four MRI modalities (T1, T1ce, T2, FLAIR) to a common space so the same voxel coordinate refers to the same brain location across sequences. This uses rigid or affine registration.
Register the brain to a standard anatomical template (BraTS uses the SRI24 atlas). This puts every patient’s brain into the same coordinate frame, making it possible to compare across subjects.
Remove all non-brain tissue: skull bone, skin, fat, eyes, neck. This isolates the brain volume so the model only sees relevant tissue and isn’t distracted by irrelevant anatomy.
Standardize the voxel intensity values so that the same tissue type has approximately the same intensity across different patients and scanners. Z-score normalization is the most common approach.
Resample all images to a uniform voxel spacing (typically 1mm × 1mm × 1mm isotropic). Use trilinear interpolation for images and nearest-neighbor interpolation for segmentation labels.
Look at a raw brain MRI and you’ll often notice that one side of the brain appears brighter than the other, or the center is darker than the edges. This isn’t biology — it’s an artifact from the RF receive coil’s non-uniform sensitivity. This smooth, low-frequency intensity variation is called the bias field (or “inhomogeneity field”).
N4ITK (an improvement over the earlier N3 algorithm) is the standard tool for correcting this. It estimates the bias field as a smooth B-spline surface and divides it out of the image. The original N4ITK paper by Tustison et al. (2010) remains the most cited reference for this step.
A 2024 study evaluated 240 different N4 parameter configurations and found that results are sensitive to parameter choices — but for brain MRI, the defaults in most tools (SimpleITK, ANTs) work well. The key takeaway from the literature: N4 alone is not enough. A study on 615 glioma patients showed that using N4 without subsequent intensity normalization caused dramatic performance drops (AUC dropping from 0.85 to 0.19–0.52 on external data). N4 must be paired with Z-score or WhiteStripe normalization.
import SimpleITK as sitk
# Load the image
input_img = sitk.ReadImage('scan_t1ce.nii.gz', sitk.sitkFloat32)
# Create a mask (optional but recommended)
mask = sitk.OtsuThreshold(input_img, 0, 1, 200)
# Run N4 bias field correction
corrector = sitk.N4BiasFieldCorrectionImageFilter()
corrected = corrector.Execute(input_img, mask)
# Save the result
sitk.WriteImage(corrected, 'scan_t1ce_n4.nii.gz')
Skull-stripping removes everything that isn’t brain: skull bone, scalp skin, fat, meninges, eyes, and neck tissue. It’s one of the most critical preprocessing steps because errors here propagate to everything downstream — bad brain extraction means bad segmentation.
There are two generations of tools:
FSL BET — Uses a deformable surface model. Fast but sensitive to parameter tuning and inconsistent across datasets. High failure rates on pathological brains. FreeSurfer — Hybrid watershed algorithm. Robust for T1-weighted images but very slow (hours per scan). ROBEX — Random Forest + point distribution model. Significantly outperformed BET and FreeSurfer across three public datasets with better cross-dataset consistency.
HD-BET — Neural network trained on multisequence MRI. Outperformed 6 popular methods by +1.16 to +2.50 Dice points. Robust on brains with tumors. The top recommendation for tumor patients. SynthStrip — Trained entirely on synthetic data. Works across any contrast, resolution, and age group with a single model. ~7 seconds per scan. Best for general-purpose use. deepbet — Trained on 7,837 images from 191 datasets. State-of-the-art Dice of 99.0. Processes one image in ~2 seconds.
Standard skull-stripping tools were designed for healthy brains. Tumors cause mass effect (pushing brain tissue aside), midline shift (displacing the brain’s centerline), and disrupted boundaries (tumor breaking through normal tissue borders). Classical tools often fail catastrophically on these cases.
Solutions include: HD-BET (specifically trained on tumor cases), OptiBET (uses registration-based back-projection to handle severely pathological brains), and modality-agnostic training (Thakur et al., 2020 — evaluated on 3,340 brain tumor scans from multiple institutions and generalizes across available MRI sequences without retraining).
Co-registration aligns the four MRI modalities (T1, T1ce, T2, FLAIR) to each other. Since they’re acquired in the same scanning session, the patient’s head is in roughly the same position, but small movements between sequences mean the images aren’t perfectly aligned. Rigid registration (translation + rotation, 6 parameters) usually suffices for within-session alignment.
Atlas registration maps the patient’s brain to a standard template (BraTS uses the SRI24 atlas, created from 24 normal adult brains at 3T). This puts all patients into a common coordinate frame. It typically uses affine registration (translation + rotation + scaling + shearing, 12 parameters).
The largest benchmark of 14 registration algorithms (Klein et al., 2009, over 45,000 registrations) found that ANTs SyN delivered the most consistently high accuracy. For linear registration, a comparison on nearly 10,000 images found that MINC’s BestLinReg had the lowest failure rate (0.44%) while FSL FLIRT had 11.11% failures and SPM had 30.66%.
Registration is fundamentally harder with tumors because tumors have no correspondence in the template brain — they create tissue where none should exist. Research shows that lower-grade gliomas register more accurately than glioblastomas, and that for mapping tumor locations, linear registration actually works as well as non-linear approaches. Advanced solutions include tumor growth simulation, pathology-aware registration (PORTR), and deep learning methods that “hallucinate” what the brain would look like without the tumor before registering.
This is arguably the most impactful preprocessing step for multi-site generalizability. Remember: MRI intensities are arbitrary. The same brain tissue can have completely different voxel values on different scanners. If you don’t normalize, your model learns scanner-specific patterns instead of anatomy.
Subtract the mean and divide by the standard deviation of brain voxels. Simple, effective, and the default in most deep learning pipelines (including nnU-Net). A study on multi-institutional brain MRI showed Z-score normalization increased tumor grade classification accuracy from 0.67 to 0.82 (p = .005). Set voxels outside the brain mask to zero after normalizing.
Normalizes based on the intensity distribution of normal-appearing white matter. More principled than Z-score because it anchors to a specific tissue type, but requires identifying white matter voxels. Performs comparably to Z-score in most benchmarks.
Aligns the intensity histogram of each image to a reference histogram learned from a training set. Good for radiomic feature consistency. One systematic review of glioma studies found histogram matching produced the most reliable features, though Z-score is more widely used in segmentation pipelines.
The critical evidence comes from Foltyn-Dumitru et al. (2024): testing four strategies on 615 glioma patients, N4 + Z-score and N4 + WhiteStripe both maintained high performance on external data (AUC 0.85–0.87), while using N4 alone or no normalization caused AUC to plummet to 0.19–0.52. The lesson is clear: always normalize intensities after bias field correction.
import numpy as np
import nibabel as nib
# Load the bias-corrected, skull-stripped image
img = nib.load('brain_t1ce_n4_stripped.nii.gz')
data = img.get_fdata()
# Create brain mask (non-zero voxels)
brain_mask = data > 0
# Z-score normalize within the brain
brain_voxels = data[brain_mask]
mean_val = np.mean(brain_voxels)
std_val = np.std(brain_voxels)
data[brain_mask] = (data[brain_mask] - mean_val) / std_val
data[~brain_mask] = 0 # Set non-brain to zero
# Save
out = nib.Nifti1Image(data, img.affine, img.header)
nib.save(out, 'brain_t1ce_normalized.nii.gz')
Even after standard preprocessing, scanner-related differences persist. This is called the scanner effect or domain shift, and it’s one of the biggest unsolved problems in medical image analysis. When your model trained at Hospital A performs poorly at Hospital B, this is usually why.
Advanced harmonization techniques aim to explicitly remove scanner effects:
A statistical method originally from genomics, now widely used in neuroimaging. It models and removes batch effects (scanner differences) from extracted features. A comprehensive 2025 study on 28 MR scanners found ComBat effective for most metrics but less successful for functional connectivity measures. The Longitudinal ComBat variant handles data collected over time. Multiple studies confirm ComBat is essential for radiomic analyses.
Neural networks that translate images from one scanner’s “style” to another’s. A style transfer GAN framework achieved 94.41% balanced accuracy harmonizing images from unseen scanners across five large-scale datasets. However, a comparison study found that no current method fully harmonizes longitudinal multi-scanner data — this remains an active research frontier.
Automated preprocessing can fail silently. A skull-stripping algorithm might clip off part of the cerebellum, or registration might subtly misalign a scan. These errors are invisible in aggregate metrics but devastating for individual predictions. Always visually inspect your preprocessed data.
After skull-stripping: Load the brain mask overlaid on the original image in 3D Slicer. Scroll through all three planes (axial, sagittal, coronal). Look for: brain tissue outside the mask (under-stripping), non-brain tissue inside the mask (over-inclusion), or clipped regions (especially cerebellum and temporal lobes).
After registration: Overlay the registered image on the atlas template. Structures should roughly align. Major misalignment is obvious visually. A deep learning QC tool called RegQCNET can automatically estimate registration error, but visual checks remain the gold standard.
After normalization: Check that the intensity histogram looks reasonable — roughly Gaussian for brain voxels, no extreme outliers, zero-centered after Z-score.
MRIQC extracts quality metrics from MRI scans and can flag problematic images (76% accuracy on new sites). MRQy is an open-source tool that can identify site-specific variations. However, a study comparing 12 QC strategies found that manual visual inspection still outperforms all automated alternatives — so don’t skip the eyeballing.
3D Slicer or ITK-SNAP — For visual inspection. Load NIfTI files, scroll through slices, overlay masks. Non-negotiable. Python + nibabel — For loading and manipulating NIfTI files programmatically. SimpleITK — For N4 bias field correction, resampling, and registration. More feature-rich than nibabel for processing operations. ANTsPy — Python wrapper for ANTs, the top-performing registration toolkit. Also includes N4 and brain extraction. HD-BET — For skull-stripping on tumor cases (install via pip: pip install hd-bet).