Medical Image Processing: All in One View

Last updated on 2024-08-14 | Edit this page

Estimated time: 10 minutes

Overview

Questions

How can I use this course to be better at my research?

Objectives

Explain how to get the most from the course
Demonstrate and explain how the course will be laid out

This is a lesson created in the style of The Carpentries. It is written with the assumption that you already possess skills in terms of git, Python and basic image processing.

The interpretation of medical images for clinical purposes requires skills that take highly trained professionals such as nuclear medicine specialists and radiologists many years to master. This course does not aim to improve such interpretive skills, but rather to enhance the computational skills needed to answer research questions involving medical images.

Some examples of the kinds of research questions that can be answered are:

Can we predict from brain MRIs when patients will become demented before they do?
Can we build machine learning models on ultrasound data which can aid in the detection of neuromuscular diseases?
Are there observable anatomical differences in the brains of autistic people at a population level?
Can we use existing medical imaging to screen for underdiagnosed conditions like osteoporosis?

You are in all likelihood here because you have a research question which can be answered with the processing and analysis of medical images. This course is meant to aid you.

Note that all figures and data presented are licensed under open-source terms.

Challenge

Challenge: Can You Do It?

What is the way to use the challenges and question?

Show me the solution

Do not peek, try to solve it yourself. The effort will pay off.

Content from Medical Imaging Modalities

Last updated on 2024-10-15 | Edit this page

Estimated time: 60 minutes

Overview

Questions

What are the common different types of diagnostic imaging?
What sorts of computational challenges do they present?

Objectives

Explain various common types of medical images
Explain at a high level how images’ metadata is created and organized

Introduction

Medical imaging uses many technologies including X-rays, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound, positron emission tomography (PET) and microscopy. Although there are tendencies to use certain technologies, or modalities to answer certain clinical questions, many modalities may provide information of interest in terms of research questions. In order to work with digital images at scale we need to use information technology. We receive images in certain types of files, e.g., an x-ray stored at the hospital in DICOM format, but the image itself is contained in a JPEG inside the DICOM as a 2D-array. Understanding all the kinds of files we are dealing with and how the images within them were generated can help us deal with them computationally.

Conceptually, we can think of medical images as signals. These signals need various kinds of processing before they are ‘readable’ by humans or by many of the algorithms we write.

While thinking about how the information from these signals is stored in different file types may seem less exciting than what the “true information” or final diagnosis from the image was, it is necessary to understand this to make the best algorithms possible. For example, a lot of hospital images are essentially JPEGs. This has implications in terms of image quality as we manipulate and resize the images.

Below are a few summaries about various ultra-common imaging types. Keep in mind that manufacturers may have specificities in terms of file types not covered here, and there are many possibilities in terms of how images could potentially be stored. Here we will discuss what is common to get in terms of files given to researchers.

X-Rays

Historically, x-rays were the first common form of medical imaging. The diagram below should help you visualize how they are produced. The signal from an x-ray generator crosses the subject. Some tissues attenuate the radiation more than others. The signal is captured by an x-ray detector (you can think of this metaphorically like photographic film) on the other side of the subject.

As you can imagine if you only have one view in an X-ray every object in the line of radiation from the generator is superimposed on every object below it. Even in the days of film X-rays often two views would be made. In the case of chest X-rays these could be a posteroanterior(PA) view and a lateral view. In the case of joints the views may be specific, however remember that in each view objects in the same line between the generator and receptor will be superimposed.

Knee series. image courtesy of Radiopaedia, author and ID on image

Modern x-rays are born digital. No actual “film” is produced, rather a DICOM file which typically contains arrays in JPEG files.

We could use the metaphor of a wrapped present here. The DICOM file contains metadata around the image data, wrapping it. The image data itself is a bunch of 2D-arrays, but these have been organized to a specific shape - they are “boxed” by JPEG files. JPEG is a container format. There are JPEG files (emphasis on the plural) in a single DICOM file which typically contain images of the same body part with different angles of acquisition.

We can take x-rays from any angle and even do them repeatedly, and this allows for fluoroscopy. Because fluoroscopy adds a time dimension to X-ray the data becomes 3-dimensional, possessing an X, Y and time dimension. Below is a fluoroscopy image of a patient swallowing barium.

Fluorsocopy. image courtesy of Ptrump16, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Fluoroscopy images are stored in a DICOM but can be displayed as movies because they are typically cine-files. Cine- is a file format that lets you store images in sequence with a frame rate.

Computed Tomography and Tomosynthesis

There are several kinds of tomography. This technique produces 3D-images, made of voxels, that allow us to see structures within a subject. CTs are extremely common, and helpful for many diagnostic questions, but have certain costs in terms of not only time and money, but also radiation to patients.

CTs and tomosynthetic images are produced with similar technology. One key difference is that in a CT the image is based on a 360 degree capture of the signal. You can conceptualize this as a spinning donut with the generator and receptor opposite to each other. The raw data of a CT is a sinogram. Only by processing this data do we get what most people would recognize as a CT. At this level of processing there are already choices effecting the data we get. Let’s examine two ways to process our sinograms:

PYTHON

import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import iradon
from skimage.transform import iradon_sart

# load a sinogram of a simple phantom of the head
sino = np.load('data/medical/Schepp_Logan_sinogram.npy')
# make a filtered back projection reconstruction
theta = np.linspace(0.0, 180.0, max(sino.shape), endpoint=False)
reconstruction_fbp = iradon(sino, theta=theta, filter_name='ramp')
# make a reconstruction with Simultaneous Algebraic Reconstruction Technique
reconstruction_sart = iradon_sart(sino, theta=theta)
# plot with matplotlib
fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(9, 5), sharex=True, sharey=True)
ax1.set_title("Sinogram")
ax1.imshow(sino, cmap=plt.cm.Greys_r,)
ax2.set_title("Reconstruction\nFiltered back projection")
ax2.imshow(reconstruction_fbp, cmap=plt.cm.Greys_r)
ax3.set_title("Reconstruction\nSART")
ax3.imshow(reconstruction_sart, cmap=plt.cm.Greys_r)
ax4.set_title("Difference\n between reconstructions")
ax4.imshow(reconstruction_sart - reconstruction_fbp, cmap=plt.cm.Greys_r)

plt.show()

OUTPUT

Images generated from the Shepp–Logan phantom

While you may get an already processed CT (Some commercial machines come with proprietary reconstruction algorithms which will already have been executed), it is not uncommon to get CTs as DICOM CT projection data (DICOM-CT-PD) files which can then be processed before viewing, or in some cases stored off as other file types. As shown in the code above there is more than one way to process the data into a radiologist friendly CT. Filtered Back Projection or Algebraic Reconstruction methods are shown but there are other methods such as iterative reconstruction, convolution back projection and even deep learning based methods.

Tomosynthesis makes X-ray based images using a limited angle instead of going all the way around the patient as in CT. The data from a tomosynthetic image is then processed so that you get multiple angles visible. This gets around the issue of overlapping objects in a plain film X-ray. In both the case of CT and tomosynthesis, the image output is made by processing the originally acquired data. Although most researchers work with already processed images, it is important to keep in mind that in theory the originally acquired data can be processed in a variety of ways.

Ultrasounds

Ultrasounds can produce multiple complex types of images. Ultrasound use high frequency sound waves, sent and captured from a piezoelectric probe (also known as a transducer) to get images.

Just as different tissues attenuate radiation differently, different tissues attenuate the sound waves differently. To be more precise different tissues reflect and absorb the waves differently and this can help us create images after some processing of the signal.

These images can be captured in rapid succession over time, so they can be saved as cine-files inside DICOMs. On the other hand, the sonographer can choose to record only a single ‘frame’, in which case a 2D-array will ultimately be saved.

Typically, sonographers produce a lot of B-mode images, but B-mode is far from the only type of ultrasounds. M-mode or motion mode, like the cine-files in B-mode, can also capture motion, but puts it into a a single 2D-array of one line of the image over time. In the compound image below you can see a B-mode 2D-image and an M-mode made on the line in it.

Callout

What Are Some Disadvantages to Ultrasounds in Terms of Computational Analysis?

Ultrasounds images are operator-dependent, often with embedded patient data, and the settings and patients’ positions can vary widely.

Challenge

Challenge: How to Reduce These Problems?

How can we optimize research involving ultrasounds in terms of the challenges above?

Show me the solution

Possible solutions include:

Reviewing metadata on existing images so it can be matched by machine type
Training technicians to use standardized settings only
Creating a machine set only on one power setting
Image harmonization/normalization algorithms

Magnetic Resonance Imaging

MRIs are images made by utilizing some fairly complicated physics in terms of what we can do to protons (abundant in human tissue) with magnets and radiofrequency waves, and how we capture their signal. Different ordering and timing of radiofrequency pulses and different magnetic gradients give us different MRI sequences. The actual signal on an anatomical MRI needs to be processed typically via Fourier transforms and some other computational work before it is recognizable as anatomy. The raw data is reffered to as the k-space data, and this can be kept in vendor specific formats or open common formats, e.g., ISMRMRD (International Society of Magnetic Resonance MR Raw Data). In practice, we rarely use the k-space data (unless perhaps we are medical physicists) for research on medical pathology. Nonetheless researchers in new sequences for MRI will be very interested in such data, and typically getting the fastest transformations of it possible. There are many ways the raw data could be transformed or used to produce an MRI. While an inverse Fourier transform is typical, a Hartley transform could be used, and some scientists even use deep learning based methods. Let’s look at k-space with a viridis color map:

Sourced from the FastMRI data set of NYU, 2024 (Knoll et al Radiol Artif Intell. 2020 Jan 29;2(1):e190007. doi: 10.1148/ryai.2020190007.https://pubs.rsna.org/doi/10.1148/ryai.2020190007 and the arXiv paper, https://arxiv.org/abs/1811.08839.)

Let’s do an example of a k-space transform

PYTHON

slice_kspace = np.load('data/medical/slice_kspace.npy')
# show shape
print(slice_kspace.shape)
# show type
print(type(slice_kspace))
# print type of an example pixel
print(type(slice_kspace[3,3]))

OUTPUT

(640, 368)
<class 'numpy.ndarray'>
<class 'numpy.complex64'>

Note we have an array that contains numbers with an imaginary element therefore the type is complex. We can extract and graph the real and imaginary parts, and also graph a transformation:

PYTHON

real_from_slice_kspace = slice_kspace.real
imaginary_from_slice_kspace = slice_kspace.imag
# make an inverse fourier
def inverse_fft2_shift(kspace):
    return np.fft.fftshift(np.fft.ifft2(np.fft.ifftshift(kspace, axes=(-2,-1)), norm='ortho'),axes=(-2,-1))
# graph both    
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(9, 5), sharex=True, sharey=True)
ax1.set_title("K-space real part")
ax1.imshow(real_from_slice_kspace, cmap=plt.cm.Greys_r,)
ax2.set_title("K-space imaginary part")
ax2.imshow(imaginary_from_slice_kspace, cmap=plt.cm.Greys_r,)
ax3.set_title("Transformed K-space")
ax3.imshow(np.abs(inverse_fft2_shift(slice_kspace)), cmap=plt.cm.Greys_r)

Hopefully you can see that our K-space like our sinogram is not so human-readable, and the transformed image is recognizable as a knee. A transformed type of image, one a radiologist will be able to read, is often what we are given. This final product we are used to looking at is such a post-processed 3D-array wrapped inside a DICOM file. We can transform the image, and parts of the metadata, to a variety of file types commonly used in research. These file types will be covered in more detail later in the course.

Callout

CT versus MRI

After processing both CT and standard MRIs will give a 3D image. However, it is important to understand that there are differences. An standard MRI sequence will give better differentiation between various soft tissues, wheras a CT will provide better images of bones. CTs can be acquired more quickly and cheaply, but have the hidden cost of radiation.

Challenge

Challenge: CT, MRIs and Artifacts?

You are researching tumors in adults. You team up with the best radiologist you can find and ask for imaging. She hands you DICOM files of some patient MRIs and CTs, and states “These are exactly the images I use. I have checked that they are all without artifacts” You have the images straight from the radiologist, could there potentially be any artifacts?

Show me the solution

In an absolute total sense, they could have artifacts. Both CT and MRI are reconstructed from original data, and the reconstruction will introduce artifacts. The radiologist thinks of artifacts as things like motion blurring from when the patient moves or wrap-around in MRIs when the field of view was set too small. These are obvious to the human eye. However technically every reconstruction algorithm can potentially introduce tiny artifacts not visible to the human eye in the reconstruction.

Other Image Types

Nuclear medicine images scans and pathology images are also broadly available inside hospitals.

Nuclear medicine images e.g. PET and SPECT can be 2D or 3D images based on a signal of a radiotracer given to the patient. When the radiotracer is consumed it lets off gamma rays which are then used as a signal. This type of image can be extremely useful in processes like looking for metastases. While 3D images registered with a CT or MRI give anatomic precision, in some cases a 2D image awnsers basic questions. In the image below a 2D bone scan shows metastasis from prostate cancer.

sourced from RadsWiki, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

Pathology is currently undergoing a revolution of digitalization, and a typical file format has not emerged yet. Pathology images may be DICOM, but could also be stored as specific kinds of TIFF files or other file types. Pathology is an entire medical discipline in which various types of images are used, both 2D, 3D and often multi-channel i.e. in color. However, one classic type of pathology image is that of a stained tissue slide seen by a microscope. With various stains we can see what is going on in a tissue on a cellular level. In the image below you can see macrophages that have come to sorround actinomyces in someone’s lung.

sourced from By Yale Rosen from USA - Actinomycosis, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=31127755

Beyond the more common types of imaging, researchers are actively looking into new forms of imaging. Some add new information to old modalities, like contrast-enhanced ultrasounds. Other new forms of imaging are novel in terms of the signal, such as terahertz imaging, which uses a previously ‘unused’ part of the electomagnetic radiation spectrum. As you might guess, the more novel the imaging, usually the less consolidation there is around file types and how they are organized. It is useful to remember that all these file types, whether on established image types or novel ones, are sorts of ‘containers’ for the ‘payload’ of the actual images which are the arrays. Often we simply need to know how to get the payload array out of its container and/or where to find certain metadata.

There is less standardization around file formats of certain types of imaging.

For example, while typical radiological images have settled in how they are recorded in DICOM, more novel sequences, such as arterial spin-labeled ones, do not have a standardized way of how they are recorded in it. Some newer forms of imaging such as electrical impedence tomography use entirely different kinds of technologies and signals than anything already existing. When it comes to truly novel imaging types there can be no standardization at all.

Callout

Standard Image types

Type	Signal	Standard File	Image file	Image array
X-ray	ionizing radiation	DICOM	JPEG 2000	2D arrays
Standard CT	ionizing radiation	DICOM	Raw or compressed voxel data	3D arrays
Ultrasound (B-mode)	high frequency sound	DICOM	CINE	2D array or 4D tensors
MRI (spin or gradient echo)	patient’s molecules	DICOM	Raw or compressed voxel data	3D arrays
Digital Pathology slides	light through stained tissue	no consensus	often converted to TIFF	multichannel 2D or 3D arrays

The above table is a drastic simplification as there are always cases where people use novel files or novel designs. There are also many newer technologies like 4D CT. Nonetheless it is useful to know some of the typical data structures you will usually find.

Key Points

Each imaging modality provides distinct sets of information
In computational imaging, images are essentially arrays, although embedded in additional data structures
Many images we may get e.g. MRIs and CTs have already been processed with some algorithms to make them human readable
Research should be thoughtfully designed, taking into account the constraints and capabilities inherent in human capacities
We can expect the emergence of additional imaging modalities in the future

Content from Working with MRI

Last updated on 2024-12-29 | Edit this page

Estimated time: 90 minutes

Overview

Questions

What kinds of MRI are there?
How are MRI data represented digitally?
How should I organize and structure files for neuroimaging MRI data?

Objectives

Mention common kinds of MRI imaging used in research
Show the most common file formats for MRI
Introduce MRI coordinate systems
Load an MRI scan into Python and explain how the data is stored
View and manipulate image data
Explain what BIDS is
Explain advantages of working with Nifti and BIDS
Show a method to convert from DICOM to BIDS/NIfTI

Introduction

When some programmers open their IDE, they type pwd because this command allows you to see what directory you are in, and then often they type git branch to see which branch they are on. When many clinicians open an MRI they look at the anatomy, but also figure out which sequence and type of MRI they are looking at. We all need to begin with a vague idea of where in the world we are and which direction we are facing, so to speak. As a digital researcher orienting yourself in terms of MRIs and related computation is important. In terms of computation MRIs, particularly neuro MRIs, have some quirks that make it worthwhile to think of them as a separate part of the universe than other medical imaging. Three critical differences between MRIs and other medical imaging are image orientation in terms of display conventions, typical file types and typical packages used for processing. In this lesson we will cover these three issues.

As a researcher in general we recommend familiarizing yourself with the various possible sequences of MRI. Some sequences are much more suited to answer certain questions than others. Generally we could divide MR techniques into structural e.g. T1, T2 and so on, functional, diffusion, perfusion, angiographic techniques and spectroscopy. The sequences you work with determining the shape of files to expect. As an example of what you would expect for structural imaging a 3-D array is the norm, but for diffusion imaging you have a 4D tensor plus .bval and .bvec files.
If you work directly with a radiology department you will usually get DICOM files that contain whatever sequences were done. However if you obtain images from elsewhere they may come in other formats.

File formats

From the MRI scanner, MRI images are initially collected and put in the DICOM format but can be converted to other formats to make working with the data easier. Some file formats can be converted to others, for example NiFTIs or Analyze files can be converted to MINC, but not conversions can not be made from all file formats in all directions.

Common MRI File Formats

To understand common file formats please review the table on neuroimaging file formats here. To get even more information than that table we have included links to documentation websites:

Format Name	More info
DICOM	https://www.dicomstandard.org/
Analyze	https://eeg.sourceforge.net/ANALYZE75.pdf
NIfTI	https://brainder.org/2012/09/23/the-nifti-file-format/
MINC	https://www.mcgill.ca/bic/software/minc
NRRD	https://teem.sourceforge.net/nrrd/format.html
MGH	https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/MghFormat

NIfTI is one of the most ubiquitous file formats for storing neuroimaging data. We can convert DICOM data to NIfTI using dcm2niix software. Many people prefer working with dcm2bids to configure long bash commands needed for dcm2niix.

For example if you just want to write bash using dcm2niix or dcm2bids to place a convert a hypothetical DICOM file (in a directory /hypothetical) into a hypothetical directory called /converted on your machine you can do it in the following way:

dcm2biids_helper -d /hypothetical -o /converted

One of the advantages of working with dcm2niix, whether you work with it directly or with a tool like dcm2bids is that it can be used to create Brain Imaging Data Structure (BIDS) files, since it outputs a NIfTI and a JSON with metadata ready to fit into the BIDS standard. BIDS is a widely adopted standard of how data from neuroimaging research can be organized. The organization of data and files is crucial for seamless collaboration across research groups and even between individual researchers. Some pipelines assume your data is organized in BIDS structure, and these are sometimes called BIDS Apps.

Some of the more popular examples are:

fmriprep
freesurfer
micapipe
SPM
MRtrix3_connectome

We recommend the BIDS starter-kit website for learning the basics of this standard. Working with BIDS or and much of the array of tools available for brain MRIs is often facilitated by some familiarity with writing command line. This could present bit of a problem in terms of scientific reproducibility. One option is to leave bash scripts and good documentation in a repository. Another is to use a pythonic interface to such command line tools. One possibility for this will be discussed in the next section.

Libraries

If you look at python code for research on medical imaging of most of the body, a few libraries, such as SITK, will pop up again and again. The story with MRIs of the brain is entirely different. Most researchers will use libraries like nibabel and related libraries. These libraries are made to read, write, and manipulate neuroimaging file formats efficiently. There is lots of code developed in them for the specific tasks related to brain MRIs. They also tend to integrate better with existing pipelines for brain MRI. nipype is a popular tool which attempts to allow users to harness popular existing pipelines. If you are trying to compare outputs with an existing study, it is worth considering using the same pipeline and software version of the pipeline. Then you know differences between your outcomes are not artifacts of the softwares.

Sourced from https://www.nipreps.org/

Callout

Nipreps and Beyond:

There are many, many packages for medical image analysis
There are known pre-built pipelines with possibilities in python
Your pipeline will probably begin with cleaning and preparing data
You can mix and match parts of pipelines with NiPype

While you may harness sophisticated tools like Freesurfer or FSL to do complex tasks like segmentation and registration in ways that match existing research, you should not avoid checking your data by hand. At some point you should open images and examine them both in terms of the imaging (which could be done with a pre-built viewer) and the computational aspects like metadata and shape. For these mundane tasks we recommend nibabel.

Next, we’ll cover some details on working with NIfTI files with Nibabel.

Reading NIfTI Images

NiBabel is a Python package for reading and writing neuroimaging data. To learn more about how NiBabel handles NIfTIs, refer to the NiBabel documentation on working with NIfTIs, which this episode heavily references. This part of the lesson is also similar to an existing lessons from The Carpentries Incubator; namely: Introduction to Working with MRI Data in Python. This is because in terms of reading and loading data and accessing the image, there are limited possibilities.

PYTHON

import nibabel as nib

First, use the load() function to create a NiBabel image object from a NIfTI file.

PYTHON

t2_img = nib.load("data/mri//OBJECT_phantom_T2W_TSE_Cor_14_1.nii")

When loading a NIfTI file with NiBabel, you get a specialized data object that includes all the information stored in the file. Each piece of information is referred to as an attribute in Python’s terminology. To view all these attributes, simply type t2_img. followed by Tab. We will focus on discussing mainly two attributes (header and affine) and one method (get_fdata).

1. Headers

The header contains metadata about the image, including image dimensions, data type, and more.

PYTHON

t2_hdr = t2_img.header
print(t2_hdr)

OUTPUT

<class 'nibabel.nifti1.Nifti1Header'> object, endian='<'
sizeof_hdr      : 348
data_type       : b''
db_name         : b''
extents         : 0
session_error   : 0
regular         : b''
dim_info        : 0
dim             : [  3 432 432  30   1   1   1   1]
intent_p1       : 0.0
intent_p2       : 0.0
intent_p3       : 0.0
intent_code     : none
datatype        : int16
bitpix          : 16
slice_start     : 0
pixdim          : [1.        0.9259259 0.9259259 5.7360578 0.        0.        0.
 0.       ]
vox_offset      : 0.0
scl_slope       : nan
scl_inter       : nan
slice_end       : 29
slice_code      : unknown
xyzt_units      : 2
cal_max         : 0.0
cal_min         : 0.0
slice_duration  : 0.0
toffset         : 0.0
glmax           : 0
glmin           : 0
descrip         : b'Philips Healthcare Ingenia 5.4.1 '
aux_file        : b''
qform_code      : scanner
sform_code      : unknown
quatern_b       : 0.008265011
quatern_c       : 0.7070585
quatern_d       : -0.7070585
qoffset_x       : 180.81993
qoffset_y       : 21.169691
qoffset_z       : 384.01007
srow_x          : [1. 0. 0. 0.]
srow_y          : [0. 1. 0. 0.]
srow_z          : [0. 0. 1. 0.]
intent_name     : b''
magic           : b'n+1'

t2_hdr is a Python dictionary, i.e. a container that hold pairs of objects - keys and values. Let’s take a look at all of the keys.

Similar to t2_img, in which attributes can be accessed by typing t2_img. followed by Tab, you can do the same with t2_hdr.

In particular, we’ll be using a method belonging to t2_hdr that will allow you to view the keys associated with it.

PYTHON

print(t2_hdr.keys())

OUTPUT

['sizeof_hdr',
 'data_type',
 'db_name',
 'extents',
 'session_error',
 'regular',
 'dim_info',
 'dim',
 'intent_p1',
 'intent_p2',
 'intent_p3',
 'intent_code',
 'datatype',
 'bitpix',
 'slice_start',
 'pixdim',
 'vox_offset',
 'scl_slope',
 'scl_inter',
 'slice_end',
 'slice_code',
 'xyzt_units',
 'cal_max',
 'cal_min',
 'slice_duration',
 'toffset',
 'glmax',
 'glmin',
 'descrip',
 'aux_file',
 'qform_code',
 'sform_code',
 'quatern_b',
 'quatern_c',
 'quatern_d',
 'qoffset_x',
 'qoffset_y',
 'qoffset_z',
 'srow_x',
 'srow_y',
 'srow_z',
 'intent_name',
 'magic']

Notice that methods require you to include () at the end of them when you call them whereas attributes do not. The key difference between a method and an attribute is:

Attributes are variables belonging to an object and containing information about their properties or characteristics
Methods are functions that belong to an object and operate on its attributes. They differ from regular functions by implicitly receiving the object (self) as their first argument.

When you type in t2_img. followed by Tab, you may see that attributes are highlighted in orange and methods highlighted in blue.

The output above is a list of keys you can use to access values of t2_hdr. We can access the value stored by a given key by typing:

PYTHON

print(t2_hdr['<key_name>'])

Challenge

Challenge: Extract Values from the NIfTI Header

Extract the ‘pixdim’ field from the NiFTI header of the loaded image.

Show me the solution

PYTHON

print(t2_hdr['pixdim'])

OUTPUT

array([1. , 0.9259259, 0.9259259, 5.7360578, 0. , 0. , 0. , 0. ], dtype=float32)

2. Data

As you’ve seen above, the header contains useful information that gives us information about the properties (metadata) associated with the MR data we’ve loaded in. Now we’ll move in to loading the actual image data itself. We can achieve this by using the method called t2_img.get_fdata():

PYTHON

t2_data = t2_img.get_fdata()
print(t2_data)

OUTPUT

array([[[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],

       ...,

       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]])

The initial observation you might make is the prevalence of zeros in the image. This abundance of zeros might prompt concerns about the presence of any discernible content in the picture. However, when working with radiological images, it’s important to keep in mind that these images frequently contain areas of air surrounding the objects of interest, which appear as black space.

What type of data is this exactly in a computational sense? We can determine this by calling the type() function on t2_data:

PYTHON

print(type(t2_data))

OUTPUT

numpy.ndarray

The data is stored as a multidimensional array, which can also be accessed through the file’s dataobj property:

PYTHON

t2_img.dataobj

OUTPUT

<nibabel.arrayproxy.ArrayProxy at 0x20c63b5a4a0>

As you might guess there are differences in how your computer handles something made with dataobj and an actual array. These differences effect memory and processing speed. These are not trivial issues if you deal with a very large dataset of MRIs. You can save time and memory by being conscious about what is cached and using the dataobj property when dealing with slices of the array as detailed here

Challenge

Challenge: Meaning of Attributes of the Array

How can we see the number of dimensions in the t2_data array? Once again, all of the attributes of the array can be seen by typing t2_data. followed by Tab. What is the shape of the image? Can you make a guess about then size of a voxel based on the numbers you have here? Why or why not?

Show me the solution

PYTHON

print(t2_data.ndim)

OUTPUT

t2_data contains 3 dimensions. You can think of the data as a 3D version of a picture (more accurately, a volume).

Image by Tropwine, sourced from Wikimedia Commons (2024).https://commons.m.wikimedia.org/wiki/File:3D_array_diagram.svg; Creative Commons Attribution 4.0 International License

Remember typical 2D pictures are made out of pixels, but a 3D MR image is made up of 3D cubes called voxels.

Sourced with minor modification from Ahmed, M., Garzanich, M., Melaragno, L.E. et al. Exploring CT pixel and voxel size effect on anatomic modeling in mandibular reconstruction. 3D Print Med 10, 21 (2024). https://doi.org/10.1186/s41205-024-00223-0; Creative Commons Attribution 4.0 International License

With this in mind we examine the shape:

PYTHON

print(t2_data.shape)

OUTPUT

(432, 432, 30)

The three numbers given here represent the number of values along a respective dimension (x,y,z). This image was scanned in 30 slices, each with a resolution of 432 x 432 voxels. If each voxel were 1 millimeter, then our image would represent something 3cm tall (so to speak), and this seems unlikely. However, object here is not human, and could be in theory be scanned on a special MRI machine of any size. We can not make a guess from the data we printed here alone about the size of a voxel. We will learn one method to do this later in the episode.

Let’s see the type of data inside of the array.

PYTHON

print(t2_data.dtype)

OUTPUT

dtype('float64')

This tells us that each element in the array (or voxel) is a floating-point number.
The data type of an image controls the range of possible intensities. As the number of possible values increases, the amount of space the image takes up in memory also increases.

PYTHON

import numpy as np
print(np.min(t2_data))
print(np.max(t2_data))

OUTPUT

0.0
630641.0785522461

For our data, the range of intensity values goes from 0 (black) to more positive digits (whiter).

To examine the value of a specific voxel, you can access it using its indices. For example, if you have a 3D array data, you can retrieve the value of a voxel at coordinates (x, y, z) with the following syntax:

PYTHON

value = data[x, y, z]

This will give you the value stored at the voxel located at the specified index (x, y, z). Make sure that the indices are within the bounds of the array dimensions.

To inspect the value of a voxel at coordinates (9, 19, 2), you can use the following code:

PYTHON

print(t2_data[9, 19, 2])

OUTPUT

0.

This command retrieves and prints the intensity value at the specified voxel. The output represents the signal intensity at that particular location.

Next, we will explore how to extract and visualize larger regions of interest, such as slices or arrays of voxels, for more comprehensive analysis.

Working with Image Data

When we think and speak about about “slicing” colloqiually, we often think about a 2D slice from our data.

Figure with adaptation from Valdes-Hernandez, P.A., Laffitte Nodarse, C., Peraza, J.A. et al. Toward MR protocol-agnostic, unbiased brain age predicted from clinical-grade MRIs. Sci Rep 13, 19570 (2023). https://doi.org/10.1038/s41598-023-47021-y ; Creative Commons Attribution 4.0 International License

From left to right: coronal, saggital and axial slices of a brain.

Let’s select the 10th slice in the z-axis of our data:

PYTHON

z_slice = t2_data[:, :, 9]

This is similar to the indexing we did before to select a single voxel. However, instead of providing a value for each axis, the : indicates that we want to grab all values from that particular axis.

In the nibabel documentation there are more details on how to use a slicer attribute of a nibabel array data. Using this attribute can help to make code more efficient.

Challenge

Challenge: Slicing MRI Data

Write a python function using nibabel to take an image string, and coordinates for any slices desired on x, y and z axes; and plot the image slices. Then use the function to display slices of t2_data at x = 9, y = 9, and z= 9.

Show me the solution

PYTHON

# if in a new notebook re-import
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt


def slice_image_data(file_path, x, y, z):
    # load the image
    try:
        img = nib.load(file_path)
    except Exception as e:
        print(f"Error loading the image: {e}")
        return

    # get the image array
    data = img.get_fdata()

    # slice the image data
    # and extract a specific slice along each plane
    z_slice = data[:, :, z]  # slice
    y_slice = data[:, y, :]  # slice
    x_slice = data[x, :, :]  # slice
    # make each slice is 2D by explicitly removing unnecessary dimensions
    z_slice = z_slice.squeeze()  # this ensures it's a 2D array
    y_slice = y_slice.squeeze()  
    x_slice = x_slice.squeeze()  
    
    print(z_slice.shape)

    # visualize the slices
    plt.figure(figsize=(12, 8))

    # plot the z slice
    plt.subplot(2, 2, 1)
    plt.imshow(z_slice.T, cmap='gray', origin='lower')
    plt.title('Z Slice')
    plt.axis('off')

    # plot the y slice
    plt.subplot(2, 2, 2)
    plt.imshow(y_slice.T, cmap='gray', origin='lower')
    plt.title('Y Slice')
    plt.axis('off')

    # plot the x slice
    plt.subplot(2, 2, 3)
    plt.imshow(x_slice.T, cmap='gray', origin='lower')
    plt.title('X Slice')
    plt.axis('off')

slice_image_data('data/mri//OBJECT_phantom_T2W_TSE_Cor_14_1.nii',9,9,9)

In the above exercise you may note many solutions do not return anything. This is not unusal in Python when we only want a graph or visualization. We really only want an artifact of some functions, however we could return the artifact with more code if needed.

We’ve been slicing and dicing images but we have no idea what they look like in a more global sense. In the next section we’ll show you one way you can visualize it all together.

Visualizing the Data

We previously inspected the signal intensity of the voxel at coordinates (10,20,3).

We could look at the whole image at once by using a viewer. We can even use code to make a viewer.

PYTHON

import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widget
import nibabel as nib
import importlib


class NiftiSliceViewer:
    """
    A class to examine slices of MRIs, which are in Nifti Format. Similar code appears
    in several open source liberally licensed libraries written by drcandacemakedamoore 

    """

    def __init__(self, volume_str, figsize=(10, 10)):
        self.nifti = nib.load(volume_str)
        self.volume = self.nifti.get_fdata()
        self.figsize = figsize
        self.v = [np.min(self.volume), np.max(self.volume)]
        self.widgets = importlib.import_module('ipywidgets')

        self.widgets.interact(self.transpose, view=self.widgets.Dropdown(
            options=['axial', 'saggital', 'coronal'],
            value='axial',
            description='View:',
            disabled=False))

    def transpose(self, view):
        # transpose the image to orient according to the slice plane selection
        orient = {"saggital": [1, 2, 0], "coronal": [2, 0, 1], "axial": [0, 1, 2]}
        self.vol = np.transpose(self.volume, orient[view])
        maxZ = self.vol.shape[2] - 1

        self.widgets.interact(
            self.plot_slice,
            z=self.widgets.IntSlider(
                min=0,
                max=maxZ,
                step=1,
                continuous_update=True,
                description='Image Slice:'
            ),
        )

    def plot_slice(self, z):
        # plot slice for plane which will match the widget intput
        self.fig = plt.figure(figsize=self.figsize)
        plt.imshow(
            self.vol[:, :, z],
            cmap="gray",
            vmin=0,
            vmax=self.v[1],
        )
# now we wil use our class on our image        
NiftiSliceViewer('data/mri//OBJECT_phantom_T2W_TSE_Cor_14_1.nii')

Notice our viewer has axial, saggital and coronal slices. These are anatomical terms that allow us to make some assumptions about where things are in space if we assume the patient went into the machine in a certain way. When we look at an abstract image, these terms have little meaning, but let’s take a look at a head.

Challenge

Challenge: Understanding anatomical terms for brains

Use the internet to read at least two sources on the meaning of the terms axial, coronal and saggital. Write definitions for axial, saggital and coronal in terms of how they slice up the head at the eyes in your own words.

Show me the solution

Axial: Slices the head as if slicing across both eyes, such that there is a slice between higher and lower levels Saggital: Slicing between or on the side of the eyes, such that there is a slice between more left and more right Coronal: Slicing both eyes such that there is a slice between front and back levels

This brings us to the final crucial attribute of a NIfTI we will discuss: affine.

3. Affine

The final important piece of metadata associated with an image file is the affine matrix, which indicates the position of the image array data in the reference space. By reference space we usually mean a predefined system mapping to real world space if we are talking about real patient data.

Below is the affine matrix for our data:

PYTHON

t2_affine = t2_img.affine
print(t2_affine)

OUTPUT

array([[-9.25672975e-01,  2.16410652e-02, -1.74031337e-05,
         1.80819931e+02],
       [ 2.80924864e-06, -3.28338569e-08, -5.73605776e+00,
         2.11696911e+01],
       [-2.16410652e-02, -9.25672975e-01, -2.03403855e-07,
         3.84010071e+02],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         1.00000000e+00]])

To explain this concept, recall that we referred to coordinates in our data as (x,y,z) coordinates such that:

x is the first dimension of t2_data
y is the second dimension of t2_data
z is the third dimension of t2_data

Although this tells us how to access our data in terms of voxels in a 3D volume, it doesn’t tell us much about the actual dimensions in our data (centimetres, right or left, up or down, back or front). The affine matrix allows us to translate between voxel coordinates in (x,y,z) and world space coordinates in (left/right, bottom/top, back/front). An important thing to note is that in reality in which order you have:

Left/right
Bottom/top
Back/front

Depends on how you’ve constructed the affine matrix; thankfully there is in depth coverage of the issue the nibabel documentation For most of the the data we’re dealing with we use a RAS coordinate system so it always refers to:

Right
Anterior
Superior

Increasing a coordinate value in the first dimension corresponds to moving to the right of the person being scanned, and so on. In the real world whatever orientation you put something in may make someone unhappy. Luckily you can quickly change arrays around in terms of direction by simply using an already efficient numpy functions. Two functions in numpy that can be generalized to make any orientation of an image are numpy.flip() and numpy.rot90(), however there are other functions which are quite convenient for 2D arrays, as displayed below.

PYTHON

import numpy as np
slices = [z_slice, np.fliplr(z_slice), np.flipud(z_slice)]
fig, axes = plt.subplots(1, len(slices))
for i, slice in enumerate(slices):
    axes[i].imshow(slice, cmap="gray", origin="lower")

OUTPUT

This brings us to a final difference we must account for when we discuss neuro-MRIs: anatomy and visualization conventions.

Display conventions

When we describe imaging of any part of the body except the brain, as professionals we all agree on certain conventions. We rely on anatomical position as the basis of how we orient ourselves, and we expect the patient’s right side to be on the left of our screen, and certain other convensions. In terms of brain MRIs, this is less true; there is a split. The issue is extremely well summarized in the nibabel documentation on radiological versus neurological conventions.

MRI processing in Python

To get a deeper view of MRI processing in Python you can explore lessons from Carpentries Incubators; namely:

Content from Registration and Segmentation with SITK

Last updated on 2024-10-08 | Edit this page

Estimated time: 150 minutes

Overview

Questions

What are SITK Images?
How can registration be implemented in SITK?
How can I segment an image in SITK?

Objectives

Explain how to perform basic operations on SITK Images
Explain when registration can be needed and how to register images with SITK
Become familiar with basic segmentation algorithms available in SITK

Introduction

In the realm of medical imaging analysis, registration and segmentation play crucial roles. Registration aligns images, enabling the fusion of data or the tracking of changes over time. Segmentation identifies and labels objects of interest within images. Automation is essential due to high clinical demand, driving the development of robust algorithms for both processes. Moreover, many advanced segmentation techniques utilize image registration, whether implicitly or explicitly.

SimpleITK (SITK) is a simplified programming interface to the algorithms and data structures of the Insight Toolkit (ITK) for segmentation, registration and advanced image analysis, available in many programming languages (e.g., C++, Python, R, Java, C#, Lua, Ruby).

SITK is part of the Insight Software Consortium a non-profit educational consortium dedicated to promoting and maintaining open-source, freely available software for medical image analysis. Its copyright is held by NumFOCUS, and the software is distributed under the Apache License 2.0.

In this episode, we use a hands-on approach utilizing Python to show how to use SITK for performing registration and segmentation tasks in medical imaging use cases.

Fundamental Concepts

In this section, we’ll cover some fundamental image processing operations using SITK, such as reading and writing images, accessing pixel values, and resampling images.

Images

The fundamental tenet of an image in ITK and consequentially in SITK is that an image is defined by a set of points on a grid occupying a physical region in space . This significantly differs from many other image analysis libraries that treat an image as an array which has two implications: (1) pixel/voxel spacing is assumed to be isotropic and (2) there is no notion of an image’s location in physical space.

SITK images are multi-dimensional (the default configuration includes images from two dimensional up to five dimensional) and can be a scalar, labelmap (scalar with run length encoding), complex value or have an arbitrary number of scalar channels (also known as a vector image). The region in physical space which an image occupies is defined by the image’s:

Origin (vector like type) - location in the world coordinate system of the voxel with all zero indexes.
Spacing (vector like type) - distance between pixels along each of the dimensions.
Size (vector like type) - number of pixels in each dimension.
Direction cosine matrix (vector like type representing matrix in row major order) - direction of each of the axes corresponding to the matrix columns.

The following figure illustrates these concepts.

In SITK, when we construct an image we specify its dimensionality, size and pixel type, all other components are set to reasonable default values:

Origin - all zeros.
Spacing - all ones.
Direction - identity.
Intensities in all channels - all zero.

The tenet that images occupy a spatial location in the physical world has to do with the original application domain of ITK and SITK, medical imaging. In that domain images represent anatomical structures with metric sizes and spatial locations. Additionally, the spacing between voxels is often non-isotropic (most commonly the spacing along the inferior-superior/foot-to-head direction is larger). Viewers that treat images as an array will display a distorted image as shown below:

As an image is also defined by its spatial location, two images with the same pixel data and spacing may not be considered equivalent. Think of two CT scans of the same patient acquired at different sites. The following figure illustrates the notion of spatial location in the physical world, the two images are considered different even though the intensity values and pixel spacing are the same.

As SITK images occupy a physical region in space, the quantities defining this region have metric units (cm, mm, etc.). In general SITK assumes units are in millimeters (historical reasons, due to DICOM standard). In practice SITK is not aware of the specific units associated with each image, it just assumes that they are consistent. Thus, it is up to you the developer to ensure that all of the images you read and created are using the same units.

A SITK image can have an arbitrary number of channels with the content of the channels being a scalar or complex value. This is determined when an image is created.

In the medical domain, many image types have a single scalar channel (e.g. CT, US). Another common image type is a three channel image where each channel has scalar values in [0,255], often people refer to such an image as an RGB image. This terminology implies that the three channels should be interpreted using the RGB color space. In some cases you can have the same image type, but the channel values represent another color space, such as HSV (it decouples the color and intensity information and is a bit more invariant to illumination changes). SITK has no concept of color space, thus in both cases it will simply view a pixel value as a 3-tuple.

Let’s read an example of human brain CT, and let’s explore it with SITK.

PYTHON

%matplotlib inline
import matplotlib.pyplot as plt
import SimpleITK as sitk

img_volume = sitk.ReadImage("data/sitk/A1_grayT1.nrrd")

print(type(img_volume))
print(img_volume.GetOrigin())
print(img_volume.GetSpacing())
print(img_volume.GetDirection())

OUTPUT

<class 'SimpleITK.SimpleITK.Image'>
(-77.625, -107.625, 119.625)
(0.75, 0.75, 0.75)
(0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, -1.0, 0.0)

The function sitk.ReadImage loads the file as a SimpleITK.SimpleITK.Image instance, and then we can access useful methods to get an idea of how the image/s contained in the file is/are. The size of the image’s dimensions have explicit accessors:

PYTHON

print(img_volume.GetSize())
print(img_volume.GetWidth())
print(img_volume.GetHeight())
print(img_volume.GetDepth())
print(img_volume.GetNumberOfComponentsPerPixel())
print(img_volume.GetPixelIDTypeAsString())

OUTPUT

(288, 320, 208)
288
320
208
1
32-bit float

Just inspecting these accessors, we deduce that the file contains a volume made of 208 images, each made of 288x320 pixels and one channel only (grayscale).

Callout

SITK Conventions

Image access is in x,y,z order, image.GetPixel(x,y,z) or image[x,y,z], with zero based indexing.
If the output of an ITK filter has non-zero starting index, then the index will be set to 0, and the origin adjusted accordingly.

Callout

Displaying Images

While SITK does not do visualization, it does contain a built in Show method. This function writes the image out to disk and than launches a program for visualization. By default it is configured to use ImageJ, because it is readily supports all the image types which SITK has and load very quickly.

PYTHON

sitk.Show?

SITK provides two options for invoking an external viewer, use a procedural interface (the Show function) or an object oriented one. For more details about them, please refer to this notebook from the official documentation.

In this episode we will convert SITK images to numpy arrays, and we will plot them as such.

Images as Arrays

We have two options for converting from SITK to a numpy array:

GetArrayFromImage(): returns a copy of the image data. You can then freely modify the data as it has no effect on the original SITK image.
GetArrayViewFromImage(): returns a view on the image data which is useful for display in a memory efficient manner. You cannot modify the data and the view will be invalid if the original SITK image is deleted.

The order of index and dimensions need careful attention during conversion. ITK’s Image class has a GetPixel which takes an ITK Index object as an argument, which is ordered as (x,y,z). This is the convention that SITK’s Image class uses for the GetPixel method and slicing operator as well. In numpy, an array is indexed in the opposite order (z,y,x). Also note that the access to channels is different. In SITK you do not access the channel directly, rather the pixel value representing all channels for the specific pixel is returned and you then access the channel for that pixel. In the numpy array you are accessing the channel directly. Let’s see this in an example:

PYTHON

import numpy as np

multi_channel_3Dimage = sitk.Image([2, 4, 8], sitk.sitkVectorFloat32, 5)
x = multi_channel_3Dimage.GetWidth() - 1
y = multi_channel_3Dimage.GetHeight() - 1
z = multi_channel_3Dimage.GetDepth() - 1
multi_channel_3Dimage[x, y, z] = np.random.random(
    multi_channel_3Dimage.GetNumberOfComponentsPerPixel()
)

nda = sitk.GetArrayFromImage(multi_channel_3Dimage)

print("Image size: " + str(multi_channel_3Dimage.GetSize()))
print("Numpy array size: " + str(nda.shape))

# Notice the index order and channel access are different:
print("First channel value in image: " + str(multi_channel_3Dimage[x, y, z][0]))
print("First channel value in numpy array: " + str(nda[z, y, x, 0]))

OUTPUT

Image size: (2, 4, 8)
Numpy array size: (8, 4, 2, 5)
First channel value in image: 0.5384010076522827
First channel value in numpy array: 0.538401

Going back to the loaded file, let’s plot the array version of the volume slice from the middle of the stack, along the z axis, using different color maps:

PYTHON

npa = sitk.GetArrayViewFromImage(img_volume)

# Display the image slice from the middle of the stack, z axis
z = int(img_volume.GetDepth() / 2)
npa_zslice = sitk.GetArrayViewFromImage(img_volume)[z, :, :]

# Three plots displaying the same data, how do we deal with the high dynamic range?
fig = plt.figure(figsize=(10, 3))

fig.add_subplot(1, 3, 1)
plt.imshow(npa_zslice)
plt.title("default colormap", fontsize=10)
plt.axis("off")

fig.add_subplot(1, 3, 2)
plt.imshow(npa_zslice, cmap=plt.cm.Greys_r)
plt.title("grey colormap", fontsize=10)
plt.axis("off")

fig.add_subplot(1, 3, 3)
plt.title(
    "grey colormap,\n scaling based on volumetric min and max values", fontsize=10
)
plt.imshow(npa_zslice, cmap=plt.cm.Greys_r, vmin=npa.min(), vmax=npa.max())
plt.axis("off")

We can also do the reverse, i.e. converting a numpy array to the SITK Image:

PYTHON

img_zslice = sitk.GetImageFromArray(npa_zslice)
print(type(img_zslice))

OUTPUT

<class 'SimpleITK.SimpleITK.Image'>

We can also plot multiple slices at the same time, for better inspecting the volume:

PYTHON

img_xslices = [img_volume[x,:,:] for x in range(50, 200, 30)]
img_yslices = [img_volume[:,y,:] for y in range(50, 200, 30)]
img_zslices = [img_volume[:,:,z] for z in range(50, 200, 30)]

tile_x = sitk.Tile(img_xslices, [1,0])
tile_y = sitk.Tile(img_yslices, [1,0])
tile_z = sitk.Tile(img_zslices, [1,0])

nda_xslices = sitk.GetArrayViewFromImage(tile_x)
nda_yslices = sitk.GetArrayViewFromImage(tile_y)
nda_zslices = sitk.GetArrayViewFromImage(tile_z)

fig, (ax1, ax2, ax3) = plt.subplots(1,3)
ax1.imshow(nda_xslices, cmap=plt.cm.Greys_r)
ax2.imshow(nda_yslices, cmap=plt.cm.Greys_r)
ax3.imshow(nda_zslices, cmap=plt.cm.Greys_r)
ax1.set_title('X slices')
ax2.set_title('Y slices')
ax3.set_title('Z slices')

Operations like slice indexing, cropping, flipping, … can be performed on SITK images very similarly as it is usually done in numpy. Note that slicing of SITK images returns a copy of the image data, similarly to slicing Python lists, and differently from the “view” returned by slicing numpy arrays.

PYTHON

n_slice = 150

# Original slice
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, :, n_slice]), cmap="gray")

PYTHON

# Cropping
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, :100, n_slice]), cmap="gray")

PYTHON

# Flipping
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, ::-1, n_slice]), cmap="gray")

PYTHON

# Subsampling
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, ::3, n_slice]), cmap="gray")

PYTHON

# Comparative operators
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, :, n_slice] > 90), cmap="gray")

PYTHON

# Draw a square
draw_square = sitk.GetArrayFromImage(img_volume[:, :, n_slice])
draw_square[0:100,0:100] = draw_square.max()
plt.imshow(draw_square, cmap="gray")

Another example of operation we can perform is creating a grid mask and apply it to an image:

PYTHON

img_zslice = img_volume[:, :, n_slice]
# Create a grid and use as mask
grid_image = sitk.GridSource(
    outputPixelType=sitk.sitkUInt16,
    size=img_zslice.GetSize(),
    sigma=(0.1, 0.1),
    gridSpacing=(20.0, 20.0),
)

# zero out the values in the original image that correspond to
# the grid lines in the grid_image
img_zslice[grid_image == 0] = 0
nda = sitk.GetArrayViewFromImage(img_zslice)
plt.imshow(nda, cmap="gray")

Challenge

Challenge: Fix the Error (Optional)

By running the lines of code above for masking the slice with the grid, you will get an error. Can you guess what is it about?

Show me the solution

By default, SITK creates images centered in the origin, which all ones spacing. We need to have the same values in the grid and in the image in order to superimpose the first to the latter.

PYTHON

grid_image.SetOrigin(img_zslice.GetOrigin())
grid_image.SetSpacing(img_zslice.GetSpacing())

Add these two lines to the code above, after the creation of the grid. Everything should work fine now. Remember that making such changes to an image already containing data should be done cautiously.

Meta-dictionaries

SITK can read (and write) images stored in a single file, or a set of files (e.g. DICOM series).

Images stored in the DICOM format have a meta-data dictionary associated with them, which is populated with the DICOM tags. When a DICOM series is read as a single image, the meta-data information is not available since DICOM tags are specific to each file. If you need the meta-data, you have three options:

Using the object oriented interface’s ImageSeriesReader class, configure it to load the tags using the MetaDataDictionaryArrayUpdateOn method and possibly the LoadPrivateTagsOn method if you need the private tags. Once the series is read you can access the meta-data from the series reader using the GetMetaDataKeys, HasMetaDataKey, and GetMetaData.
Using the object oriented interface’s ImageFileReader, set a specific slice’s file name and only read it’s meta-data using the ReadImageInformation method which only reads the meta-data but not the bulk pixel information. Once the meta-data is read you can access it from the file reader using the GetMetaDataKeys, HasMetaDataKey, and GetMetaData.
Using the object oriented interface’s ImageFileReader, set a specific slice’s file name and read it. Or using the procedural interface’s, ReadImage function, read a specific file. You can then access the meta-data directly from the Image using the GetMetaDataKeys, HasMetaDataKey, and GetMetaData.

Note: When reading an image series, via the ImageSeriesReader or via the procedural ReadImage interface, the images in the list are assumed to be ordered correctly (GetGDCMSeriesFileNames ensures this for DICOM). If the order is incorrect, the image will be read, but its spacing and possibly the direction cosine matrix will be incorrect.

Let’s read in a digital x-ray image saved in a DICOM file format, and let’s print the metadata’s keys:

PYTHON

img_xray = sitk.ReadImage('data/sitk/digital_xray.dcm')

for key in img_xray.GetMetaDataKeys():
    print(f'"{key}":"{img_xray.GetMetaData(key)}"')

OUTPUT

"0008|0005":"ISO_IR 100"
"0008|0012":"20180713"
"0008|0013":"233245.431"
"0008|0016":"1.2.840.10008.5.1.4.1.1.7"
"0008|0018":"2.25.225981116244996633889747723103230447077"
"0008|0020":"20160208"
"0008|0060":"XC"
"0020|000d":"2.25.59412532821774470466514450673479191872"
"0020|000e":"2.25.149748537243964334146339164752570719260"
"0028|0002":"3"
"0028|0004":"YBR_FULL_422"
"0028|0006":"0"
"0028|0010":"357"
"0028|0011":"371"
"0028|0100":"8"
"0028|0101":"8"
"0028|0102":"7"
"0028|0103":"0"
"ITK_original_direction":"[UNKNOWN_PRINT_CHARACTERISTICS]
"
"ITK_original_spacing":"[UNKNOWN_PRINT_CHARACTERISTICS]
"

Callout

Reading a DICOM:

Many libraries allow you to read DICOM metadata.
PyDICOM will be explored for this task later.
If you are already using SITK, you will usually not need an extra library to get DICOM metadata.
Many libraries have some basic DICOM functionality, check the documentation before adding extra dependencies.

Generally speaking, SITK represents color images as multi-channel images independent of a color space. It is up to you to interpret the channels correctly based on additional color space knowledge prior to using them for display or any other purpose.

Things to note: 1. When using SITK to read a color DICOM image, the channel values will be transformed to the RGB color space. 2. When using SITK to read a scalar image, it is assumed that the lowest intensity value is black and highest white. If the photometric interpretation tag is MONOCHROME2 (lowest value displayed as black) nothing is done. If it is MONOCHROME1 (lowest value displayed as white), the pixel values are inverted.

PYTHON

print(f'Image Modality: {img_xray.GetMetaData("0008|0060")}')
print(img_xray.GetNumberOfComponentsPerPixel())

OUTPUT

Image Modality: XC
3

“0008|0060” is a code indicating the modality, and “XC” stays for “External-camera Photography”.

Callout

Grayscale Images Stored as sRGB

“digital_xray.dcm” image is sRGB, even if an x-ray should be a single channel gray scale image. In some cases looks may be deceiving. Gray scale images are not always stored as a single channel image. In some cases an image that looks like a gray scale image is actually a three channel image with the intensity values repeated in each of the channels. Even worse, some gray scale images can be four channel images with the channels representing RGBA and the alpha channel set to all 255. This can result in a significant waste of memory and computation time. Always become familiar with your data.

We can convert sRGB to gray scale:

PYTHON

nda_xray = sitk.GetArrayViewFromImage(img_xray)
plt.imshow(np.squeeze(nda_xray))

PYTHON

def srgb2gray(image):
    # Convert sRGB image to gray scale and rescale results to [0,255]
    channels = [
        sitk.VectorIndexSelectionCast(image, i, sitk.sitkFloat32)
        for i in range(image.GetNumberOfComponentsPerPixel())
    ]
    # linear mapping
    I = 1 / 255.0 * (0.2126 * channels[0] + 0.7152 * channels[1] + 0.0722 * channels[2])
    # nonlinear gamma correction
    I = (
        I * sitk.Cast(I <= 0.0031308, sitk.sitkFloat32) * 12.92
        + I ** (1 / 2.4) * sitk.Cast(I > 0.0031308, sitk.sitkFloat32) * 1.055
        - 0.055
    )
    return sitk.Cast(sitk.RescaleIntensity(I), sitk.sitkUInt8)

img_xray_gray = srgb2gray(img_xray)
nda = sitk.GetArrayViewFromImage(img_xray_gray)
plt.imshow(np.squeeze(nda), cmap="gray")

Transforms

SITK supports two types of spatial transforms, ones with a global (unbounded) spatial domain and ones with a bounded spatial domain. Points in SITK are mapped by the transform using the TransformPoint method.

All global domain transforms are of the form:

\(T(x) = A(x - c) + t + c\)

The nomenclature used in the documentation refers to the components of the transformations as follows:

Matrix - the matrix \(A\).
Center - the point \(c\).
Translation - the vector \(t\).
Offset - the expression \(t + c - Ac\).

A variety of global 2D and 3D transformations are available (translation, rotation, rigid, similarity, affine…). Some of these transformations are available with various parameterizations which are useful for registration purposes.

The second type of spatial transformation, bounded domain transformations, are defined to be identity outside their domain. These include the B-spline deformable transformation, often referred to as Free-Form Deformation, and the displacement field transformation.

The B-spline transform uses a grid of control points to represent a spline based transformation. To specify the transformation the user defines the number of control points and the spatial region which they overlap. The spline order can also be set, though the default of cubic is appropriate in most cases. The displacement field transformation uses a dense set of vectors representing displacement in a bounded spatial domain. It has no implicit constraints on transformation continuity or smoothness.

Finally, SITK supports a composite transformation with either a bounded or global domain. This transformation represents multiple transformations applied one after the other \(T_0(T_1(T_2(...T_n(p))))\). The semantics are stack based, that is, first in last applied:

PYTHON

composite_transform = CompositeTransform([T0, T1])
composite_transform.AddTransform(T2)

For more details about SITK transformation types and examples, see this tutorial notebook.

Resampling

Resampling, as the verb implies, is the action of sampling an image, which itself is a sampling of an original continuous signal.

Generally speaking, resampling in SITK involves four components:

Image - the image we resample, given in coordinate system \(m\).
Resampling grid - a regular grid of points given in coordinate system \(f\) which will be mapped to coordinate system \(m\).
Transformation \(T_f^m\) - maps points from coordinate system \(f\) to coordinate system \(m\), \(^mp=T_f^m(^fp)\).
Interpolator - method for obtaining the intensity values at arbitrary points in coordinate system \(m\) from the values of the points defined by the Image.

While SITK provides a large number of interpolation methods, the two most commonly used are sitkLinear and sitkNearestNeighbor. The former is used for most interpolation tasks and is a compromise between accuracy and computational efficiency. The later is used to interpolate labeled images representing a segmentation. It is the only interpolation approach which will not introduce new labels into the result.

The SITK interface includes three variants for specifying the resampling grid:

Use the same grid as defined by the resampled image.
Provide a second, reference, image which defines the grid.
Specify the grid using: size, origin, spacing, and direction cosine matrix.

Points that are mapped outside of the resampled image’s spatial extent in physical space are set to a constant pixel value which you provide (default is zero).

It is not uncommon to end up with an empty (all black) image after resampling. This is due to:

Using wrong settings for the resampling grid (not too common, but does happen).
Using the inverse of the transformation \(T_f^m\). This is a relatively common error, which is readily addressed by invoking the transformation’s GetInverse method.

Let’s try to plot multiple slices across different axis for the image “training_001_mr_T1.mha”.

PYTHON

img_volume = sitk.ReadImage("data/sitk/training_001_mr_T1.mha")
print(img_volume.GetSize())
print(img_volume.GetSpacing())

OUTPUT

(256, 256, 26)
(1.25, 1.25, 4.0)

We can plot the slices as we did before:

PYTHON

img_xslices = [img_volume[x,:,:] for x in range(50, 200, 30)]
img_yslices = [img_volume[:,y,:] for y in range(50, 200, 30)]
img_zslices = [img_volume[:,:,z] for z in range(1, 25, 3)]

tile_x = sitk.Tile(img_xslices, [1,0])
tile_y = sitk.Tile(img_yslices, [1,0])
tile_z = sitk.Tile(img_zslices, [1,0])

nda_xslices = sitk.GetArrayViewFromImage(tile_x)
nda_yslices = sitk.GetArrayViewFromImage(tile_y)
nda_zslices = sitk.GetArrayViewFromImage(tile_z)

fig, (ax1, ax2, ax3) = plt.subplots(1,3)
ax1.imshow(nda_xslices, cmap=plt.cm.Greys_r)
ax2.imshow(nda_yslices, cmap=plt.cm.Greys_r)
ax3.imshow(nda_zslices, cmap=plt.cm.Greys_r)
ax1.set_title('X slices')
ax2.set_title('Y slices')
ax3.set_title('Z slices')

Challenge

Challenge: Distorted Images

What is the main difference with the first image we plotted (“A1_grayT1.nrrd”)?

Show me the solution

In this case, there are only 26 images in the volume and the spacing between voxels is non-isotropic, and in particular it is the same across x- and y-axis, but it differs across the z-axis.

We can fix the distortion by resampling the volume along the z-axis, which has a different spacing (i.e., 4mm), and make it match with the other two spacing measures (i.e., 1.25mm):

PYTHON

def resample_img(image, out_spacing=[1.25, 1.25, 1.25]):
    
    # Resample images to 1.25mm spacing
    original_spacing = image.GetSpacing()
    original_size = image.GetSize()

    out_size = [
        int(np.round(original_size[0] * (original_spacing[0] / out_spacing[0]))),
        int(np.round(original_size[1] * (original_spacing[1] / out_spacing[1]))),
        int(np.round(original_size[2] * (original_spacing[2] / out_spacing[2])))]

    resample = sitk.ResampleImageFilter()
    resample.SetOutputSpacing(out_spacing)
    resample.SetSize(out_size)
    resample.SetOutputDirection(image.GetDirection())
    resample.SetOutputOrigin(image.GetOrigin())
    resample.SetTransform(sitk.Transform())
    resample.SetDefaultPixelValue(image.GetPixelIDValue())
    resample.SetInterpolator(sitk.sitkBSpline)

    return resample.Execute(image)

resampled_sitk_img = resample_img(img_volume)

print(resampled_sitk_img.GetSize())
print(resampled_sitk_img.GetSpacing())

OUTPUT

(256, 256, 83)
(1.25, 1.25, 1.25)

Registration

Image registration involves spatially transforming the source/moving image(s) to align with the target image. More specifically, the goal of registration is to estimate the transformation which maps points from one image to the corresponding points in another image. The transformation estimated via registration is said to map points from the fixed image (target image) coordinate system to the moving image (source image) coordinate system.

Callout

Many Ways to Do Registration:

Several libraries offer built-in registration functionalities.
Registration can be performed with DIPY (mentioned in the MRI episode), specifically for diffusion imaging using the SymmetricDiffeomorphicRegistration functionality.
NiBabel includes a function in its processing module that allows resampling or conforming one NIfTI image into the space of another.
SITK provides robust registration capabilities and allows you to code your own registration algorithms.
To maintain cleaner and more efficient code, it’s advisable to use as few libraries as possible and avoid those that may create conflicts.

SITK provides a configurable multi-resolution registration framework, implemented in the ImageRegistrationMethod class. In addition, a number of variations of the Demons registration algorithm are implemented independently from this class as they do not fit into the framework.

The task of registration is formulated using non-linear optimization which requires an initial estimate. The two most common initialization approaches are (1) Use the identity transform (a.k.a. forgot to initialize). (2) Align the physical centers of the two images (see CenteredTransformInitializerFilter). If after initialization there is no overlap between the images, registration will fail. The closer the initialization transformation is to the actual transformation, the higher the probability of convergence to the correct solution.

If your registration involves the use of a global domain transform (described here), you should also set an appropriate center of rotation. In many cases you want the center of rotation to be the physical center of the fixed image (the CenteredTransformCenteredTransformInitializerFilter ensures this). This is of significant importance for registration convergence due to the non-linear nature of rotation. When the center of rotation is far from our physical region of interest (ROI), a small rotational angle results in a large displacement. Think of moving the pivot/fulcrum point of a lever. For the same rotation angle, the farther you are from the fulcrum the larger the displacement. For numerical stability we do not want our computations to be sensitive to very small variations in the rotation angle, thus the ideal center of rotation is the point which minimizes the distance to the farthest point in our ROI:

\(p_{center} = arg_{p_{rotation}} min dist (p_{rotation}, \{p_{roi}\})\)

Without additional knowledge we can only assume that the ROI is the whole fixed image. If your ROI is only in a sub region of the image, a more appropriate point would be the center of the oriented bounding box of that ROI.

To create a specific registration instance using the ImageRegistrationMethod you need to select several components which together define the registration instance:

Transformation
- It defines the mapping between the two images.
Similarity metric
- It reflects the relationship between the intensities of the images (identity, affine, stochastic…).
Optimizer.
- When selecting the optimizer you will also need to configure it (e.g. set the number of iterations).
Interpolator.
- In most cases linear interpolation, the default setting, is sufficient.

Let’s see now an example where we want to use registration for aligning two volumes relative to the same patient, one being a CT scan and the second being a MRI sequence T1-weighted scan. We first read the images, casting the pixel type to that required for registration (Float32 or Float64) and look at them:

PYTHON

from ipywidgets import interact, fixed
from IPython.display import clear_output
import os

OUTPUT_DIR = "data/sitk/"
fixed_image =  sitk.ReadImage(f"{OUTPUT_DIR}training_001_ct.mha", sitk.sitkFloat32)
moving_image = sitk.ReadImage(f"{OUTPUT_DIR}training_001_mr_T1.mha", sitk.sitkFloat32)

print(f"Origin for fixed image: {fixed_image.GetOrigin()}, moving image: {moving_image.GetOrigin()}")
print(f"Spacing for fixed image: {fixed_image.GetSpacing()}, moving image: {moving_image.GetSpacing()}")
print(f"Size for fixed image: {fixed_image.GetSize()}, moving image: {moving_image.GetSize()}")
print(f"Number Of Components Per Pixel for fixed image: {fixed_image.GetNumberOfComponentsPerPixel()}, moving image: {moving_image.GetNumberOfComponentsPerPixel()}")

# Callback invoked by the interact IPython method for scrolling through the image stacks of
# the two images (moving and fixed).
def display_images(fixed_image_z, moving_image_z, fixed_npa, moving_npa):
    # Create a figure with two subplots and the specified size.
    plt.subplots(1,2,figsize=(10,8))
    
    # Draw the fixed image in the first subplot.
    plt.subplot(1,2,1)
    plt.imshow(fixed_npa[fixed_image_z,:,:],cmap=plt.cm.Greys_r)
    plt.title('fixed/target image')
    plt.axis('off')
    
    # Draw the moving image in the second subplot.
    plt.subplot(1,2,2)
    plt.imshow(moving_npa[moving_image_z,:,:],cmap=plt.cm.Greys_r)
    plt.title('moving/source image')
    plt.axis('off')
    
    plt.show()

interact(
    display_images,
    fixed_image_z = (0,fixed_image.GetSize()[2]-1),
    moving_image_z = (0,moving_image.GetSize()[2]-1),
    fixed_npa = fixed(sitk.GetArrayViewFromImage(fixed_image)),
    moving_npa = fixed(sitk.GetArrayViewFromImage(moving_image)))

OUTPUT

Origin for fixed image: (0.0, 0.0, 0.0), moving image: (0.0, 0.0, 0.0)
Spacing for fixed image: (0.653595, 0.653595, 4.0), moving image: (1.25, 1.25, 4.0)
Size for fixed image: (512, 512, 29), moving image: (256, 256, 26)
Number Of Components Per Pixel for fixed image: 1, moving image: 1

CT and MRI volumes before being aligned.

We can use the CenteredTransformInitializer to align the centers of the two volumes and set the center of rotation to the center of the fixed image:

PYTHON

# Callback invoked by the IPython interact method for scrolling and
# modifying the alpha blending of an image stack of two images that
# occupy the same physical space. 
def display_images_with_alpha(image_z, alpha, fixed, moving):
    img = (1.0 - alpha)*fixed[:,:,image_z] + alpha*moving[:,:,image_z] 
    plt.imshow(sitk.GetArrayViewFromImage(img),cmap=plt.cm.Greys_r)
    plt.axis('off')
    plt.show()

initial_transform = sitk.CenteredTransformInitializer(fixed_image, 
                                                      moving_image, 
                                                      sitk.Euler3DTransform(), 
                                                      sitk.CenteredTransformInitializerFilter.GEOMETRY)

moving_resampled = sitk.Resample(moving_image, fixed_image, initial_transform, sitk.sitkLinear, 0.0, moving_image.GetPixelID())

interact(
    display_images_with_alpha,
    image_z = (0,fixed_image.GetSize()[2]-1),
    alpha = (0.0,1.0,0.05),
    fixed = fixed(fixed_image),
    moving = fixed(moving_resampled))

The specific registration task at hand estimates a 3D rigid transformation between images of different modalities. There are multiple components from each group (optimizers, similarity metrics, interpolators) that are appropriate for the task. Note that each component selection requires setting some parameter values. We have made the following choices:

Similarity metric, mutual information (Mattes MI):
- Number of histogram bins, 50.
- Sampling strategy, random.
- Sampling percentage, 1%.
Interpolator, sitkLinear.
Optimizer, gradient descent:
- Learning rate, step size along traversal direction in parameter space, 1.0.
- Number of iterations, maximal number of iterations, 100.
- Convergence minimum value, value used for convergence checking in conjunction with the energy profile of the similarity metric that is estimated in the given window size, 1e-6.
- Convergence window size, number of values of the similarity metric which are used to estimate the energy profile of the similarity metric, 10.

We perform registration using the settings given above, and by taking advantage of the built in multi-resolution framework, we use a three tier pyramid.

In this example we plot the similarity metric’s value during registration. Note that the change of scales in the multi-resolution framework is readily visible.

PYTHON

# Callback invoked when the StartEvent happens, sets up our new data.
def start_plot():
    global metric_values, multires_iterations
    
    metric_values = []
    multires_iterations = []

# Callback invoked when the EndEvent happens, do cleanup of data and figure.
def end_plot():
    global metric_values, multires_iterations
    
    del metric_values
    del multires_iterations
    # Close figure, we don't want to get a duplicate of the plot latter on.
    plt.close()

# Callback invoked when the sitkMultiResolutionIterationEvent happens, update the index into the 
# metric_values list. 
def update_multires_iterations():
    global metric_values, multires_iterations
    multires_iterations.append(len(metric_values))

# Callback invoked when the IterationEvent happens, update our data and display new figure.
def plot_values(registration_method):
    global metric_values, multires_iterations
    
    metric_values.append(registration_method.GetMetricValue())                                       
    # Clear the output area (wait=True, to reduce flickering), and plot current data
    clear_output(wait=True)
    # Plot the similarity metric values
    plt.plot(metric_values, 'r')
    plt.plot(multires_iterations, [metric_values[index] for index in multires_iterations], 'b*')
    plt.xlabel('Iteration Number',fontsize=12)
    plt.ylabel('Metric Value',fontsize=12)
    plt.show()


registration_method = sitk.ImageRegistrationMethod()

# Similarity metric settings.
registration_method.SetMetricAsMattesMutualInformation(numberOfHistogramBins=50)
registration_method.SetMetricSamplingStrategy(registration_method.RANDOM)
registration_method.SetMetricSamplingPercentage(0.01)

registration_method.SetInterpolator(sitk.sitkLinear)

# Optimizer settings.
registration_method.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100, convergenceMinimumValue=1e-6, convergenceWindowSize=10)
registration_method.SetOptimizerScalesFromPhysicalShift()

# Setup for the multi-resolution framework.            
registration_method.SetShrinkFactorsPerLevel(shrinkFactors = [4,2,1])
registration_method.SetSmoothingSigmasPerLevel(smoothingSigmas=[2,1,0])
registration_method.SmoothingSigmasAreSpecifiedInPhysicalUnitsOn()

# Don't optimize in-place, we would possibly like to run this cell multiple times.
registration_method.SetInitialTransform(initial_transform, inPlace=False)

# Connect all of the observers so that we can perform plotting during registration.
registration_method.AddCommand(sitk.sitkStartEvent, start_plot)
registration_method.AddCommand(sitk.sitkEndEvent, end_plot)
registration_method.AddCommand(sitk.sitkMultiResolutionIterationEvent, update_multires_iterations) 
registration_method.AddCommand(sitk.sitkIterationEvent, lambda: plot_values(registration_method))

final_transform = registration_method.Execute(sitk.Cast(fixed_image, sitk.sitkFloat32), 
                                               sitk.Cast(moving_image, sitk.sitkFloat32))

Always remember to query why the optimizer terminated. This will help you understand whether termination is too early, either due to thresholds being too tight, early termination due to small number of iterations - numberOfIterations, or too loose, early termination due to large value for minimal change in similarity measure - convergenceMinimumValue.

PYTHON

print('Final metric value: {0}'.format(registration_method.GetMetricValue()))
print('Optimizer\'s stopping condition, {0}'.format(registration_method.GetOptimizerStopConditionDescription()))

OUTPUT

Final metric value: -0.6561600032169457
Optimizer's stopping condition, GradientDescentOptimizerv4Template: Convergence checker passed at iteration 61.

Now we can visually inspect the results:

PYTHON

moving_resampled = sitk.Resample(moving_image, fixed_image, final_transform, sitk.sitkLinear, 0.0, moving_image.GetPixelID())

interact(
    display_images_with_alpha,
    image_z = (0,fixed_image.GetSize()[2]-1),
    alpha = (0.0,1.0,0.05),
    fixed = fixed(fixed_image),
    moving = fixed(moving_resampled))

PYTHON

print(f"Origin for fixed image: {fixed_image.GetOrigin()}, shifted moving image: {moving_resampled.GetOrigin()}")
print(f"Spacing for fixed image: {fixed_image.GetSpacing()}, shifted moving image: {moving_resampled.GetSpacing()}")
print(f"Size for fixed image: {fixed_image.GetSize()}, shifted moving image: {moving_resampled.GetSize()}")

OUTPUT

Origin for fixed image: (0.0, 0.0, 0.0), shifted moving image: (0.0, 0.0, 0.0)
Spacing for fixed image: (0.653595, 0.653595, 4.0), shifted moving image: (0.653595, 0.653595, 4.0)
Size for fixed image: (512, 512, 29), shifted moving image: (512, 512, 29)

If we are satisfied with the results, save them to file.

PYTHON

sitk.WriteImage(moving_resampled, os.path.join(OUTPUT_DIR, 'RIRE_training_001_mr_T1_resampled.mha'))
sitk.WriteTransform(final_transform, os.path.join(OUTPUT_DIR, 'RIRE_training_001_CT_2_mr_T1.tfm'))

Segmentation

Image segmentation filters process images by dividing them into meaningful regions. SITK provides a wide range of filters to support classical segmentation algorithms, including various thresholding methods and watershed algorithms. The output is typically an image where different integers represent distinct objects, with 0 often used for the background and 1 (or sometimes 255) for foreground objects. After segmenting the data, SITK allows for efficient post-processing, such as labeling distinct objects and analyzing their shapes.

Let’s start by reading in a T1 MRI scan, on which we will perform segmentation operations.

PYTHON

%matplotlib inline
import matplotlib.pyplot as plt
from ipywidgets import interact, fixed
import SimpleITK as sitk

img_T1 = sitk.ReadImage("data/sitk/A1_grayT1.nrrd")
# To visualize the labels image in RGB with needs a image with 0-255 range
img_T1_255 = sitk.Cast(sitk.RescaleIntensity(img_T1), sitk.sitkUInt8)

# Callback invoked by the interact IPython method for scrolling through the image stacks of
# a volume image
def display_images(image_z, npa, title):
    plt.imshow(npa[image_z,:,:], cmap=plt.cm.Greys_r)
    plt.title(title)
    plt.axis('off')
    plt.show()

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(img_T1)),
    title = fixed('Z slices'))

Thresholding

Thresholding is the most basic form of segmentation. It simply labels the pixels of an image based on the intensity range without respect to geometry or connectivity.

PYTHON

# Basic thresholding
seg = img_T1>200
seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Basic thresholding"))

Another example using BinaryThreshold:

PYTHON

# Binary thresholding
seg = sitk.BinaryThreshold(img_T1, lowerThreshold=100, upperThreshold=400, insideValue=1, outsideValue=0)
seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Binary thresholding"))****

ITK has a number of histogram based automatic thresholding filters including Huang, MaximumEntropy, Triangle, and the popular Otsu’s method (OtsuThresholdImageFilter). These methods create a histogram then use a heuristic to determine a threshold value.

PYTHON

# Otsu Thresholding
otsu_filter = sitk.OtsuThresholdImageFilter()
otsu_filter.SetInsideValue(0)
otsu_filter.SetOutsideValue(1)
seg = otsu_filter.Execute(img_T1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Otsu thresholding"))

print(otsu_filter.GetThreshold() )

OUTPUT

236.40869140625

Region Growing Segmentation

The first step of improvement upon the naive thresholding is a class of algorithms called region growing. The common theme for all these algorithms is that a voxel’s neighbor is considered to be in the same class if its intensities are similar to the current voxel. The definition of similar is what varies:

ConnectedThreshold: The neighboring voxel’s intensity is within explicitly specified thresholds.
ConfidenceConnected: The neighboring voxel’s intensity is within the implicitly specified bounds \(\mu \pm c \sigma\), where \(\mu\) is the mean intensity of the seed points, \(\sigma\) their standard deviation and \(c\) a user specified constant.
VectorConfidenceConnected: A generalization of the previous approach to vector valued images, for instance multi-spectral images or multi-parametric MRI. The neighboring voxel’s intensity vector is within the implicitly specified bounds using the Mahalanobis distance \(\sqrt{(x-\mu)^{T\sum{-1}}(x-\mu)}<c\), where \(\mu\) is the mean of the vectors at the seed points, \(\sum\) is the covariance matrix and \(c\) is a user specified constant.
NeighborhoodConnected

Let’s imagine that we have to segment the left lateral ventricle of the brain image we just visualized.

3D Slicer was used to determine that index: (132,142,96) was a good seed for the left lateral ventricle. Let’s first visualize the seed:

PYTHON

# Initial seed
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Initial seed"))

Let’s use ConnectedThreshold functionality:

PYTHON

# Connected Threshold
seg = sitk.ConnectedThreshold(img_T1, seedList=[seed], lower=100, upper=190)

seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Connected threshold"))

Improving upon this is the ConfidenceConnected filter, which uses the initial seed or current segmentation to estimate the threshold range.

This region growing algorithm allows the user to implicitly specify the threshold bounds based on the statistics estimated from the seed points, \(\mu \pm c\sigma\). This algorithm has some flexibility which you should familiarize yourself with:

The “multiplier” parameter is the constant \(c\) from the formula above.
You can specify a region around each seed point “initialNeighborhoodRadius” from which the statistics are estimated, see what happens when you set it to zero.
The “numberOfIterations” allows you to rerun the algorithm. In the first run the bounds are defined by the seed voxels you specified, in the following iterations \(\mu\) and \(\sigma\) are estimated from the segmented points and the region growing is updated accordingly.

PYTHON

seg = sitk.ConfidenceConnected(img_T1, seedList=[seed],
                                   numberOfIterations=0,
                                   multiplier=2,
                                   initialNeighborhoodRadius=1,
                                   replaceValue=1)

seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_z=(0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title=fixed("Confidence connected"))

Since we have available also another MRI scan of the same patient, T2-weighted, we can try to further improve the segmentation. We first load a T2 image from the same person and combine it with the T1 image to create a vector image. This region growing algorithm is similar to the previous one, ConfidenceConnected, and allows the user to implicitly specify the threshold bounds based on the statistics estimated from the seed points. The main difference is that in this case we are using the Mahalanobis and not the intensity difference.

PYTHON

img_T2 = sitk.ReadImage("data/sitk/A1_grayT2.nrrd")
img_multi = sitk.Compose(img_T1, img_T2)
seg = sitk.VectorConfidenceConnected(img_multi, seedList=[seed],
                                             numberOfIterations=1,
                                             multiplier=2.5,
                                             initialNeighborhoodRadius=1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Vector confidence connected"))

Clean Up

Use of low level segmentation algorithms such as region growing is often followed by a clean up step. In this step we fill holes and remove small connected components. Both of these operations are achieved by using binary morphological operations, opening (BinaryMorphologicalOpening) to remove small connected components and closing (BinaryMorphologicalClosing) to fill holes.

SITK supports several shapes for the structuring elements (kernels) including:

sitkAnnulus
sitkBall
sitkBox
sitkCross

The size of the kernel can be specified as a scalar (same for all dimensions) or as a vector of values, size per dimension.

The following code cell illustrates the results of such a clean up, using closing to remove holes in the original segmentation.

PYTHON

seg = sitk.ConfidenceConnected(img_T1, seedList=[seed],
                                   numberOfIterations=0,
                                   multiplier=2,
                                   initialNeighborhoodRadius=1,
                                   replaceValue=1)

vectorRadius=(1,1,1)
kernel=sitk.sitkBall
seg_clean = sitk.BinaryMorphologicalClosing(seg, vectorRadius, kernel)
seg_img_clean = sitk.LabelOverlay(img_T1_255, seg_clean)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img_clean)),
    title = fixed("Confidence connected after morphological closing"))

Confidence connected after morphological closing.

Level-Set Segmentation

There are a variety of level-set based segmentation filter available in ITK:

There is also a modular Level-set framework which allows composition of terms and easy extension in C++.

First we create a label image from our seed.

PYTHON

seed = (132,142,96)

seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())

Use the seed to estimate a reasonable threshold range.

PYTHON

stats = sitk.LabelStatisticsImageFilter()
stats.Execute(img_T1, seg)

factor = 3.5
lower_threshold = stats.GetMean(1)-factor*stats.GetSigma(1)
upper_threshold = stats.GetMean(1)+factor*stats.GetSigma(1)
print(lower_threshold,upper_threshold)

OUTPUT

81.25184541308809 175.0084466827569

PYTHON

init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(lower_threshold)
lsFilter.SetUpperThreshold(upper_threshold)
lsFilter.SetMaximumRMSError(0.02)
lsFilter.SetNumberOfIterations(1000)
lsFilter.SetCurvatureScaling(.5)
lsFilter.SetPropagationScaling(1)
lsFilter.ReverseExpansionDirectionOn()
ls = lsFilter.Execute(init_ls, sitk.Cast(img_T1, sitk.sitkFloat32))
print(lsFilter)

OUTPUT

itk::simple::ThresholdSegmentationLevelSetImageFilter
  LowerThreshold: 81.2518
  UpperThreshold: 175.008
  MaximumRMSError: 0.02
  PropagationScaling: 1
  CurvatureScaling: 0.5
  NumberOfIterations: 1000
  ReverseExpansionDirection: 1
  ElapsedIterations: 119
  RMSChange: 0.0180966
  Debug: 0
  NumberOfThreads: 10
  NumberOfWorkUnits: 0
  Commands: (none)
  ProgressMeasurement: 0.119
  ActiveProcess: (none)

PYTHON

seg_img = sitk.LabelOverlay(img_T1_255, ls>0)

interact(
    display_images,
    image_z = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Level set segmentation"))

Challenge

Challenge: Segment on the y-Axis

Try to segment the lateral ventricle using volume’s slices on y-axis instead of z-axis.

Hint: Start by editing the display_images function in order to select the slices on the y-axis.

Show me the solution

PYTHON

def display_images(image_y, npa, title):
    plt.imshow(npa[:,image_y,:], cmap=plt.cm.Greys_r)
    plt.title(title)
    plt.axis('off')
    plt.show()

# Initial seed; we can reuse the same used for the z-axis slices
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
seg_img = sitk.LabelOverlay(img_T1_255, seg)

interact(
    display_images,
    image_y = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Initial seed"))

After some attempts, this was the method that gave the best segmentation results:

PYTHON

seed = (132,142,96)

seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())

stats = sitk.LabelStatisticsImageFilter()
stats.Execute(img_T1, seg)

factor = 3.5
lower_threshold = stats.GetMean(1)-factor*stats.GetSigma(1)
upper_threshold = stats.GetMean(1)+factor*stats.GetSigma(1)
print(lower_threshold,upper_threshold)

init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(lower_threshold)
lsFilter.SetUpperThreshold(upper_threshold)
lsFilter.SetMaximumRMSError(0.02)
lsFilter.SetNumberOfIterations(1000)
lsFilter.SetCurvatureScaling(.5)
lsFilter.SetPropagationScaling(1)
lsFilter.ReverseExpansionDirectionOn()
ls = lsFilter.Execute(init_ls, sitk.Cast(img_T1, sitk.sitkFloat32))

OUTPUT

81.25184541308809 175.0084466827569

PYTHON

seg_img = sitk.LabelOverlay(img_T1_255, ls>0)

interact(
    display_images,
    image_y = (0,img_T1.GetSize()[2]-1),
    npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
    title = fixed("Level set segmentation"))

Callout

Segmentation Evaluation

Evaluating segmentation algorithms typically involves comparing your results to reference data.

In the medical field, reference data is usually created through manual segmentation by an expert. When resources are limited, a single expert might define this data, though this is less than ideal. If multiple experts contribute, their inputs can be combined to produce reference data that more closely approximates the elusive “ground truth.”

For detailed coding examples on segmentation evaluation, refer to this notebook.

Acknowledgements

This episode was largely inspired by the official SITK tutorial, which is copyrighted by NumFOCUS and distributed under the Creative Commons Attribution 4.0 International License, and SITK Notebooks.

Additional Resources

To really understand the structure of SITK images and how to work with them, we recommend some hands-on interaction using the SITK Jupyter notebooks from the SITK official channels. More detailed information about SITK fundamental concepts can also be found here.

Code illustrating various aspects of the registration and segmentation framework can be found in the set of examples which are part of the SITK distribution and in the SITK Jupyter notebook repository.

Key Points

Registration aligns images for data merging or temporal tracking, while segmentation identifies objects within images, which is critical for detailed analysis.
SITK simplifies segmentation, registration, and advanced analysis tasks using ITK algorithms and supporting several programming languages.
Images in SITK are defined by physical space, unlike array-based libraries, ensuring accurate spatial representation and metadata management.
SITK offers global and bounded domain transformations for spatial manipulation and efficient resampling techniques with various interpolation options.
Use SITK’s robust capabilities for registration and classical segmentation methods such as thresholding and region growth, ensuring efficient analysis of medical images.

Content from Preparing Images for Machine Learning

Last updated on 2024-09-14 | Edit this page

Estimated time: 125 minutes

Overview

Questions

What are the fundamental steps in preparing images for machine learning?
What techniques can be used to augment data?
How should data from various sources or collected under different conditions be handled?
How can we generate features for machine learning using radiomics or volumetrics?

Objectives

Outline the fundamental steps involved in preparing images for machine learning
Demonstrate data augmentation techniques using affine transformations
Discuss the potential pitfalls of data augmentation
Explain the concept of derived features, including radiomics and other pipeline-derived features
Review image harmonization techniques at both the image and dataset levels

Basic Steps

Datasets of images can serve as the raw data for machine learning (ML). While in rare cases they might be ready for use with minimal effort, in most projects, a significant portion of time is dedicated to cleaning and preparing the data. The quality of the data directly impacts the quality of the resulting models.

One of the initial steps in building a model involves manually inspecting both the data and metadata.

Challenge

Challenge: Thinking About Metadata

What metadata should you examine in almost any radiological dataset?

Hint: Consider metadata pertinent to brain and thorax imaging.

Show me the solution

In almost any radiological dataset, it is essential to examine the age and sex distribution. These metadata are crucial for both brain MRIs and chest X-rays, as well as for many other types of imaging studies.

In real-world scenarios, we often face the challenge of investigating a specific pathology that is relatively rare. Consequently, our dataset may contain thousands of normal subjects but relatively few with any pathology, and even fewer with the pathology of interest. This creates an imbalanced dataset and can introduce biases. For instance, if we aim to develop a model to detect heart failure on chest X-rays, but all our heart failure patients are older men, the model may incorrectly “focus” (by exploiting statistical correlations) on the absence of female breasts and signs of aging, rather than on true heart failure indicators like the cardiothoracic ratio.

Here are some initial steps to understanding your dataset if you plan to do supervised ML:

Check some images by hand: assess quality, content, and any unexpected elements.
Check some labeling by hand: ensure accuracy, consistency, and note any surprises.
Diversity check for obvious protected classes: examine images, labels, or metadata for representation.
Convert images to a lossless format and store a backup of the data collection.
Anonymize data: remove identifiable information within images (consider using OCR for text).
Determine important image features: consider creating cropped versions.
Normalize images: adjust for size, histogram, and other factors, ensuring proper centering when necessary.
Re-examine some images by hand: look for artifacts introduced during preprocessing steps.
Harmonize the dataset.
Create augmented and/or synthetic data.

When preparing images for unsupervised learning, many of these steps still apply. However, you must also consider the specific algorithm’s requirements, as some segmentation algorithms may not benefit from additional data.

In our preparation for ML as outlined above, step two involves verifying the accuracy of labeling. The label represents the target variable you want your model to predict, so it is crucial to ensure that the labeling is both accurate and aligns with the outcomes you expect your model to predict.

In step three, we emphasize the importance of checking for diversity among protected classes—groups that are often subject to specific legislation in medical diagnostics due to historical underrepresentation. It is essential to ensure your dataset includes women and ethnic minority groups, as population-level variations can significantly impact model performance. For instance, chest circumference and height can vary greatly between populations, such as between Dutch and Indonesian people. While these differences might not affect an algorithm focused on cerebral blood flow, they are critical for others.

In step five, the focus is on anonymization, particularly in imaging data like ultrasounds, where patient names or diagnoses might appear directly on the image. Simple cropping may sometimes suffice, but more advanced techniques such as blurring, masking, or optical character recognition (OCR) should also be considered to ensure thorough anonymization.

We will cover step nine, dataset harmonization, in a separate section. For step ten, creating more data, synthetic data generation will be discussed further in an episode on generative AI. In the next section, we will explore examples of augmented data creation.

Let’s go throught some examples. First, we import the libraries we need:

PYTHON

import numpy as np
import matplotlib
from matplotlib import pyplot as plt
from skimage import transform
from skimage import io
from skimage.transform import rotate
from skimage import transform as tf
from skimage.transform import PiecewiseAffineTransform
from skimage.transform import resize

Then, we import our example images and examine them.

PYTHON

image_cxr1 = io.imread('data/ml/rotatechest.png') # a relatively normal chest X-ray (CXR)
image_cxr_cmegaly = io.imread('data/ml/cardiomegaly_cc0.png') # cardiomegaly CXR
image_cxr2 = io.imread('data/ml/other_op.png') # a relatively normal CXR
# create figure
fig = plt.figure(figsize=(10, 7))
  
# setting values to rows and column variables
rows = 1
columns = 3

# Adds a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(image_cxr1)
plt.axis('off')
plt.title("Normal 1")
  
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(image_cxr_cmegaly)
plt.axis('off')
plt.title("Cardiomegaly")

# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(image_cxr2)
plt.axis('off')
plt.title("Normal 2")

Challenge

Challenge: Can You See Some problems in the Following Scenario?

Imagine you got the above images and many more because you have been assigned to make an algorithm for cardiomegaly detection. The goal is to notify patients if their hospital-acquired X-rays, taken for any reason, show signs of cardiomegaly. This is an example of using ML for opportunistic screening.

You are provided with two datasets:

A dataset of healthy (no cardiomegaly) patients from an outpatient clinic in a very poor area, staffed by first-year radiography students. These patients were being screened for tuberculosis (TB).
A dataset of chest X-rays of cardiomegaly patients from an extremely prestigious tertiary inpatient hospital.

Show me the solution

All of the following may pose potential problems:

We may have different quality images from different machines for our healthy versus diseased patients, which could introduce a “batch effect” and create bias.
At the prestigious hospital, many X-rays might be taken in the supine position. Will these combine well with standing AP X-rays?
There could be concerns about the accuracy of labeling at the clinic since it was performed by lower-level staff.

Challenge

Challenge: Prepare the Images for Classic Supervised ML

Use skimage.transform.rotate to create two realistic augmented images from the given ‘normal’ image stored in the variables.

Then, in a single block of code, apply what you perceive as the most critical preprocessing operations to prepare these images for classic supervised ML.

Hint: Carefully examine the shape of the cardiomegaly image. Consider the impact of harsh lines on ML performance.

Show me the solution

PYTHON

# figure out how much to cut on sides
print("cut top/bottom:", (image_cxr_cmegaly.shape[0] - image_cxr1.shape[0])/2)
cut_top_bottom = abs(round((image_cxr_cmegaly.shape[0] - image_cxr1.shape[0])/2))
# figure our how much to cut on top and bottom
print("cut sides:",(image_cxr_cmegaly.shape[1] - image_cxr1.shape[1])/2)

OUTPUT

cut top/bottom: -119.0
cut sides: -208.5

PYTHON

cut_sides = abs(round((image_cxr_cmegaly.shape[1] - image_cxr1.shape[1])/2))
list_images = [image_cxr1, image_cxr_cmegaly, image_cxr2]
better_for_ml_list = []
for image in list_images:
    image = rotate(image,2)
    image = image[cut_top_bottom:-cut_top_bottom, cut_sides: -cut_sides]
    better_for_ml_list.append(image)

# create figure for display
fig = plt.figure(figsize=(10, 7))
  
# setting values to rows and column variables
rows = 1
columns = 3

# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(better_for_ml_list[0])
plt.axis('off')
plt.title("Normal example rotated")
  
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(better_for_ml_list[1])
plt.axis('off')
plt.title("Cardiomegaly rotated")

# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(better_for_ml_list[2])
plt.axis('off')
plt.title("Another normal rotated")

PYTHON

# find a size we want to resize to, here the smallest
zero_side = []
one_side = []
for image in better_for_ml_list:
    zero_side.append(image.shape[0])
    zero_side.sort()
    one_side.append(image.shape[1])
    one_side.sort()
smallest_size = zero_side[0], one_side[0]
# make resized images
resized = []
for image in better_for_ml_list:
    image =  resize(image, (smallest_size))
    resized.append(image)

# create figure for display
fig = plt.figure(figsize=(10, 7))
  
# setting values to rows and column variables
rows = 1
columns = 3

# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(resized[0])
plt.axis('off')
plt.title("Normal 1 Augmented Resized")
  
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(resized[1])
plt.axis('off')
plt.title("Augment Cardiomegaly Resized")

# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(resized[2])
plt.axis('off')
plt.title("Normal 2 Augment  Resized")

There are a few key considerations to keep in mind regarding the choices made here. The goal is to create data that realistically represents what might occur in real-life scenarios. For example, a 2-degree rotation is far more realistic than a 25-degree rotation. Unless you are using a rotationally invariant algorithm or another advanced method that accounts for such differences, even small rotations can make every pixel slightly different for the computer. A 2-degree rotation closely mirrors what might happen in a clinical setting, making it a more practical choice.

Additionally, the results can be further improved by cropping the images. Generally, we aim to minimize harsh, artificial lines in any images used for ML algorithms, unless those lines are consistent across all images. It is a common observation among practitioners that many ML algorithms tend to “focus” on these harsh lines, so if they are not uniformly present, it is better to remove them. As for how much to crop, we based our decision on the size of our images. While you don’t need to follow our exact approach, it is important to ensure that all images are ultimately the same size and that all critical features are retained.

Of course, there are numerous other transformations we can apply to images beyond rotation. For example, let’s explore applying a shear:

PYTHON

# create affine transform
affine_tf = tf.AffineTransform(shear=0.2)

# apply transform to image data
image_affine_tf = tf.warp(image_cxr_cmegaly, inverse_map=affine_tf)

# display the result
io.imshow(image_affine_tf)
io.show()

And finally, let’s demonstrate a “wave over a mesh.” We’ll start by creating a grid, or “mesh,” over our image, represented by dots in our plot. Then, we’ll apply a warp function to transform the image into the shape of a wave.

PYTHON

rows, cols = image_affine_tf.shape[0], image_affine_tf.shape[1]

# np.linspace will return evenly spaced numbers over an interval
src_cols = np.linspace(0, cols, 20)
# ie above is start=0, stop = cols, num = 50, and num is the number of chops
src_rows = np.linspace(0, rows, 10)

# np.meshgrid returns coordinate matrices from coordinate vectors.
src_rows, src_cols = np.meshgrid(src_rows, src_cols)

# numpy dstack stacks along a third dimension in the concatenation
src = np.dstack([src_cols.flat, src_rows.flat])[0]
dst_rows = src[:, 1] - np.sin(np.linspace(0, 3 * np.pi, src.shape[0])) * 50
dst_cols = src[:, 0]
dst_rows *= 1.5
dst_rows -= 1.5 * 50
dst = np.vstack([dst_cols, dst_rows]).T

tform = PiecewiseAffineTransform()
tform.estimate(src, dst)
noform = PiecewiseAffineTransform()
noform.estimate(src, src)

out_rows = image_affine_tf.shape[0] - 1.5 * 50
out_cols = cols
out = tf.warp(image_affine_tf, tform, output_shape=(out_rows, out_cols))
# create figure
fig = plt.figure(figsize=(10, 7))
  
# setting values to rows and column variables
rows = 1
columns = 4

# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(image_affine_tf)
plt.axis('off')
plt.title("Normal")
  
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(image_affine_tf)
plt.plot(noform.inverse(src)[:, 0], noform.inverse(src)[:, 1], '.b')
plt.axis('off')
plt.title("Normal and Mesh")

# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(out)

plt.axis('off')
plt.title("Augment")

# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 4)
# showing image
plt.imshow(out)
plt.plot(tform.inverse(src)[:, 0], tform.inverse(src)[:, 1], '.b')
plt.axis('off')
plt.title("Augment and Mesh")

The last transformation doesn’t appear realistic. The chest became noticeably widened, which could be problematic. When augmenting data, there are numerous possibilities, but it’s crucial to ensure the augmented data remains realistic. Only a subject matter expert (typically a pathologist, nuclear medicine specialist, or radiologist) can accurately determine what realistic data should look like.

Callout

Libraries for Medical Image Augmentation

We demonstrated how to augment medical images using scikit-image. Another great resource for augmentation is SimpleITK, which offers a dedicated tutorial on data augmentation. Additionally, OpenCV provides versatile tools for image processing and augmentations.

To keep your environment manageable, we recommend using just one of these libraries. scikit-image, SimpleITK, and OpenCV all offer effective solutions for medical image augmentation.

Images’ Features

So far, we’ve focused on examples where we directly manipulate images. However, much of ML involves working with derived values from images, often converted into tabular data. In fact, it’s possible to combine images with various types of tabular data in multiple ways for ML. But before exploring these methods, let’s first consider using image features alone as inputs for ML.

Two notable examples of image features are volumetric data and radiomic data. Handling such data at scale—when dealing with thousands of images—typically involves automated processes rather than manual measurements. In the past, pioneers like Benjamin Felson, a founding figure in modern chest X-ray analysis, painstakingly analyzed hundreds of X-rays by hand to gather statistics. Today, processing pipelines allow us to efficiently analyze thousands of images simultaneously.

We aim to integrate images into a pipeline that automatically generates various types of data. A notable example of such a feature pipeline, particularly for brain imaging, is Freesurfer. Freesurfer facilitates the extraction of volume and other relevant data from brain imaging scans.

Challenge

Challenge: Identifying Issues with Pipeline Outputs in the Absence of Original Images

Consider potential issues that could arise from using the output of a pipeline without accessing the original images.

Show me the solution

One major concern is accuracy, as pipelines may mislabel or miscount features, potentially leading to unreliable data analysis or model training. To mitigate these issues, consider:

Manual verification: validate pipeline outputs by manually reviewing a subset of images to ensure consistency with expected results.
Literature review: consult existing research to establish benchmarks for feature counts and volumes, helping identify discrepancies that may indicate pipeline errors.
Expert consultation: seek insights from radiologists or domain experts to assess the credibility of pipeline outputs based on their clinical experience.

Proceed with caution when relying on pipeline outputs, especially without direct access to the original images.

Another challenge with pipelines is that the results can vary significantly depending on which one is chosen. For a detailed exploration of how this can impact fMRI data, see this Nature article. Without access to the original images, it becomes difficult to determine which pipeline most accurately captures the desired outcomes from the data.

Morphometric or volumetric data represents one type of derived data, but radiomics introduces another dimension. Radiomics involves analyzing mathematical characteristics within images, such as entropy or kurtosis. Similar methodologies applied to pathology are referred to as pathomics. Some define radiomics and pathomics as advanced texture analysis in medical imaging, while others argue they encompass a broader range of quantitative data than traditional texture analysis methods.

Regardless of the data type chosen, the typical approach involves segmenting and/or masking an image to isolate the area of interest, followed by applying algorithms to derive radiomic or pathomic features. While it’s possible to develop custom code for this purpose, using established packages is preferred for ensuring reproducibility. For future reproducibility, adherence to standards such as those set by the International Bio-imaging Standards Initiative (IBSI) is crucial.

Below is a list of open-source* libraries that facilitate this type of analysis:

	`mirp`	`pyradiomics`	`LIFEx`	`radiomics`
License	EUPL-1.2	BSD-3	custom	GPL-3.0
Last updated	5/2024	1/2024	6/2023	11/2019
Programming language	Python	Python	Java	MATLAB
IBSI-1 compliant	yes	partial	yes	no claim
IBSI-2 compliant	yes	no claim	yes	no claim
Interface	high-level API	high-level API, Docker	GUI, low-level API	low-level API
Website	GitHub	GitHub	website	GitHub
Early publication	JOSS publication	doi:10.1158/0008-5472.CAN-17-0339	doi:10.1158/0008-5472.CAN-18-0125	doi:10.1088/0031-9155/60/14/5471
Notes	relative newcomer	very standard and supported	user-friendly	* MATLAB requires a license

Once we have tabular data, we can apply various algorithms to analyze it. However, applying ML isn’t just a matter of running algorithms and getting quick results. In the next section, we will explore one reason why this process is more complex than it may seem.

Harmonization

We often need to harmonize either images or datasets of derived features. When dealing with images, differences between datasets can be visually apparent. For example, if one set of X-rays consistently appears darker, it may be wise to compare average pixel values between datasets and adjust normalization to achieve more uniformity. This is a straightforward scenario.

Consider another example involving derived datasets from brain MRIs with Virchow Robin’s spaces. One dataset was captured using a 1.5 Tesla machine in a distant location, while the other used an experimental 5 Tesla machine (high resolution) in an advanced hospital. These machines vary in resolution, meaning what appears as a single Virchow Robin’s space at low resolution might actually show as two or three smaller spaces fused together at high resolution. This is just one potential difference.

Below are images of the same patient scanned with 1.5 Tesla and 3 Tesla machines:

Sourced from Bernd L. Schmitz, Andrik J. Aschoff, Martin H.K. Hoffmann and Georg Grön, Advantages and Pitfalls in 3T MR Brain Imaging: A Pictorial Review, American Journal of Neuroradiology October 2005, 26 (9) 2229-2237

Different contrast levels significantly affect the visibility of structures like the caudate and thalami in brain images. As a result, the radiomic characteristics of these images, including contrast and possibly other parameters, can vary even when they originate from the same patient.

Constructing a dataset based solely on extensive, unharmonized derived data is not always practical. However, without access to the original images, it becomes difficult to fully understand and account for these variations.

We recommend the following approach:

Compare the data using descriptive statistics and your knowledge of the patient groups. Consider whether you expect similar counts across both groups, and justify your reasoning.
Explore the use of a harmonization package to standardize data across different datasets.

Below are three examples from the field of neuroimaging:

	`neurocombat`	`haca3`	`autocombat`
License	MIT for Python/ Artistic for R	Missing?	Apache 2.0
Last updated	2021	2024	2022
Programming language	Python or R	Python	Python
Organ/area and modality	brain MRI	brain MRI	brain MRI
Data type	derived values	images	derived values
Notes	no standard release yet	versioned but not released	not release, not versioned
Original website	GitHub	GitHub	GitHub
Early publication	doi: 10.1016/j.neuroimage.2017.08.047	doi:10.1016/j.compmedimag.2023.102285	doi: 10.1038/s41598-022-16609-1
Versioned website	versioned on cvasl	versioned on GitHub	versioned on cvasl

There are numerous packages available for brain MRI alone, not to mention those for imaging the rest of the body. We have selected these three examples to highlight some of the common issues and pitfalls associated with such packages.

Challenge

Challenge: Identifying Potential Problems in Each Package

Consider issues that might hinder your implementation with each package.

Show me the solution

Aging code: neurocombat was last modified three years ago. It may rely on outdated dependencies that could pose compatibility issues with newer hardware and software environments. Additionally, the lack of versioned releases makes it challenging to track changes and updates.

No license: haca3 does not have a clear license available. Using code without a proper license could lead to legal complications and uncertainties about its usage rights.

No releases and versioning: autocombat lacks both released versions and versioning practices. Without clear versioning, it becomes difficult to ensure consistency and compare results across different implementations of the package.

We’ve pointed out a few potential issues to highlight common challenges with such programs. Looking ahead, consider developing a harmonization package that’s sustainable and useful for your research community.

While each of these packages has its strengths for researchers, none are perfect. Choose wisely when selecting a package! Also, remember you can create your own harmonization method through coding or develop it into a package for others to use.

Key Points

Direct knowledge of specific data cannot be substituted
Statistical analysis is essential to detect and mitigate biases in patient distribution
Verify if derived data aligns with known clinical realities through statistical examination
Evaluate the validity and utility of data augmentation methods before applying them
Radiomics enables the use of mathematical image qualities as features
There are several accessible pipelines for volumetrics and radiomics
Data from different machines (or even the same machines with different settings) often requires harmonization, achievable through coding and the use of existing libraries

Content from Working with Pathology Images

Last updated on 2025-02-05 | Edit this page

Estimated time: 90 minutes

Overview

Questions

What are some open libraries can be used with pathology?
How are pathology slides different from a photo?
What is the structure of a pathology slide?

Objectives

Describe some open source tools and libraries for pathology images
Discuss common file formats for pathology slides
Explain tiling and pyramidal file formats
Show basic code in openslide

Introduction

Pathologists deal with multiple types of images. Some pathologists may use photography when recording gross (or macroscopic) pathology data. The processing of digital photography is a straightforward application of general image processing. However the workhorse type image of pathology is the histopathological image, and this is very different from a photograph.

In the below picture you can see an image with gross and histopathology of signet ring cell carcinoma metastasis to the ovary. The histopathology images (bottom row images) are stained microscopy images. The gross pathology images (top row images) are photographs.

Nakamura, Yoshiaki; Hiramatsu, Ayako; Koyama, Takafumi; Oyama, Yu; Tanaka, Ayuko; Honma, Koichi, CC BY 3.0 https://creativecommons.org/licenses/by/3.0, via Wikimedia Commons

Signet ring cell carcinoma is a type of cancer that currently can only be definitively differentiated from other types of gastric cancers based on the microscopic characteristics (the histology or histopathology) of the tumor. In this lesson we will cover how to work with histopathology images as they differ from simpler multi-channel color image like a modern digital camera often produces.

File formats

In terms of histopathological images, you may find them in several file formats. Not every library or tool for such imaging works with every format. Below is a table with several file formats.

Format Name	Extension	Notes	More info
TIFF	.tiff	tag image file format	can be tiled
OME-TIFF	.ome.tiff, ome.tif, .ome.tf2, .ome.tf8 or .ome.btf	includes XML and may be multiple TIFFs	https://brainder.org/2012/09/23/the-nifti-file-format/
SVS	.svs	Scanned Virtual Slide	https://www.dicomstandard.org/
DICOM	.dcm or none	supports an included WSI file	https://dicom.nema.org/dicom/dicomwsi/

The table above is far from complete because any vendors have their own file format, however the community is consolidating towards shared file formats that are not vendor specific. The table is organized in a manner of ascending complexity. A simple general TIFF file is very close to a simple raster image. Such a file for a high resolution image would be quite large and potentially hard to manage based on size alone. Tiling is a way to more efficiently manage huge images.

Tiling approaches will be very familiar to anyone who comes from the world of GIS, mapping or digital geography. Maps provide a really obvious way to think about the advantages of pyramidal tiling. Imagine if every time you wanted to get a map with directions, and check out a picture of your destination, you needed to load a detailed map picture of the entire world, literally. It would be incredibly inefficient and taxing on your computer or smartphone. It’s much more efficient to just load what you need. There are many ways this can be done but practically some variation of pyramidal tiling is usually used. The same is true with tiled pathology images.

To be clear tiling is the process of cutting a big image up so you can load only relevant parts of it at a time. Tiling is useful even without pyramids - it can be used to make efficient running parallel inference on huge images (i.e. medical imaging at high resolution) which will mostly run at the highest resolution.

Pyramidical images are those where multiple images are made from a source image at predefined zoom levels so you can switch beteeen using the different resolutions as needed. Remember the last time you downloaded an image for a presentation and you there was a choice of resolutions because not everyone needs an image of Gigabytes in size for high resolution quality? You can think of this as a sort of pyramidal-like scheme, except that the images were not all in the same image. In the world of modern digital pathology we often use both tiling and pyramidal structure simultaneosly.

Therefore practically speaking we often have tiled images made in a pyramidal structure, and at each level of the pyramid after zero there is a lower resolution downscaled version of the image. Each level of the pyramid has tiles, smaller parts of the image that are quilted into the whole image. Creating images in this way allows much more efficient access and management of images because you can load and display images at only one level. An OME-TIFF file can be one or more tiled TIFFs and some XML in the header. An SVS file is essentially a tiled TIFF image that has a few additional things like overview image. Various types of pathology files can be somehow packed into a DICOM which has it’s own file specification. DICOMs could be conceptualized as a containers which can contain many sorts of imaging data, e.g. JPEGS of X-rays, including OME-TIFFs or other pathology file types.

Libraries

There is actually very little in the way of open pure Python libraries that deal with pathology images. A promising new contender is TIA toolbox ; however this is a relatively new library. There are better known (in the small community of researchers in digital pathology) more popular libraries that have binders or other tricks that allow you to process pathology images in Python. Probably the most popular is Openslide, but other contenders include Bio-Formats (which is both a library and a tool) and QuPath which has the distinct advantage of being able to work with OME-TIFF directly.

QuPath is mainly a GUI tool, that relies on libraries (Bioformats/OpenSlide) as “backends” to load the slide. It can be used directly as a tool for viewing, unlike BioFormats/OpenSlide, which are code only. One potential disadvantage of QuPath if you code with it, as opposed to just using it’s GUI, it usually requires some Java knowledge. Bio-Formats has Python bindings but is also mainly a Java tool which can also be used as a Java or C++ library. To keep things simple we will use openslide as the vehicle to explore histopathology images in this lesson.

Reading TIFF histopathology Images

Openslide is actually written in C, but you can also use Java or Python to work with is due to bindings.To run the code you will need to work in the pathy environment created with the environment_pathology.yml file.

PYTHON

import openslide
from openslide import open_slide
import numpy as np
import matplotlib.pyplot as plt

After importing the needed libraries we can now load a slide, and print out it’s properties.

PYTHON

file_path = 'data/pathology/CMU-1.tiff'

slide_in = open_slide(file_path)

slide_in_props = slide_in.properties
print(slide_in_props)

OUTPUT

<_PropertyMap {'openslide.level-count': '9', 'openslide.level[0].downsample': '1', 'openslide.level[0].height': '32914', 'openslide.level[0].tile-height': '256', 'openslide.level[0].tile-width': '256', 'openslide.level[0].width': '46000', 'openslide.level[1].downsample': '2', 'openslide.level[1].height': '16457', 'openslide.level[1].tile-height': '256', 'openslide.level[1].tile-width': '256', 'openslide.level[1].width': '23000', 'openslide.level[2].downsample': '4.0001215362177929', 'openslide.level[2].height': '8228', 'openslide.level[2].tile-height': '256', 'openslide.level[2].tile-width': '256', 'openslide.level[2].width': '11500', 'openslide.level[3].downsample': '8.0002430724355857', 'openslide.level[3].height': '4114', 'openslide.level[3].tile-height': '256', 'openslide.level[3].tile-width': '256', 'openslide.level[3].width': '5750', 'openslide.level[4].downsample': '16.000486144871171', 'openslide.level[4].height': '2057', 'openslide.level[4].tile-height': '256', 'openslide.level[4].tile-width': '256', 'openslide.level[4].width': '2875', 'openslide.level[5].downsample': '32.014322017605849', 'openslide.level[5].height': '1028', 'openslide.level[5].tile-height': '256', 'openslide.level[5].tile-width': '256', 'openslide.level[5].width': '1437', 'openslide.level[6].downsample': '64.050935911470475', 'openslide.level[6].height': '514', 'openslide.level[6].tile-height': '256', 'openslide.level[6].tile-width': '256', 'openslide.level[6].width': '718', 'openslide.level[7].downsample': '128.10187182294095', 'openslide.level[7].height': '257', 'openslide.level[7].tile-height': '256', 'openslide.level[7].tile-width': '256', 'openslide.level[7].width': '359', 'openslide.level[8].downsample': '257.06193261173183', 'openslide.level[8].height': '128', 'openslide.level[8].tile-height': '256', 'openslide.level[8].tile-width': '256', 'openslide.level[8].width': '179', 'openslide.mpp-x': '1000', 'openslide.mpp-y': '1000', 'openslide.quickhash-1': '428aa6abf42c774234a463cb90e2cbf88423afc0217e46ec2e308f31e29f1a9f', 'openslide.vendor': 'generic-tiff', 'tiff.ResolutionUnit': 'centimeter', 'tiff.XResolution': '10', 'tiff.YResolution': '10'}>

When we printed the properties we can see we are dealing with a tiled slide because we have properties like levels and tile height and width. We can observe here we have nine levels in this specific image file, and see tile heights and widths are often 256 n this specific file but actual heights and widths very per level. These properties may not be the only properties we care about. We most likely may care about mapping this image back to real world sizes. We have a property that tells us the microns per pixel. Let’s look:

PYTHON

print("Pixel size of X in um is:", slide_in_props['openslide.mpp-x'])
print("Pixel size of Y in um is:", slide_in_props['openslide.mpp-y'])

OUTPUT

Pixel size of X in um is: 1000
Pixel size of Y in um is: 1000

Challenge

Challenge: Find the real world size of an image

There is a property of the loaded image which is (width, height) tuple for level 0 of the image. The property is called dimensions. Calculate the real world dimensions of our image in microns.

Hint: Make sure you understand the type of each variable you assign. You can always check by printing out the type.

Show me the solution

PYTHON

# get the width and height
width, height = slide_in.dimensions

# get the spatial resolution in microns per pixel for the X and Y directions
microns_per_pixel_x = slide_in.properties.get(openslide.PROPERTY_NAME_MPP_X)
microns_per_pixel_y = slide_in.properties.get(openslide.PROPERTY_NAME_MPP_Y)

# convert to floats because they are strings
microns_per_pixel_x = float(microns_per_pixel_x)
microns_per_pixel_y = float(microns_per_pixel_y)

# calculate real-world dimensions of the slide in microns
real_world_width = width * microns_per_pixel_x
real_world_height = height * microns_per_pixel_y

# print the results
print(f"Real-world dimensions in microns: {real_world_width} microns (width), {real_world_height} microns (height)")

OUTPUT

Real-world dimensions in microns: 46000000.0 microns (width), 32914000.0 microns (height)

As you might guess loading this huge image is pointless if we just want a small image that gives us a general idea. We can use openslide to make a thumbnail.

PYTHON

# get a thumbnail of the image and pop it out on your computer
slide_in_thumb_600 = slide_in.get_thumbnail(size=(600, 600))
slide_in_thumb_600.show() #

With the above code an image will be opened on most operating systems, but if one does not open for you, do not depair. We could also store off our thumbnail (or any part of the image for that matter) as a numpy array, and then graph it.

PYTHON

# convert thumbnail to numpy array, and plot it
slide_in_thumb_600_np = np.array(slide_in_thumb_600)
plt.figure(figsize=(8,8))
plt.imshow(slide_in_thumb_600_np)

OUTPUT

The above image is part of the openslide sample images provided with the library, which is on an open software license

We can also explore a bit about how the image is tiled.

PYTHON


# get slide_in dims at each level of the pyramid at various levels
dims = slide_in.level_dimensions

num_levels = len(dims)
print("Number of levels in this image are:", num_levels)

print("Dimensions of various levels in this image are:", dims)

# understand by how much are levels downsampled from the original image
factors = slide_in.level_downsamples
print("Each level is downsampled by an amount of: ", factors)

OUTPUT

Number of levels in this image are: 9
Dimensions of various levels in this image are: ((46000, 32914), (23000, 16457), (11500, 8228), (5750, 4114), (2875, 2057), (1437, 1028), (718, 514), (359, 257), (179, 128))
Each level is downsampled by an amount of:  (1.0, 2.0, 4.000121536217793, 8.000243072435586, 16.00048614487117, 32.01432201760585, 64.05093591147048, 128.10187182294095, 257.06193261173183)

Pyramidal file structure and tiling are quite challenging to understand, and different file formats can actually implement the concepts a bit differently. Pyramids are actually a more general concept in image processing. The key concept about them to understand are that a pyramidal image simply represents the image at multiple scales, and that could be implemented in many different ways. Usually we don’t create such pyramids by hand, but rather, you guessed it, algorithmically with code. The pictures below illustrate the concepts involved in tiled pyramical images. File structures Pyramidal histopathology

Images with modifications from article “KMIT-Pathology: Digital Pathology AI Platform for Cancer Biomarkers Identification on Whole Slide Images” with CC BY 4.0 license. Subramanian, Rajasekaran & Rajasekaran, Devikarubi & Tapadia, Rohit & Singh, Rochan. (2022). KMIT-Pathology: Digital Pathology AI Platform for Cancer Biomarkers Identification on Whole Slide Images. International Journal of Advanced Computer Science and Applications. 13. 10.14569/IJACSA.2022.0131170.

One final note to consider about this pyramical tiled structure of images is that it may well be the wave of the future for more types of medical images as medical images get to higher and higher resolutions, and more and more data is stored in clouds.

Content from Anonymizing Medical Images

Last updated on 2025-03-18 | Edit this page

Estimated time: 70 minutes

Overview

Questions

What types of data make patient’s imaging data identifiable?
How can I ensure the safe sharing of medical image data?
How can I remove specific metadata from DICOM files?
How can I deal with identifying data within images themselves?

Objectives

Provide examples of data that makes patient images identifiable
Discuss the concepts of identifiable data and anonymization
Demonstrate how approaches to anonymizing identifiable images
Demonstrate the use of the Pydicom library to manage DICOM metadata

Introduction

Each of us is similar yet unique, and this individuality can make us identifiable, posing challenges for medical research. While open data sharing advances research, most patients would not want their medical details shared if they could be identified. In most countries, patient information is protected by law.

Metadata elements in imaging, such as patient names and addresses, are often clearly designed to identify patients. However, the uniqueness of patients means that even images without obvious metadata, such as names, can potentially be identified as belonging to a specific individual. With advancements in facial recognition software and search engines, images we previously thought were non-identifiable, like head CTs, MRIs, or even PET scans, can theoretically be traced back to a specific patient. To address this, we can implement de-identification strategies to create shareable data.

Types of Patient Identifying Data

Metadata

DICOM files contain metadata, which includes various types of identifying information that should remain confidential. The easiest way to mitigate issues with DICOM metadata is to avoid having it in the first place. If possible, opt to receive just the images and select metadata rather than the entire DICOM file. When sharing data with collaborators, there is often no need to share the full DICOM files.

Text on Images

Occasionally, technicians will “burn” information directly onto images. This means they change the image itself with text that becomes part of the image pixels. This “burned-in” annotation may include details such as diagnoses, demographics, or the patient’s name. Fortunately, this text is usually typed rather than handwritten, making it recognizable by optical character recognition (OCR) functions. Often, this text is placed away from the center of the image, allowing for clever cropping to eliminate it entirely in some datasets.

Challenge

Challenge: Getting rid of identifying burned in data

An ultrasound (taken from the public internet on a creative commons license lisence here.) must be kept at it’s existing height and width dimensions. You should change the image from RGB to grayscale for your operations. It can be opened as follows:

PYTHON


import numpy as np
import matplotlib.pyplot as plt
from skimage import io, filters

image = 'data/anonym/identifiable_us.jpg'
io.imshow(image)
io.show()

OUTPUT

Write code for more than one approach to de-identify (remove the annotations from) the ultrasound image data.

Show me the solution

The image has identifying metadata. You should identify not only the name, but the date and place as problematic. You can take three different approaches. One would be to mask the data, the other would be to blur it, and finally you can just crop it out entirely. First we will show a selectively blurred image:

PYTHON


from skimage.filters import gaussian
from skimage.color import rgb2gray

image_base = io.imread(image)
image_base = rgb2gray(image_base)
sub_im = image_base[0:78,:].copy()
blur_sub_im = gaussian(sub_im, sigma=9)
final_image = np.zeros(image_base.shape)
final_image[0:78,:] = blur_sub_im
final_image[79:,:]= image_base[79:, :]
io.imshow(final_image)

OUTPUT

We could have also just make a simple zero-mask:

PYTHON

image_masked = io.imread(image)
image_masked[0:78,:] = 0
io.imshow(image_masked)

OUTPUT

Finally, you could always just chop off the offending part so to speak, and resize. However if you examine the image when we try this solution you should notice that the image has changed in aspect ratio, thus pixels values have changed. This may not be the ideal solution for situations when you want to maintain the exact image pixels. Nonetheless you can crop and resize as below:

PYTHON

from skimage.transform import resize

final_image= image_base[79:, :]
final_image = resize(final_image,image_base.shape)
io.imshow(final_image)

OUTPUT

Note there are other valid solutions to getting rid of what is identifying image data, but the first two shown (blurring and masking) are very common and straightforward. You have now seen three approaches to removing some of the visual data on a 2-D image. The same approaches can be taken on a 3D image.

Let’s take another look at an image we used earlier:

PYTHON

import matplotlib.pyplot as plt
import SimpleITK as sitk
from skimage import io
from ipywidgets import interact, fixed
from IPython.display import clear_output
import os


sag_image =  sitk.ReadImage("data/sitk/A1_grayT1.nrrd", sitk.sitkFloat32)
cor_image = sitk.PermuteAxes(sag_image, [2, 1, 0])

# General display function for any two 3D images
def display_images(image1_z, image2_z, image1_npa, image2_npa, title1="Image 1", title2="Image 2"):
    plt.subplots(1, 2, figsize=(10, 8))
    
    # Display the first image
    plt.subplot(1, 2, 1)
    plt.imshow(image1_npa[image1_z, :, :], cmap=plt.cm.Greys_r)
    plt.title(title1)
    plt.axis('off')
    
    # Display the second image
    plt.subplot(1, 2, 2)
    plt.imshow(image2_npa[image2_z, :, :], cmap=plt.cm.Greys_r)
    plt.title(title2)
    plt.axis('off')
    
    plt.show()

# Display the sagittal and coronal views of the original image
interact(
    display_images,
    image1_z=(0, sag_image.GetSize()[2] - 1),
    image2_z=(0, cor_image.GetSize()[2] - 1),
    image1_npa=fixed(sitk.GetArrayViewFromImage(sag_image)),
    image2_npa=fixed(sitk.GetArrayViewFromImage(cor_image)),
    title1=fixed("Sagittal Cut"),
    title2=fixed("Coronal Cut")
)

OUTPUT

Here we can see an analagous technique to one we used with the 2D ultrasound. Identifying data (the nose and part of the ears) has been cropped out.

Challenge

Challenge: Are we truly deidentified?(Optional)

Look at the head MRI above. Are all such images, including CTs, with some of the soft tissue stripped 100 percent deidentified? Give arguments why and why not

Show me the solution

In most cases images like the above are de-identified. The image above is only an image, not a DICOM with potentially identifying metadata. Further we are missing the nose, therefore putting this in a reverse look-up engine online will not yield identifying results. On the other hand theoretically we could have an identifiable image. One potentially identifiable case could be someone with a very identifiable neck, skull bones (these will be more visible on CT) and/or ears. Additionally in the case of patients who have had some kind of rare pathology and surgery, they may still be identifiable for some familiar with their cases.

Faces in Images

A full CT, MRI, or PET scan of the head can be reconstructed into a detailed facial image, potentially revealing the patient’s identity and demographic information, such as ethnicity and gender. To mitigate this risk, many image analysis programs employ ‘defacing’ techniques to obscure these identifiable features.

We could in theory write our own defacing algorithm. For such an algorithm libraries such as SITK or skimage (sci-kit image) or even openCV provide several useful built in functions including morphological operations such as erosion, dilation, grow-from-seed and other types of operations such as connected component analysis and masking. Which library you choose will often be based on minimizing the number of libraries in your total code, because mixing close libraries with similarly named functions is a bad idea.

Challenge

Challenge: Soft tissue stripping(Optional)

Look at the head MRI above. Try using SITK to get rid of some of the soft tissue in the image (already loaded into sag_image).

Show me the solution

We can approach the problem in various ways. Below is one solution with erosion, dilation and masking:

PYTHON

# Apply thresholding to remove soft tissues by first taking out the air then dilating and cleaning
# the assumption is that the black around the brain is zero and low values
# Create the brain mask
lower_thresh = 0
upper_thresh = 100
brain_mask = sitk.BinaryThreshold(sag_image, lowerThreshold=lower_thresh, upperThreshold=upper_thresh)

# Morphological operations to clean the mask
brain_mask_cleaned = sitk.BinaryDilate(brain_mask, [5, 5, 5])
brain_mask_cleaned = sitk.BinaryErode(brain_mask_cleaned, [5, 5, 5])

# Display the original and cleaned mask images using the general display function
interact(
    display_images,
    image1_z=(0, brain_mask.GetSize()[2] - 1),
    image2_z=(0, brain_mask_cleaned.GetSize()[2] - 1),
    image1_npa=fixed(sitk.GetArrayViewFromImage(brain_mask)),
    image2_npa=fixed(sitk.GetArrayViewFromImage(brain_mask_cleaned)),
    title1=fixed("Original Mask"),
    title2=fixed("Cleaned Mask")
)

Then we can further clean with a connected component analysis that throws out our small floaters, and use the inverse as out final mask.

PYTHON


def keep_largest_component(mask_image):
    # Ensure mask_image is will work and use as little memory as possible 
    mask_image = sitk.Cast(mask_image, sitk.sitkUInt8)
    
    # Label connected components in the mask image
    labeled_image = sitk.ConnectedComponent(mask_image)
    
    # Measure the size of each labeled component
    label_shape_statistics = sitk.LabelShapeStatisticsImageFilter()
    label_shape_statistics.Execute(labeled_image)
    
    # Count and print the number of connected components
    component_count = len(label_shape_statistics.GetLabels())
    print(f"Number of connected components before filtering: {component_count}")
    
    # Find the label with the largest size
    largest_label = max(
        label_shape_statistics.GetLabels(),
        key=lambda label: label_shape_statistics.GetPhysicalSize(label)
    )
    
    # Create a new mask with only the largest component
    largest_component_mask = sitk.BinaryThreshold(labeled_image, lowerThreshold=largest_label, upperThreshold=largest_label, insideValue=1, outsideValue=0)
    
    # Verify the result by counting the components in the resulting image
    labeled_result = sitk.ConnectedComponent(largest_component_mask)
    label_shape_statistics.Execute(labeled_result)
    result_component_count = len(label_shape_statistics.GetLabels())
    print(f"Number of connected components after filtering: {result_component_count}")
    
    return largest_component_mask

largest_component_mask = keep_largest_component(brain_mask_cleaned)
# we actually want the opposite mask so we will invert the mask
inverted_mask = sitk.BinaryNot(largest_component_mask) 
# Apply the mask to the image
brain_only = sitk.Mask(sag_image, inverted_mask)

interact(
    display_images,
    image1_z = (0,brain_only.GetSize()[2]-1),
    image2_z = (0,largest_component_mask.GetSize()[2]-1),
    image1_npa = fixed(sitk.GetArrayViewFromImage(brain_only)),
    image2_npa = fixed(sitk.GetArrayViewFromImage(largest_component_mask)),
    title1=fixed("new image"),
    title2=fixed("mask")
    )

OUTPUT

Now you may notice that we put together and erosion and dilation, and we could have used instead a closure. There are many ways to solve this problem. An entirely different approach would be to use grow-from seed as below (not this may take a while to execute):

PYTHON

def guess_seed_point(img):
   """
   This function guesses a seed point for the brain in the middle of the image, and returns some seed points.
   """
   possible_point = img.GetSize()[0]//2, img.GetSize()[1]//2, img.GetSize()[2]//2
   # Get the pixel value at the potential location
   pixel_value = img.GetPixel(*possible_point)
   if pixel_value > 0:
       picked_point = possible_point
   else:
       # just move over a bit and hope for better
       new_possible_point = img.GetSize()[0]//2 + img.GetSize()[0]//10 , img.GetSize()[1]//2, img.GetSize()[2]//2
       picked_point = new_possible_point
   return picked_point
# do some reality check of a look at the value in your seed point  
seed_point = guess_seed_point(sag_image) 
pixel_value = sag_image.GetPixel(seed_point)
print(pixel_value)

OUTPUT

394.37042236328125

Grow from seed, then display:

PYTHON

seed_point = guess_seed_point(sag_image)  
lower_threshold = 370   # lower threshold
upper_threshold = 480   # upper threshold


seed_mask = sitk.ConnectedThreshold(
   sag_image, seedList=[seed_point],
   lower=lower_threshold,
   upper=upper_threshold)
# let's dilate up a bit
seed_mask= sitk.BinaryDilate(seed_mask, [5, 5, 5])

# apply the mask to the image
brain_only = sitk.Mask(sag_image, seed_mask)
# display to see what happened
interact(
   display_images,
   image1_z=(0, sag_image.GetSize()[2] - 1),
   image2_z=(0, brain_only.GetSize()[2] - 1),
   image1_npa=fixed(sitk.GetArrayViewFromImage(sag_image)),
   image2_npa=fixed(sitk.GetArrayViewFromImage(brain_only)),
   title1=fixed("Original"),
   title2=fixed("Seeded and Masked")
)

OUTPUT

In the two solutions above we made masks. They each have different problems we can see. For example in the grown from seed images we stripped out part of the genu and splenium of the corpus callosum not to mention the entire pons. An alternative is that could also have used a registered atlas as a mask. Regardless of the base algorithms we begin with, if we want a better image, we can keep tinkering with our algorithms.

In the provided solutions to the above optional challenge we didn’t get rid of a lot of the neck, and had other issues. A different approach we could have taken would have been to register the brain to a brain-like shape and use this as a mask. Nonetheless the given solutions shows two potential approaches to removing tissue (which could in some cases lead to identification) you don’t want in an image. Many researchers do not want to waste time optimizing a skull stripping technique if this was not the research question, so they use a pre-made technique.

There are various tools available for defacing head imaging, ranging from fully developed software products like FreeSurfer, which includes built-in defacing capabilities, to specialized functions within coding libraries. Some of these tools strip off all of the skull and soft tissue which may be useful for analysis even if we don’t care about deidentification e.g. if we only want to look at brain tissue.

Unfortunately a key issue under current investigation is that some defacing algorithms may inadvertently alter more than just the facial features. Emerging research suggests that these algorithms might also affect the morphometry of the brain image. This could lead to the unintended loss or distortion of critical data. Therefore, it is advisable to proceed with caution and, whenever possible, compare the original and defaced images to ensure that important information remains intact and unaltered.

Other Parts of Images

Patient identity can often be inferred with just a few pieces of data. In some cases, a single piece of information can be enough to track down a patient’s identity, especially if medical files are accessible. For instance, a serial number or other identifying number on a medical device may be traceable back to a specific patient.

In other situations, slightly more data might be required to identify a patient. Some patients may wear unique jewelry, such as a MedicAlert bracelet or necklace with initials or a name. While most routine ambulatory images are taken without jewelry, in emergency situations, medical personnel may not have had the time to remove these items. The more data points we have on a patient, the easier it becomes to identify them.

Metadata

Metadata is in the broadest sense data about our data. In the practical sense we need to understand it is a place outside the image that identifying data may be lurking in.

Various tools are available to help de-identify DICOM files in terms of metadata. A notable one is DicomAnonymizer, an open-source tool written in Python.

In some cases, you may need to examine and remove metadata manually or programmatically. For example, in some countries, DICOM fields are used inconsistently, and patient-identifying data can appear in unexpected fields. Therefore, careful examination and customized removal of metadata may be necessary.

Callout

Many Ways to Handle a DICOM:

Multiple libraries, such as Pydicom and SimpleITK (SITK), allow you to read, access, and manipulate DICOM metadata.
DICOMs follow an extremely complex standard, so it is usually better to use existing libraries rather than raw Python to handle them.

For various reasons, we may prefer Pydicom, SITK, or another method to handle DICOM metadata, typically based on the principle of minimizing dependencies and maintaining simplicity. SITK was introduced earlier in this course. Pydicom is an excellent alternative, particularly because of its comprehensive documentation.

Now, let’s see how to open a DICOM file and work with it using Pydicom.

First, let’s import Pydicom and read in a CT scan:

PYTHON

import pydicom
from pydicom import dcmread
fpath = "data/anonym/our_sample_dicom.dcm"
ds = dcmread(fpath)
print(ds)

OUTPUT

Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 218
(0002, 0001) File Meta Information Version       OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID         UI: CT Image Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 1.3.46.670589.33.1.63849049636503447100001.4758671761353145811
(0002, 0010) Transfer Syntax UID                 UI: JPEG Lossless, Non-Hierarchical, First-Order Prediction (Process 14 [Selection Value 1])
(0002, 0012) Implementation Class UID            UI: 1.2.840.113845.1.1
(0002, 0013) Implementation Version Name         SH: 'Syn7,3,0,258'
(0002, 0016) Source Application Entity Title     AE: 'SynapseDicomSCP'
-------------------------------------------------
(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0008) Image Type                          CS: ['DERIVED', 'SECONDARY', 'MPR']
(0008, 0012) Instance Creation Date              DA: '20240418'
(0008, 0013) Instance Creation Time              TM: '150716.503'
(0008, 0016) SOP Class UID                       UI: CT Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.3.46.670589.33.1.63849049636503447100001.4758671761353145811
(0008, 0020) Study Date                          DA: '20240418'
(0008, 0022) Acquisition Date                    DA: '20240418'
(0008, 0023) Content Date                        DA: '20240418'
(0008, 002a) Acquisition DateTime                DT: '20240418150313.020'
(0008, 0030) Study Time                          TM: '150045'
(0008, 0032) Acquisition Time                    TM: '150313'
(0008, 0033) Content Time                        TM: '150314.375'
(0008, 0050) Accession Number                    SH: '2001433888'
(0008, 0060) Modality                            CS: 'CT'
(0008, 0070) Manufacturer                        LO: 'Philips'
(0008, 0080) Institution Name                    LO: 'BovenIJ Ziekenhuis iCT'
(0008, 0081) Institution Address                 ST: ''
(0008, 0090) Referring Physician's Name          PN: 'WILTING^I^I^""'
(0008, 1010) Station Name                        SH: 'HOST-999999'
(0008, 1030) Study Description                   LO: 'CT thorax met iv contrast'
(0008, 103e) Series Description                  LO: 'Cor IMR med'
(0008, 1040) Institutional Department Name       LO: 'Radiology'
(0008, 1080) Admitting Diagnoses Description     LO: ''
(0008, 1084)  Admitting Diagnoses Code Sequence  0 item(s) ----
(0008, 1090) Manufacturer's Model Name           LO: 'iCT 256'
(0008, 1111)  Referenced Performed Procedure Step Sequence  1 item(s) ----
   (0008, 1150) Referenced SOP Class UID            UI: Modality Performed Procedure Step SOP Class
   (0008, 1155) Referenced SOP Instance UID         UI: 1.3.46.670589.33.1.63849049241567858000001.4675122277016890611
   ---------
(0008, 1140)  Referenced Image Sequence  1 item(s) ----
   (0008, 1150) Referenced SOP Class UID            UI: CT Image Storage
   (0008, 1155) Referenced SOP Instance UID         UI: 1.3.46.670589.33.1.63849049294969912500001.5475332148846191441
   ---------
(0008, 3010) Irradiation Event UID               UI: 1.3.46.670589.33.1.63849049343237673200010.5507538603167078985
(0010, 0010) Patient's Name                      PN: 'OurBeloved^Colleague'
(0010, 0020) Patient ID                          LO: 'party like 1999'
(0010, 0030) Patient's Birth Date                DA: '19421104'
(0010, 0040) Patient's Sex                       CS: 'M'
(0010, 1000) Other Patient IDs                   LO: '1989442112'
(0010, 1010) Patient's Age                       AS: '041Y'
(0018, 0010) Contrast/Bolus Agent                LO: 'Iodine'
(0018, 0015) Body Part Examined                  CS: 'CHEST'
(0018, 0022) Scan Options                        CS: 'HELIX'
(0018, 0050) Slice Thickness                     DS: '2.0'
(0018, 0060) KVP                                 DS: '100.0'
(0018, 0088) Spacing Between Slices              DS: '2.0'
(0018, 0090) Data Collection Diameter            DS: '500.0'
(0018, 1000) Device Serial Number                LO: ''
(0018, 1020) Software Versions                   LO: '4.1'
(0018, 1030) Protocol Name                       LO: 'Thorax std /Thorax'
(0018, 1040) Contrast/Bolus Route                LO: 'IV'
(0018, 1041) Contrast/Bolus Volume               DS: '80.0'
(0018, 1044) Contrast/Bolus Total Dose           DS: '40.0'
(0018, 1046) Contrast Flow Rate                  DS: [3, 3]
(0018, 1047) Contrast Flow Duration              DS: [17, 10]
(0018, 1049) Contrast/Bolus Ingredient Concentra DS: '300.0'
(0018, 1100) Reconstruction Diameter             DS: '348.0'
(0018, 1110) Distance Source to Detector         DS: '1040.0'
(0018, 1111) Distance Source to Patient          DS: '570.0'
(0018, 1120) Gantry/Detector Tilt                DS: '0.0'
(0018, 1130) Table Height                        DS: '85.1'
(0018, 1150) Exposure Time                       IS: '434'
(0018, 1151) X-Ray Tube Current                  IS: '258'
(0018, 1152) Exposure                            IS: '108'
(0018, 1160) Filter Type                         SH: 'IMR'
(0018, 1210) Convolution Kernel                  SH: 'IMR1,Soft Tissue'
(0018, 5100) Patient Position                    CS: 'FFS'
(0018, 9305) Revolution Time                     FD: 0.33
(0018, 9306) Single Collimation Width            FD: 0.625
(0018, 9307) Total Collimation Width             FD: 80.0
(0018, 9309) Table Speed                         FD: 185.0
(0018, 9310) Table Feed per Rotation             FD: 97.664
(0018, 9311) Spiral Pitch Factor                 FD: 0.763
(0018, 9345) CTDIvol                             FD: 4.330253533859318
(0018, a001)  Contributing Equipment Sequence  1 item(s) ----
   (0008, 0070) Manufacturer                        LO: 'PHILIPS'
   (0008, 0080) Institution Name                    LO: 'BRILLIANCE4'
   (0008, 0081) Institution Address                 ST: 'BRILLIANCE4'
   (0008, 1010) Station Name                        SH: 'HOST-999999'
   (0008, 1040) Institutional Department Name       LO: 'BRILLIANCE4'
   (0008, 1090) Manufacturer's Model Name           LO: 'BRILLIANCE4'
   (0018, 1000) Device Serial Number                LO: 'BRILLIANCE4'
   (0018, 1020) Software Versions                   LO: '4.5.0.30020'
   (0040, a170)  Purpose of Reference Code Sequence  1 item(s) ----
      (0008, 0100) Code Value                          SH: '109102'
      (0008, 0102) Coding Scheme Designator            SH: 'DCM'
      (0008, 0104) Code Meaning                        LO: 'Processing Equipment'
      ---------
   ---------
(0020, 000d) Study Instance UID                  UI: 1.3.46.670589.33.1.63849049241560857600001.4706589000974752499
(0020, 000e) Series Instance UID                 UI: 1.3.46.670589.33.1.63849049343237673200004.5226562961912261811
(0020, 0010) Study ID                            SH: '8041'
(0020, 0011) Series Number                       IS: '203'
(0020, 0012) Acquisition Number                  IS: '2'
(0020, 0013) Instance Number                     IS: '1'
(0020, 0032) Image Position (Patient)            DS: [-172.7884, 8.90000000000001, 1201.43792746114]
(0020, 0037) Image Orientation (Patient)         DS: [1, 0, 0, 0, 0, -1]
(0020, 0052) Frame of Reference UID              UI: 1.3.46.670589.33.1.63849049263758127200002.5362237490253193614
(0020, 1040) Position Reference Indicator        LO: ''
(0020, 4000) Image Comments                      LT: 'Cor IMR med'
(0028, 0002) Samples per Pixel                   US: 1
(0028, 0004) Photometric Interpretation          CS: 'MONOCHROME2'
(0028, 0010) Rows                                US: 832
(0028, 0011) Columns                             US: 772
(0028, 0030) Pixel Spacing                       DS: [0.4507772, 0.4507772]
(0028, 0100) Bits Allocated                      US: 16
(0028, 0101) Bits Stored                         US: 12
(0028, 0102) High Bit                            US: 11
(0028, 0103) Pixel Representation                US: 0
(0028, 1050) Window Center                       DS: [50, 50]
(0028, 1051) Window Width                        DS: [350, 350]
(0028, 1052) Rescale Intercept                   DS: '-1024.0'
(0028, 1053) Rescale Slope                       DS: '1.0'
(0032, 1033) Requesting Service                  LO: 'CHIPSOFT'
(0040, 1001) Requested Procedure ID              SH: 'CT5001IV'
(0054, 1001) Units                               CS: 'HU'
(00e1, 0010) Private Creator                     LO: 'ELSCINT1'
(00e1, 1036) Private tag data                    CS: 'YES'
(00e1, 1040) [Image Label]                       SH: 'Cor IMR med'
(00e1, 1046) Private tag data                    OB: Array of 512 elements
(01f1, 0010) Private Creator                     LO: 'ELSCINT1'
(01f1, 1001) [Acquisition Type]                  CS: 'SPIRAL'
(01f1, 1002) [Unknown]                           CS: 'STANDARD'
(01f1, 100e) [Unknown]                           FL: 0.0
(01f1, 1027) [Rotation Time]                     DS: '0.33'
(01f1, 1032) [Image View Convention]             CS: 'RIGHT_ON_LEFT'
(01f1, 104a) [Unknown]                           SH: 'DOM'
(01f1, 104b) [Unknown]                           SH: '128x0.625'
(01f1, 104d) [Unknown]                           SH: 'NO'
(01f1, 104e) [Unknown]                           SH: 'Chest'
(01f1, 1054) Private tag data                    IS: '11'
(01f1, 1056) Private tag data                    LO: '30.0451206729581'
(01f7, 0010) Private Creator                     LO: 'ELSCINT1'
(01f7, 1022) [Unknown]                           UI: 1.3.46.670589.33.1.63849049343237673200010.5507538603167078985
(07a1, 0010) Private Creator                     LO: 'ELSCINT1'
(07a1, 1010) [Tamar Software Version]            LO: '4.0.0'
(7fe0, 0010) Pixel Data                          OB: Array of 309328 elements

Challenge

Challenge: Identifying Safe Metadata in DICOM

Can you determine which metadata for this CT scan is likely safe, meaning it does not lead to patient identification? When would you choose to retain such data?

Show me the solution

Metadata related to the machine, image type, and file type are generally safe. This information is particularly valuable when sorting through numerous DICOM files to locate specific types of images or when generating tabular data for harmonization purposes.

We can modify elements of our DICOM metadata:

PYTHON

elem = ds[0x0010, 0x0010]
print(elem.value)

OUTPUT

'OurBeloved^Colleague'

PYTHON

elem.value = 'Citizen^Almoni'
print(elem)

OUTPUT

(0010, 0010) Patient's Name                      PN: 'Citizen^Almoni'

In some cases, as here we are dealing with a standard keyword. The keyword PatientName is in programming terms technically a property of the class FileDataset, but here we are using “keyword” to refer to it and other very standard properties of the DICOM. Certain keywords can be modified as it follows:

PYTHON

ds.PatientName = 'Almoni^Shmalmoni'
print(elem)

OUTPUT

(0010, 0010) Patient's Name                      PN: 'Almoni^Shmalmoni'

You can also just set an element to empty by using None:

PYTHON

ds.PatientName = None
print(elem)

OUTPUT

(0010, 0010) Patient's Name                      PN: None

You can also delete and add elements. After making modifications, remember to save your file:

PYTHON

ds.save_as('data/anonym/my_modified_dicom.dcm')

We recommend removing at least the patient IDs and birthdates in most cases. Additionally, consider examining the data elements ‘OtherPatientIDs’ and ‘OtherPatientIDsSequence’.

Challenge

Challenge: Accessing Additional Patient Identifying Data

How can you access and print additional patient identifying data? Hint: Refer to the documentation and compare with what we have already printed.

Show me the solution

PYTHON

print(ds.PatientBirthDate)
print(ds.PatientID)
print(ds.OtherPatientIDs)
print(ds.PatientSex)
print(ds.PatientAge)

OUTPUT

19421104
party like 1999
1989442112
M
41Y

Pydicom offers a wide range of capabilities. You can visualize your DICOM data in a hierarchical tree format for user-friendly GUI reading. It supports downsizing images and handling waveform data such as EKGs. By integrating with Matplotlib, you can load and plot files seamlessly. Before adding additional libraries, explore Pydicom’s full potential to leverage its extensive functionalities.

Key Points

Certain metadata should almost always be removed from DICOM files before sharing
Automated tools are available to strip metadata from DICOMs, but manual verification is necessary due to inconsistencies in how fields are utilized
Several Python libraries enable access to DICOM metadata
Sharing only image files such as JPEGs or NIfTI can mitigate risks associated with metadata
Imaging data alone, even without explicit metadata, can sometimes lead to patient identification
You may need to preprocess images themselves so patients are de-identified
Tools exist to deface head images to further protect patient identity

Content from Generative AI in Medical Imaging

Last updated on 2024-08-15 | Edit this page

Estimated time: 40 minutes

Overview

Questions

What is generative AI?
How can generative AI be safely used in my work?

Objectives

Showcase practical examples of generative AI applications
Explore critical considerations regarding the risks and challenges associated with generative AI

Introduction

Generative artificial intelligence (AI) includes technologies that create or generate data. Since the early 2020s, there has been significant attention on large language models (LLMs) like ChatGPT, Copilot, and LLaMA, alongside image generation tools such as Stable Diffusion, Midjourney, and DALL-E. These tools not only create images but can also manipulate styles.

The applications of generative AI span widely across fields like medical imaging, where they hold potential for image interpretation and numerous other tasks. Of particular interest to participants in this course are the capabilities of LLMs to generate code and produce large volumes of synthetic data. Ongoing research continues to explore these capabilities.

However, the safety implications of these technologies remain a subject of debate. Depending on the software or model used, data entered into the system may become the property of the software’s creators. It is crucial to exercise caution when inputting sensitive information, such as patient data, into these systems. Understanding where and how data is stored (i.e., whether on your servers or in the cloud) is essential to safeguard privacy and confidentiality.

Mastering the Use of Generative AI Tools

Architectures of Generative AI

Generative AI encompasses models capable of generating new data across various formats: text, images, video, or audio. Several prominent architectures, such as generative adversarial networks (GANs), variational autoencoders, diffusion models, and transformers, achieve this capability. While the technical intricacies of these architectures exceed this course’s scope, understanding their operational principles can illuminate the nature of their outputs. For instance, models treating data as sequential sequences (e.g., pixels or words) predict the next element based on probabilities derived from the training dataset, potentially perpetuating common patterns rather than reflecting absolute truths.

However, using generative algorithms carries inherent risks, including inadvertently embedding biases into both data and algorithms. While humans cannot infer patient ethnicity from body imaging, recent studies demonstrate that certain AI algorithms can leverage correlations in data, introducing potential ethical implications. This represents just one facet of the risks associated with these technologies, which we will further explore in the following safety section.

Safely Leveraging Generative AI

Never upload non-anonymized sensitive patient data outside your secure servers
Avoid using sensitive patient data with online tools (e.g., chatGPT, integrated tools like co-pilot, MS Office)
Be mindful of internet connectivity with your code editors and tools
Prefer systems that operate locally (on your machine or hospital servers)
Acknowledge that tools may generate erroneous information (hallucinations)
Don’t assume code generated by a LLM lacks potential bugs or harmful code
Address intellectual property and ethical considerations when utilizing generative AI models
Document model usage thoroughly in academic work, detailing versions and exact applications
Recognize that the training corpus of such models may introduce biases

Stable Diffusion is an open-source tool widely used for image generation, capable of running locally on your machine. While older versions typically require GPUs, recent builds are rumored to run on CPUs, although at a slower pace. For efficient large-scale image generation, utilizing a GPU is recommended to save time.

One limitation of Stable Diffusion, and similar tools, is that the model cannot be customized with your own datasets, restricting the ability to train or modify it according to specific needs.

Challenge

Challenge: Assessing the Risks of Local Model Deployment

Can you identify a few risks associated with running a model entirely on your machine?

Show me the solution

One significant risk of running any model locally is the potential for malicious code. Since you need to download the model to your machine, there is no guarantee that it will not contain malware. To mitigate this risk, we recommend using open-source models, which allow for community scrutiny and transparency.

Another concern is the environmental impact of model usage. Training and maintaining these models consume considerable resources, including carbon and water. We advise against running models like ChatGPT continuously due to their substantial environmental footprint. Even when operating a model locally, there are inherent risks, such as relying on a smaller training dataset to fit on your machine, which could compromise the model’s quality and performance.

Maximizing the Effectiveness of Generative AI

The output of generative tools is highly dependent on the specific words or images inputted, a process known as prompt engineering.

To create content that balances a dataset for machine learning, start by analyzing the dataset’s balance according to important labels. Then, describe images with these labels and build prompts based on the descriptions of label categories with fewer images. This approach helps focus the model’s use on generating useful outcomes while minimizing risks.

Keep in mind that content generation can be a somewhat stochastic process. Therefore, you may need to experiment with and refine your prompts and even the sequences of prompts to achieve the desired results.

Callout

Effective Prompting Tips

Ensure accuracy in spelling and grammar to avoid misdirection
Provide clear and detailed prompts
Explain the context or goal (e.g., “helping my son… can you explain it in a specific way?”)
Try different wordings to refine results
Fact-check thoroughly
- First, ask the tool to fact-check its output
- Then, verify using search engines and experts
- Never fully trust references or facts generated by an LLM, seek human-generated content for confirmation
Clearly indicate if you need a particular format (e.g., bullet points, square images)
Use prompts like “you are a Python expert” or “you are an expert pathologist” to guide the tool
For more accurate outputs, fine-tune foundation models with your specific data

These tips will help you optimize the quality and relevance of the content generated by AI tools.

Attention to detail in creating prompts is critical. For example, the following image was generated from the misspelling of “X-ray” with a popular AI tool:

While this result is laughable, more subtle mistakes can be harder to identify, especially for those without extensive training in a specific field. Therefore, having a specialist review the generated data is crucial to ensure its validity.

It is also important to remember that the training data for some algorithms is often sourced from the internet, which includes a lot of irrelevant or misleading content. For instance, while there are many cute cat pictures available, there are very few good examples of rare medical conditions like desmoplastic infantile ganglioma. Additionally, using terms like “CAT scan” (historically referring to computer axial tomography) might result in images of cats rather than the intended medical imagery:

Key Points

Generative programs can create synthetic data, potentially enhancing various algorithms
Generative AI models have inherent limitations
Running generative AI programs locally on your own server is safer than using programs that send prompts to external servers
Exercise caution when entering patient data into generative AI programs
Numerous policies exist to ensure the safe and ethical use of generative AI tools across institutions