Content from Course Introduction
Last updated on 2024-08-14 | Edit this page
Overview
Questions
- How can I use this course to be better at my research?
Objectives
- Explain how to get the most from the course
- Demonstrate and explain how the course will be laid out
This is a lesson created in the style of The Carpentries. It is written with the assumption that you already possess skills in terms of git, Python and basic image processing.
The interpretation of medical images for clinical purposes requires skills that take highly trained professionals such as nuclear medicine specialists and radiologists many years to master. This course does not aim to improve such interpretive skills, but rather to enhance the computational skills needed to answer research questions involving medical images.
Some examples of the kinds of research questions that can be answered are:
Can we predict from brain MRIs when patients will become demented before they do?
Can we build machine learning models on ultrasound data which can aid in the detection of neuromuscular diseases?
Are there observable anatomical differences in the brains of autistic people at a population level?
Can we use existing medical imaging to screen for underdiagnosed conditions like osteoporosis?
You are in all likelihood here because you have a research question which can be answered with the processing and analysis of medical images. This course is meant to aid you.
Note that all figures and data presented are licensed under open-source terms.
Challenge: Can You Do It?
What is the way to use the challenges and question?
Do not peek, try to solve it yourself. The effort will pay off.
Content from Medical Imaging Modalities
Last updated on 2024-10-15 | Edit this page
Overview
Questions
- What are the common different types of diagnostic imaging?
- What sorts of computational challenges do they present?
Objectives
- Explain various common types of medical images
- Explain at a high level how images’ metadata is created and organized
Introduction
Medical imaging uses many technologies including X-rays, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound, positron emission tomography (PET) and microscopy. Although there are tendencies to use certain technologies, or modalities to answer certain clinical questions, many modalities may provide information of interest in terms of research questions. In order to work with digital images at scale we need to use information technology. We receive images in certain types of files, e.g., an x-ray stored at the hospital in DICOM format, but the image itself is contained in a JPEG inside the DICOM as a 2D-array. Understanding all the kinds of files we are dealing with and how the images within them were generated can help us deal with them computationally.
Conceptually, we can think of medical images as signals. These signals need various kinds of processing before they are ‘readable’ by humans or by many of the algorithms we write.
While thinking about how the information from these signals is stored in different file types may seem less exciting than what the “true information” or final diagnosis from the image was, it is necessary to understand this to make the best algorithms possible. For example, a lot of hospital images are essentially JPEGs. This has implications in terms of image quality as we manipulate and resize the images.
Below are a few summaries about various ultra-common imaging types. Keep in mind that manufacturers may have specificities in terms of file types not covered here, and there are many possibilities in terms of how images could potentially be stored. Here we will discuss what is common to get in terms of files given to researchers.
X-Rays
Historically, x-rays were the first common form of medical imaging. The diagram below should help you visualize how they are produced. The signal from an x-ray generator crosses the subject. Some tissues attenuate the radiation more than others. The signal is captured by an x-ray detector (you can think of this metaphorically like photographic film) on the other side of the subject.
As you can imagine if you only have one view in an X-ray every object in the line of radiation from the generator is superimposed on every object below it. Even in the days of film X-rays often two views would be made. In the case of chest X-rays these could be a posteroanterior(PA) view and a lateral view. In the case of joints the views may be specific, however remember that in each view objects in the same line between the generator and receptor will be superimposed.
image courtesy of Radiopaedia, author and ID on image
Modern x-rays are born digital. No actual “film” is produced, rather a DICOM file which typically contains arrays in JPEG files.
We could use the metaphor of a wrapped present here. The DICOM file contains metadata around the image data, wrapping it. The image data itself is a bunch of 2D-arrays, but these have been organized to a specific shape - they are “boxed” by JPEG files. JPEG is a container format. There are JPEG files (emphasis on the plural) in a single DICOM file which typically contain images of the same body part with different angles of acquisition.
We can take x-rays from any angle and even do them repeatedly, and this allows for fluoroscopy. Because fluoroscopy adds a time dimension to X-ray the data becomes 3-dimensional, possessing an X, Y and time dimension. Below is a fluoroscopy image of a patient swallowing barium.
image courtesy of Ptrump16, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons
Fluoroscopy images are stored in a DICOM but can be displayed as movies because they are typically cine-files. Cine- is a file format that lets you store images in sequence with a frame rate.
Computed Tomography and Tomosynthesis
There are several kinds of tomography. This technique produces 3D-images, made of voxels, that allow us to see structures within a subject. CTs are extremely common, and helpful for many diagnostic questions, but have certain costs in terms of not only time and money, but also radiation to patients.
CTs and tomosynthetic images are produced with similar technology. One key difference is that in a CT the image is based on a 360 degree capture of the signal. You can conceptualize this as a spinning donut with the generator and receptor opposite to each other. The raw data of a CT is a sinogram. Only by processing this data do we get what most people would recognize as a CT. At this level of processing there are already choices effecting the data we get. Let’s examine two ways to process our sinograms:
PYTHON
import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import iradon
from skimage.transform import iradon_sart
# load a sinogram of a simple phantom of the head
sino = np.load('data/medical/Schepp_Logan_sinogram.npy')
# make a filtered back projection reconstruction
theta = np.linspace(0.0, 180.0, max(sino.shape), endpoint=False)
reconstruction_fbp = iradon(sino, theta=theta, filter_name='ramp')
# make a reconstruction with Simultaneous Algebraic Reconstruction Technique
reconstruction_sart = iradon_sart(sino, theta=theta)
# plot with matplotlib
fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(9, 5), sharex=True, sharey=True)
ax1.set_title("Sinogram")
ax1.imshow(sino, cmap=plt.cm.Greys_r,)
ax2.set_title("Reconstruction\nFiltered back projection")
ax2.imshow(reconstruction_fbp, cmap=plt.cm.Greys_r)
ax3.set_title("Reconstruction\nSART")
ax3.imshow(reconstruction_sart, cmap=plt.cm.Greys_r)
ax4.set_title("Difference\n between reconstructions")
ax4.imshow(reconstruction_sart - reconstruction_fbp, cmap=plt.cm.Greys_r)
plt.show()
OUTPUT
Images generated from the Shepp–Logan phantom
While you may get an already processed CT (Some commercial machines come with proprietary reconstruction algorithms which will already have been executed), it is not uncommon to get CTs as DICOM CT projection data (DICOM-CT-PD) files which can then be processed before viewing, or in some cases stored off as other file types. As shown in the code above there is more than one way to process the data into a radiologist friendly CT. Filtered Back Projection or Algebraic Reconstruction methods are shown but there are other methods such as iterative reconstruction, convolution back projection and even deep learning based methods.
Tomosynthesis makes X-ray based images using a limited angle instead of going all the way around the patient as in CT. The data from a tomosynthetic image is then processed so that you get multiple angles visible. This gets around the issue of overlapping objects in a plain film X-ray. In both the case of CT and tomosynthesis, the image output is made by processing the originally acquired data. Although most researchers work with already processed images, it is important to keep in mind that in theory the originally acquired data can be processed in a variety of ways.
Ultrasounds
Ultrasounds can produce multiple complex types of images. Ultrasound use high frequency sound waves, sent and captured from a piezoelectric probe (also known as a transducer) to get images.
Just as different tissues attenuate radiation differently, different tissues attenuate the sound waves differently. To be more precise different tissues reflect and absorb the waves differently and this can help us create images after some processing of the signal.
These images can be captured in rapid succession over time, so they can be saved as cine-files inside DICOMs. On the other hand, the sonographer can choose to record only a single ‘frame’, in which case a 2D-array will ultimately be saved.
Typically, sonographers produce a lot of B-mode images, but B-mode is far from the only type of ultrasounds. M-mode or motion mode, like the cine-files in B-mode, can also capture motion, but puts it into a a single 2D-array of one line of the image over time. In the compound image below you can see a B-mode 2D-image and an M-mode made on the line in it.
What Are Some Disadvantages to Ultrasounds in Terms of Computational Analysis?
Ultrasounds images are operator-dependent, often with embedded patient data, and the settings and patients’ positions can vary widely.
Challenge: How to Reduce These Problems?
How can we optimize research involving ultrasounds in terms of the challenges above?
Possible solutions include:
- Reviewing metadata on existing images so it can be matched by machine type
- Training technicians to use standardized settings only
- Creating a machine set only on one power setting
- Image harmonization/normalization algorithms
Magnetic Resonance Imaging
MRIs are images made by utilizing some fairly complicated physics in terms of what we can do to protons (abundant in human tissue) with magnets and radiofrequency waves, and how we capture their signal. Different ordering and timing of radiofrequency pulses and different magnetic gradients give us different MRI sequences. The actual signal on an anatomical MRI needs to be processed typically via Fourier transforms and some other computational work before it is recognizable as anatomy. The raw data is reffered to as the k-space data, and this can be kept in vendor specific formats or open common formats, e.g., ISMRMRD (International Society of Magnetic Resonance MR Raw Data). In practice, we rarely use the k-space data (unless perhaps we are medical physicists) for research on medical pathology. Nonetheless researchers in new sequences for MRI will be very interested in such data, and typically getting the fastest transformations of it possible. There are many ways the raw data could be transformed or used to produce an MRI. While an inverse Fourier transform is typical, a Hartley transform could be used, and some scientists even use deep learning based methods. Let’s look at k-space with a viridis color map:
Sourced from the FastMRI data set of NYU, 2024 (Knoll et al Radiol Artif Intell. 2020 Jan 29;2(1):e190007. doi: 10.1148/ryai.2020190007.https://pubs.rsna.org/doi/10.1148/ryai.2020190007 and the arXiv paper, https://arxiv.org/abs/1811.08839.)
Let’s do an example of a k-space transform
PYTHON
slice_kspace = np.load('data/medical/slice_kspace.npy')
# show shape
print(slice_kspace.shape)
# show type
print(type(slice_kspace))
# print type of an example pixel
print(type(slice_kspace[3,3]))
OUTPUT
(640, 368)
<class 'numpy.ndarray'>
<class 'numpy.complex64'>
Note we have an array that contains numbers with an imaginary element therefore the type is complex. We can extract and graph the real and imaginary parts, and also graph a transformation:
PYTHON
real_from_slice_kspace = slice_kspace.real
imaginary_from_slice_kspace = slice_kspace.imag
# make an inverse fourier
def inverse_fft2_shift(kspace):
return np.fft.fftshift(np.fft.ifft2(np.fft.ifftshift(kspace, axes=(-2,-1)), norm='ortho'),axes=(-2,-1))
# graph both
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(9, 5), sharex=True, sharey=True)
ax1.set_title("K-space real part")
ax1.imshow(real_from_slice_kspace, cmap=plt.cm.Greys_r,)
ax2.set_title("K-space imaginary part")
ax2.imshow(imaginary_from_slice_kspace, cmap=plt.cm.Greys_r,)
ax3.set_title("Transformed K-space")
ax3.imshow(np.abs(inverse_fft2_shift(slice_kspace)), cmap=plt.cm.Greys_r)
Hopefully you can see that our K-space like our sinogram is not so human-readable, and the transformed image is recognizable as a knee. A transformed type of image, one a radiologist will be able to read, is often what we are given. This final product we are used to looking at is such a post-processed 3D-array wrapped inside a DICOM file. We can transform the image, and parts of the metadata, to a variety of file types commonly used in research. These file types will be covered in more detail later in the course.
CT versus MRI
After processing both CT and standard MRIs will give a 3D image. However, it is important to understand that there are differences. An standard MRI sequence will give better differentiation between various soft tissues, wheras a CT will provide better images of bones. CTs can be acquired more quickly and cheaply, but have the hidden cost of radiation.
Challenge: CT, MRIs and Artifacts?
You are researching tumors in adults. You team up with the best radiologist you can find and ask for imaging. She hands you DICOM files of some patient MRIs and CTs, and states “These are exactly the images I use. I have checked that they are all without artifacts” You have the images straight from the radiologist, could there potentially be any artifacts?
In an absolute total sense, they could have artifacts. Both CT and MRI are reconstructed from original data, and the reconstruction will introduce artifacts. The radiologist thinks of artifacts as things like motion blurring from when the patient moves or wrap-around in MRIs when the field of view was set too small. These are obvious to the human eye. However technically every reconstruction algorithm can potentially introduce tiny artifacts not visible to the human eye in the reconstruction.
Other Image Types
Nuclear medicine images scans and pathology images are also broadly available inside hospitals.
Nuclear medicine images e.g. PET and SPECT can be 2D or 3D images based on a signal of a radiotracer given to the patient. When the radiotracer is consumed it lets off gamma rays which are then used as a signal. This type of image can be extremely useful in processes like looking for metastases. While 3D images registered with a CT or MRI give anatomic precision, in some cases a 2D image awnsers basic questions. In the image below a 2D bone scan shows metastasis from prostate cancer.
sourced from RadsWiki, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons
Pathology is currently undergoing a revolution of digitalization, and a typical file format has not emerged yet. Pathology images may be DICOM, but could also be stored as specific kinds of TIFF files or other file types. Pathology is an entire medical discipline in which various types of images are used, both 2D, 3D and often multi-channel i.e. in color. However, one classic type of pathology image is that of a stained tissue slide seen by a microscope. With various stains we can see what is going on in a tissue on a cellular level. In the image below you can see macrophages that have come to sorround actinomyces in someone’s lung.
sourced from By Yale Rosen from USA - Actinomycosis, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=31127755
Beyond the more common types of imaging, researchers are actively looking into new forms of imaging. Some add new information to old modalities, like contrast-enhanced ultrasounds. Other new forms of imaging are novel in terms of the signal, such as terahertz imaging, which uses a previously ‘unused’ part of the electomagnetic radiation spectrum. As you might guess, the more novel the imaging, usually the less consolidation there is around file types and how they are organized. It is useful to remember that all these file types, whether on established image types or novel ones, are sorts of ‘containers’ for the ‘payload’ of the actual images which are the arrays. Often we simply need to know how to get the payload array out of its container and/or where to find certain metadata.
There is less standardization around file formats of certain types of imaging.
For example, while typical radiological images have settled in how they are recorded in DICOM, more novel sequences, such as arterial spin-labeled ones, do not have a standardized way of how they are recorded in it. Some newer forms of imaging such as electrical impedence tomography use entirely different kinds of technologies and signals than anything already existing. When it comes to truly novel imaging types there can be no standardization at all.
Standard Image types
Type | Signal | Standard File | Image file | Image array |
---|---|---|---|---|
X-ray | ionizing radiation | DICOM | JPEG 2000 | 2D arrays |
Standard CT | ionizing radiation | DICOM | Raw or compressed voxel data | 3D arrays |
Ultrasound (B-mode) | high frequency sound | DICOM | CINE | 2D array or 4D tensors |
MRI (spin or gradient echo) | patient’s molecules | DICOM | Raw or compressed voxel data | 3D arrays |
Digital Pathology slides | light through stained tissue | no consensus | often converted to TIFF | multichannel 2D or 3D arrays |
The above table is a drastic simplification as there are always cases where people use novel files or novel designs. There are also many newer technologies like 4D CT. Nonetheless it is useful to know some of the typical data structures you will usually find.
Key Points
- Each imaging modality provides distinct sets of information
- In computational imaging, images are essentially arrays, although embedded in additional data structures
- Many images we may get e.g. MRIs and CTs have already been processed with some algorithms to make them human readable
- Research should be thoughtfully designed, taking into account the constraints and capabilities inherent in human capacities
- We can expect the emergence of additional imaging modalities in the future
Content from Working with MRI
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- What kinds of MRI are there?
- How are MRI data represented digitally?
- How should I organize and structure files for neuroimaging MRI data?
Objectives
- Show common kinds of MRI imaging used in research
- Show the most common file formats for MRI
- Introduce MRI coordinate systems
- Load an MRI scan into Python and explain how the data is stored
- View and manipulate image data
- Explain what BIDS is
- Explain advantages of working with Nifti and BIDS
- Show a method to convert from DICOM to BIDS/NIfTI
Introduction
This lesson was heavily based on existing lessons from Carpentries Incubators; namely:
- Introduction to Working with MRI Data in Python
- Introduction to dMRI
- Functional Neuroimaging Analysis in Python
Until clarification over overlapping materials with all authors of these repositories has been discussed. This lesson is temporarily offline.
Content from Registration and Segmentation with SITK
Last updated on 2024-10-08 | Edit this page
Overview
Questions
- What are SITK Images?
- How can registration be implemented in SITK?
- How can I segment an image in SITK?
Objectives
- Explain how to perform basic operations on SITK Images
- Explain when registration can be needed and how to register images with SITK
- Become familiar with basic segmentation algorithms available in SITK
Introduction
In the realm of medical imaging analysis, registration and segmentation play crucial roles. Registration aligns images, enabling the fusion of data or the tracking of changes over time. Segmentation identifies and labels objects of interest within images. Automation is essential due to high clinical demand, driving the development of robust algorithms for both processes. Moreover, many advanced segmentation techniques utilize image registration, whether implicitly or explicitly.
SimpleITK (SITK) is a simplified programming interface to the algorithms and data structures of the Insight Toolkit (ITK) for segmentation, registration and advanced image analysis, available in many programming languages (e.g., C++, Python, R, Java, C#, Lua, Ruby).
SITK is part of the Insight Software Consortium a non-profit educational consortium dedicated to promoting and maintaining open-source, freely available software for medical image analysis. Its copyright is held by NumFOCUS, and the software is distributed under the Apache License 2.0.
In this episode, we use a hands-on approach utilizing Python to show how to use SITK for performing registration and segmentation tasks in medical imaging use cases.
Fundamental Concepts
In this section, we’ll cover some fundamental image processing operations using SITK, such as reading and writing images, accessing pixel values, and resampling images.
Images
The fundamental tenet of an image in ITK and consequentially in SITK is that an image is defined by a set of points on a grid occupying a physical region in space . This significantly differs from many other image analysis libraries that treat an image as an array which has two implications: (1) pixel/voxel spacing is assumed to be isotropic and (2) there is no notion of an image’s location in physical space.
SITK images are multi-dimensional (the default configuration includes images from two dimensional up to five dimensional) and can be a scalar, labelmap (scalar with run length encoding), complex value or have an arbitrary number of scalar channels (also known as a vector image). The region in physical space which an image occupies is defined by the image’s:
- Origin (vector like type) - location in the world coordinate system of the voxel with all zero indexes.
- Spacing (vector like type) - distance between pixels along each of the dimensions.
- Size (vector like type) - number of pixels in each dimension.
- Direction cosine matrix (vector like type representing matrix in row major order) - direction of each of the axes corresponding to the matrix columns.
The following figure illustrates these concepts.
In SITK, when we construct an image we specify its dimensionality, size and pixel type, all other components are set to reasonable default values:
- Origin - all zeros.
- Spacing - all ones.
- Direction - identity.
- Intensities in all channels - all zero.
The tenet that images occupy a spatial location in the physical world has to do with the original application domain of ITK and SITK, medical imaging. In that domain images represent anatomical structures with metric sizes and spatial locations. Additionally, the spacing between voxels is often non-isotropic (most commonly the spacing along the inferior-superior/foot-to-head direction is larger). Viewers that treat images as an array will display a distorted image as shown below:
As an image is also defined by its spatial location, two images with the same pixel data and spacing may not be considered equivalent. Think of two CT scans of the same patient acquired at different sites. The following figure illustrates the notion of spatial location in the physical world, the two images are considered different even though the intensity values and pixel spacing are the same.
As SITK images occupy a physical region in space, the quantities defining this region have metric units (cm, mm, etc.). In general SITK assumes units are in millimeters (historical reasons, due to DICOM standard). In practice SITK is not aware of the specific units associated with each image, it just assumes that they are consistent. Thus, it is up to you the developer to ensure that all of the images you read and created are using the same units.
A SITK image can have an arbitrary number of channels with the content of the channels being a scalar or complex value. This is determined when an image is created.
In the medical domain, many image types have a single scalar channel (e.g. CT, US). Another common image type is a three channel image where each channel has scalar values in [0,255], often people refer to such an image as an RGB image. This terminology implies that the three channels should be interpreted using the RGB color space. In some cases you can have the same image type, but the channel values represent another color space, such as HSV (it decouples the color and intensity information and is a bit more invariant to illumination changes). SITK has no concept of color space, thus in both cases it will simply view a pixel value as a 3-tuple.
Let’s read an example of human brain CT, and let’s explore it with SITK.
PYTHON
%matplotlib inline
import matplotlib.pyplot as plt
import SimpleITK as sitk
img_volume = sitk.ReadImage("data/sitk/A1_grayT1.nrrd")
print(type(img_volume))
print(img_volume.GetOrigin())
print(img_volume.GetSpacing())
print(img_volume.GetDirection())
OUTPUT
<class 'SimpleITK.SimpleITK.Image'>
(-77.625, -107.625, 119.625)
(0.75, 0.75, 0.75)
(0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, -1.0, 0.0)
The function sitk.ReadImage
loads the file as a
SimpleITK.SimpleITK.Image
instance, and then we can access
useful methods to get an idea of how the image/s contained in the file
is/are. The size of the image’s dimensions have explicit accessors:
PYTHON
print(img_volume.GetSize())
print(img_volume.GetWidth())
print(img_volume.GetHeight())
print(img_volume.GetDepth())
print(img_volume.GetNumberOfComponentsPerPixel())
print(img_volume.GetPixelIDTypeAsString())
OUTPUT
(288, 320, 208)
288
320
208
1
32-bit float
Just inspecting these accessors, we deduce that the file contains a volume made of 208 images, each made of 288x320 pixels and one channel only (grayscale).
SITK Conventions
- Image access is in x,y,z order,
image.GetPixel(x,y,z)
orimage[x,y,z]
, with zero based indexing. - If the output of an ITK filter has non-zero starting index, then the index will be set to 0, and the origin adjusted accordingly.
Displaying Images
While SITK does not do visualization, it does contain a built in
Show
method. This function writes the image out to disk and
than launches a program for visualization. By default it is configured
to use ImageJ, because it is readily supports all the image types which
SITK has and load very quickly.
SITK provides two options for invoking an external viewer, use a
procedural interface (the Show
function) or an object
oriented one. For more details about them, please refer to this
notebook from the official documentation.
In this episode we will convert SITK images to numpy
arrays, and we will plot them as such.
Images as Arrays
We have two options for converting from SITK to a numpy
array:
-
GetArrayFromImage()
: returns a copy of the image data. You can then freely modify the data as it has no effect on the original SITK image. -
GetArrayViewFromImage()
: returns a view on the image data which is useful for display in a memory efficient manner. You cannot modify the data and the view will be invalid if the original SITK image is deleted.
The order of index and dimensions need careful attention during
conversion. ITK’s Image class has a GetPixel
which takes an
ITK Index object as an argument, which is ordered as (x,y,z). This is
the convention that SITK’s Image class uses for the
GetPixel
method and slicing operator as well. In
numpy
, an array is indexed in the opposite order (z,y,x).
Also note that the access to channels is different. In SITK you do not
access the channel directly, rather the pixel value representing all
channels for the specific pixel is returned and you then access the
channel for that pixel. In the numpy array you are accessing the channel
directly. Let’s see this in an example:
PYTHON
import numpy as np
multi_channel_3Dimage = sitk.Image([2, 4, 8], sitk.sitkVectorFloat32, 5)
x = multi_channel_3Dimage.GetWidth() - 1
y = multi_channel_3Dimage.GetHeight() - 1
z = multi_channel_3Dimage.GetDepth() - 1
multi_channel_3Dimage[x, y, z] = np.random.random(
multi_channel_3Dimage.GetNumberOfComponentsPerPixel()
)
nda = sitk.GetArrayFromImage(multi_channel_3Dimage)
print("Image size: " + str(multi_channel_3Dimage.GetSize()))
print("Numpy array size: " + str(nda.shape))
# Notice the index order and channel access are different:
print("First channel value in image: " + str(multi_channel_3Dimage[x, y, z][0]))
print("First channel value in numpy array: " + str(nda[z, y, x, 0]))
OUTPUT
Image size: (2, 4, 8)
Numpy array size: (8, 4, 2, 5)
First channel value in image: 0.5384010076522827
First channel value in numpy array: 0.538401
Going back to the loaded file, let’s plot the array version of the volume slice from the middle of the stack, along the z axis, using different color maps:
PYTHON
npa = sitk.GetArrayViewFromImage(img_volume)
# Display the image slice from the middle of the stack, z axis
z = int(img_volume.GetDepth() / 2)
npa_zslice = sitk.GetArrayViewFromImage(img_volume)[z, :, :]
# Three plots displaying the same data, how do we deal with the high dynamic range?
fig = plt.figure(figsize=(10, 3))
fig.add_subplot(1, 3, 1)
plt.imshow(npa_zslice)
plt.title("default colormap", fontsize=10)
plt.axis("off")
fig.add_subplot(1, 3, 2)
plt.imshow(npa_zslice, cmap=plt.cm.Greys_r)
plt.title("grey colormap", fontsize=10)
plt.axis("off")
fig.add_subplot(1, 3, 3)
plt.title(
"grey colormap,\n scaling based on volumetric min and max values", fontsize=10
)
plt.imshow(npa_zslice, cmap=plt.cm.Greys_r, vmin=npa.min(), vmax=npa.max())
plt.axis("off")
We can also do the reverse, i.e. converting a numpy
array to the SITK Image:
OUTPUT
<class 'SimpleITK.SimpleITK.Image'>
We can also plot multiple slices at the same time, for better inspecting the volume:
PYTHON
img_xslices = [img_volume[x,:,:] for x in range(50, 200, 30)]
img_yslices = [img_volume[:,y,:] for y in range(50, 200, 30)]
img_zslices = [img_volume[:,:,z] for z in range(50, 200, 30)]
tile_x = sitk.Tile(img_xslices, [1,0])
tile_y = sitk.Tile(img_yslices, [1,0])
tile_z = sitk.Tile(img_zslices, [1,0])
nda_xslices = sitk.GetArrayViewFromImage(tile_x)
nda_yslices = sitk.GetArrayViewFromImage(tile_y)
nda_zslices = sitk.GetArrayViewFromImage(tile_z)
fig, (ax1, ax2, ax3) = plt.subplots(1,3)
ax1.imshow(nda_xslices, cmap=plt.cm.Greys_r)
ax2.imshow(nda_yslices, cmap=plt.cm.Greys_r)
ax3.imshow(nda_zslices, cmap=plt.cm.Greys_r)
ax1.set_title('X slices')
ax2.set_title('Y slices')
ax3.set_title('Z slices')
Operations like slice indexing, cropping, flipping, … can be
performed on SITK images very similarly as it is usually done in
numpy
. Note that slicing of SITK images returns a copy of
the image data, similarly to slicing Python lists, and differently from
the “view” returned by slicing numpy
arrays.
PYTHON
n_slice = 150
# Original slice
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, :, n_slice]), cmap="gray")
PYTHON
# Subsampling
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, ::3, n_slice]), cmap="gray")
PYTHON
# Comparative operators
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, :, n_slice] > 90), cmap="gray")
PYTHON
# Draw a square
draw_square = sitk.GetArrayFromImage(img_volume[:, :, n_slice])
draw_square[0:100,0:100] = draw_square.max()
plt.imshow(draw_square, cmap="gray")
Another example of operation we can perform is creating a grid mask and apply it to an image:
PYTHON
img_zslice = img_volume[:, :, n_slice]
# Create a grid and use as mask
grid_image = sitk.GridSource(
outputPixelType=sitk.sitkUInt16,
size=img_zslice.GetSize(),
sigma=(0.1, 0.1),
gridSpacing=(20.0, 20.0),
)
# zero out the values in the original image that correspond to
# the grid lines in the grid_image
img_zslice[grid_image == 0] = 0
nda = sitk.GetArrayViewFromImage(img_zslice)
plt.imshow(nda, cmap="gray")
Challenge: Fix the Error (Optional)
By running the lines of code above for masking the slice with the grid, you will get an error. Can you guess what is it about?
By default, SITK creates images centered in the origin, which all ones spacing. We need to have the same values in the grid and in the image in order to superimpose the first to the latter.
Add these two lines to the code above, after the creation of the grid. Everything should work fine now. Remember that making such changes to an image already containing data should be done cautiously.
Meta-dictionaries
SITK can read (and write) images stored in a single file, or a set of files (e.g. DICOM series).
Images stored in the DICOM format have a meta-data dictionary associated with them, which is populated with the DICOM tags. When a DICOM series is read as a single image, the meta-data information is not available since DICOM tags are specific to each file. If you need the meta-data, you have three options:
Using the object oriented interface’s ImageSeriesReader class, configure it to load the tags using the
MetaDataDictionaryArrayUpdateOn
method and possibly theLoadPrivateTagsOn
method if you need the private tags. Once the series is read you can access the meta-data from the series reader using theGetMetaDataKeys
,HasMetaDataKey
, andGetMetaData
.Using the object oriented interface’s ImageFileReader, set a specific slice’s file name and only read it’s meta-data using the
ReadImageInformation
method which only reads the meta-data but not the bulk pixel information. Once the meta-data is read you can access it from the file reader using theGetMetaDataKeys
,HasMetaDataKey
, andGetMetaData
.Using the object oriented interface’s ImageFileReader, set a specific slice’s file name and read it. Or using the procedural interface’s, ReadImage function, read a specific file. You can then access the meta-data directly from the Image using the
GetMetaDataKeys
,HasMetaDataKey
, andGetMetaData
.
Note: When reading an image series, via the
ImageSeriesReader
or via the procedural
ReadImage
interface, the images in the list are assumed to
be ordered correctly (GetGDCMSeriesFileNames
ensures this
for DICOM). If the order is incorrect, the image will be read, but its
spacing and possibly the direction cosine matrix will be incorrect.
Let’s read in a digital x-ray image saved in a DICOM file format, and let’s print the metadata’s keys:
PYTHON
img_xray = sitk.ReadImage('data/sitk/digital_xray.dcm')
for key in img_xray.GetMetaDataKeys():
print(f'"{key}":"{img_xray.GetMetaData(key)}"')
OUTPUT
"0008|0005":"ISO_IR 100"
"0008|0012":"20180713"
"0008|0013":"233245.431"
"0008|0016":"1.2.840.10008.5.1.4.1.1.7"
"0008|0018":"2.25.225981116244996633889747723103230447077"
"0008|0020":"20160208"
"0008|0060":"XC"
"0020|000d":"2.25.59412532821774470466514450673479191872"
"0020|000e":"2.25.149748537243964334146339164752570719260"
"0028|0002":"3"
"0028|0004":"YBR_FULL_422"
"0028|0006":"0"
"0028|0010":"357"
"0028|0011":"371"
"0028|0100":"8"
"0028|0101":"8"
"0028|0102":"7"
"0028|0103":"0"
"ITK_original_direction":"[UNKNOWN_PRINT_CHARACTERISTICS]
"
"ITK_original_spacing":"[UNKNOWN_PRINT_CHARACTERISTICS]
"
Reading a DICOM:
- Many libraries allow you to read DICOM metadata.
- PyDICOM will be explored for this task later.
- If you are already using SITK, you will usually not need an extra library to get DICOM metadata.
- Many libraries have some basic DICOM functionality, check the documentation before adding extra dependencies.
Generally speaking, SITK represents color images as multi-channel images independent of a color space. It is up to you to interpret the channels correctly based on additional color space knowledge prior to using them for display or any other purpose.
Things to note: 1. When using SITK to read a color DICOM image, the channel values will be transformed to the RGB color space. 2. When using SITK to read a scalar image, it is assumed that the lowest intensity value is black and highest white. If the photometric interpretation tag is MONOCHROME2 (lowest value displayed as black) nothing is done. If it is MONOCHROME1 (lowest value displayed as white), the pixel values are inverted.
PYTHON
print(f'Image Modality: {img_xray.GetMetaData("0008|0060")}')
print(img_xray.GetNumberOfComponentsPerPixel())
OUTPUT
Image Modality: XC
3
“0008|0060” is a code indicating the modality, and “XC” stays for “External-camera Photography”.
Grayscale Images Stored as sRGB
“digital_xray.dcm” image is sRGB, even if an x-ray should be a single channel gray scale image. In some cases looks may be deceiving. Gray scale images are not always stored as a single channel image. In some cases an image that looks like a gray scale image is actually a three channel image with the intensity values repeated in each of the channels. Even worse, some gray scale images can be four channel images with the channels representing RGBA and the alpha channel set to all 255. This can result in a significant waste of memory and computation time. Always become familiar with your data.
We can convert sRGB to gray scale:
PYTHON
def srgb2gray(image):
# Convert sRGB image to gray scale and rescale results to [0,255]
channels = [
sitk.VectorIndexSelectionCast(image, i, sitk.sitkFloat32)
for i in range(image.GetNumberOfComponentsPerPixel())
]
# linear mapping
I = 1 / 255.0 * (0.2126 * channels[0] + 0.7152 * channels[1] + 0.0722 * channels[2])
# nonlinear gamma correction
I = (
I * sitk.Cast(I <= 0.0031308, sitk.sitkFloat32) * 12.92
+ I ** (1 / 2.4) * sitk.Cast(I > 0.0031308, sitk.sitkFloat32) * 1.055
- 0.055
)
return sitk.Cast(sitk.RescaleIntensity(I), sitk.sitkUInt8)
img_xray_gray = srgb2gray(img_xray)
nda = sitk.GetArrayViewFromImage(img_xray_gray)
plt.imshow(np.squeeze(nda), cmap="gray")
Transforms
SITK supports two types of spatial transforms, ones with a global
(unbounded) spatial domain and ones with a bounded spatial domain.
Points in SITK are mapped by the transform using the
TransformPoint
method.
All global domain transforms are of the form:
\(T(x) = A(x - c) + t + c\)
The nomenclature used in the documentation refers to the components of the transformations as follows:
- Matrix - the matrix \(A\).
- Center - the point \(c\).
- Translation - the vector \(t\).
- Offset - the expression \(t + c - Ac\).
A variety of global 2D and 3D transformations are available (translation, rotation, rigid, similarity, affine…). Some of these transformations are available with various parameterizations which are useful for registration purposes.
The second type of spatial transformation, bounded domain transformations, are defined to be identity outside their domain. These include the B-spline deformable transformation, often referred to as Free-Form Deformation, and the displacement field transformation.
The B-spline transform uses a grid of control points to represent a spline based transformation. To specify the transformation the user defines the number of control points and the spatial region which they overlap. The spline order can also be set, though the default of cubic is appropriate in most cases. The displacement field transformation uses a dense set of vectors representing displacement in a bounded spatial domain. It has no implicit constraints on transformation continuity or smoothness.
Finally, SITK supports a composite transformation with either a bounded or global domain. This transformation represents multiple transformations applied one after the other \(T_0(T_1(T_2(...T_n(p))))\). The semantics are stack based, that is, first in last applied:
For more details about SITK transformation types and examples, see this tutorial notebook.
Resampling
Resampling, as the verb implies, is the action of sampling an image, which itself is a sampling of an original continuous signal.
Generally speaking, resampling in SITK involves four components:
- Image - the image we resample, given in coordinate system \(m\).
- Resampling grid - a regular grid of points given in coordinate system \(f\) which will be mapped to coordinate system \(m\).
- Transformation \(T_f^m\) - maps points from coordinate system \(f\) to coordinate system \(m\), \(^mp=T_f^m(^fp)\).
- Interpolator - method for obtaining the intensity values at arbitrary points in coordinate system \(m\) from the values of the points defined by the Image.
While SITK provides a large number of interpolation methods, the two most commonly used are sitkLinear and sitkNearestNeighbor. The former is used for most interpolation tasks and is a compromise between accuracy and computational efficiency. The later is used to interpolate labeled images representing a segmentation. It is the only interpolation approach which will not introduce new labels into the result.
The SITK interface includes three variants for specifying the resampling grid:
- Use the same grid as defined by the resampled image.
- Provide a second, reference, image which defines the grid.
- Specify the grid using: size, origin, spacing, and direction cosine matrix.
Points that are mapped outside of the resampled image’s spatial extent in physical space are set to a constant pixel value which you provide (default is zero).
It is not uncommon to end up with an empty (all black) image after resampling. This is due to:
- Using wrong settings for the resampling grid (not too common, but does happen).
- Using the inverse of the transformation \(T_f^m\). This is a relatively common error,
which is readily addressed by invoking the transformation’s
GetInverse
method.
Let’s try to plot multiple slices across different axis for the image “training_001_mr_T1.mha”.
PYTHON
img_volume = sitk.ReadImage("data/sitk/training_001_mr_T1.mha")
print(img_volume.GetSize())
print(img_volume.GetSpacing())
OUTPUT
(256, 256, 26)
(1.25, 1.25, 4.0)
We can plot the slices as we did before:
PYTHON
img_xslices = [img_volume[x,:,:] for x in range(50, 200, 30)]
img_yslices = [img_volume[:,y,:] for y in range(50, 200, 30)]
img_zslices = [img_volume[:,:,z] for z in range(1, 25, 3)]
tile_x = sitk.Tile(img_xslices, [1,0])
tile_y = sitk.Tile(img_yslices, [1,0])
tile_z = sitk.Tile(img_zslices, [1,0])
nda_xslices = sitk.GetArrayViewFromImage(tile_x)
nda_yslices = sitk.GetArrayViewFromImage(tile_y)
nda_zslices = sitk.GetArrayViewFromImage(tile_z)
fig, (ax1, ax2, ax3) = plt.subplots(1,3)
ax1.imshow(nda_xslices, cmap=plt.cm.Greys_r)
ax2.imshow(nda_yslices, cmap=plt.cm.Greys_r)
ax3.imshow(nda_zslices, cmap=plt.cm.Greys_r)
ax1.set_title('X slices')
ax2.set_title('Y slices')
ax3.set_title('Z slices')
Challenge: Distorted Images
What is the main difference with the first image we plotted (“A1_grayT1.nrrd”)?
In this case, there are only 26 images in the volume and the spacing between voxels is non-isotropic, and in particular it is the same across x- and y-axis, but it differs across the z-axis.
We can fix the distortion by resampling the volume along the z-axis, which has a different spacing (i.e., 4mm), and make it match with the other two spacing measures (i.e., 1.25mm):
PYTHON
def resample_img(image, out_spacing=[1.25, 1.25, 1.25]):
# Resample images to 1.25mm spacing
original_spacing = image.GetSpacing()
original_size = image.GetSize()
out_size = [
int(np.round(original_size[0] * (original_spacing[0] / out_spacing[0]))),
int(np.round(original_size[1] * (original_spacing[1] / out_spacing[1]))),
int(np.round(original_size[2] * (original_spacing[2] / out_spacing[2])))]
resample = sitk.ResampleImageFilter()
resample.SetOutputSpacing(out_spacing)
resample.SetSize(out_size)
resample.SetOutputDirection(image.GetDirection())
resample.SetOutputOrigin(image.GetOrigin())
resample.SetTransform(sitk.Transform())
resample.SetDefaultPixelValue(image.GetPixelIDValue())
resample.SetInterpolator(sitk.sitkBSpline)
return resample.Execute(image)
resampled_sitk_img = resample_img(img_volume)
print(resampled_sitk_img.GetSize())
print(resampled_sitk_img.GetSpacing())
OUTPUT
(256, 256, 83)
(1.25, 1.25, 1.25)
Registration
Image registration involves spatially transforming the source/moving image(s) to align with the target image. More specifically, the goal of registration is to estimate the transformation which maps points from one image to the corresponding points in another image. The transformation estimated via registration is said to map points from the fixed image (target image) coordinate system to the moving image (source image) coordinate system.
Many Ways to Do Registration:
- Several libraries offer built-in registration functionalities.
- Registration can be performed with DIPY (mentioned in the MRI
episode), specifically for diffusion imaging using the
SymmetricDiffeomorphicRegistration
functionality. - NiBabel includes a function in its processing module that allows resampling or conforming one NIfTI image into the space of another.
- SITK provides robust registration capabilities and allows you to code your own registration algorithms.
- To maintain cleaner and more efficient code, it’s advisable to use as few libraries as possible and avoid those that may create conflicts.
SITK provides a configurable multi-resolution registration framework, implemented in the ImageRegistrationMethod class. In addition, a number of variations of the Demons registration algorithm are implemented independently from this class as they do not fit into the framework.
The task of registration is formulated using non-linear optimization which requires an initial estimate. The two most common initialization approaches are (1) Use the identity transform (a.k.a. forgot to initialize). (2) Align the physical centers of the two images (see CenteredTransformInitializerFilter). If after initialization there is no overlap between the images, registration will fail. The closer the initialization transformation is to the actual transformation, the higher the probability of convergence to the correct solution.
If your registration involves the use of a global domain transform (described here), you should also set an appropriate center of rotation. In many cases you want the center of rotation to be the physical center of the fixed image (the CenteredTransformCenteredTransformInitializerFilter ensures this). This is of significant importance for registration convergence due to the non-linear nature of rotation. When the center of rotation is far from our physical region of interest (ROI), a small rotational angle results in a large displacement. Think of moving the pivot/fulcrum point of a lever. For the same rotation angle, the farther you are from the fulcrum the larger the displacement. For numerical stability we do not want our computations to be sensitive to very small variations in the rotation angle, thus the ideal center of rotation is the point which minimizes the distance to the farthest point in our ROI:
\(p_{center} = arg_{p_{rotation}} min dist (p_{rotation}, \{p_{roi}\})\)
Without additional knowledge we can only assume that the ROI is the whole fixed image. If your ROI is only in a sub region of the image, a more appropriate point would be the center of the oriented bounding box of that ROI.
To create a specific registration instance using the ImageRegistrationMethod you need to select several components which together define the registration instance:
- Transformation
- It defines the mapping between the two images.
- Similarity metric
- It reflects the relationship between the intensities of the images (identity, affine, stochastic…).
- Optimizer.
- When selecting the optimizer you will also need to configure it (e.g. set the number of iterations).
- Interpolator.
- In most cases linear interpolation, the default setting, is sufficient.
Let’s see now an example where we want to use registration for aligning two volumes relative to the same patient, one being a CT scan and the second being a MRI sequence T1-weighted scan. We first read the images, casting the pixel type to that required for registration (Float32 or Float64) and look at them:
PYTHON
from ipywidgets import interact, fixed
from IPython.display import clear_output
import os
OUTPUT_DIR = "data/sitk/"
fixed_image = sitk.ReadImage(f"{OUTPUT_DIR}training_001_ct.mha", sitk.sitkFloat32)
moving_image = sitk.ReadImage(f"{OUTPUT_DIR}training_001_mr_T1.mha", sitk.sitkFloat32)
print(f"Origin for fixed image: {fixed_image.GetOrigin()}, moving image: {moving_image.GetOrigin()}")
print(f"Spacing for fixed image: {fixed_image.GetSpacing()}, moving image: {moving_image.GetSpacing()}")
print(f"Size for fixed image: {fixed_image.GetSize()}, moving image: {moving_image.GetSize()}")
print(f"Number Of Components Per Pixel for fixed image: {fixed_image.GetNumberOfComponentsPerPixel()}, moving image: {moving_image.GetNumberOfComponentsPerPixel()}")
# Callback invoked by the interact IPython method for scrolling through the image stacks of
# the two images (moving and fixed).
def display_images(fixed_image_z, moving_image_z, fixed_npa, moving_npa):
# Create a figure with two subplots and the specified size.
plt.subplots(1,2,figsize=(10,8))
# Draw the fixed image in the first subplot.
plt.subplot(1,2,1)
plt.imshow(fixed_npa[fixed_image_z,:,:],cmap=plt.cm.Greys_r)
plt.title('fixed/target image')
plt.axis('off')
# Draw the moving image in the second subplot.
plt.subplot(1,2,2)
plt.imshow(moving_npa[moving_image_z,:,:],cmap=plt.cm.Greys_r)
plt.title('moving/source image')
plt.axis('off')
plt.show()
interact(
display_images,
fixed_image_z = (0,fixed_image.GetSize()[2]-1),
moving_image_z = (0,moving_image.GetSize()[2]-1),
fixed_npa = fixed(sitk.GetArrayViewFromImage(fixed_image)),
moving_npa = fixed(sitk.GetArrayViewFromImage(moving_image)))
OUTPUT
Origin for fixed image: (0.0, 0.0, 0.0), moving image: (0.0, 0.0, 0.0)
Spacing for fixed image: (0.653595, 0.653595, 4.0), moving image: (1.25, 1.25, 4.0)
Size for fixed image: (512, 512, 29), moving image: (256, 256, 26)
Number Of Components Per Pixel for fixed image: 1, moving image: 1
We can use the CenteredTransformInitializer
to align the
centers of the two volumes and set the center of rotation to the center
of the fixed image:
PYTHON
# Callback invoked by the IPython interact method for scrolling and
# modifying the alpha blending of an image stack of two images that
# occupy the same physical space.
def display_images_with_alpha(image_z, alpha, fixed, moving):
img = (1.0 - alpha)*fixed[:,:,image_z] + alpha*moving[:,:,image_z]
plt.imshow(sitk.GetArrayViewFromImage(img),cmap=plt.cm.Greys_r)
plt.axis('off')
plt.show()
initial_transform = sitk.CenteredTransformInitializer(fixed_image,
moving_image,
sitk.Euler3DTransform(),
sitk.CenteredTransformInitializerFilter.GEOMETRY)
moving_resampled = sitk.Resample(moving_image, fixed_image, initial_transform, sitk.sitkLinear, 0.0, moving_image.GetPixelID())
interact(
display_images_with_alpha,
image_z = (0,fixed_image.GetSize()[2]-1),
alpha = (0.0,1.0,0.05),
fixed = fixed(fixed_image),
moving = fixed(moving_resampled))
The specific registration task at hand estimates a 3D rigid transformation between images of different modalities. There are multiple components from each group (optimizers, similarity metrics, interpolators) that are appropriate for the task. Note that each component selection requires setting some parameter values. We have made the following choices:
- Similarity metric, mutual information (Mattes MI):
- Number of histogram bins, 50.
- Sampling strategy, random.
- Sampling percentage, 1%.
- Interpolator,
sitkLinear
. - Optimizer, gradient descent:
- Learning rate, step size along traversal direction in parameter space, 1.0.
- Number of iterations, maximal number of iterations, 100.
- Convergence minimum value, value used for convergence checking in conjunction with the energy profile of the similarity metric that is estimated in the given window size, 1e-6.
- Convergence window size, number of values of the similarity metric which are used to estimate the energy profile of the similarity metric, 10.
We perform registration using the settings given above, and by taking advantage of the built in multi-resolution framework, we use a three tier pyramid.
In this example we plot the similarity metric’s value during registration. Note that the change of scales in the multi-resolution framework is readily visible.
PYTHON
# Callback invoked when the StartEvent happens, sets up our new data.
def start_plot():
global metric_values, multires_iterations
metric_values = []
multires_iterations = []
# Callback invoked when the EndEvent happens, do cleanup of data and figure.
def end_plot():
global metric_values, multires_iterations
del metric_values
del multires_iterations
# Close figure, we don't want to get a duplicate of the plot latter on.
plt.close()
# Callback invoked when the sitkMultiResolutionIterationEvent happens, update the index into the
# metric_values list.
def update_multires_iterations():
global metric_values, multires_iterations
multires_iterations.append(len(metric_values))
# Callback invoked when the IterationEvent happens, update our data and display new figure.
def plot_values(registration_method):
global metric_values, multires_iterations
metric_values.append(registration_method.GetMetricValue())
# Clear the output area (wait=True, to reduce flickering), and plot current data
clear_output(wait=True)
# Plot the similarity metric values
plt.plot(metric_values, 'r')
plt.plot(multires_iterations, [metric_values[index] for index in multires_iterations], 'b*')
plt.xlabel('Iteration Number',fontsize=12)
plt.ylabel('Metric Value',fontsize=12)
plt.show()
registration_method = sitk.ImageRegistrationMethod()
# Similarity metric settings.
registration_method.SetMetricAsMattesMutualInformation(numberOfHistogramBins=50)
registration_method.SetMetricSamplingStrategy(registration_method.RANDOM)
registration_method.SetMetricSamplingPercentage(0.01)
registration_method.SetInterpolator(sitk.sitkLinear)
# Optimizer settings.
registration_method.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100, convergenceMinimumValue=1e-6, convergenceWindowSize=10)
registration_method.SetOptimizerScalesFromPhysicalShift()
# Setup for the multi-resolution framework.
registration_method.SetShrinkFactorsPerLevel(shrinkFactors = [4,2,1])
registration_method.SetSmoothingSigmasPerLevel(smoothingSigmas=[2,1,0])
registration_method.SmoothingSigmasAreSpecifiedInPhysicalUnitsOn()
# Don't optimize in-place, we would possibly like to run this cell multiple times.
registration_method.SetInitialTransform(initial_transform, inPlace=False)
# Connect all of the observers so that we can perform plotting during registration.
registration_method.AddCommand(sitk.sitkStartEvent, start_plot)
registration_method.AddCommand(sitk.sitkEndEvent, end_plot)
registration_method.AddCommand(sitk.sitkMultiResolutionIterationEvent, update_multires_iterations)
registration_method.AddCommand(sitk.sitkIterationEvent, lambda: plot_values(registration_method))
final_transform = registration_method.Execute(sitk.Cast(fixed_image, sitk.sitkFloat32),
sitk.Cast(moving_image, sitk.sitkFloat32))
Always remember to query why the optimizer terminated. This will help
you understand whether termination is too early, either due to
thresholds being too tight, early termination due to small number of
iterations - numberOfIterations
, or too loose, early
termination due to large value for minimal change in similarity measure
- convergenceMinimumValue
.
PYTHON
print('Final metric value: {0}'.format(registration_method.GetMetricValue()))
print('Optimizer\'s stopping condition, {0}'.format(registration_method.GetOptimizerStopConditionDescription()))
OUTPUT
Final metric value: -0.6561600032169457
Optimizer's stopping condition, GradientDescentOptimizerv4Template: Convergence checker passed at iteration 61.
Now we can visually inspect the results:
PYTHON
moving_resampled = sitk.Resample(moving_image, fixed_image, final_transform, sitk.sitkLinear, 0.0, moving_image.GetPixelID())
interact(
display_images_with_alpha,
image_z = (0,fixed_image.GetSize()[2]-1),
alpha = (0.0,1.0,0.05),
fixed = fixed(fixed_image),
moving = fixed(moving_resampled))
PYTHON
print(f"Origin for fixed image: {fixed_image.GetOrigin()}, shifted moving image: {moving_resampled.GetOrigin()}")
print(f"Spacing for fixed image: {fixed_image.GetSpacing()}, shifted moving image: {moving_resampled.GetSpacing()}")
print(f"Size for fixed image: {fixed_image.GetSize()}, shifted moving image: {moving_resampled.GetSize()}")
OUTPUT
Origin for fixed image: (0.0, 0.0, 0.0), shifted moving image: (0.0, 0.0, 0.0)
Spacing for fixed image: (0.653595, 0.653595, 4.0), shifted moving image: (0.653595, 0.653595, 4.0)
Size for fixed image: (512, 512, 29), shifted moving image: (512, 512, 29)
If we are satisfied with the results, save them to file.
Segmentation
Image segmentation filters process images by dividing them into meaningful regions. SITK provides a wide range of filters to support classical segmentation algorithms, including various thresholding methods and watershed algorithms. The output is typically an image where different integers represent distinct objects, with 0 often used for the background and 1 (or sometimes 255) for foreground objects. After segmenting the data, SITK allows for efficient post-processing, such as labeling distinct objects and analyzing their shapes.
Let’s start by reading in a T1 MRI scan, on which we will perform segmentation operations.
PYTHON
%matplotlib inline
import matplotlib.pyplot as plt
from ipywidgets import interact, fixed
import SimpleITK as sitk
img_T1 = sitk.ReadImage("data/sitk/A1_grayT1.nrrd")
# To visualize the labels image in RGB with needs a image with 0-255 range
img_T1_255 = sitk.Cast(sitk.RescaleIntensity(img_T1), sitk.sitkUInt8)
# Callback invoked by the interact IPython method for scrolling through the image stacks of
# a volume image
def display_images(image_z, npa, title):
plt.imshow(npa[image_z,:,:], cmap=plt.cm.Greys_r)
plt.title(title)
plt.axis('off')
plt.show()
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(img_T1)),
title = fixed('Z slices'))
Thresholding
Thresholding is the most basic form of segmentation. It simply labels the pixels of an image based on the intensity range without respect to geometry or connectivity.
PYTHON
# Basic thresholding
seg = img_T1>200
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Basic thresholding"))
Another example using BinaryThreshold
:
PYTHON
# Binary thresholding
seg = sitk.BinaryThreshold(img_T1, lowerThreshold=100, upperThreshold=400, insideValue=1, outsideValue=0)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Binary thresholding"))****
ITK has a number of histogram based automatic thresholding filters
including Huang
, MaximumEntropy
,
Triangle
, and the popular Otsu’s method
(OtsuThresholdImageFilter
). These methods create a
histogram then use a heuristic to determine a threshold value.
PYTHON
# Otsu Thresholding
otsu_filter = sitk.OtsuThresholdImageFilter()
otsu_filter.SetInsideValue(0)
otsu_filter.SetOutsideValue(1)
seg = otsu_filter.Execute(img_T1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Otsu thresholding"))
print(otsu_filter.GetThreshold() )
OUTPUT
236.40869140625
Region Growing Segmentation
The first step of improvement upon the naive thresholding is a class of algorithms called region growing. The common theme for all these algorithms is that a voxel’s neighbor is considered to be in the same class if its intensities are similar to the current voxel. The definition of similar is what varies:
- ConnectedThreshold: The neighboring voxel’s intensity is within explicitly specified thresholds.
- ConfidenceConnected: The neighboring voxel’s intensity is within the implicitly specified bounds \(\mu \pm c \sigma\), where \(\mu\) is the mean intensity of the seed points, \(\sigma\) their standard deviation and \(c\) a user specified constant.
- VectorConfidenceConnected: A generalization of the previous approach to vector valued images, for instance multi-spectral images or multi-parametric MRI. The neighboring voxel’s intensity vector is within the implicitly specified bounds using the Mahalanobis distance \(\sqrt{(x-\mu)^{T\sum{-1}}(x-\mu)}<c\), where \(\mu\) is the mean of the vectors at the seed points, \(\sum\) is the covariance matrix and \(c\) is a user specified constant.
- NeighborhoodConnected
Let’s imagine that we have to segment the left lateral ventricle of the brain image we just visualized.
3D Slicer was used to determine that index: (132,142,96) was a good seed for the left lateral ventricle. Let’s first visualize the seed:
PYTHON
# Initial seed
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Initial seed"))
Let’s use ConnectedThreshold
functionality:
PYTHON
# Connected Threshold
seg = sitk.ConnectedThreshold(img_T1, seedList=[seed], lower=100, upper=190)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Connected threshold"))
Improving upon this is the ConfidenceConnected
filter,
which uses the initial seed or current segmentation to estimate the
threshold range.
This region growing algorithm allows the user to implicitly specify the threshold bounds based on the statistics estimated from the seed points, \(\mu \pm c\sigma\). This algorithm has some flexibility which you should familiarize yourself with:
- The “multiplier” parameter is the constant \(c\) from the formula above.
- You can specify a region around each seed point “initialNeighborhoodRadius” from which the statistics are estimated, see what happens when you set it to zero.
- The “numberOfIterations” allows you to rerun the algorithm. In the first run the bounds are defined by the seed voxels you specified, in the following iterations \(\mu\) and \(\sigma\) are estimated from the segmented points and the region growing is updated accordingly.
PYTHON
seg = sitk.ConfidenceConnected(img_T1, seedList=[seed],
numberOfIterations=0,
multiplier=2,
initialNeighborhoodRadius=1,
replaceValue=1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z=(0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title=fixed("Confidence connected"))
Since we have available also another MRI scan of the same patient,
T2-weighted, we can try to further improve the segmentation. We first
load a T2 image from the same person and combine it with the T1 image to
create a vector image. This region growing algorithm is similar to the
previous one, ConfidenceConnected
, and allows the user to
implicitly specify the threshold bounds based on the statistics
estimated from the seed points. The main difference is that in this case
we are using the Mahalanobis and not the intensity difference.
PYTHON
img_T2 = sitk.ReadImage("data/sitk/A1_grayT2.nrrd")
img_multi = sitk.Compose(img_T1, img_T2)
seg = sitk.VectorConfidenceConnected(img_multi, seedList=[seed],
numberOfIterations=1,
multiplier=2.5,
initialNeighborhoodRadius=1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Vector confidence connected"))
Clean Up
Use of low level segmentation algorithms such as region growing is
often followed by a clean up step. In this step we fill holes and remove
small connected components. Both of these operations are achieved by
using binary morphological operations, opening
(BinaryMorphologicalOpening
) to remove small connected
components and closing (BinaryMorphologicalClosing
) to fill
holes.
SITK supports several shapes for the structuring elements (kernels) including:
sitkAnnulus
sitkBall
sitkBox
sitkCross
The size of the kernel can be specified as a scalar (same for all dimensions) or as a vector of values, size per dimension.
The following code cell illustrates the results of such a clean up, using closing to remove holes in the original segmentation.
PYTHON
seg = sitk.ConfidenceConnected(img_T1, seedList=[seed],
numberOfIterations=0,
multiplier=2,
initialNeighborhoodRadius=1,
replaceValue=1)
vectorRadius=(1,1,1)
kernel=sitk.sitkBall
seg_clean = sitk.BinaryMorphologicalClosing(seg, vectorRadius, kernel)
seg_img_clean = sitk.LabelOverlay(img_T1_255, seg_clean)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img_clean)),
title = fixed("Confidence connected after morphological closing"))
Level-Set Segmentation
There are a variety of level-set based segmentation filter available in ITK:
There is also a modular Level-set framework which allows composition of terms and easy extension in C++.
First we create a label image from our seed.
PYTHON
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
Use the seed to estimate a reasonable threshold range.
PYTHON
stats = sitk.LabelStatisticsImageFilter()
stats.Execute(img_T1, seg)
factor = 3.5
lower_threshold = stats.GetMean(1)-factor*stats.GetSigma(1)
upper_threshold = stats.GetMean(1)+factor*stats.GetSigma(1)
print(lower_threshold,upper_threshold)
OUTPUT
81.25184541308809 175.0084466827569
PYTHON
init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(lower_threshold)
lsFilter.SetUpperThreshold(upper_threshold)
lsFilter.SetMaximumRMSError(0.02)
lsFilter.SetNumberOfIterations(1000)
lsFilter.SetCurvatureScaling(.5)
lsFilter.SetPropagationScaling(1)
lsFilter.ReverseExpansionDirectionOn()
ls = lsFilter.Execute(init_ls, sitk.Cast(img_T1, sitk.sitkFloat32))
print(lsFilter)
OUTPUT
itk::simple::ThresholdSegmentationLevelSetImageFilter
LowerThreshold: 81.2518
UpperThreshold: 175.008
MaximumRMSError: 0.02
PropagationScaling: 1
CurvatureScaling: 0.5
NumberOfIterations: 1000
ReverseExpansionDirection: 1
ElapsedIterations: 119
RMSChange: 0.0180966
Debug: 0
NumberOfThreads: 10
NumberOfWorkUnits: 0
Commands: (none)
ProgressMeasurement: 0.119
ActiveProcess: (none)
PYTHON
seg_img = sitk.LabelOverlay(img_T1_255, ls>0)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Level set segmentation"))
Challenge: Segment on the y-Axis
Try to segment the lateral ventricle using volume’s slices on y-axis instead of z-axis.
Hint: Start by editing the display_images
function in
order to select the slices on the y-axis.
PYTHON
def display_images(image_y, npa, title):
plt.imshow(npa[:,image_y,:], cmap=plt.cm.Greys_r)
plt.title(title)
plt.axis('off')
plt.show()
# Initial seed; we can reuse the same used for the z-axis slices
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_y = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Initial seed"))
After some attempts, this was the method that gave the best segmentation results:
PYTHON
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
stats = sitk.LabelStatisticsImageFilter()
stats.Execute(img_T1, seg)
factor = 3.5
lower_threshold = stats.GetMean(1)-factor*stats.GetSigma(1)
upper_threshold = stats.GetMean(1)+factor*stats.GetSigma(1)
print(lower_threshold,upper_threshold)
init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(lower_threshold)
lsFilter.SetUpperThreshold(upper_threshold)
lsFilter.SetMaximumRMSError(0.02)
lsFilter.SetNumberOfIterations(1000)
lsFilter.SetCurvatureScaling(.5)
lsFilter.SetPropagationScaling(1)
lsFilter.ReverseExpansionDirectionOn()
ls = lsFilter.Execute(init_ls, sitk.Cast(img_T1, sitk.sitkFloat32))
OUTPUT
81.25184541308809 175.0084466827569
Segmentation Evaluation
Evaluating segmentation algorithms typically involves comparing your results to reference data.
In the medical field, reference data is usually created through manual segmentation by an expert. When resources are limited, a single expert might define this data, though this is less than ideal. If multiple experts contribute, their inputs can be combined to produce reference data that more closely approximates the elusive “ground truth.”
For detailed coding examples on segmentation evaluation, refer to this notebook.
Acknowledgements
This episode was largely inspired by the official SITK tutorial, which is copyrighted by NumFOCUS and distributed under the Creative Commons Attribution 4.0 International License, and SITK Notebooks.
Additional Resources
To really understand the structure of SITK images and how to work with them, we recommend some hands-on interaction using the SITK Jupyter notebooks from the SITK official channels. More detailed information about SITK fundamental concepts can also be found here.
Code illustrating various aspects of the registration and segmentation framework can be found in the set of examples which are part of the SITK distribution and in the SITK Jupyter notebook repository.
Key Points
- Registration aligns images for data merging or temporal tracking, while segmentation identifies objects within images, which is critical for detailed analysis.
- SITK simplifies segmentation, registration, and advanced analysis tasks using ITK algorithms and supporting several programming languages.
- Images in SITK are defined by physical space, unlike array-based libraries, ensuring accurate spatial representation and metadata management.
- SITK offers global and bounded domain transformations for spatial manipulation and efficient resampling techniques with various interpolation options.
- Use SITK’s robust capabilities for registration and classical segmentation methods such as thresholding and region growth, ensuring efficient analysis of medical images.
Content from Preparing Images for Machine Learning
Last updated on 2024-09-14 | Edit this page
Overview
Questions
- What are the fundamental steps in preparing images for machine learning?
- What techniques can be used to augment data?
- How should data from various sources or collected under different conditions be handled?
- How can we generate features for machine learning using radiomics or volumetrics?
Objectives
- Outline the fundamental steps involved in preparing images for machine learning
- Demonstrate data augmentation techniques using affine transformations
- Discuss the potential pitfalls of data augmentation
- Explain the concept of derived features, including radiomics and other pipeline-derived features
- Review image harmonization techniques at both the image and dataset levels
Basic Steps
Datasets of images can serve as the raw data for machine learning (ML). While in rare cases they might be ready for use with minimal effort, in most projects, a significant portion of time is dedicated to cleaning and preparing the data. The quality of the data directly impacts the quality of the resulting models.
One of the initial steps in building a model involves manually inspecting both the data and metadata.
Challenge: Thinking About Metadata
What metadata should you examine in almost any radiological dataset?
Hint: Consider metadata pertinent to brain and thorax imaging.
In almost any radiological dataset, it is essential to examine the age and sex distribution. These metadata are crucial for both brain MRIs and chest X-rays, as well as for many other types of imaging studies.
In real-world scenarios, we often face the challenge of investigating a specific pathology that is relatively rare. Consequently, our dataset may contain thousands of normal subjects but relatively few with any pathology, and even fewer with the pathology of interest. This creates an imbalanced dataset and can introduce biases. For instance, if we aim to develop a model to detect heart failure on chest X-rays, but all our heart failure patients are older men, the model may incorrectly “focus” (by exploiting statistical correlations) on the absence of female breasts and signs of aging, rather than on true heart failure indicators like the cardiothoracic ratio.
Here are some initial steps to understanding your dataset if you plan to do supervised ML:
- Check some images by hand: assess quality, content, and any unexpected elements.
- Check some labeling by hand: ensure accuracy, consistency, and note any surprises.
- Diversity check for obvious protected classes: examine images, labels, or metadata for representation.
- Convert images to a lossless format and store a backup of the data collection.
- Anonymize data: remove identifiable information within images (consider using OCR for text).
- Determine important image features: consider creating cropped versions.
- Normalize images: adjust for size, histogram, and other factors, ensuring proper centering when necessary.
- Re-examine some images by hand: look for artifacts introduced during preprocessing steps.
- Harmonize the dataset.
- Create augmented and/or synthetic data.
When preparing images for unsupervised learning, many of these steps still apply. However, you must also consider the specific algorithm’s requirements, as some segmentation algorithms may not benefit from additional data.
In our preparation for ML as outlined above, step two involves verifying the accuracy of labeling. The label represents the target variable you want your model to predict, so it is crucial to ensure that the labeling is both accurate and aligns with the outcomes you expect your model to predict.
In step three, we emphasize the importance of checking for diversity among protected classes—groups that are often subject to specific legislation in medical diagnostics due to historical underrepresentation. It is essential to ensure your dataset includes women and ethnic minority groups, as population-level variations can significantly impact model performance. For instance, chest circumference and height can vary greatly between populations, such as between Dutch and Indonesian people. While these differences might not affect an algorithm focused on cerebral blood flow, they are critical for others.
In step five, the focus is on anonymization, particularly in imaging data like ultrasounds, where patient names or diagnoses might appear directly on the image. Simple cropping may sometimes suffice, but more advanced techniques such as blurring, masking, or optical character recognition (OCR) should also be considered to ensure thorough anonymization.
We will cover step nine, dataset harmonization, in a separate section. For step ten, creating more data, synthetic data generation will be discussed further in an episode on generative AI. In the next section, we will explore examples of augmented data creation.
Let’s go throught some examples. First, we import the libraries we need:
PYTHON
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
from skimage import transform
from skimage import io
from skimage.transform import rotate
from skimage import transform as tf
from skimage.transform import PiecewiseAffineTransform
from skimage.transform import resize
Then, we import our example images and examine them.
PYTHON
image_cxr1 = io.imread('data/ml/rotatechest.png') # a relatively normal chest X-ray (CXR)
image_cxr_cmegaly = io.imread('data/ml/cardiomegaly_cc0.png') # cardiomegaly CXR
image_cxr2 = io.imread('data/ml/other_op.png') # a relatively normal CXR
# create figure
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 3
# Adds a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(image_cxr1)
plt.axis('off')
plt.title("Normal 1")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(image_cxr_cmegaly)
plt.axis('off')
plt.title("Cardiomegaly")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(image_cxr2)
plt.axis('off')
plt.title("Normal 2")
Challenge: Can You See Some problems in the Following Scenario?
Imagine you got the above images and many more because you have been assigned to make an algorithm for cardiomegaly detection. The goal is to notify patients if their hospital-acquired X-rays, taken for any reason, show signs of cardiomegaly. This is an example of using ML for opportunistic screening.
You are provided with two datasets:
- A dataset of healthy (no cardiomegaly) patients from an outpatient clinic in a very poor area, staffed by first-year radiography students. These patients were being screened for tuberculosis (TB).
- A dataset of chest X-rays of cardiomegaly patients from an extremely prestigious tertiary inpatient hospital.
All of the following may pose potential problems:
- We may have different quality images from different machines for our healthy versus diseased patients, which could introduce a “batch effect” and create bias.
- At the prestigious hospital, many X-rays might be taken in the supine position. Will these combine well with standing AP X-rays?
- There could be concerns about the accuracy of labeling at the clinic since it was performed by lower-level staff.
Challenge: Prepare the Images for Classic Supervised ML
Use skimage.transform.rotate
to create two realistic
augmented images from the given ‘normal’ image stored in the
variables.
Then, in a single block of code, apply what you perceive as the most critical preprocessing operations to prepare these images for classic supervised ML.
Hint: Carefully examine the shape of the cardiomegaly image. Consider the impact of harsh lines on ML performance.
PYTHON
# figure out how much to cut on sides
print("cut top/bottom:", (image_cxr_cmegaly.shape[0] - image_cxr1.shape[0])/2)
cut_top_bottom = abs(round((image_cxr_cmegaly.shape[0] - image_cxr1.shape[0])/2))
# figure our how much to cut on top and bottom
print("cut sides:",(image_cxr_cmegaly.shape[1] - image_cxr1.shape[1])/2)
OUTPUT
cut top/bottom: -119.0
cut sides: -208.5
PYTHON
cut_sides = abs(round((image_cxr_cmegaly.shape[1] - image_cxr1.shape[1])/2))
list_images = [image_cxr1, image_cxr_cmegaly, image_cxr2]
better_for_ml_list = []
for image in list_images:
image = rotate(image,2)
image = image[cut_top_bottom:-cut_top_bottom, cut_sides: -cut_sides]
better_for_ml_list.append(image)
# create figure for display
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 3
# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(better_for_ml_list[0])
plt.axis('off')
plt.title("Normal example rotated")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(better_for_ml_list[1])
plt.axis('off')
plt.title("Cardiomegaly rotated")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(better_for_ml_list[2])
plt.axis('off')
plt.title("Another normal rotated")
PYTHON
# find a size we want to resize to, here the smallest
zero_side = []
one_side = []
for image in better_for_ml_list:
zero_side.append(image.shape[0])
zero_side.sort()
one_side.append(image.shape[1])
one_side.sort()
smallest_size = zero_side[0], one_side[0]
# make resized images
resized = []
for image in better_for_ml_list:
image = resize(image, (smallest_size))
resized.append(image)
# create figure for display
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 3
# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(resized[0])
plt.axis('off')
plt.title("Normal 1 Augmented Resized")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(resized[1])
plt.axis('off')
plt.title("Augment Cardiomegaly Resized")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(resized[2])
plt.axis('off')
plt.title("Normal 2 Augment Resized")
There are a few key considerations to keep in mind regarding the choices made here. The goal is to create data that realistically represents what might occur in real-life scenarios. For example, a 2-degree rotation is far more realistic than a 25-degree rotation. Unless you are using a rotationally invariant algorithm or another advanced method that accounts for such differences, even small rotations can make every pixel slightly different for the computer. A 2-degree rotation closely mirrors what might happen in a clinical setting, making it a more practical choice.
Additionally, the results can be further improved by cropping the images. Generally, we aim to minimize harsh, artificial lines in any images used for ML algorithms, unless those lines are consistent across all images. It is a common observation among practitioners that many ML algorithms tend to “focus” on these harsh lines, so if they are not uniformly present, it is better to remove them. As for how much to crop, we based our decision on the size of our images. While you don’t need to follow our exact approach, it is important to ensure that all images are ultimately the same size and that all critical features are retained.
Of course, there are numerous other transformations we can apply to images beyond rotation. For example, let’s explore applying a shear:
PYTHON
# create affine transform
affine_tf = tf.AffineTransform(shear=0.2)
# apply transform to image data
image_affine_tf = tf.warp(image_cxr_cmegaly, inverse_map=affine_tf)
# display the result
io.imshow(image_affine_tf)
io.show()
And finally, let’s demonstrate a “wave over a mesh.” We’ll start by creating a grid, or “mesh,” over our image, represented by dots in our plot. Then, we’ll apply a warp function to transform the image into the shape of a wave.
PYTHON
rows, cols = image_affine_tf.shape[0], image_affine_tf.shape[1]
# np.linspace will return evenly spaced numbers over an interval
src_cols = np.linspace(0, cols, 20)
# ie above is start=0, stop = cols, num = 50, and num is the number of chops
src_rows = np.linspace(0, rows, 10)
# np.meshgrid returns coordinate matrices from coordinate vectors.
src_rows, src_cols = np.meshgrid(src_rows, src_cols)
# numpy dstack stacks along a third dimension in the concatenation
src = np.dstack([src_cols.flat, src_rows.flat])[0]
dst_rows = src[:, 1] - np.sin(np.linspace(0, 3 * np.pi, src.shape[0])) * 50
dst_cols = src[:, 0]
dst_rows *= 1.5
dst_rows -= 1.5 * 50
dst = np.vstack([dst_cols, dst_rows]).T
tform = PiecewiseAffineTransform()
tform.estimate(src, dst)
noform = PiecewiseAffineTransform()
noform.estimate(src, src)
out_rows = image_affine_tf.shape[0] - 1.5 * 50
out_cols = cols
out = tf.warp(image_affine_tf, tform, output_shape=(out_rows, out_cols))
# create figure
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 4
# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(image_affine_tf)
plt.axis('off')
plt.title("Normal")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(image_affine_tf)
plt.plot(noform.inverse(src)[:, 0], noform.inverse(src)[:, 1], '.b')
plt.axis('off')
plt.title("Normal and Mesh")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(out)
plt.axis('off')
plt.title("Augment")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 4)
# showing image
plt.imshow(out)
plt.plot(tform.inverse(src)[:, 0], tform.inverse(src)[:, 1], '.b')
plt.axis('off')
plt.title("Augment and Mesh")
The last transformation doesn’t appear realistic. The chest became noticeably widened, which could be problematic. When augmenting data, there are numerous possibilities, but it’s crucial to ensure the augmented data remains realistic. Only a subject matter expert (typically a pathologist, nuclear medicine specialist, or radiologist) can accurately determine what realistic data should look like.
Libraries for Medical Image Augmentation
We demonstrated how to augment medical images using scikit-image. Another great resource for augmentation is SimpleITK, which offers a dedicated tutorial on data augmentation. Additionally, OpenCV provides versatile tools for image processing and augmentations.
To keep your environment manageable, we recommend using just one of these libraries. scikit-image, SimpleITK, and OpenCV all offer effective solutions for medical image augmentation.
Images’ Features
So far, we’ve focused on examples where we directly manipulate images. However, much of ML involves working with derived values from images, often converted into tabular data. In fact, it’s possible to combine images with various types of tabular data in multiple ways for ML. But before exploring these methods, let’s first consider using image features alone as inputs for ML.
Two notable examples of image features are volumetric data and radiomic data. Handling such data at scale—when dealing with thousands of images—typically involves automated processes rather than manual measurements. In the past, pioneers like Benjamin Felson, a founding figure in modern chest X-ray analysis, painstakingly analyzed hundreds of X-rays by hand to gather statistics. Today, processing pipelines allow us to efficiently analyze thousands of images simultaneously.
We aim to integrate images into a pipeline that automatically generates various types of data. A notable example of such a feature pipeline, particularly for brain imaging, is Freesurfer. Freesurfer facilitates the extraction of volume and other relevant data from brain imaging scans.
Challenge: Identifying Issues with Pipeline Outputs in the Absence of Original Images
Consider potential issues that could arise from using the output of a pipeline without accessing the original images.
One major concern is accuracy, as pipelines may mislabel or miscount features, potentially leading to unreliable data analysis or model training. To mitigate these issues, consider:
- Manual verification: validate pipeline outputs by manually reviewing a subset of images to ensure consistency with expected results.
- Literature review: consult existing research to establish benchmarks for feature counts and volumes, helping identify discrepancies that may indicate pipeline errors.
- Expert consultation: seek insights from radiologists or domain experts to assess the credibility of pipeline outputs based on their clinical experience.
Proceed with caution when relying on pipeline outputs, especially without direct access to the original images.
Another challenge with pipelines is that the results can vary significantly depending on which one is chosen. For a detailed exploration of how this can impact fMRI data, see this Nature article. Without access to the original images, it becomes difficult to determine which pipeline most accurately captures the desired outcomes from the data.
Morphometric or volumetric data represents one type of derived data, but radiomics introduces another dimension. Radiomics involves analyzing mathematical characteristics within images, such as entropy or kurtosis. Similar methodologies applied to pathology are referred to as pathomics. Some define radiomics and pathomics as advanced texture analysis in medical imaging, while others argue they encompass a broader range of quantitative data than traditional texture analysis methods.
Regardless of the data type chosen, the typical approach involves segmenting and/or masking an image to isolate the area of interest, followed by applying algorithms to derive radiomic or pathomic features. While it’s possible to develop custom code for this purpose, using established packages is preferred for ensuring reproducibility. For future reproducibility, adherence to standards such as those set by the International Bio-imaging Standards Initiative (IBSI) is crucial.
Below is a list of open-source* libraries that facilitate this type of analysis:
mirp |
pyradiomics |
LIFEx |
radiomics |
|
---|---|---|---|---|
License | EUPL-1.2 | BSD-3 | custom | GPL-3.0 |
Last updated | 5/2024 | 1/2024 | 6/2023 | 11/2019 |
Programming language | Python | Python | Java | MATLAB |
IBSI-1 compliant | yes | partial | yes | no claim |
IBSI-2 compliant | yes | no claim | yes | no claim |
Interface | high-level API | high-level API, Docker | GUI, low-level API | low-level API |
Website | GitHub | GitHub | website | GitHub |
Early publication | JOSS publication | doi:10.1158/0008-5472.CAN-17-0339 | doi:10.1158/0008-5472.CAN-18-0125 | doi:10.1088/0031-9155/60/14/5471 |
Notes | relative newcomer | very standard and supported | user-friendly | * MATLAB requires a license |
Once we have tabular data, we can apply various algorithms to analyze it. However, applying ML isn’t just a matter of running algorithms and getting quick results. In the next section, we will explore one reason why this process is more complex than it may seem.
Harmonization
We often need to harmonize either images or datasets of derived features. When dealing with images, differences between datasets can be visually apparent. For example, if one set of X-rays consistently appears darker, it may be wise to compare average pixel values between datasets and adjust normalization to achieve more uniformity. This is a straightforward scenario.
Consider another example involving derived datasets from brain MRIs with Virchow Robin’s spaces. One dataset was captured using a 1.5 Tesla machine in a distant location, while the other used an experimental 5 Tesla machine (high resolution) in an advanced hospital. These machines vary in resolution, meaning what appears as a single Virchow Robin’s space at low resolution might actually show as two or three smaller spaces fused together at high resolution. This is just one potential difference.
Below are images of the same patient scanned with 1.5 Tesla and 3 Tesla machines:
Different contrast levels significantly affect the visibility of structures like the caudate and thalami in brain images. As a result, the radiomic characteristics of these images, including contrast and possibly other parameters, can vary even when they originate from the same patient.
Constructing a dataset based solely on extensive, unharmonized derived data is not always practical. However, without access to the original images, it becomes difficult to fully understand and account for these variations.
We recommend the following approach:
- Compare the data using descriptive statistics and your knowledge of the patient groups. Consider whether you expect similar counts across both groups, and justify your reasoning.
- Explore the use of a harmonization package to standardize data across different datasets.
Below are three examples from the field of neuroimaging:
neurocombat |
haca3 |
autocombat |
|
---|---|---|---|
License | MIT for Python/ Artistic for R | Missing? | Apache 2.0 |
Last updated | 2021 | 2024 | 2022 |
Programming language | Python or R | Python | Python |
Organ/area and modality | brain MRI | brain MRI | brain MRI |
Data type | derived values | images | derived values |
Notes | no standard release yet | versioned but not released | not release, not versioned |
Original website | GitHub | GitHub | GitHub |
Early publication | doi: 10.1016/j.neuroimage.2017.08.047 | doi:10.1016/j.compmedimag.2023.102285 | doi: 10.1038/s41598-022-16609-1 |
Versioned website | versioned on cvasl | versioned on GitHub | versioned on cvasl |
There are numerous packages available for brain MRI alone, not to mention those for imaging the rest of the body. We have selected these three examples to highlight some of the common issues and pitfalls associated with such packages.
Challenge: Identifying Potential Problems in Each Package
Consider issues that might hinder your implementation with each package.
Aging code: neurocombat
was last modified three years
ago. It may rely on outdated dependencies that could pose compatibility
issues with newer hardware and software environments. Additionally, the
lack of versioned releases makes it challenging to track changes and
updates.
No license: haca3
does not have a clear license
available. Using code without a proper license could lead to legal
complications and uncertainties about its usage rights.
No releases and versioning: autocombat
lacks both
released versions and versioning practices. Without clear versioning, it
becomes difficult to ensure consistency and compare results across
different implementations of the package.
We’ve pointed out a few potential issues to highlight common challenges with such programs. Looking ahead, consider developing a harmonization package that’s sustainable and useful for your research community.
While each of these packages has its strengths for researchers, none are perfect. Choose wisely when selecting a package! Also, remember you can create your own harmonization method through coding or develop it into a package for others to use.
Key Points
- Direct knowledge of specific data cannot be substituted
- Statistical analysis is essential to detect and mitigate biases in patient distribution
- Verify if derived data aligns with known clinical realities through statistical examination
- Evaluate the validity and utility of data augmentation methods before applying them
- Radiomics enables the use of mathematical image qualities as features
- There are several accessible pipelines for volumetrics and radiomics
- Data from different machines (or even the same machines with different settings) often requires harmonization, achievable through coding and the use of existing libraries
Content from Anonymizing Medical Images
Last updated on 2024-11-03 | Edit this page
Overview
Questions
- What types of data make patient’s imaging data identifiable?
- How can I ensure the safe sharing of medical image data?
- How can I remove specific metadata from DICOM files?
Objectives
- Provide examples of data that makes patient images identifiable
- Discuss the concepts of identifiable data and anonymization
- Demonstrate how approaches to anonymizing identifiable images
- Demonstrate the use of the Pydicom library to manage DICOM metadata
Introduction
Each of us is similar yet unique, and this individuality can make us identifiable, posing challenges for medical research. While open data sharing advances research, most patients would not want their medical details shared if they could be identified. In most countries, patient information is protected by law.
Metadata elements in imaging, such as patient names and addresses, are often clearly designed to identify patients. However, the uniqueness of patients means that even images without obvious metadata, such as names, can potentially be identified as belonging to a specific individual. With advancements in facial recognition software and search engines, images we previously thought were non-identifiable, like head CTs, MRIs, or even PET scans, can theoretically be traced back to a specific patient. To address this, we can implement de-identification strategies to create shareable data.
Types of Patient Identifying Data
Metadata
DICOM files contain metadata, which includes various types of identifying information that should remain confidential. The easiest way to mitigate issues with DICOM metadata is to avoid having it in the first place. If possible, opt to receive just the images and select metadata rather than the entire DICOM file. When sharing data with collaborators, there is often no need to share the full DICOM files.
Text on Images
Occasionally, technicians will “burn” information directly onto images as part of a burned-in annotation. This means they change the image itself with text that becomes part of the image pixels. This may include details such as diagnoses, demographics, or the patient’s name. Fortunately, this text is usually typed rather than handwritten, making it recognizable by optical character recognition (OCR) functions. Often, this text is placed away from the center of the image, allowing for clever cropping to eliminate it entirely in some datasets.
Challenge: Getting rid of identifying burned in data
An ultrasound (taken from the public internet on a creative commons license lisence here.) must be kept at it’s existing height and width dimensions. You should change the image from RGB to grayscale for your operations. It can be opened as follows:
PYTHON
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, filters
image = 'data/anonym/identifiable_us.jpg'
io.imshow(image)
io.show()
OUTPUT
Write code for two approaches that de-identify (remove the annotations from) the ultrasound image data.
The image has identifying metadata. You should identify not only the name, but the date and place as problematic. You can take three different approaches. One would be to mask the data, the other would be to blur it, and finally you can just crop it out entirely. First we will show a selectively blurred image:
PYTHON
from skimage.filters import gaussian
from skimage.color import rgb2gray
image_base = io.imread(image)
image_base = rgb2gray(image_base)
sub_im = image_base[0:78,:].copy()
blur_sub_im = gaussian(sub_im, sigma=9)
final_image = np.zeros(image_base.shape)
final_image[0:78,:] = blur_sub_im
final_image[79:,:]= image_base[79:, :]
io.imshow(final_image)
OUTPUT
We could have also just make a simple zero-mask:
OUTPUT
Finally, you could always just chop off the offending part so to speak, and resize. However if you examine the image when we try this solution you should notice that the image has changed in aspect ratio, thus pixels values have changed. This may not be the ideal solution for situations when you want to maintain the exact image pixels. Nonetheless you can crop and resize as below:
PYTHON
from skimage.transform import resize
final_image= image_base[79:, :]
final_image = resize(final_image,image_base.shape)
io.imshow(final_image)
OUTPUT
Note there are other valid solutions to getting rid of what is identifying image data, but the first two shown (blurring and masking) are very common and straightforward. You have now seen three approaches to removing some of the visual data on a 2-D image. The same approaches can be taken on a 3D image.
Let’s take another look at an image we used earlier:
PYTHON
import matplotlib.pyplot as plt
import SimpleITK as sitk
from skimage import io
from ipywidgets import interact, fixed
from IPython.display import clear_output
import os
sag_image = sitk.ReadImage("data/sitk/A1_grayT1.nrrd", sitk.sitkFloat32)
cor_image = sitk.PermuteAxes(sag_image, [2, 1, 0])
# General display function for any two 3D images
def display_images(image1_z, image2_z, image1_npa, image2_npa, title1="Image 1", title2="Image 2"):
plt.subplots(1, 2, figsize=(10, 8))
# Display the first image
plt.subplot(1, 2, 1)
plt.imshow(image1_npa[image1_z, :, :], cmap=plt.cm.Greys_r)
plt.title(title1)
plt.axis('off')
# Display the second image
plt.subplot(1, 2, 2)
plt.imshow(image2_npa[image2_z, :, :], cmap=plt.cm.Greys_r)
plt.title(title2)
plt.axis('off')
plt.show()
# Display the sagittal and coronal views of the original image
interact(
display_images,
image1_z=(0, sag_image.GetSize()[2] - 1),
image2_z=(0, cor_image.GetSize()[2] - 1),
image1_npa=fixed(sitk.GetArrayViewFromImage(sag_image)),
image2_npa=fixed(sitk.GetArrayViewFromImage(cor_image)),
title1=fixed("Sagittal Cut"),
title2=fixed("Coronal Cut")
)
OUTPUT
Here we can see an analagous technique to one we used with the 2D ultrasound. Identifying data has been cropped out.
Challenge: Are we truly deidentified?(Optional)
Look at the head MRI above. Are all such images, including CTs, with some of the soft tissue stripped 100 percent deidentified? Give arguments why and why not
In most cases images like the above are de-identified. The image above is only an image, not a DICOM with potentially identifying metadata. Further we are missing the nose, therefore putting this in a reverse look-up engine online will not yield identifying results. On the other hand theoretically we could have an identifiable image. One potentially identifiable case could be someone with a very identifiable neck, skull bones (these will be more visible on CT) and/or ears. Additionally in the case of patients who have had some kind of rare pathology and surgery, they may still be identifiable for some familiar with their cases.
Faces in Images
A full CT, MRI, or PET scan of the head can be reconstructed into a detailed facial image, potentially revealing the patient’s identity and demographic information, such as ethnicity and gender. To mitigate this risk, many image analysis programs employ ‘defacing’ techniques to obscure these identifiable features.
We could in theory write our own defacing algorithm. For such an algorithm libraries such as SITK or skimage (sci-kit image) or even openCV provide several useful built in functions including morphological operations such as erosion, dilation, grow-from-seed and other types of operations such as connected component analysis and masking. Which library you choose will often be based on minimizing the number of libraries in your total code, because mixing close libraries with similarly named functions is a bad idea.
Challenge: Soft tissue stripping(Optional)
Look at the head MRI above. Try using SITK to get rid of some of the soft tissue in the image (already loaded into sag_image).
We can approach the problem in various ways. Below is one solution with erosion, dilation and masking:
PYTHON
# Apply thresholding to remove soft tissues by first taking out the air then dilating and cleaning
# the assumption is that the black around the brain is zero and low values
# Create the brain mask
lower_thresh = 0
upper_thresh = 100
brain_mask = sitk.BinaryThreshold(sag_image, lowerThreshold=lower_thresh, upperThreshold=upper_thresh)
# Morphological operations to clean the mask
brain_mask_cleaned = sitk.BinaryDilate(brain_mask, [5, 5, 5])
brain_mask_cleaned = sitk.BinaryErode(brain_mask_cleaned, [5, 5, 5])
# Display the original and cleaned mask images using the general display function
interact(
display_images,
image1_z=(0, brain_mask.GetSize()[2] - 1),
image2_z=(0, brain_mask_cleaned.GetSize()[2] - 1),
image1_npa=fixed(sitk.GetArrayViewFromImage(brain_mask)),
image2_npa=fixed(sitk.GetArrayViewFromImage(brain_mask_cleaned)),
title1=fixed("Original Mask"),
title2=fixed("Cleaned Mask")
)
Then we can further clean with a connected component analysis that throws out our small floaters, and use the inverse as out final mask.
PYTHON
def keep_largest_component(mask_image):
# Ensure mask_image is binary (0 and 1 values)
mask_image = sitk.Cast(mask_image, sitk.sitkUInt8)
# Label connected components in the mask image
labeled_image = sitk.ConnectedComponent(mask_image)
# Measure the size of each labeled component
label_shape_statistics = sitk.LabelShapeStatisticsImageFilter()
label_shape_statistics.Execute(labeled_image)
# Count and print the number of connected components
component_count = len(label_shape_statistics.GetLabels())
print(f"Number of connected components before filtering: {component_count}")
# Find the label with the largest size
largest_label = max(
label_shape_statistics.GetLabels(),
key=lambda label: label_shape_statistics.GetPhysicalSize(label)
)
# Create a new mask with only the largest component
largest_component_mask = sitk.BinaryThreshold(labeled_image, lowerThreshold=largest_label, upperThreshold=largest_label, insideValue=1, outsideValue=0)
# Verify the result by counting the components in the resulting image
labeled_result = sitk.ConnectedComponent(largest_component_mask)
label_shape_statistics.Execute(labeled_result)
result_component_count = len(label_shape_statistics.GetLabels())
print(f"Number of connected components after filtering: {result_component_count}")
return largest_component_mask
largest_component_mask = keep_largest_component(brain_mask_cleaned)
# we actually want the opposite mask so we will invert the mask
inverted_mask = sitk.BinaryNot(largest_component_mask)
# Apply the mask to the image
brain_only = sitk.Mask(sag_image, inverted_mask)
interact(
display_images,
image1_z = (0,brain_only.GetSize()[2]-1),
image2_z = (0,largest_component_mask.GetSize()[2]-1),
image1_npa = fixed(sitk.GetArrayViewFromImage(brain_only)),
image2_npa = fixed(sitk.GetArrayViewFromImage(largest_component_mask)),
title1=fixed("new image"),
title2=fixed("mask")
)
OUTPUT
Now you may notice that we put together and erosion and dilation, and we could have used instead a closure. There are many ways to solve this problem. An entirely different approach would be to use grow-from seed as below (not this may take a while to execute):
PYTHON
def guess_seed_point(img):
"""
This function guesses a seed point for the brain in the middle of the image, and returns some seed points.
"""
possible_point = img.GetSize()[0]//2, img.GetSize()[1]//2, img.GetSize()[2]//2
# Get the pixel value at the potential location
pixel_value = img.GetPixel(*possible_point)
if pixel_value > 0:
picked_point = possible_point
else:
# just move over a bit and hope for better
new_possible_point = img.GetSize()[0]//2 + img.GetSize()[0]//10 , img.GetSize()[1]//2, img.GetSize()[2]//2
picked_point = new_possible_point
return picked_point
# do some reality check of a look at the value in your seed point
seed_point = guess_seed_point(sag_image)
pixel_value = sag_image.GetPixel(seed_point)
print(pixel_value)
OUTPUT
394.37042236328125
Grow from seed, then display:
PYTHON
seed_point = guess_seed_point(sag_image)
lower_threshold = 370 # lower threshold
upper_threshold = 480 # upper threshold
seed_mask = sitk.ConnectedThreshold(
sag_image, seedList=[seed_point],
lower=lower_threshold,
upper=upper_threshold)
# let's dilate up a bit
seed_mask= sitk.BinaryDilate(seed_mask, [5, 5, 5])
# apply the mask to the image
brain_only = sitk.Mask(sag_image, seed_mask)
# display to see what happened
interact(
display_images,
image1_z=(0, sag_image.GetSize()[2] - 1),
image2_z=(0, brain_only.GetSize()[2] - 1),
image1_npa=fixed(sitk.GetArrayViewFromImage(sag_image)),
image2_npa=fixed(sitk.GetArrayViewFromImage(brain_only)),
title1=fixed("Original"),
title2=fixed("Seeded and Masked")
)
OUTPUT
In the two solutions above we made masks. They each have different problems we can see. For example in the grown from seed images we stripped out part of the genu and splenium of the corpus callosum not to mention the entire pons. An alternative is that could also have used a registered atlas as a mask. Regardless of the base algorithms we begin with, if we want a better image, we can keep tinkering with our algorithms.
In the provided solutions to the above optional challenge we didn’t get rid of a lot of the neck, and had other issues. A different approach we could have taken would have been to register the brain to a brain-like shape and use this as a mask. Nonetheless the given solutions shows two potential approaches to removing tissue (which could in some cases lead to identification) you don’t want in an image. Many researchers do not want to waste time optimizing a skull stripping technique if this was not the research question, so they use a pre-made technique.
There are various tools available for defacing head imaging, ranging from fully developed software products like FreeSurfer, which includes built-in defacing capabilities, to specialized functions within coding libraries. Some of these tools strip off all of the skull and soft tissue which may be useful for analysis even if we don’t care about deidentification e.g. if we only want to look at brain tissue.
However, a key issue under current investigation is that some defacing algorithms may inadvertently alter more than just the facial features. Emerging research suggests that these algorithms might also affect the morphometry of the brain image. This could lead to the unintended loss or distortion of critical data. Therefore, it is advisable to proceed with caution and, whenever possible, compare the original and defaced images to ensure that important information remains intact and unaltered.
Other Parts of Images
Patient identity can often be inferred with just a few pieces of data. In some cases, a single piece of information can be enough to track down a patient’s identity, especially if medical files are accessible. For instance, a serial number or other identifying number on a medical device may be traceable back to a specific patient.
In other situations, slightly more data might be required to identify a patient. Some patients may wear unique jewelry, such as a MedicAlert bracelet or necklace with initials or a name. While most routine ambulatory images are taken without jewelry, in emergency situations, medical personnel may not have had the time to remove these items. The more data points we have on a patient, the easier it becomes to identify them.
Metadata
Metadata is in the broadest sense data about our data. In the practical sense we need to understand it is a place outside the image that identifying data may be lurking in.
Various tools are available to help de-identify DICOM files in terms of metadata. A notable one is DicomAnonymizer, an open-source tool written in Python.
In some cases, you may need to examine and remove metadata manually or programmatically. For example, in some countries, DICOM fields are used inconsistently, and patient-identifying data can appear in unexpected fields. Therefore, careful examination and customized removal of metadata may be necessary.
Many Ways to Handle a DICOM:
- Multiple libraries, such as Pydicom and SimpleITK (SITK), allow you to read, access, and manipulate DICOM metadata.
- DICOMs follow an extremely complex standard, so it is usually better to use existing libraries rather than raw Python to handle them.
For various reasons, we may prefer Pydicom, SITK, or another method to handle DICOM metadata, typically based on the principle of minimizing dependencies and maintaining simplicity. SITK was introduced earlier in this course. Pydicom is an excellent alternative, particularly because of its comprehensive documentation.
Now, let’s see how to open a DICOM file and work with it using Pydicom.
First, let’s import Pydicom and read in a CT scan:
PYTHON
import pydicom
from pydicom import dcmread
fpath = "data/anonym/our_sample_dicom.dcm"
ds = dcmread(fpath)
print(ds)
OUTPUT
Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length UL: 218
(0002, 0001) File Meta Information Version OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID UI: CT Image Storage
(0002, 0003) Media Storage SOP Instance UID UI: 1.3.46.670589.33.1.63849049636503447100001.4758671761353145811
(0002, 0010) Transfer Syntax UID UI: JPEG Lossless, Non-Hierarchical, First-Order Prediction (Process 14 [Selection Value 1])
(0002, 0012) Implementation Class UID UI: 1.2.840.113845.1.1
(0002, 0013) Implementation Version Name SH: 'Syn7,3,0,258'
(0002, 0016) Source Application Entity Title AE: 'SynapseDicomSCP'
-------------------------------------------------
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0008, 0008) Image Type CS: ['DERIVED', 'SECONDARY', 'MPR']
(0008, 0012) Instance Creation Date DA: '20240418'
(0008, 0013) Instance Creation Time TM: '150716.503'
(0008, 0016) SOP Class UID UI: CT Image Storage
(0008, 0018) SOP Instance UID UI: 1.3.46.670589.33.1.63849049636503447100001.4758671761353145811
(0008, 0020) Study Date DA: '20240418'
(0008, 0022) Acquisition Date DA: '20240418'
(0008, 0023) Content Date DA: '20240418'
(0008, 002a) Acquisition DateTime DT: '20240418150313.020'
(0008, 0030) Study Time TM: '150045'
(0008, 0032) Acquisition Time TM: '150313'
(0008, 0033) Content Time TM: '150314.375'
(0008, 0050) Accession Number SH: '2001433888'
(0008, 0060) Modality CS: 'CT'
(0008, 0070) Manufacturer LO: 'Philips'
(0008, 0080) Institution Name LO: 'BovenIJ Ziekenhuis iCT'
(0008, 0081) Institution Address ST: ''
(0008, 0090) Referring Physician's Name PN: 'WILTING^I^I^""'
(0008, 1010) Station Name SH: 'HOST-999999'
(0008, 1030) Study Description LO: 'CT thorax met iv contrast'
(0008, 103e) Series Description LO: 'Cor IMR med'
(0008, 1040) Institutional Department Name LO: 'Radiology'
(0008, 1080) Admitting Diagnoses Description LO: ''
(0008, 1084) Admitting Diagnoses Code Sequence 0 item(s) ----
(0008, 1090) Manufacturer's Model Name LO: 'iCT 256'
(0008, 1111) Referenced Performed Procedure Step Sequence 1 item(s) ----
(0008, 1150) Referenced SOP Class UID UI: Modality Performed Procedure Step SOP Class
(0008, 1155) Referenced SOP Instance UID UI: 1.3.46.670589.33.1.63849049241567858000001.4675122277016890611
---------
(0008, 1140) Referenced Image Sequence 1 item(s) ----
(0008, 1150) Referenced SOP Class UID UI: CT Image Storage
(0008, 1155) Referenced SOP Instance UID UI: 1.3.46.670589.33.1.63849049294969912500001.5475332148846191441
---------
(0008, 3010) Irradiation Event UID UI: 1.3.46.670589.33.1.63849049343237673200010.5507538603167078985
(0010, 0010) Patient's Name PN: 'OurBeloved^Colleague'
(0010, 0020) Patient ID LO: 'party like 1999'
(0010, 0030) Patient's Birth Date DA: '19421104'
(0010, 0040) Patient's Sex CS: 'M'
(0010, 1000) Other Patient IDs LO: '1989442112'
(0010, 1010) Patient's Age AS: '041Y'
(0018, 0010) Contrast/Bolus Agent LO: 'Iodine'
(0018, 0015) Body Part Examined CS: 'CHEST'
(0018, 0022) Scan Options CS: 'HELIX'
(0018, 0050) Slice Thickness DS: '2.0'
(0018, 0060) KVP DS: '100.0'
(0018, 0088) Spacing Between Slices DS: '2.0'
(0018, 0090) Data Collection Diameter DS: '500.0'
(0018, 1000) Device Serial Number LO: ''
(0018, 1020) Software Versions LO: '4.1'
(0018, 1030) Protocol Name LO: 'Thorax std /Thorax'
(0018, 1040) Contrast/Bolus Route LO: 'IV'
(0018, 1041) Contrast/Bolus Volume DS: '80.0'
(0018, 1044) Contrast/Bolus Total Dose DS: '40.0'
(0018, 1046) Contrast Flow Rate DS: [3, 3]
(0018, 1047) Contrast Flow Duration DS: [17, 10]
(0018, 1049) Contrast/Bolus Ingredient Concentra DS: '300.0'
(0018, 1100) Reconstruction Diameter DS: '348.0'
(0018, 1110) Distance Source to Detector DS: '1040.0'
(0018, 1111) Distance Source to Patient DS: '570.0'
(0018, 1120) Gantry/Detector Tilt DS: '0.0'
(0018, 1130) Table Height DS: '85.1'
(0018, 1150) Exposure Time IS: '434'
(0018, 1151) X-Ray Tube Current IS: '258'
(0018, 1152) Exposure IS: '108'
(0018, 1160) Filter Type SH: 'IMR'
(0018, 1210) Convolution Kernel SH: 'IMR1,Soft Tissue'
(0018, 5100) Patient Position CS: 'FFS'
(0018, 9305) Revolution Time FD: 0.33
(0018, 9306) Single Collimation Width FD: 0.625
(0018, 9307) Total Collimation Width FD: 80.0
(0018, 9309) Table Speed FD: 185.0
(0018, 9310) Table Feed per Rotation FD: 97.664
(0018, 9311) Spiral Pitch Factor FD: 0.763
(0018, 9345) CTDIvol FD: 4.330253533859318
(0018, a001) Contributing Equipment Sequence 1 item(s) ----
(0008, 0070) Manufacturer LO: 'PHILIPS'
(0008, 0080) Institution Name LO: 'BRILLIANCE4'
(0008, 0081) Institution Address ST: 'BRILLIANCE4'
(0008, 1010) Station Name SH: 'HOST-999999'
(0008, 1040) Institutional Department Name LO: 'BRILLIANCE4'
(0008, 1090) Manufacturer's Model Name LO: 'BRILLIANCE4'
(0018, 1000) Device Serial Number LO: 'BRILLIANCE4'
(0018, 1020) Software Versions LO: '4.5.0.30020'
(0040, a170) Purpose of Reference Code Sequence 1 item(s) ----
(0008, 0100) Code Value SH: '109102'
(0008, 0102) Coding Scheme Designator SH: 'DCM'
(0008, 0104) Code Meaning LO: 'Processing Equipment'
---------
---------
(0020, 000d) Study Instance UID UI: 1.3.46.670589.33.1.63849049241560857600001.4706589000974752499
(0020, 000e) Series Instance UID UI: 1.3.46.670589.33.1.63849049343237673200004.5226562961912261811
(0020, 0010) Study ID SH: '8041'
(0020, 0011) Series Number IS: '203'
(0020, 0012) Acquisition Number IS: '2'
(0020, 0013) Instance Number IS: '1'
(0020, 0032) Image Position (Patient) DS: [-172.7884, 8.90000000000001, 1201.43792746114]
(0020, 0037) Image Orientation (Patient) DS: [1, 0, 0, 0, 0, -1]
(0020, 0052) Frame of Reference UID UI: 1.3.46.670589.33.1.63849049263758127200002.5362237490253193614
(0020, 1040) Position Reference Indicator LO: ''
(0020, 4000) Image Comments LT: 'Cor IMR med'
(0028, 0002) Samples per Pixel US: 1
(0028, 0004) Photometric Interpretation CS: 'MONOCHROME2'
(0028, 0010) Rows US: 832
(0028, 0011) Columns US: 772
(0028, 0030) Pixel Spacing DS: [0.4507772, 0.4507772]
(0028, 0100) Bits Allocated US: 16
(0028, 0101) Bits Stored US: 12
(0028, 0102) High Bit US: 11
(0028, 0103) Pixel Representation US: 0
(0028, 1050) Window Center DS: [50, 50]
(0028, 1051) Window Width DS: [350, 350]
(0028, 1052) Rescale Intercept DS: '-1024.0'
(0028, 1053) Rescale Slope DS: '1.0'
(0032, 1033) Requesting Service LO: 'CHIPSOFT'
(0040, 1001) Requested Procedure ID SH: 'CT5001IV'
(0054, 1001) Units CS: 'HU'
(00e1, 0010) Private Creator LO: 'ELSCINT1'
(00e1, 1036) Private tag data CS: 'YES'
(00e1, 1040) [Image Label] SH: 'Cor IMR med'
(00e1, 1046) Private tag data OB: Array of 512 elements
(01f1, 0010) Private Creator LO: 'ELSCINT1'
(01f1, 1001) [Acquisition Type] CS: 'SPIRAL'
(01f1, 1002) [Unknown] CS: 'STANDARD'
(01f1, 100e) [Unknown] FL: 0.0
(01f1, 1027) [Rotation Time] DS: '0.33'
(01f1, 1032) [Image View Convention] CS: 'RIGHT_ON_LEFT'
(01f1, 104a) [Unknown] SH: 'DOM'
(01f1, 104b) [Unknown] SH: '128x0.625'
(01f1, 104d) [Unknown] SH: 'NO'
(01f1, 104e) [Unknown] SH: 'Chest'
(01f1, 1054) Private tag data IS: '11'
(01f1, 1056) Private tag data LO: '30.0451206729581'
(01f7, 0010) Private Creator LO: 'ELSCINT1'
(01f7, 1022) [Unknown] UI: 1.3.46.670589.33.1.63849049343237673200010.5507538603167078985
(07a1, 0010) Private Creator LO: 'ELSCINT1'
(07a1, 1010) [Tamar Software Version] LO: '4.0.0'
(7fe0, 0010) Pixel Data OB: Array of 309328 elements
Challenge: Identifying Safe Metadata in DICOM
Can you determine which metadata for this CT scan is likely safe, meaning it does not lead to patient identification? When would you choose to retain such data?
Metadata related to the machine, image type, and file type are generally safe. This information is particularly valuable when sorting through numerous DICOM files to locate specific types of images or when generating tabular data for harmonization purposes.
We can modify elements of our DICOM metadata:
OUTPUT
'OurBeloved^Colleague'
OUTPUT
(0010, 0010) Patient's Name PN: 'Citizen^Almoni'
In some cases, as here we are dealing with a standard keyword.
The keyword PatientName
is in programming terms technically
a property of the class FileDataset
, but here we are using
“keyword” to refer to it and other very standard properties of the
DICOM. Certain keywords can be modified as it follows:
OUTPUT
(0010, 0010) Patient's Name PN: 'Almoni^Shmalmoni'
You can also just set an element to empty by using None:
OUTPUT
(0010, 0010) Patient's Name PN: None
You can also delete and add elements. After making modifications, remember to save your file:
We recommend removing at least the patient IDs and birthdates in most cases. Additionally, consider examining the data elements ‘OtherPatientIDs’ and ‘OtherPatientIDsSequence’.
Challenge: Accessing Additional Patient Identifying Data
How can you access and print additional patient identifying data? Hint: Refer to the documentation and compare with what we have already printed.
Pydicom offers a wide range of capabilities. You can visualize your DICOM data in a hierarchical tree format for user-friendly GUI reading. It supports downsizing images and handling waveform data such as EKGs. By integrating with Matplotlib, you can load and plot files seamlessly. Before adding additional libraries, explore Pydicom’s full potential to leverage its extensive functionalities.
Key Points
- Certain metadata should almost always be removed from DICOM files before sharing
- Automated tools are available to strip metadata from DICOMs, but manual verification is necessary due to inconsistencies in how fields are utilized
- Several Python libraries enable access to DICOM metadata
- Sharing only image files such as JPEGs or NIfTI can mitigate risks associated with metadata
- Imaging data alone, even without explicit metadata, can sometimes lead to patient identification
- You may need to preprocess images themselves so patients are de-identified
- Tools exist to deface images to further protect patient identity
Content from Generative AI in Medical Imaging
Last updated on 2024-08-15 | Edit this page
Overview
Questions
- What is generative AI?
- How can generative AI be safely used in my work?
Objectives
- Showcase practical examples of generative AI applications
- Explore critical considerations regarding the risks and challenges associated with generative AI
Introduction
Generative artificial intelligence (AI) includes technologies that create or generate data. Since the early 2020s, there has been significant attention on large language models (LLMs) like ChatGPT, Copilot, and LLaMA, alongside image generation tools such as Stable Diffusion, Midjourney, and DALL-E. These tools not only create images but can also manipulate styles.
The applications of generative AI span widely across fields like medical imaging, where they hold potential for image interpretation and numerous other tasks. Of particular interest to participants in this course are the capabilities of LLMs to generate code and produce large volumes of synthetic data. Ongoing research continues to explore these capabilities.
However, the safety implications of these technologies remain a subject of debate. Depending on the software or model used, data entered into the system may become the property of the software’s creators. It is crucial to exercise caution when inputting sensitive information, such as patient data, into these systems. Understanding where and how data is stored (i.e., whether on your servers or in the cloud) is essential to safeguard privacy and confidentiality.
Mastering the Use of Generative AI Tools
Architectures of Generative AI
Generative AI encompasses models capable of generating new data across various formats: text, images, video, or audio. Several prominent architectures, such as generative adversarial networks (GANs), variational autoencoders, diffusion models, and transformers, achieve this capability. While the technical intricacies of these architectures exceed this course’s scope, understanding their operational principles can illuminate the nature of their outputs. For instance, models treating data as sequential sequences (e.g., pixels or words) predict the next element based on probabilities derived from the training dataset, potentially perpetuating common patterns rather than reflecting absolute truths.
However, using generative algorithms carries inherent risks, including inadvertently embedding biases into both data and algorithms. While humans cannot infer patient ethnicity from body imaging, recent studies demonstrate that certain AI algorithms can leverage correlations in data, introducing potential ethical implications. This represents just one facet of the risks associated with these technologies, which we will further explore in the following safety section.
Safely Leveraging Generative AI
- Never upload non-anonymized sensitive patient data outside your secure servers
- Avoid using sensitive patient data with online tools (e.g., chatGPT, integrated tools like co-pilot, MS Office)
- Be mindful of internet connectivity with your code editors and tools
- Prefer systems that operate locally (on your machine or hospital servers)
- Acknowledge that tools may generate erroneous information (hallucinations)
- Don’t assume code generated by a LLM lacks potential bugs or harmful code
- Address intellectual property and ethical considerations when utilizing generative AI models
- Document model usage thoroughly in academic work, detailing versions and exact applications
- Recognize that the training corpus of such models may introduce biases
Stable Diffusion is an open-source tool widely used for image generation, capable of running locally on your machine. While older versions typically require GPUs, recent builds are rumored to run on CPUs, although at a slower pace. For efficient large-scale image generation, utilizing a GPU is recommended to save time.
One limitation of Stable Diffusion, and similar tools, is that the model cannot be customized with your own datasets, restricting the ability to train or modify it according to specific needs.
Challenge: Assessing the Risks of Local Model Deployment
Can you identify a few risks associated with running a model entirely on your machine?
One significant risk of running any model locally is the potential for malicious code. Since you need to download the model to your machine, there is no guarantee that it will not contain malware. To mitigate this risk, we recommend using open-source models, which allow for community scrutiny and transparency.
Another concern is the environmental impact of model usage. Training and maintaining these models consume considerable resources, including carbon and water. We advise against running models like ChatGPT continuously due to their substantial environmental footprint. Even when operating a model locally, there are inherent risks, such as relying on a smaller training dataset to fit on your machine, which could compromise the model’s quality and performance.
Maximizing the Effectiveness of Generative AI
The output of generative tools is highly dependent on the specific words or images inputted, a process known as prompt engineering.
To create content that balances a dataset for machine learning, start by analyzing the dataset’s balance according to important labels. Then, describe images with these labels and build prompts based on the descriptions of label categories with fewer images. This approach helps focus the model’s use on generating useful outcomes while minimizing risks.
Keep in mind that content generation can be a somewhat stochastic process. Therefore, you may need to experiment with and refine your prompts and even the sequences of prompts to achieve the desired results.
Effective Prompting Tips
- Ensure accuracy in spelling and grammar to avoid misdirection
- Provide clear and detailed prompts
- Explain the context or goal (e.g., “helping my son… can you explain it in a specific way?”)
- Try different wordings to refine results
- Fact-check thoroughly
- First, ask the tool to fact-check its output
- Then, verify using search engines and experts
- Never fully trust references or facts generated by an LLM, seek human-generated content for confirmation
- Clearly indicate if you need a particular format (e.g., bullet points, square images)
- Use prompts like “you are a Python expert” or “you are an expert pathologist” to guide the tool
- For more accurate outputs, fine-tune foundation models with your specific data
These tips will help you optimize the quality and relevance of the content generated by AI tools.
Attention to detail in creating prompts is critical. For example, the following image was generated from the misspelling of “X-ray” with a popular AI tool:
While this result is laughable, more subtle mistakes can be harder to identify, especially for those without extensive training in a specific field. Therefore, having a specialist review the generated data is crucial to ensure its validity.
It is also important to remember that the training data for some algorithms is often sourced from the internet, which includes a lot of irrelevant or misleading content. For instance, while there are many cute cat pictures available, there are very few good examples of rare medical conditions like desmoplastic infantile ganglioma. Additionally, using terms like “CAT scan” (historically referring to computer axial tomography) might result in images of cats rather than the intended medical imagery:
Key Points
- Generative programs can create synthetic data, potentially enhancing various algorithms
- Generative AI models have inherent limitations
- Running generative AI programs locally on your own server is safer than using programs that send prompts to external servers
- Exercise caution when entering patient data into generative AI programs
- Numerous policies exist to ensure the safe and ethical use of generative AI tools across institutions