Content from Course Introduction
Last updated on 2024-08-14 | Edit this page
Estimated time: 10 minutes
Overview
Questions
- How can I use this course to be better at my research?
Objectives
- Explain how to get the most from the course
- Demonstrate and explain how the course will be laid out
This is a lesson created in the style of The Carpentries. It is written with the assumption that you already possess skills in terms of git, Python and basic image processing.
The interpretation of medical images for clinical purposes requires skills that take highly trained professionals such as nuclear medicine specialists and radiologists many years to master. This course does not aim to improve such interpretive skills, but rather to enhance the computational skills needed to answer research questions involving medical images.
Some examples of the kinds of research questions that can be answered are:
Can we predict from brain MRIs when patients will become demented before they do?
Can we build machine learning models on ultrasound data which can aid in the detection of neuromuscular diseases?
Are there observable anatomical differences in the brains of autistic people at a population level?
Can we use existing medical imaging to screen for underdiagnosed conditions like osteoporosis?
You are in all likelihood here because you have a research question which can be answered with the processing and analysis of medical images. This course is meant to aid you.
Note that all figures and data presented are licensed under open-source terms.
Challenge: Can You Do It?
What is the way to use the challenges and question?
Do not peek, try to solve it yourself. The effort will pay off.
Content from Medical Imaging Modalities
Last updated on 2024-10-15 | Edit this page
Estimated time: 60 minutes
Overview
Questions
- What are the common different types of diagnostic imaging?
- What sorts of computational challenges do they present?
Objectives
- Explain various common types of medical images
- Explain at a high level how images’ metadata is created and organized
Introduction
Medical imaging uses many technologies including X-rays, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound, positron emission tomography (PET) and microscopy. Although there are tendencies to use certain technologies, or modalities to answer certain clinical questions, many modalities may provide information of interest in terms of research questions. In order to work with digital images at scale we need to use information technology. We receive images in certain types of files, e.g., an x-ray stored at the hospital in DICOM format, but the image itself is contained in a JPEG inside the DICOM as a 2D-array. Understanding all the kinds of files we are dealing with and how the images within them were generated can help us deal with them computationally.
Conceptually, we can think of medical images as signals. These signals need various kinds of processing before they are ‘readable’ by humans or by many of the algorithms we write.
While thinking about how the information from these signals is stored in different file types may seem less exciting than what the “true information” or final diagnosis from the image was, it is necessary to understand this to make the best algorithms possible. For example, a lot of hospital images are essentially JPEGs. This has implications in terms of image quality as we manipulate and resize the images.
Below are a few summaries about various ultra-common imaging types. Keep in mind that manufacturers may have specificities in terms of file types not covered here, and there are many possibilities in terms of how images could potentially be stored. Here we will discuss what is common to get in terms of files given to researchers.
X-Rays
Historically, x-rays were the first common form of medical imaging. The diagram below should help you visualize how they are produced. The signal from an x-ray generator crosses the subject. Some tissues attenuate the radiation more than others. The signal is captured by an x-ray detector (you can think of this metaphorically like photographic film) on the other side of the subject.
As you can imagine if you only have one view in an X-ray every object in the line of radiation from the generator is superimposed on every object below it. Even in the days of film X-rays often two views would be made. In the case of chest X-rays these could be a posteroanterior(PA) view and a lateral view. In the case of joints the views may be specific, however remember that in each view objects in the same line between the generator and receptor will be superimposed.
image courtesy of Radiopaedia, author and ID on image
Modern x-rays are born digital. No actual “film” is produced, rather a DICOM file which typically contains arrays in JPEG files.
We could use the metaphor of a wrapped present here. The DICOM file contains metadata around the image data, wrapping it. The image data itself is a bunch of 2D-arrays, but these have been organized to a specific shape - they are “boxed” by JPEG files. JPEG is a container format. There are JPEG files (emphasis on the plural) in a single DICOM file which typically contain images of the same body part with different angles of acquisition.
We can take x-rays from any angle and even do them repeatedly, and this allows for fluoroscopy. Because fluoroscopy adds a time dimension to X-ray the data becomes 3-dimensional, possessing an X, Y and time dimension. Below is a fluoroscopy image of a patient swallowing barium.
image courtesy of Ptrump16, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons
Fluoroscopy images are stored in a DICOM but can be displayed as movies because they are typically cine-files. Cine- is a file format that lets you store images in sequence with a frame rate.
Computed Tomography and Tomosynthesis
There are several kinds of tomography. This technique produces 3D-images, made of voxels, that allow us to see structures within a subject. CTs are extremely common, and helpful for many diagnostic questions, but have certain costs in terms of not only time and money, but also radiation to patients.
CTs and tomosynthetic images are produced with similar technology. One key difference is that in a CT the image is based on a 360 degree capture of the signal. You can conceptualize this as a spinning donut with the generator and receptor opposite to each other. The raw data of a CT is a sinogram. Only by processing this data do we get what most people would recognize as a CT. At this level of processing there are already choices effecting the data we get. Let’s examine two ways to process our sinograms:
PYTHON
import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import iradon
from skimage.transform import iradon_sart
# load a sinogram of a simple phantom of the head
sino = np.load('data/medical/Schepp_Logan_sinogram.npy')
# make a filtered back projection reconstruction
theta = np.linspace(0.0, 180.0, max(sino.shape), endpoint=False)
reconstruction_fbp = iradon(sino, theta=theta, filter_name='ramp')
# make a reconstruction with Simultaneous Algebraic Reconstruction Technique
reconstruction_sart = iradon_sart(sino, theta=theta)
# plot with matplotlib
fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(9, 5), sharex=True, sharey=True)
ax1.set_title("Sinogram")
ax1.imshow(sino, cmap=plt.cm.Greys_r,)
ax2.set_title("Reconstruction\nFiltered back projection")
ax2.imshow(reconstruction_fbp, cmap=plt.cm.Greys_r)
ax3.set_title("Reconstruction\nSART")
ax3.imshow(reconstruction_sart, cmap=plt.cm.Greys_r)
ax4.set_title("Difference\n between reconstructions")
ax4.imshow(reconstruction_sart - reconstruction_fbp, cmap=plt.cm.Greys_r)
plt.show()
OUTPUT
Images generated from the Shepp–Logan phantom
While you may get an already processed CT (Some commercial machines come with proprietary reconstruction algorithms which will already have been executed), it is not uncommon to get CTs as DICOM CT projection data (DICOM-CT-PD) files which can then be processed before viewing, or in some cases stored off as other file types. As shown in the code above there is more than one way to process the data into a radiologist friendly CT. Filtered Back Projection or Algebraic Reconstruction methods are shown but there are other methods such as iterative reconstruction, convolution back projection and even deep learning based methods.
Tomosynthesis makes X-ray based images using a limited angle instead of going all the way around the patient as in CT. The data from a tomosynthetic image is then processed so that you get multiple angles visible. This gets around the issue of overlapping objects in a plain film X-ray. In both the case of CT and tomosynthesis, the image output is made by processing the originally acquired data. Although most researchers work with already processed images, it is important to keep in mind that in theory the originally acquired data can be processed in a variety of ways.
Ultrasounds
Ultrasounds can produce multiple complex types of images. Ultrasound use high frequency sound waves, sent and captured from a piezoelectric probe (also known as a transducer) to get images.
Just as different tissues attenuate radiation differently, different tissues attenuate the sound waves differently. To be more precise different tissues reflect and absorb the waves differently and this can help us create images after some processing of the signal.
These images can be captured in rapid succession over time, so they can be saved as cine-files inside DICOMs. On the other hand, the sonographer can choose to record only a single ‘frame’, in which case a 2D-array will ultimately be saved.
Typically, sonographers produce a lot of B-mode images, but B-mode is far from the only type of ultrasounds. M-mode or motion mode, like the cine-files in B-mode, can also capture motion, but puts it into a a single 2D-array of one line of the image over time. In the compound image below you can see a B-mode 2D-image and an M-mode made on the line in it.
What Are Some Disadvantages to Ultrasounds in Terms of Computational Analysis?
Ultrasounds images are operator-dependent, often with embedded patient data, and the settings and patients’ positions can vary widely.
Challenge: How to Reduce These Problems?
How can we optimize research involving ultrasounds in terms of the challenges above?
Possible solutions include:
- Reviewing metadata on existing images so it can be matched by machine type
- Training technicians to use standardized settings only
- Creating a machine set only on one power setting
- Image harmonization/normalization algorithms
Magnetic Resonance Imaging
MRIs are images made by utilizing some fairly complicated physics in terms of what we can do to protons (abundant in human tissue) with magnets and radiofrequency waves, and how we capture their signal. Different ordering and timing of radiofrequency pulses and different magnetic gradients give us different MRI sequences. The actual signal on an anatomical MRI needs to be processed typically via Fourier transforms and some other computational work before it is recognizable as anatomy. The raw data is reffered to as the k-space data, and this can be kept in vendor specific formats or open common formats, e.g., ISMRMRD (International Society of Magnetic Resonance MR Raw Data). In practice, we rarely use the k-space data (unless perhaps we are medical physicists) for research on medical pathology. Nonetheless researchers in new sequences for MRI will be very interested in such data, and typically getting the fastest transformations of it possible. There are many ways the raw data could be transformed or used to produce an MRI. While an inverse Fourier transform is typical, a Hartley transform could be used, and some scientists even use deep learning based methods. Let’s look at k-space with a viridis color map:
Sourced from the FastMRI data set of NYU, 2024 (Knoll et al Radiol Artif Intell. 2020 Jan 29;2(1):e190007. doi: 10.1148/ryai.2020190007.https://pubs.rsna.org/doi/10.1148/ryai.2020190007 and the arXiv paper, https://arxiv.org/abs/1811.08839.)
Let’s do an example of a k-space transform
PYTHON
slice_kspace = np.load('data/medical/slice_kspace.npy')
# show shape
print(slice_kspace.shape)
# show type
print(type(slice_kspace))
# print type of an example pixel
print(type(slice_kspace[3,3]))
OUTPUT
(640, 368)
<class 'numpy.ndarray'>
<class 'numpy.complex64'>
Note we have an array that contains numbers with an imaginary element therefore the type is complex. We can extract and graph the real and imaginary parts, and also graph a transformation:
PYTHON
real_from_slice_kspace = slice_kspace.real
imaginary_from_slice_kspace = slice_kspace.imag
# make an inverse fourier
def inverse_fft2_shift(kspace):
return np.fft.fftshift(np.fft.ifft2(np.fft.ifftshift(kspace, axes=(-2,-1)), norm='ortho'),axes=(-2,-1))
# graph both
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(9, 5), sharex=True, sharey=True)
ax1.set_title("K-space real part")
ax1.imshow(real_from_slice_kspace, cmap=plt.cm.Greys_r,)
ax2.set_title("K-space imaginary part")
ax2.imshow(imaginary_from_slice_kspace, cmap=plt.cm.Greys_r,)
ax3.set_title("Transformed K-space")
ax3.imshow(np.abs(inverse_fft2_shift(slice_kspace)), cmap=plt.cm.Greys_r)
Hopefully you can see that our K-space like our sinogram is not so human-readable, and the transformed image is recognizable as a knee. A transformed type of image, one a radiologist will be able to read, is often what we are given. This final product we are used to looking at is such a post-processed 3D-array wrapped inside a DICOM file. We can transform the image, and parts of the metadata, to a variety of file types commonly used in research. These file types will be covered in more detail later in the course.
CT versus MRI
After processing both CT and standard MRIs will give a 3D image. However, it is important to understand that there are differences. An standard MRI sequence will give better differentiation between various soft tissues, wheras a CT will provide better images of bones. CTs can be acquired more quickly and cheaply, but have the hidden cost of radiation.
Challenge: CT, MRIs and Artifacts?
You are researching tumors in adults. You team up with the best radiologist you can find and ask for imaging. She hands you DICOM files of some patient MRIs and CTs, and states “These are exactly the images I use. I have checked that they are all without artifacts” You have the images straight from the radiologist, could there potentially be any artifacts?
In an absolute total sense, they could have artifacts. Both CT and MRI are reconstructed from original data, and the reconstruction will introduce artifacts. The radiologist thinks of artifacts as things like motion blurring from when the patient moves or wrap-around in MRIs when the field of view was set too small. These are obvious to the human eye. However technically every reconstruction algorithm can potentially introduce tiny artifacts not visible to the human eye in the reconstruction.
Other Image Types
Nuclear medicine images scans and pathology images are also broadly available inside hospitals.
Nuclear medicine images e.g. PET and SPECT can be 2D or 3D images based on a signal of a radiotracer given to the patient. When the radiotracer is consumed it lets off gamma rays which are then used as a signal. This type of image can be extremely useful in processes like looking for metastases. While 3D images registered with a CT or MRI give anatomic precision, in some cases a 2D image awnsers basic questions. In the image below a 2D bone scan shows metastasis from prostate cancer.
sourced from RadsWiki, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons
Pathology is currently undergoing a revolution of digitalization, and a typical file format has not emerged yet. Pathology images may be DICOM, but could also be stored as specific kinds of TIFF files or other file types. Pathology is an entire medical discipline in which various types of images are used, both 2D, 3D and often multi-channel i.e. in color. However, one classic type of pathology image is that of a stained tissue slide seen by a microscope. With various stains we can see what is going on in a tissue on a cellular level. In the image below you can see macrophages that have come to sorround actinomyces in someone’s lung.
sourced from By Yale Rosen from USA - Actinomycosis, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=31127755
Beyond the more common types of imaging, researchers are actively looking into new forms of imaging. Some add new information to old modalities, like contrast-enhanced ultrasounds. Other new forms of imaging are novel in terms of the signal, such as terahertz imaging, which uses a previously ‘unused’ part of the electomagnetic radiation spectrum. As you might guess, the more novel the imaging, usually the less consolidation there is around file types and how they are organized. It is useful to remember that all these file types, whether on established image types or novel ones, are sorts of ‘containers’ for the ‘payload’ of the actual images which are the arrays. Often we simply need to know how to get the payload array out of its container and/or where to find certain metadata.
There is less standardization around file formats of certain types of imaging.
For example, while typical radiological images have settled in how they are recorded in DICOM, more novel sequences, such as arterial spin-labeled ones, do not have a standardized way of how they are recorded in it. Some newer forms of imaging such as electrical impedence tomography use entirely different kinds of technologies and signals than anything already existing. When it comes to truly novel imaging types there can be no standardization at all.
Standard Image types
Type | Signal | Standard File | Image file | Image array |
---|---|---|---|---|
X-ray | ionizing radiation | DICOM | JPEG 2000 | 2D arrays |
Standard CT | ionizing radiation | DICOM | Raw or compressed voxel data | 3D arrays |
Ultrasound (B-mode) | high frequency sound | DICOM | CINE | 2D array or 4D tensors |
MRI (spin or gradient echo) | patient’s molecules | DICOM | Raw or compressed voxel data | 3D arrays |
Digital Pathology slides | light through stained tissue | no consensus | often converted to TIFF | multichannel 2D or 3D arrays |
The above table is a drastic simplification as there are always cases where people use novel files or novel designs. There are also many newer technologies like 4D CT. Nonetheless it is useful to know some of the typical data structures you will usually find.
Key Points
- Each imaging modality provides distinct sets of information
- In computational imaging, images are essentially arrays, although embedded in additional data structures
- Many images we may get e.g. MRIs and CTs have already been processed with some algorithms to make them human readable
- Research should be thoughtfully designed, taking into account the constraints and capabilities inherent in human capacities
- We can expect the emergence of additional imaging modalities in the future
Content from Working with MRI
Last updated on 2024-12-29 | Edit this page
Estimated time: 90 minutes
Overview
Questions
- What kinds of MRI are there?
- How are MRI data represented digitally?
- How should I organize and structure files for neuroimaging MRI data?
Objectives
- Mention common kinds of MRI imaging used in research
- Show the most common file formats for MRI
- Introduce MRI coordinate systems
- Load an MRI scan into Python and explain how the data is stored
- View and manipulate image data
- Explain what BIDS is
- Explain advantages of working with Nifti and BIDS
- Show a method to convert from DICOM to BIDS/NIfTI
Introduction
When some programmers open their IDE, they type pwd
because this command allows you to see what directory you are in, and
then often they type git branch
to see which branch they
are on. When many clinicians open an MRI they look at the anatomy, but
also figure out which sequence and type of MRI they are looking at. We
all need to begin with a vague idea of where in the world we are and
which direction we are facing, so to speak. As a digital researcher
orienting yourself in terms of MRIs and related computation is
important. In terms of computation MRIs, particularly neuro MRIs, have
some quirks that make it worthwhile to think of them as a separate part
of the universe than other medical imaging. Three critical differences
between MRIs and other medical imaging are image orientation in terms of
display conventions, typical file types and typical packages used for
processing. In this lesson we will cover these three issues.
As a researcher in general we recommend familiarizing yourself with
the various possible sequences of MRI. Some sequences are much more
suited to answer certain questions than others. Generally we could
divide MR techniques into structural e.g. T1, T2 and so on, functional,
diffusion, perfusion, angiographic techniques and spectroscopy. The
sequences you work with determining the shape of files to expect. As an
example of what you would expect for structural imaging a 3-D array is
the norm, but for diffusion imaging you have a 4D tensor plus .bval and
.bvec files.
If you work directly with a radiology department you will usually get
DICOM files that contain whatever sequences were done. However if you
obtain images from elsewhere they may come in other formats.
File formats
From the MRI scanner, MRI images are initially collected and put in the DICOM format but can be converted to other formats to make working with the data easier. Some file formats can be converted to others, for example NiFTIs or Analyze files can be converted to MINC, but not conversions can not be made from all file formats in all directions.
Common MRI File Formats
To understand common file formats please review the table on neuroimaging file formats here. To get even more information than that table we have included links to documentation websites:
NIfTI is one of the most ubiquitous file formats for storing
neuroimaging data. We can convert DICOM data to NIfTI using dcm2niix software. Many
people prefer working with dcm2bids
to configure long bash
commands needed for dcm2niix
.
For example if you just want to write bash using dcm2niix or dcm2bids to place a convert a hypothetical DICOM file (in a directory /hypothetical) into a hypothetical directory called /converted on your machine you can do it in the following way:
dcm2biids_helper -d /hypothetical -o /converted
One of the advantages of working with dcm2niix
, whether
you work with it directly or with a tool like dcm2bids
is
that it can be used to create Brain Imaging Data Structure (BIDS) files,
since it outputs a NIfTI and a JSON with metadata ready to fit into the
BIDS standard. BIDS is a
widely adopted standard of how data from neuroimaging research can be
organized. The organization of data and files is crucial for seamless
collaboration across research groups and even between individual
researchers. Some pipelines assume your data is organized in BIDS
structure, and these are sometimes called BIDS Apps.
Some of the more popular examples are:
fmriprep
freesurfer
micapipe
SPM
MRtrix3_connectome
We recommend the BIDS starter-kit website for learning the basics of this standard. Working with BIDS or and much of the array of tools available for brain MRIs is often facilitated by some familiarity with writing command line. This could present bit of a problem in terms of scientific reproducibility. One option is to leave bash scripts and good documentation in a repository. Another is to use a pythonic interface to such command line tools. One possibility for this will be discussed in the next section.
Libraries
If you look at python code for research on medical imaging of most of the body, a few libraries, such as SITK, will pop up again and again. The story with MRIs of the brain is entirely different. Most researchers will use libraries like nibabel and related libraries. These libraries are made to read, write, and manipulate neuroimaging file formats efficiently. There is lots of code developed in them for the specific tasks related to brain MRIs. They also tend to integrate better with existing pipelines for brain MRI. nipype is a popular tool which attempts to allow users to harness popular existing pipelines. If you are trying to compare outputs with an existing study, it is worth considering using the same pipeline and software version of the pipeline. Then you know differences between your outcomes are not artifacts of the softwares.
Sourced from https://www.nipreps.org/
Nipreps and Beyond:
- There are many, many packages for medical image analysis
- There are known pre-built pipelines with possibilities in python
- Your pipeline will probably begin with cleaning and preparing data
- You can mix and match parts of pipelines with NiPype
While you may harness sophisticated tools like Freesurfer or FSL to do complex tasks like segmentation and registration in ways that match existing research, you should not avoid checking your data by hand. At some point you should open images and examine them both in terms of the imaging (which could be done with a pre-built viewer) and the computational aspects like metadata and shape. For these mundane tasks we recommend nibabel.
Next, we’ll cover some details on working with NIfTI files with Nibabel.
Reading NIfTI Images
NiBabel is a Python package for reading and writing neuroimaging data. To learn more about how NiBabel handles NIfTIs, refer to the NiBabel documentation on working with NIfTIs, which this episode heavily references. This part of the lesson is also similar to an existing lessons from The Carpentries Incubator; namely: Introduction to Working with MRI Data in Python. This is because in terms of reading and loading data and accessing the image, there are limited possibilities.
First, use the load()
function to create a
NiBabel
image object from a NIfTI file.
When loading a NIfTI file with NiBabel
, you get a
specialized data object that includes all the information stored in the
file. Each piece of information is referred to as an
attribute in Python’s terminology. To view all these
attributes, simply type t2_img.
followed by Tab.
We will focus on discussing mainly two attributes (header
and affine
) and one method (get_fdata
).
1. Headers
The header contains metadata about the image, including image dimensions, data type, and more.
OUTPUT
<class 'nibabel.nifti1.Nifti1Header'> object, endian='<'
sizeof_hdr : 348
data_type : b''
db_name : b''
extents : 0
session_error : 0
regular : b''
dim_info : 0
dim : [ 3 432 432 30 1 1 1 1]
intent_p1 : 0.0
intent_p2 : 0.0
intent_p3 : 0.0
intent_code : none
datatype : int16
bitpix : 16
slice_start : 0
pixdim : [1. 0.9259259 0.9259259 5.7360578 0. 0. 0.
0. ]
vox_offset : 0.0
scl_slope : nan
scl_inter : nan
slice_end : 29
slice_code : unknown
xyzt_units : 2
cal_max : 0.0
cal_min : 0.0
slice_duration : 0.0
toffset : 0.0
glmax : 0
glmin : 0
descrip : b'Philips Healthcare Ingenia 5.4.1 '
aux_file : b''
qform_code : scanner
sform_code : unknown
quatern_b : 0.008265011
quatern_c : 0.7070585
quatern_d : -0.7070585
qoffset_x : 180.81993
qoffset_y : 21.169691
qoffset_z : 384.01007
srow_x : [1. 0. 0. 0.]
srow_y : [0. 1. 0. 0.]
srow_z : [0. 0. 1. 0.]
intent_name : b''
magic : b'n+1'
t2_hdr
is a Python dictionary, i.e. a
container that hold pairs of objects - keys and values. Let’s take a
look at all of the keys.
Similar to t2_img
, in which attributes can be accessed
by typing t2_img.
followed by Tab, you can do
the same with t2_hdr
.
In particular, we’ll be using a method belonging to
t2_hdr
that will allow you to view the keys associated with
it.
OUTPUT
['sizeof_hdr',
'data_type',
'db_name',
'extents',
'session_error',
'regular',
'dim_info',
'dim',
'intent_p1',
'intent_p2',
'intent_p3',
'intent_code',
'datatype',
'bitpix',
'slice_start',
'pixdim',
'vox_offset',
'scl_slope',
'scl_inter',
'slice_end',
'slice_code',
'xyzt_units',
'cal_max',
'cal_min',
'slice_duration',
'toffset',
'glmax',
'glmin',
'descrip',
'aux_file',
'qform_code',
'sform_code',
'quatern_b',
'quatern_c',
'quatern_d',
'qoffset_x',
'qoffset_y',
'qoffset_z',
'srow_x',
'srow_y',
'srow_z',
'intent_name',
'magic']
Notice that methods require you to include () at the end of them when you call them whereas attributes do not. The key difference between a method and an attribute is:
- Attributes are variables belonging to an object and containing information about their properties or characteristics
- Methods are functions that belong to an object and operate on its
attributes. They differ from regular functions by implicitly receiving
the object (
self
) as their first argument.
When you type in t2_img.
followed by Tab, you
may see that attributes are highlighted in orange and methods
highlighted in blue.
The output above is a list of keys you can use to
access values of t2_hdr
. We can access the
value stored by a given key by typing:
Challenge: Extract Values from the NIfTI Header
Extract the ‘pixdim’ field from the NiFTI header of the loaded image.
2. Data
As you’ve seen above, the header contains useful information that
gives us information about the properties (metadata) associated with the
MR data we’ve loaded in. Now we’ll move in to loading the actual
image data itself. We can achieve this by using the method
called t2_img.get_fdata()
:
OUTPUT
array([[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
...,
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]])
The initial observation you might make is the prevalence of zeros in the image. This abundance of zeros might prompt concerns about the presence of any discernible content in the picture. However, when working with radiological images, it’s important to keep in mind that these images frequently contain areas of air surrounding the objects of interest, which appear as black space.
What type of data is this exactly in a computational sense? We can
determine this by calling the type()
function on
t2_data
:
OUTPUT
numpy.ndarray
The data is stored as a multidimensional array,
which can also be accessed through the file’s dataobj
property:
OUTPUT
<nibabel.arrayproxy.ArrayProxy at 0x20c63b5a4a0>
As you might guess there are differences in how your computer handles something made with dataobj and an actual array. These differences effect memory and processing speed. These are not trivial issues if you deal with a very large dataset of MRIs. You can save time and memory by being conscious about what is cached and using the dataobj property when dealing with slices of the array as detailed here
Challenge: Meaning of Attributes of the Array
How can we see the number of dimensions in the t2_data
array? Once again, all of the attributes of the array can be seen by
typing t2_data.
followed by Tab. What is the
shape of the image? Can you make a guess about then size of a voxel
based on the numbers you have here? Why or why not?
OUTPUT
3
t2_data
contains 3 dimensions. You can think of the data
as a 3D version of a picture (more accurately, a volume).
Image by Tropwine, sourced from Wikimedia Commons (2024).https://commons.m.wikimedia.org/wiki/File:3D_array_diagram.svg; Creative Commons Attribution 4.0 International License
Remember typical 2D pictures are made out of pixels, but a 3D MR image is made up of 3D cubes called voxels.
Sourced with minor modification from Ahmed, M., Garzanich, M., Melaragno, L.E. et al. Exploring CT pixel and voxel size effect on anatomic modeling in mandibular reconstruction. 3D Print Med 10, 21 (2024). https://doi.org/10.1186/s41205-024-00223-0; Creative Commons Attribution 4.0 International License
With this in mind we examine the shape:
OUTPUT
(432, 432, 30)
The three numbers given here represent the number of values along a respective dimension (x,y,z). This image was scanned in 30 slices, each with a resolution of 432 x 432 voxels. If each voxel were 1 millimeter, then our image would represent something 3cm tall (so to speak), and this seems unlikely. However, object here is not human, and could be in theory be scanned on a special MRI machine of any size. We can not make a guess from the data we printed here alone about the size of a voxel. We will learn one method to do this later in the episode.
Let’s see the type of data inside of the array.
OUTPUT
dtype('float64')
This tells us that each element in the array (or voxel) is a
floating-point number.
The data type of an image controls the range of possible intensities. As
the number of possible values increases, the amount of space the image
takes up in memory also increases.
OUTPUT
0.0
630641.0785522461
For our data, the range of intensity values goes from 0 (black) to more positive digits (whiter).
To examine the value of a specific voxel, you can access it using its
indices. For example, if you have a 3D array data
, you can
retrieve the value of a voxel at coordinates (x, y, z) with the
following syntax:
This will give you the value stored at the voxel located at the
specified index (x, y, z)
. Make sure that the indices are
within the bounds of the array dimensions.
To inspect the value of a voxel at coordinates (9, 19, 2), you can use the following code:
OUTPUT
0.
This command retrieves and prints the intensity value at the specified voxel. The output represents the signal intensity at that particular location.
Next, we will explore how to extract and visualize larger regions of interest, such as slices or arrays of voxels, for more comprehensive analysis.
Working with Image Data
When we think and speak about about “slicing” colloqiually, we often think about a 2D slice from our data.
Figure with adaptation from Valdes-Hernandez, P.A., Laffitte Nodarse, C., Peraza, J.A. et al. Toward MR protocol-agnostic, unbiased brain age predicted from clinical-grade MRIs. Sci Rep 13, 19570 (2023). https://doi.org/10.1038/s41598-023-47021-y ; Creative Commons Attribution 4.0 International License
From left to right: coronal, saggital and axial slices of a brain.
Let’s select the 10th slice in the z-axis of our data:
This is similar to the indexing we did before to select a single
voxel. However, instead of providing a value for each axis, the
:
indicates that we want to grab all values from
that particular axis.
In the nibabel documentation there are more details on how to use a
slicer
attribute of a nibabel array data. Using this
attribute can help to make code more efficient.
Challenge: Slicing MRI Data
Write a python function using nibabel to take an image string, and
coordinates for any slices desired on x, y and z axes; and plot the
image slices. Then use the function to display slices of
t2_data
at x = 9, y = 9, and z= 9.
PYTHON
# if in a new notebook re-import
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
def slice_image_data(file_path, x, y, z):
# load the image
try:
img = nib.load(file_path)
except Exception as e:
print(f"Error loading the image: {e}")
return
# get the image array
data = img.get_fdata()
# slice the image data
# and extract a specific slice along each plane
z_slice = data[:, :, z] # slice
y_slice = data[:, y, :] # slice
x_slice = data[x, :, :] # slice
# make each slice is 2D by explicitly removing unnecessary dimensions
z_slice = z_slice.squeeze() # this ensures it's a 2D array
y_slice = y_slice.squeeze()
x_slice = x_slice.squeeze()
print(z_slice.shape)
# visualize the slices
plt.figure(figsize=(12, 8))
# plot the z slice
plt.subplot(2, 2, 1)
plt.imshow(z_slice.T, cmap='gray', origin='lower')
plt.title('Z Slice')
plt.axis('off')
# plot the y slice
plt.subplot(2, 2, 2)
plt.imshow(y_slice.T, cmap='gray', origin='lower')
plt.title('Y Slice')
plt.axis('off')
# plot the x slice
plt.subplot(2, 2, 3)
plt.imshow(x_slice.T, cmap='gray', origin='lower')
plt.title('X Slice')
plt.axis('off')
slice_image_data('data/mri//OBJECT_phantom_T2W_TSE_Cor_14_1.nii',9,9,9)
In the above exercise you may note many solutions do not return anything. This is not unusal in Python when we only want a graph or visualization. We really only want an artifact of some functions, however we could return the artifact with more code if needed.
We’ve been slicing and dicing images but we have no idea what they look like in a more global sense. In the next section we’ll show you one way you can visualize it all together.
Visualizing the Data
We previously inspected the signal intensity of the voxel at coordinates (10,20,3).
We could look at the whole image at once by using a viewer. We can even use code to make a viewer.
PYTHON
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widget
import nibabel as nib
import importlib
class NiftiSliceViewer:
"""
A class to examine slices of MRIs, which are in Nifti Format. Similar code appears
in several open source liberally licensed libraries written by drcandacemakedamoore
"""
def __init__(self, volume_str, figsize=(10, 10)):
self.nifti = nib.load(volume_str)
self.volume = self.nifti.get_fdata()
self.figsize = figsize
self.v = [np.min(self.volume), np.max(self.volume)]
self.widgets = importlib.import_module('ipywidgets')
self.widgets.interact(self.transpose, view=self.widgets.Dropdown(
options=['axial', 'saggital', 'coronal'],
value='axial',
description='View:',
disabled=False))
def transpose(self, view):
# transpose the image to orient according to the slice plane selection
orient = {"saggital": [1, 2, 0], "coronal": [2, 0, 1], "axial": [0, 1, 2]}
self.vol = np.transpose(self.volume, orient[view])
maxZ = self.vol.shape[2] - 1
self.widgets.interact(
self.plot_slice,
z=self.widgets.IntSlider(
min=0,
max=maxZ,
step=1,
continuous_update=True,
description='Image Slice:'
),
)
def plot_slice(self, z):
# plot slice for plane which will match the widget intput
self.fig = plt.figure(figsize=self.figsize)
plt.imshow(
self.vol[:, :, z],
cmap="gray",
vmin=0,
vmax=self.v[1],
)
# now we wil use our class on our image
NiftiSliceViewer('data/mri//OBJECT_phantom_T2W_TSE_Cor_14_1.nii')
Notice our viewer has axial, saggital and coronal slices. These are anatomical terms that allow us to make some assumptions about where things are in space if we assume the patient went into the machine in a certain way. When we look at an abstract image, these terms have little meaning, but let’s take a look at a head.
Challenge: Understanding anatomical terms for brains
Use the internet to read at least two sources on the meaning of the terms axial, coronal and saggital. Write definitions for axial, saggital and coronal in terms of how they slice up the head at the eyes in your own words.
Axial: Slices the head as if slicing across both eyes, such that there is a slice between higher and lower levels Saggital: Slicing between or on the side of the eyes, such that there is a slice between more left and more right Coronal: Slicing both eyes such that there is a slice between front and back levels
This brings us to the final crucial attribute of a NIfTI we will discuss: affine.
3. Affine
The final important piece of metadata associated with an image file is the affine matrix, which indicates the position of the image array data in the reference space. By reference space we usually mean a predefined system mapping to real world space if we are talking about real patient data.
Below is the affine matrix for our data:
OUTPUT
array([[-9.25672975e-01, 2.16410652e-02, -1.74031337e-05,
1.80819931e+02],
[ 2.80924864e-06, -3.28338569e-08, -5.73605776e+00,
2.11696911e+01],
[-2.16410652e-02, -9.25672975e-01, -2.03403855e-07,
3.84010071e+02],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00000000e+00]])
To explain this concept, recall that we referred to coordinates in our data as (x,y,z) coordinates such that:
- x is the first dimension of
t2_data
- y is the second dimension of
t2_data
- z is the third dimension of
t2_data
Although this tells us how to access our data in terms of voxels in a 3D volume, it doesn’t tell us much about the actual dimensions in our data (centimetres, right or left, up or down, back or front). The affine matrix allows us to translate between voxel coordinates in (x,y,z) and world space coordinates in (left/right, bottom/top, back/front). An important thing to note is that in reality in which order you have:
- Left/right
- Bottom/top
- Back/front
Depends on how you’ve constructed the affine matrix; thankfully there is in depth coverage of the issue the nibabel documentation For most of the the data we’re dealing with we use a RAS coordinate system so it always refers to:
- Right
- Anterior
- Superior
Increasing a coordinate value in the first dimension corresponds to moving to the right of the person being scanned, and so on. In the real world whatever orientation you put something in may make someone unhappy. Luckily you can quickly change arrays around in terms of direction by simply using an already efficient numpy functions. Two functions in numpy that can be generalized to make any orientation of an image are numpy.flip() and numpy.rot90(), however there are other functions which are quite convenient for 2D arrays, as displayed below.
PYTHON
import numpy as np
slices = [z_slice, np.fliplr(z_slice), np.flipud(z_slice)]
fig, axes = plt.subplots(1, len(slices))
for i, slice in enumerate(slices):
axes[i].imshow(slice, cmap="gray", origin="lower")
OUTPUT
This brings us to a final difference we must account for when we discuss neuro-MRIs: anatomy and visualization conventions.
Display conventions
When we describe imaging of any part of the body except the brain, as professionals we all agree on certain conventions. We rely on anatomical position as the basis of how we orient ourselves, and we expect the patient’s right side to be on the left of our screen, and certain other convensions. In terms of brain MRIs, this is less true; there is a split. The issue is extremely well summarized in the nibabel documentation on radiological versus neurological conventions.
MRI processing in Python
To get a deeper view of MRI processing in Python you can explore lessons from Carpentries Incubators; namely:
Content from Registration and Segmentation with SITK
Last updated on 2024-10-08 | Edit this page
Estimated time: 150 minutes
Overview
Questions
- What are SITK Images?
- How can registration be implemented in SITK?
- How can I segment an image in SITK?
Objectives
- Explain how to perform basic operations on SITK Images
- Explain when registration can be needed and how to register images with SITK
- Become familiar with basic segmentation algorithms available in SITK
Introduction
In the realm of medical imaging analysis, registration and segmentation play crucial roles. Registration aligns images, enabling the fusion of data or the tracking of changes over time. Segmentation identifies and labels objects of interest within images. Automation is essential due to high clinical demand, driving the development of robust algorithms for both processes. Moreover, many advanced segmentation techniques utilize image registration, whether implicitly or explicitly.
SimpleITK (SITK) is a simplified programming interface to the algorithms and data structures of the Insight Toolkit (ITK) for segmentation, registration and advanced image analysis, available in many programming languages (e.g., C++, Python, R, Java, C#, Lua, Ruby).
SITK is part of the Insight Software Consortium a non-profit educational consortium dedicated to promoting and maintaining open-source, freely available software for medical image analysis. Its copyright is held by NumFOCUS, and the software is distributed under the Apache License 2.0.
In this episode, we use a hands-on approach utilizing Python to show how to use SITK for performing registration and segmentation tasks in medical imaging use cases.
Fundamental Concepts
In this section, we’ll cover some fundamental image processing operations using SITK, such as reading and writing images, accessing pixel values, and resampling images.
Images
The fundamental tenet of an image in ITK and consequentially in SITK is that an image is defined by a set of points on a grid occupying a physical region in space . This significantly differs from many other image analysis libraries that treat an image as an array which has two implications: (1) pixel/voxel spacing is assumed to be isotropic and (2) there is no notion of an image’s location in physical space.
SITK images are multi-dimensional (the default configuration includes images from two dimensional up to five dimensional) and can be a scalar, labelmap (scalar with run length encoding), complex value or have an arbitrary number of scalar channels (also known as a vector image). The region in physical space which an image occupies is defined by the image’s:
- Origin (vector like type) - location in the world coordinate system of the voxel with all zero indexes.
- Spacing (vector like type) - distance between pixels along each of the dimensions.
- Size (vector like type) - number of pixels in each dimension.
- Direction cosine matrix (vector like type representing matrix in row major order) - direction of each of the axes corresponding to the matrix columns.
The following figure illustrates these concepts.
In SITK, when we construct an image we specify its dimensionality, size and pixel type, all other components are set to reasonable default values:
- Origin - all zeros.
- Spacing - all ones.
- Direction - identity.
- Intensities in all channels - all zero.
The tenet that images occupy a spatial location in the physical world has to do with the original application domain of ITK and SITK, medical imaging. In that domain images represent anatomical structures with metric sizes and spatial locations. Additionally, the spacing between voxels is often non-isotropic (most commonly the spacing along the inferior-superior/foot-to-head direction is larger). Viewers that treat images as an array will display a distorted image as shown below:
As an image is also defined by its spatial location, two images with the same pixel data and spacing may not be considered equivalent. Think of two CT scans of the same patient acquired at different sites. The following figure illustrates the notion of spatial location in the physical world, the two images are considered different even though the intensity values and pixel spacing are the same.
As SITK images occupy a physical region in space, the quantities defining this region have metric units (cm, mm, etc.). In general SITK assumes units are in millimeters (historical reasons, due to DICOM standard). In practice SITK is not aware of the specific units associated with each image, it just assumes that they are consistent. Thus, it is up to you the developer to ensure that all of the images you read and created are using the same units.
A SITK image can have an arbitrary number of channels with the content of the channels being a scalar or complex value. This is determined when an image is created.
In the medical domain, many image types have a single scalar channel (e.g. CT, US). Another common image type is a three channel image where each channel has scalar values in [0,255], often people refer to such an image as an RGB image. This terminology implies that the three channels should be interpreted using the RGB color space. In some cases you can have the same image type, but the channel values represent another color space, such as HSV (it decouples the color and intensity information and is a bit more invariant to illumination changes). SITK has no concept of color space, thus in both cases it will simply view a pixel value as a 3-tuple.
Let’s read an example of human brain CT, and let’s explore it with SITK.
PYTHON
%matplotlib inline
import matplotlib.pyplot as plt
import SimpleITK as sitk
img_volume = sitk.ReadImage("data/sitk/A1_grayT1.nrrd")
print(type(img_volume))
print(img_volume.GetOrigin())
print(img_volume.GetSpacing())
print(img_volume.GetDirection())
OUTPUT
<class 'SimpleITK.SimpleITK.Image'>
(-77.625, -107.625, 119.625)
(0.75, 0.75, 0.75)
(0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, -1.0, 0.0)
The function sitk.ReadImage
loads the file as a
SimpleITK.SimpleITK.Image
instance, and then we can access
useful methods to get an idea of how the image/s contained in the file
is/are. The size of the image’s dimensions have explicit accessors:
PYTHON
print(img_volume.GetSize())
print(img_volume.GetWidth())
print(img_volume.GetHeight())
print(img_volume.GetDepth())
print(img_volume.GetNumberOfComponentsPerPixel())
print(img_volume.GetPixelIDTypeAsString())
OUTPUT
(288, 320, 208)
288
320
208
1
32-bit float
Just inspecting these accessors, we deduce that the file contains a volume made of 208 images, each made of 288x320 pixels and one channel only (grayscale).
SITK Conventions
- Image access is in x,y,z order,
image.GetPixel(x,y,z)
orimage[x,y,z]
, with zero based indexing. - If the output of an ITK filter has non-zero starting index, then the index will be set to 0, and the origin adjusted accordingly.
Displaying Images
While SITK does not do visualization, it does contain a built in
Show
method. This function writes the image out to disk and
than launches a program for visualization. By default it is configured
to use ImageJ, because it is readily supports all the image types which
SITK has and load very quickly.
SITK provides two options for invoking an external viewer, use a
procedural interface (the Show
function) or an object
oriented one. For more details about them, please refer to this
notebook from the official documentation.
In this episode we will convert SITK images to numpy
arrays, and we will plot them as such.
Images as Arrays
We have two options for converting from SITK to a numpy
array:
-
GetArrayFromImage()
: returns a copy of the image data. You can then freely modify the data as it has no effect on the original SITK image. -
GetArrayViewFromImage()
: returns a view on the image data which is useful for display in a memory efficient manner. You cannot modify the data and the view will be invalid if the original SITK image is deleted.
The order of index and dimensions need careful attention during
conversion. ITK’s Image class has a GetPixel
which takes an
ITK Index object as an argument, which is ordered as (x,y,z). This is
the convention that SITK’s Image class uses for the
GetPixel
method and slicing operator as well. In
numpy
, an array is indexed in the opposite order (z,y,x).
Also note that the access to channels is different. In SITK you do not
access the channel directly, rather the pixel value representing all
channels for the specific pixel is returned and you then access the
channel for that pixel. In the numpy array you are accessing the channel
directly. Let’s see this in an example:
PYTHON
import numpy as np
multi_channel_3Dimage = sitk.Image([2, 4, 8], sitk.sitkVectorFloat32, 5)
x = multi_channel_3Dimage.GetWidth() - 1
y = multi_channel_3Dimage.GetHeight() - 1
z = multi_channel_3Dimage.GetDepth() - 1
multi_channel_3Dimage[x, y, z] = np.random.random(
multi_channel_3Dimage.GetNumberOfComponentsPerPixel()
)
nda = sitk.GetArrayFromImage(multi_channel_3Dimage)
print("Image size: " + str(multi_channel_3Dimage.GetSize()))
print("Numpy array size: " + str(nda.shape))
# Notice the index order and channel access are different:
print("First channel value in image: " + str(multi_channel_3Dimage[x, y, z][0]))
print("First channel value in numpy array: " + str(nda[z, y, x, 0]))
OUTPUT
Image size: (2, 4, 8)
Numpy array size: (8, 4, 2, 5)
First channel value in image: 0.5384010076522827
First channel value in numpy array: 0.538401
Going back to the loaded file, let’s plot the array version of the volume slice from the middle of the stack, along the z axis, using different color maps:
PYTHON
npa = sitk.GetArrayViewFromImage(img_volume)
# Display the image slice from the middle of the stack, z axis
z = int(img_volume.GetDepth() / 2)
npa_zslice = sitk.GetArrayViewFromImage(img_volume)[z, :, :]
# Three plots displaying the same data, how do we deal with the high dynamic range?
fig = plt.figure(figsize=(10, 3))
fig.add_subplot(1, 3, 1)
plt.imshow(npa_zslice)
plt.title("default colormap", fontsize=10)
plt.axis("off")
fig.add_subplot(1, 3, 2)
plt.imshow(npa_zslice, cmap=plt.cm.Greys_r)
plt.title("grey colormap", fontsize=10)
plt.axis("off")
fig.add_subplot(1, 3, 3)
plt.title(
"grey colormap,\n scaling based on volumetric min and max values", fontsize=10
)
plt.imshow(npa_zslice, cmap=plt.cm.Greys_r, vmin=npa.min(), vmax=npa.max())
plt.axis("off")
We can also do the reverse, i.e. converting a numpy
array to the SITK Image:
OUTPUT
<class 'SimpleITK.SimpleITK.Image'>
We can also plot multiple slices at the same time, for better inspecting the volume:
PYTHON
img_xslices = [img_volume[x,:,:] for x in range(50, 200, 30)]
img_yslices = [img_volume[:,y,:] for y in range(50, 200, 30)]
img_zslices = [img_volume[:,:,z] for z in range(50, 200, 30)]
tile_x = sitk.Tile(img_xslices, [1,0])
tile_y = sitk.Tile(img_yslices, [1,0])
tile_z = sitk.Tile(img_zslices, [1,0])
nda_xslices = sitk.GetArrayViewFromImage(tile_x)
nda_yslices = sitk.GetArrayViewFromImage(tile_y)
nda_zslices = sitk.GetArrayViewFromImage(tile_z)
fig, (ax1, ax2, ax3) = plt.subplots(1,3)
ax1.imshow(nda_xslices, cmap=plt.cm.Greys_r)
ax2.imshow(nda_yslices, cmap=plt.cm.Greys_r)
ax3.imshow(nda_zslices, cmap=plt.cm.Greys_r)
ax1.set_title('X slices')
ax2.set_title('Y slices')
ax3.set_title('Z slices')
Operations like slice indexing, cropping, flipping, … can be
performed on SITK images very similarly as it is usually done in
numpy
. Note that slicing of SITK images returns a copy of
the image data, similarly to slicing Python lists, and differently from
the “view” returned by slicing numpy
arrays.
PYTHON
n_slice = 150
# Original slice
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, :, n_slice]), cmap="gray")
PYTHON
# Subsampling
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, ::3, n_slice]), cmap="gray")
PYTHON
# Comparative operators
plt.imshow(sitk.GetArrayViewFromImage(img_volume[:, :, n_slice] > 90), cmap="gray")
PYTHON
# Draw a square
draw_square = sitk.GetArrayFromImage(img_volume[:, :, n_slice])
draw_square[0:100,0:100] = draw_square.max()
plt.imshow(draw_square, cmap="gray")
Another example of operation we can perform is creating a grid mask and apply it to an image:
PYTHON
img_zslice = img_volume[:, :, n_slice]
# Create a grid and use as mask
grid_image = sitk.GridSource(
outputPixelType=sitk.sitkUInt16,
size=img_zslice.GetSize(),
sigma=(0.1, 0.1),
gridSpacing=(20.0, 20.0),
)
# zero out the values in the original image that correspond to
# the grid lines in the grid_image
img_zslice[grid_image == 0] = 0
nda = sitk.GetArrayViewFromImage(img_zslice)
plt.imshow(nda, cmap="gray")
Challenge: Fix the Error (Optional)
By running the lines of code above for masking the slice with the grid, you will get an error. Can you guess what is it about?
By default, SITK creates images centered in the origin, which all ones spacing. We need to have the same values in the grid and in the image in order to superimpose the first to the latter.
Add these two lines to the code above, after the creation of the grid. Everything should work fine now. Remember that making such changes to an image already containing data should be done cautiously.
Meta-dictionaries
SITK can read (and write) images stored in a single file, or a set of files (e.g. DICOM series).
Images stored in the DICOM format have a meta-data dictionary associated with them, which is populated with the DICOM tags. When a DICOM series is read as a single image, the meta-data information is not available since DICOM tags are specific to each file. If you need the meta-data, you have three options:
Using the object oriented interface’s ImageSeriesReader class, configure it to load the tags using the
MetaDataDictionaryArrayUpdateOn
method and possibly theLoadPrivateTagsOn
method if you need the private tags. Once the series is read you can access the meta-data from the series reader using theGetMetaDataKeys
,HasMetaDataKey
, andGetMetaData
.Using the object oriented interface’s ImageFileReader, set a specific slice’s file name and only read it’s meta-data using the
ReadImageInformation
method which only reads the meta-data but not the bulk pixel information. Once the meta-data is read you can access it from the file reader using theGetMetaDataKeys
,HasMetaDataKey
, andGetMetaData
.Using the object oriented interface’s ImageFileReader, set a specific slice’s file name and read it. Or using the procedural interface’s, ReadImage function, read a specific file. You can then access the meta-data directly from the Image using the
GetMetaDataKeys
,HasMetaDataKey
, andGetMetaData
.
Note: When reading an image series, via the
ImageSeriesReader
or via the procedural
ReadImage
interface, the images in the list are assumed to
be ordered correctly (GetGDCMSeriesFileNames
ensures this
for DICOM). If the order is incorrect, the image will be read, but its
spacing and possibly the direction cosine matrix will be incorrect.
Let’s read in a digital x-ray image saved in a DICOM file format, and let’s print the metadata’s keys:
PYTHON
img_xray = sitk.ReadImage('data/sitk/digital_xray.dcm')
for key in img_xray.GetMetaDataKeys():
print(f'"{key}":"{img_xray.GetMetaData(key)}"')
OUTPUT
"0008|0005":"ISO_IR 100"
"0008|0012":"20180713"
"0008|0013":"233245.431"
"0008|0016":"1.2.840.10008.5.1.4.1.1.7"
"0008|0018":"2.25.225981116244996633889747723103230447077"
"0008|0020":"20160208"
"0008|0060":"XC"
"0020|000d":"2.25.59412532821774470466514450673479191872"
"0020|000e":"2.25.149748537243964334146339164752570719260"
"0028|0002":"3"
"0028|0004":"YBR_FULL_422"
"0028|0006":"0"
"0028|0010":"357"
"0028|0011":"371"
"0028|0100":"8"
"0028|0101":"8"
"0028|0102":"7"
"0028|0103":"0"
"ITK_original_direction":"[UNKNOWN_PRINT_CHARACTERISTICS]
"
"ITK_original_spacing":"[UNKNOWN_PRINT_CHARACTERISTICS]
"
Reading a DICOM:
- Many libraries allow you to read DICOM metadata.
- PyDICOM will be explored for this task later.
- If you are already using SITK, you will usually not need an extra library to get DICOM metadata.
- Many libraries have some basic DICOM functionality, check the documentation before adding extra dependencies.
Generally speaking, SITK represents color images as multi-channel images independent of a color space. It is up to you to interpret the channels correctly based on additional color space knowledge prior to using them for display or any other purpose.
Things to note: 1. When using SITK to read a color DICOM image, the channel values will be transformed to the RGB color space. 2. When using SITK to read a scalar image, it is assumed that the lowest intensity value is black and highest white. If the photometric interpretation tag is MONOCHROME2 (lowest value displayed as black) nothing is done. If it is MONOCHROME1 (lowest value displayed as white), the pixel values are inverted.
PYTHON
print(f'Image Modality: {img_xray.GetMetaData("0008|0060")}')
print(img_xray.GetNumberOfComponentsPerPixel())
OUTPUT
Image Modality: XC
3
“0008|0060” is a code indicating the modality, and “XC” stays for “External-camera Photography”.
Grayscale Images Stored as sRGB
“digital_xray.dcm” image is sRGB, even if an x-ray should be a single channel gray scale image. In some cases looks may be deceiving. Gray scale images are not always stored as a single channel image. In some cases an image that looks like a gray scale image is actually a three channel image with the intensity values repeated in each of the channels. Even worse, some gray scale images can be four channel images with the channels representing RGBA and the alpha channel set to all 255. This can result in a significant waste of memory and computation time. Always become familiar with your data.
We can convert sRGB to gray scale:
PYTHON
def srgb2gray(image):
# Convert sRGB image to gray scale and rescale results to [0,255]
channels = [
sitk.VectorIndexSelectionCast(image, i, sitk.sitkFloat32)
for i in range(image.GetNumberOfComponentsPerPixel())
]
# linear mapping
I = 1 / 255.0 * (0.2126 * channels[0] + 0.7152 * channels[1] + 0.0722 * channels[2])
# nonlinear gamma correction
I = (
I * sitk.Cast(I <= 0.0031308, sitk.sitkFloat32) * 12.92
+ I ** (1 / 2.4) * sitk.Cast(I > 0.0031308, sitk.sitkFloat32) * 1.055
- 0.055
)
return sitk.Cast(sitk.RescaleIntensity(I), sitk.sitkUInt8)
img_xray_gray = srgb2gray(img_xray)
nda = sitk.GetArrayViewFromImage(img_xray_gray)
plt.imshow(np.squeeze(nda), cmap="gray")
Transforms
SITK supports two types of spatial transforms, ones with a global
(unbounded) spatial domain and ones with a bounded spatial domain.
Points in SITK are mapped by the transform using the
TransformPoint
method.
All global domain transforms are of the form:
\(T(x) = A(x - c) + t + c\)
The nomenclature used in the documentation refers to the components of the transformations as follows:
- Matrix - the matrix \(A\).
- Center - the point \(c\).
- Translation - the vector \(t\).
- Offset - the expression \(t + c - Ac\).
A variety of global 2D and 3D transformations are available (translation, rotation, rigid, similarity, affine…). Some of these transformations are available with various parameterizations which are useful for registration purposes.
The second type of spatial transformation, bounded domain transformations, are defined to be identity outside their domain. These include the B-spline deformable transformation, often referred to as Free-Form Deformation, and the displacement field transformation.
The B-spline transform uses a grid of control points to represent a spline based transformation. To specify the transformation the user defines the number of control points and the spatial region which they overlap. The spline order can also be set, though the default of cubic is appropriate in most cases. The displacement field transformation uses a dense set of vectors representing displacement in a bounded spatial domain. It has no implicit constraints on transformation continuity or smoothness.
Finally, SITK supports a composite transformation with either a bounded or global domain. This transformation represents multiple transformations applied one after the other \(T_0(T_1(T_2(...T_n(p))))\). The semantics are stack based, that is, first in last applied:
For more details about SITK transformation types and examples, see this tutorial notebook.
Resampling
Resampling, as the verb implies, is the action of sampling an image, which itself is a sampling of an original continuous signal.
Generally speaking, resampling in SITK involves four components:
- Image - the image we resample, given in coordinate system \(m\).
- Resampling grid - a regular grid of points given in coordinate system \(f\) which will be mapped to coordinate system \(m\).
- Transformation \(T_f^m\) - maps points from coordinate system \(f\) to coordinate system \(m\), \(^mp=T_f^m(^fp)\).
- Interpolator - method for obtaining the intensity values at arbitrary points in coordinate system \(m\) from the values of the points defined by the Image.
While SITK provides a large number of interpolation methods, the two most commonly used are sitkLinear and sitkNearestNeighbor. The former is used for most interpolation tasks and is a compromise between accuracy and computational efficiency. The later is used to interpolate labeled images representing a segmentation. It is the only interpolation approach which will not introduce new labels into the result.
The SITK interface includes three variants for specifying the resampling grid:
- Use the same grid as defined by the resampled image.
- Provide a second, reference, image which defines the grid.
- Specify the grid using: size, origin, spacing, and direction cosine matrix.
Points that are mapped outside of the resampled image’s spatial extent in physical space are set to a constant pixel value which you provide (default is zero).
It is not uncommon to end up with an empty (all black) image after resampling. This is due to:
- Using wrong settings for the resampling grid (not too common, but does happen).
- Using the inverse of the transformation \(T_f^m\). This is a relatively common error,
which is readily addressed by invoking the transformation’s
GetInverse
method.
Let’s try to plot multiple slices across different axis for the image “training_001_mr_T1.mha”.
PYTHON
img_volume = sitk.ReadImage("data/sitk/training_001_mr_T1.mha")
print(img_volume.GetSize())
print(img_volume.GetSpacing())
OUTPUT
(256, 256, 26)
(1.25, 1.25, 4.0)
We can plot the slices as we did before:
PYTHON
img_xslices = [img_volume[x,:,:] for x in range(50, 200, 30)]
img_yslices = [img_volume[:,y,:] for y in range(50, 200, 30)]
img_zslices = [img_volume[:,:,z] for z in range(1, 25, 3)]
tile_x = sitk.Tile(img_xslices, [1,0])
tile_y = sitk.Tile(img_yslices, [1,0])
tile_z = sitk.Tile(img_zslices, [1,0])
nda_xslices = sitk.GetArrayViewFromImage(tile_x)
nda_yslices = sitk.GetArrayViewFromImage(tile_y)
nda_zslices = sitk.GetArrayViewFromImage(tile_z)
fig, (ax1, ax2, ax3) = plt.subplots(1,3)
ax1.imshow(nda_xslices, cmap=plt.cm.Greys_r)
ax2.imshow(nda_yslices, cmap=plt.cm.Greys_r)
ax3.imshow(nda_zslices, cmap=plt.cm.Greys_r)
ax1.set_title('X slices')
ax2.set_title('Y slices')
ax3.set_title('Z slices')
Challenge: Distorted Images
What is the main difference with the first image we plotted (“A1_grayT1.nrrd”)?
In this case, there are only 26 images in the volume and the spacing between voxels is non-isotropic, and in particular it is the same across x- and y-axis, but it differs across the z-axis.
We can fix the distortion by resampling the volume along the z-axis, which has a different spacing (i.e., 4mm), and make it match with the other two spacing measures (i.e., 1.25mm):
PYTHON
def resample_img(image, out_spacing=[1.25, 1.25, 1.25]):
# Resample images to 1.25mm spacing
original_spacing = image.GetSpacing()
original_size = image.GetSize()
out_size = [
int(np.round(original_size[0] * (original_spacing[0] / out_spacing[0]))),
int(np.round(original_size[1] * (original_spacing[1] / out_spacing[1]))),
int(np.round(original_size[2] * (original_spacing[2] / out_spacing[2])))]
resample = sitk.ResampleImageFilter()
resample.SetOutputSpacing(out_spacing)
resample.SetSize(out_size)
resample.SetOutputDirection(image.GetDirection())
resample.SetOutputOrigin(image.GetOrigin())
resample.SetTransform(sitk.Transform())
resample.SetDefaultPixelValue(image.GetPixelIDValue())
resample.SetInterpolator(sitk.sitkBSpline)
return resample.Execute(image)
resampled_sitk_img = resample_img(img_volume)
print(resampled_sitk_img.GetSize())
print(resampled_sitk_img.GetSpacing())
OUTPUT
(256, 256, 83)
(1.25, 1.25, 1.25)
Registration
Image registration involves spatially transforming the source/moving image(s) to align with the target image. More specifically, the goal of registration is to estimate the transformation which maps points from one image to the corresponding points in another image. The transformation estimated via registration is said to map points from the fixed image (target image) coordinate system to the moving image (source image) coordinate system.
Many Ways to Do Registration:
- Several libraries offer built-in registration functionalities.
- Registration can be performed with DIPY (mentioned in the MRI
episode), specifically for diffusion imaging using the
SymmetricDiffeomorphicRegistration
functionality. - NiBabel includes a function in its processing module that allows resampling or conforming one NIfTI image into the space of another.
- SITK provides robust registration capabilities and allows you to code your own registration algorithms.
- To maintain cleaner and more efficient code, it’s advisable to use as few libraries as possible and avoid those that may create conflicts.
SITK provides a configurable multi-resolution registration framework, implemented in the ImageRegistrationMethod class. In addition, a number of variations of the Demons registration algorithm are implemented independently from this class as they do not fit into the framework.
The task of registration is formulated using non-linear optimization which requires an initial estimate. The two most common initialization approaches are (1) Use the identity transform (a.k.a. forgot to initialize). (2) Align the physical centers of the two images (see CenteredTransformInitializerFilter). If after initialization there is no overlap between the images, registration will fail. The closer the initialization transformation is to the actual transformation, the higher the probability of convergence to the correct solution.
If your registration involves the use of a global domain transform (described here), you should also set an appropriate center of rotation. In many cases you want the center of rotation to be the physical center of the fixed image (the CenteredTransformCenteredTransformInitializerFilter ensures this). This is of significant importance for registration convergence due to the non-linear nature of rotation. When the center of rotation is far from our physical region of interest (ROI), a small rotational angle results in a large displacement. Think of moving the pivot/fulcrum point of a lever. For the same rotation angle, the farther you are from the fulcrum the larger the displacement. For numerical stability we do not want our computations to be sensitive to very small variations in the rotation angle, thus the ideal center of rotation is the point which minimizes the distance to the farthest point in our ROI:
\(p_{center} = arg_{p_{rotation}} min dist (p_{rotation}, \{p_{roi}\})\)
Without additional knowledge we can only assume that the ROI is the whole fixed image. If your ROI is only in a sub region of the image, a more appropriate point would be the center of the oriented bounding box of that ROI.
To create a specific registration instance using the ImageRegistrationMethod you need to select several components which together define the registration instance:
- Transformation
- It defines the mapping between the two images.
- Similarity metric
- It reflects the relationship between the intensities of the images (identity, affine, stochastic…).
- Optimizer.
- When selecting the optimizer you will also need to configure it (e.g. set the number of iterations).
- Interpolator.
- In most cases linear interpolation, the default setting, is sufficient.
Let’s see now an example where we want to use registration for aligning two volumes relative to the same patient, one being a CT scan and the second being a MRI sequence T1-weighted scan. We first read the images, casting the pixel type to that required for registration (Float32 or Float64) and look at them:
PYTHON
from ipywidgets import interact, fixed
from IPython.display import clear_output
import os
OUTPUT_DIR = "data/sitk/"
fixed_image = sitk.ReadImage(f"{OUTPUT_DIR}training_001_ct.mha", sitk.sitkFloat32)
moving_image = sitk.ReadImage(f"{OUTPUT_DIR}training_001_mr_T1.mha", sitk.sitkFloat32)
print(f"Origin for fixed image: {fixed_image.GetOrigin()}, moving image: {moving_image.GetOrigin()}")
print(f"Spacing for fixed image: {fixed_image.GetSpacing()}, moving image: {moving_image.GetSpacing()}")
print(f"Size for fixed image: {fixed_image.GetSize()}, moving image: {moving_image.GetSize()}")
print(f"Number Of Components Per Pixel for fixed image: {fixed_image.GetNumberOfComponentsPerPixel()}, moving image: {moving_image.GetNumberOfComponentsPerPixel()}")
# Callback invoked by the interact IPython method for scrolling through the image stacks of
# the two images (moving and fixed).
def display_images(fixed_image_z, moving_image_z, fixed_npa, moving_npa):
# Create a figure with two subplots and the specified size.
plt.subplots(1,2,figsize=(10,8))
# Draw the fixed image in the first subplot.
plt.subplot(1,2,1)
plt.imshow(fixed_npa[fixed_image_z,:,:],cmap=plt.cm.Greys_r)
plt.title('fixed/target image')
plt.axis('off')
# Draw the moving image in the second subplot.
plt.subplot(1,2,2)
plt.imshow(moving_npa[moving_image_z,:,:],cmap=plt.cm.Greys_r)
plt.title('moving/source image')
plt.axis('off')
plt.show()
interact(
display_images,
fixed_image_z = (0,fixed_image.GetSize()[2]-1),
moving_image_z = (0,moving_image.GetSize()[2]-1),
fixed_npa = fixed(sitk.GetArrayViewFromImage(fixed_image)),
moving_npa = fixed(sitk.GetArrayViewFromImage(moving_image)))
OUTPUT
Origin for fixed image: (0.0, 0.0, 0.0), moving image: (0.0, 0.0, 0.0)
Spacing for fixed image: (0.653595, 0.653595, 4.0), moving image: (1.25, 1.25, 4.0)
Size for fixed image: (512, 512, 29), moving image: (256, 256, 26)
Number Of Components Per Pixel for fixed image: 1, moving image: 1
We can use the CenteredTransformInitializer
to align the
centers of the two volumes and set the center of rotation to the center
of the fixed image:
PYTHON
# Callback invoked by the IPython interact method for scrolling and
# modifying the alpha blending of an image stack of two images that
# occupy the same physical space.
def display_images_with_alpha(image_z, alpha, fixed, moving):
img = (1.0 - alpha)*fixed[:,:,image_z] + alpha*moving[:,:,image_z]
plt.imshow(sitk.GetArrayViewFromImage(img),cmap=plt.cm.Greys_r)
plt.axis('off')
plt.show()
initial_transform = sitk.CenteredTransformInitializer(fixed_image,
moving_image,
sitk.Euler3DTransform(),
sitk.CenteredTransformInitializerFilter.GEOMETRY)
moving_resampled = sitk.Resample(moving_image, fixed_image, initial_transform, sitk.sitkLinear, 0.0, moving_image.GetPixelID())
interact(
display_images_with_alpha,
image_z = (0,fixed_image.GetSize()[2]-1),
alpha = (0.0,1.0,0.05),
fixed = fixed(fixed_image),
moving = fixed(moving_resampled))
The specific registration task at hand estimates a 3D rigid transformation between images of different modalities. There are multiple components from each group (optimizers, similarity metrics, interpolators) that are appropriate for the task. Note that each component selection requires setting some parameter values. We have made the following choices:
- Similarity metric, mutual information (Mattes MI):
- Number of histogram bins, 50.
- Sampling strategy, random.
- Sampling percentage, 1%.
- Interpolator,
sitkLinear
. - Optimizer, gradient descent:
- Learning rate, step size along traversal direction in parameter space, 1.0.
- Number of iterations, maximal number of iterations, 100.
- Convergence minimum value, value used for convergence checking in conjunction with the energy profile of the similarity metric that is estimated in the given window size, 1e-6.
- Convergence window size, number of values of the similarity metric which are used to estimate the energy profile of the similarity metric, 10.
We perform registration using the settings given above, and by taking advantage of the built in multi-resolution framework, we use a three tier pyramid.
In this example we plot the similarity metric’s value during registration. Note that the change of scales in the multi-resolution framework is readily visible.
PYTHON
# Callback invoked when the StartEvent happens, sets up our new data.
def start_plot():
global metric_values, multires_iterations
metric_values = []
multires_iterations = []
# Callback invoked when the EndEvent happens, do cleanup of data and figure.
def end_plot():
global metric_values, multires_iterations
del metric_values
del multires_iterations
# Close figure, we don't want to get a duplicate of the plot latter on.
plt.close()
# Callback invoked when the sitkMultiResolutionIterationEvent happens, update the index into the
# metric_values list.
def update_multires_iterations():
global metric_values, multires_iterations
multires_iterations.append(len(metric_values))
# Callback invoked when the IterationEvent happens, update our data and display new figure.
def plot_values(registration_method):
global metric_values, multires_iterations
metric_values.append(registration_method.GetMetricValue())
# Clear the output area (wait=True, to reduce flickering), and plot current data
clear_output(wait=True)
# Plot the similarity metric values
plt.plot(metric_values, 'r')
plt.plot(multires_iterations, [metric_values[index] for index in multires_iterations], 'b*')
plt.xlabel('Iteration Number',fontsize=12)
plt.ylabel('Metric Value',fontsize=12)
plt.show()
registration_method = sitk.ImageRegistrationMethod()
# Similarity metric settings.
registration_method.SetMetricAsMattesMutualInformation(numberOfHistogramBins=50)
registration_method.SetMetricSamplingStrategy(registration_method.RANDOM)
registration_method.SetMetricSamplingPercentage(0.01)
registration_method.SetInterpolator(sitk.sitkLinear)
# Optimizer settings.
registration_method.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100, convergenceMinimumValue=1e-6, convergenceWindowSize=10)
registration_method.SetOptimizerScalesFromPhysicalShift()
# Setup for the multi-resolution framework.
registration_method.SetShrinkFactorsPerLevel(shrinkFactors = [4,2,1])
registration_method.SetSmoothingSigmasPerLevel(smoothingSigmas=[2,1,0])
registration_method.SmoothingSigmasAreSpecifiedInPhysicalUnitsOn()
# Don't optimize in-place, we would possibly like to run this cell multiple times.
registration_method.SetInitialTransform(initial_transform, inPlace=False)
# Connect all of the observers so that we can perform plotting during registration.
registration_method.AddCommand(sitk.sitkStartEvent, start_plot)
registration_method.AddCommand(sitk.sitkEndEvent, end_plot)
registration_method.AddCommand(sitk.sitkMultiResolutionIterationEvent, update_multires_iterations)
registration_method.AddCommand(sitk.sitkIterationEvent, lambda: plot_values(registration_method))
final_transform = registration_method.Execute(sitk.Cast(fixed_image, sitk.sitkFloat32),
sitk.Cast(moving_image, sitk.sitkFloat32))
Always remember to query why the optimizer terminated. This will help
you understand whether termination is too early, either due to
thresholds being too tight, early termination due to small number of
iterations - numberOfIterations
, or too loose, early
termination due to large value for minimal change in similarity measure
- convergenceMinimumValue
.
PYTHON
print('Final metric value: {0}'.format(registration_method.GetMetricValue()))
print('Optimizer\'s stopping condition, {0}'.format(registration_method.GetOptimizerStopConditionDescription()))
OUTPUT
Final metric value: -0.6561600032169457
Optimizer's stopping condition, GradientDescentOptimizerv4Template: Convergence checker passed at iteration 61.
Now we can visually inspect the results:
PYTHON
moving_resampled = sitk.Resample(moving_image, fixed_image, final_transform, sitk.sitkLinear, 0.0, moving_image.GetPixelID())
interact(
display_images_with_alpha,
image_z = (0,fixed_image.GetSize()[2]-1),
alpha = (0.0,1.0,0.05),
fixed = fixed(fixed_image),
moving = fixed(moving_resampled))
PYTHON
print(f"Origin for fixed image: {fixed_image.GetOrigin()}, shifted moving image: {moving_resampled.GetOrigin()}")
print(f"Spacing for fixed image: {fixed_image.GetSpacing()}, shifted moving image: {moving_resampled.GetSpacing()}")
print(f"Size for fixed image: {fixed_image.GetSize()}, shifted moving image: {moving_resampled.GetSize()}")
OUTPUT
Origin for fixed image: (0.0, 0.0, 0.0), shifted moving image: (0.0, 0.0, 0.0)
Spacing for fixed image: (0.653595, 0.653595, 4.0), shifted moving image: (0.653595, 0.653595, 4.0)
Size for fixed image: (512, 512, 29), shifted moving image: (512, 512, 29)
If we are satisfied with the results, save them to file.
Segmentation
Image segmentation filters process images by dividing them into meaningful regions. SITK provides a wide range of filters to support classical segmentation algorithms, including various thresholding methods and watershed algorithms. The output is typically an image where different integers represent distinct objects, with 0 often used for the background and 1 (or sometimes 255) for foreground objects. After segmenting the data, SITK allows for efficient post-processing, such as labeling distinct objects and analyzing their shapes.
Let’s start by reading in a T1 MRI scan, on which we will perform segmentation operations.
PYTHON
%matplotlib inline
import matplotlib.pyplot as plt
from ipywidgets import interact, fixed
import SimpleITK as sitk
img_T1 = sitk.ReadImage("data/sitk/A1_grayT1.nrrd")
# To visualize the labels image in RGB with needs a image with 0-255 range
img_T1_255 = sitk.Cast(sitk.RescaleIntensity(img_T1), sitk.sitkUInt8)
# Callback invoked by the interact IPython method for scrolling through the image stacks of
# a volume image
def display_images(image_z, npa, title):
plt.imshow(npa[image_z,:,:], cmap=plt.cm.Greys_r)
plt.title(title)
plt.axis('off')
plt.show()
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(img_T1)),
title = fixed('Z slices'))
Thresholding
Thresholding is the most basic form of segmentation. It simply labels the pixels of an image based on the intensity range without respect to geometry or connectivity.
PYTHON
# Basic thresholding
seg = img_T1>200
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Basic thresholding"))
Another example using BinaryThreshold
:
PYTHON
# Binary thresholding
seg = sitk.BinaryThreshold(img_T1, lowerThreshold=100, upperThreshold=400, insideValue=1, outsideValue=0)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Binary thresholding"))****
ITK has a number of histogram based automatic thresholding filters
including Huang
, MaximumEntropy
,
Triangle
, and the popular Otsu’s method
(OtsuThresholdImageFilter
). These methods create a
histogram then use a heuristic to determine a threshold value.
PYTHON
# Otsu Thresholding
otsu_filter = sitk.OtsuThresholdImageFilter()
otsu_filter.SetInsideValue(0)
otsu_filter.SetOutsideValue(1)
seg = otsu_filter.Execute(img_T1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Otsu thresholding"))
print(otsu_filter.GetThreshold() )
OUTPUT
236.40869140625
Region Growing Segmentation
The first step of improvement upon the naive thresholding is a class of algorithms called region growing. The common theme for all these algorithms is that a voxel’s neighbor is considered to be in the same class if its intensities are similar to the current voxel. The definition of similar is what varies:
- ConnectedThreshold: The neighboring voxel’s intensity is within explicitly specified thresholds.
- ConfidenceConnected: The neighboring voxel’s intensity is within the implicitly specified bounds \(\mu \pm c \sigma\), where \(\mu\) is the mean intensity of the seed points, \(\sigma\) their standard deviation and \(c\) a user specified constant.
- VectorConfidenceConnected: A generalization of the previous approach to vector valued images, for instance multi-spectral images or multi-parametric MRI. The neighboring voxel’s intensity vector is within the implicitly specified bounds using the Mahalanobis distance \(\sqrt{(x-\mu)^{T\sum{-1}}(x-\mu)}<c\), where \(\mu\) is the mean of the vectors at the seed points, \(\sum\) is the covariance matrix and \(c\) is a user specified constant.
- NeighborhoodConnected
Let’s imagine that we have to segment the left lateral ventricle of the brain image we just visualized.
3D Slicer was used to determine that index: (132,142,96) was a good seed for the left lateral ventricle. Let’s first visualize the seed:
PYTHON
# Initial seed
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Initial seed"))
Let’s use ConnectedThreshold
functionality:
PYTHON
# Connected Threshold
seg = sitk.ConnectedThreshold(img_T1, seedList=[seed], lower=100, upper=190)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Connected threshold"))
Improving upon this is the ConfidenceConnected
filter,
which uses the initial seed or current segmentation to estimate the
threshold range.
This region growing algorithm allows the user to implicitly specify the threshold bounds based on the statistics estimated from the seed points, \(\mu \pm c\sigma\). This algorithm has some flexibility which you should familiarize yourself with:
- The “multiplier” parameter is the constant \(c\) from the formula above.
- You can specify a region around each seed point “initialNeighborhoodRadius” from which the statistics are estimated, see what happens when you set it to zero.
- The “numberOfIterations” allows you to rerun the algorithm. In the first run the bounds are defined by the seed voxels you specified, in the following iterations \(\mu\) and \(\sigma\) are estimated from the segmented points and the region growing is updated accordingly.
PYTHON
seg = sitk.ConfidenceConnected(img_T1, seedList=[seed],
numberOfIterations=0,
multiplier=2,
initialNeighborhoodRadius=1,
replaceValue=1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z=(0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title=fixed("Confidence connected"))
Since we have available also another MRI scan of the same patient,
T2-weighted, we can try to further improve the segmentation. We first
load a T2 image from the same person and combine it with the T1 image to
create a vector image. This region growing algorithm is similar to the
previous one, ConfidenceConnected
, and allows the user to
implicitly specify the threshold bounds based on the statistics
estimated from the seed points. The main difference is that in this case
we are using the Mahalanobis and not the intensity difference.
PYTHON
img_T2 = sitk.ReadImage("data/sitk/A1_grayT2.nrrd")
img_multi = sitk.Compose(img_T1, img_T2)
seg = sitk.VectorConfidenceConnected(img_multi, seedList=[seed],
numberOfIterations=1,
multiplier=2.5,
initialNeighborhoodRadius=1)
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Vector confidence connected"))
Clean Up
Use of low level segmentation algorithms such as region growing is
often followed by a clean up step. In this step we fill holes and remove
small connected components. Both of these operations are achieved by
using binary morphological operations, opening
(BinaryMorphologicalOpening
) to remove small connected
components and closing (BinaryMorphologicalClosing
) to fill
holes.
SITK supports several shapes for the structuring elements (kernels) including:
sitkAnnulus
sitkBall
sitkBox
sitkCross
The size of the kernel can be specified as a scalar (same for all dimensions) or as a vector of values, size per dimension.
The following code cell illustrates the results of such a clean up, using closing to remove holes in the original segmentation.
PYTHON
seg = sitk.ConfidenceConnected(img_T1, seedList=[seed],
numberOfIterations=0,
multiplier=2,
initialNeighborhoodRadius=1,
replaceValue=1)
vectorRadius=(1,1,1)
kernel=sitk.sitkBall
seg_clean = sitk.BinaryMorphologicalClosing(seg, vectorRadius, kernel)
seg_img_clean = sitk.LabelOverlay(img_T1_255, seg_clean)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img_clean)),
title = fixed("Confidence connected after morphological closing"))
Level-Set Segmentation
There are a variety of level-set based segmentation filter available in ITK:
There is also a modular Level-set framework which allows composition of terms and easy extension in C++.
First we create a label image from our seed.
PYTHON
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
Use the seed to estimate a reasonable threshold range.
PYTHON
stats = sitk.LabelStatisticsImageFilter()
stats.Execute(img_T1, seg)
factor = 3.5
lower_threshold = stats.GetMean(1)-factor*stats.GetSigma(1)
upper_threshold = stats.GetMean(1)+factor*stats.GetSigma(1)
print(lower_threshold,upper_threshold)
OUTPUT
81.25184541308809 175.0084466827569
PYTHON
init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(lower_threshold)
lsFilter.SetUpperThreshold(upper_threshold)
lsFilter.SetMaximumRMSError(0.02)
lsFilter.SetNumberOfIterations(1000)
lsFilter.SetCurvatureScaling(.5)
lsFilter.SetPropagationScaling(1)
lsFilter.ReverseExpansionDirectionOn()
ls = lsFilter.Execute(init_ls, sitk.Cast(img_T1, sitk.sitkFloat32))
print(lsFilter)
OUTPUT
itk::simple::ThresholdSegmentationLevelSetImageFilter
LowerThreshold: 81.2518
UpperThreshold: 175.008
MaximumRMSError: 0.02
PropagationScaling: 1
CurvatureScaling: 0.5
NumberOfIterations: 1000
ReverseExpansionDirection: 1
ElapsedIterations: 119
RMSChange: 0.0180966
Debug: 0
NumberOfThreads: 10
NumberOfWorkUnits: 0
Commands: (none)
ProgressMeasurement: 0.119
ActiveProcess: (none)
PYTHON
seg_img = sitk.LabelOverlay(img_T1_255, ls>0)
interact(
display_images,
image_z = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Level set segmentation"))
Challenge: Segment on the y-Axis
Try to segment the lateral ventricle using volume’s slices on y-axis instead of z-axis.
Hint: Start by editing the display_images
function in
order to select the slices on the y-axis.
PYTHON
def display_images(image_y, npa, title):
plt.imshow(npa[:,image_y,:], cmap=plt.cm.Greys_r)
plt.title(title)
plt.axis('off')
plt.show()
# Initial seed; we can reuse the same used for the z-axis slices
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
seg_img = sitk.LabelOverlay(img_T1_255, seg)
interact(
display_images,
image_y = (0,img_T1.GetSize()[2]-1),
npa = fixed(sitk.GetArrayViewFromImage(seg_img)),
title = fixed("Initial seed"))
After some attempts, this was the method that gave the best segmentation results:
PYTHON
seed = (132,142,96)
seg = sitk.Image(img_T1.GetSize(), sitk.sitkUInt8)
seg.CopyInformation(img_T1)
seg[seed] = 1
seg = sitk.BinaryDilate(seg, [3]*seg.GetDimension())
stats = sitk.LabelStatisticsImageFilter()
stats.Execute(img_T1, seg)
factor = 3.5
lower_threshold = stats.GetMean(1)-factor*stats.GetSigma(1)
upper_threshold = stats.GetMean(1)+factor*stats.GetSigma(1)
print(lower_threshold,upper_threshold)
init_ls = sitk.SignedMaurerDistanceMap(seg, insideIsPositive=True, useImageSpacing=True)
lsFilter = sitk.ThresholdSegmentationLevelSetImageFilter()
lsFilter.SetLowerThreshold(lower_threshold)
lsFilter.SetUpperThreshold(upper_threshold)
lsFilter.SetMaximumRMSError(0.02)
lsFilter.SetNumberOfIterations(1000)
lsFilter.SetCurvatureScaling(.5)
lsFilter.SetPropagationScaling(1)
lsFilter.ReverseExpansionDirectionOn()
ls = lsFilter.Execute(init_ls, sitk.Cast(img_T1, sitk.sitkFloat32))
OUTPUT
81.25184541308809 175.0084466827569
Segmentation Evaluation
Evaluating segmentation algorithms typically involves comparing your results to reference data.
In the medical field, reference data is usually created through manual segmentation by an expert. When resources are limited, a single expert might define this data, though this is less than ideal. If multiple experts contribute, their inputs can be combined to produce reference data that more closely approximates the elusive “ground truth.”
For detailed coding examples on segmentation evaluation, refer to this notebook.
Acknowledgements
This episode was largely inspired by the official SITK tutorial, which is copyrighted by NumFOCUS and distributed under the Creative Commons Attribution 4.0 International License, and SITK Notebooks.
Additional Resources
To really understand the structure of SITK images and how to work with them, we recommend some hands-on interaction using the SITK Jupyter notebooks from the SITK official channels. More detailed information about SITK fundamental concepts can also be found here.
Code illustrating various aspects of the registration and segmentation framework can be found in the set of examples which are part of the SITK distribution and in the SITK Jupyter notebook repository.
Key Points
- Registration aligns images for data merging or temporal tracking, while segmentation identifies objects within images, which is critical for detailed analysis.
- SITK simplifies segmentation, registration, and advanced analysis tasks using ITK algorithms and supporting several programming languages.
- Images in SITK are defined by physical space, unlike array-based libraries, ensuring accurate spatial representation and metadata management.
- SITK offers global and bounded domain transformations for spatial manipulation and efficient resampling techniques with various interpolation options.
- Use SITK’s robust capabilities for registration and classical segmentation methods such as thresholding and region growth, ensuring efficient analysis of medical images.
Content from Preparing Images for Machine Learning
Last updated on 2024-09-14 | Edit this page
Estimated time: 125 minutes
Overview
Questions
- What are the fundamental steps in preparing images for machine learning?
- What techniques can be used to augment data?
- How should data from various sources or collected under different conditions be handled?
- How can we generate features for machine learning using radiomics or volumetrics?
Objectives
- Outline the fundamental steps involved in preparing images for machine learning
- Demonstrate data augmentation techniques using affine transformations
- Discuss the potential pitfalls of data augmentation
- Explain the concept of derived features, including radiomics and other pipeline-derived features
- Review image harmonization techniques at both the image and dataset levels
Basic Steps
Datasets of images can serve as the raw data for machine learning (ML). While in rare cases they might be ready for use with minimal effort, in most projects, a significant portion of time is dedicated to cleaning and preparing the data. The quality of the data directly impacts the quality of the resulting models.
One of the initial steps in building a model involves manually inspecting both the data and metadata.
Challenge: Thinking About Metadata
What metadata should you examine in almost any radiological dataset?
Hint: Consider metadata pertinent to brain and thorax imaging.
In almost any radiological dataset, it is essential to examine the age and sex distribution. These metadata are crucial for both brain MRIs and chest X-rays, as well as for many other types of imaging studies.
In real-world scenarios, we often face the challenge of investigating a specific pathology that is relatively rare. Consequently, our dataset may contain thousands of normal subjects but relatively few with any pathology, and even fewer with the pathology of interest. This creates an imbalanced dataset and can introduce biases. For instance, if we aim to develop a model to detect heart failure on chest X-rays, but all our heart failure patients are older men, the model may incorrectly “focus” (by exploiting statistical correlations) on the absence of female breasts and signs of aging, rather than on true heart failure indicators like the cardiothoracic ratio.
Here are some initial steps to understanding your dataset if you plan to do supervised ML:
- Check some images by hand: assess quality, content, and any unexpected elements.
- Check some labeling by hand: ensure accuracy, consistency, and note any surprises.
- Diversity check for obvious protected classes: examine images, labels, or metadata for representation.
- Convert images to a lossless format and store a backup of the data collection.
- Anonymize data: remove identifiable information within images (consider using OCR for text).
- Determine important image features: consider creating cropped versions.
- Normalize images: adjust for size, histogram, and other factors, ensuring proper centering when necessary.
- Re-examine some images by hand: look for artifacts introduced during preprocessing steps.
- Harmonize the dataset.
- Create augmented and/or synthetic data.
When preparing images for unsupervised learning, many of these steps still apply. However, you must also consider the specific algorithm’s requirements, as some segmentation algorithms may not benefit from additional data.
In our preparation for ML as outlined above, step two involves verifying the accuracy of labeling. The label represents the target variable you want your model to predict, so it is crucial to ensure that the labeling is both accurate and aligns with the outcomes you expect your model to predict.
In step three, we emphasize the importance of checking for diversity among protected classes—groups that are often subject to specific legislation in medical diagnostics due to historical underrepresentation. It is essential to ensure your dataset includes women and ethnic minority groups, as population-level variations can significantly impact model performance. For instance, chest circumference and height can vary greatly between populations, such as between Dutch and Indonesian people. While these differences might not affect an algorithm focused on cerebral blood flow, they are critical for others.
In step five, the focus is on anonymization, particularly in imaging data like ultrasounds, where patient names or diagnoses might appear directly on the image. Simple cropping may sometimes suffice, but more advanced techniques such as blurring, masking, or optical character recognition (OCR) should also be considered to ensure thorough anonymization.
We will cover step nine, dataset harmonization, in a separate section. For step ten, creating more data, synthetic data generation will be discussed further in an episode on generative AI. In the next section, we will explore examples of augmented data creation.
Let’s go throught some examples. First, we import the libraries we need:
PYTHON
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
from skimage import transform
from skimage import io
from skimage.transform import rotate
from skimage import transform as tf
from skimage.transform import PiecewiseAffineTransform
from skimage.transform import resize
Then, we import our example images and examine them.
PYTHON
image_cxr1 = io.imread('data/ml/rotatechest.png') # a relatively normal chest X-ray (CXR)
image_cxr_cmegaly = io.imread('data/ml/cardiomegaly_cc0.png') # cardiomegaly CXR
image_cxr2 = io.imread('data/ml/other_op.png') # a relatively normal CXR
# create figure
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 3
# Adds a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(image_cxr1)
plt.axis('off')
plt.title("Normal 1")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(image_cxr_cmegaly)
plt.axis('off')
plt.title("Cardiomegaly")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(image_cxr2)
plt.axis('off')
plt.title("Normal 2")
Challenge: Can You See Some problems in the Following Scenario?
Imagine you got the above images and many more because you have been assigned to make an algorithm for cardiomegaly detection. The goal is to notify patients if their hospital-acquired X-rays, taken for any reason, show signs of cardiomegaly. This is an example of using ML for opportunistic screening.
You are provided with two datasets:
- A dataset of healthy (no cardiomegaly) patients from an outpatient clinic in a very poor area, staffed by first-year radiography students. These patients were being screened for tuberculosis (TB).
- A dataset of chest X-rays of cardiomegaly patients from an extremely prestigious tertiary inpatient hospital.
All of the following may pose potential problems:
- We may have different quality images from different machines for our healthy versus diseased patients, which could introduce a “batch effect” and create bias.
- At the prestigious hospital, many X-rays might be taken in the supine position. Will these combine well with standing AP X-rays?
- There could be concerns about the accuracy of labeling at the clinic since it was performed by lower-level staff.
Challenge: Prepare the Images for Classic Supervised ML
Use skimage.transform.rotate
to create two realistic
augmented images from the given ‘normal’ image stored in the
variables.
Then, in a single block of code, apply what you perceive as the most critical preprocessing operations to prepare these images for classic supervised ML.
Hint: Carefully examine the shape of the cardiomegaly image. Consider the impact of harsh lines on ML performance.
PYTHON
# figure out how much to cut on sides
print("cut top/bottom:", (image_cxr_cmegaly.shape[0] - image_cxr1.shape[0])/2)
cut_top_bottom = abs(round((image_cxr_cmegaly.shape[0] - image_cxr1.shape[0])/2))
# figure our how much to cut on top and bottom
print("cut sides:",(image_cxr_cmegaly.shape[1] - image_cxr1.shape[1])/2)
OUTPUT
cut top/bottom: -119.0
cut sides: -208.5
PYTHON
cut_sides = abs(round((image_cxr_cmegaly.shape[1] - image_cxr1.shape[1])/2))
list_images = [image_cxr1, image_cxr_cmegaly, image_cxr2]
better_for_ml_list = []
for image in list_images:
image = rotate(image,2)
image = image[cut_top_bottom:-cut_top_bottom, cut_sides: -cut_sides]
better_for_ml_list.append(image)
# create figure for display
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 3
# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(better_for_ml_list[0])
plt.axis('off')
plt.title("Normal example rotated")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(better_for_ml_list[1])
plt.axis('off')
plt.title("Cardiomegaly rotated")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(better_for_ml_list[2])
plt.axis('off')
plt.title("Another normal rotated")
PYTHON
# find a size we want to resize to, here the smallest
zero_side = []
one_side = []
for image in better_for_ml_list:
zero_side.append(image.shape[0])
zero_side.sort()
one_side.append(image.shape[1])
one_side.sort()
smallest_size = zero_side[0], one_side[0]
# make resized images
resized = []
for image in better_for_ml_list:
image = resize(image, (smallest_size))
resized.append(image)
# create figure for display
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 3
# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(resized[0])
plt.axis('off')
plt.title("Normal 1 Augmented Resized")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(resized[1])
plt.axis('off')
plt.title("Augment Cardiomegaly Resized")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(resized[2])
plt.axis('off')
plt.title("Normal 2 Augment Resized")
There are a few key considerations to keep in mind regarding the choices made here. The goal is to create data that realistically represents what might occur in real-life scenarios. For example, a 2-degree rotation is far more realistic than a 25-degree rotation. Unless you are using a rotationally invariant algorithm or another advanced method that accounts for such differences, even small rotations can make every pixel slightly different for the computer. A 2-degree rotation closely mirrors what might happen in a clinical setting, making it a more practical choice.
Additionally, the results can be further improved by cropping the images. Generally, we aim to minimize harsh, artificial lines in any images used for ML algorithms, unless those lines are consistent across all images. It is a common observation among practitioners that many ML algorithms tend to “focus” on these harsh lines, so if they are not uniformly present, it is better to remove them. As for how much to crop, we based our decision on the size of our images. While you don’t need to follow our exact approach, it is important to ensure that all images are ultimately the same size and that all critical features are retained.
Of course, there are numerous other transformations we can apply to images beyond rotation. For example, let’s explore applying a shear:
PYTHON
# create affine transform
affine_tf = tf.AffineTransform(shear=0.2)
# apply transform to image data
image_affine_tf = tf.warp(image_cxr_cmegaly, inverse_map=affine_tf)
# display the result
io.imshow(image_affine_tf)
io.show()
And finally, let’s demonstrate a “wave over a mesh.” We’ll start by creating a grid, or “mesh,” over our image, represented by dots in our plot. Then, we’ll apply a warp function to transform the image into the shape of a wave.
PYTHON
rows, cols = image_affine_tf.shape[0], image_affine_tf.shape[1]
# np.linspace will return evenly spaced numbers over an interval
src_cols = np.linspace(0, cols, 20)
# ie above is start=0, stop = cols, num = 50, and num is the number of chops
src_rows = np.linspace(0, rows, 10)
# np.meshgrid returns coordinate matrices from coordinate vectors.
src_rows, src_cols = np.meshgrid(src_rows, src_cols)
# numpy dstack stacks along a third dimension in the concatenation
src = np.dstack([src_cols.flat, src_rows.flat])[0]
dst_rows = src[:, 1] - np.sin(np.linspace(0, 3 * np.pi, src.shape[0])) * 50
dst_cols = src[:, 0]
dst_rows *= 1.5
dst_rows -= 1.5 * 50
dst = np.vstack([dst_cols, dst_rows]).T
tform = PiecewiseAffineTransform()
tform.estimate(src, dst)
noform = PiecewiseAffineTransform()
noform.estimate(src, src)
out_rows = image_affine_tf.shape[0] - 1.5 * 50
out_cols = cols
out = tf.warp(image_affine_tf, tform, output_shape=(out_rows, out_cols))
# create figure
fig = plt.figure(figsize=(10, 7))
# setting values to rows and column variables
rows = 1
columns = 4
# add a subplot at the 1st position
fig.add_subplot(rows, columns, 1)
# showing image
plt.imshow(image_affine_tf)
plt.axis('off')
plt.title("Normal")
# add a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)
# showing image
plt.imshow(image_affine_tf)
plt.plot(noform.inverse(src)[:, 0], noform.inverse(src)[:, 1], '.b')
plt.axis('off')
plt.title("Normal and Mesh")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 3)
# showing image
plt.imshow(out)
plt.axis('off')
plt.title("Augment")
# add a subplot at the 3nd position
fig.add_subplot(rows, columns, 4)
# showing image
plt.imshow(out)
plt.plot(tform.inverse(src)[:, 0], tform.inverse(src)[:, 1], '.b')
plt.axis('off')
plt.title("Augment and Mesh")
The last transformation doesn’t appear realistic. The chest became noticeably widened, which could be problematic. When augmenting data, there are numerous possibilities, but it’s crucial to ensure the augmented data remains realistic. Only a subject matter expert (typically a pathologist, nuclear medicine specialist, or radiologist) can accurately determine what realistic data should look like.
Libraries for Medical Image Augmentation
We demonstrated how to augment medical images using scikit-image. Another great resource for augmentation is SimpleITK, which offers a dedicated tutorial on data augmentation. Additionally, OpenCV provides versatile tools for image processing and augmentations.
To keep your environment manageable, we recommend using just one of these libraries. scikit-image, SimpleITK, and OpenCV all offer effective solutions for medical image augmentation.
Images’ Features
So far, we’ve focused on examples where we directly manipulate images. However, much of ML involves working with derived values from images, often converted into tabular data. In fact, it’s possible to combine images with various types of tabular data in multiple ways for ML. But before exploring these methods, let’s first consider using image features alone as inputs for ML.
Two notable examples of image features are volumetric data and radiomic data. Handling such data at scale—when dealing with thousands of images—typically involves automated processes rather than manual measurements. In the past, pioneers like Benjamin Felson, a founding figure in modern chest X-ray analysis, painstakingly analyzed hundreds of X-rays by hand to gather statistics. Today, processing pipelines allow us to efficiently analyze thousands of images simultaneously.
We aim to integrate images into a pipeline that automatically generates various types of data. A notable example of such a feature pipeline, particularly for brain imaging, is Freesurfer. Freesurfer facilitates the extraction of volume and other relevant data from brain imaging scans.
Challenge: Identifying Issues with Pipeline Outputs in the Absence of Original Images
Consider potential issues that could arise from using the output of a pipeline without accessing the original images.
One major concern is accuracy, as pipelines may mislabel or miscount features, potentially leading to unreliable data analysis or model training. To mitigate these issues, consider:
- Manual verification: validate pipeline outputs by manually reviewing a subset of images to ensure consistency with expected results.
- Literature review: consult existing research to establish benchmarks for feature counts and volumes, helping identify discrepancies that may indicate pipeline errors.
- Expert consultation: seek insights from radiologists or domain experts to assess the credibility of pipeline outputs based on their clinical experience.
Proceed with caution when relying on pipeline outputs, especially without direct access to the original images.
Another challenge with pipelines is that the results can vary significantly depending on which one is chosen. For a detailed exploration of how this can impact fMRI data, see this Nature article. Without access to the original images, it becomes difficult to determine which pipeline most accurately captures the desired outcomes from the data.
Morphometric or volumetric data represents one type of derived data, but radiomics introduces another dimension. Radiomics involves analyzing mathematical characteristics within images, such as entropy or kurtosis. Similar methodologies applied to pathology are referred to as pathomics. Some define radiomics and pathomics as advanced texture analysis in medical imaging, while others argue they encompass a broader range of quantitative data than traditional texture analysis methods.
Regardless of the data type chosen, the typical approach involves segmenting and/or masking an image to isolate the area of interest, followed by applying algorithms to derive radiomic or pathomic features. While it’s possible to develop custom code for this purpose, using established packages is preferred for ensuring reproducibility. For future reproducibility, adherence to standards such as those set by the International Bio-imaging Standards Initiative (IBSI) is crucial.
Below is a list of open-source* libraries that facilitate this type of analysis:
mirp |
pyradiomics |
LIFEx |
radiomics |
|
---|---|---|---|---|
License | EUPL-1.2 | BSD-3 | custom | GPL-3.0 |
Last updated | 5/2024 | 1/2024 | 6/2023 | 11/2019 |
Programming language | Python | Python | Java | MATLAB |
IBSI-1 compliant | yes | partial | yes | no claim |
IBSI-2 compliant | yes | no claim | yes | no claim |
Interface | high-level API | high-level API, Docker | GUI, low-level API | low-level API |
Website | GitHub | GitHub | website | GitHub |
Early publication | JOSS publication | doi:10.1158/0008-5472.CAN-17-0339 | doi:10.1158/0008-5472.CAN-18-0125 | doi:10.1088/0031-9155/60/14/5471 |
Notes | relative newcomer | very standard and supported | user-friendly | * MATLAB requires a license |
Once we have tabular data, we can apply various algorithms to analyze it. However, applying ML isn’t just a matter of running algorithms and getting quick results. In the next section, we will explore one reason why this process is more complex than it may seem.
Harmonization
We often need to harmonize either images or datasets of derived features. When dealing with images, differences between datasets can be visually apparent. For example, if one set of X-rays consistently appears darker, it may be wise to compare average pixel values between datasets and adjust normalization to achieve more uniformity. This is a straightforward scenario.
Consider another example involving derived datasets from brain MRIs with Virchow Robin’s spaces. One dataset was captured using a 1.5 Tesla machine in a distant location, while the other used an experimental 5 Tesla machine (high resolution) in an advanced hospital. These machines vary in resolution, meaning what appears as a single Virchow Robin’s space at low resolution might actually show as two or three smaller spaces fused together at high resolution. This is just one potential difference.
Below are images of the same patient scanned with 1.5 Tesla and 3 Tesla machines:
Different contrast levels significantly affect the visibility of structures like the caudate and thalami in brain images. As a result, the radiomic characteristics of these images, including contrast and possibly other parameters, can vary even when they originate from the same patient.
Constructing a dataset based solely on extensive, unharmonized derived data is not always practical. However, without access to the original images, it becomes difficult to fully understand and account for these variations.
We recommend the following approach:
- Compare the data using descriptive statistics and your knowledge of the patient groups. Consider whether you expect similar counts across both groups, and justify your reasoning.
- Explore the use of a harmonization package to standardize data across different datasets.
Below are three examples from the field of neuroimaging:
neurocombat |
haca3 |
autocombat |
|
---|---|---|---|
License | MIT for Python/ Artistic for R | Missing? | Apache 2.0 |
Last updated | 2021 | 2024 | 2022 |
Programming language | Python or R | Python | Python |
Organ/area and modality | brain MRI | brain MRI | brain MRI |
Data type | derived values | images | derived values |
Notes | no standard release yet | versioned but not released | not release, not versioned |
Original website | GitHub | GitHub | GitHub |
Early publication | doi: 10.1016/j.neuroimage.2017.08.047 | doi:10.1016/j.compmedimag.2023.102285 | doi: 10.1038/s41598-022-16609-1 |
Versioned website | versioned on cvasl | versioned on GitHub | versioned on cvasl |
There are numerous packages available for brain MRI alone, not to mention those for imaging the rest of the body. We have selected these three examples to highlight some of the common issues and pitfalls associated with such packages.
Challenge: Identifying Potential Problems in Each Package
Consider issues that might hinder your implementation with each package.
Aging code: neurocombat
was last modified three years
ago. It may rely on outdated dependencies that could pose compatibility
issues with newer hardware and software environments. Additionally, the
lack of versioned releases makes it challenging to track changes and
updates.
No license: haca3
does not have a clear license
available. Using code without a proper license could lead to legal
complications and uncertainties about its usage rights.
No releases and versioning: autocombat
lacks both
released versions and versioning practices. Without clear versioning, it
becomes difficult to ensure consistency and compare results across
different implementations of the package.
We’ve pointed out a few potential issues to highlight common challenges with such programs. Looking ahead, consider developing a harmonization package that’s sustainable and useful for your research community.
While each of these packages has its strengths for researchers, none are perfect. Choose wisely when selecting a package! Also, remember you can create your own harmonization method through coding or develop it into a package for others to use.
Key Points
- Direct knowledge of specific data cannot be substituted
- Statistical analysis is essential to detect and mitigate biases in patient distribution
- Verify if derived data aligns with known clinical realities through statistical examination
- Evaluate the validity and utility of data augmentation methods before applying them
- Radiomics enables the use of mathematical image qualities as features
- There are several accessible pipelines for volumetrics and radiomics
- Data from different machines (or even the same machines with different settings) often requires harmonization, achievable through coding and the use of existing libraries
Content from Working with Pathology Images
Last updated on 2025-02-05 | Edit this page
Estimated time: 90 minutes
Overview
Questions
- What are some open libraries can be used with pathology?
- How are pathology slides different from a photo?
- What is the structure of a pathology slide?
Objectives
- Describe some open source tools and libraries for pathology images
- Discuss common file formats for pathology slides
- Explain tiling and pyramidal file formats
- Show basic code in openslide
Introduction
Pathologists deal with multiple types of images. Some pathologists may use photography when recording gross (or macroscopic) pathology data. The processing of digital photography is a straightforward application of general image processing. However the workhorse type image of pathology is the histopathological image, and this is very different from a photograph.
In the below picture you can see an image with gross and histopathology of signet ring cell carcinoma metastasis to the ovary. The histopathology images (bottom row images) are stained microscopy images. The gross pathology images (top row images) are photographs.
Nakamura, Yoshiaki; Hiramatsu, Ayako; Koyama, Takafumi; Oyama, Yu; Tanaka, Ayuko; Honma, Koichi, CC BY 3.0 https://creativecommons.org/licenses/by/3.0, via Wikimedia Commons
Signet ring cell carcinoma is a type of cancer that currently can only be definitively differentiated from other types of gastric cancers based on the microscopic characteristics (the histology or histopathology) of the tumor. In this lesson we will cover how to work with histopathology images as they differ from simpler multi-channel color image like a modern digital camera often produces.
File formats
In terms of histopathological images, you may find them in several file formats. Not every library or tool for such imaging works with every format. Below is a table with several file formats.
Format Name | Extension | Notes | More info |
---|---|---|---|
TIFF | .tiff | tag image file format | can be tiled |
OME-TIFF | .ome.tiff, ome.tif, .ome.tf2, .ome.tf8 or .ome.btf | includes XML and may be multiple TIFFs | https://brainder.org/2012/09/23/the-nifti-file-format/ |
SVS | .svs | Scanned Virtual Slide | https://www.dicomstandard.org/ |
DICOM | .dcm or none | supports an included WSI file | https://dicom.nema.org/dicom/dicomwsi/ |
The table above is far from complete because any vendors have their own file format, however the community is consolidating towards shared file formats that are not vendor specific. The table is organized in a manner of ascending complexity. A simple general TIFF file is very close to a simple raster image. Such a file for a high resolution image would be quite large and potentially hard to manage based on size alone. Tiling is a way to more efficiently manage huge images.
Tiling approaches will be very familiar to anyone who comes from the world of GIS, mapping or digital geography. Maps provide a really obvious way to think about the advantages of pyramidal tiling. Imagine if every time you wanted to get a map with directions, and check out a picture of your destination, you needed to load a detailed map picture of the entire world, literally. It would be incredibly inefficient and taxing on your computer or smartphone. It’s much more efficient to just load what you need. There are many ways this can be done but practically some variation of pyramidal tiling is usually used. The same is true with tiled pathology images.
To be clear tiling is the process of cutting a big image up so you can load only relevant parts of it at a time. Tiling is useful even without pyramids - it can be used to make efficient running parallel inference on huge images (i.e. medical imaging at high resolution) which will mostly run at the highest resolution.
Pyramidical images are those where multiple images are made from a source image at predefined zoom levels so you can switch beteeen using the different resolutions as needed. Remember the last time you downloaded an image for a presentation and you there was a choice of resolutions because not everyone needs an image of Gigabytes in size for high resolution quality? You can think of this as a sort of pyramidal-like scheme, except that the images were not all in the same image. In the world of modern digital pathology we often use both tiling and pyramidal structure simultaneosly.
Therefore practically speaking we often have tiled images made in a pyramidal structure, and at each level of the pyramid after zero there is a lower resolution downscaled version of the image. Each level of the pyramid has tiles, smaller parts of the image that are quilted into the whole image. Creating images in this way allows much more efficient access and management of images because you can load and display images at only one level. An OME-TIFF file can be one or more tiled TIFFs and some XML in the header. An SVS file is essentially a tiled TIFF image that has a few additional things like overview image. Various types of pathology files can be somehow packed into a DICOM which has it’s own file specification. DICOMs could be conceptualized as a containers which can contain many sorts of imaging data, e.g. JPEGS of X-rays, including OME-TIFFs or other pathology file types.
Libraries
There is actually very little in the way of open pure Python libraries that deal with pathology images. A promising new contender is TIA toolbox ; however this is a relatively new library. There are better known (in the small community of researchers in digital pathology) more popular libraries that have binders or other tricks that allow you to process pathology images in Python. Probably the most popular is Openslide, but other contenders include Bio-Formats (which is both a library and a tool) and QuPath which has the distinct advantage of being able to work with OME-TIFF directly.
QuPath is mainly a GUI tool, that relies on libraries (Bioformats/OpenSlide) as “backends” to load the slide. It can be used directly as a tool for viewing, unlike BioFormats/OpenSlide, which are code only. One potential disadvantage of QuPath if you code with it, as opposed to just using it’s GUI, it usually requires some Java knowledge. Bio-Formats has Python bindings but is also mainly a Java tool which can also be used as a Java or C++ library. To keep things simple we will use openslide as the vehicle to explore histopathology images in this lesson.
Reading TIFF histopathology Images
Openslide is actually written in C, but you can also use Java or
Python to work with is due to bindings.To run the code you will need to
work in the pathy
environment created with the
environment_pathology.yml
file.
PYTHON
import openslide
from openslide import open_slide
import numpy as np
import matplotlib.pyplot as plt
After importing the needed libraries we can now load a slide, and print out it’s properties.
PYTHON
file_path = 'data/pathology/CMU-1.tiff'
slide_in = open_slide(file_path)
slide_in_props = slide_in.properties
print(slide_in_props)
OUTPUT
<_PropertyMap {'openslide.level-count': '9', 'openslide.level[0].downsample': '1', 'openslide.level[0].height': '32914', 'openslide.level[0].tile-height': '256', 'openslide.level[0].tile-width': '256', 'openslide.level[0].width': '46000', 'openslide.level[1].downsample': '2', 'openslide.level[1].height': '16457', 'openslide.level[1].tile-height': '256', 'openslide.level[1].tile-width': '256', 'openslide.level[1].width': '23000', 'openslide.level[2].downsample': '4.0001215362177929', 'openslide.level[2].height': '8228', 'openslide.level[2].tile-height': '256', 'openslide.level[2].tile-width': '256', 'openslide.level[2].width': '11500', 'openslide.level[3].downsample': '8.0002430724355857', 'openslide.level[3].height': '4114', 'openslide.level[3].tile-height': '256', 'openslide.level[3].tile-width': '256', 'openslide.level[3].width': '5750', 'openslide.level[4].downsample': '16.000486144871171', 'openslide.level[4].height': '2057', 'openslide.level[4].tile-height': '256', 'openslide.level[4].tile-width': '256', 'openslide.level[4].width': '2875', 'openslide.level[5].downsample': '32.014322017605849', 'openslide.level[5].height': '1028', 'openslide.level[5].tile-height': '256', 'openslide.level[5].tile-width': '256', 'openslide.level[5].width': '1437', 'openslide.level[6].downsample': '64.050935911470475', 'openslide.level[6].height': '514', 'openslide.level[6].tile-height': '256', 'openslide.level[6].tile-width': '256', 'openslide.level[6].width': '718', 'openslide.level[7].downsample': '128.10187182294095', 'openslide.level[7].height': '257', 'openslide.level[7].tile-height': '256', 'openslide.level[7].tile-width': '256', 'openslide.level[7].width': '359', 'openslide.level[8].downsample': '257.06193261173183', 'openslide.level[8].height': '128', 'openslide.level[8].tile-height': '256', 'openslide.level[8].tile-width': '256', 'openslide.level[8].width': '179', 'openslide.mpp-x': '1000', 'openslide.mpp-y': '1000', 'openslide.quickhash-1': '428aa6abf42c774234a463cb90e2cbf88423afc0217e46ec2e308f31e29f1a9f', 'openslide.vendor': 'generic-tiff', 'tiff.ResolutionUnit': 'centimeter', 'tiff.XResolution': '10', 'tiff.YResolution': '10'}>
When we printed the properties we can see we are dealing with a tiled slide because we have properties like levels and tile height and width. We can observe here we have nine levels in this specific image file, and see tile heights and widths are often 256 n this specific file but actual heights and widths very per level. These properties may not be the only properties we care about. We most likely may care about mapping this image back to real world sizes. We have a property that tells us the microns per pixel. Let’s look:
PYTHON
print("Pixel size of X in um is:", slide_in_props['openslide.mpp-x'])
print("Pixel size of Y in um is:", slide_in_props['openslide.mpp-y'])
OUTPUT
Pixel size of X in um is: 1000
Pixel size of Y in um is: 1000
Challenge: Find the real world size of an image
There is a property of the loaded image which is (width, height)
tuple for level 0 of the image. The property is called
dimensions
. Calculate the real world dimensions of our
image in microns.
Hint: Make sure you understand the type of each variable you assign. You can always check by printing out the type.
PYTHON
# get the width and height
width, height = slide_in.dimensions
# get the spatial resolution in microns per pixel for the X and Y directions
microns_per_pixel_x = slide_in.properties.get(openslide.PROPERTY_NAME_MPP_X)
microns_per_pixel_y = slide_in.properties.get(openslide.PROPERTY_NAME_MPP_Y)
# convert to floats because they are strings
microns_per_pixel_x = float(microns_per_pixel_x)
microns_per_pixel_y = float(microns_per_pixel_y)
# calculate real-world dimensions of the slide in microns
real_world_width = width * microns_per_pixel_x
real_world_height = height * microns_per_pixel_y
# print the results
print(f"Real-world dimensions in microns: {real_world_width} microns (width), {real_world_height} microns (height)")
OUTPUT
Real-world dimensions in microns: 46000000.0 microns (width), 32914000.0 microns (height)
As you might guess loading this huge image is pointless if we just want a small image that gives us a general idea. We can use openslide to make a thumbnail.
PYTHON
# get a thumbnail of the image and pop it out on your computer
slide_in_thumb_600 = slide_in.get_thumbnail(size=(600, 600))
slide_in_thumb_600.show() #
With the above code an image will be opened on most operating systems, but if one does not open for you, do not depair. We could also store off our thumbnail (or any part of the image for that matter) as a numpy array, and then graph it.
PYTHON
# convert thumbnail to numpy array, and plot it
slide_in_thumb_600_np = np.array(slide_in_thumb_600)
plt.figure(figsize=(8,8))
plt.imshow(slide_in_thumb_600_np)
OUTPUT
The above image is part of the openslide sample images provided with the library, which is on an open software license
We can also explore a bit about how the image is tiled.
PYTHON
# get slide_in dims at each level of the pyramid at various levels
dims = slide_in.level_dimensions
num_levels = len(dims)
print("Number of levels in this image are:", num_levels)
print("Dimensions of various levels in this image are:", dims)
# understand by how much are levels downsampled from the original image
factors = slide_in.level_downsamples
print("Each level is downsampled by an amount of: ", factors)
OUTPUT
Number of levels in this image are: 9
Dimensions of various levels in this image are: ((46000, 32914), (23000, 16457), (11500, 8228), (5750, 4114), (2875, 2057), (1437, 1028), (718, 514), (359, 257), (179, 128))
Each level is downsampled by an amount of: (1.0, 2.0, 4.000121536217793, 8.000243072435586, 16.00048614487117, 32.01432201760585, 64.05093591147048, 128.10187182294095, 257.06193261173183)
Pyramidal file structure and tiling are quite challenging to understand, and different file formats can actually implement the concepts a bit differently. Pyramids are actually a more general concept in image processing. The key concept about them to understand are that a pyramidal image simply represents the image at multiple scales, and that could be implemented in many different ways. Usually we don’t create such pyramids by hand, but rather, you guessed it, algorithmically with code. The pictures below illustrate the concepts involved in tiled pyramical images.
Images with modifications from article “KMIT-Pathology: Digital Pathology AI Platform for Cancer Biomarkers Identification on Whole Slide Images” with CC BY 4.0 license. Subramanian, Rajasekaran & Rajasekaran, Devikarubi & Tapadia, Rohit & Singh, Rochan. (2022). KMIT-Pathology: Digital Pathology AI Platform for Cancer Biomarkers Identification on Whole Slide Images. International Journal of Advanced Computer Science and Applications. 13. 10.14569/IJACSA.2022.0131170.
One final note to consider about this pyramical tiled structure of
images is that it may well be the wave of the future for more types of
medical images as medical images get to higher and higher resolutions,
and more and more data is stored in clouds.
Content from Anonymizing Medical Images
Last updated on 2024-11-03 | Edit this page
Estimated time: 70 minutes
Overview
Questions
- What types of data make patient’s imaging data identifiable?
- How can I ensure the safe sharing of medical image data?
- How can I remove specific metadata from DICOM files?
Objectives
- Provide examples of data that makes patient images identifiable
- Discuss the concepts of identifiable data and anonymization
- Demonstrate how approaches to anonymizing identifiable images
- Demonstrate the use of the Pydicom library to manage DICOM metadata
Introduction
Each of us is similar yet unique, and this individuality can make us identifiable, posing challenges for medical research. While open data sharing advances research, most patients would not want their medical details shared if they could be identified. In most countries, patient information is protected by law.
Metadata elements in imaging, such as patient names and addresses, are often clearly designed to identify patients. However, the uniqueness of patients means that even images without obvious metadata, such as names, can potentially be identified as belonging to a specific individual. With advancements in facial recognition software and search engines, images we previously thought were non-identifiable, like head CTs, MRIs, or even PET scans, can theoretically be traced back to a specific patient. To address this, we can implement de-identification strategies to create shareable data.
Types of Patient Identifying Data
Metadata
DICOM files contain metadata, which includes various types of identifying information that should remain confidential. The easiest way to mitigate issues with DICOM metadata is to avoid having it in the first place. If possible, opt to receive just the images and select metadata rather than the entire DICOM file. When sharing data with collaborators, there is often no need to share the full DICOM files.
Text on Images
Occasionally, technicians will “burn” information directly onto images as part of a burned-in annotation. This means they change the image itself with text that becomes part of the image pixels. This may include details such as diagnoses, demographics, or the patient’s name. Fortunately, this text is usually typed rather than handwritten, making it recognizable by optical character recognition (OCR) functions. Often, this text is placed away from the center of the image, allowing for clever cropping to eliminate it entirely in some datasets.
Challenge: Getting rid of identifying burned in data
An ultrasound (taken from the public internet on a creative commons license lisence here.) must be kept at it’s existing height and width dimensions. You should change the image from RGB to grayscale for your operations. It can be opened as follows:
PYTHON
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, filters
image = 'data/anonym/identifiable_us.jpg'
io.imshow(image)
io.show()
OUTPUT
Write code for two approaches that de-identify (remove the annotations from) the ultrasound image data.
The image has identifying metadata. You should identify not only the name, but the date and place as problematic. You can take three different approaches. One would be to mask the data, the other would be to blur it, and finally you can just crop it out entirely. First we will show a selectively blurred image:
PYTHON
from skimage.filters import gaussian
from skimage.color import rgb2gray
image_base = io.imread(image)
image_base = rgb2gray(image_base)
sub_im = image_base[0:78,:].copy()
blur_sub_im = gaussian(sub_im, sigma=9)
final_image = np.zeros(image_base.shape)
final_image[0:78,:] = blur_sub_im
final_image[79:,:]= image_base[79:, :]
io.imshow(final_image)
OUTPUT
We could have also just make a simple zero-mask:
OUTPUT
Finally, you could always just chop off the offending part so to speak, and resize. However if you examine the image when we try this solution you should notice that the image has changed in aspect ratio, thus pixels values have changed. This may not be the ideal solution for situations when you want to maintain the exact image pixels. Nonetheless you can crop and resize as below:
PYTHON
from skimage.transform import resize
final_image= image_base[79:, :]
final_image = resize(final_image,image_base.shape)
io.imshow(final_image)
OUTPUT
Note there are other valid solutions to getting rid of what is identifying image data, but the first two shown (blurring and masking) are very common and straightforward. You have now seen three approaches to removing some of the visual data on a 2-D image. The same approaches can be taken on a 3D image.
Let’s take another look at an image we used earlier:
PYTHON
import matplotlib.pyplot as plt
import SimpleITK as sitk
from skimage import io
from ipywidgets import interact, fixed
from IPython.display import clear_output
import os
sag_image = sitk.ReadImage("data/sitk/A1_grayT1.nrrd", sitk.sitkFloat32)
cor_image = sitk.PermuteAxes(sag_image, [2, 1, 0])
# General display function for any two 3D images
def display_images(image1_z, image2_z, image1_npa, image2_npa, title1="Image 1", title2="Image 2"):
plt.subplots(1, 2, figsize=(10, 8))
# Display the first image
plt.subplot(1, 2, 1)
plt.imshow(image1_npa[image1_z, :, :], cmap=plt.cm.Greys_r)
plt.title(title1)
plt.axis('off')
# Display the second image
plt.subplot(1, 2, 2)
plt.imshow(image2_npa[image2_z, :, :], cmap=plt.cm.Greys_r)
plt.title(title2)
plt.axis('off')
plt.show()
# Display the sagittal and coronal views of the original image
interact(
display_images,
image1_z=(0, sag_image.GetSize()[2] - 1),
image2_z=(0, cor_image.GetSize()[2] - 1),
image1_npa=fixed(sitk.GetArrayViewFromImage(sag_image)),
image2_npa=fixed(sitk.GetArrayViewFromImage(cor_image)),
title1=fixed("Sagittal Cut"),
title2=fixed("Coronal Cut")
)
OUTPUT
Here we can see an analagous technique to one we used with the 2D ultrasound. Identifying data has been cropped out.
Challenge: Are we truly deidentified?(Optional)
Look at the head MRI above. Are all such images, including CTs, with some of the soft tissue stripped 100 percent deidentified? Give arguments why and why not
In most cases images like the above are de-identified. The image above is only an image, not a DICOM with potentially identifying metadata. Further we are missing the nose, therefore putting this in a reverse look-up engine online will not yield identifying results. On the other hand theoretically we could have an identifiable image. One potentially identifiable case could be someone with a very identifiable neck, skull bones (these will be more visible on CT) and/or ears. Additionally in the case of patients who have had some kind of rare pathology and surgery, they may still be identifiable for some familiar with their cases.
Faces in Images
A full CT, MRI, or PET scan of the head can be reconstructed into a detailed facial image, potentially revealing the patient’s identity and demographic information, such as ethnicity and gender. To mitigate this risk, many image analysis programs employ ‘defacing’ techniques to obscure these identifiable features.
We could in theory write our own defacing algorithm. For such an algorithm libraries such as SITK or skimage (sci-kit image) or even openCV provide several useful built in functions including morphological operations such as erosion, dilation, grow-from-seed and other types of operations such as connected component analysis and masking. Which library you choose will often be based on minimizing the number of libraries in your total code, because mixing close libraries with similarly named functions is a bad idea.
Challenge: Soft tissue stripping(Optional)
Look at the head MRI above. Try using SITK to get rid of some of the soft tissue in the image (already loaded into sag_image).
We can approach the problem in various ways. Below is one solution with erosion, dilation and masking:
PYTHON
# Apply thresholding to remove soft tissues by first taking out the air then dilating and cleaning
# the assumption is that the black around the brain is zero and low values
# Create the brain mask
lower_thresh = 0
upper_thresh = 100
brain_mask = sitk.BinaryThreshold(sag_image, lowerThreshold=lower_thresh, upperThreshold=upper_thresh)
# Morphological operations to clean the mask
brain_mask_cleaned = sitk.BinaryDilate(brain_mask, [5, 5, 5])
brain_mask_cleaned = sitk.BinaryErode(brain_mask_cleaned, [5, 5, 5])
# Display the original and cleaned mask images using the general display function
interact(
display_images,
image1_z=(0, brain_mask.GetSize()[2] - 1),
image2_z=(0, brain_mask_cleaned.GetSize()[2] - 1),
image1_npa=fixed(sitk.GetArrayViewFromImage(brain_mask)),
image2_npa=fixed(sitk.GetArrayViewFromImage(brain_mask_cleaned)),
title1=fixed("Original Mask"),
title2=fixed("Cleaned Mask")
)
Then we can further clean with a connected component analysis that throws out our small floaters, and use the inverse as out final mask.
PYTHON
def keep_largest_component(mask_image):
# Ensure mask_image is binary (0 and 1 values)
mask_image = sitk.Cast(mask_image, sitk.sitkUInt8)
# Label connected components in the mask image
labeled_image = sitk.ConnectedComponent(mask_image)
# Measure the size of each labeled component
label_shape_statistics = sitk.LabelShapeStatisticsImageFilter()
label_shape_statistics.Execute(labeled_image)
# Count and print the number of connected components
component_count = len(label_shape_statistics.GetLabels())
print(f"Number of connected components before filtering: {component_count}")
# Find the label with the largest size
largest_label = max(
label_shape_statistics.GetLabels(),
key=lambda label: label_shape_statistics.GetPhysicalSize(label)
)
# Create a new mask with only the largest component
largest_component_mask = sitk.BinaryThreshold(labeled_image, lowerThreshold=largest_label, upperThreshold=largest_label, insideValue=1, outsideValue=0)
# Verify the result by counting the components in the resulting image
labeled_result = sitk.ConnectedComponent(largest_component_mask)
label_shape_statistics.Execute(labeled_result)
result_component_count = len(label_shape_statistics.GetLabels())
print(f"Number of connected components after filtering: {result_component_count}")
return largest_component_mask
largest_component_mask = keep_largest_component(brain_mask_cleaned)
# we actually want the opposite mask so we will invert the mask
inverted_mask = sitk.BinaryNot(largest_component_mask)
# Apply the mask to the image
brain_only = sitk.Mask(sag_image, inverted_mask)
interact(
display_images,
image1_z = (0,brain_only.GetSize()[2]-1),
image2_z = (0,largest_component_mask.GetSize()[2]-1),
image1_npa = fixed(sitk.GetArrayViewFromImage(brain_only)),
image2_npa = fixed(sitk.GetArrayViewFromImage(largest_component_mask)),
title1=fixed("new image"),
title2=fixed("mask")
)
OUTPUT
Now you may notice that we put together and erosion and dilation, and we could have used instead a closure. There are many ways to solve this problem. An entirely different approach would be to use grow-from seed as below (not this may take a while to execute):
PYTHON
def guess_seed_point(img):
"""
This function guesses a seed point for the brain in the middle of the image, and returns some seed points.
"""
possible_point = img.GetSize()[0]//2, img.GetSize()[1]//2, img.GetSize()[2]//2
# Get the pixel value at the potential location
pixel_value = img.GetPixel(*possible_point)
if pixel_value > 0:
picked_point = possible_point
else:
# just move over a bit and hope for better
new_possible_point = img.GetSize()[0]//2 + img.GetSize()[0]//10 , img.GetSize()[1]//2, img.GetSize()[2]//2
picked_point = new_possible_point
return picked_point
# do some reality check of a look at the value in your seed point
seed_point = guess_seed_point(sag_image)
pixel_value = sag_image.GetPixel(seed_point)
print(pixel_value)
OUTPUT
394.37042236328125
Grow from seed, then display:
PYTHON
seed_point = guess_seed_point(sag_image)
lower_threshold = 370 # lower threshold
upper_threshold = 480 # upper threshold
seed_mask = sitk.ConnectedThreshold(
sag_image, seedList=[seed_point],
lower=lower_threshold,
upper=upper_threshold)
# let's dilate up a bit
seed_mask= sitk.BinaryDilate(seed_mask, [5, 5, 5])
# apply the mask to the image
brain_only = sitk.Mask(sag_image, seed_mask)
# display to see what happened
interact(
display_images,
image1_z=(0, sag_image.GetSize()[2] - 1),
image2_z=(0, brain_only.GetSize()[2] - 1),
image1_npa=fixed(sitk.GetArrayViewFromImage(sag_image)),
image2_npa=fixed(sitk.GetArrayViewFromImage(brain_only)),
title1=fixed("Original"),
title2=fixed("Seeded and Masked")
)
OUTPUT
In the two solutions above we made masks. They each have different problems we can see. For example in the grown from seed images we stripped out part of the genu and splenium of the corpus callosum not to mention the entire pons. An alternative is that could also have used a registered atlas as a mask. Regardless of the base algorithms we begin with, if we want a better image, we can keep tinkering with our algorithms.
In the provided solutions to the above optional challenge we didn’t get rid of a lot of the neck, and had other issues. A different approach we could have taken would have been to register the brain to a brain-like shape and use this as a mask. Nonetheless the given solutions shows two potential approaches to removing tissue (which could in some cases lead to identification) you don’t want in an image. Many researchers do not want to waste time optimizing a skull stripping technique if this was not the research question, so they use a pre-made technique.
There are various tools available for defacing head imaging, ranging from fully developed software products like FreeSurfer, which includes built-in defacing capabilities, to specialized functions within coding libraries. Some of these tools strip off all of the skull and soft tissue which may be useful for analysis even if we don’t care about deidentification e.g. if we only want to look at brain tissue.
However, a key issue under current investigation is that some defacing algorithms may inadvertently alter more than just the facial features. Emerging research suggests that these algorithms might also affect the morphometry of the brain image. This could lead to the unintended loss or distortion of critical data. Therefore, it is advisable to proceed with caution and, whenever possible, compare the original and defaced images to ensure that important information remains intact and unaltered.
Other Parts of Images
Patient identity can often be inferred with just a few pieces of data. In some cases, a single piece of information can be enough to track down a patient’s identity, especially if medical files are accessible. For instance, a serial number or other identifying number on a medical device may be traceable back to a specific patient.
In other situations, slightly more data might be required to identify a patient. Some patients may wear unique jewelry, such as a MedicAlert bracelet or necklace with initials or a name. While most routine ambulatory images are taken without jewelry, in emergency situations, medical personnel may not have had the time to remove these items. The more data points we have on a patient, the easier it becomes to identify them.
Metadata
Metadata is in the broadest sense data about our data. In the practical sense we need to understand it is a place outside the image that identifying data may be lurking in.
Various tools are available to help de-identify DICOM files in terms of metadata. A notable one is DicomAnonymizer, an open-source tool written in Python.
In some cases, you may need to examine and remove metadata manually or programmatically. For example, in some countries, DICOM fields are used inconsistently, and patient-identifying data can appear in unexpected fields. Therefore, careful examination and customized removal of metadata may be necessary.
Many Ways to Handle a DICOM:
- Multiple libraries, such as Pydicom and SimpleITK (SITK), allow you to read, access, and manipulate DICOM metadata.
- DICOMs follow an extremely complex standard, so it is usually better to use existing libraries rather than raw Python to handle them.
For various reasons, we may prefer Pydicom, SITK, or another method to handle DICOM metadata, typically based on the principle of minimizing dependencies and maintaining simplicity. SITK was introduced earlier in this course. Pydicom is an excellent alternative, particularly because of its comprehensive documentation.
Now, let’s see how to open a DICOM file and work with it using Pydicom.
First, let’s import Pydicom and read in a CT scan:
PYTHON
import pydicom
from pydicom import dcmread
fpath = "data/anonym/our_sample_dicom.dcm"
ds = dcmread(fpath)
print(ds)
OUTPUT
Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length UL: 218
(0002, 0001) File Meta Information Version OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID UI: CT Image Storage
(0002, 0003) Media Storage SOP Instance UID UI: 1.3.46.670589.33.1.63849049636503447100001.4758671761353145811
(0002, 0010) Transfer Syntax UID UI: JPEG Lossless, Non-Hierarchical, First-Order Prediction (Process 14 [Selection Value 1])
(0002, 0012) Implementation Class UID UI: 1.2.840.113845.1.1
(0002, 0013) Implementation Version Name SH: 'Syn7,3,0,258'
(0002, 0016) Source Application Entity Title AE: 'SynapseDicomSCP'
-------------------------------------------------
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0008, 0008) Image Type CS: ['DERIVED', 'SECONDARY', 'MPR']
(0008, 0012) Instance Creation Date DA: '20240418'
(0008, 0013) Instance Creation Time TM: '150716.503'
(0008, 0016) SOP Class UID UI: CT Image Storage
(0008, 0018) SOP Instance UID UI: 1.3.46.670589.33.1.63849049636503447100001.4758671761353145811
(0008, 0020) Study Date DA: '20240418'
(0008, 0022) Acquisition Date DA: '20240418'
(0008, 0023) Content Date DA: '20240418'
(0008, 002a) Acquisition DateTime DT: '20240418150313.020'
(0008, 0030) Study Time TM: '150045'
(0008, 0032) Acquisition Time TM: '150313'
(0008, 0033) Content Time TM: '150314.375'
(0008, 0050) Accession Number SH: '2001433888'
(0008, 0060) Modality CS: 'CT'
(0008, 0070) Manufacturer LO: 'Philips'
(0008, 0080) Institution Name LO: 'BovenIJ Ziekenhuis iCT'
(0008, 0081) Institution Address ST: ''
(0008, 0090) Referring Physician's Name PN: 'WILTING^I^I^""'
(0008, 1010) Station Name SH: 'HOST-999999'
(0008, 1030) Study Description LO: 'CT thorax met iv contrast'
(0008, 103e) Series Description LO: 'Cor IMR med'
(0008, 1040) Institutional Department Name LO: 'Radiology'
(0008, 1080) Admitting Diagnoses Description LO: ''
(0008, 1084) Admitting Diagnoses Code Sequence 0 item(s) ----
(0008, 1090) Manufacturer's Model Name LO: 'iCT 256'
(0008, 1111) Referenced Performed Procedure Step Sequence 1 item(s) ----
(0008, 1150) Referenced SOP Class UID UI: Modality Performed Procedure Step SOP Class
(0008, 1155) Referenced SOP Instance UID UI: 1.3.46.670589.33.1.63849049241567858000001.4675122277016890611
---------
(0008, 1140) Referenced Image Sequence 1 item(s) ----
(0008, 1150) Referenced SOP Class UID UI: CT Image Storage
(0008, 1155) Referenced SOP Instance UID UI: 1.3.46.670589.33.1.63849049294969912500001.5475332148846191441
---------
(0008, 3010) Irradiation Event UID UI: 1.3.46.670589.33.1.63849049343237673200010.5507538603167078985
(0010, 0010) Patient's Name PN: 'OurBeloved^Colleague'
(0010, 0020) Patient ID LO: 'party like 1999'
(0010, 0030) Patient's Birth Date DA: '19421104'
(0010, 0040) Patient's Sex CS: 'M'
(0010, 1000) Other Patient IDs LO: '1989442112'
(0010, 1010) Patient's Age AS: '041Y'
(0018, 0010) Contrast/Bolus Agent LO: 'Iodine'
(0018, 0015) Body Part Examined CS: 'CHEST'
(0018, 0022) Scan Options CS: 'HELIX'
(0018, 0050) Slice Thickness DS: '2.0'
(0018, 0060) KVP DS: '100.0'
(0018, 0088) Spacing Between Slices DS: '2.0'
(0018, 0090) Data Collection Diameter DS: '500.0'
(0018, 1000) Device Serial Number LO: ''
(0018, 1020) Software Versions LO: '4.1'
(0018, 1030) Protocol Name LO: 'Thorax std /Thorax'
(0018, 1040) Contrast/Bolus Route LO: 'IV'
(0018, 1041) Contrast/Bolus Volume DS: '80.0'
(0018, 1044) Contrast/Bolus Total Dose DS: '40.0'
(0018, 1046) Contrast Flow Rate DS: [3, 3]
(0018, 1047) Contrast Flow Duration DS: [17, 10]
(0018, 1049) Contrast/Bolus Ingredient Concentra DS: '300.0'
(0018, 1100) Reconstruction Diameter DS: '348.0'
(0018, 1110) Distance Source to Detector DS: '1040.0'
(0018, 1111) Distance Source to Patient DS: '570.0'
(0018, 1120) Gantry/Detector Tilt DS: '0.0'
(0018, 1130) Table Height DS: '85.1'
(0018, 1150) Exposure Time IS: '434'
(0018, 1151) X-Ray Tube Current IS: '258'
(0018, 1152) Exposure IS: '108'
(0018, 1160) Filter Type SH: 'IMR'
(0018, 1210) Convolution Kernel SH: 'IMR1,Soft Tissue'
(0018, 5100) Patient Position CS: 'FFS'
(0018, 9305) Revolution Time FD: 0.33
(0018, 9306) Single Collimation Width FD: 0.625
(0018, 9307) Total Collimation Width FD: 80.0
(0018, 9309) Table Speed FD: 185.0
(0018, 9310) Table Feed per Rotation FD: 97.664
(0018, 9311) Spiral Pitch Factor FD: 0.763
(0018, 9345) CTDIvol FD: 4.330253533859318
(0018, a001) Contributing Equipment Sequence 1 item(s) ----
(0008, 0070) Manufacturer LO: 'PHILIPS'
(0008, 0080) Institution Name LO: 'BRILLIANCE4'
(0008, 0081) Institution Address ST: 'BRILLIANCE4'
(0008, 1010) Station Name SH: 'HOST-999999'
(0008, 1040) Institutional Department Name LO: 'BRILLIANCE4'
(0008, 1090) Manufacturer's Model Name LO: 'BRILLIANCE4'
(0018, 1000) Device Serial Number LO: 'BRILLIANCE4'
(0018, 1020) Software Versions LO: '4.5.0.30020'
(0040, a170) Purpose of Reference Code Sequence 1 item(s) ----
(0008, 0100) Code Value SH: '109102'
(0008, 0102) Coding Scheme Designator SH: 'DCM'
(0008, 0104) Code Meaning LO: 'Processing Equipment'
---------
---------
(0020, 000d) Study Instance UID UI: 1.3.46.670589.33.1.63849049241560857600001.4706589000974752499
(0020, 000e) Series Instance UID UI: 1.3.46.670589.33.1.63849049343237673200004.5226562961912261811
(0020, 0010) Study ID SH: '8041'
(0020, 0011) Series Number IS: '203'
(0020, 0012) Acquisition Number IS: '2'
(0020, 0013) Instance Number IS: '1'
(0020, 0032) Image Position (Patient) DS: [-172.7884, 8.90000000000001, 1201.43792746114]
(0020, 0037) Image Orientation (Patient) DS: [1, 0, 0, 0, 0, -1]
(0020, 0052) Frame of Reference UID UI: 1.3.46.670589.33.1.63849049263758127200002.5362237490253193614
(0020, 1040) Position Reference Indicator LO: ''
(0020, 4000) Image Comments LT: 'Cor IMR med'
(0028, 0002) Samples per Pixel US: 1
(0028, 0004) Photometric Interpretation CS: 'MONOCHROME2'
(0028, 0010) Rows US: 832
(0028, 0011) Columns US: 772
(0028, 0030) Pixel Spacing DS: [0.4507772, 0.4507772]
(0028, 0100) Bits Allocated US: 16
(0028, 0101) Bits Stored US: 12
(0028, 0102) High Bit US: 11
(0028, 0103) Pixel Representation US: 0
(0028, 1050) Window Center DS: [50, 50]
(0028, 1051) Window Width DS: [350, 350]
(0028, 1052) Rescale Intercept DS: '-1024.0'
(0028, 1053) Rescale Slope DS: '1.0'
(0032, 1033) Requesting Service LO: 'CHIPSOFT'
(0040, 1001) Requested Procedure ID SH: 'CT5001IV'
(0054, 1001) Units CS: 'HU'
(00e1, 0010) Private Creator LO: 'ELSCINT1'
(00e1, 1036) Private tag data CS: 'YES'
(00e1, 1040) [Image Label] SH: 'Cor IMR med'
(00e1, 1046) Private tag data OB: Array of 512 elements
(01f1, 0010) Private Creator LO: 'ELSCINT1'
(01f1, 1001) [Acquisition Type] CS: 'SPIRAL'
(01f1, 1002) [Unknown] CS: 'STANDARD'
(01f1, 100e) [Unknown] FL: 0.0
(01f1, 1027) [Rotation Time] DS: '0.33'
(01f1, 1032) [Image View Convention] CS: 'RIGHT_ON_LEFT'
(01f1, 104a) [Unknown] SH: 'DOM'
(01f1, 104b) [Unknown] SH: '128x0.625'
(01f1, 104d) [Unknown] SH: 'NO'
(01f1, 104e) [Unknown] SH: 'Chest'
(01f1, 1054) Private tag data IS: '11'
(01f1, 1056) Private tag data LO: '30.0451206729581'
(01f7, 0010) Private Creator LO: 'ELSCINT1'
(01f7, 1022) [Unknown] UI: 1.3.46.670589.33.1.63849049343237673200010.5507538603167078985
(07a1, 0010) Private Creator LO: 'ELSCINT1'
(07a1, 1010) [Tamar Software Version] LO: '4.0.0'
(7fe0, 0010) Pixel Data OB: Array of 309328 elements
Challenge: Identifying Safe Metadata in DICOM
Can you determine which metadata for this CT scan is likely safe, meaning it does not lead to patient identification? When would you choose to retain such data?
Metadata related to the machine, image type, and file type are generally safe. This information is particularly valuable when sorting through numerous DICOM files to locate specific types of images or when generating tabular data for harmonization purposes.
We can modify elements of our DICOM metadata:
OUTPUT
'OurBeloved^Colleague'
OUTPUT
(0010, 0010) Patient's Name PN: 'Citizen^Almoni'
In some cases, as here we are dealing with a standard keyword.
The keyword PatientName
is in programming terms technically
a property of the class FileDataset
, but here we are using
“keyword” to refer to it and other very standard properties of the
DICOM. Certain keywords can be modified as it follows:
OUTPUT
(0010, 0010) Patient's Name PN: 'Almoni^Shmalmoni'
You can also just set an element to empty by using None:
OUTPUT
(0010, 0010) Patient's Name PN: None
You can also delete and add elements. After making modifications, remember to save your file:
We recommend removing at least the patient IDs and birthdates in most cases. Additionally, consider examining the data elements ‘OtherPatientIDs’ and ‘OtherPatientIDsSequence’.
Challenge: Accessing Additional Patient Identifying Data
How can you access and print additional patient identifying data? Hint: Refer to the documentation and compare with what we have already printed.
Pydicom offers a wide range of capabilities. You can visualize your DICOM data in a hierarchical tree format for user-friendly GUI reading. It supports downsizing images and handling waveform data such as EKGs. By integrating with Matplotlib, you can load and plot files seamlessly. Before adding additional libraries, explore Pydicom’s full potential to leverage its extensive functionalities.
Key Points
- Certain metadata should almost always be removed from DICOM files before sharing
- Automated tools are available to strip metadata from DICOMs, but manual verification is necessary due to inconsistencies in how fields are utilized
- Several Python libraries enable access to DICOM metadata
- Sharing only image files such as JPEGs or NIfTI can mitigate risks associated with metadata
- Imaging data alone, even without explicit metadata, can sometimes lead to patient identification
- You may need to preprocess images themselves so patients are de-identified
- Tools exist to deface images to further protect patient identity
Content from Generative AI in Medical Imaging
Last updated on 2024-08-15 | Edit this page
Estimated time: 40 minutes
Overview
Questions
- What is generative AI?
- How can generative AI be safely used in my work?
Objectives
- Showcase practical examples of generative AI applications
- Explore critical considerations regarding the risks and challenges associated with generative AI
Introduction
Generative artificial intelligence (AI) includes technologies that create or generate data. Since the early 2020s, there has been significant attention on large language models (LLMs) like ChatGPT, Copilot, and LLaMA, alongside image generation tools such as Stable Diffusion, Midjourney, and DALL-E. These tools not only create images but can also manipulate styles.
The applications of generative AI span widely across fields like medical imaging, where they hold potential for image interpretation and numerous other tasks. Of particular interest to participants in this course are the capabilities of LLMs to generate code and produce large volumes of synthetic data. Ongoing research continues to explore these capabilities.
However, the safety implications of these technologies remain a subject of debate. Depending on the software or model used, data entered into the system may become the property of the software’s creators. It is crucial to exercise caution when inputting sensitive information, such as patient data, into these systems. Understanding where and how data is stored (i.e., whether on your servers or in the cloud) is essential to safeguard privacy and confidentiality.
Mastering the Use of Generative AI Tools
Architectures of Generative AI
Generative AI encompasses models capable of generating new data across various formats: text, images, video, or audio. Several prominent architectures, such as generative adversarial networks (GANs), variational autoencoders, diffusion models, and transformers, achieve this capability. While the technical intricacies of these architectures exceed this course’s scope, understanding their operational principles can illuminate the nature of their outputs. For instance, models treating data as sequential sequences (e.g., pixels or words) predict the next element based on probabilities derived from the training dataset, potentially perpetuating common patterns rather than reflecting absolute truths.
However, using generative algorithms carries inherent risks, including inadvertently embedding biases into both data and algorithms. While humans cannot infer patient ethnicity from body imaging, recent studies demonstrate that certain AI algorithms can leverage correlations in data, introducing potential ethical implications. This represents just one facet of the risks associated with these technologies, which we will further explore in the following safety section.
Safely Leveraging Generative AI
- Never upload non-anonymized sensitive patient data outside your secure servers
- Avoid using sensitive patient data with online tools (e.g., chatGPT, integrated tools like co-pilot, MS Office)
- Be mindful of internet connectivity with your code editors and tools
- Prefer systems that operate locally (on your machine or hospital servers)
- Acknowledge that tools may generate erroneous information (hallucinations)
- Don’t assume code generated by a LLM lacks potential bugs or harmful code
- Address intellectual property and ethical considerations when utilizing generative AI models
- Document model usage thoroughly in academic work, detailing versions and exact applications
- Recognize that the training corpus of such models may introduce biases
Stable Diffusion is an open-source tool widely used for image generation, capable of running locally on your machine. While older versions typically require GPUs, recent builds are rumored to run on CPUs, although at a slower pace. For efficient large-scale image generation, utilizing a GPU is recommended to save time.
One limitation of Stable Diffusion, and similar tools, is that the model cannot be customized with your own datasets, restricting the ability to train or modify it according to specific needs.
Challenge: Assessing the Risks of Local Model Deployment
Can you identify a few risks associated with running a model entirely on your machine?
One significant risk of running any model locally is the potential for malicious code. Since you need to download the model to your machine, there is no guarantee that it will not contain malware. To mitigate this risk, we recommend using open-source models, which allow for community scrutiny and transparency.
Another concern is the environmental impact of model usage. Training and maintaining these models consume considerable resources, including carbon and water. We advise against running models like ChatGPT continuously due to their substantial environmental footprint. Even when operating a model locally, there are inherent risks, such as relying on a smaller training dataset to fit on your machine, which could compromise the model’s quality and performance.
Maximizing the Effectiveness of Generative AI
The output of generative tools is highly dependent on the specific words or images inputted, a process known as prompt engineering.
To create content that balances a dataset for machine learning, start by analyzing the dataset’s balance according to important labels. Then, describe images with these labels and build prompts based on the descriptions of label categories with fewer images. This approach helps focus the model’s use on generating useful outcomes while minimizing risks.
Keep in mind that content generation can be a somewhat stochastic process. Therefore, you may need to experiment with and refine your prompts and even the sequences of prompts to achieve the desired results.
Effective Prompting Tips
- Ensure accuracy in spelling and grammar to avoid misdirection
- Provide clear and detailed prompts
- Explain the context or goal (e.g., “helping my son… can you explain it in a specific way?”)
- Try different wordings to refine results
- Fact-check thoroughly
- First, ask the tool to fact-check its output
- Then, verify using search engines and experts
- Never fully trust references or facts generated by an LLM, seek human-generated content for confirmation
- Clearly indicate if you need a particular format (e.g., bullet points, square images)
- Use prompts like “you are a Python expert” or “you are an expert pathologist” to guide the tool
- For more accurate outputs, fine-tune foundation models with your specific data
These tips will help you optimize the quality and relevance of the content generated by AI tools.
Attention to detail in creating prompts is critical. For example, the following image was generated from the misspelling of “X-ray” with a popular AI tool:
While this result is laughable, more subtle mistakes can be harder to identify, especially for those without extensive training in a specific field. Therefore, having a specialist review the generated data is crucial to ensure its validity.
It is also important to remember that the training data for some algorithms is often sourced from the internet, which includes a lot of irrelevant or misleading content. For instance, while there are many cute cat pictures available, there are very few good examples of rare medical conditions like desmoplastic infantile ganglioma. Additionally, using terms like “CAT scan” (historically referring to computer axial tomography) might result in images of cats rather than the intended medical imagery:
Key Points
- Generative programs can create synthetic data, potentially enhancing various algorithms
- Generative AI models have inherent limitations
- Running generative AI programs locally on your own server is safer than using programs that send prompts to external servers
- Exercise caution when entering patient data into generative AI programs
- Numerous policies exist to ensure the safe and ethical use of generative AI tools across institutions