Exploring open MRI datasets
Last updated on 2024-02-28 | Edit this page
Estimated time: 45 minutes
Overview
Questions
- How does standardizing neuroimaging data ease the data exploration process
Objectives
- Understand the BIDS format
- Use PyBIDS in to easily explore a BIDS dataset
Tutorial Dataset
In this episode, we will be using a subset of a publicly available dataset, ds000030, from openneuro.org. All of the datasets on OpenNeuro are already structured according to BIDS.
Downloading Data
DataLad
DataLad
installs the data - which for a dataset means
that we get the “small” data (i.e. the text files) and the download
instructions for the larger files. We can now navigate the dataset like
its a part of our file system and plan our analysis.
First, navigate to the folder where you’d like to download the dataset.
Let’s take a look at the participants.tsv
file to see
what the demographics for this dataset look like.
PYTHON
import pandas as pd
participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata
OUTPUT:
participant_id | diagnosis | age | gender | bart | bht | dwi | pamenc | pamret | rest | scap | stopsignal | T1w | taskswitch | ScannerSerialNumber | ghost_NoGhost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | sub-10159 | CONTROL | 30 | F | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
1 | sub-10171 | CONTROL | 24 | M | 1.0 | 1.0 | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
2 | sub-10189 | CONTROL | 49 | M | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
3 | sub-10193 | CONTROL | 40 | M | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 35343.0 | No_ghost |
4 | sub-10206 | CONTROL | 21 | M | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
From this table we can easily view unique diagnostic groups:
OUTPUT
array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)
Imagine we’d like to work with participants that are either CONTROL
or SCHZ (diagnosis
) and have both a T1w
(T1w == 1
) and rest (rest == 1
) scan. Also,
you’ll notice some of the T1w scans included in this dataset have a
ghosting artifact. We’ll need to filter these out as well
(ghost_NoGhost == 'No_ghost'
).
We’ll filter this data out like so:
PYTHON
participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) &
(participant_metadata.T1w == 1) &
(participant_metadata.rest == 1) &
(participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata['diagnosis'].unique()
OUTPUT
array(['CONTROL', 'SCHZ'], dtype=object)
From this we have a list of participants corresponding to a list of
participants who are of interest in our analysis. We can then use this
list in order to download participant from software such as
aws
or datalad
. In fact, this is exactly how
we set up a list of participants to download for the fMRI workshop!
Since we’ve already downloaded the dataset, we can now explore the
structure using PyBIDS, a Python API for querying, summarizing and
manipulating the BIDS folder structure.
Getting and dropping data
datalad get datalad drop
The pybids layout object indexes the BIDS folder. Indexing can take a
really long time, especially if you have several subjects, modalities,
scan types, etc. pybids
has an option to save the indexed
results to a SQLite database. This database can then be re-used the next
time you want to query the same BIDS folder.
PYTHON
layout.save("~/Desktop/dc-mri/data/ds000030/.db")
layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030", database_path="~/Desktop/dc-mri/data/ds000030/.db")
The pybids layout object also lets you query your BIDS dataset
according to a number of parameters by using a get_*()
method. We can get a list of the subjects we’ve downloaded from the
dataset.
OUTPUT
['10171',
'10292',
'10365',
'10438',
'10565',
'10788',
'11106',
'11108',
'11122',
...
'50083']
We can also pull a list of imaging modalities in the dataset:
OUTPUT
['anat', 'func']
As well as tasks and more!:
PYTHON
#Task fMRI
print(layout.get_tasks())
#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...)
print(layout.get_types())
OUTPUT
['rest']
['bold',
'brainmask',
'confounds',
'description',
'dtissue',
'fsaverage5',
'inflated',
'midthickness',
'participants',
'pial',
'preproc',
'probtissue',
'smoothwm',
'warp']
In addition we can specify sub-types of each BIDS category:
OUTPUT
['brainmask', 'confounds', 'fsaverage5', 'preproc']
We can use this functionality to pull all our fMRI NIfTI files:
OUTPUT
TO FILL
Finally, we can convert the data stored in bids.layout
into a pandas.DataFrame
:
OUTPUT:
path | modality | subject | task | type | |
---|---|---|---|---|---|
0 | ~/Desktop/dc-mri/data… | NaN | NaN | rest | bold |
1 | ~/Desktop/dc-mri/data… | NaN | NaN | NaN | participants |
2 | ~/Desktop/dc-mri/data… | NaN | NaN | NaN | NaN |
3 | ~/Desktop/dc-mri/data… | anat | 10565 | NaN | brainmask |
4 | ~/Desktop/dc-mri/data… | anat | 10565 | NaN | probtissue |
Key Points
- Public neuroimaging BIDS-compatible data repositories allow for pulling data easily.
- PyBIDS is a Python-based tool that allows for easy exploration of BIDS-formatted neuroimaging data.