Exploring open MRI datasets
Overview
Teaching: 30 min
Exercises: 15 minQuestions
How does standardizing neuroimaging data ease the data exploration process
Objectives
Understand the BIDS format
Use PyBIDS in to easily explore a BIDS dataset
Tutorial Dataset
In this episode, we will be using a subset of a publicly available dataset, ds000030, from openneuro.org. All of the datasets on OpenNeuro are already structured according to BIDS.
Downloading Data
DataLad
DataLad
installs the data - which for a dataset means that we get the “small” data (i.e. the text files) and the download instructions for the larger files.
We can now navigate the dataset like its a part of our file system and plan our analysis.
First, navigate to the folder where you’d like to download the dataset.
cd ~/Desktop/dc-mri/data
datalad install ///openneuro/ds000030
Let’s take a look at the participants.tsv
file to see what the demographics for this dataset look like.
import pandas as pd
participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata
OUTPUT:
participant_id | diagnosis | age | gender | bart | bht | dwi | pamenc | pamret | rest | scap | stopsignal | T1w | taskswitch | ScannerSerialNumber | ghost_NoGhost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | sub-10159 | CONTROL | 30 | F | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
1 | sub-10171 | CONTROL | 24 | M | 1.0 | 1.0 | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
2 | sub-10189 | CONTROL | 49 | M | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
3 | sub-10193 | CONTROL | 40 | M | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 35343.0 | No_ghost |
4 | sub-10206 | CONTROL | 21 | M | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
From this table we can easily view unique diagnostic groups:
participant_metadata['diagnosis'].unique()
array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)
Imagine we’d like to work with participants that are either CONTROL or SCHZ (diagnosis
) and have both a T1w (T1w == 1
) and rest (rest == 1
) scan. Also, you’ll notice some of the T1w scans included in this dataset have a ghosting artifact. We’ll need to filter these out as well (ghost_NoGhost == 'No_ghost'
).
We’ll filter this data out like so:
participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) &
(participant_metadata.T1w == 1) &
(participant_metadata.rest == 1) &
(participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata['diagnosis'].unique()
array(['CONTROL', 'SCHZ'], dtype=object)
From this we have a list of participants corresponding to a list of participants who are of interest in our analysis.
We can then use this list in order to download participant from software such as aws
or datalad
.
In fact, this is exactly how we set up a list of participants to download for the fMRI workshop!
Since we’ve already downloaded the dataset, we can now explore the structure using PyBIDS, a Python API for querying, summarizing and manipulating the BIDS folder structure.
Getting and dropping data
datalad get datalad drop
from bids.layout import BIDSLayout
layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030")
The pybids layout object indexes the BIDS folder.
Indexing can take a really long time, especially if you have several subjects, modalities, scan types, etc.
pybids
has an option to save the indexed results to a SQLite database.
This database can then be re-used the next time you want to query the same BIDS folder.
layout.save("~/Desktop/dc-mri/data/ds000030/.db")
layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030", database_path="~/Desktop/dc-mri/data/ds000030/.db")
The pybids layout object also lets you query your BIDS dataset according to a number of parameters by using a get_*()
method.
We can get a list of the subjects we’ve downloaded from the dataset.
layout.get_subjects()
['10171',
'10292',
'10365',
'10438',
'10565',
'10788',
'11106',
'11108',
'11122',
...
'50083']
We can also pull a list of imaging modalities in the dataset:
layout.get_modalities()
['anat', 'func']
As well as tasks and more!:
#Task fMRI
print(layout.get_tasks())
#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...)
print(layout.get_types())
['rest']
['bold',
'brainmask',
'confounds',
'description',
'dtissue',
'fsaverage5',
'inflated',
'midthickness',
'participants',
'pial',
'preproc',
'probtissue',
'smoothwm',
'warp']
In addition we can specify sub-types of each BIDS category:
layout.get_types(modality='func')
['brainmask', 'confounds', 'fsaverage5', 'preproc']
We can use this functionality to pull all our fMRI NIfTI files:
layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file')
TO FILL
Finally, we can convert the data stored in bids.layout
into a pandas.DataFrame
:
df = layout.as_data_frame()
df.head()
OUTPUT:
path | modality | subject | task | type | |
---|---|---|---|---|---|
0 | ~/Desktop/dc-mri/data... | NaN | NaN | rest | bold |
1 | ~/Desktop/dc-mri/data... | NaN | NaN | NaN | participants |
2 | ~/Desktop/dc-mri/data... | NaN | NaN | NaN | NaN |
3 | ~/Desktop/dc-mri/data... | anat | 10565 | NaN | brainmask |
4 | ~/Desktop/dc-mri/data... | anat | 10565 | NaN | probtissue |
Key Points
BIDS is an organizational principle for neuroimaging data
PyBIDS is a Python-based tool that allows for easy exploration of BIDS-formatted neuroimaging data