Exploring open MRI datasets
Last updated on 2024-02-28 | Edit this page
Estimated time: 45 minutes
Overview
Questions
- How does standardizing neuroimaging data ease the data exploration process
Objectives
- Understand the BIDS format
- Use PyBIDS in to easily explore a BIDS dataset
Tutorial Dataset
In this episode, we will be using a subset of a publicly available dataset, ds000030, from openneuro.org. All of the datasets on OpenNeuro are already structured according to BIDS.
Downloading Data
DataLad
DataLad installs the data - which for a dataset means
that we get the “small” data (i.e. the text files) and the download
instructions for the larger files. We can now navigate the dataset like
its a part of our file system and plan our analysis.
First, navigate to the folder where you’d like to download the dataset.
Let’s take a look at the participants.tsv file to see
what the demographics for this dataset look like.
PYTHON
import pandas as pd
participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata
OUTPUT:
| participant_id | diagnosis | age | gender | bart | bht | dwi | pamenc | pamret | rest | scap | stopsignal | T1w | taskswitch | ScannerSerialNumber | ghost_NoGhost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | sub-10159 | CONTROL | 30 | F | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
| 1 | sub-10171 | CONTROL | 24 | M | 1.0 | 1.0 | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
| 2 | sub-10189 | CONTROL | 49 | M | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
| 3 | sub-10193 | CONTROL | 40 | M | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 35343.0 | No_ghost |
| 4 | sub-10206 | CONTROL | 21 | M | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 35343.0 | No_ghost |
From this table we can easily view unique diagnostic groups:
OUTPUT
array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)
Imagine we’d like to work with participants that are either CONTROL
or SCHZ (diagnosis) and have both a T1w
(T1w == 1) and rest (rest == 1) scan. Also,
you’ll notice some of the T1w scans included in this dataset have a
ghosting artifact. We’ll need to filter these out as well
(ghost_NoGhost == 'No_ghost').
We’ll filter this data out like so:
PYTHON
participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) &
(participant_metadata.T1w == 1) &
(participant_metadata.rest == 1) &
(participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata['diagnosis'].unique()
OUTPUT
array(['CONTROL', 'SCHZ'], dtype=object)
From this we have a list of participants corresponding to a list of
participants who are of interest in our analysis. We can then use this
list in order to download participant from software such as
aws or datalad. In fact, this is exactly how
we set up a list of participants to download for the fMRI workshop!
Since we’ve already downloaded the dataset, we can now explore the
structure using PyBIDS, a Python API for querying, summarizing and
manipulating the BIDS folder structure.
Getting and dropping data
datalad get datalad drop
The pybids layout object indexes the BIDS folder. Indexing can take a
really long time, especially if you have several subjects, modalities,
scan types, etc. pybids has an option to save the indexed
results to a SQLite database. This database can then be re-used the next
time you want to query the same BIDS folder.
PYTHON
layout.save("~/Desktop/dc-mri/data/ds000030/.db")
layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030", database_path="~/Desktop/dc-mri/data/ds000030/.db")
The pybids layout object also lets you query your BIDS dataset
according to a number of parameters by using a get_*()
method. We can get a list of the subjects we’ve downloaded from the
dataset.
OUTPUT
['10171',
'10292',
'10365',
'10438',
'10565',
'10788',
'11106',
'11108',
'11122',
...
'50083']
We can also pull a list of imaging modalities in the dataset:
OUTPUT
['anat', 'func']
As well as tasks and more!:
PYTHON
#Task fMRI
print(layout.get_tasks())
#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...)
print(layout.get_types())
OUTPUT
['rest']
['bold',
'brainmask',
'confounds',
'description',
'dtissue',
'fsaverage5',
'inflated',
'midthickness',
'participants',
'pial',
'preproc',
'probtissue',
'smoothwm',
'warp']
In addition we can specify sub-types of each BIDS category:
OUTPUT
['brainmask', 'confounds', 'fsaverage5', 'preproc']
We can use this functionality to pull all our fMRI NIfTI files:
OUTPUT
TO FILL
Finally, we can convert the data stored in bids.layout
into a pandas.DataFrame :
OUTPUT:
| path | modality | subject | task | type | |
|---|---|---|---|---|---|
| 0 | ~/Desktop/dc-mri/data… | NaN | NaN | rest | bold |
| 1 | ~/Desktop/dc-mri/data… | NaN | NaN | NaN | participants |
| 2 | ~/Desktop/dc-mri/data… | NaN | NaN | NaN | NaN |
| 3 | ~/Desktop/dc-mri/data… | anat | 10565 | NaN | brainmask |
| 4 | ~/Desktop/dc-mri/data… | anat | 10565 | NaN | probtissue |
- Public neuroimaging BIDS-compatible data repositories allow for pulling data easily.
- PyBIDS is a Python-based tool that allows for easy exploration of BIDS-formatted neuroimaging data.