Exploring open MRI datasets

Last updated on 2024-02-28 | Edit this page

Overview

Questions

How does standardizing neuroimaging data ease the data exploration process

Objectives

Understand the BIDS format
Use PyBIDS in to easily explore a BIDS dataset

Tutorial Dataset

In this episode, we will be using a subset of a publicly available dataset, ds000030, from openneuro.org. All of the datasets on OpenNeuro are already structured according to BIDS.

Downloading Data

DataLad

DataLad installs the data - which for a dataset means that we get the “small” data (i.e. the text files) and the download instructions for the larger files. We can now navigate the dataset like its a part of our file system and plan our analysis.

First, navigate to the folder where you’d like to download the dataset.

BASH

cd ~/Desktop/dc-mri/data
datalad install ///openneuro/ds000030

Let’s take a look at the participants.tsv file to see what the demographics for this dataset look like.

PYTHON

import pandas as pd

participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata

OUTPUT:

	participant_id	diagnosis	age	gender	bart	bht	dwi	pamenc	pamret	rest	scap	stopsignal	T1w	taskswitch	ScannerSerialNumber	ghost_NoGhost
0	sub-10159	CONTROL	30	F	1.0	NaN	1.0	NaN	NaN	1.0	1.0	1.0	1.0	1.0	35343.0	No_ghost
1	sub-10171	CONTROL	24	M	1.0	1.0	1.0	NaN	NaN	1.0	1.0	1.0	1.0	1.0	35343.0	No_ghost
2	sub-10189	CONTROL	49	M	1.0	NaN	1.0	NaN	NaN	1.0	1.0	1.0	1.0	1.0	35343.0	No_ghost
3	sub-10193	CONTROL	40	M	1.0	NaN	1.0	NaN	NaN	NaN	NaN	NaN	1.0	NaN	35343.0	No_ghost
4	sub-10206	CONTROL	21	M	1.0	NaN	1.0	NaN	NaN	1.0	1.0	1.0	1.0	1.0	35343.0	No_ghost

From this table we can easily view unique diagnostic groups:

PYTHON

participant_metadata['diagnosis'].unique()

OUTPUT

array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)

Imagine we’d like to work with participants that are either CONTROL or SCHZ (diagnosis) and have both a T1w (T1w == 1) and rest (rest == 1) scan. Also, you’ll notice some of the T1w scans included in this dataset have a ghosting artifact. We’ll need to filter these out as well (ghost_NoGhost == 'No_ghost').

We’ll filter this data out like so:

PYTHON

participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) &
                                            (participant_metadata.T1w == 1) &
                                            (participant_metadata.rest == 1) &
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata['diagnosis'].unique()

OUTPUT

array(['CONTROL', 'SCHZ'], dtype=object)

From this we have a list of participants corresponding to a list of participants who are of interest in our analysis. We can then use this list in order to download participant from software such as aws or datalad. In fact, this is exactly how we set up a list of participants to download for the fMRI workshop! Since we’ve already downloaded the dataset, we can now explore the structure using PyBIDS, a Python API for querying, summarizing and manipulating the BIDS folder structure.

Getting and dropping data

datalad get datalad drop

PYTHON

from bids.layout import BIDSLayout

layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030")

The pybids layout object indexes the BIDS folder. Indexing can take a really long time, especially if you have several subjects, modalities, scan types, etc. pybids has an option to save the indexed results to a SQLite database. This database can then be re-used the next time you want to query the same BIDS folder.

PYTHON

layout.save("~/Desktop/dc-mri/data/ds000030/.db")

layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030", database_path="~/Desktop/dc-mri/data/ds000030/.db")

The pybids layout object also lets you query your BIDS dataset according to a number of parameters by using a get_*() method. We can get a list of the subjects we’ve downloaded from the dataset.

PYTHON

layout.get_subjects()

OUTPUT

['10171',
 '10292',
 '10365',
 '10438',
 '10565',
 '10788',
 '11106',
 '11108',
 '11122',
   ...
 '50083']

We can also pull a list of imaging modalities in the dataset:

PYTHON

layout.get_modalities()

OUTPUT

['anat', 'func']

As well as tasks and more!:

PYTHON

#Task fMRI
print(layout.get_tasks())

#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...)
print(layout.get_types())

OUTPUT

['rest']

['bold',
 'brainmask',
 'confounds',
 'description',
 'dtissue',
 'fsaverage5',
 'inflated',
 'midthickness',
 'participants',
 'pial',
 'preproc',
 'probtissue',
 'smoothwm',
 'warp']

In addition we can specify sub-types of each BIDS category:

PYTHON

layout.get_types(modality='func')

OUTPUT

['brainmask', 'confounds', 'fsaverage5', 'preproc']

We can use this functionality to pull all our fMRI NIfTI files:

PYTHON

layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file')

OUTPUT

TO FILL

Finally, we can convert the data stored in bids.layout into a pandas.DataFrame :

PYTHON

df = layout.as_data_frame()
df.head()

OUTPUT:

	path	modality	subject	task	type
0	~/Desktop/dc-mri/data…	NaN	NaN	rest	bold
1	~/Desktop/dc-mri/data…	NaN	NaN	NaN	participants
2	~/Desktop/dc-mri/data…	NaN	NaN	NaN	NaN
3	~/Desktop/dc-mri/data…	anat	10565	NaN	brainmask
4	~/Desktop/dc-mri/data…	anat	10565	NaN	probtissue

Key Points

Public neuroimaging BIDS-compatible data repositories allow for pulling data easily.
PyBIDS is a Python-based tool that allows for easy exploration of BIDS-formatted neuroimaging data.