Exploring open MRI datasets

Last updated on 2024-02-28 | Edit this page

Overview

Questions

  • How does standardizing neuroimaging data ease the data exploration process

Objectives

  • Understand the BIDS format
  • Use PyBIDS in to easily explore a BIDS dataset

Tutorial Dataset


In this episode, we will be using a subset of a publicly available dataset, ds000030, from openneuro.org. All of the datasets on OpenNeuro are already structured according to BIDS.

Downloading Data


DataLad

DataLad installs the data - which for a dataset means that we get the “small” data (i.e. the text files) and the download instructions for the larger files. We can now navigate the dataset like its a part of our file system and plan our analysis.

First, navigate to the folder where you’d like to download the dataset.

BASH

cd ~/Desktop/dc-mri/data
datalad install ///openneuro/ds000030

Let’s take a look at the participants.tsv file to see what the demographics for this dataset look like.

PYTHON

import pandas as pd

participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata

OUTPUT:

participant_id diagnosis age gender bart bht dwi pamenc pamret rest scap stopsignal T1w taskswitch ScannerSerialNumber ghost_NoGhost
0 sub-10159 CONTROL 30 F 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
1 sub-10171 CONTROL 24 M 1.0 1.0 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
2 sub-10189 CONTROL 49 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
3 sub-10193 CONTROL 40 M 1.0 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN 35343.0 No_ghost
4 sub-10206 CONTROL 21 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost

From this table we can easily view unique diagnostic groups:

PYTHON

participant_metadata['diagnosis'].unique()

OUTPUT

array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)

Imagine we’d like to work with participants that are either CONTROL or SCHZ (diagnosis) and have both a T1w (T1w == 1) and rest (rest == 1) scan. Also, you’ll notice some of the T1w scans included in this dataset have a ghosting artifact. We’ll need to filter these out as well (ghost_NoGhost == 'No_ghost').

We’ll filter this data out like so:

PYTHON

participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) &
                                            (participant_metadata.T1w == 1) &
                                            (participant_metadata.rest == 1) &
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata['diagnosis'].unique()

OUTPUT

array(['CONTROL', 'SCHZ'], dtype=object)

From this we have a list of participants corresponding to a list of participants who are of interest in our analysis. We can then use this list in order to download participant from software such as aws or datalad. In fact, this is exactly how we set up a list of participants to download for the fMRI workshop! Since we’ve already downloaded the dataset, we can now explore the structure using PyBIDS, a Python API for querying, summarizing and manipulating the BIDS folder structure.

Getting and dropping data

datalad get datalad drop

PYTHON

from bids.layout import BIDSLayout

layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030")

The pybids layout object indexes the BIDS folder. Indexing can take a really long time, especially if you have several subjects, modalities, scan types, etc. pybids has an option to save the indexed results to a SQLite database. This database can then be re-used the next time you want to query the same BIDS folder.

PYTHON

layout.save("~/Desktop/dc-mri/data/ds000030/.db")

layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030", database_path="~/Desktop/dc-mri/data/ds000030/.db")

The pybids layout object also lets you query your BIDS dataset according to a number of parameters by using a get_*() method. We can get a list of the subjects we’ve downloaded from the dataset.

PYTHON

layout.get_subjects()

OUTPUT

['10171',
 '10292',
 '10365',
 '10438',
 '10565',
 '10788',
 '11106',
 '11108',
 '11122',
   ...
 '50083']

We can also pull a list of imaging modalities in the dataset:

PYTHON

layout.get_modalities()

OUTPUT

['anat', 'func']

As well as tasks and more!:

PYTHON

#Task fMRI
print(layout.get_tasks())

#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...)
print(layout.get_types())

OUTPUT

['rest']

['bold',
 'brainmask',
 'confounds',
 'description',
 'dtissue',
 'fsaverage5',
 'inflated',
 'midthickness',
 'participants',
 'pial',
 'preproc',
 'probtissue',
 'smoothwm',
 'warp']

In addition we can specify sub-types of each BIDS category:

PYTHON

layout.get_types(modality='func')

OUTPUT

['brainmask', 'confounds', 'fsaverage5', 'preproc']

We can use this functionality to pull all our fMRI NIfTI files:

PYTHON

layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file')

OUTPUT

TO FILL

Finally, we can convert the data stored in bids.layout into a pandas.DataFrame :

PYTHON

df = layout.as_data_frame()
df.head()

OUTPUT:

path modality subject task type
0 ~/Desktop/dc-mri/data… NaN NaN rest bold
1 ~/Desktop/dc-mri/data… NaN NaN NaN participants
2 ~/Desktop/dc-mri/data… NaN NaN NaN NaN
3 ~/Desktop/dc-mri/data… anat 10565 NaN brainmask
4 ~/Desktop/dc-mri/data… anat 10565 NaN probtissue

Key Points

  • Public neuroimaging BIDS-compatible data repositories allow for pulling data easily.
  • PyBIDS is a Python-based tool that allows for easy exploration of BIDS-formatted neuroimaging data.