This lesson is in the early stages of development (Alpha version)

Exploring open MRI datasets


Teaching: 30 min
Exercises: 15 min
  • How does standardizing neuroimaging data ease the data exploration process

  • Understand the BIDS format

  • Use PyBIDS in to easily explore a BIDS dataset

Tutorial Dataset

In this episode, we will be using a subset of a publicly available dataset, ds000030, from All of the datasets on OpenNeuro are already structured according to BIDS.

Downloading Data


DataLad installs the data - which for a dataset means that we get the “small” data (i.e. the text files) and the download instructions for the larger files. We can now navigate the dataset like its a part of our file system and plan our analysis.

First, navigate to the folder where you’d like to download the dataset.

cd ~/Desktop/dc-mri/data
datalad install ///openneuro/ds000030

Let’s take a look at the participants.tsv file to see what the demographics for this dataset look like.

import pandas as pd

participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')


participant_id diagnosis age gender bart bht dwi pamenc pamret rest scap stopsignal T1w taskswitch ScannerSerialNumber ghost_NoGhost
0 sub-10159 CONTROL 30 F 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
1 sub-10171 CONTROL 24 M 1.0 1.0 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
2 sub-10189 CONTROL 49 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
3 sub-10193 CONTROL 40 M 1.0 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN 35343.0 No_ghost
4 sub-10206 CONTROL 21 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost

From this table we can easily view unique diagnostic groups:

array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)

Imagine we’d like to work with participants that are either CONTROL or SCHZ (diagnosis) and have both a T1w (T1w == 1) and rest (rest == 1) scan. Also, you’ll notice some of the T1w scans included in this dataset have a ghosting artifact. We’ll need to filter these out as well (ghost_NoGhost == 'No_ghost').

We’ll filter this data out like so:

participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) &
                                            (participant_metadata.T1w == 1) &
                                            ( == 1) &
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
array(['CONTROL', 'SCHZ'], dtype=object)

From this we have a list of participants corresponding to a list of participants who are of interest in our analysis. We can then use this list in order to download participant from software such as aws or datalad. In fact, this is exactly how we set up a list of participants to download for the fMRI workshop! Since we’ve already downloaded the dataset, we can now explore the structure using PyBIDS, a Python API for querying, summarizing and manipulating the BIDS folder structure.

Getting and dropping data

datalad get datalad drop

from bids.layout import BIDSLayout

layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030")

The pybids layout object indexes the BIDS folder. Indexing can take a really long time, especially if you have several subjects, modalities, scan types, etc. pybids has an option to save the indexed results to a SQLite database. This database can then be re-used the next time you want to query the same BIDS folder."~/Desktop/dc-mri/data/ds000030/.db")

layout = BIDSLayout("~/Desktop/dc-mri/data/ds000030", database_path="~/Desktop/dc-mri/data/ds000030/.db")

The pybids layout object also lets you query your BIDS dataset according to a number of parameters by using a get_*() method. We can get a list of the subjects we’ve downloaded from the dataset.


We can also pull a list of imaging modalities in the dataset:

['anat', 'func']

As well as tasks and more!:

#Task fMRI

#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...)


In addition we can specify sub-types of each BIDS category:

['brainmask', 'confounds', 'fsaverage5', 'preproc']

We can use this functionality to pull all our fMRI NIfTI files:

layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file')

Finally, we can convert the data stored in bids.layout into a pandas.DataFrame :

df = layout.as_data_frame()


path modality subject task type
0 ~/Desktop/dc-mri/data... NaN NaN rest bold
1 ~/Desktop/dc-mri/data... NaN NaN NaN participants
2 ~/Desktop/dc-mri/data... NaN NaN NaN NaN
3 ~/Desktop/dc-mri/data... anat 10565 NaN brainmask
4 ~/Desktop/dc-mri/data... anat 10565 NaN probtissue

Key Points

  • BIDS is an organizational principle for neuroimaging data

  • PyBIDS is a Python-based tool that allows for easy exploration of BIDS-formatted neuroimaging data