This lesson is in the early stages of development (Alpha version)

Exploration of Open Neuroimaging Datasets in BIDS format


Teaching: 30 min
Exercises: 15 min
  • How does standardization of neuroimaging data ease the data exploration process

  • Gain a grasp of the BIDS format

  • Use PyBIDS in to easily explore a BIDS dataset

Tutorial Dataset

For this tutorial, we will be using a subset of a publicly available dataset, ds000030, from The dataset is structured according to the Brain Imaging Data Structure (BIDS). BIDS is a simple and intuitive way to organize and describe your neuroimaging and behavioural data. Neuroimaging experiments result in complicated data that can be arranged in several different ways. BIDS tackles this problem by suggesting a new standard (based on consensus from multiple researchers across the world) for the arrangement of neuroimaging datasets.

Using the same structure for all of your studies will allow you to easily reuse all of your scripts between studies. Additionally, sharing code with other researchers will be much easier.

Let’s take a look at the participants.tsv file to see what the demographics for this dataset look like.

import pandas as pd

participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')


participant_id diagnosis age gender bart bht dwi pamenc pamret rest scap stopsignal T1w taskswitch ScannerSerialNumber ghost_NoGhost
0 sub-10159 CONTROL 30 F 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
1 sub-10171 CONTROL 24 M 1.0 1.0 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
2 sub-10189 CONTROL 49 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost
3 sub-10193 CONTROL 40 M 1.0 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN 35343.0 No_ghost
4 sub-10206 CONTROL 21 M 1.0 NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 35343.0 No_ghost

From this table we can easily view unique diagnostic groups:

array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)

For this tutorial, we’re just going to work with participants that are either CONTROL or SCHZ (diagnosis) and have both a T1w (T1w == 1) and rest (rest == 1) scan. Also, you’ll notice some of the T1w scans included in this dataset have a ghosting artifact. We’ll need to filter these out as well (ghost_NoGhost == 'No_ghost').

We’ll filter this data out like so:

participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) &
                                            (participant_metadata.T1w == 1) &
                                            ( == 1) &
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
array(['CONTROL', 'SCHZ'], dtype=object)

From this we have a list of participants corresponding to a list of participants who are of interest in our analysis. We can then use this list in order to download participant from software such as aws or datalad. In fact, this is exactly how we set up a list of participants to download for this workshop! Since we’ve already downloaded the dataset, we can now explore the structure using PyBIDS:

import bids.layout
layout = bids.layout.BIDSLayout('../data/ds000030')

The pybids layout object lets you query your BIDS dataset according to a number of parameters by using a get_*() method. We can get a list of the subjects we’ve downloaded from the dataset.


We can also pull a list of imaging modalities in the dataset:

['anat', 'func']

As well as tasks and more!:

#Task fMRI

#Data types (bold, brainmask, confounds, smoothwm, probtissue, warp...)


In addition we can specify sub-types of each BIDS category:

['brainmask', 'confounds', 'fsaverage5', 'preproc']

We can use this functionality to pull all our fMRI NIfTI files:

layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file')

Finally, we can convert the data stored in bids.layout into a pandas.DataFrame :

df = layout.as_data_frame()


path modality subject task type
0 /home/jerry/projects/scwg2018_python_neuroimag... NaN NaN rest bold
1 /home/jerry/projects/scwg2018_python_neuroimag... NaN NaN NaN participants
2 /home/jerry/projects/scwg2018_python_neuroimag... NaN NaN NaN NaN
3 /home/jerry/projects/scwg2018_python_neuroimag... anat 10565 NaN brainmask
4 /home/jerry/projects/scwg2018_python_neuroimag... anat 10565 NaN probtissue

Key Points

  • BIDS is an organizational principle for neuroimaging data for transparent data sharing

  • PyBIDS is a python based tool that allows for easy exploration of BIDS-formatted neuroimaging data