Summary and Setup

Nextflow is workflow management software which enables the writing of scalable and reproducible scientific workflows. It can integrate various software package and environment management systems such as Docker, Singularity, and Conda. It allows for existing pipelines written in common scripting languages, such as R and Python, to be seamlessly coupled together. It implements a Domain Specific Language (DSL) that simplifies the implementation and running of workflows on cloud or high-performance computing (HPC) infrastructures.

This lesson also introduces nf-core: a community-driven platform, which provide peer reviewed best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as development tools for building and sharing reproducible data science workflows.

lesson objectives


  1. The learner will understand the fundamental components of a Nextflow script, including channels, processes and operators.
  2. The learner will write a multi-step workflow script to align, quantify, and perform QC on an RNA-Seq data in Nextflow DSL2.
  3. The learner will be able to write a Nextflow configuration file to alter the computational resources allocated to a process.
  4. The learner will use nf-core to run a community curated pipeline.

Prerequisites

This is an intermediate lesson and assumes familiarity with the core materials covered in the Software Carpentry Lessons. In particular learners need to be familiar with material covered in The Unix Shell. It is helpful to be familiar with using another programming language, to the level of Plotting and Programming in Python or R for Reproducible Scientific Analysis, although this lesson does not specifically rely on Python or R. No previous knowledge of Nextflow, other workflow software, or Groovy is required.

Running the lessons on your local machine

Training directory

Each learner should setup a training folder e.g. nf-training

BASH

$ mkdir nf-training
$ cd nf-training

There are three items that you need to download:

  1. The training software.
  2. The training dataset.
  3. The workshop scripts.

Training software

A list of software with version required for this training is listed below:

Software Version
Nextflow 20.10.0
nf-core/tools 1.12.1
salmon 1.5
fastqc 0.11
multiqc 1.10
python 3.8

conda

The simplest way to install the software for this course is using conda.

To install conda see here.

An environment file is provided here environment.yml

BASH

# wget
wget https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml

# or curl 
curl -L -o environment.yml https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml

To create the training environment run:

BASH

$ conda env create -n nf-training -f environment.yml

Then activate the environment by running

BASH

$ conda activate nf-training

Training scripts

To aid in the delivery of the lesson, the scripts mentioned in each episode, can be found in the respective episode folders in the github repository. https://github.com/carpentries-incubator/workflows-nextflow/tree/main/episodes/files/scripts

To get the scripts associated with each episode you will need to download the scripts folder from the github repository.

Below is a series of commands to download and unpack scripts folder.

BASH

# get the gitrepo as a zip file
wget https://github.com//carpentries-incubator/workflows-nextflow/archive/main.zip

#or
curl -L -o main.zip https://github.com//carpentries-incubator/workflows-nextflow/archive/main.zip

# unzip the script file
unzip main.zip 'workflows-nextflow-main/episodes/files/scripts*' -d  .

# mv the scripts folder to the nf-training folder 
mv workflows-nextflow-main/episodes/files/scripts .

# remove the zip file and the git repo
rm -r workflows-nextflow-main main.zip

The nextflow scripts for each episode, can be found in the respective episode folders inside this the scripts folder.

Data

Inside the nf-training folder download the workshop dataset from Figshare, https://figshare.com/articles/dataset/RNA-seq_training_dataset/14822481

BASH

$ wget --content-disposition https://ndownloader.figshare.com/files/28531743

# or curl
curl -L -o  data.tar.gz https://ndownloader.figshare.com/files/28531743

Unpack gzipped tar file:

BASH

$ tar -xvf  data.tar.gz
$ rm data.tar.gz

Visual Studio Code editor setup

Any text editor can be used to write Nextflow scripts. A recommended code editor is Visual Studio Code.

Go to Visual Studio Code and you should see a download button. The button or buttons should be specific to your platform and the download package should be installable.

Nextflow language support in Visual Studio Code

You can add Nextflow language support in Visual Studio Code by clicking the install button on the Nextflow language extension.

Nextflow install without conda

Nextflow can be used on any POSIX compatible system (Linux, OS X, etc). It requires Bash and Java 8 (or later, up to 12) to be installed.

Windows systems may be supported using a POSIX compatibility layer like Cygwin (unverified) or, alternatively, installing it into a Linux VM using virtualization software like VirtualBox or VMware.

Nextflow installation

Install the latest version of Nextflow copy & pasting the following snippet in a terminal window:

# Make sure that Java v8+ is installed:
java -version

# Install Nextflow
export NXF_VER=20.10.0
curl get.nextflow.io | bash

Add Nextflow binary to your user’s PATH:

BASH

$ mv nextflow ~/bin/
# OR system-wide installation:
# sudo mv nextflow /usr/local/bin

Check the correct installation running the following command:

BASH

$ nextflow info

nf-core/tools installation without conda

Pip

BASH

pip install nf-core