Getting Started with Nextflow
- A workflow is a sequence of tasks that process a set of data.
- A workflow management system (WfMS) is a computational platform that provides an infrastructure for the set-up, execution and monitoring of workflows.
- Nextflow is a workflow management system that comprises both a runtime environment and a domain specific language (DSL).
- Nextflow scripts comprise of channels for controlling inputs and outputs, and processes for defining workflow tasks.
- You run a Nextflow script using the nextflow runcommand.
Workflow parameterisation
- Pipeline parameters are specified by prepending the prefix
paramsto a variable name, separated by dot character.
- To specify a pipeline parameter on the command line for a Nextflow
run use --variable_namesyntax.
- You can add parameters to a JSON formatted file and pass them to the
script using option -params-file.
Channels
- Channels must be used to import data into Nextflow.
- Nextflow has two different kinds of channels: queue channels and value channels.
- Data in value channels can be used multiple times in workflow.
- Data in queue channels are consumed when they are used by a process or an operator.
- Channel factory methods, such as Channel.of, are used to create channels.
- Channel factory methods have optional parameters e.g.,
checkIfExists, that can be used to alter the creation and behaviour of a channel.
Processes
- A Nextflow process is an independent step in a workflow.
- Processes contain up to five definition blocks including: directives, inputs, outputs, when clause and finally a script block.
- The script block contains the commands you would like to run.
- A process should have a script but the other four blocks are optional.
- Inputs are defined in the input block with a type qualifier and a name.
Processes Part 2
- Outputs to a process are defined using the output blocks.
- You can group input and output data from a process using the tuple qualifier.
- The execution of a process can be controlled using the
whendeclaration and conditional statements.
- Files produced within a process and defined as outputcan be saved to a directory using thepublishDirdirective.
Workflow
- A Nextflow workflow is defined by invoking processesinside theworkflowscope.
- A process is invoked like a function inside the
workflowscope passing any required input parameters as arguments. e.g.FASTQC(reads_ch).
- Process outputs can be accessed using the outattribute for the respectiveprocessobject or assigning the output to a Nextflow variable.
- Multiple outputs from a single process can be accessed using the
list syntax []and it’s index or by referencing the a named process output .
Operators
- Nextflow operators are methods that allow you to modify, set or view channels.
- Operators can be separated in to several groups; filtering , transforming , splitting , combining , forking and Maths operators
- To use an operator use the dot notation after the Channel object
e.g. my_ch.view().
- You can parse text items emitted by a channel, that are formatted
using the CSV format, using the splitCsvoperator.
Reporting
- Nextflow can produce a custom execution report with run information
using the logcommand.
- You can generate a report using the -toption specifying a template file.
Nextflow configuration
- Nextflow configuration can be managed using a Nextflow configuration file.
- Nextflow configuration files are plain text files containing a set of properties.
- You can define process specific settings, such as cpus and memory,
within the processscope.
- You can assign different resources to different processes using the
process selectors withNameorwithLabel.
- You can define a profile for different configurations using the
profilesscope. These profiles can be selected when launching a pipeline execution by using the-profilecommand-line option
- Nextflow configuration settings are evaluated in the order they are read-in.
Workflow caching and checkpointing
- Nextflow automatically keeps track of all the processes executed in your pipeline via checkpointing.
- Nextflow caches intermediate data in task directories within the work directory.
- Nextflow caching and checkpointing allows re-entrancy into a workflow after a pipeline error or using new data, skipping steps that have been successfully executed.
- Re-entrancy is enabled using the -resumeoption.
Simple RNA-Seq pipeline
- Nextflow can combined tasks (processes) and manage data flows using channels into a single pipeline/workflow.
- A Workflow can be parameterise using params. These value of the parameters can be captured in a log file usinglog.info
- Nextflow can handle a workflow’s software requirements using several
technologies including the condapackage and enviroment manager.
- Workflow steps are connected via their inputsandoutputsusingChannels.
- Intermediate pipeline results can be transformed using Channel
operatorssuch ascombine.
- Nextflow can execute an action when the pipeline completes the
execution using the workflow.onCompleteevent handler to print a confirmation message.
- Nextflow is able to produce multiple reports and charts providing
several runtime metrics and execution information using the command line
options -with-report,-with-trace,-with-timelineand produce a graph using-with-dag.
Deploying nf-core pipelines
- nf-core is a community-led project to develop a set of best-practice pipelines built using the Nextflow workflow management system.
- The nf-core tool (nf-core) is a suite of helper tools that aims to help people run and develop nf-core pipelines.
- nf-core pipelines can be found using nf-core list, or by checking the nf-core website.
- 
nf-core launch nf-core/<pipeline>can be used to write a parameter file for an nf-core pipeline. This can be supplied to the pipeline using the-params-fileoption.
- An nf-core workflow is run using
nextflow run nf-core/<pipeline>syntax.
- nf-core pipelines can be reconfigured by using custom config files and/or adding command line parameters.