Sub-workflows
Last updated on 2023-12-08 | Edit this page
Overview
Questions
- How do I reuse a workflow as part of a larger workflow?
- How do I run only a part of a workflow?
Objectives
- Understand how to create a sub-workflow.
- Understand how to run part of a workflow.
Sub-workflows
We have seen previously the Nextflow DSL2 syntax allows for the definition of reusable processes (modules). Nextflow DSL2 also allow the definition reusable sub-workflow libraries.
Workflow definition
The workflow
keyword allows the definition of workflow
components that enclose the invocation of one or more
processes
and operators
.
For example,:
GROOVY
.enable.dsl=2
nextflow
{QUANT;INDEX} from './modules/module.nf'
include
{
workflow RNASEQ_QUANT_PIPE = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
read_pairs_ch = channel.fromPath('/data/yeast/transcriptome/*.fa.gz')
transcriptome_ch QUANT(INDEX(transcriptome_ch),read_pairs_ch)
}
The above snippet defines a workflow component, named
RNASEQ_QUANT_PIPE
, that can be invoked from another
workflow component definition in the same way as any other function or
process
i.e. RNASEQ_QUANT()
.
GROOVY
.enable.dsl=2
nextflow
{QUANT;INDEX} from './modules/module.nf'
include
{
workflow RNASEQ_QUANT_PIPE = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
read_pairs_ch = channel.fromPath('/data/yeast/transcriptome/*.fa.gz')
transcriptome_ch QUANT(INDEX(transcriptome_ch),read_pairs_ch)
}
// Implicit workflow
{
workflow /*
* Call sub-workflow using <WORKFLOWNAME>() syntax
*/
RNASEQ_QUANT_PIPE()
}
Workflow parameters
A workflow component can access any variable and parameter defined in the outer scope.
For Example:
GROOVY
.enable.dsl=2
nextflow
{QUANT;INDEX} from './modules/module.nf'
include
.transcriptome = '/some/data/file'
params= channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
read_pairs_ch
{
workflow RNASEQ_QUANT_PIPE
= channel.fromPath(params.transcriptome)
transcriptome_ch QUANT(INDEX(transcriptome_ch),read_pairs_ch)
}
Workflow inputs
A workflow component can declare one or more input channels using the
take
keyword.
For example:
GROOVY
.enable.dsl=2
nextflow
{QUANT;INDEX} from './modules/module.nf'
include
.transcriptome = '/some/data/file'
params= channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
read_pairs_ch
{
workflow RNASEQ_QUANT_PIPE :
take
transcriptome_ch
read_pairs_ch:
main= channel.fromPath(params.transcriptome)
transcriptome_ch INDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
}
These input channels can then be passed to the workflow as parameters
inside the ()
. Multiple parameters are separated by a comma
,
and must be specified in the order they appear under
take
:
GROOVY
.enable.dsl=2
nextflow
{QUANT;INDEX} from './modules/module.nf'
include
.transcriptome = '/some/data/file'
params= channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
read_pairs_ch
{
workflow RNASEQ_QUANT_PIPE :
take
transcriptome_ch
read_pairs_ch:
main= channel.fromPath(params.transcriptome)
transcriptome_ch INDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
}
{
workflow RNASEQ_QUANT_PIPE(transcriptome_ch,read_pairs_ch )
}
Workflow outputs
A workflow component can declare one or more output channels using
the emit
keyword.
For example:
GROOVY
.enable.dsl=2
nextflow
{QUANT;INDEX} from './modules/module.nf'
include
.transcriptome = '/some/data/file'
params= channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
read_pairs_ch
{
workflow RNASEQ_QUANT_PIPE :
take
transcriptome_ch
read_pairs_ch:
emit.out
QUANT:
main= channel.fromPath(params.transcriptome)
transcriptome_ch INDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
}
The above script declares one output, QUANT.out
.
The result of the RNASEQ_QUANT_PIPE
execution can be
accessed using the out
property ie.
RNASEQ_QUANT_PIPE.out
.
When there are multiple output channels declared, use the array bracket notation to access each output component as described for the Process outputs definition.
GROOVY
.out[0]
RNASEQ_QUANT_PIPE.out[1] RNASEQ_QUANT_PIPE
Alternatively, the output channel can be accessed using a name which it’s assigned to in the emit declaration:
For example:
GROOVY
.enable.dsl=2
nextflow
{
workflow RNASEQ_QUANT_PIPE :
mainINDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
:
emit= QUANT.out
read_quant }
The output QUANT.out
is assigned the name
read_quant
The the result of the above snippet can accessed
using:
GROOVY
.out.read_quant`. RNASEQ_QUANT_PIPE
Workflow composition
As with modules
workflows components can be defined
within your script or imported by a include
statment. After
which thet can then be invoked and composed as any other
workflow component
or process in your script.
GROOVY
.enable.dsl=2
nextflow
// file modules/qc.nf
{FASTQC} from './modules.nf'
include
{
workflow READ_QC_PIPE :
take
read_pairs_ch
quant_out_ch:
mainFASTQC(read_pairs_ch)
:
emit.out
FASTQC}
nextflow.enable.dsl=2
include { READ_QC_PIPE } from './modules/qc.nf'
workflow RNASEQ_QUANT_PIPE {
take:
transcriptome_ch
read_pairs_ch
main:
INDEX(transcriptome)
QUANT(INDEX.out)
emit:
QUANT.out
}
params.transcriptome = '/some/data/file'
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
transcriptome_ch = channel.fromPath(params.transcriptome)
workflow {
take:
transcriptome_ch
read_pairs_ch
main:
RNASEQ_QUANT(transcriptome_ch,read_pairs_ch)
READ_QC(read_pairs_ch,RNASEQ_QUANT.out)
MULTIQC(RNASEQ_QUANT.out.mix(READ_QC).collect())
}
Nested workflow execution
Nested workflow execution determines an implicit scope. Therefore the
same process can be invoked in two different workflow scopes, like for
example in the above snippet INDEX
could be used either in
RNASEQ_QUANT
and RNASEQ_QC
. The workflow
execution path along with the process names defines the process fully
qualified name that is used to distinguish the two different process
invocations i.e. RNASEQ_QUANT:INDEX
and
RNASEQ_QC:INDEX
in the above example.
Specific workflow entry points
By default, the unnamed workflow is assumed to be the main entry
point for the script. Using named workflows, the entry point can be
customised by using the entry
option of the
run
command. This allows users to run a specific
sub-workflow or a section of their entire workflow script.
For example:
BASH
$ nextflow run main.nf -entry RNASEQ_QUANT_PIPE
The above command would run the RNASEQ_QUANT_PIPE
sub-workflow.
Keypoints
- Nextflow allows for definition of reusable sub-workflow libraries.
- Sub-workflow allows the definition of workflow processes that can be included from any other script and invoked as a custom function within the new workflow scope. This enables reuse of workflow components
- The
entry
option of the nextflowrun
command specifies the workflow name to be executed