Sub-workflows
Last updated on 2023-12-11 | Edit this page
Overview
Questions
- How do I reuse a workflow as part of a larger workflow?
- How do I run only a part of a workflow?
Objectives
- Understand how to create a sub-workflow.
- Understand how to run part of a workflow.
Sub-workflows
We have seen previously the Nextflow DSL2 syntax allows for the definition of reusable processes (modules). Nextflow DSL2 also allow the definition reusable sub-workflow libraries.
Workflow definition
The workflow keyword allows the definition of workflow
components that enclose the invocation of one or more
processes and operators.
For example,:
GROOVY
nextflow.enable.dsl=2
include {QUANT;INDEX} from './modules/module.nf'
workflow RNASEQ_QUANT_PIPE {
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
transcriptome_ch = channel.fromPath('/data/yeast/transcriptome/*.fa.gz')
QUANT(INDEX(transcriptome_ch),read_pairs_ch)
}
The above snippet defines a workflow component, named
RNASEQ_QUANT_PIPE, that can be invoked from another
workflow component definition in the same way as any other function or
process i.e. RNASEQ_QUANT().
GROOVY
nextflow.enable.dsl=2
include {QUANT;INDEX} from './modules/module.nf'
workflow RNASEQ_QUANT_PIPE {
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
transcriptome_ch = channel.fromPath('/data/yeast/transcriptome/*.fa.gz')
QUANT(INDEX(transcriptome_ch),read_pairs_ch)
}
// Implicit workflow
workflow {
/*
* Call sub-workflow using <WORKFLOWNAME>() syntax
*/
RNASEQ_QUANT_PIPE()
}
Workflow parameters
A workflow component can access any variable and parameter defined in the outer scope.
For Example:
GROOVY
nextflow.enable.dsl=2
include {QUANT;INDEX} from './modules/module.nf'
params.transcriptome = '/some/data/file'
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
workflow RNASEQ_QUANT_PIPE {
transcriptome_ch = channel.fromPath(params.transcriptome)
QUANT(INDEX(transcriptome_ch),read_pairs_ch)
}
Workflow inputs
A workflow component can declare one or more input channels using the
take keyword.
For example:
GROOVY
nextflow.enable.dsl=2
include {QUANT;INDEX} from './modules/module.nf'
params.transcriptome = '/some/data/file'
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
workflow RNASEQ_QUANT_PIPE {
take:
transcriptome_ch
read_pairs_ch
main:
transcriptome_ch = channel.fromPath(params.transcriptome)
INDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
}
These input channels can then be passed to the workflow as parameters
inside the (). Multiple parameters are separated by a comma
, and must be specified in the order they appear under
take:
GROOVY
nextflow.enable.dsl=2
include {QUANT;INDEX} from './modules/module.nf'
params.transcriptome = '/some/data/file'
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
workflow RNASEQ_QUANT_PIPE {
take:
transcriptome_ch
read_pairs_ch
main:
transcriptome_ch = channel.fromPath(params.transcriptome)
INDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
}
workflow {
RNASEQ_QUANT_PIPE(transcriptome_ch,read_pairs_ch )
}
Workflow outputs
A workflow component can declare one or more output channels using
the emit keyword.
For example:
GROOVY
nextflow.enable.dsl=2
include {QUANT;INDEX} from './modules/module.nf'
params.transcriptome = '/some/data/file'
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
workflow RNASEQ_QUANT_PIPE {
take:
transcriptome_ch
read_pairs_ch
emit:
QUANT.out
main:
transcriptome_ch = channel.fromPath(params.transcriptome)
INDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
}
The above script declares one output, QUANT.out.
The result of the RNASEQ_QUANT_PIPE execution can be
accessed using the out property ie.
RNASEQ_QUANT_PIPE.out.
When there are multiple output channels declared, use the array bracket notation to access each output component as described for the Process outputs definition.
GROOVY
RNASEQ_QUANT_PIPE.out[0]
RNASEQ_QUANT_PIPE.out[1]
Alternatively, the output channel can be accessed using a name which it’s assigned to in the emit declaration:
For example:
GROOVY
nextflow.enable.dsl=2
workflow RNASEQ_QUANT_PIPE {
main:
INDEX(transcriptome_ch)
QUANT(INDEX.out,read_pairs_ch)
emit:
read_quant = QUANT.out
}
The output QUANT.out is assigned the name
read_quant The the result of the above snippet can accessed
using:
GROOVY
RNASEQ_QUANT_PIPE.out.read_quant`.
Workflow composition
As with modules workflows components can be defined
within your script or imported by a include statment. After
which thet can then be invoked and composed as any other
workflow component or process in your script.
GROOVY
nextflow.enable.dsl=2
// file modules/qc.nf
include {FASTQC} from './modules.nf'
workflow READ_QC_PIPE {
take:
read_pairs_ch
quant_out_ch
main:
FASTQC(read_pairs_ch)
emit:
FASTQC.out
}
nextflow.enable.dsl=2
include { READ_QC_PIPE } from './modules/qc.nf'
workflow RNASEQ_QUANT_PIPE {
take:
transcriptome_ch
read_pairs_ch
main:
INDEX(transcriptome)
QUANT(INDEX.out)
emit:
QUANT.out
}
params.transcriptome = '/some/data/file'
read_pairs_ch = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
transcriptome_ch = channel.fromPath(params.transcriptome)
workflow {
take:
transcriptome_ch
read_pairs_ch
main:
RNASEQ_QUANT(transcriptome_ch,read_pairs_ch)
READ_QC(read_pairs_ch,RNASEQ_QUANT.out)
MULTIQC(RNASEQ_QUANT.out.mix(READ_QC).collect())
}
Nested workflow execution
Nested workflow execution determines an implicit scope. Therefore the
same process can be invoked in two different workflow scopes, like for
example in the above snippet INDEX could be used either in
RNASEQ_QUANT and RNASEQ_QC. The workflow
execution path along with the process names defines the process fully
qualified name that is used to distinguish the two different process
invocations i.e. RNASEQ_QUANT:INDEX and
RNASEQ_QC:INDEX in the above example.
Specific workflow entry points
By default, the unnamed workflow is assumed to be the main entry
point for the script. Using named workflows, the entry point can be
customised by using the entry option of the
run command. This allows users to run a specific
sub-workflow or a section of their entire workflow script.
For example:
BASH
$ nextflow run main.nf -entry RNASEQ_QUANT_PIPE
The above command would run the RNASEQ_QUANT_PIPE
sub-workflow.
Key Points
- Nextflow allows for definition of reusable sub-workflow libraries.
- Sub-workflow allows the definition of workflow processes that can be included from any other script and invoked as a custom function within the new workflow scope. This enables reuse of workflow components
- The
entryoption of the nextflowruncommand specifies the workflow name to be executed