Modules
Last updated on 2023-12-08 | Edit this page
Estimated time 45 minutes
Overview
Questions
- How can I reuse a Nextflow
process
in different workflows? - How do I use parameters in a module?
Objectives
- Add modules to a Nextflow script.
- Create a Nextflow modules.
- Understand how to use parameters in a module.
Modules
In most programming languages there is the concept of creating code blocks/modules that can be reused.
Nextflow (DSL2) allows the definition of module
scripts
that can be included and shared across workflow pipelines.
A module file is nothing more than a Nextflow script containing one
or more process
definitions that can be imported from
another Nextflow script.
A module can contain the definition of a function
,
process
and workflow
definitions.
For example:
GROOVY
{
process INDEX :
input
path transcriptome:
output'index'
path :
script"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
The Nextflow process INDEX
above could be saved in a
file modules/rnaseq-tasks.nf
as a Module script.
Importing module components
A component defined in a module script can be imported into another
Nextflow script using the include
statement.
For example:
GROOVY
.enable.dsl=2
nextflow
{ INDEX } from './modules/rnaseq-tasks'
include
{
workflow = channel.fromPath('data/yeast/transcriptome/*.fa.gz')
transcriptome_ch //
INDEX(transcriptome_ch)
}
The above snippets includes a process with name INDEX
defined in the module script rnaseq-tasks.nf
in the main
execution context, as such it can be invoked in the workflow scope.
Nextflow implicitly looks for the script file
./modules/rnaseq-tasks.nf
resolving the path against the
including script location.
Note: Relative paths must begin with the
./
prefix.
Add module
Add the Nextflow module FASTQC
from the Nextflow script
./modules/rnaseq-tasks.nf
to the following workflow.
GROOVY
.enable.dsl=2
nextflow
.reads = "data/yeast/reads/ref1_{1,2}.fq.gz"
params= channel.fromFilePairs( params.reads, checkIfExists:true )
read_pairs_ch
{
workflow FASTQC(read_pairs_ch)
}
Solution
nextflow.enable.dsl=2 include { FASTQC } from './modules/rnaseq-tasks' params.reads = "$baseDir/data/yeast/reads/ref1_{1,2}.fq.gz" read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists:true ) workflow { FASTQC(read_pairs_ch) }
{: .language-groovy }
{: .solution}
Multiple inclusions
A Nextflow script allows the inclusion of any number of modules. When
multiple components need to be included from the some module script, the
component names can be specified in the same inclusion using the curly
brackets {}
.
Note Component names are separated by a semi-colon
;
as shown below:
GROOVY
.enable.dsl=2
nextflow
{ INDEX; QUANT } from './modules/rnaseq-tasks'
include
{
workflow = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
reads = channel.fromPath('data/yeast/transcriptome/*.fa.gz')
transcriptome_ch INDEX(transcriptome_ch)
QUANT(index.out,reads)
}
Module aliases
A process component, such as INDEX
, can be invoked only
once in the same workflow context.
However, when including a module component it’s possible to specify a
name alias using the keyword as
in the include
statement. This allows the inclusion and the invocation of the same
component multiple times in your script using different names.
For example:
GROOVY
.enable.dsl=2
nextflow
{ INDEX } from './modules/rnaseq-tasks'
include { INDEX as SALMON_INDEX } from './modules/rnaseq-tasks'
include
{
workflow = channel.fromPath('data/yeast/transcriptome/*.fa.gz')
transcriptome_ch INDEX(transcriptome_ch)
SALMON_INDEX(transcriptome_ch)
}
In the above script the INDEX
process is imported as
INDEX
and an alias SALMON_INDEX
.
The same is possible when including multiple components from the same module script as shown below:
GROOVY
.enable.dsl=2
nextflow
{ INDEX; INDEX as SALMON_INDEX } from './modules/rnaseq-tasks'
include
{
workflow = channel.fromPath('/data/yeast/transcriptome/*.fa.gz)'
transcriptome_ch INDEX(transcriptome)
SALMON_INDEX(transcriptome)
}
Add multiple modules
Add the Nextflow modules FASTQC
and MULTIQC
from the Nextflow script modules/rnaseq-tasks.nf
to the
following workflow.
GROOVY
.enable.dsl=2
nextflow.reads = "$baseDir/data/yeast/reads/ref1_{1,2}.fq.gz"
params= channel.fromFilePairs( params.reads, checkIfExists:true )
read_pairs_ch
{
workflow FASTQC(read_pairs_ch)
MULTIQC(fastqc.out.collect())
}
GROOVY
.enable.dsl=2
nextflow{ FASTQC; MULTIQC } from './modules/rnaseq-tasks'
include
.reads = "$baseDir/data/yeast/reads/ref1_{1,2}.fq.gz"
params= channel.fromFilePairs( params.reads, checkIfExists:true )
read_pairs_ch
{
workflow FASTQC(read_pairs_ch)
MULTIQC(fastqc.out.collect())
}
Module parameters
A module script can define one or more parameters using the same syntax of a Nextflow workflow script:
GROOVY
//functions.nf file
.message = 'parameter from module script'
params
//The def keyword allows use to define a function that we can use in the code
def sayMessage() {
"$params.message"
println }
Then, parameters are inherited from the including context. For example:
GROOVY
.enable.dsl=2
nextflow
.message = 'parameter from workflow script'
params
{sayMessage} from './modules/functions'
include
{
workflow sayMessage()
}
The above snippet prints:
OUTPUT
parameter from workflow script
The module uses the parameters define before the include statement, therefore any further parameter set later is ignored.
Tip: Define all pipeline parameters at the beginning of the script before any include declaration.
The option addParams
can be used to extend the module
parameters without affecting the parameters set before the
include
statement.
For example:
GROOVY
.enable.dsl=2
nextflow
.message = 'parameter from workflow script'
params
{sayMessage} from './modules/module.nf' addParams(message: 'using addParams')
include
{
workflow sayMessage()
}
The above code snippet prints:
OUTPUT
using addParams
Keypoints
- A module file is a Nextflow script containing one or more
process
definitions that can be imported from another Nextflow script. - To import a module into a workflow use the
include
keyword. - A module script can define one or more parameters using the same syntax of a Nextflow workflow script.
- The module inherits the parameters define before the include statement, therefore any further parameter set later is ignored.