Modules
Last updated on 2023-12-08 | Edit this page
Estimated time 45 minutes
Overview
Questions
- How can I reuse a Nextflow
processin different workflows? - How do I use parameters in a module?
Objectives
- Add modules to a Nextflow script.
- Create a Nextflow modules.
- Understand how to use parameters in a module.
Modules
In most programming languages there is the concept of creating code blocks/modules that can be reused.
Nextflow (DSL2) allows the definition of module scripts
that can be included and shared across workflow pipelines.
A module file is nothing more than a Nextflow script containing one
or more process definitions that can be imported from
another Nextflow script.
A module can contain the definition of a function,
process and workflow definitions.
For example:
GROOVY
process INDEX {
input:
path transcriptome
output:
path 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
The Nextflow process INDEX above could be saved in a
file modules/rnaseq-tasks.nf as a Module script.
Importing module components
A component defined in a module script can be imported into another
Nextflow script using the include statement.
For example:
GROOVY
nextflow.enable.dsl=2
include { INDEX } from './modules/rnaseq-tasks'
workflow {
transcriptome_ch = channel.fromPath('data/yeast/transcriptome/*.fa.gz')
//
INDEX(transcriptome_ch)
}
The above snippets includes a process with name INDEX
defined in the module script rnaseq-tasks.nf in the main
execution context, as such it can be invoked in the workflow scope.
Nextflow implicitly looks for the script file
./modules/rnaseq-tasks.nf resolving the path against the
including script location.
Note: Relative paths must begin with the
./ prefix.
Add module
Add the Nextflow module FASTQC from the Nextflow script
./modules/rnaseq-tasks.nf to the following workflow.
GROOVY
nextflow.enable.dsl=2
params.reads = "data/yeast/reads/ref1_{1,2}.fq.gz"
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists:true )
workflow {
FASTQC(read_pairs_ch)
}
Solution
nextflow.enable.dsl=2 include { FASTQC } from './modules/rnaseq-tasks' params.reads = "$baseDir/data/yeast/reads/ref1_{1,2}.fq.gz" read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists:true ) workflow { FASTQC(read_pairs_ch) }
{: .language-groovy }
{: .solution}
Multiple inclusions
A Nextflow script allows the inclusion of any number of modules. When
multiple components need to be included from the some module script, the
component names can be specified in the same inclusion using the curly
brackets {}.
Note Component names are separated by a semi-colon
; as shown below:
GROOVY
nextflow.enable.dsl=2
include { INDEX; QUANT } from './modules/rnaseq-tasks'
workflow {
reads = channel.fromFilePairs('data/yeast/reads/*_{1,2}.fq.gz')
transcriptome_ch = channel.fromPath('data/yeast/transcriptome/*.fa.gz')
INDEX(transcriptome_ch)
QUANT(index.out,reads)
}
Module aliases
A process component, such as INDEX, can be invoked only
once in the same workflow context.
However, when including a module component it’s possible to specify a
name alias using the keyword as in the include
statement. This allows the inclusion and the invocation of the same
component multiple times in your script using different names.
For example:
GROOVY
nextflow.enable.dsl=2
include { INDEX } from './modules/rnaseq-tasks'
include { INDEX as SALMON_INDEX } from './modules/rnaseq-tasks'
workflow {
transcriptome_ch = channel.fromPath('data/yeast/transcriptome/*.fa.gz')
INDEX(transcriptome_ch)
SALMON_INDEX(transcriptome_ch)
}
In the above script the INDEX process is imported as
INDEX and an alias SALMON_INDEX.
The same is possible when including multiple components from the same module script as shown below:
GROOVY
nextflow.enable.dsl=2
include { INDEX; INDEX as SALMON_INDEX } from './modules/rnaseq-tasks'
workflow {
transcriptome_ch = channel.fromPath('/data/yeast/transcriptome/*.fa.gz)'
INDEX(transcriptome)
SALMON_INDEX(transcriptome)
}
Add multiple modules
Add the Nextflow modules FASTQC and MULTIQC
from the Nextflow script modules/rnaseq-tasks.nf to the
following workflow.
GROOVY
nextflow.enable.dsl=2
params.reads = "$baseDir/data/yeast/reads/ref1_{1,2}.fq.gz"
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists:true )
workflow {
FASTQC(read_pairs_ch)
MULTIQC(fastqc.out.collect())
}
GROOVY
nextflow.enable.dsl=2
include { FASTQC; MULTIQC } from './modules/rnaseq-tasks'
params.reads = "$baseDir/data/yeast/reads/ref1_{1,2}.fq.gz"
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists:true )
workflow {
FASTQC(read_pairs_ch)
MULTIQC(fastqc.out.collect())
}
Module parameters
A module script can define one or more parameters using the same syntax of a Nextflow workflow script:
GROOVY
//functions.nf file
params.message = 'parameter from module script'
//The def keyword allows use to define a function that we can use in the code
def sayMessage() {
println "$params.message"
}
Then, parameters are inherited from the including context. For example:
GROOVY
nextflow.enable.dsl=2
params.message = 'parameter from workflow script'
include {sayMessage} from './modules/functions'
workflow {
sayMessage()
}
The above snippet prints:
OUTPUT
parameter from workflow script
The module uses the parameters define before the include statement, therefore any further parameter set later is ignored.
Tip: Define all pipeline parameters at the beginning of the script before any include declaration.
The option addParams can be used to extend the module
parameters without affecting the parameters set before the
include statement.
For example:
GROOVY
nextflow.enable.dsl=2
params.message = 'parameter from workflow script'
include {sayMessage} from './modules/module.nf' addParams(message: 'using addParams')
workflow {
sayMessage()
}
The above code snippet prints:
OUTPUT
using addParams
Keypoints
- A module file is a Nextflow script containing one or more
processdefinitions that can be imported from another Nextflow script. - To import a module into a workflow use the
includekeyword. - A module script can define one or more parameters using the same syntax of a Nextflow workflow script.
- The module inherits the parameters define before the include statement, therefore any further parameter set later is ignored.