Nextflow scripting
Last updated on 2023-12-08 | Edit this page
Estimated time 35 minutes
Overview
Questions
- What language are Nextflow scripts written in?
- How do I store values in a Nextflow script?
- How do I write comments in a Nextflow script?
- How can I store and retrieve multiple values?
- How are strings evaluated in Nextflow?
- How can I create simple re-useable code blocks?
Objectives
- Understand what language Nextflow scripts are written in.
- Define variables in a script.
- Create lists of simple values.
- Comment Nextflow scripts.
- Explain what a list is.
- Explain what string interpolation is.
- Understand what a closure is.
Nextflow is a Domain Specific Language (DSL) implemented on top of the Groovy programming language, which in turn is a super-set of the Java programming language. This means that Nextflow can run any Groovy and Java code. It is not necessary to learn Groovy to use Nextflow DSL but it can be useful in edge cases where you need more functionality than the DSL provides.
Nextflow console
Nextflow has a console graphical interface. The console is a REPL (read-eval-print loop) environment that allows a user to quickly test part of a script or pieces of Nextflow code in an interactive manner.
It is a handy tool that allows a user to evaluate fragments of Nextflow/Groovy code or fast prototype a complete pipeline script. More information can be found here
We can use the command nextflow console
to launch the
interactive console to test out out Groovy code.
BASH
nextflow console
Console global scope
It is worth noting that the global script context is maintained across script executions. This means that variables declared in the global script scope are not lost when the script run is complete, and they can be accessed in further executions of the same or another piece of code.
Language Basics
Printing values
To print something is as easy as using the println
method (println
is a compression of “print line”) and
passing the text to print in quotes. The text is referred to as a
string
as in a string of characters.
GROOVY
println("Hello, World!")
OUTPUT
Hello, World!
Parenthesis for function invocations are optional. Therefore also the following is a valid syntax.
GROOVY
"Hello, World!" println
OUTPUT
Hello, World!
Methods
println
is a example of a Groovy method. A method is
just a block of code which only runs when it is called. You can pass
data, known as parameters, into a method using the method name followed
by brackets ()
. Methods are used to perform certain
actions, and they are also known as functions. Methods enable us to
reuse code: define the code once, and use it many times.
Variables
In any programming language, you need to use variables to store different types of information. A variable is a pointer to a space in the computer’s memory that stores the value associated with it.
Variables are assigned using =
and can have any value.
Groovy is dynamically-typed which means the variable’s data type is
based on its value. For example, setting x = 1
means
x
is an integer number, but if it is later set to
x = "hello"
then it becomes a String.
Variable scope
When we create a variable using the x = 1
syntax we can
access, (scope
), it anywhere (globally
) in the
script. A variable declared in this fashion is sometimes called a public
variable.
We can also define variables with a data type
e.g. String x="Hello"
or with the def
keyword
def x=1
. This effects the accessibility
(scope
) of the variable. This is called lexical scoping
(sometimes known as static scoping) that sets the scope of a variable so
that it may only be accessed from within the block of code in which it
is defined. A variable declared in this fashion is sometimes called a
private variable.
Types of Data
Groovy knows various types of data. four common ones are:
-
String
− These are text literals which are represented in the form of chain of characters enclosed in quotes. For example"Hello World"
. -
int
− This is used to represent whole numbers. An example is1234
. -
Boolean
− This represents a Boolean value which can either betrue
orfalse
. -
float
- This is used to represent floating point number12.34
.
A more complete list can be found here
In the example below, variable my_var
has an integer
value of 1
:
GROOVY
//int − This is used to represent whole numbers.
= 1 my_var
To create a variable with a floating point value, we can execute:
GROOVY
//float − This is used to represent floating point numbers.
= 3.1499392 my_var
To create a Boolean value we assign the value true
or
false
.
*Note: Do not enclose a Boolean value in quotes or they will be
interpreted as a string.
GROOVY
//Boolean − This represents a Boolean value which can either be true or false.
= false my_var
And to create a string, we add single or double quotes around some text.
For example:
GROOVY
//String - These are text literals which are represented in the form of chain of characters
= "chr1" my_var
Multi-line strings
A block of text that span multiple lines can be defined by delimiting
it with triple single '''
or double quotes
"""
:
GROOVY
= """
text This is a multi-line string
using triple quotes.
"""
To display the value of a variable to the screen in Groovy, we can
use the println
method passing the variable name are a
parameter.
GROOVY
= 1
x println(x)
OUTPUT
1
Slashy strings
Strings can also be defined using the forward slash /
character as delimiter. They are known as slashy strings
and are useful for defining regular expressions and patterns, as there
is no need to escape backslashes e.g \n
specifies a new
line. As with double quote strings they allow to interpolate variables
prefixed with a $
character.
Try the following to see the difference:
GROOVY
= /ATP1B2\TP53\WRAP53/
x println(x)
OUTPUT
ATP1B2\TP53\WRAP53
GROOVY
= 'ATP1B2\TP53\WRAP53'
y println(y)
Produces an error as the \
is a special characters that
we need to escape.
GROOVY
// use \ to escape
= 'ATP1B2\\TP53\\WRAP53'
y println(y)
OUTPUT
ATP1B2\TP53\WRAP53
String interpolation
To use a variable inside a single or multi-line double quoted string
""
prefix the variable name with a $
to show
it should be interpolated:
GROOVY
= "1"
chr println("processing chromosome $chr")
OUTPUT
processing chromosome 1
Note: Variable names inside single quoted strings do not support String interpolation.
GROOVY
= "1"
chr println('processing chromosome $chr')
OUTPUT
processing chromosome $chr
Lists
To store multiple values in a variable we can use a List. A List
(also known as array) object can be defined by placing the list items in
square brackets and separating items by commas ,
:
GROOVY
= [11,21,27,31] kmers
You can access a given item in the list with square-bracket notation
[]
. These positions are numbered starting at 0, so the
first element has an index of 0.
GROOVY
= [11,21,27,31]
kmers println(kmers[0])
OUTPUT
11
We can use negative numbers as indices in Groovy. They count from the
end of the list rather than the front: the index -1
gives
us the last element in the list, -2
the second to last, and
so on. Because of this, kmers[3]
and kmers[-1]
point to the same element in our example list.
GROOVY
= [11,21,27,31]
kmers //Lists can also be indexed with negative indexes
println(kmers[3])
println(kmers[-1])
OUTPUT
31
31
Lists can also be indexed using a range. A range is a quick way of
declaring a list of consecutive sequential numbers. To define a range
use <num1>..<num2>
notation.
GROOVY
= [11,21,27,31]
kmers // The first three elements using a range.
println(kmer[0..2])
OUTPUT
[11, 21, 27]
String interpolation of list elements
To use an expression like kmer[0..2]
inside a double
quoted String ""
we use the ${expression}
syntax, similar to Bash shell scripts.
For example, the expression below without the {}
“”
GROOVY
= [11,21,27,31]
kmers println("The first three elements in the Lists are. $kmers[0..2]")
would output.
OUTPUT
The first three elements in the Lists are. [11, 21, 27, 31][0..2]
We need to enclose the kmers[0..2]
expression inside
${}
as below to get the correct output.
GROOVY
= [11,21,27,31]
kmers println("The first three elements in the Lists are. ${kmers[0..2]}")
OUTPUT
The first three elements in the Lists are. [11, 21, 27]
List methods
Lists have a number of useful methods that can perform operations on their contents. See more here. When using a method on a type of object you need prefix the method with the variable name.
For example, in order to get the length of the list use the list
size
method:
GROOVY
= [0,1,2]
mylist
println(mylist.size())
//inside a string need we need to use the ${} syntax
println("list size is: ${mylist.size()}")
OUTPUT
3
list size is: 3
We can use the get
method items to retrieve items in a
list.
GROOVY
= [0,1,2]
mylist .get(1) println mylist
OUTPUT
1
Listed below are a few more common list methods and their output on a simple example.
GROOVY
= [1,2,3]
mylist
println mylist+ [1]
println mylist - [1]
println mylist * 2
println mylist .reverse()
println mylist.collect{ it+3 }
println mylist.unique().size()
println mylist.count(1)
println mylist.min()
println mylist.max()
println mylist.sum()
println mylist.sort()
println mylist.find{it%2 == 0}
println mylist.findAll{it%2 == 0} println mylist
OUTPUT
[1, 2, 3]
[1, 2, 3, 1]
[2, 3]
[1, 2, 3, 1, 2, 3]
[3, 2, 1]
[4, 5, 6]
3
1
1
3
6
[1, 2, 3]
2
[2]
GROOVY
= [1,2,3,4,5,6,7,8,9,10]
list //or
= 1..10
list println("${list[4]}")
//or
println("${list.get(4)}")
The fifth element is 5
. Remember that the array index
starts at 0.
Maps
It can difficult to remember the index of a value in a list, so we
can use Groovy Maps (also known as associative arrays) that have an
arbitrary type of key instead of an integer value. The syntax is very
similar to Lists. To specify the key use a colon before the value
[key:value]
. Multiple values are separated by a comma.
Note: the key value is not enclosed in quotes.
GROOVY
= [ chromosome : "chr17", start: 7640755, end: 7718054, genes: ['ATP1B2','TP53','WRAP53']] roi
Maps can be accessed in a conventional square-bracket syntax or as if the key was a property of the map or using the dot notation. Note: When retrieving a value the key value is enclosed in quotes.
GROOVY
//Use of the square brackets.
println(roi['chromosome'])
//Use a dot notation
println(roi.start)
//Use of get method
println(roi.get('genes'))
To add data or to modify a map, the syntax is similar to adding values to list:
GROOVY
//Use of the square brackets
['chromosome'] = '17'
roi
//Use a dot notation
.chromosome = 'chr17'
roi
//Use of put method
.put('genome', 'hg38') roi
More information about maps can be found in the Groovy API.
Closures
Closures are the swiss army knife of Nextflow/Groovy programming. In a nutshell a closure is a block of code that can be passed as an argument to a function. This can be useful to create a re-usable function.
We can assign a closure to a variable in same way as a value using
the =
.
GROOVY
= { it * it } square
The curly brackets {}
around the expression
it * it
tells the script interpreter to treat this
expression as code. it
is an implicit variable that is
provided in closures. It’s available when the closure doesn’t have an
explicitly declared parameter and represents the value that is passed to
the function when it is invoked.
We can pass the function square
as an argument to other
functions or methods. Some built-in functions take a function like this
as an argument. One example is the collect
method on lists
that iterates through each element of the list transforming it into a
new value using the closure:
GROOVY
= { it * it }
square = [ 1, 2, 3, 4 ]
x = x.collect(square)
y println y
OUTPUT
[ 1, 4, 9, 16 ]
A closure can also be defined in an anonymous manner, meaning that it is not given a name, and is defined in the place where it needs to be used.
GROOVY
= [ 1, 2, 3, 4 ]
x = x.collect({ it * it })
y println("x is $x")
println("y is $y")
OUTPUT
x is [1, 2, 3, 4]
y is [1, 4, 9, 16]
Closure parameters
By default, closures take a single parameter called it
.
To define a different name use the variable ->
syntax.
For example:
GROOVY
= { num -> num * num } square
In the above example the variable num
is assigned as the
closure input parameter instead of it
.
GROOVY
= { "chr${it}"}
prefix = [ 1,2,3,4,5,6 ].collect(prefix)
x println x
OUTPUT
[chr1, chr2, chr3, chr4, chr5, chr6]
Multiple map parameters
It’s also possible to define closures with multiple, custom-named
parameters using the ->
syntax. This separate the
custom-named parameters by a comma before the ->
operator.
For example:
GROOVY
= [chromosome: "chr17",start:7661779 ,end:7687538, genome:'GRCh38', gene: "TP53"]
tp53 //perform subtraction of end and start coordinates
= {start,end -> end-start }
region_length .length = region_length(tp53.start,tp53.end)
tp53println(tp53)
Would add the region length
to the map
tp53
, calculated as end - start
.
OUTPUT
[chromosome:chr17, start:7661779, end:7687538, genome:GRCh38, gene:TP53, length:25759]
For another example, the method each()
when applied to a
map
can take a closure with two arguments, to which it
passes the key-value pair for each entry in the map object:
GROOVY
//closure with two parameters
= { a, b -> println "$a with value $b" }
printMap
//map object
= [ chromosome : "chr17", start : 1, end : 83257441 ]
my_map
//each iterates through each element
.each(printMap) my_map
OUTPUT
chromosome with value chr17
start with value 1
end with value 83257441
Learn more about closures in the Groovy documentation.
Additional Material
Conditional Execution
If statement
One of the most important features of any programming language is the ability to execute different code under different conditions. The simplest way to do this is to use the if construct.
The if statement uses the syntax common to other programming languages such Java, C, JavaScript, etc.
GROOVY
if( < boolean expression > ) {
// true branch
}
else {
// false branch
}
The else branch is optional. Curly brackets are optional when the branch defines just a single statement.
GROOVY
= 12
x if( x > 10 )
"$x is greater than 10" println
null, empty strings and empty collections are evaluated to false. Therefore a statement like:
GROOVY
= [1,2,3]
list if( list != null && list.size() > 0 ) {
println list}
else {
'The list is empty'
println }
Can be written as:
GROOVY
if( list )
println listelse
'The list is empty' println
In some cases can be useful to replace if
statement with
a ternary expression, also known as a conditional expression. For
example:
GROOVY
? list : 'The list is empty' println list
The previous statement can be further simplified using the Elvis
operator ?:
as shown below:
GROOVY
?: 'The list is empty' println list
For statement
The classical for loop syntax is supported as shown here:
GROOVY
for (int i = 0; i <3; i++) {
println("Hello World $i")
}
Iteration over list objects is also possible using the syntax below:
GROOVY
= ['a','b','c']
list
for( String elem : list ) {
println elem}
Functions
It is possible to define a custom function into a script, as shown here:
GROOVY
int fib(int n) {
return n < 2 ? 1 : fib(n-1) + fib(n-2)
}
println (fib(10)) // prints 89
- A function can take multiple arguments separated by commas.
- The
return
keyword can be omitted and the function implicitly returns the value of the last evaluated expression. (Not recommended) - Explicit types can be omitted. (Not recommended):
GROOVY
def fact( n ) {
> 1 ? n * fact(n-1) : 1
n }
println (fact(5)) // prints 120
More resources
The complete Groovy language documentation is available at this link.
A great resource to master Apache Groovy syntax is Groovy in Action.
Keypoints
- Nextflow is a Domain Specific Language (DSL) implemented on top of the Groovy programming language.
- To define a variable, assign a value to it e.g.,
a = 1
. - Comments use the same syntax as in the C-family programming
languages:
//
or multiline/* */
. - Multiple values can be stored in lists \[value1, value2, value3, ...\] or maps \[chromosome: 1, start :1\].
- Lists are indexed and sliced with square brackets (e.g., list\[0\] and list\[2..9\])
- String interpolation (variable interpolation, variable substitution, or variable expansion) is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values.
- A closure is an expression (block of code) encased in
{}
e.g.{ it * it }
.
Comments
When we write any code it is useful to document it using comments. In Nextflow comments use the same syntax as in the C-family programming languages. This can be confusing for people familiar with the
#
syntax for commenting in other languages.GROOVY