HPC Workflow Management with Snakemake: Key Points

Pre-Alpha

HPC Workflow Management with Snakemake

Running commands with Snakemake

“Before running Snakemake you need to write a Snakefile”
“A Snakefile is a text file which defines a list of rules”
“Rules have inputs, outputs, and shell commands to be run”
“You tell Snakemake what file to make and it will run the shell command defined in the appropriate rule”

Running Snakemake on the cluster

“Snakemake generates and submits its own batch scripts for your scheduler.”
“You can store default configuration settings in a Snakemake profile”
“localrules defines rules that are executed locally, and never submitted to a cluster.”

Placeholders

“Snakemake rules are made more generic with placeholders”
“Placeholders in the shell part of the rule are replaced with values based on the chosen wildcards”

MPI applications and Snakemake

“Snakemake chooses the appropriate rule by replacing wildcards such that the output matches the target”
“Snakemake checks for various error conditions and will stop if it sees a problem”

Chaining rules

“Snakemake links rules by iteratively looking for rules that make missing inputs”
“Rules may have multiple named inputs and/or outputs”
“If a shell command does not yield an expected output then Snakemake will regard that as a failure”

Processing lists of inputs

“Use the expand() function to generate lists of filenames you want to combine”
“Any {input} to a rule can be a variable-length list”