Content from Running a Parallel Application on the Cluster
Last updated on 2023-08-02 | Edit this page
Overview
Questions
- What output does the Amdahl code generate?
- Why does parallelizing the amdahl code make it faster?
Objectives
- Run the amdahl parallel code on the cluster
- Note what output is generated, and where it goes
- Predict the trend of execution time vs parallelism
Introduction
A high-performance computing cluster offers powerful computational resources to its users, but taking advantage of these resources is not always straightforward. The cluster system does not work in the same way as systems you may be more familiar with.
The software we will use in this lesson is a model of the kind of parallel task that is well-adapted to high-performance computing resources. It’s called “amdahl”, named for Eugene Amdahl, a famous computer scientist who coined “Amdahl’s Law”, which is about the advantages and limitations of parallelism in code execution.
Callout
Amdahl’s Law is a statement about how much benefit you can expect to get by parallelizing a computer program.
The limitation arises from the fact that, in any application, there is some fraction of the work to be done which is inherently serial, and some fraction which is amenable to parallelization. The law is a quantitative expression of the fact that, by parallelizing the code, you can only ever make the parallel part faster, you cannot reduce the execution time of the serial part.
As a practical matter, this means that developer effort spent on parallelization has diminishing returns on the overall reduction in execution time.
The Amdahl Code
Download it and install it, via pip. Note that amdahl
depends on MPI, so make sure that’s also available.
On the HPC Carpentry cluster:
[user@login1 ~]$ module load OpenMPI
[user@login1 ~]$ module load Python
[user@login1 ~]$ pip install amdahl
Running It on the Cluster
Use the sacct
command to see the run-time. The run-time
is also recorded in the output itself.
[user@login1 ~]$ nano amdahl_1.sh
BASH
#!/bin/bash
#SBATCH -t 00:01 # max 1 minute
#SBATCH -p smnodes # max 4 cores
#SBATCH -n 1 # use 1 core
#SBATCH -o amdahl-np1.out # record result
module load OpenMPI
module load Python
mpirun amdahl
[user@login1 ~]$ sbatch amdahl_1.sh
[user@login1 ~]$ sbatch amdahl_1.sh # serial job ~ 25 sec
[user@login1 ~]$ sbatch amdahl_2.sh # 2-way parallel ~ 20 sec
[user@login1 ~]$ sbatch amdahl_3.sh # 3-way parallel ~ 16 sec
The amdahl code runs faster with 3 processors than with 2, but the speed-up is less than 1.5×.
Content from Introduction to Snakemake
Last updated on 2023-08-02 | Edit this page
Overview
Questions
- What are Snakemake rules?
- Why do Snakemake rules not always run?
Objectives
- Write a single-rule Snakefile and execute it with Snakemake
- Predict whether the rule will run or not
Snakemake
Snakemake is a workflow tool. It takes as input a description of the work that you would like the computer to do, and when run, does the work that you have asked for.
The description of the work takes the form of a series of rules, written in a special format into a Snakefile. Rules have outputs, and the Snakefile and generated output files make up the system state.
Write a Snakemake rule file
Open your favorite editor, do the thing.
Run Snakemake
Throw the switch!
The rule does not get executed the second time. The Snakemake infrastructure is stateful, and knows that the required outputs are up to date.
The rule also does not get executed the third time. The output is not the output from the rule, but the Snakemake infrastructure doesn’t know that, it only checks the file time-stamp. Editing Snakemake-manipulated files can get you into an inconsistent state.
Content from More Complicated Snakefiles
Last updated on 2023-08-02 | Edit this page
Overview
Questions
- What is a task graph?
- How does the Snakemake file express a task graph?
Objectives
- Write a multiple-rule Snakefile with dependent rules
- Translate between a task graph and rule set
Snakemake and Workflow
A Snakefile can contain multiple rules. In the trivial case, there will be no dependencies between the rules, and they can all run concurrently.
A more interesting case is when there are dependencies between the rules, e.g. when one rule takes the output of another rule as its input. In this case, the dependent rule (the one that needs another rule’s output) cannot run until the rule it depends on has completed.
It’s possible to express this relationship by means of a task graph, whose nodes are tasks, and whose arcs are input-output relationships between the tasks.
A Snakemake file is textual description of a task graph.
Write a multi-rule Snakemake rule file
Open your favorite editor, do the thing.
Run Snakemake
Throw the switch!
The rules in the snakefile are nodes in the task graph. Two rules are connected by an arc in the task graph if the output of one rule is the input to the other. The task graph is directed, so the arc points from the rule that generates a file as output to the rule that consumes the same file as input.
A rule with an output that no other rules consumes is a terminal rule.
Content from Snakemake and the Cluster
Last updated on 2023-08-02 | Edit this page
Overview
Questions
- How can we express a one-task cluster operation in Snakemake?
Objectives
- Write a Snakefile that executes a job on the cluster
- Use MPI options to ensure the job runs in parallel
Snakemake and the Cluster
Snakemake has provisions for operating on an HPC cluster.
Various command-line arguments can be provided to tell Snakemake not to run things locally, but do run things via the queuing system instead.
In this lesson, we will repeat the first module, running the admahl code on the cluster, but will use snakemake to make it happen.
Write a cluster Snakemake rule file
Open your favorite editor, do the thing. Specify resources. Provide command line arguments to do the cluster operations by hand.
Run Snakemake
Throw the switch!
Use the “mpi” option in the resource block of the Snakemake rule, and
specify the number of tasks. This will be mapped to the -n
argument of the equivalent sbatch
command.
Content from Snakemake Profiles
Last updated on 2023-08-02 | Edit this page
Overview
Questions
- How can we encapsulate our desired snakemake configuration?
- How do we balance non-reptition and customizability?
Objectives
- Write a Snakemake profile for the cluster
- Run the amdahl code with varying degrees of parallelism with the cluster profile.
Snakemake Profiles
Snakemake has a provision for profiles, which allow users to collect various common settings together in a special file that snakemake examines when it runs. This lets users avoid repetition and possible errors of omission for common settings, and encapsulates some of the cluster complexity we encoutered in the previous module.
Not all settings should be in the profile. Users can choose which ones to make static and which ones to make adjustable. In our case, we will want to have the freedom to choose the degree of parallelism, but most of the cluster arguments will not change, and so can be static in the profile.
Write a Profile
Do the thing.
Run Snakemake
Throw the switch!
The profile files can have variables taken from the rule file, and in particular can refer to resources from a rule.
Content from Amdahl Parallel Runs
Last updated on 2023-08-02 | Edit this page
Overview
Questions
- How can we collect data on Amdahl run times?
Objectives
- Collect systematic data on the runtime of the amdahl code
Systematic Data Collection
Using what we have learned so far, including Snakemake profiles and rules, we will now compose a Snakefile that runs the Amdahl example code over a range of parallel widths. This workflow will generate the data we will use in the next module to demonstrate the diminishing returns of increasing parallelism.
Write a File
Compose the Snakemake file that does what we want.
We can put the widths in a list and iterate over them. We will use the profile generated previously to ensure that the jobs run on the cluster.
Run Snakemake
Throw the switch!
Arbitrary parameters are still finite, so you could just generate a flat list of all the combinations, and iterate over that. Or you could generate two lists and do a nested loop.