Content from Running a Parallel Application on the Cluster


Last updated on 2023-08-02 | Edit this page

Overview

Questions

  • What output does the Amdahl code generate?
  • Why does parallelizing the amdahl code make it faster?

Objectives

  • Run the amdahl parallel code on the cluster
  • Note what output is generated, and where it goes
  • Predict the trend of execution time vs parallelism

Introduction


A high-performance computing cluster offers powerful computational resources to its users, but taking advantage of these resources is not always straightforward. The cluster system does not work in the same way as systems you may be more familiar with.

The software we will use in this lesson is a model of the kind of parallel task that is well-adapted to high-performance computing resources. It’s called “amdahl”, named for Eugene Amdahl, a famous computer scientist who coined “Amdahl’s Law”, which is about the advantages and limitations of parallelism in code execution.

Callout

Amdahl’s Law is a statement about how much benefit you can expect to get by parallelizing a computer program.

The limitation arises from the fact that, in any application, there is some fraction of the work to be done which is inherently serial, and some fraction which is amenable to parallelization. The law is a quantitative expression of the fact that, by parallelizing the code, you can only ever make the parallel part faster, you cannot reduce the execution time of the serial part.

As a practical matter, this means that developer effort spent on parallelization has diminishing returns on the overall reduction in execution time.

The Amdahl Code


Download it and install it, via pip. Note that amdahl depends on MPI, so make sure that’s also available.

On the HPC Carpentry cluster:

[user@login1 ~]$ module load OpenMPI
[user@login1 ~]$ module load Python
[user@login1 ~]$ pip install amdahl

Running It on the Cluster


Use the sacct command to see the run-time. The run-time is also recorded in the output itself.

[user@login1 ~]$ nano amdahl_1.sh

BASH

#!/bin/bash
#SBATCH -t 00:01          # max 1 minute
#SBATCH -p smnodes        # max 4 cores
#SBATCH -n 1              # use 1 core
#SBATCH -o amdahl-np1.out # record result

module load OpenMPI
module load Python

mpirun amdahl
[user@login1 ~]$ sbatch amdahl_1.sh

Challenge

Run the amdhal code with a few (small!) levels of parallelism. Make a quantitative estimate of how much faster the code will run with 3 processors than 2. The naive estimate would be that it would run 1.5× the speed, or equivalently, that it would complete in 2/3 the time.

[user@login1 ~]$ sbatch amdahl_1.sh  # serial job     ~ 25 sec
[user@login1 ~]$ sbatch amdahl_2.sh  # 2-way parallel ~ 20 sec
[user@login1 ~]$ sbatch amdahl_3.sh  # 3-way parallel ~ 16 sec

The amdahl code runs faster with 3 processors than with 2, but the speed-up is less than 1.5×.

Key Points

  • The amdahl code is a model of a parallel application
  • The execution speed depends on the degree of parallelism

Content from Introduction to Snakemake


Last updated on 2023-08-02 | Edit this page

Overview

Questions

  • What are Snakemake rules?
  • Why do Snakemake rules not always run?

Objectives

  • Write a single-rule Snakefile and execute it with Snakemake
  • Predict whether the rule will run or not

Snakemake


Snakemake is a workflow tool. It takes as input a description of the work that you would like the computer to do, and when run, does the work that you have asked for.

The description of the work takes the form of a series of rules, written in a special format into a Snakefile. Rules have outputs, and the Snakefile and generated output files make up the system state.

Write a Snakemake rule file


Open your favorite editor, do the thing.

Run Snakemake


Throw the switch!

Challenge

Remove the output file, and run Snakemake. Then run it again. Edit the output file, and run it a third time. For which of these invocations does Snakemake do non-trivial work?

The rule does not get executed the second time. The Snakemake infrastructure is stateful, and knows that the required outputs are up to date.

The rule also does not get executed the third time. The output is not the output from the rule, but the Snakemake infrastructure doesn’t know that, it only checks the file time-stamp. Editing Snakemake-manipulated files can get you into an inconsistent state.

Key Points

  • Snakemake is an indirect way of running executables
  • Snakemake has a notion of system state, and can be fooled.

Content from More Complicated Snakefiles


Last updated on 2023-08-02 | Edit this page

Overview

Questions

  • What is a task graph?
  • How does the Snakemake file express a task graph?

Objectives

  • Write a multiple-rule Snakefile with dependent rules
  • Translate between a task graph and rule set

Snakemake and Workflow


A Snakefile can contain multiple rules. In the trivial case, there will be no dependencies between the rules, and they can all run concurrently.

A more interesting case is when there are dependencies between the rules, e.g. when one rule takes the output of another rule as its input. In this case, the dependent rule (the one that needs another rule’s output) cannot run until the rule it depends on has completed.

It’s possible to express this relationship by means of a task graph, whose nodes are tasks, and whose arcs are input-output relationships between the tasks.

A Snakemake file is textual description of a task graph.

Write a multi-rule Snakemake rule file


Open your favorite editor, do the thing.

Run Snakemake


Throw the switch!

Challenge

Draw the task graph for your Snakefile.

Given an example task graph, write a Snakefile that implements it.

The rules in the snakefile are nodes in the task graph. Two rules are connected by an arc in the task graph if the output of one rule is the input to the other. The task graph is directed, so the arc points from the rule that generates a file as output to the rule that consumes the same file as input.

A rule with an output that no other rules consumes is a terminal rule.

Key Points

  • Snakemake rule files can be mapped to task graphs
  • Tasks are executed as required in dependency order
  • Where possible, tasks may run concurrently.

Content from Snakemake and the Cluster


Last updated on 2023-08-02 | Edit this page

Overview

Questions

  • How can we express a one-task cluster operation in Snakemake?

Objectives

  • Write a Snakefile that executes a job on the cluster
  • Use MPI options to ensure the job runs in parallel

Snakemake and the Cluster


Snakemake has provisions for operating on an HPC cluster.

Various command-line arguments can be provided to tell Snakemake not to run things locally, but do run things via the queuing system instead.

In this lesson, we will repeat the first module, running the admahl code on the cluster, but will use snakemake to make it happen.

Write a cluster Snakemake rule file


Open your favorite editor, do the thing. Specify resources. Provide command line arguments to do the cluster operations by hand.

Run Snakemake


Throw the switch!

Challenge

How can you control the degree of parallelism of your cluster task?

Use the “mpi” option in the resource block of the Snakemake rule, and specify the number of tasks. This will be mapped to the -n argument of the equivalent sbatch command.

Key Points

  • Snakemake rule files can submit cluster jobs.
  • There are a lot of options.

Content from Snakemake Profiles


Last updated on 2023-08-02 | Edit this page

Overview

Questions

  • How can we encapsulate our desired snakemake configuration?
  • How do we balance non-reptition and customizability?

Objectives

  • Write a Snakemake profile for the cluster
  • Run the amdahl code with varying degrees of parallelism with the cluster profile.

Snakemake Profiles


Snakemake has a provision for profiles, which allow users to collect various common settings together in a special file that snakemake examines when it runs. This lets users avoid repetition and possible errors of omission for common settings, and encapsulates some of the cluster complexity we encoutered in the previous module.

Not all settings should be in the profile. Users can choose which ones to make static and which ones to make adjustable. In our case, we will want to have the freedom to choose the degree of parallelism, but most of the cluster arguments will not change, and so can be static in the profile.

Write a Profile


Do the thing.

Run Snakemake


Throw the switch!

Challenge

Write a profile that allows you to choose a different partition, in addition to the level of parallelism.

The profile files can have variables taken from the rule file, and in particular can refer to resources from a rule.

Key Points

  • Snakemake profiles encapsulate cluster complexity.
  • Retaining operational flexibliity is also important.

Content from Amdahl Parallel Runs


Last updated on 2023-08-02 | Edit this page

Overview

Questions

  • How can we collect data on Amdahl run times?

Objectives

  • Collect systematic data on the runtime of the amdahl code

Systematic Data Collection


Using what we have learned so far, including Snakemake profiles and rules, we will now compose a Snakefile that runs the Amdahl example code over a range of parallel widths. This workflow will generate the data we will use in the next module to demonstrate the diminishing returns of increasing parallelism.

Write a File


Compose the Snakemake file that does what we want.

We can put the widths in a list and iterate over them. We will use the profile generated previously to ensure that the jobs run on the cluster.

Run Snakemake


Throw the switch!

Challenge

Our example has a single paramter, the parallelism, that we vary. How would you generalize this to arbitrary parameters?

Arbitrary parameters are still finite, so you could just generate a flat list of all the combinations, and iterate over that. Or you could generate two lists and do a nested loop.

Key Points

  • A relatively compact snakemake file collects interesting data.