Running a Parallel Application on the Cluster

Last updated on 2023-08-02 | Edit this page



  • What output does the Amdahl code generate?
  • Why does parallelizing the amdahl code make it faster?


  • Run the amdahl parallel code on the cluster
  • Note what output is generated, and where it goes
  • Predict the trend of execution time vs parallelism


A high-performance computing cluster offers powerful computational resources to its users, but taking advantage of these resources is not always straightforward. The cluster system does not work in the same way as systems you may be more familiar with.

The software we will use in this lesson is a model of the kind of parallel task that is well-adapted to high-performance computing resources. It’s called “amdahl”, named for Eugene Amdahl, a famous computer scientist who coined “Amdahl’s Law”, which is about the advantages and limitations of parallelism in code execution.


Amdahl’s Law is a statement about how much benefit you can expect to get by parallelizing a computer program.

The limitation arises from the fact that, in any application, there is some fraction of the work to be done which is inherently serial, and some fraction which is amenable to parallelization. The law is a quantitative expression of the fact that, by parallelizing the code, you can only ever make the parallel part faster, you cannot reduce the execution time of the serial part.

As a practical matter, this means that developer effort spent on parallelization has diminishing returns on the overall reduction in execution time.

The Amdahl Code

Download it and install it, via pip. Note that amdahl depends on MPI, so make sure that’s also available.

On the HPC Carpentry cluster:

[user@login1 ~]$ module load OpenMPI
[user@login1 ~]$ module load Python
[user@login1 ~]$ pip install amdahl

Running It on the Cluster

Use the sacct command to see the run-time. The run-time is also recorded in the output itself.

[user@login1 ~]$ nano


#SBATCH -t 00:01          # max 1 minute
#SBATCH -p smnodes        # max 4 cores
#SBATCH -n 1              # use 1 core
#SBATCH -o amdahl-np1.out # record result

module load OpenMPI
module load Python

mpirun amdahl
[user@login1 ~]$ sbatch


Run the amdhal code with a few (small!) levels of parallelism. Make a quantitative estimate of how much faster the code will run with 3 processors than 2. The naive estimate would be that it would run 1.5× the speed, or equivalently, that it would complete in 2/3 the time.

[user@login1 ~]$ sbatch  # serial job     ~ 25 sec
[user@login1 ~]$ sbatch  # 2-way parallel ~ 20 sec
[user@login1 ~]$ sbatch  # 3-way parallel ~ 16 sec

The amdahl code runs faster with 3 processors than with 2, but the speed-up is less than 1.5×.

Key Points

  • The amdahl code is a model of a parallel application
  • The execution speed depends on the degree of parallelism