This lesson is being piloted (Beta version)

Parallel Programming in Python

Python is one of most widely used languages to do scientific data analysis, visualization, and even modelling and simulation. The popularity of Python is mainly due to the two pillars of a friendly syntax together with the availability of many high-quality libraries. The flexibility that Python offers comes with a few downsides though: code typically doesn’t perform as fast as lower-level implementations in C/C++ or Fortran, and it is not trivial to parallelize Python code to work efficiently on many-core architectures. This workshop addresses both these issues, with an emphasis on being able to run Python code efficiently (in parallel) on multiple cores.

We’ll start with learning to recognize problems that are suitable for parallel processing, looking at dependency diagrams and kitchen recipes. From then on, the workshop is highly interactive, diving straight into the first parallel programs. Participants will be coding along with the instructor in the style of teaching like Software Carpentry. This workshop teaches the principles of parallel programming in Python using Dask, Numba and Snakemake. More importantly, we try to give insight in how these different methods perform and when they should be used.

Prerequisites

The course is aimed at graduate students and other researchers.

The participant should be:

  • familiar with basic Python: control flow, functions, numpy
  • comfortable working in Jupyter

Recommended:

  • understand how NumPy and/or Pandas work

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What problems are we solving, and what are we not discussing?
Why do we use Python?
What is parallel programming?
Why can it be hard to write a parallel program?
00:25 2. Measuring performance How do we know our program ran faster?
How do we learn about efficiency?
01:25 3. Accellerators: vectorized Numpy and Numba How do I parallelize a Python application?
What is data parallelism?
What is task parallelism?
02:55 4. Dask abstractions: delays What abstractions does Dask offer?
What programming patterns exist in the parallel universe?
04:25 5. Threading and Multiprocessing What is the Global Interpreter Lock (GIL)?
How do I use multiple threads in Python?
05:55 6. Dask abstractions: bags What abstractions does Dask offer?
What programming patterns exist in the parallel universe?
07:25 7. Snakemake What are computational workflows?
How do you program using a build system?
How do I mix Python code into a workflow?
08:15 8. Exercise: Photo Mosaic How do I decide which technique to use where?
How do I put everything together?
Can you show some real life examples?
09:45 9. Exercise: Mandelbrot fractals How do I decide which technique to use where?
How do I put everything together?
Can you show some real life examples?
11:15 10. Dynamic programming How can I save intermediate result and recover from crashes?
How can I prevent duplicate computations?
11:15 11. Calling external C and C++ libraries from Python What are some of my options in calling C and C++ libraries from Python code?
How does this work together with Numpy arrays?
How do I use this in multiple threads while lifting the GIL?
12:45 12. Asyncio fundamentals What is AsyncIO?
How do I structure an async program?
13:45 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.