This lesson is still being designed and assembled (Pre-Alpha version)

Parallel Programming in Python

Python is one of most widely used languages to do scientific data analysis, visualization, and even modelling and simulation. The popularity of Python is mainly due to the two pillars of a friendly syntax together with the availability of many high-quality libraries. The flexibility that Python offers comes with a few downsides though: code typically doesn’t perform as fast as lower-level implementations in C/C++ or Fortran, and it is not trivial to parallelize Python code to work efficiently on many-core architectures. This workshop addresses both these issues, with an emphasis on being able to run Python code efficiently (in parallel) on multiple cores.

We’ll start with learning to recognize problems that are suitable for parallel processing, looking at dependency diagrams and kitchen recipes. From then on, the workshop is highly interactive, diving straight into the first parallel programs. Participants will be coding along with the instructor in the style of teaching like Software Carpentry. This workshop teaches the principles of parallel programming in Python using Dask, Numba and Snakemake. More importantly, we try to give insight in how these different methods perform and when they should be used.

Prerequisites

The participant should be:

  • familiar with basic Python: control flow, functions, numpy
  • comfortable working in Jupyter

Recommended:

  • understand how NumPy and/or Pandas work

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What problems are we solving, and what are we not discussing?
Why do we use Python?
What is parallel programming?
Why can it be hard to write a parallel program?
00:25 2. Measuring performance How do we know our program ran faster?
How do we learn about efficiency?
01:25 3. Understanding parallelization in Python What is the Global Interpreter Lock (GIL)?
How do I parallelize a Python application?
What is data parallelism?
What is task parallelism?
How do I use multiple threads in Python?
02:55 4. Dask abstractions: bags and delays What abstractions does Dask offer?
What programming patterns exist in the parallel universe?
04:25 5. Snakemake What are computational workflows?
How do you program using a build system?
How do I mix Python code into a workflow?
05:15 6. Asyncio fundamentals What is AsyncIO?
How do I structure an async program?
06:15 7. Dynamic programming How can I save intermediate result and recover from crashes?
How can I prevent duplicate computations?
06:15 8. Calling external C and C++ libraries from Python What are some of my options in calling C and C++ libraries from Python code?
How does this work together with Numpy arrays?
How do I use this in multiple threads while lifting the GIL?
07:45 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.