Reproducible Computational Environments Using Containers: Introduction to Docker: Key Points

Introducing Containers

Almost all software depends on other software components to function, but these components have independent evolutionary paths.
Small environments that contain only the software that is needed for a given task are easier to replicate and maintain.
Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way.
Virtualization allows multiple environments to run on a single computer.
Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources.
Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image.
Docker is just one software platform that can create containers and the resources they use.

A toolbar icon indicates that Docker is ready to use (on Windows and macOS).
You will typically interact with Docker using the command line.
To learn how to run a certain Docker command, we can type the command followed by the --help flag.

The docker image pull command downloads Docker container images from the internet.
The docker image ls command lists Docker container images that are (now) on your computer.
The docker container run command creates running containers from container images and can run commands inside them.
When using the docker container run command, a container can run a default action (if it has one), a user specified action, or a shell to be used interactively.

docker container has subcommands used to interact and manage containers.
docker image has subcommands used to interact and manage container images.
docker container ls or docker ps can provide information on currently running containers.

The Docker Hub is an online repository of container images.
Many Docker Hub container images are public, and may be officially endorsed.
Each Docker Hub page about a container image provides structured information and subheadings
Most Docker Hub pages about container images contain sections that provide examples of how to use those container images.
Many Docker Hub container images have multiple versions, indicated by tags.
The naming convention for Docker container images is: OWNER/CONTAINER_IMAGE_NAME:TAG

Dockerfiles specify what is within Docker container images.
The docker image build command is used to build a container image from a Dockerfile.
You can share your Docker container images through the Docker Hub so that others can create Docker containers from your container images.

Docker allows containers to read and write files from the Docker host.
You can include files from your Docker host into your Docker container images by using the COPY instruction in your Dockerfile.

There are many ways you might use Docker and existing container images in your research project.

Container images allow us to encapsulate the computation (and data) we have used in our research.
Using a service such as Docker Hub allows us to easily share computational work we have done.
Using container images along with a DOI service such as Zenodo allows us to capture our work and enables reproducibility.