This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Reproducible Computational Environments Using Containers: Introduction to Docker: Glossary

Key Points

Introducing Containers
  • Almost all software depends on other software components to function, but these components have independent evolutionary paths.

  • Small environments that contain only the software that is needed for a given task are easier to replicate and maintain.

  • Critical systems that cannot be upgraded, due to cost, difficulty, etc. need to be reproduced on newer systems in a maintainable and self-documented way.

  • Virtualization allows multiple environments to run on a single computer.

  • Containerization improves upon the virtualization of whole computers by allowing efficient management of the host computer’s memory and storage resources.

  • Containers are built from ‘recipes’ that define the required set of software components and the instructions necessary to build/install them within a container image.

  • Docker is just one software platform that can create containers and the resources they use.

Introducing the Docker Command Line
  • A toolbar icon indicates that Docker is ready to use (on Windows and macOS).

  • You will typically interact with Docker using the command line.

  • To learn how to run a certain Docker command, we can type the command followed by the --help flag.

Exploring and Running Containers
  • The docker image pull command downloads Docker container images from the internet.

  • The docker image ls command lists Docker container images that are (now) on your computer.

  • The docker container run command creates running containers from container images and can run commands inside them.

  • When using the docker container run command, a container can run a default action (if it has one), a user specified action, or a shell to be used interactively.

Cleaning Up Containers
  • docker container has subcommands used to interact and manage containers.

  • docker image has subcommands used to interact and manage container images.

  • docker container ls or docker ps can provide information on currently running containers.

Finding Containers on Docker Hub
  • The Docker Hub is an online repository of container images.

  • Many Docker Hub container images are public, and may be officially endorsed.

  • Each Docker Hub page about a container image provides structured information and subheadings

  • Most Docker Hub pages about container images contain sections that provide examples of how to use those container images.

  • Many Docker Hub container images have multiple versions, indicated by tags.

  • The naming convention for Docker container images is: OWNER/CONTAINER_IMAGE_NAME:TAG

Creating Your Own Container Images
  • Dockerfiles specify what is within Docker container images.

  • The docker image build command is used to build a container image from a Dockerfile.

  • You can share your Docker container images through the Docker Hub so that others can create Docker containers from your container images.

Creating More Complex Container Images
  • Docker allows containers to read and write files from the Docker host.

  • You can include files from your Docker host into your Docker container images by using the COPY instruction in your Dockerfile.

Examples of Using Container Images in Practice
  • There are many ways you might use Docker and existing container images in your research project.

Containers in Research Workflows: Reproducibility and Granularity
  • Container images allow us to encapsulate the computation (and data) we have used in our research.

  • Using a service such as Docker Hub allows us to easily share computational work we have done.

  • Using container images along with a DOI service such as Zenodo allows us to capture our work and enables reproducibility.

Glossary

Command-line argument/option
See the Carpentries Glossario entry
Command-line interface (CLI)
See the Carpentries Glossario entry
Container
A particular instance of a lightweight virtual machine derived from a container image. Containers are typically transient, unlike container images which persist.
Container image
The persistent binary artefact that encapsulates the set of files and configuration for running an instance of a container. Sometimes shortened to just image
CPU/processor
See the Carpentries Glossario entry
Dependency
See the Carpentries Glossario entry
Dependency hell
A colloquial term for the frustration of some software users who run into issues with software packages which have dependencies on specific versions of other software packages. The dependency issue arises when several packages have dependencies on the same shared packages or libraries, but they depend on different and incompatible versions of the shared packages. If the shared package or library can only be installed in a single version, the user may need to address the problem by obtaining newer or older versions of the dependent packages. This, in turn, may break other dependencies and push the problem to another set of packages. Extract from Wikipedia
Digital object identifier (DOI)
See the Carpentries Glossario entry
Docker
A software framework for creating, running and managing containers.
Docker build context
The docker build command builds Docker images from a Dockerfile and a “context”. A build's context is the set of files located in the specified PATH or URL.
Docker Hub
An online library of Docker container images.
Docker Hub repository
A collection of related Docker container images hosted on Docker Hub.
Docker tag
The specific version identifier associated with a Docker container image.
Dockerfile
The file containing the commands to build a Docker container image along with the Docker context.
Filesystem
See the Carpentries Glossario entry
Filesystem layer
Each container image is made up of multiple read-only filesystem layers that represent the file system differences from the layers below them in the image.
Hardware
See the Carpentries Glossario entry
Hard drive
The hardware in a computer that hosts the filesystem (or, sometimes, other storage types).
Host computer
The computer system which is running the container.
Memory/RAM
Random Access Memory (RAM) is where data the CPU is working with is temporarily stored.
Operating system (OS)
See the Carpentries Glossario entry
Reproducible research
See the Carpentries Glossario entry
Software library
See the Carpentries Glossario entry
Tar archive
A file archive format commonly used in Unix-like operating systems that combines multiple files into a single file. tar archive files are used as the export format of Docker images.
Virtualization
Containers are an example of virtualization – having a second “virtual” computer running and accessible from a host computer.