This lesson is in the early stages of development (Alpha version)

Introduction to Conda for (Data) Scientists: Glossary

Key Points

Getting Started with Conda
  • Conda is a platform agnostic, open source package and environment management system.

  • Using a package and environment management tool facilitates portability and reproducibility of (data) science workflows.

  • Conda (+pip) solves both the package and environment managment problems and targets multiple programming languages. Other open source tools solve either one or the other, or target only a particular programming language.

Working with Environments
  • A Conda environment is a directory that contains a specific collection of Conda packages that you have installed.

  • You create (remove) a new environment using the conda create (conda remove) commands.

  • You activate (deactivate) an environment using the conda activate (conda deactivate) commands.

  • You install packages into environments using conda install; you install packages into an active environment using pip install.

  • You should install each environment as a sub-directory inside its corresponding project directory

  • Use the conda env list command to list existing environments and their respective locations.

  • Use the conda list command to list all of the packages installed in an environment.

Sharing Environments
  • Sharing Conda environments with other researchers facilitates the reprodicibility of your research.

  • Create anenvironment.yml file that describes your project’s software environment.

  • Creating custom kernels enables you to connect your Conda environments to an existing JupterLab install.

Using Packages and Channels
  • A package is a tarball containing system-level libraries, Python or other modules, executable programs and other components, and associated metadata.

  • A Conda channel is a URL to a directory containing a Conda package(s).

  • Explicitly including the channels (and their priority!) in a project’s environment file is necessary for another researcher to completely re-create that project’s software environment.

Managing GPU dependencies
  • Conda can be used to manage your key GPU dependencies.

  • Use conda search to identify which version of CUDA libraries are available.

  • For most projects you will not need NVCC and can use the cudatoolkit package from default channels.

  • If your project does need NVCC, try cudatoolkit-dev package or nvcc_linux-64 meta-package (requires separate NVIDIA CUDA Toolkit install).