This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Introduction to Conda for (Data) Scientists: Glossary

Key Points

Getting Started with Conda
  • Conda is a platform agnostic, open source package and environment management system.

  • Using a package and environment management tool facilitates portability and reproducibility of (data) science workflows.

  • Conda solves both the package and environment management problems and targets multiple programming languages. Other open source tools solve either one or the other, or target only a particular programming language.

  • Anaconda is not only for Python

Working with Environments
  • A Conda environment is a directory that contains a specific collection of Conda packages that you have installed.

  • You create (remove) a new environment using the conda create (conda remove) commands.

  • You activate (deactivate) an environment using the conda activate (conda deactivate) commands.

  • You install packages into environments using conda install; you install packages into an active environment using pip install.

  • You should install each environment as a sub-directory inside its corresponding project directory

  • Use the conda env list command to list existing environments and their respective locations.

  • Use the conda list command to list all of the packages installed in an environment.

Using Packages and Channels
  • A package is a tarball containing system-level libraries, Python or other modules, executable programs and other components, and associated metadata.

  • A Conda channel is a URL to a directory containing a Conda package(s).

  • Explicitly including the channels (and their priority!) in a project’s environment file is necessary for another researcher to completely re-create that project’s software environment.

  • Understand how to use Conda and Pip together effectively.

Sharing Environments
  • Sharing Conda environments with other researchers facilitates the reprodicibility of your research.

  • Create anenvironment.yml file that describes your project’s software environment.

  • Creating custom kernels enables you to connect your Conda environments to an existing JupterLab install.

Managing GPU dependencies
  • Conda can be used to manage your key GPU dependencies.

  • Use conda search to identify which version of CUDA libraries are available.

  • For most projects you will not need NVCC and can use the cudatoolkit package from default channels.

  • If your project does need NVCC, try cudatoolkit-dev package or nvcc_linux-64 meta-package (requires separate NVIDIA CUDA Toolkit install).

Glossary

FIXME