Using Packages and Channels
OverviewTeaching: 20 min
Exercises: 10 minQuestions
What are Conda channels?
What are Conda packages?
Why should I be explicit about which channels my research project uses?Objectives
Install a package from a specific channel.
What are Conda packages?
A [conda package][conda-pkg-docs] is a compressed tarball file (
.tar.bz2) that contains:
- system-level libraries
- Python or other modules
- executable programs and other components
- metadata under the
- a collection of files that are installed directly into an
Conda keeps track of the dependencies between packages and platforms; the conda package format is identical across platforms and operating systems.
All conda packages have a specific sub-directory structure inside the tarball file. There is a
bin directory that contains any binaries for the package; a
lib directory containing the
relevant library files (i.e., the
.py files); and an
info directory containing package metadata.
For a more details of the conda package specification, including discussions of the various
metadata files, see the [docs][conda-pkg-spec-docs].
As an example of Conda package structure consider the Conda package for
Python 3.6 version of PyTorch targeting a 64-bit Mac OS,
. ├── bin │ └── convert-caffe2-to-onnx │ └── convert-onnx-to-caffe2 ├── info │ ├── LICENSE.txt │ ├── about.json │ ├── files │ ├── git │ ├── has_prefix.json │ ├── hash_input.json │ ├── index.json │ ├── paths.json │ ├── recipe/ │ └── test/ └── lib └── python3.6 └── site-packages ├── caffe2/ ├── torch/ └── torch-1.1.0-py3.6.egg-info/
A complete listing of available PyTorch packages can be found on Anaconda Cloud.
What are Conda channels?
Again from the [Conda documentation][conda-channels-docs], conda packages are downloaded from
remote channels, which are URLs to directories containing conda packages. The
searches a default set of channels, and packages are automatically downloaded and updated from the
Anaconda Cloud channels.
main: The majority of all new Anaconda, Inc. package builds are hosted here. Included in conda’s defaults channel as the top priority channel.
r: Microsoft R Open conda packages and Anaconda, Inc.’s R conda packages. This channel is included in conda’s defaults channel. When creating new environments, MRO is now chosen as the default R implementation.
Collectively, the Anaconda managed channels are referred to as the
defaults channel because,
unless otherwise specified, packages installed using
conda will be downloaded from these
In addition to the
defaultchannels that are managed by Anaconda Inc., there is another channel called that also has a special status. The Conda-Forge project “is a community led collection of recipes, build infrastructure and distributions for the conda package manager.”
There are a few reasons that you may wish to use the
conda-forgechannel instead of the
defaultschannel maintained by Anaconda:
- Packages on
conda-forgemay be more up-to-date than those on the
- There are packages on the
conda-forgechannel that aren’t available from
- You may wish to use a dependency such as
conda-forge) instead of
How do I install a package from a specific channel?
You can install a package from a specific channel into the currently activate environment by
--channel option to the
conda install command as follows.
$ conda install scipy=1.3 --channel conda-forge
You can also install a package from a specific channel into a named environment (using
or into an environment installed at a particular prefix (using
--prefix). For example, the
following command installs the
scipy package from the
conda-forge channel into the environment
my-first-conda-env which we created eariler.
$ conda install scipy=1.3 --channel conda-forge --name my-first-conda-env
This command would install
tensorflow package from
conda-forge channel into an environment
installed into the
$ conda install tensorflow=1.13 --channel conda-forge --prefix ./env
Here is another example for R users. The following command would install
r-tidyverse package from the
r channel into an
environment installed into the
$ conda install r-tidyverse=1.2 --channel r --prefix ./env
In this case the
--channel option is unnecessary because the
r channel is included by default.
The following works just as well!
$ conda install r-tidyverse=1.2 --prefix ./env
You may specify multiple channels for installing packages by passing the
--channelargument multiple times.
$ conda install scipy=1.3 --channel conda-forge --channel bioconda
Channel priority decreases from left to right - the first argument has higher priority than the second. For reference, bioconda is a channel for the conda package manager specializing in bioinformatics software. For those interested in learning more about the Bioconda project, checkout the project’s GitHub page.
My package isn’t available on the
defaults channel! What should I do?
It may very well be the case that packages (or often more recent versions of packages!) that you need to
install for your project are not available on the
defaults channel. In this case you should try the
conda-forgechannel contains a large number of community curated conda packages. Typically the most recent versions of packages that are generally available via the
defaultschannel are available on
pip: only if a package is not otherwise available via
conda-forge(or some domain-specific channel like
bioconda) should a package be installed into a conda environment from PyPI using
For example, Kaggle publishes a Python 3 API that can be used to interact with Kaggle datasets, kernels and competition submissions. You can search for the package on the
defaults channels but you will not find it!
$ conda search kaggle Loading channels: done No match found for: kaggle. Search: *kaggle* PackagesNotFoundError: The following packages are not available from current channels: - kaggle Current channels: - https://repo.anaconda.com/pkgs/main/osx-64 - https://repo.anaconda.com/pkgs/main/noarch - https://repo.anaconda.com/pkgs/free/osx-64 - https://repo.anaconda.com/pkgs/free/noarch - https://repo.anaconda.com/pkgs/r/osx-64 - https://repo.anaconda.com/pkgs/r/noarch To search for alternate channels that may provide the conda package you're looking for, navigate to https://anaconda.org and use the search bar at the top of the page.
The official installation instructions suggest downloading
kaggle package using
pip. But since we are using
conda we should check whether the
package exists on at least
conda-forge channel before proceeding to use
$ conda search conda-forge::kaggle Loading channels: done # Name Version Build Channel kaggle 1.5.3 py27_1 conda-forge kaggle 1.5.3 py36_1 conda-forge kaggle 1.5.3 py37_1 conda-forge kaggle 1.5.4 py27_0 conda-forge kaggle 1.5.4 py36_0 conda-forge kaggle 1.5.4 py37_0 conda-forge
Once we know that the
kaggle package is available via
conda-forge we can go ahead and install
it! Note that we are explicitly providing both the channel to use when installing the
package as well as a specific version number.
$ conda install conda-forge::kaggle=1.5.4 --prefix ./env
For the moment let us suppose that the
kaggle package was not avaiable on
conda-forge. Here is
how we would install the package into our environment using
pipinto our environment (if necessary).
- Activate the enviroment (if necessary).
$ conda install pip --prefix ./env $ source activate ./env $ pip install $SOME_PACKAGE
Since you write
environment.ymlfiles for all of your projects, you might be wondering how to specify that packages should be installed using
environment.ymlfile. Here is an example
environment.ymlfile that uses
pipto install the
name: null dependencies: - jupyterlab=1.0 - matplotlib=3.1 - pandas=0.24 - scikit-learn=0.21 - pip=19.1 - pip: - kaggle=1.5 - yellowbrick=0.9
Note that you should include
pipitself as a dependency and then a sub-section denoting those packages to be installed via
pip. Also in case you are wondering, The Yellowbrick package is a suite of visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process. Yellowbrick can also be installed using
$ conda install --channel districtdatalabs yellowbrick=0.9 --prefix ./env
What actually happens when I install packages?
During the installation process, files are extracted into the specified environment (defaulting to the current environment if none is specified). Installing the files of a conda package into an environment can be thought of as changing the directory to an environment, and then downloading and extracting the package and its dependencies.
For example, when you
conda install a package that exists in a channel and has no dependencies,
conda does the following.
- looks at your configured channels (in priority)
- reaches out to the repodata associated with your channels/platform
- parses repodata to search for the package
- once the package is found, conda pulls it down and installs
The [conda documentation][conda-install-docs] has a nice decision tree that describes the package installation process.
Specifying channels when installing packages
Like many projects, PyTorch has its own channel on Anaconda Cloud. This channel has several interesting packages, in particular
pytorch(PyTorch core) and
torchvision(datasets, transforms, and models specific to computer vision).
Create a new directory called
my-computer-vision-projectand then create a Python 3.6 environment in a sub-directory called
env/with the three packages listed above. Also include the most recent version of
jupyterlabin your environment (so you have a nice UI) and
matplotlib(so you can make plots).
In order to create a new environment you use the
conda createcommand as follows.
$ mkdir my-computer-vision-project $ cd my-computer-vision-project/ $ conda create --prefix ./env --channel pytorch \ > python=3.6 \ > jupyterlab=1.0 \ > pytorch=1.1 \ > torchvision=0.3 \ > matplotlib=3.1
Alternative syntax for installing packages from specific channels
There exists an alternative syntax for installing conda packages from specific channels that more explicitly links the channel being used to install a particular package.
$ conda install conda-forge::tensorflow --prefix ./env
Repeat the previous exercise using this alternative syntax to install
One possibility would be to use the
conda createcommand as follows.
$ mkdir my-computer-vision-project $ cd my-computer-vision-project/ $ conda create --prefix ./env \ > conda-forge::python=3.6 \ > conda-forge::jupyterlab=1.0 \ > conda-forge::matplotlib=3.1 > pytorch::pytorch=1.1 \ > pytorch::torchvision=0.3
A package is a tarball containing system-level libraries, Python or other modules, executable programs and other components, and associated metadata.
A Conda channel is a URL to a directory containing a Conda package(s).
Explicitly including the channels (and their priority!) in a project’s environment file is necessary for another researcher to completely re-create that project’s software environment.