This lesson is still being designed and assembled (Pre-Alpha version)

Packaging Python Projects

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • How do I use my own functions?

  • How can I make my functions most usable for my collaborators?

Objectives
  • Identify the components of a Python package

  • Apply a template for packaging existing code

  • Update the packaged project after modifying the code

  • Install and update a local or GitHub-hosted package

Recall: Functions

When we develop code for research, we often start by writing unorganized code in notebook cells or a script. Eventually, we might want to re-use the code we wrote in other contexts. In order to re-use code, it is helpful to organize it into functions and classes in separate .py files. We call these files modules, and will soon go into more detail about them. Whenever we refer to a module in Python, we can think of it as as .py file that has other code, typically functions or other objects, in it.

For example, say we are making a program that deals with temperature date. We have a function to convert from degrees Fahrenheit to Celsius:

def fahr_to_celsius(temperature):
    """
    Function to convert temperature from fahrenheit to Celsius

    Parameters
    -------------
    temperature : float
         temperature in Fahrenheit
         
    Returns
    --------
    temperature_c : float
          temperature in Celsius
    """
    return (temperature - 32) * (5 / 9)

We use this function a lot, so we don’t want to have to copy and paste it every time. Instead, we can store it in a module and import it from there. You have probably imported modules or functions before, this time we will do that for our own code!

Pip

Pip is the most common package manager for Python. Pip allows you to easily install Python packages locally from your computer or from an online repository like the Python Package Index (PyPI). Once a package is installed with pip, you can import that package and use it in your own code.

Pip is a command line tool. We’ll start by exploring its help manual:

pip

The output will look like this

Usage:   
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring
                              environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be
                              used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be
                              used up to 3 times (corresponding to WARNING,
                              ERROR, and CRITICAL logging levels).
  --log <path>                Path to a verbose appending log.
  --proxy <proxy>             Specify a proxy in the form
                              [user:passwd@]proxy.server:port.
  --retries <retries>         Maximum number of retries each connection should
                              attempt (default 5 times).
  --timeout <sec>             Set the socket timeout (default 15 seconds).
  --exists-action <action>    Default action when a path already exists:
                              (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort).
  --trusted-host <hostname>   Mark this host as trusted, even though it does
                              not have valid or any HTTPS.
  --cert <path>               Path to alternate CA bundle.
  --client-cert <path>        Path to SSL client certificate, a single file
                              containing the private key and the certificate
                              in PEM format.
  --cache-dir <dir>           Store the cache data in <dir>.
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine
                              whether a new version of pip is available for
                              download. Implied with --no-index.
  --no-color                  Suppress colored output

This shows the basic commands available with pip and and the general options.

Exercise

  1. Use pip to install the sphinx package, we will need it later.
  2. Choose a pip command and look up its options. Discuss the command with your neighbour.

Solution

pip install sphinx

Python Modules

A module is a piece of code that serves a specific purpose. In Python, a module is written in a .py file. The name of the file is name of the module. A module can contain classes, functions, or a combination of both. Modules can also define variables for use, for example, numpy defines the value of pi with numpy.pi.

If a .py file is on the path, we can import functions from it to our current file. Open up Python, import sys and print the path.

import sys
sys.path
['',
'/home/vlad/anaconda3/lib/python37.zip',
'/home/vlad/anaconda3/lib/python3.7',
'/home/vlad/anaconda3/lib/python3.7/lib-dynload',
'/home/vlad/anaconda3/lib/python3.7/site-packages'
]

Here we see that Python is aware of the path to the Python executable, as well as other directories like site-packages.

sys.path is a list of strings, each describing the absolute path to a directory. Python will look in these directories for modules. If we have a directory containing modules we want Python to be aware of, we append it that directory to the path. If I have a package in /home/vlad/Documents/science/cool-package I add it with sys.path.append

sys.path.append('/home/vlad/Documents/science/cool-package')
sys.path
['',
'/home/vlad/anaconda3/lib/python37.zip',
'/home/vlad/anaconda3/lib/python3.7',
'/home/vlad/anaconda3/lib/python3.7/lib-dynload',
'/home/vlad/anaconda3/lib/python3.7/site-packages',
'/home/vlad/Documents/science/cool-package'
]

We can see that the path to our module has been added to sys.path. Once the module you want is in sys.path, it can be imported just like any other module.

Python Packages

To save adding modules to the path every time we want to use them, we can package our modules to be installable. This method of importing standardises how we import modules across different user systems. This is why when we import packages like pandas and matplotlib we don’t have to write out their path, or add it to the path before importing. When we install a package, its location gets added to the path, or it’s saved to a location already on the path.

Many packages contain multiple modules. When we import matplotlib.pyplot as plt we are importing only the pyplot module, not the entire matplotlib package. This use of package.module is a practice referred to as a namespace. Python namespaces help to keep modules and functions with the same name separate. For instance, both scipy and numpy have a randfunction to create arrays of random numbers. We can differentiate them in our code by using scipy.sparse.rand and numpy.random.rand. respectively

In this way, namespaces allow multiple packages to have functions of the same name without creating conflicts. Packages are namespaces or containers which can contain multiple modules.

Making Python code into a package requires no extra tools. We need to

Our final package will look like this:

├── package-name
│ ├── __init__.py
│ ├── module-a.py
│ └── module-b.py
└── setup.py

The __init__.py file tells Python that the directory is supposed to be tread as a package.

Let’s create a package called conversions with two modules temperature and speed.

Step 1: Creating a directory

Create a directory called conversions

mkdir conversions

Step 2: Adding Modules

conversions/temperature.py

def fahr_to_celsius(temperature):
    """
    Function to convert temperature from fahrenheit to Celsius

    Parameters
    -------------
    temperature : float
         temperature in Fahrenheit
         
    Returns
    --------
    temperature_c : float
          temperature in Celsius
    """
    return (temperature - 32) * (5 / 9)

the file temperature.py will be treated as a module called temperature. This module contains the function fahr_to_celsius. The top level container is the package conversions. The end user will import this as: from conversions.temperature import fahr_to_celsius

Exercise

  1. Create a file named speed.py inside the conversions directory and add a function named kph_to_ms that will convert kilometres per hour to meters per second. Here’s the docstring desribing the function:
     """
     Function to convert speed from kilometres per hour to meters per second
    
     Parameters
     -------------
     speed : float
          speed in kilometres per hour
    
     Returns
     --------
     speed_ms : float
           speed in meters per second
     """
    

Solution

conversions/speed.py

def kph_to_ms(speed):
    """
    Function to convert speed from kilometres per hour to meters per second

    Parameters
    -------------
    speed : float
         speed in kilometres per hour

    Returns
    --------
    speed_ms : float
          speed in meters per second
    """
    return speed / 3.6

Step 3 Adding the init file

Finally, we create a file named __init__.py inside the conversions directory:

touch conversions/__init__.py

The init file is the map that tells Python what our package looks like. It is also what tells Python a directory is a module. An empty init file marks a directory as a module.

Now, if we launch a new Python terminal from this directory, we can import the package conversions

from conversions import temperature, speed

print(temperature.fahr_to_celsius(100))

Even if the __init__.py file is empty, its existence indicates to Python that we can import names from that package. However, by adding import code to it, we can make our package easier to use. Add the following code to the init file:

from .temperature import fahr_to_celsius
from .speed import kph_to_ms

The . before the temperature and speed means that they refer to local modules, that is, files in the same directory as the __init__.py file. If we start a new Python interpreter, we can now import fahr_to_celsius and kph_to_ms directly from the conversions module:

from conversions import fahr_to_celsius, kph_to_ms

Now, we can import from conversions, but only if our working directory is one level above the conversions directory. What if we want to use the conversions package from another project or directory?

SetupTools and installing Locally

The file setup.py contains the essential information about our package for PyPI. It needs to be machine readable, so be sure to format it correctly

import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="conversions",
    version="0.0.1",
    author="Example Author",
    author_email="author@example.com",
    description="An example  package to perform unit conversions",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/pypa/sampleproject",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
)

Now that our code is organized into a package and has setup instructions, how can we use it? If we try importing it now, what happens?

We need to install it first. Earlier, we saw that pip can install packages remotely from PyPI. pip can also install from a local directory.

Relative file paths

We want to install the package located in the conversions/ directory. If we move inside that directory, we can refer to it as .. This is a special file path that means the current directory. We can see what directory we are in with the pwd command, that stands for “print working directory”. Other special file paths are .., meaning “the directory containing this one”, and ~, that refers to the current user’s home directory (usually /home/<user-name> for UNIX systems).

Usually the . and .. file paths are hidden if we run ls (and the same happens for all file names that start with the . character), but if we run ls -a, we can list them:

ls -a
. .. conversions setup.py

So, to install our package, we can run:

cd conversions
pip install -e .

The -e flag (aka --editable) tells pip to install this package in editable mode. This allows us to make changes to the package without re-installing it. Analysis code can change dramatically over time, so this is a useful option!

Now we can try importing and using our package.

Command Line Tools

FIXME: how to make a tool command line installable

More details on this may be found at on the Python packaging documentation site

Getting a Package from A Colleague

Many projects are distributed via GitHub as open source projects, we can use pip to install those as well.

Using git clone

Download and unzip their folder

Direct download via pip

cd project_dir
pip install .

PyPI Submission

To make pip install packagename work you have to submit your package to the repository. We won’t do that today, but an important thing to think about if you might want to go this direction, is that the name must be unique. This mens that i’s helpful to check pipy before creating your package so that you chooses a name that is availalbe.

To do this, you also need to package it up somewhat more. There are two types of archives that it looks for, as ‘compiled’ versions of your code. One is a source archive (tar.gz) and the other is a built distribution (.whl). The built version will be used most often, but the source archive is a backup and makes your package more broadly compatible.

The next step is to generate distribution packages for the package. These are archives that are uploaded to the Package Index and can be installed by pip.

Make sure you have the latest versions of setuptools and wheel installed:

python3 -m pip install --user --upgrade setuptools wheel
python3 setup.py sdist bdist_wheel

This command should output a lot of text and once completed should generate two files in the dist directory:

dist/
  example_pkg_your_username-0.0.1-py3-none-any.whl
  example_pkg_your_username-0.0.1.tar.gz

Finally, it’s time to upload your package to the Python Package Index!

First, we’ll register for accounts on Test PyPI, intended for testing and experimentation. This way, we can practice all of the steps, without publishing our sample code that we’ve been working with.

Go to test.pypi.org/account/register/ and complete the steps on that page, then verify your account.

Now that you are registered, you can use twine to upload the distribution packages. You’ll need to install Twine:

python3 -m pip install --user --upgrade twine

Once installed, run Twine to upload all of the archives under dist:

python3 -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*

You will be prompted for the username and password you registered with Test PyPI. After the command completes, you should see output similar to this:

Uploading distributions to https://test.pypi.org/legacy/
Enter your username: [your username]
Enter your password:
Uploading example_pkg_your_username-0.0.1-py3-none-any.whl
100%|█████████████████████| 4.65k/4.65k [00:01<00:00, 2.88kB/s]
Uploading example_pkg_your_username-0.0.1.tar.gz
100%|█████████████████████| 4.25k/4.25k [00:01<00:00, 3.05kB/s]

Once uploaded your package should be viewable on TestPyPI, for example, https://test.pypi.org/project/example-pkg-your-username

test by having your neighbor install your package.

Since they’re not actually a packaged with functionality, we should uninstall once we’re done with pip uninstall

Key Points

  • Packaged code is reusable within and across systems

  • A Python package consists of modules

  • Projects can be distributed in many ways and installed with a package manager