Introduction to Data Python Data Analysis Projects


  • Projects have common structures
  • Packaging enables a project to be installed
  • An environment allows different people to all have the same versions and run software more reliably
  • Documentation is an essential component of nay complete project and should exist with the code

Setting up a Project


  • Data and code should be governed by different principles
  • A package enables a project to be installed
  • An environment allows different people to all have the same versions and run software more reliably
  • Documentation is an essential component of nay complete project and should exist with the code

Packaging Python Projects


  • Packaged code is reusable within and across systems
  • A Python package consists of modules
  • Projects can be distributed in many ways and installed with a package manager

Managing Virtual Environments


  • A Python dependency is an independent package that a given project requires to be able to run.
  • An environment is a directory that contains a Python installation, plus a number of additional packages.
  • An environment manager enables one-step installing and documentation of dependencies, including versions.
  • virtualenv is a tool to create lightweight Python virtual environments.
  • conda is a more advanced environment and package manager that is included with Anaconda.
  • Isolating our environment can be helpful to keep our system organized.
  • Dependencies can be ‘pinned’ to files such as requirements.txt or environment.yml.

Getting started with Documentation


  • Documentation tells people how to use code and provides examples
  • Types of documentation include: literal, API, and tutorial/example
  • Literal Documentation lives outside the code and explains the big picture ideas of the project and how to get it ste up
  • API documentation lives in docstrings within the code and explains how to use functions in detail
  • Examples are scripts (or notebooks, or code excerpts) that live alongside the project and connect between the details and the common tasks.

Documentation in Code


  • Docstrings describe functions
  • comments throughout the code help onboard and debug

Building Documentation with Sphinx


  • Building documentation into a website is a common way of distributing it
  • Sphinx will auto build a website from plain text files and your docstrings

Publishing code and data


Testing and Continuous Integration