Summary and Schedule
Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.
This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged
to use their own computers to insure the proper setup of tools for an
efficient workflow.
These lessons assume no prior knowledge
of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
For Instructors
If you are teaching this lesson in a workshop, please see the Instructor notes.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Short Introduction to Programming in Python |
What is Python? Why should I learn Python? |
Duration: 00h 30m | 2. Starting With Data |
How can I import data in Python? What is Pandas? Why should I use Pandas to work with data? |
Duration: 01h 30m | 3. Indexing, Slicing and Subsetting DataFrames in Python |
How can I access specific data within my data set? How can Python and Pandas help me to analyse my data? |
Duration: 02h 30m | 4. Data Types and Formats |
What types of data can be contained in a DataFrame? Why is the data type important? |
Duration: 03h 15m | 5. Combining DataFrames with pandas |
Can I work with data from multiple sources? How can I combine data from different data sets? |
Duration: 04h 00m | 6. Data workflows and automation |
Can I automate operations in Python? What are functions and why should I use them? |
Duration: 05h 30m | 7. Plotting with bokeh |
Can I use Python to create plots? How can I customize plots generated in Python? |
Duration: 06h 15m | 8. Data Ingest & Visualization - Matplotlib & Pandas |
What other tools can I use to create plots apart from ggplot? Why should I use Python to create plots? |
Duration: 07h 00m | 9. Accessing SQLite Databases Using Python & Pandas | |
Duration: 07h 45m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data
Data for this lesson is from the Humanities Lesson data.
In this course, several data files will be used as examples. You can download these example data files by right-clicking on the following links and selecting “Save as” (or clicking on the link, then right-clicking on the file and selecting “Save as” if you are on a windows machine). You should save them in a memorable location, as you will need to tell Python where they are later.
- For lessons 2-4, and 6-8: eebo.csv
- For lesson 5,
- authors.csv
- places.csv
- 1635.csv
- 1640.csv
- For lesson 9: eebo database
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of the scientific packages we use in the lesson individually can be a bit cumbersome, and therefore recommend the all-in-one installer Anaconda.
Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.6 is fine).
Software
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of the scientific packages we use in this lesson individually can be a bit cumbersome, and therefore we recommend the all-in-one installer [Anaconda][anaconda].
Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.6 is fine).
Install the workshop packages
For installing these packages we will use Anaconda or Miniconda. They both use Conda, the main difference is that Anaconda comes with a lot of packages pre-installed. With Miniconda you will need to install the required packages.
Anaconda installation
Anaconda will install the workshop packages for you. You only need one of the two.
Download and install Anaconda
Download and install Anaconda. Remember to download and install the installer for Python 3.x.
Miniconda installation
Miniconda is a “light” version of Anaconda. If you install and use Miniconda you will also need to install the workshop packages.
Download and install Miniconda
Download and install Miniconda following the instructions. Remember to download and run the installer for Python 3.x.
Launch a Jupyter notebook
After installing either Anaconda or Miniconda and the workshop packages, launch a Jupyter notebook by typing this command from the terminal:
jupyter notebook
The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888.
Overview of the Jupyter notebook (Optional)
Screenshot of a Jupyter Notebook on
quantum mechanics by Robert Johansson
How the Jupyter notebook works
After typing the command jupyter notebook
, the following
happens:
A Jupyter Notebook server is automatically created on your local machine.
The Jupyter Notebook server runs locally on your machine only and does not use an internet connection.
-
The Jupyter Notebook server opens the Jupyter notebook client, also known as the notebook user interface, in your default web browser.
The Jupyter notebook file browser -
To create a new Python notebook select the “New” dropdown on the upper right of the screen.
The Jupyter notebook file browser -
When you can create a new notebook and type code into the browser, the web browser and the Jupyter notebook server communicate with each other.
A new, blank Jupyter notebook -
Under the “help” menu, take a quick interactive tour of how to use the notebook. Help on Jupyter and key workshop packages is available here too.
User interface tour and Help The Jupyter Notebook server does the work and calculations, and the web browser renders the notebook.
The web browser then displays the updated notebook to you.
-
For example, click in the first cell and type some Python code.
A Code cell -
This is a Code cell (see the cell type dropdown with the word Code). To run the cell, type Shift-Enter.
A Code cell and its output -
Let’s look at a Markdown cell. Markdown is a text manipulation language that is readable yet offers additional formatting. Don’t forget to select Markdown from the cell type dropdown. Click in the cell and enter the markdown text.
A markdown input cell -
To run the cell, type Shift-Enter.
A rendered markdown cell
This workflow has several advantages:
- You can easily type, edit, and copy and paste blocks of code.
- Tab completion allows you to easily access the names of things you are using and learn more about them.
- It allows you to annotate your code with links, different sized text, bullets, etc. to make information more accessible to you and your collaborators.
- It allows you to display figures next to the code that produces them to tell a complete story of the analysis.
Changing the Title
-
You can change the Notebooks’s title by clicking on the title cell
-
Rename the folder in the box and click “Rename”
How the notebook is stored
- The notebook file is stored in a format called JSON and has the
suffix
.ipynb
. - Just like HTML for a webpage, what’s saved in a notebook file looks different from what you see in your browser.
- But this format allows Jupyter to mix software (in several languages) with documentation and graphics, all in one file.
Notebook modes: Control and Edit
The notebook has two modes of operation: Control and Edit. Control mode lets you edit notebook level features; while, Edit mode lets you change the contents of a notebook cell. Remember a notebook is made up of a number of cells which can contain code, markdown, html, visualizations, and more.