Learning objectives
- Understand concept of relational database
- Be able to perform simple queries in SQL from a single table
- Understand how to filter and sort results from a query
- Use aggregate functions to combine data
- Perform queries across tables
Data
As for all of the Data Carpentry ecology lessons, this lesson uses the Portal Project Teaching Database. The data is available at http://dx.doi.org/10.6084/m9.figshare.1314459 and the download includes a SQLite database file (portal_mammals.sqlite) as well as three .csv files (species.csv, plots.csv, surveys.csv) that can be imported into SQLite.
Note that the figshare download is an archive (.zip) file that rudely explodes all of the files into your current directory.
Instructor’s setup notes
By default SQLite Manager opens in a separate window and it is not possible to zoom in to enlarge the font so that it is more readable, especially for students in the back rows.
The way to fix this is to:
- Open the SQLite Manager
- Click on the Options button .
- Chose Start SQLite Manager: in a new tab.
You can then use Ctrl - + to zoom just like any other web page.
Motivation and Framing
See this slide deck as a sample intro for the lesson: SQL Intro Deck
Key points:
- Want to query data, instead of editing directly
- Need a solution that is scalable and reproducible
- Introduce the typical ways of dealing with rectangular data (subsetting, split-apply-combine)
If you’ve written up a diagram of the data analysis pipeline (raw data -> clean data -> import and analyze -> results -> visualization), it can be helpful to identify that you’re now somewhere between clean data and analysis.
Lesson outline
00-sql-introduction
- Introduce relational databases, database management systems, the SQLite Firefox plugin, and the Portal dataset
- Import .csv files into sqlite
- Structuring data for database import
- Discuss the different SQL data types
Tips
- Changing data type: There is a step on import where you have to select the data type for the field. Some people select the wrong type or nothing at all and wonder what they can do to change it. If you right click or Control click on the column name in the Structure tab, then there’s an Edit Column option and you can type in a new data type. When you go to save it, it does warn you that this is dangerous behavior, so people are being warned, but it might be nice to show this option and also to say why it’s not generally a good idea to change data types.
- Importing data: Note how cleanly the csv files import into SQL. If you have also taught the spreadsheet lesson, it would be a good idea to compare the format of the csv files with the messy spreadsheet and ask “Remember that messy spreadsheet? What would have happened if we tried to load that in to SQL?”
00-supplement-database-design.md
(optional) The first lesson includes a brief introduction to data design and choosing database systems. This material expands on the database design in the first section.
01-sql-basic-queries
- Write basic queries using SELECT and FROM
- Filter results using DISTINCT, WHERE
- Change how results are displayed using ORDER BY and by doing calculations on results
02-sql-aggregation
- Combine results using COUNT and GROUP BY
- Filtering based on aggregation with HAVING
- Saving queries using VIEW
03-sql-joins-aliases
- combining data from two tables using JOIN, ON, USING
- depending on time, could introduce different types of JOINs
- using aliases with AS to simplify queries
Alternative activities
Queries on the board
As you teach the lesson, it can be helpful to pause and write up the query keywords on the board. This could look like this:
- After 01-sql-basic queries ~~~ SELECT column FUNCTION(column)
FROM table
WHERE (conditional statement, applies to row values) (AND/OR)
ORDER BY column/FUNCTION(column) (ASC/DESC)
* After 02-sql-aggregation
SELECT column FUNCTION(column) AGGREGATE_FUNCTION(column) FROM table
WHERE (conditional statement, applies to row values) (AND/OR) (IS (NOT) NULL) GROUP BY column HAVING (conditional statement, applies to group) ORDER BY column/FUNCTION(column) (ASC/DESC)
* After 03-sql-joins-aliases
SELECT column FUNCTION(column) AGGREGATE_FUNCTION(column) FROM table JOIN table ON table.col = table.col WHERE (conditional statement, applies to row values) (AND/OR) (IS (NOT) NULL) GROUP BY column HAVING (conditional statement, applies to group) ORDER BY column/FUNCTION(column) (ASC/DESC)
As a bonus, if you can leave this on the board, it translates nicely into
the `dplyr` portion of the `R` lesson, i.e.:
SQL: dplyr:
SELECT column select(col) FUNCTION(column) mutate(col = fcn(col)) AGGREGATE_FUNCTION(column) summarize(col = fcn(col)) FROM table JOIN table ON table.col = table.col WHERE (conditional statement, applies to row values) filter(condition) (AND/OR) (IS (NOT) NULL) is.na() GROUP BY column group_by(col) HAVING (conditional statement, applies to group) ORDER BY column/FUNCTION(column) (ASC/DESC) arrange() ~~~
“Interactive” database
If you want to try something more active (esp. if you’re teaching SQL in the afternoon!), this is a an interactive activity to try.
- Give each student six cards, with the following labels:
- name
- name
- height
- DoC
- height*2.54
- dept
- Have students fill out their cards:
- Name: first name
- Height: height in INCHES
- Dept (department): close enough (just pick one if you don’t have a home department)
- DoC (Dog or Cat): Dog, Cat, Both, Neither, or leave blank
- Each student is now a record in an interactive “students” database, where each of the cards they hold is a field in that database.
- At various points in the lesson, stop and “query” the student database. To do this:
- On a slide (or in a text editor), show or type in a sample query. Something like:
SELECT name, name FROM students WHERE height > 66
- If the query applies to a record (student), that student should stand, and display (hold up) the appropriate field (card)
- See the following slide deck for a list of sample queries. Sample queries
- On a slide (or in a text editor), show or type in a sample query. Something like: