Overview

Teaching: 40 min
Exercises: 10 min
Questions
  • How do I bring data together from separate tables?

Objectives
  • Employ joins to combine data from two tables.

  • Apply functions to manipulate individual values.

  • Employ aliases to assign new names to tables and columns in a query.

Joins

To combine data from two tables we use the SQL JOIN command, which comes after the FROM command.

The JOIN command on its own will result in a cross product, where each row in the first table is paired with each row in the second table. Usually this is not what is desired when combining two tables with data that is related in some way.

An example of cross product:
Not very helpful right?
cross product

For that, we need to tell the computer which columns provide the link between the two tables using the word ON. What we want is to join the data with the same item id. For example, try join the invoice_info and item_info table:

SELECT *
FROM invoice_info
JOIN item_info
ON invoice_info.Item_id = item_info.Item_id;

ON is like WHERE, it filters things out according to a test condition. We use the table.colname format to tell the manager what column in which table we are referring to.

The output of the JOIN command will have columns from the first table plus the columns from the second table. For the above command, you will see two item_id columns.

Alternatively, we can use the word USING, as a short-hand. USING only works on columns which share the same name. In this case we are telling the manager that we want to combine invoice_info with item_info and that the common column is item_id.

SELECT *
FROM invoice_info
JOIN item_info
USING (item_id);

The output will only have one item_id column

We often won’t want all of the fields from both tables, we can select the columns by using table.colname.

For example, what if we wanted only the invoice_id, Item_Description, Bottle_Volume_ml, Bottle_Retail_Price? Sometimes table names are very long, it is handy to give alias to table names.

SELECT inv.Invoice_id, ite.Item_Description, ite.Bottle_Volume_ml, ite.Bottle_Retail_Price
FROM invoice_info AS inv
JOIN item_info AS ite
USING (item_id);

Challenge:

Solution

SELECT Store_id, Store_Name, County_Name, City_Name
FROM Store_info
JOIN County USING (County_id);

Different join types

There are few types of joins:
join
An INNER JOIN is the most common and default type of join. You can use INNER keyword optionally.
We can count the number of records returned by a join query with store_info and county table.

SELECT COUNT(*)
FROM store_info
INNER JOIN county
USING(County_id);

Notice that this number is larger than left join.

SELECT COUNT(*)
FROM store_info
LEFT JOIN county
USING(County_id);

Challenge:

Solution

There are 2 stores without a County_id

Remember: In SQL a NULL value in one table can never be joined to a NULL value in a second table because NULL is not equal to anything, not even itself.
CROSS JOIN are not very often used, it returns weird things…
RIGHT JOIN and FULL OUTER JOIN is not supported in sqlite3
If you need to do RIGHT JOIN, you can just swap the table names
If you need to do FULL OUTER JOIN, you need to do two queries and use UNION ALL to put them together
Suppose you have two tables, table A with attribute “ab” and “a”, table B with attribute “ab” and “b”, and the join column is “ab”,

SELECT *
FROM A
LEFT JOIN B USING(ab)
UNION ALL
SELECT *
FROM B
LEFT JOIN A USING(ab)
WHERE A.ab IS NULL; 

Challenge:

Think about this query, what does each SELECT statement do? How was the full outer join achieved?

Solution

First select statement selects “in A but not in B” and “both in A and in B”
This is different from SELECT * FROM A; because it identifies what’s in A but not in B
Second select statement selects “in B but not in A”

Combining joins with sorting and aggregation

Ok, now we mash everything together. Lets try to see which stores have the most number of purchases in 2015. Number of purchases by a store is equal to number of invoices grouped by store_id. We want the Store_Name, Address, County_Name, number of invoices for each store in 2015, and then sort the result by number of invoices at descending order. Try slowly build the query step by step.

SELECT st.Store_Name, st.Address, County_Name, count(invoice_id) AS Num_Invoice
FROM store_info as st
INNER JOIN County as ct 
ON st.County_id = ct.County_id
INNER JOIN invoice_info as inv
ON st.Store_id = inv.Store_id
WHERE strftime('%Y', Date)  == '2015'
GROUP BY st.Store_id
ORDER BY Num_Invoice DESC;

Challenge:

Solution

SELECT strftime('%Y-%m', Date) as month, sum(Bottles_sold*Bottle_Cost) AS Dollar_Sale
FROM invoice_info as inv
INNER JOIN item_info as ite ON inv.item_id = ite.item_id 
WHERE month BETWEEN '2013-01-01' AND '2016-12-31'
GROUP BY month
ORDER BY Dollar_Sale DESC;  

Functions IFNULL and NULLIF and more

SQL includes numerous functions for manipulating data. You’ve already seen some of these being used for aggregation (SUM and COUNT) but there are functions that operate on individual values as well. Probably the most important of these are IFNULL and NULLIF. IFNULL allows us to specify a value to use in place of NULL.

Remember the ones that does not have a County_id? Let’s replace the “None” with “Online”

SELECT Store_id, Store_Name, IFNULL(County_id, "Online") AS County_id
FROM store_info;

Keep in mind that this does not change the database, it is still just a query. So if you exclude the IFNULL, the query will still return None. IFNULL can be particularly useful in JOIN. Even if there is no NULL value in any tables, sometimes a LEFT JOIN could result in NULL values. Our database is very clean, so unfortunately, there are not much null values to play with…

The inverse of IFNULL is NULLIF. This returns NULL if the first argument is equal to the second argument. If the two are not equal, the first argument is returned. This is useful for “nulling out” specific values.

Some more functions which are common to SQL databases are listed in the table below:

Function Description
ABS(n) Returns the absolute (positive) value of the numeric expression n
LENGTH(s) Returns the length of the string expression s
LOWER(s) Returns the string expression s converted to lowercase
NULLIF(x, y) Returns NULL if x is equal to y, otherwise returns x
ROUND(n) or ROUND(n, x) Returns the numeric expression n rounded to x digits after the decimal point (0 by default)
TRIM(s) Returns the string expression s without leading and trailing whitespace characters
UPPER(s) Returns the string expression s converted to uppercase

Finally, some useful functions which are particular to SQLite are listed in the table below:

Function Description
IFNULL(x, y) Returns x if it is non-NULL, otherwise returns y
RANDOM() Returns a random integer between -9223372036854775808 and +9223372036854775807.
REPLACE(s, f, r) Returns the string expression s in which every occurrence of f has been replaced with r
SUBSTR(s, x, y) or SUBSTR(s, x) Returns the portion of the string expression s starting at the character position x (leftmost position is 1), y characters long (or to the end of s if y is omitted)

FINAL Challenge:

Suppose you are a retail store owner in Davenport and want to get some soda in the inventory. You want the sodas that can generate more profit for you.
Since you don’t have the actual sales data from the stores to individuals, you can only estimate by looking at how much each kind of soda did other stores buy from the vendors. Find the sodas that can potentially generated most profit after 2015 in Davenport (City_Name = “DAVENPORT”). Sort by total profit.
A few soda’s “Bottle_Retail_Price” is empty, replace those with Bottle_Cost * 1.8, which is average profit margin.

Solution

SELECT Item_id, Item_Description, SUM((IFNULL(Bottle_Retail_Price, Bottle_Cost*1.8) - Bottle_Cost) * Bottles_Sold) AS Profit
FROM invoice_info
INNER JOIN item_info USING (item_id)
INNER JOIN store_info USING (Store_id)
INNER JOIN county USING (County_id)
WHERE Date > "2015-01-01" AND City_Name = "DAVENPORT"
GROUP BY item_id
ORDER BY Profit DESC; 

Congratulations! You just completed the SQL lesson. Yes, there are a lot of stuff, it’s difficult to memorize everything. Keep practicing! Here is a cheat sheet for you.

Key Points