Multiple linear regression for public health

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This lesson was designed for researchers interested in working with public health data in R, but may be of interest to researchers in other fields as well.

This lesson provides an introduction to linear regression with more than one explanatory variable. This is known as multiple linear regression, as the model has multiple explanatory variables. The first episode of this lesson covers how to fit and interpret models with one continuous and one categorical explanatory variable. In the second episode, these models are expanded by including an interaction between the two explanatory variables. In the third episode, predictions of the mean are covered. The final episode covers model fit and assessment of model assumptions.

Getting started

To get started, see the instructions in the Setup page. There you will learn how to obtain the data and packages used in this lesson.

Prerequisites

This lesson does not require a formal background in statistics.

This lesson requires:

Working copies of R and RStudio. See here for installation instructions.

An understanding of how to use the Tidyverse packages to summarise and manipulate data in RStudio. See these episodes on data handling and data manipulation.

An understanding of how to use the ggplot2 package to plot data in RStudio. See this episode on data visualisation.

An understanding of the concepts covered in the Statistical thinking for public health and Simple linear regression for public health lessons.

Schedule

	Setup	Download files required for the lesson
00:00	1. Linear regression with one continuous and one categorical explanatory variable	How can we visualise the relationship between three variables, two of which are continuous and one of which is categorical, in R? How can we fit a linear regression model to this type of data in R? How can we obtain and interpret the model parameters? How can we visualise the linear regression model in R?
00:35	2. Linear regression including an interaction between one continuous and one categorical explanatory variable	When is it appropriate to add an interaction to a multiple linear regression model? How do we add an interaction term in the lm() command? How do the coefficient estimates given by summ() relate to the multiple linear regression model equation? How can we visualise the final model in R?
01:25	3. Making predictions from a multiple linear regression model	How can we make predictions using the model equation of a multiple linear regression model? How can we use R to obtain predictions from a multiple linear regression model?
01:45	4. Assessing multiple linear regression model fit and assumptions	Why is the adjusted R squared used, instead of the standard R squared, when working with multiple linear regression? What are the six assumptions of multiple linear regression and how are they assessed?
02:45	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.