This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Simple linear regression for public health

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This lesson was designed for researchers interested in working with public health data in R, but may be of interest to researchers in other fields as well.

This lesson provides an introduction to simple linear regression. The episodes cover the concept of simple linear regression, the use of simple linear regression with various types of predictor variables (single continuous variable, single factor variable with two groups and single factor variable with more than two groups), predictions of the mean and the assessment of model fit and assumptions.

Getting started

To get started, see the instructions in the Setup page. There you will learn how to obtain the data and packages used in this lesson.

Prerequisites

This lesson does not require a formal background in statistics.

This lesson requires:

  • Working copies of R and RStudio. See here for installation instructions.
  • An understanding of how to use the Tidyverse packages to summarise and manipulate data in RStudio. See these episodes on data handling and data manipulation.
  • An understanding of how to use the ggplot2 package to plot data in RStudio. See this episode on data visualisation.
  • An understanding of the concepts covered in the Statistical thinking for public health lesson.

Schedule

Setup Download files required for the lesson
00:00 1. An introduction to linear regression What type of variables are required for simple linear regression?
What do each of the components in the equation of a simple linear regression model represent?
00:20 2. Linear regression with one continuous explanatory variable How can we assess whether simple linear regression is a suitable way to model the relationship between two continuous variables?
How can we fit a simple linear regression model with one continuous explanatory variable in R?
How can the parameters of this model be interpreted in R?
How can this model be visualised in R?
01:00 3. Linear regression with a two-level factor explanatory variable How can we explore the relationship between one continuous variable and one categorical variable with two groups prior to fitting a simple linear regression?
How can we fit a simple linear regression model with one two-level categorical explanatory variable in R?
How does the use of the simple linear regression equation differ between the continuous and categorical explanatory variable cases?
How can the parameters of this model be interpreted in R?
How can this model be visualised in R?
01:40 4. Making predictions from a simple linear regression model How can predictions be manually obtained from a simple linear regression model?
How can R be used to obtain predictions from a simple linear regression model?
02:00 5. Assessing simple linear regression model fit and assumptions What does it mean to assess model fit?
What does $R^2$ quantify and how is it interpreted?
What are the six assumptions of simple linear regression?
How do I check if any of these assumptions are violated?
04:00 6. Optional: linear regression with a multi-level factor explanatory variable How can we explore the relationship between one continuous variable and one multi-level categorical variable prior to fitting a simple linear regression?
How can we fit a simple linear regression model with one multi-level categorical explanatory variable in R?
How can the parameters of this model be interpreted in R?
How can this model be visualised in R?
04:25 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.