This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Statistical thinking for public health

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This lesson was designed for researchers interested in working with public health data in R, but may be of interest to researchers in other fields as well.

This lesson provides an introduction to statistical concepts commonly used for linear modelling. It is a prerequisite for the other lessons in the statistics for public health curriculum. The lesson covers mean estimation, linear association and mean prediction.

Getting started

To get started, see the instructions in the Setup page. There you will learn how to obtain the data and packages used in this lesson.


This lesson does not require a formal background in statistics.

This lesson requires:

  • Working copies of R and RStudio. See here for installation instructions.
  • An understanding of how to use the Tidyverse packages to summarise and manipulate data in RStudio. See these episodes on data handling and data manipulation.
  • An understanding of how to use the ggplot2 package to plot data in RStudio. See this episode on data visualisation.


Setup Download files required for the lesson
00:00 1. Estimating the mean, variance and standard deviation How are the mean, variance and standard deviation calculated and interpreted?
00:35 2. Estimating the variation around the mean: standard errors and confidence intervals What are the definitions of the standard error and the 95% confidence interval?
How is the 95% confidence interval interpreted in practice?
01:35 3. Visualising and quantifying linear associations How can we visualise the linear association between two variables?
How can we quantify the size of a linear association?
01:55 4. Predicting means using linear associations How can the mean of a continous outcome variable be predicted with a continous explanatory variable?
02:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.