This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Making predictions from a multiple linear regression model

Overview

Teaching: 8 min
Exercises: 12 min
Questions
  • How can we make predictions using the model equation of a multiple linear regression model?

  • How can we use R to obtain predictions from a multiple linear regression model?

Objectives
  • Calculate a prediction from a simple linear regression model using parameter estimates given by the model output.

  • Use the predict function to generate predictions from a multiple linear regression model.

As with the simple linear regression model, the multiple linear regression model allows us to make predictions. First we will calculate predictions using the model equation. Then we will see how R can calculate predictions for us using the make_predictions() function.

Calculating predictions manually

Let us use the BPSysAve_Age_Sex model from the previous episode. The equation for this model was:

\(E(\text{BPSysAve}) = 93.39 + 0.55 \times \text{Age} + 17.06 \times x_2 - 0.28 \times x_2 \times \text{Age}\),

where $x_2 = 1$ for male participants and $0$ otherwise.

For a 30-year old female, the model predicts an average systolic blood pressure of

\(E(\text{BPSysAve}) = 93.39 + 0.55 \times 30 + 17.06 \times 0 - 0.28 \times 0 \times 30 = 93.39 + 16.5 = 109.89\).

For a 30-year old male, the model predicts an average systolic blood pressure of

\(E(\text{BPSysAve}) = 93.39 + 0.55 \times 30 + 17.06 \times 1 - 0.28 \times 1 \times 30 = 93.39 + 16.5 + 17.06 - 8.4 = 118.55\).

Exercise

Given the summ output from our Hemoglobin_Age_Sex model, the model can be described as \(E(\text{Hemoglobin}) = 13.29 + 0.00 \times \text{Age} + 2.76 \times x_2 - 0.02 \times x_2 \times \text{Age}\), where $x_2 = 1$ for participants of the male Sex and $0$ otherwise.

  1. What level of Hemoglobin does the model predict, on average, for an individual of the female Sex with an Age of 40?
  2. What Hemoglobin is predicted for a male of the same Age?

Solution

For 40-year old females: $13.29 + 0.00 \times 40 + 2.76 \times 0 - 0.02 \times 0 \times 40 = 13.29$.
For 40-year old males: $13.29 + 0.00 \times 40 + 2.76 \times 1 - 0.02 \times 1 \times 40 = 13.29 + 2.76 - 0.8 = 15.25$.

Making predictions using make_predictions()

Using the make_predictions() function brings two advantages. First, when calculating multiple predictions, we are saved the effort of inserting multiple values into our model manually and doing the calculations. Secondly, make_predictions() returns 95% confidence intervals around the predictions, giving us a sense of the uncertainty around the predictions.

To use make_predictions(), we need to create a tibble with the explanatory variable values for which we wish to have mean predictions from the model. We do this using the tibble() function. Note that the column names must correspond to the names of the explanatory variables in the model, i.e. Age and Sex. In the code below, we create a tibble with Age having the values 30, 40, 50 and 60 for females and males. We then provide make_predictions() with this tibble, alongside the model from which we wish to have predictions. By default, make_predicitons() returns 95% confidence intervals for the mean predictions.

From the output we can see that the model predicts an average systolic blood pressure of 109.8596 for a 30-year old female. The confidence interval around this prediction is [109.0593, 110.6599].

BPSysAve_Age_Sex <- dat %>%
  filter(Age > 17) %>%
  lm(formula = BPSysAve ~ Age * Sex)

predictionDat <- tibble(Age = c(30, 40, 50, 60,
                                30, 40, 50, 60),
                        Sex = c("female", "female", "female", "female",
                                "male", "male", "male", "male"))

make_predictions(BPSysAve_Age_Sex, new_data = predictionDat)
# A tibble: 8 × 5
    Age Sex    BPSysAve  ymin  ymax
  <dbl> <chr>     <dbl> <dbl> <dbl>
1    30 female     110.  109.  111.
2    40 female     115.  115.  116.
3    50 female     121.  120.  121.
4    60 female     126.  126.  127.
5    30 male       119.  118.  119.
6    40 male       121.  121.  122.
7    50 male       124.  123.  125.
8    60 male       127.  126.  127.

Exercise

Using the make_predictions() function, obtain the expected mean Hemoglobin levels predicted by the Hemoglobin_Age_Sex model for individuals with an Age of 20, 30, 40 and 50. Obtain confidence intervals for these predictions. How are these confidence intervals interpreted?

Solution

Hemoglobin_Age_Sex <- dat %>%
 filter(Age > 17) %>%
 lm(formula = Hemoglobin ~ Age * Sex)

predictionDat <- tibble(Age = c(20, 30, 40, 50,
                                 20, 30, 40, 50),
                         Sex = c("female", "female", "female", "female",
                                 "male", "male", "male", "male"))

make_predictions(Hemoglobin_Age_Sex, new_data = predictionDat)
# A tibble: 8 × 5
    Age Sex    Hemoglobin  ymin  ymax
  <dbl> <chr>       <dbl> <dbl> <dbl>
1    20 female       13.3  13.2  13.4
2    30 female       13.3  13.3  13.4
3    40 female       13.3  13.3  13.4
4    50 female       13.3  13.3  13.4
5    20 male         15.6  15.5  15.7
6    30 male         15.4  15.3  15.4
7    40 male         15.2  15.1  15.2
8    50 male         14.9  14.9  15.0

Recall that 95% of the 95% confidence intervals are expected to contain the population mean. Therefore, we can be fairly confident that the true population means lie somewhere between the bounds of the intervals, assuming that our model is good.

Key Points

  • Predictions of the mean in the outcome variable can be manually calculated using the model’s equation.

  • Predictions of multiple means in the outcome variable alongside 95% CIs can be obtained using the make_predictions() function.