Making predictions from a multiple linear regression model
Overview
Teaching: 8 min
Exercises: 12 minQuestions
How can we make predictions using the model equation of a multiple linear regression model?
How can we use R to obtain predictions from a multiple linear regression model?
Objectives
Calculate a prediction from a simple linear regression model using parameter estimates given by the model output.
Use the predict function to generate predictions from a multiple linear regression model.
As with the simple linear regression model, the multiple linear regression model allows us to make predictions. First we will calculate predictions using the model equation. Then we will see how R can calculate predictions for us using the make_predictions()
function.
Calculating predictions manually
Let us use the BPSysAve_Age_Sex
model from the previous episode. The equation for this model was:
\(E(\text{BPSysAve}) = 93.39 + 0.55 \times \text{Age} + 17.06 \times x_2 - 0.28 \times x_2 \times \text{Age}\),
where $x_2 = 1$ for male participants and $0$ otherwise.
For a 30-year old female, the model predicts an average systolic blood pressure of
\(E(\text{BPSysAve}) = 93.39 + 0.55 \times 30 + 17.06 \times 0 - 0.28 \times 0 \times 30 = 93.39 + 16.5 = 109.89\).
For a 30-year old male, the model predicts an average systolic blood pressure of
\(E(\text{BPSysAve}) = 93.39 + 0.55 \times 30 + 17.06 \times 1 - 0.28 \times 1 \times 30 = 93.39 + 16.5 + 17.06 - 8.4 = 118.55\).
Exercise
Given the
summ
output from ourHemoglobin_Age_Sex
model, the model can be described as \(E(\text{Hemoglobin}) = 13.29 + 0.00 \times \text{Age} + 2.76 \times x_2 - 0.02 \times x_2 \times \text{Age}\), where $x_2 = 1$ for participants of the maleSex
and $0$ otherwise.
- What level of Hemoglobin does the model predict, on average, for an individual of the female
Sex
with anAge
of 40?- What
Hemoglobin
is predicted for a male of the sameAge
?Solution
For 40-year old females: $13.29 + 0.00 \times 40 + 2.76 \times 0 - 0.02 \times 0 \times 40 = 13.29$.
For 40-year old males: $13.29 + 0.00 \times 40 + 2.76 \times 1 - 0.02 \times 1 \times 40 = 13.29 + 2.76 - 0.8 = 15.25$.
Making predictions using make_predictions()
Using the make_predictions()
function brings two advantages. First, when calculating multiple predictions, we are saved the effort of inserting multiple values into our model manually and doing the calculations. Secondly, make_predictions()
returns 95% confidence intervals around the predictions, giving us a sense of the uncertainty around the predictions.
To use make_predictions()
, we need to create a tibble
with the explanatory variable values for which we wish to have mean predictions from the model. We do this using the tibble()
function. Note that the column names must correspond to the names of the explanatory variables in the model, i.e. Age
and Sex
. In the code below, we create a tibble
with Age
having the values 30, 40, 50 and 60 for females and males. We then provide make_predictions()
with this tibble
, alongside the model from which we wish to have predictions. By default, make_predicitons()
returns
95% confidence intervals for the mean predictions.
From the output we can see that the model predicts an average systolic blood pressure of 109.8596 for a 30-year old female. The confidence interval around this prediction is [109.0593, 110.6599].
BPSysAve_Age_Sex <- dat %>%
filter(Age > 17) %>%
lm(formula = BPSysAve ~ Age * Sex)
predictionDat <- tibble(Age = c(30, 40, 50, 60,
30, 40, 50, 60),
Sex = c("female", "female", "female", "female",
"male", "male", "male", "male"))
make_predictions(BPSysAve_Age_Sex, new_data = predictionDat)
# A tibble: 8 × 5
Age Sex BPSysAve ymin ymax
<dbl> <chr> <dbl> <dbl> <dbl>
1 30 female 110. 109. 111.
2 40 female 115. 115. 116.
3 50 female 121. 120. 121.
4 60 female 126. 126. 127.
5 30 male 119. 118. 119.
6 40 male 121. 121. 122.
7 50 male 124. 123. 125.
8 60 male 127. 126. 127.
Exercise
Using the
make_predictions()
function, obtain the expected mean Hemoglobin levels predicted by theHemoglobin_Age_Sex
model for individuals with anAge
of 20, 30, 40 and 50. Obtain confidence intervals for these predictions. How are these confidence intervals interpreted?Solution
Hemoglobin_Age_Sex <- dat %>% filter(Age > 17) %>% lm(formula = Hemoglobin ~ Age * Sex) predictionDat <- tibble(Age = c(20, 30, 40, 50, 20, 30, 40, 50), Sex = c("female", "female", "female", "female", "male", "male", "male", "male")) make_predictions(Hemoglobin_Age_Sex, new_data = predictionDat)
# A tibble: 8 × 5 Age Sex Hemoglobin ymin ymax <dbl> <chr> <dbl> <dbl> <dbl> 1 20 female 13.3 13.2 13.4 2 30 female 13.3 13.3 13.4 3 40 female 13.3 13.3 13.4 4 50 female 13.3 13.3 13.4 5 20 male 15.6 15.5 15.7 6 30 male 15.4 15.3 15.4 7 40 male 15.2 15.1 15.2 8 50 male 14.9 14.9 15.0
Recall that 95% of the 95% confidence intervals are expected to contain the population mean. Therefore, we can be fairly confident that the true population means lie somewhere between the bounds of the intervals, assuming that our model is good.
Key Points
Predictions of the mean in the outcome variable can be manually calculated using the model’s equation.
Predictions of multiple means in the outcome variable alongside 95% CIs can be obtained using the
make_predictions()
function.