# Making predictions from a multiple linear regression model

## Overview

Teaching:8 min

Exercises:12 minQuestions

How can we make predictions using the model equation of a multiple linear regression model?

How can we use R to obtain predictions from a multiple linear regression model?

Objectives

Calculate a prediction from a simple linear regression model using parameter estimates given by the model output.

Use the predict function to generate predictions from a multiple linear regression model.

As with the simple linear regression model, the multiple linear regression model allows us to make predictions. First we will calculate predictions using the model equation. Then we will see how R can calculate predictions for us using the `make_predictions()`

function.

## Calculating predictions manually

Let us use the `BPSysAve_Age_Sex`

model from the previous episode. The equation for this model was:

\(E(\text{BPSysAve}) = 93.39 + 0.55 \times \text{Age} + 17.06 \times x_2 - 0.28 \times x_2 \times \text{Age}\),

where $x_2 = 1$ for male participants and $0$ otherwise.

For a 30-year old female, the model predicts an average systolic blood pressure of

\(E(\text{BPSysAve}) = 93.39 + 0.55 \times 30 + 17.06 \times 0 - 0.28 \times 0 \times 30 = 93.39 + 16.5 = 109.89\).

For a 30-year old male, the model predicts an average systolic blood pressure of

\(E(\text{BPSysAve}) = 93.39 + 0.55 \times 30 + 17.06 \times 1 - 0.28 \times 1 \times 30 = 93.39 + 16.5 + 17.06 - 8.4 = 118.55\).

## Exercise

Given the

`summ`

output from our`Hemoglobin_Age_Sex`

model, the model can be described as \(E(\text{Hemoglobin}) = 13.29 + 0.00 \times \text{Age} + 2.76 \times x_2 - 0.02 \times x_2 \times \text{Age}\), where $x_2 = 1$ for participants of the male`Sex`

and $0$ otherwise.

- What level of Hemoglobin does the model predict, on average, for an individual of the female
`Sex`

with an`Age`

of 40?- What
`Hemoglobin`

is predicted for a male of the same`Age`

?## Solution

For 40-year old females: $13.29 + 0.00 \times 40 + 2.76 \times 0 - 0.02 \times 0 \times 40 = 13.29$.

For 40-year old males: $13.29 + 0.00 \times 40 + 2.76 \times 1 - 0.02 \times 1 \times 40 = 13.29 + 2.76 - 0.8 = 15.25$.

## Making predictions using `make_predictions()`

Using the `make_predictions()`

function brings two advantages. First, when calculating multiple predictions, we are saved the effort of inserting multiple values into our model manually and doing the calculations. Secondly, `make_predictions()`

returns 95% confidence intervals around the predictions, giving us a sense of the uncertainty around the predictions.

To use `make_predictions()`

, we need to create a `tibble`

with the explanatory variable values for which we wish to have mean predictions from the model. We do this using the `tibble()`

function. Note that the column names must correspond to the names of the explanatory variables in the model, i.e. `Age`

and `Sex`

. In the code below, we create a `tibble`

with `Age`

having the values 30, 40, 50 and 60 for females and males. We then provide `make_predictions()`

with this `tibble`

, alongside the model from which we wish to have predictions. By default, `make_predicitons()`

returns
95% confidence intervals for the mean predictions.

From the output we can see that the model predicts an average systolic blood pressure of 109.8596 for a 30-year old female. The confidence interval around this prediction is [109.0593, 110.6599].

```
BPSysAve_Age_Sex <- dat %>%
filter(Age > 17) %>%
lm(formula = BPSysAve ~ Age * Sex)
predictionDat <- tibble(Age = c(30, 40, 50, 60,
30, 40, 50, 60),
Sex = c("female", "female", "female", "female",
"male", "male", "male", "male"))
make_predictions(BPSysAve_Age_Sex, new_data = predictionDat)
```

```
# A tibble: 8 × 5
Age Sex BPSysAve ymin ymax
<dbl> <chr> <dbl> <dbl> <dbl>
1 30 female 110. 109. 111.
2 40 female 115. 115. 116.
3 50 female 121. 120. 121.
4 60 female 126. 126. 127.
5 30 male 119. 118. 119.
6 40 male 121. 121. 122.
7 50 male 124. 123. 125.
8 60 male 127. 126. 127.
```

## Exercise

Using the

`make_predictions()`

function, obtain the expected mean Hemoglobin levels predicted by the`Hemoglobin_Age_Sex`

model for individuals with an`Age`

of 20, 30, 40 and 50. Obtain confidence intervals for these predictions. How are these confidence intervals interpreted?## Solution

`Hemoglobin_Age_Sex <- dat %>% filter(Age > 17) %>% lm(formula = Hemoglobin ~ Age * Sex) predictionDat <- tibble(Age = c(20, 30, 40, 50, 20, 30, 40, 50), Sex = c("female", "female", "female", "female", "male", "male", "male", "male")) make_predictions(Hemoglobin_Age_Sex, new_data = predictionDat)`

`# A tibble: 8 × 5 Age Sex Hemoglobin ymin ymax <dbl> <chr> <dbl> <dbl> <dbl> 1 20 female 13.3 13.2 13.4 2 30 female 13.3 13.3 13.4 3 40 female 13.3 13.3 13.4 4 50 female 13.3 13.3 13.4 5 20 male 15.6 15.5 15.7 6 30 male 15.4 15.3 15.4 7 40 male 15.2 15.1 15.2 8 50 male 14.9 14.9 15.0`

Recall that 95% of the 95% confidence intervals are expected to contain the population mean. Therefore, we can be fairly confident that the true population means lie somewhere between the bounds of the intervals, assuming that our model is good.

## Key Points

Predictions of the mean in the outcome variable can be manually calculated using the model’s equation.

Predictions of multiple means in the outcome variable alongside 95% CIs can be obtained using the

`make_predictions()`

function.