# Making predictions from a simple linear regression model

## Overview

Teaching:10 min

Exercises:10 minQuestions

How can predictions be manually obtained from a simple linear regression model?

How can R be used to obtain predictions from a simple linear regression model?

Objectives

Calculate a prediction from a simple linear regression model using parameter estimates given by the model output.

Use the predict function to generate predictions from a simple linear regression model.

One of the features of linear regression is prediction: a model presents predicted mean values for the outcome variable for any values of the explanatory variables. We have already seen this in the previous episodes through our `effect_plot()`

outputs, which showed mean predicted responses as straight lines (episode 2) or individual points for levels of a categorical variable (episodes 3). Here, we will see how to obtain predicted values and the uncertainty surrounding them.

## Calculating predictions manually

First, we can calculate a predicted value manually. From the `summ()`

output associated with our `Weight_Height_lm`

model from episode 2, we can write the model as $E(\text{Weight}) = \beta_0 + \beta_1 \times \text{Height} = -70.194 + 0.901 \times \text{Height}$. The output can be found again below. If we take a height of 165 cm, then our model predicts an average weight of $-70.194 + 0.901 \times 165 = 78.471$ kg.

```
Weight_Height_lm <- dat %>%
filter(Age > 17) %>%
lm(formula = Weight ~ Height)
summ(Weight_Height_lm)
```

```
MODEL INFO:
Observations: 6177 (320 missing obs. deleted)
Dependent Variable: Weight
Type: OLS linear regression
MODEL FIT:
F(1,6175) = 1398.22, p = 0.00
R² = 0.18
Adj. R² = 0.18
Standard errors: OLS
-------------------------------------------------
Est. S.E. t val. p
----------------- -------- ------ -------- ------
(Intercept) -70.19 4.06 -17.29 0.00
Height 0.90 0.02 37.39 0.00
-------------------------------------------------
```

## Exercise

Given the

`summ`

output from our`BPSysAve_AgeMonths_lm`

model, the model can be described as$E(\text{BPSysAve}) = \beta_0 + \beta_1 \times \text{Age (months)} = 101.812 + 0.033 \times \text{Age (months)}$.

What level of average systolic blood pressure does the model predict, on average, for an individual with an age of 480 months?

## Solution

$101.812 + 0.033 * 480 = 117.652 \text{mmHg}$.

## Making predictions using `make_predictions()`

Using the `make_predictions()`

function brings two advantages. First, when calculating multiple predictions, we are saved the effort of inserting multiple values into our model manually and doing the calculations. Secondly, `make_predictions()`

returns 95% confidence intervals around the predictions, giving us a sense of the uncertainty around the predictions.

To use `make_predictions()`

, we need to create a `tibble`

with the explanatory variable values for which we wish to have mean predictions from the model. We do this using the `tibble()`

function. Note that the column name must correspond to the name of the explanatory variable in the model, i.e. `Height`

. In the code below, we create a `tibble`

with the values 150, 160, 170 and 180. We then provide `make_predictions()`

with this `tibble`

, alongside the model from which we wish to have predictions. By default, 95% confidence intervals are returned.

We see that the model predicts an average weight of 64.88 kg for an individual with a height of 150 cm, with a 95% confidence interval of [63.9kg, 65.9kg].

```
Heights <- tibble(Height = c(150, 160, 170, 180))
make_predictions(Weight_Height_lm, new_data = Heights)
```

```
# A tibble: 4 × 4
Height Weight ymin ymax
<dbl> <dbl> <dbl> <dbl>
1 150 64.9 63.9 65.9
2 160 73.9 73.3 74.5
3 170 82.9 82.4 83.4
4 180 91.9 91.2 92.6
```

## Exercise

- Using the
`make_predictions()`

function, obtain the expected mean average systolic blood pressure levels predicted by the`BPSysAve_AgeMonths_lm`

model for individuals with an age of 300, 400, 500 and 600 months.- Obtain 95% confidence intervals for these predictions.
- How are these confidence intervals interpreted?
## Solution

`BPSysAve_AgeMonths_lm <- dat %>% filter(Age > 17) %>% lm(formula = BPSysAve ~ AgeMonths) ages <- tibble(AgeMonths = c(300, 400, 500, 600)) make_predictions(BPSysAve_AgeMonths_lm, new_data = ages)`

`# A tibble: 4 × 4 AgeMonths BPSysAve ymin ymax <dbl> <dbl> <dbl> <dbl> 1 300 112. 111. 113. 2 400 115. 114. 116. 3 500 118. 118. 119. 4 600 121. 121. 122.`

Recall that 95% of 95% confidence intervals are expected to contain the population mean. Therefore, we can be fairly confident that the true population means lie somewhere between the bounds of the intervals, assuming that our model is good.

## Key Points

Predictions of the mean in the outcome variable can be manually calculated using the model’s equation.

Predictions of multiple means in the outcome variable alongside 95% CIs can be obtained using the

`make_predictions()`

function.