This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Multiple linear regression for public health: Glossary

Key Points

Linear regression with one continuous and one categorical explanatory variable
  • A scatterplot, with points coloured by the levels of a categorical variable, can be used to explore the relationship between two continuous variables and a categorical variable.

  • The categorical variable can be added to the formula in lm() using a +.

  • The model output shows separate intercepts for the levels of the categorical variable. The slope across the levels of the categorical variable is held constant.

  • Parallel lines can be added to the exploratory scatterplot to visualise the linear regression model.

Linear regression including an interaction between one continuous and one categorical explanatory variable
  • It may be appropriate to include an interaction when the slopes appear to differ across levels of a categorical variable.

  • Replace + by * in the lm() command to add an interaction.

  • When an interaction is included, two coefficients relate to differences between the two levels of a categorical variable - one relates to a difference in the intercept, the other to a difference in the slope.

  • The function interact_plot() can be used to visualise the model.

Making predictions from a multiple linear regression model
  • Predictions of the mean in the outcome variable can be manually calculated using the model’s equation.

  • Predictions of multiple means in the outcome variable alongside 95% CIs can be obtained using the make_predictions() function.

Assessing multiple linear regression model fit and assumptions
  • The adjusted R squared measure ensures that the metric does not increase simply due to the addition of a variable. The variable needs to improve model fit for the adjusted R squared to increase.

  • The same assumptions hold for simple and multiple linear regression, however more steps are involved in the assessment of the assumptions in the context of multiple linear regression.