This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Simple linear regression for public health: Glossary

Key Points

An introduction to linear regression
  • Simple linear regression requires one continuous dependent variable and one continuous or categorical explanatory variable. In addition, the assumptions of the model must hold.

  • The components of the model describe the mean of the dependent variable as a function of the explanatory variables, the mean of the dependent variable at the 0-point of the explanatory variable and the effect of the explanatory variable on the mean of dependent variable.

Linear regression with one continuous explanatory variable
  • As a first check of suitability, examine the relationship between the continuous variables using a scatterplot.

  • Use lm() to fit a simple linear regression model.

  • Use summ() to obtain parameter estimates for the model.

  • Use effect_plot() to visualise the model.

Linear regression with a two-level factor explanatory variable
  • As a first exploration of the data, construct a violin plot describing the relationship between the two variables.

  • Use lm() to fit a simple linear regression model.

  • Use summ() to obtain parameter estimates for the model.

  • The intercept estimates the mean in the outcome variable for the baseline group. The other parameter estimates the difference in the means in the outcome variable between the baseline and contrast group.

  • Use effect_plot() to visualise the estimated means per group along with their 95% CIs.

Making predictions from a simple linear regression model
  • Predictions of the mean in the outcome variable can be manually calculated using the model’s equation.

  • Predictions of multiple means in the outcome variable alongside 95% CIs can be obtained using the make_predictions() function.

Assessing simple linear regression model fit and assumptions
  • Assessing model fit is the process of visually checking whether the model fits the data sufficiently well.

  • $R^2$ quantifies the proportion of variation in the response variable explained by the explanatory variable. An $R^2$ close to 1 indicates that most variation is accounted for by the model, while an $R^2$ close to 0 indicates that the model does not perform much better than predicting the mean of the response.

  • The six assumptions of the simple linear regression model are validity, representativeness, linearity and additivity, independence of errors, homoscedasticity of the residuals and normality of the residuals.

  • We can check the assumptions of a simple linear regression model by carefully considering our research question, the data set that we are using and by visualising our model parameters.

Optional: linear regression with a multi-level factor explanatory variable
  • As a first exploration of the data, construct a violin plot to describe the relationship between the two variables.

  • Use lm() to fit the simple linear regression model.

  • Use summ() to obtain parameter estimates for the model.

  • The intercept estimates the mean in the outcome variable for the baseline group. The other parameters estimate the differences in the means in the outcome variable between the baseline and contrast groups.

  • Use effect_plot() to visualise the estimated means per group along with their 95% CIs.

Glossary

FIXME