This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

This lesson is part of The Carpentries Incubator, a place to share and use each other's Carpentries-style lessons. This lesson has not been reviewed by and is not endorsed by The Carpentries.

Simple linear regression for public health: Glossary

Key Points

An introduction to linear regression	Simple linear regression requires one continuous dependent variable and one continuous or categorical explanatory variable. In addition, the assumptions of the model must hold. The components of the model describe the mean of the dependent variable as a function of the explanatory variables, the mean of the dependent variable at the 0-point of the explanatory variable and the effect of the explanatory variable on the mean of dependent variable.
Linear regression with one continuous explanatory variable	As a first check of suitability, examine the relationship between the continuous variables using a scatterplot. Use `lm()` to fit a simple linear regression model. Use `summ()` to obtain parameter estimates for the model. Use `effect_plot()` to visualise the model.
Linear regression with a two-level factor explanatory variable	As a first exploration of the data, construct a violin plot describing the relationship between the two variables. Use `lm()` to fit a simple linear regression model. Use `summ()` to obtain parameter estimates for the model. The intercept estimates the mean in the outcome variable for the baseline group. The other parameter estimates the difference in the means in the outcome variable between the baseline and contrast group. Use `effect_plot()` to visualise the estimated means per group along with their 95% CIs.
Making predictions from a simple linear regression model	Predictions of the mean in the outcome variable can be manually calculated using the model’s equation. Predictions of multiple means in the outcome variable alongside 95% CIs can be obtained using the `make_predictions()` function.
Assessing simple linear regression model fit and assumptions	Assessing model fit is the process of visually checking whether the model fits the data sufficiently well. $R^2$ quantifies the proportion of variation in the response variable explained by the explanatory variable. An $R^2$ close to 1 indicates that most variation is accounted for by the model, while an $R^2$ close to 0 indicates that the model does not perform much better than predicting the mean of the response. The six assumptions of the simple linear regression model are validity, representativeness, linearity and additivity, independence of errors, homoscedasticity of the residuals and normality of the residuals. We can check the assumptions of a simple linear regression model by carefully considering our research question, the data set that we are using and by visualising our model parameters.
Optional: linear regression with a multi-level factor explanatory variable	As a first exploration of the data, construct a violin plot to describe the relationship between the two variables. Use `lm()` to fit the simple linear regression model. Use `summ()` to obtain parameter estimates for the model. The intercept estimates the mean in the outcome variable for the baseline group. The other parameters estimate the differences in the means in the outcome variable between the baseline and contrast groups. Use `effect_plot()` to visualise the estimated means per group along with their 95% CIs.

Glossary

FIXME