This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Logistic regression for public health: Glossary

Key Points

An introduction to binary response variables
  • The probabilities of success and failure are estimated as the proportions of participants with a success and failure, respectively.

  • The expectation of a binary variable equals the probability of success.

  • The odds equal the ratio of the probability of success and one minus the probability of success. The odds quantify how many times more likely success is than failure.

  • The log odds are calculated by taking the log of the odds. When the log odds are greater than 0, the probability of success is greater than 0.5.

An introduction to logistic regression
  • Logistic regression requires one binary dependent variable and one or more continuous or categorical explanatory variables.

  • The model equation in terms of the log odds is $\text{logit}(E(y)) = \beta_0 + \beta_1 \times x_1$.

  • The model equation in terms of the probability of success is $E(y) = \text{logit}^{-1}(\beta_0 + \beta_1 \times x_1)$.

  • The odds is multiplied by $e^{\beta_1}$ for a one-unit increase in $x_1$.

Logistic regression with one continuous explanatory variable
  • A violin plot can be used to explore the relationship between a binary response variable and a continuous explanatory variable.

  • Instead of lm(), glm() with family = binomial is used to fit a logistic regression model.

  • The default summ() output shows the model coefficients in terms of the log odds.

  • Adding exp = TRUE to summ() allows us to interpret the model in terms of the multiplicative change in the odds of success.

  • The logistic regression model is visualised in terms of the probability of success.

Making predictions from a logistic regression model
  • Predictions of the log odds, the odds and the probability of success can be manually calculated using the model’s equation.

  • Predictions of the log odds, the odds and the probability of success alongside 95% CIs can be obtained using the make_predictions() function.

Assessing logistic regression fit and assumptions
  • McFadden’s $R^2$ measures relative performance, compared to a model that always predicts the mean. Binned residual plots allow us to check whether the residuals have a pattern and whether particular residuals are larger than expected, both indicating poor model fit.

  • The logistic regression assumptions are similar to the linear regression assumptions. However, linearity and additivity are checked with respect to the logit of the outcome variable. In addition, homoscedasticity and normality of residuals are not assumptions of binary logistic regression.

Glossary

FIXME