Regularised regression

class: center, middle, inverse, title-slide

# Regularised regression

---

# Feature-feature correlation in "Prostate"
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-corr-mat-prostate-1.png" width="450px" />

---
# Feature-feature correlation in methylation
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-corr-mat-meth-1.png" width="450px" />

---
# Finding the best linear model
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-regplot-1.png" width="600px" />

---

# Ridge regression restricts the coefficients

---

# Ridge regression formally

Ridge regression uses the normal linear regression loss function:

$$
    \sum_{i=1}^N (y_i - \hat{y}_i)^2
$$

combined with the squared *L2 norm* of the coefficients:

$$
    \left\lVert \beta\right\lVert^2 = \sum_{j=1}^p \beta_j^2
$$
--

This gives us a modified least squares, including a weight, `$\lambda$`, 
for the L2 norm:

$$
    \sum_{i=1}^N (y_i - \hat{y}_i)^2 +
        \lambda \left\lVert \beta\right\lVert^2
$$

---

# LASSO restricts the coefficient values
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-shrink-lasso-1.png" width="450px" />

---

# LASSO definition

As with ridge regression, LASSO uses the normal linear regression loss function:

$$
    \sum_{i=1}^N (y_i - \hat{y}_i)^2
$$

but here this is combined with the *L1 norm* of the coefficients:

$$
    \left\lVert \beta \right\lVert^1 = \sum_{j=1}^p |\beta_j|
$$

As before this gives us a modified least squares:

$$
    \sum_{i=1}^N (y_i - \hat{y}_i)^2 +
        \lambda \left\lVert \beta\right\lVert^2
$$

---
# Cross-validation is more rigorous than one test/training split
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/cross_validation.png" width="450px" />

---
# Selecting features can work
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-heatmap-lasso-1.png" width="500px" />

---
# What is the elastic net?

The elastic net aims to have the best parts of ridge regression:
- good performance with correlated variables
- generalisable models

with the good properties of LASSO:
- sparse models
- variable selection

---

# What is the elastic net?

For the elastic net, we using the normal linear regression loss, a bit of ridge (L2 norm)
penalty, and a bit of LASSO (L1 norm) penalty:

$$
    \left(\sum_{i=1}^N (y_i - \hat{y}_i)^2 \right) +
        \lambda (\alpha \left\lVert \beta \right\lVert^1) +
        \lambda ((1 - \alpha) \left\lVert \beta \right\lVert^2)
$$

`$\alpha$` controls the blend of the two penalties, with 1 giving pure LASSO,
and 0 giving pure ridge regression.

---
# The elastic net blends LASSO and ridge regression
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-elastic-contour-1.png" width="900px" />