class: center, middle, inverse, title-slide # Regularised regression --- # Feature-feature correlation in "Prostate" <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-corr-mat-prostate-1.png" width="450px" /> --- # Feature-feature correlation in methylation <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-corr-mat-meth-1.png" width="450px" /> --- # Finding the best linear model <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-regplot-1.png" width="600px" /> --- # Ridge regression restricts the coefficients <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-ridgeplot-1.png" width="450px" /> --- # Ridge regression formally Ridge regression uses the normal linear regression loss function: $$ \sum_{i=1}^N (y_i - \hat{y}_i)^2 $$ -- combined with the squared *L2 norm* of the coefficients: $$ \left\lVert \beta\right\lVert^2 = \sum_{j=1}^p \beta_j^2 $$ -- This gives us a modified least squares, including a weight, `\(\lambda\)`, for the L2 norm: $$ \sum_{i=1}^N (y_i - \hat{y}_i)^2 + \lambda \left\lVert \beta\right\lVert^2 $$ --- # LASSO restricts the coefficient values <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-shrink-lasso-1.png" width="450px" /> --- # LASSO definition As with ridge regression, LASSO uses the normal linear regression loss function: $$ \sum_{i=1}^N (y_i - \hat{y}_i)^2 $$ but here this is combined with the *L1 norm* of the coefficients: $$ \left\lVert \beta \right\lVert^1 = \sum_{j=1}^p |\beta_j| $$ As before this gives us a modified least squares: $$ \sum_{i=1}^N (y_i - \hat{y}_i)^2 + \lambda \left\lVert \beta\right\lVert^2 $$ --- # Cross-validation is more rigorous than one test/training split <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/cross_validation.png" width="450px" /> --- # Selecting features can work <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-heatmap-lasso-1.png" width="500px" /> --- # What is the elastic net? The elastic net aims to have the best parts of ridge regression: - good performance with correlated variables - generalisable models -- with the good properties of LASSO: - sparse models - variable selection --- # What is the elastic net? For the elastic net, we using the normal linear regression loss, a bit of ridge (L2 norm) penalty, and a bit of LASSO (L1 norm) penalty: $$ \left(\sum_{i=1}^N (y_i - \hat{y}_i)^2 \right) + \lambda (\alpha \left\lVert \beta \right\lVert^1) + \lambda ((1 - \alpha) \left\lVert \beta \right\lVert^2) $$ -- `\(\alpha\)` controls the blend of the two penalties, with 1 giving pure LASSO, and 0 giving pure ridge regression. --- # The elastic net blends LASSO and ridge regression <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-03-elastic-contour-1.png" width="900px" />