class: center, middle, inverse, title-slide # Regression with many features --- # Heatmap of methylation values <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-heatmap-1.png" width="450px" /> --- # A strong linear association with a continuous predictor. <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-example1-1.png" width="450px" /> --- # A strong linear association with a discrete predictor. <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-example2-1.png" width="450px" /> --- # A weak linear association with a discrete predictor. <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-example3-1.png" width="450px" /> --- # Calculating a t-statistic The t-statistic for the `\(k^{th}\)` linear model coefficient is defined as `\(t_{k} = \frac{\hat{\beta}_{k}}{SE\left(\hat{\beta}_{k}\right)}\)` This means that large `\(\beta\)` estimates *or* small `\(SE(\beta)\)` estimates lead to large test statistics. --- # p-values as measures of the null distribution. <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-tdist-1.png" width="450px" /> --- # Effect sizes from lm and limma <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-plot-limma-lm-effect-1.png" width="450px" /> --- # p-values from lm and limma <img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-plot-limma-lm-pval-1.png" width="450px" /> --- # Accepting or rejecting the null | |Reject null |Accept null| |-------------:|-------------:|--------------:| |Null is true |True positive |False negative | |Null is false |False positive|True negative | --- # Calculating FDR p-values 2. Assign each p-value a rank (1 is smallest) 3. Calculate the critical value $$ q = \frac{i}{m}Q $$ where `\(i\)` is rank, `\(m\)` is the number of tests, and `\(Q\)` is the false discovery rate we want to target. 4. Find the largest p-value less than the critical value. All smaller than this are significant. --- # Comparing FWER and FDR properties |FWER|FDR| |:-------------|:--------------| |+ Controls probability of identifying a false positive|+ Controls rate of false discoveries| |+ Strict error rate control |+ Allows error control with less stringency| |- Very conservative |- Does not control probability of making errors| |- Requires larger statistical power|- May result in false discoveries|