Regression with many features

class: center, middle, inverse, title-slide

# Regression with many features

---

# Heatmap of methylation values
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-heatmap-1.png" width="450px" />

---
# A strong linear association with a continuous predictor.
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-example1-1.png" width="450px" />

---
# A strong linear association with a discrete predictor.
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-example2-1.png" width="450px" />

---
# A weak linear association with a discrete predictor.
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-example3-1.png" width="450px" />

---

# Calculating a t-statistic

The t-statistic for the `$k^{th}$` linear model coefficient is defined as

`$t_{k} = \frac{\hat{\beta}_{k}}{SE\left(\hat{\beta}_{k}\right)}$`

This means that large `$\beta$` estimates *or* small `$SE(\beta)$` estimates
lead to large test statistics.

---
# p-values as measures of the null distribution.
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-tdist-1.png" width="450px" />

---
# Effect sizes from lm and limma
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-plot-limma-lm-effect-1.png" width="450px" />

---
# p-values from lm and limma
<img src="data:image/png;base64,#/home/alan/Documents/github/carpentries/high-dimensional-stats-r/fig/rmd-02-plot-limma-lm-pval-1.png" width="450px" />

---

# Accepting or rejecting the null

---

# Calculating FDR p-values

2. Assign each p-value a rank (1 is smallest)
3. Calculate the critical value
    
    $$
    q = \frac{i}{m}Q
    $$

where `$i$` is rank, `$m$` is the number of tests, and `$Q$` is the
    false discovery rate we want to target.
4. Find the largest p-value less than the critical value.
   All smaller than this are significant.

---

# Comparing FWER and FDR properties

|FWER|FDR|
|:-------------|:--------------|
|+ Controls probability of identifying a false positive|+ Controls rate of false discoveries|
|+ Strict error rate control |+ Allows error control with less stringency|
|- Very conservative |- Does not control probability of making errors|
|- Requires larger statistical power|- May result in false discoveries|