This lesson is still being designed and assembled (Pre-Alpha version)

# Differential expression analysis

## Overview

Teaching: XX min
Exercises: XX min
Questions
• How do we find differentially expressed genes?

Objectives
• Explain the steps involved in a differential expression analysis.

• Explain how to perform these steps in R, using DESeq2.

## Contribute!

This episode is intended to introduce the concepts required to perform differential expression analysis with RNA-seq data. Explain concepts like size factors, count modeling (Negative Binomial), dispersion, interpretation of the test output, multiple testing correction.

``````suppressPackageStartupMessages({
library(SummarizedExperiment)
library(DESeq2)
library(ggplot2)
library(ExploreModelMatrix)
library(cowplot)
library(ComplexHeatmap)
})
``````

``````se <- readRDS("data/GSE96870_se.rds")
``````

# Create DESeqDataSet

``````dds <- DESeq2::DESeqDataSet(se[, se\$tissue == "Cerebellum"],
design = ~ sex + time)
``````
``````Warning in DESeq2::DESeqDataSet(se[, se\$tissue == "Cerebellum"], design = ~sex + : some variables in design formula are
characters, converting to factors
``````

# Run DESeq()

## Contribute!

The concepts may be clearer if the steps of DESeq() are first performed separately, followed by a note that they can be performed in a single step using DESeq().

``````dds <- DESeq2::DESeq(dds)
``````
``````estimating size factors
``````
``````estimating dispersions
``````
``````gene-wise dispersion estimates
``````
``````mean-dispersion relationship
``````
``````final dispersion estimates
``````
``````fitting model and testing
``````
``````plotDispEsts(dds)
`````` # Extract results for specific contrasts

## Contribute!

Refer back to the episode about experimental design.

``````## Day 8 vs Day 0
resTime <- DESeq2::results(dds, contrast = c("time", "Day8", "Day0"))
summary(resTime)
``````
``````
out of 32652 with nonzero total read count
LFC > 0 (up)       : 4472, 14%
LFC < 0 (down)     : 4276, 13%
outliers        : 10, 0.031%
low counts      : 8732, 27%
(mean count < 1)
 see 'cooksCutoff' argument of ?results
 see 'independentFiltering' argument of ?results
``````
``````head(resTime[order(resTime\$pvalue), ])
``````
``````log2 fold change (MLE): time Day8 vs Day0
Wald test p-value: time Day8 vs Day0
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange     lfcSE      stat      pvalue        padj
<numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
Asl             701.343        1.11733 0.0592541   18.8565 2.59885e-79 6.21386e-75
Apod          18765.146        1.44698 0.0805186   17.9708 3.30147e-72 3.94690e-68
Cyp2d22        2550.480        0.91020 0.0554756   16.4072 1.69794e-60 1.35326e-56
Klk6            546.503       -1.67190 0.1058989  -15.7877 3.78228e-56 2.26086e-52
Fcrls           184.235       -1.94701 0.1279847  -15.2128 2.90708e-52 1.39017e-48
A330076C08Rik   107.250       -1.74995 0.1154279  -15.1606 6.45112e-52 2.57077e-48
``````
``````DESeq2::plotMA(resTime)
`````` ``````## Male vs Female
resSex <- DESeq2::results(dds, contrast = c("sex", "Male", "Female"))
summary(resSex)
``````
``````
out of 32652 with nonzero total read count
LFC > 0 (up)       : 53, 0.16%
LFC < 0 (down)     : 71, 0.22%
outliers        : 10, 0.031%
low counts      : 13717, 42%
(mean count < 6)
 see 'cooksCutoff' argument of ?results
 see 'independentFiltering' argument of ?results
``````
``````head(resSex[order(resSex\$pvalue), ])
``````
``````log2 fold change (MLE): sex Male vs Female
Wald test p-value: sex Male vs Female
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange     lfcSE      stat       pvalue         padj
<numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
Xist         22603.0359      -11.60429  0.336282  -34.5076 6.16852e-261 1.16739e-256
Ddx3y         2072.9436       11.87241  0.397493   29.8683 5.08722e-196 4.81378e-192
Eif2s3y       1410.8750       12.62514  0.565216   22.3369 1.62066e-110 1.02237e-106
Kdm5d          692.1672       12.55386  0.593627   21.1477  2.89566e-99  1.37001e-95
Uty            667.4375       12.01728  0.593591   20.2451  3.92780e-91  1.48667e-87
LOC105243748    52.9669        9.08325  0.597624   15.1989  3.59432e-52  1.13371e-48
``````
``````DESeq2::plotMA(resSex)
`````` # Visualize selected set of genes

## Contribute!

Here we intend to practice how to interpret the results from the differential expression analysis. Refer back to the exploratory/QC episode.

``````vsd <- DESeq2::vst(dds, blind = TRUE)

heatmapData <- assay(vsd)[genes, ]
heatmapData <- t(scale(t(heatmapData)))
heatmapColAnnot <- data.frame(colData(vsd)[, c("time", "sex")])
idx <- order(vsd\$time)
heatmapData <- heatmapData[, idx]
heatmapColAnnot <- HeatmapAnnotation(df = heatmapColAnnot[idx, ])
ComplexHeatmap::Heatmap(heatmapData,
top_annotation = heatmapColAnnot,
cluster_rows = TRUE, cluster_columns = FALSE)
`````` • Key point 1