Extra exploration of design matrices

Last updated on 2023-09-02 | Edit this page

Estimated time 60 minutes

Overview

Questions

  • How can one translate biological questions and comparisons to statistical terms suitable for use with RNA-seq analysis packages?

Objectives

  • Explain the formula notation and design matrices.
  • Explore different designs and learn how to interpret coefficients.

Loading required packages and reading data


We start by loading a few packages that will be needed in this episode. In particular, the ExploreModelMatrix package provides resources for exploring design matrices in a graphical fashion, for easier interpretation.

R

suppressPackageStartupMessages({
    library(SummarizedExperiment)
    library(ExploreModelMatrix)
    library(dplyr)
})

Next, we read the metadata table for our data set. Because we want to explore many different design matrices, we will read in the 4th file we downloaded but haven’t used yet: that for both Cerebellum and Spinal Cord samples (45 samples total). As seen in previous episodes, the metadata contains information about the age, sex, infection status, time of measurement and tissue of the collected samples. Note that Day0 always corresponds to non-infected samples, and that infected samples are collected on days 4 and 8. Moreover, all mice have the same age (8 weeks). Hence, in the first part of this episode we consider only the sex, tissue and time variables further.

R

meta <- read.csv("data/GSE96870_coldata_all.csv", row.names = 1)
meta

OUTPUT

                     title geo_accession     organism     age    sex
GSM2545336 CNS_RNA-seq_10C    GSM2545336 Mus musculus 8 weeks Female
GSM2545337 CNS_RNA-seq_11C    GSM2545337 Mus musculus 8 weeks Female
GSM2545338 CNS_RNA-seq_12C    GSM2545338 Mus musculus 8 weeks Female
GSM2545339 CNS_RNA-seq_13C    GSM2545339 Mus musculus 8 weeks Female
GSM2545340 CNS_RNA-seq_14C    GSM2545340 Mus musculus 8 weeks   Male
GSM2545341 CNS_RNA-seq_17C    GSM2545341 Mus musculus 8 weeks   Male
GSM2545342  CNS_RNA-seq_1C    GSM2545342 Mus musculus 8 weeks Female
GSM2545343 CNS_RNA-seq_20C    GSM2545343 Mus musculus 8 weeks   Male
GSM2545344 CNS_RNA-seq_21C    GSM2545344 Mus musculus 8 weeks Female
GSM2545345 CNS_RNA-seq_22C    GSM2545345 Mus musculus 8 weeks   Male
GSM2545346 CNS_RNA-seq_25C    GSM2545346 Mus musculus 8 weeks   Male
GSM2545347 CNS_RNA-seq_26C    GSM2545347 Mus musculus 8 weeks   Male
GSM2545348 CNS_RNA-seq_27C    GSM2545348 Mus musculus 8 weeks Female
GSM2545349 CNS_RNA-seq_28C    GSM2545349 Mus musculus 8 weeks   Male
GSM2545350 CNS_RNA-seq_29C    GSM2545350 Mus musculus 8 weeks   Male
GSM2545351  CNS_RNA-seq_2C    GSM2545351 Mus musculus 8 weeks Female
GSM2545352 CNS_RNA-seq_30C    GSM2545352 Mus musculus 8 weeks Female
GSM2545353  CNS_RNA-seq_3C    GSM2545353 Mus musculus 8 weeks Female
GSM2545354  CNS_RNA-seq_4C    GSM2545354 Mus musculus 8 weeks   Male
GSM2545355 CNS_RNA-seq_571    GSM2545355 Mus musculus 8 weeks   Male
GSM2545356 CNS_RNA-seq_574    GSM2545356 Mus musculus 8 weeks   Male
GSM2545357 CNS_RNA-seq_575    GSM2545357 Mus musculus 8 weeks   Male
GSM2545358 CNS_RNA-seq_583    GSM2545358 Mus musculus 8 weeks Female
GSM2545359 CNS_RNA-seq_585    GSM2545359 Mus musculus 8 weeks Female
GSM2545360 CNS_RNA-seq_589    GSM2545360 Mus musculus 8 weeks   Male
GSM2545361 CNS_RNA-seq_590    GSM2545361 Mus musculus 8 weeks   Male
GSM2545362  CNS_RNA-seq_5C    GSM2545362 Mus musculus 8 weeks Female
GSM2545363  CNS_RNA-seq_6C    GSM2545363 Mus musculus 8 weeks   Male
GSM2545364 CNS_RNA-seq_709    GSM2545364 Mus musculus 8 weeks Female
GSM2545365 CNS_RNA-seq_710    GSM2545365 Mus musculus 8 weeks Female
GSM2545366 CNS_RNA-seq_711    GSM2545366 Mus musculus 8 weeks Female
GSM2545367 CNS_RNA-seq_713    GSM2545367 Mus musculus 8 weeks   Male
GSM2545368 CNS_RNA-seq_728    GSM2545368 Mus musculus 8 weeks   Male
GSM2545369 CNS_RNA-seq_729    GSM2545369 Mus musculus 8 weeks   Male
GSM2545370 CNS_RNA-seq_730    GSM2545370 Mus musculus 8 weeks Female
GSM2545371 CNS_RNA-seq_731    GSM2545371 Mus musculus 8 weeks Female
GSM2545372 CNS_RNA-seq_733    GSM2545372 Mus musculus 8 weeks   Male
GSM2545373 CNS_RNA-seq_735    GSM2545373 Mus musculus 8 weeks   Male
GSM2545374 CNS_RNA-seq_736    GSM2545374 Mus musculus 8 weeks Female
GSM2545375 CNS_RNA-seq_738    GSM2545375 Mus musculus 8 weeks Female
GSM2545376 CNS_RNA-seq_740    GSM2545376 Mus musculus 8 weeks Female
GSM2545377 CNS_RNA-seq_741    GSM2545377 Mus musculus 8 weeks Female
GSM2545378 CNS_RNA-seq_742    GSM2545378 Mus musculus 8 weeks   Male
GSM2545379 CNS_RNA-seq_743    GSM2545379 Mus musculus 8 weeks   Male
GSM2545380  CNS_RNA-seq_9C    GSM2545380 Mus musculus 8 weeks Female
             infection  strain time     tissue mouse
GSM2545336  InfluenzaA C57BL/6 Day8 Cerebellum    14
GSM2545337 NonInfected C57BL/6 Day0 Cerebellum     9
GSM2545338 NonInfected C57BL/6 Day0 Cerebellum    10
GSM2545339  InfluenzaA C57BL/6 Day4 Cerebellum    15
GSM2545340  InfluenzaA C57BL/6 Day4 Cerebellum    18
GSM2545341  InfluenzaA C57BL/6 Day8 Cerebellum     6
GSM2545342  InfluenzaA C57BL/6 Day8 Cerebellum     5
GSM2545343 NonInfected C57BL/6 Day0 Cerebellum    11
GSM2545344  InfluenzaA C57BL/6 Day4 Cerebellum    22
GSM2545345  InfluenzaA C57BL/6 Day4 Cerebellum    13
GSM2545346  InfluenzaA C57BL/6 Day8 Cerebellum    23
GSM2545347  InfluenzaA C57BL/6 Day8 Cerebellum    24
GSM2545348 NonInfected C57BL/6 Day0 Cerebellum     8
GSM2545349 NonInfected C57BL/6 Day0 Cerebellum     7
GSM2545350  InfluenzaA C57BL/6 Day4 Cerebellum     1
GSM2545351  InfluenzaA C57BL/6 Day8 Cerebellum    16
GSM2545352  InfluenzaA C57BL/6 Day4 Cerebellum    21
GSM2545353 NonInfected C57BL/6 Day0 Cerebellum     4
GSM2545354 NonInfected C57BL/6 Day0 Cerebellum     2
GSM2545355  InfluenzaA C57BL/6 Day4 Spinalcord     1
GSM2545356 NonInfected C57BL/6 Day0 Spinalcord     2
GSM2545357 NonInfected C57BL/6 Day0 Spinalcord     3
GSM2545358 NonInfected C57BL/6 Day0 Spinalcord     4
GSM2545359  InfluenzaA C57BL/6 Day8 Spinalcord     5
GSM2545360  InfluenzaA C57BL/6 Day8 Spinalcord     6
GSM2545361 NonInfected C57BL/6 Day0 Spinalcord     7
GSM2545362  InfluenzaA C57BL/6 Day4 Cerebellum    20
GSM2545363  InfluenzaA C57BL/6 Day4 Cerebellum    12
GSM2545364 NonInfected C57BL/6 Day0 Spinalcord     8
GSM2545365 NonInfected C57BL/6 Day0 Spinalcord     9
GSM2545366 NonInfected C57BL/6 Day0 Spinalcord    10
GSM2545367 NonInfected C57BL/6 Day0 Spinalcord    11
GSM2545368  InfluenzaA C57BL/6 Day4 Spinalcord    12
GSM2545369  InfluenzaA C57BL/6 Day4 Spinalcord    13
GSM2545370  InfluenzaA C57BL/6 Day8 Spinalcord    14
GSM2545371  InfluenzaA C57BL/6 Day4 Spinalcord    15
GSM2545372  InfluenzaA C57BL/6 Day8 Spinalcord    17
GSM2545373  InfluenzaA C57BL/6 Day4 Spinalcord    18
GSM2545374  InfluenzaA C57BL/6 Day8 Spinalcord    19
GSM2545375  InfluenzaA C57BL/6 Day4 Spinalcord    20
GSM2545376  InfluenzaA C57BL/6 Day4 Spinalcord    21
GSM2545377  InfluenzaA C57BL/6 Day4 Spinalcord    22
GSM2545378  InfluenzaA C57BL/6 Day8 Spinalcord    23
GSM2545379  InfluenzaA C57BL/6 Day8 Spinalcord    24
GSM2545380  InfluenzaA C57BL/6 Day8 Cerebellum    19

R

table(meta$time, meta$infection)

OUTPUT

      
       InfluenzaA NonInfected
  Day0          0          15
  Day4         16           0
  Day8         14           0

R

table(meta$age)

OUTPUT


8 weeks 
     45 

We can start by visualizing the number of observations for each combination of the three predictor variables.

R

vd <- VisualizeDesign(sampleData = meta, 
                      designFormula = ~ tissue + time + sex)
vd$cooccurrenceplots

OUTPUT

$`tissue = Cerebellum`

OUTPUT


$`tissue = Spinalcord`

Challenge

Based on this visualization, would you say that the data set is balanced, or are there combinations of predictor variables that are severely over- or underrepresented?

Compare males and females, non-infected spinal cord


Next, we will set up our first design matrix. Here, we will focus on the uninfected (Day0) spinal cord samples, and our aim is to compare the male and female mice. Thus, we first subset the metadata to only the samples of interest, and next set up and visualize the design matrix with a single predictor variable (sex). By defining the design formula as ~ sex, we tell R to include an intercept in the design. This intercept will represent the ‘baseline’ level of the predictor variable, which in this case is selected to be the Female mice. If not explicitly specified, R will order the values of the predictor in alphabetical order and select the first one as the reference or baseline level.

R

## Subset metadata
meta_noninf_spc <- meta %>% filter(time == "Day0" & 
                                       tissue == "Spinalcord")
meta_noninf_spc

OUTPUT

                     title geo_accession     organism     age    sex
GSM2545356 CNS_RNA-seq_574    GSM2545356 Mus musculus 8 weeks   Male
GSM2545357 CNS_RNA-seq_575    GSM2545357 Mus musculus 8 weeks   Male
GSM2545358 CNS_RNA-seq_583    GSM2545358 Mus musculus 8 weeks Female
GSM2545361 CNS_RNA-seq_590    GSM2545361 Mus musculus 8 weeks   Male
GSM2545364 CNS_RNA-seq_709    GSM2545364 Mus musculus 8 weeks Female
GSM2545365 CNS_RNA-seq_710    GSM2545365 Mus musculus 8 weeks Female
GSM2545366 CNS_RNA-seq_711    GSM2545366 Mus musculus 8 weeks Female
GSM2545367 CNS_RNA-seq_713    GSM2545367 Mus musculus 8 weeks   Male
             infection  strain time     tissue mouse
GSM2545356 NonInfected C57BL/6 Day0 Spinalcord     2
GSM2545357 NonInfected C57BL/6 Day0 Spinalcord     3
GSM2545358 NonInfected C57BL/6 Day0 Spinalcord     4
GSM2545361 NonInfected C57BL/6 Day0 Spinalcord     7
GSM2545364 NonInfected C57BL/6 Day0 Spinalcord     8
GSM2545365 NonInfected C57BL/6 Day0 Spinalcord     9
GSM2545366 NonInfected C57BL/6 Day0 Spinalcord    10
GSM2545367 NonInfected C57BL/6 Day0 Spinalcord    11

R

## Use ExploreModelMatrix to create a design matrix and visualizations, given 
## the desired design formula. 
vd <- VisualizeDesign(sampleData = meta_noninf_spc, 
                      designFormula = ~ sex)
vd$designmatrix

OUTPUT

           (Intercept) sexMale
GSM2545356           1       1
GSM2545357           1       1
GSM2545358           1       0
GSM2545361           1       1
GSM2545364           1       0
GSM2545365           1       0
GSM2545366           1       0
GSM2545367           1       1

R

vd$plotlist

OUTPUT

[[1]]

R

## Note that we can also generate the design matrix like this
model.matrix(~ sex, data = meta_noninf_spc)

OUTPUT

           (Intercept) sexMale
GSM2545356           1       1
GSM2545357           1       1
GSM2545358           1       0
GSM2545361           1       1
GSM2545364           1       0
GSM2545365           1       0
GSM2545366           1       0
GSM2545367           1       1
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$sex
[1] "contr.treatment"

Challenge

With this design, what is the interpretation of the sexMale coefficient?

Challenge

Set up the design formula to compare male and female spinal cord samples from Day0 as above, but instruct R to not include an intercept in the model. How does this change the interpretation of the coefficients? What contrast would have to be specified to compare the mean expression of a gene between male and female mice?

R

meta_noninf_spc <- meta %>% filter(time == "Day0" & 
                                       tissue == "Spinalcord")
meta_noninf_spc

OUTPUT

                     title geo_accession     organism     age    sex
GSM2545356 CNS_RNA-seq_574    GSM2545356 Mus musculus 8 weeks   Male
GSM2545357 CNS_RNA-seq_575    GSM2545357 Mus musculus 8 weeks   Male
GSM2545358 CNS_RNA-seq_583    GSM2545358 Mus musculus 8 weeks Female
GSM2545361 CNS_RNA-seq_590    GSM2545361 Mus musculus 8 weeks   Male
GSM2545364 CNS_RNA-seq_709    GSM2545364 Mus musculus 8 weeks Female
GSM2545365 CNS_RNA-seq_710    GSM2545365 Mus musculus 8 weeks Female
GSM2545366 CNS_RNA-seq_711    GSM2545366 Mus musculus 8 weeks Female
GSM2545367 CNS_RNA-seq_713    GSM2545367 Mus musculus 8 weeks   Male
             infection  strain time     tissue mouse
GSM2545356 NonInfected C57BL/6 Day0 Spinalcord     2
GSM2545357 NonInfected C57BL/6 Day0 Spinalcord     3
GSM2545358 NonInfected C57BL/6 Day0 Spinalcord     4
GSM2545361 NonInfected C57BL/6 Day0 Spinalcord     7
GSM2545364 NonInfected C57BL/6 Day0 Spinalcord     8
GSM2545365 NonInfected C57BL/6 Day0 Spinalcord     9
GSM2545366 NonInfected C57BL/6 Day0 Spinalcord    10
GSM2545367 NonInfected C57BL/6 Day0 Spinalcord    11

R

vd <- VisualizeDesign(sampleData = meta_noninf_spc, 
                      designFormula = ~ 0 + sex)
vd$designmatrix

OUTPUT

           sexFemale sexMale
GSM2545356         0       1
GSM2545357         0       1
GSM2545358         1       0
GSM2545361         0       1
GSM2545364         1       0
GSM2545365         1       0
GSM2545366         1       0
GSM2545367         0       1

R

vd$plotlist

OUTPUT

[[1]]

Challenge

Set up the design formula to compare the three time points (Day0, Day4, Day8) in the male spinal cord samples, and visualize it using ExploreModelMatrix.

R

meta_male_spc <- meta %>% filter(sex == "Male" & tissue == "Spinalcord")
meta_male_spc

OUTPUT

                     title geo_accession     organism     age  sex   infection
GSM2545355 CNS_RNA-seq_571    GSM2545355 Mus musculus 8 weeks Male  InfluenzaA
GSM2545356 CNS_RNA-seq_574    GSM2545356 Mus musculus 8 weeks Male NonInfected
GSM2545357 CNS_RNA-seq_575    GSM2545357 Mus musculus 8 weeks Male NonInfected
GSM2545360 CNS_RNA-seq_589    GSM2545360 Mus musculus 8 weeks Male  InfluenzaA
GSM2545361 CNS_RNA-seq_590    GSM2545361 Mus musculus 8 weeks Male NonInfected
GSM2545367 CNS_RNA-seq_713    GSM2545367 Mus musculus 8 weeks Male NonInfected
GSM2545368 CNS_RNA-seq_728    GSM2545368 Mus musculus 8 weeks Male  InfluenzaA
GSM2545369 CNS_RNA-seq_729    GSM2545369 Mus musculus 8 weeks Male  InfluenzaA
GSM2545372 CNS_RNA-seq_733    GSM2545372 Mus musculus 8 weeks Male  InfluenzaA
GSM2545373 CNS_RNA-seq_735    GSM2545373 Mus musculus 8 weeks Male  InfluenzaA
GSM2545378 CNS_RNA-seq_742    GSM2545378 Mus musculus 8 weeks Male  InfluenzaA
GSM2545379 CNS_RNA-seq_743    GSM2545379 Mus musculus 8 weeks Male  InfluenzaA
            strain time     tissue mouse
GSM2545355 C57BL/6 Day4 Spinalcord     1
GSM2545356 C57BL/6 Day0 Spinalcord     2
GSM2545357 C57BL/6 Day0 Spinalcord     3
GSM2545360 C57BL/6 Day8 Spinalcord     6
GSM2545361 C57BL/6 Day0 Spinalcord     7
GSM2545367 C57BL/6 Day0 Spinalcord    11
GSM2545368 C57BL/6 Day4 Spinalcord    12
GSM2545369 C57BL/6 Day4 Spinalcord    13
GSM2545372 C57BL/6 Day8 Spinalcord    17
GSM2545373 C57BL/6 Day4 Spinalcord    18
GSM2545378 C57BL/6 Day8 Spinalcord    23
GSM2545379 C57BL/6 Day8 Spinalcord    24

R

vd <- VisualizeDesign(sampleData = meta_male_spc, designFormula = ~ time)
vd$designmatrix

OUTPUT

           (Intercept) timeDay4 timeDay8
GSM2545355           1        1        0
GSM2545356           1        0        0
GSM2545357           1        0        0
GSM2545360           1        0        1
GSM2545361           1        0        0
GSM2545367           1        0        0
GSM2545368           1        1        0
GSM2545369           1        1        0
GSM2545372           1        0        1
GSM2545373           1        1        0
GSM2545378           1        0        1
GSM2545379           1        0        1

R

vd$plotlist

OUTPUT

[[1]]

Factorial design without interactions


Next, we again consider only non-infected mice, but fit a model incorporating both sex and tissue as predictors. We assume that the tissue differences are the same for both male and female mice, and consequently fit an additive model, without interaction terms.

R

meta_noninf <- meta %>% filter(time == "Day0")
meta_noninf

OUTPUT

                     title geo_accession     organism     age    sex
GSM2545337 CNS_RNA-seq_11C    GSM2545337 Mus musculus 8 weeks Female
GSM2545338 CNS_RNA-seq_12C    GSM2545338 Mus musculus 8 weeks Female
GSM2545343 CNS_RNA-seq_20C    GSM2545343 Mus musculus 8 weeks   Male
GSM2545348 CNS_RNA-seq_27C    GSM2545348 Mus musculus 8 weeks Female
GSM2545349 CNS_RNA-seq_28C    GSM2545349 Mus musculus 8 weeks   Male
GSM2545353  CNS_RNA-seq_3C    GSM2545353 Mus musculus 8 weeks Female
GSM2545354  CNS_RNA-seq_4C    GSM2545354 Mus musculus 8 weeks   Male
GSM2545356 CNS_RNA-seq_574    GSM2545356 Mus musculus 8 weeks   Male
GSM2545357 CNS_RNA-seq_575    GSM2545357 Mus musculus 8 weeks   Male
GSM2545358 CNS_RNA-seq_583    GSM2545358 Mus musculus 8 weeks Female
GSM2545361 CNS_RNA-seq_590    GSM2545361 Mus musculus 8 weeks   Male
GSM2545364 CNS_RNA-seq_709    GSM2545364 Mus musculus 8 weeks Female
GSM2545365 CNS_RNA-seq_710    GSM2545365 Mus musculus 8 weeks Female
GSM2545366 CNS_RNA-seq_711    GSM2545366 Mus musculus 8 weeks Female
GSM2545367 CNS_RNA-seq_713    GSM2545367 Mus musculus 8 weeks   Male
             infection  strain time     tissue mouse
GSM2545337 NonInfected C57BL/6 Day0 Cerebellum     9
GSM2545338 NonInfected C57BL/6 Day0 Cerebellum    10
GSM2545343 NonInfected C57BL/6 Day0 Cerebellum    11
GSM2545348 NonInfected C57BL/6 Day0 Cerebellum     8
GSM2545349 NonInfected C57BL/6 Day0 Cerebellum     7
GSM2545353 NonInfected C57BL/6 Day0 Cerebellum     4
GSM2545354 NonInfected C57BL/6 Day0 Cerebellum     2
GSM2545356 NonInfected C57BL/6 Day0 Spinalcord     2
GSM2545357 NonInfected C57BL/6 Day0 Spinalcord     3
GSM2545358 NonInfected C57BL/6 Day0 Spinalcord     4
GSM2545361 NonInfected C57BL/6 Day0 Spinalcord     7
GSM2545364 NonInfected C57BL/6 Day0 Spinalcord     8
GSM2545365 NonInfected C57BL/6 Day0 Spinalcord     9
GSM2545366 NonInfected C57BL/6 Day0 Spinalcord    10
GSM2545367 NonInfected C57BL/6 Day0 Spinalcord    11

R

vd <- VisualizeDesign(sampleData = meta_noninf, 
                      designFormula = ~ sex + tissue)
vd$designmatrix

OUTPUT

           (Intercept) sexMale tissueSpinalcord
GSM2545337           1       0                0
GSM2545338           1       0                0
GSM2545343           1       1                0
GSM2545348           1       0                0
GSM2545349           1       1                0
GSM2545353           1       0                0
GSM2545354           1       1                0
GSM2545356           1       1                1
GSM2545357           1       1                1
GSM2545358           1       0                1
GSM2545361           1       1                1
GSM2545364           1       0                1
GSM2545365           1       0                1
GSM2545366           1       0                1
GSM2545367           1       1                1

R

vd$plotlist

OUTPUT

[[1]]

Factorial design with interactions


In the previous model, we assumed that the tissue differences were the same for both male and female mice. To allow for the estimation of sex-specific tissue differences (at the expense of having one additional coefficient to estimate from the data), we can include an interaction term in the model.

R

meta_noninf <- meta %>% filter(time == "Day0")
meta_noninf

OUTPUT

                     title geo_accession     organism     age    sex
GSM2545337 CNS_RNA-seq_11C    GSM2545337 Mus musculus 8 weeks Female
GSM2545338 CNS_RNA-seq_12C    GSM2545338 Mus musculus 8 weeks Female
GSM2545343 CNS_RNA-seq_20C    GSM2545343 Mus musculus 8 weeks   Male
GSM2545348 CNS_RNA-seq_27C    GSM2545348 Mus musculus 8 weeks Female
GSM2545349 CNS_RNA-seq_28C    GSM2545349 Mus musculus 8 weeks   Male
GSM2545353  CNS_RNA-seq_3C    GSM2545353 Mus musculus 8 weeks Female
GSM2545354  CNS_RNA-seq_4C    GSM2545354 Mus musculus 8 weeks   Male
GSM2545356 CNS_RNA-seq_574    GSM2545356 Mus musculus 8 weeks   Male
GSM2545357 CNS_RNA-seq_575    GSM2545357 Mus musculus 8 weeks   Male
GSM2545358 CNS_RNA-seq_583    GSM2545358 Mus musculus 8 weeks Female
GSM2545361 CNS_RNA-seq_590    GSM2545361 Mus musculus 8 weeks   Male
GSM2545364 CNS_RNA-seq_709    GSM2545364 Mus musculus 8 weeks Female
GSM2545365 CNS_RNA-seq_710    GSM2545365 Mus musculus 8 weeks Female
GSM2545366 CNS_RNA-seq_711    GSM2545366 Mus musculus 8 weeks Female
GSM2545367 CNS_RNA-seq_713    GSM2545367 Mus musculus 8 weeks   Male
             infection  strain time     tissue mouse
GSM2545337 NonInfected C57BL/6 Day0 Cerebellum     9
GSM2545338 NonInfected C57BL/6 Day0 Cerebellum    10
GSM2545343 NonInfected C57BL/6 Day0 Cerebellum    11
GSM2545348 NonInfected C57BL/6 Day0 Cerebellum     8
GSM2545349 NonInfected C57BL/6 Day0 Cerebellum     7
GSM2545353 NonInfected C57BL/6 Day0 Cerebellum     4
GSM2545354 NonInfected C57BL/6 Day0 Cerebellum     2
GSM2545356 NonInfected C57BL/6 Day0 Spinalcord     2
GSM2545357 NonInfected C57BL/6 Day0 Spinalcord     3
GSM2545358 NonInfected C57BL/6 Day0 Spinalcord     4
GSM2545361 NonInfected C57BL/6 Day0 Spinalcord     7
GSM2545364 NonInfected C57BL/6 Day0 Spinalcord     8
GSM2545365 NonInfected C57BL/6 Day0 Spinalcord     9
GSM2545366 NonInfected C57BL/6 Day0 Spinalcord    10
GSM2545367 NonInfected C57BL/6 Day0 Spinalcord    11

R

## Define a design including an interaction term
## Note that ~ sex * tissue is equivalent to 
## ~ sex + tissue + sex:tissue
vd <- VisualizeDesign(sampleData = meta_noninf, 
                      designFormula = ~ sex * tissue)
vd$designmatrix

OUTPUT

           (Intercept) sexMale tissueSpinalcord sexMale:tissueSpinalcord
GSM2545337           1       0                0                        0
GSM2545338           1       0                0                        0
GSM2545343           1       1                0                        0
GSM2545348           1       0                0                        0
GSM2545349           1       1                0                        0
GSM2545353           1       0                0                        0
GSM2545354           1       1                0                        0
GSM2545356           1       1                1                        1
GSM2545357           1       1                1                        1
GSM2545358           1       0                1                        0
GSM2545361           1       1                1                        1
GSM2545364           1       0                1                        0
GSM2545365           1       0                1                        0
GSM2545366           1       0                1                        0
GSM2545367           1       1                1                        1

R

vd$plotlist

OUTPUT

[[1]]

Paired design


In this particular data set the samples are paired - the same mice have contributed both the cerebellum and spinal cord samples. This information was not included in the previous models. However, accounting for it can increase power to detect tissue differences by eliminating variability in baseline expression levels between mice. Here, we define a paired design for the female non-infected mice, aimed at testing for differences between tissues after accounting for baseline differences between mice.

R

meta_fem_day0 <- meta %>% filter(sex == "Female" & 
                                     time == "Day0")

# ensure that mouse is treated as a categorical variable
meta_fem_day0$mouse <- factor(meta_fem_day0$mouse)

meta_fem_day0

OUTPUT

                     title geo_accession     organism     age    sex
GSM2545337 CNS_RNA-seq_11C    GSM2545337 Mus musculus 8 weeks Female
GSM2545338 CNS_RNA-seq_12C    GSM2545338 Mus musculus 8 weeks Female
GSM2545348 CNS_RNA-seq_27C    GSM2545348 Mus musculus 8 weeks Female
GSM2545353  CNS_RNA-seq_3C    GSM2545353 Mus musculus 8 weeks Female
GSM2545358 CNS_RNA-seq_583    GSM2545358 Mus musculus 8 weeks Female
GSM2545364 CNS_RNA-seq_709    GSM2545364 Mus musculus 8 weeks Female
GSM2545365 CNS_RNA-seq_710    GSM2545365 Mus musculus 8 weeks Female
GSM2545366 CNS_RNA-seq_711    GSM2545366 Mus musculus 8 weeks Female
             infection  strain time     tissue mouse
GSM2545337 NonInfected C57BL/6 Day0 Cerebellum     9
GSM2545338 NonInfected C57BL/6 Day0 Cerebellum    10
GSM2545348 NonInfected C57BL/6 Day0 Cerebellum     8
GSM2545353 NonInfected C57BL/6 Day0 Cerebellum     4
GSM2545358 NonInfected C57BL/6 Day0 Spinalcord     4
GSM2545364 NonInfected C57BL/6 Day0 Spinalcord     8
GSM2545365 NonInfected C57BL/6 Day0 Spinalcord     9
GSM2545366 NonInfected C57BL/6 Day0 Spinalcord    10

R

vd <- VisualizeDesign(sampleData = meta_fem_day0,
                      designFormula = ~ mouse + tissue)
vd$designmatrix

OUTPUT

           (Intercept) mouse8 mouse9 mouse10 tissueSpinalcord
GSM2545337           1      0      1       0                0
GSM2545338           1      0      0       1                0
GSM2545348           1      1      0       0                0
GSM2545353           1      0      0       0                0
GSM2545358           1      0      0       0                1
GSM2545364           1      1      0       0                1
GSM2545365           1      0      1       0                1
GSM2545366           1      0      0       1                1

R

vd$plotlist

OUTPUT

[[1]]

Within- and between-subject comparisons


In some situations, we need to combine the types of models considered above. For example, let’s say that we want to investigate if the tissue differences are different for infected and non-infected female mice. In this case, each mice only contributes to one of the infection groups (each mice is either infected or non-infected), but contributes both a cerebellum and a spinal cord sample. One way to view this type of design is as two paired experiments, one for each infection group (see the edgeR user guide section 3.5). We can then easily compare the two tissues in each infection group, and contrast the tissue differences between the infection groups.

R

meta_fem_day04 <- meta %>% 
    filter(sex == "Female" & 
               time %in% c("Day0", "Day4")) %>%
    droplevels()
# ensure that mouse is treated as a categorical variable
meta_fem_day04$mouse <- factor(meta_fem_day04$mouse)

meta_fem_day04

OUTPUT

                     title geo_accession     organism     age    sex
GSM2545337 CNS_RNA-seq_11C    GSM2545337 Mus musculus 8 weeks Female
GSM2545338 CNS_RNA-seq_12C    GSM2545338 Mus musculus 8 weeks Female
GSM2545339 CNS_RNA-seq_13C    GSM2545339 Mus musculus 8 weeks Female
GSM2545344 CNS_RNA-seq_21C    GSM2545344 Mus musculus 8 weeks Female
GSM2545348 CNS_RNA-seq_27C    GSM2545348 Mus musculus 8 weeks Female
GSM2545352 CNS_RNA-seq_30C    GSM2545352 Mus musculus 8 weeks Female
GSM2545353  CNS_RNA-seq_3C    GSM2545353 Mus musculus 8 weeks Female
GSM2545358 CNS_RNA-seq_583    GSM2545358 Mus musculus 8 weeks Female
GSM2545362  CNS_RNA-seq_5C    GSM2545362 Mus musculus 8 weeks Female
GSM2545364 CNS_RNA-seq_709    GSM2545364 Mus musculus 8 weeks Female
GSM2545365 CNS_RNA-seq_710    GSM2545365 Mus musculus 8 weeks Female
GSM2545366 CNS_RNA-seq_711    GSM2545366 Mus musculus 8 weeks Female
GSM2545371 CNS_RNA-seq_731    GSM2545371 Mus musculus 8 weeks Female
GSM2545375 CNS_RNA-seq_738    GSM2545375 Mus musculus 8 weeks Female
GSM2545376 CNS_RNA-seq_740    GSM2545376 Mus musculus 8 weeks Female
GSM2545377 CNS_RNA-seq_741    GSM2545377 Mus musculus 8 weeks Female
             infection  strain time     tissue mouse
GSM2545337 NonInfected C57BL/6 Day0 Cerebellum     9
GSM2545338 NonInfected C57BL/6 Day0 Cerebellum    10
GSM2545339  InfluenzaA C57BL/6 Day4 Cerebellum    15
GSM2545344  InfluenzaA C57BL/6 Day4 Cerebellum    22
GSM2545348 NonInfected C57BL/6 Day0 Cerebellum     8
GSM2545352  InfluenzaA C57BL/6 Day4 Cerebellum    21
GSM2545353 NonInfected C57BL/6 Day0 Cerebellum     4
GSM2545358 NonInfected C57BL/6 Day0 Spinalcord     4
GSM2545362  InfluenzaA C57BL/6 Day4 Cerebellum    20
GSM2545364 NonInfected C57BL/6 Day0 Spinalcord     8
GSM2545365 NonInfected C57BL/6 Day0 Spinalcord     9
GSM2545366 NonInfected C57BL/6 Day0 Spinalcord    10
GSM2545371  InfluenzaA C57BL/6 Day4 Spinalcord    15
GSM2545375  InfluenzaA C57BL/6 Day4 Spinalcord    20
GSM2545376  InfluenzaA C57BL/6 Day4 Spinalcord    21
GSM2545377  InfluenzaA C57BL/6 Day4 Spinalcord    22

R

design <- model.matrix(~ mouse, data = meta_fem_day04)
design <- cbind(design, 
                Spc.Day0 = meta_fem_day04$tissue == "Spinalcord" & 
                    meta_fem_day04$time == "Day0",
                Spc.Day4 = meta_fem_day04$tissue == "Spinalcord" & 
                    meta_fem_day04$time == "Day4")
rownames(design) <- rownames(meta_fem_day04)
design

OUTPUT

           (Intercept) mouse8 mouse9 mouse10 mouse15 mouse20 mouse21 mouse22
GSM2545337           1      0      1       0       0       0       0       0
GSM2545338           1      0      0       1       0       0       0       0
GSM2545339           1      0      0       0       1       0       0       0
GSM2545344           1      0      0       0       0       0       0       1
GSM2545348           1      1      0       0       0       0       0       0
GSM2545352           1      0      0       0       0       0       1       0
GSM2545353           1      0      0       0       0       0       0       0
GSM2545358           1      0      0       0       0       0       0       0
GSM2545362           1      0      0       0       0       1       0       0
GSM2545364           1      1      0       0       0       0       0       0
GSM2545365           1      0      1       0       0       0       0       0
GSM2545366           1      0      0       1       0       0       0       0
GSM2545371           1      0      0       0       1       0       0       0
GSM2545375           1      0      0       0       0       1       0       0
GSM2545376           1      0      0       0       0       0       1       0
GSM2545377           1      0      0       0       0       0       0       1
           Spc.Day0 Spc.Day4
GSM2545337        0        0
GSM2545338        0        0
GSM2545339        0        0
GSM2545344        0        0
GSM2545348        0        0
GSM2545352        0        0
GSM2545353        0        0
GSM2545358        1        0
GSM2545362        0        0
GSM2545364        1        0
GSM2545365        1        0
GSM2545366        1        0
GSM2545371        0        1
GSM2545375        0        1
GSM2545376        0        1
GSM2545377        0        1

R

vd <- VisualizeDesign(sampleData = meta_fem_day04 %>%
                          select(time, tissue, mouse),
                      designFormula = NULL, 
                      designMatrix = design, flipCoordFitted = FALSE)
vd$designmatrix

OUTPUT

           (Intercept) mouse8 mouse9 mouse10 mouse15 mouse20 mouse21 mouse22
GSM2545337           1      0      1       0       0       0       0       0
GSM2545338           1      0      0       1       0       0       0       0
GSM2545339           1      0      0       0       1       0       0       0
GSM2545344           1      0      0       0       0       0       0       1
GSM2545348           1      1      0       0       0       0       0       0
GSM2545352           1      0      0       0       0       0       1       0
GSM2545353           1      0      0       0       0       0       0       0
GSM2545358           1      0      0       0       0       0       0       0
GSM2545362           1      0      0       0       0       1       0       0
GSM2545364           1      1      0       0       0       0       0       0
GSM2545365           1      0      1       0       0       0       0       0
GSM2545366           1      0      0       1       0       0       0       0
GSM2545371           1      0      0       0       1       0       0       0
GSM2545375           1      0      0       0       0       1       0       0
GSM2545376           1      0      0       0       0       0       1       0
GSM2545377           1      0      0       0       0       0       0       1
           Spc.Day0 Spc.Day4
GSM2545337        0        0
GSM2545338        0        0
GSM2545339        0        0
GSM2545344        0        0
GSM2545348        0        0
GSM2545352        0        0
GSM2545353        0        0
GSM2545358        1        0
GSM2545362        0        0
GSM2545364        1        0
GSM2545365        1        0
GSM2545366        1        0
GSM2545371        0        1
GSM2545375        0        1
GSM2545376        0        1
GSM2545377        0        1

R

vd$plotlist

OUTPUT

$`time = Day0`

OUTPUT


$`time = Day4`

Keypoints

  • The formula framework in R allows creation of design matrices, which details the variables expected to be associated with systematic differences in gene expression levels.
  • Comparisons of interest can be defined using contrasts, which are linear combinations of the model coefficients.