Summary and Schedule

Prerequisites

Knowledge of R programming (eg, a data carpentries course)
Knowledge of basic statistical techniques (eg, an introduction to linear regression for health sciences)

Extra resources

This course can’t cover all aspects of statistics and data with R. There are many free resources to learn more about the topics, and indeed to learn even broader topics! Some of these are listed here:

Setup Instructions Download files required for the lesson

Duration: 00h 00m 1. Introduction to high-dimensional data What are high-dimensional data and what do these data look like in the biosciences?
What are the challenges when analysing high-dimensional data?
What statistical methods are suitable for analysing these data?
How can Bioconductor be used to access high-dimensional data in the biosciences?

Duration: 00h 50m 2. Regression with many outcomes How can we apply linear regression in a high-dimensional setting?
How can we benefit from the fact that we have many outcomes?
How can we control for the fact that we do many tests?

Duration: 02h 50m 3. Regularised regression What is regularisation?
How does regularisation work?
How can we select the level of regularisation for a model?

Duration: 05h 40m 4. Principal component analysis What is principal component analysis (PCA) and when can it be used?
How can we perform a PCA in R?
How many principal components are needed to explain a significant amount of variation in the data?
How to interpret the output of PCA using loadings and principal components?

Duration: 07h 50m 5. Factor analysis What is factor analysis and when can it be used?
What are communality and uniqueness in factor analysis?
How to decide on the number of factors to use?
How to interpret the output of factor analysis?

Duration: 08h 30m 6. K-means How do we detect real clusters in high-dimensional data?
How does K-means work and when should it be used?
How can we perform K-means in R?
How can we appraise a clustering and test cluster robustness?

Duration: 09h 50m 7. Hierarchical clustering What is hierarchical clustering and how does it differ from other clustering methods?
How do we carry out hierarchical clustering in R?
What distance matrix and linkage methods should we use?
How can we validate identified clusters?

Duration: 11h 20m Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

It’s usually recommended that course instructors provide a virtual environment with software and data available. However this page includes instructions to set up for the lessons. This should take about an hour to run, depending on the speed of your computer, your internet connection, and any packages you have installed already. You’ll need to install R 4.0 or later.

R usually enables package downloads using pre-built binaries. Some times, this is not possible, particularly on Linux and Mac systems. In this case, R package installation often requires additional system dependencies. If you are a Linux user, to ensure that you can download packages using the code below, first run the terminal commands for your distribution from the posit documentation. Note that you will need to use root access (sudo) to install the system dependencies. Mac users may need to use homebrew to install system dependencies, and Windows users may need to install RTools. Ideally, installing packages will proceed without error and you can ignore these steps, but this isn’t always the case.

Previous learners have reported issues with igraph. Installation instructions for this package can be found on https://r.igraph.org/,

All learners should then run the following code to download the data and install the libraries used in this lesson:

R

install.packages("renv")
download.file(
    "https://raw.githubusercontent.com/carpentries-incubator/high-dimensional-stats-r/refs/heads/transition-workbench/renv.lock",
    destfile = 'renv.lock'
)
renv::restore()

dir.create("data", recursive=TRUE, showWarnings = FALSE)
data_files <- c(
    "cancer_expression.rds",
    "coefHorvath.rds",
    "methylation.rds",
    "scrnaseq.rds",
    "prostate.rds",
    "cres.rds"
)
for (file in data_files) {
    download.file(
        url = file.path(
            "https://raw.githubusercontent.com/carpentries-incubator/high-dimensional-stats-r/main/episodes/data",
            file
        ),
        destfile = file.path("data", file)
    )
}