This lesson is in the early stages of development (Alpha version)

High dimensional statistics with R

Prerequisites

Extra resources

This course can’t cover all aspects of statistics and data with R. There are many free resources to learn more about the topics, and indeed to learn even broader topics! Some of these are listed here:

Schedule

Setup Download files required for the lesson
00:00 1. Introduction to high-dimensional data What are high-dimensional data and what do these data look like in the biosciences?
What are the challenges when analysing high-dimensional data?
What statistical methods are suitable for analysing these data?
How can Bioconductor be used to access high-dimensional data in the biosciences?
00:50 2. Regression with many outcomes How can we apply linear regression in a high-dimensional setting?
How can we benefit from the fact that we have many outcomes?
How can we control for the fact that we do many tests?
02:50 3. Regularised regression What is regularisation?
How does regularisation work?
How can we select the level of regularisation for a model?
05:40 4. Principal component analysis What is principal component analysis (PCA) and when can it be used?
How can we perform a PCA in R?
How many principal components are needed to explain a significant amount of variation in the data?
How to interpret the output of PCA using loadings and principal components?
07:50 5. Factor analysis What is factor analysis and when can it be used?
What are communality and uniqueness in factor analysis?
How to decide on the number of factors to use?
How to interpret the output of factor analysis?
08:30 6. K-means How do we detect real clusters in high-dimensional data?
How does K-means work and when should it be used?
How can we perform K-means in R?
How can we appraise a clustering and test cluster robustness?
09:50 7. Hierarchical clustering What is hierarchical clustering and how does it differ from other clustering methods?
How do we carry out hierarchical clustering in R?
What distance matrix and linkage methods should we use?
How can we validate identified clusters?
11:20 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.