This lesson is still being designed and assembled (Pre-Alpha version)

High dimensional statistics with R

Prerequisites

Extra resources

This course can’t cover all aspects of statistics and data with R. There are many free resources to learn more about the topics, and indeed to learn even broader topics! Some of these are listed here:

Schedule

Setup Download files required for the lesson
00:00 1. Introduction How can we describe statistical models?
How can we describe model assumptions?
00:00 2. Regression with many features How can we apply regression methods in a high-dimensional setting?
How can we control for the fact that we do many tests?
How can we benefit from the fact that we have many variables?
01:30 3. Feature selection for regression Why would we want to find a subset of features that are associated with an outcome?
How can we iteratively find a good subset of our features variables to use for regression?
What are some risks and downsides of iterative feature selection?
02:30 4. Regularised regression What is regularisation?
How does regularisation work?
How can we select the level of regularisation for a model?
03:50 5. Principal component analysis What is principal component analysis and when can it be used?
What are principal components and loadings?
How many principal components are needed to explain a significant amount of variation in the data?
How to interpret output of PCA?
05:50 6. Non-metric multi-dimensional scaling What statistical methods are available to compare communities of species/genes/groups between sites?
What is non-metric multidimensional scaling and how does it differ from other ordination methods, such as PCA?
How is NMDS carried out and how is the fit to the original data assessed?
How is the output from NMDS interpreted?
07:00 7. Factor analysis What is factor analysis and when can it be used?
What are communality and uniqueness in factor analysis?
How to decide on number of factors to use?
How to interpret output of factor analysis?
07:35 8. Mixture models What is clustering?
Why would we want to find clusters in data?
How can we cluster low-dimensional data with a model?
What difficulties does high-dimensional clustering present?
08:35 9. K-means Should we always believe in clusters?
How does K-means work?
How can we perform K-means?
How can we appraise a clustering?
How can we test cluster robustness?
09:35 10. Hierarchical clustering What is hierarchical clustering and how does it differ from other clustering methods?
How do we carry out hierarchical clustering in R?
What distance matrix and linkage methods should we use?
How can we validate identified clusters?
10:45 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.