Summary and Schedule
This lesson presents basic machine learning concepts featured in Data Analysis for the Life Sciences by Rafael A. Irizarry and Michael I. Love. The lesson is adapted from software for Chapter 9: Practical Machine Learning, which is published under this MIT license. Adaptation was funded by NIH grant 1R25GM141520 awarded to Dr. Gary Churchill at The Jackson Laboratory.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Basic Machine Learning |
What is machine learning? How is machine learning used in biomedical studies? |
Duration: 00h 10m | 2. Clustering |
How can clusters within high dimensional data can be
discovered? :::::: |
Duration: 01h 10m | 3. Conditional Probabilities and Expectations |
How do we get statistical information from specific subsets in our
data? :::::: |
Duration: 02h 10m | 4. Smoothing |
Can a model be fitted to a dataset which shape is unknown but
smooth? :::::: |
Duration: 03h 10m | 5. Class Prediction |
What is machine learning (ML)? Why should we learn ML? :::::: |
Duration: 04h 10m | 6. Cross-validation |
How can the best configuration of parameters be selected for a machine
learning model using only the data available? :::::: |
Duration: 05h 10m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Project organization
Start RStudio.
Create a new project in your Desktop called
ml-biomed
.
- Click the
File
menu button, thenNew Project
. - Click
New Directory
. - Click
New Project
. - Type
ml-biomed
as the directory name. Browse to your Desktop to create the project there. - Click the
Create Project
button.
-
Use the
Files
tab to create adata
folder to hold the data, ascripts
folder to hold your scripts, and aresults
folder to hold results. Alternatively, you can use the R console to run the following commands for step 3 only. You still need to create a project with step 2.dir.create("./data") dir.create("./scripts") dir.create("./results")
Data Sets
Download the tissue gene expression data
directly from Github and place them in your new data
directory.
The data represent RNA expression levels for eight tissues, each with several individuals.
Software Setup
-
Install R packages and load the libraries.
install.packages("rafalib", "RColorBrewer", "gplots", "UsingR", "class", "caret") library(rafalib) library(RColorBrewer) library(gplots) library(UsingR) library(class) library(caret)
-
Install and load packages from Bioconductor.
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("genefilter", "Biobase", "SpikeIn", "hgu95acdf")) library(genefilter) library(Biobase) library(SpikeIn) library(hgu95acdf) data(SpikeIn95)