Summary and Setup
This lesson presents basic machine learning concepts featured in Data Analysis for the Life Sciences by Rafael A. Irizarry and Michael I. Love. The lesson is adapted from software for Chapter 9: Practical Machine Learning, which is published under this MIT license. Adaptation was funded by NIH grant 1R25GM141520 awarded to Dr. Gary Churchill at The Jackson Laboratory.
Project organization
Start RStudio.
Create a new project in your Desktop called
ml-biomed.
- Click the
Filemenu button, thenNew Project. - Click
New Directory. - Click
New Project. - Type
ml-biomedas the directory name. Browse to your Desktop to create the project there. - Click the
Create Projectbutton.
-
Use the
Filestab to create adatafolder to hold the data, ascriptsfolder to hold your scripts, and aresultsfolder to hold results. Alternatively, you can use the R console to run the following commands for step 3 only. You still need to create a project with step 2.dir.create("./data") dir.create("./scripts") dir.create("./results")
Data Sets
Download the tissue gene expression data
directly from Github and place them in your new data
directory.
The data represent RNA expression levels for eight tissues, each with several individuals.
Software Setup
-
Install R packages and load the libraries.
install.packages("rafalib", "RColorBrewer", "gplots", "UsingR", "class", "caret") library(rafalib) library(RColorBrewer) library(gplots) library(UsingR) library(class) library(caret) -
Install and load packages from Bioconductor.
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("genefilter", "Biobase", "SpikeIn", "hgu95acdf")) library(genefilter) library(Biobase) library(SpikeIn) library(hgu95acdf) data(SpikeIn95)