Summary and Setup
This lesson presents basic machine learning concepts featured in Data Analysis for the Life Sciences by Rafael A. Irizarry and Michael I. Love. The lesson is adapted from software for Chapter 9: Practical Machine Learning, which is published under this MIT license. Adaptation was funded by NIH grant 1R25GM141520 awarded to Dr. Gary Churchill at The Jackson Laboratory.
Project organization
Start RStudio.
Create a new project in your Desktop called
ml-biomed
.
- Click the
File
menu button, thenNew Project
. - Click
New Directory
. - Click
New Project
. - Type
ml-biomed
as the directory name. Browse to your Desktop to create the project there. - Click the
Create Project
button.
-
Use the
Files
tab to create adata
folder to hold the data, ascripts
folder to hold your scripts, and aresults
folder to hold results. Alternatively, you can use the R console to run the following commands for step 3 only. You still need to create a project with step 2.dir.create("./data") dir.create("./scripts") dir.create("./results")
Data Sets
Download the tissue gene expression data
directly from Github and place them in your new data
directory.
The data represent RNA expression levels for eight tissues, each with several individuals.
Software Setup
-
Install R packages and load the libraries.
install.packages("rafalib", "RColorBrewer", "gplots", "UsingR", "class", "caret") library(rafalib) library(RColorBrewer) library(gplots) library(UsingR) library(class) library(caret)
-
Install and load packages from Bioconductor.
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("genefilter", "Biobase", "SpikeIn", "hgu95acdf")) library(genefilter) library(Biobase) library(SpikeIn) library(hgu95acdf) data(SpikeIn95)