This lesson is still being designed and assembled (Pre-Alpha version)

Introduction into R


Teaching: 30 min
Exercises: 0 min
  • Basics of the R language.

  • Be able to create variables

  • Join variables into a vector

  • Join vectors into a tibble

  • Perform basic tibble manipulation



Introducing the data

We would like to introduce the data that we will be using during the course. The Critical Care Health Informatics Collaborative (CC-HIC) is a UK research body that has aggregated data from thousands of critical care patients. We will be using an anonymised sample from this cohort.

The data has been pre-prepared for us today, and will be presented as two “spread sheets” or what is called a data frame in R parlance. This is the most common format for presenting data that can be described in rows and columns (so called “rectangular data”).

The data given contains information in a 1 row per patient, and 1 column per variable.

For a full description of the data that exists inside CC-HIC see here

You are also encourage to bring along your own data. We can’t promise to spend time on this, but there are exercises to do along the way, and you might want to try these exercises out on your own work after the course. We will be around for the day, so feel free to approach us at any time and ask us for advice for your own data.

Files and directories

It is going to be helpful to have an understanding of how files and folders (commonly called “directories”) are named on your computer because unlike your usual habit of pointing and clicking to open something, we will need to start writing things down.

While you get used to the notion of typing the location of files on the computer, you can use a little shortcut to help out. The file.choose() function will allow you to pick a file on your computer, and it will tell you the full location. For now, use the function to navigate to the course data and pick out the demographic-data.csv file.

## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
my_file <- read_csv("./data/synthetic_data_clean.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   creatinine = col_character(),
##   arrival_dttm = col_datetime(format = ""),
##   discharge_dttm = col_datetime(format = ""),
##   dob = col_date(format = ""),
##   vital_status = col_character(),
##   sex = col_character(),
##   id = col_character()
## )
## See spec(...) for full column specifications.

now we can call the my_file object back and see that it contains the address to the file.


And this is what it should look like:

## Error in loadNamespace(name): there is no package called 'webshot'

Don’t worry if this doesn’t all make sense at this time. It only means that you are paying attension! All will become clear over the course of the day.

Key Points

  • Data is structured and can be broken down into basic building blocks