This lesson is still being designed and assembled (Pre-Alpha version)

Importing Data into R

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How to import data into R?

Objectives
  • Successfully import data into R

Importing data into R

In order to use your data in R, you must import it and turn it into an R object. There are many ways to get data into R.

Make sure the files you downloaded from the zip folder are organized correctly before continuing on.

If you need to reorganize your files the instructions are below:

Create three folders on your computer desktop and name them data, anderson_naive, and search_results

Add the following files you downloaded at the beginning of the lesson to the data folder.

  • anderson_refs.csv
  • anderson_refs.rda
  • andersosn_studies.rda
  • suggested_keywords_grouped

Add the three downloaded files from the anderson_naive zip folder to the anderson_naive folder.

  • MEDLINE_1-500
  • MEDLINE_501-603
  • PsycINFO

Add the three downloaded savedrecs files to the search_results folder.

  • savedrecs(1)
  • savedrecs(2)
  • savedrecs(3)

Create a folder on your desktop called lc_litsearchr. Add your three new folders, data, anderson_naive, and search_results to the lc_litsearchr folder.

If you need to download the files they can be found in Data

Opening a .csv file

To open a .csv file we will use the built in read.csv(...) function, which reads the data in as a data frame, and assigns the data frame to a variable using the <- so that it is stored in R’s memory.

## import the data and look at the first six rows
## use the tab key to get the file option
anderson_refs <- read.csv(file = "./data/anderson_refs.csv", header = TRUE, sep = ",")

head(anderson_refs)
## We can see what R thinks of the data in our dataset by using the class() function with $ operator.

# use the column name `year`
class(anderson_refs$year)

# use the column name `source`
class(anderson_refs$source)

The header Argument

The default for read.csv(...) is to set the header argument to TRUE. This means that the first row of values in the .csv is set as header information (column names). If your data set does not have a header, set the header argument to FALSE.

The na.strings Argument

We often need to deal with missing data in our dataset. A useful argument for the read.csv() function is na.strings, which allows you to specify how you have represented missing values in the dataset you’re importing, and recode those values as NA, which is how R recognizes missing values. For example, if our anderson_refs dataset had missing values coded as lowercase ‘na’, we can recode these to uppercase ‘NA’ using the na.strings arguments as follows:

## To import data use the read.csv() function. To recode missing values to NA, use the `na.strings` argument

read.csv(anderson_refs, file = 'data/anderson_refs_clean.csv', na.strings = "na")

The write.csv Function

After altering a dataset by replacing columns or updating values you can save the new output with write.csv(...).

## To export the data use the write.csv() function. It requires a minimum of two arguments for the data to be saved and the name of the output file.

## For example, if we had edited the anderson_refs csv file we could use:
write.csv(anderson_refs, file = "./data/anderson-refs-cleaned.csv")

The row.names Argument

This argument for the write.csv function allows us to set the names of the rows in the output data file. R’s default for this argument is TRUE, and since it does not know what else to name the rows for the dataset, it resorts to using row numbers. To correct this, we can set row.names to FALSE:

## To export data use the write.csv() function. To avoid an additional column with row numbers, set `row.names` to FALSE

write.csv(anderson_refs, file = 'data/anderson_refs_clean.csv', row.names = FALSE)

Key Points

  • There are many ways to get data into R. You can import data from a .csv file using the read.csv(…) function.