Importing Data into R

Overview

Teaching: 15 min
Exercises: 15 min

Questions

How to import data into R?

Objectives

Successfully import data into R

Importing data into R

In order to use your data in R, you must import it and turn it into an R object. There are many ways to get data into R.

Manually: You can manually create it. To create a data.frame, use the data.frame() and specify your variables.
Import it from a file Below is a very incomplete list
Text: TXT (readLines() function)
Tabular data: CSV, TSV (read.table() function or readr package)
Excel: XLSX (xlsx package)
Google sheets: (googlesheets package)
Statistics program: SPSS, SAS (haven package)
Databases: MySQL (RMySQL package)
Gather it from the web: You can connect to webpages, servers, or APIs directly from within R, or you can create a data scraped from HTML webpages using the rvest package.
For example, connect to the Twitter API with the twitteR package, or Altmetrics data with rAltmetric, or World Bank’s World Development Indicators with WDI.

Make sure the files you downloaded from the zip folder are organized correctly before continuing on.

If you need to reorganize your files the instructions are below:

Create three folders on your computer desktop and name them data, anderson_naive, and search_results

Add the following files you downloaded at the beginning of the lesson to the data folder.

anderson_refs.csv

anderson_refs.rda

andersosn_studies.rda

suggested_keywords_grouped

Add the three downloaded files from the anderson_naive zip folder to the anderson_naive folder.

MEDLINE_1-500

MEDLINE_501-603

PsycINFO

Add the three downloaded savedrecs files to the search_results folder.

savedrecs(1)

savedrecs(2)

savedrecs(3)

Create a folder on your desktop called lc_litsearchr. Add your three new folders, data, anderson_naive, and search_results to the lc_litsearchr folder.

If you need to download the files they can be found in Data

Opening a .csv file

To open a .csv file we will use the built in read.csv(...) function, which reads the data in as a data frame, and assigns the data frame to a variable using the <- so that it is stored in R’s memory.

## import the data and look at the first six rows
## use the tab key to get the file option
anderson_refs <- read.csv(file = "./data/anderson_refs.csv", header = TRUE, sep = ",")

head(anderson_refs)

## We can see what R thinks of the data in our dataset by using the class() function with $ operator.

# use the column name `year`
class(anderson_refs$year)

# use the column name `source`
class(anderson_refs$source)

The header Argument

The default for read.csv(...) is to set the header argument to TRUE. This means that the first row of values in the .csv is set as header information (column names). If your data set does not have a header, set the header argument to FALSE.

The na.strings Argument

We often need to deal with missing data in our dataset. A useful argument for the read.csv() function is na.strings, which allows you to specify how you have represented missing values in the dataset you’re importing, and recode those values as NA, which is how R recognizes missing values. For example, if our anderson_refs dataset had missing values coded as lowercase ‘na’, we can recode these to uppercase ‘NA’ using the na.strings arguments as follows:
## To import data use the read.csv() function. To recode missing values to NA, use the `na.strings` argument

read.csv(anderson_refs, file = 'data/anderson_refs_clean.csv', na.strings = "na")

The `write.csv` Function

After altering a dataset by replacing columns or updating values you can save the new output with write.csv(...).

## To export the data use the write.csv() function. It requires a minimum of two arguments for the data to be saved and the name of the output file.

## For example, if we had edited the anderson_refs csv file we could use:
write.csv(anderson_refs, file = "./data/anderson-refs-cleaned.csv")

The row.names Argument

This argument for the write.csv function allows us to set the names of the rows in the output data file. R’s default for this argument is TRUE, and since it does not know what else to name the rows for the dataset, it resorts to using row numbers. To correct this, we can set row.names to FALSE:
## To export data use the write.csv() function. To avoid an additional column with row numbers, set `row.names` to FALSE

write.csv(anderson_refs, file = 'data/anderson_refs_clean.csv', row.names = FALSE)

Key Points

There are many ways to get data into R. You can import data from a .csv file using the read.csv(…) function.

previous episode

Library Carpentry: Introduction to R and litsearchr

next episode

Importing Data into R

Overview

Importing data into R

Make sure the files you downloaded from the zip folder are organized correctly before continuing on.

Opening a .csv file

The `header` Argument

The `na.strings` Argument

The `write.csv` Function

The `row.names` Argument

Key Points

previous episode

next episode

previous episode

Library Carpentry: Introduction to R and litsearchr

next episode

Importing Data into R

Overview

Importing data into R

Make sure the files you downloaded from the zip folder are organized correctly before continuing on.

Opening a .csv file

The header Argument

The na.strings Argument

The write.csv Function

The row.names Argument

Key Points

previous episode

next episode

The `header` Argument

The `na.strings` Argument

The `write.csv` Function

The `row.names` Argument