Importing Data into R
Overview
Teaching: 15 min
Exercises: 15 minQuestions
How to import data into R?
Objectives
Successfully import data into R
Importing data into R
In order to use your data in R, you must import it and turn it into an R object. There are many ways to get data into R.
- Manually: You can manually create it. To create a data.frame, use the
data.frame()and specify your variables. - Import it from a file Below is a very incomplete list
- Text: TXT (
readLines()function) - Tabular data: CSV, TSV (
read.table()function orreadrpackage) - Excel: XLSX (
xlsxpackage) - Google sheets: (
googlesheetspackage) - Statistics program: SPSS, SAS (
havenpackage) - Databases: MySQL (
RMySQLpackage) - Gather it from the web: You can connect to webpages, servers, or APIs directly from within R, or you can create a data scraped from HTML webpages using the
rvestpackage. - For example, connect to the Twitter API with the
twitteRpackage, or Altmetrics data withrAltmetric, or World Bank’s World Development Indicators withWDI.
Make sure the files you downloaded from the zip folder are organized correctly before continuing on.
If you need to reorganize your files the instructions are below:
Create three folders on your computer desktop and name them
data,anderson_naive, andsearch_resultsAdd the following files you downloaded at the beginning of the lesson to the
datafolder.
- anderson_refs.csv
- anderson_refs.rda
- andersosn_studies.rda
- suggested_keywords_grouped
Add the three downloaded files from the anderson_naive zip folder to the
anderson_naivefolder.
- MEDLINE_1-500
- MEDLINE_501-603
- PsycINFO
Add the three downloaded savedrecs files to the
search_resultsfolder.
- savedrecs(1)
- savedrecs(2)
- savedrecs(3)
Create a folder on your desktop called
lc_litsearchr. Add your three new folders,data,anderson_naive, andsearch_resultsto thelc_litsearchrfolder.If you need to download the files they can be found in Data
Opening a .csv file
To open a .csv file we will use the built in read.csv(...) function, which reads the data in as a data frame, and assigns the data frame to a variable using the <- so that it is stored in R’s memory.
## import the data and look at the first six rows
## use the tab key to get the file option
anderson_refs <- read.csv(file = "./data/anderson_refs.csv", header = TRUE, sep = ",")
head(anderson_refs)
## We can see what R thinks of the data in our dataset by using the class() function with $ operator.
# use the column name `year`
class(anderson_refs$year)
# use the column name `source`
class(anderson_refs$source)
The
headerArgumentThe default for
read.csv(...)is to set the header argument toTRUE. This means that the first row of values in the .csv is set as header information (column names). If your data set does not have a header, set the header argument toFALSE.
The
na.stringsArgumentWe often need to deal with missing data in our dataset. A useful argument for the read.csv() function is na.strings, which allows you to specify how you have represented missing values in the dataset you’re importing, and recode those values as NA, which is how R recognizes missing values. For example, if our anderson_refs dataset had missing values coded as lowercase ‘na’, we can recode these to uppercase ‘NA’ using the na.strings arguments as follows:
## To import data use the read.csv() function. To recode missing values to NA, use the `na.strings` argument read.csv(anderson_refs, file = 'data/anderson_refs_clean.csv', na.strings = "na")
The
write.csvFunctionAfter altering a dataset by replacing columns or updating values you can save the new output with
write.csv(...).## To export the data use the write.csv() function. It requires a minimum of two arguments for the data to be saved and the name of the output file. ## For example, if we had edited the anderson_refs csv file we could use: write.csv(anderson_refs, file = "./data/anderson-refs-cleaned.csv")
The
row.namesArgumentThis argument for the write.csv function allows us to set the names of the rows in the output data file. R’s default for this argument is TRUE, and since it does not know what else to name the rows for the dataset, it resorts to using row numbers. To correct this, we can set row.names to FALSE:
## To export data use the write.csv() function. To avoid an additional column with row numbers, set `row.names` to FALSE write.csv(anderson_refs, file = 'data/anderson_refs_clean.csv', row.names = FALSE)
Key Points
There are many ways to get data into R. You can import data from a .csv file using the read.csv(…) function.