This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

# Data Frame Manipulation

## Overview

Teaching: 10 min
Exercises: 10 min
Questions
• Data-frames. What are they, and how to manage them?

Objectives
• Understand what is a data-frame and learn to manipulate it.

## Data-frames: The power of interdisciplinarity

Data-frames are the powerful data structures in R. Let’s begin by creating a mock data set:

``````> musician <- data.frame(people = c("Medtner", "Radwimps", "Shakira"),
pieces = c(722,187,68),
likes = c(0,1,1))
> musician
``````

The content of our new object:

``````    people pieces likes
1  Medtner    722     0
3  Shakira     68     1
``````

We have just created our first data-frame. We can see if this is true using the `class()` command:

``````> class(musician)
``````
``````[1] "data.frame"
``````

A data-frame is a collection of vectors (i.e. a list) whose components must be of the same data type within each vector:

Figure 3. Structure of the created data-frame.

We can begin to explore our new object by pulling out columns using the `\$` operator. In order to use it, you need to write the name of your data-frame, followed by the `\$` operator and the name of the column you want to extract:

``````> musician\$people
``````
``````[1] "Medtner"  "Radwimps" "Shakira"
``````

We can do operations with the columns:

``````> musician\$pieces + 20
``````
``````[1] 742 207  88
``````

Moreover, we can change the data type of one of the columns. Using the next line of code we can see if the musicians are popular or not:

``````> typeof(musician\$likes)
``````
``````[1] "double"
``````
``````> musician\$likes <- as.logical(musician\$likes)
> paste("Is",musician\$people, "popular? :", musician\$likes, sep = " ")
``````
``````[1] "Is Medtner popular? : FALSE" "Is Radwimps popular? : TRUE" "Is Shakira popular? : TRUE"
``````

Finally, we can extract information from a specific place in our data by using the “matrix” nomenclature `[-,-]`, where the first number inside the brackets specifies the row number, and the second the column number:

Figure 4. Extraction of specific data in a data-frame and a matrix.

``````> musician[1,2]  # The number of pieces that Nikolai Medtner composed
``````
``````[1] 722
``````

We can also call for that data by calling the column by it’s name

``````> musician[1,"pieces"]  # The number of pieces that Nikolai Medtner composed
``````
``````[1] 722
``````

## Exercise 2:

Complete the lines of code to obtain the required information

Code Information required
> musician[__,__] Pieces composed by Shakira
> (musician____)_2 Pieces composed by all musicians if they were half of productive (The half of their actual pieces)
> musician\$___ <- c(,,___) Redefine the `likes` column to make all the musicians popular!

がんばって! (ganbatte; good luck):

## Solution

Code Information required
> musician[3,”pieces”] Pieces composed by Shakira
> (musician\$pieces)/2 Pieces composed by all musicians if they were half of productive (The half of their actual pieces)
> musician\$likes <- c(“TRUE”,”TRUE”,”TRUE”) Redefine the `likes` columne to make all the musicians popular!

## Key Points

• Data-frames contain multiple columns with different types of data.