A Brief Introduction to Functions

Last updated on 2024-12-26 | Edit this page

Overview

Questions

  • What are functions?
  • Why should we know how to write them?
  • What are the main components of a function?

Objectives

  • Understand the usefulness of custom functions
  • Understand the basic concepts around writing functions

About functions


Functions in R are something we are used to thinking of as something that comes from a package. You find, install and use specialized functions from packages to get your work done.

But you can, and arguably should, be writing your own functions too! Functions are a great way of making it easy to repeat the same operation but with different settings. How many times have you copy-pasted the exact same code in your script, only to change a couple of things (a variable, an input etc.) before running it again? Only to then discover that there was an error in the code, and when you fix it, you need to remember to do so in all the places where you copied that code.

Through writing functions you can reduce this back and forth, and create a more efficient workflow for yourself. When you find the bug, you fix it in a single place, the function you made, and each subsequent call of that function will now be fixed.

Furthermore, targets makes extensive use of custom functions, so a basic understanding of how they work is very important to successfully using it.

Writing a function

There is not much difference between writing your own function and writing other code in R, you are still coding with R! Let’s imagine we want to convert the millimeter measurements in the penguins data to centimeters.

R

library(palmerpenguins)
library(tidyverse)

penguins |>
  mutate(
    bill_length_cm = bill_length_mm / 10,
    bill_depth_cm = bill_depth_mm / 10
  )

OUTPUT

# A tibble: 344 × 10
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 4 more variables: sex <fct>, year <int>, bill_length_cm <dbl>,
#   bill_depth_cm <dbl>

This is not a complicated operation, but we might want to make a convenient custom function that can do this conversion for us anyways.

To write a function, you need to use the function() function. With this function we provide what will be the input arguments of the function inside its parentheses, and what the function will subsequently do with those input arguments in curly braces {} after the function parentheses. The object name we assign this to, will become the function’s name.

R

my_function <- function(argument1, argument2) {
  # the things the function will do
}
# call the function
my_function(1, "something")

For our mm to cm conversion the function would look like so:

R

mm2cm <- function(x) {
  x / 10
}

Our custom function will now transform any numerical input by dividing it by 10.

Let’s try it out:

R

penguins |>
  mutate(
    bill_length_cm = mm2cm(bill_length_mm),
    bill_depth_cm = mm2cm(bill_depth_mm)
  )

OUTPUT

# A tibble: 344 × 10
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 4 more variables: sex <fct>, year <int>, bill_length_cm <dbl>,
#   bill_depth_cm <dbl>

Congratulations, you’ve created and used your first custom function!

Make a function from existing code

Many times, we might already have a piece of code that we’d like to use to create a function. For instance, we’ve copy-pasted a section of code several times and realize that this piece of code is repetitive, so a function is in order. Or, you are converting your workflow to targets, and need to change your script into a series of functions that targets will call.

Recall the code snippet we had to clean our penguins data:

R

penguins_data_raw |>
  select(
    species = Species,
    bill_length_mm = `Culmen Length (mm)`,
    bill_depth_mm = `Culmen Depth (mm)`
  ) |>
  drop_na()

We need to adapt this code to become a function, and this function needs a single argument, which is the dataset it should clean.

It should look like this:

R

clean_penguin_data <- function(penguins_data_raw) {
  penguins_data_raw |>
    select(
      species = Species,
      bill_length_mm = `Culmen Length (mm)`,
      bill_depth_mm = `Culmen Depth (mm)`
    ) |>
    drop_na()
}

Add this function to _targets.R after the part where you load packages with library() and before the list at the end.

RStudio function extraction

RStudio also has a handy helper to extract a function from a piece of code. Once you have basic familiarity with functions, it may help you figure out the necessary input when turning code into a function.

To use it, highlight the piece of code you want to make into a function. In our case that is the entire pipeline from penguins_data_raw to the drop_na() statement. Once you have done this, in RStudio go to the “Code” section in the top bar, and select “Extract function” from the list. A prompt will open asking you to hit enter, and you should have the following code in your script where the cursor was.

This function will not work however, because it contains more stuff than is needed as an argument. This is because tidyverse uses non-standard evaluation, and we can write unquoted column names inside the select(). The function extractor thinks that all unquoted (or back-ticked) text in the code is a reference to an object. You will need to do some manual cleaning to get the function working, which is why its more convenient if you have a little experience with functions already.

Challenge: Write a function that takes a numerical vector and returns its mean divided by 10.

R

vecmean <- function(x) {
  mean(x) / 10
}

Using functions in the workflow


Now that we’ve defined our custom data cleaning function, we can put it to use in the workflow.

Can you see how this might be done?

We need to delete the corresponding code from the last tar_target() and replace it with a call to the new function.

Modify the workflow to look like this:

R

library(targets)
library(tidyverse)
library(palmerpenguins)

clean_penguin_data <- function(penguins_data_raw) {
  penguins_data_raw |>
    select(
      species = Species,
      bill_length_mm = `Culmen Length (mm)`,
      bill_depth_mm = `Culmen Depth (mm)`
    ) |>
    drop_na()
}

list(
  tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")),
  tar_target(penguins_data_raw, read_csv(
    penguins_csv_file, show_col_types = FALSE)),
  tar_target(penguins_data, clean_penguin_data(penguins_data_raw))
)

We should run the workflow again with tar_make() to make sure it is up-to-date:

R

tar_make()

OUTPUT

✔ skipped target penguins_csv_file
✔ skipped target penguins_data_raw
▶ dispatched target penguins_data
● completed target penguins_data [0.093 seconds, 1.614 kilobytes]
▶ ended pipeline [0.242 seconds]

We will learn more soon about the messages that targets() prints out.

Functions make it easier to reason about code


Notice that now the list of targets at the end is starting to look like a high-level summary of your analysis.

This is another advantage of using custom functions: functions allows us to separate the details of each workflow step from the overall workflow.

To understand the overall workflow, you don’t need to know all of the details about how the data were cleaned; you just need to know that there was a cleaning step. On the other hand, if you do need to go back and delve into the specifics of the data cleaning, you only need to pay attention to what happens inside that function, and you can ignore the rest of the workflow. This makes it easier to reason about the code, and will lead to fewer bugs and ultimately save you time and mental energy.

Here we have only scratched the surface of functions, and you will likely need to get more help in learning about them. For more information, we recommend reading this episode in the R Novice lesson from Carpentries that is all about functions.

Key Points

  • Functions are crucial when repeating the same code many times with minor differences
  • RStudio’s “Extract function” tool can help you get started with converting code into functions
  • Functions are an essential part of how targets works.