Writing our own functions

Last updated on 2024-11-19 | Edit this page

Overview

Questions

  • “How can I add functionality to my package?”

Objectives

  • “Create a custom package”

So far, we have been playing with the example package that is built-in in RStudio. It only contains a function that says Hello, world!. It is extremely useful that we have an example that gets us started, but this package is not very interesting yet. In this and the next episodes, we will create our own customized package, which in the previous episode we gave the name of mysterycoffee.

mysterycoffee, our virtual coffee room


Our package will try to mitigate a practical problem: social isolation in remote working environments. We want to create a software solution that simulates random encounters at the office’s coffee machine… when there is no office. The very first step is to answer three very important questions:

1. What will our package do?

As we said, our package will simulate random encounters between employees.

2. How will we do it?

After some thinking, we figured out that we can have a function that has the following input and output:

  • Input: vector of employees’ names.
  • Output: a two-column matrix, randomly grouping the employees’ names into couples.

The output table can be published, say, weekly, and each couple invited to have a videoconference-coffee chat, just as if they had randomly met at the coffee machine.

So this will be our starting point. In the next sections we’ll write an R function that does exactly this.

Later on, we’ll see that our original design may face unexpected challenges. For instance, what happens if the number of employees not even?

3. Are we reinventing the wheel?

This is also an important question worth investing some time in when creating new functions or packages. Can we solve our problem with a software solution that someone else already wrote?

Well… if our problem really was to simulate random encounters, the answer is yes. There are already solutions for this. But our real problem is learning how to make R Packages, isn’t it? So we’ll use this problem as a pedagogical example.

Getting our hands dirty


First step: remove hello.R

The next step is to remove the file hello.R. This was just an example file, and we don’t need it anymore.

Second step: edit DESCRIPTION

Now that we know what our package is expected to do, it is a perfect moment to edit the DESCRIPTION file.

The DESCRIPTION file

Open the DESCRIPTION file. What do you see here?

Take 5 minutes to edit this file with the information it asks. In particular, edit the following fields (when needed): Title, Version, Author, Maintainer, Description. For now, ignore the rest.

After editing, your DESCRIPTION file should look similar to:

TXT

Package: mysterycoffee
Type: Package
Title: Simulation of random encounters between couples of persons
Version: 0.1.0
Author: Pablo Rodriguez-Sanchez
Maintainer: Pablo Rodriguez-Sanchez <p.rodriguez-sanchez@esciencecenter.nl
Description: Simulates random encounters between couples of persons
    This package was inspired by the need to mitigate social isolation in remote 
    working environments. The idea is to simulate random encounters at the office's
    coffee machine... when there is no such an office.
License: What license is it under?
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1

More on DESCRIPTION

In the previous exercise we deliberately ignored some of the fields in the DESCRIPTION file. Two of them are particularly relevant, the Version and the License fields.

The Version field helps you and your users keep track of the package version. It is advisable to use semantic versioning. In a nutshell, semantic versioning means using a MAJOR.MINOR.PATCH structure for naming your versions. If your new version only fixes some bugs, increase the PATCH by one. If your new version includes new features, increase the MINOR by one. If your new version includes new features that are not backwards compatible, increase MAJOR by one.

Regarding the License: software licensing is a large and complicated field which lies at the intersection of programming and law. Its intricacies are far beyond the aim of this course. The good news is that, most likely, most of your research code can be released under a permissive license, typically MIT or Apache. If you want to know more, please take a look at these materials.

For more, detailed information about DESCRIPTION files, see R Packages documentation.

Third step: create a function

We came up with the following prototype function that will do the random grouping:

R

make_groups <- function(names) {
  # Shuffle the names
  names_shuffled <- sample(names)

  # Arrange it as a two-columns matrix
  names_coupled <- matrix(names_shuffled, ncol = 2)

  return(names_coupled)
}

Please, open an editor, copy the function above and save it as R/functions.R. All the functions of the package have to be in R files inside the R/ folder.

On the art of being tidy

So far, our package only has one function, and we have chosen a very boring name for the file where it is stored (R/functions.R).

In the future, keep in mind that you can use any valid filename for storing your functions. Additionally, such a file can contain one or many functions, and you can use multiple files if you want. Indeed, using multiple files with descriptive filenames is a good idea.

For instance, if your package has some functions for doing input, output and parsing of data, it could be a good idea to store those as R/io.R. You can later put your analysis functions in R/analysis.R, and the plotting ones under R/plotting.R.

Be creative and informative! The only rule is that your .R files should “live” inside the R/ folder.

Key Points

  • “It is important to think about what we want our package to do (design) and how to do it (implementation). We also want to know why we need a new package (avoid reinventing the wheel)”
  • “Functions have to be saved in .R files in the R folder”