Identify the problem and make a plan
Last updated on 2025-05-27 | Edit this page
Estimated time: 70 minutes
Overview
Questions
- What do I do when I encounter an error?
- What do I do when my code outputs something I don’t expect?
- Why do errors and warnings appear in R?
- How can I find which areas of code are responsible for errors?
- How can I fix my code? What other options exist if I can’t fix it?
Objectives
After completing this episode, participants should be able to…
- Describe how the desired code output differs from the actual output
- Categorize an error message (e.g. syntax error, semantic errors, package-specific errors, etc.)
- decode/describe what an error message is trying to communicate
- Identify specific lines and/or functions generating the error message
- Use R Documentation to look up function syntax and examples
- Quickly fix commonly-encountered R errors using the internet
- Identify when a problem is better suited for asking for further help, including online help and reprex
(initial intro – edit upon looking at intro episode)
The first step we’ll cover is what to do when encountering an error or other undesired output from your code. With this episode, we hope to teach you the basics about identifying errors, rectifying them if possible, and if not, how to isolate the problem for others to look at. This is the first step in our “roadmap” of how to solve coding problems – recognizing when something you don’t intend is happening with your code, and then identifying the problem (to a lesser or greater degree) in order to solve it yourself or be able to succinctly describe it to a helper.
3.1 What do I do when I encounter an error message?
While sometimes frustrating to read, R will often let us know when a problem occurs by generating an error message that tells us why R was unable to run our code. This type of ‘error’ is often referred to as a syntax error. When R is unable to run your code, it will return this type of error message, and stop the program (as opposed to a warning or attempting to run further lines despite the error). Error messages may happen for many reasons. However, deciphering the meaning of such error messages is not always as easy as we might hope. While we can’t review every type of reason your code generates an error, we will try to teach you some tools for you to interpret and figure out syntax errors for yourself.
The accompanying error message attempts to tell you exactly how your code failed. For example, consider the following error message that occurs when I run this command in the R console:
R
ggplot(x = taxa) + geom_bar()
ERROR
Error: object 'taxa' not found
Though we know somewhere there is an object called taxa
(it is actually a column of the dataset rodents
), R is
trying to communicate that it cannot find any such object in the local
environment. Let’s try again, appropriately pointing ggplot to the
rodents
dataset and taxa
column using the
$
operator. For the sake of argument, let’s say we also
remember that geom_bar
expects an aesthetic
(aes
).
R
ggplot(aes(x = rodents$taxa)) + geom_bar()
ERROR
Error in `fortify()`:
! `data` must be a <data.frame>, or an object coercible by `fortify()`,
or a valid <data.frame>-like object coercible by `as.data.frame()`, not a
<uneval> object.
ℹ Did you accidentally pass `aes()` to the `data` argument?
Whoops! Here we see another error message – this time, R responds with a perhaps more-uninterpretable message.
Let’s go over each part briefly. First, we see an error from a
function called fortify
, which we didn’t even call! Then,
there’s a more helpful informational message: Did we accidentally pass
aes()
to the data
argument? This does seem to
relate to our function call, as we do pass aes
into the
ggplot
function. But what is this “data
argument?” A helpful starting place when attempting to decipher an error
message is checking the documentation for the function which caused the
error:
?ggplot
Here, a Help window pops up in RStudio which provides some more
information. Skipping the general description at the top, we see ggplot
takes positional arguments of data
, then
mapping
, which uses the aes
call. We can see
in “Arguments” that the aes(x = rodents$taxa)
object used
in the plot is attempted by fortify
to be converted to a
data.frame: now the picture is clear! We accidentally passed our
mapping
argument (telling ggplot how to map variables to
the plot) into the position it expected data
in the form of
a data frame. And if we scroll down to “Examples”, to “Pattern 1”, we
can see exactly how ggplot expects these arguments in practice. Let’s
amend our result:
R
ggplot(rodents, aes(x = taxa)) + geom_bar()

Here we see our desired plot.
Stop no. 1 on our roadmap: Identifying the problem
Let’s pause here to highlight some patterns we’re starting to see in the course of fixing our code:
Seeing a problem arise in our code (in this case, R is explicitly telling us it has a problem running it).
Reading and interpreting the error message R gives us.
Other steps we might take then include:
Acting on parts of the error we can understand, such as changing input to a function.
Pulling up the R Documentation for that function, and reading the documentation’s Description, Usage, Arguments, Details and Examples entries for greater insight into our error.
Copying and pasting the error message into a search engine / generative LLM for more interpretable explanations.
And, when all else fails, we can prepare our code into a reproducible example for expert help.
While the above steps may be new or seem familiar, we formalize this a little bit to explicitly address something: recognizing when a problem arises and attempting to interpret what is going wrong is essential to fixing it. This is true whether you fix the problem on your own, or communicate it to an expert. The latter steps we listed might be categorized as attempts to immediately address the problem – we’ll call these code first aid – these steps might fix the problem, give you greater insight into what the problem is (and how R is interpreting your code), or not be helpful at all.
In any case, we want to emphasize that these skill sets are essential to being a practiced coder able to effectively seek help. While these examples may seem trivial to pull up a whole checklist, below we will see examples of problems that are trickier to both recognize and interpret. But in those cases, we’ll nonetheless apply the same framework.
3.2 What do I do when my code outputs something I don’t expect
Another type of problem you may encounter is when your R code runs without errors, but does not produce the desired output. You may sometimes see these called semantic errors. As with syntax errors, semantic errors may occur for a variety of non-intuitive reasons, and are often harder to solve as there is no description of the error – you must work out where your code is defective yourself!
Let’s go back to our rodent analysis. The next step in the plan is to
subset to just the Rodent
taxa (as opposed to other taxa:
Bird, Rabbit, Reptile or NA). Let’s quickly check to see how much data
we’d be throwing out by doing so:
R
table(rodents$taxa)
OUTPUT
Bird Rabbit Reptile Rodent
300 69 4 16148
We’re interested in the Rodents, and thankfully it seems like a majority of our observations will be maintained when subsetting to rodents. But wait… In our plot above, we can clearly see the presence of NA values. Why are we not seeing them here? Our command was correctly executed, but the output is not everything we intended. Having no error message to interpret, let’s jump straight to the function documentation:
R
?table
OUTPUT
Help on topic 'table' was found in the following packages:
Package Library
vctrs /home/runner/.local/share/renv/cache/v5/linux-ubuntu-jammy/R-4.5/x86_64-pc-linux-gnu/vctrs/0.6.5/c03fa420630029418f7e6da3667aac4a
base /home/runner/.cache/R/renv/sandbox/linux-ubuntu-jammy/R-4.5/x86_64-pc-linux-gnu/9a444a72
Using the first match ...
Here, the documentation provides some clues: there seems to be an
argument called useNA
that accepts “no”, “ifany”, and
“always”, but it’s not immediately apparent which one we should use to
show our NA values. As a second approach, let’s go to
Examples
to see if we can find any quick fixes. Here we see
a couple lines further down:
R
table(a) # does not report NA's
table(a, exclude = NULL) # reports NA's
That seems like it should be inclusive. Let’s try again:
R
table(rodents$taxa, exclude = NULL)
OUTPUT
Bird Rabbit Reptile Rodent <NA>
300 69 4 16148 357
Now our NA values show up in the table. We see that by subsetting to the “Rodent” taxa, we would losing about 357 NAs, which themselves could be rodents! However, in this case, it seems a small enough portion to safely omit. Let’s subset our data to the rodent taxon.
R
rodents <- rodents %>% filter(taxa == "Rodent")
Challenge
There are 3 lines of code below, and each attempts to create the same plot. Identify which produces a syntax error, which produces a semantic error, and which correctly creates the plot (hint: this may require you inferring what type of graph we’re trying to create!)
ggplot(rodents) + geom_bin_2d(aes(month, plot_type))
ggplot(rodents) + geom_tile(aes(month, plot_type), stat = "count")
ggplot(rodents) + geom_tile(aes(month, plot_type))
In this case, A correctly creates the graph, plotting as colors in the tile the number of times an observation is seen. It essentially runs the following lines of code:
R
rodents_summary <- rodents %>% group_by(plot_type, month) %>% summarize(count=n())
OUTPUT
`summarise()` has grouped output by 'plot_type'. You can override using the
`.groups` argument.
R
ggplot(rodents_summary) + geom_tile(aes(month, plot_type, fill=count))

B is a syntax error, and will produce the following error:
R
ggplot(rodents) + geom_tile(aes(month, plot_type), stat = "count")
ERROR
Error in `geom_tile()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `setup_params()`:
! `stat_count()` must only have an x or y aesthetic.
Finally, C is a semantic error. It produces a plot, which is rather meaningless:
R
ggplot(rodents) + geom_tile(aes(month, plot_type))

Summary
In general, encountering semantic errors can make our job more difficult, but the roadmap remains the same:
Seeing a problem arise in our code.
Interpreting the problem.
Other steps we might take then include:
Acting on parts of the error we can understand, such as changing input to a function.
Pulling up the R Documentation for relevant functions, and reading the documentation’s Description, Usage, Arguments, Details and Examples entries for greater insight into our error.
Describing our problem into a search engine / generative LLM for more interpretable explanations.
And, when all else fails, we can prepare our code into a reproducible example for expert help.
The steps to identifying the problem and in code first aid matches what we’ve seen above. However, here seeing the problem arise in our code may be much more subtle, and comes from us recognizing output we don’t expect or know to be wrong. Even if the code is run, R may give us warning or informational messages which pop up when executing your code. Most of the time, however, it’s up to the coder to be vigilant and be sure steps are running as they should. Interpreting the problem may also be more difficult as R gives us little or no indication about how it’s misinterpreting our intent.
Callout
Generally, the more your code deviates from just using base R
functions, or the more you use specific packages, both the quality of
documentation and online help available from search engines and Googling
gets worse and worse. While base R errors will often be solvable in a
couple of minutes from a quick ?help
check or a long online
discussion and solutions on a website like Stack Overflow, errors
arising from little-used packages applied in bespoke analyses might
merit isolating your specific problem to a reproducible example for
online help, or even getting in touch with the developers! Such
community input and questions are often the way packages and
documentation improves over time.
3.3 How can I find where my code is failing?
Isolating your problem may not be as simple as assessing the output from a single function call on the line of code which produces your error. Often, it may be difficult to determine which lines or logical sections of code (e.g. functions) are producing the error.
Consider the example below, where we now are attempting to see which species of kangaroo rodents appear in different plot types over the years. To do so, we’ll filter our dataset to just include the genus Dipodomys. Then we’ll plot a histogram of which how many observations are seen in each plot type over an x axis of years.
R
krats <- rodents %>% filter(genus == "Dipadomys") #kangaroo rat genus
ggplot(krats, aes(year, fill=plot_type)) +
geom_histogram() +
facet_wrap(~species)
ERROR
Error in `combine_vars()`:
! Faceting variables must have at least one value.
Uh-oh. Another error here, when we try to make a ggplot. But what is “combine_vars?” And then: “Faceting variables must have at least one value” What does that mean?
This is not an easily-interpretable error message from ggplot, and our code looks like it should run. Perhaps we can take a step back and see whether our error is actually not in the ggplot code itself. Often, when trying to isolate the problem area, it is a good idea to look back at the original input. So let’s take a look at our krats dataset.
R
krats
OUTPUT
# A tibble: 0 × 13
# ℹ 13 variables: record_id <dbl>, month <dbl>, day <dbl>, year <dbl>,
# plot_id <dbl>, species_id <chr>, sex <chr>, hindfoot_length <dbl>,
# weight <dbl>, genus <chr>, species <chr>, taxa <chr>, plot_type <chr>
It’s empty! What went wrong with our “Dipadomys” filter? Let’s use a print statement to see which genuses are included in the original rodents dataset.
R
print(rodents %>% count(genus))
OUTPUT
# A tibble: 12 × 2
genus n
<chr> <int>
1 Ammospermophilus 136
2 Baiomys 3
3 Chaetodipus 382
4 Dipodomys 9573
5 Neotoma 904
6 Onychomys 1656
7 Perognathus 553
8 Peromyscus 1271
9 Reithrodontomys 1412
10 Rodent 4
11 Sigmodon 103
12 Spermophilus 151
We see two things here. For one, we’ve misspelled Dipodomys, which we can now amend. This quick function call also tells us we should expect a data frame with 9573 values resulting after subsetting to the genus Dipodomys.
R
krats <- rodents %>% filter(genus == "Dipodomys") #kangaroo rat genus
dim(krats)
OUTPUT
[1] 9573 13
R
ggplot(krats, aes(year, fill=plot_type)) +
geom_histogram() +
facet_wrap(~species)
OUTPUT
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Our improved code here looks good. Checking the dimensions of our subsetted data frame using the dim() function confirms we now have all the Dipodomys observations, and our plot is looking better. In general, having a ‘print’ statement or some other output after we manipulate data or other major steps can be a good way to check your code is producing intermediate results consistent with your expectations.
Callout
Often, giving your expert helpers access to the entire problem, with a detailed description of your desired output allows you to directly improve your coding skills and learn about new functions and techniques.
Summary
In general, encountering semantic errors can make our job more difficult, but the roadmap remains the same:
Seeing a problem arise in our code.
Interpreting the problem.
Other steps we might take then include:
Acting on parts of the error we can understand, such as changing input to a function.
Pulling up the R Documentation for relevant functions, and reading the documentation’s Description, Usage, Arguments, Details and Examples entries for greater insight into our error.
Describing our problem into a search engine / generative LLM for more interpretable explanations.
And, when all else fails, we can prepare our code into a reproducible example for expert help.
Our roadmap to identifying problems in our code may now look like:
Seeing a problem arise in our code.
Isolating our code to the problem area.
Interpreting the problem.
Now we can see the need to isolate the specific areas of code causing the bug or problem. There is no general rule of thumb as to how large this needs to be. But, unless our problem occurs on the first line, we should be able to isolate our code a bit: Any early lines which we know run correctly and as intended may not need to be included, and by isolating the problem area as much as we can to make it understandable to others, even if that does not help us solve the problem ourselves.
Let’s add to our code first aid:
Identify the problem area – add print statements immediately upstream or downstream of problem areas, check the desired output from functions, and see whether any intermediate output can be further isolated.
Acting on parts of the error we can understand, such as changing input to a function.
Pulling up the R Documentation for relevant functions, and reading the documentation’s Description, Usage, Arguments, Details and Examples entries for greater insight into our error.
Describing our problem into a search engine / generative LLM for more interpretable explanations.
And, when all else fails, we can prepare our code into a reproducible example for expert help.
While this is similar to our previous checklists, we can now understand these steps as a continuous cycle of isolating the problem into more and more discrete chunks for a reproducible example. Any step in the above that helps us identify the specific areas or aspects of our code that are failing in particular, we can zoom in on and restart the checklist. We can stop as soon as we don’t understand anymore how our code fails. At this point, we can excise that area for further help using a reprex.
3.3 When should I prepare my code for a reprex?
There may be some point at which our code first aid does not help us anymore, and we still cannot figure out the problem our code is giving us – in that case, it may be time to turn to expert help, by asking a coworker, mentor, or someone online for aid in
While it is common practice in intro coding courses to call over the instructor with a raised hand and a statement such as “I don’t know what’s wrong,” in reality people have limited time, bandwidth, or requisite knowledge to be able to help out with any problem that might arise. Even if they can’t figure out a bug on their own, the practiced coder can identify and articulate the problem effectively such that someone with available time and expertise can help out