This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Developing Software In a Team: Code Review

Overview

Teaching: 30 min
Exercises: 30 min
Questions
  • How do we develop software in a team?

  • What is code review and how it can improve the quality of code?

Objectives
  • Describe commonly used code review techniques.

  • Understand how to do a pull request via GitHub to engage in code review with a team and contribute to a shared code repository.

Introduction

So far in this course we’ve focused on learning software design and (some) technical practices, tools and infrastructure that help the development of software in a team environment, but in an individual setting. Despite developing tests to check our code - no one else from the team had a look at our code before we merged it into the main development stream. Software is often designed and built as part of a team, so in this episode we’ll be looking at how to manage the process of team software development and improve our code by engaging in code review process with other team members.

Collaborative Code Development Models

The way a team provides contributions to a shared codebase depends on the type of development model used in a project. Two commonly used models are described below.

Fork and Pull Model

In the fork and pull model, anyone can fork an existing repository (to create their copy of the project linked to the source) and push changes to their personal fork. A contributor can work independently on their own fork as they do not need permissions on the source repository to push modifications to a fork they own. The changes from contributors can then be pulled into the source repository by the project maintainer on request and after a code review process. This model is popular with open source projects as it reduces the start up costs for new contributors and allows them to work independently without upfront coordination with source project maintainers. So, for example, you may use this model when you are an external collaborator on a project rather than a core team member.

Shared Repository Model

In the shared repository model, collaborators are granted push access to a single shared code repository. By default, collaborators have write access to the main branch. However, it is best practice to create feature branches for new developments and protect the main branch from direct and unreviewed commits to keep it stable - see GitHub’s documentation on how to do this. While this model of collaboration requires more upfront coordination, it makes it easier to share each other’s work. It works well for more stable teams and is more prevalent with teams and organisations collaborating on private projects.

Code Review

Regardless of the collaborative code development model your team uses, code review is one of the widely accepted best practices for software development in teams and something you should adopt in your development process too.

Code review is a software quality assurance practice where one or several people from the team (different from the code’s author) check the software by viewing parts of its source code at the point when the code changes. Code review is very useful for all parties involved - someone checks your design or code for errors and gets to learn from your solution; having to explain code to someone else clarifies your rationale and design decisions in your mind too.

Code review is universally applicable throughout the software development cycle - from design to development to maintenance. According to Michael Fagan, the author of the code inspection technique, rigorous inspections can remove 60-90% of errors from the code even before the first tests are run (Fagan, 1976). Furthermore, according to Fagan, the cost to remedy a defect in the early (design) stage is 10 to 100 times less compared to fixing the same defect in the development and maintenance stages, respectively. Since the cost of bug fixes grows in orders of magnitude throughout the software lifecycle, it is far more efficient to find and fix defects as close as possible to the point where they were introduced.

There are several code review techniques with various degree of formality and the use of a technical infrastructure, including:

You can read more about these techniques in the “Five Types of Review” section of the “Best Kept Secrets of Peer Code Review” eBook.

It is worth trying multiple code review techniques to see what works best for you and your team. We will have a look at the tool-assisted code review process, which is likely to be the most effective and efficient. We will use GitHub’s built-in code review tool - pull requests, or PRs. It is a lightweight tool, included with GitHub’s core service for free and has gained popularity within the software development community in recent years.

Code Reviews via GitHub’s Pull Requests

Pull requests are fundamental to how teams review and improve code on GitHub (and similar code sharing platforms) - they let you tell others about changes you’ve pushed to a branch in a repository on GitHub and that your code is ready for review. Once a pull request is opened, you can discuss and review the potential changes with others on the team and add follow-up commits based on the feedback before your changes are merged into the development branch. The name ‘pull request’ suggests you are requesting the codebase moderators to pull your changes into the codebase.

Such changes are normally done on a feature branch, to ensure that they are separate and self-contained, that the main branch only contains “production-ready” work, and that the development branch contains code that has already been extensively tested. You create a branch for your work based on one of the existing branches (typically the development branch but can be any other branch), do some commits on that branch, and, once you are ready to merge your changes, create a pull request to bring the changes back to the branch that you started from. In this context, the branch from which you branched off to do your work and where the changes should be applied back to is called the base branch, while the feature branch that contains changes you would like to be applied is the compare branch.

How you create your feature branches and open pull requests in GitHub will depend on your collaborative code development model:

In both development models, it is recommended to create a feature branch for your work and the subsequent pull request, even though you can submit pull requests from any branch or commit. This is because, with a feature branch, you can push follow-up commits as a response to feedback and update your proposed changes within a self-contained bundle. The only difference in creating a pull request between the two models is how you create the feature branch. In either model, once you are ready to merge your changes in - you will need to specify the base branch and the compare branch.

Let’s see this in action - you are going to act as a reviewer on a proposed change to the codebase contributed by a fictional colleague on one of your fellow learner’s repository. One of your fellow learners will review the proposed changes on your repository. Once the review is done, you will then take on the role of the fictional colleague and respond to the review on your repository. If you are completing the course by yourself, you can add the review on the proposed changes in your own repository and then respond to your own review comments by fixing the proposed code. This is actually a very sensible thing to do in general - looking at your own code in a review window will allow you to spot mistakes you have not seen before.

Here is an outline of the process of a tool assisted code review.

Code review process sequence

Recall solution requirement SR1.1.1 from an earlier episode. A fictional colleague has implemented it according to the specification and pushed it to a branch feature-std-dev of our software repository. You will turn this branch into a pull request on your repository. You will then engage in code review for the change (acting as a code reviewer) on a fellow learner’s repository. Once complete, you will respond to the comments from another team member on the pull request on your repository (acting as a code author).

Raising a Pull Request

  1. Head over to your software repository in GitHub.
  2. Navigate to the pull requests tab.
  3. Create a new pull request by clicking the green New pull request button. GitHub pull requests tab
  4. Select the base and the compare branch - main and feature-std-dev, respectively. Recall that the base branch is where you want your changes to be merged and the compare branch contains the changes.
  5. Click Create pull request button to open the request. Creating a new pull request.
  6. Add a comment describing the nature of the changes, and then submit the pull request by clicking the Create pull request button (in the new window). Submitting a pull request.
  7. At this point, the code review process is initiated.

We will now discuss what to look for in a code review, before practising it on this fictional change.

Reviewing a Pull Request

Once a review has been raised it is over to the reviewer to review the code and submit feedback.

Reviewing code effectively takes practice. However, here is some guidance on what you should be looking for when reviewing a piece of code.

Things to Look for in a Code Review

Start by understanding what the code should do, by reading the specification/user requirements, the pull request description or talking to the developer if need be. In this case, understand what SR1.1.1 means.

Once you’re happy, start reading the code (skip the test code for now - we will come back to it later). You’re going to be assessing the code in the following key areas.

Is the proposed code readable?
Is the proposed code a minimal change?
Is the structure of the code clear?
Is there an appropriate and up-to-date documentation for the proposed code?

Things Not to Look for in a Code Review

The overriding priority for reviewing code should be making sure progress is being made - do not let perfect be the enemy of the good here. According to “Best Kept Secrets of Peer Code Review” (Cohen, 2006) the first hour of reviewing code is the most effective, with diminishing returns after that.

To that end, here are a few things you should not be trying to spot when reviewing:

Adding a review comment

Here, we are outlining the process of adding a review to a pull request. There is doing to be an exercise next for you to practice it.

  1. Your fellow learner should add you as a collaborator on their repository to begin with. They can do that from the Settings tab of the repository, then Collaborators and teams tab on the left, then clicking the Add people button. Once they find you by your GitHub name - you will receive an invitation via email to join the repository as a collaborator. You will have to do the same for the collaborator doing the review on your repository.

    Code Review from External Contributors

    You do not have to be a collaborator on a public repository to do code reviews so this step is not strictly necessary. We are still asking you to do it now as we will get you working in teams for the rest of the course so it may make sense to start setting up your collaborators now.

    Adding a collaborator in GitHub

  2. Locate up the pull request from the GitHub’s Pull Requests tab on the home page of your fellow learner’s software repository, then head to the Files changed tab on the pull request. The files changed tab of a pull request
  3. When you find a line that you want to add a comment to, click on the blue plus (+) button next to the line. This will bring up a “Write” box to add your comment. Adding a review comment to a pull request You can also add comments referring to multiple lines by clicking the plus and dragging down over the relevant lines. If you want to make a concrete suggestion or a change to the code directly, such as renaming a variable, you can click the Add a suggestion button (which looks like a document with a plus and a minus in it). This will populate the comment with the existing code, and you can edit it to be what you think the code should be.

    Note: you can only make direct code suggestions if you are a collaborator on a repository. Otherwise, you can add comments only. Adding a suggestion to a pull request GitHub will then provide a button for the code author to apply your changes directly.

  4. Write your comment in the box, and then click Start review. This will save your comment, but not publish it yet. You can use Add single comment button to immediately post a comment. However, it is best to batch the comments into a single review, so that the author knows when you have finished adding comments (and avoid spamming their email with notifications).
  5. Continue adding comments in this way, if you have any, using the Add review comment button.

Effective Review Comments

  • Make sure your review comments are specific and actionable.
  • Try to be as specific as you can - instead of “this code is unclear” instead say “I do not understand what values this variable can hold”.
  • Make it clear in the comment if you want something to change as part of this pull request.
  • Ideally provide a concrete suggestion (e.g. a better variable name).

Exercise: Review Some Code

Pair up with a colleague from your group/team and go to the pull request your colleague created on their project repository. If there is an odd number of people in your group, three people can go in a round robin fashion (the first team member will review the pull request on the second member’s repository and will receive comments on the pull request on their repository from the third team member, and so on). If you are going through the material on your own and do not have a collaborator, you can be the reviewer on the pull requests on your own repository.

Review the code, looking for the kinds of problems that we have just discussed. There are examples of all four main problem areas in the pull request, so try to make at least one suggestion for each area.

Add your review comments but do not submit your review just yet.

Time: 10 min

Solution

Here are some of the things you might have found were wrong with the code.

Is the proposed code readable?
  • Function name s_dev is not the best or self-explanatory - it uses an uncommon abbreviation and does not make it clear immediately what the function does without reading the code. A better name is standard_deviation.
  • Not clear what variable number contains - better option is a business-logic name like mean or mean_of_data.
Is the proposed code a minimal change?
  • Could have used np.std to compute standard deviation of data without having to reimplement from scratch.
Is the structure of the proposed code clear?
  • Have the function return the data, rather than having the graph name (a view layer consideration) leak into the model code.
Is there an appropriate and up-to-date documentation for the proposed code?
  • The documentation say it returns the standard deviation, but it actually returns a dictionary containing the standard deviation.

Making Sure Code is Valid

The other key thing you want to verify in a code review is that the code is correct and well tested. One approach to do this is to build up a list of tests you would expect to see (and the results you would expect them to have), and then verify that all these tests are present and correct.

Start by listing out all the tests you would expect to see based on the specification. As you are going through the code, add to this list any other tests you can think of, making sure to add tests for:

Once you have the list, go through the tests in the pull request. Inspect them closely to make sure they test everything you expect them to.

Submitting a Review

Once you have a list of tests you want the author to add, it is time to submit your review.

  1. To do this, click the Finish your review button at the top of the Files changed tab. Using the finishing your review dialog In the comment box, you can add any other comments that are not associated with a specific line. For example, you can put the list of tests that you want to see added here.
  2. Next you will need select to one of Comment, Approve or Request changes. Using the finishing your review dialog
    • Use Approve if you would be happy for the code to go in with no further changes.
    • Use Request changes to communicate to the author that they should address your comments before you will approve it.
    • Use Comment if you do not want to express a decision on whether the code should be accepted. For example, if you have been asked to look at a specific part of the code, or if you are part way through a review, but wanted to share some comments sooner.
  3. Finally, you can press Submit review button. This will publish all the comments you have made as part of the review and let the author know that the review is complete and it is their turn for action.

Exercise: Review the Code for Suitable Tests

Remind yourself of the specification for SR1.1.1 and write a list of tests you would expect to see for this feature. Review the code again and expand this list to include any other edge cases the code makes you think of. Go through the tests in the pull request and work out which tests are present.

Once you are happy, you can submit your review. Select Request changes to let the author know they need to address your comments.

Time: 10 min

Solution

Your list might include the following:

  1. Standard deviation for one patient with multiple observations.
  2. Standard deviation for two patients.
  3. Graph includes a standard deviation graph.
  4. Standard deviation function should raise an error if given empty data.
  5. Computing standard deviation where deviation is different from variance.
  6. Standard deviation function should give correct result given negative inputs.
  7. Function should work with numpy arrays.

Looking at the tests in the PR, you might be content that tests for 1, 4 and 7 are present so you would request changes to add tests 2, 3, 5 and 6.

In looking at the tests, you may have noticed that the test for numpy arrays is currently spuriously passing as it does not use the return value from the function in the assert.

You may have spotted that the function actually computes the variance rather than the standard deviation. Perhaps that made you think to add the test for some data where the variance and standard deviation are different. In more complex examples, it is often easier to spot code that looks like it could be wrong and think of a test that will exercise it. This saves embarrassment if the code turns out to be right, means you have the missing test written if it is wrong, and is often quicker than trying to execute the code in your head to find out if it is correct.

Responding to Review Comments

Once you receive comments on your code, a few different scenarios can occur:

  1. You understand and agree with the reviewer’s comments. In this scenario, you should make the requested change to your branch (or accept the suggested change by the reviewer) and commit it. It might be helpful to add a thumbs up reaction to the comment, so the reviewer knows you have addressed it. Even better, leave a comment such as “Fixed via #commit_number” with a link to your commit that implemented the change. Responding to a review comment with an emoji Responding to a review comment with a link to commit
  2. It is not completely clear what the requested change should be - in this scenario you should reply to such a review to ask for clarification.
  3. You disagree with the reviewer - in this scenario, it might be best to talk to them in person. Discussions that happen on code reviews can often feel quite adversarial - discussing what the best solution is in person can help defuse this.

Exercise: Responding to Comments

Look at the pull request that you created on your repository. By now you should have someone else’s comments on it. For each comment, either reply explaining why you do not think the change is necessary or make the change and push a commit fixing it. Reply to each of the comments indicating you have addressed it.

At the same time, people will be addressing your comments on the pull request in their repository. If you are satisfied that your comment has been suitably addressed, you can mark it as resolved. Once all comments have been addressed, you can approve the pull request by submitting a new review and this time selecting Approve. This tells the author you are happy for them to merge the pull request.

Time: 10 min

Approving a Pull Request

  1. Once the reviewer approves the changes, the person whose repository it is can merge the changes onto the base branch. Typically, it is the code author’s responsibility to merge but this may differ from team to team. In our case, you will merge the changes on the PR on your repository. Merging a pull request in GitHub
  2. Delete the merged branch to reduce the clutter in the repository.

Writing Easy-To-Review Code

There are a few things you can do to make it as easy as possible for the reviewer to review your code:

The most important thing to keep in mind is how long your pull request is. Smaller changes that just make one small improvement will be much quicker and easier to review. There is no golden rule, but studies into code review show that you should not review more than 400 lines of code at a time, so this is a reasonable target to aim for. You can refer to some studies and Google recommendations as to what a “large pull request” is but be aware that it is not an exact science.

Try to keep each commit (within your pull request) to be making one logical change. This can especially help with larger pull requests that would otherwise be harder to review. In particular, if you have reformatted, refactored and changed the behavior of the code make sure each of these is in a separate commit (i.e reformat the code, commit, refactor the code, commit, alter the behavior of the code, commit).

Make sure you write a clear description of the content and purpose of the change. This should be provided as the pull request description. This should provide the context needed to read the code.

It is also a good idea to review your code yourself, before requesting a review. In doing this you will spot the more obvious issues with your code, allowing your reviewer to focus on the things you cannot spot.

Writing Effective Review Comments

Code is written by humans (mostly!), and code review is a form of communication. As such, empathy is important for effective reviewing.

When reviewing code, it can be sometimes frustrating when code is confusing, particularly as it is implemented differently to how you would have done it. However, it is important as a reviewer to be compassionate to the person whose code you are reviewing. Specifically:

Defining a Review Process For Your Team

To be effective, code review needs to be a process that is followed by everyone in the team developing the code. Everyone should believe that the process provides value. One way to foster this is to agree on the review process as a team and consider, e.g.:

You could also consider using pull request states in GitHub:

Once you have agreed on a review process, you should monitor (either formally or informally) how well it is working.

It is important that reviews are processed quickly, to avoid costly context switching for the code author moving on to work on other things and coming back to their PR. Try and set the targets for when you would want the first review submitted on a PR and the PR merged, based on how your team works. If you are regularly missing your targets, then you should review your process to identify where things are getting stuck and work out what you can do to move things along.

Optional Exercise: Code Review in Your Own Working Environment

In this episode we have looked at some best practices for code review and practiced tool assisted code review with GitHub’s pull requests.

Now think about how you and your collaborators typically develop code. What benefits do you see for introducing a code review process in your work environment? How might you institute code review practices within your environment? Write down a process for a tool assisted code review for your team, answering the questions above.

Once complete, discuss with the rest of the class what are the advantages of a code review process and what challenges you think you would face in implementing this process in your own working environment.

Solution

The purposes of code review include:

  • improving internal code readability, understandability, quality and maintainability,
  • checking for coding standards compliance, code uniformity and consistency,
  • checking for test coverage and detecting bugs and code defects early,
  • detecting performance problems and identifying code optimisation points,
  • finding alternative/better solutions,
  • sharing knowledge of the code, and of coding standards and expectations of quality.

Finally, it helps increase the sense of collective code ownership and responsibility, which in turn helps increase the “bus factor” and reduce the risk resulting from information and capabilities being held by a single person “responsible” for a certain part of the codebase and not being shared among team members.

Challenges you might face introducing a code review process:

  • complaints that it is a waste of time,
  • creating a negative atmosphere where people are overly critical of each others work, or are defensive of their own,
  • perfectionism leading to slower development,
  • people not sharing code to avoid the review process.

Make sure to monitor whether these are happening, and adjust the process accordingly.

Further Reading

There are multiple perspectives to a code review process - from general practices to technical details relating to different roles involved in the process. We have discussed the main points, but do check these useful code review blogs from Swarmia and Smartbear.

The key thing is to try it, and iterate the process until it works well for your team.

Key Points

  • Code review is a team software quality assurance practice where team members look at parts of the codebase in order to improve their code’s readability, understandability, quality and maintainability.

  • It is important to agree on a set of best practices and establish a code review process in a team to help to sustain a good, stable and maintainable code for many years.