Coding conventions and modular coding

Last updated on 2025-12-19 | Edit this page

Estimated time: 75 minutes

Overview

Questions

Why should you follow software code style conventions?
How can formatters and linters help you with this in Python and R?
How can nested code be targeted and improved through modularization?
How can I write a new function in R?

Objectives

Know how to write readable code
Know how to write modular code

Coding formatting and linting

Readable code - for others and our future selves - should be descriptive, cleanly and consistently formatted, and use sensible, descriptive names for variables, functions and modules. Furthermore, our code should be efficient and well designed.

Programs called formatters and linters can help us with the above. What do they do and how do they differ?

Formatters

With a light touch, formatters ensure the code adheres to standard style guides without altering functionality. A style guide is a set of conventions that we agree upon with our colleagues or community, to ensure that people produce code which looks similar in style. For examples, check out the PEP 8 Style Guide or the Tidyverse style guide for Python and R, respectively.

Good, consistent formatting makes the code easier to read and can prevent merge conflicts.

Linters

The work of “linters” is a bit more heavy-handed. These programs can check your code for common programming errors, convoluted syntax, unnecessary complexity, unused code, performance bottlenecks and other violations of best practices. Often they can fix the issues they encounter, or suggest appropriate improvements otherwise.

Improving your code

Discussion

Challenge

Install and use a formatter and a linter to improve the style of your code.

The program ruff can both format and lint Python code. Install ruff from PyPI or conda. Don’t forget to add it to your requirements.txt, pyproject.toml or whichever file you use to define your dependencies.

Formatting

First, use the formatter. Note that the use without the --check flag ruff automatically changes your file. This is the default, because formatting does not change the behaviour of your code.

BASH

# Check for formatting
ruff format --check

# Fix formatting, if desired
ruff format

Do you agree with the default choices ruff made? You can configure ruff to follow your choices if you need to, but be aware that the defaults were chosen for a reason.

Linting

Next, try out the linter. While fixing formatting is usually harmless, linting fixes change your code on a deeper level. By default, ruff only advises you. To automatically apply the linting, you need an explicit --fix flag. This is because linting can touch the functionality of your code, so make sure to review the changes ruff makes.

BASH

# Check for linting
ruff check

# Fix linting, if desired
ruff check --fix

What is your opinion on the linting suggestions? Again, you can configure the details ruff pays attention to when linting. Did you learn something new about the Python language?

The standard option for formatting R code is styler, for linting it’s lintr.

R

# Install the formatter and linter.
install.packages(c("styler", "lintr"))

If the installation fails due to a missing libxml-2.0 library, install it first (it’s called libxml2-dev in Ubuntu) and re-run the installation.

Formatting

You will notice that both new packages are listed under the “Addins” menu in RStudio. Open an R file and use

Addins ➜ STYLER ➜ "Style active file".

Do you agree with the default choices that styler made? You can configure styler to follow your choices if you need to, but be aware that the defaults were chosen for a reason. You can also configure it under

Addins ➜ STYLER ➜ "Set style".

Linting

Next, try out the linter with

Addins ➜ LINTR ➜ "Lint current file".

While fixing formatting is usually harmless, linting fixes change your code on a deeper level. Make sure to review the changes lintr makes.

What is your opinion on the linting suggestions? Again, you can configure the details lintr pays attention to when linting. Did you learn something new about the R language?

Further reading: R Coding Style

https://www.datanovia.com/en/blog/r-coding-style-best-practices/
https://rfortherestofus.com/2024/03/styler-package

(Optional) Git pre-commit hooks

So far, formatting and linting were conscious choices: you have to remember to execute them yourself every once in a while. A more robust approach would be to take away this mental load and automate linting and formatting. This can be achieved through “git hooks”, which are a set of scripts git can run every time a certain action is performed. Here we have a look at “pre-commit hooks” that check your code changes before you commit them.

Discussion

Challenge

The amount of git pre-commit hook scripts can grow rather large on bigger projects. pre-commit manages your commit hooks and helper programs in a declarative way and makes them easy to share between collaborators.

Use a git pre-commit hook to run formatters and linters every time before a git commit.

Install pre-commit and add it to your requirements.txt or pyproject.toml.

Create a pre-commit config file named .pre-commit-config.yaml.

YAML

repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
  # Ruff version.
  rev: v0.14.8
  hooks:
    # Run the linter.
    - id: ruff-check
      # Possible choices: [ python, pyi, jupyter ]
      # Here we exclude Jupyter notebooks
      types_or: [ python, pyi ]

      # Comment _out_ to only check, not fix
      args: [ --fix ]

    # Run the formatter.
    - id: ruff-format

      # Comment _in_ to only check, not fix
      # args: [ --check ]

      # Possible choices: [ python, pyi, jupyter ]
      # Here we exclude Jupyter notebooks
      types_or: [ python, pyi ]

If you would want to run additional checks, you need to add a new - repo: entry to the pre-commit config file.

Make sure to add and commit this file to your git repository, so your collaborators will use the exact same configurations and versions.

BASH

git add .pre-commit-config.yaml
git commit -m "add pre-commit hook for formatting and linting"

Install the new git hook.

BASH

pre-commit install

You can test your script by executing it manually.

BASH

pre-commit run --all-files

For updating your hooks to the latest version, run

BASH

pre-commit autoupdate

and make sure to add and commit your updated config file.

Install the package precommit and initialize it.

⚠️ This package is only supported on operating systems that are also officially supported by R. Please check the R Download page which ones are eligible. Non long-term-stable (LTS) Ubuntu distributions, for example, are not. Installing and running precommit on non-supported operating systems will probably require extra manual work.

R

install.packages("precommit")
precommit::use_precommit()

RStudio will open the .pre-commit-config.yaml file for you to review. It will look something like this:

YAML

# All available hooks: https://pre-commit.com/hooks.html
# R specific hooks: https://github.com/lorenzwalthert/precommit
repos:
-   repo: https://github.com/lorenzwalthert/precommit
    rev: v0.4.3.9017
    hooks: 
    -   id: style-files
        args: [--style_pkg=styler, --style_fun=tidyverse_style]    
    -   id: spell-check
        exclude: >
          (?x)^(
          .*\.[rR]|
          .*\.feather|
          .*\.jpeg|
          .*\.pdf|
          .*\.png|
          .*\.py|
          .*\.RData|
          .*\.rds|
          .*\.Rds|
          .*\.Rproj|
          .*\.sh|
          (.*/|)\.gitignore|
          (.*/|)\.gitlab-ci\.yml|
          (.*/|)\.lintr|
          (.*/|)\.pre-commit-.*|
          (.*/|)\.Rbuildignore|
          (.*/|)\.Renviron|
          (.*/|)\.Rprofile|
          (.*/|)\.travis\.yml|
          (.*/|)appveyor\.yml|
          (.*/|)NAMESPACE|
          (.*/|)renv/settings\.dcf|
          (.*/|)renv\.lock|
          (.*/|)WORDLIST|
          \.github/workflows/.*|
          data/.*|
          )$
    -   id: lintr
    -   id: readme-rmd-rendered
    -   id: parsable-R
    -   id: no-browser-statement
    -   id: no-debug-statement
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks: 
    -   id: check-added-large-files
        args: ['--maxkb=200']
    -   id: end-of-file-fixer
        exclude: '\.Rd'
-   repo: https://github.com/pre-commit-ci/pre-commit-ci-config
    rev: v1.6.1
    hooks:
    # Only required when https://pre-commit.ci is used for config validation
    -   id: check-pre-commit-ci-config
-   repo: local
    hooks:
    -   id: forbid-to-commit
        name: Don't commit common R artifacts
        entry: Cannot commit .Rhistory, .RData, .Rds or .rds.
        language: fail
        files: '\.(Rhistory|RData|Rds|rds)$'
        # `exclude: <regex>` to allow committing specific files

ci:
    autoupdate_schedule: monthly

You can comment out repo or id stanzas if you don’t want them, or add new ones if you want more checks to happen before you commit files.

https://www.r-bloggers.com/2022/09/enforcing-style-in-an-r-project/

⚠️ Failed pre-commit checks will also fail the commit! Make sure to resolve the issues - be it fixing the code, or excluding certain lines from being checked - and commit again.

Modular coding

What is modularity?

Modularity refers to the practice of building software from smaller, self-contained, and independent elements. Each element is designed to handle a specific set of tasks, contributing to the overall functionality of the system.

Modular coding is explained in more detail in these slides.

Writing functions

One of the best ways to improve your code and to make it more modular is to write functions. Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. Writing a function has four big advantages over using copy-and-paste:

You can give a function an evocative name that makes your code easier to understand.
As requirements change, you only need to update code in one place, instead of many.
You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
It makes it easier to reuse work from project-to-project, increasing your productivity over time.

A good rule of thumb is to consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).

Defining a function

Let’s open a new R script file and call it functions-lesson.R.

The general structure of a function is:

R

my_function <- function(parameters) {
  # perform action
  # return value
}

Let’s define a function fahr_to_kelvin() that converts temperatures from Fahrenheit to Kelvin:

R

fahr_to_kelvin <- function(temp) {
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

We define fahr_to_kelvin() by assigning it to the output of function. The list of argument names are contained within parentheses. Next, the body of the function–the statements that are executed when it runs–is contained within curly braces ({}). The statements in the body are indented by two spaces. This makes the code easier to read but does not affect how the code operates.

It is useful to think of creating functions like writing a cookbook. First you define the “ingredients” that your function needs. In this case, we only need one ingredient to use our function: “temp”. After we list our ingredients, we then say what we will do with them, in this case, we are taking our ingredient and applying a set of mathematical operators to it.

When we call the function, the values we pass to it as arguments are assigned to those variables so that we can use them inside the function. Inside the function, we use a return statement to send a result back to whoever asked for it.

Let’s try running our function. Calling our own function is no different from calling any other function:

R

# freezing point of water
fahr_to_kelvin(32)

OUTPUT

[1] 273.15

R

# boiling point of water
fahr_to_kelvin(212)

OUTPUT

[1] 373.15

Let’s open a new Python script file and call it functions-lesson.py

The general structure of a function is:

PYTHON

def my_function(parameters):
  # perform action
  # return value

Let’s define a function fahr_to_kelvin() that converts temperatures from Fahrenheit to Kelvin:

PYTHON

def fahr_to_kelvin(temp):
    kelvin = ((temp - 32) * (5 / 9)) + 273.15
    return kelvin

We define fahr_to_kelvin() by using the def keyword. The list of argument names are contained within parentheses.

Next, the body of the function–the statements that are executed when it runs–is indicated with indentation. The statements in the body are indented by four spaces.

Let’s try running our function. Calling our own function is no different from calling any other function:

PYTHON

# freezing point of water
fahr_to_kelvin(32)

OUTPUT

[1] 273.15

PYTHON

# boiling point of water
fahr_to_kelvin(212)

OUTPUT

[1] 373.15

Discussion

Challenge: Identify code that can be put in a function

In your own project: identify code that would fit better in a function. Try to look for pieces of code that you repeat throughout your project.

Create an issue in your project for each possible function that you find. (Actually implementing the function is beyond the scope of this workshop).

GitHub issues are a good way to track your progress and to-do list. As well as a way for others to signal issues with your code.

(Optional) Modularity in Python

Challenge

Carefully review the following code snippet:

PYTHON

def convert_temperature(temperature, unit):
    if unit == "F":
        # Convert Fahrenheit to Celsius
        celsius = (temperature - 32) * (5 / 9)
        if celsius < -273.15:
            # Invalid temperature, below absolute zero
            return "Invalid temperature"
        else:
            # Convert Celsius to Kelvin
            kelvin = celsius + 273.15
            if kelvin < 0:
                # Invalid temperature, below absolute zero
                return "Invalid temperature"
            else:
                fahrenheit = (celsius * (9 / 5)) + 32
                if fahrenheit < -459.67:
                    # Invalid temperature, below absolute zero
                    return "Invalid temperature"
                else:
                    return celsius, kelvin
    elif unit == "C":
        # Convert Celsius to Fahrenheit
        fahrenheit = (temperature * (9 / 5)) + 32
        if fahrenheit < -459.67:
            # Invalid temperature, below absolute zero
            return "Invalid temperature"
        else:
            # Convert Celsius to Kelvin
            kelvin = temperature + 273.15
            if kelvin < 0:
                # Invalid temperature, below absolute zero
                return "Invalid temperature"
            else:
                return fahrenheit, kelvin
    elif unit == "K":
        # Convert Kelvin to Celsius
        celsius = temperature - 273.15
        if celsius < -273.15:
            # Invalid temperature, below absolute zero
            return "Invalid temperature"
        else:
            # Convert Celsius to Fahrenheit
            fahrenheit = (celsius * (9 / 5)) + 32
            if fahrenheit < -459.67:
                # Invalid temperature, below absolute zero
                return "Invalid temperature"
            else:
                return celsius, fahrenheit
    else:
        return "Invalid unit"

Refactor the code by extracting functions without altering its functionality.

What functions did you create?
What strategies did you use to identify them?

Share your answers in the collaborative document.

Solution 1 - Basic

PYTHON

def celsius_to_fahrenheit(celsius):
    """
    Converts a temperature from Celsius to Fahrenheit.

    Args:
        celsius (float): The temperature in Celsius.

    Returns:
        float: The temperature in Fahrenheit.
    """
    return (celsius * (9 / 5)) + 32


def fahrenheit_to_celsius(fahrenheit):
    """
    Converts a temperature from Fahrenheit to Celsius.

    Args:
        fahrenheit (float): The temperature in Fahrenheit.

    Returns:
        float: The temperature in Celsius.
    """
    return (fahrenheit - 32) * (5 / 9)


def celsius_to_kelvin(celsius):
    """
    Converts a temperature from Celsius to Kelvin.

    Args:
        celsius (float): The temperature in Celsius.

    Returns:
        float: The temperature in Kelvin.
    """
    return celsius + 273.15


def kelvin_to_celsius(kelvin):
    """
    Converts a temperature from Kelvin to Celsius.

    Args:
        kelvin (float): The temperature in Kelvin.

    Returns:
        float: The temperature in Celsius.
    """
    return kelvin - 273.15


def check_temperature_validity(temperature, unit):
    """
    Checks if a temperature is valid for a given unit.

    Args:
        temperature (float): The temperature to check.
        unit (str): The unit of the temperature. Must be "C", "F", or "K".

    Returns:
        bool: True if the temperature is valid, False otherwise.
    """
    abs_zero = {"C": -273.15, "F": -459.67, "K": 0}
    if temperature < abs_zero[unit]:
        return False
    return True


def check_unit_validity(unit):
    """
    Checks if a unit is valid.

    Args:
        unit (str): The unit to check. Must be "C", "F", or "K".

    Returns:
        bool: True if the unit is valid, False otherwise.
    """
    if not unit in ["C", "F", "K"]:
        return False
    return True


def convert_temperature(temperature, unit):
    """
    Converts a temperature from one unit to another.

    Args:
        temperature (float): The temperature to convert.
        unit (str): The unit of the temperature. Must be "C", "F", or "K".

    Returns:
        tuple: A tuple containing the converted temperature in Celsius and Kelvin units.

    Raises:
        ValueError: If the unit is not "C", "F", or "K".
        ValueError: If the temperature is below absolute zero for the given unit.

    Examples:
        >>> convert_temperature(32, "F")
        (0.0, 273.15)
        >>> convert_temperature(0, "C")
        (32.0, 273.15)
        >>> convert_temperature(273.15, "K")
        (0.0, -459.67)
    """
    if not check_unit_validity(unit):
        raise ValueError("Invalid unit")
    if not check_temperature_validity(temperature, unit):
        raise ValueError("Invalid temperature")
    if unit == "F":
        celsius = fahrenheit_to_celsius(temperature)
        kelvin = celsius_to_kelvin(celsius)
        return celsius, kelvin
    if unit == "C":
        fahrenheit = celsius_to_fahrenheit(temperature)
        kelvin = celsius_to_kelvin(temperature)
        return fahrenheit, kelvin
    if unit == "K":
        celsius = kelvin_to_celsius(temperature)
        fahrenheit = celsius_to_fahrenheit(celsius)
        return celsius, fahrenheit

if __name__ == "__main__":
    print(convert_temperature(0, "C"))
    print(convert_temperature(0, "F"))
    print(convert_temperature(0, "K"))
    print(convert_temperature(-500, "K"))
    print(convert_temperature(-500, "C"))
    print(convert_temperature(-500, "F"))
    print(convert_temperature(-500, "B"))

Solution 2 - Advanced

PYTHON

class TemperatureConverter:
    """
    A class for converting temperatures between Celsius, Fahrenheit, and Kelvin.
    """

    def __init__(self):
        """
        Initializes the TemperatureConverter object with a dictionary of absolute zero temperatures for each unit.
        """
        self.abs_zero = {"C": -273.15, "F": -459.67, "K": 0}

    def celsius_to_fahrenheit(self, celsius):
        """
        Converts a temperature from Celsius to Fahrenheit.

        Args:
            celsius (float): The temperature in Celsius.

        Returns:
            float: The temperature in Fahrenheit.
        """
        return (celsius * (9 / 5)) + 32

    def fahrenheit_to_celsius(self, fahrenheit):
        """
        Converts a temperature from Fahrenheit to Celsius.

        Args:
            fahrenheit (float): The temperature in Fahrenheit.

        Returns:
            float: The temperature in Celsius.
        """
        return (fahrenheit - 32) * (5 / 9)

    def celsius_to_kelvin(self, celsius):
        """
        Converts a temperature from Celsius to Kelvin.

        Args:
            celsius (float): The temperature in Celsius.

        Returns:
            float: The temperature in Kelvin.
        """
        return celsius + 273.15

    def kelvin_to_celsius(self, kelvin):
        """
        Converts a temperature from Kelvin to Celsius.

        Args:
            kelvin (float): The temperature in Kelvin.

        Returns:
            float: The temperature in Celsius.
        """
        return kelvin - 273.15

    def check_temperature_validity(self, temperature, unit):
        """
        Checks if a given temperature is valid for a given unit.

        Args:
            temperature (float): The temperature to check.
            unit (str): The unit to check the temperature against.

        Returns:
            bool: True if the temperature is valid for the unit, False otherwise.
        """
        if temperature < self.abs_zero[unit]:
            return False
        return True

    def check_unit_validity(self, unit):
        """
        Checks if a given unit is valid.

        Args:
            unit (str): The unit to check.

        Returns:
            bool: True if the unit is valid, False otherwise.
        """
        if unit not in ["C", "F", "K"]:
            return False
        return True

    def convert_temperature(self, temperature, unit):
        """
        Converts a temperature from one unit to another.

        Args:
            temperature (float): The temperature to convert.
            unit (str): The unit of the temperature.

        Returns:
            tuple: A tuple containing the converted temperature in the other two units.
        """
        if not self.check_unit_validity(unit):
            raise ValueError("Invalid unit")
        if not self.check_temperature_validity(temperature, unit):
            raise ValueError("Invalid temperature")
        if unit == "F":
            celsius = self.fahrenheit_to_celsius(temperature)
            kelvin = self.celsius_to_kelvin(celsius)
            return celsius, kelvin
        if unit == "C":
            fahrenheit = self.celsius_to_fahrenheit(temperature)
            kelvin = self.celsius_to_kelvin(temperature)
            return fahrenheit, kelvin
        if unit == "K":
            celsius = self.kelvin_to_celsius(temperature)
            fahrenheit = self.celsius_to_fahrenheit(celsius)
            return celsius, fahrenheit

if __name__ == "__main__":
    converter = TemperatureConverter()
    print(converter.convert_temperature(0, "C"))
    print(converter.convert_temperature(0, "F"))
    print(converter.convert_temperature(0, "K"))
    print(converter.convert_temperature(-500, "K"))
    print(convert_temperature(-500, "C"))
    print(convert_temperature(-500, "F"))
    print(convert_temperature(0, "X"))

(Optional): Writing good functions in R

Challenge

Challenge 1

Write a function called kelvin_to_celsius() that takes a temperature in Kelvin and returns that temperature in Celsius.

Hint: To convert from Kelvin to Celsius you subtract 273.15

Solution to challenge 1

Write a function called kelvin_to_celsius that takes a temperature in Kelvin and returns that temperature in Celsius

R

kelvin_to_celsius <- function(temp) {
 celsius <- temp - 273.15
 return(celsius)
}

Combining functions

The real power of functions comes from mixing, matching and combining them into ever-larger chunks to get the effect we want.

Let’s define two functions that will convert temperature from Fahrenheit to Kelvin, and Kelvin to Celsius:

R

fahr_to_kelvin <- function(temp) {
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

kelvin_to_celsius <- function(temp) {
  celsius <- temp - 273.15
  return(celsius)
}

Challenge

Challenge 2

Define the function to convert directly from Fahrenheit to Celsius, by reusing the two functions above (or using your own functions if you prefer).

Solution to challenge 2

Define the function to convert directly from Fahrenheit to Celsius, by reusing these two functions above

R

fahr_to_celsius <- function(temp) {
  temp_k <- fahr_to_kelvin(temp)
  result <- kelvin_to_celsius(temp_k)
  return(result)
}

The Modular coding section is based on the following sources:

Modular Code Development from Good practices in research software development
Functions explained from R for Reproducible Scientific Analysis Software Carpentry lesson
Functions chapter from R for Data Science (2e)

Key Points

Coding conventions help you create more readable code that is easier to reuse and contribute to.
Consistently formatted code including descriptive variable and function names is easier to read and write
Software is built from smaller, self-contained elements, each handling specific tasks.
Modular code enhances robustness, readability, and ease of maintenance.
Modules can be reused across projects, promoting efficiency.
Good modules perform limited, defined tasks and have descriptive names.

Coding conventions and modular coding

Overview

Questions

Objectives

Coding formatting and linting

Formatters

Linters

Improving your code

Challenge

BASH

BASH

R

(Optional) Git pre-commit hooks

Challenge

YAML

BASH

BASH

BASH

BASH

R

YAML

Further reading: CI/CD

Modular coding

What is modularity?

Writing functions

Defining a function

R

R

R

OUTPUT

R

OUTPUT

PYTHON

PYTHON

PYTHON

OUTPUT

PYTHON

OUTPUT

Challenge: Identify code that can be put in a function

(Optional) Modularity in Python

Challenge

PYTHON

Solution 1 - Basic

PYTHON

Solution 2 - Advanced

PYTHON

(Optional): Writing good functions in R

Challenge 1

Solution to challenge 1

R

Combining functions

R

Challenge 2

Solution to challenge 2

R