Tools and good practices for research software

Last updated on 2025-03-27 | Edit this page

Overview

Questions

  • What tools and practices are available to help us develop good quality and FAIR research software?
  • How do such tools and practices fit together to enable open and reproducible research?

Objectives

After completing this episode, participants should be able to:

  • Identify some key tools and practices to aid the development of quality research software
  • Explain how can these tools help us work in a FAIR way

There are various tools and practices for the development of research software that support open and reproducible research. These tools and practices work together - no single one will fully address one aspect of developing quality research software, and each one can contribute to multiple aspects simultaneously. It is important to understand that merely using these tools, without aligning them to good practices is not sufficient to create FAIR and quality research software.

You should already have some of these tools installed on your machine by following the setup instructions. Here, we will provide an overview and explain good practices for using them in your research software development worklflow. This list is not exhaustive — you should choose the best set of tools and practices based on your specific domain, problem and community.

Use version control


Using a version control system is a fundamental best practice in software development, ensuring that changes to code are tracked, managed, and reversible. Version control systems maintain a history of modifications and enable efficient collaboration - promoting code ownership, responsibility and credit. When combined with software sharing and collaborative platforms such as GitHub, GitLab or BitBucket, it facilitates code publication, sharing and findability, teamwork and discussions about software and design decisions, provides backup facilities for your code and speeds up collaboration on shared code by allowing edits by more than one person at a time.

Organise software project and structure code


Structure your software project following community best practices and write clean, modular, and reusable code to enhance readability, testing, extensibility, and reusability.

Use reproducible software environments


Using virtual development environments or containerisation ensures software can be developed and run consistently across different systems. Virtual software development environments (such as conda or venv) enable us to share our working code environments with others, making it easier to run, reuse and extend our code. Docker is an open-source containerisation platform that allows developers to package software applications along with their dependencies (libraries, configurations, etc.) into a container that can be deployed and run consistently on different operating systems. We will not cover Docker in this course, but will focus on virtual environments for developing software.

Comply with coding conventions


Applying relevant coding style guidelines (e.g. PEP8 for Python) is crucial for writing clean, readable, and maintainable code. These guidelines help ensure consistency across a codebase, making it easier for developers to understand and collaborate on projects.

Following coding conventions and guides for your programming language that is agreed upon by the community and other programmers are important practices to ensure that others find it easy to read your code, reuse or extend it in their own examples and applications.

Use standard data formats and communication protocols


Using standard data exchange, input and output formats and communication protocols helps create interoperable software that can more readily integrate with other tools into more complex pipelines - increasing its interoperability and reusability.

Test software


Implement tests to validate code functionality, ensure result accuracy, and maintain result consistency over time when updating the code. Testing ensures that your code is correct and does what it is set out to do. When you write code you often feel very confident that it is perfect, but when writing bigger codes or code that is meant to do complex operations it is very hard to consider all possible edge cases or notice every single typing mistake.

Testing also gives other people confidence in your code as they can see an example of how it is meant to run and be assured that it does work correctly on their machine - helping with code understanding and reusability.

Document software


Provide clear and comprehensive documentation, including code comments, API specifications, setup guides, and usage instructions, to ensure the software is easy to understand, use, and extend.

Software documentation varies in scope and intended audience (developers and contributors, end-users, system administrators, etc.), ranging from code-level documentation for developers and end-users looking to modify and extend the code to project and product-level documentation help different users discover the software, understand its functionality, legal terms, installation, contribution guidelines and project governance. More extensive documentation, including full websites with function definitions, usage examples, tutorials, and guides, can further support adoption.

While you may not need as much documentation as a large commercial product, ensuring your code is well-documented is essential for making it discoverable, reusable, and maintainable.

License software


A licence is a legal document which sets down the terms under which the creator of work (such as written text, photographs, films, music, software code) is releasing what they have created for others to use, modify, extend or exploit. It is important to state the terms under which software can be reused - the lack of a licence for the software implies that no one can reuse the software at all.

Use open-source licenses (e.g., MIT, BSD) to make your code freely accessible, modifiable, and redistributable.

Share software


Openly publish your research software with proper documentation and licence in an accessible repository to promote collaboration, transparency, and peer review to identify bugs, improve software correctness and quality, and promote knowledge sharing. Having somewhere to share or publish your code is fundamental to making it findable and accessible, and promote reuse of your software or reproducibility of your results.

Your institution might have a code repository, your research field may have a practice of sharing code via a specific website, archive or journal, or your version control system might include an online component that makes sharing different versions of your code easy. You should check the rules or guidelines of your institution, grant or domain on publishing code, as well as any licenses of the code your software depends on or reuses.

Some examples of commonly used software repositories and registries include:

  • general-purpose software repositories - GitHub, GitLab or BitBucket
  • programming language-specific software repositories for publishing software packages and libraries- PyPi (for Python) and CRAN (for R)
  • software registries - BioTools (for biosciences) and Awesome Research Software Registries, providing a list of research software registries (by country, organisation, domain and programming language) where research software can be registered to help promote its discovery

Use persistent identifiers for software


Using unique persistent identifiers, such as Digital Object Identifiers (DOIs) provided by Zenodo or FigShare, or SoftWare Heritage persistent IDentifiers (SWHID) provided by Software Heritage, and similar digital archiving services help with uniquely identifying your software and its various versions. Code sharing platforms such as GitHub, GitLab or BitBucket provide commit/tag/release identifiers which are very useful but typically only unique within a project or a repository and not globally.

Services like Zenodo provide permanent archiving (storage) for research artefacts (including software) ensuring long-term availability and accessibility of research outputs (even if code sharing services go offline for any reason). Zenodo also integrates with code sharing and development platform GitHub allowing for automatic archiving of software repositories.

These services improve the visibility, findability, citatiton and preservation of research software.

Provide citation for software


Provide identification and citation information in documentation for your software (including its versions) to ensure that software can be properly cited in publications and research outputs. Software citation help researchers and developers properly cite software in publications, ensuring credit and reproducibility. Citation tools can generate standardised citation formats and often integrate with repositories like GitHub, Zenodo and similar DOI services.

There are multiple formats for providing citation information for software, such as including plain text files (e.g. CITATION.txt) or Markdown files (e.g. CITATION.md) or BibTeX files within the software and its repository. Citation File Format (CFF) and CodeMeta are both standard formats for describing software metadata. CFF is a YAML format, while CodeMeta is a JSON format; there are related tools to generate them (e.g. cffinit) Using CITATION.cff or codemeta.json files for software citation offers advantages by enabling richer metadata for citing code, making citation information more accessible and usable by both humans and machines and interoperability with various repositories.

Build community & encourage collaboration


Foster a community and promote collaboration around your software by engaging with users, encouraging contributions, and providing clear guidelines for participation.

Platforms like GitHub, GitLab and BitBucket are more than just source code repositories - they provide social platforms to facilitate discussions, track issues, recognise contributors and support collaboration around your software projects and promote long-term involvement, maintenance and sustainability of your software.

Other software development tools


Integrated development environments (IDEs), such as Visual Studio Code (VS Code) or PyCharm, help with reading, running, testing, and debugging code. IDEs often provide integrations with other tools, e.g. version control and command line terminals, enabling you to do many tasks from a single environment, saving time in switching between different tools.

Command line terminals and tools (e.g. Bash, GitBash) enable us to run and test our code without graphical user interfaces (GUI) afforded to us by IDEs - this is sometimes needed for running our code remotely on servers and high-performance systems without a GUI provision, where time, memory and processing power are expensive or in high demand. Version control systems are typically provided as command line tools but many IDEs now provide integrations with version control systems too - making them accessible using a GUI instead of typing commands in a terminal.

Well designed command line tools support interoperability by using standard protocols for passing parameters, inputs and outputs. Such tools integrate well with one another - allowing us to chain them and build up complex and reproducible workflows and analysis pipelines using several tools in different steps. If we write our software in a way which provides such an interoperable Command Line Interface (CLI) - we will be able to integrate it with other existing command line tools to automate and speed up our work.

Challenge

Individually,

  • reflect on what practices (and tools) you are already using in your software development workflow,
  • list at least 3 new practices or tools that you would like to start employing or using.

Write your reflections in the shared collaborative document.

Tools & practices summary


The table below provides a summary of some good practices that you should be adhering to when developing research software, together with different tools that can help with such practices and how they contribute to the the FAIR and other good software principles.

Practices Tools FAIR Readable/Understandable Correct/Reliable Sustainable/Maintainable
Use version control git, GitHub, GitLab, BitBucket F
Write modular code with well defined interfaces R x x
Connect reusable software components into automated/reproducible software pipelines Command line tools, CLI, workflow tools (Galaxy, Snakemake, WorkflowHub, CWL) I, R
Use reproducible software development environments venv, conda, IDEs (integration with virtual envs.), Docker, etc. R
Use conventional folder structures, format your code to comply with coding conventions PEP8, IDEs (help with formatting and conforming with coding conventions) R
Use standard exchange data formats/communication protocols/interfaces CSV, YAML, JSON, CLI, REST, HTTP(S), etc. I x
Test your software Unit, functional, integration, regression, etc. tests, IDEs (testing and debugging), CI/CD (automation) x
Document your software Comments and documentation strings, README, guides, contributions guidelines R x
Share your software & encourage review Code sharing platforms/services (e.g. GitHub, GitLab, BitBucket) and their code review facilities F, A x
License your software Various open source licences for code (MIT, BSD, GPL, LGPL, etc.), LICENSE R
Use persistent identifiers for your software DOIs, SWHIDs, Zenodo, FigShare, Software Heritage, etc. F x
Provide citation & metadata info for your software CITATION, DOIs, Zenodo, CFF, cffinit, CodeMeta, etc. R x
Build community & encourage collaboration around your software Code of Conduct, README, CONTRIBUTING, open source project governance processes x

Let’s explore some of these practices and tools in detail over the next few episodes.

Further reading


We recommend the following resources for some additional reading on the topic of this episode:

Also check the full reference set for the course.

Key Points

  • Automating your analysis with shell scripts allows you to save and reproduce your methods.
  • Version control helps you back up your work, see how data and code change over time and identify which analysis used which data and code.
  • Programming languages each have advantages and disadvantages in different situations. Use the correct tools for your own work.
  • Integrated development environments (IDEs) automate many coding tasks, provide easy access to documentation, and can identify common errors.
  • Testing helps you check that your code is behaving as expected and will continue to do so in the future or when used by someone else.