Course introduction


FAIR research software


  • Open research means the outputs of publicly funded research are publicly accessible with no or minimal restrictions.
  • Reproducible research means the data and software is available to recreate the analysis.
  • FAIR data and software is Findable, Accessible, Interoperable, Reusable.
  • These principles support research and researchers by saving time, reducing barriers to discovery, and increasing impact of the research output.

Tools and practices for FAIR research software development


  • Automating your analysis with shell scripts allows you to save and reproduce your methods.
  • Version control helps you back up your work, see how data and code change over time and identify which analysis used which data and code.
  • Programming languages each have advantages and disadvantages in different situations. Use the correct tools for your own work.
  • Integrated development environments (IDEs) automate many coding tasks, provide easy access to documentation, and can identify common errors.
  • Testing helps you check that your code is behaving as expected and will continue to do so in the future or when used by someone else.

Version control


  • A version control system is software that tracks and manages changes to a project over time
  • Using version control aids reproducibility since the exact state of the software that produced an output can be recovered
  • A commit represents the smallest unit of change to a project
  • Commit messages describe what each commit contains and should be descriptive
  • Logs can be used to overview the history of a project

Reproducible development environment


  • Virtual environments keep Python versions and dependencies required by different projects separate.
  • A Python virtual environment is itself a directory structure.
  • You can use venv to create and manage Python virtual environments, and pip to install and manage Python external (third-party) libraries.
  • By convention, you can save and export your Python virtual environment in a requirements.txt in your project’s root directory, which can then be shared with collaborators/users and used to replicate your virtual environment elsewhere.

Code readability


  • Readable code is easier to understand, maintain, debug and extend (reuse) - saving time and effort.
  • Choosing descriptive variable and function names will communicate their purpose more effectively.
  • Using comments and docstrings to describe parts of the code will help transmit understanding and context.
  • Use libraries or packages for common functionality to avoid duplication.
  • Creating functions from the smallest, reusable units of code will make the code more readable and help. compartmentalise which parts of the code are doing what actions and isolate specific code sections for re-use.

Code structure


  • Good practices for code and project structure are essential for creating readable, accessible and reproducibile projects.

Code correctness


  1. Code testing supports the FAIR principles by improving the accessibility and re-usability of research code.
  2. Unit testing is crucial as it ensures each functions works correctly.
  3. Using the pytest framework, you can write basic unit tests for Python functions to verify their correctness.
  4. Identifying and handling edge cases in unit tests is essential to ensure your code performs correctly under a variety of conditions.
  5. Test coverage can help you to identify parts of your code that require additional testing.

Code documentation


  • Documentation allows users to run and understand software without having to work things out for themselves directly from the source code.
  • Software documentation supports the FAIR principles by improving the reusability of research code.
  • A (good) README, CITATION file and LICENSE file are the minimum documentation elements required to support FAIR research code.
  • Documentation can be provided to users in a variety of formats including a docs folder of Markdown files, a repository Wiki and static webpages.
  • A static documentation site can be created using the tool MkDocs.
  • Documentation frameworks such as Diataxis provide content and style guidelines that can helps us write high quality documentation.

Open collaboration on code


  • Open source applies Copyright licenses permitting others to reuse and adapt your code or data.
  • Permissive licenses allow code to be used in other products providing the copyright statement is displayed.
  • Copyleft licenses require the source code of any modifications to be released under a copyleft license.
  • Creative commons licenses are suitable for non-code files such as documentation and images.
  • Open source software can be sold, but you must supply the source code and the people you sell it to can give it away to somebody else.
  • Add license file to your repository and add a license to each file in case it gets detached.
  • Zenodo can be used to archive a Github repository and obtain a DOI for it.
  • We can include a CITATION file to tell people how to cite our code.
  • Github can track bugs or issues with a program.
  • Git branches can be used to allow multiple developers to work on the same part of a program in parallel.
  • The git branch command shows the list of branches and can create new branches.
  • The git switch command changes which branch we are working on.
  • The git merge command merges another branch into the current one.
  • Pull requests allow developers to work on their own branch/fork and then request other developers review their changes before they are merged.

Wrap-up


  • When developing software for your research, think about how it will help you and your team, your peers and domain/community and the world.