Course introduction
FAIR research software
- Open research means the outputs of publicly funded research are publicly accessible with no or minimal restrictions.
- Reproducible research means the data and software is available to recreate the analysis.
- FAIR data and software is Findable, Accessible, Interoperable, Reusable.
- These principles support research and researchers by saving time, reducing barriers to discovery, and increasing impact of the research output.
Tools and practices for FAIR research software development
- Automating your analysis with shell scripts allows you to save and reproduce your methods.
- Version control helps you back up your work, see how data and code change over time and identify which analysis used which data and code.
- Programming languages each have advantages and disadvantages in different situations. Use the correct tools for your own work.
- Integrated development environments (IDEs) automate many coding tasks, provide easy access to documentation, and can identify common errors.
- Testing helps you check that your code is behaving as expected and will continue to do so in the future or when used by someone else.
Version control
- A version control system is software that tracks and manages changes to a project over time
- Using version control aids reproducibility since the exact state of the software that produced an output can be recovered
- A commit represents the smallest unit of change to a project
- Commit messages describe what each commit contains and should be descriptive
- Logs can be used to overview the history of a project
Reproducible development environment
- Virtual environments keep Python versions and dependencies required by different projects separate.
- A Python virtual environment is itself a directory structure.
- You can use
venv
to create and manage Python virtual environments, andpip
to install and manage Python external (third-party) libraries. - By convention, you can save and export your Python virtual
environment in a
requirements.txt
in your project’s root directory, which can then be shared with collaborators/users and used to replicate your virtual environment elsewhere.
Code readability
- Readable code is easier to understand, maintain, debug and extend (reuse) - saving time and effort.
- Choosing descriptive variable and function names will communicate their purpose more effectively.
- Using comments and docstrings to describe parts of the code will help transmit understanding and context.
- Use libraries or packages for common functionality to avoid duplication.
- Creating functions from the smallest, reusable units of code will make the code more readable and help. compartmentalise which parts of the code are doing what actions and isolate specific code sections for re-use.
Code structure
- Good practices for code and project structure are essential for creating readable, accessible and reproducibile projects.
Code correctness
- Code testing supports the FAIR principles by improving the accessibility and re-usability of research code.
- Unit testing is crucial as it ensures each functions works correctly.
- Using the
pytest
framework, you can write basic unit tests for Python functions to verify their correctness. - Identifying and handling edge cases in unit tests is essential to ensure your code performs correctly under a variety of conditions.
- Test coverage can help you to identify parts of your code that require additional testing.
Code documentation
- Documentation allows users to run and understand software without having to work things out for themselves directly from the source code.
- Software documentation supports the FAIR principles by improving the reusability of research code.
- A (good) README, CITATION file and LICENSE file are the minimum documentation elements required to support FAIR research code.
- Documentation can be provided to users in a variety of formats
including a
docs
folder of Markdown files, a repository Wiki and static webpages. - A static documentation site can be created using the tool MkDocs.
- Documentation frameworks such as Diataxis provide content and style guidelines that can helps us write high quality documentation.
Open code & collaboration
- Zenodo can be used to archive a GitHub repository and obtain a DOI for it.
- We include a CITATION file with our code to tell people how to cite it.
- GitHub can help us track bugs or issues with software.
- Git branches can be used to allow multiple developers to work on the same part of code in parallel.
- The
git branch
command shows the list of branches and can create new branches. - The
git switch
command changes which branch we are working on. - The
git merge
command merges another branch into the current one. - Pull requests allow developers to work on their own branch/fork and then request other developers review their changes before they are merged.
Wrap-up
- When developing software for your research, think about how it will help you and your team, your peers and domain/community and the world.