This lesson is still being designed and assembled (Pre-Alpha version)

Managing Open and Reproducible Computational Projects

The Managing Open and Reproducible Computational Projects training material covers best practices for managing and supervising computational projects in biology and related fields through data science methods, analysis, interpretation, and reporting processes. Through lessons learned in this training, researchers will enhance their understanding and guide the integration of rigorous and reproducible scientific methods for designing reproducible, transparent and collaborative computational projects. Furthermore, the guidance provided for managing and supervising early career researchers in conducting computational (data-driven/informed) research will help ensure transparency and research integrity throughout the project design, methodology, analysis, interpretation and reporting process.

This training material is developed under the Data Science for Biomedical Scientists project. It massively reuses The Turing Way chapters and builds on The Carpentries and Open Life Science practices. Hosted by the Tools, practices and systems (TPS) research team, all materials are shared under CC-BY 4.0 License. Although the training course is tailored to the biomedical sciences community, materials will be generally transferable and directly relevant for data science projects across different domains. Anyone interested in collaboration and improvements of this material is welcome to connect with the development team on GitHub (see the repository).

Funding Acknowledgement: The first iteration of this project was funded by The Alan Turing Institute - AI for Science and Government (ASG) Research Programme from October 2021 to March 2022.


This resource is designed for experimental biologists, biomedical researchers and adjacent communities, with a focus on two key professional/career groups:

  • Group leaders or lab managers without prior experience with Data Science or management of computational projects
  • Postdoc and lab scientists (next-generation senior leaders) interested in enabling the integration of computational science into biosciences

In defining the scope of this project for our target audience, we make some assumptions about the learner groups:

  • Our learners have a good understanding of designing or contributing to a scientific project throughout its lifecycle.
  • They have a computational project in mind for which funding and research ethics approval have been received.
  • We also assume that the research team of any size is (either partially or fully) established.

This lesson is developed alongside the Introduction to Data Science and AI for senior researchers lesson. Our learners are encouraged to go through Introduction to Data Science and AI for senior researchers lesson to learn about data science and AI/ML practices that could be relevant to life science domains, where the best practices for Managing Open and Reproducible Computational Projects can be practically applied.


Setup Download files required for the lesson
00:00 1. Introduction to this course What is the purpose of this training?
Who are the target audience?
What will they learn at the end of this training?
00:10 2. Better and faster research ! How does this training relate to your work?
What are the benefits of using data science skills?
What are the challenges for teams and management?
00:45 3. What is special in data science project ? Get an overview of the training material
understand how the different aspects of this material relates to one another
01:00 4. Reproducibility How to build reproducible analysis?
how to deal with dependencies?
01:10 5. An introduction to version control What is version control?
Why using git ?
How is version control system relevant for biomedical research?
01:50 6. What IT tools can be used ? What IT tools are used in data science and how do they relate to research project?
02:00 7. Setting up a computational project How to set up a computational project?
What main concerns and challenges exist and how to address them?
How to create a project repository for sharing, collaboration and an intention to release?
02:30 8. Implementing tools and methods during the project How to manage and oversee tasks and track progress of your projects?
How collaborative practices help ensure code quality, testing and reuse?
What is literate programming and how does it help with early communication, testing and collaboration?
03:00 9. Research Data Management What is considered research data?
How to start building a research data management plan?
What is FAIR principles for data management?
Why care about documentation and metadata standards?
03:50 10. Fostering documentation
04:00 11. Scientific rigour with code Is analysis with code more rigourous ?
What is p-hacking?
04:10 12. Coding basics What is the role of data wrangling?
What is literate programming?
How to use data visualisation for insight and communication?
04:30 13. Code testing and Review What are the main objectives and best practices for testing and reviewing code?
What can continous integration help?
How can group leaders facilitate a collaborative environment for code review?
04:50 14. Code Modularity
05:00 15. Publication and release Why should I make my research objects available?
What open source tools to use for applying data science practices in bioscience?
How to get your research work cited and invite more contributions to your project?
05:30 16. Open Science Practices How to maintain history of contributions and contributors?
How to apply open science practices to work transparently and collaborate openly?
05:50 17. Data and code citation Why should I make my research objects available?
What open source tools to use for applying data science practices in bioscience?
How to get your research work cited and invite more contributions to your project?
06:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.