This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Data Science and AI for senior researchers

The Data Science and AI for senior Biomedical Researchers project is funded by The Alan Turing Institute's AI for Science and Government Research Programme (ASG). This workshop gives an introduction to data science and Artificial Intelligence (AI). Providing context and examples from biomedical research, the workshop content includes AI for automation, the process of unsupervised and supervised machine learning, their practical applications, and common pitfalls that researchers should be aware of in order to maintain scientific rigour and research ethics.

This workshop builds on training resources and practices developed by the following communities: The Turing Way, The Carpentries, and Open Life Science. Hosted by the The Alan Turing Institute's Tools, practices and systems (TPS) research team</a>, all materials are shared under CC-BY 4.0 License. Although the workshop is tailored to experimental biologists and biomedical researchers, materials will be generally transferable and directly relevant for data science projects across different research disciplines. Anyone interested in collaboration and improvements of this workshop is welcome to connect with the development team on GitHub (see the repository).


This resource is designed for experimental biologists and biomedical researchers, with a focus on two key professional groups:

  • Group leaders and lab managers without prior experience with Data Science or management of computational projects.
  • Postdoc and lab scientists (next-generation senior leaders) interested in integrating computational science with research projects in the field of bioscience.

In defining the scope of this project, we make the following assumptions about our target audience:

  • They have a computational project in mind for which funding and research ethics approval have been received.
  • They have a good understanding of designing and contributing to a scientific project throughout its lifecycle.
  • The research team is either partially or fully established; consisting of 3 or more team members.

This workshop is developed alongside the Managing Open and Reproducible Computational Projects workshop; which includes selected practices and tools for senior researchers to manage and supervise data science and AI/ML projects life science domains.


Setup Download files required for the lesson
00:00 1. Welcome to this workshop What is the purpose of this training?
What are the learning goals and objectives?
What will this workshop not cover?
What next steps should be taken after this course?
00:10 2. Data Science, AI, and Machine Learning What is Data Science and Artificial Intelligence?
What is Machine Learning and how do they apply in biomedical research?
What are some relevant examples of Deep Learning and Large Language Models?
00:43 3. AI for Automation How is AI used for automating tasks in biomedical experimental setups?
What are examples of biomedical AI-driven software packages and what can they be used for?
00:55 4. AI for Data Insights How is AI used for data insights in biomedical experimental setups?
What types of data insights can be generated?
00:55 5. Problems with AI What are the common pitfalls of using machine learning?
What are common limitations and pitfalls in ML applications?
What are conscious and unconscious biases that might influence ML algorithms?
How can data privacy and data security be ensured?
Who is responsible and accountable for any ethical issues implied by ML utilisation?
00:55 6. Practical Considerations for Researchers What are the necessary steps before research data can be processed through ML pipelines?
What types of data cleaning can be applied to prepare raw data for ML?
01:45 7. Practical Considerations: Reporting Results How do research results differ with regards to Supervised versus Unsupervised Learning?
What are best practices for responsible reporting results from ML pipelines?
02:35 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.