Project management, data science

Last updated on 2023-09-06 | Edit this page

Overview

Questions

  • What are the benefits of using data science skills in research project management?
  • What are common challenges for research teams and research project management?

Objectives

  • Understanding what project management and data science entail.

  • Knowing the next steps to take after this workshop.

In this introduction, we will have an overview of what is meant with project management. Because we will use data science principles to tackle the question of open and reproducible research, we will also define what data science is. Reproducibility is introduced later in the course, while we expect participants to be quire knowledgable in open research. One can refer to the additional content if openness needs a definition.

Project Management in a research context


Discussion

What defines a research project ? What makes two projects in a lab different?

The answer may vary with domain and lab culture, but the main components are very similar to what defines one research publication:

  • The vision or specific research question
  • The team: different people may be involved inside and outside the lab
  • The funding source
  • The methods used

Different types of experiments can be part of the same project.

As lab resources are often pooled between project, this means that lab and project management are linked, while being independent.

The research workflow, also known as the research cycle, usually starts with a research idea, via a literature search on what is already known on the subject, to data collection and analysis, to writing, publishing, and the final assessment of a study. Each of these steps in a given research project involves aspects of project management that such as planning, coordination, execution and monitoring of the team, resources (budgets, materials, equipment), and timeliness according to schedule.

Project Design and Planning

It involves defining the project plan which includes the scope, objectives, milestones, and deliverables within the available timeline.

This includes the creation of a timetable or Gantt chart usually created during the grant application.

A great way to visualise the project plan is the Gantt chart, a horizontal bar chart on a time scale, reflecting all of a project’s components, dependencies, and responsibilities. GanttProject and GNOME Planner are two open-source project management applications that allow you to visualise your project plan in a Gantt chart.

Resource Management

It includes managing the budget, allocation of funds to each step over time, and ensuring that deliverables and objectives will be met within the available timeframe.

Team Management

It is the task of making people work together. It includes the definition of the responsibilities for every team member on specific steps and aspects of the research project, task assignments, progress monitoring, and team efficiency measures to thus produce results. It is core in meeting the objectives and goals of a research project.

Research teams often work remotely with team members being located in different parts of the world, and therefore need tools that allow for real-time collaboration, as well as access to process documentation, files, and data to all team members at any given time.

Data Management

The core of every research project, including the collection, organisation, analysis, and secure storing of research data. A Data Management Plan (DMP) is increasingly a necessary component of the project design process and in research proposals, and describes in detail how and where the collected research will be recorded, stored securely, and made accessible for analysis and reuse. Furthermore, the F.A.I.R. Data Principles are equally important to comply with to make research data human- as well as machine-readability to ensure they are findable, accessible, interoperable and reusable, and thus FAIR. We will discuss FAIR data in more detail in episode XX.

Communication and Reporting

Effective communication among team members and project stakeholders (project partners, funders, librarians, publishers, …) to keep everyone updated on the project’s progress. A key component here is the strategic documentation and reporting of the methodologies and any adjustments along the execution of the project. Electronic Lab Notebooks (ELNs) provide a certain amount of interoperability between systems to automate some of the documentation processes, thereby making it easier and more efficient to comply with the FAIR principles.

  • ELNs
  • Progress reports

Risk Management

It is an assessment and documentation of any foreseeable events that might arise and interfere with the success of the project. and adaption With a S.W.O.T. Analysis as part of the project plan, it is possible to describe strengths and weaknesses of the project idea, and also opportunities and threats that might impact the project through external factors. A thorough risk assessment allows you to make contingency plans to address any challenges or issues should they occur.

Monitoring and Evaluation

Monitoring and Evaluation involves ensuring quality control throughout the execution of the project and the adherence to research policies and procedures, lessons learned and conclusions that can be drawn from the results.

Dissemination

Making research results Findable, Accessible, Interoperable, and Reusable (see again the [F.A.I.R. principles[(https://www.go-fair.org/fair-principles/)) is key to the dissemination plan of a research project. Datasets and code should be archived in standardised repositories (see re3data.org for lists of region- and discipline-specific as well as generalist repositories you can use).

Testimonial

I often manage projects where I do not have decision power. My work usually starts by making sure decisions are taken and documented. The documentation is very important, because the implementation of the decision often requires me to remind people of their decision. Indeed, during the rest of the project, my role consist in monitoring progresses and remind people of the objectives, such that specific work is done at the right time. For instance, a team may well aim for publishing data at the start of a project, but this objective will not be met unless a specific workflow is implemented at the start of the project. My work is then to make sure the researchers are aware of the problems and that they do not take shortcuts that go against the long term objectives.

As a project manager, one needs to make sure the resources are sufficient to achieve the goals set in the plan. In research, it often consists in making sure the people are indeed working toward the goal and are not investing their time to other projects or other objectives of the lab. This is particularly important when short term individual incentives are not aligned with the project long term vision. For instance, data management is primordial in team science, but, too often, data collector cannot recognize its importance.

Project Management tools overview

Several digital project management tools exist that facilitate the remote coordination and management of research teams with their projects. Widely used proprietary examples include Trello, Asana, ClickUp, Notion, and Zoho Projects. Each of these has a different set of features, and all of them work with a Kanban board for process documentation.

Kanboard, WeKan, Open project, and Taiga are examples of open-source project management software that contain Kanban boards.

Discussion

What project management (digital) tools are you using ?

What is data science


Over the last decade, several tools, methods and training resources have been developed for early career researchers to learn about and apply data science skills in biomedicine. This is often referred to as biomedical data science, with the following definition.

Testimonial

Biosciences and biomedical researchers regularly combine mathematics and computational methods to interpret experimental data. The term “data science” describes expertise associated with taking (usually large) data sets and annotating, cleaning, organizing, storing, and analyzing them for the purposes of extracting knowledge. […] The terms “biomedical data science” and “biomedical data scientist” […] connote activities associated with the creation and application of methods to new and large sources of biological and medical data aimed at converting them into useful information and knowledge. They also connote technical activities that are data-intensive and require special skills in managing the large, noisy, and complex data typical of biology and medicine. They may also imply the application of these technologies in domains where their collaborators previously have not needed data-intensive computational approaches.

-- Russ B. Altman and Michael Levitt (2018). Annual Review of Biomedical Data Science

In contrast to the definition above (and as will be explained in the next chapters), we think research which is not data intensive would also gain in applying data science principles. However, to ensure that data science approaches are appropriately applied in domain research, such as in biosciences, there is a need to also engage and educate scientific group leaders and researchers in project leadership roles on best practices. Computational methods might indeed be as complex as a neural network, but even statistical tests and producing figures for a publication require data science and coding methods.

Researcher use data science skills to apply computation techniques and reproducible data analyses approaches to their research questions. In order to apply these tools, researcher first need to understand and apply the building blocks of data science, especially research data management, collaborative working and project management.

Two people with computational expertise holding a giant book towards two other people who conduct lab experiments. The book saya: how to apply data science in biology.

How to apply data science in biology. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807

Testimonial

In some instances, it has been argued that “data science” simply rebrands existing fields like statistics or computer science. Our view is that data science has gained traction as an overarching term due to increased data availability and complexity; development of computational methods; advances in computational infrastructure; growing concerns about scientific rigor and the reproducibility of research findings; and a recognition that new advances will result from interdisciplinary research and collaboration. These trends are not unique to data science, but their integration and consolidation under a single term, however broad, reflects an understanding of their interconnectedness and is a real shift in the scientific landscape

  • Goldsmith, J., Sun, Y., Fried, L. P., Wing, J., Miller, G. W., & Berhane, K. (2021). The Emergence and Future of Public Health Data Science. Public Health Reviews, 42. doi: 10.3389/phrs.2021.1604023

With new technologies supporting the generation of large-scale data as well as successful applications of data science, Machine Learning (ML) and Artificial Intelligence (AI) in biomedicine and related fields have recently shown huge potential to transform the way we conduct research. Recent groundbreaking research utilising AI technologies in biomedicine has led to an enormous interest among researchers in data science, ML and AI approach to extract useful insights from big data, make new discoveries and address biological questions. As pictured below, in order to apply these tools, researcher first need to understand and apply the building blocks of data science, especially research data management, collaborative working and project management.

Discussion

  • In what aspects of your projects do you already apply computational and statistical approaches?

  • Do you consider data science relevant for your project? Why/Why not?

  • To what extent do you apply data science practices in your research projects? (A) Not yet, (B) Sometimes, (C) In most projects, (D) In every project.

The Data Science for Biomedical Scientists project helps address this need in training by equipping experimental biomedical scientists with essential computational skills. In all the resources developed within this project, we consistently emphasise how computational and data science approaches can be applied while ensuring reproducibility, collaboration and transparent reporting.

Testimonial

The goal is to maintain the highest standards of research practice and integrity.

In this training material for learning how to manage computational projects, we discuss essential practices for computational reproducibility required for carrying out meaningful analyses of research datasets through data exploration, processing, visualisation and communication. We present unfamiliar and complex topics from computation and data science to biologists by providing examples and recommendations from their fields. The goal is to enable effective management and sharing of their computational projects. We therefore encourage you to go through this training material before taking our second workshop, more focused in AI and Data Science.

Project management in open and reproducible projects


This course aims at giving an overview of project management techniques particularly useful for open and reproducible computational project. This means we will not talk much about risk assessment, and resource management, but focus on team and data management. This has of course repercussion in the project design and planning, the communication and reporting strategy, and monitoring and quality control is at the core of the code management topic.

In this course, we will also present some software that may help to share project management tasks and results in a distributed team.

Some References and resources


General guides

Relevant turing way chapters

literate programing guide

Keypoints

  • Project management includes resource, team, data, communication and risk management.
  • Data science is dealing with large and diverse team, working remotely while using and re-using code to achieve reproducible analysis.