Why should we share long-term experiment data

Last updated on 2024-11-19 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • Why should you share data?
  • What are the motivations for sharing LTE data?

Objectives

  • Understand the wider context for data sharing.
  • Explain how sharing data can increase impact of your data and research.
  • Understand how sharing data can increase the return on investment to funders.
  • Understand how sharing data supports scientific integrity.

What is open science?


Open science is the movement to make scientific research outputs accessible to all. Open science research outputs include articles, datasets, physical samples, protocols, and code. These outputs are the building blocks of the open science movement.

What is open data?


Open data is a key building block of the open science movement and key to supporting scientific integrity and reproducibility, but what does open data mean? The Open Knowledge Foundation’s Open Data Handbook defines Open Data as “data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike”.

The full open data definition gives precise details as to what this means in practice. These are:

  • Data must be freely available, and at no more than the cost of reproduction.
  • Data must me available in a modifiable and convenient form.
  • Data must be distributed with terms allowing for its reuse and redistribution.
  • Everyone must be able to use the data, there must be no restrictions based on intended use or user.

Does data have to be open to be shareable?


No. Data can be shared even if it isn’t open. ‘Data available on request’ is still a common practice in research article data availability statements. sucha a statement means the data is not open, but the data can be shared. As we will see in the next episode on FAIR data, data can be FAIR but not open.

What are the motivations for open science and data sharing?


Discussion

Working in groups, identify reasons why many organisations, funders, and governments are promoting open science and data sharing.

  • Think about the different motivations and who benefits.
  • How might these motivations change for LTE data?

Return on Investment

A lot of research is funded by the public, either through charities or taxes. For example, Horizon Europe has a €95.5 billion budget for 2021-27, and the Bill and Melinda Gates Foundation aims to pay $9 billion by 2026. But, according to EU report “Cost-benefit analysis for FAIR research data”, €10.2bn is lost every year because of not accessible data. Funders, who are spending billions on research, have a self-interested motivation to have a return on their investments. One method for doing that is to ensure the data they pay to generate is reused as widely as possible, beyond its original purpose. Funders may be especially keen to see data which is either expensive to generate or difficult to replicate shared.

Many funders have therefore chosen to adopt policies actively encouraging and in some cases mandating open data sharing: For example, the following funders all promote open data sharing in their data policies:

  • Bill and Melinda Gates Foundation
  • National Science Foundation (US)
  • CGIAR (International)
  • USAID
  • Horizon Europe
  • Biodiversa
  • UKRI (UK)

Moral Imperative

For publicly funded research there is a moral imperative to make data openly available. Public money should be invested for public good, namely the public should benefit by not having to pay to access research they already funded, and reducing duplication of effort.

Reproducbility, transparency, and accountability

Open Science by its nature helps to address reproducibility in science. Sharing data allows other researchers to replicate and results and validate their provenance. This helps to build transparency and trust in research findings.

By making data openly available researchers can more easily counteract narratives which seek to deliberately misinterpret the data by selectively analysing the data.

Personal reward

Open data works best when researchers are incentivised and rewarded for sharing their data. There is a growing body of evidence that sharing data can lead to increased citations. Allowing other researchers to reuse your data can increase the impact of your research, and if datasets are cited this impact can be measured and reported. Adding accessible data as a research output can therefore increase researcher reputation, and help to build communities around reuse of data.

Organisations are developing infrastructures to recognise open science practices. For example the Declaration on Research Assessment (DORA) aims to recognise datasets as important research outputs which should be considered by funders and institutions in their assessment processes.

Long-term data

Long-term experiments pull together several motivations for making data accessible. Long-term data is costly to produce, unique and irreplaceable and can be used to address a wide variety of research questions. The types of questions that can be asked from LTE increases if there are opportunities to combine and analyse their data using new methods. For example, Maclaren et al (2022) combined data from 30 LTEs in Africa and Europe to analyse the interactions between different management practices.

In the UK, the Biotechnology and Biological Sciences Research Council data sharing policy, who fund the Rothamsted Long-term experiments, identify the low throughput cumulative long times series data generated by long-term experiments as having an especially strong scientific case for data sharing.

Long-term experiments can directly benefit from sharing data by establishing new collaborations with researchers using the data. In our opinion, it is not necessary for an LTE manager to require co-authorship on every article authored by a researcher using LTE data, however, it is our experience from the Rothamsted LTEs that researchers will often seek support to interpret LTEs from experiment managers and data curators and this naturally leads to co-authorship.

Who uses LTE data

Use the Rothamsted Long-term Experiments Bibliography to see how many different research areas have used the Broadbalk experiment.

  • Agronomy
  • Weed ecology
  • Soil science
  • Soil metagenomics
  • Plant nutrition
  • Nitrogen Use efficiency
  • Crop and soil model development
  • Soil carbon dynamics
  • Atmospheric sciences
  • Disease and disease resistance

LTE data is therefore inherently valuable to a wide research community. By making LTE data openly available, LTE managers can increase the impact of the experiment through data reuse and use this to evidence ongoing impact to funders.

Key Points

  • Open science aims to remove barriers to accessing research outputs.
  • Open data is a key building block of Open Science.
  • There are different motivations for sharing data including.
  • LTE data is inherently valuable with relevance to multiple research areas.