Essential Features of a Comparative Experiment

Last updated on 2025-06-30 | Edit this page

Overview

Questions

  • How are comparative experiments structured?

Objectives

  • Describe the common features of comparative experiments.
  • Identify experimental units, treatments, response measurements, ancillary and nuisance variables

Well-designed experiments can deliver information that has a clear impact on human health. The main goals of good experimental design are to develop valid scientific conclusions and to efficiently use resources - time, money, and materials - to deliver meaningful results. These results include estimates that are precise enough to be meaningful, not fuzzy and useless. Statistical tests should have enough power to detect a real effect if it’s there. Ultimately this can protect conclusions by ruling out extraneous factors or biases, and with certainty and confidence claiming that the effect is a direct result of the treatment and not something else.

For example, to compare enzyme levels measured in processed blood samples from laboratory mice using either a kit from a vendor A or a kit from a competitor B. From 20 mice, randomly select 10 of them for sample preparation with kit A, while the blood samples of the remaining 10 mice are prepared with kit B. The average level of for kit A is 10.32 and for kit B 10.66. We might interpret the difference of -0.34 as due to differences in the two preparation kits and conclude that either they give substantially equal results or alternatively, that they systematically differ from one another.

boxplots of enzyme levels in kits A and B
boxplots of enzyme levels in kits A and B

Either interpretation is only valid if the two groups of mice and their measurements are identical in all aspects except for the sample preparation kit. If we use one strain of mice for kit A and another strain for kit B, any difference might also be attributed to inherent differences between the strains. Similarly, if the measurements using kit B were conducted much later than those using kit A, any observed difference might be attributed to changes in, e.g., personnel, device calibration, or any number of other influences. None of these competing explanations for an observed difference can be excluded from the given data alone, but good experimental design allows us to render them (almost) implausible. adapted from Statistical Design and Analysis of Biological Experiments by Hans-Michael Kaltenbach

The Generation 100 study evaluated the effects of exercise on more than 1500 elderly Norwegians from Trondheim, Norway, to determine if exercise led to a longer active and healthy life. Specifically the researchers investigated the relationship between exercise intensity and health and longevity. One group performed high-intensity interval training (10 minute warm-up followed by four 4-minute intervals at ∼90% of peak heart rate) twice a week for five years. A second group performed moderate exercise twice a week (50 minutes of continuous exercise at ∼70% of peak heart rate). A third control group followed physical activity advice according to national recommendations. Clinical examinations and questionnaires were administered to all at the start and after one, three, and five years. Heart rate, blood pressure, leg and grip strength, cognitive function, and other health indicators were measured during clinical exams.

experimental design for Generation 100 study
experimental design for Generation 100 study

Generation 100: Does exercise make older adults live longer?

Challenge 1: Raw ingredients of a comparative experiment

Discuss the following questions with your partner, then share your answers to each question in the collaborative document.

  1. What is the research question in this study? If you prefer to name a hypothesis, turn the research question into a declarative statement.
  2. What is the treatment (treatment factor)? How many levels are there for this treatment factor?
  3. What are the experimental units (the entities to which treatments are applied)?
  4. What are the responses (the measurements used to determine treatment effects)?
  5. Should participants have been allowed to choose which group (high-intensity, moderate exercise, or national standard) they wanted to join? Why or why not? Should the experimenters have assigned participants to treatment groups based on their judgment of each participant’s characteristics? Why or why not?

The research question asked whether exercise, specifically high-intensity exercise, would affect healthspan and lifespan of elderly Norwegians. The treatments were high-intensity, moderate-intensity, and national standard exercise groups. The experimental units are the individuals. The responses measured were heart rate, blood pressure, strength, cognitive function and other health indicators. The main response measured was 5-year survival. If participants had been allowed to choose their preferred exercise group or if experimenters had chosen the groups based on participant characteristics, extraneous variables (e.g. state of depression or anxiety) could be introduced into the study. When participants are randomly assigned to treatment groups, these variables are spread across the groups and cancel out. Furthermore, if experimenters had used their own judgment to assign participants to groups, their own biases could have affected the results.

Conducting a Comparative Experiment

Comparative experiments apply treatments to experimental units and measure the responses, then compare the responses to those treatments with statistical analysis. If in the Generation 100 study the experimenters had only one group (e.g. high-intensity training), they might have achieved good results but would have no way of knowing if either of the other treatments would have achieved the same or even better results. To know whether high-intensity training is better than moderate or low-intensity training, it was necessary to run experiments in which some experimental units engaged in high-intensity training, others in moderate, and others still in low-intensity training. Only then can the responses to those treatments be statistically analyzed to determine treatment effects.

An experimental unit is the entity to which treatment is applied, whether this is a person, a mouse, a cell line, or a sample.

Challenge 2: Which are the experimental units?

Identify the experimental units in each experiment described below, then share your answers in the collaborative document.

  1. Three hundred mice are individually housed in the same room. Half of them are fed a high-fat diet and the other half are fed regular chow.
  2. Three hundred mice are housed five per cage in the same room. Half of them are fed a high-fat diet and the other half are fed regular chow.
  3. Three hundred mice are individually housed in two different rooms. Those in the first room are fed a high-fat diet and those in the other room are fed regular chow.
  1. The individual animal is the experimental unit.

  2. The cage receives the treatment and is the experimental unit.

  3. The room receives the treatment and is the experimental unit.

Reducing Bias with Randomization and Blinding

Randomized studies assign experimental units to treatment groups randomly by pulling a number out of a hat or using a computer’s random number generator. The main purpose for randomization comes later during statistical analysis, where we compare the data we have with the data distribution we might have obtained by random chance. Randomization provides us a way to create the distribution of data we might have obtained and ensures that our comparisons between treatment groups are valid. Random assignment (allocation) of experimental units to treatment groups prevents the subjective bias that might be introduced by an experimenter who selects, even in good faith and with good intention, which experimental units should get which treatment. For example, if the experimenter selected which people would do high-, moderate- and low-intensity training they might unconsciously bias the groups by body size or shape. This selection bias would influence the outcome of the experiment.

Randomization also accounts for or cancels out effects of “nuisance” variables like the time or day of the experiment, the investigator or technician, equipment calibration, exposure to light or ventilation in animal rooms, or other variables that are not being studied but that do influence the responses. Randomization balances out the effects of nuisance variables between treatment groups by giving an equal probability for an experimental unit to be assigned to any treatment group.

Blinding (also known as masking) prevents the experimenter from influencing the outcome of an experiment to suit their expectations or preferred hypothesis. Ideally experimenters should not know which treatment the experimental units have received or will receive from the beginning through to the statistical analysis stage of the experiment. This might require additional personnel like technicians or other colleagues to perform some tasks, and should be planned during experimental design. If ideal circumstances can’t be arranged, it should be possible to carry out at least some of the stages blind. Blinding during allocation (assignment of experimental units to treatment groups), treatment, data collection or data analysis can reduce experimental bias.

Challenge 3: How does bias enter an experiment?

Identify ways that bias enters into each experiment described below, then share your answers in the collaborative document.

  1. A clinician perceives increased aggression in subjects given testosterone.
  2. A clinician concludes that mood of each subject has improved in the treatment group given a new antidepressant.
  3. A researcher unintentionally treats subjects differently based on their treatment group by providing more food to control group animals.
  4. A clinician gives different nonverbal cues to patients in the treatment group of a clinical trial than to the control group patients.

1 and 2 describe nonblind data collection reporting increased treatment effects. Inflated effect sizes are a common problem with nonblinded studies. In 3 and 4 the experimenter

Key Points

  • The raw ingredients of comparative experiments are experimental units, treatments and responses.