This lesson is being piloted (Beta version)

Understanding Machine Learning Literature

Overview

Duration: 60 min
Questions
  • How are machine learning workflows presented in research papers?

Objectives
  • Assess a typical machine learning methodology presented in an academic paper

This lesson will focus on understanding and evaluating machine learning workflows as they are presented in the literature. For this lesson, you will choose one of the below papers to read. Please read this paper for day 2 of the workshop, so you are familiar with the paper’s layout. Alternatively, if you brought a paper to day 1 you think is a good candidate, we can add it to the list of papers people can choose from. We will ask you to switch papers if you end up being the only person in the workshop who chooses that paper.

We will be exploring filling out this chart for the paper you select. This chart is a tool to help you think about how machine learning is used in a paper and come to a conclusion about if you think the claims made in the paper using machine learning are valid.

Choosing a paper to read

The paper you choose will use supervised learning in some form. While the paper should contain a classification or regression task, this task does not have to be the main goal of the paper. If the paper you choose has multiple machine learning models or tasks, choose one of them to focus on.

Terms to look out for when searching for a paper are terms related to the machine learning workflow such as training, testing, holdout, features, classifier, or regression. You can also looks for the name of a specific classifier, such as random forest or neural network. Papers which use deep learning methods are also usable for this activity.

It’s okay if the paper is not an exact fit, especially if it is a technique used in your field which you want to understand.

Example/Backup Papers

Here is a list of papers which you can choose for this activity.

Example Chart

While it might be helpful to look at this chart while filling out the paper, you do not need to fill this chart out before day 2 of the workshop. You will fill out this chart for the paper you chose during day 2 of the workshop.

Here is an example partially filled-out chart from Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Note that this paper uses 3 different classifiers, but we are just going to focus on the decision tree:

Example Chart Activity

Look in the paper for where the filled in parts of the chart came from.

Based on the paper and the filled out parts of the chart, try to fill in the Your Conclusions section.

Charting Your Paper

Now, on your own, see if you can fill in a blank chart using the paper you brought today with your group.

Key Points

  • Research workflows for machine learning are often not straightforward

  • Published papers often omit details which can make it difficult to evaluate machine learning workflows

  • Machine learning is used in a large variety of ways in biology