Understanding Machine Learning Literature
Overview
Duration: 60 minQuestions
How are machine learning workflows presented in research papers?
Objectives
Assess a typical machine learning methodology presented in an academic paper
This lesson will focus on understanding and evaluating machine learning workflows as they are presented in the literature. For this lesson, you will choose one of the below papers to read. Please read this paper for day 2 of the workshop, so you are familiar with the paper’s layout. Alternatively, if you brought a paper to day 1 you think is a good candidate, we can add it to the list of papers people can choose from. We will ask you to switch papers if you end up being the only person in the workshop who chooses that paper.
We will be exploring filling out this chart for the paper you select. This chart is a tool to help you think about how machine learning is used in a paper and come to a conclusion about if you think the claims made in the paper using machine learning are valid.
Choosing a paper to read
The paper you choose will use supervised learning in some form. While the paper should contain a classification or regression task, this task does not have to be the main goal of the paper. If the paper you choose has multiple machine learning models or tasks, choose one of them to focus on.
Terms to look out for when searching for a paper are terms related to the machine learning workflow such as training, testing, holdout, features, classifier, or regression. You can also looks for the name of a specific classifier, such as random forest or neural network. Papers which use deep learning methods are also usable for this activity.
It’s okay if the paper is not an exact fit, especially if it is a technique used in your field which you want to understand.
Example/Backup Papers
Here is a list of papers which you can choose for this activity.
- Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
- DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning
- Gradient modeling of conifer species using random forests
- A Logistic Regression Model Based on the National Mammography Database Format to Aid Breast Cancer Diagnosis
- Potential neutralizing antibodies discovered for novel corona virus using machine learning
- seqQscorer: automated quality control of next-generation sequencing data using machine learning
- Identifying mouse developmental essential genes using machine learning
- A deep learning approach to antibiotic discovery
Example Chart
While it might be helpful to look at this chart while filling out the paper, you do not need to fill this chart out before day 2 of the workshop. You will fill out this chart for the paper you chose during day 2 of the workshop.
Here is an example partially filled-out chart from Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Note that this paper uses 3 different classifiers, but we are just going to focus on the decision tree:
Example Chart Activity
Look in the paper for where the filled in parts of the chart came from.
Based on the paper and the filled out parts of the chart, try to fill in the Your Conclusions section.
Charting Your Paper
Now, on your own, see if you can fill in a blank chart using the paper you brought today with your group.
Key Points
Research workflows for machine learning are often not straightforward
Published papers often omit details which can make it difficult to evaluate machine learning workflows
Machine learning is used in a large variety of ways in biology