This lesson is still being designed and assembled (Pre-Alpha version)

Trimming and Filtering

Overview

Teaching: 30 min
Exercises: 25 min
Questions
  • How can I get rid of sequence data that doesn’t meet my quality standards?

Objectives
  • Clean FASTQ reads using Trimmomatic.

  • Select and set multiple options for command-line bioinformatic tools.

  • Write for loops with two variables.

# Cleaning Reads In the previous episode, we took a high-level look at the quality of each of our samples using FastQC. We visualized per-base quality graphs showing the distribution of read quality at each base across all reads in a sample and extracted information about which samples fail which quality checks. Some of our samples failed quite a few quality metrics used by FastQC. This doesn't mean, though, that our samples should be thrown out! It's very common to have some quality metrics fail, and this may or may not be a problem for your downstream application. For our variant calling workflow, we will be removing some of the low quality sequences to reduce our false positive rate due to sequencing error. We will use a program called [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) to filter poor quality reads and trim poor quality bases from our samples. ## Trimmomatic Options Trimmomatic has a variety of options to trim your reads. If we run the following command, we can see some of our options.

Key Points

  • The options you set for the command-line tools you use are important!

  • Data cleaning is an essential step in a genomics workflow.