Trimming and Filtering
# Cleaning Reads In the previous episode, we took a high-level look at the quality of each of our samples using FastQC. We visualized per-base quality graphs showing the distribution of read quality at each base across all reads in a sample and extracted information about which samples fail which quality checks. Some of our samples failed quite a few quality metrics used by FastQC. This doesn't mean, though, that our samples should be thrown out! It's very common to have some quality metrics fail, and this may or may not be a problem for your downstream application. For our variant calling workflow, we will be removing some of the low quality sequences to reduce our false positive rate due to sequencing error. We will use a program called [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) to filter poor quality reads and trim poor quality bases from our samples. ## Trimmomatic Options Trimmomatic has a variety of options to trim your reads. If we run the following command, we can see some of our options.Overview
Teaching: 30 min
Exercises: 25 minQuestions
How can I get rid of sequence data that doesn’t meet my quality standards?
Objectives
Clean FASTQ reads using Trimmomatic.
Select and set multiple options for command-line bioinformatic tools.
Write
for
loops with two variables.
Key Points
The options you set for the command-line tools you use are important!
Data cleaning is an essential step in a genomics workflow.