Trimming and Filtering

Overview

Teaching: 30 min
Exercises: 25 min

Questions

How can I get rid of sequence data that doesn’t meet my quality standards?

Objectives

Clean FASTQ reads using Trimmomatic.

Select and set multiple options for command-line bioinformatic tools.

Write for loops with two variables.

# Cleaning Reads In the previous episode, we took a high-level look at the quality of each of our samples using FastQC. We visualized per-base quality graphs showing the distribution of read quality at each base across all reads in a sample and extracted information about which samples fail which quality checks. Some of our samples failed quite a few quality metrics used by FastQC. This doesn't mean, though, that our samples should be thrown out! It's very common to have some quality metrics fail, and this may or may not be a problem for your downstream application. For our variant calling workflow, we will be removing some of the low quality sequences to reduce our false positive rate due to sequencing error. We will use a program called [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) to filter poor quality reads and trim poor quality bases from our samples. ## Trimmomatic Options Trimmomatic has a variety of options to trim your reads. If we run the following command, we can see some of our options.

Key Points

The options you set for the command-line tools you use are important!

Data cleaning is an essential step in a genomics workflow.

previous episode

Single-cell RNA-Seq Analysis

lesson home

Trimming and Filtering

Overview

Key Points

previous episode

lesson home