This lesson is in the early stages of development (Alpha version)

Data processing and visualization for metagenomics

A lot of metagenomics analysis is done using command-line tools for three reasons:

1) You will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks.

2) You will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface.

3) You will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if a GUI tool even exists).

In a previous lesson, you learned how to use the bash shell to interact with your computer through a command-line interface. In this lesson, you will be applying this new knowledge to carry out a common metagenomics workflow - identifying Operational Taxonomic Unities (OTUs) among samples taken from two metagenomes within a location. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, assembly those reads into contigs, and ending by identifying and visualizing the OTUs among these samples.

As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatics tools. What you learn here will enable you to use a variety of bioinformatics tools with confidence and greatly enhance your research efficiency and productivity.


This lesson assumes a working understanding of the bash shell. If you haven’t already completed the Shell metagenomics lesson, and you aren’t familiar with the bash shell, please review those materials before starting this lesson.

This lesson also assumes some familiarity with biological concepts, including the structure of DNA, nucleotide abbreviations, and the concept microbiome. This lesson uses data hosted on an Amazon Machine Instance (AMI). Workshop participants will be given information on how to log-in to the AMI during the workshop. Learners using these materials for self-directed study will need to set up their own AMI. Information on setting up an AMI and accessing the required data is provided on the Metagenomics Workshop setup page.

Things You Need To Know

  1. Stay calm, don’t panic.
  2. Everything is going to be fine.
  3. We are learning together.


Setup Download files required for the lesson
00:00 1. Starting a Metagenomics Project How do you plan a metagenomics experiment?
How a metagenomics project looks like?
00:30 2. Assessing Read Quality How can I describe the quality of my data?
01:20 3. Trimming and Filtering How can we get rid of sequence data that doesn’t meet our quality standards?
02:15 4. Metagenome Assembly Why genomic data should be assembled?
What is the difference between reads and contigs?
How can we assemble a metagenome?
02:55 5. Metagenome Binning How can we obtain the original genomes from a metagenome?
03:55 6. Taxonomic Assignment How can I assign a taxonomy to my contigs?
04:40 7. Diversity Tackled With R How can I obtain the abundance of the reads?
How can I use R to explore diversity?
05:30 8. Taxonomic Analysis with R How can we compare depth-contrasting samples?
How can we manipulate our data to deliver a messagge?
06:30 9. Other Resources Where are other metagenomic resources?
How can lessons be previewed?
06:35 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.