This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Metagenomics Workshop Overview

Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses Data Carpentry’s approach to teach data management and analysis for metagenomics research including: best practices for organization of bioinformatics projects and data, use of command-line utilities, use of command-line tools to analyze sequence quality, use of R studio and use of R libraries to compare diversity between samples, and connecting to and using cloud computing. This workshop is designed to be taught over two full days of instruction.

Please note that workshop materials for working with Metagenomics data are in “beta” development. These lessons are available for review and for informal teaching experiences, but are not yet part of The Carpentries’ official lesson offerings.

Interested in teaching these materials? We have an Slack channel were we will be happy to help you!

Frequently Asked Questions

Read our FAQ to learn more about Data Carpentry’s Metagenomics workshop, as an Instructor or a workshop host.

Getting Started

This lesson assumes that learners have no prior experience with the tools covered in the workshop.

However, learners are expected to have some familiarity with biological concepts, including the concept of DNA sequencing, nucleotide abreviations, genome, microbiome and taxonomy. Participants should bring their own laptops and plan to participate actively.

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.


This workshop uses data from enviromental experiment: Genomic adaptations in information processing underpin trophic strategy in a whole-ecosystem nutrient enrichment experiment, by Jordan G Okie et al. In this research, authors compared the differences between the microbial community in its natural, oligotrophic, phosphorus-deficient environment, a pond from the Cuatro Ciénegas Basin (CCB), and the same microbial community under a fertilization treatment.

All of the data used in this workshop can be downloaded from DOI More information about this data is available on the Data page.

Workshop Overview

Lesson Overview Estimated time
Project Organization and Management Learn how to structure your metadata, organize and document your metagenomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. 1:30 hr
Introduction to the Command Line Learn to navigate your file system, create, copy, move, and remove files and directories, and automate repetitive tasks using scripts and wildcards. 4:00 hr
Introduction to R Use R studio to manage several data types and data structures. 1:00 hr
Data Processing and Visualization for Metagenomics Use command-line tools to perform quality control, metagenomic assembly, metagenomic binning, taxonomic assignment, and diversity exploration. 6:30 hr

Lessons Reference

The content of this page and three of the lessons presented on this workshop are adapted from lessons on the Data Carpenty Genomics Workshop.

Teaching Platform

This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. All the software and data used in the workshop are hosted on an Amazon Machine Image (AMI). If you want to run your own instance of the server used for this workshop, follow the directions in the Setup tab.


Preparing to submit to JOSE