This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Genome Mining

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What is Genome Mining?

Objectives
  • Understand that Natural products are encoded in Biosynthetic Gene Clusters.

  • Understand that Biosynthetic Gene Clusters can be identified in the genomic material.

  • Discuss bioinformatic’s good practices with your colleagues.

Genome mining aims to find BGCs

Natural products are encoded in Biosynthetic Gene Clusters (BGCs) in Bacteria. These BGCs are clusters of genes placed together in the same genome region. These include the genes encoding the biosynthetic enzymes and those related to the metabolite’s transport or resistance against antibacterial metabolites. Genome mining consists in analyzing genomes with specialized algorithms designed to find some BGCs. Chemists in the last century diligently characterized some of these clusters. We have extensive databases that contain information about which genes belong to which BGCs and some control sets of genes that do not. The use of genome mining methodologies facilitates the prioritization of BGCs for the search of novel metabolites. Since the era of next-generation sequencing, genomes have been explored as a source for discovering new BGCs.

Complete pipeline of genome mining. From a single genome, this example obtains their BGC and compares them with other BGC from related genomes Genome Mining Wikipedia

Chloramphenicol is a known antibiotic produced in a BGC

For example, let’s look into the BGC responsible for chloramphenicol biosynthesis. This is a BGC described for the first time in a Streptomyces venezuelae genome.

MIBiG layout of the Chloramphenicol gene cluster from _Streptomyces venezuelae_ comprising 17 genes Explore the BGC

Exercise 1: Sort the Steps to Identify BGCs Similar to Clavulanic Acid

Below is a list of steps in disarray that are part of the process of identifying BGCs similar to clavulanic acid. Your task is to logically order them to establish a coherent methodology that allows the effective identification of such BGCs.

Disordered Steps:

a. Annotate the genes within the identified BGCs to predict their function.

b. Compare the identified BGCs against databases of known BGCs to find similarities to clavulanic acid.

c. Extract DNA from samples of interest, such as microorganism-rich soils or specific bacterial cultures.

d. Conduct phylogenetic analysis of the BGCs to explore their evolutionary relationship.

e. Use bioinformatics tools to assemble DNA sequences and detect potential BGCs.

f. Sequence the extracted DNA using next-generation sequencing (NGS) techniques.

Solution

  1. c. Extract DNA from samples of interest, such as microorganism-rich soils or specific bacterial cultures.
  2. f. Sequence the extracted DNA using next-generation sequencing (NGS) techniques.
  3. e. Use bioinformatics tools to assemble DNA sequences and detect potential BGCs.
  4. a. Annotate the genes within the identified BGCs to predict their function.
  5. d. Conduct phylogenetic analysis of the BGCs to explore their evolutionary relationship.
  6. b. Compare the identified BGCs against databases of known BGCs to find similarities to clavulanic acid.

Planning a genome mining project

Here we will provide tips and tricks to plan and execute a genome mining project. Firstly, choose a set of genomes from taxa. In this lesson we will be working with S. agalactiae genomes (Tettelin et al., 2005). Although this genus is not know for its potential as a Natural products producer, it is good enough to show different approaches to genome mining. Recently, metagenomes have been considered in genome mining studies. Here are some considerations that might be useful as a genome miner:

Discussion 1: Describe your project

Tdoihasodihfo FIXME

Solution

a. FIX ME.

  1. Donwload Genomes, Dat

Starting a genome mining project

Once you have chosen your set of genomes, you need to annotate the sequences. The process of genome annotation needs two steps. First, a gene calling approach (structural annotation), which looks for CDS or RNAs within the DNA sequences. Once these features have been detected, you need to assign a function for each CDS (functional annotation). This is usually done through comparison against protein databases. There are tens of bioinformatics tools to annotate genomes, but some of the most broadly used are; RAST (Aziz et al. 2008), and Prokka (Seeman, 2014). Here, we will start the genome mining lesson with S. agalactiae genomes already annotated by Prokka. You can download this data from this repository. The annotated genomes are written in GeneBank format (extension “.gbk”). To learn more about the basic annotation of genomes, see the lesson named “Pangenome Analysis in Prokaryotes: Annotating Genomic Data”These files are also accessible in the… Insert introduction related to the access to the server??

References

Key Points

  • Natural products are encoded in Biosynthetic Gene Clusters (BGCs)

  • Genome mining describes the exploitation of genomic information with specialized algorithms intended to discover and study BGCs