This lesson is still being designed and assembled (Pre-Alpha version)

Genome Mining in Prokaryotes: Glossary

Key Points

Introduction to Genome Mining
  • Natural products are encoded in Biosynthetic Gene Clusters (BGCs)

  • Genome mining describes the exploitation of genomic information with specialized algorithms intended to discover and study BGCs

Secondary metabolite biosynthetic gene cluster identification
  • antiSMASH is a bioinformatic tool capable of identifying, annotating and analysing secondary metabolite BGC

  • antiSMASH can be used as a web-based tool or as stand-alone command-line tool

  • The file extensions accepted by antiSMASH are GenBank, FASTA and EMBL

Genome Mining Databases
  • MIBiG provides BGCs that have been experimentally tested

  • antiSMASH database comprises predicted BGCs of each organism

BGC Similarity Networks
  • BGC similarity is measured by BiG-SCAPE according to protein domain content, adjacency and sequence identity.

  • The gbks of the regions identified by antiSMASH are the input for BiG-SCAPE.

  • BiG-SCAPE delivers BGCs similarity networks with which it delimits Gene Cluster Families and creates a phylogeny of the BGCs in each GCF.

Homologous BGC Clusterization
  • BiG-SLiCE and BiG-FAM are softwares that are useful to compare the metabolic diversity of bacterial lineages between each other and against a big database

  • An input-folder containing the BGCs from antiSMASH and the taxonomic information of each genome is needed to run BiG-SLiCE

  • The results from the antiSMASH web-tool are needed to run BiG-FAM

  • Gene Cluster Families can help us to compare the metabolic capabilities of a set of bacterial lineages

  • We can use BiG-FAM to compare a BGC against the whole database and predict its Gene Cluster Family

Finding Variation on Genomic Vicinities
  • CORASON is a command-line tool that finds BGC-families

  • Genomic vicinity variation is organized phylogenetically according to the conserved genes in the BGC-family

Evolutionary Genome Mining
  • EvoMining is a command-line tool that performs evolutionary genome mining over gene families

  • EvoMining hits can belong to new BGC

  • MicroReact is an interactive genomic visualizer compatible with EvoMining output

GATOR-GC: Genomic Assessment Tool for Orthologous Regions and Gene Clusters
  • GATOR-GC is an innovative tool that uses an enzyme-aware scoring system and evolutionary principles to explore BGC diversity.

  • Unlike traditional methods, GATOR-GC offers flexibility in defining the taxonomic scope and prioritizes the identification of novel biosynthetic pathways.

  • GATOR-GC can be customized to search for essential and optional enzymes, making it a powerful tool for targeted exploration.

  • Dynamic gene cluster diagrams and GATOR neighborhood visualizations provide clear insights into gene conservation and genomic relationships.

Metabolomics workshop
  • Data is generated using Liquid Chromatography coupled to a tandem mass spectrometer (LC-MS/MS or MS2).

  • Dereplication is the process of identifying previously known compounds.

  • Molecular networking is a computational method that organizes MS2 data based on spectral similarity, allowing us to infer relationships between chemical structures

  • Feature-Based Molecular Networking (FBMN) enhances classical molecular networking by integrating relative quantitative data, enabling more robust metabolomics statistical analysis.

Other Resources
  • First key point. Brief Answer to questions. (FIXME)

Glossary

: cd command to change directory