This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Pangenome Analysis in Prokaryotes: Glossary

Key Points

Introduction to Pangenomics
  • A pangenome encompasses the complete collection of genes found in all genomes within a specific group, typically a species.

  • Comparing the complete genome sequences of all members within a clade allows for the construction of a pangenome.

  • The pangenome consists of two main components: the core genome and the accessory genome.

  • The accessory genome can be further divided into the shell genome and the cloud genome.

  • In an open pangenome, the size of the pangenome significantly increases with the addition of each new genome.

  • In a closed pangenome, only a few gene families are added to the pangenome when a new genome is introduced.

Downloading Genomic Data
  • The ncbi-genome-download package is a set of scripts designed to download genomes from the NCBI database.

Annotating Genomic Data
  • Prokka is a command line utility that provides rapid prokaryotic genome annotation.

  • Sometimes we need manual curation of the output files of the software.

  • Specialized software exist to perform annotation of specific genomic elements.

Measuring Sequence Similarity
  • To build a pangenome you need to compare the genes and build gene families.

  • BLAST gives a score of similarity between two sequences.

Clustering with BLAST Results
  • The Bidirectional Best-Hit algorithm groups sequences together into families according to the E-value.

Clustering Protein Sequences
  • Clustering protein sequences refers to the process of grouping similar sequences into distinct clusters or families.

  • GET_HOMOLOGUES is a software package for microbial pangenome analysis

  • Three sequence clustering algorithms are supported by GET_HOMOLOGUES; BDBH, COGtriangles, and OrthoMCL

Exploring Pangenome Graphs
  • PPanGGOLiN is a software to create and manipulate prokaryotic pangenomes.

  • PPanGGOLiN integrates gene families and their genomic neighborhood to build a graph and define the partitions.

  • PPanGGOLiN is designed to scale up to tens of thousands of genomes.

Interactive Pangenome Plots
  • Anvi’o can build a pangenome starting from genomes or metagenomes, or a combination of both

  • Anvi’o allows you to interactively visualize your pangenomes

  • Anvi’o platform includes additional scripts to explore the geometric and biochemical homogeneity of the gene clusters, to compute and visualize the ANI values of the genomes, to conduct a functional enrichment analysis in a group of genomes, among others

Other Resources
  • Downstream analysis of pangenomes could be focused on describing the core or the accessory genome of the organism studied.

  • Examples using the information obtained in the CORE GENOME:

  • a) Selection of a conserved gene to design a molecular test for a diagnostic tool or a vaccine.

  • b) Reconstruction of a species phylogenetic tree by using all the core genes.

  • Examples using the information obtained in the ACCESSORY GENOME:

  • a) Describe niche-specific genes among the strains compared.

  • b) Analysis of horizontal gene transfer or genetic recombination.

  • c) Evolutionary studies of genes (duplication, gain-loss genes, etc.).

Glossary

Pangenome: The complete repertoire of genes of a group of organisms.