This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Pangenome Analysis in Prokaryotes

Welcome to this lesson on the fundamental principles of Pangenomics, a rapidly advancing field in bioinformatics. Throughout this course, you will delve into the fundamental theories that underpin the study of pangenomes. By utilizing command-line software, you will gain hands-on experience in downloading and annotating public bacterial genomes, acquiring essential genomic analysis skills.

One of the key highlights of this course is the opportunity to engage with specialized programs designed for pangenomics analysis. You will master the art of gene family clustering. You will become adept at constructing interactive pangenome graphs and plots, powerful visualization tools for studying the general structure of a pangenome and the families composing it. You will finally explore how to apply Topological Data Analysis to studying pangenomes.

The analyses presented here were meticulously curated to equip you with the necessary tools for conducting a starting pangenomics pipeline. By refining your bioinformatician skills through practical application, you will not only gain confidence in your abilities but also be well-prepared to explore diverse resources (See Other Resources). With this, you can go ahead and develop your personalized workflow tailored to the specific objectives of your pangenomics research.

Get ready to embark on this exciting journey into the world of Pangenomics, where you will unlock new insights and unravel the complexities of genomic variation!

Pre-requisites

Before diving into this lesson on Pangenomics, it is essential to have a working understanding of the Bash shell and the language Python. If you are not already familiar with these programming languages, we recommend completing the Introduction to the Command Line for Pangenomics lesson before to starting this one and Introduction to Python for Pangenomics.

Additionally, some familiarity with biological concepts is assumed for this lesson. Having a basic understanding of prokaryote, genomes, genes, and orthologyis beneficial. If you are new to these concepts, we encourage you to review relevant materials to ensure you have a solid foundation for this course.

Throughout this lesson, we will utilize data hosted on an Amazon Machine Instance (AMI). Workshop participants will receive information on how to log in to the AMI during the workshop. If you are studying independently, you must set up your own AMI or install the necessary programs on your computer. Detailed instructions on setting up an AMI and accessing the required data can be found on the Pangenomics Workshop Setup page.
If you are taking this workshop in UNAM-CCM, you will access the shell and Python and have access to all the bioinformatics programs through a JupyterHub server.

This lesson is the third part of the Pangenomics Workshop, which also includes Introduction to the Command Line for Pangenomics and Introduction to Python for Pangenomics.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction to Pangenomics What is a pangenome?
What are the components of a pangenome?
00:25 2. Downloading Genomic Data How to download public genomes by using the command line?
01:10 3. Annotating Genomic Data How can I identify the genes in a genome?
02:15 4. Measuring Sequence Similarity How can we measure differences in gene sequences?
02:55 5. Clustering with BLAST Results How can we use the blast results to form families?
03:30 6. Clustering Protein Sequences Can I cluster my sequences automatically?
04:10 7. Exploring Pangenome Graphs How can I build a pangenome of thousands of genomes?
How can I visualize the spatial relationship between gene families?
04:50 8. Interactive Pangenome Plots How can I obtain an interactive pangenome plot?
How can I measure the homogeneity of the gene families?
How to obtain an enrichment analysis of the gene families?
How to compute the ANI values between the genomes of the pangenome?
05:20 9. Other Resources What can I do after I have built a pangenome?
What bioinformatic tools are available for downstream analysis of pangenomes?
05:40 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.