Topological Data Analysis for Pangenomics

Welcome to this lesson on the fundamental principles of Topological Data Analysis, hereafter called TDA. TDA is used to analyze large datasets and discover structures that differentiate datasets.The main objects in TDA are called simplicial complexes and are a generalization of a graph. They are formed by connecting vertices (0-simplices), edges (1-simplices), triangles (2-simplices), and simplices in higher dimensions.

We will begin by exploring the basic concepts of TDA and the computational tools that allow us to use TDA. In the following chapters, we will delve into the mini Streptococcus database used in the Pangenome Analysis in Prokaryotes lesson to reconstruct a pangenome using simplicial persistence techniques. This will help us identify gene families. In the subsequent lesson, the phenomenon of horizontal gene transfer is studied by searching for 1-holes using two different simplicial complexes.

One of the key highlights of this course is the opportunity to explore how to apply TDA methods to Comparative Genomics analyzes. Get ready to embark on this exciting journey into the world of TDA.

Prerequisites

Before diving into this lesson, it is essential to have a working understanding of the Bash shell and the language Python. If you are not already familiar with these programming languages, we recommend completing the Introduction to the Command Line for Pangenomics and Introduction to Python.

This lesson is the third part of the Pangenomics Workshop.

Schedule

	Setup	Download files required for the lesson
00:00	1. Introduction to Topological Data Analysis	What is topological data analysis?
00:50	2. Computational Tools for TDA	How can I computationally manipulate simplicial complexes?
01:35	3. Detecting horizontal gene transfer	How can I detect HGT with TDA?
02:35	4. Persistence Simplices gives rise to Gene Families	How can I apply TDA to describe Pangenomes? How can persistence simplices be related to gene families?
03:25	5. Other Resources	What other tools are available in TDA? What are the limitations of those tools?
03:45	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.