This lesson is in the early stages of development (Alpha version)

Pangenomics Workshop Overview: Data

Getting the dataset

wget https://zenodo.org/record/7974915/files/pan_workshop.zip?download=1
unzip 'pan_workshop.zip?download=1'
rm 'pan_workshop.zip?download=1'

Features of the dataset

The dataset consist of six genomes of Streptococcus agalactiae . The nucleotide files analized in this lesson are available in the DDBJ/EMBL/GenBank database with accession nos. AAJO01000000 (18RS21), AAJP01000000 (515), AAJQ01000000 (CJB111), AAJR01000000 (COH1), AAJS01000000 (H36B), and CP000114 (A909). Also, these sequences has been deposited in Zenodo with permanent doi:
DOI

Introduction to the dataset

In 2005 Tettelin and collaborators were working in a vaccine against Streptococcus agalactiae, the organism responsible of the main cause of neonatal infections in humans. When comparing these six genomes, it was discovered that a single genome does nos contain all the genetic repertoir of a species. These data lead to the aknowledge of the inter-species genomic variation. The genetic content was described as the pan-genome consisting of a core genome shared by all isolates, plus a dispensable genome consisting of partially shared and strain-specific genes.

References

[1] Hervé Tettelin, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome” PNAS 2005. doi https://doi.org/10.1073/pnas.0506758102