This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Genome Mining Workshop Overview: Data

Features of the dataset

The dataset consist of six genomes of Streptococcus agalactiae . The nucleotide files analized in this lesson are available in the DDBJ/EMBL/GenBank database with accession nos. AAJO01000000 (18RS21), AAJP01000000 (515), AAJQ01000000 (CJB111), AAJR01000000 (COH1), AAJS01000000 (H36B), and CP000114 (A909). Also, these sequences has been deposited in Zenodo with permanent doi:
DOI

Introduction to the dataset

In 2005 Tettelin and collaborators were working in a vaccine against Streptococcus agalactiae, the organism responsible of the main cause of nenonatal infections in humans. When comparing these six genomes, it was discovered that a single genome does nos contain all the genetic repertoir of a species. These data lead to the aknowledge of the inter-species genomic variation. The genetic content was described as the pan-genome consisting of a core genome shared by all isolates, plus a dispensable genome consisting of partially shared and strain-specific genes.

References

[1] Hervé Tettelin, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome” PNAS 2005. doi https://doi.org/10.1073/pnas.0506758102