Features of the dataset

The dataset consist of six genomes of Streptococcus agalactiae . The nucleotide files analized in this lesson are available in the DDBJ/EMBL/GenBank database with accession nos. AAJO01000000 (18RS21), AAJP01000000 (515), AAJQ01000000 (CJB111), AAJR01000000 (COH1), AAJS01000000 (H36B), and CP000114 (A909). Also, these sequences has been deposited in Zenodo with permanent doi:

Introduction to the dataset

In 2005 Tettelin and collaborators were working in a vaccine against Streptococcus agalactiae, the organism responsible of the main cause of nenonatal infections in humans. When comparing these six genomes, it was discovered that a single genome does nos contain all the genetic repertoir of a species. These data lead to the aknowledge of the inter-species genomic variation. The genetic content was described as the pan-genome consisting of a core genome shared by all isolates, plus a dispensable genome consisting of partially shared and strain-specific genes.

References

[1] Hervé Tettelin, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome” PNAS 2005. doi https://doi.org/10.1073/pnas.0506758102

Genome Mining Workshop Overview: Data

Features of the dataset

Introduction to the dataset

References