Getting the dataset
wget https://zenodo.org/record/7974915/files/pan_workshop.zip?download=1
unzip 'pan_workshop.zip?download=1'
rm 'pan_workshop.zip?download=1'
Features of the dataset
The dataset consist of six genomes of Streptococcus agalactiae . The nucleotide files analized in this lesson are available
in the DDBJ/EMBL/GenBank database with accession nos. AAJO01000000 (18RS21), AAJP01000000 (515), AAJQ01000000 (CJB111), AAJR01000000 (COH1), AAJS01000000 (H36B), and CP000114 (A909). Also, these sequences has been deposited in Zenodo with permanent doi:
Introduction to the dataset
In 2005 Tettelin and collaborators were working in a vaccine against Streptococcus agalactiae, the organism responsible of the main cause of neonatal infections in humans. When comparing these six genomes, it was discovered that a single genome does nos contain all the genetic repertoir of a species. These data lead to the aknowledge of the inter-species genomic variation. The genetic content was described as the pan-genome consisting of a core genome shared by all isolates, plus a dispensable genome consisting of partially shared and strain-specific genes.
References
[1] Hervé Tettelin, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome” PNAS 2005. doi https://doi.org/10.1073/pnas.0506758102