EukRef Pipeline

The PR2 database was initiated in 2010 in the frame of the BioMarks project from work that had developed in the previous ten years in the Plankton Group of the Station Biologique of Roscoff. Its aim is to provide a reference database of carefully annotated 18S rRNA sequences using eight unique taxonomic fields (from kingdom to species). At present it contains about 184,000 sequences. A number of metadata fields are available for many sequences, including geo-localisation, whether it originates from a culture or a natural sample, host type etc… The annotation of PR2 is performed by experts from each taxonomic groups. One very important project in this respect is EukRef which has recently decided to merge its effort with PR2. EukRef has built bioinformatics pipelines that have been used during three workshops dedicated to specific taxonomic groups. As an example, part of the ciliate annotation originate from the first EukRef workshop.

Prerequisites

This lesson is intended to be used by microbial ecologists or genomicists at the doctoral level or above.

Schedule

	Setup	Download files required for the lesson
00:00	1. Getting Started	Key question (FIXME)
00:00	2. Retrieve an Initial Set of Sequences and Cluster	Key question (FIXME)
00:00	3. Building Initial Alignment	Key question (FIXME)
00:00	4. Build an Initial Tree	Key question (FIXME)
00:00	5. Download Databases	Key question (FIXME)
00:00	6. Retrieve All Sequences That Belong To Your Clade	Key question (FIXME)
00:00	7. Build an Alignment with the Reference Sequences	Key question (FIXME)
00:00	8. Build RaxML Trees and Clean Up	Key question (FIXME)
00:00	9. Visualize Your Tree and Remove Errant Sequences	Key question (FIXME)
00:00	10. Build Reference Tree	Key question (FIXME)
00:00	11. Annotation	Key question (FIXME)
00:00	12. Getting Started	Key question (FIXME)
00:00	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.