This lesson is still being designed and assembled (Pre-Alpha version)



Teaching: 0 min
Exercises: 0 min
  • Key question (FIXME)

  • First learning objective. (FIXME)

Trim your metadata

Use the script Trim your metadata file to include only representative sequences of clusters – these are the sequences in your tree.


-i current_DB.cleaned.fasta is the fasta file with your clustered sequences containing only the sequences used in your reference tree. -m metadata.txt is your metadata file from

python -i current_DB.cleaned.fasta -m metadata.txt -o metadata_ref.txt


-o metadata_ref.txt, which only has the accessions in your reference tree. This will make annotating easier.

Annotate classification

You need to annotate classification your metadata_ref.txt. Note that you should have GenBank taxonomy and SILVA taxonomy in your metadata_ref.txt file so you know the starting point. In many cases, it will be easiest to annotate clades directly on your reference tree. You can annotate the tree and then export annotations to your metadata_ref.txt file using the instructions below.

Download your reference tree e.g. “RAxML_bestTree.clade”. Open tree in FigTree and annotate taxa with the name of the taxon. You can only annotate the taxa, NOT nodes or clades. You can select entire clades to annotate all taxa at once. Annotated nodes cannot be collapsed (annotations within the collapsed node will not be reported). “Save as” the tree with annotations for example “RAxML_bestTree.clade_annotated.tre”, which is a nexus file.

Run script to add annotations in your tree to your metadata_ref.txt file.


-t RAxML_bestTree.clade_annotated.tre -m metadata_ref.txt

python -t RAxML_bestTree.clade_annotated.tre -m metadata_ref.txt


The “metadata_ref_out.txt” will be your tab-delimited metadata file that includes the Accession Number of representative for your clusters and the annotations that you made on the tree in the last column.

Expand your curation

Use the script to expand your curation to all sequences within the clusters.


-i metadata.txt is the initial metadata file you got -r metadata_ref_out.txt is the reference metadata you just added annotations to -c current_DB.clusters.uc is the cluster file (.uc) from usearch. Use the most recent version if you clustered multiple times.

python -i metadata.txt -r metadata_ref_out.txt -c current_DB.clusters.uc -o metadata_ref_expanded.txt


-o metadata_ref_expanded.txt is the expanded annotated metadata file

Key Points

  • First key point. Brief Answer to questions. (FIXME)