Build an Alignment with the Reference Sequences

Overview

Teaching: 0 min
Exercises: 0 min

Questions

Key question (FIXME)

Objectives

First learning objective. (FIXME)

Cluster sequences using usearch

usearch -sortbylength annotated_DB_for_tree.fasta -fastaout current_DB.sorted.fasta -minseqlength 64 -notrunclabels

If the FASTA file to be clustered contains long (e.g. genomic) sequences, it is likely that the clustering step will run out of memory. Check the screen output of this “usearch -sortbylength” step, which indicates the length of the longest sequence found within the file; this should not exceed 5,000 (the hard limit imposed in Step 5).[/su_note]

NOTE: you should choose the -id (similarity threshold for clustering) that is appropriate for your group.

usearch -cluster_smallmem current_DB.sorted.fasta -id 0.97 -centroids current_DB.clustered.fasta -uc current_DB.clusters.uc -notrunclabels

Align using MAFFT

If there is not a specific reason to use a different approach to use the default mafft algorithm. See instructions below for alignments with groups known to be difficult.

mafft --reorder --auto current_DB.clustered.fasta > current_DB_aligned.fasta

Alternative: Align using INFERNAL

cmalign --outformat AFA -o current_DB_aligned_infernal.fasta SSUref.cmm current_DB.clustered.fasta
sed 's/\./-/g' current_DB_aligned_infernal.fasta > current_DB_aligned.fasta

Alternative: Align using SINA*
./sina -i NAME.clustered.fasta -o NAME_aligned.fasta --outtype fasta --ptdb REFERENCE_ALN.ARB
If you choose to use SINA alignment, that you have to provide a reference alignment in ARB format. Easiest way to get one is to downloaded from SILVA’s ARB database page.

It is critical to open and check your alignment in Aliview software (or similar) to see if there are misaligned blocks or sequences. Make sure you look at the entire length of the alignment. If the alignment is not good you will not produce a good tree. If you have a poor alignment you can 1) align according to a template (seed) alignment, the command below, or 2) modify your alignment by hand.

If you already have a curated alignment for your group you can use it as a template (seed) to align your new sequences with the following command.

mafft --reorder --genafpair --maxiterate 1000 --retree --anysymbol --seed template_curated_alignment.fasta current_DB.OUTGROUP.clustered.fasta > current_DB_aligned.fasta

Remember to open your alignment file and check before proceeding.

Trim the alignment using trimal

trimal -in current_DB_aligned.fasta -out current_DB_trim.fasta -gt 0.3 -st 0.001

Key Points

First key point. Brief Answer to questions. (FIXME)

previous episode

EukRef Pipeline

next episode