This lesson is still being designed and assembled (Pre-Alpha version)

Download Databases


Teaching: 0 min
Exercises: 0 min
  • Key question (FIXME)

  • First learning objective. (FIXME)

Download and unpack databases

tar -xvf databases.tar.gz

Inside the unpacked “databases” directory you will find three files: Reference_DB.fas and gb203_pr2_all_10_28_97p_noorg.fasta, and tax_d.bin. These files are used for filtering results within All three files must be in the current working directory where you run the script in Step 6.

Get GenBank nt database

Locate a copy of GenBank nt database in use at your institution or on your server. At the EukRef workshop there is a copy on the Amazon server. GenBank NT should ideally be downloaded to a server. There will be more than 40 files totaling more than 30 GB. Downloading can take several hours. You may wish to install and use a tool like wget. In a workshop setting download one time and allow access for all participants.

Make a new folder called DATABASEFOLDER. From download taxdb.tar.gz and all nt.tar.gz files. Also, download all nt_.tar.gz.md5 files.

Unpack with the following command:

for i in nt*; do tar -xvf $i ; done

Run md5 checksum to ensure all files are fully downloaded.

(NOTE: Add explanation of what checksums are and instructions on how to run.)

Export database

Before running the pipeline run the following commands


(you can also add BLASTDB to your PATH permanently)

(NOTE - Add instructions on how to modify PATH.)

Key Points

  • First key point. Brief Answer to questions. (FIXME)