Download Databases


Download and unpack databases

tar -xvf databases.tar.gz

Inside the unpacked “databases” directory you will find three files: Reference_DB.fas and gb203_pr2_all_10_28_97p_noorg.fasta, and tax_d.bin. These files are used for filtering results within All three files must be in the current working directory where you run the script in Step 6.

Get GenBank nt database

Locate a copy of GenBank nt database in use at your institution or on your server. At the EukRef workshop there is a copy on the Amazon server. GenBank NT should ideally be downloaded to a server. There will be more than 40 files totaling more than 30 GB. Downloading can take several hours. You may wish to install and use a tool like wget. In a workshop setting download one time and allow access for all participants.

Make a new folder called DATABASEFOLDER. From download taxdb.tar.gz and all nt.tar.gz files. Also, download all nt_.tar.gz.md5 files.

Unpack with the following command:

for i in nt*; do tar -xvf $i ; done

Run md5 checksum to ensure all files are fully downloaded.

Export database

Before running the pipeline run the following commands


