In a Data Carpentry Workshop
Free server access is offered upon request for workshops, and self-learners can also request temporary accounts. Servers are provided by “The Collaborative Community for Genomic Bacterial Analysis and Practice” and the sub-community “The Carpentries México”. For individuals choosing to follow the lessons on a different computer, step-by-step instructions for configuring remote machines with complete conda environments, as well as a Docker image, are provided here.
This workshop can also be run on Amazon Web Services (AWS) instances using the Docker image (a computer with all the required programs and files to which you will have access from your computer). If you are signed up to take a Pangenomics Data Carpentry Workshop, you do not need to worry about setting up an AMI instance. The Carpentries México staff will create an instance on our local servers for you, which will be free. This setup is true for both self-organized and centrally-organized workshops. Your instructor will provide instructions on how to connect to the AMI instance at the workshop.
If you are in The Carpentries Workshop, you do not even need to install a bash terminal;
the Jupyter-Notebook terminal provided in the AMI is enough to run all the commands in the lesson.
Instead of connecting by ssh
, users can simply use the Jupyter Notebook AMI terminal.
What you’ll need is an up-to-date web browser (Firefox, Safari, Chrome or Edge), and a working spreadsheet program. If you don’t have the latter, you can install LibreOffice, a free and open source office suite which includes a spreadsheet program called Calc.
Running the lesson by yourself (Not in a Data Carpentry Workshop)
Required software and packages
The following tables lists all the required software for the workshop.
Software | Version | Manual | Available for | Description |
---|---|---|---|---|
BLAST | 2.12 | BLAST Command Line Applications User Manual | Linux, MacOS & Windows | Similarity regions identifier between biological sequences |
Prokka | 1.14.6 | GitHub | Linux, MacOS & Windows | Bacterial, archaeal and viral assembly annotation |
ncbi-genome-download | 0.3.3 | GitHub | Linux, MacOS & Windows | Downloading genomes from the NCBI |
Anvi’o | 7.1 | Pangenomics Workflow Manual | Linux & MacOS | Multi-omics analysis including pangenomics |
GET_HOMOLOGUES | 3.6.2 | GitHub | Linux, MacOS & Windows | Sequence clustering |
PPanGGOLiN | 2.0.5 | GitHub Wiki | Linux & MacOS | Pangenomics |
RGI | 6.0.3 | GitHub | Linux, MacOS & Windows | Resistome annotation |
Python | At least 3.7 | Python Docs | Linux, MacOS & Windows | General-purpose programming language |
You will also need the following Python packages to be available: gudhi, jupyter, matplotlib, networkx, numpy, pandas, plotly, requests, scikit-learn, scipy, and seaborn.
While you could install each of these requirements manually, it is a highly laborious process; thus we provide a ready-to-use Docker image with all of these dependencies included. A Docker image is a file used to create a Docker container, which is an encapsulated environment storing everything a project or software needs for it to function. Follow the instructions of the section labeled Option A to learn how to set up a Docker container for our workshop, or read the section Option B if you prefer to install everything by yourself.
Option A: Running the Docker image
- If you haven’t done so already, install Docker Engine or Docker Desktop on your system by following these instructions.
- Next, open a terminal (in Linux and MacOS) or PowerShell (in Windows) in an
empty folder, and type the following command to start a container from our
image, replacing
NAME
with any memorable name you wish (the first time you execute this command, it will take a while as it will download the entire image):docker run -itv $(pwd):/root --name NAME aapashkov/panworkshop
- You are now connected to the Docker container, and are ready to get started
with the workshop! Type
exit
to leave the container. If you wish to connect back to it, please type the following two commands (replacingNAME
with the name you chose in step 2):docker start NAME docker attach NAME
- To delete a container after you have exited it, run this command (once again,
use the same
NAME
as from step 2):docker rm NAME
Option B: Installing dependencies manually
Data
The data used in this workshop is available on Zenodo. Please read the Zenodo page linked below for information about the data and access to the data files. Because this workshop works with real data, be aware that file sizes for the data are large.
More information about these data will be presented in the first episode of the Pangenome Analysis in Prokaryotes lesson.
Install a Bash terminal
Windows
- Download the Git for Windows installer. Run the installer and follow the steps below:
- Click on “Next” four times (two times if you’ve previously installed Git). You don’t need to change anything in the information, location, components, and start menu screens.
- Select “Use the nano editor by default” and click on “Next”.
- Keep “Use Git from the Windows Command Prompt” selected and click on “Next”. If you forget to do this, the programs that you need for the workshop will not work properly. If this happens, rerun the installer and select the appropriate option.
- Select “Use bundled OpenSSH” and click on “Next”.
- Select “Use the OpenSSL Library” and click “Next”.
- Keep “Checkout Windows-style, commit Unix-style line endings” selected and click on “Next”.
- Select “Use Windows’ default console window” and click on “Next”.
- Select “Default (fast-forward on merge)” and click on “Next”.
- Select “None” (Do not use a credential helper) and click on “Next”.
- Select “Enable file system caching” and click on “Next”.
- Ignore “Configuring experimental options” and click on “Install”.
- Click on “Install”.
- Click on “Finish”.
- If your “HOME” environment variable is not set (or you don’t know what this is):
- Open command prompt (Open Start Menu, then type
cmd
and press [Enter])- Type the following line into the command prompt window exactly as shown:
setx HOME "%USERPROFILE%"
- Press [Enter], and you should see
SUCCESS: Specified value was saved.
- Quit the command prompt by typing
exit
and then pressing [Enter]- See the video tutorial for an example of how to install Git on Windows 11.
- An alternative option is to install PuTTY by going to the the installation page. For most newer computers, click on putty-64bit-X.XX-installer.msi to download the 64-bit version. If you have an older laptop, you may need to get the 32-bit version putty-X.XX-installer.msi. If you aren’t sure whether you need the 64 or 32-bit version, you can check your laptop version by following the instructions here. Once the installer is downloaded, double-click on it, and PuTTY should install.
- Another alternative option is to use the Windows Subsystem Linux (WSL). This option is available for Windows 10 and Windows 11 - detailed instructions are available here. See the video tutorial for an example of how to install WSL with Ubuntu 22.04 on Windows 11.
macOS
- The default shell in some versions of macOS is Bash, and Bash is available in all versions, so no need to install anything. You access Bash from the Terminal Application (found in /Applications/Utilities). See how to open the terminal in the video tutorial. You may want to keep the terminal in your dock for this workshop.
Linux
- The default shell is usually Bash, and there is usually no need to install anything. To see if your default shell is Bash type, echo $SHELL in a terminal and press the Enter key. If the message printed does not end with
/bash
, then your default is something else, and you can run Bash by typingbash
.
Install Miniconda3
These instructions assume familiarity with the command line and with installation in general. There are different operating systems and many different versions of operating systems and environments, so these may not work on your computer. If an installation doesn’t work for you, please refer to the user guide for the tool listed in the table above. If you have difficulties with the installations or find better ways to install things in your operating system, please raise an Issue to let us know.
To make a Conda environment, first, you need to install Conda. We recommend installing the Miniconda3 version. Miniconda is a package manager that includes Conda and its dependencies and simplifies the installation process. Please first install Miniconda3 (installation instructions below) and then proceed to the installation of the environment.
Linux
To install miniconda3, see the video tutorial
MacOSX
In a terminal type:
$ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh $ bash Miniconda3-latest-MacOSX-x86_64.sh
Then, follow the instructions that you are prompted with on the screen to install Miniconda3.
WSL
See the video tutorial, installing Miniconda3 on WSL Ubuntu
Create the virtual environments
First, make sure to set a faster dependency solver, add necessary channels, update conda, and install BLAST into your base environment.
conda config --set solver libmamba
conda config --add channels conda-forge
conda config --add channels bioconda
conda update --quiet --yes --all
conda install --yes --name base blast=2.12
Download every .yml
file found in
this link
somewhere easily accessible. These files are environment definitions for the
virtual environments to be used during the workshop. Next, open a terminal
inside the directory where you downloaded these files, and create the virtual
environments using the following command:
for file in *.yml; do
conda env create --quiet --yes --file "$file"
done
After this step, if you list your conda environments, you should expect something like the following:
conda env list
# conda environments:
#
base * /miniconda3
Pangenomics_Global /miniconda3/envs/Pangenomics_Global
TDA /miniconda3/envs/TDA
anvio-7.1 /miniconda3/envs/anvio-7.1
ncbi-genome-download /miniconda3/envs/ncbi-genome-download
rgi /miniconda3/envs/rgi
You should now perform two more steps to get up and running with your environments.
Download and extract databases
Download the panworkshop-databases.tgz
file from
this Zenodo link and decompress it using
the following command:
tar -C / -zxf panworkshop-databases.tgz
Test run RGI
Test RGI by running the following commands:
conda activate /miniconda3/envs/rgi/
rgi main --help
If you get an error stating “ImportError: libffi.so.6: cannot open shared
object file: No such file or directory”, you will have to install the libffi6
package on your system. On Ubuntu and derivatives, you may install it as
follows:
wget https://mirrors.kernel.org/ubuntu/pool/main/libf/libffi/libffi6_3.2.1-8_amd64.deb
apt install ./libffi6_3.2.1-8_amd64.deb
rm libffi6_3.2.1-8_amd64.deb
Retry test running RGI; you should not see the same error again.
Connection to JupyterHub in servers from CCM UNAM (Notebooks and Terminal)
User credentials
As stated in the beginning of this page, during a workshop you’ll be provided with user credentials for servers from CCM UNAM by your Instructor. If you wish to run the lessons by yourself, or are planning to be an Instructor of this workshop, please contact nselem@matmor.unam.mx in order to access the user credentials for the servers.
Open a Bash Terminal
Click on the button “New” (upper right within the “Files” section) and choose the option “Terminal” from the drop-down menu. A new tab with a terminal will open.
Open a Jupyter Notebook with the TDA environment
Click on the button “New” (upper right within the “Files” section) and choose the option “TDA” from the drop-down menu. A new tab with a notebook will open.
Activating Conda environments
To activate a Conda environment in the Terminal of JupyterHub you need to
specify the absolute path of the environment:
conda activate /miniconda3/envs/my-environment-name