This lesson is still being designed and assembled (Pre-Alpha version)

Metabolomics workshop

Overview

Teaching: 15 min min
Exercises: 75 min min
Questions
  • How can I evaluate the similarity between MS spectra?

Objectives
  • Understand how GNPS molecular networks work.

  • Use raw mzML metabolomic data for annotation.

  • Visualize and explore molecular networks

Introduction

In the previous sections, we explored the biosynthetic potential of various strains by analyzing the BGCs within their genomes. We utilized BiG-SCAPE to create BGC networks, which compare all the BGCs detected by antiSMASH to determine their relatedness. Similarly, using metabolomics we can investigate the metabolites produced by selected strains in the laboratory. While crude extracts can be made to analyze the metabolites of specific strains, it is also possible to study the metabolites present in complex samples, such as soil, marine environments, host organisms, or stool samples. There are two approaches to annotating metabolomics data. The first comprises using the information obtained by LC-MS, comparing the exact mass and UV spectra of each detected metabolite through natural product databases. The introduction of the global natural products social molecular networking GNPS platform for molecular networking (M. Wang et al., 2016), significantly influenced dereplication techniques. Molecular networking groups metabolites into molecular families (MFs), thereby improving the annotation process of unknown metabolites.

GNPS output can be directly visualized in the GNPS webpage, or using other visualization tools such as [Cytoscape](https://cytoscape.org/)

Creating a GNPS account

Before starting this tutorial, it will be useful to have a GNPS account. For that go to the GNPS webpage and select Create a new account.

Create an account in GNPS

Then fill in the following information:

Wait for a confirmation email, and you now have a GNPS account.

Download MZMine v3.9

  1. First, go to the MZMine 3.9 release MZMine 3.9

  2. Select the installable file depending on your computer

MZmine 3, an MS data analysis platform

  1. Double-click on the file, and install the software

Download Cytoscape

Cytoscape is a software frequently used to visualize networks, such as BGC networks or molecular networks.

  1. Go to Cytoscape webpage

  2. Click on download Cytoscape for your operating system

  3. Install Cytoscape in your computer.

Download the metabolomics dataset

First, we need to go to Zenodo and download all the mzML raw data collected from the described strains. https://zenodo.org/api/records/13352458/files-archive

After downloading the compressed file, we need to decompress it and store the files in a folder on our computer.

Import the dataset, and use the batch file to analyze your data

This data was collected from crude extracts from two marine Streptomyces: Streptomyces sp. H-KF8, and Streptomyces sp. Vc74B-19. Two media were used, ISP2 and ISP2 prepared with artificial seawater (ASW), to evaluate the effect of replicating the natural environment from which these strains were isolated.

Data collection from *Streptomyces* sp. H-KF8, and *Streptomyces* sp. Vc74B-19.

We downloaded 18 LC-MS/MS-derived files in mzML format. This data was collected by Dr. Mauricio Caraballo-Rodriguez in the Dorrestein Lab, at the University of California San Diego. There are files from each strain, in ISP2 and ISP2-ASW, besides the crude extracts from the culture media. The data is in triplicates.

Besides mzML files, there is a file metadata_table.tsv, that contains all the relevant information from this dataset. Includes the names of the samples, relevant data collection, and taxonomic information.

In addition, there is information relevant to the analysis, such as the names of the strains, the media used for culturing, and the antimicrobial activity. All this information is included in the format ATTRIBUTE_*

Data collection from *Streptomyces* sp. H-KF8, and *Streptomyces* sp. Vc74B-19.

At last, there is a file named MZMine_FBMN_batch.xml that collects all the information necessary for the analysis using MZMine

Analysis using MZMine

Load batch file

Open MZMine3, click on “Open”, and then in “Batch Mode”

Load batch file

Here you should select load, and search for your downloaded files on your computer. Then select the MZMine_FBMN_batch.xml file In confirmation, you should select Replace the batch steps.

Load batch file

Then double-click on import MS data

Load batch file

Select from your computer the 18 mzML files from this dataset

Load batch file

After this, every file should be included in the batch-processing mode. Select OK afterward so the files begin to process in the meantime.

Load batch file

Briefly, MZMine is now detecting all the masses present in your samples, grouping them, and then aligning them, so you can know in which sample each detected spectrum is present

You should note, that the parameters used are specific for this dataset, you might need to change some of the values when analyzing your own samples.

For more information: Nothias, LF., Petras, D., Schmid, R. et al. Feature-based molecular networking in the GNPS analysis environment. Nat Methods 17, 905–908 (2020). https://doi.org/10.1038/s41592-020-0933-6

Explore the structure of our data

In the “MS data files” from MZMine you can observe all the 18 LC-MS/MS files in mzML format that we loaded

Datasets

Let’s inspect what the two of these files look like. We could select one file from Streptomyces sp. H-KF8 in ISP2, and one in ISP2-ASW We can select both files, then right-click and select “Show chromatograms”

Chromatogram

Here we can select the mass range that we want to observe. Since we want to see all the spectra detected, click “Auto range”. It will automatically will select masses ranging from 100 m/z to almost 3,749 m/z Click “OK” then

Chromatogram

The software will display the Total Ion Chromatogram (TIC) from both samples. In this case, strain H-KF8 is displayed in pink when cultured in ISP2-ASW, and in black when cultured in ISP2

Chromatogram

We can select a section of the chromatogram to inspect the differences of the metabolomic profiles of these samples

Chromatogram

We can observe that several spectra are produced exclusively by strain H-KF8 in ISP2-ASW

Chromatogram

Analyze the final output from the analysis

After processing all files. We should look at the feature lists tab from MZMine. There we can observe that we have a file called “Aligned feature list 13C gaps”. Double-click on that

Feature list

Here we can observe the feature list, where each row is one detected MS spectra with its m/z and retention time (RT). Each column is one of the 18 samples. If MS spectra are detected in a sample, then the height of the peak is displayed in the table

Feature List

Remove media blanks MS spectra

Now we want to remove all the MS spectra that are part of the culture media and not produced by our strains.

For that, we need to go to “Feature List Methods”, then click on “Feature List Filtering”, and then on “Feature List Blank Subtraction”

Blank substraction

In the “Blank/Control raw data files” section, we need to select “Specific raw data files”, and then select

Blank substraction

Here we need to select all the samples that belong to the crude extracts from the Media. There are 6 in total. Press OK afterward

Blank substraction

Now we have two Feature lists

  1. Aligned feature list 13C gaps. That is the original feature list including media MS spectra
  2. Aligned feature list 13C gaps subtracted. Feature list with media blanks removed

Blank substraction

Export Feature lists in GNPS format

We are going to export both Feature lists.

First, select “Aligned feature list 13C gaps”. Then go to “Feature List Methods”, “Export Feature List”, and select Molecular “networking files”

Export files

Then click “Select”, and in “File name” write the name that you want your files to be named. In this case, I selected “GM_workshop_Featurelist_complete”. So I know that this file is from the Latin American genome mining workshop and that the feature list includes the media blank MS spectra. then press “save”

Export files

Make sure that in Filter rows you select “MS2 or ION IDENTITY”, so only MS spectra with MS2 are selected.

Export files

Then, in your selected folder, you should have two files

  1. GM_workshop_Featurelist_complete_quant.csv

This file is a table that includes all the feature lists in your samples. Again, each row is an MS spectrum, and each column is each of the 18 samples.

  1. GM_workshop_Featurelist_complete_quant.mgf

This file contains the information on each spectrum. Contains the parent mass in m/z, and the m/z values of each fragment from that spectra, with the peak intensity of each spectrum.

Export files

Now we need to repeat the export step but with the media blanks removed. This time the files will be named “GM_workshop_Featurelist_filtered” so we can know that there is no MS spectra that are originally from the culture media.

After this, we have 4 files. And we are done with the processing steps in MZMine 3

Create a molecular network

Go into GNPS webpage

and login using your username and password

Then we should go to “Advanced Analysis Tools”, and Select “Analyze” in the Feature Networking

FBMN

Then we should write a title for our network. We could use something like “GM_workshop_FBMN_filtered”, because we are going to use the feature list with the media blanks removed.

Following that, in “File Selection”, click on “Select Input File”

FBMN

We now need to upload our feature lists and our metadata table. Click on “Upload files”

FBMN

Here we can create a folder in our GNPS. I created a new folder called “LATAM_GM_workshop”. Click on that folder. Then in the “File Drag and Drop”, drag and drop the following files:

  1. GM_workshop_Featurelist_filtered.mgf
  2. GM_workshop_Featurelist_filtered_quant.csv
  3. metadata_table.tsv

FBMN

Return to the “Select Input Files” after uploading your files. You should be able to see the three uploaded files in your directory

FBMN

We are going to select the “GM_workshop_Featurelist_filtered.mgf” file as “MS2 file in MGF format”

FBMN

then we are going to select “GM_workshop_Featurelist_filtered_quant.csv” as the “Feature Quantification Table”

FBMN

Finally, we are going to select the “metadata_table.tsv” as our “Sample Metadata Table”

FBMN

In the “Selected Files” section, we should be able to see the three files in their corresponding sections. Click on finish selection after checking everything is ok.

FBMN

After selecting the files, we need to adjust the parameters for our network Since our data was collected using a high-resolution LC-MS/MS, we could adjust the

FBMN

Then we need to select our thresholds to create the molecular network

FBMN

After this, write your email, so you know when your network is finished. Then click “Submit”

FBMN

For more information about the rest of the parameters used for molecular networking:

Statistics analysis using FBMN STATS guide web server

Now we want to observe how different are the metabolic profiles of our samples. For that, we are going to calculate a Principal Coordinate Analysis (PCoA)

Go into FBMN STATS guide

Select “Data Preparation”

FBMN

In file origin select “Quantification table and metadata files”.

FBMN

After loading your files, click “Submit Data for Statistics!”.

FBMN

Check that your data have been properly submitted by checking that you have the “Data preparation was successful!”.

FBMN

Now we should go to the “PERMANOVA & PCoA” section.

FBMN

Select Principal Coordinate Analysis.

FBMN

Now we could change how we want to color our samples. Select “attribute for multivariate analysis” ATTRIBUTE_media. This will color our samples according to the media (ISP2 or ISP2-ASW)

FBMN

We can observe that there is a difference between the samples prepared with ISP2 and ISP2-ASW

FBMN

Now if we color our samples by strain, selecting ATTRIBUTE_strain in “attribute for multivariate analysis”

We can observe that Streptomyces sp. Vc74B-19 metabolic profile is quite different than the ones from Streptomyces sp. H-KF8 and the culture media.

Also, it is possible to observe that Streptomyces sp. H-KF8 metabolomic profile differs more from the media when cultured in ISP2-ASW

FBMN

There are several statistics that you can do using FBMN STATS guide, you can check the preprint here.

Visualize the network using Cytoscape

We need to check if our GNPS network is done processing. You might have received an email, otherwise, go to your GNPS account and click on “Jobs”

FBMN

If your network is Done, then click on it. Afterward, click on “Direct Cytoscape Preview/Download”

FBMN

If your network is not Done, then we could use a previously computed network, that holds the same data that we used in this workshop

Cytoscape

Click on “Download Cytoscape File”, and save it in your computer

FBMN

Open the network on Cytoscape. We want to change the style of the network and color the nodes according to the strains. Click on “Style”

FBMN

Select the pie chart icon on Image/Chart. This will help us figure out the abundance of each MS spectra, depending on the metadata parameter that we choose

FBMN

In this case, we will select the two strains, and also the media Although we removed all the MS spectra present in the media, there might still be some nodes with MS spectra detected in some of the media samples.

FBMN

If we click “Apply”, we can observe the nodes present in each strain, and the media. In this case, the light blue means the MS spectra detected in Streptomyces sp. Vc74B-19

In this Molecular Family, we can observe some MS spectra with annotations. Some are similar to Urdamycinone B, Dehydroxyaquayamycin, and other Angucycline-related compounds.

FBMN

We can color now the nodes depending on the media where they are detected. Select the pie chart again in “Image/Chart”, and now select ISP2 and ISP2-ASW. This will color ISP2 in red, and ISP2-ASW in light blue

FBMN

We can observe that some nodes are detected mostly in ISP2. However, several angucycline-related compounds are detected almost exclusively in ISP2-ASW

FBMN

References

In the future it will include:

Key Points

  • Data is generated using Liquid Chromatography coupled to a tandem mass spectrometer (LC-MS/MS or MS2).

  • Dereplication is the process of identifying previously known compounds.

  • Molecular networking is a computational method that organizes MS2 data based on spectral similarity, allowing us to infer relationships between chemical structures

  • Feature-Based Molecular Networking (FBMN) enhances classical molecular networking by integrating relative quantitative data, enabling more robust metabolomics statistical analysis.