
Welcome to NOODAI!
The NOODAI software offers a method to collectively analyze multiple omics datasets by integrating them into a unified framework. As input, NOODAI takes the biological entities (i.e. genes/proteins/small molecules) found in at least one omics dataset comparing two conditions. In a nutshell, the analysis steps for each of the comparisons and omics dataset are:
- Merging all genes/proteins/small molecules into a joint protein-protein and protein-small molecule interaction network.
- Computing multiple centrality scores for each node, highlighting the most important proteins/small molecules.
- Decomposing the network into modules, where each module consists of upregulated or downregulated omics elements belonging to a specific signaling pathway. These subnetworks characterize a specific biological function, in their entirety defining the analyzed conditions.
How to cite
Totu, Tiberiu; Riudavets Puig, Rafael; Häuser, Lukas Jonathan; Tomasoni, Mattia; Bolck, Hella Anna; Buljan, Marija.
NOODAI: A webserver for network-oriented multi-omics data analysis and integration pipeline.
https://doi.org/10.1101/2024.11.08.622488
Input data
Input sanity checks
Network analysis parameters
Knowledge databases settings
Getting started
- Upload your data as a zipped folder containing Excel files.
- Each Excel file should represent a different omics layer.
- Create a different sheet within the file for each condition comparison.
- Include a column with representative UniProt IDs (i.e. significant gene sets) or ChEBI IDs.
- Download the Demo Dataset to understand formatting requirements, sheet names (compared conditions), and column names (containing UniProt or ChEBI IDs).
- In the “Condition comparison (contrasts)” field, specify the conditions you are comparing.
- Example:
DISEASEvsHEALTHY
- These must match exactly the sheet names in your Excel files.
- Run the NOODAI pipeline. The default configuration performs a non-weighted MONET network analysis with STRING, BioGRID and IntAct knowledge-base interaction datasets, and REACTOME-based pathway analysis. Optionally, provide your email address to get notified when the results are ready.
- Save the shown Results job ID.
- Go to the NOODAI results page and provide the Results ID. There you can follow the analysis progress and download the results when they are ready. When the analyses are complete you can choose conditions you wish to examine on the left “Selector” panel and then assess on the right the most central network elements together with connections among them visualised as a circos plot, as well as major pathways found enriched in the largest network modules. Information is also provided on the entities that constitute circos plots or individual modules.
- By providing your email, you get notified when the analysis is complete and the result files can be easily downloaded from the link provided within the email. The results folder contains output Excel files that list centrality values for all analysed entities, provide information on the extracted network modules and describe biological roles of most central regulatory proteins (kinases and transcription factors). Please check the “Results_Interpretation” summary there!
Checklist: Was This Analysis Suitable for Your Data?
Ask yourself the following:
- Do most central entities point to biologically meaningful processes?
- Are there modules with 10 or more members identified?
- Are reported functional enrichments biologically meaningful?
- Does network integration bring added value?
Note: Output files also include results for individual omics layers. For instance, higher average node betweenness value in the integrated analysis can point to a more structured architecture of the integrated network with a higher capacity for information flow centralized around key regulatory nodes.
Overview
The NOODAI algorithm allows the collective analysis of multiple omics datasets by integrating them into a unified framework. The pipeline starts from lists of upregulated (or downregulated) proteins, genes, and/or small molecules extracted from different omics analyses, organized in separate Excel tables containing multiple sheets corresponding to different comparisons of interest (contrasts).
For each contrast, the algorithm is organized into 4 analysis segments as follows:
- Segment I: all proteins, genes, and/or small molecules from different omics analyses are merged into a unified interaction network using filtered knowledge available in STRING, BioGrid, and IntAct databases. After the network is built, NOODAI computes a number of centrality metrics for each node (i.e. proteins) while putting especial interest in the current-flow betweenness centrality. These centrality scores provide a metric of the importance of each node in the context of the overall network.
- Segment II: the full-size network is decomposed into subnetworks (also known as modules) using the MONET decomposition tool. In theory, each module is expected to be related to a specific biological function. Such expectation can be checked by evaluating how well the members of the subnetwork associate with a particular signaling pathway.
- Segment III: the associated signaling pathways are extracted for each module containing more than 10 members. Subsequently, their over-representation against a specified pathway database is evaluated relative to a selected background.
- Segment IV: three types of outputs are created:
- Circular diagrams: the plots highlight the most important transcription factors and their connecting proteins in the full-size network are marked based on the module in which the proteins are found. “Most important” is defined by default as being in the top 30% of most central nodes sorted by the current-flow betweenness centrality.
- Enriched pathways: the top 3 pathways (based on FDR levels) for the first 5 modules (based on their size) are represented. Notably, there may be overlap among these pathways, and it is recommended that for publication purposes, a plot be recreated to emphasize unique pathways supported by strong biological rationales.
- Summary report: the report highlights the most important and robust nodes, pathways, transcription factors and kinases. All the definitions and the entire folder structure are presented in the summary report.
Web interface
You can run your NOODAI analysis by navigating to the Run NOODAI
tab on the side panel. There, the analysis is divided in three main panels:
- Run the full pipeline: this is the main tab where you can run the entire analysis pipeline with a few input parameters.
- Custom algorithms: in this tab, each segment of the analysis pipeline can be customized depending on the needs. Requires a thorough understanding of the analysis pipeline!
- Results download: in this tab you can download your analysis results using a unique identifier that was given to you after submitting an analysis request.
Input description
Below we describe the required input parameters to be able to run NOODAI:
- Condition comparison (contrasts): as the analysis is based on the upregulated/downregulated elements when comparing one condition to another, such a comparison is termed here as a “contrast” (ex. the comparison of M1 to M2a has two contrasts: M1vsM2a contrast corresponding to the upregulated elements of M1 compared to M2a and M2avsM1 corresponding to the upregulated elements of M2a compared to M1). Each contrast should match an Excel sheet with an identical name in the tables from the uploaded ZIP archive. Please enter the contrasts separated by commas (ex: M1vsM2a,M2avsM1,M1vsM2c).
- Omics files archive: the web server accepts as input an archive that contains one Excel table for each of the omics profiles of interest. Each table should have at least one sheet, named after a unique analyzed contrast (ex: M1vsM2a) and should contain the upregulated elements of the first covariate compared to the second one (ex: the M1vsM2a sheet in the Proteomics.xlsx file contains the upregulated proteins found in M1 when compared to M2a considering a log2FC threshold of 1 and FDR threshold of 0.05). All sheets MUST have the column that contains the proteins/small molecules that are upregulated named UniProt_ChEBI. The names of the entries for this column MUST be UniProt or ChEBI Ids. All tables must contain the same number of sheets with the same names, even though they are empty! All sheets that are to be analyzed must contain the UniProt_ChEBI column.
Example of table format:
UniProt_ChEBI Optional Optional
P42224 STAT1 0.05
P40763 STAT3 0.05
Optional inputs (all have default values):
Besides the fields described in the Mandatory input fields description
section, all fields are optional and will use default values when not specified. These fields are described below:
-
Weight factor: if you want to analyze weighted interaction networks and have available weights for your input features, you can include an additional column named
Weight
in the Excel tables. This column should contain log fold changes or another relevant metric for your input features. This approach applies to any of the input omics layers or contrasts. In this case, edge weights will be computed using the NetWalk approach, and some centrality measures (excluding current-flow betweenness centrality) will incorporate these weights. The MONET decomposition tool will also take the edge weights into account. The weight parameter represents the NetWalk restart probability and should be a value greater than 0 but smaller than 1. By default it is set as 0.1. -
MONET method: the MONET method parameters must be specified following the MONET tool instructions. The format must remain as the one pre-loaded. The MONET avilable methods are: M1, K1 and R1. By default the networks are undirectional with a desired average degree for nodes in output of 10.
-
Pathway databases: used to specify which pathway databases to use to find the over-represented pathways associated with each identified MONET module. Possible databases are
Reactome
,Wikipathways
,BioCarta
,PID
,NetPath
,HumanCyc
,INOH
andSMPDB
. Note that licensing restrictions may apply to some databases, and you are responsible for compliance. The developers assume no liability in the event of a legal dispute. Currently, Reactome and Wikipathways are public domain licenses (CC0) and you can use them freely. -
BioMart dataset: the conversion of input Uniprot IDs to other identifiers is facilitated through the BioMart database service. The pre-selected dataset is the one corresponding to humans. If you are using data from other organisms, please provide the correct identifier. Some example identifiers are:
- Mus musculus:
mmusculus_gene_ensembl
- Bos taurus:
btaurus_gene_ensembl
- Caenorhabditis elegans:
celegans_gene_ensembl
- Canis familiaris:
clfamiliaris_gene_ensembl
- Danio rerio:
drerio_gene_ensembl
- Drosophila melanogaster:
dmelanogaster_gene_ensembl
- Gallus gallus:
ggallus_gene_ensembl
- Rattus norvegicus:
rnorvegicus_gene_ensembl
- Saccharomyces cerevisiae:
scerevisiae_gene_ensembl
- Xenopus tropicalis:
xtropicalis_gene_ensembl
- Sus scrofa:
sscrofa_gene_ensembl
- Schizosaccharomyces pombe:
spombe_gene_ensembl
If you have data from another organism than the pre-laoded ones described above, it is required to provide a pre-formatted interaction table, pathway, TF and kinome databases depending on the analysis segment you are interested in.
- Mus musculus:
-
Interaction table file: activating the Add custom databases button highlights the interaction databases fields that are dependent on the selection option for the Use
Pre-compiled Interaction file
field. Default protein-protein interaction databases are already loaded on the server. However, due to technical limitations all databases are already filtered to include only the selected species protein-protein and protein-small_molecule interactions. The pre-formatted interaction file that you can upload has the following format: 2 columns with the names Interactor1 and Interactor2, followed by lines with 2 proteins/small_molecule per line using NCBI or ChEBI IDs. It must be a tab-separated text file. YOU MUST provide this table if you have data from other organism than the selected ones. Formatting example:Interactor1 Interactor2 10421 23020 10755 4646
-
BioGrid database file: one of the protein-protein interaction sources is the BioGrid database. The available version is 4.4.218 in mitab format filtered to include only entries with a confidence score. If you would like to upload another version you can upload it here. The database will be filtered automatically for humans!
-
STRING database file: another protein-protein interaction database is STRING. The available version on the server is 11.5 containing the complete interactions data considering all sources and filtered to include only entries with a combined score above 0.7. This database is filtered a priori for the selected organisms interactions. If provided, this database will not be filtered. If you choose to provide this database, please make sure to filter it a priori for your organism of interest. Do not upload the entire database as it is too big.
-
IntAct database file: the IntAct database is also used for the analysis. The psimitab from 13/07/2022 is already loaded and filtered to keep only interactions with a confidence score above 0.7 for the selected organisms. If you choose to upload another database, please filter it a priori.
-
Interaction table file: the protein-protein interactions from BioGrid, STRING and IntAct databases filtered as above and the small molecule interactions from IntAct, are merged into a final pre-compiled interaction table.
Running the pipeline with the demo dataset
Within the Run the full pipeline
tab, you can run an analysis using Demo data by clicking on the Demo button.
The Demo dataset consists of proteomics, phosphoproteomics and transcriptomics measurements of 3 macrophage phenotypes derived from primary cells: M1, M2a and M2c (Original publication). In the paper, differential expression analysis (DEA) was performed for all possible comparisons. Each comparison is denoted as a string following the convention Condition1vsCondition2
. For example, the upregulated elements of M1 compared to M2a from the DEA are denoted as M1vsM2a
.
The Demo dataset can be downloaded from the NOODAI platform using the Download the Demo data
button. It includes an archive containing formatted Excel table examples (Uploaded_Data_Archive.zip
) and a text file specifying the setup values for the fields in the NOODAI platform.
Analysis submission and results download tab
After providing the input data, you can submit an analysis request using the Submit button. If there are no evident errors related to the data formatting, the output panel containing your analysis ID will appear. Please make sure to save this results folder ID
as it cannot be retrieved later. If you provide an email address in the Email address (Optional)
field, you will receive an email notification once the analysis is finished.
To download your results, navigate to the Results download
tab and enter your results folder ID
in the Results directory index
field. Finally, use the button on the right-hand side to initiate the download of your results. From this tab, you can download a pre-compiled version of the Demo dataset.
Custom algorithms tab
The options in this tab are tailored for advanced users who possess a thorough understanding of all analysis steps. Users have the possibility to submit an initial analysis or, for an existing analysis, redo specific segments of the analysis pipeline, thereby changing the original results. To ensure full flexibility, error monitoring is kept to a minimum.
To run any code segment, please provide inputs for all non-transparent fields. Below is an explanation of the fields that were not covered under the Run the full pipeline
section:
- Results directory index: provide the folder associated with your results in which you would like to redo some analysis. The original analysis for the respective segment will be discarded. If no folder is provided, a new one will be created.
- DTU file id: in this field, you can specify the name of the omics table that contains the alternative splicing DTU results. The NOODAI algorithm was designed to include splicing data as an input omics type. Compared to proteomics or transcriptomics data, differentially used transcripts (DTU) extracted from analyzing spliced data have some particular characteristics. Firstly, DTU hits often form large, well-connected networks that are disconnected from any proteomics or transcriptomics hits. To address this, NOODAI restricts the analysis of networks composed of more than 75% DTU hits, ensuring the meaningfulness of the resulting networks. Moreover, DTU hits are the same for symmetric contrasts (M1vsM2a is equivalent to M2avsM1). However, to keep the networks meaningful, common elements between DTU and other analyses are merged only for that specific contrast. Example: If MAPK1 is identified as alternatively spliced between M1 and M2a and is also found to be upregulated in proteomics for M1, it will be retained only for M1. To achieve this, the uploaded data must have values for all the contrasts symmetrically (for both M1vsM2a and M2avsM1). You can upload any omics data meeting these characteristics (the entries are the same for both M1vsM2a and M2avsM1) as splicing data.
- Edge file path: by default, the names used for the nodes during the MONET decomposition and pathway analysis are sourced from the
Symbol
folder as Gene Names. If you prefer to use Uniprot IDs instead of Gene names, you can specify the folder here. - Temporary folder: this is the temporary folder for the MONET analysis. By default, it is deleted after MONET results are successfully copied to their default location. If you run the interface locally, you can specify your desired temporary folder location.
- MONET path: this is the path to the MONET executable. Change only if you run the analysis locally!
- MONET background file: by default, the over-representation analysis of members associated with each MONET cluster uses a background consisting of all proteins/small molecules from the full-size network, stored in
Background_total.xlsx
. This file contains all proteins/small molecules of the full-size network. If you prefer to use a different background, you can load an Excel table formatted similarly toBackground_total.xlsx
(no header, only NCBI IDs, and each sheet representing a contrast). - CPDB database file: the signaling pathway knowledge is sourced mainly from CPDB-provided files and Reactome. If you prefer to utilize different databases or newer versions, please supply an updated CPDB pathway database or a file with a similar format. You must provide this table if you have species other than the pre-loaded ones and wish to perform this analysis segment.
- Edge files directory: this must be consistent with the
Edge file path
. By default, the circular plots and the final summary report use gene names. If you prefer to use Uniprot IDs, you can switch to the Uniprot folder (formatted identically to theEdge file path
). - TF database: if you wish to utilize your custom transcription factor dataset, you can upload it here. Ensure it follows the same format as the dataset available on GitHub (Databases/_TF with the mandatory column
Symbol
). You must provide this table if you have species other than the pre-loaded ones and wish to perform this analysis segment.Species Symbol Family 9606 PAX8 PAX 9606 STAT3 STAT
- Centralities file: by default, circular diagrams are generated for the combination of all omics datasets. However, you have the flexibility to create circular plots and pathways for individual omics datasets as they were all analyzed. Given the option to upload your file, you can customize your input, which can influence the final plots. This is particularly beneficial if the representation of the top 30% most central nodes does not align with your data.
- Kinome Database: in the summary report, kinases are identified using a pre-compiled database. If you wish to highlight elements other than kinases, you have the option to change this database to one of your preferences. Ensure that the format of the uploaded text file matches that of the one provided on GitHub (Databases/kinome.txt with mandatory columns
Uniprot
andGene_name
). You must provide this table if you have species other than the pre-loaded ones and wish to perform this analysis segment. - File_ending: Bby default, circular diagrams, pathway plots, and summary reports are generated using the network composed of all omics data (files ending with
Total
). If you prefer to generate plots only for a specific omics dataset, you can change the file ending here to match the one corresponding to your input omics file name.
When changing the parameters in the custom algorithm tab, ensure consistency by updating all relevant fields! For example, do not change the File_ending
without uploading another Centrality file!
Purpose of the Custom algorithms:
- Integrate splicing data or data with similar properties with other omics measurements.
- Run MONET using another algorithm such as (R1 or K1) without waiting for the computation of all centralities.
- Provide a different background for the pathway over-representation analysis.
- Provide a different signaling pathways knowledge database for the pathway over-representation analysis.
- Instead of highlighting the kinases in the summary report, you can leverage the algorithm and replace the Kinome database with one containing elements you wish to search and highlight. For instance, you can search for important phosphatases by providing a phosphatase database structured similarly to the Kinome database.
- Modify the centrality file used for generating the circular diagram plot to emphasize different hits.
Miscellanous
NOODAI relies mainly on protein-protein and small molecules interaction networks. Certain omics data, such as transcriptomics, can be utilized to construct such a network. However, some omics measurements are not inherently linked to proteins/small molecules, rendering a protein-protein interaction network purposeless (such as miRNAs). These should be mapped to either a ChEBI ID or UniProt ID.
Small molecule interactions are taken from IntAct. If you want other interaction source you should provide them as a custom made “Interaction File”.
If you analyze other species that the pre-loaded ones you must provide all databases files and you can use the platform only for protein-protein interaction networks (due to ID conversion restrictions from BiomaRt).
How to cite
Totu, Tiberiu; Riudavets Puig, Rafael; Häuser, Lukas Jonathan; Tomasoni, Mattia; Bolck, Hella Anna; Buljan, Marija.
NOODAI: A webserver for network-oriented multi-omics data analysis and integration pipeline.
https://doi.org/10.1101/2024.11.08.622488
NOODAI version 2.0.0 relies on STRING v11.5, BioGRID v4.4.218, IntAct v245, Ensembl BioMart v114 and Reactome v90. Alternative knowledge databases can be provided by the user.