A systems biology approach for pathway level analysis
Draghici et al. (2007): A systems biology approach for pathway level analysis
The authors describe a systems biology approach which they used to develop "an impact analysis that includes the classical statistics but also considers other crucial factors such as the magnitude of each gene's expression change, their type and position in the given pathways, their interactions, etc." with the aim being "to understand the underlying phenomenon in the context of all complex interactions taking place on various signaling pathways".
The impact analysis is an attempt to a deeper level of statistical analysis, informed by more pathway-specific biology than the existing techniques. On several illustrative data sets, the classical analysis produces both false positives and false negatives, while the impact analysis provides biologically meaningful results. This analysis method has been implemented as a Web-based tool, Pathway-Express, freely available as part of the Onto-Tools .
In 2002, the authors of this paper published "a computerized analysis approach using the Gene Ontology (GO)", which "takes a list of differentially expressed genes and uses a statistical analysis to identify the GO categories (e.g. biological processes, etc.) that are over- or under-represented in the condition under study". According to the authors there were more than 20 tools using this overrepresentation approach (ORA) at the time this paper was written.
An alternative approach, which "considers the distribution of the pathway genes in the entire list of genes and performs a functional class scoring (FCS)" was proposed by Goeman et al. in 2004. The state of the art in this category is supposed to be the Gene Set Enrichment Analysis (GSEA) which various authors published about in the years 2003 to 2005.
However, both of these approaches have a snag to them:
Both ORA and FCS techniques currently used are limited by the fact that each functional category is analyzed independently without a unifying analysis at a pathway or system level. This approach is not well suited for a systems biology approach that aims to account for system level dependencies and interactions as well as to identify perturbations and modifications at the pathway or organism level.
As to be expected, several solutions to this problem have been proposed, some of which are listed in this paper. However:
The approaches currently available for the analysis of gene signaling networks share a number of important limitations. First, these approaches consider only the set of genes on any given pathway and ignore their position in those pathways. This may be unsatisfactory from a biological point of view. If a pathway is triggered by a single gene product or activated through a single receptor and if that particular protein is not produced, the pathway will be greatly impacted, probably completely shut off. A good example is the insulin pathway . If the insulin receptor (INSR) is not present, the entire pathway is shut off. Conversely, if several genes are involved in a pathway but they only appear somewhere downstream, changes in their expression levels may not affect the given pathway as much.
Second, some genes have multiple functions and are involved in several pathways but with different roles. For instance, the above INSR is also involved in the adherens junction pathway as one of the many receptor protein tyrosine kinases. However, if the expression of INSR changes, this pathway is not likely to be heavily perturbed because INSR is just one of many receptors on this pathway. Once again, all these aspects are not considered by any of the existing approaches.
Also, "the existing analysis approaches consider only the sets of genes involved on these pathways, without taking into consideration their topology".
Therefore, the authors "propose a radically different approach for pathway analysis that attempts to capture all aspects above":
An impact factor (IF) is calculated for each pathway incorporating parameters such as the normalized fold change of the differentially expressed genes, the statistical significance of the set of pathway genes, and the topology of the signaling pathway. We show on a number of real data sets that the intrinsic limitations of the classical analysis produce both false positives and false negatives while the impact analysis provides biologically meaningful results.
The authors then describe how this impact factor is calculated and present some results from using this pathway analysis approach to analyze several data sets. Finally, they conclude:
The impact analysis incorporates the classical probabilistic component but also includes important biological factors that are not captured by the existing techniques: the magnitude of the expression changes of each gene, the position of the differentially expressed genes on the given pathways, the topology of the pathway that describes how these genes interact, and the type of signaling interactions between them. The results obtained on several independent data sets show that the proposed approach is very promising.