Applications of genome-scale metabolic reconstructions
Oberhardt et al. (2009): Applications of genome-scale metabolic reconstructions
This is a review that examines "the many uses and future directions of genome-scale metabolic reconstructions" and highlights "trends and opportunities in the field that will make the greatest impact on many fields of biology" ten years after the publication of the first genome-scale metabolic reconstruction, a metabolic model of Haemophilus influenzae (Edwards et al. (1999): Systems properties of the Haemophilus influenzae Rd metabolic genotype ).
[T]oday [more than] 50 genome-scale metabolic reconstructions have been published[.] [...] Of all organisms that have been analyzed through a constraint-based metabolic reconstruction, Escherichia coli has gained the most attention as a model organism.
Since there has already been a review focusing on E. coli (Feist et al. (2008): The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli ), this paper excludes E. coli and focuses on the other organisms instead.
The papers this review is about can be put into five different categories:
(1) contextualization of high-throughput data, (2) guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4) interrogation of multi-species relationships, and (5) network property discovery[.]
The authors summarize the process of metabolic reconstruction as follows:
First, an initial reconstruction is built from gene-annotation data coupled with information from online databases such as KEGG and EXPASY, which link known genes to functional categories and help bridge the genotype phenotype gap. Second, the initial reconstruction is curated through an examination of the primary literature. Then, the reconstruction as a knowledge base is converted into a mathematical model that can be analyzed through constraint-based approaches. Third, the reconstruction is validated through comparison of model predictions to phenotypic data. In a final fourth step, a metabolic reconstruction is subjected to continued wet- and dry-lab cycles, which improve accuracy and allow investigation of key hypotheses.
What data does this process deliver to us? The authors write:
Through gap analysis and subsequent pathway analysis, studies have elucidated both the stoichiometry of certain reactions and the most efficient pathways for production of certain metabolites, and in some cases have even proposed methods for engineering more efficient strains. Also, it is common for reconstruction efforts to provide high-quality estimates of cellular parameters such as growth yield, specific fluxes, P/O ratio, and ATP maintenance costs, and these theoretical values are often used for hypothesis building or validation in biological studies. Several published metabolic reconstruction studies also include in silico predictions for minimal medium design.
Which organisms have been reconstructed and what kind of data have we gained by this? The paper provides the following answer:
Metabolic GENREs of prokaryotes encompass an average of 600 metabolites, 650 genes, and 800 reactions, whereas metabolic GENREs of eukaryotes include on average 1200 metabolites, 1000 genes, and 1500 reactions. Excluding the two existing reconstructions of Homo sapiens metabolism lowers the average eukaryotic network size to 800, 800, and 1300, metabolites, genes, and reactions, respectively, a closer but still higher distribution to that of prokaryotes. [...] Existing reconstructions span the domains Eukaryota, Bacteria, and Archaea. The most represented domain is bacteria, with 25 species reconstructed.
Now comes something that is interesting for us - the relationship with Computational Systems Biology:
With biology increasingly becoming a data-rich field, an emerging challenge has been determining how to organize, sort, interrelate, and contextualize all of the high-throughput datasets now available. This challenge has motivated the field of top down systems biology, wherein statistical analyses of high-throughput data are used to infer biochemical network structures and functions.
This metabolic data is "often linked with other data types, such as protein expression data, protein protein interaction data, protein metabolite interaction data, and physical interaction data." It can also be used for metabolic engineering, which is "the use of recombinant DNA technology to selectively alter cell metabolism and improve a targeted cellular function".
Regarding hypothesis-driven discovery, the authors write:
Gene microarrays serve as a prime example; a traditional hypothesis-driven study might include examination of 1 or 2 genes in a microarray that are of particular interest. This approach would ignore the thousands of other genes on the chip, however, and could miss important information or trends embedded in those data. Therefore, a systematic framework for incorporating genome-scale data available from multiple high-throughput methods would allow hypothesis-driven biology to benefit from the full range of tools available today. Metabolic GENREs represent concise collections of existing hypotheses, and taken together as a broad context they enable systematic identification of new hypotheses that can be tested and resolved. Therefore, they represent a crucial framework for incorporating the flood of biological data now available into the biological discovery process.
Metabolic GENREs intrinsically represent a simplification of cellular function. The distinct biochemical networks categorized by scientists (e.g. metabolism, regulation, and signaling) blend together in a living cell, creating a far more complicated web of interactions than is convenient or possible to model. This web is fundamentally stochastic, and co-habits the cell with many other simultaneous phenomena including transcription and translation, protein modification, cell division, adhesion, motility, and mechanical transduction of external forces. The very simplifications that make metabolic GENREs powerful tools also make them challenging to use for the study of totally unknown or novel phenomena.
About the interrogation of multi-species relationships the authors write:
A promising direction for computational systems biology is the incorporation of network-level analysis into the field of comparative genomics, which is currently driven by bioinformatics. [...] However, most multi-species analyses reported to date have involved either sub-genome-scale metabolic models or models that have not been carefully annotated. [...] Of the five categories of uses of metabolic GENREs described in this paper, multi-species studies have been represented the least in literature so far. With more genome-scale metabolic models being built and an increased focus on studying multicellular systems, however, we anticipate that this field will see a major increase in activity in the coming years.
Finally, regarding the fifth category, network property discovery, the main point conveyed by the authors of this paper is:
The field of computational systems biology has produced a rich array of methods for network-based analysis, offering tremendous insight into the functioning of metabolic networks. However, many of these methods produce results that can be difficult to link to observable phenotypes. Forging this link poses the greatest challenge toward development of useful network-based tools. For instance, several methods exist to analyze redundancy in metabolic networks. Although these techniques define redundancy intuitively in terms of the number of available paths between a given set of inputs and outputs, relating redundancy to an observable phenotype poses a difficult challenge.
Each chapter of the paper comes along with a wealth of examples and references to concrete research projects that illustrate what has been done in the respective fields so far.