Web Portal on Computational Biology

Note: Articles considered especially worth reading are highlighted using this color.

Systems Biology, Proteomics, and the Future of Health Care: Toward Predictive, Preventative, and Personalized Medicine

Weston et al. (2004): Systems Biology, Proteomics, and the Future of Health Care: Toward Predictive, Preventative, and Personalized Medicine

This paper is about "paradigm changes in health care" which are going to happen soon and will lead to a "predictive medicine":

We predict that a paradigm shift in medicine will take place within the next two decades replacing the current approach, which is predominantly reactive, to one that can increasingly predict and prevent cellular dysfunction and disease. Within the next 10-15 years, a predictive medicine will emerge, capable of determining a probabilistic, individualized future health history. [...] [P]redictive medicine will involve analyzing the individual genome for disease-susceptibilities and following pathogenic environmental exposures by multiparameter blood analyses.

These paradigm changes are primarily due to innovations in systems biology. Therefore, at the beginning of their paper, the authors provide a definition of systems biology:

Systems biology is the analysis of the relationships among the elements in a system in response to genetic or environmental perturbations, with the goal of understanding the system or the emergent properties of the system. [...] A biological system may encompass molecules, cells, organs, individuals, or even ecosystems. [...] One of the major challenges of systems biology is to determine the architecture of protein and gene regulatory networks and to understand how their behaviors are integrated to carry out biological functions. [...] [T]o do systems biology, as many levels of information as possible must be gathered and integrated. [...] In summary, systems biology is hypothesis-driven, in that systems approaches always begin with a model (descriptive, graphical or mathematical) and the model is tested with hypotheses that require systems perturbations and the gathering of dynamic global data sets. Different data types are integrated and compared against the model. At each turn of the hypothesis-driven process, the model is reformulated. This process is continued until the experimental data and the model are brought into juxtaposition.

What are the effects of systems biology upon clinical medicine going to be? The authors name two things:

First, systems biology will continually improve our capacity to understand and model biological systems on a more global and in-depth scale than ever before. [...] The second major impact of systems biology in medicine will be the continual spawning of new technologies, which will enhance the efficiency, scale and precision with which cellular measurements are made.

As examples of such new technologies, microfluidics and nanotechnology are mentioned.

The importance of systems biology for future healthcare is stressed with the following arguments:

[T]he behaviors of most biological systems, including those affected in cancer, cannot be attributed to a single molecule or pathway, rather they emerge as a result of interactions at multiple levels, and among many cellular components. [...] Understanding the design principles of biomodules and protein and gene regulatory networks during normal physiology and disease will lead to more rationalized and efficacious treatment strategies, as the actual nodal points or direct underlying causes of diseases will be pinpointed.

Two Examples

In the next two chapters, the application of systems biology to two model systems is outlined: galactose utilization in yeast and endomesoderm specification in the sea urchin.

Regarding galactose utilization in yeast, the authors write:

The systems biology approach has provided a wealth of new information even for the relatively simple system whereby yeast utilize galactose as a carbon sources - a system that has been intensely studied for decades and which represents one of the best-characterized systems of gene regulation. [...] Until recently, many have regarded galactose utilization as a simple regulatory network. [...] Further studies, however, have established additional regulatory roles for [various] events[.] [...] All of these events take place during galactose induction. Despite these additional insights, however, it was not until the galactose system was interrogated using a large-scale, systems biology approach, that the complexity of this system and its interconnections with other cellular functions became apparent.

Which insights has the systems biology study of galactose utilization brought to us?

The systems biology study of galactose utilization provided a number of new insights. First, this was the earliest study to report, on a global level, a poor correlation between changes in mRNA levels and changes in protein expression. This suggests that posttranscriptional regulatory mechanisms are important for changing patterns of protein expression. Second, it was demonstrated, unequivocally, that although the galactose pathway itself involves a well-characterized transcriptional network controlling the genes required for galactose utilization, the cellular response to galactose extends well beyond the activation of these genes.

In the course of these studies, various methods were used, such as "microarray expression analysis, genome-wide binding analysis, the use of search algorithms on a defined list of sequences, and comparative genomics". With the integration of all of these using "computational approaches", "accurate models of gene modules" have been generated "in which the targets of a transcription factor are defined, as are the ciselements to which these factors bind".

While "[t]he analysis of the galactose utilization system in yeast displays a systems approach to understanding a simple physiological response[,] [t]he studies carried out by Eric Davidson and colleagues, to understand endomesoderm specification in sea urchin larva, demonstrate the power of a systems approach to understanding developmental processes".

Davidson and co-workers have extensively analyzed the regulatory gene network underlying endomesodermal specification in sea urchin embryos. In one approach, they focused on the cis-regulatory system of the developmentally regulated endo 16 genes - a marker of endoderm cell fate specification. [...] In addition, Davidson and colleagues constructed a gene regulatory network for endomesodermal development in the larva.

What conclusions could be drawn?

First, there appear to be a variety of subcircuits similar to those found in engineering (feedforward, feed-backward, positive feed back loops, negative feedback loops, etc). [...] Second, the network is designed to move development forward inexorably, in keeping with the fact that development is, under most conditions, irreversible. Finally, a careful examination of the network suggests perturbations that may change fundamental emergent properties of the system. Indeed, one such perturbation has been carried out to generate a larva with two guts.

The significance of all of this is that "these model systems are providing fundamental new strategies for thinking about drug and drug target discovery".

Proteomics

Next, the authors talk about proteomics. Their motivation for doing so is as follows:

[P]roteins are the actual effectors driving cell behavior, and they cannot be studied simply by looking at the genes or mRNAs that encode them, thus warranting the establishment of a field, now termed proteomics, devoted entirely to their study.

Regarding the goal of proteomics research, they write:

The goal of proteomics research is to understand the expression and function of proteins on a global level. More than simply cataloguing the proteomesa quantitative assessment of the full complement of proteins within cells, the field of proteomics strives to characterize protein structure and function, protein-protein, protein-nucleic acid, protein-lipid, and enzyme-substrate interactions, post-translational modifications, protein processing and folding, protein activation, cellular and sub-cellular localization, protein turnover and synthesis rates, and even alternative isoforms caused by differential splicing and promoter usage. In addition, the ability to capture and compare all of this information between two cellular states is essential for understanding cellular responses.

Two approaches towards mass spectrometry for "global quantitative protein profiling" are introduced:

The more established and most widespread method uses high-resolution, two-dimensional electrophoresis (2DE) to separate proteins from two different samples in parallel, followed by staining and selection of differentially expressed proteins to be identified by mass spectrometry. [...] A second quantitative approach, which is gaining in popularity, uses stable isotope tags to differentially label proteins from two different complex mixtures. In this method, proteins within a complex mixture are first labeled isotopically then digested to yield labeled peptides. The two differentially labeled peptide mixtures are then combined, peptides separated by multidimensional liquid chromatography (LC) and analyzed by tandem mass spectrometry. Peptides are identified by automated database searches, and relative protein abundances are obtained from the mass spectra.

The authors further write that "[t]he identification of biomarkers is an area in which proteomics will undoubtedly have a significant impact - a prospect that has not gone unnoticed by the proteomics community".

There are two concerns related to biomarkers:

First, of the biomarkers routinely used to diagnose disease, most are capable of detecting the onset or advanced progression of disease, but have little, if any, predictive power. [...] The second concern with respect to the use of single molecule biomarkers is that it is based on the expectation that an increase in the concentration of a single protein can unambiguously specify diseasesa dangerous and unrealistic assumption. Diseases are characterized by heterogeneity between individuals; the same disease can be initiated by numerous factors and can cause a range of molecular changes.

For these reasons, the authors want to replace traditional biomarkers with multiparameter analyses, aided by systems biology:

Just as normal physiology and disease arise from protein and gene regulatory networks, normal and perturbed, and these require analyses of all the elements in the system, diagnostics will also require the analysis of multicomponents to reflect the true complexity of the disease process. Moreover, multiparameter analyses will be able to (1) predict the onset of disease, (2) stratify disease (e.g., prostate cancer is probably three or four diseases and not just a single one), (3) indicate the progression of the disease, (4) follow the course of treatment, and (5) make predictions about the effectiveness of a drug or adverse reactions, etc. By this view, multiparameter analyses of the serum or blood will provide a window into health and disease.

This will eventually lead to a new way of diagnosing diseases, by means of "serum proteome patterns":

The concept behind pattern diagnostics is that the blood plasma proteome reflects tissue and organ pathology, causing patterns of protein changes that have diagnostic potential without even knowing the identities of the individual proteins. Since MS-based approaches provide a pattern of peaks, the idea is that these patterns can discriminate certain diseases.

The authors write about a study that was intended to prove that this principle works:

In the first proof-of-principle study, a new computer-based artificial intelligence algorithm was used to identify patterns among a training set of mass spectral data[.] [...] The algorithm generated a proteomic pattern that was then used to identify ovarian cancer in individuals from a second independent group[.] [...] [T]his technique can be used alongside any number of other indicators such as genetic defects, or histopathological findings, to make more accurate diagnoses.

Again, there are some concerns about this technique:

Some researchers contest that [this technique] is not sensitive enough, and captures only high abundance proteins, and therefore is not suitable for measuring true cancer biomarkers. Of equal concern is the reproducibility of the technique. [...] In addition to these concerns, the concept of using a pattern of MS peaks to diagnose disease without knowing the identities of the proteins responsible for those peaks is a foreign one and a major point of contention for many researchers.

The authors conclude that we need "to obtain more data on this approach to evaluate its predictive power".

The next chapter deals with protein chips:

[T]he goal behind protein microarrays is to print thousands of protein-detecting features, for the interrogation of biological samples. An example is antibody arrays (also referred to as protein profiling arrays), in which a host of different antibodies (e.g., monoclonal, polyclonal, antibody fragments) are arrayed to detect their respective antigens from a sample of human blood. [...] [T]he implementation of protein arrays is a much greater challenge than DNA arrays for a number of reasons. Proteins are inherently much more difficult to work with than DNA, their solubility varies widely, they have a broad dynamic range, they are much less stable than DNA, and their structure can be difficult to preserve on a glass slide, but is essential for most assays (unlike DNA, in which only the sequence order needs to be maintained). Finally, there is no technique, analagous to PCR, that exists for amplifying proteins, and thus the starting material is much more of a limiting factor.

Then, reverse phase microarrays are described, which are "particularly useful for profiling the status of cellular signaling molecules, or post-translational modifications, among a cross-section of tissue that includes both normal and cancerous cells".

This method can track all kinds of molecular events and can compare diseased and healthy tissues within the same patient, enabling the development of individualized diagnosis and treatment strategies. The ability to acquire proteomic snapshots of neighboring cell populations, using multiplexed reverse phase microarrays in conjunction with LCM [laser capture microdissection], will have applications in a number of areas beyond the study of tumors. The approach can provide insights into normal physiology and pathology of all tissues, and will be invaluable for characterizing developmental processes and anomalies. It should be emphasized, however, that beyond reverse phase microarrays, the marriage of LCM with any refined proteomics platform offers great promise for extracting information from pure cell populations, in turn decreasing some of the limitations imposed by tissue heterogeneity.

Next, the authors write about emerging trends in proteomics, and among other things, they state:

Advances in quantitative proteomics would clearly enable more in-depth analyses of cellular systems. However, for many cellular events, protein concentrations likely do not change significantly, rather their function is modulated by posttranslational modifications (PTMs). Over 400 PTMs have been described, many with important influences on cell function. Methods of monitoring PTMs are sorely needed in proteomics, but to date, this remains an underdeveloped area.

According to the authors, one of the main goals in proteomics is "characterizing the human plasma proteome" because the blood should "contain information on the physiological state of all tissues in the body". The authors discuss some of the obstacles in this endeavour.

Further, they write about micro- and nanoscale technology:

Developing any technology intended for clinical use will require the minaturation, integration and automation of the procedures for sample analyses. This in turn will lead to more sensitive and cost-effective analyses. [...] Biological systems are made up of individual molecules operating on a nanoscale, whereas current tools used in medicine are much larger and thus inadequate for fully characterizing cellular function at the molecular level.

As examples of these technologies, quantum dots, microfluidics, microcantilevers and nanowire sensors are provided. The authors conclude this chapter with the following words:

With these devices, one can eventually imagine analyzing 100s, 1000s, or even 10,000s blood elements. In addition, we predict that individuals will have their genomes sequenced relatively inexpensively within the next l0-15 years, making it possible to provide each individual with a probabilistic future health history. Thus, the predictive medicine will assess the digital information of the genome and the pathological cues of the environment. Another area for which nanotechnology has an application is that of drug delivery systems. It is conceivable that in the future, drugs will be delivered to specific targets in the body via biodegradable devices. Implantable biosensors can also be foreseen, which can monitor sugar levels in the cells of diabetics, and release insulin as needed, resulting in much more precise control of blood sugar levels than is currently attainable in diabetics. Finally, an exciting possibility is the use of microrobots and probes, which can target and destroy tumors.

Then comes a chapter on bioinformatics related to proteomics, in which the database search programs SEAQUEST, MASCOT, ProFound, MS-Tag, Sonar and PeptideProphet are mentioned.

These programs are used determine the amino acid sequence and thus the protein(s) corresponding to a given mass spectrum, but in many cases they generate a large number of incorrect assignments. Improvements to the current capabilities for tandem-MS identification are continually being developed.

What is also needed, according to the authors, is a unified database format.

Fortunately, the Human Proteome Organization (HUPO), which was formed to coordinate worldwide proteomic efforts, has taken on this challenge and through the Proteomic Standards Initiative (PSI) group, established in 2002, is developing a common data standard which will enable users to retrieve data from different sites and perform comparative analyses of different data sets. [...] In a similar manner, a standard format has been adapted by the Microarray Gene Expression Data Society (MGEDS) for depositing microarray expression data. As a result, MAGE-ML (MicroArray Gene Expression Markup Language) was designed to describe and communicate information about microarray experiments, incorporating the principles outlined by an earlier standard, MIAME (Minimum Information About a Microarray Experiment). [...] Parenthetically, we are attempting to develop a database at the Institute for Systems Biology, (Systems Biology Expression and Management system, or SBEAMs), that will be able to acquire all relevant types of global data sets (DNA, RNA, proteins, interactions, phenotypic data, etc.) and begin to do the integrations that are an essential part of systems biology.

The next step is computational integration:

The goal of cataloguing all of the cellular elements under various conditions and in various organisms is well underway, and becoming increasingly possible as global technologies mature. The next phase is to understand how these elements are coordinated to form functional biological systems. Systems-level integration of data is still in its infancy, but a number of new concepts have emerged. [...] The second benefit of data integration is that it serves to reveal new biological phenomena, which would not be readily apparent from any single analysis. [...] The ultimate goal is to characterize the information flow through protein networks that interconnect the extracellular microenvironment with the control specified by gene regulatory networks which, in turn, active the peripherial batteries of genes to execute the effector functions of development and physiological responses. To successfully understand the interfacing of these protein and gene regulatory networks will require, ultimately, the integrations of many of the different data types arising from DNA, RNA, protein, metabolites, small molecules, and many different aspects of phenotype.

Summary

Finally, the authors summarize the benefits systems biology will bring to the future of medicine.

In conclusion, the emerging fields of systems biology and proteomics offer exciting and promising advances toward predictive, preventative, and personalized medicine. [...] Understanding protein and gene regulatory networks of biological systems will improve drug development efforts and eventually will lead to preventive drugs. [...] Proteomics will play a major role both in developing better multiparameter diagnostics and in the search for new therapeutic targets. [...] Integrating different types of biological information will be critical both for understanding biological systems and for accurately diagnosing and monitoring disease.

Contact: cdvolko (at) gmail (dot) com

Imprint: This website is owned by Claus Volko, Vienna, Austria. No liability is taken for the contents of any of the linked websites. http://www.cdvolko.net/