Computational Prediction of Proteotypic Peptides for Quantitative Proteomics
Mallick et al. (2006): Computational Prediction of Proteotypic Peptides for Quantitative Proteomics
The authors of this paper empirically identified more than 16,000 proteotypic peptides the properties of which "were used to develop a computational tool that can predict proteotypic peptides for any protein from any organism".
Possible applications of proteotypic peptides include validation of protein identifications, absolute quantification of proteins, annotation of coding sequences in genomes, and characterization of the physical principles governing key elements of mass spectrometric workflows (e.g., digestion, chromatography, ionization and fragmentation).
The authors' research is based on the assumption that "estimates of protein identification probabilities and quantification may be based on the detection of one or a few preferentially observed, or proteotypic, peptides".
Although proteotypic peptides may be collected empirically in databases such as PeptideAtlas, GPM, SBEAMS and PRIDE15-18, such collections are typically not available for many recently sequenced genomes, including some bacteria and archaea that are important model organisms in systems biology. Therefore, it is desirable to be able to predict proteotypic peptides for any protein from any organism whether or not these have previously been reported. [...] We reasoned that prediction of proteotypic peptides might be achieved by first determining the physicochemical peptide properties that distinguish proteotypic peptides from less frequently or unobserved peptides. Towards this goal, we extracted the proteotypic peptides from four large and well-curated archives of yeast proteomic data representing four of the commonly used proteomic platforms (Supplementary Table 1 online) and containing 4600,000 peptide identifications covering 4,030 distinct proteins (61% of the yeast proteome, which contained 6,604 proteins as of this writing, http://www.yeastgenome.org ). A peptide was classified as proteotypic if observed in 450% of all identifications of the corresponding protein (for lists of proteins and proteotypic peptides, see Supplementary Tables 2 and 3 online). [...] Having discovered the most discriminating physicochemical properties for each experimental platform, we applied these predictors to score the proteotypic propensity of each protein's theoretical tryptic peptides.
The authors think that:
The excellent performance of our predictors opens up the exciting possibility that these can be applied universally to any protein of any species of origin as long as DNA sequence information for this protein is available. Predictors derived from physicochemical properties of the constituent amino acids of proteotypic peptides should enable rational selection of synthetic peptides for absolute protein quantification as well as improve label-free quantification algorithms that rely on correlating spectra of observed peptides with protein quantity (Supplementary Results online). We are making the predictors publicly available and anticipate that the community will find other creative uses for proteotypic peptides.