Advancing bioinformatics methods for in-depth proteome analysis based on high-resolution mass spectrometry
Mass spectrometry-based shotgun proteomics has become one of the essential techniques for comprehensive studies of living systems. Due to the inherent complexity of proteomes and the data, bioinformatics plays a critical role to translate mass spectra into biological information and knowledge. Adapting to the increased availability of high-resolution mass analyzers, computational strategies for processing shotgun proteomics data should have some adjustments to utilize the advantages of modern instruments. This thesis presents five constituent papers to illustrate the methodological advancements for analyzing shotgun proteomics data that are generated from high-resolution mass spectrometry. Paper-I describes the DeMix workflow for protein identification, in which we broke down an old paradigm of tandem mass spectrometry by converting the unwanted co-fragmentation events into an advantage of natural multiplexing. DeMix simplifies the data processing procedure and significantly improves protein identification outcomes. Paper-III describes a label-free extension of the DeMix workflow, termed DeMix-Q, which makes use of the quantitative features of extracted ion chromatograms (XICs) for reliably propagating peptide identifications across LC-MS/MS experiments. DeMix-Q improves the reproducibility of peptide quantification by addressing the missing value problem that is caused by the data-dependent acquisition of MS/MS. Based on the results, the concept of quantification-centered proteomics has been proposed. In the practice of quantification-centered proteomics, a flexible proteome summarizing approach termed Diffacto is described in Paper-V, which utilizes the information about covariation of peptides’ abundances to improve the relative quantification of proteins. Diffacto offers automatic quality control to remove inconsistent and unreliable quantitative data on peptides. The combination of a weighted summarizing method and an efficient FDR estimation provides significant enhancement of data utility for large-scale comparative proteomics. In Paper-II, an improved pI estimation method has been introduced to the novel device for sample fractionation based on isoelectric focusing technique. In Paper-IV and V, the applications of peptide de novo sequencing have been demonstrated for analyzing complex proteomes in the absence of reference databases.
List of scientific papers
I. Zhang, B., Pirmoradian, M., Chernobrovkin, A., and Zubarev, R. A. (2014). DeMix Workflow for Efficient Identification of Cofragmented Peptides in High Resolution Data-dependent Tandem Mass Spectrometry. Mol Cell Proteomics. 13:11–17.
https://doi.org/10.1074/mcp.O114.038877
II. Pirmoradian, M., Zhang, B., Chingin, K., Astorga-Wells, J., and Zubarev, R. A. (2014). Membrane-assisted isoelectric focusing device as a micropreparative fractionator for two-dimensional shotgun proteomics. Anal Chem. 86:5728–5732.
https://doi.org/10.1021/ac404180e
III. Zhang, B., Käll, L., and Zubarev, R. A. (2016). DeMix-Q: Quantification-centered Data Processing Workflow. Mol Cell Proteomics. 15:1467–1478.
https://doi.org/10.1074/mcp.O115.055475
IV. Lundström, S. L., Zhang, B., Rutishauser, D., Aarsland, D., and Zubarev, R. A. (2017). SpotLight Proteomics: uncovering the hidden blood proteome improves diagnostic power of proteomics. Sci Reports. 7:41929.
https://doi.org/10.1038/srep41929
V. Zhang, B., Pirmoradian, M., Zubarev, R. A., and Käll, L. (2017). Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences. Mol Cell Proteomics. 2017 Mar 16.
https://doi.org/10.1074/mcp.O117.067728
History
Defence date
2017-05-12Department
- Department of Medical Biochemistry and Biophysics
Publisher/Institution
Karolinska InstitutetMain supervisor
Zubarev, RomanCo-supervisors
Sonnhammer, Erik; Käll, LukasPublication year
2017Thesis type
- Doctoral thesis
ISBN
978-91-7676-617-0Number of supporting papers
5Language
- eng