Protein Identification Algorithms Developed from Statistical Analysis of MS/MS Fragmentation Patterns

Persistent Link:
http://hdl.handle.net/10150/242432
Title:
Protein Identification Algorithms Developed from Statistical Analysis of MS/MS Fragmentation Patterns
Author:
Li, Wenzhou
Issue Date:
2012
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Tandem mass spectrometry is widely used in proteomic studies because of its ability to identify large numbers of peptides from complex mixtures. In a typical LC-MS/MS experiment, thousands of tandem mass spectra will be collected and peptide identification algorithms are of great importance to translate them into peptide sequences. Though these spectra contain both m/z and intensity values, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. In this dissertation, an unsupervised statistical method, K-means clustering, was used to study peptide fragmentation patterns for both CID and ETD data, and many unique fragmentation features were discovered. For instance, strong c(n-1) ions were observed in ETD, indicating that the fragmentation site in ETD is highly related to the amino acid residue location. Based on the fragmentation patterns observed through data mining, a peptide identification algorithm that makes use of these patterns was developed. The program is named SQID and it is the first algorithm in our bioinformatics project. Our testing results using multiple public datasets indicated an improvement in the number of identified peptides compared with popular proteomics algorithms such as Sequest or X!Tandem. SQID was further extended to improve cross-linked peptide identification (SQID-XLink) as well as blind modification identification (SQID-Mod), and both of them showed significant improvement compared with existing methods. In this dissertation the SQID algorithm was also successfully applied to a mosquito proteomics project. We are incorporating new features and new algorithms to our software, such as more fragmentation methods, more accurate spectra prediction and more user-friendly interface. We hope the SQID project can continually benefit researchers and help to improve the data analysis of proteomics community.
Type:
text; Electronic Dissertation
Keywords:
peptide fragmentation pattern; Chemistry; algorithm; mass spectrometry
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Chemistry
Degree Grantor:
University of Arizona
Advisor:
Wysocki, Vicki H.

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleProtein Identification Algorithms Developed from Statistical Analysis of MS/MS Fragmentation Patternsen_US
dc.creatorLi, Wenzhouen_US
dc.contributor.authorLi, Wenzhouen_US
dc.date.issued2012-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractTandem mass spectrometry is widely used in proteomic studies because of its ability to identify large numbers of peptides from complex mixtures. In a typical LC-MS/MS experiment, thousands of tandem mass spectra will be collected and peptide identification algorithms are of great importance to translate them into peptide sequences. Though these spectra contain both m/z and intensity values, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. In this dissertation, an unsupervised statistical method, K-means clustering, was used to study peptide fragmentation patterns for both CID and ETD data, and many unique fragmentation features were discovered. For instance, strong c(n-1) ions were observed in ETD, indicating that the fragmentation site in ETD is highly related to the amino acid residue location. Based on the fragmentation patterns observed through data mining, a peptide identification algorithm that makes use of these patterns was developed. The program is named SQID and it is the first algorithm in our bioinformatics project. Our testing results using multiple public datasets indicated an improvement in the number of identified peptides compared with popular proteomics algorithms such as Sequest or X!Tandem. SQID was further extended to improve cross-linked peptide identification (SQID-XLink) as well as blind modification identification (SQID-Mod), and both of them showed significant improvement compared with existing methods. In this dissertation the SQID algorithm was also successfully applied to a mosquito proteomics project. We are incorporating new features and new algorithms to our software, such as more fragmentation methods, more accurate spectra prediction and more user-friendly interface. We hope the SQID project can continually benefit researchers and help to improve the data analysis of proteomics community.en_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
dc.subjectpeptide fragmentation patternen_US
dc.subjectChemistryen_US
dc.subjectalgorithmen_US
dc.subjectmass spectrometryen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineChemistryen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorWysocki, Vicki H.en_US
dc.contributor.committeememberSaavedra, S. Scotten_US
dc.contributor.committeememberBrown, Michael F.en_US
dc.contributor.committeememberMontfort, William R.en_US
dc.contributor.committeememberWysocki, Vicki H.en_US
dc.contributor.committeememberCordes, Matthew H. J.en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.