A UNIQUOME BASED METHOD FOR THE PROTEIN IDENTIFICATION BY MASS SPECTROMETRY
Protein identification by mass spectrometry is a pivotal step in proteomics. Numerous methods have been developed to securely and effectively identify proteins derived from experimentally detected peptides by mass spectrometry. The dominant approach is based on the assumption that each experimentally identified peptide can be matched with a peptide included in a database of peptide sequences, generated by in silico digestion of proteins with a specific proteolytic enzyme. In this way, the protein containing the peptide can be identified. In a more advanced approach, the proteins and their in silico digested peptides in the database are transformed into theoretical mass spectrometry spectra, and search engines match the experimentally obtained spectra to these theoretical spectra generated from protein and peptide sequences. We developed an alternative method for protein identification using Core Unique Peptides (CrUPs) and the Uniquome, termed as Uniquome-Based Protein Identification Method (UB-PIM). According to this method, instead of searching for peptides in the database of in silico_digested peptides, we search for CrUPs within the experimentally obtained peptides by mass spectrometry. If a peptide contains at least one CrUP, it can be directly correlated to the protein from which the CrUP is derived. Because of the unique nature of CrUPs, peptides obtained by MS can securely and uniquely identify the protein of origin. This provides a reference space in which even single-peptide identifications can achieve high specificity, reducing the ambiguity caused by shared or homologous sequences and improving the interpretability of MS data. Furthermore, UB-PIM can be applied to any type of peptide and is effective with both Data-Independent Acquisition (DIA) and Data-Dependent Acquisition (DDA) approaches, as well as with top-down and bottom-up proteomics. This allows confident protein identification from minimal evidence, expands the scope of detectable proteins, and remains computationally efficient, rapid, and universally applicable.