MASS SPECTROMETRY-CLEAVABLE CROSS-LINKING AGENTS TO FACILITATE STRUCTURAL ANALYSIS OF PROTEINS AND PROTEIN COMPLEXES, AND METHOD OF USING SAME

Novel cross-linking compounds that can be used in mass spectrometry, tandem mass spectrometry, and multi-stage tandem mass spectrometry to facilitate structural analysis of proteins and protein complexes are provided and have the formula: where X is an N-hydroxy-succinimidyl or similar heterocyclic group. Also provided is a method of mapping protein-protein interactions of protein complexes using various mass spectrometry techniques.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. patent application Ser. No. 13/471,365, filed May 14, 2012, and issued as U.S. Pat. No. 9,222,943 on Dec. 29, 2015, which was based on U.S. provisional patent application No. 61/486,260, filed May 14, 2011, the entire contents of which are incorporated by reference herein.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with Government support under Grant No. GM074830, awarded by the National Institute of Health. The Government has certain rights in this invention.

SEQUENCE LISTING IN ELECTRONIC FORMAT

A Sequence Listing in electronic format is provided as a file entitled UCI012_001C1_SEQLIST.TXT which is 32,341 bytes in size, which was created on May 10, 2016, and which was last modified on May 10, 2016.

FIELD OF THE INVENTION

The invention relates to the field of cross-linking agents and, more specifically, MS-cleavable cross-linkers that are diester derivatives of 3,3′-sulfinylbispropanoic acid, and the use of such compounds to facilitate structural analysis of proteins and protein complexes.

BACKGROUND OF THE INVENTION

Knowledge of elaborate structures of protein complexes is fundamental for understanding their functions and regulations. Although cross-linking coupled with mass spectrometry (MS) has been presented as a feasible strategy for structural elucidation of large multi-subunit protein complexes, this method has proven challenging due to technical difficulties in unambiguous identification of cross-linked peptides and determination of cross-linked sites by MS analysis.

Proteins form stable and dynamic multi-subunit complexes under different physiological conditions to maintain cell viability and normal cell homeostasis. Detailed knowledge of protein interactions and protein complex structures is fundamental to understanding how individual proteins function within a complex and how the complex functions as a whole. However, structural elucidation of large multi-subunit protein complexes has been difficult due to lack of technologies which can effectively handle their dynamic and heterogeneous nature. Traditional methods such as nuclear magnetic resonance (NMR) analysis and X-ray crystallography can yield detailed information on protein structures; however, NMR spectroscopy requires large quantities of pure protein in a specific solvent while X-ray crystallography is often limited by the crystallization process.

In recent years, chemical cross-linking coupled with mass spectrometry (MS) has become a powerful method for studying protein interactions. See for example the disclosures of Sinz, A. (2003) Chemical Cross-Linking and Mass Spectrometry for Mapping Three-Dimensional Structures of Proteins and Protein Complexes. J Mass Spectrom. 38, 1225-1237; Sinz, A. (2006) Chemical Cross-Linking and Mass Spectrometry to Map Three-Dimensional Protein Structures and Protein-Protein Interactions. Mass Spectrom Rev 25, 663-682; and Leitner, A., Walzthoeni, T., Kahraman, A., Herzog, F., Rinner, O., Beck, M., and Aebersold, R. (2010) Probing Native Protein Structures by Chemical Cross-Linking, Mass Spectrometry and Bioinformatics. Molecular & Cellular Proteomics 9, 1634-1649. Chemical cross-linking stabilizes protein interactions through the formation of covalent bonds and allows the detection of stable, weak and/or transient protein-protein interactions in native cells or tissues See for example the disclosures of Sinz, A. (2010) Investigation of Protein-Protein Interactions in Living Cells by Chemical Crosslinking and Mass Spectrometry. Anal Bioanal Chem 397, 3433-3440; Vasilescu, J., Guo, X., and Kast, J. (2004) Identification of Protein-Protein Interactions Using in Vivo Cross-Linking and Mass Spectrometry. Proteomics 4, 3845-3854; Guerrero, C., Tagwerker, C., Kaiser, P., and Huang, L. (2006) An Integrated Mass Spectrometry-Based Proteomic Approach: Quantitative Analysis of Tandem Affinity-Purified in Vivo Cross-Linked Protein Complexes (Qtax) to Decipher the 26 S Proteasome-Interacting Network. Mol Cell Proteomics 5, 366-378; Tagwerker, C., Flick, K., Cui, M., Guerrero, C., Dou, Y., Auer, B., Baldi, P., Huang, L., and Kaiser, P. (2006) A Tandem Affinity Tag for Two-Step Purification under Fully Denaturing Conditions: Application in Ubiquitin Profiling and Protein Complex Identification Combined with in Vivocross-Linking. Mol Cell Proteomics 5, 737-748; Guerrero, C., Milenkovic, T., Przulj, N., Kaiser, P., and Huang, L. (2008) Characterization of the Proteasome Interaction Network Using a Qtax-Based Tag-Team Strategy and Protein Interaction Network Analysis. Proc Natl Acad Sci USA 105, 13333-13338; and Kaake, R. M., Milenkovic, T., Przulj, N., Kaiser, P., and Huang, L. (2010) Characterization of Cell Cycle Specific Protein Interaction Networks of the Yeast 26s Proteasome Complex by the Qtax Strategy. J Proteome Res 9, 2016-2019. In addition to capturing protein interacting partners, many studies have shown that chemical cross-linking can yield low-resolution structural information about the constraints within a molecule. See for example the disclosures of Sinz, A. (2006) Chemical Cross-Linking and Mass Spectrometry to Map Three-Dimensional Protein Structures and Protein-Protein Interactions. Mass Spectrom Rev 25, 663-682; Leitner, A., Walzthoeni, T., Kahraman, A., Herzog, F., Rinner, O., Beck, M., and Aebersold, R. (2010) Probing Native Protein Structures by Chemical Cross-Linking, Mass Spectrometry and Bioinformatics. Molecular & Cellular Proteomics 9, 1634-1649; and Back, J. W., de Jong, L., Muijsers, A. O., and de Koster, C. G. (2003) Chemical Cross-Linking and Mass Spectrometry for Protein Structural Modeling. J Mol Biol. 331, 303-313, or protein complex, as disclosed in Rappsilber, J., Siniossoglou, S., Hurt, E. C., and Mann, M. (2000) A Generic Strategy to Analyze the Spatial Organization of Multi-Protein Complexes by Cross-Linking and Mass Spectrometry. Anal Chem. 72, 267-275; Maiolica, A., Cittaro, D., Borsotti, D., Sennels, L., Ciferri, C., Tarricone, C., Musacchio, A., and Rappsilber, J. (2007) Structural Analysis of Multiprotein Complexes by Cross-Linking, Mass Spectrometry, and Database Searching. Mol Cell Proteomics 6, 2200-2211; and Chen, Z. A., Jawhari, A., Fischer, L., Buchen, C., Tahir, S., Kamenski, T., Rasmussen, M., Lariviere, L., Bukowski-Wills, J. C., Nilges, M., Cramer, P., and Rappsilber, J. (2010) Architecture of the Rna Polymerase Ii-Tfiif Complex Revealed by Cross-Linking and Mass Spectrometry. Embo J 29, 717-726. The application of chemical cross-linking, enzymatic digestion, and subsequent mass spectrometric and computational analysis for the elucidation of three dimensional protein structures offers distinct advantages over traditional methods due to its speed, sensitivity, and versatility. Identification of cross-linked peptides provides distance constraints that aid in constructing the structural topology of proteins and/or protein complexes. Although this approach has been successful, effective detection and accurate identification of cross-linked peptides as well as unambiguous assignment of cross-linked sites remain extremely challenging due to their low abundance and complicated fragmentation behavior in MS analysis. See for the example the disclosures of Sinz, A. (2006) Chemical Cross-Linking and Mass Spectrometry to Map Three-Dimensional Protein Structures and Protein-Protein Interactions. Mass Spectrom Rev 25, 663-682; Leitner, A., Walzthoeni, T., Kahraman, A., Herzog, F., Rinner, O., Beck, M., and Aebersold, R. (2010) Probing Native Protein Structures by Chemical Cross-Linking, Mass Spectrometry and Bioinformatics. Molecular & Cellular Proteomics 9, 1634-1649; Back, J. W., de Jong, L., Muijsers, A. O., and de Koster, C. G. (2003) Chemical Cross-Linking and Mass Spectrometry for Protein Structural Modeling. J Mol Biol. 331, 303-313; and Schilling, B., Row, R. H., Gibson, B. W., Guo, X., and Young, M. M. (2003) Ms2assign, Automated Assignment and Nomenclature of Tandem Mass Spectra of Chemically Crosslinked Peptides. J Am Soc Mass Spectrom. 14, 834-850. Therefore, new reagents and methods are urgently needed to allow unambiguous identification of cross-linked products and to improve the speed and accuracy of data analysis to facilitate its application in structural elucidation of large protein complexes.

A number of approaches have been developed to facilitate MS detection of low abundance cross-linked peptides from complex mixtures. These include selective enrichment using affinity purification with biotinylated cross-linkers, for example, as described in Trester-Zedlitz, M., Kamada, K., Burley, S. K., Fenyo, D., Chait, B. T., and Muir, T. W. (2003) A Modular Cross-Linking Approach for Exploring Protein Interactions. J Am Chem Soc. 125, 2416-2425; Tang, X., Munske, G. R., Siems, W. F., and Bruce, J. E. (2005) Mass Spectrometry Identifiable Cross-Linking Strategy for Studying Protein-Protein Interactions. Anal Chem 77, 311-318; and Chu, F., Mahrus, S., Craik, C. S., and Burlingame, A. L. (2006) Isotope-Coded and Affinity-Tagged Cross-Linking (Icatxl): An Efficient Strategy to Probe Protein Interaction Surfaces. J Am Chem Soc 128, 10362-10363, and click chemistry with alkyne-tagged (Chowdhury, S. M., Du, X., Tolic, N., Wu, S., Moore, R. J., Mayer, M. U., Smith, R. D., and Adkins, J. N. (2009) Identification of Cross-Linked Peptides after Click-Based Enrichment Using Sequential Collision-Induced Dissociation and Electron Transfer Dissociation Tandem Mass Spectrometry. Anal Chem 81, 5524-5532) or azide tagged cross-linkers, see for example Kasper, P. T., Back, J. W., Vitale, M., Hartog, A. F., Roseboom, W., de Koning, L. J., van Maarseveen, J. H., Muijsers, A. O., de Koster, C. G., and de Jong, L. (2007) An Aptly Positioned Azido Group in the Spacer of a Protein Cross-Linker for Facile Mapping of Lysines in Close Proximity. Chembiochem 8, 1281-1292; and Nessen, M. A., Kramer, G., Back, J., Baskin, J. M., Smeenk, L. E., de Koning, L. J., van Maarseveen, J. H., de Jong, L., Bertozzi, C. R., Hiemstra, H., and de Koster, C. G. (2009) Selective Enrichment of Azide-Containing Peptides from Complex Mixtures. J Proteome Res 8, 3702-3711. In addition, Staudinger ligation has recently been shown to be effective for selective enrichment of azide-tagged cross-linked peptides (Vellucci, D., Kao, A., Kaake, R. M., Rychnovsky, S. D., and Huang, L. (2010) Selective Enrichment and Identification of Azide-Tagged Cross-Linked Peptides Using Chemical Ligation and Mass Spectrometry. J Am Soc Mass Spectrom 21, 1432-1445). Apart from enrichment, detection of cross-linked peptides can be achieved by isotope-labeled, as described in Collins, C. J., Schilling, B., Young, M., Dollinger, G., and Guy, R. K. (2003) Isotopically Labeled Crosslinking Reagents: Resolution of Mass Degeneracy in the Identification of Crosslinked Peptides. Bioorg Med Chem Lett. 13, 4023-4026; Petrotchenko, E. V., Olkhovik, V. K., and Borchers, C. H. (2005) Isotopically Coded Cleavable Cross-Linker for Studying Protein-Protein Interaction and Protein Complexes. Mol Cell Proteomics 4, 1167-1179; and Petrotchenko, E., and Borchers, C. (2010) Icc-Class: Isotopically-Coded Cleavable Crosslinking Analysis Software Suite. BMC bioinformatics 11, 64, fluorescently labeled (Sinz, A., and Wang, K. (2004) Mapping Spatial Proximities of Sulfhydryl Groups in Proteins Using a Fluorogenic Cross-Linker and Mass Spectrometry. Anal Biochem. 331, 27-32), and mass-tag labeled cross-linking reagents, for example as described in Tang, X., Munske, G. R., Siems, W. F., and Bruce, J. E. (2005) Mass Spectrometry Identifiable Cross-Linking Strategy for Studying Protein-Protein Interactions. Anal Chem 77, 311-318; and Back, J. W., Hartog, A. F., Dekker, H. L., Muijsers, A. O., de Koning, L. J., and de Jong, L. (2001) A New Crosslinker for Mass Spectrometric Analysis of the Quaternary Structure of Protein Complexes. J. Am. Soc. Mass Spectrom. 12, 222-227. These methods can identify cross-linked peptides with MS analysis, but interpretation of the data generated from inter-linked peptides (two peptides connected with the cross-link) by automated database searching remains difficult. Several bioinformatics tools have thus been developed to interpret MS/MS data and determine inter-linked peptide sequences from complex mixtures, as described in Maiolica, A. et al.; Schilling, B. et al.; Chu, F., Baker, P. R., Burlingame, A. L., and Chalkley, R. J. (2009) Finding Chimeras: A Bioinformatic Strategy for Identification of Cross-Linked Peptides. Mol Cell Proteomics 9, 25-31; Gao, Q., Xue, S., Shaffer, S. A., Doneanu, C. E., Goodlett, D. R., and Nelson, S. D. (2008) Minimize the Detection of False Positives by the Software Program Detectshift for 18o-Labeled Cross-Linked Peptide Analysis. Eur J Mass Spectrom (Chichester, Eng) 14, 275-280; Singh, P., Shaffer, S. A., Scherl, A., Holman, C., Pfuetzner, R. A., Larson Freeman, T. J., Miller, S. I., Hernandez, P., Appel, R. D., and Goodlett, D. R. (2008) Characterization of Protein Cross-Links Via Mass Spectrometry and an Open-Modification Search Strategy. Anal Chem 80, 8799-8806; Rinner, O., Seebacher, J., Walzthoeni, T., Mueller, L. N., Beck, M., Schmidt, A., Mueller, M., and Aebersold, R. (2008) Identification of Cross-Linked Peptides from Large Sequence Databases. Nat Methods 5, 315-318; Lee, Y. J., Lackner, L. L., Nunnari, J. M., and Phinney, B. S. (2007) Shotgun Cross-Linking Analysis for Studying Quaternary and Tertiary Protein Structures. J Proteome Res 6, 3908-3917; and Nadeau, O. W., Wyckoff, G. J., Paschall, J. E., Artigues, A., Sage, J., Villar, M. T., and Carlson, G. M. (2008) Crosssearch, a User-Friendly Search Engine for Detecting Chemically Cross-Linked Peptides in Conjugated Proteins. Mol Cell Proteomics 7, 739-749. Although promising, further developments are still needed to make such data analyses as robust and reliable as analyzing MS/MS data of single peptide sequences using existing database searching tools (e.g. Protein Prospector, Mascot or SEQUEST).

Various types of cleavable cross-linkers with distinct chemical properties have been developed to facilitate MS identification and characterization of cross-linked peptides. These include UV photocleavable (Nadeau, O. W., Wyckoff, G. J., Paschall, J. E., Artigues, A., Sage, J., Villar, M. T., and Carlson, G. M. (2008) Crosssearch, a User-Friendly Search Engine for Detecting Chemically Cross-Linked Peptides in Conjugated Proteins. Mol Cell Proteomics 7, 739-749), chemical cleavable (Kasper, P. T., et al.), isotopically-coded cleavable (Petrotchenko, E. V., et al.), and MS-cleavable reagents, as described in Tang, X, et. al.; Back, J. W., et. al.; Zhang, H., Tang, X., Munske, G. R., Tolic, N., Anderson, G. A., and Bruce, J. E. (2009) Identification of Protein-Protein Interactions and Topologies in Living Cells with Chemical Cross-Linking and Mass Spectrometry. Mol Cell Proteomics 8, 409-420; Soderblom, E. J., and Goshe, M. B. (2006) Collision-Induced Dissociative Chemical Cross-Linking Reagents and Methodology: Applications to Protein Structural Characterization Using Tandem Mass Spectrometry Analysis. Anal Chem 78, 8059-8068; Soderblom, E. J., Bobay, B. G., Cavanagh, J., and Goshe, M. B. (2007) Tandem Mass Spectrometry Acquisition Approaches to Enhance Identification of Protein-Protein Interactions Using Low-Energy Collision-Induced Dissociative Chemical Crosslinking Reagents. Rapid Commun Mass Spectrom 21, 3395-3408; Lu, Y., Tanasova, M., Borhan, B., and Reid, G. E. (2008) Ionic Reagent for Controlling the Gas-Phase Fragmentation Reactions of Cross-Linked Peptides. Anal Chem 80, 9279-9287; and Gardner, M. W., Vasicek, L. A., Shabbir, S., Anslyn, E. V., and Brodbelt, J. S. (2008) Chromogenic Cross-Linker for the Characterization of Protein Structure by Infrared Multiphoton Dissociation Mass Spectrometry. Anal Chem 80, 4807-4819. MS-cleavable cross-linkers have received considerable attention since the resulting cross-linked products can be identified based on their characteristic fragmentation behavior observed during MS analysis. Gas-phase cleavage sites result in the detection of a “reporter” ion (Back, J. W., et al.), single peptide chain fragment ions (Soderblom, E. J., and Goshe; Soderblom, E. J., Bobay, B. G., et al.; Lu, Y., et al. and Gardner, M. W. et al.), or both reporter and fragment ions (Tang, X., et al.; and Zhang, H. et. al.). In each case, further structural characterization of the peptide product ions generated during the cleavage reaction can be accomplished by subsequent MSn1 analysis. Among these linkers, the “fixed charge” sulfonium ion containing cross-linker developed by Lu. et. al appears to be the most attractive as it allows specific and selective fragmentation of cross-linked peptides regardless of their charge and amino acid composition based on their studies with model peptides.

Despite the availability of multiple types of cleavable cross-linkers, most of the applications have been limited to the study of model peptides and single proteins. Additionally, complicated synthesis and fragmentation patterns have impeded most of the known MS-cleavable cross-linkers from wide adaptation by the community.

SUMMARY OF THE INVENTION

The present invention provides novel cross-linking compounds that can be coupled with multi-stage tandem mass spectrometry (MSn) to facilitate structural analysis of proteins and protein complexes. In a first aspect of the invention, a new crosslinking compound is provided and has the formula:

where x is selected from the group consisting of

wherein R is methyl or ethyl,
and

Compounds of the general formula shown above are symmetric diester derivatives of 3,3′-sulfinylbispropanoic acid, also known as 3,3′-sulfinyldipropanoic acid, C6H10O5S. Like the diacid, the diesters have two symmetric collision-induced dissociation (CID)-cleavable sites that allow effective identification of diestercross-linked peptides based on their distinct fragmentation patterns unique to cross-linking types (i.e. inter-link, intra-link, and dead-end).

In a second aspect of the invention, the new cross-linking agents are used to facilitate mapping of protein-protein interactions of protein complexes. In one embodiment, the method comprises the steps of providing a MS-cleavable cross-linker having the formula described above; forming a cross-linked protein complex by cross-linking proteins with the MS-cleavable cross-linker; forming protein and/or peptide fragments that are chemically bound to the MS-cleavable cross-linker by digesting the cross-linked protein complex with an enzyme such as trypsin; and using mass spectrometry (MS) and MSn analysis to identify the protein and/or peptide fragments.

In another aspect of the invention, a method for integrated data analysis work flow for identification of cross-linked peptides is provided and comprises the steps of providing cross-linked peptides, each cross-linked peptide comprising an MS-cleavable cross-linker as described above; performing mass spectrometry on the cross-linked peptides to obtain MS data, MS/MS data, and MS3 data; identifying the MS/MS data comprising characteristic fragmentation profiles of MS-cleavable cross-linker-containing cross-linked peptides to obtain an MS/MS result comprising a list of parent ions corresponding to cross-linked peptide candidates; peptide sequencing the cross-linked peptides using the MS3 data to obtain an MS3 result comprising identities of cleaved cross-linked peptide fragments generated during MS/MS analysis; mass mapping the MS data using the list of parent ions corresponding to the cross-linked peptide candidates against a database comprising known protein sequences and the MS-cleavable cross-linker to obtain an MS result comprising possible cross-linked peptide sequences based on theoretical masses; and integrating the MS result, the MS/MS result, and MS3 result to identify cross-linked peptides.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

FIG. 1 shows exemplary Compounds 1 and 3-9 and General Structure 2 according to the invention.

FIG. 2 shows proposed fragmentation schemes of DSSO-cross-linked peptides. A, DSSO synthesis and structure. B-D, MS/MS fragmentation patterns of the three types of DSSO-cross-linked peptides: interlinked (B), dead end (C), and intralinked (D). E, conversion of a sulfenic acid-modified fragment to an unsaturated thiol-modified fragment after a water loss. F, mass relationships between MS/MS fragment ions shown in B-D and their parent ions. DCC, N,N′-dicyclohexylcarbodiimide; MCPBA, m-chloroperbenzoic acid.

FIG. 3 is an exemplary MSn analysis of DSSO-cross-linked model peptides. A-E, MSn analysis of the DSSO-interlinked Ac-IR7 (α-α). A, MS spectrum of α-α: [α-α]3+ (m/z 615.973+) and [α-α]2+ (m/z 923.462+). B and C, MS/MS spectra of [α-α]3+ (B) and [α-α]2+ (C) in which alkene (αA) and sulfenic acid (αS) fragments were detected. D and E, MS3 spectra of αA (m/z 449.662+) (D) and αS (m/z 948.43) (E). F-I, MSn analysis of DSSO-interlinked Ac-myelin (β-β). F, MS spectrum of β-β: [β-β]6+ (m/z 458.236+), [β-β]5+ (m/z 549.685+), and [β-β]4+ (m/z 686.844+). G-I, MS/MS spectra of [β-β]6+ in which βAT pair was observed (G), [β-β]5+ in which the βAS pair was observed (H), and [β-β]4+ in which βAs pair was observed (I). J-L, MSn analysis of DSSO dead end-modified substance P peptide γDN. J, MS spectrum of γDN (m/z 538.762+). K, MS/MS spectrum of γDN in which two fragments, γA (m/z 478.032+) and γs (m/z 502.952+), were detected. L, MS3 spectrum of γA (m/z 478.032+). Sequences of Ac-IR7, Ac-myelin, and substance P are Ac-IEAEKGR (SEQ ID NO: 2), Ac-ASQKRPSQRHG (SEQ ID NO: 6), and RPKPQQF (SEQ ID NO: 7), respectively.

FIG. 4 is an exemplary MSn analysis of DSSO heterodimeric interlinked peptide of cytochrome c (α-β: Ac-GDVEKGKK (SEQ ID NO: 11) interlinked to KKGER (SEQ ID NO: 13)). A, MS/MS spectrum of [α-β]4+ (m/z 419.97164+) in which two fragment pairs were observed: αA (m/z 478.992+)/βT (m/z 352.402+) and αT (m/z 494.962+A (m/z 336.422+). B, MS3 spectrum of αA (m/z 478.992+) in which detection of y1-y7 and b2-b7 determined the sequence unambiguously as Ac-GDVEKAGKK (SEQ ID NO: 12). C, MS3 spectrum of βT (m/z 352.402+) in which detection of y1-y4, al, and b2-b7 ions determined the sequence unambiguously as KTKGER (SEQ ID NO: 14). KA is modified with the alkene moiety, and KT is modified with the unsaturated thiol moiety.

FIG. 5 is an exemplary MSn analysis of DSSO heterodimeric interlinked peptide of cytochrome c (α-β: HKTGPNLHGLFGR (SEQ ID NO: 16) interlinked to GKK). This peptide was detected in MS as triply charged [α-β]3+ (m/z 641.67303+), quadruply charged [α-β]4+ (m/z 481.50694+), and quintuply charged [α-β]5+ (m/z 385.40705+) ions. A, MS/MS spectrum of [α-β]3+ (m/z 641.67303+) in which two fragment pairs were observed: αA (m/z 744.402+)/βT (m/z 418.21) and αT (m/z 760.382+)/βA (m/z 386.24). B, MS/MS spectrum of [α-β]4+ (m/z 481.50694+) in which two fragment pairs were observed: αA (m/z 496.603+)/βT (m/z 418.21) and αT (m/z 507.263+)/βA (m/z 386.24). C, MS/MS spectrum of [α-β]5+ (m/z 385.40705+) in which two fragment pairs were observed: αAT (m/z 496.603+/209.612+ and 372.704+/418.21) and αT (m/z 507.263+)/βA (m/z 193.622+). D, MS3 spectrum of αA fragment (m/z 496.603+) in which detection of a series of y and b ions determined its sequence unambiguously as HKATGPNLHGLFGR (SEQ ID NO: 17). KA is modified with the alkene moiety.

FIG. 6 is an exemplary MSn analysis of DSSO dead end-modified peptide (A and B) and intralinked peptide of cytochrome c (C and D). A, MS/MS spectrum of a dead end-modified peptide (αDN; m/z 880.89752+, KDNTGQAPGFSYTDANK (SEQ ID NO: 20)) in which two fragment ions were determined as αA (m/z 820.202+) and αT (m/z 835.882+). B, MS3 spectrum of αA (m/z 820.202+) in which detection of a series of y and b ions determined its sequence unambiguously as KATGQAPGFSYTDANK (SEQ ID NO: 21). C, MS/MS spectrum of an intralinked peptide (αintra; m/z 611.98023+, GGK*HK*TGPNLHGLFGR (SEQ ID NO: 24)) in which one fragment ion was observed and determined as αA+T (m/z 606.243+). D, MS3 spectrum of αA+T (m/z 606.243+) in which detection of a series of y and b ions determined the presence of a mixture of GGKAHKTTGPNLHGLFGR (SEQ ID NO: 25) and GGKTHKATGPNLHGLFGR (SEQ ID NO: 26). KA is modified with the alkene moiety, and KT is modified with the unsaturated thiol moiety.

FIG. 7 shows A, the integrated data analysis work flow for identifying DSSO-crosslinked peptides by LC MSn and B, the work flow for the Link-Finder program.

FIG. 8 is an exemplary MSn analysis of DSSO heterodimeric interlinked peptide of the yeast 20 S proteasome complex (α-β: NKPELYQIDYLGTK (SEQ ID NO: 27) interlinked to LGSQSLGVSNKFEK (SEQ ID NO: 29)) with intersubunit link between 20 S subunit β4 and β3. A, MS/MS spectrum of [α-β]4+ (m/z 833.92314+) in which two fragment pairs were detected and determined as αA (m/z 868.522+)/βT (m/z 790.552+) and αT (m/z 884.982+)/βA (m/z 774.322+). B, MS3 spectrum of αA (m/z 868.522+) in which detection of a series of y and b ions determined its sequence unambiguously as NKAPELYQIDYLGTK (SEQ ID NO: 28). C, MS3 spectrum of βT (m/z 790.552+) in which detection of a series of y and b ions determined its sequence unambiguously as LGSQSLGVSNKTFEK (SEQ ID NO: 30). KA is modified with the alkene moiety, and KT is modified with the unsaturated thiol moiety.

FIG. 9 shows a mapping identified DSSO-interlinked lysines onto crystal structure of yeast 20 S proteasome. The lysines forming intrasubunit cross-links appear space-filled in blue, and those forming intersubunit cross-links appear space-filled in red.

FIG. 10 is a flowchart showing a general technique for identifying crosslinked peptides according to one embodiment of the invention.

FIG. 11A is an exemplary MS3 analysis of DSSO inter-linked peptides of cytochrome c.

FIG. 11B is an exemplary MS3 analysis of ubiquitin.

FIG. 12 is an exemplary SDS-PAGE gel picture of the 20S proteasome cross-linked with various molar ratios of cross-linker DSSO, i.e. 1:100, 1:500 and 1:1000. The 20S proteasome without cross-linking served as a control. The cross-linked proteasome complex was separated using 4-20% gradient gel.

FIG. 13 is an exemplary MS3 analysis of DSSO inter-linked peptides of the yeast 20S proteasome complex.

FIG. 14 is an exemplary MSn analysis of a DSSO dead-end peptide of the yeast 20S proteasome complex. A) MS/MS spectrum of a dead-end (DN) peptide (αDN, m/z 693.00783+, AELEKDNLVDHHPEGLSAR (SEQ ID NO: 110)), in which two fragment ions were determined as αA (m/z 652.673+) and αT (m/z 663.333+); B) MS3 spectrum of αA (m/z 652.673+), detection of a series of y and b ions determined its sequence unambiguously as AELEKALVDHHPEGLSAR (SEQ ID NO: 111), in which KA is modified with the alkene moiety. The sequence matched to subunit α7; C) MS3 spectrum of αT (m/z 663.333+), detection of a series of y and b ions determined its sequence unambiguously as AELEKTLVDHHPEGLSAR (SEQ ID NO: 112), in which KT is modified with the unsaturated thiol moiety.

DESCRIPTION OF THE TABLES

TABLE 1 Summary of DSSO-interlinked peptides of cytochrome c identified by LC MSn.

TABLE 2 Summary of DSSO-interlinked peptides of the yeast 20 S proteasome complex identified by LC MSn.

TABLE 3 Summary of DSSO cross-linked peptides—DSSO dead-end, intra-linked and multilinked peptides—of cytochrome c by LC MSn.

TABLE 4 Summary of DSSO cross-linked peptides of ubiquitin by LC MSn.

TABLE 5 Summary of DSSO inter-linked and dead-end peptides of the yeast 20S proteasome complex by LC MSn.

TABLE 6 Peptide sequences with their corresponding SED ID NOs.

TABLE 1 m/z Peptide AA MS m/z Δ Mod. sequenced Distance Type Sequence Location (Observed) z (PPM) Position in MS3 z (Cα-Cα) References 2 Ac-GDVEKGK G1-K7  565.30 3 1 KT5  860.38 1 5.3 A 19, 20, 21, KIFVQK K8-K13 KA8  408.75 2 31 2 Ac-GDVEKGK G1-K7  603.81 2 0 KA5  828.41 1 13.0 A 21, 31, 43 KK K87-K88 K87*  860.38 2 Ac-GDVEKGK G1-K7  516.93 3 0 KT5  660.38 1 13.0 A 21, 31 KKGER K87-R91 KA87  336.20 2 2 Ac-GDVEKGK G1-K7  474.23 3 2 KA5  414.71 2 13.0 A N/A KGER K88-R91 K88* 2 Ac-GDVEKGK G1-K7  675.35 3 4 KT5  860.38 1 13.2 A N/A EDLIAYLKK E92-K100 KA99  573.83 2 2 Ac-GDVEKGKK G1-K8  445.57 3 1 KA7  478.76 2 15.7 A 21, 31 KK K87-K88 K87* 2 Ac-GDVEKGKK G1-K8  419.97 4 0 KA7  478.76 2 15.7 A 21, 31 KKGER K87-K91 KT87 352.418 2 2 GKK G6-K8  641.67 3 0 K7*  760.39 2 18.7 A 14, 31, 43 HKTGPNLHGLFGR H26-R38 KT27 2 GKK G6-K8  526.26 2 0 K7*  616.29 1 9.9 A 21, 43 KATNE K100-E104 KA100 2 KIFVQK K8-K13  398.90 3 2 KT8  424.74 2 14.8 A 31 KK K87-K88 K87* 2 KIFVQK K8-K13  384.97 4 2 KA8  408.75 2 14.8 A 31 KKGER K87-R91 KT87  352.18 2 2 KIFVQK K8-K13  494.59 3 2 KA8  406.75 2 13.7 A 21, 31 KATNE K100-E104 K100* 2 GGKHK G23-K27  756.70 3 2 KT25  612.29 2 19.3 A N/A KTGQAPGFSYTDANK K39-K53 KA39  819.89 2 KTGQAPGFSYTDANK K39-K53  945.47 3 3 KA39  819.89 2 15.1 A 31 EDLIAYLKK E92-K100 KT99 1178.62 1 2 KTGQAPGFSYTDANK K39-K53  768.69 3 0 KT99  835.88 2 18.0 A 21, 31, 43 KATNE K100-E104 K100* 2 TGQAPGFSYTDANKNK T40-K55 1104.21 3 2 KT53  892.90 2 11.6 A 31 YIPGTKMoxIFAGIK Y74-K86 KA79 1508.82 1 2 KYIPGTK K73-K79  629.68 3 2 KT73  892.43 1 13.2 A 31 MoxIFAGIKK M80-K87 KT86 1009.52 1 2 MIFAGIKK M80-K87  389.21 4 2 KT86  497.27 2 6.4 A 31 KGER K88-R91 K88* 2 MoxIFGIKK M80-K87  393.21 4 2 KT86  505.27 2 6.4 A 31 KGER K88-R91 K88* *They were identified from different pair ions by MS3. They were identifed from different fragment pair ions by MS3. Note: All of the inter-linked peptides displayed characteristic fragment pairs and were identified by Batch-Tag, MS-Bridge, and Link-Finder.

TABLE 2 m/z Peptide AA MS m/z Δ Mod. sequenced Distance Type Sequence Subunit Location (Observed) z (PPM) Position in MS3 z (Cα-Cα) 2 ATATGPKQQEITTNLENHFK α1  A168-K187 595.10 5 2 KA174  571.29 4 14.8 Å (PRS2/SCL1) KVPDK α1  K58-K62 KT58  672.34 1 (PRS2/SCL1) 2 KVAHTSYK α2 (PRE8) K91-K98 477.51 4 2 KT91  510.25 2  5.1 Å VLVDKSR α2 (PRE8) V84-R90 KA88  435.76 2 2 IFKPQEIK α3 (PRE9) I229-K236 514.03 4 0 KT231  544.80 2 14.2 Å LYKLNDK α3 (PRE9) L66-K72 KA68  474.26 2 2 IHAQNYLKTYNEDIPVEILVR α3 (PRE9) I93-R113 904.47 4 1 KT100 1307.58 2 10.6 Å YKTNLYK β3 (PUP3) Y69-K75 KA70  492.27 2 2 EFLKNYDR α4 (PRE6) E173-R181 692.33 3 2 KA177  634.30 2 13.1 Å NSKTVR α4 (PRE6) N167-R172 KA169  379.71 2 2 ILKQVMEEK α5 (PUP2) I203-K211 641.01 3 0 KT205  602.31 2 10.5 Å ELEK α5 (PUP2) E242-K246 K244* 2 SYKFPR β2 (PUP1) S202-R207 539.26 3 1 KA204  426.23 2 12.1 Å EEKQK β2 (PUP1) E197-K201 KT199  747.34 1 2 YKTNLYK β3 (PUP3) Y69-K42 587.64 3 2 KA70  492.26 2 10.7 Å LKEER β3 (PUP3) Y199-R203 KA77  364.70 2 2 LGSQSLGVSNKFEK β3 (PUP3) L29-K42 595.05 4 2 KT39  790.40 2 13.2 Å YLKMoxR β3 (PUP3) Y199-R203 KA201  390.71 2 2 NKPELYQIDYLGTK β4 (PRE1) N112-R203 833.92 4 0 KA113  868.45 2 19.1 Å LGSQSLGVSNKFEK β3 (PUP3) L29-K42 KT39  790.39 2 2 VQDSVILASSKAVTR β4 (PRE1) V9-R23 633.74 5 1 KA19  543.30 3  7.8 Å GISVLKDSDDKTR β4 (PRE1) G24-R36 KT29  460.38 2 2 FKNSVK β6 (PRE7) F59-K64 532.29 3 2 KT60  808.40 1 16.2 Å KLAVER α6 (PRE5) K102-R107 KA102  385.23 2 2 NQYEPGTNGKVK β6 (PRE7) N149-K160 659.68 3 0 KA158  694.84 2  9.8 Å KPLK β6 (PRE7) K161-K164 K161* *Peptide fragments containing these sites were not sequenced by MS3. They were identified from different fragment pair ions by MS3. Mature sequence from crystal data was used for data analysis. Note: All of the inter-linked peptides displayed characteristic fragment pairs and were identified by Batch-tag, MS-Bridge and Link-Finder.

TABLE 3 MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation in other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value Refs 0 Ac-GDVEKGKK G1-K8 539.76 2 1 KT5 494.74 2 22.7 1.90E-05 21 (SEQ ID NO: 11) 0 KIFVQK K8-K13 469.76 2 2 KA8 408.75 2 19.1 1.00E-04 19, 20, 21 (SEQ ID NO: 35) 31 0 KTGQAPGFSYTDANK K39-K53 880.90 2 2 KT39 838.88 2 41.5 2.10E-10 19, 20, 21 (SEQ ID NO: 19) 41 0 TGQAPGFSYTDANKNK T40-K55 937.92 2 0 KT53 892.90 2 28.8 4.60E-08 19, 31 (SEQ ID NO: 46) 0 KYIPGTK K73-K79 491.75 2 2 KA73 430.75 2 23.9 1.40E-05 20, 21, 31 (SEQ ID NO: 51) 0 YIPGTKMoxIFAGIK y74-K86 815.92 2 2 KT79 770.90 2 18.3 5.00E-06 19, 31 (SEQ ID NO: 49) 0 MoxIFAGIKK M80-K87 550.28 2 1 KT86 505.27 2 22.0 4.20E-06 31 (SEQ ID NO: 54) 0 EDLIAYLKK E92-K100 634.83 2 1 KA99 573.83 2 32.9 2.70E-07 21, 31 (SEQ ID NO: 39) MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation Distance in other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 1 Ac-GDVEKGKK G1-K8 530.75 2 2 KA5, KT7 521.76 2 19.5 6.20E-05  5.4 Å 21 (SEQ ID NO: 11) 1 GGKHKTGPNLHGLFGR G23-R38 611.98 3 0 KA25, KT27 605.98 3 37.7 2.80E-08  6.3 Å 14, 19, 20, (SEQ ID NO: 23) 21, 31, 42 1 KYIPGTKMoxIFAGIK K73-K86 870.96 2 2 K73, K79* 12.1 Å 31 (SEQ ID NO: 114) 1 KYIPGTKMoxIFAGIKK K73-K87 623.67 3 2 K73, K86* 13.2 Å 31 (SEQ ID NO: 116) 1 MoxIFAGIKKK M80-K88 605.32 2 2 KA86, KT87 596.32 2 29.5 1.10E-08 14, 19, 20, (SEQ ID NO: 118) 21, 31, 42 1 KKGER K87-R91 388.19 2 1 KA87, KS88* 20, 21 (SEQ ID NO: 13) 1 EDLIAYLKKATNE E92-E104 833.41 2 3 K99, KT100 824.40 2 28.7 1.50E-06 20, 21 (SEQ ID NO: 119) MS m/z Best Best Ex- Peptide AA m/z Δ Mod. sequenced Discovery pectation Distance Ref- Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) erences 2 Ac-GDVEKGK G1-K7 565.30 3 1 KT5 860.38 1 19.7 2.70E-05  5.3 Å 19, 20,  (SEQ ID NO: 32) 21, 31 KIFVQK K8-K13 KA8 408.75 2 20.3 1.90E-05 (SEQ ID NO: 35) 2 Ac-GDVEKGK G1-K7 603.81 2 0 KA5 828.41 1 23.1 2.70E-06 13.0 Å 21, 31,  (SEQ ID NO: 32) 43 KK K87-K88 K87* 2 Ac-GDVEKGK G1-K7 516.93 3 0 KT5 860.38 1 19.7 2.70E-06 13.0 Å 21, 31 (SEQ ID NO: 32) KKGER K87-R91 KA87 336.20 2 14.8 1.50E-04 (SEQ ID NO: 23) 2 Ac-GDVEKGK G1-K7 474.23 3 2 KA5 414.71 2 25.5 8.60E-07 13.0 Å (SEQ ID NO: 32) KGER K88-R91 k88* (SEQ ID NO: 38) 2 Ac-GDVEKGK G1-K7 675.35 3 4 KT5 860.38 1 19.7 2.70E-05 13.2 Å (SEQ ID NO: 32) EDLIAYLKK E92-K10 KA99 573.83 2 32.9 2.10E-07 (SEQ ID NO: 30) 2 Ac-GDVEKGKK G1-K8 445.57 3 1 KA7 478.76 2 23.1 7.50E-06 15.7 Å 21, 31 (SEQ ID NO: 11) KK K87-K88 K87* 2 Ac-GDVEKGKK G1-K8 419.97 4 0 KA7 478.76 2 22.0 2.20E-05 15.7 Å 21, 31 (SEQ ID NO: 11) KKGER K87-K91 KT87 352.18 2 15.5 1.40E-03 (SEQ ID NO: 13) 2 GKK G6-K8 641.67 3 0 K7* 18.7 Å 14, 31, HKTGPNLHGLFGR H26-R38 KT27 760.39 2 35.0 7.10E-11 43 (SEQ ID NO: 16) 2 GKK G6-K8 526.26 2 0 KT*  9.9 Å 21, 43 KATNE K100-E104 KA100 616.29 1 14.2 2.40E-09 (SEQ ID NO: 42) 2 KIFVQK K8-K13 398.90 3 2 KT8 424.74 2 19.4 1.40E-04 14.8 Å 31 (SEQ ID NO: 35) KK K87-K88 KT87* 2 KIFVQK K8-K13 384.97 4 2 KA8 408.75 2 20.3 1.90E-05 14.8 Å 31 (SEQ ID NO: 35) KKGER K87-K91 KT87 352.18 2 15.0 1.00E-04 (SEQ ID NO: 13) 2 KIFVQK K8-K13 494.59 3 2 KA8 408.75 2 20.6 3.20E-05 13.7 Å 21, 31 (SEQ ID NO: 35) KATNE K100-E104 K100* (SEQ ID NO: 42) 2 GGKHK G23-K27 756.70 3 2 KT25 612.29 1 9.0# 8.00E-03 19.3 Å (SEQ ID NO: 44) KTGQAPGFSYTDANK K39-K53 KA39 819.89 2 44.7 5.70E-11 (SEQ ID NO: 19) 2 KTGQAPGFSYTDANK K39-K53 945.47 3 3 KA39 819.89 2 42.5 2.50E-10 15.1 Å 31 (SEQ ID NO: 19) EDLIAYLKK E92-K100 KT99 1178.62 1 22.9 1.80E-05 (SEQ ID NO: 39) 2 KTGQAPGFSYTDANK K39-K53 768.69 3 0 KT39 835.88 2 39.9 1.20E-09 18.0 Å 21, 31, (SEQ ID NO: 19) 43 KATNE K100-E104 K100* (SEQ ID NO: 42) 2 TGQAPGFSYTDANKNK T40-K55 1104.21 3 2 KT53 892.90 2 28.8 4.60E-08 11.6 Å 31 (SEQ ID NO: 46) YIPGTKMoxIFAGIK Y74-K86 KA79 1508.82 1 9.3# 1.00E-03 (SEQ ID NO: 49) 2 KYIPGTK K73-K79 629.68 3 2 KT73 892.46 1 17.6 2.00E-05 13.2 Å 31 (SEQ ID NO: 51) MoxIFAGIKK M80-K87 KT86 1009.52 1 15.0 2.10E-05 (SEQ ID NO: 54) 2 MIFAGIKK M80-K87 389.21 4 2 KT86 497.27 2 18.9 5.00E-05  6.4 Å 31 (SEQ ID NO: 53) KGER K88-R91 K88* 505.27 (SEQ ID NO: 38) 2 MoxIFAGIKK M80-K87 393.21 4 2 KT86 2 24.0 4.20E-07  6.4 Å 31 (SEQ ID NO: 54) KGER K88-R91 K88* MS m/z Expect- Dis- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation tance In Other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 0,0 GGKHKTGPNLGHLFGR G23-R38 507.74 4 -2 KA28, KA27 446.74 4 28.0 1.10E-06 (SEQ ID NO: 23) 0,1 YIPGTKMoxIFAGIKKK Y74-588 682.34 3 1 KA79, KA86,  635.67 3 24.6 3.60E-05 (SEQ ID NO: 121) KT87 0,1 MoxIFAGIKKKGER M80-R91 576.61 3 2 KA86, KA87, 529.94 3 31.8 1.20E-05 (SEQ ID NO: 123) KT88 0,1 MoxIFAGIKKKGER M80-R91 864.41 2 1 KT86, KA87, 794.41 2 34.0 2.00E-08 (SEQ ID NO: 323) KA88 0,2 Ac-GDVEKGKK G1-K8 899.40 2 1 K5, K7* ~11.3 Å (SEQ ID NO: 11) KATNE K100-E104 KA100 616.29 1 14.2 2.60E-08 (SEQ ID NO: 42) 0,2 GKK G6-K8 469.04 5 0 K7* ~18.7 Å GGKHKTGPNLHGLFGR G23-R38 KA25,  446.74 4 22.3 4.20E-06 (SEQ ID NO: 23) KA27 0,2 GKKIFVQK G6-K13 519.28 3 2 KT7,  KA8 544.30 2 23.1 1.90E-05 ~15.3 Å (SEQ ID NO: 124) KK K87-K88 K87* 1,2 Ac-GDVEVGK G1-K7 828.40 3 0 KT5 860.38 1 19.5 3.20E-05 ~13.8 Å (SEQ ID NO: 32) MoxIFAGIKKKGER M80-R91 KA86, KA87, 794.41 2 36.3 2.00E-09 (SEQ ID NO: 123) KT88 1,2 Ac-GDVEKGKKIFVQK G1-K13 799.06 3 2 KT5, KT7, 872.43 2 18.7 1.20E-04 ~12.1 Å (SEQ ID NO: 126) KA8 KATNE K100-E104 KA100 616.30 1 14.2 2.40E-09 (SEQ ID NO: 42) 1,2 KYIPGTK K73-K79 839.10 3 1 KT73 892.46 1 17.6 2.00E-05 ~15.3 Å (SEQ ID NO: 53) MoxIFAGIKKKGER M80-R91 KA86, KT87, 794.41 2 36.3 2.00E-09 (SEQ ID NO: 323) KA88 2,2 Ac-GDVEKGKK G1-K8 599.79 4 0 K5, K7* ~14.38  Å (SEQ ID NO: 11) KKGER K87-R91 KA87 336.20 2 14.8 1.50E-04 ~11.3  Å (SEQ ID NO: 13) KATNE K100-E104 K100* (SEQ ID NO: 42) *Peptide fragments containing these sites were not sequenced by MS3. **These intra-linked were identified by MS/MS. #These MS3 data were considered due to the presence of other lines of evidence for identifying the cross-linked peptides. They wete identified from different charged fragment pair ions by MS3. Note: Type 0: dead-end Type 1: intra-linked Type 0,1; 0,2; 1,2; 2,2: multi-linked All of the peptides displayed characteristic fragment pairs. All of the cross-linked peptides were identified by Link-Finder, Batch-tag and MS-Bridge.

TABLE 4  MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation in other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value Refs 0 MQIFVKTLTGK M1-K11 721.38 2  9 KT6 676.36 2 30.1 5.40E-08 19, 38 (SEQ ID NO: 127) 0 AKIQDK A28-K33 439.72 2  7 KT29 394.71 2 18.0 2.40E-04 (SEQ ID NO: 128) 0 LIFAGKQLEDGR L43-R54 761.89 2 10 KT48 716.87 2 35.1 1.10E-07 19, 38 (SEQ ID NO: 60) 0 LIFAGKQLEDGRTLSDYNIQK L43-K62 862.44 3  8 KT48 832.43 3 34.1 1.20E-07 (SEQ ID NO: 129) 0 TLSDYNIQKESTLHLVLR T55-R72 769.40 3 10 KA63 728.73 3 36.1 1.40E-07 19, 38 (SEQ ID NO: 64) MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation Distance In Other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 1 AKIQDKEGIPPDQQR A28-R42 940.97 2 5 K29, K33 940.97 28.5 4.40E-07 6.42 Å 19 (SEQ ID NO: 130) MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation Distance In Other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 2 TLTGKTITLEVEPSDTIENVK T7-K27 993.01 4 5 K11* 13.3 Å 38 (SEQ ID NO: 57) IQDKEGIPPDQQR I30-R42 KA33  789.41 2 28.6 3.20E-08 (SEQ ID NO: 58) 2 LIFAGKQLEDGR L43-R54 713.38 4 5 KA48  700.88 2 39.2 1.00E-08 15.3 Å 19 (SEQ ID NO: 60) LIFAGKQLEDGR L43-R54 KA48  716.87 2 36.4 1.90E-08 (SEQ ID NO: 60) 2 LIFAGKQLEDGR L43-R54 909.24 4 9 KA48  700.89 2 35.5 1.80E-08 15.4 Å 19, 38 (SEQ ID NO: 60) TLSDYNIQKESTLHLVLR T55-R72 KT63 1108.58 2 31.3 1.20E-08 (SEQ ID NO: 64)

TABLE 5 MS m/z Expect- Peptide AA m/z Δ Mod. sequenced Peptide ation Type Sequence Subunit Loctation (Observed z (PPM) Position in MS3 z Score Value 0 AKAEAAEFR α1(PRS2/SCL1) A97-R105 584.77 2 -1 KT98 539.75 2 35.0 1.50E-04 (SEQ ID NO: 131) 0 VLVDKSR α2 (PRE8) V84-R90 496.76 2 0 KA88 435.76 2 23.8 4.60E-04 (SEQ ID NO: 72) 0 TFLEKR α2 (PRE8) T173-R178 485.24 2 1 KA177 424.24 2 22.9 3.30E-04 (SEQ ID NO: 132) 0 KVTSTLLEQDTSTEK α3 (PRE9) K51-65 928.45 2 0 KA51 867.45 2 47.2 3.50E-09 (SEQ ID NO: 133) 0 STLKLQDTR α4 (PRE6) S50-R58 619.31 2 1 KA53 558.31 2 36.3 3.90E-05 (SEQ ID NO: 134) 0 ITPSKVSK α4 (PRE6) I59-K66 518.27 2 1 KT63 473.26 2 21.3 2.30E-03 (SEQ ID NO: 135) 0 ILIEKAR α4 (PRE6) I84-R90 509.78 2 -1 KT88 464.77 2 27.4 1.40E-03 (SEQ ID NO: 136) 0 NSKTVR α4 (PRE6) N157-R172 440.71 2 1 KT176 395.70 2 22.1 5.90E-03 (SEQ ID NO: 85) 0 EFLEKNYDR α4 (PRE6) E173-R181 695.30 2 -1 KT177 650.29 2 30.9 1.40E-05 (SEQ ID NO: 83) 0 TAELIKELK α5 (PUP2) T236-K244 610.82 2 -4 KT241 565.81 2 36.3 1.80E-04 (SEQ ID NO: 137) 0 KLAVER α6 (PRE5) K12-R107 446.23 2 2 KA102 385.23 2 18.1 3.00E-04 (SEQ ID NO: 305) 0 LLVPQNKVK α7 (PRE10) L58-K66 607.84 2 1 KT63 562.83 2 24.2 9.20E-05 (SEQ ID NO: 138) 0 AELEKLVDHHPEGLSAR α7 (PRE10) A174-R190 693.00 3 -2 KT178 663.00 2 33.5 6.70E-06 (SEQ ID NO: 109) 0 EAVKQAAK α7 (PRE10) E191-K198 510.76 2 2 KT194 465.74 2 26.9 7.10E-04 (SEQ ID NO: 139) 0 YKTNLYK β3 (PUP3) Y69-K75 553.27 2 3 KT70 508.25 2 25.7 9.60E-05 (SEQ ID NO: 82) 0 TNLYKLK β3 (PUP3) T71-K77 528.27 2 -5 KA75 467.27 2 25.9 2.50E-03 (SEQ ID NO: 140) 0 QELAKSIR β4 (PRE1) Q86-R93 560.79 2 2 KA90 499.79 2 22.5 4.00E-03 (SEQ ID NO: 141) 0 IVDKDGIR β4 (PRE1) I183-R190 546.27 2 1 KT186 501.26 2 30.6 1.40E-03 (SEQ ID NO: 142) 0 FKNSVK β6 (PRE7) F59-K64 449.72 2 1 KT60 404.71 2 19.0 1.90E-02 (SEQ ID NO: 103) 0 KLSINSAAR β6 (PRE7) K74-R82 568.29 2 3 KA74 507.29 2 32.8 2.00E-04 (SEQ ID NO: 143) 0 KEFYELK β6 (PRE7) K205-K211 566.77 2 2 KA205 505.77 2 24.7 5.40E-03 (SEQ ID NO: 144) MS m/z Expect- Peptide AA m/z Δ Mod. sequenced Peptide ation Distance Type Sequence Subunit Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) 2 ATATGPKQQEITTNLENHFK α1(PRS2/SCL1) A168-K187 595.10 5 2 KA174  571.29 4 24.5 2.90E-04 14.8  Å (SEQ ID NO: 66) KVPDK α1(PRS2/SCL1) K58-K62 KT58  672.34 1 12.3 0.71** (SEQ ID NO: 68) 2 KVAHTSYK α2 (PRE8) K91-K98 477.51 4 2 KT91  510.25 2 29.9 7.60E-05  5.1  Å (SEQ ID NO: 70) VLVLDKSR α2 (PRE8) V84-R90 KA88  435.76 2 27.6 2.50E-03 (SEQ ID NO: 72) KVAHTSYK α2 (PRE8) K91-K98 382.21 5 1 KA91  329.85 3 19.3 1.90E-02 (SEQ ID NO: 70) VLVDKSR α2 (PRE8) V84-R90 KT88  451.74 2 25.4 2.50E-04 (SEQ ID NO: 72) 2 IFKPQEIK α3 (PRE9) I229-K236 514.03 4 0 KT231  544.80 2 23.6 1.50E-02 14.2  Å (SEQ ID NO: 74) LYKLNDK α3 (PRE9 L66-K72 KA68  474.26 2 25.5 5.50E-03 (SEQ ID NO: 86) 2 IHAQNYLKTYNEDIPVEILVR α3 (PRE9) I93-R113 904.47 4 1 KT100 1307.68 2 26.6 7.90E-05 10.6  Å (SEQ ID NO: 78) YKTNLYK β3 (PUP3)  Y69-K75 KA70  492.27 2 23.9 3.00E-03 (SEQ ID NO: 80) 2 IHAQNYLKTYNEDIPVEILVR α3 (PRE9) I93-R113 723.78 5 5 K100* (SEQ ID NO: 78) YKTNLYK β3 (PUP3)  Y69-K75 KA70  492.27 2 24.2 2.90E-03 (SEQ ID NO: 80) 2 EFLEKNYDR α4 (PRE6) E173-R181 692.33 3 2 KA177*  634.30 2 23.6 2.60E-04 13.1  Å (SEQ ID NO: 83) NSKTVR α4 (PRE6) N167-R172 KA169  379.71 2 22.6 2.80E-03 (SEQ ID NO: 85) EFLEKNYDR α4 (PRE6) E173-R181 519.50 4 2 KT177  650.29 2 33.2 1.70E-05 (SEQ ID NO: 83) NSKTVR α4 (PRE6) N167-R172 KA169  379.71 2 22.6 2.80E-03 (SEQ ID NO: 85) 2 ILKQVMEEK α5 (PUP2) I203-K211 641.01 3 0 KT205  602.31 2 29.2 3.50E-03 10.5 Å (SEQ ID NO: 87) ELKEK α5 (PUP2) E242-K246 K244* (SEQ ID NO: 89) ILKQVMEEK α5 (PUP2) I203-K211 481.01 4 0 KT205  602.31 2 27.6 2.60E-04 (SEQ ID NO: 87) ELKEK α5 (PUP2) E242-K246 K244* (SEQ ID NO: 89) 2 SYKFPR β2 (PUP1) S202-R207 539.26 3 1 KA204  426.23 2 23.1 6.40E-03 12.1  Å (SEQ ID NO: 90) EEKQK β2 (PUP1) E197-K201 KT199  747.34 1 10.4 0.3** (SEQ ID NO: 92) SYKFPR β2 (PUP1) S202-R207 404.70 4 2 KT204  442.21 2 21.1 8.20E-04 (SEQ ID NO: 90) EEKQK β2 (PUP1) E197-K201 K199* (SEQ ID NO: 92) 2 YKTNLYK β3 (PUP3) Y69-K75 587.64 3 2 KA70  492.26 2 23.8 4.60E-04 10.7  Å (SEQ ID NO: 30) LKEER β3 (PUP3) L76-R80 KA77  364.70 2 17.0 2.70E-02 (SEQ ID NO: 94) YKTNLYK β3 (PUP3) Y69-K75 440.98 4 2 KT70  508.25 2 25.7 1.10E-04 (SEQ ID NO: 30) LKEER β3 (PUP3) L76-R80 KA77  364.70 2 16.5 8.40E-03 (SEQ ID NO: 94) 2 LGSQSLGVSNKFEK β3 (PUP3) L29-K42 793.07 3 2 KA39  774.41 2 42.0 5.30E-07 13.2  Å (SEQ ID NO: 29) YLKMoxR β3 (PUP3) Y199-R203 KT201  406.69 2 16.2 1.10E-03 (SEQ ID NO: 97) LGSQSLGVSNKFEK β3 (PUP3) L29-K42 595.05 4 2 KT39  790.40 2 40.7 8.40E-07 (SEQ ID NO: 29) YLKMoxR β3 (PUP3) Y199-R203 KA201  390.71 2 18.1 6.10E-03 (SEQ ID NO: 97) 2 NKPWLYQIDYLGTK β4 (PRE1) N112-K125 833.92 4 0 KA113  868.45 2 32.0 9.50E-08 19.1  Å (SEQ ID NO: 27) LGSQSLGVSNKFEK β3 (PUP3) L29-K42 KT39  790.39 2 26.5 3.90E-05 (SEQ ID NO: 29) 2 VQDSVILASSKAVTR β4 (PRE1) V9-R23 633.74 5 1 KA19  543.30 3 23.0 4.90E-03  7.8  Å (SEQ ID NO: 99) GISVLKDSDDKTR β4 (PRE1) G24-R36 KT29  760.38 2 35.4 2.40E-05 (SEQ ID NO: 101) 2 FKNSVK β6 (PRE7) F59-K64 532.29 3 2 KT60  808.40 1 16.2 2.00E-02 16.2  Å (SEQ ID NO: 103) KLAVER α6 (PRE5) K102-R107 KA102  385.23 2 21.2 9.80E-04 (SEQ ID NO: 105) 4 2 FKNSVK β6 (PRE7) F59-K64 399.47 KT60  404.71 2 16.5 1.10E-02 (SEQ ID NO: 103) KLAVER α6 (PRE5) K102-R107 KA102  385.23 2 18.3 1.60E-04 (SEQ ID NO: 105) 2 NQYEPGTNGKVK β6 (PRE7) N149-K160 659.68 3 0 KA158  694.84 2 29.8 4.20E-05  9.8 Å (SEQ ID NO: 106) KPLK β6 (PRE7) K161-K164 K161* (SEQ ID NO: 108) NQYEPGTNGKVK β6 (PRE7) N149-K160 495.01 4 2 KT158  710.83 26.3 3.00E-04 (SEQ ID NO: 106) KPLK β6 (PRE7) K161-K164 K161* (SEQ ID NO: 108) *Peptide fragment containing theses sites were not sequenced by MS3. **The peptide identification was above 1% false positive rat but MS3 was validated manually. They were identified from different fragment pair ions by MS3. Mature sequence from crystal data was used for data analysis. Note: Type 0: dead-end All of the peptides displayed characteristic fragment pairs. All of the cross-linked peptides were identified by Link-Finder, Batch-tag, MS-Bridge.

TABLE 6 SEQ ID NO: Sequence SEQ ID NO: 1 IEAEKGR SEQ ID NO: 2 Ac-IEAEKGR SEQ ID NO: 3 Ac-IEAEKAGR SEQ ID NO: 4 Ac-IEAEKSGR SEQ ID NO: 5 ASQKRPSQRHG SEQ ID NO: 6 Ac-ASQKRPSQRHG SEQ ID NO: 7 RPKPQQF SEQ ID NO: 8 RPKAPQQF SEQ ID NO: 9 RPKDNPQQF SEQ ID NO: 10 GDVEKGKK SEQ ID NO: 11 Ac-GDVEKGKK SEQ ID NO: 12 Ac-GDVEKAGKK SEQ ID NO: 13 KKGER SEQ ID NO: 14 KTKGER SEQ ID NO: 15 KAKGER SEQ ID NO: 16 HKTGPNLHGLFGR SEQ ID NO: 17 HKATGPNLHGLFGR SEQ ID NO: 18 HKTTGPNLHGLFGR SEQ ID NO: 19 KTGQAPGFSYTDANK SEQ ID NO: 20 KDNTGQAPGFSYTDANK SEQ ID NO: 21 KATGQAPGFSYTDANK SEQ ID NO: 22 KTTGQAPGFSYTDANK SEQ ID NO: 23 GGKHKTGPNLHGLFGR SEQ ID NO: 24 GGK*HK*TGPNLHGLFGR SEQ ID NO: 25 GGKAHKTTGPNLHGLFGR SEQ ID NO: 26 GGKTHKATGPNLHGLFGR SEQ ID NO: 27 NKPELYQIDYLGTK SEQ ID NO: 28 NKAPELYQIDYLGTK SEQ ID NO: 29 LGSQSLGVSNKFEK SEQ ID NO: 30 LGSQSLGVSNKTFEK SEQ ID NO: 31 GDVEKGK SEQ ID NO: 32 Ac-GDVEKGK SEQ ID NO: 33 Ac-GDVEKTGK SEQ ID NO: 34 Ac-GDVEKAGK SEQ ID NO: 35 KIFVQK SEQ ID NO: 36 KAIFVQK SEQ ID NO: 37 KTIFVQK SEQ ID NO: 38 KGER SEQ ID NO: 39 EDLIAYLKK SEQ ID NO: 40 EDLIAYLKAK SEQ ID NO: 41 EDLIAYLKTK SEQ ID NO: 42 KATNE SEQ ID NO: 43 KAATNE SEQ ID NO: 44 GGKHK SEQ ID NO: 45 GGKTHK SEQ ID NO: 46 TGQAPGFSYTDANKNK SEQ ID NO: 47 TGQAPGFSYTDANKTNK SEQ ID NO: 48 YIPGTKMIFAGIK SEQ ID NO: 49 YIPGTKMOXIFAGIK SEQ ID NO: 50 YIPGTKAMOXIFAGIK SEQ ID NO: 51 KYIPGTK SEQ ID NO: 52 KTYIPGTK SEQ ID NO: 53 MIFAGIKK SEQ ID NO: 54 MOXIFAGIKK SEQ ID NO: 55 MOXIFAGIKTK SEQ ID NO: 56 MIFAGIKTK SEQ ID NO: 57 TLTGKTITLEVEPSDTIENVK SEQ ID NO: 58 IQDKEGIPPDQQR SEQ ID NO: 59 IQDKAEGIPPDQQR SEQ ID NO: 60 LIFAGKQLEDGR SEQ ID NO: 61 LIFAGKAQLEDGR SEQ ID NO: 62 LIFAGKTQLEDGR SEQ ID NO: 63 LIFAGK48QLEDGR SEQ ID NO: 64 TLSDYNIQKESTLHLVLR SEQ ID NO: 65 TLSDYNIQKTESTLHLVLR SEQ ID NO: 66 ATATGPKQQEITTNLENHFK SEQ ID NO: 67 ATATGPKAQQEITTNLENHFK SEQ ID NO: 68 KVPDK SEQ ID NO: 69 KTVPDK SEQ ID NO: 70 KVAHTSYK SEQ ID NO: 71 KTVAHTSYK SEQ ID NO: 72 VLVDKSR SEQ ID NO: 73 VLVDKASR SEQ ID NO: 74 IFKPQEIK SEQ ID NO: 75 IFKTPQEIK SEQ ID NO: 76 LYKLNDK SEQ ID NO: 77 LYKALNDK SEQ ID NO: 78 IHAQNYLKTYNEDIPVEILVR SEQ ID NO: 79 IHAQNYLKTTYNEDIPVEILVR SEQ ID NO: 80 YKTNLYK SEQ ID NO: 81 YKATNLYK SEQ ID NO: 82 YKTTNLYK SEQ ID NO: 83 EFLEKNYDR SEQ ID NO: 84 EFLEKANYDR SEQ ID NO: 85 NSKTVR SEQ ID NO: 86 NSKAkTVR SEQ ID NO: 87 ILKQVMEEK SEQ ID NO: 88 ILKTQVMEEK SEQ ID NO: 89 ELKEK SEQ ID NO: 90 SYKFPR SEQ ID NO: 91 SYKAFPR SEQ ID NO: 92 EEKQK SEQ ID NO: 93 EEKTQK SEQ ID NO: 94 LKEER SEQ ID NO: 95 LKAEER SEQ ID NO: 96 YLKMR SEQ ID NO: 97 YLKMOXR SEQ ID NO: 98 YLKAMOXR SEQ ID NO: 99 VQDSVILASSKAVTR SEQ ID NO: 100 VQDSVILASSKAkAVTR SEQ ID NO: 101 GISVLKDSDDKTR SEQ ID NO: 102 GISVLKTDSDDKTR SEQ ID NO: 103 FKNSVK SEQ ID NO: 104 FKANSVK SEQ ID NO: 105 KLAVER SEQ ID NO: 106 NQYEPGTNGKVK SEQ ID NO: 107 NQYEPGTNGKAVK SEQ ID NO: 108 KPLK SEQ ID NO: 109 AELEKLVDHHPEGLSAR SEQ ID NO: 110 AELEKDNLVDHHPEGLSAR SEQ ID NO: 111 AELEKALVDHHPEGLSAR SEQ ID NO: 112 AELEKTLVDHHPEGLSAR SEQ ID NO: 113 KYIPGTKMIFAGIK SEQ ID NO: 114 KYIPGTKMoxIFAGIK SEQ ID NO: 115 KYIPGTKMIFAGIKK SEQ ID NO: 116 KYIPGTKMoxIFAGIKK SEQ ID NO: 117 MIFAGIKKK SEQ ID NO: 118 MoxIFAGIKKK SEQ ID NO: 119 EDLIAYLKKATNE SEQ ID NO: 120 YIPGTKMIFAGIKKK SEQ ID NO: 121 YIPGTKMoxIFAGIKKK SEQ ID NO: 122 MIFAGIKKKGER SEQ ID NO: 123 MoxIFAGIKKKGER SEQ ID NO: 124 GKKIFVQK SEQ ID NO: 125 GDVEKGKKIFVQK SEQ ID NO: 126 Ac-GDVEKGKKIFVQK SEQ ID NO: 127 MQIFVKTLTGK SEQ ID NO: 128 AKIQDK SEQ ID NO: 129 LIFAGKQLEDGRTLSDYNIQK SEQ ID NO: 130 AKIQDKEGIPPDQQR SEQ ID NO: 131 AKAEAAEFR SEQ ID NO: 132 TFLEKR SEQ ID NO: 133 KVTSTLLEQDTSTEK SEQ ID NO: 134 STLKLQDTR SEQ ID NO: 135 ITPSKVSK SEQ ID NO: 136 ILIEKAR SEQ ID NO: 137 TAELIKELK SEQ ID NO: 138 LLVPQKNVK SEQ ID NO: 139 EAVKQAAK SEQ ID NO: 140 TNLYKLK SEQ ID NO: 141 QELAKSIR SEQ ID NO: 142 IVDKDGIR SEQ ID NO: 143 KLSINSAAR SEQ ID NO: 144 KEFYELK Ac-Acetyl XaaA-Alkene modification XaaAk-Alkane modification XaaDN-Dead-end modification XaaT-Thiol modification XaaS-Sulfenic acid modification XaaOX-Oxidation *-Intra-peptide linkage Xaa48-Inter-peptide linkage

DETAILED DESCRIPTION

In a first aspect of the invention, a new crosslinking compound is provided and has the formula:

where x is selected from the group consisting of

wherein R is methyl or ethyl,
and

A particularly preferred cross-linking agent is bis(2,5-dioxopyrrolidin-1-yl) 3,3′-sulfinyldipropanoate (“DSSO”):

In a second aspect of the invention, the new cross-linking agents are used to facilitate mapping of protein-protein interactions of protein complexes. In one embodiment, the method comprises the steps of providing a MS-cleavable cross-linker having the formula described above; forming a cross-linked protein complex by cross-linking proteins with the MS-cleavable cross-linker; forming cross-linked peptide fragments that are chemically bound to the MS-cleavable cross-linker by digesting the cross-linked protein complex with an enzyme such as trypsin; and using mass spectrometry (MS) and MSn analysis to identify the protein and/or peptide fragments. For convenience, in the discussion that follows, reference is sometimes made to the particular crosslinker, DSSO. It will be understood, however, that any of the other MS-cleavable crosslinkers that fit the general formula may also be used. Thus, DSSO fragments, DSSO remnants, DSSO cross-linked peptides, and like language applies equally to other crosslinkers as described herein.

Abbreviations

MS: mass spectrometry
MS/MS: tandem mass spectrometry
MSn: multi-stage tandem mass spectrometry (n=2, 3, . . . )
LC MSn: liquid chromatography multi-stage tandem mass spectrometry
CID: collision induced dissociation
DSSO: bis(2,5-dioxopyrrolidin-1-yl) 3,3′-sulfinyldipropanoate
NMR: nuclear magnetic resonance

The CID-induced separation of inter-linked peptides in MS/MS permits MS3 analysis of single peptide chain fragment ions with defined modifications (due to diamide remnants) for easy interpretation and unambiguous identification using existing database searching tools. Integration of data analyses from three generated datasets (MS, MS/MS and MS3) allows high confidence identification of DSSO cross-linked peptides. The efficacy of the newly developed DSSO-based cross-linking strategy has been demonstrated using model peptides and proteins. In addition, this method has been successfully employed for structural characterization of the yeast 20 S proteasome complex. In total, 13 non-redundant inter-linked peptides of the 20 S proteasome have been identified, representing the first application of an MS-cleavable cross-linker for the characterization of a multi-subunit protein complex. Given its effectiveness and simplicity, this cross-linking strategy can find a broad range of applications in elucidating structural topology of proteins and protein complexes.

In combination with new software developed for data integration, the inventors were able to identify DSSO cross-linked peptides from complex peptide mixtures with speed and accuracy. Given its effectiveness and simplicity, the inventors anticipate a broader application of this MS-cleavable cross-linker in the study of structural topology of other protein complexes using cross-linking and mass spectrometry.

Experimental Procedures

Materials and Reagents—

General chemicals were purchased from Fisher Scientific (Hampton, N.H.) or VWR International (West Chester, Pa.). Bovine heart cytochrome c (98% purity) and bovine erythrocyte ubiquitin (98% purity) were purchased from Sigma Aldrich (St. Louis, Mo.). Synthetic peptide Ac-IR7 (Ac-IEAEKGR (SEQ ID NO: 2), 98.1% purity) was synthesized by GL Biochem (Shanghai, China). Sequencing grade modified trypsin was purchased from Promega (Fitchburg, Wis.). The 20 S proteasome core particle was affinity purified using Prel-TAP expressing yeast strain as previously described in Leggett, D. S., Hanna, J., Borodovsky, A., Crosas, B., Schmidt, M., Baker, R. T., Walz, T., Ploegh, H., and Finley, D. (20032) Multiple Associated Proteins Regulate Proteasome Structure and Function. Mol Cell. 10, 495-507.

Synthesis and Characterization of DSSO—

FIG. 2A displays a two-step synthesis scheme of DSSO with an extended spacer length of 10.1 Å. Sulfide S-1 was first synthesized by mixing 3,3′-thiodipropionic acid (2.50 g, 14.0 mmol) with N-hydroxysuccinimide (3.30 g, 28.6 mmol) in dioxane (60 ml). The reaction mixture was stirred under an atmosphere of argon, and a solution of DCC (5.79 g, 28.1 mmol) in dioxane (20 ml) was added drop-wise. After 12 h, the insoluble urea was filtered from the reaction. The filtrate was concentrated to form a white solid. The solid residue was washed with cold diethyl ether followed by cold hexanes. After drying under reduced pressure, 5.20 g (70%) of sulfide S-1 was recovered and used without further purification: 1H (500 MHz, DMSO-d6) δ 3.02 (t, J=7.0 Hz, 4H), 2.86 (t, J=7.0 Hz, 4H), 2.81 (s, 8H); 13C (125 MHz, DMSO-d6) δ 170.1, 167.8, 31.4, 25.6, 25.4; IR (KBr pellet) 1801, 1732 cm−1; HRMS (ES/MeOH) m/z calcd for C14H16N2O8SNa [M+Na]+ 395.0525. found 395.0531.

To synthesize DSSO, a solution of sulfide S-1 (0.600 g, 1.61 mmol) in CHCl3 (30 ml) at 0° C. was mixed with a solution of m-chloroperbenzoic acid (MCPBA) (0.371 g, 1.61 mmol) in CHCl3 (10 ml). The reaction product was filtered and washed with cold CHCl3 (10 ml) and cold MeOH (10 ml). The filtrate was cooled to −10° C. for 1 h, washed again with CHCl3 and MeOH, and dried under reduced pressure to yield 0.400 g (64%) of DSSO: 1H (600 MHz, DMSO-d6) δ 3.28-3.21 (m, 2H), 3.17-3.13 (m, 4H), 3.08-2.99 (m, 2H), 2.88-2.75 (s, 8H); 13C (125 MHz, DMSO-d6) δ 170.08, 167.74, 44.62, 25.46, 23.41; IR (KBr pellet) 2943, 1786, 1720 cm−1; HRMS (ES/MeOH) m/z calculated for C14H16N2O9Na [M+Na]+ 411.0474. found 411.0471.

A similar synthetic approach is used to make the other symmetric diesters identified above and having the general structure 2, where X is as defined above. Thus, the symmetric sulfide is prepared by reacting 3,3′-thiodipropionic acid with the appropriate N-hydroxyamine (e.g., a functionalized analogue of N-hydroxysucinimide (compounds 4-7), or other N-hydroxy-functionalized heterocycle (compounds 3, 8, and 9), and then the sulfinyl group is made by treating the symmetric sulfide with MCPBA in CHCl3 or another appropriate solvent.

Cross-Linking of Synthetic Peptides with DSSO—

Synthetic peptides Ac-IR7, Ac-myelin and substance P were dissolved in DMSO to 1 mM and cross-linked with DSSO dissolved in DMSO in a ratio of 1:1 in the presence of 1 equivalent diisopropylethylamine similarly as described Vellucci, D, et al. The cross-linked peptide solution was then diluted to 1 pmol/μl in 4% ACN, 0.1% formic acid for liquid chromatography multi-stage tandem mass spectrometry (LC MSn) analysis.

Cross-Linking of Cytochrome C and Ubiquitin with DSSO—

Lyophilized bovine cytochrome c or ubiquitin was reconstituted in 1×PBS (pH 7.5) to 200 μM, 20 μl of which was mixed with 2 μl 20 mM DSSO (in DMSO) in a molar ratio of 1:10 (protein: cross-linker) for the cross-linking reaction as described in Vellucci, D., et al. The cross-linked protein was digested with trypsin (1% w/w) overnight at 37° C. The cross-linked peptide digest was then diluted to 1 pmol/μl in 4% ACN, 0.1% formic acid for LC MSn analysis.

Cross-Linking of the Yeast 20 S Proteasome with DSSO—

Affinity purified yeast 20S proteasome complex was concentrated by Microcon (Billerica, Mass.) to −1.2 μM in 1×PBS buffer (pH 7.5). Typically 50 μl of the 20S proteasome was cross-linked with 3 μl DSSO (20 mM) dissolved in DMSO (final concentration ˜1 mM) at a molar ratio of 1:1000 (protein:cross-linker). Cross-linking was performed for a half hour or overnight and quenched with excess ammonium bicarbonate buffer. Cysteine residues were reduced with 5 mM DTT at 56° C. for 30 mins, and alkylated with 10 mM choloroacetamide for 30 min at room temperature. The cross-linked protein complex was digested with trypsin (2% w/w) overnight at 37° C. Digested peptides were desalted by C18 OMIX ZipTip (Varian, Palo Alto, Calif.) prior to LC MSn analysis.

For some analyses, 2-dimensional LC MSn analysis was carried out. Off-line strong cation exchange (SCX) chromatography was performed as the first dimension of separation using an ÄKTA HPLC system (GE Healthcare Life Sciences, Uppsala, Sweden) as described in Kaake, R. M., et al. Each fraction was desalted by ZipTip prior to LC MS' analysis.

LC MSn Analysis—

LC MSn analysis of DSSO cross-linked peptides was performed using a LTQ-Orbitrap XL MS (Thermo Scientific, San Jose, Calif.) with an on-line Eksigent NanoLC system (Eksigent, Dublin, Calif.). The LC separation was the same as previously described by Vellucci, D., et al. The MSn method was set specifically for analyzing DSSO cross-linked peptides. Each acquisition cycle of a MSn experiment includes one MS scan in FT mode (350-1800 m/z, resolution of 60,000 at m/z 400) followed by two data-dependent MS/MS scans with normalized collision energy at 10 or 15% on the top two peaks from the MS scan, and then three MS3 scans operated in LTQ with normalized collision energy at 29% on the top three peaks from each of the MS/MS scans. For initial analyses, MS/MS spectra were acquired in LTQ in LC MSn experiments. For automated data analysis, MS/MS spectra were obtained in FT mode (resolution of 7500).

Data Analysis of DSSO Cross-Linked Peptides—

Monoisotopic masses of parent ions and corresponding fragment ions, parent ion charge states and ion intensities from LC MS/MS and LC MS3 spectra were extracted using in-house software based on Raw_Extract script from Xcalibur v2.4 (Thermo Scientific, San Jose, Calif.). Database searching was performed with a developmental version of Protein Prospector (v. 5.5.0, University of California, San Francisco) (http://prospector.ucsf.edu/prospector/mshome.htm) using its software suite, i.e. Batch-Tag and MS-Bridge as described in Chu, F., et al. Using in-house scripts, extracted MS3 data were reformatted such that MS3 fragment ions were directly linked to their MS/MS parent ions. For cytochrome c (P62894) and ubiquitin (P62990) analyses, database searching of MS3 spectra was performed using Batch-Tag against their accession numbers in SwissProt. 2009.09.01 database. For the 20S proteasome, Batch-Tag search of MS3 data was performed against a decoy database consisting of a normal SGD yeast database concatenated with its reversed version (total 13490 protein entries). The mass tolerances for parent ions and fragment ions were set as ±20 ppm and 0.6 Da, respectively. Trypsin was set as the enzyme and a maximum of two missed cleavages were allowed. Protein N-terminal acetylation, methionine oxidation, and N-terminal conversion of glutamine to pyroglutamic acid were selected as variable modifications. In addition, three defined modifications on uncleaved lysines were chosen, including alkene (C3H2O, +54 Da), sulfenic acid (C3H4O2S, +104 Da), and thiol (C3H2SO, +86 Da) modifications due to remnants of the cross-linker (FIG. 1). Initial acceptance criteria for peptide identification required a reported expectation value ≦0.05. For the 20S proteasome analysis, the false positive rate for peptide identification is less than 1%.

The Link-Finder program (http://www.ics.uci.edu/˜baldig/Link-Finder/) was developed to search MS/MS data and identify the list of putative DSSO inter-linked and dead-end products based on their unique MS fragmentation patterns as illustrated in FIG. 2 (details see results section). For example, one embodiment of the invention includes identifying the MS/MS data that display characteristic fragmentation profiles of DSSO cross-linked peptides based on the unique mass relationships between parent ions of cross-linked peptides and their fragment ions to obtain an MS/MS result including a list of parent ions corresponding to cross-linked peptide candidates (e.g., the putative or potential identities of the cross-linked peptides being analyzed). In one embodiment, analysis of the MS/MS data is carried out using the Link-Finder program. Monoisotopic masses and charges of parent ions measured in MS scans for those putative cross-linked peptides identified by the Link-Finder program were subsequently submitted to MS-Bridge to determine cross-linked peptide sequences by mass mapping with a given cross-linker (i.e. DSSO) and protein sequences (see Chu, F., et al.). For example, one embodiment of the invention further includes mass mapping the MS data using the list of parent ions corresponding to the cross-linked peptide candidates and the MS-cleavable cross-linker againt known protein sequences to obtain an MS result comprising possible cross-linked peptide sequences. In one embodiment, the mass mapping is carried out using MS-Bridge. The parent mass error for MS-Bridge search was set as ±10 ppm and only one cross-link was allowed in the cross-linked peptides for general search. All of the three types of the cross-linked peptides (Schilling, B., et al.), i.e. inter-linked (type 2), intra-linked (type 1) and dead-end modified (type 0), can be computed and matched in MS-Bridge (see Chu, F., et al.).

The search results from Link-Finder, Batch-Tag and MS-Bridge programs are integrated together using in-house scripts to compile a list of cross-linked peptides identified with high confidence. The final results were validated manually by examining MS/MS spectra and MS3 spectra respectively.

Results

Development of a Novel Sulfoxide Containing MS-Cleavable Cross-Linker—

In order to develop a robust MS-cleavable cross-linking reagent, the incorporated MS-labile bond must have the ability to selectively and preferentially fragment prior to peptide backbone breakage independent of peptide charges and sequences. It is well documented that methionine sulfoxide containing peptides have preferential fragmentation at the C—S bond adjacent to the sulfoxide during collision induced dissociation (CID) analysis (see Reid, G. E., Roberts, K. D., Kapp, E. A., and Simpson, R. I. (2004) Statistical and Mechanistic Approaches to Understanding the Gas-Phase Fragmentation Behavior of Methionine Sulfoxide Containing Peptides. J Proteome Res 3, 751-759), and this fragmentation is dominant and much more labile than peptide bonds. Such labile fragmentation has often been observed as the loss of 64 Da (—SOCH4) from oxidized methionine containing peptides in our routine peptide analysis. Therefore, the inventors expect that if a sulfoxide is incorporated in the spacer region of a NHS ester, the C—S bond adjacent to the sulfoxide will be MS-labile and prone to preferential fragmentation. To test this, the inventors have designed and synthesized a CID cleavable cross-linker having a general formula of 3,3′-sulfinylbispropanoic acid, also known as 3,3-′sulfonyldipropanoic acid. The molecular formula is C6H10O5S, and it has a general structure as shown in General Structure 2 of FIG. 1 where X=—OH. The molecular formula is C6H10O5S, and it has a general structure as shown in General Structure 2 of FIG. 1 where X=—OH. More specific cleaving agents are as shown in FIG. 1 including Compound 1, namely Disuccinimidyl Sulfoxide (sometimes hereinafter referred to as “DSSO”), which is one exemplary compound of the invention. Other compounds where the X in the General Structure 2 are substituted are shown as Compounds 3-6 in FIG. 1. Hereinafter, while reference is made to DSSO, other MS-cleavable cross-linker having the general structure as shown in General Structure 2 of FIG. 1 are included as MS-cleavable cross-linkers of invention. Turning back to disuccinimidyl sulfoxide (DSSO), it contains two NHS ester functional groups and two symmetric MS-labile C—S bonds adjacent to the sulfoxide (FIG. 2A). DSSO has a spacer length of 10.1 Å, making it well suited for detecting protein interaction interfaces of protein complexes and generating highly informative distance constraints. In comparison to existing MS-cleavable cross-linkers, DSSO can be easily synthesized in a two-step process as shown in FIG. 2A.

Proposed CID Fragmentation Pattern of DSSO Cross-Linked Peptides—

Three types of cross-linked peptides can be formed during the cross-linking reaction: inter-linked (type 2), intra-linked (type 1) and dead-end (type 0) modified peptides (Schilling, B., et al.), among which inter-linked peptides are the most informative for generating distance constraints. FIGS. 2B-D shows the proposed fragmentation schemes of DSSO cross-linked peptides. As shown in FIG. 2B, during CID analysis of a DSSO inter-linked peptide α-β, the cleavage of one C—S bond next to the sulfoxide separates the inter-linked peptide into a pair of peptide fragments, i.e. αAS, in which the a peptide fragment is modified with the alkene (A) moiety (+54 Da) and the β peptide fragment is modified with the sulfenic acid (S) moiety (+104 Da). If peptides a and β have different sequences, two possible pairs of fragments (i.e. αAS and αSA) will be observed due to the breakage of either of the two symmetric C—S bonds next to the sulfoxide in the spacer region of DSSO (FIG. 2B), thus resulting in four individual peaks in the MS/MS spectrum. But if peptides a and β have the same sequences, only one fragment pair, i.e. two peaks, will be detected in the MS/MS spectrum. To determine sequences of inter-linked peptides and assign the cross-linking site, the resulting peptide fragments (i.e. αA, βS, αS, or βA) generated in MS/MS can be further subjected to LTQ-Orbitrap XL MS for MS3 analysis. Because these fragments represent single peptide sequences, the interpretation of the MS3 spectra by Batch-Tag program in Protein Prospector is identical to the identification of a single peptide with a defined modification (remnant of the cross-linker). This will dramatically simplify data interpretation and improve the identification accuracy of cross-linked products.

DSSO dead-end modified peptides have a defined mass modification (+176 Da) due to the half-hydrolyzed DSSO (FIG. 2C). MS/MS analysis of a dead-end modified peptide αDN would result in two possible fragment ions, i.e. αA and αS, due to the cleavage of the C—S bond on either side of the sulfoxide. The inventors name the αA and αS fragments as the dead end fragment pair and the mass difference between these fragments correlates to the difference between the remnants of DSSO attached to the fragments. Similarly, intra-linked peptides (e.g. αintra) also have a defined mass modification (+158 Da) due to DSSO cross-linking of two distinct lysines in the same peptide sequence (FIG. 2D). The cleavage of the C—S bond will result in only one fragment peak in MS/MS with the same mass as the parent ion observed in MS. MS3 analysis of fragment ions detected in MS/MS will lead to the detection of y or b ions containing either alkene (A) or sulfenic acid (S) modifications.

As shown in FIG. 2E, the sulfenic acid containing fragment (e.g. αS, βS, or αA+S) may undergo further fragmentation and lose a water molecule (−18 Da) to generate a new fragment containing an unsaturated thiol (T) moiety (+86 Da) (e.g. αT, βT, or αA+T). The inventors do not expect any complication with data analysis as the thiol-containing fragment ion will become the dominant ion instead of the sulfenic acid modified fragment ion in the MS/MS spectrum. Thus the inventors anticipate that the total number of pairs and peaks will remain similar as shown in FIGS. 2B-D. Due to specific and unique MS/MS fragmentation patterns for different types of DSSO cross-linked peptides, there are fixed mass relationships between parent ions and their fragment ions as listed in FIG. 2F. For DSSO inter-linked peptides (α-β), the mass sum of each fragment pair (αAS or αSA) is equivalent to the mass of the parent ion (FIG. 2F, Eq. 1). If αS or βS loses a water and becomes αT or βT respectively, the fragment pairs will be αAT and αTA and the mass sum of each fragment pair plus a water will be the same as the parent mass (FIG. 2F, Eq. 2). As for the dead-end (DN) modified peptide αDN, each fragment (i.e. αA, αS or αT) has a distinct mass difference from the parent ion (FIG. 2F, Eq. 3). For the intra-link peptide αintra, the fragment mass could be either the same as the parent mass (i.e. αA+S), or 18 Da less than the parent mass (i.e. αA+T) (FIG. 2F, Eq. 4). Moreover, there is a definite mass difference (Δ 32 Da) between the thiol (T) and alkene (A) modified forms of the same sequence (FIG. 2F, Eq. 5). These characteristic mass relationships have been incorporated into the Link-Finder program to identify DSSO cross-linked peptides.

Characterization of DSSO Cross-Linked Model Peptides by MSn Analysis—

To characterize the new DSSO linker, the inventors have first cross-linked several model peptides including Ac-IR7, Ac-myelin, and substance P. Under the experimental conditions, the major cross-linked products for Ac-IR7 and Ac-myelin are inter-linked, whereas substance P mostly formed dead-end modified peptides. All of the cross-linked model peptides were subjected to LC MSn analysis. The inter-linked Ac-IR7 peptide (α-α) was detected as doubly charged (m/z 923.462+) and triply charged (m/z 615.973+) ions (FIG. 3A). MS/MS analyses of the two differently charged parent ions resulted in two dominant fragment ions respectively (FIGS. 3B-C). Since the two inter-linked sequences are identical, only one fragment pair (i.e. αAS) was observed as expected. The results suggest that MS/MS fragmentation of inter-linked peptides is independent of peptide charges. It should be noted that besides unique mass relationships, the fragment ions in each pair have a defined charge relationship associated to the charge of the parent ion. In other words, the sum of the observed charges for each fragment in a pair equals the charge of the parent ion. For example, the triply charged parent ion (m/z 615.973+) generated the fragment pair with one doubly charged (αA) and one singly charged (αS1+) ion, whereas the doubly charge parent ion (m/z 923.462+) only produced a fragment pair with two singly charged (αA1+ and αS1+) ions. This information can be used to validate the fragment pairs identified by masses. The respective MS3 analysis of αA and αS ions (FIGS. 3D-E) allowed unambiguous identification of the peptide sequence and cross-linked site based on a series of y and b ions. Similar analysis was carried out for inter-linked Ac-myelin (β-β), and a characteristic fragment pair was observed in MS/MS spectra of the parent ion (β-β at three different charge states (m/z 458.236+, 549.685+, 686.844+) respectively (FIGS. 3F-I), which represent the expected fragmentation of two identical inter-linked peptides. While the fragment pair βAS was detected in MS/MS spectra of quintuply and quadruply charged inter-linked Ac-myelin (β-β) (m/z 549.685+, 686.844+) (FIGS. 2H-I), the fragment pair βAT was observed in the MS/MS spectrum of sextuply charged inter-linked Ac-myelin (β-β) (458.236+) (FIG. 3G). The βT fragment, namely the β peptide fragment containing an unsaturated thiol (T) moiety (+86 Da), was generated due to the loss of H2O from the sulfenic acid moiety on the βS fragment (FIG. 2E). This is likely due to excess collision energy deposited on the highest charged species as the collision energy chosen for CID analysis in LTQ-Orbitrap XL MS does not change with peptide charges during LC MSn runs.

In addition to inter-linked peptides, dead-end modified peptides were analyzed. FIG. 2J displays the MS spectrum of the dead-end (DN) modified substance P (γDN, m/z 538.762+). As predicted in FIG. 2D, MS/MS analysis of γDN led to two major fragments, the alkene (γA, m/z 478.032+) and sulfenic acid (γS, m/z 502.952+) containing peptide fragments, representing the characteristic feature of dead-end modified peptides. The fragment ions carry the same charge state as the parent ion, and MS3 analysis of the γA fragment confirmed its sequence unambiguously (FIG. 3L). Taken together, the results clearly demonstrate that the new MS-cleavable bonds in DSSO are labile and can be preferentially fragmented prior to peptide bond breakage, and the desired fragmentation is independent of peptide charge states and sequences.

Characterization of DSSO Cross-Linked Peptides of Model Proteins by MSn Analysis—

The inventors next evaluated the applicability of DSSO for protein cross-linking under physiological conditions. Model proteins cytochrome c (see for previously described Sinz, A. (2003); Kasper, P. T., et al.; Nessen, M. A., et al.; Vellucci, D., et al.; Lee, Y. J., et al.; Pearson, K. M., Pannell, L. K., and Fales, H. M. (2002) Intramolecular Cross-Linking Experiments on Cytochrome C and Ribonuclease a Using an Isotope Multiplet Method. Rapid Commun. Mass Spectrom. 16, 149-159; Dihazi, G. H., and Sinz, A. (2003) Mapping Low-Resolution Three-Dimensional Protein Structures Using Chemical Cross-Linking and Fourier Transform Ion-Cyclotron Resonance Mass Spectrometry. 17, 2005-2014; and Guo, X., Bandyopadhyay, P., Schilling, B., Young, M. M., Fujii, N., Aynechi, T., Guy, R. K., Kuntz, I. D., and Gibson, B. W. (2008) Partial Acetylation of Lysine Residues Improves Intraprotein Cross-Linking. Anal Chem 80, 951-960) and ubiquitin (Chowdhury, S. M., et al.; and Gardner, M. W., et al.) have been extensively utilized to test various new cross-linking strategies since they have a relatively large number of lysine residues accessible for cross-linking. Based on our previous work (see Vellucci, D., et al.), cytochrome c was cross-linked with a 10-fold excess of DSSO. The cytochrome c cross-linking efficiency using DSSO was comparable to the efficiency using DSG or our previously developed Azide-DSG cross-linkers (see Vellucci, D., et al.), indicating that DSSO is as effective for protein cross-linking reactions. The DSSO cross-linked cytochrome c was then digested with trypsin and analyzed by LC MSn. Three types of cross-linked peptides of cytochrome c (i.e. inter-link, intra-link and dead-end) have been observed. FIG. 4A displays the MS/MS spectrum of a tryptic peptide of cytochrome c with m/z 419.97164+, in which only four abundant fragment ions (m/z 336.422+, 352.402+, 478.992+, 494.962+) were detected, suggesting this peptide as a potential heterodimeric inter-linked peptide (α-β). Two possible fragment pairs, αAS/T and αS/TA are thus expected, in which S/T means either S (sulfenic) or T (unsaturated thiol) containing fragment ions will be observed. Using the mass relationship between the pairs and the parent ion of inter-linked peptides (Eqs. 1, 2, 5 in FIG. 2F), the inventors identified two fragment pairs as αAT (478.992+/352.402+) and αTA (494.962+/336.422+), confirming that this peptide is a heterodimeric inter-linked peptide (a-0). Mass mapping of the parent ion (m/z 419.97164+) by MS-Bridge revealed that it matches to an inter-linked peptide [Ac-GDVEKGKK (SEQ ID NO: 11) inter-linked to KKGER (SEQ ID NO: 13)] with an error of 0.48 ppm. The fragment ions αA (m/z 478.992+) and βT (m/z 352.402+) were further subjected to MS3 sequencing and their MS3 spectra are illustrated in FIGS. 4B-C. Based on the series of y (i.e. y1-7) and b (i.e. b2-7) ions, the sequence of the MS/MS fragment ion αA (m/z 478.992+) was unambiguously identified as Ac-GDVEKAGKK (SEQ ID NO: 12), in which K (Lys) at 5th position from N-terminus was determined to be modified with the alkene moiety. MS3 analysis of the corresponding fragment pair ion βT (m/z 352.402+) determined its sequence as KTKGER (SEQ ID NO: 14). Although there are two lysine residues in the sequence, occurrence of y4 and a1 ions indicates that the first N-terminal K is modified with an unsaturated thiol moiety. Taken together, the identity and cross-linking site of the inter-link peptide [Ac-GDVEKGKK (SEQ ID NO: 11) inter-linked to KKGER (SEQ ID NO: 13)] was determined unambiguously.

FIGS. 5A-C display MS/MS spectra of triply (m/z 641.67303+), quadruply (m/z 481.50694+), and quintuply (m/z 385.40705+) charged ions of a cytochrome c cross-linked peptide. The MS/MS spectrum of the triply charged ion (m/z 641.67303+) resulted in four dominant fragment ions (m/z 386.24, 418.21, 744.402+, 760.382+), which have been determined as the two fragment pairs αAT (744.402+/418.21) and αTA (760.382+/386.24), indicating this peptide is a heterodimeric inter-linked peptide. The same characteristic fragment pairs, i.e. αAT and αTA have also been identified but with different charges in the MS/MS spectra of the quadruply (m/z 481.50694+) and quintuply (m/z 385.40705+) charged parent ions respectively (FIGS. 5B-C). It is noted that some charge distribution of fragment ions was observed in the pairs (FIG. 5C) due to the high charge state of the parent ion. Nevertheless, the dominant ions are the characteristic fragment ions of the inter-linked peptide. MS3 analysis of the αA (m/z 496.603+) fragment has revealed its sequence identity unambiguously as HKATGPNLHGLFGR (SEQ ID NO: 17), in which the K (Lys) at position 2 from N-terminus was modified with the alkene moiety (FIG. 5D). In combination with the MS-Bridge result, the inter-linked peptide is identified as [HKTGPNLHGLFGR (SEQ ID NO: 16) inter-linked to GKK]. These results demonstrate that preferred fragmentation of the C—S bonds in DSSO inter-linked peptides of cytochrome c occurs as expected and is independent of peptide charge states and sequences.

To understand how dead-end modified peptides of cytochrome c behave in MS' analysis, FIG. 6A illustrates the MS/MS spectrum of a selected dead-end modified peptide (m/z 880.89752+). As shown, two major fragment ions (m/z 820.202+ and 835.882+) were detected and they are 122 and 90 Da less than the parent ion respectively. Such mass differences between the parent ion and its fragment ions fit well with those predicted for DSSO dead-end modified peptides (eq. 3 in FIG. 2F), identifying the ion m/z 820.202+ as αA and 835.882+ as αT fragment. MS3 analysis of the αA fragment (m/z 820.202+) (FIG. 6B) as well as the MS-Bridge result of the parent ion (m/z 880.89752+) identified its sequence as KDNTGQAPGFSYTDANK (SEQ ID NO: 20).

As discussed above (FIG. 2D), the inventors predict that MS/MS analysis of the intra-linked peptide (αintra) will lead to either a fragment ion (αA+S) containing one KA (LysA) and one KS (LysS) with the same mass as the parent ion or a fragment ion (αA+T) containing one KA (LysA) and one KS (LysT) with a mass 18 Da less than the original parent ion. FIG. 6C displays the MS/MS spectrum of a cytochrome c tryptic peptide with m/z 611.98023+ in which only one major fragment ion (m/z 606.242+) was detected with a mass 18 Da less than the parent ion. This suggests that the peptide is potentially an intra-linked peptide of cytochrome c and its MS/MS fragment ion (m/z 606.242+) can be labeled as αA+T. Mass mapping of the parent ion m/z 611.98023+ using MS-Bridge matched to an intra-linked peptide, GGK*HK*TGPNLHGLFGR (SEQ ID NO: 24), where the two N-terminal K* (Lys*) are linked. Since the CID-induced C—S bond breakage can occur at either side of the sulfoxide, a mixture of two fragments with identical masses but with alkene (A) or thiol (T) moieties at either K can be generated. FIG. 6D illustrates the MS3 spectrum of the MS/MS fragment ion (m/z 606.243+), with a series of y and b ions confirming its identity as GGKTHKATGPNLHGLFGR (SEQ ID NO: 26) and/or GGKAHKTTGPNLHGLFGR (SEQ ID NO: 25). The detection of y13 (760.432+), and b3 (297.34) ions indicates the presence of the peptide fragments from the sequence of GGKTHKATGPNLHGLFGR (SEQ ID NO: 26), and the detection of b3* (329.37), b4* (466.33), y12* (692.102+), and y13* (744.512+) identified the peptide fragments from the GGKAHKTTGPNLHGLFGR (SEQ ID NO: 25) sequence.

Development of an Integrated Workflow for Fast and Accurate Identification of DSSO Cross-Linked Peptides by LC MS″—

In order to facilitate data analysis for the identification of DSSO cross-linked peptides from complex mixtures, the inventors have developed an integrated workflow for processing LC MSn data acquired by LTQ-Orbitrap XL MS (FIG. 7A). During LC MSn analysis, three types of data are collected, i.e. MS, MS/MS and MS3 spectra, in which MS and MS/MS are acquired in FT mode to allow accurate mass measurement and charge determination of both parent ions in MS and their fragment ions in MS/MS spectra. MS3 is obtained in LTQ to achieve the highest sensitivity. As shown, the first data extraction step is to generate the text files containing peak lists of MS/MS and MS3 data respectively. Based on the unique MS/MS fragmentation profiles of DSSO cross-linked peptides and the defined mass relationships between parent ions and their fragment ions (FIG. 2), Link-Finder program was developed to automatically search MS/MS data to identify putative DSSO cross-linked peptides (FIG. 7B). As discussed above, the inter-linked products produce distinct MS/MS spectra with two pairs of dominant peptide fragments (αAS/T and αT/SA). For each MS/MS scan, among the top eight most abundant peaks, if there is a fragment pair with a mass sum equal to their parent mass with or without a water loss (−18 Da), the parent ion will be categorized as a possible inter-linked peptide. If two of those pairs can be found, and the mass difference between any two fragments from the two distinct pairs is 32 Da, i.e., the mass difference between the thiol and alkene moieties, then it is almost certain that the parent ion is a true inter-linked product. The dead-end product typically has two major fragment ions representing the parent peptide attached with either a thiol or an alkene moiety. Among the top three peaks, if there are two peaks with mass difference of 32 Da, and one of them is 90 Da less than the parent mass, then it is categorized as a possible dead-end peptide. Using the Link-Finder program, a list of parent ions are identified as putative inter-linked or dead-end modified peptides. The generated list of parent ion masses is then subjected to MS-Bridge to identify putative cross-linked peptides of all types by mass matching with high mass accuracy (<10 ppm).

For MS3 data, only the original parent ion observed in MS scan is listed as the precursor ion during database searching. In order to extract the MS3 parent ion (fragment ions in MS/MS), for Batch-Tag search, the second data extraction step is carried out using in-house scripts to generate a modified MS3-txt file. The Batch-Tag search result provides high confidence identification of single peptide fragments generated in MS/MS that are initially cross-linked. Finally, the results from three different types of searches, i.e. Batch-Tag (MS3 data), Link-Finder (MS/MS data), and MS-Bridge (MS data) are integrated using in-house scripts within Link-Finder program to obtain accurate and reliable identification of cross-linked peptides. Among them, MS3 sequencing with Batch-Tag searching is essential for unambiguous identification of cross-linking sites.

Identification of DSSO Cross-Linked Peptides of Model Proteins by Automated Database Searching—

The newly developed integrated workflow was first employed to identify DSSO cross-linked peptides of cytochrome c. In total, 19 inter-linked peptides have been unambiguously identified and summarized in TABLE 1 (for details see TABLE 3 and FIG. 11). Each peptide has characteristic fragment pairs in MS/MS spectra and was identified by Link-Finder program. In addition, one or two MS/MS fragment pair ions have been sequenced by MS3 to provide unambiguous identification. Moreover, all of the parent masses fit well with identified cross-linked peptides by MS-Bridge program with high mass accuracy. In comparison to reported cross-linking studies of cytochrome c (Schilling, B., et al.; Kasper, P. T. et a/.; Nessen, M. A. et al.; Vellucci, D. et al.; Lee, Y. J., et al.; Pearson, K. M., et al.; Dihazi, G. H.; and Guo, X., et al.), three novel inter-links have been identified in this work. Besides the inter-linked peptides, 7 intra-linked and 8 dead-end peptides have also been identified (See TABLE 3). For the dead-end modified peptides, each has a dead-end fragment pair and at least one of the fragment ions has been sequenced, which correlates very well with MS-Bridge and Batch-Tag results. The intra-linked peptides were mainly identified by Batch-Tag and MS-Bridge results.

In addition to products with one cross-link (i.e. type 0, 1 and 2), peptides containing two cross-links have also been identified using this integrated workflow. In this work, 11 non-redundant DSSO cross-linked peptides with two links (e.g. one inter-link with one dead-end, one inter-link with one intra-link, or one intra-link with one dead-end) have been identified and summarized in TABLE 3. This type of information is not commonly reported since peptide sequencing of multi-linked peptides is highly complicated. This demonstrates the ability of our new cross-linking strategy for identifying such complex products.

Based on the crystal structure of bovine heart cytochrome c (PDB ID; 2B4Z) (44), the inventors have calculated the distances between alpha carbons of the identified cross-linked lysine residues (TABLE 1 and TABLE 3). Among the 26 non-redundant inter-linked lysines in cytochrome c identified in this work (excluding linkages between two adjacent lysines), all of the linkages have the distances between their alpha carbons within the range of 5.3 Å to 19.3 Å. This is consistent not only with the length of a fully expanded DSSO (10.1 Å spacer length) and two lysine side chains, but also with the previous results using similar lengths of NHS ester cross-linkers (see Vellucci, D., et al.; Lee, Y. J., et al.; Guo, X., et al.; and Kruppa, G. H., Schoeniger, J., and Young, M. M. (2003) A Top Down Approach to Protein Structural Studies Using Chemical Cross-Linking and Fourier Transform Mass Spectrometry. Rapid Commun Mass Spectrom 17, 155-162). The results suggest that our cross-linking conditions did not induce significant disturbance to cytochrome c structural conformations.

In addition to cytochrome c, the same strategy has been successfully applied to identify DSSO cross-linked peptides of ubiquitin. Using the same analysis strategy, 3 inter-linked, 1 intra-linked, and 5 dead-end peptides have been identified as summarized in TABLE 4 and FIG. 11. Based on the crystal structure of bovine ubiquitin (PDB ID; 1AAR), all of the identified inter-/intra-linked lysines in ubiquitin have the distances between their alpha carbons within the range of 6 to 18 Å. The identified cross-linked lysines are consistent with the known structure of ubiquitin and previous reports (Chowdhury, S. M., et al.; and Gardner, M. W., et al.) It is interesting to note that one of the identified inter-linked peptides is [LIFAGK48QLEDGR (SEQ ID NO: 63) inter-linked to LIFAGK48QLEDGR (SEQ ID NO: 63)], which is a cross-link formed between the ubiquitin dimer. Residue K48 is located at a hydrophobic patch important for protein interactions and K48 is also an in vivo chain linkage site for polyubiquitination required for ubiquitin/ATP dependent proteasomal degradation (Pickart, C. M., and Cohen, R. E. (2004) Proteasomes and Their Kin: Proteases in the Machine Age. Nat Rev Mol Cell Biol. 5, 177-187). The same K48-K48 (Ly48-Lys48) cross-link was identified previously using an alkyne-tagged NHS ester, but only after selective enrichment coupled with CID and ETD analyses (Chowdhury, S. M., et al.). In comparison, the inventors were able to identify the K48 inter-linked peptide without any enrichment, thus further demonstrating the effectiveness of our approach to identify DSSO cross-linked peptides from complex mixtures.

Structural Elucidation of the Yeast 20 S Proteasome Complex Using DSSO Cross-Linking—

The ubiquitin-proteasome degradation pathway plays an important role in regulating many biological processes (Pickart, C. M., et al.) The 26 S proteasome complex is the macromolecular machine responsible for ubiquitin/ATP dependent protein degradation, and it is composed of two subcomplexes: the 20S core particle and the 19 S regulatory complex. To date, only the crystal structure of the 20 S proteasome complex has been resolved. However, structures of the 19 S and 26 S remain elusive, thus hindering the understanding of the structure and functional relationship of the 26 S proteasome complex. To develop an effective cross-linking strategy to elucidate structures of the 19 S and 26 S proteasome complexes, have therefore investigated the structure of the yeast 20 S proteasome complex using the DSSO cross-linking approach. The cross-linking of the 20 S proteasome complex was carried out in PBS buffer under conditions allowing efficient cross-linking of all subunits as based on 1-D SDS-PAGE (FIG. 12). The tryptic digest of the cross-linked proteasome complex was subjected to LC MS' analysis and the data were analyzed using the integrated work flow described above (FIG. 7). In total, 13 unique inter-linked peptides were identified including 10 intra-subunit and 3 inter-subunit heterodimeric inter-links as summarized in TABLE 2 (for details see TABLE 5), which were determined unambiguously by integration of Link-Finder, Batch-Tag (MS3 sequencing, see FIG. 13), and MS-Bridge (mass mapping of the cross-linked peptides) results. As an example, FIG. 8A displays the MS/MS spectrum of a DSSO heterodimeric inter-linked peptide α-β (m/z 833.92314+) of the yeast 20 S proteasome complex, in which two fragment pairs were detected and determined as αAT (868.452+/790.392+) and αTT (884.442+/774.412+). MS3 analysis of the αA fragment (m/z 868.452+) identified the a chain unambiguously as NKAPELYQIDYLGTK (SEQ ID NO: 28), which matched to 20 S subunit β4. In this sequence, KA is modified with the alkene moiety. In addition, MS3 analysis of the βT fragment (m/z 790.392+) identified the β chain unambiguously as LGSQSLGVSNKTFEK (SEQ ID NO: 30), which matched to 20 S subunit β3. Here, KT is modified with an unsaturated thiol moiety. Mass mapping by MS-Bridge further confirmed this inter-subunit (β4-β3) inter-linked peptide as [NKPELYQIDYLGTK (SEQ ID NO: 27) inter-linked to LGSQSLGVSNKFEK (SEQ ID NO: 29)].

In addition, 21 dead-end modified peptides were identified by multiple lines of evidence as illustrated in TABLE 5. The fragmentation behavior for the dead-end modified peptides of the 20 S subunits is the same as that of cytochrome c showing two distinct dead-end pairs in MS/MS spectra. This is illustrated with an example shown in FIG. 14.

The experimentally determined structure of the yeast 20 S proteasome holocomplex was utilized (Protein Data Bank code 1RYP) to assess the cross-linked lysine pairs identified in this study. For each identified cross-link the distance between the alpha carbons was calculated and the results are summarized in TABLE 2. Considering the spacer length of DSSO and lysine side chains, the theoretical upper limit for the distance between the alpha carbon atoms of paired lysines is approximately 26 Å. The inventors' reported distances are within this upper limit, providing some evidence that the proteasome cross-links are formed in the native state. The quaternary proteasome structure is formed by four stacked seven-member rings in the order αββα. The side view and basal view of the arrangement among one set of the symmetric αβ rings and their subunits are shown in FIG. 9. The alpha carbon trace is shown for all subunits and the cross-linked lysines are shown in space fill representation. Lysines forming intra-subunit cross-links appear in blue and those forming inter-subunit cross-links appear in red. The images in FIG. 9 were generated using UCSF Chimera visualization software (Pettersen, E., Goddard, T., Huang, C., Couch, G., Greenblatt, D., Meng, E., and Ferrin, T. (2004) Ucsf Chimera—a Visualization System for Exploratory Research and Analysis. Journal of computational chemistry 25, 1605-1612).

DISCUSSION

The inventors have presented a novel cross-linking strategy for structural analysis of model proteins and the yeast 20 S proteasome complex by combining a newly designed MS-cleavable cross-linker DSSO with an integrated data analysis workflow. As noted above, while this discussion has centered around DSSO (shown as Compound 1 in FIG. 1), other compounds having the General Structure 2, such as Compounds 3-6 can also be used. This approach is effective and facilitates fast and accurate identification of DSSO cross-linked peptides by LC MSn. The new MS-cleavable cross-linker DSSO is attractive for cross-linking studies of protein complexes for a number of reasons: 1) it can be easily synthesized and can cross-link protein complexes effectively at sub-micromolar concentrations (˜1 μM); 2) it has two symmetric CID labile C—S bonds that preferentially fragment prior to peptide backbone breakage; 3) the CID-induced cleavage of inter-linked peptides is specific and independent of peptide charges and sequences; 4) DSSO cross-linked peptides can generate characteristic fragmentation patterns in MS/MS spectra that are unique to different types of cross-linked peptides for easy identification; 5) there are unique mass and charge relationships between MS/MS peptide fragment ions and their parent ions, permitting automated data processing. In comparison to existing MS-cleavable cross-linkers (Tang, X., et al.; Zhang, H., et al.; Soderblom, E. J., and Goshe, M. B. et al.; Soderblom, E. J., Bobay, B. G., et al.; and Gardner, M. W., et al.), the DSSO cross-linker can provide a specific and selective fragmentation of cross-linked peptides for identification. The fragmentation patterns of DSSO cross-linked peptides are similar to those of “fixed charge” sulfonium ion containing cross-linked model peptides developed by Lu, Y. et al. Although DSSO does not carry a fixed charge, our results have demonstrated that the preferential cleavage of C—S bond adjacent to the sulfoxide in DSSO is as effective as cleavage of the C—S bond in the sulfonium ion containing cross-linker (i.e. S-methyl 5,5′-thiodipentanoylhydroxysuccinimide) (Lu, Y. et al.). However, fragmentation of the sulfonium ion containing cross-linked peptide requires the formation of a five-membered ring with the sulfonium ion and the amide of the linker such that it is not feasible to change spacer lengths in these cross-linkers. In contrast, the simple fragmentation mechanism gives DSSO the flexibility of changing its spacer lengths to accommodate cross-linking lysines at different distances while maintaining the symmetry of the linker with easily interpretable fragmentation patterns. In addition, DSSO has better potential for studying protein interactions by in vivo cross-linking. It is well known that cross-linking study of protein complexes is extremely challenging due to the inherent limitations of current cross-linkers. With the improvement on database searching of non-cleavable inter-linked peptides, it is possible to identify cross-linked peptides of protein complexes using non-cleavable cross-linkers (Maiolica, A., et al.; and Chen, Z. A. et al.). However, this requires a special program for data interpretation and the false positive rate of identifying inter-linked sequences is higher than that of identifying single sequences. Here the inventors have demonstrated the feasibility of using novel DSSO cross-linking strategy to study the structure of the yeast 20S proteasome complex. This work represents a major advancement in structural elucidation of multi-subunit protein complexes with improved data analysis and accuracy as such application of MS-cleavable cross-linkers has not been reported before.

In addition to the design of this novel MS-cleavable linker, the inventors have developed an integrated data analysis workflow to achieve fast, easy and accurate identification of cross-linked peptides and the cross-linking sites. Identification of DSSO cross-linked peptides from complex mixtures has been accomplished with high confidence by integrating data analyses of three different datasets, MS, MS/MS and MS3 data. Due to the difficulty in interpreting MS/MS spectra of unseparated inter-linked peptides, many of previously reported inter-linked products were determined only based on parent masses. In contrast, all of the inter-linked peptides of cytochrome c, ubiquitin and the yeast 20 S proteasome complex have been identified in this work with three lines of evidence including characteristic fragmentation pairs (Link-Finder), peptide sequence determination by MS3 sequencing (Batch-Tag), and mass mapping (MS-Bridge). This procedure permits the identification of cross-linked peptides with high accuracy, reliability and speed. It is important to note that existing database search programs can be easily adapted for analyzing DSSO cross-linked peptides, thus a broad application of the DSSO-based cross-linking strategy is foreseeable. Furthermore, cross-linked peptides of cytochrome c with two links can be identified, suggesting the capability of the new cross-linking strategy for identifying more complex cross-linked products.

Cross-linking/mass spectrometry has been previously attempted to study the yeast 20S proteasome complex using Ru(II)(bpy)2+/3 tris(2,2′-ipyridyl)ruthenium (II) dication)/ammonium persulfate/light-mediated cross-linking (Denison, C., and Kodadek, T. (2004) Toward a General Chemical Method for Rapidly Mapping Multi-Protein Complexes. J Proteome Res 3, 417-425), in which multiple subunit interconnectivity has been determined based on MS identification of co-migrated subunits by SDS-PAGE after cross-linking. No cross-linked peptides were identified due to complicated chemistry of the radical based cross-linking reaction. Therefore the inventors' work describes the first successful use of a cross-linking/mass spectrometry strategy to determine inter-subunit and intra-subunit interaction interfaces of the yeast 20 S proteasome complex. Although only 13 inter-linked peptides of the yeast 20 S proteasome have been identified and reported here, this work presents the first step toward full characterization of proteasome structures using cross-linking/mass spectrometry in the future. The feasibility of using the DSSO-based cross-linking strategy to identify cross-linked peptides of a large protein complex at 1 μM or less concentration is very significant and of great promise to structural studies of protein complexes since purifying protein complexes at high concentrations is technically challenging.

During LC MSn analysis using LTQ-Orbitrap XL MS, collision energy cannot be adjusted on the fly to account for differences in peptide charge states, therefore compromised collision energy is set during the entire LC MSn run. Thus there exists a possibility that the collision energy may be too high for the highly charged ions while too low for peptides with lower charges. Future improvement on charge selection and energy adjustment during LC MSn data acquisition may be needed to further enhance the quality of the results. Additionally, optimized peptide separation prior to LC MSn analysis will be necessary to improve the dynamic range of peptide analysis and allow the detection of low abundance cross-linked peptides. Moreover, refinement of the Link-Finder program is needed to improve the identification of intra-linked peptides. Lastly, the addition of an affinity tag to the sulfoxide containing cross-linker will improve detection of cross-linked peptides, which will be the subject of our future study.

In summary, the inventors have developed a new MS-cleavable cross-linker family of compounds, including DSSO that are applicable for model peptides, proteins and a multi-subunit protein complex. The unique MS features of DSSO cross-linked peptides together with our integrated data analysis workflow for analyzing LC MSn data greatly reduce the time spent identifying cross-linked peptides. Given its simplicity, speed and accuracy, the inventors believe that this cross-linking strategy will have a broad application in elucidating structures of proteins and protein complexes in the future.

Although embodiments of the present invention have been described in detail herein in connection with certain exemplary embodiments, it will be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements. The invention is limited only by the appended claims and their equivalents.

Claims

1. A MS-cleavable cross-linker for proteins and protein complexes, the crosslinker having two symmetric collision-induced dissociation (CID) cleavable sites and the formula: where X is selected from the group consisting of wherein R is methyl or ethyl, and

2. A MS-cleavable cross-linker as recited in claim 1, having the structure:

3. A method for mapping protein-protein interactions of protein complexes, comprising:

providing a MS-cleavable cross-linker as recited in claim 1;
forming a cross-linked protein complex by cross-linking proteins with the MS-cleavable cross-linker;
forming protein and/or peptide fragments that are chemically bound to the MS-cleavable cross-linker by digesting the cross-linked protein complex with an enzyme such as trypsin; and
using mass spectrometry (MS) and MSn to identify the protein and/or peptide fragments.

4. A method for integrated data analysis workflow for identification of cross-linked peptides, comprising:

providing cross-linked peptides, each cross-linked peptide comprising an MS-cleavable cross-linker;
performing mass spectrometry on the cross-linked peptides to obtain MS data, MS/MS data, and MS3 data;
identifying the MS/MS data comprising characteristic fragmentation profiles of MS-cleavable cross-linked peptides to obtain an MS/MS result comprising a list of parent ions corresponding to cross-linked peptide candidates;
mass mapping the MS data using the list of parent ions corresponding to the cross-linked peptide candidates and the MS-cleavable cross-linker against known protein sequences to obtain an MS result;
peptide sequencing the cross-linked peptides using the MS3 data to obtain an MS3 result; and
integrating the MS result, the MS/MS result, and MS3 result to identify at least one of the cross-linked peptides.

5. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS-cleavable cross-linker is DSSO.

6. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS data is obtained in fourier transform (FT) mode.

7. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS/MS data is obtained in fourier transform (FT) mode.

8. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS3 data is obtained using a linear trap quadrupole (LTQ).

9. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the comparing the MS/MS data is carried out using Link-Finder.

10. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the mass mapping is carried out using Protein Prospector MS-Bridge.

11. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the peptide sequencing is carried out using Protein Prospector Batch-Tag.

12. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, further comprising reformatting the MS3 data such that data from MS3 fragment ions is linked to data from MS/MS parent ions.

13. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the performing mass spectrometry on the cross-linked peptides to obtain the MS data, the MS/MS data, and the MS3 data comprises:

obtaining an MS spectrum;
obtaining an MS/MS spectrum;
obtaining an MS3 spectrum;
extracting the MS data from the MS spectrum;
extracting the MS/MS data from MS/MS spectrum; and
extracting the MS3 data from the MS3 spectrum.

14. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS-cleavable cross-linker has the formula: where X is selected from the group consisting of wherein R is methyl or ethyl, and

Patent History
Publication number: 20160245822
Type: Application
Filed: Oct 29, 2015
Publication Date: Aug 25, 2016
Inventors: Scott D. Rychnovsky (Irvine, CA), Lan Huang (Irvine, CA)
Application Number: 14/927,332
Classifications
International Classification: G01N 33/68 (20060101); H01J 49/42 (20060101); C07D 207/46 (20060101); H01J 49/00 (20060101); G06F 19/16 (20060101); G06F 19/26 (20060101);