Method of verifying the synthesis of organic molecules using nuclear magnetic resonance spectroscopy

- Bruker Analytik GmbH

A NMR method to verify the presence of organic molecular compounds consisting of repetitive occurring substructures is presented. The method comprises the steps of assigning sub-structure codes to the selected compounds, in accordance with the respective starting compounds used, measuring multi-dimensional NMR spectra from at least some of the compounds, uniquely assigning signal groups of NMR spectra to the individual sub-structures, checking the NMR spectra of the compounds for the presence of all assigned signal groups, and characterizing a particular compound as being TRUE if the check of its particular combination of substructures yields the result that the signal groups of substructures contained in its total code had been observed. The method permits rapid and accurate verification of the presence of compounds having repetitive substructures.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description

[0001] This application is a continuation in part of Ser. No. 09/422,639 filed Oct. 22, 1999 and claims Paris Convention Priority of German patent application 198 49 231.6 filed Oct. 26, 1998 the complete disclosure of which are hereby incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] The invention relates to a method of verifying the synthesis of organic molecules using nuclear magnetic resonance spectroscopy.

[0003] A large number of new organic molecules can be automatically synthesized from a smaller number of compounds using the techniques of combinatorial chemistry. These structural parts of the product structure, which result from specific starting compounds are named sub-structures and assigned a sub-structure code. Methods have been proposed for the subsequent verification of the success or failure of the synthesis (see for example “COMBINATORIAL” by A. W. Czarnik, Analytical Chemistry News & Features, pages 378 A to 386 A, Jun. 1, 1998).

[0004] Combinatorial chemistry methods aim at synthesizing using a small number of chemical reactants in all combinations defined by a given reaction scheme to obtain a large number of well-defined products. NMR methods can be used to verify synthesis of these products with high throughput. The assessment of the measured NMR spectra has been conventionally carried out “manually” and mainly intuitively by highly specialized chemists and has also been based on relatively inaccurate model calculations.

[0005] The purity control and structure verification of compound libraries produced by automated synthesis and combinatorial chemistry both play an essential role in the success of medicinal chemistry programs. High performance liquid chromatography (HPLC), mass spectrometry (MS) and liquid chromatography-mass spectrometry (LC-MS) techniques are generally accepted as the most appropriate means of characterization. Although these analytical methods are fast and easy to automate, they do not provide sufficient structural and quantitative data about the desired products.

[0006] Nuclear magnetic resonance (NMR) spectroscopy is the most informative analytical technique and is widely applied in combinatorial chemistry. However, an automated interpretation of the NMR spectral results is difficult. The interpretation can usually be supported by use of spectrum calculation and structure generator programs. Automated structure validation methods rely on 13C NMR signal comparison using sub-structure/sub-spectra correlated databases or shift prediction methods.

[0007] In view of these aspects of prior art, it is the object of the present invention to present an NMR method which permits rapid, reproducible and reliable verification of a large number of molecular products produced by combinatorial chemistry.

SUMMARY OF THE INVENTION

[0008] In combinatorial chemistry, large numbers of compounds are synthesized by systematic combination of a relatively small number of molecules. For example, a three component reaction may involve linking three classes of molecules (building blocks) A, B, C to form a product denoted ABC. Each class may contain several species (A1, A2 . . . Ai; B1, B2 . . . B1; C1, C2 . . . C1). With just 10 building blocks in each class, 1000 different products can be formed. Thus, the structures of the synthesized products can be formally represented as a combination of individual molecular fragments (sub-structures) with one fragment coming from each class of building block. In many cases, a non-variable region (core) occurs in all molecules. A sub-structure code AxByCz defined by the synthesis can be assigned to each product. Both spectroscopic and chromatographic data can be regarded as the sum of data belonging to the sub-structures of a molecule.

[0009] Since synthesized products can be formally represented as a combination of individual molecular fragments, 2D NMR spectra can be regarded as the sum of spectra of sub-structures. The key idea of the invention is to systematically examine 2D C,H correlated NMR spectra and to derive subspectra of the individual molecular fragments. The subspectra are managed as spectral patterns.

[0010] Once the spectral patterns of all individual sub-structures have been defined, all available spectra can be tested for the presence of a particular sub-structure in the synthesized compounds. The proposed structure is verified (true) if all expected fragments are found. If at least one of the expected patterns is not found, then the spectrum is not verified (false). Spectra with a low signal-to-noise ratio, or with large amounts of impurities are automatically assigned an “vague” category and should be checked manually. In the simplest case the verification procedure can be based on the integration of spectral patterns and comparison to an automatically detected noise level. Better results are obtained if a signal (e.g., from the core) can be defined as an internal reference signal to normalize all integrals. A reference spectrum is then defined for each sub-structure pattern. The corresponding integrals of the reference spectrum are defined as 100% and corresponding integral values of all other spectra are re-scaled accordingly. During the verification it is then possible to apply an additional threshold which expresses the minimum signal intensity of identified patterns. Example: A spectrum related to the structure code A1B1C1 would be classified as true if A1, B1, and C1 are identified and at least each integral exceeds 30%.

[0011] These basic principles and strategies can also be applied to 1D NMR spectra. The 1D NMR spectra are translated into 1-dimensional peak lists and clusters. Appropriate changes in various calculation techniques such as peak picking, cluster analysis, spectral pattern definition, noise level estimation, and pattern integration must be made. Again, individual structural fragments yield signals in different spectral regions and spectra are classified as “true” if all requested spectral patterns can be verified. Although there is usually more overlap of signals in the 1D spectra, good verification results can be obtained in many cases.

[0012] More specifically, the invention comprises the following steps:

[0013] (A) Selecting at least two classes of molecules, each class having a plurality of species;

[0014] (B) Assigning a sub-structure code to each species;

[0015] (C) Selecting compounds containing these species;

[0016] (D) Measuring multi-dimensional NMR spectra of at least a subset of the compounds selected in step C);

[0017] (E) Uniquely correlating signal groups in the NMR spectra with individual species and individual sub-structure codes;

[0018] (F) Checking the NMR spectra for the presence of all correlated signal groups; and

[0019] (G) If the check of a particular NMR spectra yields the result that the signal groups of the sub-structures of a particular compound have been observed, this particular compound is characterized to be “true”.

[0020] The sequence of steps (A) to (G) is preferably in the above-mentioned order, but may proceed in a reasonably modified different order. For example, correlation of signal groups in the NMR spectra to individual sub-structures may be effected on the basis of previously obtained information, even prior to step (A). The expression “multi-dimensional NMR spectra” includes one-dimensional spectra.

[0021] In a preferred variant of the inventive method, step (G) is followed by step (H):

[0022] (H) If the check of a particular compound yields the result that at least one signal group of the species contained in the molecule was not observed in the NMR spectra, this compound is characterized “false”.

[0023] In this manner, proper synthesis according to plan is characterized as “true” in step (G) and from the remaining spectra, those molecules where the synthesis did not work out, (at least not completely) are recognized through the absence of at least one of the signal groups in the NMR spectrum. For the combinations characterized “false” one can try to repeat the synthesis using the associated initial chemical substances to confirm that incomplete synthesis was not due to an error occurring during the initial synthesis procedure. Alternatively, particularly in the event of very large libraries of molecules, further observation of the molecules characterized to be “false” can be completely omitted since such molecules are difficult to synthesize.

[0024] In a further improvement, prior to steps (G) and (H), the NMR spectra are examined for a signal to noise ratio and/or a core signal intensity and a combination of sub-structures are characterized as “vague” if the signal to noise ratio or the core signal intensity is less than a certain threshold value.

[0025] The classification of “vague” is generally given when too little substance was available in the sample for the measuring time, leading to poor signal to noise ratios. For spectra exhibiting a core, “vague” results can be associated with core signal intensities which are below a certain threshold value.

[0026] Of particular importance is a variant of the inventive method in which steps (E) and (F) are realized as follows:

[0027] (E1′) Selecting a subset of compounds which includes all species of sub-structures;

[0028] (E2′) Uniquely correlating signal groups in NMR spectra of the subset to individual sub-structures;

[0029] (E3′) If a unique signal group cannot be correlated with each sub-structure, an amended subset of compounds is selected and steps E2′ and F3′ are repeated;

[0030] (F1) Checking of the NMR spectra of all other compounds, not present in the subset, for the presence of all expected signal groups.

[0031] Since the subset of compounds is generally considerably smaller than the entire library of all possible combinations the checking of the NMR spectra for the remaining combinations can be considerably accelerated. Clearly, the prerequisite therefor is that a unique assignment of each sub-structure contained in the subset to a signal group is actually possible. If this should not be the case, the subset has to be changed in step (E3′) and a new attempt for unique assessment in step (E2′) must be pursued. Only when the assignment is unique, can all remaining NMR spectra be checked in step (F′) for the signal groups determined with the assistance of the subset to verify synthesis of the remaining compounds in the library.

[0032] A further embodiment of this preferred variant of the method is characterized in that the subset in step (E1′) is derived through modification of a subset of compounds used in a previous measurement series. Using previous results and assignments as a guide, a new subset of compounds can be selected which is more likely to meet the criterion of a unique assignment of signal groups to the individual sub-structures.

[0033] The number of compounds in the subset is preferably minimized by grouping sub-structures into classes having identical or similar chemical behavior. The smallest possible number of compounds in the subset is equal to the number of sub-structures in the largest class. This measure accelerates verification of the remaining NMR spectra.

[0034] The assignment of signal groups in the NMR spectra of the subset to the individual sub-structures in step (E2′) can also be carried out manually. In this case, no special assessment software is required.

[0035] Alternatively, the assignment can be carried out automatically via computer which is considerably faster than “manual assessment”. Moreover, costly personnel are not required for the actual assessment and the verification is carried out objectively, in a reproducible fashion, and largely error-free.

[0036] In a further preferred variant of the inventive method, the assignment of signal groups to the sub-structures in step (E) is carried out using NMR spectra from a previous measurement series and/or on the basis of theoretically calculated spectral data. Using preexisting information, the assessment can be accelerated by using a selected subset of sub-structure combinations and the assignment of signal groups to the sub-structures can be carried out directly.

[0037] The organic molecules to be synthesized are preferably of low molecular weight, in a molecular weight range of approximately 100 u to approximately 2000 u. This is a mass range which is preferred in combinatorial chemistry. There are sufficient numbers of sub-structures in this mass range for carrying out the NMR measurements. Furthermore, two-dimensional NMR spectra are still relatively easy to analyze in this molecular weight range.

[0038] In a particularly preferred manner, the synthesized organic total molecule contains a section referred to as a “core”. The core is present in all molecules of the library and can be consequently characterized in the NMR spectra by a common signal group. It can be used as an internal reference for normalizing intensities. The core may be added prior to synthesis as an independent reactant, wherein the other sub-structures couple thereto. Alternatively, a core can be formed in a coupling range of the starting compounds themselves as a common section of the compound.

[0039] The core is preferable a sub-structure having between two and six chemical coupling points. In this case, the possible number of combinations remains sufficiently clear. Moreover, a multitude of commercial substances can be used as a core with this kind of core sub-structure.

[0040] In a particularly preferred further development of the method, the sub-structures of a class of molecules are assigned to a respective common coupling point of the core.

[0041] The number of sub-structures should be considerably greater than three to make a combinatorial approach reasonable at all.

[0042] In the most simple case the multi-dimensional NMR spectra are one-dimensional. Preferably, however, the NMR spectra are two-dimensional, in particular 13C/1H correlated spectra (e. g. HSQC spectra=hetero nuclear single quantum coherence, see e.g. J.Magn.Reson. B108, pages 94-98 (1995)). Two-dimensional NMR spectra can be generated in rather short measuring times on the order of minutes with a resolution which is substantially better than that of one-dimensional spectra only. The multi-dimensional NMR spectrum preferably comprises signals stemming from coupling between 1H and 13C nuclei. In this way, the two most important atomic species of organic chemistry are included.

[0043] The assignment of signal groups in the NMR spectra to the individual sub-structures can be carried out particularly easily by formal addition and subtraction of normalized spectra of the associated structure codes. In a computer automated application of the method, the corresponding data can be quickly processed in this manner with the assistance of cluster algorithms.

[0044] In order to reduce the information content of the NMR spectra to the essential relevant features, a further preferred variant of the method provides peak lists which are established from the multi-dimensional NMR spectra to define the signal groups.

[0045] In a particularly easy standard for the recognition of peaks, a data point of the multi-dimensional, preferably two-dimensional, NMR spectrum is recognized as a “peak” if its value is larger than those of the n neighboring data points, wherein e.g. 4≦n≦12, preferably n=8.

[0046] In a preferred further development, neighboring peaks are combined into clusters and are assessed by means of cluster analysis, wherein one or more clusters are assigned to a given sub-structure as a signal group. In this manner, a two-dimensional definition of the signal groups is possible. This method is insensitive to the fine structure of the individual peaks, which can be neglected. The analysis of the cluster as such is described e.g. in K.-P. Neidig et al., Journal of Magnetic Resonance 89, pages 543 to 552 (1990).

[0047] In a particularly preferred further development, the method assigns a cluster surface to each cluster in the two-dimensional NMR spectrum (more generally: a hyper surface in a multi-dimensional spectrum) and a sub-structure is regarded as recognized if, for all cluster surfaces assigned thereto, the NMR signal integrated over the cluster surface exceeds a predetermined threshold value. In this way, a highly reliable pattern recognition of sub-structures in the NMR spectrum is possible.

[0048] The threshold value can be defined as a normalized constant. The threshold value is preferably chosen normalized to the integral of the NMR signal over cluster surfaces assigned to other sub-structures. This enables normalization of the respective signal group and is particularly useful for a core sub-structure. The threshold value is then independent of the absolute intensities of the spectra.

[0049] In a further preferred variant of the method, a table is established in steps (G) to (I) to display the results of analysis of NMR spectra measured for the remaining compounds in the library, recording sub-structures (columns) and whether they were recognized (+) or not (−). In three additional columns, the sum of the recognized sub-structures, a total assessment (“true” or “false”, possibly “vague”) and the required combined sub-structure code is indicated. In this manner, the total result of the combinatorial measuring series can be conveniently summarized.

[0050] Further advantages of the invention can be derived from the description and the drawing. The features mentioned above and below can be used individually or in any arbitrary combination. The embodiments shown and described are not to be understood as exhaustive enumeration but rather have exemplary character for illustrating the invention.

[0051] The invention is shown in the drawing and is further explained by means of an embodiment.

BRIEF DESCRIPTION OF THE DRAWING

[0052] FIG. 1a shows a schematic representation of an organic molecular compound formed from the sub-structures Ax+By+Cz wherein the sub-structures form a common core;

[0053] FIG. 1b shows a schematic representation of an organic molecular compound formed from the sub-structures Ax+By+Cz and a core molecule;

[0054] FIG. 2 shows the sub-structures used in the spectra of FIGS. 3 to 7

[0055] FIG. 3 through 7 each show an NMR spectrum of a combination of three sub-structures Ax, By, Cz with a core molecule, namely

[0056] FIG. 3 A2+B2+C2;

[0057] FIG. 4 A2+B1+C1;

[0058] FIG. 5 A2+B1+C3;

[0059] FIG. 6 A2+B2+C3;

[0060] FIG. 7 A1+B2+C2;

[0061] FIG. 8 shows a second example of the invention, having a library of ninety-six 4-phenylbenzopyrans generated in a three component reaction;

[0062] FIG. 9 shows how linear combinations of spectra can be used to extract pattern box C3;

[0063] FIG. 10 shows decomposition of a 2D HSQC spectrum of a compound into subspectra corresponding to each of the three molecular fragments A2, B1, and C1;

[0064] FIG. 11 shows a 1D spectrum of a synthesis product;

[0065] FIG. 12 illustrates synthesis of 4-phenylbenzopyran library 1; and

[0066] FIG. 13 illustrates results of automated NMR analysis in accordance with the invention in comparison to ESIMS, and HPCL analysis.

DESCRIPTION OF THE PREFERRED EMBODIMENT

[0067] Chemical compounds of a combinatorial series are particularly useful for automatic or at least partially automatic interpretation, since the signals of the structures in the combinatorial series can be separated formally into a core module, which is identical for all members of the series, and into a few variable module classes which are varied systematically via a limited number of structural fragment species in the class.

[0068] FIG. 1a shows such a compound in a highly schematic fashion. The combined organic molecule consists of three sub-structures Ax, By and Cz which form a common core section in the range of their mutual connections (dashed lines in FIG. 1a).

[0069] FIG. 1b shows an alternative in which the combined organic molecule is formed with its own core sub-molecule and having three attached sub-structures Ax, By and Cz.

[0070] These combined molecules can be described by structure codes which consist of a sub-structure class with a corresponding sub-structure index Ax, By and Cz etc. The indices x, y, z each represent a species or sub-structure and are successive integers (1, 2, 3 . . . )

[0071] Such sub-structure elements can be identified as signals or signal groups in two-dimensional HSQC spectra as shown below. The examples shown in FIGS. 3 to 7 are chemical substances represented in FIG. 2. Referring to FIG. 2 one can define: 1 (1) 4-nitrophenyl = B2 (2) phenyl = B1 (3) 3,4.methylenedioxy-phenoxy = A2 (4) 3-hydroxy,4-bromo-phenoxy = A1 (5) tert-butyloxycarbonyl-piperazyl = C2 (6) morpholinyl = C1 (7) 2-methoxy-piperazyl = C3

[0072] The results of NMR experiments are shown in FIGS. 3 to 7. The spectra represent two-dimensional so-called HSQC (hetero nuclear single quantum coherence) experiments. Applied to protons and carbon (13C), the signals in those spectra show the correlation between carbon atoms and protons chemically bound thereto, i.e. the carbon signals in the direction &dgr;1 and the proton signals in the direction &dgr;2.

[0073] Identification of signal groups belonging to a certain sub-structure can be carried out manually or automatically. For automatic analysis, one performs formal algebraic additions and subtractions on the spectra associated with specific structure codes to isolate signals originating from a particular sub-structure. For analysis purposes one can assign the value “1” to each sub-structure present in a particular combination and use a threshold to extract a particular sub-structure. Consider the following sub-structure combinations:

[0074] A2 B2 C2

[0075] A2 B1 C1

[0076] A2 B1 C3

[0077] A2 B2 C3

[0078] A1 B2 C2

[0079] The addition of

[0080] A2 B1 C3

[0081] A2 B2 C3

[0082] and

[0083] subtraction of

[0084] A2 B2 C2

[0085] A2 B1 C1

[0086] A1 B2 C2

[0087] yields the following sums for the sub-structures

[0088] A1=−1

[0089] A2=0

[0090] B1=0

[0091] B2=−1

[0092] C1=−1

[0093] C2=2

[0094] C3−2

[0095] If one sets the threshold value at 2, only C3 remains.

[0096] The general rule is as follows: Add all N structure codes which contain the desired sub-structure, subtract others, and set the threshold value to <=N (e. g. N/2)

[0097] This formal operation can be carried out in practice on peak lists obtained from the corresponding spectra. A data point is thereby recognized as a two-dimensional peak if its value is larger than each of its 8 nearest neighbors. Neighboring peaks can be combined by means of a cluster analysis which evaluates distances and intensities to form groups (clusters).

[0098] When a peak of a spectrum is added to the peaks of another spectrum, it is included in the associated list with an increase in intensity for all peaks which are within a pre-defined radius.

[0099] When a peak of a spectrum is subtracted from the peaks of another spectrum, it is removed from the respective list and the intensities of all peaks which are within a pre-defined radius are reduced.

[0100] The result is a list of peaks which originate from the signals of the desired sub-structure. Since these signals may be slightly different in various spectra, the peaks appear several times and in groups. The groups or clusters are determined by a cluster analysis.

[0101] The signals obtained for the desired sub-structure are represented by small rectangles in the spectra, with each rectangle containing exactly one cluster. The width and height of these areas correspond to the expected variations of the signals in the given set of spectra.

[0102] When the remaining spectra are checked, integration of all sub-structures is carried out. (Summation of all corresponding data points). Furthermore, for each spectrum, a pre-defined area which does not contain any signals is integrated and a noise value is calculated therefrom. The noise value is subtracted from all integrals.

[0103] Signals of the “core” sub-structure can be defined as a reference and integrated separately. The integral ratios between all areas of all sub-structures can also be calculated.

[0104] A sub-structure is regarded as recognized if all of its areas have an integration value >0. A sub-structure can also be regarded as recognized if all its areas exceed a defined integration value, compared to a reference value.

[0105] A sub-structure could also be regarded as recognized if all ratios of the integrals of all of its areas to the integrals of all other areas of all other sub-structures exceed a defined value.

[0106] In the embodiment of FIGS. 2 through 7, x=2, y=2 and z=3. This results in 2×2×3 =12 possible combinatorial combinations (Ax By Cz). The minimum subset for correlating the signals of each sub-structure would have to comprise at least three molecules to assure that C1, C2 and C3 are all present.

[0107] The results of the measurements is summarized in the following table: 2 Combinations A2B2C2 A2B1C1 A2B1C3 A2B2C3 A1B2C2 A1 − − − − + A2 + + + + − B1 − + + − − B2 + − − + + C1 − + − − − C2 + − − − + C3 − − + + − Core + + + + + Sum 4 4 4 4 4 Result + + + + +

[0108] FIGS. 8 to 14 illustrate results for another library. FIG. 8 shows ninty-six 4-phenylbenzopyrans generated in a three component reaction. For x=6, y=4 and z=4, x x y x z=96 different compounds with the sub-structure codes AxByCz are obtained. Library 1 is characterized by x+y+z+core=15 different structural fragments and a subset of six of the ninety-six compounds contain all structural fragments (e.g. A1B1C1, A2B2C2, A3B3C3, A4B4C4, A5B1C3 and A6B2C4).

[0109] FIG. 9 illustrates a linear combination of spectra to extract pattern box C3. Signals are peak picked and transformed into peak areas. Overlapping peak areas of spectra containing the structural fragment code C3 are added (counted) and peak areas of spectra not containing C3 are subtracted. The threshold is adjusted so that only peak areas of C3 remain, and after a clustering step, boxes are defined for each remaining peak area.

[0110] FIG. 10 illustrates decomposition of a 2D HSQC spectrum of a compound into sub-spectra corresponding to each of the three molecular fragments A2, B1, and C1. The width and height of the boxes indicate the expected range of chemical shift for the signals of a given fragment. A spectral pattern is defined by the combination of the corresponding boxes. In FIG. 10A the spectral patterns of each fragment are found and the structure of the expected compound A2B1C1 is therefore validated. In FIG. 10B the structure of compound A2B1C1 is not verified because the spectral patterns of both A2 and C1 are missing.

[0111] FIG. 11 shows a 1D spectrum of a synthesis product. Different signals are related to different molecular fragments.

[0112] FIG. 12 illustrates synthesis of 4-phenylbenzopyran library 1.

[0113] FIG. 13 illustrates results of the automated NMR method of the invention in comparison to an ESIMS, and HPLC analysis. Each cell contains the expected structure code, the final assignment, and the data for NMR (top left), ESIMS (top middle), and HPLC (top right). Light gray coloration means that the proposed structure is “true” in NMR, gives the expected molecular ion in ESIMS, and shows the expected retention time in HPLC. Dark grey means that the proposed structure is “false” following NMR, does not give a diagnostic molecular ion in ESIMS, or the retention time differs from the expected one. White is given for “vague” results in both NMR and ESIMS. HPLC purity is given in % (top right). Combined results are given in the structure code field (light grey: “true”, dark grey: “false”, white: “vague”). The classification “true” of the HPLC analysis was not taken into consideration for the final assignment. Contradictory results lead to the final category “vague”. Eighteen compounds were not obtained by the synthesis procedure (B10, C1, C6, C12, D1, D3, D4, D8, D9, E12, F12, G1, G9, G11, H1, H7, H8, H11).

[0114] The 4-phenylbenzopyran library 1 was synthesized using a multi-component reaction by the combination of phenols, unsaturated aldehydes and secondary amines (FIG. 12). The products were purified before analysis. The 1H NMR and 2D HSQC spectra of the ninety-six 4-phenylbenzopyrans were measured using standard NMR probes (5-mm) within sixteen hours.

[0115] The software analysis of the spectra includes the following steps:

[0116] 1. Enter into the software:

[0117] a) list of codes for the possible molecular fragments involved in the combinatorial reaction

[0118] b) the paths to the recorded spectra and the associated structure codes.

[0119] 2. Perform calculation step to define the integration boxes for each molecular fragment. Output are boxes assigned to each fragment.

[0120] 3. Perform calculation step to determine appropriate reference spectra.

[0121] 4. Perform calculation step to integrate all spectra. As an output a graphical display in rack format, optionally using three colors (red, green, and yellow) to characterize the samples is shown and a textual result list is written on disk.

[0122] The following table summarizes verification results for the example of FIGS. 8 through 14. 3 NMR result A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 C1 C2 C3 C4 A1B1C1 + + − − − − − + − − − + − − − A1B2C1 + + − − − − − − + − − + − − − A1B3C1 − + − − − − − − − − − + − − − A1B4C1 ? + − − − − − − − − + − − − − A1B1C2 + + − − − − − + − − − − + − − A1B2C2 + + − − − − − − + − − − + − − A1B3C2 − + − − − − − − + − − − + − − A1B4C2 − + − − − − − − − − + − − − − A2B1C1 + − + − − − − + − − − + − − − A2B2C1 + − + − − − − − + − − + − − −

[0123] In the columns labeled with fragment codes, the “+” and “−” entries indicate whether or not the corresponding spectral pattern was identified in a given spectrum. The column labeled NMR results indicates whether the structure is verified (+), false (−) or vague (?). For example, for compound A1B3C2 pattern A1, B2 , and C2 were identified and the compound was assigned false. In this case the sample has been exchanged and the correct structure code would be A1B2C2.

Claims

1. A nuclear magnetic resonance (NMR) method to verify the presence of compounds from a library of organic molecular compounds, the compounds consisting of molecular sub-structures, the method comprising the following steps:

(A) selecting at least two classes of molecular sub-structures, each class having a plurality of sub-structure species;
(B) assigning a sub-structure code to each species, said code comprising a sub-structure class and a sub-structure index;
(C) selecting compounds containing these species;
(D) measuring multi-dimensional NMR spectra of at least a subset of the selected organic molecular compounds in the library;
(E) uniquely assigning signal groups in said NMR spectra to individual sub-structures and individual sub-structure codes;
(F) checking said NMR spectra for the presence of all assigned signal groups;
(G) if said checking of a particular spectrum indicates that the signal groups of sub-structures contained in one compound had been observed, this particular compound is characterized to be TRUE.

2. The method of claim 1, comprising the additional step:

(H) if the check of a particular compound yields the result that at least one of the signal groups of the sub-structures contained in a total code of that compound had not been observed, this particular combination is characterized to be FALSE.

3. The method of claim 2, comprising the additional steps of examining, prior to steps (G) and (H), the NMR spectra for at least one of a signal to noise ratio and a core signal intensity and characterizing a combination of sub-structures as VAGUE if at least one of said signal to noise ratio and said core signal intensity is less than a threshold value.

4. The method of claim 1, wherein the steps (E) and (F) comprise:

(E1′) selecting of subset of compounds, wherein each sub-structure is present in at least one of the compounds in the subset;
(E2′) uniquely assigning signal groups of the NMR spectra of said subset to individual sub-structures;
(E3′) if it is not possible to uniquely assign one signal group to each sub-structure, selection of a modified subset is performed and the process repeated starting from step (E2′); and
(F′) checking NMR spectra of remaining compounds, that do not belong to said subset, for presence of all assigned signal groups.

5. The method of claim 2, wherein the steps (E) and (F) comprise:

(E1′) selecting a subset of compounds, wherein each sub-structure is present in at least one compound;
(E2′) uniquely assigning signal groups of the NMR spectra of said subset to individual sub-structures;
(E3′) if it is not possible to uniquely assign one signal group to each sub-structure, a selection of a modified subset of compounds is performed and the process repeated starting from step (E2′);
(F′) checking NMR spectra of remaining compounds, that do not belong to said subset, for a presence of all assigned signal groups.

6. The method of claim 3, wherein the steps (E) and (F) comprise in detail:

(E1′) selecting a subset of compounds, wherein each sub-structure is present in at least one compound;
(E2′) uniquely assigning signal groups of NMR spectra of said subset to individual sub-structures;
(E3′) if it is not possible to uniquely assign one signal group to each sub-structure, a modified subset is selected and the procedure repeated starting from step (E2′);
(F′) checking NMR spectra of remaining compounds, that do not belong to said subset, for the presence of all assigned signal groups.

7. The method of claim 4, wherein said subset of step (E1′) is derived by modifying a subset of compounds of an earlier measuring series.

8. The method of claim 5, wherein said subset of step (E1′) is derived by modifying a subset of compounds of an earlier measuring series.

9. The method of claim 6, wherein said subset of step (E1′) is derived by modifying a subset of compounds of an earlier measuring series.

10. The method of claim 4, wherein a number of compounds contained in said subset is minimized.

11. The method of claim 5, wherein a number of compounds contained in said subset is minimized.

12. The method of claim 9, wherein a number of compounds contained in said subset is minimized.

13. The method of claim 1, wherein the organic molecular compounds have molecular weights in a range from 100 u to 2000 u.

14. The method of claim 3, wherein the organic molecules have molecular weights in the range from 100 u to 2000 u.

15. The method of claim 1, wherein a common core is present in all compounds.

16. The method of claim 3, wherein a common core is present in all compounds.

17. The method of claim 15, wherein said core is a sub-structure with between 2 and 6 binding sites.

18. The method of claim 1, wherein a number of sub-structures is between 5 and 500.

19. The method of claim 1, wherein said multi-dimensional NMR spectrum is a two-dimensional 1H and 13C correlated spectrum.

20. The method of claim 1, wherein said assignment of signal groups in the NMR spectra to individual sub-structures is achieved by formal addition and subtraction of normalized structure codes.

21. The method of claim 3, further comprising generating peak lists of multi-dimensional NMR spectra for definition of signal groups.

22. The method of claim 21, wherein a data point of a multi-dimensional NMR spectrum is recognized to be a peak if its intensity is larger than that of n neighboring data points, wherein 4≦n≦12.

23. The method of claim 22, wherein neighboring peaks are combined into clusters and are analyzed by means of cluster analysis, wherein one or more clusters are assigned to the sub-structures as signal groups.

24. The method of claim 23, wherein a cluster area is assigned to each cluster inside a two-dimensional NMR spectrum and wherein a particular sub-structure is recognized to be identified if an NMR signal, integrated over said cluster area, is greater than a predetermined limit for all cluster areas assigned to said particular sub-structure.

25. The method of claim 24, wherein said limit is a ratio to an integral of NMR signal over cluster areas which are assigned to other sub-structures.

26. The method of claim 3, wherein, in steps (G) to (I), a table is created containing symbols indicating, for all sub-structures (columns), whether they were identified (+) in the NMR spectra or not (−), and wherein, in three additional columns, a sum of the identified sub-structures, a total classification (TRUE, FALSE, VAGUE), and a required total code are displayed.

27. The method of claim 25, wherein, in steps (G) to (I), a table is created containing symbols indicating, for all sub-structures (columns), whether they were identified (+) in the NMR spectra or not (−), and wherein, in three additional columns, a sum of the identified sub-structures, a total classification (TRUE, FALSE, VAGUE), and a required total code are displayed.

28. A nuclear magnetic resonance method to verify the presence of organic molecular compounds in a library of compounds having molecular weights in a range from 100 U to 2000 U and having between 5 and 500 sub-structures of interest, the method comprising the following steps:

(A) selecting at least two classes of molecular sub-structures, each class comprising a plurality of sub-structure species;
(B) assigning a sub-structure class and a sub-structure code to each species;
(C) selecting organic molecular compounds of the library containing the sub-structures, wherein a total code is assigned to each compound in the library;
(D) measuring n-dimensional (1≦n≦3) NMR spectra, comprising at least one of 1H and 13C, of at least some of the selected organic molecular compounds in the library;
(E1′) selecting a subset of compounds, wherein each sub-structure is present in at least one compound in said subset, wherein a number of compounds contained in said subset is minimized;
(E2′) uniquely assigning signal groups of the NMR spectra to individual sub-structures by formal addition and subtraction on respective structure codes;
(E3′) if it is not possible to uniquely assign one signal group to each sub-structure, a modified subset of compounds is selected and the process repeated starting from step (E2′);
(F′) checking the NMR spectra of remaining compounds, that do not belong to said subset, for a presence of all assigned signal groups;
(G) if said checking of a particular compound indicates that exactly signal groups of sub-structures contained in said compound had been observed, this particular compound is characterized to be TRUE;
(H) if said checking of a particular compound indicates that at least one of signal groups of sub-structures contained in that compound had not been observed, this particular compound is characterized to be FALSE;
(I) if said checking of a particular compound indicates that neither (G) nor (H) is the case, this particular compound is characterized to be VAGUE;

29. The method of claim 28, further comprising constructing peak lists of multi-dimensional NMR spectra for definition of signal groups.

30. The method of claim 29, wherein a data point of an n-dimensional NMR spectrum is recognized to be a peak if its intensity is larger than that of n neighboring data points, with 4≦n≦12.

31. The method of claim 30, wherein neighboring peaks are combined into clusters and are analyzed by means of cluster analysis, wherein one or more clusters are assigned to sub-structures as signal groups.

32. The method of claim 31, wherein a cluster range is assigned to each cluster inside an n-dimensional NMR spectrum and wherein a particular sub-structure is recognized to be identified if an NMR signal, integrated over a cluster range, is greater than a predetermined limit for all cluster ranges assigned to that particular sub-structure.

33. The method of claim 32, wherein said limit is a ratio to an integral of NMR signal over cluster ranges which are assigned to other sub-structures.

34. The method of claim 33, wherein, in steps (G) to (I), a table is created containing symbols indicating, for all sub-structures (columns), whether they were identified (+) in the NMR spectra compounds not present in said subset or not (−), and wherein, in three additional columns, a sum of identified sub-structures, a total classification (TRUE, FALSE, VAGUE), and an associated total code are displayed.

Patent History
Publication number: 20020001816
Type: Application
Filed: Jun 26, 2001
Publication Date: Jan 3, 2002
Applicant: Bruker Analytik GmbH (Rheinstetten)
Inventors: Harald Shroeder (D-79576 Weil am Rhein), Klaus-Peter Neidig (D-76275 Ettlingen)
Application Number: 09888596