COMPUTATIONAL FILTERING OF METHYLATED SEQUENCE DATA FOR PREDICTIVE MODELING
Computational techniques are disclosed for using methylation profiles to classify the medication condition of a person. Initial sequence data is obtained containing sequences of an initial set of nucleic acids from a biological sample of a person. The initial sequence data is filtered to generate filtered sequence data that describes sequences of a filtered subset of nucleic acids from the biological sample. A methylation profile is determined for the filtered subset of nucleic acids from the biological sample. The methylation profile can be processed to determine a likelihood that the person has the specified medical condition. The system outputs an indication of the likelihood that the person has the specified medical condition.
This application claims the benefit of priority to U.S. Application Ser. No. 62/832,157, filed on Apr. 10, 2019, U.S. Application Ser. No. 62/882,215, filed on Aug. 2, 2019, U.S. Application Ser. No. 62/928,156, filed on Oct. 30, 2019, U.S. Application Ser. No. 63/007,204, filed on Apr. 8, 2020, U.S. Application Ser. No. 63/007,208 filed on Apr. 8, 2020, and U.S. Application Ser. No. 63/007,218, filed on Apr. 8, 2020. The disclosures of each of these applications are considered part of the disclosure of the present document, and are each incorporated by reference in their entireties.
GOVERNMENT FUNDING STATEMENTThis invention was made with government support under grant number HD068578 awarded by the National Institutes of Health. The government has certain rights in the invention.
BACKGROUND 1. Technical FieldThis document relates to systems and methods for classifying a condition of a mammal (e.g., a human) using predictive models (e.g., machine-learning models) that process methylation patterns of DNA obtained from a biological sample. Certain implementations of the techniques described herein employ computational methods to perform filtering operations on an input data set that can improve efficiency of the prediction or classification task while increasing model sensitivity.
2. Background InformationMachine-learning involves the analysis of data samples to identify features and patterns in the data samples that can be employed to perform tasks such as classification and prediction without explicit instructions. Some machine-learning techniques determine a model for performing the desired task, such as a neural network, a regression model, a decision tree, a support vector machine, and a naïve Bayes machine.
Methylation patterns in a mammal's DNA have been correlated with certain medical conditions or phenotypes of the mammal. However, data sets describing DNA sequences and/or methylation patterns are typically quite large and can be computationally expensive to process.
SUMMARYThis document describes systems, methods, devices, and other techniques for training and applying models to classify a medical condition of a person or other mammal based on methylation patterns occurring in DNA sequences of the person. In some aspects, the disclosed techniques employ a filtering operation to enrich a subset of sequences represented in an initial data set based on methylation characteristics and/or copy number characteristics of the nucleic acids. The filtering operation can, in some embodiments, achieve advantages including reducing the size of the input data set (e.g., a methylation profile) provided to the classifier (e.g., a machine-learning model), decreasing the time and computational expense required to process the input data set, and/or improving the sensitivity of the model to patterns that have the highest predictive power in differentiating persons with normal from abnormal medical conditions.
In some aspects, the disclosed techniques can involve identifying a set of reference CpG sites of abnormal individuals. Computing systems can estimate either restricted reference component methylomes or mixture methylomes that are independent linear combinations of certain reference component methylomes. The proportions of these components at the reference CpG sites for the tested biological samples can further be estimated, and the system can then predict the methylation level of the tested biological samples at a target set of CpG sites under the hypothesis that the sample is from a normal individual. The predicted methylation levels can then be compared against the observed methylation levels, and a classification for the individual as either exhibiting a normal or abnormal condition with respect to the specified medical condition can then be determined.
Further implementations of the disclosed subject matter include methods performed by a computing system. The system can obtain initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from multiple different tissues of the person. The system can filter the initial sequence data to generate filtered sequence data that describes sequences of a filtered subset of nucleic acids from the biological sample. Filtering can include (i) selecting target nucleic acids from the initial set of nucleic acids based on at least one of a methylation characteristic or a copy number characteristic of the target nucleic acids and (ii) enriching the target nucleic acids in the filtered subset. A methylation profile can be determined for the filtered subset of nucleic acids from the biological sample. The system processes the methylation profile for the filtered subset of nucleic acids to determine a likelihood that the person has a specified medical condition, and an indication of the likelihood that the person has the specified medical condition can then be provided as an output of the system.
These and other implementations can further include one or more of the following features.
The system can identify a pre-defined set of genomic regions (i.e., genomic loci). Selecting target nucleic acids from the initial set of nucleic acids can include comparing nucleic acid sequences from the initial set of nucleic acids to sequences from the pre-defined set of genomic regions. Enriching the target nucleic acids in the filtered subset can include discarding nucleic acid sequences from the initial sequence data that are not among the sequences from the pre-defined set of genomic regions, while retaining nucleic acid sequences from the initial sequence data that are among the sequences from the pre-defined set of genomic regions. At least a first subset of the pre-defined set of genomic regions can be defined based on those regions in the first subset exhibiting a minimum level of stability with respect to at least one of the methylation characteristic or the copy number characteristic in a population of individuals.
The biological sample can be plasma, and the initial set of nucleic acids can include cell-free DNA in the plasma.
The method can further include actions of identifying a set of restricted reference component methylomes in the initial set or filtered subset of nucleic acids; identifying a set of reference component methylomes; determining a proportion of the reference component methylomes at a reference set of CpG sites in the initial set or filtered subset of nucleic acids; generating predictions of methylation levels at a target set of CpG sites in the initial set or filtered subset of nucleic acids; comparing the predictions of methylation levels at the target set of CpG sites to observed methylation levels; and determining whether the person likely has or does not have the specified medical condition based on the comparison.
The biological sample can be a stool sample.
The biological sample can be cerebrospinal fluid.
The initial set of nucleic acids can be treated to facilitate detection of methylated sites before sequencing.
The specified medical condition can be ovarian cancer, endometriosis, necrotizing enterocolitis, fetal aneuploidy, preeclampsia, or a brain condition.
The methylation profile for the filtered subset of nucleic acids can indicate, for each of a set of multiple genomic loci, a methylation level of the locus. The genomic loci can be a CpG site, CpG island, differentially methylated region (DMR), promoter region, enhancer region, or CpG island shore.
Determining the likelihood that the person has the specified medical condition can include determining a probability that the person has the specified medical condition.
Determining the likelihood that the person has the specified medical condition can include generating a binary indication that the person either likely has the specified medical condition or likely does not have the specified medical condition.
Processing the methylation profile can include providing data representing the methylation profile as input to a machine-learning model, and obtaining the likelihood, or a value from which the likelihood is derived, as an output of the machine-learning model.
The machine-learning model can include at least one of a classifier, an artificial neural network, a support vector machine, a decision tree, or a regression model.
The machine-learning model can define reference or predicted methylation profiles against which the methylation profile for the filtered subset are compared to determine the likelihood that the person has the specified medical condition.
The determined likelihood that the person has the specified medical condition can be used by a medical provider to assess whether to perform additional diagnostic testing on the person.
The determined likelihood that the person has the specified medical condition can be used by a medical provider to at least one of diagnose the person or treat the person for the specified medical condition.
Outputting the indication of the likelihood that the person has the specified medical condition can include at least one of presenting the indication on an electronic display, audibly playing the indication through a speaker, storing the indication in a memory of a computing system for subsequent retrieval, or transmitting the indication in an electronic message to one or more users.
Enriching the target nucleic acids in the filtered subset can include generating the filtered subset so that a fraction of the target nucleic acids that occur in the filtered subset is greater than a fraction of the target nucleic acids that occur in the initial set of nucleic acids.
The filtered subset can consist exclusively of the target nucleic acids. Alternatively, the filtered subset can include both the target nucleic acids and non-targeted nucleic acids.
Some implementations include yet another method performed by a computing system. The method can include actions of obtaining initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a multiple different tissues of the person; filtering, by the computing system, the initial sequence data to identify a first subset of sequences from the initial sequence data that correspond to a first pre-defined set of genomic regions; filtering, by the computing system, the initial sequence data to identify a second subset of sequences from the initial sequence data that correspond to a second pre-defined set of genomic regions; processing, by the computing system, data that includes an observed methylation profile of the first subset of sequences to generate a predicted methylation profile of the second subset of sequences; comparing, by the computing system, an observed methylation profile of the second subset of sequences to the predicted methylation profile of the second subset of sequences to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the second subset of sequences and the predicted methylation profile of the second subset of sequences meets a minimum difference criterion; and outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition.
These and other implementations can further include one or more of the following features.
The first pre-defined set of genomic regions can be regions that exhibit a minimum level of stability with respect to at least one of a methylation characteristic or a copy number characteristic in a population of individuals. The second pre-defined set of genomic regions can be regions that exhibit a minimum difference with respect to at least one of the methylation characteristic or the copy number characteristic between a first sub-population of individuals who have the specified medical condition and a second sub-population of individuals who do not have the specified medical condition.
The first pre-defined set of genomic regions can be a first reference set of genomic regions, and the second pre-defined set of genomic regions can be a first target set of genomic regions. The actions can further include selecting the first reference set of genomic regions as the first pre-defined set of genomic regions from a database that includes multiple reference sets of genomic regions, wherein different ones of the multiple reference sets of genomic regions correspond to different medical conditions; and selecting the first target set of genomic regions as the second pre-defined set of genomic regions from the database, wherein the database further includes a multiple target sets of genomic regions, wherein different ones of the multiple target sets of genomic regions correspond to different medical conditions.
The specified medical condition is preeclampsia, endometriosis, ovarian cancer, necrotizing enterocolitis, or a brain condition.
Some implementations include yet another method performed by a computing system. The method can include actions of obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from multiple different tissues of the person; filtering, by the computing system, the initial sequence data to identify a target subset of sequences from the initial sequence data that correspond to a pre-defined set of genomic regions; comparing, by the computing system, an observed methylation profile of the target subset of sequences to a pre-defined methylation profile to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the target subset of sequences and the pre-defined methylation profile meets a minimum difference criterion; and outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition.
Additional aspects of the disclosed subject matter includes a computing system having one or more processors and one or more computer-readable media having instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to perform actions of any of the methods/processes disclosed herein. Further aspects include one or more computer-readable media having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to perform actions of any of the methods/processes disclosed herein.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.
Environment 100 includes a sample analyzer 104 that processes and sequences biological samples 150 from a person. In some implementations, the system is configured to perform “liquid biopsies” on liquid-based biological samples 150 from a person, e.g., plasma samples, stool samples, saliva samples, cerebrospinal fluid samples, urine samples, or cervical swab samples. The plasma (or other biological samples) may include cell-free DNA from the person (and, in some cases, cell-free DNA from a fetus if the person is a pregnant female). Cell-free DNA can originate from various tissues in a person's body. For example, cell-free DNA in a biological sample 150 can include fragments of DNA that were released from cells as a result of processes such as active secretion, necrosis, apoptosis, or a combination of these. The level of cell-free DNA in a sample 150 originating from certain tissue(s) can be correlated with a given medical condition (e.g., disease). Moreover, the methylation patterns of cell-free DNA originating from certain tissue(s), and which occur in the biological sample 150, can be correlated with a medical condition (e.g., disease) of the person. To analyze the level of cell-free DNA or other nucleic acids associated with a specified medical condition, and to analyze the methylation patterns of these nucleic acids, the sample analyzer 104 may process the biological sample 150 to generate initial sequence data for the fragments of extracellular DNA and/or other nucleic acids in the sample 150.
The initial sequence data describes sequences of nucleotides occurring in nucleic acids from the sample 150, e.g., sequences of bases along fragments of cell-free DNA. Typically, the initial sequence data includes sequence descriptions for both targeted nucleic acids and background (non-targeted) nucleic acids in the biological sample 150. The targeted nucleic acids are those that are deemed significant to detection of a specified medical condition, while the background nucleic acids are deemed insignificant or less significant to detection of the specified medical condition. The mixture of targeted and background nucleic acids may vary for different medical conditions. For example, some nucleic acids may be classified in the target set for detection of ovarian cancer, while those same acids may be classified in the background set for detection of necrotizing enterocolitis. In some implementations, the set of target nucleic acids that are deemed significant to a particular medical condition are those originating from particular tissue(s) affected by a specified medical condition. For instance, the target nucleic acids associated with endometriosis may include or consist of cell-free DNA from the endometrium or uterus. Likewise, the target nucleic acids associated with necrotizing enterocolitis may be based on intestinal tissue. Often, the fraction of targeted nucleic acids in the biological sample is small in relation to the fraction of background nucleic acids, and the fraction of sequences reflected in the initial sequence data for targeted nucleic acids may also be small in relation to sequences of background nucleic acids. As a result, the initial sequence data may contain substantial levels of “noise” relative to the signals present in the sequences for targeted nucleic acids, which can degrade the performance of models in predicting whether or not a patient likely has a specified medical condition.
The sample analyzer 104 is configured to sequence nucleic acids in the biological sample 150 using any suitable technique, including polymerase chain reaction (PCR)-based methods such as droplet digital PCR, or next-generation sequencing (NGS). In some implementations, nucleic acids from the biological sample 150 undergo bisulfite conversion before sequencing in order to facilitate subsequent detection of methylated sites. Bisfulite treatment has the effect of converting unmethylated cytosines (C) to uracil (U), which in turn are converted to thymine (T) in the course of DNA amplification. In contrast, bisulfite treatment does not affect methylated cytosines (C). As a result, bisulfite conversion enables differentiation of methylated from non-methylated cytosines, which appear as different bases in the sequencing data. Methylation arrays, methylation-specific PCR, enrichment, and/or additional methods may also or alternatively be applied.
Sample analyzer 104 can output initial sequencing data for a biological sample 150 for receipt by a user's computing system 102. The user's computing system 102 can comprise one or more computers in one or more locations. System 102 can be, for example, a desktop computer, notebook computer, tablet computer, or smartphone. System 102 includes a network interface for communicating over one or more networks 106 such as a local area network (LAN), a wireless LAN (WLAN), the Internet, or a combination of these. System 102 may include peripherals such as a keyboard, pointing device, and display screen to enable user interaction with the system 102. In some implementations, the user may coordinate activities at the system 102 for obtaining a classification result related to a specified medical condition based on sequencing data from biological sample 150. For example, upon obtaining initial sequencing data from sample analyzer 104, the user may instruct system 102 to send a classification request 152 to classification system 120. Classification system 120 can comprise one or more computers in one or more locations. System 120 may be located remotely from the user's system 102, or may be located at the same premises on system 102. In some implementations, the capabilities of systems 102, 120, 110, or any two of these, are consolidated in a single, integrated system. In general, classification system 120 processes a classification request 152 and returns a classification result 158 that indicates a predicted likelihood that the person who provided biological sample 150 either has or does not have a specified medical condition (e.g., preeclampsia, necrotizing enterocolitis, endometriosis, ovarian cancer, fetal aneuploidy, an abnormal brain condition, or others).
In more detail, classification system 120 can include a package selector 122, filtering engine 124, methylation profiler 126, and model evaluator 128. Each of these components 122, 124, 126, and 128 may be implemented on one or more computers using a combination of software, hardware, or firmware. Package selector 122 is operable to select, from a library of medical-condition packages 140a-n, a particular package 140 that corresponds to the medical condition specified in classification request 152. Each package 140a-n defines information or instructions usable by classification system 120 to generate a classification result as to a different medical condition. In some implementations, the package 140 for a given medical condition can include one or more sequence filters 142, a loci list 144, and a machine-learning (ML) model 146, each of which is specific to the corresponding medical condition. Classification system 120 can load a retrieved package 140 to facilitate generation of a classification result 158 responsive to request 152. In some implementations, package selector 122 can select individual components of a package 140 as needed, such as filters 142, loci list 144, or ML model 146, to the exclusion of the others.
Sequence filters 142 provide information that enable a filtering engine 124 of the classification system 120 to filter initial sequence data by retaining sequences for target nucleic acids corresponding to the specified medical condition and discarding sequences for background (non-targeted) nucleic acids for the specified medical condition. In some implementations, filters 142 provides a whitelist that identifies target nucleic acids that should be retained in a filtering operation, such as cell-free DNA fragments that originate uniquely from a particular tissue associated with the specified medical condition. In some implementations, filters 142 provide a blacklist that identifies background nucleic acids that should be discarded in a filtering operation, such as cell-free DNA fragments that are not uniquely originated from a particular tissue associated with the specified medical conditions. In some implementations, filters 142 provide a whitelist that identifies target nucleic acids that should be retained (while other nucleic acids not specified in the whitelist are discarded) in a filtering operation, such as cell-free DNA fragments that are uniquely originated from a particular tissue associated with the specified medical condition. The whitelist, blacklist, or other information in filters 142 can define a set of genomic regions or sequences of nucleic acids against which sequences in the initial set of nucleic acids from the classification request 152 are compared to assess whether to retain or discard for further processing. For example, all nucleic acids in a plasma sample can be sequenced and the filtering can remove certain sequences identified in the blacklist. Alternatively, all nucleic acids in the sample can be sequenced and the filtering can discard everything but the sequences identified in a whitelist.
In some implementations, the filtering engine 124 is configured to perform a filtering operation that involves (i) selecting target nucleic acids from the initial set of nucleic acids based a methylation characteristic (e.g., methylation status), a copy number characteristic of the target nucleic acids, or both, and (ii) enriching the target nucleic acids, e.g., to increase a fraction of the target nucleic acids after filtering relative to their fraction in a pre-filtered set. In some examples, filters 142 identify a set of genomic regions in a whitelist. Selecting target nucleic acids from the initial set of nucleic acids can include comparing sequences of nucleic acids from the initial set of nucleic acids to sequences in the pre-defined set of genomic regions. Enriching the target nucleic acids in the filtered subset can include discarding sequences of nucleic acids from the initial set of nucleic acids that do not appear in the pre-defined set of regions, while retaining nucleic acids that do appear in the pre-defined set of regions. The pre-defined set of genomic regions can identify target nucleic acids having at least a minimum level of stability (e.g., minimum threshold stability) with respect to the methylation characteristic and the copy number characteristic, wherein the identified target nucleic acids originate from multiple different tissues. Additionally or alternatively, the pre-defined set of genomic regions can identify target nucleic acids originating from a subset of the multiple different tissues for which at least one of the methylation characteristic or the copy number characteristic differs by at least a minimum amount (e.g., minimum threshold) between individuals who have the specified medical condition and individuals who do not have the specified medical condition. In other examples, filters 142 identify a pre-defined set of genomic regions or sequences in a blacklist/exclude list, and corresponding operations can apply to select and enrich target nucleic acids except that the target nucleic acids are identified by discarding nucleic acids within the blacklist/exclude list.
Loci list 144 identifies the set of genomic loci whose methylation statuses are processed to generate a classification result with respect to a particular medical condition. The methylation profiler 126 within classification system 120 can use the loci list 144 to construct a methylation profile for a set of nucleic acids. For example, the loci list 144 may identify specific nucleotides, CpG sites, CpG islands, differentially methylated regions, promoter regions, enhancer regions, and/or CPG island shores whose methylation statuses can be processed to inform a classification with respect to the corresponding medical condition. In some implementations, the loci list 144 identifies genomic loci that occur only within the set of target nucleic acids for the medical condition. The loci list 144 can provide that the methylation statuses of all CpG sites within the target nucleic acids should be processed to generate a classification result. Alternatively, the loci list 144 can provide that the methylation statuses of only a subset of CpG sites within the target nucleic acids should be processed to generate a classification result. The subset of CpG sites (or other genomic loci) can be deemed the most statistically significant, or those that have the highest predictive power, for accurately classifying whether a patient has or does not have a specified medical condition. In some implementations, the loci list 144 identifies genomic loci that occur anywhere in the genome, regardless of whether the loci occur in target or background nucleic acid. The genomic loci in this embodiment may be processed without a separate filtering step that discards all or some of the background nucleic acid sequences, and the loci may have been identified as the most statistically significant, or those having the highest predictive power across the genome for accurately classifying whether a patient has or does not have a specified medical condition. The methylation profiler 126 can analyze the initial set of sequence data from the classification request 152 or the filtered set of sequence data from filtering engine 124 to determine the methylation status of all or some of the loci identified in list 144. The methylation status can be expressed in a number of ways, such as a binary value indicating whether the methylation level at a locus is above or below a pre-defined threshold, or a normalized value within a pre-defined range of values indicating a relative methylation level at the locus across multiple DNA fragments encompassing the locus.
As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.
As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.
As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro. A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.
Machine-learning model 146 is a model that correlates methylation patterns from a methylation profile to likelihoods that a person has or does not have a specified medical condition. The machine-learning model 146 can be loaded by model evaluator 128 in classification system 120. Further details on training and applying machine-learning models are described with respect to
Environment 100 can further include a machine-learning system 110 for training the machine-learning models 146 in the library of packages 140a-n. System 110 can be implemented one or more computers in one or more locations, and may be accessible to the user's system 102 and classification system 120 via a direct connection or by connection over a network 106. In some implementations, the functionality of system 110 can be integrated with the user's system 102, classification system 120, or both. System 110 can include a training data receiver 112, training engine 114, and model provider 116. Receiver 112 is configured to receive training data 132 for training a machine-learning model. Different training data sets 132 can be used to train different models corresponding to different medical conditions. For example, a first training data set may be constructed for training a model that screens for pre-eclampsia, while a second training data set may be constructed for training a model that screens for endometriosis. The training data 132 can include a collection of training samples 130, each training sample 130 comprising (i) a set of filtered or unfiltered sequence data describing sequences of nucleic acids from a biological sample of a person and (ii) a label indicating whether or not the patient exhibits a specified medical condition. The label can serve as a target output for the model 146 when evaluated on a methylation profile derived from the sequence data in the training sample. Further details of a process for training a machine-learning model 146 is described below with respect to
Referring to
In some implementations, the systems and methods disclosed herein can be applied to generate a classification result indicating whether a patient likely does or does not have a specified medical condition according to the process 500 depicted in
The process 500 was developed to identify changes of methylation patterns in the methylome of a biological sample caused by phenotypes of certain tissues affected by the abnormal medical condition (e.g., intestinal tissue for necrotizing enterocolitis or uterine tissue for endometriosis). One insight behind this process 500 was that the methylome of the DNA fragments in these biological samples is a mixture of a variety of component methylomes, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal tissue phenotype. By constructing a model methylome for a biological sample as a linear combination of various component methylomes, the process 500 can accurately predict the methylation patterns of a new biological sample under the hypothesis that it is from a normal individual. Consequently, the process 500 can exhibit high sensitivity in detecting abnormal methylation patterns in a biological sample caused by changes of the methylomes of some tissues (e.g., intestinal tissues) when the sample is from an affected individual. The process 500 can be performed by any of the computing systems described herein, such as systems 102, 110, and 120 shown in the environment of
Let i be any CpG site in human genome, zi,j be the methylation level of CpG site i in a biological sample j, pi,r,j be the proportion of the rth component methylome mr,j of particular tissue origin in sample j at site i, mi,r,j be the methylation level of CpG i in methylome mr,j. The system models the scenario as follows:
zi,j=Σr=1Rpi,r,jmi,r,j (1)
where pi,r,j, mi,r,j>=0, mi,r,j<=1, pi,1,j+ . . . +pi,R,j=1.
The model assumes that there is a set of CpG sites S such that, for any CpG site i in S, and any biological sample of a particular type (e.g., plasma, stool, cerebrospinal fluid, saliva, or urine) j from a normal individual, it has mi,r,j=mi,r and pi,r,j=pr,j.
That is, the model assumes that in any biological sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. The model further assumes that by restricting to the set of CpG sites S, biological samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m1S, . . . , mRS or simply m1, . . . , mR when there is no confusion. For any biological sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let zjS be the methylome of biological sample C restricted to S, then for some mixture vector pj=[pj,1 . . . pj,R]T, it has:
zjs=[m1S, . . . ,mRS]pj (2)
The model also assumes that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets Tk such that T=Uk=1K Tk where the index k represents the kth type of abnormal tissue (e.g., intestinal tissue) phenotype. Tk's do not need to be disjoint. Moreover, Tk itself is the union of two disjoint sets Dk and Vk. Either Dk or Vk could be empty, but not both. It is assumed that for any biological sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: zjC=[m1C, . . . , mRC]pj regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a biologic sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w1S≠[m1S, . . . , mRS]pl for any mixture vector pi. More specifically, for a biologic sample l from the kth type of abnormal individual, it has: 1) wjC=[m1C, . . . , mRC]pl, 2) if DK is non-empty, then wl
T is called the target set of CpG sites, Dk is called the differential methylation target set, Vk is called the copy number variation target set, and Tk is called the target set for the kth type of abnormal individual.
Certain operations of the process 500 are depicted in the flowchart of
Process 500 of
In some implementations of the presently disclosed process 500, it is assumed the restricted methylome of a biologic sample from a normal individual can be approximated by a mixture of two restricted reference methylomes, one representing the DNA fragments from a first specific tissue region (e.g., intestinal tissue region for necrotizing enterocolitis), another representing the DNA fragments from a second specific tissue region. It is further assumed that the estimations of these two reference component methylomes are available. The implementation of the process 500 includes the following steps.
To begin, identify the reference set C, and the target sets T1, . . . , TK (502). First, collect the methylation data for a set of first cell type samples, a set of second cell type samples, and a set of biologic samples, all from normal individuals. For each type of abnormal individuals, collect a set of first cell type samples, a set of second cell type samples, and a set of biologic samples from that type of abnormal individuals. All these samples should have matched age, race, and other relevant parameters. These are the training data. Next, let xi,j be the observed methylation level of CpG site i in a normal first cell type sample j, and yi,l the observed methylation level of CpG site i in a normal second cell type sample l, sx,i2 the sample variance of xi,j over all normal first cell type samples, sy,i2 the sample variance of yi,j over all normal second cell type samples. Identify the CpG sites S0 such that for any i∈S0, it has both sx,i2<c0 and sy,i2<c0 for some constant c0. These are CpG sites with stable methylation levels in each type of normal cells. Next, let xi,j be the observed methylation level of CpG site i in a first cell type sample j, including normal and abnormal, and yi,l the observed methylation level of CpG site i in a second cell type sample l, including normal and abnormal, sx,i2 the sample variance of xi,j over all first cell type samples, including normal and abnormal, sy,i2 the sample variance of yi,j over all second cell type samples, including normal and abnormal. Identify the CpG sites S1 such that for any i∈S1, it has both sx,i2<c0 and sy,i2<c0 for some constant c0, and that the statistical test for the difference between {xi,j0: j0 is a normal first cell type sample}, and {xi,jk: jk is a first cell type sample of the kth abnormal phenotype}, is not significant for all abnormal phenotypes of first cell type, and that the statistical test for the difference between {yi,j0: j0 is a normal second cell type sample} and {yi,jk: jk is a second cell type sample of the kth abnormal phenotype} is not significant for all abnormal phenotypes of the second cell type. These are CpG sites with stable methylation levels in each type of cells, and with no difference in methylation level between normal and any abnormal samples. Let xi be the sample mean of xi,j over all first cell type samples, including normal and abnormal, yi the sample mean of yi,j over all second cell type samples, including normal and abnormal. Identify the subset C0 of S1 such that for any i∈C0, it has |xi−yi|>c1 for some constant c1. These are CpG sites that are stably methylated in each cell type, with no difference between the normal and abnormal samples of the same cell type, and differentially methylated between different types of cells. Next, let xR
and ei,k2<c3 for some constants c2 and c3, where ei,k2 is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all biologic samples of the kth abnormal type, and si,k2 the sample variances of methylation levels of CpG site i in the same set of biologic samples. Repeat the above procedure for each type of abnormal biologic samples, the intersection of the subsets C=∩k=0KC0k is the reference set of CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal biologic samples can be accurately predicted by the reference component methylomes from normal individuals.
Next, let T0=S0\S1. Let xC and xT
Next, the system estimates the fraction of the new biologic samples to be tested. Recall that xc and yc are mean vectors of the methylation levels of the training first cell type and training second cell type data for the CpG sites in the reference set C. For any new biologic sample t to be tested, let ztC be the observed methylation levels of CpG sites in C. Regress ztC against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients for xC are the estimated fractions of the two cell types for the biologic sample t.
The system can then test if the new biologic samples are from the kth type of abnormal individual. For the new biologic sample (e.g., plasma) t, let xT
Other ways of implementing the process 500 can be developed by modifying the implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the biologic methylomes, nor does it need to estimate them directly. Instead, a set of predictor methylomes can be collected that are mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of biologic samples with known different proportion of first and second cell type DNAs.
In process 500, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a biologic sample the methylome has been affect by some type of tissue abnormality. To illustrate the advantage of this approach, it is assumed that the mixture vector pj for the methylome of a normal biologic sample j followed a Dirichlet's distribution with parameters α1= . . . =αR. Furthermore, for CpG site i, its methylation levels in the R reference vector pj for component methylomes are mi,r=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of
If there is a methyl-seq library in sample j with a coverage of N for CpG site i, the variance of the measured methylation level zi,j is
In other words, if zi,j is used as a test statistic to detect abnormal intestinal tissue using biologic sample, under the null hypothesis, the test statistic has a variance of σ12. However, in process 500, it is first estimated the mixture vector pj, then predicted zi,j by Σrmi,r pr,j. Note that in a methyl-seq data, each library can cover millions of CpG sites, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to obtain highly accurate estimation of the mixture vector pi, even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σrmi,r pr,j can be obtained, that is, the error of the estimation can be ignored, the variance of the difference zi,j−ΣrMi,r pr,j between the observed methylation level and the prediction will be
In other words, under the null hypothesis, the test static zi,j−Σr Mi,r pr,j used in process 500 has a much smaller variance than the other candidate test statistic zi,j. This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.
Additional techniques for detecting, assessing, monitoring, or treating preeclampsia can include those set forth in U.S. Application Ser. No. 62/832,157, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating CNS conditions can include those set forth in U.S. Application Ser. No. 62/882,215, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, or monitoring fetal aneuploidy can include those set forth in U.S. Application Ser. No. 62/928,156, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating ovarian cancer can include those set forth in U.S. Application Ser. No. 63/007,218, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating endometriosis can include those set forth in U.S. Application Ser. No. 63/007,204, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating necrotizing enterocolitis can include those set forth in U.S. Application Ser. No. 63/007,208, which is incorporated by reference in this disclosure.
The process 600 can include obtaining an initial set of sequence data (602). The initial sequence data describes sequences of nucleic acids from a biologic sample of a person or other mammal. In some examples, the nucleic acids characterized in the initial sequence data include cell-free DNA. The initial sequence data may include cell-free DNA from various tissues of the person, including tissues affected by a specified medical condition and tissues that are not affected, or are substantially less affected, by the specified medical condition. DNA fragments corresponding to the affected tissues may be deemed target DNA (e.g., DNA from the intestinal region when assessing necrotizing enterocolitis), while fragments corresponding to the other tissues may be deemed background or non-targeted DNA. The system receives an indication of the specified medical condition that is to be screened, e.g., based on user input provided into a computing terminal (604). The initial sequence data can be filtered using a selected filter corresponding to the specified medical condition (606). In some implementations, the filtering is operable to increase a fraction of sequences for target DNA relative to other DNA. In some implementations, the filtering includes selecting target nucleic acids from an initial set of nucleic acids based a methylation characteristic (e.g., methylation status), a copy number characteristic of the target nucleic acids, or both, and enriching the target nucleic acids, e.g., to increase a fraction of the target nucleic acids after filtering relative to their fraction in a pre-filtered set. A methylation profile can be generated from the filtered sequence data (608), and the methylation profile processed with an appropriate machine-learning model corresponding to the specified medical condition to generate a classification result (610). In some implementations, the machine-learning model is a model corresponding to those described with respect to the process 500 of
The computing device 800 includes a processor 802, a memory 804, a storage device 806, a high-speed interface 808 connecting to the memory 804 and multiple high-speed expansion ports 810, and a low-speed interface 812 connecting to a low-speed expansion port 814 and the storage device 806. Each of the processor 802, the memory 804, the storage device 806, the high-speed interface 808, the high-speed expansion ports 810, and the low-speed interface 812, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 802 can process instructions for execution within the computing device 800, including instructions stored in the memory 804 or on the storage device 806 to display graphical information for a GUI on an external input/output device, such as a display 816 coupled to the high-speed interface 808. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
The memory 804 stores information within the computing device 800. In some implementations, the memory 804 is a volatile memory unit or units. In some implementations, the memory 804 is a non-volatile memory unit or units. The memory 804 may also be another form of computer-readable medium, such as a magnetic or optical disk.
The storage device 806 is capable of providing mass storage for the computing device 800. In some implementations, the storage device 806 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The computer program product can also be tangibly embodied in a computer- or machine-readable medium, such as the memory 804, the storage device 806, or memory on the processor 802.
The high-speed interface 808 manages bandwidth-intensive operations for the computing device 800, while the low-speed interface 812 manages lower bandwidth-intensive operations. Such allocation of functions is exemplary only. In some implementations, the high-speed interface 808 is coupled to the memory 804, the display 816 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 810, which may accept various expansion cards (not shown). In the implementation, the low-speed interface 812 is coupled to the storage device 806 and the low-speed expansion port 814. The low-speed expansion port 814, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
The computing device 800 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 820, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as a laptop computer 822. It may also be implemented as part of a rack server system 824. Alternatively, components from the computing device 800 may be combined with other components in a mobile device (not shown), such as a mobile computing device 850. Each of such devices may contain one or more of the computing device 800 and the mobile computing device 850, and an entire system may be made up of multiple computing devices communicating with each other.
The mobile computing device 850 includes a processor 852, a memory 864, an input/output device such as a display 854, a communication interface 866, and a transceiver 868, among other components. The mobile computing device 850 may also be provided with a storage device, such as a micro-drive or other device, to provide additional storage. Each of the processor 852, the memory 864, the display 854, the communication interface 866, and the transceiver 868, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
The processor 852 can execute instructions within the mobile computing device 850, including instructions stored in the memory 864. The processor 852 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor 852 may provide, for example, for coordination of the other components of the mobile computing device 850, such as control of user interfaces, applications run by the mobile computing device 850, and wireless communication by the mobile computing device 850.
The processor 852 may communicate with a user through a control interface 858 and a display interface 856 coupled to the display 854. The display 854 may be, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display) display or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 856 may comprise appropriate circuitry for driving the display 854 to present graphical and other information to a user. The control interface 858 may receive commands from a user and convert them for submission to the processor 852. In addition, an external interface 862 may provide communication with the processor 852, so as to enable near area communication of the mobile computing device 850 with other devices. The external interface 862 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used.
The memory 864 stores information within the mobile computing device 850. The memory 864 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. An expansion memory 874 may also be provided and connected to the mobile computing device 850 through an expansion interface 872, which may include, for example, a SIMM (Single In Line Memory Module) card interface. The expansion memory 874 may provide extra storage space for the mobile computing device 850, or may also store applications or other information for the mobile computing device 850. Specifically, the expansion memory 874 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, the expansion memory 874 may be provided as a security module for the mobile computing device 850, and may be programmed with instructions that permit secure use of the mobile computing device 850. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.
The memory may include, for example, flash memory and/or NVRAM memory (non-volatile random access memory), as discussed below. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The computer program product can be a computer- or machine-readable medium, such as the memory 864, the expansion memory 874, or memory on the processor 852. In some implementations, the computer program product can be received in a propagated signal, for example, over the transceiver 868 or the external interface 862.
The mobile computing device 850 may communicate wirelessly through the communication interface 866, which may include digital signal processing circuitry where necessary. The communication interface 866 may provide for communications under various modes or protocols, such as GSM voice calls (Global System for Mobile communications), SMS (Short Message Service), EMS (Enhanced Messaging Service), or MMS messaging (Multimedia Messaging Service), CDMA (code division multiple access), TDMA (time division multiple access), PDC (Personal Digital Cellular), WCDMA (Wideband Code Division Multiple Access), CDMA2000, or GPRS (General Packet Radio Service), among others. Such communication may occur, for example, through the transceiver 868 using a radio-frequency. In addition, short-range communication may occur, such as using a Bluetooth, WiFi, or other such transceiver (not shown). In addition, a GPS (Global Positioning System) receiver module 870 may provide additional navigation- and location-related wireless data to the mobile computing device 850, which may be used as appropriate by applications running on the mobile computing device 850.
The mobile computing device 850 may also communicate audibly using an audio codec 860, which may receive spoken information from a user and convert it to usable digital information. The audio codec 860 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 850. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on the mobile computing device 850.
The mobile computing device 850 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 880. It may also be implemented as part of a smart-phone 882, personal digital assistant, or other similar mobile device.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Although various implementations have been described in detail above, other modifications are possible. In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Accordingly, other implementations are within the scope of the following claims.
OTHER EMBODIMENTS AND EXAMPLES Example 1—Methods of Assessing DNA Methylation for Pregnancy Phenotyping and Disease DiagnosisIntroduction of this Example
The example relates to methods for diagnosing, prognosing, monitoring and/or treating pregnancy-associated disorders in a pregnant subject.
Background of this ExampleEarly detection of pregnancy-associated conditions and prenatal disorders, including potential complications during pregnancy or delivery, is crucial, as it allows early medical intervention necessary for the safety of both the mother and the fetus. For example, the pregnancy-associated condition preeclampsia affects 2%-8% of all pregnancies and contributes to 15% of preterm deliveries and between 9% and 26% of maternal deaths worldwide. A number of risk factors for preeclampsia have been identified including hypertension and diabetes, and the consequences of preeclampsia can be far reaching and can include an elevated lifetime risk of cardiovascular disease both in the mother and the infant.
Prenatal diagnosis is typically reliant on invasive procedures such as chorionic villus sampling and amniocentesis. However, such early gestational placental biopsies for prediction of complex gestational diseases are not feasible due to the costly and invasive nature of such procedures. In addition, these invasive procedures are associated with higher risks of spontaneous abortion or fetal death, procedure-induced limb defects and oromandibular hypogenesis, fetomaternal hemorrhage, persistent leakage of amniotic fluid and vertical transmission of infection, and can also cause maternal discomfort and anxiety. With regard to preeclampsia, the only reliable predictor is a previous occurrence.
Alternatives to such invasive approaches have been developed for prenatal screening following the discovery that plasma from pregnant women contains significant numbers of genome equivalents that are derived from the fetoplacental unit. It has also been demonstrated that fetoplacental RNAs are detectable in maternal plasma and that these can be targeted for non-invasive early gestational phenotyping (Koh et al., Proc Natl Acad Sci USA. 2014; 111:7361-7366 and Ngo et al., Science. 2018; 360:1133-1136). However, very little is understood regarding the potential of DNA methylation in maternal plasma to provide non-invasive diagnostic and phenotypic information during pregnancy. This is significant because numerous studies have identified altered DNA methylation in the placentas of mothers affected by complex gestational diseases (Lapaire et al., Fetal Diagn Ther. 2012; 31:147-153; Winn et al., Pregnancy Hypertens. 2011; 1:100-108; Kang et al., J Hypertens. 2011; 29:928-936; Chaouat et al., J Reprod Immunol. 2011; 89:163-172; Varkonyi et al., Placenta. 2011; 32 Suppl:S21-29; Yuen et al., Eur J Hum Genet. 2010; 18:1006-1012; Blair et al., Mol Hum Reprod. 2013; 19:697-708; and Chu et al., PLoS One. 2014; 9:e107318). However, it is not known to what extent the maternal plasma provides a window into the fetoplacental DNA methylome and therefore little progress has been made with respect to whether this approach has value for diagnosis and phenotyping of the fetoplacental unit during early gestation. Therefore, there is a need for non-invasive methods for diagnosis of pregnancy-associated conditions, e.g., preeclampsia, and prenatal conditions.
Summary of this ExampleThis Example relates to methods for diagnosing, prognosing, monitoring and/or treating pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject. It is based, at least in part, on the discovery that DNA is differentially methylated in blood samples from pregnant women with preeclampsia as compared to pregnant women that do not have preeclampsia. This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating pregnancy-associated disorders.
In certain embodiments, a method for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a pregnant subject can include obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and diagnosing a pregnancy-associated disorder in the subject. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have the pregnancy-associated disorder. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from the pregnant subject being tested for the pregnancy-associated disorder. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a non-pregnant subject. In certain embodiments, the pregnancy-associated disorder is preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy or intrauterine growth retardation. For example, but not by way of limitation, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.
This Example further provides methods for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder, e.g., preeclampsia in a pregnant subject. For example, but not by way of limitation, the methods can include obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and diagnosing the pregnancy-associated disorder, e.g., preeclampsia, in the subject. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the pregnancy-associated disorder, e.g., preeclampsia, in the subject. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have the pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from the pregnant subject being tested for the pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a non-pregnant subject.
This Example further provides methods for determining if a pregnant subject has an increased risk of having a preterm birth. In certain embodiments, the method includes obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and determining that the subject is at an increased risk of preterm birth. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates that the subject is at an increased risk of preterm birth.
In certain embodiments, a method for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject can include obtaining a biological sample from the subject, determining the fraction of fetal nucleic acid (fetal fraction) in the biological sample, determining the methylation status of one or more genomic loci in placental nucleic acids present in the biological sample and diagnosing the subject with the pregnancy-associated disorder by analyzing the fetal fraction and the methylation status of the genomic loci in the placental nucleic acid. In certain embodiments, the methylation status is the methylation rate of the one or more genomic loci. In certain embodiments, the fetal fraction is determined by: analyzing the methylation status of one or more reference genomic loci in the biological sample, analyzing the methylation status of the one or more reference genomic loci in a reference sample of maternal blood cells and analyzing the methylation status of the one or more reference genomic loci in a reference sample of placental nucleic acids. In certain embodiments, the methylation status of the one or more genomic loci is determined by: analyzing the methylation status of the one or more genomic loci in the biological sample and analyzing the methylation status of the one or more genomic loci in a reference sample of maternal blood cells. In certain embodiments, the methylation status of the one or more genomic loci is determined by: analyzing the methylation status of the one or more genomic loci in the biological sample and analyzing the methylation status of the one or more genomic loci in a reference sample of blood cells from a non-pregnant individual. In certain embodiments, the methylation status of the one or more genomic loci is determined by: analyzing the methylation status of the one or more genomic loci in the biological sample and analyzing the methylation status of the one or more genomic loci in a reference sample of plasma from a non-pregnant individual.
This Example disclosure also provides for methods of treating a pregnancy-associated disorder in a pregnant subject. In certain embodiments, the method can include obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci present in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference, diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject, and treating the subject diagnosed with the pregnancy-associated disorder. In certain embodiments, a method of treating a pregnancy-associated disorder in a pregnant subject can include diagnosing a pregnancy-associated disorder in the subject by utilization of the algorithm disclosed in Example Embodiment B and treating the subject diagnosed with the pregnancy-associated disorder. In certain embodiments, the pregnancy-associated disorder preeclampsia, preterm labor, hyperemesis gravidarum, ectopic pregnancy or intrauterine growth retardation. For example, but not by way of limitation, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth. In certain embodiments, the method of treating preeclampsia can include any method known in the art, e.g., can include one or more of the following treatments: administration of an anti-hypertensive medication, administration of HMG-CoA reductase inhibitors, delivery, administration of a corticosteroid and/or administration of an anti-convulsant medication.
In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject. In certain embodiments, a decrease in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject. Alternatively and/or additionally, a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample and the increase in the level of methylation of at least one of the one or more different genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject.
In certain embodiments, the subject is human. In certain embodiments, the biological sample is a blood sample, stool sample, saliva sample and/or urine sample obtained from the subject. For example, but not by way of limitation, the biological sample can be obtained from the pregnant subject anytime during the pregnancy but prior to onset of clinical systems. In certain embodiments, the biological sample can be obtained from the pregnant subject between week 10 and week 13 of gestation, e.g., for early diagnosis of preeclampsia. In certain embodiments, the one or more genomic loci are present within maternal nucleic acids isolated from the biological sample, e.g., the maternal nucleic acids are obtained from cells, e.g., leukocytes, in the biological sample or are cell-free nucleic acids, e.g., placental nucleic acids, in the biological sample. Alternatively and/or additionally, the one or more genomic loci are present within fetal nucleic acids isolated from the biological sample, e.g., the fetal nucleic acids are cell-free nucleic acids in the biological sample. In certain embodiments, the one or more genomic loci comprise one or more CpG sites.
This Example provides for algorithms for diagnosing and/or monitoring a subject with a pregnancy-associated disorder. In certain embodiments, the algorithm can be used to classify a pregnancy-associated disorder of a subject.
This Example provides for kits for diagnosing, monitoring, classifying and/or treating a subject with a pregnancy-associated disorder. In certain embodiments, a kit of this Example includes a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the kit can further include instructions for diagnosing, monitoring and/or treating a subject with a pregnancy-associated disorder, e.g., preeclampsia.
Description of this ExampleThis Example provides methods for diagnosing, prognosing, monitoring, classifying and/or treating pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject. It is based, at least in part, on the discovery that DNA is differentially methylated in blood samples from pregnant women with preeclampsia as compared to pregnant women that do not have preeclampsia. For example, but not by way of limitation, the methods disclosed herein include determining the methylation status of one or more genomic loci in a biological sample of a pregnant subject. In certain embodiments, the methods disclosed herein include the use of an algorithm to diagnose, prognose, monitor, classifying and/or treat pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject.
Definitions of this ExampleUnless defined otherwise, all technical and scientific terms used in this Example have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them below, unless specified otherwise.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an” and “the” include plural references unless the context clearly dictates otherwise. This Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.
As used herein, the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
As used herein, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of disease in its early stages. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence and/or level of a biomarker in a biological sample of a subject is compared to a reference control.
The terms “reference sample,” “reference control,” “control” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a biological sample of a subject. In certain embodiments, a reference sample can be a sample from a healthy pregnant individual, e.g., an individual that does not have a pregnancy-associated disorder. In certain embodiments, a reference sample can be a sample from an individual that is not pregnant. In certain embodiments, a reference sample can be a sample from an individual that does not have preeclampsia. In certain embodiments, the reference sample can be an earlier sample taken from the same subject, e.g., while they were not pregnant or during an earlier healthy pregnancy. In certain embodiments, a control or reference can be the presence, absence and/or particular level of a methylation state of a genomic locus in a healthy pregnant individual. In certain embodiments, a control can be the presence, absence and/or particular level of a methylation state of a genomic locus in a healthy individual that underwent treatment for a pregnancy-associated disorder, wherein the healthy individual is non-symptomatic. In certain embodiments, a reference can be the presence, absence and/or particular level of a methylation state of a genomic locus in a healthy individual that has never had the disease. In certain embodiments, the reference can be a predetermined presence, absence and/or particular level of a methylation state of a genomic locus that indicates a subject does not have preeclampsia.
The term “pregnancy-associated disorder,” as used herein, refers to any condition or disease that may affect a pregnant woman or both the woman and the fetus. Such a condition or disease may manifest its symptoms during a limited time period, e.g., during pregnancy or delivery, or may last the entire life span of the fetus following its birth. Non-limiting examples of pregnancy-associated disorders include preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy, intrauterine growth retardation and genomic abnormalities (e.g., aneuploidy and chromosomal abnormalities such as trisomy 21, trisomy, 18, trisomy 13 and extra or missing copies of the X chromosome and Y chromosome). In certain embodiments, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.
As used herein, “preeclampsia” refers to a pregnancy-associated disorder that manifests as new onset hypertension. For example, but not by way of limitation, a subject suffering from preeclampsia can has a systolic blood pressure greater or equal to 140 mmHg and a diastolic pressure greater than or equal to 90 mmHg and protein in the urine at a concentration greater than 300 mg/dL/24 hr after 20 weeks gestation (i.e., arterial pressure >140/90 mmHg and proteinuria >300 mg/dL/24 hours after 20 weeks gestation). See, e.g., Chu et al., PLoS One. 2014; 9:e107318.
The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues or fetal cells by biopsy and/or effraction from the placental barrier. In certain embodiments, slightly invasive or non-invasive methods, as disclosed herein, includes the extraction of a biological sample from a subject by venipuncture.
The term “patient” or “subject,” as used interchangeably herein, refers to any pregnant warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.
As used herein, the term “biological sample” refers to a sample of biological material obtained from a pregnant subject. In certain embodiments, a sample of biological material obtained from a pregnant subject a pregnant human subject, including a biological fluid. Non-limiting examples of a biological fluid include urine, amniotic fluid, saliva, tears, sweat, blood, plasma and serum. In certain embodiments, the biological sample is a peripheral blood sample from a pregnant subject. In certain embodiments, the blood sample can be a fractionated portion of peripheral blood, such as a plasma sample. In certain embodiments, the biological sample can be a tissue sample obtained from the fetoplacental unit, e.g., as obtained by chorionic villus sampling and/or amniocentesis. In certain embodiments, the biological sample can be maternal leukocytes obtained from a blood sample. In certain embodiments, the biological sample can be circulating fetal cells obtained from a maternal blood sample. In certain embodiments, the biological sample can be a stool sample, a sample from the embryo and/or a sample from the blastocyst.
The term “nucleic acid,” “nucleic acid molecule” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T) or uracil (U)), a sugar (i.e., deoxyribose or ribose) and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.
The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.
The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, but not by way of limitation, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island or a CpG island shore. For example, but not by way of limitation, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.
As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status” and “methylation level” refer to the presence, absence, percentage and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.
As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.
As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.
A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters 2009; 583:1713-1720. For example, but not by way of limitation, Yamada et al. (Genome Research 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50% and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A. 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.
A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.
The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome) or a portion of the subset (e.g., those areas found to be associated with a pregnancy-associated disorder). A “fetal methylome” corresponds to a methylome of a fetus of a pregnant female. The fetal methylome can be determined using a variety of fetal tissues or sources of fetal DNA, including placental tissues and cell-free fetal DNA in maternal plasma. A methylome from plasma can be referred to a “plasma methylome.” The plasma methylome is an example of a cell-free methylome since plasma and serum include cell-free DNA (cfDNA). The plasma methylome is also an example of a mixed population methylome because it is a mixture of the fetal/maternal methylome. A “maternal methylome” corresponds to a methylome of a pregnant female. The maternal methylome can be determined using a variety of material tissues or sources of maternal DNA, including cell-free maternal DNA in maternal plasma and DNA from leukocytes.
As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As used herein, the terms “reduce,” “reduction” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
Methods of this ExampleThis Example provides methods for diagnosing, monitoring, classifying and/or treating pregnancy-associated disorders in a pregnant subject, e.g., by analyzing the methylation status of one or more genomic loci in a biological sample of a pregnant subject. In certain embodiments, the methods can utilize the algorithm disclosed herein. For example, but not by way of limitation, methods of this Example can be used to diagnose, monitor and/or treat pregnancy-associated disorders including, but not limited to, preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and/or intrauterine growth retardation. In certain embodiments, methods this Example allow the early diagnosis of a pregnant subject with preeclampsia, e.g., within the first trimester.
In certain embodiments, the biological sample can be a blood sample (including plasma or serum), stool sample, saliva sample and/or urine sample. In certain embodiments, the biological sample can be obtained from an embryo or blastocyst, e.g., during an in vitro fertilization procedure. The step of collecting a biological sample can be carried out either directly or indirectly by any suitable technique. For example, and not by way of limitation, a blood sample from a subject can be carried out by phlebotomy or any other suitable technique, with the blood sample processed further to provide a serum sample or other suitable blood fraction for analysis.
In certain embodiments, the biological sample, e.g., blood, is extracted from a pregnant subject at any time during the pregnancy. For example, but not by way of limitation, the biological sample can be obtained during the first trimester, second trimester or third trimester of the subject's pregnancy. In certain embodiments, the biological sample can be obtained any time during the pregnancy before the onset of pregnancy-associated disorder and/or before the onset of symptoms of the pregnancy-associated disorder. In certain embodiments, the biological sample is obtained during the first trimester of a subject's pregnancy for early diagnosis of the subject with a pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, a biological sample can be extracted from a pregnant subject between the 10th and 34th week of the pregnancy. In certain embodiments, the biological sample is obtained before the 34th week of pregnancy, before the 33rd week of pregnancy, before the 32nd week of pregnancy, before the 31st week of pregnancy, before the 30th week of pregnancy, before the 29th week of pregnancy, before the 28th week of pregnancy, before the 27th week of pregnancy, before the 26th week of pregnancy, before the 25th week of pregnancy, before the 24th week of pregnancy, before the 23rd week of pregnancy, before the 22nd week of pregnancy, before the 20th week of pregnancy, before the 21st week of pregnancy, before the 20th week of pregnancy, before the 19th week of pregnancy, before the 18th week of pregnancy, before the 17th week of pregnancy, before the 16th week of pregnancy, before the 15th week of pregnancy, before the 14th week of pregnancy, before the 13th week of pregnancy, before the 12th week of pregnancy, before the 11th week of pregnancy or before the 10th week of pregnancy. For example, but not by way of limitation, the biological sample can be obtained from a pregnant subject between about week 10 and about week 13 of pregnancy, e.g., to diagnose a subject with preeclampsia.
In certain embodiments, multiple biological samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more biological samples) can be obtained during a subject's pregnancy (e.g., serially obtained samples). For example, but not by way of limitation, multiple biological samples can be obtained during the first trimester. In certain embodiments, one or more samples can be obtained during the first trimester and one or more samples can be obtained during the second or third trimester.
Diagnostic, Prognostic, Classification and Monitoring Methods of this Example
This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a subject that includes analyzing the methylation status of certain genomic loci. In certain embodiments, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.
In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a biological sample from a subject that has a pregnancy-associated disorder compared to a reference sample. For example, but not by way of limitation, the methods of this Example include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a biological sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in
In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes or a combination thereof. In certain embodiments, the genomic loci are present on a particular chromosome. For example, but not by way of limitation, the genomic loci are present on chromosome 19.
In certain embodiments, this Example provides for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject by detecting the DNA methylation profiles associated with the pregnancy-associated disorder. For example, and not by way of limitation, the method can include (a) obtaining a biological sample from the subject, (b) determining the methylation status of one or more genomic loci present in the biological sample, (c) comparing the methylation status of the one or more genomic loci to a reference and (d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject. In certain embodiments, the difference in the methylation status can also indicate the severity of the pregnancy-associated disorder.
In certain embodiments, a method for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject includes (a) obtaining a biological sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the biological sample, (c) comparing the level of methylation of the one or more genomic loci to a reference and (d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the level of methylation of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject. In certain embodiments, the difference in the methylation level can also indicate the severity of the pregnancy-associated disorder.
In certain embodiments, this Example further provides methods for monitoring a subject at risk of developing preeclampsia. In certain embodiments, a subject a risk of developing preeclampsia is an individual that suffered from preeclampsia in an earlier pregnancy. For example, and not by way of limitation, the method can include determining the methylation status of one or more genomic loci in the biological sample obtained from the pregnant subject prior to a diagnosis of preeclampsia and determining the methylation status of the one or more genomic loci in a biological sample obtained from the subject at one or more later timepoints during the subject's pregnancy. In certain embodiments, a change in the methylation status of the one or more genomic loci in the second or subsequent samples, relative to the first sample can indicate that the subject has developed preeclampsia.
In certain embodiments, this Example further provides methods for determining if a pregnant subject has an increased risk of having a preterm birth. In certain embodiments, the method includes obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and determining that the subject is at an increased risk of preterm birth. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates that the subject is at an increased risk of preterm birth.
In certain embodiments, diagnosis of a subject with a pregnancy-associated disorder, monitoring a subject at increased risk of developing a pregnancy-associated disorder or determining if a subject is at risk of having a preterm birth can be based on a higher or lower methylation level of the genomic locus in the biological sample of the subject relative to the methylation level in a reference sample, e.g., a biological sample from a non-pregnant woman or a pregnant woman that does not have the pregnancy-associated disorder. In certain embodiments, the reference sample can be a biological sample from the subject prior to being pregnant and/or during an earlier pregnancy where the subject did not have a pregnancy-associated disorder. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a biological sample obtained from a subject compared to a control is indicative that the subject has the pregnancy-associated disorder or at risk of developing the pregnancy-associated disorder. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the biological sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the biological sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the biological sample and the increase in the level of methylation of one or more different genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject.
In certain embodiments, diagnosis of a subject with a pregnancy-associated disorder, e.g., preeclampsia, can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with a pregnancy-associated disorder can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. Alternatively, a genomic locus in a sample from a subject diagnosed with a pregnancy-associated disorder can be unmethylated and the genomic locus in a reference sample can be methylated.
In certain embodiments, this Example provides for diagnosing, prognosing, classifying and/or monitoring preeclampsia in a subject by assessing the DNA methylation profiles associated with preeclampsia. For example, and not by way of limitation, the method can include (a) obtaining a biological sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the biological sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference and (d) diagnosing preeclampsia in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of preeclampsia or a sub-type of preeclampsia in the subject. For example, but not by way of limitation, the method can be used to indicate whether the subject has mild preeclampsia or severe preeclampsia.
Diagnostic, Prognostic, Classification and Monitoring Methods Using an Algorithm of this Example
This Example further provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm, as disclosed in Example Embodiment B. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions. In certain embodiments, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.
By accurately representing the methylome of the maternal plasma as a mixture of multiple DNA methylomes of both fetal and maternal origin, the algorithm of this Example is able to provide a patient's specific reference methylation pattern for all genomic loci used as a biomarker, reduce the variance of the estimated deviation of the methylation pattern of a genomic locus in a test sample from the reference pattern, and achieve higher power when testing if the deviation is statistically significant.
Methods of Treatment of this Example
This Example further provides methods of treating a subject with a pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, a method of treating a pregnancy-associated disorder can include diagnosing a subject with the pregnancy-associated disorder as disclosed in herein, followed by the treatment of the subject. Any of the diagnosis methods disclosed herein can be used in treating a subject.
In certain embodiments, the treatment method can include (a) obtaining a biological sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the biological sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference, (d) diagnosing a pregnancy-associated disorder in the subject, where the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject and (e) treating the subject diagnosed with the pregnancy-associated disorder.
In certain embodiments, the methods of treatment can include diagnosing the subject with a pregnancy-associated disorder by use of the algorithm of this Example. For example, but not by way of limitation, the treatment method can include (a) diagnosing the subject with a pregnancy-associated disorder by use of the algorithm of this Example and (b) treating the subject diagnosed with the pregnancy-associated disorder.
In certain embodiments, the treatment method can include (a) obtaining a biological sample from the subject, e.g., a blood sample, (b) determining the fraction of fetal nucleic acid in the biological sample, (c) determining the methylation status of one or more genomic loci in placental nucleic acids present in the biological sample, (d) diagnosing the subject with the pregnancy-associated disorder by analyzing the fetal fraction and the methylation status of the genomic loci in the placental nucleic acid and (e) treating the subject diagnosed with the pregnancy-associated disorder.
In certain embodiments, this Example provides methods for treating preeclampsia. For example, but not by way of limitation, if a subject is diagnosed with preeclampsia, the subject can be treated by any method known in the art to treat preeclampsia. For example, but not by way of limitation, the subject can be treated by administration of medication to reduce the blood pressure of the subject, e.g., by administration of an anti-hypertensive medication. In certain embodiments, delivery can be used as a treatment for preeclampsia. Additional methods of treatment include administration of a corticosteroid administration of HMG-CoA reductase inhibitors and/or administration of an anti-convulsant medication.
In certain embodiments, the information provided by the methods described in this Example can be used by a physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of a pregnancy-associated disorder is made. For example, when a subject is identified to have an increased risk of developing a pregnancy-associated disorder, e.g., preeclampsia, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with pregnancy-associated disorder, e.g., preeclampsia (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.
In certain embodiments, this Example further provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for preventing, inhibiting or treating a pregnancy-associated disorder in a subject, comprising determining the methylation status of one or more genomic loci obtained from a subject prior to therapy and determining methylation status of the one or more genomic loci present in a biological sample obtained from the subject at one or more time points during therapeutic or prophylactic therapy, wherein the therapy is efficacious for preventing, inhibiting and/or treating a pregnancy-associated disorder in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.
In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment (for example, administration of an anti-hypertensive to a subject diagnosed with preeclampsia) can include measuring the methylation status and/or level of one or more genomic loci in a biological sample of a subject at a first timepoint, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second timepoint, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first timepoint is prior to an administration of the therapeutic agent, and the second timepoint is after said administration of the therapeutic agent. In certain embodiments, the first timepoint is prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) is increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the method of the present disclosure can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a biological sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced and/or stopped.
Assays of this Example
This Example further provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence and/or severity of a pregnancy-associated disorder. For example, but not by way of limitation, the pregnancy-associated disorder is preeclampsia. In certain embodiments, a method can include comparing the methylation status and/or level of genomic loci present in a biological sample from a subject that has a pregnancy-associated disorder to the methylation status and/or level of genomic loci in a biological sample from a healthy subject to determine the methylation pattern, as disclosed above, that correlates with the presence of the pregnancy-associated disorder. In certain embodiments, a method can include comparing the methylation status and/or level of genomic loci in a biological sample from a subject that has a pregnancy-associated disorder at an early stage (or less severe case, e.g., mild preeclampsia) to the methylation status and/or level of genomic loci in a biological sample from a subject that has the pregnancy-associated disorder at a late stage (or a more severe case, e.g., severe preeclampsia), as disclosed above, to determine the methylation status and/or level that correlates with the different stages and/or severity of the pregnancy-associated disorder. Non-limiting examples of pregnancy-associated disorders include preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy, intrauterine growth retardation and genomic abnormalities (e.g., aneuploidy and chromosomal abnormalities such as trisomy 21, trisomy, 18, trisomy 13 and extra or missing copies of the X chromosome and Y chromosome).
DNA Isolation Techniques of this Example
In certain embodiments, the methods of this Example include obtaining nucleic acid from a biological sample from a subject, e.g., a blood sample. There are several platforms that are known in the art and currently available to isolate nucleic acids from biological samples. For example, but not by way of limitation, isolation of DNA from a biological sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302) and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids or cells, for example, Qiagen's Gentra PureGene Cell Kit, QIAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a biological sample, e.g., blood sample or a tissue sample, from a pregnant woman.
In certain embodiments, the biological sample can be enriched or relatively enriched for maternal nucleic acids (e.g., maternal DNA) by one or more methods. For example, but not by way of limitation, maternal peripheral blood can be collected from a pregnant subject by venipuncture, e.g., during the first trimester of pregnancy, and DNA from leukocytes obtained from the blood sample, i.e., maternal DNA, can be isolated in the disclosed methods. In certain embodiments, placental nucleic acids can be isolated from a biological sample and used in the methods of this Example.
In certain embodiments, the biological sample can be first enriched or relatively enriched for fetal nucleic acids (e.g., fetal DNA) by one or more methods. For example, but not by way of limitation, discrimination between fetal and maternal DNA can be performed by detecting one or more of the following: single nucleotide differences between chromosome X and Y, chromosome Y-specific sequences, polymorphisms located elsewhere in the genome, size differences between fetal and maternal DNA and differences in the methylation pattern between maternal and fetal tissues. For example, but not by way of limitation, fetal nucleic acids can be enriched from a biological sample based on the differential methylation between fetal and maternal nucleic acids. In certain embodiments, separation of fetal and maternal nucleic acids can be based on the methylation status of a genomic locus, e.g., a CpG site or a genomic locus that includes a CpG site. In certain embodiments, the genomic locus of the maternal DNA is methylated and the genomic locus of the fetal DNA is unmethylated. In certain embodiments, the genomic locus of the maternal DNA is unmethylated and the genomic locus of the fetal DNA is methylated. In certain embodiments, the genomic locus of the maternal DNA is hypomethylated compared to the genomic locus of the fetal DNA. In certain embodiments, the genomic locus of the maternal DNA is less methylated compared to the genomic locus of the fetal DNA. In certain embodiments, the genomic locus of the maternal DNA is hypermethylated compared to the genomic locus of the fetal DNA. In certain embodiments, the genomic locus of the maternal DNA is more methylated compared to the genomic locus of the fetal DNA. See, e.g., U.S. 2010/0105049, the contents of which are incorporated by reference in their entirety.
Methylation Detection Techniques of this Example
Various methylation analysis procedures are known in the art, and can be used in the methods of this Example. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a biological sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.
In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).
In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci is provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.
Kits of this Example
This Example provides kits for diagnosing, monitoring, classifying and/or treating a subject with a pregnancy-associated disorder that comprise a means for determining and/or detecting the methylation status of one or more genomic loci. For example, but not by way of limitation, a kit of this Example can be used to diagnose, monitor and/or treat a subject with preeclampsia. In certain embodiments, a kit of this Example can be used to diagnose, monitor and/or treat a subject with preterm labor and/or preterm birth.
Types of kits include, but are not limited to, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, but not by way of limitation, a kit of this Example can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, but not by way of limitation, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes of this Example comprise one or more CpG sites.
In certain non-limiting embodiments, a primer and/or probe of this Example can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.
In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.
In certain non-limiting embodiments, a kit of this Example can additionally include other components such as, but not limited to, a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, negative control sequences and the like necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.
In certain embodiments, this Example provides for a kit that includes a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. The kit can further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of a pregnancy-associated disorder in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.
Reports, Programmed Computers and Systems of this Example
In certain embodiments, the diagnosis and/or monitoring of a pregnancy-associated disorder, e.g., preeclampsia, in a subject based on the methylation status of one or more genomic loci, can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).
Examples of tangible reports can include, but are not limited to, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).
A report can include, for example, an individual's medical history, or can just include size, presence, absence or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.
A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.
In certain embodiments, this Example provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods of this Example, e.g., to perform the algorithm of this Example (see Example Embodiment B). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's pregnancy-associated disorder risk or treat the individual, such as by implementing a disorder management system.
The following Example Embodiments are offered to more fully illustrate the disclosure of the Example, but are not to be construed as limiting the scope thereof
Example Embodiment a of Example 1: DNA is Differentially Methylated in Healthy Pregnant Females and Pregnant Females with PreeclampsiaDNA methylation patterns are associated with cellular phenotypes, are altered in complex gestational disease states and are cell lineage-specific. Thus, DNA methylation signatures identified in plasma potentially contain information relating to both pathobiology and the cell lineage-specific origins of the signal.
The maternal plasma compartment contains biomarkers derived from both mother and fetus and offers several distinct theoretical advantages. Obtaining maternal plasma is less invasive than obtaining amniotic fluid, placental biopsy or fetal blood. Maternal blood is drawn routinely at several points in time during prenatal care and, thus, a plasma biomarker could be incorporated into the usual provision of care. Using maternal plasma for biomarkers of PTB also facilitates central processing and analysis. After minimal processing at the point of care, blood specimens may be transported to a central facility for processing and analysis.
The present Example Embodiment A used solution phase hybridization to undertake targeted region capture of bisulfite-converted DNA obtained from the plasma of pregnant women in early gestation and non-pregnant female controls as disclosed in Example Embodiment C. The present Example Embodiment A performed targeted sequencing of 80.4 Mb of the plasma methylome and generated an average genome read depth of ˜42× in 18 plasma samples. The present Example Embodiment A used these data to identify the pregnancy-specific characteristics of cell-free DNA (cfDNA) methylation in plasma and found that pregnancy resulted in clearly detectable global alterations in DNA methylation patterns that were modulated by genomic location. Similar data was analyzed from first trimester maternal leukocyte populations and gestational age-matched chorionic villus (CV) and confirmed that tissue-specific DNA methylation signatures in these samples had a significant influence on global and gene-specific methylation in the plasma of pregnant versus non-pregnant women. The subject matter disclosed in this Example can be used in the context of non-invasive prenatal testing with respect to phenotypic pregnancy monitoring and the early detection of complex gestational phenotypes such as preeclampsia and preterm birth.
Results
DNA was extracted from the plasma of both pregnant and non-pregnant women. Blood was drawn from the pregnant women between 10-13 weeks gestation. Plasma DNA was subjected to solution-phase hybridization capture and DNA sequencing. Summary sequencing data are shown in Table 1. It was previously found that CpG methylation levels in early gestational maternal leukocytes were distributed in a biphasic manner with two peaks reflecting largely low methylation (LM) (<20%) and high methylation (HM) (>80%) states respectively. Relatively few CpG sites existed in an intermediate methylation (IM) (20-80%) state in these samples (Chu et al., PLoS One. 2011; 6:e14723). Previous studies have also determined that, compared with maternal leukocytes, the early gestational chorionic villus (CV) contained relatively fewer HM CpG sites, more IM sites and a slight increase in LM sites (Chu et al., PLoS One. 2011; 6:e14723).
Differences in the methylation of DNA obtained from pregnant women were compared to non-pregnant women were analyzed. As shown in
Methylation distributions were influenced by the location of the CpG sites of interest. Those sites that are present in specific genomic elements, specifically: exons, introns, promoters, CpG island (CGIs), CpG island shores and enhancers were examined. To reduce potential bias created by the presence within other elements of CGIs, these regions of interest were filtered to exclude those that overlapped with CGIs (with the exception of course of CGIs themselves). As shown in
The spatial differences between CpG methylation in cell-free plasma DNA from pregnant and non-pregnant women were examined. CpG methylation levels were plotted spatially both in a genome-wide fashion and with respect to each autosome.
The impact of genomic location on the ability of the presently disclosed subject matter to detect spatial differences in DNA methylation signals in plasma from pregnant and non-pregnant women was examined. The methylation rates of CpG sites present in each of the same structural and regulatory genomic elements (see above) between plasma DNA from pregnant and non-pregnant women were compared. As shown in
To identify specific CpG sites whose methylation levels are altered between pregnant and non-pregnant plasma, a multiple testing significance filter of q=<0.1 and a % methylation filter of >+/−5% was used. These analyses confirmed the overall reduction in CpG methylation levels observed in maternal plasma compared to plasma from non-pregnant women. Specifically, 24,398 CpG sites were identified, whose methylation levels were significantly different between pregnant and non-pregnant subjects. A significant majority of these (19,470 sites or −80%) displayed lower cfDNA methylation levels in plasma from pregnant women than non-pregnant women. There were 10.35-fold more CpG sites in exons (2463/2702 or ˜91%) that were less methylated in the plasma of pregnant compared to non-pregnant women. Similar ratios were identified for introns, with 8.82-fold more CpG sites in introns (6032/6803 or ˜89%) that were less methylated in the plasma of pregnant compared to non-pregnant women. There were 4.53-times more CpG sites in promoters that were less methylated in pregnant v non-pregnant plasma (2078/2536 or ˜82%) and similar numbers for CGI (4.32-fold, 2987/3671, ˜81%). Similarly of 78 CpGs in enhancers (excluding those overlapping with CGIs) 63 (˜81%) (4.2-fold) were less methylated in pregnant v non-pregnant plasma. Notably, there were only 2.14-fold more CpGs in CGIs (510/1092 or ˜55%) that were less methylated in cfDNA from the plasma of pregnant v non-pregnant women. These results are reflected in the data presented in
It was next determined whether the specific loci that are known to have distinct epigenetic signatures in the CV genome may influence maternal plasma DNA methylation profiles in a predictable fashion. To identify such loci, CpG methylation between early gestational CV tissue samples (11-13 weeks) and gestational age-matched maternal leukocytes (MBC) were compared. cfDNA from CV and MBC contributed the majority of circulating DNA fragments to maternal plasma during pregnancy. These assays were performed using solution-phase hybridization followed by bisulfite DNA sequencing. A multiple testing significance filter of q=<0.1 was used for these analyses. The CpG sites that were differentially methylated between the CV and MBC genomes AND that were differentially methylated between plasma samples from the pregnant and non-pregnant women were identified. It was found, in every instance, that if a CpG site was differentially methylated between CV and MBC and between plasma from pregnant and non-pregnant women, the direction of change in each comparison was the same. That is, if a CpG site was less methylated in CV compared to MBC then it was also less methylated in the plasma of pregnant women compared to non-pregnant women. A total of 6,558 CpG sites followed this trend. Examples of such genomic loci were shown in
Differences between the distribution of DNA methylation in chorionic villus sampling (CVS) samples and maternal leukocyte samples obtained from pregnant women between 10-13 weeks gestation were also observed. As shown in
To determine if genomic regions are differentially methylated in preeclampsia patients, maternal plasma samples were collected from women that later developed severe preeclampsia or mild preeclampsia between 10-13 weeks of gestation. As shown in
Discussion of this Example
This Example relates to a high read depth comprehensive bisulfite sequencing analysis of methylation in cfDNA from the plasma of pregnant and non-pregnant women. This Example also shows the global impact of pregnancy on the plasma DNA methylome and provide methylated cfDNA biomarkers of complex gestational diseases such as preeclampsia and preterm birth, and non-invasive biomarkers that are able to capture both fetal and maternal influences on the molecular phenotype during early gestation.
This Example discovers that pregnancy has a profound impact on methylation signatures in plasma cfDNA. This Example observed an overall reduction in DNA methylation that was influenced significantly by the location of the CpG sites of interest. This Example also discovered that, in general, exonic and intronic sequences in cfDNA displayed significant reductions in DNA methylation levels in pregnant versus non-pregnant women, whereas CpG islands displayed relatively fewer differences. Given that the early gestational CV is known to be generally hypomethylated compared to gestational age-matched MBCs, the pregnancy-specific reductions in global DNA methylation levels in maternal plasma are highly influenced by methylated cfDNA (mcfDNA) fragments originating in the chorionic villus. Comparing the presently disclosed plasma data with similar data from a comparison of gestational age-matched CV and maternal leukocyte samples (MBC) provided further support for this finding. It was found that when CpGs were differentially methylated between CV and MBC AND between pregnant and non-pregnant plasma, the direction of change in plasma was, in every case, predicted by that in CV/MBC. In other words, if a CpG site was hypomethylated in CV compared to MBC, its methylation levels were reduced in the plasma of pregnant women compared to non-pregnant women. The opposite was also true.
High throughput genomic approaches including the ones disclosed herein have demonstrated enormous potential both with regard to unraveling complex heterogeneous phenotypes at the molecular level and generating novel hypotheses to improve understanding of complex pathophysiology including in the context of gestational disease (Hong et al., Epigenetics. 2018; 13:163-172; Strauss et al., Am J Obstet Gynecol. 2018; 218:294-314 e292; Heng et al., PLoS One. 2016; 11:e0155191). The high throughput genomic approaches have also been used with great success to develop biomarkers, including those consisting of cell-free methylated DNA signatures (Ngo, Science. 2018; 360:1133-1136; Karlas, J Transl Med. 2017; 15:106; Hardy et a., Gut. 2017; 66:1321-1328; Liggett et al., J Neurol Sci. 2010; 290:16-21). These approaches have been used in oncology, including a particularly promising application involves plasma-based detection of DNA methylation (Shen et al., Nature. 2018; 563:579-583).
Example Embodiment B of Example 1: Method for DNA-Methylation-Based Liquid Biopsy for Non-Invasive Pregnancy Phenotyping and Disease DiagnosisProvided below is an algorithm that can be used to diagnose a subject with a pregnancy-associated disorder and/or pregnancy abnormality.
The methylomes of either maternal tissues or the placenta could be affected by certain types of pregnancy abnormality, and the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in maternal plasma, which are released by the maternal tissues and the placenta. An algorithm was developed to identify the changes of methylation patterns in the methylome of maternal plasma caused by the abnormal pregnancies. The main insight behind this algorithm is that the methylome of the DNA fragments in a maternal plasma sample is a mixture of a variety of component methylomes with either maternal or fetal origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal pregnancies. By constructing a model of maternal plasma methylome as a linear combination of various component methylomes of fetal and maternal origins, the algorithm can accurately predict the methylation patterns of a new maternal plasma sample under the hypothesis that it is from a normal pregnancy. Consequently, the algorithm exhibits high sensitivity in detecting abnormal methylation patterns of a maternal plasma sample caused by changes of the methylomes of some fetal/maternal tissues when the sample is from an abnormal pregnancy.
Let i be any CpG site in human genome, zi,j be the methylation level of CpG site i in a maternal plasma sample j, pi,x,j be the proportion of a component methylome mr,j of either fetal or maternal origin in maternal plasma sample j at site i, mi,r,j be the methylation level of CpG i in methylome mr,j. Our hypothesis is:
zi,j=Σr=1Rpi,r,jmi,r,j (1)
where pi,r,j, mi,r,j>=0, mi,r,j<=1, pi,1,j+ . . . +pi,N,c=1.
It is assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any maternal plasma j from a normal pregnancy, we have mi,r,j=mi,r and pi,r,j=pr,j.
That is, it is assumed that in any maternal plasma sample from a normal pregnancy, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, maternal plasma samples from all normal pregnancies have the same set of component methylomes. We call them restricted reference component methylomes (RRCM), and label them as m1S, . . . mRS or simply m1, . . . , mR when there is no confusion. For any maternal plasma sample j from a normal pregnancy, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let zjS be the methylome of maternal plasma sample C restricted to S, then for some mixture vector pj=[pj,1, . . . , pk,R]T, we have:
zjS=[m1S, . . . ,mRS]pj (2)
Finally, it is assumed that the set S is the union of two disjoint subsets R and T, where T is a union of K non-empty sets Tk such that T=∪k=1KTk where the index k represents the kth type of abnormal pregnancy. Tk's do not need to be disjoint. Moreover, Tk itself is the union of two disjoint sets Dk and Vk. Either Dk or Vk could be empty, but not both. It is assumed that for any maternal plasma sample, including one from an abnormal pregnancy, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. Therefore: zjC[m1C, . . . , mlC]pj regardless whether j is from an abnormal pregnancy, where C refers to the set of reference CpG sites. On the other hand, for a maternal plasma sample l form an abnormal pregnancy, when restricted to CpG sites in S=C∪T, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. Therefore, we have: wlS≠[m1S, . . . , mRS]pl for any mixture vector P. More specifically, for a maternal plasma sample l from the kth type of abnormal pregnancy, we have: 1), wjC=[m1C, . . . , mRC]pl, 2), if DK is non-empty, then wl
T is the target set of CpG sites, Dk is the differential methylation target set, Vk is the copy number variation target set, and Tk is the target set for the kth type of abnormal pregnancy.
The main steps of the algorithm of this Example are as follows:
-
- 1) Identify the sets of CpG sites R, and T1, . . . , TK for the list of K types of abnormal pregnancies.
- 2) Estimate the restricted reference component methylomes m1, . . . , mR, or R predictor methylomes n1, . . . , nR that are independent linear combinations of the reference component methylomes such that nr=[m1, . . . , mR]qr for R linearly independent mixture vectors q1, . . . , qR.
- 3) (Optional) If the reference component methylomes are available, estimate the proportions of these components at the reference CpG sites C for the test maternal plasma samples.
- 4) Predict the methylation level of the test maternal plasma samples at the target set Tk of CpG sites, under the hypothesis that the sample is from a normal pregnancy.
- 5) Compare the predicted methylation levels at Dk and Vk against the observed methylation levels, and reject the null hypothesis that a test sample is from a normal pregnancy if the observed methylation levels are significantly different form the predicted levels.
This algorithm can be implemented in a variety of ways. For example, given the methyl-seq data for a set of maternal plasma samples from normal pregnancies, the EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below we will present some simple implementations that use linear regression.
In the first simple implementation of the algorithm of this Example, it is assumed that the restricted methylome of a maternal plasma sample from normal pregnancy can be approximated by a mixture of two restricted reference methylomes, one representing the DNA fragments from the fetus, another representing the DNA fragments from maternal tissues. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, the methylome of chorionic villi samples (CVS) will be used as an approximation to the fetal methylome in the maternal plasma sample, the methylome of plasma samples from healthy non-pregnant women (NPP) as an approximation to the maternal methylome in the maternal plasma sample. The implementation of the algorithm includes the following steps:
1. Identify the reference set C, and the target sets Tl, . . . , TK.
-
- 1.1. Collect the methylation data for a set of CVS samples, a set of NPP samples, and a set of maternal plasma samples from normal pregnancies. For each type of abnormal pregnancy, collect a set of maternal plasma samples from that type of abnormal pregnancy. All these samples should have matched age, race, gestation age (if applicable) and other relevant parameters. These are the training data.
- 1.2. Let xi,j be the observed methylation level of CpG i site in a CVS sample j, and yi,l the observed methylation level of CpG site l in a NPP sample l, sx,i2 the sample variance of xi,j over all CVS samples, sy,i2 the sample variance of yi,j over all NPP samples. Identify the CpG sites S0, such that for any i∈S0, we have both sx,i2<c0 and sy,i2<c0 for some constant c0.
- 1.3. Let xi be the sample mean of xi,j over all CVS samples, yi the sample mean of yl,j over all NPP samples. Identify the subset C0 of S0 such that for any i∈C0, we have for some constant
- 1.4. Let xR
0 be the vector of xi for all i∈C0, and yC0 be the vector of yl for all i∈C0. Let zjC0 be the observed methylation levels of CpG sites in C0 for a normal maternal plasma sample j, and wlC0 the observed methylation level of CpG sites in C0 for a maternal sample l from some abnormal pregnancy. For each j regress zjC0 against xC0 and yC0 , with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1, and get the residual ejC0 . Similarly, for each l regress wlC0 against xC0 and yC0 , with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1, and get the residual elC0 . Identify the subset C of C0 such that for any l∈C, we have
-
- and si,z2<c3 for some constants c2 and c3, where ei,z and ei,w are means of methylation levels of CpG site i in all the normal samples and all the abnormal pregnancy respectively, si,x2 and si,w2 are sample variances of methylation levels of CpG site in all the normal samples and all the abnormal pregnancy respectively. C is the reference set of CpG sites.
- 1.5. Let T0=S0\C. Let xC and xT
0 be the vectors of xi and xh, for all l∈C and h∈T0 respectively, and yC and yT0 be the vectors of yi and yh for all i∈C and h∈T0 respectively. Let zjC and zjT0 and be the observed methylation levels of CpG sites in C and T0 respectively for a normal maternal plasma sample j, wlk C and wlk T0 the observed methylation level of CpG sites in C and T0 respectively for a maternal sample lk from a pregnancy with the kth type of abnormality, wlg C and wlg T0 the observed methylation level of CpG sites in C and T0 respectively for a maternal sample lg from a pregnancy with the gth type of abnormality, where g≠k. For each j, lk, and lg, regress zjC, wlk C, and wlg C respectively against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. Apply the fitted models respectively to xT0 and YT0 to predict zlT0 , wlk T0 , and wlg T0 respectively, and get the differences ejT0 , elk T0 and elg T0 between the predicted values and observed values. Let ei, ei,g, and ei,k be sample means, and si2, si,k2, and si,g2 be the sample variances, of entries in ejT0 , elk T0 , elg T0 and for CpG site i respectively over the normal samples, the samples from the kth type of abnormal pregnancy, and samples from the type of abnormal pregnancy. Identify the subset Tk of T0 such that for any i∈Tk, we have
for some constants c2, c2,k, ck, and ck,g. Tk is the target set for the kth type of abnormal pregnancy.
-
- 2. Estimate fetal fraction of the new maternal plasma samples to be tested. Recall that xC and yC are mean vectors of the methylation levels of the training CVS and training NPP data for the CpG sites in the reference set C. For any new maternal plasma sample t to be tested, let ztC be the observed methylation levels of CpG sites in C. Regress ztC against xC and yC, with the constraints that the intercept must be 0 and the coefficients must be non-negative and add to 1. The estimated coefficient for xC is the estimated fetal fraction for the maternal plasma sample t.
- 3. Test if the new maternal plasma samples are from the kth type of abnormal pregnancy. For the new maternal plasma sample t, let xT
k and yTk be mean vectors of the methylation levels of the training CVS and training NPP data for the CpG sites in the target set Tk identified in step 1 of this algorithm, apply the fitted regression models obtained from the step 2 of this algorithm to xTk and yTk to predict the methylation levels of CpG sites in Tk for sample t under the hypothesis that sample t is from a normal pregnancy. Let nk be the number of CpG sites in Tk. Define functions
-
- and
-
- where I_(x)=I(−∞,0)(x), that is, the indicator function for the interval (∞,0), ei and ei
k are estimations obtained from step 1.5 of the algorithm. We will say the sample is from the kth type of abnormal pregnancy if fk(e1r , . . . , enk,r )>ck, and fk,g(e1r , . . . , enk,r )>ck,g for all g≠k, where eir is the difference between the observed methylation level of the CpG site i∈Tk for sample t and the predicted value by the fitted model obtained from step 2, and g is any type of abnormal pregnancy that is different form the kth type of abnormal pregnancy.
- where I_(x)=I(−∞,0)(x), that is, the indicator function for the interval (∞,0), ei and ei
Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, we do not need to assume that there are only two component reference methylomes that make up the maternal plasma methylomes, nor do we need to approximate them by the CVS and NPP methylomes. Instead, we can collect a set of predictor methylomes that are mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of maternal mononuclear cells (MBC), or even normal maternal plasma samples. The only shortcoming of using the predictor methylomes instead of the reference component methylomes is that a straightforward estimation of the fraction of fetal DNA fragments in a maternal plasma sample cannot always be provided.
In this algorithm, we choose the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a maternal plasma sample the methylome has been affect by some type of pregnancy abnormality. To illustrate the advantage of this approach, let us assume that the mixture vector pl for the methylome of a normal maternal plasma sample i follows a Dirichlet's distribution with parameters α1= . . . =αR. Furthermore, for CpG site i, its methylation levels in the R reference vector pi for component methylomes are mi,r=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of
If we have a methyl-seq library of in sample j with a coverage of N for CpG site i, the variance of the measured methylation level zi,j is
In other words, if we use zi,j as a test statistic to detect abnormal pregnancy using maternal plasma sample, under the null hypothesis, the test statistic has a variance of σ12. However, in our algorithm, we first estimate the mixture vector pi, then predict zi,j by Σrmi,rpr,j. Note that in a methyl-seq data, we can get millions of CpG sites covered in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector pj, even if we take into account that adjacent CpG sites tend to have correlated methylation levels. Assuming we can get an accurate estimate of Σrmi,rpr,j, the variance of the difference zi,j−Σrmi,rpr,j between the observed methylation level and our prediction will be
In other words, under the null hypothesis, the test static zi,j−Σrmi,rpr,j used in our algorithm has a much smaller variance than the other candidate test statistic zi,j. This in turns means that our test will achieve a higher power at the same level of type I error.
Example Embodiment C of Example 1: Non-Invasive Analysis of Methylated Cell-Free DNA in Preterm BirthPreterm birth (PTB) is one of the most important problem in modern obstetrics. In 2010, more than 1 million infants born preterm (at less than 37 weeks of gestation) died worldwide, making it the second leading cause of death in children under the age of 5 years. Preterm infants who survive are at risk of chronic lung disease, deafness blindness or other visual impairment, and learning and cognitive disabilities. The 12% rate of preterm birth in the United States ranks 131st of 184 countries, behind many developing nations. The past 3 decades in the United States have seen little decline in preterm births, including the earliest deliveries, which cause the most morbidity and mortality. Identifying potential targets for preterm birth prevention is a public health priority. One reason for the lack of predictive biomarkers for PTB is the difficulty associated with analyzing tissue samples from ongoing pregnancies. Therefore, relevant fetal tissues (cord blood and placenta) have largely been characterized at birth such that many studies are confounded by comparisons between samples collected at premature birth with controls that are collected at normal term. Thus, it is challenging, if not impossible, to separate gestational age differences from true markers of disease pathology.
Spontaneous preterm birth is a physiologically heterogeneous syndrome. The cascade of events that culminate in spontaneous preterm birth has several possible underlying pathways. Four of these pathways are supported by a considerable body of clinical and experimental evidence: excessive myometrial and fetal membrane overdistention, decidual hemorrhage, precocious fetal endocrine activation, and intrauterine infection or inflammation. These pathways may be initiated weeks to months before clinically apparent preterm labor. The processes leading to preterm parturition may originate from one or more of these pathways; for example, intrauterine infection or inflammation and placental abruption often coexist in preterm births. Decidual hemorrhage and intrauterine infection share several inflammatory molecular mechanisms that contribute to parturition. The understanding of the nature of the molecular cross-talk among these pathways is in its infancy.
The etiologic heterogeneity of preterm birth adds complexity to therapeutic approaches. Although the ultimate clinical presentation of women with preterm labor may appear to be homogeneous, the antecedent contributing factors probably differ considerably from woman to woman. A key feature of pathologic parturition is the dynamic interplay of the mother and the fetus/placenta. A critical element of the present disclosure is the notion that the approach disclosed herein permits both cross-sectional and longitudinal assessment of maternal and placental “signatures” from the readily accessible maternal blood compartment.
Delineating the factors predictive of PTB, the present disclosure obtains a better understanding of the mechanisms and biologic pathways that lead to spontaneous preterm parturition. Moreover, the use of predictors of sPTB permits the identification of a group of women at the highest risk for whom an intervention may be tested and for whom intervention is most critically needed. Third, identifying women at low risk of PTB, unnecessary, costly, and sometime hazardous interventions might be avoided.
Two fundamental compartments, the maternal and the fetal, provide the sources of biomarkers. The maternal compartment may be subdivided into blood, saliva, urine, cervix, and vagina. The fetal compartment may be subdivided into placenta, cord blood and amniotic fluid. The maternal plasma compartment contains biomarkers derived from both mother and fetus and offers several distinct theoretical advantages. Obtaining maternal plasma is less invasive than obtaining amniotic fluid, placental biopsy or fetal blood. Maternal blood is drawn routinely at several points in time during prenatal care and, thus, a plasma biomarker could be incorporated into the usual provision of care. Using maternal plasma for biomarkers of PTB also facilitates central processing and analysis. After minimal processing at the point of care, blood specimens may be transported to a central facility for processing and analysis.
It is known that plasma contains fragmented “cell-free” DNA that exists outside of intact cells. In addition to containing cell-free maternal (self) DNA, plasma from pregnant mothers contains significant amounts of DNA from the developing fetus. Methods of non-invasive detection of fetal aneuploidy have been developed using maternal plasma DNA in early gestation. This was achieved via the DNA sequencing of circulating cell-free fetal DNA in maternal plasma. Recent advances in the genomic analysis of circulating cell-free DNA (cfDNA) have enabled the development of sophisticated methods for detecting fetal aneuploidy. A small number of groups have extended these methods to enable the quantification of DNA methylation levels in cfDNA from maternal plasma. This is an attractive avenue of research because DNA methylation patterns are associated with cellular phenotype, are altered in complex gestational disease states, and are cell lineage-specific. Thus, DNA methylation signatures identified in plasma potentially contain information relating to both pathobiology and the cell lineage-specific origins of the signal. However, previous analyses of methylated cell-free DNA (mcfDNA) in maternal plasma have consisted only of proof of concept studies performed at low-resolution. None have generated detailed insights into normal temporal or pathobiological changes in DNA methylation that may occur during pregnancy.
Thus, despite its biological importance and potential clinical utility, this field is in its infancy. Critical knowledge gaps include an understanding of how methylated cell-free DNA signatures are modulated during gestation in normal pregnancies, how these are influenced by both maternal and fetal cell lineages and how they are influenced in the context of disease.
Because DNA methylation is intimately linked to cellular phenotype and modified in the context of gestational diseases and environmental exposures, the characterization of methylated cell-free DNA (mcfDNA) provides an opportunity to identify altered epigenetic signatures in early gestation that are associated with gestational disease and other negative pregnancy outcomes. This is particularly relevant given that PTB is frequently associated with abnormal placental phenotypes including malperfusion and inflammation.
This Example Embodiment C relates to novel methodology addresses these gaps. The presently disclosed novel methodology assesses methylated cell-free DNA (mcfDNA) that signatures in maternal plasma at high resolution and have generated experimental and computational tools to dissect these signatures such that their maternal and fetal inputs may be distinguished and quantified. The present disclosed methodology has identified putative DNA methylation signatures in early gestation (11-13 weeks) that are associated with an eventual diagnosis of preeclampsia as disclosed in Example Embodiment A.
The present disclosure of this Example establishes a paradigm for non-invasive pregnancy monitoring in which phenotypic information relating to both the mother and the fetus is ascertained both in the context of normal development and spontaneous preterm birth (sPTB). Importantly, novel molecular and computational methods have been developed to enable these goals. This creates an opportunity in which epigenomic signatures in maternal plasma can be exploited to define biomarkers for sPTB without the need for risky and invasive tissue biopsy. The present disclosure of this Example discloses applying the present disclosed methodology in the area of sPTB. The present disclosure of this Example has the ability to distinguish the cell-lineage specific contributions to plasma DNA methylation signals. This requires DNA methylation data for key reference cell lineages, for example cytothrophoblasts and syncytiotrophoblasts. Therefore, the present disclosure of this Example relates to optimized methods to assess genome-wide DNA methylation signatures in specific cell lineages, in which dilution effects caused by heterogeneous samples are avoided. The present disclosure of this Example can be used in combination to perform high-resolution analysis of DNA methylation in the context of normal development and spontaneous preterm birth.
The present disclosure of this Example relates to novel approaches for non-invasive pregnancy monitoring and early detection (or exclusion) of sPTB. A major barrier to progress has been that methylation profiling of maternal plasma DNA is technically challenging and expensive. Previous genome-wide bisulfite sequencing approaches in pregnancy have resulted in low coverage whole-genome data from low numbers of samples. However, the presently disclosed molecular and computational methods of this Example allow accurate bisulfite sequencing of plasma cell-free DNA at high read depth. The present disclosure of this Example also relates to methods for whole-genome bisulfite DNA sequencing of ultra-low input DNA samples obtained via laser capture microdissection. The present disclosure of this Example further relates to reference data from homogenous populations of cytotrophoblasts and syncytiotrophoblasts and other placental cell types. The present disclosure of this Example determines DNA methylation differences between mother and fetus and to identify biomarkers for the non-invasive prediction of sPTB.
The present disclosure of this Example further relates to clinical liquid biopsy assays for non-invasive pregnancy monitoring and risk assessment for sPTB.
Study 1
To characterize temporal changes in cfDNA methylation signatures in maternal plasma across the gestational age span, dynamic changes in methylated cell-free DNA signatures in maternal plasma across the gestational age range in primigravids who experience normal pregnancy are profiled. Novel computational tools and genetic approaches are used to further characterize these signatures with respect to their maternal and/or fetal origins. To conduct parallel analysis of DNA methylation in maternal leukocytes, dynamic changes in DNA methylation patterns of maternal leukocytes (from the same blood samples processed above) are profiled to understand how these are modulated throughout gestation. These data illuminate the leukocyte contribution to methylated cell-free signatures identified in maternal plasma.
A high-resolution temporal analysis of the circulating cell-free DNA methylome during gestation in maternal plasma from normal outcome pregnancies is performed. Such analysis improves the understanding of the biology of cell-free nucleic acids throughout gestation and provides a framework from which to explore pathogenesis of PTB.
The temporal dynamic changes in mcfDNA in all three trimesters of pregnancy in matched primigravid women who undergo normal pregnancy and deliver at term were characterized. Genotypic differences between mother and fetus, resulting from the fetal inheritance of uniquely paternal variants not possessed by the mother are explored to identify the origin (fetal, maternal or both) of mcfDNA in maternal plasma and determine the fraction of fetal DNA fragments (Koh et al., Proc Natl Acad Sci USA. 2014; 111:7361-7366). This allows the identification of both paternally- and maternally-inherited sequence variants in circulating methylated DNA fragments.
Identification of Fetoplacental Methylation Patterns in Maternal Plasma:
mcfDNA patterns in plasma samples from n=6 pregnant women who had normal pregnancy outcomes (gestational age 11-13 weeks) were compared with those in plasma of n=12 non-pregnant controls. mcfDNA patterns that are associated with pregnancy were identified. This is important with respect to the understanding of the epigenomic changes that occur during early gestation and is a first step towards the characterization of the early gestational methylome in maternal plasma. Such pregnancy-specific signals can originate in maternal hematopoietic cells, can be fetoplacental, and/or can be derived from other (non-hematopoietic) adult organ systems. As shown in
These CpG methylation distributions are influenced by the location of the CpG sites of interest examining, in turn, those sites that are present in defined genomic elements, specifically; exons, introns, promoters, CpG island, CpG island shores and enhancers. To reduce potential bias created by the presence, within other elements, of CGIs, regions of interest were filtered to exclude those that overlap with CGIs (with the exception of course of CGIs themselves). As shown in
Distributions in CGIs appeared similar to those promoters except even few HM sites exist (
To examine the spatial differences between CpG methylation between cell-free plasma DNA from pregnant and non-pregnant women, CpG methylation levels were plotted relative to genomic coordinates both in a genome-wide fashion and with respect to each autosome. As shown in
The impact of genomic location on the present disclosure's ability to detect pregnancy-specific DNA methylation signals in maternal plasma. The methylation rates of CpG sites present in each of the same structural and regulatory genomic elements (see above) was compared between pregnant and non-pregnant control plasma DNA. As before, regions of interest were filtered to exclude those that overlapped with CGIs. As shown in
Bisulfite DNA sequencing data is generated from serial samples of maternal plasma and paired leukocytes obtained from 6 normal term primigravid pregnancies within gestational age windows centered at each of weeks 12, 26, and 39 weeks of gestation. All samples are obtained from women with normal healthy pregnancies who went on to deliver at term. Six matched non-pregnant controls are analyzed. Individuals are matched for gestational age, race/ethnicity, fetal sex, smoking history, and BMI. DNA is extracted from maternal plasma as previously described (Chu et al., PLoS One. 2017; 12:e0171882). Bisulfite-converted DNA libraries undergo targeted capture using the SeqCap Epi CpGiant Enrichment Kit Roche, Pleasanton, Calif.) and sequenced to a read depth of −150×. This approach targets 80.4 Mb of the human genome and 5.5 million individual CpG sites. The Bismark Bisulfite Read Mapper (Krueger et al., Bioinformatics. 2011; 27:1571-1572) is used to align the libraries against the GRCh38 reference genome and determine the methylation status for each CpG site. The beta-binomial test implemented in the R packages methylSig (Park et al., Bioinformatics. 2014; 30:2414-2422) and DSS (Park et al., Bioinformatics. 2016; 32:1446-1453) are used to compare the methylation status of groups of samples. Specifically, for each pair of sample groups and each CpG site with sufficient coverage, it is tested whether the methylation rate of that CpG site is the same in cases and controls. The origin (maternal or fetal) of mcfDNA species was determined via the detection of polymorphic sequence variants that are uniquely paternal and therefore inherited by the fetus. Informative variants are single nucleotide polymorphisms (SNPs) for which the mother is a homozygote and the fetus is a heterozygote (Koh et al., Proc Natl Acad Sci USA. 2014; 111:7361-7366). Reference genotypes are obtained for mothers and fathers using oligonucleotide microarrays. The p values of the beta-binomial tests for methylation states are adjusted using the Benjamini and Hochberg method to control the false discovery rate (Benjamini et al., J Roy Stat Soc B Met. 1995; 57:289-300).
Besides characterizing the temporal dynamic changes in levels of mcfDNA, pregnancy specific circulating mcfDNA is defined and determined in many cases whether they are maternally or fetally-derived. This insight improves the understanding of the biology and epigenomic landscape of circulating cell-free nucleic acids in pregnancy.
Study 2
To identify DNA methylation signatures in placenta associated with cell lineage, gestational age and spontaneous preterm birth, cell-type-specific DNA methylation signatures are characterized in both early gestational chorionic villus biopsies (12 weeks) and placental villus samples obtained at normal delivery (39 weeks). Placental and paired plasma and leukocyte samples obtained at delivery are also investigated in a group of women whose pregnancies resulted in preterm birth with confirmed placental malperfusion and inflammation. Cell type-specific analysis of these placentas improves the understanding of DNA methylation signatures associated with defined placental pathology in sPTB. Furthermore, plasma and leukocytes from these sPTB individuals are compared with plasma and leukocytes from gestational age-matched controls whose pregnancies are progressing normally and later deliver at term.
There are limited publicly available data describing the genome-wide DNA methylation architecture of specific human cell types in reproductive systems, particularly in early gestation. A number of studies (Chu et al., PLoS One. 2011; 6:e14723; Bunce, Prenat Diagn. 2012; 32:542-549; Chu et al., Prenat Diagn. 2009; 29:1020-1030) have provided insight but these are generally restricted to analyses of heterogenous tissue biopsies containing multiple cell types. Given that previous evidence points towards trophoblasts as the primary contributors to fetal DNA fragments in maternal plasma (Alberry, Prenatal diagnosis. 2007; 27:415-418; Lo et al., Pediatr Res. 2003; 53:16-17), it is essential to definitively characterize DNA methylation in purified cytrophoblasts and syncytiotrophoblasts. This generates critical information relating to the cell-type specificity of these epigenetic profiles and their key differences compared to maternal leukocyte. It also provides fundamental insight into their epigenomic architecture both in early gestation, at term, and in the context of sPTB pathobiology with placental malperfusion and inflammation. Plasma and leukocytes are collected from gestational age-matched controls with ongoing normal pregnancies that later deliver at term. These are compared with plasma and leukocytes obtained at delivery from mothers who experience sPTB. The presently disclosed subject matter of this Example can characterize placental and non-hematopoietic maternal DNA methylation signals in plasma, which enables insights into the sPTB phenotype in a developmentally appropriate gestational age-matched context. This has not been achieved in prior studies using conventional approaches because of the need to compare sPTB samples with normal term controls (at term).
Comprehensive comparisons of DNA methylation between first trimester chorionic villus (CV) tissue and maternal leukocytes were performed previously. These experiments were first performed using a novel custom microarray assay that was developed to provide a relatively unbiased assessment of methylation across chromosomes 13, 18 and 21 (Chu et al., PLoS One. 2011; 6:e14723; Bunce, Prenat Diagn. 2012; 32:542-549). In these experiments, it was not specifically focus on regions of perceived biological significance and so the data reveal unbiased insight into the epigenetic architecture of the first trimester chorionic villus. The results confirmed that CV tissue is generally hypomethylated relative to maternal leukocytes (
In order to identify broad regions of interest, a “sliding windows” approach was used (Chu et al., PLoS One. 6:e14723). It was observed that the CV and maternal leukocyte genomes shared common features but that T-DMRs tend to cluster together in distinct chromosomal locations. It was notable that regions dense in T-DMRs were also those that encode the fewest numbers of structural genes (compare the top and middle panels of
Bisulfite Sequencing for Comprehensive Analysis of DNA Methylation
To further develop the understanding of DNA methylation in the early gestational placenta and demonstrate the proficiency of relevant methodology, a detailed analysis of first trimester CV and maternal leukocyte methylation profiles was performed using targeted bisulfite sequencing. A commercially available oligonucleotide panel (Agilent Methylseq) was used to target an 80.4 Mb region covering all human chromosomes, with emphasis on the capture of the exome, promoters, and known CpG islands. For each sample, an average sequencing depth of 29× coverage was achieved. The distributions of single CpG sites were examined that were identified as differentially methylated (DM) between CV and MBC. A multiple testing significance filter of q=<0.1 and a % methylation filter of >+/−10% was used for these analyses.
Laser Capture Microdissection and Whole Genome Bisulfite DNA Sequencing
Laser capture microdissection (LCM) followed by whole genome bisulfite sequencing (WGBS) in limited tissue samples were performed as shown in
Placental tissues are obtained in early gestation (week 12) via chorionic villus sampling and also, from the same women, after normal term delivery (week 39). Blood samples (plasma and leukocytes) are collected at the same time points from the same individuals. These individuals will overlap with those sampled in Study 1. Placental and paired plasma and leukocyte samples obtained at delivery are further investigated from a group of women whose pregnancies resulted in preterm birth with confirmed placental malperfusion and inflammation. Cell type-specific analysis of these placentas improves the understanding of DNA methylation signatures associated with defined placental pathology in sPTB. Furthermore, plasma and leukocytes from these sPTB individuals are compared with plasma and leukocytes from gestational age-matched normal controls whose pregnancies are ongoing.
DNA methylation analysis methods for leukocytes and plasma are disclosed in the above Study 1. Placental tissue samples are snap frozen on dry ice and stored at −80° C. Blood samples are centrifuged and buffy coat and plasma snap frozen and stored at −80° C. Laser capture is performed using protocols for microdissection, DNA extraction and WGBS as shown in
Standardized performance and classification of placental pathologic examination and diagnostic classification was performed as disclosed in Catov et al., Placenta. 2015; 36:687-692. Placental malperfusion and inflammation are among the most common lesions associated with preterm birth, and standardized and validated approaches have been incorporated to classification and sub classification of placental pathologic findings into the presently disclosed perinatal research. Using this validated system, it is straightforward to define the eligible cases of preterm birth with placental malperfusion and/or inflammation.
The sizes of tissue samples can be limited when targeting gestational week 12, but they are sufficient to generate enough DNA for effective bisulfite conversion and sequencing (see
Study 3
To identify non-invasive biomarkers for the prediction of spontaneous preterm birth, existing banked plasma samples were used to compare methylation signatures in cell-free DNA obtained in early gestational (11-13 weeks) from women whose pregnancies later ended in sPTB birth with those who later delivered at term.
Study 3 identifies associations between an eventual outcome of sPTB and early gestational DNA methylation signatures in maternal plasma. This is achieved using a large cohort of plasma samples obtained from mothers who are enrolled in the ongoing NIPT research program. These samples, obtained between 11 and 13 weeks of gestation, have all been collected with IRB approval for use in the context of the current proposal and have been handled and stored in a manner that optimizes their use for non-invasive prenatal testing.
Bisulfite sequencing was performed following solution-phase hybridization capture of cell-free DNA obtained from maternal plasma obtained between 11-13 weeks gestation from pregnant women who later developed either preterm severe preeclampsia <32 weeks (SPE<32) (n=5) and term preeclampsia without severe symptoms (MPE) (n=5). A normal control (NC) group (n=6) was also analyzed. Individuals were matched for gestational age and fetal gender (male) and were all non-smokers. Methylation signatures present in mcfDNA was identified in early gestation, before the onset of preeclampsia symptoms. Using the logistic regression test as implemented in the R package methylKit (Akalin et al., Genome Biol. 2012; 13:R87), after adjusting the p values to control the false discovery rate, n=552 significantly differentially methylated CpG sites were identified that distinguish SPE<32 plasma from NC and n=549 that distinguish MPE plasma from NC. The most significant of these are shown in
DNA is extracted and genome-wide bisulfite sequencing carried out as disclosed in Study 1. Plasma DNA methylation is analyzed in a minimum of n=50 sPTB<34 weeks gestation with confirmed placental malperfusion and inflammation, and n=50 normal controls (NC). n=3750 plasma samples have been collected and stored, and sample acquisition continues at a rate of n=1150 per year with total >8000. The bisulfite sequencing libraries are prepared and processed as disclosed in Study 1. The beta-binomial test is used, implemented in the R packages methylSig and DSS, to identify CpG sites differentially methylated in plasma cfDNA samples collected between weeks 11-13 gestation from women who later experienced sPTB compared to normal controls. Cases and controls are matched with the following considerations: gestational age, ethnicity, fetal gender, smoking history, BMI, gravidity. Exclusion criteria will include multiple gestation, maternal autoimmune diseases, anti-phospholipid antibody syndrome, IVF Conception, maternal pre-pregnancy diabetes and maternal pre-pregnancy chronic hypertension. The differentially methylated CpG sites are intersected with those whose methylation states have changed during the pregnancy, as identified in Study 1 disclosed herein. Those differentially methylated CpG sites that are maternal or fetal specific are identified, as determined in Study 1 disclosed herein. A minimum of 50 differentially methylated sites is selected for multiplex PCR-targeted bisulfite sequencing using methods described in Chu et al., PLoS One. 2014; 9:e107318. Both the elastic net and the support vector machine algorithms are employed to predict the risk of sPTB using the plasma DNA methylation levels at those selected sites. Similarly, leave-one-out cross validation is used to evaluate the performance of the elastic net models and support vector machine models. The false positive rate and the false negative rate, as well as their 95% confidence intervals, are estimated.
Power Analysis of Study 3
Using logit transformation, the methylation levels of 0.375 and 0.625 are transformed to about −0.51 and 0.51. Assuming the within group variance of the logit transformed methylation levels is 1, then with 50 samples per group, the difference between the methylation levels of 0.375 and 0.625 for a CpG site can be detected at the significance level of 0.0004 with a power of 0.9. Assuming that for only 1% of the targeted CpG sites the difference in methylation levels are about 0.25 or more between the two groups, by setting the significance level at 0.0004, the false discovery rate can be controlled at <0.05 when detecting these differentially methylated CpG sites.
One potential challenge is that phenotypic heterogeneity may reduce the sensitivity of the methods disclosed herein in identifying differentially methylated CpG sites with the false discovery rate controlled at 0.05. To address this problem, large numbers of appropriate maternal plasma samples and associated clinical outcome data are obtained. This allows the application of stringent inclusion/exclusion criteria when selecting the study cohorts. To further mitigate this possibility, the logit transformed preeclamptic methylation data is clustered using the non-negative matrix factorization method as implemented in the Python module Nimfa, then a beta-binomial test is performed between each subgroup of preeclamptic plasma samples and the normal plasma samples to identify differentially methylated CpG sites.
REFERENCES
- 1. Chu T, Shaw P A, Yeniterzi S, Dunkel M, Rajkovic A, Hogge W A, et al. Comparative evaluation of the minimally-invasive karyotyping (mink) algorithm for non-invasive prenatal testing. PLoS One. 2017; 12:e0171882
- 2. Rabinowitz M, Valenti E, Pettersen B, Sigurjonsson S, Hill M, Zimmermann B. Noninvasive aneuploidy detection by multiplexed amplification and sequencing of polymorphic loci. Obstet Gynecol. 123 Suppl 1:167S
- 3. Jiang P, Tong Y K, Sun K, Cheng S H, Leung T Y, Chan K C, et al. Gestational age assessment by methylation and size profiling of maternal plasma DNA: A feasibility study. Clin Chem. 2017; 63:606-608
- 4. Sun K, Lun F M F, Leung T Y, Chiu R W K, Lo Y M D, Sun H. Noninvasive reconstruction of placental methylome from maternal plasma DNA: Potential for prenatal testing and monitoring. Prenat Diagn. 2018; 38:196-203
- 5. Chu T, Bunce K, Shaw P, Shridhar V, Althouse A, Hubel C, et al. Comprehensive analysis of preeclampsia-associated DNA methylation in the placenta. PLoS One. 2014; 9:e107318
- 6. Han Y, Yang Z, Ding X, Yu H, Yi Y. Variation of long-chain 3-hydroxyacyl-coa dehydrogenase DNA methylation in placenta of different preeclampsia-like mouse models. Zhonghua Fu Chan Ke Za Zhi. 2015; 50:740-746
- 7. Mayne B T, Leemaqz S Y, Smith A K, Breen J, Roberts C T, Bianco-Miotto T. Accelerated placental aging in early onset preeclampsia pregnancies identified by DNA methylation. Epigenomics. 2017; 9:279-289
- 8. van den Berg C B, Chaves I, Herzog E M, Willemsen S P, van der Horst G T J, Steegers-Theunissen R P M. Early- and late-onset preeclampsia and the DNA methylation of circadian clock and clock-controlled genes in placental and newborn tissues. Chronobiol Int. 2017; 34:921-932
- 9. Xuan J, Jing Z, Yuanfang Z, Xiaoju H, Pei L, Guiyin J, et al. Comprehensive analysis of DNA methylation and gene expression of placental tissue in preeclampsia patients. Hypertens Pregnancy. 2016; 35:129-138
- 10. Ye Y, Tang Y, Xiong Y, Feng L, Li X. Bisphenol a exposure alters placentation and causes preeclampsia-like features in pregnant mice involved in reprogramming of DNA methylation of wnt2. FASEB J. 2019; 33:2732-2742
- 11. Yeung K R, Chiu C L, Pidsley R, Makris A, Hennessy A, Lind J M. DNA methylation profiles in preeclampsia and healthy control placentas. Am J Physiol Heart Circ Physiol. 2016; 310:H1295-1303
- 12. Zhang L, Leng M, Li Y, Yuan Y, Yang B, Li Y, et al. Altered DNA methylation and transcription of wnt2 and dkk1 genes in placentas associated with early-onset preeclampsia. Clin Chim Acta. 2019; 490:154-160
- 13. Barcelona de Mendoza V, Wright M L, Agaba C, Prescott L, Desir A, Crusto C A, et al. A systematic review of DNA methylation and preterm birth in african american women. Biol Res Nurs. 2017; 19:308-317
- 14. Behnia F, Parets S E, Kechichian T, Yin H, Dutta E H, Saade G R, et al. Fetal DNA methylation of autism spectrum disorders candidate genes: Association with spontaneous preterm birth. Am J Obstet Gynecol. 2015; 212:533 e531-539
- 15. Burris H H, Rifas-Shiman S L, Baccarelli A, Tarantini L, Boeke C E, Kleinman K, et al. Associations of line-1 DNA methylation with preterm birth in a prospective cohort study. J Dev Orig Health Dis. 2012; 3:173-181
- 16. Hong X, Sherwood B, Ladd-Acosta C, Peng S, Ji H, Hao K, et al. Genome-wide DNA methylation associations with spontaneous preterm birth in us blacks: Findings in maternal and cord blood samples. Epigenetics. 2018; 13:163-172
- 17. Liu Y, Hoyo C, Murphy S, Huang Z, Overcash F, Thompson J, et al. DNA methylation at imprint regulatory regions in preterm birth and infection. Am J Obstet Gynecol. 2013; 208:395 e391-397
- 18. Menon R, Conneely K N, Smith A K. DNA methylation: An epigenetic risk factor in preterm birth. Reprod Sci. 2012; 19:6-13
- 19. Parets S E, Conneely K N, Kilaru V, Fortunato S J, Syed T A, Saade G, et al. Fetal DNA methylation associates with early spontaneous preterm birth and gestational age. PLoS One. 2013; 8:e67489
- 20. Parets S E, Conneely K N, Kilaru V, Menon R, Smith A K. DNA methylation provides insight into intergenerational risk for preterm birth in african americans. Epigenetics. 2015; 10:784-792
- 21. Vidal A C, Benjamin Neelon S E, Liu Y, Tuli A M, Fuemmeler B F, Hoyo C, et al. Maternal stress, preterm birth, and DNA methylation at imprint regulatory sequences in humans. Genet Epigenet. 2014; 6:37-44
- 22. Wang X M, Tian F Y, Fan L J, Xie C B, Niu Z Z, Chen W Q. Comparison of DNA methylation profiles associated with spontaneous preterm birth in placenta and cord blood. BMC Med Genomics. 2019; 12:1
- 23. Chu T, Handley D, Bunce K, Surti U, Hogge W A, Peters D G. Structural and regulatory characterization of the placental epigenome at its maternal interface. PLoS One. 2011; 6:e14723
- 24. Strauss J F, 3rd, Romero R, Gomez-Lopez N, Haymond-Thornburg H, Modi B P, Teves M E, et al. Spontaneous preterm birth: Advances toward the discovery of genetic predisposition. Am J Obstet Gynecol. 2018; 218:294-314 e292
- 25. Heng Y J, Pennell C E, McDonald S W, Vinturache A E, Xu J, Lee M W, et al. Maternal whole blood gene expression at 18 and 28 weeks of gestation associated with spontaneous preterm birth in asymptomatic women. PLoS One. 2016; 11:e0155191
- 26. Ngo T T M, Moufarrej M N, Rasmussen M H, Camunas-Soler J, Pan W, Okamoto J, et al. Noninvasive blood tests for fetal development predict gestational age and preterm delivery. Science. 2018; 360:1133-1136
- 27. Karlas T, Weise L, Kuhn S, Krenzien F, Mehdorn M, Petroff D, et al. Correlation of cell-free DNA plasma concentration with severity of non-alcoholic fatty liver disease. J Transl Med. 2017; 15:106
- 28. Hardy T, Zeybel M, Day C P, Dipper C, Masson S, McPherson S, et al. Plasma DNA methylation: A potential biomarker for stratification of liver fibrosis in non-alcoholic fatty liver disease. Gut. 2017; 66:1321-1328
- 29. Liggett T, Melnikov A, Tilwalli S, Yi Q, Chen H, Replogle C, et al. Methylation patterns of cell-free plasma DNA in relapsing-remitting multiple sclerosis. J Neurol Sci. 2010; 290:16-21
- 30. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C H, Angelo M, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA. 2001; 98:15149-15154
- 31. DeRisi J, Penland L, Brown P O, Bittner M L, Meltzer P S, Ray M, et al. Use of a cdna microarray to analyse gene expression patterns in human cancer. Nat Genet. 1996; 14:457-460
- 32. Chen B, Dias P, Jenkins J J, 3rd, Savell V H, Parham D M. Methylation alterations of the myod1 upstream region are predictive of subclassification of human rhabdomyosarcomas. Am J Pathol. 1998; 152:1071-1079
- 33. Ghosh R K, Pandey T, Dey P. Liquid biopsy: A new avenue in pathology.
Cytopathology. 2018
- 34. Shen S Y, Singhania R, Fehringer G, Chakravarthy A, Roehrl M H A, Chadwick D, et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature. 2018; 563:579-583
- 35. Koh W, Pan W, Gawad C, Fan H C, Kerchner G A, Wyss-Coray T, et al. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proc Natl Acad Sci USA. 2014; 111:7361-7366.
- 36. Lapaire O, Grill S, Lalevee S, Kolla V, Hosli I, Hahn S. Microarray screening for novel preeclampsia biomarker candidates. Fetal Diagn Ther. 2012; 31:147-153.
- 37. Winn V D, Gormley M, Fisher S J. The impact of preeclampsia on gene expression at the maternal-fetal interface. Pregnancy Hypertens. 2011; 1:100-108.
- 38. Kang J H, Song H, Yoon J A, Park D Y, Kim S H, Lee K J, et al. Preeclampsia leads to dysregulation of various signaling pathways in placenta. J Hypertens. 2011; 29:928-936.
- 39. Chaouat G, Rodde N, Petitbarat M, Bulla R, Rahmati M, Dubanchet S, et al. An insight into normal and pathological pregnancies using large-scale microarrays: Lessons from microarrays. J Reprod Immunol. 2011; 89:163-172.
- 40. Varkonyi T, Nagy B, Fule T, Tarca A L, Karaszi K, Schonleber J, et al. Microarray profiling reveals that placental transcriptomes of early-onset hellp syndrome and preeclampsia are similar. Placenta. 2011; 32 Suppl:S21-29.
- 41 Yuen R K, Penaherrera M S, von Dadelszen P, McFadden D E, Robinson W P. DNA methylation profiling of human placentas reveals promoter hypomethylation of multiple genes in early-onset preeclampsia. Eur J Hum Genet. 2010; 18:1006-1012.
- 42. Blair J D, Yuen R K, Lim B K, McFadden D E, von Dadelszen P, Robinson W P. Widespread DNA hypomethylation at gene enhancer regions in placentas associated with early-onset pre-eclampsia. Mol Hum Reprod. 2013; 19:697-708.
- 43. Fan H C, Blumenfeld Y J, Chitkara U, Hudgins L, Quake S R. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci USA. 2008; 105:16266-16271.
- 44. Lo Y M, Lun F M, Chan K C, Tsui N B, Chong K C, Lau T K, et al. Digital per for the molecular detection of fetal chromosomal aneuploidy. Proc Natl Acad Sci USA. 2007; 104:13116-13121.
- 45. Lun F M, Chiu R W, Allen Chan K C, Yeung Leung T, Kin Lau T, Dennis Lo Y M. Microfluidics digital per reveals a higher than expected fraction of fetal DNA in maternal plasma. Clin Chem. 2008; 54:1664-1672.
- 46. Chu T, Yeniterzi S, Rajkovic A, Hogge W A, Dunkel M, Shaw P, et al. High resolution non-invasive detection of a fetal microdeletion using the gcrem algorithm. Prenat Diagn. 2014; 34:469-477
- 47. Peters D, Chu T, Yatsenko S A, Hendrix N, Hogge W A, Surti U, et al. Noninvasive prenatal diagnosis of a fetal microdeletion syndrome. N Engl J Med. 2011; 365:1847-1848
- 48. Yatsenko S A, Peters D G, Saller D N, Chu T, Clemens M, Rajkovic A. Maternal cell-free DNA-based screening for fetal microdeletion and the importance of careful diagnostic follow-up. Genetics in medicine: official journal of the American College of Medical Genetics. 2015
- 49. Lau T K, Cheung S W, Lo P S, Pursley A N, Chan M K, Jiang F, et al. Non-invasive prenatal testing for fetal chromosomal abnormalities by low-coverage whole-genome sequencing of maternal plasma DNA: Review of 1982 consecutive cases in a single center. Ultrasound Obstet Gynecol. 2014; 43:254-264
- 50. Catov J M, Peng Y, Scifres C M, Parks W T. Placental pathology measures: Can they be rapidly and reliably integrated into large-scale perinatal studies? Placenta. 2015; 36:687-692
- 51. Catov J M, Scifres C M, Caritis S N, Bertolet M, Larkin J, Parks W T. Neonatal outcomes following preterm birth classified according to placental features. Am J Obstet Gynecol. 2017; 216:411 e411-411 e414
- 52. Krueger F, Andrews S R. Bismark: A flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics. 2011; 27:1571-1572
- 53. Park Y, Figueroa M E, Rozek L S, Sartor M A. Methylsig: A whole genome DNA methylation analysis pipeline. Bioinformatics. 2014; 30:2414-2422
- 54. Park Y, Wu H. Differential methylation analysis for bs-seq data under general experimental design. Bioinformatics. 2016; 32:1446-1453
- 55. Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J Roy Stat Soc B Met. 1995; 57:289-300
- 56. Bunce K, Chu T, Surti U, Hogge W A, Peters D G. Discovery of epigenetic biomarkers for the noninvasive diagnosis of fetal disease. Prenat Diagn. 2012; 32:542-549
- 57. Chu T, Burke B, Bunce K, Surti U, Allen Hogge W, Peters D G. A microarray-based approach for the identification of epigenetic biomarkers for the noninvasive diagnosis of fetal disease. Prenat Diagn. 2009; 29:1020-1030
- 58. Alberry M, Maddocks D, Jones M, Abdel Hadi M, Abdel-Fattah S, Avent N, et al. Free fetal DNA in maternal plasma in anembryonic pregnancies: Confirmation that the origin is the trophoblast. Prenatal diagnosis. 2007; 27:415-418
- 59. Lo Y M. Fetal DNA in maternal plasma/serum: The first 5 years. Pediatr Res. 2003; 53:16-17
- 60. Akalin A, Kormaksson M, Li S, Garrett-Bakelman F E, Figueroa M E, Melnick A, et al. Methylkit: A comprehensive r package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 2012; 13:R87
A1. A method for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a pregnant subject comprising:
(a) obtaining a biological sample from the subject;
(b) determining the methylation status and/or level of one or more genomic loci in the biological sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and
(d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject.
A2. The method of embodiment A1, wherein an increase in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject or a decrease in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject.
A3. The method of embodiment A1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample and an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject.
A4. The method of any one of embodiments A1-A3, wherein the biological sample is selected from the group consisting of a blood sample, stool sample, saliva sample and urine sample.
A5. The method of any one of embodiments A1-A4, wherein the biological sample is obtained from the pregnant subject between week 10 and week 13 of gestation.
A6. The method of any one of embodiments A1-A5, wherein the one or more genomic loci comprise one or more CpG sites.
A7. The method of any one of embodiments A1-A6, wherein the pregnancy-associated disorder is selected from the group consisting of preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and intrauterine growth retardation.
A8. The method of any one of embodiments A1-A7, wherein the pregnancy-associated disorder is preeclampsia.
A9. A method for diagnosing, prognosing and/or monitoring preeclampsia in a pregnant subject comprising:
(a) obtaining a biological sample from the subject;
(b) determining the methylation status and/or level of one or more genomic loci in the biological sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and
(d) diagnosing preeclampsia in the subject,
wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates preeclampsia in the subject.
A10. The method of embodiment A9, wherein an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject or a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject.
A11. The method of embodiment A9, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample and an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample.
A12. The method of any one of embodiments A9-A11, wherein the biological sample is a blood sample, stool sample, saliva sample and urine sample.
A13. The method of any one of embodiments A9-A12, wherein the biological sample is obtained from the pregnant subject between week 10 and week 13 of gestation.
A14. The method of any one of embodiments A9-A13, wherein the one or more genomic loci comprise one or more CpG sites.
A15. A method for determining if a pregnant subject is at increased risk of having a preterm birth comprising:
(a) obtaining a biological sample from the subject;
(b) determining the methylation status and/or level of one or more genomic loci in the biological sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and
(d) determining that the subject is at an increased risk of preterm birth,
wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates that the subject is at an increased risk of preterm birth.
A16. The method of embodiment A15, wherein an increase in the level of methylation of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth or a decrease in the level of methylation of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth.
A17. The method of embodiment A15, wherein an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth and a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth.
A18. The method of any one of embodiments A15-A17, wherein the biological sample is a blood sample, stool sample, saliva sample and urine sample.
A19. The method of any one of embodiments A15-A18, wherein the biological sample is obtained from the pregnant subject between week 10 and week 13 of gestation.
A20. The method of any one of embodiments A15-A20, wherein the one or more genomic loci comprise one or more CpG sites.
A21. A method of treating a pregnancy-associated disorder in a pregnant subject comprising:
(a) obtaining a biological sample from the subject;
(b) determining the methylation status and/or level of one or more genomic loci present in the biological sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference;
(d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject; and
(e) treating the subject diagnosed with the pregnancy-associated disorder.
A22. A method of treating a pregnancy-associated disorder in a pregnant subject comprising;
(a) diagnosing a pregnancy-associated disorder in the subject by utilization of the algorithm disclosed in Example Embodiment C; and
(b) treating the subject diagnosed with the pregnancy-associated disorder.
A23. The method of embodiment A21 or A22, wherein the pregnancy-associated disorder is selected from the group consisting of preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and intrauterine growth retardation.
A24. The method of embodiment A21, A22 or A23, wherein the pregnancy-associated disorder is preeclampsia.
A25. The method of embodiment A24, wherein treating the subject diagnosed with preeclampsia comprises one or more of the following:
(a) administration of an anti-hypertensive medication;
(b) delivery;
(c) administration of a corticosteroid;
(d) administration of an HMG-CoA reductase inhibitor; and/or
(e) administration of an anti-convulsant medication.
A26. The method of any one of embodiments A8-A14, A24 and A25, wherein the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have preeclampsia or from a non-pregnant subject.
A27. The method of any one of embodiments A1-A7 and A21, wherein the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have the pregnancy-associated disorder or from a non-pregnant subject.
A28. The method of any one of embodiments A1-A27, wherein the subject is human.
A29. The method of any one of embodiments A1-A28, wherein the one or more genomic loci are present within maternal nucleic acids isolated from the biological sample.
A30. The method of embodiment A29, wherein the maternal nucleic acids are obtained from leukocytes in the biological sample.
A31. The method of embodiment A29, wherein the maternal nucleic acids are cell-free nucleic acids in the biological sample.
A32. The method of embodiment A31, wherein the maternal nucleic acids are placental nucleic acids.
A33. The method of any one of embodiments A1-A28, wherein the one or more genomic loci are present within fetal nucleic acids isolated from the biological sample.
A34. The method of embodiment A33, wherein the fetal nucleic acids are cell-free nucleic acids in the biological sample.
A35. A method for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject comprising:
(a) obtaining a biological sample from the subject;
(b) determining the fraction of fetal nucleic acid (fetal fraction) in the biological sample;
(c) determining the methylation status of one or more genomic loci in placental nucleic acids present in the biological sample; and
(d) diagnosing the subject with the pregnancy-associated disorder by analyzing the fetal fraction and the methylation status of the genomic loci in the placental nucleic acid.
A36. The method of embodiment A35, wherein the pregnancy-associated disorder is selected from the group consisting of preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and intrauterine growth retardation.
A37. The method of embodiment A35 or A36, wherein the pregnancy-associated disorder is preeclampsia.
A38. The method of any one of embodiments A35-A37, wherein the biological sample is a blood sample, stool sample, saliva sample and urine sample.
A39. The method of any one of embodiments A35-A37, wherein the methylation status is the methylation rate of the one or more genomic loci.
A40. The method of any one of embodiments A35-A39, wherein the fetal fraction is determined by:
(a) analyzing the methylation status of one or more reference genomic loci in the biological sample; and
(b) analyzing the methylation status of the one or more reference genomic loci in a reference sample of maternal blood cells;
(c) analyzing the methylation status of the one or more reference genomic loci in a reference sample of placental nucleic acids.
A41. The method of any one of embodiments A35-A32, wherein the methylation status of the one or more genomic loci is determined by:
(a) analyzing the methylation status of the one or more genomic loci in the biological sample; and
(b) analyzing the methylation status of the one or more genomic loci in a reference sample of maternal blood cells.
A42. A kit for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject comprising a means for determining and/or detecting the methylation status of one or more genomic loci.
A43. The kit of embodiment A42, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
This Example provides methods for diagnosing, prognosing, monitoring and/or treating pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject. For example, but not by way of limitation, the methods disclosed in this Example include determining the methylation status of one or more genomic loci in a biological sample of a pregnant subject. This Example further provides algorithms and kits for performing such methods.
Example 2—Methods of Assessing DNA Methylation in Cerebrospinal Fluid for Phenotyping and Diagnosis of Central Nervous System Disorders Introduction of this ExampleThis Example provides methods, algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating central nervous system (“CNS”) disorders.
Background of this ExampleBrain and spinal cord biopsies have been used to diagnose CNS tumors, infections, inflammation, and other CNS disorders. This procedure involves the removal of a small piece of brain or spinal cord tissue for the diagnosis of abnormalities of the CNS. The procedure can also be used for identifying brain- or spinal-cord specific molecular phenotypes. However, brain and spinal cord biopsy is an invasive procedure that carries the risk associated with anesthesia and surgery, for example, CNS injury, seizure, death, and complications.
Liquid biopsy is a non-invasive method for diagnosis and phenotyping via the analysis of circulating cell-free nucleic acids. Liquid biopsy has been used in reproductive genetics for detecting fetal aneuploidy (Chu T et al., Bioinformatics. 2009; 25(10):1244-50; Fan H C et al., Proc Natl Acad Sci USA. 2008; 105(42):16266-71; Chiu R W et al., Proc Natl Acad Sci USA. 2008; 105(51):20458-63), fetal microdeletions and duplications (Chu T et al., Bioinformatics. 2009; 25(10):1244-50; Peters D et al., N Engl J Med. 365(19):1847-8; Chu T et al., Prenat Diagn. 2014; 34(5):469-77; Chu T et al., PLoS One. 2016; 11(6):e0153182), and single nucleotide variants (Camunas-Soler J et al., Clin Chem. 2018; 64(2):336-45) by sequencing DNA obtained from maternal plasma. Liquid biopsy has also been used for detecting and quantifying mutations in tumor-derived DNA obtained from plasma, and other fluid reservoirs including cerebrospinal fluid (“CSF”) (De Rubis G et al., Trends Pharmacol Sci. 2019; Muinelo-Romay L et al., Int J Mol Sci. 2018; 19(8); Seoane J et al., Ann Oncol. 2018; Hiemcke-Jiwa L et al., Crit Rev Oncol Hematol. 2018; 127:56-65; Hiemcke-Jiwa L S et al., Hematol Oncol. 2018; 36(2):429-35).
DNA methylation is involved in the regulation of gene expression (Ball M P et al., Nature biotechnology. 2009; 27(4):361-8) and DNA methylation patterns can be altered as a consequence of environmental exposure and are a central component of disease pathogenesis (van Vliet J et al., CMLS. 2007; 64(12):1531-8; Abdolmaleky H M et al., Human molecular genetics. 2006; 15(21):3132-45). However, very little is understood regarding the potential of DNA methylation in cerebrospinal fluid to provide non-invasive diagnostic and phenotypic information on CNS.
Therefore, there remains a need for non-invasive methods for diagnosing CNS disorders.
Summary of this ExampleThis Example provides methods for diagnosing, prognosing, monitoring, classifying and/or treating CNS disorders, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. For example, but not by way of limitation, the methods of this Example include determining the methylation status of one or more genomic loci in a cerebrospinal fluid sample of a subject. This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating CNS disorders.
In one aspect, this Example provides a method for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject comprising: (a) obtaining a cerebrospinal fluid sample from the subject; (b) determining the methylation status and/or level of one or more genomic loci in the cerebrospinal fluid sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and (d) diagnosing the CNS disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject.
In another aspect, this Example provides a method of treating a CNS disorder in a subject comprising: (a) obtaining a cerebrospinal fluid sample from the subject; (b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference; (d) diagnosing a CNS disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject; and (e) treating the subject diagnosed with the CNS disorder.
In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject.
In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a cerebrospinal fluid sample obtained from a subject that does not have the CNS disorder.
In another aspect, this Example provides a method of treating a CNS disorder in a subject comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject prior to a treatment of the CNS disorder; (b) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject during the treatment of the CNS disorder; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is responsive to the treatment. In certain embodiments, the method further comprises (d) administering a different treatment to the subject if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is not responsive to the treatment.
In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment.
This Example further provides for algorithms for diagnosing and/or monitoring a subject with a CNS disorder. In certain embodiments, the algorithm can be used to classify a CNS disorder of a subject.
In another aspect, this Example provides a kit for diagnosing, prognosing and/or monitoring a CNS disorder in a subject comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the cerebrospinal fluid sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the cerebrospinal fluid sample.
In certain embodiments, the CNS disorder is selected from the group consisting of brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. In certain embodiments, the subject is human.
Description of this ExampleThis Example provides methods for diagnosing, prognosing, monitoring, classifying and/or treating CNS disorders, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. For example, but not by way of limitation, the methods of this Example include determining the methylation status of one or more genomic loci in a cerebrospinal fluid sample of a subject. In certain embodiments, the methods of this Example include the use of an algorithm to diagnose, prognose, monitor, classifying and/or treat CNS disorders.
Definitions of this ExampleUnless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this invention and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of the invention and how to make and use them.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.
As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing” and “comprising” are interchangeable and one of skill in the art is cognizant that these terms are open ended terms.
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence and/or level of a biomarker in a cerebrospinal fluid sample of a subject is determined by comparing to a reference control.
The terms “reference sample,” “reference control,” “control” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a cerebrospinal fluid sample of a subject. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have a CNS disorder (e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases). In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence and/or particular level of a methylation state of a genomic locus that indicates a subject does not have a CNS disorder. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has a CNS disorder (e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases), where the methylation status of the locus is known to be not associated with the disease or the phenotype.
The term “CNS disorder” or “central nervous system disorder” as used herein, refers to any condition or disease that are caused by damage or disruption to the CNS, including the brain and spinal cord, and results in alternations of the function or structure of the CNS. Non-limiting exemplary CNS disorders include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, rasmussen's encephalitis and neurosarcoidosis, and secondary inflammation that occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders).
The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from brain or spinal cord. In certain embodiments, slightly invasive or non-invasive methods, as disclosed herein, include obtaining cerebrospinal fluid from a subject by lumbar puncture or spinal tap, ventricular puncture, cisternal puncture or from a shunt or ventricular drain. In certain embodiments, slightly invasive or non-invasive methods, as disclosed herein, include obtaining cerebrospinal fluid from the lymphatic system, e.g., the lymphatic system around nose and pharynx, of a subject.
The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.
The term “nucleic acid,” “nucleic acid molecule” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T) or uracil (U)), a sugar (i.e., deoxyribose or ribose) and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.
The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.
The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, but not by way of limitation, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island or a CpG island shore. For example, but not by way of limitation, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.
As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status” and “methylation level” refer to the presence, absence, percentage and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.
As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.
As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.
A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters 2009; 583:1713-1720. For example, but not by way of limitation, Yamada et al. (Genome Research 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50% and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A. 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.
A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.
The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome) or a portion of the subset (e.g., those areas found to be associated with a CNS disorder). A methylome from cerebrospinal fluid can be referred to a “cerebrospinal fluid methylome,” or a “cerebrospinal fluid DNA methylome.” The cerebrospinal fluid methylome is an example of a cell-free methylome since cerebrospinal fluid includes cell-free DNA (cfDNA).
As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As used herein, the terms “reduce,” “reduction” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
Methods of this ExampleThis Example provides methods for diagnosing, monitoring, classifying and/or treating CNS disorders in a subject, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases, by analyzing the methylation status of one or more genomic loci in a cerebrospinal fluid sample of the subject. In certain embodiments, the methods can utilize the algorithm disclosed herein. In certain embodiments, methods of this Example allow the early diagnosis or screening of a subject with a CNS disorder, e.g., the subject does not have any symptoms, or only have early symptoms of the CNS disorder, e.g., headache, vomiting, and nausea.
In certain embodiments, the cerebrospinal fluid samples obtained for use in this Example comprise cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, this Example comprises analyzing cfDNA in cerebrospinal fluid sample, to identify genetic phenotypes that are drivers and/or consequences of CNS disorders.
The cerebrospinal fluid sample from the subject can be collected using any suitable methods known in the art. In certain embodiments, the cerebrospinal fluid is collected by a lumbar puncture or a spinal tap, where a hollow needle is inserted into the spinal canal to collect the cerebrospinal fluid. In certain embodiments, the cerebrospinal fluid is collected by a cisternal puncture, where a needle is placed below the occipital bone to collect the cerebrospinal fluid. In certain embodiments, the cerebrospinal fluid is collected by a ventricular puncture, where a needle is inserted directly into one of the brain's ventricles. In certain embodiments, a fluoroscopy is used to assist in guiding the insertion of the needle. In certain embodiments, the cerebrospinal fluid is collected from a tube that is part of a ventriculoperitoneal shunt or an external ventricular drain.
In certain embodiments, the cerebrospinal fluid sample is collected from the subject before the subject has any symptom of the CNS disorder, i.e., a non-symptomatic subject. In certain embodiments, the cerebrospinal fluid sample is collected from the non-symptomatic subject who is at high risk of the CNS disorder, e.g., multiple members of the subject's family have a history of cancer or the subject suffered a brain trauma before. In certain embodiments, the cerebrospinal fluid sample is collected from the subject who has at least one early symptom of the CNS disorder, e.g., headache, vomiting, and nausea. In certain embodiments, the cerebrospinal fluid sample is collected from the subject who has at least one symptom of the CNS disorder, e.g., persistent headache; pain in the face, back, arms, or legs; an inability to concentrate; loss of feeling; memory loss; loss of muscle strength; tremors; seizures; increased reflexes, spasticity, tics; paralysis; and slurred speech. In certain embodiments, the cerebrospinal fluid sample is collected from the subject who has previously received or is currently receiving a treatment for a CNS disorder. In certain embodiments, two or more cerebrospinal fluid samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more cerebrospinal fluid samples) can be obtained before and during the subject is receiving the treatment of the CNS disorder (e.g., serially obtained samples).
Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example
This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject that includes analyzing the methylation status of certain genomic loci.
Non-limiting examples of CNS disorders that can be diagnosed, monitored and/or treated by the presently disclosed subject matter include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, Rasmussen's encephalitis and neurosarcoidosis and secondary inflammation that occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders).
In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a cerebrospinal fluid sample from a subject that has a CNS disorder compared to a reference sample. For example, but not by way of limitation, the methods of this Example include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a cerebrospinal fluid sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in Tables 8 & 9,
In certain embodiments, this Example provides for diagnosing, prognosing and/or monitoring a CNS disorder in a subject by detecting the DNA methylation profiles associated with the CNS disorder. In certain embodiments, the methods of this Example include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the methylation status of one or more genomic loci present in the cerebrospinal fluid sample, e.g., present within cfDNA in the cerebrospinal fluid sample, (c) comparing the methylation status of the one or more genomic loci to a reference and (d) diagnosing a CNS disorder in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject. In certain embodiments, the difference in the methylation status can also indicate the severity of the CNS disorder.
In certain embodiments, the methods disclosed herein for diagnosing, prognosing and/or monitoring a CNS disorder in a subject can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the level of methylation of the one or more genomic loci to a reference; and (d) diagnosing a CNS disorder in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject. In certain embodiments, the difference in the methylation level can also indicate the severity of the CNS disorder.
In certain embodiments, diagnosing a CNS disorder in the subject includes characterizing a phenotype of the CNS disorder, wherein the difference in the methylation status of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the phenotype of the CNS disorder. In certain embodiment, the phenotype of the CNS disorder includes the severity of the CNS disorder, prognosis of the CNS disorder, molecular expression profile of the CNS disorder, responsiveness of the CNS disorder to certain treatments, or any combinations thereof.
In certain embodiments, the methods disclosed herein for determining if a subject is at risk of developing a CNS disorder in a subject can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the level of methylation of the one or more genomic loci to a reference; and (d) determining that the subject is at risk of developing a CNS disorder, wherein the difference in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates that the subject is at risk.
In certain embodiments, diagnosing, prognosing and/or monitoring of a subject with a CNS disorder can be based on a higher or lower methylation level of the genomic locus in the cerebrospinal fluid sample of the subject relative to the methylation level in a reference sample, e.g., a cerebrospinal fluid sample from a subject that does not have the CNS disorder. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a cerebrospinal fluid sample obtained from a subject compared to a control is indicative that the subject has the CNS disorder or is at risk of developing the CNS disorder. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the cerebrospinal fluid sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the cerebrospinal fluid sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the cerebrospinal fluid sample and the increase in the level of methylation of one or more different genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder.
In certain embodiments, diagnosis of a subject with a CNS disorder can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with a CNS disorder can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with a CNS disorder can be unmethylated and the genomic locus in a reference sample can be methylated.
Diagnostic, Prognostic, Classification and Monitoring Methods Using an Algorithm of this Example
This Example further provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm, as disclosed in Example Embodiment E. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.
Methods of Treatment and/or Prevention of this Example
This Example further provides methods of treating a subject with a CNS disorder. Non-limiting examples of CNS disorders that can be treated by the presently disclosed methods include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, Rasmussen's encephalitis and neurosarcoidosis and secondary inflammation that occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders). This Example also provides methods of preventing the development of a CNS disorder in a subject.
In certain embodiments, the methods disclosed herein can include diagnosing a subject with a CNS disorder as disclosed herein, followed by treatment of the subject. Any diagnosis methods of this Example can be used with the methods for treating a subject. In certain embodiments, the methods of this Example can include determining that a subject is at risk of developing a CNS disorder as disclosed herein, followed by a method for preventing the development of the CNS disorder in the subject.
In certain embodiments, the treatment methods can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference, (d) diagnosing a CNS disorder in the subject, where the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject and (e) treating the subject diagnosed with the CNS disorder.
In certain embodiments, the methods of treatment can include diagnosing the subject with a CNS disorder by use of the algorithm disclosed herein. For example, but not by way of limitation, the treatment method can include (a) diagnosing the subject with a CNS disorder by use of the disclosed algorithm and (b) treating the subject diagnosed with the CNS disorder.
In certain embodiments, the prophylactic methods can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference, (d) determining that the subject is at risk of developing a CNS disorder, where the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates that the subject is at risk of developing a CNS disorder and (e) preventing the subject from developing the CNS disorder.
In certain embodiments, the prophylactic methods can include (a) determining that the subject is at risk of developing a CNS disorder by use of the disclosed algorithm of this Example and (b) preventing the subject from developing the CNS disorder.
Any suitable treatment methods and preventative methods known in the art can be used with the presently disclosed methods for treating and preventing CNS disorders. For example, but not by way of limitation, the subject can be treated by administration of a medication to treat the CNS disorder. Non-limiting examples of treatment methods for CNS disorders including depression, epilepsy, multiple sclerosis (MS), neurodegenerative diseases (e.g., Alzheimer's disease), neuropathic pain and schizophrenia are disclosed in DiNunzio and Williams, “CNS Disorders—Current Treatment Options and the Prospects for Advanced Therapies,” Drug Development and Industrial Pharmacy, 34:11:1141-1167 (2008).
In certain embodiments, the information provided by the methods described herein can be used by a physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of a CNS disorder is made. For example, when a subject is identified to have an increased risk of developing a CNS disorder, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with a CNS disorder (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.
In certain embodiments, this Example further provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for preventing, inhibiting or treating a CNS disorder in a subject, comprising determining the methylation status of one or more genomic loci present in a cerebrospinal fluid sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a cerebrospinal fluid sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for preventing, inhibiting and/or treating a CNS disorder in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample.
In certain embodiments, the first sample is obtained after therapeutic treatment has begun.
In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment can include measuring the methylation status and/or level of one or more genomic loci in a cerebrospinal fluid sample of a subject at a first timepoint, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second timepoint, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first timepoint is prior to an administration of the therapeutic agent, and the second timepoint is after said administration of the therapeutic agent. In certain embodiments, the first timepoint is prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) is increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the method of the present disclosure can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a cerebrospinal fluid sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced and/or stopped.
Assays of this Example
This Example further provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence and/or severity of a CNS disorder.
In certain embodiments, the assay method of this Example can include comparing the methylation status and/or level of genomic loci present in a cerebrospinal fluid sample from a subject that has a CNS disorder to the methylation status and/or level of genomic loci in a cerebrospinal fluid sample from a healthy subject to determine the methylation pattern, as disclosed above, that correlates with the presence of the CNS disorder.
In certain embodiments, the assay methods of this Example can include comparing the methylation status and/or level of genomic loci in a cerebrospinal fluid sample from a subject that has a CNS disorder at an early stage to the methylation status and/or level of genomic loci in a cerebrospinal fluid sample from a subject that has the CNS disorder at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of the CNS disorder.
Non-limiting examples of CNS disorders include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, Rasmussen's encephalitis and neurosarcoidosis, and secondary inflammation occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders).
DNA Isolation Techniques of this Example
In certain embodiments, the methods of this Example include isolating nucleic acid from a cerebrospinal fluid sample obtained from a subject. There are several platforms that are known in the art and currently available to isolate nucleic acids from cerebrospinal fluid samples. For example, but not by way of limitation, isolation of DNA from a cerebrospinal fluid sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302) and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., cerebrospinal fluid) or cells, for example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a cerebrospinal fluid sample from a subject.
Methylation Detection Techniques of this Example
Various methylation analysis procedures are known in the art, and can be used with the methods of this Example. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a cerebrospinal fluid sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.
In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).
In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci is provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.
Kits of this Example
This Example provides kits for diagnosing, monitoring, classifying and/or treating a subject with a CNS disorder. The kits of this Example comprise a means for determining and/or detecting the methylation status of one or more genomic loci.
Types of kits of this Example include, but are not limited to, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, but not by way of limitation, a kit of this Example can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, but not by way of limitation, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes of this Example comprise one or more CpG sites.
In certain non-limiting embodiments, a primer and/or probe of this Example can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.
In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.
In certain non-limiting embodiments, the kits of this Example additionally include other components such as, but not limited to, a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, negative control sequences and the like necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.
In certain embodiments, the kits of this Example include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of a CNS disorder in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.
Reports, Programmed Computers, and Systems of this Example
In certain embodiments, the presently disclosed diagnosis and/or monitoring of a CNS disorder in a subject based on the methylation status of one or more genomic loci can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).
Examples of tangible reports can include, but are not limited to, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).
A report can include, for example, an individual's medical history, or can just include size, presence, absence or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.
A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.
In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm of this Example (see Example Embodiment E). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's CNS disorder risk or treat the individual, such as by implementing a disorder management system.
The following Example Embodiments are offered to more fully illustrate the disclosure of this Example, but are not to be construed as limiting the scope thereof.
Example Embodiment D of Example 2: Non-Invasive Molecular Phenotyping of the Human Brain Via Epigenomic Liquid Biopsy of Cerebrospinal FluidExample Embodiment D discovered methods of epigenomic liquid biopsy for the comprehensive analysis of cell-free DNA (cfDNA) methylation signatures in the human central nervous system (CNS). Example Embodiment D used solution phase hybridization and high throughput bisulfite sequencing to compare DNA methylation signatures of cfDNA obtained from cerebrospinal fluid (CSF) and plasma. Recovery of cfDNA from CSF was relatively low (68 to 840/mL CSF) compared to plasma. Distributions of CpG methylation were significantly altered between CSF and plasma, both generally and at the level of specific functional elements such as exons, introns, CpG islands and shores. Sliding window analysis was used to identify differentially methylated CpG sites. Example Embodiment D found numerous gene/locus-specific differences in CpG methylation between cfDNA from CSF and plasma. These loci were more likely to be hypomethylated in CSF compared to plasma. Differentially methylated CpGs in CSF were identified in genes related to branching of neurites and neuronal development. Example Embodiment D found clear association between tissue-specific gene expression in the CNS and cfDNA methylation patterns in CSF. Ingenuity pathways analysis (IPA) of differentially methylated regions identified an enrichment of functional pathways related to neurobiology. The GTEX RNA expression database was used to analyze the presently disclosed data in the context of central nervous system (CNS)-specific gene expression. In conclusion, Example Embodiment D was the first comprehensive quantitative genome-wide analysis of DNA methylation in human CSF. The presently disclosed methods of this Example can be used for epigenomic liquid biopsy of the human CNS for molecular phenotyping of brain-derived DNA methylation signatures.
Example Embodiment D obtained CSF from healthy volunteers, and recovered brain-derived cfDNA from CSF. Example Embodiment D also quantified and catalogued the brain-derived cfDNA in a genome-wide fashion at the level of DNA methylation via bisulfite sequencing. In order to generate preliminary functional insights into brain-specific molecular phenotypes, solution-phase hybridization coupled with high-throughput sequencing was used to compare DNA methylation of cfDNA from CSF with that from plasma.
Methods
Human Samples
Plasma and CSF samples were processed for cell-free nucleic acid analysis as previously described (Chu T et al., PLoS One. 2017; 12(3):e0171882). Healthy human Lumbar puncture for CSF collection was performed under fluoroscopic guidance by the Neuroradiology Department at UPMC. All participants provided informed consent. CSF donors had no personal or first degree-relative history of psychiatric disorder or suicidal behaviour.
Preparation and Analysis of Bisulfite DNA Sequencing Libraries
DNA sequencing libraries were prepared using the SeqCap-Epi kit (Roche). Libraries were sequenced on an Illumina HiSeq 2500 instrument. Reads were quality trimmed and adaptor sequences were removed using Trim-Galore. Reads were aligned using Bismark in paired-end, bowtie2 modes. Unmapped reads were aligned using single-end, bowtie2 modes (Bismark). Duplicates were removed (Bismark). Methylation was called on pair-end and single-end files and then merged. CpG methylation information for autosomes was read into MethylSig for differential methylation analysis.
Results
Example Embodiment D examined the physical properties of cfDNA in CSF. Following extraction, DNA amount was quantified by real time PCR. The resulting yields for a series of CSF samples are shown in Table 5A.
Quantities of cfDNA are low, ranging from 68 pg to 840 pg per mL of input CSF (mean 268 ng/mL). This is significantly lower than cfDNA recovery from plasma (Table 5B).
Example Embodiment D then explored the size distribution of cfDNA from CSF and compared this to plasma. It was found that cfDNA fragments recovered from CSF had a mean size of approximately 155 bp, which was approximately 20 bp less than the mean fragment size of cfDNA recovered from plasma (Table 6).
Example Embodiment D performed solution phase hybridization capture of bisulfate-converted cfDNA from two CSF and two plasma samples. To further explore size differences between CSF and plasma, fragment size was plotted against read count, which confirmed that cfDNA fragments from CSF were significantly shorter in length than those from plasma (
To explore the distributions of CpG methylation in cfDNA from CSF, Example Embodiment D examined the percentages of CpGs within cfDNA fragments that are methylated at low methylation (LM) levels (<20%), high methylation (HM) levels (>80%) or at intermediate methylation (IM) levels (20-80%) in both CSF and plasma. It was found that CpG sites in CSF, as in plasma, are largely distributed in a biphasic manner with large numbers of LM sites and very few IM sites. This is consistent with previous reports from DNA methylation analysis in differentiated intact adult cells (Chu T et al., PLoS One. 2011; 6(2):e14723). cfDNA in CSF is represented by fewer IM sites than plasma and greater numbers of LM and HM sites (
Example Embodiment D further explored distributions of CpG methylation in the context of distinct genomic elements, specifically introns, exons, promoters, CpG islands and CpG island shores. The relationship between Group (CSF/Plasma) and Response (<20%, 20-80% and >80%) was tested, after controlling for Category (intergenic region, intron, exon or promoter) using Cochran-Mantel-Haenszel test. The p-value (<0.0001) indicates that the numbers of sites at <20%, 20-80% and >80% are significantly different between CSF and Plasma after adjusting for category. Specifically, distribution of DNA methylation varies across CSF and plasma samples adjusted for categories (intergenic, intron, exon or promoter) such that the CSF DNA methylome contains a greater percentage of >80% methylation in exons and introns compared to plasma (
Example Embodiment D further explored how the methylation signatures in cfDNA fragments from CSF differ from those of plasma in a gene/locus-specific fashion. A sliding windows analysis was performed to identify genomic regions (250 bp) whose CpG methylation characteristics differ significantly between cfDNA from CSF and plasma. As shown in
To search for biological themes within the data, Ingenuity Pathways Analysis (IPA) was performed to search for ontological patterns that characterize the sample-specific differentially methylated regions. The most significant canonical pathway identified was “axonal guidance”, containing a number of differentially methylation loci including those that map to ARHGEF7, HKR1, ITGB1, PLCH2, PRKCZ, RGS3, WNT8A. All 45 “top diseases and bio-functions” identified by IPA were related to neurobiology with the top 3 consisting of the following: 1) branching of neuritis, 2) development of neurons, and 3) developmental process of synapse (Table 8).
To further explore the functional significance of the data, Example Embodiment D used the Genotype-Tissue Expression (GTEx) project database. GTEx is an on-going effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Example Embodiment D identified genes whose expressions are low in whole blood and highly expressed in neuronal tissues. It was rationalized that leukocytes are the major (though probably not exclusive) contributors to cfDNA in plasma whereas neuronal tissues are the primary contributors to cfDNA in CSF. Thus, by identifying tissue-specific differences in gene expression in this context, it was hoped to illuminate the DNA methylation differences identified between cfDNA from plasma and CSF. As expected, a significant number of the differentially methylated regions overlap with genes whose expressions are high in the brain and low in whole blood. Examples of these genes are shown in Table 9 and
Of twenty-nine windows that map to genes whose expressions are at least 5-fold higher in brain than whole blood, 26 were found to significantly hypomethylated in cfDNA from CSF compared to plasma. Examples of genes displaying elevated gene expression and altered DNA methylation in brain versus whole blood include KLF15, which is thought to play a critical role in the maintenance of neural stem cells at late embryonic stages and functions as a transcriptional activator to promote dopamine D2 receptor expression in neurons (Ohtsuka T et al., Stem Cells. 2011; 29(11):1817-28; Zhou J et al., Biochem Biophys Res Commun. 2017; 492(2):269-74). KCNK9, which encodes the protein TASK-3 (a potassium channel protein containing a two pore-forming P domain) and is highly expressed in the cerebellum. The synaptic protein encoding gene ADGRA1 (Pandya N J et al., Sci Rep. 2017; 7(1):12107), which is highly expressed in the frontal cortex. MYRIP, which encodes a scaffolding protein involved in exocytosis, and is expressed most highly in the amygdala, anterior cingulate cortex and cerebellum. The transcriptional repressor FOXN3, which is very highly expressed in the cerebellum. RAPGEF2, which is involved in cerebral cortex development and D1 Dopamine receptor-dependent ERK phosphorylation in brain (Jiang S Z et al., eNeuro. 2017; 4(5); Ye T et al., Nat Commun. 2014; 5:4826).
Discussion of this ExampleThe presently disclosed subject matter of this Example relates to a comprehensive quantitative genome-wide analysis of DNA methylation in human cerebrospinal fluid. The presently disclosed methods and the resulting data of this Example demonstrate that epigenomic liquid biopsy of the human central nervous system can be used for molecular phenotyping of brain DNA methylation signatures.
The presently disclosed subject matter of this Example reveals differences in the physical properties of cfDNA from CSF compared to that form plasma, with fragments of the former existing in a notably shorter state. This is reminiscent of cfDNA from the fetus during pregnancy which is known to be represented by fragments of lower molecular weight than maternal cfDNA (Yu S C et al., Proc Natl Acad Sci USA. 2014; 111(23):8583-8). The presently disclosed subject matter of this Example further identified clear periodicity in fragment size of cfDNA in CSF which suggests the presence of a nucleosome footprint that provides further information about the molecular phenotype of the CNS as has been suggested in the context of plasma cfDNA (Teo Y V et al., Aging Cell. 2019; 18(1):e12890; Ulz P et al., Nat Genet. 2016; 48(10):1273-8).
Further evidence that information relating to the molecular phenotype of the CNS may be obtained via epigenomic analysis of CSF cfDNA comes from the observation that DNA methylation signatures showed clear differences between cfDNA from CSF and plasma. Direct comparison of specific loci to identify differentially methylated CpG sites revealed that CSF-derived fragments were generally more likely to me less methylated than their plasma-derived counterparts. Furthermore, differentially methylated sites appeared to be correlated with tissue-specific gene expression patterns of the CNS.
Example Embodiment D reveals notable differences in the global distribution of DNA methylation between cfDNA from CSF and plasma, with fewer numbers of CpG sites existing in an intermediately methylated state (20-80%) in CSF versus plasma. This may reflect the fact that cfDNA in CSF is derived from relatively fewer distinct cell lineages than cfDNA in plasma which presumably consists of multiple contributions from a large and diverse range of different organ systems and cell-lineages. This is logical if one considers that, in its simplest form, a haploid CpG site in a single cell exists in either a state of complete hyper or hypo methylation. Thus, the % methylation or methylation rate at a given CpG site in a population of cfDNA fragments represents some complex combination of this binary state that is likely the result of the contributions of many millions or even billions of cells.
Although DNA methylation is, at some level, an indirect corollary to gene expression in the context of non-invasive molecular phenotyping, it has the distinct advantage of being relatively stable compared to cell-free RNA and thus represents a potentially attractive substrate for analysis. Sample stability means that any future clinical analysis of DNA methylation in cfDNA from CSF could be centralized in a specialized testing laboratory following sample transit.
One advantage of the presently disclosed method of this Example is its targeted nature. The solution-phase hybridization of ˜80 Mb of the human genome enables a systematic analysis of known structural genes and regulatory elements at a read depth that is higher relative to cost than could be achieved by whole genome shotgun bisulfite sequencing. Even though the yield of cfDNA per mL of CSF was lower than that of plasma, the recent emergence of new approaches to DNA methylation analysis at single base resolution will likely increase the efficiency of these assays and enable the generation of richer data sets from many more individual samples that reflect a range of pathobiological and normal phenotypes.
The presently disclosed subject matter of this Example relating to epigenomic liquid biopsy can be used for the molecular profiling of CNS phenotypes in a variety of research and clinical settings.
Example Embodiment E of Example 2: Estimation of Abnormal Spinal Fluid Methylome Variation in Targeted Regions for Diagnosis of CNS Abnormality
Provided below is an algorithm that can be used to diagnose a subject with a CNS disorder. The presently disclosed subject matter of this Example Embodiment E provides that the methylome(s) of the central nervous, or structures therein, could be affected by certain abnormalities, and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in maternal cerebrospinal fluid (CSF), which are released by CNS tissues. An algorithm was developed to identify the changes of methylation patterns in the methylome of CSF caused by CNS phenotypes. The main insight behind this algorithm of this Example was that the methylome of the DNA fragments in CSF is a mixture of a variety of component methylomes of CNS origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal CNS phenotype. By constructing a model of CSF methylome as a linear combination of various component methylomes of CNS origins, the algorithm of this Example can accurately predict the methylation patterns of a new CSF sample under the hypothesis that it is from a normal individual. Consequently, the algorithm exhibited high sensitivity in detecting abnormal methylation patterns in a CSF sample caused by changes of the methylomes of some CNS tissues when the sample is from an affected individual.
Let i be any CpG site in human genome, zi,j be the methylation level of CpG site i in a CSF sample j, pi,r,j be the proportion of the rth component methylome mr,j of CNS origin in maternal plasma sample j at site i, mi,r,j be the methylation level of CpG i in methylome mr,j. The hypothesis is:
Zi,j=Σr=1Rpi,r,jmi,r,j (1)
where pi,r,j, mi,r,j>=0, mi,r,j<=1, pi,1,j+ . . . +pi,R,c=1.
It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any CSF j from a normal individual, it has mI,r,j=mI,r and pI,r,j=pr,j.
That is, it is assumed that in any CSF sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, CSF samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m1S, . . . , mRS, or simply m1, . . . , mR when there is no confusion. For any CSF sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let zjS be the methylome of CSF sample C restricted to S, then for some mixture vector pj=[pj,1 . . . , pj,R]T, it has:
zjs=[m1S, . . . ,mRS]pj (2)
Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets Tk such that T=Uk=1KTk where the index k represents the kth type of abnormal CNS phenotype. Tk's do not need to be disjoint. Moreover, Tk itself is the union of two disjoint sets Dk and Vk. Either Dk or Vk could be empty, but not both. It is assumed that for any CSF sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: zjC=[m1C, . . . , mRC]pj regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a CSF sample l from an abnormal individual, when restricted to CpG sites in S=C∪T, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w1S≠[m1S, . . . , mRS]pl for any mixture vector pl. More specifically, for a CSF sample l from the kth type of abnormal individual, it has: 1), wjC=[m1C, . . . , mRC]pl, 2), if DK is non-empty, then wl
T is called the target set of CpG sites, Dk is called the differential methylation target set, Vk is called the copy number variation target set, and Tk is called the target set for the kth type of abnormal individual.
The main steps of the presently disclosed algorithm are:
-
- 1) Identify the sets of reference CpG sites C, and T1, . . . , TK for the list of K types of abnormal individuals.
- 2) Estimate the restricted reference component methylomes m1, . . . , mR, or R predictor methylomes n1, . . . , nR that are independent linear combinations of the reference component methylomes such that nr=[m1, . . . , mR]qr for R linearly independent mixture vectors q1, . . . , qR.
- 3) (Optional) If the reference component methylomes are available, estimate the proportions of these components at the reference CpG sites C for the test CSF samples.
- 4) Predict the methylation level of the test CSF samples at the target set Tk of CpG sites, under the hypothesis that the sample is from a normal individual.
- 5) Compare the predicted methylation levels at Dk and Vk against the observed methylation levels, and reject the null hypothesis that a test sample is from a normal individual if the observed methylation levels are significantly different form the predicted levels.
The presently disclosed algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of CSF samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.
In the first simple implementation of the presently disclosed algorithm of this Example, it is assumed the restricted methylome of a CSF sample from a normal individual can be approximated by a mixture of two restricted reference methylomes, one representing the DNA fragments from a first specific CNS region, another representing the DNA fragments from a second specific CNS region. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, the methylome of oligodendrocytes is used as an approximation to the oligodendrocyte methylome in the CSF sample, and the methylome of neuronal cell samples from healthy individuals (HI) is used as an approximation to the neuronal methylome in the CSF sample. The implementation of the algorithm includes the following steps:
-
- 1. Identify the reference set C, and the target sets T1, . . . , TK.
- 1.1 Collect the methylation data for a set of oligodendrocyte samples, a set of neuronal cell samples, and a set of CSF samples, all from normal individuals. For each type of abnormal individuals, collect a set of oligodendrocyte samples, a set of neuronal cell samples, and a set of CSF samples from that type of abnormal individuals. All these samples should have matched age, race, and other relevant parameters. These are the training data.
- 1.2 Let xi,j be the observed methylation level of CpG site i in a normal oligodendrocyte sample j, and yi,l the observed methylation level of CpG site i in a normal neuronal cell sample l, sx,i2 the sample variance of xi,j over all normal oligodendrocyte samples, sy,i2 the sample variance of yi,j over all normal neuronal cell samples. Identify the CpG sites S0 such that for any i∈S0, it has both sx,i2<c0 and sy,i2<c0 for some constant c0. These are CpG sites with stable methylation levels in each type of normal cells.
- 1.3 Let xi,j be the observed methylation level of CpG site i in an oligodendrocyte sample j, including normal and abnormal, and yi,l the observed methylation level of CpG site i in a neuronal cell sample l, including normal and abnormal, sx,i2 the sample variance of xi,j over all oligodendrocyte samples, including normal and abnormal, sy,i2 the sample variance of yi,j over all neuronal cell samples, including normal and abnormal. Identify the CpG sites S1 such that for any i∈S1, it has both sx,i2<c0 and sy,i2<c0 for some constant c0, and that the statistical test for the difference between {xi,j0: j0 is a normal oligodendrocyte sample} and {xi,j2: jk is an abnormal oligodendrocyte sample of type k} is not significant for all abnormal types of oligodendrocyte, and that the statistical test for the difference between {yi,j0: j0 is a normal neuronal cell sample} and {yi,j2: jk is an abnormal neuronal cell sample of type k} is not significant for all abnormal types of neuronal cell. These are CpG sites with stable methylation levels in each type of cells, and with no difference in methylation level between normal and any abnormal samples. Let xi be the sample mean of xi,j over all oligodendrocyte samples, including normal and abnormal, yi the sample mean of yi,j over all neuronal cell samples, including normal and abnormal. Identify the subset C0 of S1 such that for any i∈C0, it has |xi−yi|>c1 for some constant c1. These are CpG sites that are stably methylated in each cell type, with no difference between the normal and abnormal samples of the same cell type, and differentially methylated between different types of cells.
- 1.4 Let xR
0 be the vector of xi for all i∈C0, and yC0 be the vector of yi for all i∈C0, where xi is the mean methylation at site i in all oligodendrocyte samples yi the mean methylation at site i in all neuronal cell samples. Note that by the way the set C0 is selected, there is no difference in the methylation level of any CpG sites in C0 between normal and abnormal oligodendrocyte samples, or between normal and abnormal neuronal cell samples. Let zjC0 be the observed methylation levels of CpG sites in C0 for a CSF sample j of the kth abnormal type. (For convenience, the normal CSF sample is called as sample of the 0th abnormal type). For each sample j belonging to the kth abnormal type, regress zjC0 against xC0 and yC0 , with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1, and get the residual ejC0 . Identify the subset C0k of C0 such that for any CpG i in C0k, it has
- 1. Identify the reference set C, and the target sets T1, . . . , TK.
-
-
- and ei,k2<c3 for some constants c2 and c3, where ei,k2 is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all CSF samples of the kth abnormal type, and si,k2 the sample variances of methylation levels of CpG site i in the same set of CSF samples. Repeat the above procedure for each type of abnormal CSF samples, the intersection of the subsets C=∩k=0K C0k is the reference set of J CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal CSF samples can be accurately predicted by the reference component methylomes from normal individuals.
- 1.5 Let T0=S0\S1. Let xC and xT
0 be the vectors of xi and xh for all i∈C and h∈T0 respectively, and yC and yT0 be the vectors of yi and yh for all i∈C and h∈T0 respectively, where xi, xh, yi, and yh are mean methylation level of sites for a normal oligodendrocyte or neuronal cell at sites i and h respectively. Let zjC and zjT0 and be the observed methylation levels of CpG sites in C and T0 respectively for a normal CSF sample j, wlk C and wlg T0 the observed methylation level of CpG sites in C and T0 respectively for a CSF sample lk from an individual with the kth type of abnormality, wlg C and wlg T0 the observed methylation level of CpG sites in C and T0 respectively for a CSF sample lg from an individual with the gth type of abnormality, where g≠k. For each j, lk, and lg, regress zjC, wlk C, and wlg C respectively against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. Apply the fitted models respectively to xT0 and yT0 to predict zjT0 , wlk T0 , and wlg T0 respectively, and get the differences ejT0 , elk T0 and elg T0 between the predicted values and observed values. Let ei, ei,k, and ei,g be the means of the sets of differences {eji∈ejT0 : j is a normal CSF sample}, {elk T0 : lk is a CSF sample of th kth abnormal type} and {elg T0 : lg is a CSF sample of the gth abnormal type} for CpG site i respectively. Identify the subset Tk of T0 such that for any i∈Tk, it has |ei|<c2,0, |eik|>c2,k, and |eik−eig|>c3,k, for some constants c2,0, c2,k, and c3,k, for all g≠k. Tk is the target set for the kth type of the abnormal individual. These are the sites where the methylation of a normal CSF sample can be accurately predicted, the observed methylation in a CSF sample of the kth abnormal type will deviate from the prediction, and deviation will be different from that of a CSF sample of any other abnormal type.
- 2. Estimate fraction of the new CSF samples to be tested. Recall that xc and yc are mean vectors of the methylation levels of the training oligodendrocyte and training neuronal cell data for the CpG sites in the reference set C. For any new CSF sample t to be tested, let ztC be the observed methylation levels of CpG sites in C. Regress ztC against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficient for xC is the estimated oligodendrocyte fraction for the CSF sample t.
- 3. Test if the new CSF samples are from the kth type of abnormal individual. For the new CSF sample t, let xT
k and yTk be mean vectors of the methylation levels of the training oligodendrocyte and training neuronal cells data for the CpG sites in the target set Tk identified in step 1 of this algorithm, apply the fitted regression models obtained from the step 2 of this algorithm to XTk and yTk to predict the methylation levels of CpG sites in Tk for sample t under the hypothesis that sample t is from a normal pregnancy. Let nk be the number of CpG sites in Tk. Define functions
-
-
- where I_(⋅)=I(−∞,0)(⋅), that is, the indicator function for the interval (−∞, 0), ei, eik and eig are estimations obtained from step 1.5 of the algorithm. It will be said the sample is from the kth type of abnormal individual if fk(e1
t −e1, . . . , enk,t −enk )>c4,k, and fk,g (e1t −e1g , . . . , enk,t −enk,g )>c5,g for all g≠k, where eit is the difference between the observed methylation level of the CpG site i∈Tk for sample t and the predicted value by the fitted model obtained from step 2, and g is any type of abnormal individual that is different form the kth type of abnormal individual.
- where I_(⋅)=I(−∞,0)(⋅), that is, the indicator function for the interval (−∞, 0), ei, eik and eig are estimations obtained from step 1.5 of the algorithm. It will be said the sample is from the kth type of abnormal individual if fk(e1
Other ways of implementing the presently disclosed algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the CSF methylomes, nor does it need to approximate them by the oligodendrocyte and HI methylomes. Instead, a set of predictor methylomes can be collected that are mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of CSF samples with known different proportion of oligodendrocyte and neuronal cell DNAs.
In the presently disclosed algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a CSF sample the methylome has been affect by some type of CNS abnormality. To illustrate the advantage of this approach, it is assumed that the mixture vector pi for the methylome of a normal CSF sample j followed a Dirichlet's distribution with parameters αi= . . . =αR. Furthermore, for CpG site i, its methylation levels in the R reference vector pj for component methylomes are mi,r=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of
If there is a methyl-seq library of in sample j with a coverage of N for CpG site i, the variance of the measured methylation level zi,j is
In other words, if zi,j is used as a test statistic to detect abnormal CNS using CSF sample, under the null hypothesis, the test statistic has a variance of σ12. However, in the presently disclosed algorithm of this Example, it is first estimated the mixture vector pi, then predict zi,j by Σrmi,rpr,j. Note that in a methyl-seq data, it can get millions of CpG sites covered in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector pi, even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σrmi,rpr,j can be obtained, the variance of the difference zi,j−Σrmi,rpr,j between the observed methylation level and the prediction will be
In other words, under the null hypothesis, the test static zi,j−Σr Mi,r pr,j used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic zi,j. This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.
Although the presently disclosed subject matter of this Example and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention. Moreover, the scope of this Example is not intended to be limited to the particular embodiments of the process, machine, manufacture, and composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the invention of the presently disclosed subject matter, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the presently disclosed subject matter. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
Example of Embodiments for Example 2B1. A method for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject comprising:
(a) obtaining a cerebrospinal fluid sample from the subject;
(b) determining the methylation status and/or level of one or more genomic loci in the cerebrospinal fluid sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and
(d) diagnosing the CNS disorder in the subject,
wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject.
B2. The method of embodiment B1, wherein an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject.
B3. The method of embodiment B1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject.
B4. A method of treating a CNS disorder in a subject comprising:
(a) obtaining a cerebrospinal fluid sample from the subject;
(b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference;
(d) diagnosing a CNS disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject; and
(e) treating the subject diagnosed with the CNS disorder.
B5. The method of any one of embodiments B1-B4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a cerebrospinal fluid sample obtained from a subject that does not have the CNS disorder.
B6. A method of treating a CNS disorder in a subject comprising:
(a) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject prior to a treatment of the CNS disorder;
(b) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject during the treatment of the CNS disorder;
(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is responsive to the treatment.
B7. The method of embodiment B6, further comprising (d) administering a different treatment to the subject if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is not responsive to the treatment.
B8. The method of embodiment B6, wherein an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment.
B9. The method of embodiment B6, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment.
B10. The method of embodiment B7, wherein an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment.
B11. The method of embodiment B7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment.
B12. The method of any one of embodiments B1-B11, wherein the one or more genomic loci comprise one or more CpG sites.
B13. The method of any one of embodiments B1-B12, wherein the CNS disorder is selected from the group consisting of brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases.
B14. The method of any one of embodiments B1-B13, wherein the one or more genomic loci are present within nucleic acids isolated from the cerebrospinal fluid sample.
B15. The method of any one of embodiments B1-B14, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the cerebrospinal fluid sample.
B16. A method of treating a CNS disorder in a subject comprising;
(a) diagnosing a CNS disorder in the subject by utilization of the algorithm disclosed in Example Embodiment E; and
(b) treating the subject diagnosed with the CNS disorder.
B17. The method of any one of embodiments B1-B16, wherein the subject is human.
B18. A kit for diagnosing, prognosing and/or monitoring a CNS disorder in a subject comprising a means for determining and/or detecting the methylation status of one or more genomic loci.
B19. The kit of embodiment B18, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
Example 2 provides methods for diagnosing, prognosing, monitoring, classifying and/or treating central nervous system disorders, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating central nervous system disorders.
Example 3—Method for Non-Invasive Detection of Fetal Aneuploidy and/or Sub-Chromosomal Fetal Copy Number Variations by Bisulfite Sequencing of Maternal Plasma Field of this ExampleThe methods disclosed in this Example are related to the field of prenatal diagnosis, specifically to non-invasive methods to detect fetal aneuploidy in a biological sample including maternal plasma DNA.
Background of this ExampleDefinitive prenatal diagnosis is currently performed via amniocentesis (AF) or chorionic villus sampling (CVS) to obtain fetal or placental cells, respectively. Chromosome analysis is then achieved using conventional karyotyping or, more recently, array comparative genomic hybridization (aCGH). AF and CVS are invasive procedures that have significant risk of miscarriage, fetal morbidity and considerable parental stress and there have been intense efforts to develop non-invasive alternatives.
Summary of this ExampleDisclosed in this Example are diagnostic methods that can be used to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations (e.g., microdeletions and/or microduplications) in maternal plasma while reducing cost and complexity. The method uses DNA methylation signatures identified using biochemical methods with a computational approach to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations.
Description of this ExampleProof of concept has been demonstrated for the detection of pregnancy related disease via bisulfite sequencing of maternal plasma. One intermediate step of the method includes the estimation of a percentage of DNA fragments in the maternal plasma that originate from the fetus. The method has been modified to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations.
The approach of this Example involves the targeted methylation of specific regions of genomic DNA such as commonly aneuploid chromosomes and/or copy number variation regions (such as microdeletion regions). Such an approach would involve a parallel strategy in which regions of interest are targeted in a multiplex fashion.
Terms of this ExampleUnless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found in Benjamin Lewin, Genes V, published by Oxford University Press, 1994 (ISBN 0-19-854287-9); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8).
In order to facilitate review of the various embodiments of this Example, the following explanations of specific terms are provided:
Amplification: To increase the number of copies of a nucleic acid molecule. The resulting amplification products are called “amplicons.” Amplification of a nucleic acid molecule (such as a DNA or RNA molecule) refers to use of a technique that increases the number of copies of a nucleic acid molecule in a sample. An example of amplification is the polymerase chain reaction (PCR), in which a sample is contacted with a pair of oligonucleotide primers under conditions that allow for the hybridization of the primers to a nucleic acid template in the sample. The primers are extended under suitable conditions, dissociated from the template, re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid. This cycle can be repeated. The product of amplification can be characterized by such techniques as electrophoresis, restriction endonuclease cleavage patterns, oligonucleotide hybridization or ligation, and/or nucleic acid sequencing.
Other examples of in vitro amplification techniques include quantitative real-time PCR; reverse transcriptase PCR (RT-PCR); real-time PCR (rt PCR); real-time reverse transcriptase PCR (rt RT-PCR); nested PCR; strand displacement amplification (see U.S. Pat. No. 5,744,311); transcription-free isothermal amplification (see U.S. Pat. No. 6,033,881); repair chain reaction amplification (see PCT Publication No. WO 90/01069); ligase chain reaction amplification (see European patent publication No. EP-A-320 308); gap filling ligase chain reaction amplification (see U.S. Pat. No. 5,427,930); coupled ligase detection and PCR (see U.S. Pat. No. 6,027,889); and NASBA™ RNA transcription-free amplification (see U.S. Pat. No. 6,025,134), amongst others.
Allele (Haplotype): A 5′ to 3′ sequence of nucleotides found at a set of one or more polymorphic sites in a locus on a single chromosome from a single individual. “Allelic pair” is the two alleles found for a locus in a single individual. With regard to a population, alleles are the ordered, linear combination of polymorphisms (e.g., single nucleotide polymorphisms (SNPs) in the sequence of each form of a gene (on individual chromosomes) that exist in the population. “Haplotyping” is a process for determining one or more alleles in an individual and includes use of family pedigrees, molecular techniques and/or statistical inference. “Haplotype data” or “allele data” is the information concerning one or more of the following for a specific gene: a listing of the allelic pairs in an individual or in each individual in a population; a listing of the different alleles in a population; frequency of each allele in that or other populations, and any known associations between one or more alleles and a trait.
Bisulfite: All types of bisulfites, such as sodium bisulfite, that are capable of chemically converting a cytosine (C) to a uracil (U) without chemically modifying a methylated cytosine and therefore can be used to differentially modify a DNA sequence based on the methylation status of the DNA.
Bisulfite treatment: The treatment of DNA with bisulfite or a salt thereof, such as sodium bisulfite (NaHSO3). Bisulfite reacts readily with the 5,6-double bond of cytosine, but poorly with methylated cytosine. Cytosine reacts with the bisulfite ion to form a sulfonated cytosine reaction intermediate which is susceptible to deamination, giving rise to a sulfonated uracil. The sulfonate group can be removed under alkaline conditions, resulting in the formation of uracil. Uracil is recognized as a thymine by polymerases and amplification will result in an adenine-thymine base pair instead of a cytosine-guanine base pair.
Cell-free DNA: DNA which is no longer fully contained within an intact cell, for example DNA found in plasma or serum.
Chromosomal abnormality: A chromosome, or a segment of a chromosome, with DNA deletions or duplications, such as chromosomal aneuploidy. The term also encompasses translocation of extra chromosomal sequences to other chromosomes.
Chromosomal aneuploidy or aneuploidy: The abnormal presence (hyperploidy) or absence (hypoploidy) of a chromosome, such as chromosome 13, 18 or 21. In some cases, the abnormality can involve more than one chromosome, or more than one portion of one or more chromosomes. The most common chromosome aneuploidy is trisomy, such as trisomy 21, where the genome of an afflicted patient has three chromosomes 21, as compared to two chromosomes 21. In rarer cases, the patient may have an extra piece of chromosome 21 (less than full length) in addition to the normal pair. In yet other cases, a portion of chromosome 21 may be translocated to another chromosome, such as chromosome 14. In this example, chromosome 21 is referred as the “chromosome relevant to the chromosomal aneuploidy” and a second, chromosome that is present in the normal pair in the patient's genome, for example chromosome 1, is a “reference chromosome.” There are also cases where the number of a relevant chromosome is less than the normal number of 2. Turner syndrome is one example of a chromosomal aneuploidy where the number of X chromosome in a female subject has been reduced from two to one.
Chromosomal Copy number variation: A microdeletion is a small deletion in a chromosome, such as a deletion of about 10 kilobases to about 5 million base pairs, for example a deletion of about 100 to about 5 million base pairs, such as about 1,000 kilobases, such as about 100 kilobases, 200 kilobases, 400 kilobases, 500 kilobases, 750 kilobases, or about 1 million base pairs. Similarly, a “microduplication” is a small duplication in a chromosome, such as a duplication of about 10 kilobases to about 5 million base pairs, for example a deletion of about 100 to about 5 million base pairs, such as about 1,000 kilobases, such as about 100 kilobases, 200 kilobases, 400 kilobases, 500 kilobases, 750 kilobases, or about 1 million base pairs. A chromosomal fetal copy number variation is either a microdeletion or a microduplication in genomic DNA of a fetus of a pregnant woman.
CpG-containing genomic sequence: A segment of DNA sequence at a defined location in the genome of an individual such as a human fetus or a pregnant woman. Typically, a “CpG-containing genomic sequence” is at least 15 nucleotides in length and contains at least one cytosine. In some embodiments, a CpG containing sequence can be at least 30, 50, 80, 100, 150, 200, 250, or 300 nucleotides in length and contains at least 2, 5, 10, 15, 20, 25, or 30 cytosines. For any specific “CpG-containing genomic sequence” at a given location, for example, within a region centering around a given genetic locus on chromosome 21 nucleotide sequence variations can exist from individual to individual and from allele to allele even for the same individual. Typically, such a region centering around a defined genetic locus (e.g., a CpG island) contains the locus as well as upstream and/or downstream sequences. Each of the upstream or downstream sequence (counting from the 5′ or 3′ boundary of the genetic locus, respectively) can be as long as 1 kb, in other cases may be as long as 5 kb, 2 kb, 750 bp, 500 bp, 200 bp, or 100 bp. A “CpG-containing genomic sequence” can encompass a coding or a non-coding, nucleic acid sequence, and thus can include a nucleotide sequence transcribed (or not transcribed) for protein production. Thus, a CpG containing genomic sequence can be a nucleotide sequence can be a protein-coding sequence, a non protein-coding sequence or a combination thereof.
Control DNA: Genomic DNA obtained from an individual that is used for comparative purposes, such as DNA from a healthy individual who does not have a chromosomal abnormality. In some embodiments, a control DNA sample can be obtained from plasma of a female carrying a healthy fetus who does not have a chromosomal abnormality, which can serve as a negative control. When certain chromosome anomalies are known, the control can also be established standards that are indicative of a specific disease or condition.
To screen for three different chromosomal aneuploidies in a maternal plasma of a pregnant female, a panel of control DNAs that have been isolated from plasma of mothers who are known to carry a fetus with, for example, chromosome 13, 18, or 21 trisomy, and a mother who is pregnant with a fetus who does not have a chromosomal abnormality can be used as a control.
Copy number: The number of copies of a section of DNA in a genome. Copy number analysis usually refers to the process of analyzing data produced by a test for DNA copy number variation in patient's sample. Such analysis helps detect chromosomal copy number variation that may cause or may increase risks of various critical disorders. Copy number variation can be detected with various types of tests, including, but not limited to, such as methylation status, fluorescent in situ hybridization, comparative genomic hybridization high-resolution array-based tests based on array comparative genomic and SNP array technologies. The methods disclosed herein can be used to determine the copy number of a specific locus of interest.
DNA (deoxyribonucleic acid): DNA is a long chain polymer which comprises the genetic material of most living organisms (some viruses have genes comprising ribonucleic acid (RNA)). The repeating units in DNA polymers are four different nucleotides, each of which comprises one of the four bases, adenine, guanine, cytosine and thymine bound to a deoxyribose sugar to which a phosphate group is attached. Triplets of nucleotides (referred to as codons) code for each amino acid in a polypeptide, or for a stop signal (termination codon). The term codon is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the DNA sequence is transcribed.
Unless otherwise specified, any reference to a DNA molecule is intended to include the reverse complement of that DNA molecule. Except where single-strandedness is required by the text herein, DNA molecules, though written to depict only a single strand, encompass both strands of a double-stranded DNA molecule. Thus, a reference to the nucleic acid molecule that encodes a protein, or a fragment thereof, encompasses both the sense strand and its reverse complement. Thus, for instance, it is appropriate to generate probes or primers from the reverse complement sequence of the disclosed nucleic acid molecules.
DNA sequencing: The process of determining the nucleotide order of a given DNA molecule. The general characteristics of “parallel or massively parallel sequencing” are that the sequence of the target genetic material is then performed in parallel and the sequence information is captured by a computer. For example, the sequencing can be performed using sequencing-by-synthesis with reversible terminations (ILLUMINA® Genome Analyzer or Hi Seq, Next-Seq, Nova-Seq etc., semiconductor sequencing (Thermo FishernIon Torrent) or nanopore sequencing (Oxford Nanopore).
Differentially Modifies (methylated or non-methylated DNA): A reagent that modifies methylated or non-methylated DNA, respectively, in a process through which distinguishable products result from methylated and non-methylated DNA, thereby allowing the identification of the DNA methylation status. Such processes may include, but are not limited to, chemical reactions (such as conversion by bisulfite) and enzymatic treatment (such as cleavage by a methylation-dependent endonuclease), or an antibody that specifically binds a methylated (or non-methylated) DNA sequence. Thus, an enzyme that preferentially cleaves or digests methylated DNA is one capable of cleaving or digesting a DNA molecule at a significantly higher efficiency when the DNA is methylated, whereas an enzyme that preferentially cleaves or digests unmethylated DNA exhibits a significantly higher efficiency when the DNA is not methylated.
Gene: A segment of DNA that contains the coding sequence for a protein, wherein the segment may include promoters, exons, introns, and other untranslated regions that control expression.
Genotype: An unphased 5′ to 3′ sequence of nucleotide pair(s) found at a set of one or more polymorphic sites in a locus on a pair of homologous chromosomes in an individual. “Genotyping” is a process for determining a genotype of an individual.
Genomic segment: A contiguous sequence of genomic DNA no more than 2000 bases in length.
Hybridization: Oligonucleotides and their analogs hybridize by hydrogen bonding, which includes Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary bases. Generally, nucleic acids consist of nitrogenous bases that are either pyrimidines (cytosine (C), uracil (U), and thymine (T)) or purines (adenine (A) and guanine (G)). These nitrogenous bases form hydrogen bonds between a pyrimidine and a purine, and the bonding of the pyrimidine to the purine is referred to as “base pairing.” More specifically, A will hydrogen bond to T or U, and G will bond to C. “Complementary” refers to the base pairing that occurs between two distinct nucleic acid sequences or two distinct regions of the same nucleic acid sequence. For example, an oligonucleotide can be complementary to a specific genetic locus, so it specifically hybridizes with a mutant allele (and not the reference allele) or so that it specifically hybridizes with a reference allele (and not the mutant allele).
“Specifically hybridizable” and “specifically complementary” are terms that indicate a sufficient degree of complementarity such that stable and specific binding occurs between the oligonucleotide (or its analog) and the DNA or RNA target, such that the target can be distinguished. The oligonucleotide or oligonucleotide analog need not be 100% complementary to its target sequence to be specifically hybridizable. An oligonucleotide or analog is specifically hybridizable when binding of the oligonucleotide or analog to the target DNA or RNA molecule interferes with the normal function of the target DNA or RNA, and there is a sufficient degree of complementarity to avoid non-specific binding of the oligonucleotide or analog to non-target sequences under conditions where specific binding is desired, for example under physiological conditions in the case of in vivo assays or systems. Such binding is referred to as specific hybridization. In one example, an oligonucleotide is specifically hybridizable to DNA or RNA nucleic acid sequences including an allele of a gene, wherein it will not hybridize to nucleic acid sequences containing a polymorphism.
Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method of choice and the composition and length of the hybridizing nucleic acid sequences. Generally, the temperature of hybridization and the ionic strength (especially the Na+ concentration) of the hybridization buffer will determine the stringency of hybridization, though wash times Also influence stringency. Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed by Sambrook et al. (ed.), Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989, chapters 9 and 11.
Increase or a Decrease: A significantly significant positive or negative change, respectively, in quantity from a control value. An increase is a positive change, such as a 50%, 100%, 200%, 300%, 400% or 500% increase as compared to the control value. A decrease is a negative change, such as a 50%, 100%, 200%, 300%, 400% or 500% decrease as compared to a control value.
Isolated: An “isolated” biological component (such as a nucleic acid molecule, protein or organelle) has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, i.e., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids and proteins that have been “isolated” include nucleic acids and proteins purified by standard purification methods. The term also embraces nucleic acids and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids.
Locus: A location on a chromosome or DNA molecule corresponding to a gene or a physical or phenotypic feature, where physical features include polymorphic sites.
Methylation: The addition of a methyl group (—CH3) to cytosine nucleotides of CpG sites in DNA. DNA methylation, the addition of a methyl group onto a nucleotide, is a post-replicative covalent modification of DNA that is catalyzed by a DNA methyltransferase enzyme. In biological systems, DNA methylation can serve as a mechanism for changing the structure of DNA without altering its coding function or its sequence.
Methylation sequencing assay: A sequencing assay that detects the methylation status of one or more CpG sites in DNA. A non-limiting example of a methylation sequencing assay is a sequencing assay performed on bisulfite-treated and amplified genomic DNA.
Methylation status: The state of methylation of a genomic sequence. This refers to the characteristics of a DNA segment at a particular genomic locus relevant to methylation. Such characteristics include, but are not limited to, whether any of the cytosine (C) residues within this DNA sequence are methylated, location of methylated C residue(s), percentage of methylated C at any particular stretch of residues, and allelic differences in methylation. The methylation profile affects the relative or absolute concentration of methylated C or unmethylated C at any particular stretch of residues in a biological sample.
Methyl-sensitive enzymes: DNA restriction endonucleases that are dependent on the methylation state of their DNA recognition site for activity. For example, there are methyl-sensitive enzymes that cleave at their DNA recognition sequence only if it is not methylated. Thus, an unmethylated DNA sample will be cut into smaller fragments than a methylated DNA sample. Similarly, a hypermethylated DNA sample will not be cleaved. In contrast, there are methyl-sensitive enzymes that cleave at their DNA recognition sequence only if it is methylated. As used herein, the terms “cleave”, “cut” and “digest” are used interchangeably.
Methyl-sensitive enzymes that digest unmethylated DNA suitable for use in methods of the invention include, but are not limited to, HpaII, HhaI, MaeII, BstUI and AciI. One enzyme is HpaII that cuts only the unmethylated sequence CCGG. Enzymes that digest only methylated DNA include, but are not limited to, DpnI, which cuts at a recognition sequence GATC, and McrBC, which belongs to the family of AAA+ proteins (New England BioLabs, Inc., Beverly, Mass.).
Cleavage methods and procedures for selected restriction enzymes for cutting DNA at specific sites are well known to the skilled artisan. For example, many suppliers of restriction enzymes provide information on conditions and types of DNA sequences cut by specific restriction enzymes, including New England BioLabs, Promega Corporation, Boehringer-Mannheim, and the like. Sambrook et al. (See Sambrook et al., Molecular Biology: A Laboratory Approach, Cold Spring Harbor, N.Y. 1989) provide a general description of methods for using restriction enzymes and other enzymes.
Oligonucleotide: An oligonucleotide is a plurality of joined nucleotides joined by native phosphodiester bonds, between about 6 and about 300 nucleotides in length. An oligonucleotide analog refers to moieties that function similarly to oligonucleotides but have non-naturally occurring portions. For example, oligonucleotide analogs can contain non-naturally occurring portions, such as altered sugar moieties or inter-sugar linkages, such as a phosphorothioate oligodeoxynucleotide. Functional analogs of naturally occurring polynucleotides can bind to RNA or DNA, and include peptide nucleic acid (PNA) molecules.
In several examples, oligonucleotides and oligonucleotide analogs can include linear sequences up to about 200 nucleotides in length, for example a sequence (such as DNA or RNA) that is at least 6 bases, for example at least 8, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100 or even 200 bases long, or from about 6 to about 70 bases, for example about 10-25 bases, such as 12, 15 or 20 bases.
Polymorphic marker: A segment of genomic DNA that exhibits heritable variation in a DNA sequence between individuals. Such markers include, but are not limited to, single nucleotide polymorphisms (SNPs), restriction fragment length polymorphisms (RFLPs), short tandem repeats, such as di-, tri- or tetra-nucleotide repeats (STRs), and the like. Polymorphic markers can be used to specifically differentiate between a maternal and paternal allele in the enriched fetal nucleic acid sample.
Polymorphism: A variation in a gene sequence. The polymorphisms can be those variations (DNA sequence differences) which are generally found between individuals or different ethnic groups and geographic locations which, while having a different sequence, produce functionally equivalent gene products. Typically, the term can also refer to variants in the sequence which can lead to gene products that are not functionally equivalent. Polymorphisms also encompass variations which can be classified as alleles and/or mutations which can produce gene products which may have an altered function. Polymorphisms also encompass variations which can be classified as alleles and/or mutations which either produce no gene product or an inactive gene product or an active gene product produced at an abnormal rate or in an inappropriate tissue or in response to an inappropriate stimulus. Alleles are the alternate forms that occur at the polymorphism.
Polymorphisms can be referred to, for instance, by the nucleotide position at which the variation exists, by the change in amino acid sequence caused by the nucleotide variation, or by a change in some other characteristic of the nucleic acid molecule or protein that is linked to the variation.
Primers: Primers are nucleic acid molecules, usually DNA oligonucleotides of about 10-50 nucleotides in length (longer lengths are also possible). Typically, primers are at least about 15 nucleotides in length, such as at least about 20, 25, 30, or 40 nucleotides in length. For example, a primer can be about 10-50 nucleotides in length, such as, 10-30, 15-20, 15-25, 15-30, or 20-30 nucleotides in length. Primers can also be of a maximum length, for example no more than 25, 30, 40, or 50 nucleotides in length. Forward and reverse primers may be annealed to a complementary target DNA strand by nucleic acid hybridization to form hybrids between the primers and the target DNA strand, and then extended along the target DNA strand by a DNA polymerase enzyme to form an amplicon. One of skill in the art will appreciate that the hybridization specificity of a particular probe or primer typically increases with its length. Thus, for example, a probe or primer including 20 consecutive nucleotides typically will anneal to a target with a higher specificity than a corresponding probe or primer of only 15 nucleotides. In some embodiments, forward and reverse primers are used in combination in a bisulfite amplicon sequencing assay.
Sample: A sample, such as a biological sample, is a sample obtained from a subject. As used herein, biological samples include all clinical samples useful for detection of fetal aneuploidy, including, but not limited to, cells, tissues, and bodily fluids, such as: blood; derivatives and fractions of blood, such as serum; urine; sputum; or CVS samples. In a particular example, a sample includes blood obtained from a human subject, such as whole blood or serum. Microdeletions and microduplications can be measured in samples isolated from a subject.
Sensitivity and specificity: Statistical measurements of the performance of a binary classification test. Sensitivity measures the proportion of actual positives which are correctly. Specificity measures the proportion of negatives which are correctly identified.
Sequence Read: A sequence (e.g., of about 300 bp) of contiguous base pairs of a nucleic acid molecule. The sequence read may be represented symbolically by the base pair sequence (in ATCG) of the sample portion. It may be stored in a memory device and processed as appropriate to determine whether it matches a reference sequence or meets other criteria. A sequence read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information concerning a sample.
Standard control: A value reflective of the ratio, or the amount or concentration of a fetal genomic sequence located on a chromosome relevant to a particular chromosomal aneuploidy (such as trisomy 13, 18, or 21) over the amount or concentration of a fetal genetic marker located on a reference chromosome, as the amounts or concentrations are found in a biological sample (for example, blood, plasma, or serum) from an average, healthy pregnant woman carrying a chromosomally normal fetus. A “standard control” can be determined differently and represent different value depending on the context in which it is used. For instance, when used in an epigenetic-genetic dosage method where an epigenetic marker is measured against a genetic marker, the “standard control” is a value reflective of the ratio, or the amount or concentration of a fetal genomic sequence located on a chromosome relevant to a particular chromosomal aneuploidy (for example, trisomy 13, 18, or 21) over the amount or concentration of a fetal genetic marker located on a reference chromosome, as the amounts or concentrations are found in a biological sample (such as blood, plasma, or serum) from an average, healthy pregnant woman carrying a chromosomally normal fetus. In some embodiments, a standard control is determined based on an average healthy pregnant woman at a certain gestational age.
Subject: Living multi-cellular vertebrate organisms, a category that includes human and non-human mammals (such as laboratory or veterinary subjects).
Target sequence or region: A sequence of nucleotides located in a particular region in the human genome. The target can be for instance a coding sequence; it can also be the non-coding strand that corresponds to a coding sequence. The target can also be a non-coding sequence, such as an intronic sequence.
Unless otherwise explained, all technical and scientific terms used in this Example have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The singular terms “a,” “an,” and “the” include plural referents unless context clearly indicates otherwise. Similarly, the word “or” is intended to include “and” unless the context clearly indicates otherwise. It is further to be understood that all base sizes or amino acid sizes, and all molecular weight or molecular mass values, given for nucleic acids or polypeptides are approximate, and are provided for description. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of this disclosure, suitable methods and materials are described below. The term “comprises” means “includes.”
This Example provides an improved method for the detection of fetal aneuploidy and/or the detection of sub-chromosomal fetal copy number variations via bisulfate sequencing of maternal plasma. One intermediate step of the method is to estimate the percentage of DNA fragments in the maternal plasma that originate from the fetus. In some embodiments, the method for the detection of fetal aneuploidy includes the following steps.
First, the method comprises estimating the percentage of fetal DNA fragments in the sample using exclusively the reads aligned to the target chromosome for which we would like to test fetal aneuploidy, e.g., chromosome 21. Herein, this estimation is called FA.
Next, the method comprises estimating the percentage of fetal DNA fragment in the sample using the reads aligned to the reference chromosomes that we believe are not affected by fetal aneuploidy, e.g., chromosome 1, 2, 3, etc. Herein, this estimation is called FD.
The next step comprises calculating the difference between FA−FD.
If |FA−FD|>c1, for some threshold c1>0, we claim that the fetus has aneuploidy in the target chromosome. Specifically, if FA−FD>c1, it is indicated that the fetus has one more extra copy in the target chromosome. If FA−FD<−c1, it is indicated that the fetus has only one copy in the target chromosome.
If |FA−FD|>c2, for some threshold c2 such that c1>c2>0, and FD<c3 for threshold c3>0, it is undecided whether the fetus has aneuploidy in the target chromosome.
If |FA−FD|<c2, it is undecided whether the fetus is not affected by aneuploidy in the target chromosome.
In some embodiments, the method for the detection of sub-chromosomal fetal copy number variations includes the following steps:
First, the method comprises estimating the percentage of fetal DNA fragments in the sample using exclusively the reads aligned to a target genomic copy number variation region. The target copy number variation region may be on a target chromosome, or any chromosome. Herein, this estimation is called FAM.
Next, the method comprises estimating the percentage of fetal DNA fragment in the sample using the reads aligned to a reference or control copy number variation region that is known to not be significantly affected by fetal copy number variations. The reference copy number variation region may be on a reference chromosome or any chromosome. Herein, this estimation is called FDM.
The next step comprises calculating the difference between FAM−FDM.
If |FAM−FDM|>c1, for some threshold c1>0, we claim that the sample has the presence of sub-chromosomal fetal microdeletions. Specifically, if FAM−FDM>c1, it is indicated that the fetal DNA in the sample has sub-chromosomal copy number variations. If FAM−FDM<−c1, it is indicated that the fetal DNA in the sample does not have sub-chromosomal copy number variations.
If |FAM−FDM|>c2, for some threshold c2 such that c1>c2>0, and FD<c3 for constant c3>0, it is undecided whether the fetus has sub-chromosomal copy number variations.
If |FA−FD|<c2, it is undecided whether the fetus is not affected by sub-chromosomal copy number variations.
Example Embodiment F of Example 3: Detection of Trisomy 21 by Bisulfite Sequencing of Maternal PlasmaSequencing Datasets Used were as Follows:
CM dataset: CVS and MBC genome-wide targeted bisulfite sequencing data: We have 5 bisulfite sequencing libraries from 5 normal CVS samples and 6 bisulfite sequencing libraries from 6 normal MBC samples. Genome-wide targeting was achieved via solution-phase hybridization using Agilent kit. There are about 2.8 million CpG sites (ranging from 1.2 to 3.5 million) covered by 5 or more reads in each library. In total we have 2.75 million CpG sites that are covered by 5 or more reads in at least 3 libraries in each group.
PnP dataset: Normal Pregnant and Non pregnant plasma targeted bisulfite sequencing data: We have 6 bisulfite sequencing libraries from 6 normal maternal plasma samples and 9 bisulfite sequencing libraries from 9 normal nonpregnant woman plasma samples. Genome-wide targeting was achieved via solution-phase hybridization using the Roche kit. There are about 7.3 million CpG sites (ranging from 5.3 to 8.8 million) covered by 5 or more reads in each library. We have in total 6.84 million CpG sites that are covered by 5 or more reads in at least 4 libraries in each group.
T21N dataset: Maternal plasma (Normal and Trisomy 21) targeted bisulfite sequencing data included 14 libraries from 12 normal samples, and 12 libraries from 12 T21 samples. Genome-wide targeting was achieved via solution-phase hybridization using the Roche kit. In each library, there are about 7.7 million CpG sites covered by 5 or more reads (ranging from 4.86 million to 9.95 million). We have in total about 6.86 million CpG sites that are covered by 5 or more reads in at least 8 libraries in each group.
Model selection: The method first comprises determining the most appropriate predictive model for analyzing the methylation levels of libraries in the T21N data set.
Maternal plasma DNA can be considered as a mixture of the DNA from maternal tissues and fetal tissues. If we assume that among the fetal tissues, placenta contributes the most to the maternal plasma DNA, and among the maternal tissues, maternal white blood cells contribute the most to the maternal plasma DNA, we can approximate the fetal tissue signature using placental reference data, and the maternal tissue signature using maternal white blood cell reference data. This suggests that, for the first predictive model, we can predict the maternal plasma DNA methylation signature as a mixture of the average placental (CVS) DNA methylation signature and the average maternal white blood cells (MBC) DNA methylation signature. Herein this is referred to as the CM model.
Secondly, we can use the DNA from non-pregnant women's plasma to approximate the maternal plasma DNA coming from the maternal tissues. Therefore, as the second predictive model, we can predict the maternal plasma DNA methylation as a mixture of the average placenta (CVS) DNA methylation and the average non-pregnant women plasma (nP) DNA methylation. Herein this is referred to as the CnP model.
Finally, as discussed in Example 1, we can use mixtures of the maternal and fetal tissues to predict the maternal plasma DNA methylation, as long as the signals in these mixtures have different mix proportions. Therefore, as the third predictive model, we can predict the maternal plasma DNA methylation as a mixture of a certain average maternal plasma (P) DNA methylation and the average non-pregnant women plasma (nP) DNA methylation. (Note that because different maternal plasma samples have different proportions of fetal DNA, average maternal plasma methylation will be different for different sets of samples). Herein this is referred to as the PnP model.
Selection of CpG sites: CpG sites were selected in the target chromosome (chr21) and reference chromosomes (chr1, . . . , 12, 14, 15). We experimented with several different ways of identifying informative CpG sites for the detection of aneuploidy. In addition to the use of all CpG sites, we also considered the set of CpG sites that are differentially methylated between the maternal tissues (those that have the most significant influence on the maternal plasma DNA methylation pattern) and the fetal tissues (those that have the most significant influence on the maternal plasma methylation pattern).
For the CM model, the CpG sites must be shared by the CM data and the T21N data. In total we found 2031580 shared CpG sites. Among them, 20859 are located in the target chromosome chr21, 1431135 are located in the reference chromosomes (chr1, . . . , 12, 14, 15). Furthermore, if we require the CpG sites to be differentially methylated between the CVS and MBC samples, with p value <=0.05 and difference in methylation level >=20, the number of CpG sites reduces to 648617, with 7409 in the target chromosome and 455660 in the reference chromosomes.
For the CnP model, the CpG sites must be shared by all three data sets: CM data, PnP data, and T21N data. In total we found 1970505 shared CpG sites. Among them, 20015 are located in the target chromosome chr21, 1391135 are located in the reference chromosomes (chr1, . . . , 12, 14, 15). Furthermore, if we require the CpG sites to be differentially methylated between the CVS and MBC samples, with p value <=0.05 and difference in methylation level >=20, the number of CpG sites reduces to 637599, with 7243 in the target chromosome and 448842 in the reference chromosomes.
For the PnP model, the CpG sites must be shared by two data sets: PnP data, and T21N data. In total we found 6386312 shared CpG sites. Among them, 69344 are located in the target chromosome chr21, 4498308 are located in the reference chromosomes (chr1, . . . , 12, 14, 15). Furthermore, if we require the CpG sites to be differentially methylated between the pregnant and non-pregnant plasma samples, with p value <=0.05 and difference in methylation level between 2 and 20, the number of CpG sites reduces to 546671, with 6316 in the target chromosome and 392185 in the reference chromosomes.
Table 10 below summarizes the CpG sites used in our study:
Application of the modified algorithm: The modified algorithm of this Example was applied to test to test which libraries in T21N are from trisomy 21 pregnancies.
We predicted the fetal frequency using the CpG sites shown in Table 10. For the CM model, we used the average methylation level of the selected CpG sites respectively in the 5 CVS libraries and 6 MBC libraries from the CM data set. For the CnP model, we used the average methylation level of the selected CpG sites respectively in the 5 CVS libraries from the CM data set and 6 maternal plasma libraries from the PnP data set. For the PnP model, we used the average methylation level of the selected CpG sites respectively in the 6 maternal plasma libraries and 9 nonpregnant plasma libraries from the PnP data set. The results of these analyses, shown at
Results: In each figure for the CM models (
For the PnP models (
C1. A method, comprising:
obtaining sequence reads of a methylation sequencing assay covering genomic segments of a maternal plasma sample from a pregnant subject;
estimating a first percentage of fetal DNA molecules (FA) in the maternal plasma sample based on sequence reads that are aligned to a target chromosome;
estimating a second percentage of fetal DNA molecules (FD) in the maternal plasma sample based on sequence reads that are aligned to one or more reference chromosomes; and
indicating the presence of a fetal aneuploidy in the pregnant subject responsive to an absolute value difference between the first percentage and the second percentage being larger than a threshold c1, wherein c1>0,
wherein the maternal plasma sample comprises maternal DNA and fetal DNA.
C2. The method of embodiment C1, wherein the target chromosome is chromosome 21, chromosome 13, chromosome 18, or chromosome X.
C3. The method of embodiments C1 or C2, wherein the one or more reference chromosomes include one or more of chromosomes 1-12, 14-17, 19, 20 and 22.
C4. The method of any of embodiments C1-C3, wherein indicating fetal aneuploidy comprises indicating that a fetus of the pregnant subject has an abnormal number of copies of the target chromosome.
C5. The method of any of embodiments C1-C4, wherein the pregnant subject is a human.
C6. A method for detecting fetal aneuploidy in a maternal plasma DNA sample obtained from a pregnant woman as described herein.
C7. A method for detecting sub-chromosomal fetal copy number variation in a maternal plasma DNA sample obtained from a pregnant woman as described herein.
C8. A method, comprising:
obtaining sequence reads of a methylation sequencing assay covering genomic segments of a maternal plasma sample from a pregnant subject;
estimating a first percentage of fetal DNA molecules (FAM) in the maternal plasma sample based on sequence reads that are aligned to a target copy number variation region of a chromosome of a genome;
estimating a second percentage of fetal DNA molecules (FDM) in the maternal plasma sample based on sequence reads that are aligned to one or more reference regions of the genome; and
indicating the presence of a sub-chromosomal fetal copy number variation in the pregnant subject responsive to an absolute value difference between the first percentage and the second percentage being larger than a threshold c1, wherein c1>0, wherein the maternal plasma sample comprises maternal DNA and fetal DNA.
Abstract of this Example (Example 3)Non-invasive methods are disclosed in this Example for prenatal diagnosis. These methods can be used to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations. The methods are performed on a biological sample including maternal plasma DNA, such as a plasma or blood sample from a pregnant woman.
Example 4—Methods and Materials for Assessing and Treating Endometriosis Field of this ExampleThis Example relates to methods and materials involved in assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing endometriosis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) to determine whether or not a mammal has, or is developing, endometriosis. This Example also provides methods, algorithms, and kits for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis.
Background of this ExampleEndometriosis is a debilitating disease involving the growth of uterine tissue outside the uterus. The primary symptoms are pelvic pain and infertility. Nearly half of affected women have chronic pelvic pain, and in 70 percent of those, the pain occurs during menstruation. Pain with sex also is common. Infertility occurs in up to half of women affected. Less common symptoms include urinary or bowel symptoms. About 25 percent of women have no symptoms. Endometriosis can have both social and psychological effects.
Currently, definitive diagnosis is achieved by surgical biopsy, which is achieved laparoscopically. Because of the invasive nature of this method, diagnosis and therefore treatment, are frequently considerably delayed, and the consequence of this is prolonged and progressive pain and a risk of infertility.
Summary of this ExampleThis Example provides methods and materials involved assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing endometriosis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) to determine whether or not a mammal has, or is developing, endometriosis. Determining if a mammal (e.g., a human) has, or is likely to develop, endometriosis by assessing DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) can aid in the identification of mammals (e.g., humans) that should be treated in a particular manner (e.g., by administering a hormone therapy, by administering a pain medication, and/or by performing a surgical treatment), for example, early in the disease process.
This Example also provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis. For example, the methods described in this Example can include determining the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) of a mammal (e.g., a female human). This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis.
In one aspect, this Example provides a method for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and (d) identifying the mammal as having endometriosis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference indicates the presence of endometriosis in the mammal.
In another aspect, this Example provides a method of treating endometriosis in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci present in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted; (d) identifying the mammal as having endometriosis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the mammal; and (e) treating the mammal by administering a hormone therapy, by administering a pain medication, and/or by performing a surgical treatment.
In some cases, an increase in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) can indicate the presence of endometriosis in the mammal or a decrease in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) indicates the presence of the endometriosis in the mammal. In some cases, a decrease in the level of methylation of at least one of the one or more genomic loci in a sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample can indicate the presence of endometriosis in the mammal.
In some cases, the reference can be the methylation status and/or level of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) obtained from a mammal (e.g., a human female) that does not have endometriosis.
In another aspect, this Example provides a method of treating endometriosis in a mammal (e.g., a human female) comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal prior to a treatment of endometriosis; (b) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal during the treatment of endometriosis; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the subject is responsive to the treatment. In some cases, the method further comprises (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the subject is not responsive to the treatment.
In some cases, an increase in the level of methylation of the one or more genomic loci in the sample indicates that the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the subject is not responsive to the treatment.
This Example further provides algorithms for diagnosing and/or monitoring a mammal having endometriosis. In certain embodiments, the algorithm of this Example can be used to classify endometriosis of a mammal (e.g., a human female).
In another aspect, this Example provides a kit for diagnosing, prognosing, and/or monitoring endometriosis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.
Description of this ExampleThis Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample of a mammal (e.g., a human female). In some cases, the methods described herein can include the use of an algorithm to diagnose, prognose, monitor, classify, and/or assist in the treatment of endometriosis.
Unless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this Example and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this Example: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of this Example and how to make and use them.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. The present Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.
As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing,” and “comprising” are interchangeable, and one of skill in the art is cognizant that these terms are open ended terms.
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence, and/or level of a biomarker in a sample of a mammal (e.g., a human) is determined by comparing to a reference control.
The terms “reference sample,” “reference control,” “control,” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a sample of a mammal. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have endometriosis. In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence, and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence, and/or particular level of a methylation state of a genomic locus that indicates a subject does not have endometriosis. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has endometriosis, where the methylation status of the locus is known to be not associated with the disease or the phenotype.
The term “a set of predicted values” refers to the methylation status of certain genomic loci for a sample. The status of those loci is not directly measured from that sample. Rather, it is inferred from measurements of other loci for that sample and/or measurements of other samples. The inference of the predicted values is based on some mathematical/statistical models. The models usually assume that the sample for which the methylation status of those loci is to be predicted has a normal phenotype. This assumption may be either correct or wrong, but its correctness is not required for the inference of the predicted values.
The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from the uterus or endometrium. In certain embodiments, slightly invasive or non-invasive methods, as described herein, include obtaining plasma, urine, or a cervical or vaginal swab from a subject.
The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include mammals, non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.
The term “nucleic acid,” “nucleic acid molecule,” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T), or uracil (U)), a sugar (i.e., deoxyribose or ribose), and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.
The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins, and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.
The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island, or a CpG island shore. For example, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.
As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.
As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.
As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.
A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters, 2009; 583:1713-1720. For example, Yamada et al. (Genome Research, 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50%, and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A., 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.
A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.
The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome), or a portion of the subset (e.g., those areas found to be associated with endometriosis). A methylome from plasma can be referred to a “plasma fluid methylome,” or a “plasma fluid DNA methylome.” The plasma fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA).
As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As used herein, the terms “reduce,” “reduction,” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As described herein, this Example provides methods for diagnosing, monitoring, classifying, and/or treating endometriosis by analyzing the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) of a mammal (e.g., a human female). In certain embodiments, the methods can include using an algorithm described herein. In certain embodiments, the methods described herein can allow for the early diagnosis or screening of a subject with endometriosis, e.g., the subject does not have any symptoms, or only have early symptoms of endometriosis.
In certain embodiments, samples obtained for use in the methods described herein can include cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA methylation signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, the methods described herein can include analyzing cfDNA in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample), to identify genetic phenotypes that are drivers and/or consequences of endometriosis.
The sample from the subject can be collected using any appropriate technique. For example, a blood sample, a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample can be collected using standard methods. In some cases, the sample can be collected from the subject before the subject has any symptom of endometriosis, i.e., a non-symptomatic subject. In certain embodiments, the sample can be collected from the non-symptomatic subject who is at high risk of endometriosis. In certain embodiments, the sample can be collected from the subject who has previously received or is currently receiving a treatment for endometriosis. In certain embodiments, two or more samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more samples) can be obtained before and during the subject is receiving a treatment for endometriosis (e.g., serially obtained samples).
Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example
This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a subject that includes analyzing the methylation status of certain genomic loci.
In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a sample from a subject that has endometriosis compared to a reference sample. For example, the methods described herein can include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a sample of a subject. In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes, or a combination thereof. In certain embodiments, the genomic loci are present in intergenic regions. In certain embodiments, the genomic loci are present on a particular chromosome.
In certain embodiments, this Example provides methods for diagnosing, prognosing, and/or monitoring endometriosis in a subject by detecting the DNA methylation profiles associated with endometriosis. In certain embodiments, the methods described herein can include (a) obtaining a sample from the subject, (b) determining the methylation status of one or more genomic loci present in the sample, e.g., present within cfDNA in a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample, (c) comparing the methylation status of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing endometriosis in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the subject. In certain embodiments, the difference in the methylation status also can indicate the severity of endometriosis.
In certain embodiments, the methods described herein for diagnosing, prognosing, and/or monitoring endometriosis in a subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing endometriosis in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the subject. In certain embodiments, the difference in the methylation level also can indicate the severity of endometriosis.
In certain embodiments, diagnosing endometriosis in the subject can include characterizing a phenotype of the endometriosis, wherein the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the phenotype of the endometriosis. In certain embodiment, the phenotype of the endometriosis can include the severity of the endometriosis, prognosis of the endometriosis, molecular expression profile of the endometriosis, responsiveness of the endometriosis to certain treatments, or any combinations thereof.
In certain embodiments, the methods described herein for determining if a subject is at risk of developing endometriosis in the subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) determining that the subject is at risk of developing endometriosis, wherein the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates that the subject is at risk.
In certain embodiments, diagnosing, prognosing, and/or monitoring of a subject with endometriosis can be based on a higher or lower methylation level of the genomic locus in the sample of the subject relative to the methylation level in a reference sample, e.g., a sample from a subject that does not have endometriosis. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a sample obtained from a subject compared to a control can be indicative that the subject has endometriosis or is at risk of developing endometriosis. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the sample and the increase in the level of methylation of one or more different genomic loci in the sample can indicate the presence of endometriosis.
In certain embodiments, diagnosis of a subject with endometriosis can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with endometriosis can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with endometriosis can be unmethylated and the genomic locus in a reference sample can be methylated.
Diagnostic, Prognostic, Classification, and Monitoring Methods Using an Algorithm of this Example
This Example also provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm as described, for example, in Example Embodiment H. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.
Methods for Treating Endometriosis
This Example also provides methods for treating a subject having endometriosis. For example, a mammal (e.g., a human female) that was identified as having endometriosis as described herein (or identified as being at risk of developing endometriosis as described herein) can be administered one or more hormone therapies, one or more pain medications, or a combination thereof to treat endometriosis. Examples of hormone therapies that can be used as described herein include, without limitation, gonadotropin-releasing hormone therapies (e.g., elagolix), estrogen therapies, progestin therapies, estrogen and progestin combination therapies, progesterone therapies, progesterone and progestin combination therapies, danazol therapies, and gestrinone therapies. Examples of pain medications that can be used as described herein include, without limitation, nonsteroidal anti-inflammatory drugs. In some cases, a mammal (e.g., a human female) that was identified as having endometriosis as described herein (or identified as being at risk of developing endometriosis as described herein) can be treated using a surgical procedure to treat endometriosis. Examples of surgical procedures that can be used as described herein include, without limitation, laparoscopic surgeries to remove one or more endometriosis patches, laparotomy surgeries to remove one or more endometriosis patches, and surgeries to sever pelvic nerves (e.g., presacral neurectomy or laparoscopic uterine nerve ablation surgeries).
In some cases, the information provided by the methods described herein can be used by a clinician or physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of endometriosis is made. For example, when a subject is identified to have an increased risk of developing endometriosis, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with endometriosis (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.
In some cases, this Example provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for treating endometriosis in a subject, comprising determining the methylation status of one or more genomic loci present in a sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for treating endometriosis in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.
In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment of endometriosis can include measuring the methylation status and/or level of one or more genomic loci in a sample of a subject at a first time-point, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second time-point, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first time-point can be prior to an administration of the therapeutic agent, and the second time-point can be after said administration of the therapeutic agent. In certain embodiments, the first time-point can be prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) can be increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the methods described herein can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced, and/or stopped.
Assays of this Example
This Example also provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence, and/or severity of endometriosis. In some cases, the assay method can include comparing the methylation status and/or level of genomic loci present in a sample from a subject that has endometriosis to the methylation status and/or level of genomic loci in a sample from a healthy subject to determine the methylation pattern, as described above, that correlates with the presence of endometriosis. In some cases, the assay methods can include comparing the methylation status and/or level of genomic loci in a sample from a subject that has endometriosis at an early stage to the methylation status and/or level of genomic loci in a sample from a subject that has endometriosis at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of endometriosis.
DNA Isolation Techniques of this Example
In certain embodiments, the methods described herein can include isolating nucleic acid from a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) obtained from a subject. Any appropriate technique can be used to isolate nucleic acids from a sample. For example, isolation of DNA from a plasma sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302), and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., plasma samples) or cells. For example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a sample from a subject.
Methylation Detection Techniques of this Example
Various methylation analysis procedures are known in the art, and can be used with the methods described herein. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.
In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process, and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).
In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci are provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.
Kits of this Example
This Example provides kits for diagnosing, monitoring, classifying, and/or treating a subject with endometriosis. The kits described herein can comprise a means for determining and/or detecting the methylation status of one or more genomic loci.
Kits of this Example can include, without limitation, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers, or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, a kit described herein can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes can comprise one or more CpG sites.
In certain non-limiting embodiments, a primer and/or probe described herein can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length, and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.
In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.
In certain non-limiting embodiments, the kits described herein can additionally include other components such as a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, and/or negative control sequences necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.
In certain embodiments, the kits described herein can include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of endometriosis in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.
Reports, Programmed Computers, and Systems of this Example
In certain embodiments, a diagnosis and/or monitoring of endometriosis in a subject based on the methylation status of one or more genomic loci as described herein can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).
Examples of tangible reports can include, without limitation, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).
A report can include, for example, an individual's medical history, or can just include size, presence, absence, or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.
A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.
In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm disclosed herein (see Example Embodiment H). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's endometriosis risk or treat the individual, such as by implementing an endometriosis management system.
Example Embodiment G of Example 4: Discovery of Putative Uterine/Endometrial-Derived Nucleic Acids in Human Plasma Via Epigenomic Liquid BiopsyExample Embodiment G provides an analysis of epigenomic liquid biopsy of human plasma for the discovery of putative uterine/endometrial-derived cell-free DNA (cfDNA) methylation signatures. Example Embodiment G used solution phase hybridization and high throughput bisulfate sequencing to compare DNA methylation signatures of cfDNA obtained from the plasma samples of women who had previously had a hysterectomy with those from control women who had not previously had a hysterectomy.
A total of n=9 women (the case group) who had previously had a hysterectomy and n=11 women (the control group) who had not previously had a hysterectomy were included. No women in the control group were pregnant at the time of analysis.
Methods
DNA was extracted from plasma volumes ranging from ˜6.3 to 8.1 mL using the NucleoSnap DNA Plasma Kit (Macherey-Nagel) and quantified by Agilent Bioanalyzer. The average yield of cfDNA was 5.625 ng/mL plasma. DNA sequencing libraries were prepared using the Kapa Hyper Prep Kit (Roche). Zymo DNA Methylation Direct Kit was used for bisulfite conversion and targeted libraries were prepared using the SeqCap-Epi (Roche). Libraries were sequenced on an Illumina HiSeq 2500 instrument using 150 bp, paired-end reads.
Reads were trimmed for quality and adaptor sequences using Trim-Galore. The reads were aligned to the human reference sequence (GRCh38/hg20) using Bismark in paired-end and single-end (unaligned paired-end reads), bowtie2 modes. Read duplicates were removed using Bismark. Methylation was called on paired-end and single-end files and then merged. Average on-target coverage was 44.49×.
CpG methylation calls with read depth of at least 10× were read into MethylSig for differential methylation analysis. MethylKit was used for sliding window analysis with a window size of 250 bp and step size of 50 bp. Differentially methylated regions were identified according to (insert paper reference).
Results
Analysis of plasma epigenomic liquid biopsy data using MethylSig revealed 42,403 significant CpG sites that were differentially methylated between cases and controls. Similarly, analysis of the same plasma epigenomic liquid biopsy data using MethylKit revealed 16,562 significant 250 bp sliding windows (identified by Bonferroni corrected p-value and a confidence interval methylation difference of at least 5%) that were differentially methylated between cases and controls.
To further explore the data, Example Embodiment G used the Genotype-Tissue Expression (GTEx) project database. GTEx is an on-going effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Example Embodiment G identified genes whose expressions are low in whole blood and comparatively highly expressed in uterine tissues. It was rationalized that leukocytes are a major (though not exclusive) contributor to cfDNA in plasma. Furthermore, the uterus is likely to be the sole contributor of uterine/endometrial-derived cfDNA in plasma. Thus, the identification of tissue-specific differences in gene expression in this context, was carried out to illuminate the DNA methylation data and assist in the discovery of putative uterine/endometrial-derived cfDNA in plasma. Specifically, genes were identified whose expressions were elevated by a minimum of 5-fold in uterus compared to whole blood and whose expressions were below two transcripts per million (TPM) (from GTEx). These were then merged with the list of n=16,562 significant CpGs identified via sliding windows analysis (see above). This identified 3,538 significantly differentially methylated windows (not shown) located within and adjacent to the tissue-specific differentially expressed genes. The top n=30 most significantly differentially methylated windows from this output are listed in Table 11. This list of CpG sites represents examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment H.
This example demonstrates a comprehensive quantitative genome-wide analysis of DNA methylation in human plasma from women who have previously had a hysterectomy and women who have not previously had a hysterectomy. These results demonstrate that epigenomic liquid biopsy of human plasma can be used for quantitative analysis of cfDNA methylation and, in this example, the discovery of putative uterine/endometrial-derived DNA methylation signatures.
Example Embodiment H of Example 4: Estimation of Abnormal Plasma Methylome Variation in Targeted Regions for Diagnosis of EndometriosisProvided below is an algorithm that can be used to diagnose a subject with endometriosis. The presently disclosed subject matter provides that the methylome(s) of uterus of a mammal, or structures therein (e.g., endometrium), could be affected by certain abnormalities (e.g., endometriosis), and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in plasma, which are released by uterus/endometrium-derived tissues. An algorithm was developed to identify the changes of methylation patterns in the methylome of plasma caused by uterine/endometrial phenotypes.
The main insight behind this algorithm was that the methylome of the DNA fragments in plasma is a mixture of a variety of component methylomes of uterine/endometrial origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal uterine/endometrial phenotypes. By constructing a model of plasma methylome as a linear combination of various component methylomes of uterine/endometrial and other origins, the algorithm can accurately predict the methylation patterns of a new plasma sample under the hypothesis that it is from a normal individual.
Consequently, the algorithm of this Example has high sensitivity for detecting abnormal methylation patterns in a plasma sample caused by changes of the methylomes of some uterine/endometrial or other relevant tissues when the sample is from an affected individual.
The procedure can be applied with little modification to the diagnosis and phenotyping of endometriosis using other types of biopsy samples, such as cervical swabs, vaginal swabs, urine, and tampon blood, provided that the DNA fragments from the tissues affected by endometriosis can be found in those biopsy samples.
Let i be any CpG site in human genome, zi,j be the methylation level of CpG site i in a plasma sample j, pi,r,j be the proportion of the rth component methylome mr,j of ovarian origin in plasma sample j at site i, mi,r,j be the methylation level of CpG i in methylome mr,j. The hypothesis is:
Zi,j=Σr=1Rpi,r,jmi,r,j (1)
where pI,r,j, mI,r,j>=0, mI,r,j<=1, pI,1,j+ . . . +pI,R,j=1.
It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any plasma j from a normal individual, it has mI,r,j=mI,r and pI,r,j=pr,j.
That is, it is assumed that in any plasma sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, plasma samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m1S, . . . , mRS or simply m1, . . . , mR when there is no confusion. For any plasma sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let zis be the methylome of plasma sample C restricted to S, then for some mixture vector pj=[pj,l . . . , pj,R]T, it has:
zjs=[m1S, . . . ,mRS]pj (2)
Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets Tk such that T=Uk=1KTk where the index k represents the kth type of abnormal uterine/endometrial phenotype. Tk's do not need to be disjoint. Moreover, Tk itself is the union of two disjoint sets Dk and Vk. Either Dk or Vk could be empty, but not both. It is assumed that for any plasma sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: zjC=[m1C, . . . , mRC]pj regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a plasma sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w1S≠[m1S, . . . , mRS]pl for any mixture vector pl. More specifically, for a plasma sample l from an individual with the kth type of abnormal phenotype, it has: 1), wjC=[m1C, . . . , mRC]pl, 2), if DK is non-empty, then wl
T is called the target set of CpG sites, Dk is called the differential methylation target set, Vk is called the copy number variation target set, and Tk is called the target set for the kth type of abnormal phenotype.
The main steps of the algorithm of this Example are:
-
- 1) Identify the sets of reference CpG sites C, and T1, . . . , TK for the list of K types of abnormal individuals.
- 2) Estimate the restricted reference component methylomes m1, . . . , mR, or R predictor methylomes n1, . . . , nR that are independent linear combinations of the reference component methylomes such that nr=[m1, . . . , mR]qr for R linearly independent mixture vectors q1, . . . , qR.
- 3) (Optional) If the reference component methylomes are available, estimate the proportions of these components at the reference CpG sites C for the test plasma samples.
- 4) Predict the methylation level of the test plasma samples at the target set Tk of CpG sites, under the hypothesis that the sample is from a normal individual.
- 5) Compare the predicted methylation levels at Dk and Vk against the observed methylation levels, and reject the null hypothesis that a test sample is from a normal individual if the observed methylation levels are significantly different form the predicted levels.
The algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of plasma samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.
In the simplest implementation of the algorithm of this Example, it is assumed the restricted methylome of a plasma sample from a normal individual can be approximated by a mixture of two restricted reference methylomes. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, for the genomic loci of interest, the plasma methylome is approximated by the mixture of leukocyte and uterine/endometrial- or other relevant tissue/cell-derived methylomes. The implementation of the algorithm includes the following steps:
1. Identify the Reference Set C, and the Target Sets T1, . . . , TK.
-
- 1.1 Collect the methylation data for a set of leukocyte samples, a set of uterine/endometrial- or other relevant tissue/cell-derived samples, and a set of plasma samples, all from normal individuals. For each type of abnormal individuals, collect a set of leukocyte-derived samples, a set of uterine/endometrial- or other relevant tissue/cell-derived samples, and a set of plasma samples from that type of abnormal individuals. All these samples should have matched age, race, and other relevant parameters. These are the training data.
- 1.2 Let xi,j be the observed methylation level of CpG site i in a normal leukocyte-derived sample j, and yi,l the observed methylation level of CpG site i in a normal uterine/endometrial- or other relevant tissue/cell-derived sample l, sx,i2 the sample variance of xi,j over all normal leukocyte-derived samples, sy,i2 the sample variance of yi,j over all normal uterine/endometrial- or other relevant tissue/cell-derived samples. Identify the CpG sites S0 such that for any i∈S0, it has both sx,i2<c0 and sy,i2<c0 for some constant c0. These are CpG sites with stable methylation levels in each type of normal cells.
- 1.3 Let xi,j be the observed methylation level of CpG site i in a leukocyte-derived sample j, including normal and abnormal, and y0 the observed methylation level of CpG site i in a uterine/endometrial- or other relevant tissue/cell-derived sample 1, including normal and abnormal, sx,i2 the sample variance of xi over all leukocyte-derived samples, including normal and abnormal, sy,i2 the sample variance of yi,j over all uterine/endometrial- or other relevant tissue/cell-derived samples, including normal and abnormal. Identify the CpG sites S1 such that for any i∈Si, it has both sx,i2<c0 and sy,i2<c0 for some constant c0, and that the statistical test for the difference between {xi,j0: j0 is a normal leukocyte—derived sample} and {xi,jk: jk is an abnormal leukocyte—derived sample of type k} is not significant for all abnormal types of leukocyte-derived, and that the statistical test for the difference between {yi,j0: j0 is a normal uterine/endometrial—or other relevant tissue/cell—derived sample} and {yi,jk: jk is an abnormal uterine/endometrial—or other relevant tissue/cell—derived sample of type k} is not significant for all abnormal types of uterine/endometrial- or other relevant tissue/cell-derived sample. These are CpG sites with stable methylation levels in each type of cells, and with no difference in methylation level between normal and any abnormal samples. Let xi be the sample mean of xi,j over all leukocyte-derived samples, including normal and abnormal, yi the sample mean of yi,j over all uterine/endometrial- or other relevant tissue/cell-derived samples, including normal and abnormal. Identify the subset C0 of S1 such that for any i∈C0, it has |xi−yi|>c1 for some constant c1. These are CpG sites that are stably methylated in each cell type, with no difference between the normal and abnormal samples of the same cell type, and differentially methylated between different types of cells.
- 1.4 Let xR
0 be the vector of xi for all i∈C0, and yC0 be the vector of yi for all i∈C0, where xi is the mean methylation at site i in all leukocyte-derived samples yi the mean methylation at site i in all uterine/endometrial- or other relevant tissue/cell-derived samples. Note that by the way the set C0 is selected, there is no difference in the methylation level of any CpG sites in C0 between normal and abnormal leukocyte-derived samples, or between normal and abnormal uterine/endometrial- or other relevant tissue/cell-derived samples. Let zjC0 be the observed methylation levels of CpG sites in C0 for a plasma sample j of the kth abnormal type. (For convenience, the normal plasma sample is called as sample of the 0th abnormal type). For each sample j belonging to the kth abnormal type, regress zjC0 against xC0 and yC0 , with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1, and get the residual ejC0 . Identify the subset C0k of C0 such that for any CpG i in C0k, it has
-
- and ei,k2<c3 for some constants c2 and c3, where ei,k2 is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all plasma samples of the kth abnormal type, and si,k2 the sample variances of methylation levels of CpG site i in the same set of plasma samples. Repeat the above procedure for each type of abnormal plasma samples, the intersection of the subsets C=∩k=0K C0k is the reference set of CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal plasma samples can be accurately predicted by the reference component methylomes from normal individuals.
- 1.5 Let T0=S0\ S1. Let xC and xT
0 be the vectors of xi and xh for all i∈C and h∈T0 respectively, and yC and yT0 be the vectors of yi and yh for all i∈C and h∈T0 respectively, where xi, xh, yi, and yh are mean methylation level of sites for a normal leukocyte-derived or uterine/endometrial- or other relevant tissue/cell-derived sample at sites i and h respectively. Let zjC and zjT0 and be the observed methylation levels of CpG sites in C and T0 respectively for a normal plasma sample j, wlk C and wlk T0 the observed methylation level of CpG sites in C and T0 respectively for a plasma sample lk from an individual with the kth type of abnormality, wlg C and wlg T0 the observed methylation level of CpG sites in C and T0 respectively for a plasma sample lg from an individual with the gth type of abnormality, where g≠k. For each j, lk, and lg, regress zjC, wlk C, and wlg C respectively against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. Apply the fitted models respectively to xT0 and yT0 to predict zjT0 , wlk T0 and wlg T0 respectively, and get the differences ejT0 , elk T0 and elg T0 between the predicted values and observed values. Let ei, ei,k, and ei,g be the means of the sets of differences {ejT0 : j is a normal plasma sample}, {elk T0 : lk is a plasma sample of th kth abnormal type} and {elg T0 : lg is a plasma sample of the gth abnormal type} for CpG site i respectively. Identify the subset Tk of T0 such that for any i∈Tk, it has |ei|<c2,0, |ei,k|>c2,k, and |ei,k−ei,g|>c3,k, for some constants c2,0, C2,k, and C3,k, for all g≠k. Tk is the target set for the kth type of the abnormal individual. These are the sites where the methylation of a normal plasma sample can be accurately predicted, the observed methylation in a plasma sample of the kth abnormal type will deviate from the prediction, and deviation will be different from that of a plasma sample of any other abnormal type.
Recall that xc and yc are mean vectors of the methylation levels of the training leukocyte-derived and training uterine/endometrial- or other relevant tissue/cell-derived sample data for the CpG sites in the reference set C. For any new plasma sample t to be tested, let ztC be the observed methylation levels of CpG sites in C. Regress ztC against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients are the estimated fractions of the component methylomes for the plasma sample t.
3. Test if the New Plasma Samples are from the kth Type of Abnormal Individual.
For the new plasma sample t, let xT
Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the plasma methylomes, nor does it need to approximate them by mixtures of the component methylomes. Instead, a set of predictor methylomes can be collected that are themselves mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of plasma samples with known different proportion of leukocyte-derived and uterine/endometrial- or other relevant tissue/cell-derived sample DNAs.
In the algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a plasma sample the methylome has been affected by endometriosis. To illustrate the advantage of this approach, it is assumed that the mixture vector pi for the methylome of a normal plasma sample j followed a Dirichlet's distribution with parameters α1= . . . =αR. Furthermore, for CpG site i, its methylation levels in the R reference vector pi for component methylomes are mi,r=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of
If there is a methyl-seq library of sample j with a coverage of N for CpG site i, the variance of the measured methylation level zi,j is
In other words, if zi,j is used as a test statistic to detect and phenotype endometriosis using a plasma sample, under the null hypothesis, the test statistic has a variance of σ12. However, in the presently disclosed algorithm, it is first estimated the mixture vector pj, then predicted zi,j by Σrmi,rpr,j. Note that in methyl-seq data, millions of CpG sites can be contained in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector pi, even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σrmi,rpr,j can be obtained, that is, the error of the estimation can be ignored, the variance of the difference zi,j−Σrmi,rpr,j between the observed methylation level and the prediction will be
In other words, under the null hypothesis, the test static zi,j−ΣrMi,r pr,j used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic zi,j. This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.
Examples of Embodiments for Example 4D1. A method for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a mammal, comprising:
(a) obtaining a sample from the mammal;
(b) determining the methylation status and/or level of one or more genomic loci in the sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and
(d) diagnosing endometriosis in the mammal,
wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the mammal.
D2. The method of embodiment D1, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the presence of endometriosis in the mammal or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the presence of endometriosis in the mammal.
D3. The method of embodiment D1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the presence of endometriosis in the mammal.
D4. A method of treating endometriosis in a mammal, comprising:
(a) obtaining a sample from the mammal;
(b) determining the methylation status and/or level of one or more genomic loci present in the sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values;
(d) diagnosing endometriosis in the mammal, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference indicates the presence of endometriosis in the mammal; and
(e) administering a hormone therapy, a pain medication, or both to said mammal.
D5. The method of any one of embodiments D1-D4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a sample obtained from a mammal that does not have endometriosis.
D6. The method of any one of embodiments D1-D5, wherein said sample is a plasma sample.
D7. A method of treating endometriosis comprising:
(a) measuring the methylation status and/or level of one or more genomic loci present in a sample from a mammal prior to a treatment of endometriosis;
(b) measuring the methylation status and/or level of one or more genomic loci present in a sample from the mammal during the treatment of endometriosis; and
(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the mammal is responsive to the treatment.
D8. The method of embodiment D7, further comprising (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the mammal is not responsive to the treatment.
D9. The method of embodiment D7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment.
D10. The method of embodiment D7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment.
D11. The method of embodiment D7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment.
D12. The method of embodiment D7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment.
D13. The method of any one of embodiments D7-D12, wherein said sample is a plasma sample.
D14. The method of any one of embodiments D1-D13, wherein the one or more genomic loci comprise one or more CpG sites.
D15. The method of any one of embodiments D1-D14, wherein the one or more genomic loci are present within nucleic acids isolated from the sample.
D16. The method of any one of embodiments D1-D15, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.
D17. A method of treating endometriosis, comprising;
(a) diagnosing endometriosis in a mammal by utilization of the algorithm disclosed in Example Embodiment H; and
(b) administering a hormone therapy, a pain medication, or both to said mammal to treat said endometriosis.
D18. The method of any one of embodiments D1-D17, wherein said mammal is a human.
D19. A kit for diagnosing, prognosing, and/or monitoring endometriosis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci.
D20. The kit of embodiment D19, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
This Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis. For example, algorithms, kits, and methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis are provided.
Example 5—Methods and Materials for Assessing and Treating Necrotizing Enterocolitis Field of this ExampleThis Example relates document relates to methods and materials involved assessing a mammal (e.g., a human infant) for and/or treating a mammal (e.g., human infant) having or developing necrotizing enterocolitis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a biopsy (e.g., a plasma sample, a blood sample, or a stool sample) to determine whether or not a mammal has, or is developing, necrotizing enterocolitis. This Example also provides methods, algorithms, and kits for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis.
Background of this ExampleNecrotizing enterocolitis affects low birth weight infants in the first weeks of life with a reported frequency of between 1 and 5 percent of neonatal intensive care unit admissions with mortality rates of between 15 and 30 percent. Necrotizing enterocolitis currently is suspected through the visualization of distended abdomen and confirmed via x-ray. By the time final diagnosis occurs through this process, however, the disease has already progressed to the point that typically requires surgical intervention, which carries a 50 percent mortality rate.
Summary of this ExampleThis Example provides methods and materials involved in assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing necrotizing enterocolitis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a biopsy (e.g., a plasma sample, a blood sample, or a stool sample) to determine whether or not a mammal has, or is developing, necrotizing enterocolitis. Determining if a mammal (e.g., a human) has, or is likely to develop, necrotizing enterocolitis by assessing DNA methylation profiles of nucleic acid within a biopsy (e.g., a plasma sample, a blood sample, or a stool sample) can aid in the identification of mammals (e.g., humans) that should be treated in a particular manner (e.g., by administering an antibiotic therapy, by feeding intravenously as opposed to by mouth, and/or by performing a surgical treatment), for example, early in the disease process.
This Example also provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, blood sample, or a stool sample) of a mammal (e.g., a human infant). This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis.
In one aspect, the present disclosure provides a method for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a mammal (e.g., a human infant) comprising: (a) obtaining a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and (d) identifying the mammal as having necrotizing enterocolitis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal.
In another aspect, this Example provides a method of treating necrotizing enterocolitis in a mammal (e.g., a human infant) comprising: (a) obtaining a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci present in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; (d) identifying the mammal as having necrotizing enterocolitis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal; and (e) treating the mammal by administering an antibiotic therapy, by feeding intravenously as opposed to by mouth, and/or by performing a surgical treatment.
In some cases, an increase in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a blood sample, or a stool sample) can indicate the presence of necrotizing enterocolitis in the mammal or a decrease in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a blood sample, or a stool sample) indicates the presence of the necrotizing enterocolitis in the mammal. In some cases, a decrease in the level of methylation of at least one of the one or more genomic loci in a sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample can indicate the presence of necrotizing enterocolitis in the mammal.
In some cases, the reference can be the methylation status and/or level of the one or more genomic loci in a sample (e.g., a plasma sample, a blood sample, or a stool sample) obtained from a mammal (e.g., a human infant) that does not have necrotizing enterocolitis.
In another aspect, this Example provides a method of treating necrotizing enterocolitis in a mammal (e.g., a human infant) comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal prior to a treatment of necrotizing enterocolitis; (b) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal during the treatment of necrotizing enterocolitis; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the subject is responsive to the treatment. In some cases, the method further comprises (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the subject is not responsive to the treatment.
In some cases, an increase in the level of methylation of the one or more genomic loci in the sample indicates that the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the subject is not responsive to the treatment.
This Example further provides algorithms for diagnosing and/or monitoring a mammal having necrotizing enterocolitis. In certain embodiments, the algorithm can be used to classify necrotizing enterocolitis of a mammal (e.g., a human infant).
In another aspect, this Example provides a kit for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.
Description of this ExampleThis Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample of a mammal (e.g., a human infant). In some cases, the methods described herein can include the use of an algorithm to diagnose, prognose, monitor, classify, and/or assist in the treatment of necrotizing enterocolitis. Non-limiting embodiments of this Example are described by the present specification and Examples.
Unless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this Example and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this Example: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of this Example and how to make and use them.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. The present Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.
As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing,” and “comprising” are interchangeable, and one of skill in the art is cognizant that these terms are open ended terms.
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence, and/or level of a biomarker in a sample of a mammal (e.g., a human) is determined by comparing to a reference control.
The terms “reference sample,” “reference control,” “control,” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a sample of a mammal. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have necrotizing enterocolitis. In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence, and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence, and/or particular level of a methylation state of a genomic locus that indicates a subject does not have necrotizing enterocolitis. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has necrotizing enterocolitis, where the methylation status of the locus is known to be not associated with the disease or the phenotype.
The term “a set of predicted values” refers to the methylation status of certain genomic loci for a sample. The status of those loci is not directly measured from that sample. Rather, it is inferred from measurements of other loci for that sample and/or measurements of other samples. The inference of the predicted values is based on mathematical/statistical models. The models are designed under the null hypothesis that the sample for which the methylation status of those loci is to be predicted has a normal phenotype.
The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from the intestinal tract. In certain embodiments, slightly invasive or non-invasive methods, as described herein, include obtaining plasma or stool from a subject.
The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include mammals, non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.
The term “nucleic acid,” “nucleic acid molecule,” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T), or uracil (U)), a sugar (i.e., deoxyribose or ribose), and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.
The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins, and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.
The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island, or a CpG island shore. For example, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.
As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.
As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.
As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.
A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters, 2009; 583:1713-1720. For example, Yamada et al. (Genome Research, 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50%, and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A., 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.
A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.
The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome), or a portion of the subset (e.g., those areas found to be associated with necrotizing enterocolitis). A methylome from plasma can be referred to a “plasma fluid methylome,” or a “plasma fluid DNA methylome.” The plasma fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA). A methylome from stool can be referred to a “stool fluid methylome,” or a “stool fluid DNA methylome.” The stool fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA).
As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As used herein, the terms “reduce,” “reduction,” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As described herein, this Example provides methods for diagnosing, monitoring, classifying, and/or treating necrotizing enterocolitis by analyzing the methylation status of one or more genomic loci in a sample (e.g., a plasma sample or a stool sample) of a mammal (e.g., a human infant). In certain embodiments, the methods can include using an algorithm described herein. In certain embodiments, the methods described herein can allow for the early diagnosis or screening of a subject with necrotizing enterocolitis, e.g., the subject does not have any symptoms, or only have early symptoms of necrotizing enterocolitis.
In certain embodiments, samples obtained for use in the methods described herein can include cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, the methods described herein can include analyzing cfDNA in a sample (e.g., a plasma sample or a stool sample), to identify genetic phenotypes that are drivers and/or consequences of necrotizing enterocolitis.
The sample from the subject can be collected using any appropriate technique. For example, a blood sample, a plasma sample, or a stool sample can be collected using standard methods. In some cases, the sample can be collected from the subject before the subject has any symptom of necrotizing enterocolitis, i.e., a non-symptomatic subject. In certain embodiments, the sample can be collected from the non-symptomatic subject who is at high risk of necrotizing enterocolitis (e.g., a preterm baby). In certain embodiments, the sample can be collected from the subject who has previously received or is currently receiving a treatment for necrotizing enterocolitis. In certain embodiments, two or more samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more samples) can be obtained before and during the subject is receiving a treatment for necrotizing enterocolitis (e.g., serially obtained samples).
Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example
This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a subject that includes analyzing the methylation status of certain genomic loci.
In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a sample from a subject that has necrotizing enterocolitis compared to a reference sample. For example, the methods described herein can include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in Example Embodiment I. In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes, or a combination thereof. In certain embodiments, the genomic loci are present on a particular chromosome.
In certain embodiments, this Example provides methods for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a subject by detecting the DNA methylation profiles associated with necrotizing enterocolitis. In certain embodiments, the methods described herein can include (a) obtaining a sample from the subject, (b) determining the methylation status of one or more genomic loci present in the sample, e.g., present within cfDNA in a plasma sample or a stool sample, (c) comparing the methylation status of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing necrotizing enterocolitis in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the subject. In certain embodiments, the difference in the methylation status also can indicate the severity of necrotizing enterocolitis.
In certain embodiments, the methods described herein for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing necrotizing enterocolitis in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the subject. In certain embodiments, the difference in the methylation level also can indicate the severity of necrotizing enterocolitis.
In certain embodiments, diagnosing necrotizing enterocolitis in the subject can include characterizing a phenotype of the necrotizing enterocolitis, wherein the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the phenotype of the necrotizing enterocolitis. In certain embodiment, the phenotype of the necrotizing enterocolitis can include the severity of the necrotizing enterocolitis, prognosis of the necrotizing enterocolitis, molecular expression profile of the necrotizing enterocolitis, responsiveness of the necrotizing enterocolitis to certain treatments, or any combinations thereof.
In certain embodiments, the methods described herein for determining if a subject is at risk of developing necrotizing enterocolitis in the subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) determining that the subject is at risk of developing necrotizing enterocolitis, wherein the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates that the subject is at risk.
In certain embodiments, diagnosing, prognosing, and/or monitoring of a subject with necrotizing enterocolitis can be based on a higher or lower methylation level of the genomic locus in the sample of the subject relative to the methylation level in a reference sample, e.g., a sample from a subject that does not have necrotizing enterocolitis. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a sample obtained from a subject compared to a control can be indicative that the subject has necrotizing enterocolitis or is at risk of developing necrotizing enterocolitis. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the sample and the increase in the level of methylation of one or more different genomic loci in the sample can indicate the presence of necrotizing enterocolitis.
In certain embodiments, diagnosis of a subject with necrotizing enterocolitis can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with necrotizing enterocolitis can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with necrotizing enterocolitis can be unmethylated and the genomic locus in a reference sample can be methylated.
Diagnostic, Prognostic, Classification, and Monitoring Methods Using an Algorithm of this Example
This Example also provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm as described, for example, in Example Embodiment J. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.
Methods for Treating Necrotizing Enterocolitis
This Example also provides methods for treating a subject having necrotizing enterocolitis. For example, a mammal (e.g., a human infant) that was identified as having necrotizing enterocolitis as described herein (or identified as being at risk of developing necrotizing enterocolitis as described herein) can be administered one or more antibiotic therapies, can be fed intravenously as opposed to feeding by mouth, or a combination thereof to treat necrotizing enterocolitis. Examples of antibiotic therapies that can be used as described herein include, without limitation, ampicillin, gentamycin, vancomycin, cefepime, and metronidazole. In some cases, a mammal (e.g., a human infant) that was identified as having necrotizing enterocolitis as described herein (or identified as being at risk of developing necrotizing enterocolitis as described herein) can be treated using a surgical procedure to treat necrotizing enterocolitis. Examples of surgical procedures that can be used as described herein include, without limitation, exploratory laparotomy, resection of affected intestine and creation of stoma and reanastamosis.
In some cases, the information provided by the methods described herein can be used by a clinician or physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of necrotizing enterocolitis is made. For example, when a subject is identified to have an increased risk of developing necrotizing enterocolitis, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with necrotizing enterocolitis (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.
In some cases, this Example provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for treating necrotizing enterocolitis in a subject, comprising determining the methylation status of one or more genomic loci present in a sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for treating necrotizing enterocolitis in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.
In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment of necrotizing enterocolitis can include measuring the methylation status and/or level of one or more genomic loci in a sample of a subject at a first time-point, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second time-point, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first time-point can be prior to an administration of the therapeutic agent, and the second time-point can be after said administration of the therapeutic agent. In certain embodiments, the first time-point can be prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) can be increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the methods described herein can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced, and/or stopped.
Assays of this Example
This Example also provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence, and/or severity of necrotizing enterocolitis. In some cases, the assay method can include comparing the methylation status and/or level of genomic loci present in a sample from a subject that has necrotizing enterocolitis to the methylation status and/or level of genomic loci in a sample from a healthy subject to determine the methylation pattern, as described above, that correlates with the presence of necrotizing enterocolitis. In some cases, the assay methods can include comparing the methylation status and/or level of genomic loci in a sample from a subject that has necrotizing enterocolitis at an early stage to the methylation status and/or level of genomic loci in a sample from a subject that has necrotizing enterocolitis at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of necrotizing enterocolitis.
DNA Isolation Techniques of this Example
In certain embodiments, the methods described herein can include isolating nucleic acid from a sample (e.g., a plasma sample or a stool sample) obtained from a subject. Any appropriate technique can be used to isolate nucleic acids from a sample. For example, isolation of DNA from a plasma sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302), and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., plasma samples) or cells. For example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a sample from a subject.
Methylation Detection Techniques of this Example
Various methylation analysis procedures are known in the art, and can be used with the methods described herein. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.
In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process, and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).
In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci are provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.
Kits of this Example
This Example provides kits for diagnosing, monitoring, classifying, and/or treating a subject with necrotizing enterocolitis. The kits described herein can comprise a means for determining and/or detecting the methylation status of one or more genomic loci.
Kits described herein can include, without limitation, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers, or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, a kit described herein can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes can comprise one or more CpG sites.
In certain non-limiting embodiments, a primer and/or probe described herein can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length, and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.
In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.
In certain non-limiting embodiments, the kits described herein can additionally include other components such as a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, and/or negative control sequences necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.
In certain embodiments, the kits described herein can include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of necrotizing enterocolitis in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.
Reports, Programmed Computers, and Systems of this Example
In certain embodiments, a diagnosis and/or monitoring of necrotizing enterocolitis in a subject based on the methylation status of one or more genomic loci as described herein can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).
Examples of tangible reports can include, without limitation, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).
A report can include, for example, an individual's medical history, or can just include size, presence, absence, or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.
A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.
In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm disclosed in this Example (see Example Embodiment J). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's necrotizing enterocolitis risk or treat the individual, such as by implementing an necrotizing enterocolitis management system.
Example Embodiment I of Example 5: Whole Genome Bisulfite Sequencing of Laser-Captured Enterocytes from NEC-Affected Colon and IleumThis Example Embodiment I identifies DNA methylation differences in enterocyte genomic DNA that may serve as biomarkers for NEC and also provides new insights into disease pathogenesis as well as tissue origin (colon or ileum). These experiments confirm that completely unbiased analysis of DNA methylation in an enterocyte cell-type-specific fashion can be performed. The goal of these studies was to differentiate patients with NEC from controls, thereby identifying putative diagnostic biomarkers that may be present in stool.
It is challenging to perform laser capture microdissection (LCM) followed by whole genome bisulfite sequencing (WGBS) in limited tissue samples. However,
Note that, control tissue samples were in fact healed NEC tissue, obtained during surgical re-anastomosis. These samples were the only readily available source of human tissue for this study design, and the practice of using these controls is appropriate in the NEC field. When evaluating premature intestine, it is noted that the average premature infant is not having a bowel resection or a routine endoscopy as in older children/adults. Control bowel is never resected from a healthy infant, and it is very rare that surgeons will take non-necrotic margins from patients with NEC as they want to preserve every centimeter of bowel as possible. Bowel can be resected for non-inflamed conditions such as patients intestinal atresia, but to have those patients aged matched with an infant with NEC is nearly impossible and as those patients usually deliver at term and further, would require multi-center funding to collect these samples.
NEC samples were obtained across a corrected gestational span of approximately 5 weeks. Given the challenges associated with consenting subjects and obtaining samples, samples were combined for analysis into case versus control and corrected gestational age variables were ignore. Furthermore, because of the manner in which individuals had, prior to the beginning of this study, been consented and samples obtained, complete information regarding neonatal sex for all samples was not obtained. The focus, therefore, was only on autosomal data to avoid the complication of X-inactivation and minimize the impact of sex differences.
Analysis of the data identified differentially-methylated CpG sites that differ between NEC and control colon and NEC and control ileum. These are summarized in
Further analysis identified the top 30 CpG sites, determined by LCM-coupled WGBS, with the most highly significant differences (q value) between NEC and control colon and the top 30 CpG sites with the most highly significant differences between NEC and control ileum. These are shown in Table 12A and Table 12B. The lists of CpG sites represent examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment J.
Targeted Genome-Wide Analysis of NEC-Specific DNA Methylation in Tissue Sections from Neonatal Ileum and Colon
Targeted genome-wide bisulfite DNA sequencing from histopathological sections obtained from the tips of the villi or the crypts during NEC and control neonatal gut samples (both colon and ileum) was performed. This method delivered higher sequencing read depth per-dollar than WGBS in well-characterized regions of the human genome including promoters, exons, introns, CpG islands, CpG island shores and enhancers. This included about 5.5 million CpG sites. This approach quantified DNA methylation levels within genes involved in known biological pathways.
Specifically, targeted genome-wide bisulfite sequencing was carried out on DNA extracted from histological tissue sections obtained from n=8 NEC colon, n=13 control colon, n=5 NEC ileum and n=9 control ileum samples. Bisulfite-converted DNA libraries underwent targeted capture using the SeqCap Epi CpGiant Enrichment Kit (Roche, Pleasanton, Calif.) and sequenced to a read depth of ˜50×. The Bismark Bisulfite Read Mapper was used to align the libraries against the GRCh38 reference genome and determine the methylation status for each CpG site. DNA methylation signatures were identified using the beta-binomial test implemented in the R packages methylSig and DSS. A total, across all samples, of aligned read pairs 3,126,811,295 was sequenced.
Note that, control tissue samples were in fact healed NEC tissue, obtained during surgical re-anastomosis. These samples were the only readily available source of human tissue for this study design, and the practice of using these controls is appropriate in the NEC field. NEC samples were obtained across a corrected gestational span of approximately 5 weeks. Given the challenges associated with consenting subjects and obtaining samples, samples were combined for analysis into case versus control, and corrected gestational age variables were ignore. Furthermore, because of the manner in which individuals had, prior to the beginning of this study, been consented and samples obtained, complete information regarding neonatal sex for all samples was not obtained. The focus, therefore, was only on autosomal data to avoid the complication of X-inactivation and minimize the impact of sex differences.
As shown in
The top 30 CpG sites, determined via targeted genome-wide analysis, with the most highly significant differences (q value) between NEC and control colon and the top 30 CpG sites with the most highly significant differences (q value) between NEC and control ileum were identified. These are shown in Table 13A and Table 13B. Again, the lists of CpG sites represent examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment J.
Targeted Genome-Wide Analysis of DNA Methylation in Whole Blood from NEC Cases and Controls
Blood samples from NEC patients (n=6) were compared with blood samples from control premature infants (n=6). A further samples from NEC cases (n=9) and from controls (n=7) were sequenced, and data is analyzed. These samples did not fully overlap with tissue samples analyzed. This was partly because of the availability of the tissue and blood samples and partly because of the challenges associated with gathering data from laser captured tissue, which meant that some tissue samples could not be assayed effectively. Data presented herein was from NEC samples that were surgical only. The goal was to identify DNA methylation changes in blood samples from affected individuals and advance the understanding of epigenomic dysregulation in the circulating hematopoietic cell compartment during NEC. As before, the SeqCap Epi CpGiant Enrichment Kit and related protocols as described above were used to generate an average of 42× read depth coverage across ˜80.5 Mb of the human genome. A total, across all samples, of aligned read pairs 1,671,368,776 was sequenced.
Analysis of the data identified differentially-methylated CpG sites that differ between NEC and control blood samples. These are summarized in
Targeted Genome-Wide Analysis of DNA Methylation in Stool Samples from NEC Cases and Controls
A discovery-based analysis of DNA methylation in stool samples from NEC affected neonates and controls was performed. The data was gathered from four cases and four controls. The SeqCap Epi CpGiant Enrichment Kit and related protocols as described above were used to generate bisulfite sequencing data.
Analysis of the stool data identified differentially-methylated CpG sites that differ between NEC and control samples. These are summarized in
Identification of Differentially Methylated Tissue Markers in Stool and Blood
One goal was to explore the overlap between DNA methylation differences identified in NEC versus control intestinal tissue and NEC versus control stool. It was hypothesized that differentially methylated CpG sites in intestinal tissue would be detectable in stool. Using the pilot data generated using the small number of stool samples from NEC patients and controls, the relationship between NEC-specific biomarkers identified in tissue with those identified in stool was explored. A total of n=41 autosomal CpG sites with adjusted p values of <0.05 and at least 20% methylation that are differentially methylated in both stool and tissue (ileum or colon) were identified. These were markers of interest because of their potential for translation into a stool-based assay for NEC (Table 16).
Some overlap of NEC-specific differentially methylated sequences between stool and blood samples was identified (
Provided below is an algorithm that can be used to diagnose a subject with necrotizing enterocolitis. The presently disclosed subject matter provides that the methylome(s) of intestinal tissue of a mammal, or structures therein (e.g., small intestines), could be affected by certain abnormalities (e.g., necrotizing enterocolitis), and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in stool, which are released by intestinal tissue. An algorithm was developed to identify the changes of methylation patterns in the methylome of stool caused by intestinal tissue phenotypes. The main insight behind this algorithm was that the methylome of the DNA fragments in stool is a mixture of a variety of component methylomes of intestinal origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal intestinal tissue phenotype. By constructing a model of stool methylome as a linear combination of various component methylomes of intestinal tissue origins, the algorithm can accurately predict the methylation patterns of a new stool sample under the hypothesis that it is from a normal individual. Consequently, the algorithm has high sensitivity for detecting abnormal methylation patterns in a stool sample caused by changes of the methylomes of some intestinal tissues when the sample is from an affected individual. The procedure can be applied with little modification to the diagnosis of NEC using other types of biopsy samples, such as plasma, provided that the DNA fragments from the tissues affected by NEC can be found in those biopsy samples.
Let i be any CpG site in human genome, zi,j be the methylation level of CpG site i in a stool sample j, pi,r,j be the proportion of the rth component methylome mr,j of intestinal tissue origin in stool sample j at site i, mi,r,j be the methylation level of CpG i in methylome mr,j. The hypothesis is:
Zi,j=Σr=1Rpi,r,jmi,r,j (1)
where pi,r,j, mi,r,j>=0, mi,r,j<=1, pi,1,j+ . . . +pi,R,j=1.
It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any stool j from a normal individual, it has mI,r,j=ml,r and pI,r,j=pr,j.
That is, it is assumed that in any stool sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, stool samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m1S, . . . , mRS or simply m1, . . . , mR when there is no confusion. For any stool sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let zjS be the methylome of stool sample C restricted to S, then for some mixture vector pj=[pj,l . . . , pj,R]T, it has:
zjs=[m1S, . . . mRS]pj (2)
Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets Tk such that T=Uk=1KTk where the index k represents the kth type of abnormal intestinal tissue phenotype. Tk's do not need to be disjoint. Moreover, Tk itself is the union of two disjoint sets Dk and Vk. Either Dk or Vk could be empty, but not both. It is assumed that for any stool sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: zjC=[m1C, . . . , mRC]pj regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a stool sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w1S≠[m1S, . . . , mRS]pl for any mixture vector pl. More specifically, for a stool sample l from an individual with the kth type of abnormal phenotype, it has: 1), wjC=m1C, . . . , mRC]pl, 2), if DK is non-empty, then wiDK=[m1,kD
T is called the target set of CpG sites, Dk is called the differential methylation target set, Vk is called the copy number variation target set, and Tk is called the target set for the kth type of abnormal phenotype.
The main steps of the algorithm of this Example are:
-
- 1) Identify the sets of reference CpG sites C, and T1, . . . , TK for the list of K types of abnormal individuals.
- 2) Estimate the restricted reference component methylomes mR, or R predictor methylomes n1, . . . , nR that are independent linear combinations of the reference component methylomes such that nr=[m1, . . . , mR]qr for R linearly independent mixture vectors q1, . . . , qR.
- 3) (Optional) If the reference component methylomes are available, estimate the proportions of these components at the reference CpG sites C for the test stool samples.
- 4) Predict the methylation level of the test stool samples at the target set Tk of CpG sites, under the hypothesis that the sample is from a normal individual.
- 5) Compare the predicted methylation levels at Dk and Vk against the observed methylation levels, and reject the null hypothesis that a test sample is from a normal individual if the observed methylation levels are significantly different form the predicted levels.
The algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of stool samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.
In the simplest implementation of the algorithm of this Example, it is assumed the restricted methylome of a stool sample from a normal individual can be approximated by a mixture of two restricted reference methylomes. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, for the genomic loci of interest, the stool methylome is approximated by the mixture of ileum and colon methylomes. The implementation of the algorithm includes the following steps:
1. Identify the Reference Set C, and the Target Sets T1, . . . , TK.
-
- 1.1 Collect the methylation data for a set of colon-derived cell samples, a set of ileum-derived cell samples, and a set of stool samples, all from normal individuals. For each type of abnormal individuals, collect a set of colon-derived samples, a set of ileum-derived cell samples, and a set of stool samples from that type of abnormal individuals. All these samples should have matched age, race, and other relevant parameters. These are the training data.
- 1.2 Let xi,j be the observed methylation level of CpG site i in a normal colon-derived sample j, and yi,l the observed methylation level of CpG site i in a normal ileum-derived cell sample l, sx,i2 the sample variance of xi,j over all normal colon-derived samples, sy,i2 the sample variance of yi,j over all normal ileum-derived cell samples. Identify the CpG sites S0 such that for any i∈S0, it has both sx,i2<c0 and sy,i2<c0 for some constant c0. These are CpG sites with stable methylation levels in each type of normal cells.
- 1.3 Let xi,j be the observed methylation level of CpG site i in a colon-derived sample j, including normal and abnormal, and yi,l the observed methylation level of CpG site i in a ileum-derived cell sample l, including normal and abnormal, sx,i2 the sample variance of xi,j over all colon-derived samples, including normal and abnormal, sy,i2 the sample variance of yi,j over all ileum-derived cell samples, including normal and abnormal. Identify the CpG sites S1 such that for any i∈S1, it has both sx,i2<c0 and sy,i2<c0 for some constant c0, and that the statistical test for the difference between {xi,j0: j0 is a normal colon—derived sample} and {xi,jk: jk is an abnormal colon—derived sample of type k} is not significant for all abnormal types of colon-derived, and that the statistical test for the difference between {yi,j0: j0 is a normal ileum—derived cell sample} and {yi,jk: jk is an abnormal ileum—derived cell sample of type k} is not significant for all abnormal types of ileum-derived cell. These are CpG sites with stable methylation levels in each type of cells, and with no difference in methylation level between normal and any abnormal samples. Let xi be the sample mean of xi,j over all colon-derived samples, including normal and abnormal, yi the sample mean of over all ileum-derived cell samples, including normal and abnormal. Identify the subset C0 of S1 such that for any i∈C0, it has |xi−yi|>c1 for some constant c1. These are CpG sites that are stably methylated in each cell type, with no difference between the normal and abnormal samples of the same cell type, and differentially methylated between different types of cells.
- 1.4 Let xR
0 be the vector of xi for all i∈C0, and yC0 be the vector of yi for all i∈C0, where xi is the mean methylation at site i in all colon-derived samples yi the mean methylation at site i in all ileum-derived cell samples. Note that by the way the set C0 is selected, there is no difference in the methylation level of any CpG sites in C0 between normal and abnormal colon-derived samples, or between normal and abnormal ileum-derived cell samples. Let zjC0 be the observed methylation levels of CpG sites in C0 for a stool sample j of the kth abnormal type. (For convenience, the normal stool sample is called as sample of the 0th abnormal type). For each sample j belonging to the kth abnormal type, regress zjC0 against xC0 and yC0 , with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1, and get the residual ejC0 . Identify the subset C0k of C0 such that for any CpG i in C0k, it has
and ei,k2<c3 for some constants c2 and c3, where ei,k2 is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all stool samples of the kth abnormal type, and si,k2 the sample variances of methylation levels of CpG site i in the same set of stool samples. Repeat the above procedure for each type of abnormal stool samples, the intersection of the subsets C=∩k=0KC0k is the reference set of CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal stool samples can be accurately predicted by the reference component methylomes from normal individuals.
-
- 1.5 Let T0=S0\ S1. Let xC and xT
0 be the vectors of xi and xh for all i∈C and h∈T0 respectively, and yC and yT0 be the vectors of yi and yh for all i∈C and h∈T0 respectively, where xi, xh, yi, and yh are mean methylation level of sites for a normal colon-derived or ileum-derived cell at sites i and h respectively. Let zjC and zjT0 and be the observed methylation levels of CpG sites in C and T0 respectively for a normal stool sample j, wlk C and wlk T0 the observed methylation level of CpG sites in C and T0 respectively for a stool sample lk from an individual with the kth type of abnormality, wlg C and wlg T0 the observed methylation level of CpG sites in C and T0 respectively for a stool sample lg from an individual with the gth type of abnormality, where g≠k. For each j, lk, and lg, regress zjC, wlk C, and wlg C respectively against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. Apply the fitted models respectively to xT0 and yT0 to predict zjT0 , wlk T0 , and wlg T0 respectively, and get the differences ejT0 , elk T0 and elg T0 between the predicted values and observed values. Let ei, ei,k, and ei,g be the means of the sets of differences {ejT0 : j is a normal stool sample}, {elk T0 : lk is a stool sample of th kth abnormal type} and {elg T0 : lg is a stool sample of the gth abnormal type} for CpG site i respectively. Identify the subset Tk of T0 such that for any i∈Tk, it has |ei|<c2,0, |eI,k|>c2,k, and |ei,k−ei,g|>c3,k, for some constants C2,0, C2,k, and C3,k, for all g≠k. Tk is the target set for the kth type of the abnormal individual. These are the sites where the methylation of a normal stool sample can be accurately predicted, the observed methylation in a stool sample of the kth abnormal type will deviate from the prediction, and deviation will be different from that of a stool sample of any other abnormal type.
- 1.5 Let T0=S0\ S1. Let xC and xT
Recall that xc and yc are mean vectors of the methylation levels of the training colon-derived and training ileum-derived cell data for the CpG sites in the reference set C. For any new stool sample t to be tested, let ztC be the observed methylation levels of CpG sites in C. Regress ztC against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients are the estimated fractions of the component methylomes for the stool sample t.
3. Test if the New Stool Samples are from the kth Type of Abnormal Individual.
For the new stool sample t, let XT
Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the stool methylomes, nor does it need to approximate them by mixtures of the component methylomes. Instead, a set of predictor methylomes can be collected that are themselves mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of stool samples with known different proportion of colon-derived and ileum-derived cell DNAs.
In the algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a stool sample the methylome has been affect by some type of intestinal tissue abnormality. To illustrate the advantage of this approach, it is assumed that the mixture vector pj for the methylome of a normal stool sample j followed a Dirichlet's distribution with parameters αi= . . . =αR. Furthermore, for CpG site i, its methylation levels in the R reference vector pj for component methylomes are mi,r=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of
If there is a methyl-seq library of sample j with a coverage of N for CpG site i, the variance of the measured methylation level zi,j is
In other words, if zi,j is used as a test statistic to detect abnormal intestinal tissue using stool sample, under the null hypothesis, the test statistic has a variance of σ12. However, in the presently disclosed algorithm, it is first estimated the mixture vector pj, then predicted xi,j by Σrmi,rpr,j. Note that in a methyl-seq data, it can get millions of CpG sites covered in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector pj, even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σrmi,rpr,j can be obtained, that is, the error of the estimation can be ignored, the variance of the difference zi,j−Σrmi,rpr,j between the observed methylation level and the prediction will be
In other words, under the null hypothesis, the test static zi,j−Σrmi,rpr,j used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic zi,j. This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.
Examples of Embodiments for Example 5E1. A method for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a mammal, comprising:
(a) obtaining a sample from the mammal;
(b) determining the methylation status and/or level of one or more genomic loci in the sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and
(d) diagnosing necrotizing enterocolitis in the mammal,
wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal.
E2. The method of embodiment E1, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the presence of necrotizing enterocolitis in the mammal or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the presence of necrotizing enterocolitis in the mammal.
E3. The method of embodiment E1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the presence of necrotizing enterocolitis in the mammal.
E4. A method of treating necrotizing enterocolitis in a mammal, comprising:
(a) obtaining a sample from the mammal;
(b) determining the methylation status and/or level of one or more genomic loci present in the sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values;
(d) diagnosing necrotizing enterocolitis in the mammal, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal; and
(e) administering an antibiotic therapy.
E5. The method of any one of embodiments E1-E4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a sample obtained from a mammal that does not have necrotizing enterocolitis.
E6. The method of any one of embodiments E1-E5, wherein said sample is a stool sample.
E7. A method of treating necrotizing enterocolitis comprising:
(a) measuring the methylation status and/or level of one or more genomic loci present in a sample from a mammal prior to a treatment of necrotizing enterocolitis;
(b) measuring the methylation status and/or level of one or more genomic loci present in a sample from the mammal during the treatment of necrotizing enterocolitis; and
(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the mammal is responsive to the treatment.
E8. The method of embodiment E7, further comprising (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the mammal is not responsive to the treatment.
E9. The method of embodiment E7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment.
E10. The method of embodiment E7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment.
E11. The method of embodiment E7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment.
E12. The method of embodiment E7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment.
E13. The method of any one of embodiments E7-E12, wherein said sample is a stool sample.
E14. The method of any one of embodiments E1-E13, wherein the one or more genomic loci comprise one or more CpG sites.
E15. The method of any one of embodiments E1-E14, wherein the one or more genomic loci are present within nucleic acids isolated from the sample.
E16. The method of any one of embodiments E1-E15, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.
E17. A method of treating necrotizing enterocolitis, comprising;
(a) diagnosing necrotizing enterocolitis in a mammal by utilization of the algorithm disclosed in Example Embodiment J; and
(b) administering an antibiotic therapy to said mammal to treat said necrotizing enterocolitis.
E18. The method of any one of embodiments E1-E17, wherein said mammal is a human.
E19. A kit for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci.
E20. The kit of embodiment E19, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
This Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis. For example, algorithms, kits, and methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis are provided.
Example 6—Methods and Materials for Assessing and Treating Ovarian Cancer Field of this ExampleThis Example relates to methods and materials involved assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing ovarian cancer. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) to determine whether or not a mammal has, or is developing, ovarian cancer. This Example also provides methods, algorithms, and kits for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer.
Background of this ExampleOvarian cancer often remains undetected until it becomes incurable or challenging to treat. Early diagnosis is frequently impossible or dangerously invasive by current methods. There is therefore great interest in the development of practical and accurate methods for the non-invasive detection and phenotyping of ovarian cancer.
Summary of this ExampleThis Example provides methods and materials involved assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing ovarian cancer. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) to determine whether or not a mammal has, or is developing, ovarian cancer. Determining if a mammal (e.g., a human) has, or is likely to develop, ovarian cancer by assessing DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) can aid in the identification of mammals (e.g., humans) that should be treated in a particular manner (e.g., by administering chemotherapy, by administering immunotherapy, and/or by performing a surgical treatment), for example, early in the disease process.
This Example also provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) of a mammal (e.g., a female human). This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer.
In one aspect, this Example provides a method for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and (d) identifying the mammal as having ovarian cancer, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the mammal.
In another aspect, this Example provides a method of treating ovarian cancer in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci present in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; (d) identifying the mammal as having ovarian cancer, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the mammal; and (e) treating the mammal by administering a chemotherapy, by administering an immunotherapy, and/or by performing a surgical treatment.
In some cases, an increase in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) can indicate the presence of ovarian cancer in the mammal or a decrease in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) indicates the presence of the ovarian cancer in the mammal. In some cases, a decrease in the level of methylation of at least one of the one or more genomic loci in a sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample can indicate the presence of ovarian cancer in the mammal.
In some cases, the reference can be the methylation status and/or level of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) obtained from a mammal (e.g., a human female) that does not have ovarian cancer.
In another aspect, this Example provides a method of treating ovarian cancer in a mammal (e.g., a human female) comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal prior to a treatment of ovarian cancer; (b) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal during the treatment of ovarian cancer; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the subject is responsive to the treatment. In some cases, the method further comprises (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the subject is not responsive to the treatment.
In some cases, an increase in the level of methylation of the one or more genomic loci in the sample indicates that the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the subject is not responsive to the treatment.
This Example further provides algorithms for diagnosing and/or monitoring a mammal having ovarian cancer. In certain embodiments, the algorithm can be used to classify ovarian cancer of a mammal (e.g., a human female).
In another aspect, this Example provides a kit for diagnosing, prognosing, and/or monitoring ovarian cancer in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.
Description of this ExampleThis Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample of a mammal (e.g., a human female). In some cases, the methods described herein can include the use of an algorithm to diagnose, prognose, monitor, classify, and/or assist in the treatment of ovarian cancer. Non-limiting embodiments of this Example are described by the present specification and Examples.
Unless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this Example and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this Example: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of this Example and how to make and use them.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. The present Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.
As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing,” and “comprising” are interchangeable, and one of skill in the art is cognizant that these terms are open ended terms.
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence, and/or level of a biomarker in a sample of a mammal (e.g., a human) is determined by comparing to a reference control.
The terms “reference sample,” “reference control,” “control,” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a sample of a mammal. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have ovarian cancer. In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence, and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence, and/or particular level of a methylation state of a genomic locus that indicates a subject does not have ovarian cancer. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has ovarian cancer, where the methylation status of the locus is known to be not associated with the disease or the phenotype.
The term “a set of predicted values” refers to the methylation status of certain genomic loci for a sample. The status of those loci is not directly measured from that sample. Rather, it is inferred from measurements of other loci for that sample and/or measurements of other samples. The inference of the predicted values is based on some mathematical/statistical models. The models usually assume that the sample for which the methylation status of those loci is to be predicted has a normal phenotype. This assumption may be either correct or wrong, but its correctness is not required for the inference of the predicted values.
The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from the ovaries. In certain embodiments, slightly invasive or non-invasive methods, as described herein, include obtaining plasma, urine, a peritoneal fluid sample, or a cervical swab from a subject.
The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include mammals, non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.
The term “nucleic acid,” “nucleic acid molecule,” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T), or uracil (U)), a sugar (i.e., deoxyribose or ribose), and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.
The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins, and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.
The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island, or a CpG island shore. For example, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.
As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.
As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.
As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.
A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters, 2009; 583:1713-1720. For example, Yamada et al. (Genome Research, 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50%, and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A., 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.
A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.
The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome), or a portion of the subset (e.g., those areas found to be associated with ovarian cancer). A methylome from plasma can be referred to a “plasma fluid methylome,” or a “plasma fluid DNA methylome.” The plasma fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA).
As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As used herein, the terms “reduce,” “reduction,” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.
As described herein, this Example provides methods for diagnosing, monitoring, classifying, and/or treating ovarian cancer by analyzing the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) of a mammal (e.g., a human female). In certain embodiments, the methods can include using an algorithm described herein. In certain embodiments, the methods described herein can allow for the early diagnosis or screening of a subject with ovarian cancer, e.g., the subject does not have any symptoms, or only have early symptoms of ovarian cancer.
In certain embodiments, samples obtained for use in the methods described herein can include cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, the methods described herein can include analyzing cfDNA in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample), to identify genetic phenotypes that are drivers and/or consequences of ovarian cancer.
The sample from the subject can be collected using any appropriate technique. For example, a blood sample, a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample can be collected using standard methods. In some cases, the sample can be collected from the subject before the subject has any symptom of ovarian cancer, i.e., a non-symptomatic subject. In certain embodiments, the sample can be collected from the non-symptomatic subject who is at high risk of ovarian cancer. In certain embodiments, the sample can be collected from the subject who has previously received or is currently receiving a treatment for ovarian cancer. In certain embodiments, two or more samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more samples) can be obtained before and during the subject is receiving a treatment for ovarian cancer (e.g., serially obtained samples).
Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example
This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a subject that includes analyzing the methylation status of certain genomic loci.
In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a sample from a subject that has ovarian cancer compared to a reference sample. For example, the methods described herein can include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in Table 18 and/or Example Embodiment K. In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes, or a combination thereof. In certain embodiments, the genomic loci are present on a particular chromosome.
In certain embodiments, this Example provides methods for diagnosing, prognosing, and/or monitoring ovarian cancer in a subject by detecting the DNA methylation profiles associated with ovarian cancer. In certain embodiments, the methods described herein can include (a) obtaining a sample from the subject, (b) determining the methylation status of one or more genomic loci present in the sample, e.g., present within cfDNA in a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample, (c) comparing the methylation status of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing ovarian cancer in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the subject. In certain embodiments, the difference in the methylation status also can indicate the severity of ovarian cancer.
In certain embodiments, the methods described herein for diagnosing, prognosing, and/or monitoring ovarian cancer in a subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing ovarian cancer in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the subject. In certain embodiments, the difference in the methylation level also can indicate the severity of ovarian cancer.
In certain embodiments, diagnosing ovarian cancer in the subject can include characterizing a phenotype of the ovarian cancer, wherein the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the phenotype of the ovarian cancer. In certain embodiment, the phenotype of the ovarian cancer can include the severity of the ovarian cancer, prognosis of the ovarian cancer, molecular expression profile of the ovarian cancer, responsiveness of the ovarian cancer to certain treatments, or any combinations thereof.
In certain embodiments, the methods described herein for determining if a subject is at risk of developing ovarian cancer in the subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) determining that the subject is at risk of developing ovarian cancer, wherein the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates that the subject is at risk.
In certain embodiments, diagnosing, prognosing, and/or monitoring of a subject with ovarian cancer can be based on a higher or lower methylation level of the genomic locus in the sample of the subject relative to the methylation level in a reference sample, e.g., a sample from a subject that does not have ovarian cancer. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a sample obtained from a subject compared to a control can be indicative that the subject has ovarian cancer or is at risk of developing ovarian cancer. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the sample and the increase in the level of methylation of one or more different genomic loci in the sample can indicate the presence of ovarian cancer.
In certain embodiments, diagnosis of a subject with ovarian cancer can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with ovarian cancer can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with ovarian cancer can be unmethylated and the genomic locus in a reference sample can be methylated.
Diagnostic, Prognostic, Classification, and Monitoring Methods Using an Algorithm of this Example
This Example also provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm as described, for example, in Example Embodiment L. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.
Methods for Treating Ovarian Cancer
This Example also provides methods for treating a subject having ovarian cancer. For example, a mammal (e.g., a human female) that was identified as having ovarian cancer as described herein (or identified as being at risk of developing ovarian cancer as described herein) can be administered one or more chemotherapies, one or more immunotherapies, or a combination thereof to treat ovarian cancer. Examples of chemotherapies that can be used as described herein to treat ovarian cancer include, without limitation, carboplatin, cisplatin, paclitaxel, and docetaxel. Examples of immunotherapies that can be used as described herein to treat ovarian cancer include, without limitation, bevacizumab and durvalumab. In some cases, a mammal (e.g., a human female) that was identified as having ovarian cancer as described herein (or identified as being at risk of developing ovarian cancer as described herein) can be treated using a surgical procedure to treat ovarian cancer. Examples of surgical procedures that can be used as described herein to treat ovarian cancer include, without limitation, surgeries to remove one or both ovaries with or without removing the uterus.
In some cases, the information provided by the methods described herein can be used by a clinician or physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of ovarian cancer is made. For example, when a subject is identified to have an increased risk of developing ovarian cancer, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with ovarian cancer (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.
In some cases, this Example provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for treating ovarian cancer in a subject, comprising determining the methylation status of one or more genomic loci present in a sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for treating ovarian cancer in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.
In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment of ovarian cancer can include measuring the methylation status and/or level of one or more genomic loci in a sample of a subject at a first time-point, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second time-point, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first time-point can be prior to an administration of the therapeutic agent, and the second time-point can be after said administration of the therapeutic agent. In certain embodiments, the first time-point can be prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) can be increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the methods described herein can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced, and/or stopped.
Assays of this Example
This Example also provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence, and/or severity of ovarian cancer. In some cases, the assay method can include comparing the methylation status and/or level of genomic loci present in a sample from a subject that has ovarian cancer to the methylation status and/or level of genomic loci in a sample from a healthy subject to determine the methylation pattern, as described above, that correlates with the presence of ovarian cancer. In some cases, the assay methods can include comparing the methylation status and/or level of genomic loci in a sample from a subject that has ovarian cancer at an early stage to the methylation status and/or level of genomic loci in a sample from a subject that has ovarian cancer at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of ovarian cancer.
DNA Isolation Techniques of this Example
In certain embodiments, the methods described herein can include isolating nucleic acid from a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) obtained from a subject. Any appropriate technique can be used to isolate nucleic acids from a sample. For example, isolation of DNA from a plasma sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302), and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., plasma samples) or cells. For example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a sample from a subject.
Methylation Detection Techniques of this Example
Various methylation analysis procedures are known in the art, and can be used with the methods described herein. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.
In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process, and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).
In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci are provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.
Kits of this Example
This Example provides kits for diagnosing, monitoring, classifying, and/or treating a subject with ovarian cancer. The kits described herein can comprise a means for determining and/or detecting the methylation status of one or more genomic loci.
Kits of this Example can include, without limitation, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers, or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, a kit described herein can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes can comprise one or more CpG sites.
In certain non-limiting embodiments, a primer and/or probe described herein can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length, and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.
In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.
In certain non-limiting embodiments, the kits described herein can additionally include other components such as a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, and/or negative control sequences necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.
In certain embodiments, the kits described herein can include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of ovarian cancer in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.
Reports, Programmed Computers, and Systems of this Example
In certain embodiments, a diagnosis and/or monitoring of ovarian cancer in a subject based on the methylation status of one or more genomic loci as described herein can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).
Examples of tangible reports can include, without limitation, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).
A report can include, for example, an individual's medical history, or can just include size, presence, absence, or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.
A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.
In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm of this Example (see Example Embodiment L). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's ovarian cancer risk or treat the individual, such as by implementing an ovarian cancer management system.
Example Embodiment K of Example 6: Non-Invasive Molecular Phenotyping of Human Plasma for Ovarian CancerAn epigenomic analysis was performed using liquid biopsy of plasma DNA samples obtained from human ovarian cancer patients (n=9) and human healthy controls (n=11). Bisulfite-converted DNA libraries underwent targeted capture using the SeqCap Epi CpGiant Enrichment Kit (Roche, Pleasanton, Calif.) and were sequenced to a read depth of ˜50×. This approach targets 80.5 Mb of the human genome and ˜5.5 million individual CpG sites. The Bismark Bisulfite Read Mapper 34 was used to align the libraries against the GRCh38 reference genome and determine the methylation status for each CpG site.
These analyses demonstrated the ability to generate high resolution epigenomic liquid biopsy data in cell free DNA samples obtained from plasma. Examples of ovarian cancer-specific differentially methylated loci are shown in Table 18. The list of CpG sites represents examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment L.
Methylation density plots of data are shown in
Provided below is an algorithm that can be used to diagnose a subject with ovarian cancer. The presently disclosed subject matter provides that the methylome(s) of ovaries of a mammal, or structures therein, could be affected by certain abnormalities (e.g., ovarian cancer), and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in plasma, which are released by ovary tissues. An algorithm was developed to identify the changes of methylation patterns in the methylome of plasma caused by ovary phenotypes. One insight behind this algorithm was that the methylome of the DNA fragments in plasma is a mixture of a variety of component methylomes of ovarian and other origins, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal ovary phenotype. By constructing a model of plasma methylome as a linear combination of various component methylomes of ovary and other origins, the algorithm can accurately predict the methylation patterns of a new plasma sample under the hypothesis that it is from a normal individual. Consequently, the algorithm has high sensitivity for detecting abnormal methylation patterns in a plasma sample caused by changes of the methylomes of some ovarian/fallopian or other relevant tissue when the sample is from an affected individual.
The procedure can be applied with little modification to the diagnosis and phenotyping of ovarian cancer using other types of biopsy samples, such as cervical swabs, urine and peritoneal fluid, provided that the DNA fragments from the tissues affected by ovarian cancer can be found in those biopsy samples.
Let i be any CpG site in human genome, zi,j be the methylation level of CpG site i in a plasma sample j, pi,r,j be the proportion of the rth component methylome mr,j of ovarian/fallopian or other relevant tissue origin in plasma sample j at site i, mi,r,j be the methylation level of CpG i in methylome mr,j. The hypothesis is:
Zi,j=Σr=1Rpi,r,jmi,r,j (1)
where pi,r,j, mi,r,j>=0, mi,r,j<=1, pi,1,j+ . . . pi,R,j=1.
It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any plasma j from a normal individual, it has mI,r,j=mI,r and pI,r,j=pr,j.
That is, it is assumed that in any plasma sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, plasma samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m1S, . . . , mRS or simply m1, . . . , mR when there is no confusion. For any plasma sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let zjS be the methylome of plasma sample C restricted to S, then for some mixture vector pj=[pj,l . . . , pj,R]T, it has:
zjs=[m1S, . . . mRS]pj (2)
Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets Tk such that T=Uk=1KTk where the index k represents the kth type of abnormal tissue phenotype. Tk's do not need to be disjoint. Moreover, Tk itself is the union of two disjoint sets Dk and Vk. Either Dk or Vk could be empty, but not both. It is assumed that for any plasma sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: zjC=[m1C, . . . , mRC]pj regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a plasma sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w1S≠[m1S, . . . , mRS]pl for any mixture vector pl. More specifically, for a plasma sample l from an individual with the kth type of abnormal phenotype, it has: 1), wjC=[m1C, . . . , mRC], 2), if DK is non-empty wl
T is called the target set of CpG sites, Dk is called the differential methylation target set, Vk is called the copy number variation target set, and Tk is called the target set for the kth type of abnormal phenotype.
The main steps of the algorithm of this Example are:
-
- 1) Identify the sets of reference CpG sites C, and T1, . . . , TK for the list of K types of abnormal individuals.
- 2) Estimate the restricted reference component methylomes m1, . . . , mR, or R predictor methylomes n1, . . . , nR that are independent linear combinations of the reference component methylomes such that nr=[m1, . . . , mR]qr for R linearly independent mixture vectors q1, . . . , qR.
- 3) (Optional) If the reference component methylomes are available, estimate the proportions of these components at the reference CpG sites C for the test plasma samples.
- 4) Predict the methylation level of the test plasma samples at the target set Tk of CpG sites, under the hypothesis that the sample is from a normal individual.
- 5) Compare the predicted methylation levels at Dk and Vk against the observed methylation levels, and reject the null hypothesis that a test sample is from a normal individual if the observed methylation levels are significantly different form the predicted levels.
The algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of plasma samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.
In the simplest implementation of the algorithm of this Example, it is assumed the restricted methylome of a plasma sample from a normal individual can be approximated by a mixture of two restricted reference methylomes. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, for the genomic loci of interest, the plasma methylome is approximated by the mixture of leukocyte and ovarian/fallopian or other relevant tissue methylomes. The implementation of the algorithm includes the following steps:
1. Identify the Reference Set C, and the Target Sets T1, . . . , TK.
-
- 1.1 Collect the methylation data for a set of leukocyte samples, a set of ovarian/fallopian or other relevant tissue/cell samples, and a set of plasma samples, all from normal individuals. For each type of abnormal individuals, collect a set of leukocyte-derived samples, a set of ovarian/fallopian or other relevant tissue/cell samples, and a set of plasma samples from that type of abnormal individuals. All these samples should have matched age, race, and other relevant parameters. These are the training data.
- 1.2 Let xi,j be the observed methylation level of CpG site i in a normal leukocyte-derived sample j, and yi,l the observed methylation level of CpG site i in a normal ovarian/fallopian or other relevant tissue/cell sample l, sx,i2 the sample variance of xi over all normal leukocyte-derived samples, sy,i2 the sample variance of yi,j over all normal ovarian/fallopian or other relevant tissue/cell samples. Identify the CpG sites S0 such that for any i∈S0, it has both sx,i2<c0 and sy,i2<c0 for some constant c0. These are CpG sites with stable methylation levels in each type of normal cells.
- 1.3 Let xi,j be the observed methylation level of CpG site i in a leukocyte-derived sample j, including normal and abnormal, and yi,l the observed methylation level of CpG site i in a ovarian/fallopian or other relevant tissue/cell sample l, including normal and abnormal, sx,i2 the sample variance of xi,l over all leukocyte-derived samples, including normal and abnormal, sy,i2 the sample variance of yi,j over all ovarian/fallopian or other relevant tissue/cell samples, including normal and abnormal. Identify the CpG sites S1 such that for any i∈S1, it has both sx,i2<c0 and sy,i2<c0 for some constant c0, and that the statistical test for the difference between {xi,j0: j0 is a normal leukocyte—derived sample} and {xi,jk: jk is an abnormal leukocyte—derived sample of type k} is not significant for all abnormal types of leukocyte-derived, and that the statistical test for the difference between {yi,j0: j0 is a normal ovarian/fallopian or other relevant tissue/cell sample} and {yi,jk: jk is an abnormal ovarian/fallopian or other relevant tissue/cell sample of type k} is not significant for all abnormal types of ovarian/fallopian or other relevant tissue/cell. These are CpG sites with stable methylation levels in each type of cells, and with no difference in methylation level between normal and any abnormal samples. Let xi be the sample mean of xi,j over all leukocyte-derived samples, including normal and abnormal, yi the sample mean of yi,j over all ovarian/fallopian or other relevant/cell samples, including normal and abnormal. Identify the subset C0 of S1 such that for any i∈C0, it has |xi−yi|>c1 for some constant c1. These are CpG sites that are stably methylated in each cell type, with no difference between the normal and abnormal samples of the same cell type, and differentially methylated between different types of cells.
- 1.4 Let xR
0 be the vector of xi for all i∈C0, and yC0 be the vector of yi for all i∈C0, where xi is the mean methylation at site i in all leukocyte-derived samples yi the mean methylation at site i in all ovarian/fallopian or other relevant tissue/cell samples. Note that by the way the set C0 is selected, there is no difference in the methylation level of any CpG sites in C0 between normal and abnormal leukocyte-derived samples, or between normal and abnormal ovarian/fallopian or other relevant tissue/cell samples. Let zjC0 be the observed methylation levels of CpG sites in C0 for a plasma sample j of the kth abnormal type. (For convenience, the normal plasma sample is called as sample of the 0th abnormal type). For each sample j belonging to the kth abnormal type, regress zjC0 against xC0 and yC0 , with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1, and get the residual ejC0 . Identify the subset C0k of C0 such that for any CpG i in C0k, it has
and ei,k2<c3 for some constants c2 and c3, where ei,k2 is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all plasma samples of the kth abnormal type, and si,k2 the sample variances of methylation levels of CpG site i in the same set of plasma samples. Repeat the above procedure for each type of abnormal plasma samples, the intersection of the subsets C=∩k=0KC0k is the reference set of CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal plasma samples can be accurately predicted by the reference component methylomes from normal individuals.
-
- 1.5 Let T0=S0\ S1. Let xC and xT
0 be the vectors of xi and xh for all i∈C and h∈T0 respectively, and yC and yT0 be the vectors of yi and yh for all i∈C and h∈T0 respectively, where xi, xh, yi, and yh are mean methylation level of sites for a normal leukocyte-derived or ovarian/fallopian or other relevant tissue/cell at sites i and h respectively. Let zjC and zjT0 and be the observed methylation levels of CpG sites in C and T0 respectively for a normal plasma sample j, wlk C and wlk T0 the observed methylation level of CpG sites in C and T0 respectively for a plasma sample lk from an individual with the kth type of abnormality, wlg C and wlk T0 the observed methylation level of CpG sites in C and T0 respectively for a plasma sample lg from an individual with the gth type of abnormality, where g≠k. For each j, lk, and lg, regress zjC, wlk C, and wlg C respectively against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. Apply the fitted models respectively to xT0 and yT0 to predict zjT0 , wlk T0 and Wlg T0 respectively, and get the differences ejT0 , elk T0 and elg T0 between the predicted values and observed values. Let ei, ei,k, and ei,g be the means of the sets of differences {ejT0 : j is a normal plasma sample}, {elk T0 : lk is a plasma sample of th kth abnormal type} and {elg T0 : lg is a plasma sample of the gth abnormal type} for CpG site i respectively. Identify the subset Tk of T0 such that for any i∈Tk, it has led <c2,0, |ei|<c2,0, |ei,k|>c2,k, and |ei,k−ei,g|>c3,k, for some constants c2,0, c2,k, and c3,k, for all g≠k. Tk is the target set for the kth type of the abnormal individual. These are the sites where the methylation of a normal plasma sample can be accurately predicted, the observed methylation in a plasma sample of the kth abnormal type will deviate from the prediction, and deviation will be different from that of a plasma sample of any other abnormal type.
- 1.5 Let T0=S0\ S1. Let xC and xT
2. Estimate Fraction of the New Plasma Samples to be Tested
Recall that xc and yc are mean vectors of the methylation levels of the training leukocyte-derived and training ovarian/fallopian or other relevant tissue/cell data for the CpG sites in the reference set C. For any new plasma sample t to be tested, let ztC be the observed methylation levels of CpG sites in C. Regress ztC against xC and yC, with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients are the estimated fractions of the component methylomes for the plasma sample t.
3. Test if the New Plasma Samples are from the kth Type of Abnormal Individual.
For the new plasma sample t, let xT
Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the plasma methylomes, nor does it need to approximate them by mixtures of the component methylomes. Instead, a set of predictor methylomes can be collected that are themselves mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of plasma samples with known different proportion of leukocyte-derived and ovarian/fallopian or other relevant tissue/cell DNAs.
In the algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a plasma sample the methylome has been affected by ovarian cancer. To illustrate the advantage of this approach, it is assumed that the mixture vector p1 for the methylome of a normal plasma sample j followed a Dirichlet's distribution with parameters α1= . . . =αR. Furthermore, for CpG site i, its methylation levels in the R reference vector pi for component methylomes are mi,r=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of
If there is a methyl-seq library of sample j with a coverage of N for CpG site i, the variance of the measured methylation level zi,j is
In other words, if zi,j is used as a test statistic to detect and phenotype ovarian cancer using a plasma sample, under the null hypothesis, the test statistic has a variance of σ12. However, in the algorithm of this Example, it is first estimated the mixture vector pj, then predicted zi,j by Σrmi,rpr,j. Note that in methyl-seq data, millions of CpG sites can be contained in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector pj, even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σrmi,rpr,j can be obtained, that is, the error of the estimation can be ignored, the variance of the difference zi,j−Σrmi,r pr,j between the observed methylation level and the prediction will De
In other words, under the null hypothesis, the test static zi,j−Σr mi,r pr,j used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic zi,j. This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.
Examples of Embodiments of Example 6F1. A method for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a mammal, comprising:
(a) obtaining a sample from the mammal;
(b) determining the methylation status and/or level of one or more genomic loci in the sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and
(d) diagnosing ovarian cancer in the mammal,
wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or the set of predicted values indicates the presence of ovarian cancer in the mammal.
F2. The method of embodiment F1, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the presence of ovarian cancer in the mammal or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the presence of ovarian cancer in the mammal.
F3. The method of embodiment F1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the presence of ovarian cancer in the mammal.
F4. A method of treating ovarian cancer in a mammal, comprising:
(a) obtaining a sample from the mammal;
(b) determining the methylation status and/or level of one or more genomic loci present in the sample;
(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values;
(d) diagnosing ovarian cancer in the mammal, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or the set of predicted values indicates the presence of ovarian cancer in the mammal; and
(e) administering a chemotherapy, an immunotherapy, or both to said mammal.
F5. The method of any one of embodiments F1-F4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a sample obtained from a mammal that does not have ovarian cancer.
F6. The method of any one of embodiments F1-F5, wherein said sample is a plasma sample.
F7. A method of treating ovarian cancer comprising:
(a) measuring the methylation status and/or level of one or more genomic loci present in a sample from a mammal prior to a treatment of ovarian cancer;
(b) measuring the methylation status and/or level of one or more genomic loci present in a sample from the mammal during the treatment of ovarian cancer; and
(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the mammal is responsive to the treatment.
F8. The method of embodiment F7, further comprising (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the mammal is not responsive to the treatment.
F9. The method of embodiment F7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment.
F10. The method of embodiment F7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment.
F11. The method of embodiment F7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment.
F12. The method of embodiment F7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment.
F13. The method of any one of embodiments F7-F12, wherein said sample is a plasma sample.
F14. The method of any one of embodiments F1-F13, wherein the one or more genomic loci comprise one or more CpG sites.
F15. The method of any one of embodiments F1-F14, wherein the one or more genomic loci are present within nucleic acids isolated from the sample.
F16. The method of any one of embodiments F1-F15, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.
f17. A method of treating ovarian cancer, comprising;
(a) diagnosing ovarian cancer in a mammal by utilization of the algorithm disclosed in Example Embodiment L; and
(b) administering a chemotherapy, an immunotherapy, or both to said mammal to treat said ovarian cancer.
F18. The method of any one of embodiments F1-F17, wherein said mammal is a human.
F19. A kit for diagnosing, prognosing, and/or monitoring ovarian cancer in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci.
F20. The kit of embodiment F19, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.
Example 6 provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer. For example, algorithms, kits, and methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer are provided.
Other EmbodimentsIt is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
Claims
1. A computer-implemented method, comprising:
- obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a plurality of different tissues of the person;
- filtering, by the computing system, the initial sequence data to generate filtered sequence data that describes sequences of a filtered subset of nucleic acids from the biological sample, wherein the filtering includes (i) selecting target nucleic acids from the initial set of nucleic acids based on at least one of a methylation characteristic or a copy number characteristic of the target nucleic acids and (ii) enriching the target nucleic acids in the filtered subset;
- determining, by the computing system, a methylation profile for the filtered subset of nucleic acids from the biological sample;
- processing, by the computing system, the methylation profile for the filtered subset of nucleic acids to determine a likelihood that the person has a specified medical condition; and
- outputting, by the computing system, an indication of the likelihood that the person has the specified medical condition.
2. The computer-implemented method of claim 1, further comprising identifying a pre-defined set of genomic regions;
- wherein selecting target nucleic acids from the initial set of nucleic acids comprises comparing nucleic acid sequences from the initial set of nucleic acids to sequences from the pre-defined set of genomic regions; and
- wherein enriching the target nucleic acids in the filtered subset comprises discarding nucleic acid sequences from the initial sequence data that are not among the sequences from the pre-defined set of genomic regions, while retaining nucleic acid sequences from the initial sequence data that are among the sequences from the pre-defined set of genomic regions.
3. The computer-implemented method of claim 2, wherein at least a first subset of the pre-defined set of genomic regions are defined based on the regions in the first subset exhibiting a minimum level of stability with respect to at least one of the methylation characteristic or the copy number characteristic in a population of individuals.
4. The computer-implemented method of claim 3, wherein at least a second subset of the pre-defined set of genomic regions are defined based on the regions in the second subset exhibiting at least a minimum difference with respect to the methylation characteristic or the copy number characteristic between individuals who have the specified medical condition and individuals who do not have the specified medical condition.
5. The computer-implemented method of claim 1, wherein the biological sample comprises plasma, and the initial set of nucleic acids comprises cell-free DNA in the plasma.
6. The computer-implemented method of claim 1, further comprising:
- identifying a set of restricted reference component methylomes in the initial set or filtered subset of nucleic acids;
- identifying a set of reference component methylomes;
- determining a proportion of the reference component methylomes at a reference set of CpG sites in the initial set or filtered subset of nucleic acids;
- generating predictions of methylation levels at a target set of CpG sites in the initial set or filtered subset of nucleic acids;
- comparing the predictions of methylation levels at the target set of CpG sites to observed methylation levels; and
- determining whether the person likely has or does not have the specified medical condition based on the comparison.
7. The computer-implemented method of claim 1, wherein the biological sample comprises a stool sample or cerebrospinal fluid.
8. The computer-implemented method of claim 1, further comprising:
- determining, by the computing system, a copy number profile for the filtered subset of nucleic acids from the biological sample; and
- processing, by the computing system, the copy number profile along with the methylation profile for the filtered subset of nucleic acids to determine the likelihood that the person has the specified medical condition.
9. The computer-implemented method of claim 1, wherein the initial set of nucleic acids were treated to facilitate detection of methylated sites before sequencing.
10. The computer-implemented method of claim 1, wherein the specified medical condition is ovarian cancer, endometriosis, necrotizing enterocolitis, fetal aneuploidy, preeclampsia, or a brain condition.
11. The computer-implemented method of claim 1, wherein the methylation profile for the filtered subset of nucleic acids indicates, for each of a plurality of genomic loci, a methylation level of the locus.
12. The computer-implemented method of claim 1, wherein the genomic loci is a CpG site, CpG island, differentially methylated region (DMR), promoter region, enhancer region, or CpG island shore.
13. The computer-implemented method of claim 1, wherein determining the likelihood that the person has the specified medical condition comprises determining a probability that the person has the specified medical condition.
14. The computer-implemented method of claim 1, wherein determining the likelihood that the person has the specified medical condition comprises generating a binary indication that the person either likely has the specified medical condition or likely does not have the specified medical condition.
15. The computer-implemented method of claim 1, wherein processing the methylation profile comprises providing data representing the methylation profile as input to a machine-learning model, and obtaining the likelihood, or a value from which the likelihood is derived, as an output of the machine-learning model.
16. The computer-implemented method of claim 15, wherein the machine-learning model comprises at least one of a classifier, an artificial neural network, a support vector machine, a decision tree, or a regression model.
17. The computer-implemented method of claim 15, wherein the machine-learning model defines reference or predicted methylation profiles against which the methylation profile for the filtered subset are compared to determine the likelihood that the person has the specified medical condition.
18. The computer-implemented method of claim 1, wherein the determined likelihood that the person has the specified medical condition is used by a medical provider to assess whether to perform additional diagnostic testing on the person.
19. The computer-implemented method of claim 1, wherein the determined likelihood that the person has the specified medical condition is used by a medical provider to at least one of diagnose the person or treat the person for the specified medical condition.
20. The computer-implemented method of claim 1, wherein outputting the indication of the likelihood that the person has the specified medical condition comprises at least one of presenting the indication on an electronic display, audibly playing the indication through a speaker, storing the indication in a memory of a computing system for subsequent retrieval, or transmitting the indication in an electronic message to one or more users.
21. The computer-implemented method of claim 1, wherein enriching the target nucleic acids in the filtered subset comprises generating the filtered subset so that a fraction of the target nucleic acids that occur in the filtered subset is greater than a fraction of the target nucleic acids that occur in the initial set of nucleic acids.
22. The computer-implemented method of claim 1, wherein the filtered subset consists exclusively of the target nucleic acids.
23. The computer-implemented method of claim 1, wherein the filtered subset comprises the target nucleic acids and non-targeted nucleic acids.
24-25. (canceled)
26. A computer-implemented method, comprising:
- obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a plurality of different tissues of the person;
- filtering, by the computing system, the initial sequence data to identify a first subset of sequences from the initial sequence data that correspond to a first pre-defined set of genomic regions;
- filtering, by the computing system, the initial sequence data to identify a second subset of sequences from the initial sequence data that correspond to a second pre-defined set of genomic regions;
- processing, by the computing system, data that includes an observed methylation profile of the first subset of sequences to generate a predicted methylation profile of the second subset of sequences;
- comparing, by the computing system, an observed methylation profile of the second subset of sequences to the predicted methylation profile of the second subset of sequences to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the second subset of sequences and the predicted methylation profile of the second subset of sequences meets a minimum difference criterion; and
- outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition.
27. The computer-implemented method of claim 26, wherein:
- the first pre-defined set of genomic regions are regions that exhibit a minimum level of stability with respect to at least one of a methylation characteristic or a copy number characteristic in a population of individuals; and
- the second pre-defined set of genomic regions are regions that exhibit a minimum difference with respect to at least one of the methylation characteristic or the copy number characteristic between a first sub-population of individuals who have the specified medical condition and a second sub-population of individuals who do not have the specified medical condition.
28. The computer-implemented method of claim 26, wherein the first pre-defined set of genomic regions is a first reference set of genomic regions, and the second pre-defined set of genomic regions is a first target set of genomic regions;
- the method further comprising: selecting the first reference set of genomic regions as the first pre-defined set of genomic regions from a database that includes a plurality of reference sets of genomic regions, wherein different ones of the plurality of reference sets of genomic regions correspond to different medical conditions; and selecting the first target set of genomic regions as the second pre-defined set of genomic regions from the database, wherein the database further includes a plurality of target sets of genomic regions, wherein different ones of the plurality of target sets of genomic regions correspond to different medical conditions.
29. The computer-implemented method of claim 26, wherein the specified medical condition is preeclampsia, endometriosis, ovarian cancer, necrotizing enterocolitis, or a brain condition.
30-31. (canceled)
32. A computer-implemented method, comprising:
- obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a plurality of different tissues of the person;
- filtering, by the computing system, the initial sequence data to identify a target subset of sequences from the initial sequence data that correspond to a pre-defined set of genomic regions;
- comparing, by the computing system, an observed methylation profile of the target subset of sequences to a pre-defined methylation profile to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the target subset of sequences and the pre-defined methylation profile meets a minimum difference criterion; and
- outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition.
33-34. (canceled)
Type: Application
Filed: Apr 10, 2020
Publication Date: Aug 18, 2022
Inventors: David Gerard Peters (Pittsburgh, PA), Tianjiao Chu (Pittsburgh, PA), Lisa Ann Pan (Pittsburgh, PA), David N. Finegold (Pittsburgh, PA)
Application Number: 17/602,553