METHOD AND KIT FOR DETERMINING THE TISSUE OR CELL ORIGIN OF DNA

A method of detecting death of a cell type or tissue in a subject is disclosed. The method comprises determining whether cell-free DNA comprised in a fluid sample of the subject is derived from the cell type or tissue, wherein the determining is effected by ascertaining the methylation status of at least four methylation sites on a continuous sequence of the cell-free DNA, the sequence comprising no more than 300 nucleotides, wherein a methylation status of each of the at least four methylation sites on the continuous sequence of the DNA characteristic of the cell type or tissue is indicative of death of the cell type or tissue. Kits for detecting cell death are also disclosed.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No. 17/528,699, filed Nov. 17, 2021, which is a division of U.S. application Ser. No. 15/303,762, filed Oct. 13, 2016, now U.S. Pat. No. 11,203,784, which is a National Phase of PCT Patent Application No. PCT/IL2015/050403 having International filing date of Apr. 14, 2015, which claims the benefit of priority under 35 USC § 119(e) of U.S. Provisional Patent Application No. 61/979,233 filed on Apr. 14, 2014. The contents of each of the above applications are all incorporated by reference to the instant application.

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The contents of the electronic sequence listing (4005.xml; Size: 2,033,797 bytes; and Date of Creation: Feb. 14, 2023) is herein incorporated by reference in its entirety.

FIELD AND BACKGROUND OF THE INVENTION

The present invention, in some embodiments thereof, relates to a method of determining the source of cell-free DNA and use thereof for diagnosing pathological processes associated with cell death, monitoring therapeutic regimes such as drugs intended to change cell death and in studying for clinical and research purposes processes affecting cell death levels.

It has been known for decades that plasma contains small fragments of cell-free circulating DNA (cfDNA) derived from dead cells (on average 5000 genome equivalents per ml). While the mechanisms underlying the release and clearance of cfDNA remain obscure, the phenomenon is rapidly being exploited for a variety of applications with clinical relevance. The recognition that fragments of fetal DNA travel briefly in maternal circulation has opened the way for next generation sequencing (NGS)-based prenatal testing to identify fetal trisomies and other genetic aberrations, potentially replacing amniocentesis. In cancer biology, tumors are known to release DNA (including tumor-specific somatic mutations) into the circulation, providing means for liquid biopsies to monitor tumor dynamics and genomic evolution. In addition, cfDNA has been used to detect graft cell death after kidney, liver or heart transplantation, based on single nucleotide polymorphisms (SNPs) distinguishing the DNA of donor from that of recipients. In all these cases, genetic differences exist between the DNA sequence of the tissue of interest (fetus, tumor or graft) and that of the host, providing the basis for highly specific assays.

Blood levels of cfDNA are known to increase under multiple additional conditions such as traumatic brain injury, cardiovascular disease, sepsis and intensive exercise. However in these cases, the source of elevated cfDNA is unknown, greatly compromising the utility of cfDNA as a diagnostic or prognostic tool. For example, cfDNA could originate from parenchymal cells of the injured tissue, but also from dying inflammatory cells.

Despite having an identical nucleotide sequence, the DNA of each cell type in the body carries unique epigenetic marks correlating with its gene expression profile. In particular, DNA methylation, serving to repress nontranscribed genes, is a fundamental aspect of tissue identity. Methylation patterns are unique to each cell type, conserved among cells of the same type in the same individual and between individuals, and are highly stable under physiologic or pathologic conditions. Therefore, it may be possible to use the DNA methylation pattern of cfDNA to determine its tissue of origin and hence to infer cell death in the source organ.

Theoretically, such an approach could identify the rate of cell death in a tissue of interest, taking into account the total amount of cfDNA, the fraction derived from a tissue of interest, and the estimated half life of cfDNA (15-120 minutes). Note that since the approach relies on normal, stable markers of cell identity, it cannot identify the nature of the pathology (e.g. distinguishing cfDNA derived from dead tumor cells or dead wild type cells due to trauma or inflammation in the same tissue). The potential uses of a highly sensitive, minimally invasive assay of tissue specific cell death include early, precise diagnosis as well as monitoring response to therapy in both a clinical and drug-development setting.

A classic example of tissue-specific DNA methylation is provided by the insulin gene promoter, which is unmethylated in insulin-producing pancreatic p-cells and methylated elsewhere. Recent studies have identified unmethylated insulin promoter DNA in the circulation of newly diagnosed T1D patients as well as in islet graft recipients, likely reflecting both autoimmune and alloimmune destruction of β-cells (Akirav E. M. et al. Proceedings of the National Academy of Sciences of the United States of America, 108, 19018-19023 (2011); Lebastchi J et al., Diabetes 62, 1676-1680 (2013); Husseiny M. I. Plos one 9 e94591 (2014; and Herold K. C. et al., J Clin Invest. Doi: 10.1172/jc178142 (2015)).

Additional background art includes International PCT Publication No. WO2013131083, WO 2014138133 and WO201101728.

SUMMARY OF THE INVENTION

According to an aspect of some embodiments of the present invention there is provided a method of detecting death of a cell type or tissue in a subject comprising determining whether cell-free DNA comprised in a fluid sample of the subject is derived from the cell type or tissue, wherein the determining is effected by ascertaining the methylation status of at least four methylation sites on a continuous sequence of the cell-free DNA, the sequence comprising no more than 300 nucleotides, wherein a methylation status of each of the at least four methylation sites on the continuous sequence of the DNA characteristic of the cell type or tissue is indicative of death of the cell type or tissue.

According to an aspect of some embodiments of the present invention there is provided a method of identifying a methylation signature for a cell type or tissue of interest comprising identifying in the DNA of the cell type of interest a continuous sequence of no more than 300 nucleotides which comprise at least 4 methylation sites, wherein each of the sites are differentially methylated with respect to a second non-identical cell type or tissue, thereby identifying the methylation signature for the cell type or tissue of interest.

According to an aspect of some embodiments of the present invention there is provided a method of determining whether DNA is derived from a cell type or tissue of interest in a sample, the method comprising:

ascertaining the methylation status of at least four methylation sites on a continuous sequence of the DNA, the sequence comprising no more than 300 nucleotides, wherein a methylation status of each of the at least four methylation sites on the continuous sequence characteristic of the cell of interest, is indicative that the DNA is derived from the cell of interest.

According to an aspect of some embodiments of the present invention there is provided a kit for identifying the source of DNA in a sample comprising oligonucleotides which are capable of detecting the methylation status of at least four methylation sites in a nucleic acid sequence, the nucleic acid sequence being no longer than 300 base pairs and comprising at least four methylation sites which are differentially methylated in a first cell of interest with respect to a second cell which is non-identical to the first cell of interest.

According to an aspect of some embodiments of the present invention there is provided a kit for identifying the source of DNA in a sample comprising at least two oligonucleotides which are capable of amplifying a DNA having a nucleic acid sequence no longer than 300 base pairs, wherein the nucleic acid sequence comprises at least four methylation sites which are differentially methylated in a first cell of interest with respect to a second cell which is non-identical to the first cell of interest.

According to some embodiments of the invention, the methylation status is characteristic of a non-diseased cell type or tissue of interest.

According to some embodiments of the invention, the sequence comprises between 50-250 nucleotides.

According to some embodiments of the invention, when death of the cell type is associated with a pathological process, the method further comprises diagnosing the pathological process.

According to some embodiments of the invention, the DNA is cell-free DNA.

According to some embodiments of the invention, the DNA is cellular DNA.

According to some embodiments of the invention, the method further comprises lysing the cells of the cellular DNA prior to the determining.

According to some embodiments of the invention, the at least four methylation sites comprises at least five methylation sites.

According to some embodiments of the invention, the cell type of interest is comprised in a body fluid.

According to some embodiments of the invention, the sample comprises a body fluid.

According to some embodiments of the invention, the fluid is selected from the group consisting of blood, plasma, sperm, milk, urine, saliva and cerebral spinal fluid.

According to some embodiments of the invention, the ascertaining is effected using at least one methylation-dependent oligonucleotide.

According to some embodiments of the invention, the methylation-dependent oligonucleotide hybridizes to at least one of the four methylation sites, the site being methylated.

According to some embodiments of the invention, the methylation-dependent oligonucleotide hybridizes to at least one of the four methylation sites, the site being non-methylated.

According to some embodiments of the invention, the ascertaining is effected using a methylation-independent oligonucleotide.

According to some embodiments of the invention, the ascertaining is effected using at least two methylation-independent oligonucleotides.

According to some embodiments of the invention, the ascertaining is effected by:

(a) contacting the DNA in the sample with bisulfite to convert demethylated cytosines of the DNA to uracils;
(b) amplifying the continuous sequence of DNA using oligonucleotides that hybridize to a nucleic acid sequence adjacent to the first and last of the at least four methylation sites on the continuous sequence of the DNA; and
(c) sequencing the continuous sequence of DNA.

According to some embodiments of the invention, the sample comprises cell-free DNA which is derived from a second cell which is non-identical to the cell type or tissue.

According to some embodiments of the invention, the method further comprises analyzing the amount of cell-free DNA derived from the cell type or tissue: amount of cell-free DNA derived from the second cell.

According to some embodiments of the invention, the method further comprises analyzing the amount of cell-free DNA derived from the cell type or tissue: total amount of cell-free DNA in the sample.

According to some embodiments of the invention, the cell type is selected from the group consisting of a pancreatic beta cell, a pancreatic exocrine cell, a hepatocyte, a brain cell, a lung cell, a uterus cell, a kidney cell, a breast cell, an adipocyte, a colon cell, a rectum cell, a cardiomyocyte, a skeletal muscle cell, a prostate cell and a thyroid cell.

According to some embodiments of the invention, the tissue is selected from the group consisting of pancreatic tissue, liver tissue, lung tissue, brain tissue, uterus tissue, renal tissue, breast tissue, fat, colon tissue, rectum tissue, heart tissue, skeletal muscle tissue, prostate tissue and thyroid tissue.

According to some embodiments of the invention, the sample is a blood sample.

According to some embodiments of the invention, the method further comprises quantitating the amount of cell-free DNA which is derived from the cell type or tissue.

According to some embodiments of the invention, the sequence comprises between 50-250 nucleotides.

According to some embodiments of the invention, the DNA is cell-free DNA.

According to some embodiments of the invention, the DNA sequence is comprised in a sequence set forth in any one of SEQ ID NOs: 1-1484.

According to some embodiments of the invention, the kit further comprises at least one agent for sequencing the DNA sequence.

According to some embodiments of the invention, the kit further comprises DNA having the nucleic acid sequence, wherein the DNA is derived from a known cell of interest.

According to some embodiments of the invention, the kit is for diagnosing a pathological process.

According to some embodiments of the invention, the kit is for monitoring a treatment for a pathological process.

According to some embodiments of the invention, the kit is for monitoring death of a cell type or tissue.

According to some embodiments of the invention, the kit further comprises bisulfite.

According to some embodiments of the invention, at least one of the at least two oligonucleotides encodes a bar-code sequence.

According to some embodiments of the invention, at least one of the at least two oligonucleotides is labeled with an identifiable moiety.

According to some embodiments of the invention, the at least one oligonucleotide is labeled with a detectable moiety.

According to some embodiments of the invention, at least one of the at least two oligonucleotides encodes a bar-code sequence.

According to some embodiments of the invention, the two oligonucleotides encode sequences which allow for attaching to a flow cell surface.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings and images. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.

In the drawings:

FIGS. 1A-D: beta cell-derived DNA in the circulation of T1D diabetes patients.

A, structure of the insulin gene promoter fragment used as marker. Black dots represent CpG sites, arrows mark positions of PCR primers.
B, methylation status of individual CpG sites in the insulin gene promoter in multiple tissues. Graph shows the percentage of unmethylated molecules in DNA from each tissue. The set of columns on the right describe the percentage of molecules in which all 6 CpG sites are unmethylated.
C, beta-cell derived DNA in the plasma of healthy controls (c) and recently diagnosed T1D patients. The fraction of fully unmethylated insulin promoter DNA molecules (reflective of the fraction of beta-cell derived DNA) was multiplied by the absolute level of cfDNA measured in each individual. Mann Whitney test for controls vs patients, p<0.0001.
D, beta-cell derived DNA in the circulation of long-time T1D patients sampled at the indicated time points after intrahepatic islet transplantation.

FIGS. 2A-C: identification of oligodendrocyte-derived cfDNA in multiple sclerosis.

A, methylation status of MBP3 and WM1 in multiple tissues. Note lack of methylation in total brain but not in neuron-rich cerebellum, suggesting that unmethylated molecules derived from glia (see also FIGS. 8A-E and 9A-E).
B, oligodendrocyte-derived DNA in the plasma of healthy individuals.
C, oligodendrocyte-derived DNA in the plasma of remitting and relapsing MS/NMO patients. Graph shows the cumulative values of unmethylated MBP3 and WM1 in each sample. Controls vs stable disease, p=0.6; controls vs relapsing disease, p<0.0001; stable vs relapsing disease, p<0.0001; controls vs all patients, p=0.021.

FIGS. 3A-D: identification of brain-derived cfDNA after brain damage.

A, methylation status of CpG sites at the CG09787504 locus (Brain1) in multiple tissues, as determined by deep sequencing. Bars represent the percentage of molecules in which all 9 CpGs of the locus are unmethylated.
B, brain-derived DNA in the plasma of 12 healthy volunteers, calculated by multiplying the fraction of fully unmethylated Brain1 molecules by the amount of cfDNA in each individual.
C, brain-derived DNA in the plasma of 10 patients after cardiac arrest. Each patient was sampled immediately after resuscitation (“acute”) and at subsequent time points. Labels mark patients that survived (“alive”) or died, and the cause of death (cerebral, cardiac or respiratory). Controls vs patients (all time points), p<0.0001.
D, brain-derived DNA in the plasma of 5 patients after traumatic brain injury, sampled at different days after admission to a neurotrauma unit. After one year, two patients remained with impaired neurological score, two patients recovered, and one patient did not survive, as indicated. Controls vs patients (all time points), p=0.005.

FIGS. 4A-C: identification of exocrine pancreas-derived cfDNA in patients with pancreatic cancer or pancreatitis.

A, methylation status of CpG clusters in the CUX2 and REG1A loci, in multiple tissues. While CUX2 appears to be unmethyalated selectively in ducts, REG1A is unmethylated in both ducts and acinar cells, but also in ˜30% of colon cells.
B, levels of unmethylated CUX2 and REG1A DNA fragments in the plasma of healthy individuals.
C, levels of unmethylated exocrine pancreas markers in the plasma of patients with pancreatic cancer or chronic pancreatitis. The graph shows the intensity of signal from each marker for each patient, after reducing the background signal (the highest signal seen among healthy controls). Controls vs all cancer patients, p<0.0001; controls vs localized cancer, p<0.0001; controls vs metastatic disease, p<0.0001; localized vs metastatic cancer, p=0.047; controls vs pancreatitis, p<0.0001.

FIGS. 5A-B: flow chart of method to detect circulating DNA derived from a specific tissue.

A, procedure to identify tissue-specific methylation markers.
B, procedure to determine levels of tissue-specific DNA in plasma.

FIGS. 6A-B: methylation of the insulin gene promoter in the plasma of healthy volunteers and recently diagnosed T1D patients.

A, methylation status of individual CpG sites at the insulin gene promoter.
B, methylation status of an expanded window of 4-6 CpGs expressed as % unmethylated DNA, in the same patients as in panel A.

FIGS. 7A-E: methylation of the 3′ UTR of MBP3.

A, structure of the MBP3 3′ UTR fragment used as marker. Lollipops represent CpGs. Empty lollipop represents the CpG detected in the Illumina 450k array. Arrows mark positions of PCR primers.
B, methylation status of the individual CpG site at the MBP3 locus that is captured in the Illumina 450k array. Data from publicly available 450k arrays.
C, methylation status of individual CpG sites and an expanded window of multiple CpGs from the MBP3 locus, in multiple tissues, as determined by deep sequencing.
D, methylation of individual CpG sites from the MBP3 locus in the plasma of healthy controls and relapsing MS/NMO patients.
E, fraction of fully unmethylated MBP3 locus fragment in the plasma of healthy volunteers and MS/NMO patients (same patients as in panel D). Total unmethylated MBP3 locus DNA expressed in ng/ml plasma is shown in FIG. 2C.

FIGS. 8A-E: methylation of CG10809560 and adjacent CpG sites (the WM1 locus).

A, structure of the WM1 locus fragment used as marker. Lollipops represent CpGs. Empty lollipop represents the CpG detected in the Illumina 450k array. Arrows mark positions of PCR primers.
B, methylation status of WM1 in multiple tissues as recorded in publicly available Illumina 450k arrays.
C, methylation status of individual CpG sites and expanded window of multiple CpGs from the WM1 locus, in multiple tissues, as determined by deep sequencing.
D, methylation of individual CpG sites from the WM1 locus in the plasma of healthy controls and relapsing MS/NMO patients.
E, fraction of fully unmethylated WM1 DNA fragments in the plasma of healthy volunteers and MS/NMO patients (same patients as in panel D).

FIGS. 9A-E: methylation of brain marker CG09787504 (Brain1) and adjacent CpG sites.

A, structure of Brain1 locus fragment used as marker. Lollipops represent CpGs. Empty lollipop represents the CpG detected in the Illumina 450k array. Arrows mark positions of PCR primers.
B, methylation status of Brain1 in multiple tissues as recorded in publicly available Illumina 450k arrays.
C, methylation status of individual CpG sites and expanded window of multiple CpGs from the Brain1 locus, in multiple tissues, as determined by deep sequencing.
D, methylation of individual CpG sites from the Brain1 locus in the plasma of healthy controls and patients after cardiac arrest.
E, fraction of fully unmethylated Brain1 DNA fragments in the plasma of healthy volunteers and patients after cardiac arrest (same patients as in panel D).

FIGS. 10A-E: methylation of the CpG cluster near the REG1A gene.

A, structure of the REG1A fragment used as marker. Lollipops represent CpGs. Empty lollipop represents the CpG detected in the Illumina 450k array. Arrows mark positions of PCR primers.
B, methylation status of the individual CpG site in the REG1A locus that is captured in the Illumina 450k array. Data from publicly available 450k arrays.
C, methylation status of individual CpG sites and expanded window of multiple CpGs from the REG1A locus in multiple tissues, as determined by deep sequencing.
D, methylation of individual CpG sites from the REG1A locus in the plasma of healthy controls and patients with pancreatic cancer.
E, fraction of fully unmethylated REG1A fragment in the plasma of healthy volunteers and patients with pancreatic cancer (same patients as in panel D).

FIGS. 11A-E: methylation of the CpG cluster near the CUX2 gene.

A, structure of the CUX2 fragment used as marker. Lollipops represent CpGs. Empty lollipop represents the CpG detected in the Illumina 450k array. Arrows mark positions of PCR primers.
B, methylation status of the individual CpG site at the CUX2 locus that is captured in the Illumina 450k array. Data from publicly available 450k arrays.
C, methylation status of individual CpG sites and expanded window of multiple CpGs from the CUX2 locus, in multiple tissues, as determined by deep sequencing.
D, methylation of individual CpG sites from the CUX2 locus in the plasma of healthy controls and patients with pancreatic cancer.
E, fraction of fully unmethylated CUX2 fragment in the plasma of healthy volunteers and patients with pancreatic cancer (same patients as in panel D).

FIG. 12 is a graph illustrating the existence of brain DNA fragments in plasma of some ALS patients.

FIG. 13 is a graph illustrating the existence of oligodendrocyte DNA (MBP; SEQ ID NO: 1248) and white matter DNA (WM1; SEQ ID NO: 1247) in plasma of ALS patients (ng/mL serum)

FIGS. 14A-D are graphs illustrating methylation markers for colon epithelial cells. FIG. 14A—sequence comprised in SEQ ID NO: 1258; FIG. 14B—sequence comprised in SEQ ID NO: 1259; FIG. 14C—sequence comprised in 1260; and FIG. 14D—sequence comprised in SEQ ID NO: 1257).

FIG. 15 is a graph illustrating there are no colon DNA fragments in blood of healthy subjects (normal turnover to lumen), whilst there is a presence of colon DNA in blood of patients with colon cancer and Crohn's disease.

FIG. 16 is a graph illustrating the tissue distribution of unmethylated SFTP/A1 (the sequence comprised in SEQ ID NO: 1268), a lung marker in various tissues.

FIG. 17 is a graph illustrating the tissue distribution of unmethylated SFTP/C (the sequence comprised in SEQ ID NO: 1273), a lung marker in various tissues.

FIG. 18 is a graph illustrating the tissue distribution of unmethylated CHST (the sequence comprised in SEQ ID NO: 1272), a lung marker in various tissues.

FIG. 19 is a graph illustrating the tissue distribution of unmethylated RAB4 (the sequence comprised in SEQ ID NO: 1271), a lung marker in various tissues.

FIGS. 20A-C are graphs illustrating the tissue distribution of three skeletal muscle markers (TNNI2, the sequence comprised in 1278, TPO, the sequence comprised in SEQ ID NO: 1277 and MAD1L1, the sequence comprised in SEQ ID NO: 1276), unmethylated in muscle and methylated elsewhere.

FIG. 21 is a graph illustrating the levels of skeletal muscle-derived DNA in plasma of healthy individuals after intensive exercise and in patients with Duchenne or Becker Muscular Dystrophy.

FIG. 22 is a graph illustrating the presence of a methylation marker of vascular endothelial cells (comprised in the sequence as set forth in SEQ ID NO: 1261 of the DCUN1D2 gene).

FIG. 23 is a graph illustrating the presence of a methylation marker of hepatocytes (comprised in the sequence as set forth in SEQ ID NO: 1267 of the ALB gene).

FIGS. 24A-B are graphs illustrating the presence of a methylation marker of lymphocytes (comprised in the sequence as set forth in SEQ ID NO: 1275 of the PTPRCAP gene and the sequence as set forth in SEQ ID NO: 1274 of the AGAP2 gene).

FIG. 25 is a graph illustrating the presence of a methylation marker of kidney (comprised in the sequence as set forth in SEQ ID NO: 1266 of the AQP2 gene).

FIGS. 26A-D are graphs illustrating the presence of methylation markers of adipocytes (comprised in the sequence as set forth in SEQ ID NO: 1262 of the ACOT7 gene, the sequence as set forth in SEQ ID NO: 1263 of the COL4A1 gene, the sequence as set forth in SEQ ID NO: 1264 of the FRMD4A gene, the sequence as set forth in SEQ ID NO: 1265 of the NNMT gene).

FIGS. 27A-D are graphs illustrating adipocyte-derived DNA in the circulation of healthy individuals (C− control).

DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION

The present invention, in some embodiments thereof, relates to a method of determining the source of cell-free DNA and use thereof for diagnosing pathological processes associated with cell death, monitoring therapeutic regimes such as drugs intended to change cell death and in studying for clinical and research purposes processes affecting cell death levels.

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details set forth in the following description or exemplified by the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.

The analysis of circulating DNA is beginning to revolutionize prenatal diagnosis, tumor diagnosis and the monitoring of graft rejection. However a major limitation of all applications is the dependence on the presence of identifiable genetic differences between the tissue of interest and the host. The present inventors have conceived of a novel approach for detecting the tissue origins of cell-free circulating DNA which overcomes this limitation.

The present inventors propose analysis of tissue-specific methylation patterns (comprising 4 or more methylation sites) present in circulating DNA fragments to provide a robust tool to non-invasively detect cell death in essentially every tissue of the human body. The present methods shows drastically reduced noise (level of signal in healthy individuals) compared with other approaches, to the degree that clinical utility can be envisaged.

Whilst reducing the present invention to practice, the present inventors detected circulating plasma DNA derived from specific human tissues in specific pathologies on the basis of analyzing such methylation patterns. Examples include the detection of circulating pancreatic beta cell DNA in type 1 diabetes (FIGS. 1A-D), exocrine pancreas DNA in pancreatic ductal adenocarcinoma and pancreatitis (FIGS. 4A-C), brain DNA after traumatic brain injury or an ischemic insult (FIGS. 3A-D), oligodendrocyte DNA in patients with relapsing multiple sclerosis (FIGS. 2A-C) and oligodendrocyte DNA and white matter DNA in patients with ALS (FIGS. 12 and 13). In addition, the present inventors detected colon DNA in blood of colon cancer and Crohn's disease patients, skeletal muscle DNA in blood of healthy individuals after exercise, adipocyte DNA in blood of healthy individuals, endothelial cell DNA in blood of cancer patients and oligodendrocyte DNA in blood of patients with glioblastoma multiforme.

The approach enables the detection, in a minimally invasive yet highly sensitive and specific manner, acute cell death in normal and pathologic human tissues. The potential utility of this approach is very broad. In the study of normal physiology, this method can be used to monitor tissue dynamics during development and during physiologic perturbations such as dietary changes, pregnancy and aging. In a wide variety of pathologies, including but not limited to cancers, trauma, infections and autoimmune diseases, the method can be used for early diagnosis, monitoring disease progression and assessment of response to therapy. In the setting of new drug development, the method can be adapted to identify signals of efficacy or toxicity, potentially streamlining the long and expensive process of drug development.

Thus, according to one aspect of the present invention there is provided a method of identifying a methylation signature for a cell type or tissue of interest comprising identifying in the DNA of the cell type or tissue of interest a continuous sequence of no more than 300 nucleotides which comprise at least 4 methylation sites, wherein each of the sites are differentially methylated with respect to a second non-identical cell, thereby identifying the methylation signature for the cell type or tissue of interest.

The present invention contemplates identifying methylation signatures in any cell of interest, including but not limited to pancreatic cells (such as pancreatic beta cells, exocrine pancreatic cells (e.g. acinar cells), brain cells, oligodendrocytes, cardiac cells (cardiomyocytes), liver cells (hepatocytes), kidney cells, vascular endothelial cells, lymphocytes, lung cells, a uterus cells, breast cells, adipocytes, colon cells, rectum cells, prostate cells, thyroid cells and skeletal muscle cells.

As used herein, the term “methylation site” refers to a cytosine residue adjacent to guanine residue (CpG site) that has a potential of being methylated.

The continuous sequence is preferably no longer than 300 nucleotides, 295 nucleotides, 290 nucleotides, 285 nucleotides, 280 nucleotides, 275 nucleotides, 270 nucleotides, 265 nucleotides, 260 nucleotides, 255 nucleotides, 250 nucleotides, 245 nucleotides, 240 nucleotides, 235 nucleotides, 230 nucleotides, 225 nucleotides, 220 nucleotides, 215 nucleotides, 210 nucleotides, 205 nucleotides, 200 nucleotides, 195 nucleotides, 190 nucleotides, 185 nucleotides, 180 nucleotides, 175 nucleotides, 170 nucleotides, 165 nucleotides, 160 nucleotides, 155 nucleotides, 150 nucleotides, 145 nucleotides, 140 nucleotides, 135 nucleotides, 130 nucleotides, 125 nucleotides, 120 nucleotides, 115 nucleotides, 110 nucleotides, 105 nucleotides, 100 nucleotides, 95 nucleotides, 90 nucleotides, 85 nucleotides, 80 nucleotides, 75 nucleotides, 70 nucleotides, 65 nucleotides, 60 nucleotides, 55 nucleotides, or 50 nucleotides.

According to a particular embodiment, the sequence is between 50-300 nucleotides, e.g. between 50-250, between 50-200, between 100-300 nucleotides, or between 100-250 nucleotides.

The sequence may be of a coding or non-coding region.

According to a particular embodiment, the sequence is not derived from a gene which is differentially expressed in the cell of interest. Thus, for example in the case of identifying a methylation pattern for a pancreatic beta cell, the DNA sequence may not be part of a gene encoding insulin or another pancreatic beta cell protein.

In accordance with another particular embodiment, the methylation pattern characterizes the normal cell of interest and is not a methylation pattern characterizing a diseased cell (is not for example a methylation pattern characterizing cancer cells of a specific type).

The continuous nucleic acid sequences comprise at least 4 methylation sites, although at least 5, at least 6, at least 7 at least 8, at least 9 or even at least 10 or more methylation sites are contemplated.

In order to be considered a methylation signature for a particular cell of interest each of the at least four methylation sites have to be differentially methylated in that cell of interest with respect to a second non-identical cell.

According to a particular embodiment, each of the at least four methylation sites are unmethylated in the cell of interest (the cell for which the methylation pattern is being determined), whereas in the second non-identical cell each of the sites are methylated.

According to another embodiment, each of the at least four methylation sites are methylated in the cell of interest, whereas in the second non-identical cell each of the sites are unmethylated.

According to another embodiment, at least one of the four methylation sites is unmethylated in the cell of interest, whereas in the second non-identical cell that site is methylated.

According to another embodiment, at least two of the four methylation sites are unmethylated in the cell of interest, whereas in the second non-identical cell those sites are methylated.

According to another embodiment, at least three of the four methylation sites are unmethylated in the cell of interest, whereas in the second non-identical cell those sites are methylated.

The second non-identical cell may be of any source including for example blood cells.

Using this method, the present inventors have identified methylation signatures of DNA derived from pancreatic beta cells, acinar cells, brain cells, neurons, oligodendrocytes, cardiomyocytes, hepatocytes, kidney cells and skeletal muscle cells and show that these signatures can successfully distinguish between DNA derived from those cells and DNA derived from blood cells.

Thus, according to another aspect of the present invention there is provided a method of determining whether DNA is derived from a cell of interest in a sample, the method comprising:

determining the methylation status of at least four methylation sites on a continuous sequence of the DNA, the sequence comprising no more than 300 nucleotides, wherein a methylation status of each of the at least four methylation sites on the continuous sequence characteristic of the cell of interest, is indicative that the DNA is derived from the cell of interest.

It will be appreciated that the method is appropriate for examining if the investigated DNA is derived from a particular cell type or tissue type since the sequences analyzed are specific for particular cell/tissue types.

Thus, for example if the investigator wishes to determine if the DNA present in a sample is derived from pancreatic beta cells, he/she needs to analyze sequences which have a methylation pattern characteristic of pancreatic beta cells.

Such sequences are comprised for example in sequences as set forth in SEQ ID NOs: 1-50 and 1241-1244.

SEQ ID NOs: 1-27 and 1241-1244 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in pancreatic beta cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in pancreatic beta cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 28-50 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in pancreatic beta cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in pancreatic beta cells and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from pancreatic ductal cells, he/she needs to analyze sequences which have a methylation pattern characteristic of ductal cells.

Such sequences are comprised for example in sequences as set forth in SEQ ID NOs: 51-150.

SEQ ID NOs: 51-100 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in pancreatic ductal cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in pancreatic ductal cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 101-150 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in pancreatic ductal cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in pancreatic ductal cells and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from liver cells, he/she needs to analyze sequences which have a methylation pattern characteristic of liver cells.

Such sequences are comprised for example in sequences as set forth in SEQ ID NOs: 151-197 and 1267.

SEQ ID NOs: 151-173 and 1267 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in liver cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in liver cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 174-197 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in liver cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in liver cells and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from lung tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of lung cells.

Such sequences are comprised for example in sequences 198-203 and 1268-1273.

SEQ ID NOs: 198-200 and 1268-1273 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in lung cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in lung cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 201-203 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in lung cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in lung cells and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from uterine tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of uterus cells.

Such sequences are comprised for example in sequences 204-237.

SEQ ID NOs: 204-227 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in uterus cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in uterus cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 228-237 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in uterus cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in uterus cells and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from kidney cells, he/she needs to analyze sequences which have a methylation pattern characteristic of kidney cells.

Such sequences are comprised for example in sequences 238-273 and 1266.

SEQ ID NOs: 238-254 and 1266 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in kidney cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in kidney cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 255-273 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in kidney cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in kidney cells and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from breast tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of breast cells.

Such sequences are comprised for example in sequences 274-290.

SEQ ID NOs: 274-277 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in breast tissue and methylated in other cells of non-breast tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in breast cells and methylated in cells of non-breast tissue (e.g. blood cells).

SEQ ID NOs: 278-290 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in breast tissue and unmethylated in cells of non-breast tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in breast tissue and unmethylated in cells of non-breast tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from adipocytes, he/she needs to analyze sequences which have a methylation pattern characteristic of adipocytes.

Such sequences are comprised for example in sequences as set forth in SEQ ID NOs: 291-338 and 1262-1265.

SEQ ID NOs: 291-337 and 1262-1265 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in adipocytes and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in adipocytes and methylated in other cells (e.g. blood cells).

SEQ ID NO: 338 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in adipocytes and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of this sequence. This CG is methylated in adipocytes and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from colon tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of cells derived from colon tissue.

Such sequences are comprised for example in sequences as set forth in SEQ ID NOs: 339-377 and 1257-1260.

SEQ ID NOs: 339-351 and 1257-1260 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in cells of colon tissue and methylated in other cells of non-colon tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in colon cells and methylated in cells of non-colon tissue (e.g. blood cells).

SEQ ID NOs: 352-377 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in colon tissue and unmethylated in cells of non-colon tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in colon tissue and unmethylated in cells of non-colon tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from prostate tissue cells, he/she needs to analyze sequences which have a methylation pattern characteristic of prostate tissue cells.

Such sequences are comprised for example in sequences 378-443.

SEQ ID NOs: 378-409 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in prostate tissue and methylated in other cells of non-prostate tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in prostate cells and methylated in cells of non-prostate tissue (e.g. blood cells).

SEQ ID NOs: 410-443 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in prostate tissue and unmethylated in cells of non-prostate tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in prostate tissue and unmethylated in cells of non-prostate tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from thyroid tissue cells, he/she needs to analyze sequences which have a methylation pattern characteristic of thyroid tissue cells.

Such sequences are comprised for example in sequences 444-501.

SEQ ID NOs: 444-455 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in thyroid tissue and methylated in other cells of non-thyroid tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in thyroid cells and methylated in cells of non-thyroid tissue (e.g. blood cells).

SEQ ID NOs: 456-501 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in thyroid tissue and unmethylated in cells of non-thyroid tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in thyroid tissue and unmethylated in cells of non-thyroid tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from bladder tissue cells, he/she needs to analyze sequences which have a methylation pattern characteristic of bladder tissue cells.

Such sequences are comprised for example in sequences 502-509.

SEQ ID NOs: 502-506 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in bladder tissue and methylated in other cells of non-bladder tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in bladder cells and methylated in cells of non-bladder tissue (e.g. blood cells).

SEQ ID NOs: 507-509 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in bladder tissue and unmethylated in cells of non-bladder tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in bladder tissue and unmethylated in cells of non-bladder tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from islets, he/she needs to analyze sequences which have a methylation pattern characteristic of islets.

Such sequences are comprised for example in sequences 510-746.

SEQ ID NOs: 510-650 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in islets and methylated in cells of non-islet tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in islets and methylated in cells of non-islet tissue (e.g. blood cells).

SEQ ID NOs: 651-746 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in islets and unmethylated in cells of non-islet tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in islets and unmethylated in cells of non-islet tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from skeletal muscle, he/she needs to analyze sequences which have a methylation pattern characteristic of skeletal muscle.

Such sequences are comprised for example in sequences 747-817 and 1276-1279.

SEQ ID NOs: 747-767 and 1276-1279 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in skeletal muscle and methylated in cells of non-skeletal muscle tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in skeletal muscle and methylated in cells of non-skeletal muscle tissue (e.g. blood cells).

SEQ ID NOs: 768-817 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in skeletal muscle and unmethylated in cells of non-skeletal muscle tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in islets and unmethylated in cells of non-skeletal muscle tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from pancreatic tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of pancreatic tissue.

Such sequences are comprised for example in sequences 818-863 and 1280-1284.

SEQ ID NOs: 818-835 and 1280-1284 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in pancreatic tissue and methylated in cells of non-pancreatic tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in pancreatic tissue and methylated in cells of non-pancreatic tissue (e.g. blood cells).

SEQ ID NOs: 836-863 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in pancreatic and unmethylated in cells of non-pancreatic tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in pancreatic tissue and unmethylated in cells of non-pancreatic tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from brain white matter tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of brain white matter tissue.

Such sequences are comprised for example in sequences 864-1012.

SEQ ID NOs: 864-963 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in brain white matter tissue and methylated in cells of non-brain white matter tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in white matter tissue and methylated in cells of non-white matter tissue (e.g. blood cells).

SEQ ID NOs: 964-1012 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in white matter tissue and unmethylated in cells of non-white matter tissue (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in white matter tissue and unmethylated in cells of non-white matter tissue (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from blood cells, he/she needs to analyze sequences which have a methylation pattern characteristic of blood cells.

Such sequences are comprised for example in sequences 1013-1137 and 1274-1275.

SEQ ID NOs: 1013-1112 and 1274-1275 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in blood cells and methylated in non-blood cells. According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in blood cells and methylated in non blood cells.

SEQ ID NOs: 1113-1137 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in blood cells and unmethylated in non blood cells. According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in blood cells and unmethylated in non blood cells.

If the investigator wishes to determine if the DNA present in a sample is derived from cervical tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of cervical cells.

Such sequences are comprised for example in sequences 1138-1216.

SEQ ID NOs: 1138-1173 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in cervical tissue cells and methylated in non-cervical tissue cells. According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in cervical cells and methylated in non-cervical tissue cells.

SEQ ID NOs: 1174-1216 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in cervical tissue cells and unmethylated in non cervical tissue cells. According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in cervical tissue cells and unmethylated in non cervical tissue cells.

If the investigator wishes to determine if the DNA present in a sample is derived from retinal tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of retinal cells.

Such sequences are comprised for example in sequences 1217-1240.

SEQ ID NOs: 1217-1240 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in retinal tissue cells and methylated in non-retinal tissue cells. According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in retinal cells and methylated in non-retinal tissue cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from brain tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of brain tissue cells.

Such sequences are comprised for example in sequences 1285-1364 and 1245-1256.

SEQ ID NOs: 1245-1256 and 1285-1316 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in brain tissue cells and methylated in non-brain tissue cells. According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in brain cells and methylated in non-brain tissue cells (e.g. blood cells).

SEQ ID NOs: 1317-1364 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in brain tissue cells and unmethylated in non brain tissue cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in brain tissue cells and unmethylated in non-brain tissue cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from rectal tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of rectal tissue cells.

Such sequences are comprised for example in sequences as set forth in SEQ ID NOs: 1365-1385.

SEQ ID NOs: 1365-1373 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in rectal tissue cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in rectal tissue cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 1374-1385 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in rectal tissue cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in rectal tissue cells and unmethylated in other cells (e.g. blood cells).

If the investigator wishes to determine if the DNA present in a sample is derived from cardiac tissue, he/she needs to analyze sequences which have a methylation pattern characteristic of cardiac tissue cells.

Such sequences are comprised for example in sequences as set forth in SEQ ID NOs: 1386-1484.

SEQ ID NOs: 1386-1435 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are unmethylated in cardiac tissue cells and methylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is unmethylated in cardiac tissue cells and methylated in other cells (e.g. blood cells).

SEQ ID NOs: 1436-1484 comprise sequences which include at least 4 methylation sites in a continuous sequence of no more than 300 nucleotides that are methylated in cardiac tissue cells and unmethylated in other cells (e.g. blood cells). According to a particular embodiment, the continuous sequence comprises the nucleotides CG which are at position 250 and 251 of each of these sequences. This CG is methylated in cardiac tissue cells and unmethylated in other cells (e.g. blood cells).

The sequences which have been identified by the present inventors as being candidates for determining the cellular source of a DNA may be stored in a database. Exemplary sequences include those described herein above—such as those set forth in SEQ ID NOs: 1-1484.

The database may be divided into sequences which are relevant for identification of a particular cell type or tissue source. Additionally, or alternatively, the database may be divided into sequences which when non-methylated are indicative of a particular cell type/tissue source and into sequences which, when methylated are indicative of a particular cell type/tissue source.

The database may also contain the methylation status of the sequences in subjects which have been tested. The subjects may be classified as being healthy or diseased.

The database may be stored in a computer readable format on a computer readable medium, and is optionally and preferably accessed by a data processor, such as a general purpose computer or dedicated circuitry.

Samples which may be analyzed are generally fluid samples derived from mammalian subjects and include for example blood, plasma, sperm, milk, urine, saliva or cerebral spinal fluid.

Samples which are analyzed typically comprise DNA from at least two cell/tissue sources, as further described herein below.

According to one embodiment, a sample of blood is obtained from a subject according to methods well known in the art. Plasma or serum may be isolated according to methods known in the art.

DNA may be isolated from the blood immediately or within 1 hour, 2 hours, 3 hours, 4 hours, 5 hours or 6 hours. Optionally the blood is stored at temperatures such as 4° C., or at −20° C. prior to isolation of the DNA. In some embodiments, a portion of the blood sample is used in accordance with the invention at a first instance of time whereas one or more remaining portions of the blood sample (or fractions thereof) are stored for a period of time for later use.

According to one embodiment, the DNA is cellular DNA (i.e. comprised in a cell).

According to still another embodiment, the DNA is comprised in a shedded cell or non-intact cell.

Methods of DNA extraction are well-known in the art. A classical DNA isolation protocol is based on extraction using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (J. Sambrook et al., “Molecular Cloning: A Laboratory Manual”, 1989, 2nd Ed., Cold Spring Harbour Laboratory Press: New York, N.Y.). Other methods include: salting out DNA extraction (P. Sunnucks et al., Genetics, 1996, 144: 747-756; S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25: 4692-4693), trimethylammonium bromide salts DNA extraction (S. Gustincich et al., BioTechniques, 1991, 11: 298-302) and guanidinium thiocyanate DNA extraction (J. B. W. Hammond et al., Biochemistry, 1996, 240: 298-300).

There are also numerous versatile kits that can be used to extract DNA from tissues and bodily fluids and that are commercially available from, for example, BD Biosciences Clontech (Palo Alto, Calif.), Epicentre Technologies (Madison, Wis.), Gentra Systems, Inc. (Minneapolis, Minn.), MicroProbe Corp. (Bothell, Wash.), Organon Teknika (Durham, N.C.), and Qiagen Inc. (Valencia, Calif.). User Guides that describe in great detail the protocol to be followed are usually included in all these kits. Sensitivity, processing time and cost may be different from one kit to another. One of ordinary skill in the art can easily select the kit(s) most appropriate for a particular situation.

According to another embodiment, the DNA is cell-free DNA. For this method, cell lysis is not performed on the sample. Methods of isolating cell-free DNA from body fluids are also known in the art. For example Qiaquick kit, manufactured by Qiagen may be used to extract cell-free DNA from plasma or serum.

The sample may be processed before the method is carried out, for example DNA purification may be carried out following the extraction procedure. The DNA in the sample may be cleaved either physically or chemically (e.g. using a suitable enzyme). Processing of the sample may involve one or more of: filtration, distillation, centrifugation, extraction, concentration, dilution, purification, inactivation of interfering components, addition of reagents, and the like.

It will be appreciated that the present invention contemplates analyzing more than one target sequence (each one comprising at least four methylation sites on a continuous sequence of the DNA). Thus, for example 2, 3, 4, 5, 6, 7, 8, 9, 10 or more target sequences (serving as tissue or cell-specific markers) may be analyzed. This may be effected in parallel using the same DNA preparation or on a plurality of DNA preparations.

Methods of determining the methylation status of a methylation site are known in the art and include the use of bisulfite.

In this method, DNA is treated with bisulfite which converts cytosine residues to uracil (which are converted to thymidine following PCR), but leaves 5-methylcytosine residues unaffected. Thus, bisulfite treatment introduces specific changes in the DNA sequence that depend on the methylation status of individual cytosine residues, yielding single-nucleotide resolution information about the methylation status of a segment of DNA. Various analyses can be performed on the altered sequence to retrieve this information. The objective of this analysis is therefore reduced to differentiating between single nucleotide polymorphisms (cytosines and thymidine) resulting from bisulfite conversion.

During the bisulfite reaction, care should be taken to minimize DNA degradation, such as cycling the incubation temperature.

Bisulfite sequencing relies on the conversion of every single unmethylated cytosine residue to uracil. If conversion is incomplete, the subsequent analysis will incorrectly interpret the unconverted unmethylated cytosines as methylated cytosines, resulting in false positive results for methylation. Only cytosines in single-stranded DNA are susceptible to attack by bisulfite, therefore denaturation of the DNA undergoing analysis is critical. It is important to ensure that reaction parameters such as temperature and salt concentration are suitable to maintain the DNA in a single-stranded conformation and allow for complete conversion.

According to a particular embodiment, an oxidative bisulfite reaction is performed. 5-methylcytosine and 5-hydroxymethylcytosine both read as a C in bisulfite sequencing. Oxidative bisulfite reaction allows for the discrimination between 5-methylcytosine and 5-hydroxymethylcytosine at single base resolution. The method employs a specific chemical oxidation of 5-hydroxymethylcytosine to 5-formylcytosine, which subsequently converts to uracil during bisulfite treatment. The only base that then reads as a C is 5-methylcytosine, giving a map of the true methylation status in the DNA sample. Levels of 5-hydroxymethylcytosine can also be quantified by measuring the difference between bisulfite and oxidative bisulfite sequencing.

Prior to analysis (or concomitant therewith), the bisulfite-treated DNA sequence which comprises the at least four methylation sites may be subjected to an amplification reaction. If amplification of the sequence is required care should be taken to ensure complete desulfonation of pyrimidine residues. This may be effected by monitoring the pH of the solution to ensure that desulfonation is complete.

As used herein, the term “amplification” refers to a process that increases the representation of a population of specific nucleic acid sequences in a sample by producing multiple (i.e., at least 2) copies of the desired sequences. Methods for nucleic acid amplification are known in the art and include, but are not limited to, polymerase chain reaction (PCR) and ligase chain reaction (LCR). In a typical PCR amplification reaction, a nucleic acid sequence of interest is often amplified at least fifty thousand fold in amount over its amount in the starting sample. A “copy” or “amplicon” does not necessarily mean perfect sequence complementarity or identity to the template sequence. For example, copies can include nucleotide analogs such as deoxyinosine, intentional sequence alterations (such as sequence alterations introduced through a primer comprising a sequence that is hybridizable but not complementary to the template), and/or sequence errors that occur during amplification.

A typical amplification reaction is carried out by contacting a forward and reverse primer (a primer pair) to the sample DNA together with any additional amplification reaction reagents under conditions which allow amplification of the target sequence.

The terms “forward primer” and “forward amplification primer” are used herein interchangeably, and refer to a primer that hybridizes (or anneals) to the target (template strand). The terms “reverse primer” and “reverse amplification primer” are used herein interchangeably, and refer to a primer that hybridizes (or anneals) to the complementary target strand. The forward primer hybridizes with the target sequence 5′ with respect to the reverse primer.

The term “amplification conditions”, as used herein, refers to conditions that promote annealing and/or extension of primer sequences. Such conditions are well-known in the art and depend on the amplification method selected. Thus, for example, in a PCR reaction, amplification conditions generally comprise thermal cycling, i.e., cycling of the reaction mixture between two or more temperatures. In isothermal amplification reactions, amplification occurs without thermal cycling although an initial temperature increase may be required to initiate the reaction. Amplification conditions encompass all reaction conditions including, but not limited to, temperature and temperature cycling, buffer, salt, ionic strength, and pH, and the like.

As used herein, the term “amplification reaction reagents”, refers to reagents used in nucleic acid amplification reactions and may include, but are not limited to, buffers, reagents, enzymes having reverse transcriptase and/or polymerase activity or exonuclease activity, enzyme cofactors such as magnesium or manganese, salts, nicotinamide adenine dinuclease (NAD) and deoxynucleoside triphosphates (dNTPs), such as deoxyadenosine triphospate, deoxyguanosine triphosphate, deoxycytidine triphosphate and thymidine triphosphate. Amplification reaction reagents may readily be selected by one skilled in the art depending on the amplification method used.

According to this aspect of the present invention, the amplifying may be effected using techniques such as polymerase chain reaction (PCR), which includes, but is not limited to Allele-specific PCR, Assembly PCR or Polymerase Cycling Assembly (PCA), Asymmetric PCR, Helicase-dependent amplification, Hot-start PCR, Intersequence-specific PCR (ISSR), Inverse PCR, Ligation-mediated PCR, Methylation-specific PCR (MSP), Miniprimer PCR, Multiplex Ligation-dependent Probe Amplification, Multiplex-PCR, Nested PCR, Overlap-extension PCR, Quantitative PCR (Q-PCR), Reverse Transcription PCR (RT-PCR), Solid Phase PCR: encompasses multiple meanings, including Polony Amplification (where PCR colonies are derived in a gel matrix, for example), Bridge PCR (primers are covalently linked to a solid-support surface), conventional Solid Phase PCR (where Asymmetric PCR is applied in the presence of solid support bearing primer with sequence matching one of the aqueous primers) and Enhanced Solid Phase PCR (where conventional Solid Phase PCR can be improved by employing high Tm and nested solid support primer with optional application of a thermal ‘step’ to favour solid support priming), Thermal asymmetric interlaced PCR (TAIL-PCR), Touchdown PCR (Step-down PCR), PAN-AC and Universal Fast Walking.

The PCR (or polymerase chain reaction) technique is well-known in the art and has been disclosed, for example, in K. B. Mullis and F. A. Faloona, Methods Enzymol., 1987, 155: 350-355 and U.S. Pat. Nos. 4,683,202; 4,683,195; and 4,800,159 (each of which is incorporated herein by reference in its entirety). In its simplest form, PCR is an in vitro method for the enzymatic synthesis of specific DNA sequences, using two oligonucleotide primers that hybridize to opposite strands and flank the region of interest in the target DNA. A plurality of reaction cycles, each cycle comprising: a denaturation step, an annealing step, and a polymerization step, results in the exponential accumulation of a specific DNA fragment (“PCR Protocols: A Guide to Methods and Applications”, M. A. Innis (Ed.), 1990, Academic Press: New York; “PCR Strategies”, M. A. Innis (Ed.), 1995, Academic Press: New York; “Polymerase chain reaction: basic principles and automation in PCR: A Practical Approach”, McPherson et al. (Eds.), 1991, TRL Press: Oxford; R. K. Saiki et al., Nature, 1986, 324: 163-166). The termini of the amplified fragments are defined as the 5′ ends of the primers. Examples of DNA polymerases capable of producing amplification products in PCR reactions include, but are not limited to: E. coli DNA polymerase I, Klenow fragment of DNA polymerase I, T4 DNA polymerase, thermostable DNA polymerases isolated from Thermus aquaticus (Taq), available from a variety of sources (for example, Perkin Elmer), Thermus thermophilus (United States Biochemicals), Bacillus stereothermophilus (Bio-Rad), or Thermococcus litoralis (“Vent” polymerase, New England Biolabs). RNA target sequences may be amplified by reverse transcribing the mRNA into cDNA, and then performing PCR (RT-PCR), as described above. Alternatively, a single enzyme may be used for both steps as described in U.S. Pat. No. 5,322,770.

The duration and temperature of each step of a PCR cycle, as well as the number of cycles, are generally adjusted according to the stringency requirements in effect. Annealing temperature and timing are determined both by the efficiency with which a primer is expected to anneal to a template and the degree of mismatch that is to be tolerated. The ability to optimize the reaction cycle conditions is well within the knowledge of one of ordinary skill in the art. Although the number of reaction cycles may vary depending on the detection analysis being performed, it usually is at least 15, more usually at least 20, and may be as high as 60 or higher. However, in many situations, the number of reaction cycles typically ranges from about 20 to about 45.

The denaturation step of a PCR cycle generally comprises heating the reaction mixture to an elevated temperature and maintaining the mixture at the elevated temperature for a period of time sufficient for any double-stranded or hybridized nucleic acid present in the reaction mixture to dissociate. For denaturation, the temperature of the reaction mixture is usually raised to, and maintained at, a temperature ranging from about 85° C. to about 100° C., usually from about 90° C. to about 98° C., and more usually from about 93° C. to about 96° C. for a period of time ranging from about 3 to about 120 seconds, usually from about 5 to about 30 seconds.

Following denaturation, the reaction mixture is subjected to conditions sufficient for primer annealing to template DNA present in the mixture. The temperature to which the reaction mixture is lowered to achieve these conditions is usually chosen to provide optimal efficiency and specificity, and generally ranges from about 50° C. to about ° C., usually from about 55° C. to about 70° C., and more usually from about 60° C. to about 68° C. Annealing conditions are generally maintained for a period of time ranging from about 15 seconds to about 30 minutes, usually from about 30 seconds to about 5 minutes.

Following annealing of primer to template DNA or during annealing of primer to template DNA, the reaction mixture is subjected to conditions sufficient to provide for polymerization of nucleotides to the primer's end in a such manner that the primer is extended in a 5′ to 3′ direction using the DNA to which it is hybridized as a template, (i.e., conditions sufficient for enzymatic production of primer extension product). To achieve primer extension conditions, the temperature of the reaction mixture is typically raised to a temperature ranging from about 65° C. to about 75° C., usually from about 67° C. to about 73° C., and maintained at that temperature for a period of time ranging from about 15 seconds to about 20 minutes, usually from about 30 seconds to about 5 minutes.

The above cycles of denaturation, annealing, and polymerization may be performed using an automated device typically known as a thermal cycler or thermocycler. Thermal cyclers that may be employed are described in U.S. Pat. Nos. 5,612,473; 5,602,756; 5,538,871; and 5,475,610 (each of which is incorporated herein by reference in its entirety). Thermal cyclers are commercially available, for example, from Perkin Elmer-Applied Biosystems (Norwalk, Conn.), BioRad (Hercules, Calif.), Roche Applied Science (Indianapolis, Ind.), and Stratagene (La Jolla, Calif.).

According to one embodiment, the primers which are used in the amplification reaction are methylation independent primers. These primers flank the first and last of the at least four methylation sites (but do not hybridize directly to the sites) and in a PCR reaction, are capable of generating an amplicon which comprises all four or more methylation sites.

The methylation-independent primers of this aspect of the present invention may comprise adaptor sequences which include barcode sequences. The adaptors may further comprise sequences which are necessary for attaching to a flow cell surface (P5 and P7 sites, for subsequent sequencing), a sequence which encodes for a promoter for an RNA polymerase and/or a restriction site. The barcode sequence may be used to identify a particular molecule, sample or library. The barcode sequence may be between 3-400 nucleotides, more preferably between 3-200 and even more preferably between 3-100 nucleotides. Thus, the barcode sequence may be 6 nucleotides, 7 nucleotides, 8, nucleotides, nine nucleotides or ten nucleotides. The barcode is typically 4-15 nucleotides.

The methylation-independent oligonucleotide of this aspect of the present invention need not reflect the exact sequence of the target nucleic acid sequence (i.e. need not be fully complementary), but must be sufficiently complementary so as to hybridize to the target site under the particular experimental conditions. Accordingly, the sequence of the oligonucleotide typically has at least 70% homology, preferably at least 80%, 90%, 95%, 97%, 99% or 100% homology, for example over a region of at least 13 or more contiguous nucleotides with the target sequence. The conditions are selected such that hybridization of the oligonucleotide to the target site is favored and hybridization to the non-target site is minimized.

Various considerations must be taken into account when selecting the stringency of the hybridization conditions. For example, the more closely the oligonucleotide (e.g. primer) reflects the target nucleic acid sequence, the higher the stringency of the assay conditions can be, although the stringency must not be too high so as to prevent hybridization of the oligonucleotides to the target sequence. Further, the lower the homology of the oligonucleotide to the target sequence, the lower the stringency of the assay conditions should be, although the stringency must not be too low to allow hybridization to non-specific nucleic acid sequences.

As mentioned, the present invention contemplates analyzing more than one target sequence (each one comprising at least four methylation sites on a continuous sequence of the DNA). The sequences may be analyzed individually or as part of a multiplex reaction.

The DNA may be sequenced using any method known in the art—e.g. massively parallel DNA sequencing, sequencing-by-synthesis, sequencing-by-ligation, 454 pyrosequencing, cluster amplification, bridge amplification, and PCR amplification, although preferably, the method comprises a high throughput sequencing method. Typical methods include the sequencing technology and analytical instrumentation offered by Roche 454 Life Sciences™, Branford, Conn., which is sometimes referred to herein as “454 technology” or “454 sequencing.”; the sequencing technology and analytical instrumentation offered by Illumina, Inc, San Diego, Calif. (their Solexa Sequencing technology is sometimes referred to herein as the “Solexa method” or “Solexa technology”); or the sequencing technology and analytical instrumentation offered by ABI, Applied Biosystems, Indianapolis, Ind., which is sometimes referred to herein as the ABI-SOLiD™ platform or methodology.

Other known methods for sequencing include, for example, those described in: Sanger, F. et al., Proc. Natl. Acad. Sci. U.S.A. 75, 5463-5467 (1977); Maxam, A. M. & Gilbert, W. Proc Natl Acad Sci USA 74, 560-564 (1977); Ronaghi, M. et al., Science 281, 363, 365 (1998); Lysov, l. et al., Dokl Akad Nauk SSSR 303, 1508-1511 (1988); Bains W. & Smith G. C. J. Theor Biol 135, 303-307 (1988); Drnanac, R. et al., Genomics 4, 114-128 (1989); Khrapko, K. R. et al., FEBS Lett 256.118-122 (1989); Pevzner P. A. J Biomol Struct Dyn 7, 63-73 (1989); and Southern, E. M. et al., Genomics 13, 1008-1017 (1992). Pyrophosphate-based sequencing reaction as described, e.g., in U.S. Pat. Nos. 6,274,320, 6,258,568 and 6,210,891, may also be used.

The Illumina or Solexa sequencing is based on reversible dye-terminators. DNA molecules are typically attached to primers on a slide and amplified so that local clonal colonies are formed. Subsequently one type of nucleotide at a time may be added, and non-incorporated nucleotides are washed away. Subsequently, images of the fluorescently labeled nucleotides may be taken and the dye is chemically removed from the DNA, allowing a next cycle. The Applied Biosystems' SOLiD technology, employs sequencing by ligation. This method is based on the use of a pool of all possible oligonucleotides of a fixed length, which are labeled according to the sequenced position. Such oligonucleotides are annealed and ligated. Subsequently, the preferential ligation by DNA ligase for matching sequences typically results in a signal informative of the nucleotide at that position. Since the DNA is typically amplified by emulsion PCR, the resulting bead, each containing only copies of the same DNA molecule, can be deposited on a glass slide resulting in sequences of quantities and lengths comparable to Illumina sequencing. Another example of an envisaged sequencing method is pyrosequencing, in particular 454 pyrosequencing, e.g. based on the Roche 454 Genome Sequencer. This method amplifies DNA inside water droplets in an oil solution with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs. A further method is based on Helicos' Heliscope technology, wherein fragments are captured by polyT oligomers tethered to an array. At each sequencing cycle, polymerase and single fluorescently labeled nucleotides are added and the array is imaged. The fluorescent tag is subsequently removed and the cycle is repeated. Further examples of sequencing techniques encompassed within the methods of the present invention are sequencing by hybridization, sequencing by use of nanopores, microscopy-based sequencing techniques, microfluidic Sanger sequencing, or microchip-based sequencing methods. The present invention also envisages further developments of these techniques, e.g. further improvements of the accuracy of the sequence determination, or the time needed for the determination of the genomic sequence of an organism etc.

According to one embodiment, the sequencing method comprises deep sequencing.

As used herein, the term “deep sequencing” and variations thereof refers to the number of times a nucleotide is read during the sequencing process. Deep sequencing indicates that the coverage, or depth, of the process is many times larger than the length of the sequence under study.

It will be appreciated that any of the analytical methods described herein can be embodied in many forms. For example, it can be embodied on a tangible medium such as a computer for performing the method operations. It can be embodied on a computer readable medium, comprising computer readable instructions for carrying out the method operations. It can also be embodied in electronic device having digital computer capabilities arranged to run the computer program on the tangible medium or execute the instruction on a computer readable medium.

Computer programs implementing the analytical method of the present embodiments can commonly be distributed to users on a distribution medium such as, but not limited to, CD-ROMs or flash memory media. From the distribution medium, the computer programs can be copied to a hard disk or a similar intermediate storage medium. In some embodiments of the present invention, computer programs implementing the method of the present embodiments can be distributed to users by allowing the user to download the programs from a remote location, via a communication network, e.g., the internet. The computer programs can be run by loading the computer instructions either from their distribution medium or their intermediate storage medium into the execution memory of the computer, configuring the computer to act in accordance with the method of this invention. All these operations are well-known to those skilled in the art of computer systems.

Additional methods which rely on the use of bisulfite that may be used to analyze the methylation pattern as described herein are described herein below:

Methylation-Sensitive Single-Nucleotide Primer Extension:

DNA is bisulfite-converted, and bisulfite-specific primers are annealed to the sequence up to the base pair immediately before the CpG of interest. The primer is allowed to extend one base pair into the C (or T) using DNA polymerase terminating dideoxynucleotides, and the ratio of C to T is determined quantitatively. A number of methods can be used to determine this C:T ratio, such as the use of radioactive ddNTPs as the reporter of the primer extension, fluorescence-based methods or Pyrosequencing can also be used. Matrix-assisted laser desorption ionization/time-of-flight (MALDI-TOF) mass spectrometry analysis can be used to differentiate between the two polymorphic primer extension products can be used, in essence, based on the GOOD assay designed for SNP genotyping. Ion pair reverse-phase high-performance liquid chromatography (IP-RP-HPLC) can also be used to distinguish primer extension products.

Base-specific cleavage/MALDI-TOF: This method takes advantage of bisulfite-conversions by adding a base-specific cleavage step to enhance the information gained from the nucleotide changes. By first using in vitro transcription of the region of interest into RNA (by adding an RNA polymerase promoter site to the PCR primer in the initial amplification), RNase A can be used to cleave the RNA transcript at base-specific sites. As RNase A cleaves RNA specifically at cytosine and uracil ribonucleotides, base-specificity is achieved by adding incorporating cleavage-resistant dTTP when cytosine-specific (C-specific) cleavage is desired, and incorporating dCTP when uracil-specific (U-specific) cleavage is desired. The cleaved fragments can then be analyzed by MALDI-TOF. Bisulfite treatment results in either introduction/removal of cleavage sites by C-to-U conversions or shift in fragment mass by G-to-A conversions in the amplified reverse strand. C-specific cleavage will cut specifically at all methylated CpG sites. By analyzing the sizes of the resulting fragments, it is possible to determine the specific pattern of DNA methylation of CpG sites within the region.

The present inventors further contemplate analyzing the methylation status of the at least four sites including the use of methylation-dependent oligonucleotides.

Methylation dependent oligonucleotides hybridize to either the methylated form of the at least one methylation site or the unmethylated form of the at least one methylation site.

According to one embodiment, the methylation dependent olignoucleotide is a probe. In one embodiment, the probe hybridizes to the methylated site to provide a detectable signal under experimental conditions and does not hybridize to the non-methylated site to provide a detectable signal under identical experimental conditions. In another embodiment, the probe hybridizes to the non-methylated site to provide a detectable signal under experimental conditions and does not hybridize to the methylated site to provide a detectable signal under identical experimental conditions. The probes of this embodiment of this aspect of the present invention may be, for example, affixed to a solid support (e.g., arrays or beads).

According to another embodiment, the methylation dependent olignoucleotide is a primer which when used in an amplification reaction is capable of amplifying the target sequence, when the methylation site is methylated. According to another embodiment, the methylation dependent olignoucleotide is a primer which when used in an amplification reaction is capable of amplifying the target sequence, when the methylation site is unmethylated—see for example International PCT Publication No. WO2013131083, the contents of which are incorporated herein by reference.

The methylation-dependent oligonucleotide of this aspect of the present invention need not reflect the exact sequence of the target nucleic acid sequence (i.e. need not be fully complementary), but must be sufficiently complementary so as to distinguish between a methylated and non-methylated site under the particular experimental conditions. Accordingly, the sequence of the oligonucleotide typically has at least 70% homology, preferably at least 80%, 90%, 95%, 97%, 99% or 100% homology, for example over a region of at least 13 or more contiguous nucleotides with the target sequence. The conditions are selected such that hybridization of the oligonucleotide to the methylated site is favored and hybridization to the non-methylated site is minimized (and vice versa).

By way of example, hybridization of short nucleic acids (below 200 bp in length, e.g. 13-50 bp in length) can be effected by the following hybridization protocols depending on the desired stringency; (i) hybridization solution of 6×SSC and 1% SDS or 3 M TMACl, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS, 100 mg/ml denatured salmon sperm DNA and 0.1% nonfat dried milk, hybridization temperature of 1-1.5° C. below the Tm, final wash solution of 3 M TMACl, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS at 1-1.5° C. below the Tm (stringent hybridization conditions) (ii) hybridization solution of 6×SSC and 0.1% SDS or 3 M TMACl, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS, 100 mg/ml denatured salmon sperm DNA and 0.1% nonfat dried milk, hybridization temperature of 2-2.5° C. below the Tm, final wash solution of 3 M TMACl, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS at 1-1.5° C. below the Tm, final wash solution of 6×SSC, and final wash at 22° C. (stringent to moderate hybridization conditions); and (iii) hybridization solution of 6×SSC and 1% SDS or 3 M TMACl, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS, 100 mg/ml denatured salmon sperm DNA and 0.1% nonfat dried milk, hybridization temperature at 2.5-3° C. below the Tm and final wash solution of 6×SSC at 22° C. (moderate hybridization solution).

Oligonucleotides of the invention may be prepared by any of a variety of methods (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual”, 1989, 2.sup.nd Ed., Cold Spring Harbour Laboratory Press: New York, N.Y.; “PCR Protocols: A Guide to Methods and Applications”, 1990, M. A. Innis (Ed.), Academic Press: New York, N.Y.; P. Tijssen “Hybridization with Nucleic Acid Probes—Laboratory Techniques in Biochemistry and Molecular Biology (Parts I and II)”, 1993, Elsevier Science; “PCR Strategies”, 1995, M. A. Innis (Ed.), Academic Press: New York, N.Y.; and “Short Protocols in Molecular Biology”, 2002, F. M. Ausubel (Ed.), 5.sup.th Ed., John Wiley & Sons: Secaucus, N.J.). For example, oligonucleotides may be prepared using any of a variety of chemical techniques well-known in the art, including, for example, chemical synthesis and polymerization based on a template as described, for example, in S. A. Narang et al., Meth. Enzymol. 1979, 68: 90-98; E. L. Brown et al., Meth. Enzymol. 1979, 68: 109-151; E. S. Belousov et al., Nucleic Acids Res. 1997, 25: 3440-3444; D. Guschin et al., Anal. Biochem. 1997, 250: 203-211; M. J. Blommers et al., Biochemistry, 1994, 33: 7886-7896; and K. Frenkel et al., Free Radic. Biol. Med. 1995, 19: 373-380; and U.S. Pat. No. 4,458,066.

For example, oligonucleotides may be prepared using an automated, solid-phase procedure based on the phosphoramidite approach. In such a method, each nucleotide is individually added to the 5′-end of the growing oligonucleotide chain, which is attached at the 3′-end to a solid support. The added nucleotides are in the form of trivalent 3′-phosphoramidites that are protected from polymerization by a dimethoxytriyl (or DMT) group at the 5′-position. After base-induced phosphoramidite coupling, mild oxidation to give a pentavalent phosphotriester intermediate and DMT removal provides a new site for oligonucleotide elongation. The oligonucleotides are then cleaved off the solid support, and the phosphodiester and exocyclic amino groups are deprotected with ammonium hydroxide. These syntheses may be performed on oligo synthesizers such as those commercially available from Perkin Elmer/Applied Biosystems, Inc. (Foster City, Calif.), DuPont (Wilmington, Del.) or Milligen (Bedford, Mass.). Alternatively, oligonucleotides can be custom made and ordered from a variety of commercial sources well-known in the art, including, for example, the Midland Certified Reagent Company (Midland, Tex.), ExpressGen, Inc. (Chicago, Ill.), Operon Technologies, Inc. (Huntsville, Ala.), and many others.

Purification of the oligonucleotides of the invention, where necessary or desirable, may be carried out by any of a variety of methods well-known in the art. Purification of oligonucleotides is typically performed either by native acrylamide gel electrophoresis, by anion-exchange HPLC as described, for example, by J. D. Pearson and F. E. Regnier (J. Chrom., 1983, 255: 137-149) or by reverse phase IPLC (G. D. McFarland and P. N. Borer, Nucleic Acids Res., 1979, 7: 1067-1080).

The sequence of oligonucleotides can be verified using any suitable sequencing method including, but not limited to, chemical degradation (A. M. Maxam and W. Gilbert, Methods of Enzymology, 1980, 65: 499-560), matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (U. Pieles et al., Nucleic Acids Res., 1993, 21: 3191-3196), mass spectrometry following a combination of alkaline phosphatase and exonuclease digestions (H. Wu and H. Aboleneen, Anal. Biochem., 2001, 290: 347-352), and the like.

In certain embodiments, the detection probes or amplification primers or both probes and primers are labeled with a detectable agent or moiety before being used in amplification/detection assays. In certain embodiments, the detection probes are labeled with a detectable agent. Preferably, a detectable agent is selected such that it generates a signal which can be measured and whose intensity is related (e.g., proportional) to the amount of amplification products in the sample being analyzed.

The association between the oligonucleotide and detectable agent can be covalent or non-covalent. Labeled detection probes can be prepared by incorporation of or conjugation to a detectable moiety. Labels can be attached directly to the nucleic acid sequence or indirectly (e.g., through a linker). Linkers or spacer arms of various lengths are known in the art and are commercially available, and can be selected to reduce steric hindrance, or to confer other useful or desired properties to the resulting labeled molecules (see, for example, E. S. Mansfield et al., Mol. Cell. Probes, 1995, 9: 145-156).

Methods for labeling nucleic acid molecules are well-known in the art. For a review of labeling protocols, label detection techniques, and recent developments in the field, see, for example, L. J. Kricka, Ann. Clin. Biochem. 2002, 39: 114-129; R. P. van Gijlswijk et al., Expert Rev. Mol. Diagn. 2001, 1: 81-91; and S. Joos et al., J. Biotechnol. 1994, 35: 135-153. Standard nucleic acid labeling methods include: incorporation of radioactive agents, direct attachments of fluorescent dyes (L. M. Smith et al., Nucl. Acids Res., 1985, 13: 2399-2412) or of enzymes (B. A. Connoly and O. Rider, Nucl. Acids. Res., 1985, 13: 4485-4502); chemical modifications of nucleic acid molecules making them detectable immunochemically or by other affinity reactions (T. R. Broker et al., Nucl. Acids Res. 1978, 5: 363-384; E. A. Bayer et al., Methods of Biochem. Analysis, 1980, 26: 1-45; R. Langer et al., Proc. Natl. Acad. Sci. USA, 1981, 78: 6633-6637; R. W. Richardson et al., Nucl. Acids Res. 1983, 11: 6167-6184; D. J. Brigati et al., Virol. 1983, 126: 32-50; P. Tchen et al., Proc. Natl. Acad. Sci. USA, 1984, 81: 3466-3470; J. E. Landegent et al., Exp. Cell Res. 1984, 15: 61-72; and A. H. Hopman et al., Exp. Cell Res. 1987, 169: 357-368); and enzyme-mediated labeling methods, such as random priming, nick translation, PCR and tailing with terminal transferase (for a review on enzymatic labeling, see, for example, J. Temsamani and S. Agrawal, Mol. Biotechnol. 1996, 5: 223-232). More recently developed nucleic acid labeling systems include, but are not limited to: ULS (Universal Linkage System), which is based on the reaction of mono-reactive cisplatin derivatives with the N7 position of guanine moieties in DNA (R. J. Heetebrij et al., Cytogenet. Cell. Genet. 1999, 87: 47-52), psoralen-biotin, which intercalates into nucleic acids and upon UV irradiation becomes covalently bonded to the nucleotide bases (C. Levenson et al., Methods Enzymol. 1990, 184: 577-583; and C. Pfannschmidt et al., Nucleic Acids Res. 1996, 24: 1702-1709), photoreactive azido derivatives (C. Neves et al., Bioconjugate Chem. 2000, 11: 51-55), and DNA alkylating agents (M. G. Sebestyen et al., Nat. Biotechnol. 1998, 16: 568-576).

Any of a wide variety of detectable agents can be used in the practice of the present invention. Suitable detectable agents include, but are not limited to, various ligands, radionuclides (such as, for example, 32P, 35S, 3H, 14C, 125I, 131I, and the like); fluorescent dyes (for specific exemplary fluorescent dyes, see below); chemiluminescent agents (such as, for example, acridinium esters, stabilized dioxetanes, and the like); spectrally resolvable inorganic fluorescent semiconductor nanocrystals (i.e., quantum dots), metal nanoparticles (e.g., gold, silver, copper and platinum) or nanoclusters; enzymes (such as, for example, those used in an ELISA, i.e., horseradish peroxidase, beta-galactosidase, luciferase, alkaline phosphatase); colorimetric labels (such as, for example, dyes, colloidal gold, and the like); magnetic labels (such as, for example, Dynabeads™); and biotin, dioxigenin or other haptens and proteins for which antisera or monoclonal antibodies are available.

In certain embodiments, the inventive detection probes are fluorescently labeled. Numerous known fluorescent labeling moieties of a wide variety of chemical structures and physical characteristics are suitable for use in the practice of this invention. Suitable fluorescent dyes include, but are not limited to, fluorescein and fluorescein dyes (e.g., fluorescein isothiocyanine or FITC, naphthofluorescein, 4′,5′-dichloro-2′,7′-dimethoxy-fluorescein, 6 carboxyfluorescein or FAM), carbocyanine, merocyanine, styryl dyes, oxonol dyes, phycoerythrin, erythrosin, eosin, rhodamine dyes (e.g., carboxytetramethylrhodamine or TAMRA, carboxyrhodamine 6G, carboxy-X-rhodamine (ROX), lissamine rhodamine B, rhodamine 6G, rhodamine Green, rhodamine Red, tetramethylrhodamine or TMR), coumarin and coumarin dyes (e.g., methoxycoumarin, dialkylaminocoumarin, hydroxycoumarin and aminomethylcoumarin or AMCA), Oregon Green Dyes (e.g., Oregon Green 488, Oregon Green 500, Oregon Green 514), Texas Red, Texas Red-X, Spectrum Red™, Spectrum Green™, cyanine dyes (e.g., Cy-3™, Cy-5™, Cy-3.5™, Cy-5.5™), Alexa Fluor dyes (e.g., Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 633, Alexa Fluor 660 and Alexa Fluor 680), BODIPY dyes (e.g., BODIPY FL, BODIPY R6G, BODIPY TMR, BODIPY TR, BODIPY 530/550, BODIPY 558/568, BODIPY 564/570, BODIPY 576/589, BODIPY 581/591, BODIPY 630/650, BODIPY 650/665), IRDyes (e.g., IRD40, TRD 700, IRD 800), and the like. For more examples of suitable fluorescent dyes and methods for linking or incorporating fluorescent dyes to nucleic acid molecules see, for example, “The Handbook of Fluorescent Probes and Research Products”, 9th Ed., Molecular Probes, Inc., Eugene, Oreg. Fluorescent dyes as well as labeling kits are commercially available from, for example, Amersham Biosciences, Inc. (Piscataway, N.J.), Molecular Probes Inc. (Eugene, Oreg.), and New England Biolabs Inc. (Beverly, Mass.). Another contemplated method of analyzing the methylation status of the sequences is by analysis of the DNA following exposure to methylation-sensitive restriction enzymes—see for example US Application Nos. 20130084571 and 20120003634, the contents of which are incorporated herein.

It will be appreciated that analysis of the methylation status according to methods described herein allows for the accurate determination of cellular source of a DNA molecule, even when the majority of the DNA of the sample is derived from a different cellular source. The present inventors have shown that they are able to determine the cellular source of a particular DNA even when its contribution to the total amount of DNA in the population is less than 1:1000, less than 1:5,000, 1:10,000 or even 1:100,000.

Pathological and disease conditions that involve cell death cause the release of degraded DNA from dying cells into body fluids (blood, plasma, urine, cerebrospinal fluid). Thus, the methods described herein may be used to analyze the amount of cell death of a particular cell population in those body fluids. The amount of cell death of a particular cell population can then be used to diagnose a particular pathological state (e.g. disease) or condition (e.g. trauma).

Thus, according to another aspect of the present invention there is provided a method of detecting death of a cell type or tissue in a subject comprising determining whether cell-free DNA comprised in a fluid sample of the subject is derived from the cell type or tissue, wherein the determining is effected by ascertaining the methylation status of at least four methylation sites on a continuous sequence of the cell-free DNA, the sequence comprising no more than 300 nucleotides, wherein a methylation status of each of the at least four methylation sites on the continuous sequence of the DNA characteristic of the cell type or tissue is indicative of death of the cell type or tissue.

It will be appreciated that death of a particular cell type may be associated with a pathological state—e.g. disease or trauma.

The monitoring of the death of a particular cell type may also be used for monitoring the efficiency of a therapeutic regime expected to effect cell death of a specific cell type.

The determination of death of a specific cell type may also be used in the clinical or scientific study of various mechanism of healthy or diseased subjects.

Thus, for example measurement of pancreatic beta cell death is important in cases of diabetes, hyperinsulinism and islet cell tumors, and in order to monitor beta cell survival after islet transplantation, determining the efficacy of various treatment regimes used to protect beta cells from death, and determining the efficacy of treatments aimed at causing islet cell death in islet cell tumors. Similarly, the method allows the identification and quantification of DNA derived from dead kidney cells (diagnostic of kidney failure), dead neurons (diagnostic of traumatic brain injury, amyotrophic lateral sclerosis (ALS), stroke, Alzheimer's disease, Parkinson's disease or brain tumors, with or without treatment); dead pancreatic acinar cells (diagnostic of pancreatic cancer or pancreatitis); dead lung cells (diagnostic of lung pathologies including lung cancer); dead adipocytes (diagnostic of altered fat turnover), dead hepatocytes (indicative of liver failure, liver toxicity or liver cancer) dead cardiomyocytes (indicative of cardiac disease, or graft failure in the case of cardiac transplantation), dead skeletal muscle cells (diagnostic of muscle injury and myopathies), dead oligodendrocytes (indicative of relapsing multiple sclerosis, white matter damage in amyotrophic lateral sclerosis, or glioblastoma).

According to one embodiment, the sequences analyzed have methylation patterns which characterize the normal cell of interest and is not methylation patterns characterizing a diseased cell (is not for example a methylation pattern characterizing cancer cells of a specific type). Exemplary sequences which may be analyzed are comprised in sequences set forth in SEQ ID NOs: 1-1484. These sequences are further described herein above.

As used herein, the term “diagnosing” refers to determining the presence of a disease, classifying a disease, determining a severity of the disease (grade or stage), monitoring disease progression and response to therapy, forecasting an outcome of the disease and/or prospects of recovery.

The method comprises quantifying the amount of cell-free DNA which is comprised in a fluid sample (e.g. a blood sample) of the subject which is derived from a cell type or tissue. When the amount of cell free DNA derived from the cell type or tissue is above a predetermined level, it is indicative that there is a predetermined level of cell death. When the level of cell death is above a predetermined level, it is indicative that the subject has the disease or pathological state. Determining the predetermined level may be carried out by analyzing the amount of cell-free DNA present in a sample derived from a subject known not to have the disease/pathological state. If the level of the cell-free DNA derived from a cell type or tissue associated with the disease in the test sample is statistically significantly higher than the level of cell-free DNA derived from the same cell type or tissue in the sample obtained from the healthy (non-diseased subject), it is indicative that the subject has the disease. Alternatively, or additionally, determining the predetermined level may be carried out by analyzing the amount of cell-free DNA present in a sample derived from a subject known to have the disease. If the level of the cell-free DNA derived from a cell type or tissue associated with the disease in the test sample is statistically significantly similar to the level of the cell-free DNA derived from a cell type of tissue associated with the disease in the sample obtained from the diseased subject, it is indicative that the subject has the disease.

The severity of disease may be determined by quantifying the amount of DNA molecules having the specific methylation pattern of a cell population associated with the disease. Quantifying the amount of DNA molecules having the specific methylation pattern of a target tissue may be achieved using a calibration curve produced by using known and varying numbers of cells from the target tissue.

According to one embodiment, the method comprises determining the ratio of the amount of cell free DNA derived from a cell of interest in the sample: amount of overall cell free DNA.

According to still another embodiment, the method comprises determining the ratio of the amount of cell free DNA derived from a cell of interest in the sample: amount of cell free DNA derived from a second cell of interest.

The methods described herein may also be used to determine the efficacy of a therapeutic agent or treatment, wherein when the amount of DNA associated with a cell population associated with the disease is decreased following administration of the therapeutic agent, it is indicative that the agent or treatment is therapeutic.

Kits

Any of the components described herein may be comprised in a kit. In a non-limiting example the kit comprises at least one primer pair capable of amplifying a DNA sequence whose methylation status is indicative of a disease, as described herein above. According to one embodiment, the primers further comprise barcode sequences and/or sequences which allow for downstream sequencing, as further described herein above. Such primer sequences include for example those set forth in SEQ ID NOs: 1485-1496. According to one embodiment, each primer of the primer pair is comprised in a suitable container. According to another embodiment, the kit comprises two primer pairs capable of amplifying two different DNA sequences whose methylation status is indicative of a disease, as described herein above. According to another embodiment, the kit comprises three primer pairs capable of amplifying three different DNA sequences whose methylation status is indicative of a disease, as described herein above. According to another embodiment, the kit comprises four primer pairs capable of amplifying four different DNA sequences whose methylation status is indicative of a disease, as described herein above. According to another embodiment, the kit comprises five or more primer pairs capable of amplifying the five or more different DNA sequences whose methylation status is indicative of a disease, as described herein above.

In another non-limiting example the kit comprises oligonucleotides which are capable of detecting the methylation status of at least four methylation sites in a nucleic acid sequence, the nucleic acid sequence being no longer than 300 base pairs and comprising at least four methylation sites which are differentially methylated in a first cell of interest with respect to a second cell which is non-identical to the first cell of interest. The kit may comprise one oligonucleotide which is capable of detecting the methylation status of the at least four methylation sites in a nucleic acid sequence. The kit may comprise two oligonucleotides which, in combination are capable of detecting the methylation status of the at least four methylation sites in a nucleic acid sequence. The kit may comprise three oligonucleotides which, in combination are capable of detecting the methylation status of the at least four methylation sites in a nucleic acid sequence. The kit may comprise four oligonucleotides which, in combination are capable of detecting the methylation status of the at least four methylation sites in a nucleic acid sequence. The oligonucleotides of this aspect of the present invention may be labeled with a detectable moiety as further described herein above.

Additional components that may be included in any of the above described kits include at least one of the following components: bisulfite (and other reagents necessary for the bisulfite reaction), a polymerase enzyme, reagents for purification of DNA, MgCl2. The kit may also comprise reaction components for sequencing the amplified or non-amplified sequences.

The kits may also comprise DNA sequences which serve as controls. Thus, for example, the kit may comprise a DNA having the same sequence as the amplified sequence derived from a healthy subject (to serve as a negative control) and/or a DNA having the same sequence as the amplified sequence derived from a subject known to have the disease which is being investigated (to serve as a positive control).

In addition, the kits may comprise known quantities of DNA such that calibration and quantification of the test DNA may be carried out.

The containers of the kits will generally include at least one vial, test tube, flask, bottle, syringe or other containers, into which a component may be placed, and preferably, suitably aliquoted. Where there is more than one component in the kit, the kit also will generally contain a second, third or other additional container into which the additional components may be separately placed. However, various combinations of components may be comprised in a container.

When the components of the kit are provided in one or more liquid solutions, the liquid solution can be an aqueous solution. However, the components of the kit may be provided as dried powder(s). When reagents and/or components are provided as a dry powder, the powder can be reconstituted by the addition of a suitable solvent.

A kit will preferably include instructions for employing, the kit components as well the use of any other reagent not included in the kit. Instructions may include variations that can be implemented.

It is expected that during the life of a patent maturing from this application many relevant sequencing technologies will be developed (including those that will be able to determine methylation status, without bisulfite treatment) and the scope of the term sequencing is intended to include all such new technologies apriori.

As used herein the term “about” refers to ±10%.

The terms “comprises”, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to”.

The term “consisting of” means “including and limited to”.

The term “consisting essentially of” means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.

As used herein, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a compound” or “at least one compound” may include a plurality of compounds, including mixtures thereof.

Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

As used herein the term “method” refers to manners, means, techniques and procedures for accomplishing a given task including, but not limited to, those manners, means, techniques and procedures either known to, or readily developed from known manners, means, techniques and procedures by practitioners of the chemical, pharmacological, biological, biochemical and medical arts.

As used herein, the term “treating” includes abrogating, substantially inhibiting, slowing or reversing the progression of a condition, substantially ameliorating clinical or aesthetical symptoms of a condition or substantially preventing the appearance of clinical or aesthetical symptoms of a condition.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.

EXAMPLES

Reference is now made to the following examples, which together with the above descriptions illustrate some embodiments of the invention in a non limiting fashion.

Generally, the nomenclature used herein and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological and recombinant DNA techniques. Such techniques are thoroughly explained in the literature. See, for example, “Molecular Cloning: A laboratory Manual” Sambrook et al., (1989); “Current Protocols in Molecular Biology” Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., “Current Protocols in Molecular Biology”, John Wiley and Sons, Baltimore, Md. (1989); Perbal, “A Practical Guide to Molecular Cloning”, John Wiley & Sons, New York (1988); Watson et al., “Recombinant DNA”, Scientific American Books, New York; Birren et al. (eds) “Genome Analysis: A Laboratory Manual Series”, Vols. 1-4, Cold Spring Harbor Laboratory Press, New York (1998); methodologies as set forth in U.S. Pat. Nos. 4,666,828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057; “Cell Biology: A Laboratory Handbook”, Volumes I-III Cellis, J. E., ed. (1994); “Culture of Animal Cells—A Manual of Basic Technique” by Freshney, Wiley-Liss, N.Y. (1994), Third Edition; “Current Protocols in Immunology” Volumes I-III Coligan J. E., ed. (1994); Stites et al. (eds), “Basic and Clinical Immunology” (8th Edition), Appleton & Lange, Norwalk, Conn. (1994); Mishell and Shiigi (eds), “Selected Methods in Cellular Immunology”, W. H. Freeman and Co., New York (1980); available immunoassays are extensively described in the patent and scientific literature, see, for example, U.S. Pat. Nos. 3,791,932; 3,839,153; 3,850,752; 3,850,578; 3,853,987; 3,867,517; 3,879,262; 3,901,654; 3,935,074; 3,984,533; 3,996,345; 4,034,074; 4,098,876; 4,879,219; 5,011,771 and 5,281,521; “Oligonucleotide Synthesis” Gait, M. J., ed. (1984); “Nucleic Acid Hybridization” Hames, B. D., and Higgins S. J., eds. (1985); “Transcription and Translation” Hames, B. D., and Higgins S. J., eds. (1984); “Animal Cell Culture” Freshney, R. I., ed. (1986); “Immobilized Cells and Enzymes” IRL Press, (1986); “A Practical Guide to Molecular Cloning” Perbal, B., (1984) and “Methods in Enzymology” Vol. 1-317, Academic Press; “PCR Protocols: A Guide To Methods And Applications”, Academic Press, San Diego, Calif. (1990); Marshak et al., “Strategies for Protein Purification and Characterization—A Laboratory Course Manual” CSHL Press (1996); all of which are incorporated by reference as if fully set forth herein. Other general references are provided throughout this document. The procedures therein are believed to be well known in the art and are provided for the convenience of the reader. All the information contained therein is incorporated herein by reference.

Materials and Methods

Patients: All clinical studies were approved by the relevant local ethics committees and informed consent was obtained from all subjects or their legal guardians prior to blood sampling.

1. Recently diagnosed T1D patients: plasma was prepared from 11 patients (ages 4-20, mean 9.5 years) that were diagnosed with T1D 1-4 months earlier.

2. Islet graft recipients: Patients were 44-57 years old, with T1D duration of 10-36 years and poorly controlled blood glucose levels (HbA1C 6.4-10). Anti rejection therapy included Alemtuzumab, Etanercept and Anakinra. Maintenance therapy included tacrolimus and MMF.

3. MS/NMO patients: MS and NMO patients were diagnosed according to the 2010 McDonald criteria (1) and the NMO diagnostic criteria respectively (2). Patients characteristics were as follows: Relapsing-remitting MS patients, n=49, 74% females, average age=36±12.5, range 18-68 years, disease duration 4±4.5 years, range 0-14 years. Expanded Disability Status Scale (EDSS)=2.8±1.8, range 1-7.5. None of the patients had received steroid treatment for 2 months prior to testing. In patients in relapse, blood was drawn prior to IV steroid therapy. At the time of sampling one patient was treated with Copaxone and 4 with Imuran.

4. Cardiac arrest patients: samples were collected at the intensive care unit (ICU) at Uppsala University Hospital, Sweden. Unconscious patients with cardiac arrest were resuscitated with restoration of spontaneous circulation (ROSC). Hypothermia treatment to a body temperature of 32-34° C. for 24 h, ventilation, and pharmacologic support were administered immediately after resuscitation as described (Mortberg E et al., Resuscitation 2011; 82:26-31). Patients were defined as comatose if they were (i) not awake, (ii) not following any commands, and (iii) not responding to any stimuli. All patients received an arterial line in the radial or femoral artery for blood sampling. Serial blood samples were collected, starting as soon as possible in the emergency phase and continuing at 24, 48, 72, 96 and 108 hours after cardiac arrest. Serum aliquots were frozen at −70° C. until analysis.

5. Traumatic brain injury: 5 patients (3 males and 2 females, all Caucasian, average age 39) with isolated severe traumatic brain injury (TBI) were enrolled from a clinical study at the Neurointensive Care Unit (NICU) at the Sahlgrenska University Hospital, Gothenburg, Sweden. All patients had a mixture of focal contusions and general oedema. All had severe TBI according to the following criteria: 1) Reaction Level Scale (RLS) 4, corresponding to a score of 8 on the Glasgow Coma Scale (3=no response, 15=awake) (3); 2) were in need on ventilator treatment; 3) monitoring of intracranial pressure (ICP). Venous blood samples were taken on serial days after trauma. After clinical and radiologic evaluations, the patients underwent neurosurgical intervention within hours after admission, to receive an indwelling ventricular catheter for intracranial pressure (ICP) monitoring/therapeutic CSF drainage. When appropriate, space-occupying lesions like hemorrhages and contusions were surgically removed. Patients were then treated in accordance with a standardized protocol: “the Lund concept”, with the aim to maintain cerebral perfusion pressure above 60 mm Hg and the intracranial pressure of below 20 mm Hg (4). Data collected included demographic and clinical variables such as age, sex and time of injury. Physiologic and laboratory variables were continuously recorded throughout the study period and concomitantly adjusted to be kept within the following limits: Hemoglobin >120 g/L, serum sodium >135 to <150 mmol/L, serum potassium >4.0 to <5.0 mmol/L, serum albumin >35 to <50 g/L, core temperature 37+−0.5° C., mean arterial blood pressure (MABP) between 70 and 100 mm Hg, ICP <20 mm Hg, cerebral perfusion pressure CPP (MABP-ICP) >60 mm Hg, P02 >12 to <18 kPa, PCO2 around 4.5 kPa and normalized pH. Blood glucose was kept between 4 and 6 mmol/L according to NICU routine. No patient has received steroids.

6. Pancreatic cancer and chronic pancreatitis: Plasma was prepared from 42 patients with pathologically confirmed pancreatic adenocarcinoma (28 males and 14 females; average age 68, range 41-87) and 10 patients with chronic pancreatitis. The American joint committee on cancer TNM staging of pancreatic cancer (2010) was used. At the time of plasma collection 29 patients had a localized disease (4 at stage I and 25 at Stage II, all pre-operative) and 13 patients had a metastatic disease, Stage IV.

7. Healthy Controls: A total of 40 healthy individuals (50% females; ages 22 60) volunteered to participate in the study as unpaid healthy controls. All denied having any signs of symptoms related to the disease state studied.

DNA processing: Cell free DNA was isolated from plasma or serum using a kit (Qiaquick, Qiagen), and treated with bisulfite (Zymo research). Bisulfite treated DNA was PCR amplified, using primers specific for bisulfite-treated DNA but independent of methylation status at monitored CpG sites. Primers were bar-coded, allowing to mix samples from different individuals when sequencing products using MiSeq (Illumina).

Methylome of pancreatic acinar and ductal cells: Duct and acinar cells were isolated from dissociated cadaveric human pancreata as previously described. Live cells were stained with cell surface markers and sorted. Genomic DNA was isolated using phenol/chloroform and processed for Illumina 450k arrays per manufacturer instructions.

Insulin

    • Primer sequences: SEQ ID NOs: 1485 and 1486

MBP3

    • Primer sequences: SEQ ID NOs: 1495 and 1496

CG10809560 (WM1)

    • Primer sequences: SEQ ID NOs: 1491 and 1492

CG09787504

    • Primer sequences: SEQ ID NOs: 1493 and 1494

Pancreas (CUX2)

    • Primer sequences: SEQ ID NOs: 1487 and 1488

Pancreas (REG1A)

    • Primer sequences: SEQ ID NOs: 1489 and 1490

Statistical analysis: To assess the significance of differences between groups a 2-tailed MannWhitney test was used, based on values of unmethylated tissue-specific DNA in each patient.

Example 1 Identification of Tissue-Specific Methylation Markers

The present inventors identified tissue-specific DNA methylation markers, distinguishing individual tissues or cell types from other tissues. Particular attention was given to markers that differ between a tissue of interest and hematopoietic cells, which are thought to be the main contributor of cfDNA and therefore the major potential source of noise in the system. They analyzed publicly available methylomes (mostly Illumina 450k array data from The Cancer Genome Atlas [TCGA] and Gene Expression Omnibus [GEO]) to identify individual CpG dinucleotides with differential methylation patterns, unmethylated specifically in one tissue and methylated elsewhere (see schematic of procedure, FIGS. 5A-B).

The Illumina arrays provide information on the methylation status of individual CpG dinucleotides. The discriminatory power of each site is limited, since it can be randomly methylated or unmethylated in a small fraction of molecules from tissues where it is typically unmethylated or methylated, respectively. To increase the signal to noise ratio of the assay, the present inventors took advantage of the regional nature of DNA methylation. They defined an “expanded window” of 4-9 CpG sites adjacent to the original CpG marker site, reasoning that the chances are small for accidental methylation or demethylation of multiple adjacent cytosines in the same molecule. To determine the status of methylation of these expanded windows, they obtained DNA from different human tissues, and treated it with bisulfite to convert unmethylated cytosines to uracils. Short fragments containing the signature CpG site and multiple adjacent CpGs were then PCR-amplified, and sequenced multiple molecules from the PCR product using Illumina MiSeq.

As an alternative approach to the comparisons between Illumina methylome arrays, in some cases the present inventors selected and validated tissue specific markers based on promoters of known tissue specific genes, (which might not be adequately represented in the Illumina arrays) (FIGS. 5A-B). As shown in the examples below, scoring for DNA molecules in which multiple adjacent CpG sites share the same tissue-specific methylation pattern gave a dramatically higher discriminatory power between the tissue of interest and other tissues, compared with the information content of individual CpG sites. Thus, the present inventors have defined short sequences of DNA, containing 4-9 CpG sites, whose combined methylation status constitutes an epigenetic signature unique to a tissue of interest relative to blood cells and other tissues.

Example 2 Presence of Unmethylated Insulin Gene Promoter in the Circulation of T1D Patients

To detect cfDNA derived from beta-cells, the insulin gene promoter was used as a beta-cell specific methylation marker. Previous studies seeking to identify DNA derived from beta-cells in peripheral blood samples have utilized methylation-specific PCR based on the methylation status of 2-3 CpG dinucleotides in the insulin promoter (J et al., Diabetes 62, 1676-1680 (2013)). However, the insulin promoter contains additional CpG sites in close proximity, which can be used to improve the distinction between DNA of beta-cells and other tissues (FIG. 1A). To test this concept, a 160 bp fragment of the insulin gene promoter was amplified from bisulfite-treated DNA obtained from multiple tissues, and the product was sequenced to determine the methylation status of each CpG in each tissue. As shown in FIG. 1B, each CpG was unmethylated in 90-95% of the DNA molecules from human beta-cells, and in 5-15% of the DNA molecules from other tissues. However when this information was combined and the fraction of DNA molecules in which all 6 CpG sites were unmethylated was calculated, the difference between beta cells and all other tissues became dramatically larger: while ˜80% of DNA molecules from beta-cell were fully unmethylated, <0.01% of the molecules from any other tissue were fully unmethylated. Thus a stretch of 6 adjacent unmethylated CpG sites in the insulin gene promoter (comprised in SEQ ID NO: 1241) robustly distinguishes beta-cells from other tissues with a signal to noise ratio close to 10,000:1.

This information was then used to look for beta-cell derived cfDNA in the circulation of T1D patients. Plasma DNA from patients was treated with bisulfite, PCR-amplified and sequenced to determine the fraction of molecules containing fully unmethylated insulin promoter DNA. The fraction obtained was multiplied by the concentration of cfDNA measured in each sample, to obtain a value of beta-cell derived DNA (ng/ml) circulating in the blood of each patient (FIGS. 5A-B). The cfDNA of healthy volunteers (n=25) had an extremely low frequency of fully unmethylated insulin gene promoter molecules (up to 0.12% of circulating insulin promoter fragments). When multiplied by the total amount of cfDNA in each individual, it was found that <0.06 ng/ml of circulating DNA derived from beta-cells (equivalent to 10 genomes), consistent with a very low rate of beta-cell turnover in healthy adults (FIG. 1C). Plasma from T1D patients (n=11), sampled 2-16 weeks after diagnosis, showed a clear signal of unmethylated insulin promoter DNA in cfDNA (1.06-8.6 ng beta-cell DNA per 1 ml of plasma), indicative of ongoing autoimmune destruction of beta-cells (FIG. 1C).

Plasma samples taken from long-time T1D patients transplanted with cadaveric allogeneic islets and treated with immune suppressants were also studied. As shown in FIG. 1D, the plasma of all patients had a high signal (unmethylated insulin DNA) 1-2 hours after transplantation, which dramatically declined in the hours and days that followed. The extensive loss of grafted beta-cells immediately after transplantation, potentially resulting from acute ischemia, is consistent with a previous imaging study of a transplanted patient. In most patients, signals clearly above background were detected at 7 days and even 1 month after transplantation, suggesting continuous, low level loss of beta-cells despite immune suppression.

To confirm that the combined methylation pattern of multiple CpG sites at the insulin gene promoter was necessary to detect beta-cell derived DNA in the circulation, the methylation status of each individual CpG in the plasma of healthy

individuals and recently diagnosed T1D patients was examined. Each individual CpG did not have a different pattern in the plasma of healthy controls and T1D patients (unmethylated in 224 ˜15% of cfDNA molecules). However collectively the 6 CpG sites yielded a clear signal in the plasma of T1D patients that was absent in healthy controls (FIGS. 6A-B).

These results support the very high sensitivity and specificity of this next-generation sequencing (NGS)-based method for the detection of cfDNA derived from specific tissues. With respect to T1D, the complete separation between the signal in healthy controls and recently diagnosed patients compared favorably with previous reports which demonstrated a significant signal overlap between healthy controls and diabetic patients. This suggests that it may be possible to use this assay to identify beta cell death prior to clinical diagnosis as well as in additional settings of interest, such as monitoring of efficacy of immune suppression used to prevent destruction of transplanted beta-cells.

Example 3

Identification of Oligodendrocyte-Derived cfDNA in Multiple Sclerosis

Non-invasive detection of brain cell death is particularly challenging. In theory, brain-specific methylation patterns can be used to identify brain-derived cfDNA. The present inventors looked for evidence of oligodendrocyte/glial DNA circulating in the blood of patients with multiple sclerosis (MS) and neuromyelitis optica (NMO), autoimmune diseases in which myelin-producing oligodendrocytes in the white matter and astrocytes are destroyed. The published methylome of normal human white matter was analyzed, and clusters of adjacent CpG sites were identified in the 3′ UTR of Myelin Basic Protein (termed here MBP3) and around an un-annotated locus (CG10809560 in the Illumina array, which was termed WM1 for White Matter 1) which were unmethylated selectively in oligodendrocytes (FIG. 2A). As with the insulin gene promoter, individual CpGs in these clusters showed a moderate signal to noise ratio: they were unmethylated in 60-85% of DNA molecules derived from sources rich for oligodendrocytes (glial preps, white matter and whole brain), and in 2-35% of DNA from other tissues (FIGS. 7A-E and 8A-E). Combining all CpGs at the MBP3 and WM1 loci greatly increased the discrimination between DNA enriched for oligodendrocytes and DNA from other sources including blood (FIGS. 7A-E and 8A-E). Thus DNA from the MBP3 (comprised in SEQ ID NO: 1248) or WM1 (comprised in SEQ ID NO: 1247) loci unmethylated in all adjacent CpG sites can serve as an exclusive marker of oligodendrocytes.

Healthy individuals (n=19) had negligible levels of unmethylated MBP3 or WM1 in their plasma, suggesting minimal basal turnover of oligodendrocytes (FIG. 2B). Most stable MS patients (n=30) had no or very low signal. However most patients during disease exacerbation (a relapse documented both clinically and using brain MRI close to the time of sampling, n=19) displayed in their plasma unmethylated DNA of either MBP3, WM1 or both (FIG. 2B). This observation is consistent with the notion that short-lived circulating unmethylated MBP3 or WM1 DNA reflects acute oligodendrocyte cell death. Initial analysis did not reveal clinical correlates to the lack of signal in some relapsing patients. No correlation was observed between the signal in blood and age, sex, EDSS (Expanded Disability Status Scale) or disease duration. These results indicate that acute autoimmune destruction of oligodendrocytes can be manifested as increased circulating levels of fully unmethylated DNA fragments from the MBP3 or WM1 loci, opening the way to develop a sensitive test for the diagnosis and monitoring of demyelinating diseases. Additional methylation markers of oligodendrocyte can be developed to further increase the specificity and sensitivity of the assay.

Example 4

Identification of Brain-Derived cfDNA after Acute Brain Damage

To obtain a more general marker for brain injury, the Illumina arrays were scanned for loci whose methylation status distinguished brain DNA from other tissues. A cluster of 9 CpG sites around locus CG09787504 (termed here Brain1; comprised in SEQ ID NO: 1251) was fully unmethylated in 70% of DNA from various sources of brain tissue (enriched for either neurons or glia), and in <5% of DNA molecules from other tissues (likely reflecting DNA of peripheral neurons present in these tissues). Importantly, <0.03% of molecules in blood were unmethylated, providing a >2000 fold difference in methylation of this locus between brain and blood (FIG. 3A and FIGS. 9A-E).

Healthy individuals had extremely low levels of fully unmethylated Brain1 in the plasma (FIG. 3B). This low baseline may reflect either neuronal turnover below the assay's sensitivity limit or an alternative mechanism for the clearance of DNA from dying brain cells. Plasma samples were examined from patients in two settings of brain damage, both known to involve neuronal injury in combination with disruption of the blood-brain barrier. Strikingly, patients (n=10) sampled at multiple time points after cardiac arrest with documented ischemic brain damage showed high levels of unmethylated Brain1 in plasma (FIG. 3C). Similarly, patients (n=5) hospitalized in an intensive care unit for neurotrauma after severe traumatic brain injury (TBI) had elevated unmethylated Brain1 in plasma (FIG. 3D). Both sets of results are consistent with circulating DNA fragments derived from dead brain cells (neurons and/or glia) in these patients. The amount and temporal patterns of brain-derived cfDNA varied between patients. In the group with cardiac arrest, the strongest signals were observed in the first time point, shortly after resuscitation, with a decline in subsequent days in most patients. In the group of patients with TBI, a more delayed pattern of brain-derived cfDNA was observed. These findings indicate that brain-specific DNA as well as oligodendrocyte-specific DNA can be identified in the circulation of patients with neuroinflammatory, traumatic and ischemic brain pathologies, based on unique methylation markers.

Example 5

Identification of Exocrine Pancreas-Derived cfDNA in Pancreatic Cancer and Pancreatitis

The present inventors tested whether the approach can be used to detect cfDNA in the context of cancer. While tumors present extensive changes in the methylome compared with normal tissue, the majority of tissue-specific methylation sites are thought to remain intact in tumors. Thus cell death in tumors should give rise to cfDNA carrying the normal methylation patterns of the tumor's tissue of origin. Pancreatic ductal adenocarcinoma is thought to originate from either acinar or duct cells in the exocrine pancreas. Antibodies were used to FACS-purified duct and acinar cells from cadaveric human material and their methylomes were obtained using Illumina 450k arrays (unpublished results). Analysis of these data revealed multiple CpGs that were unmethylated in the exocrine pancreas and methylated in most other tissues. Two sites were selected for further analysis and clusters of adjacent CpGs that could be used as markers for the exocrine pancreas were identified that could distinguish acinar and ductal cells from other cell types (FIG. 4A, FIGS. 10A-E and 11A-E). Healthy subjects (n=25) had very low levels of unmethylated exocrine pancreas markers in their cfDNA, consistent with a low turnover of this tissue (FIG. 4B). Nearly half the patients with pancreatic cancer (20 out of 42) displayed exocrine pancreas-derived cfDNA above background level (FIG. 4C). There was a trend to a stronger signal in patients with advanced disease, and these patients were more likely to show a signal above background. Nonetheless, some patients at stage 1 and 2 (localized disease) had a clear signal (11 out of 29), suggesting that this method can in principle identify cell death in pancreatic cancer at a resectable stage.

In order to further test the hypothesis that all etiologies of cell death lead to increases in tissue specific cfDNA, the plasma of patients with chronic pancreatitis was examined. Indeed, 7 out of 10 patients with this non-malignant disease had elevated pancreas-derived cfDNA levels (FIG. 4C). It was noted that patients with pancreatitis had a clearer signal with a marker (REG1A) that was unmethylated in both acinar and ductal cells, while patients with pancreatic cancer had a stronger signal with a marker (CUX2) that was preferentially unmethylated in ductal cells, potentially reflecting the epigenetic identity of dying cells in each pathology. In summary, cfDNA carrying methylation patterns of the exocrine pancreas is present in the blood of patients with pancreatic cancer and pancreatitis, reflecting death of exocrine cells in these conditions.

Example 6

Identification of Brain-Derived DNA in Plasma of Patients with ALS

In the neurodegenerative disease Amyotrophic lateral sclerosis (ALS), motor neurons die, followed by death of muscle cells. The present inventors tested if brain-derived DNA fragments could be identified in the circulation of patients with ALS. As shown in FIG. 12, 40% of ALS patients (n=29) showed measurable levels of brain-derived DNA in their plasma (based on fraction of unmethylated CG0978 [Brain1], multiplied by the total amount of cell-free DNA in their plasma), compared with near zero signal in healthy individuals (n=12). In addition, the present inventors looked for the presence of glial DNA in the plasma of ALS patients. FIG. 13 shows that 70% of ALS patients tested (n=10) had in their plasma at least of two glial markers (WM1 and MBP3), compared with no signal in healthy individuals. These findings are consistent with clinical reports on damage to white matter in ALS patients. In preliminary analysis the present inventors also identified muscle DNA in the blood of ALS patients (data not shown). Thus, the assay can be used to detect and monitor neuronal, glial and muscle cell death in patients with ALS, for the purpose of diagnosis, monitoring of disease progression and assessment of drug activity.

Example 7

Identification of Glial DNA in the Plasma of Patients with Glioblastoma

Glioblastoma originates from glial cells. The present inventors examined the plasma of patients with glioblastoma for presence of glial DNA (unmethylated fragments of WM1 or MBP3). It was found that 30% of glioblastoma patients tested (n=10) had a clear signal above the minimal level in healthy individuals. Thus the method can be used to identify and monitor cell death in glioblastomas.

Example 8

Identification of Colon Epithelial DNA in Plasma of Patients with Colon Cancer or Crohn's Disease

The present inventors identified several markers of colon DNA, all having an unmethylated cluster of CpGs in colon that was unmethylated in other tissues. The tissue distribution of unmethylated molecules is shown in FIGS. 14A-D.

The present inventors then determined if these markers are present in the plasma of patients with diseases of the colon. As shown in FIG. 15, healthy individuals (n=8) had very low levels of unmethylated DNA from any of the four markers. This is consistent with the idea that cells that die during normal turnover of the colon are shed to the lumen and their DNA does not reach the circulation. In contrast, most patients with colon cancer had high levels of one to four of the colon markers in their plasma. This finding is consistent with extensive death of colon cancer cells and release of their DNA to the circulation, as shown previously using identification of tumor-specific mutations in blood. A single patient with Crohn's disease that was tested was also positive for a colon marker, suggesting extensive pathological colon cell death and release of DNA to blood.

Thus the method can identify colon DNA in blood in of patients with either colon cancer or Crohn's disease. Note that while existing methods for detection of colon cancer using blood rely on patient-specific somatic mutations, the present method is universal as it relies on colon markers that are conserved among individuals and apparently remain stable even in cancer.

Example 9

Identification of Lung DNA in Plasma of Patients with Lung Cancer

The present inventors identified and validated several markers of lung cells, which were unmethylated in DNA of lung epithelium and methylated in other tissues, and tested their levels in the plasma of healthy individuals and in the plasma of patients with lung cancer.

FIG. 16 shows the tissue distribution of unmethylated SFTP/A1, a gene that is expressed specifically in lung cells and is also unmethylated specifically in lung. It was found that unmethylated SFTP/A1 does not exist in plasma of most healthy individuals, but is present in the plasma of most patients with lung cancer.

FIG. 17 shows the tissue distribution of unmethylated SFTP/C, another gene that is expressed specifically in lung cells and is also unmethylated specifically in lung. It was found that unmethylated SFTP/C does not exist in plasma of most healthy individuals, but is present in the plasma of many patients with lung cancer.

FIG. 18 shows the tissue distribution of unmethylated CHST, a gene that is unmethylated specifically in lung. It was found that unmethylated CHST does not exist in plasma of most healthy individuals, but is present in the plasma of many patients with lung cancer.

FIG. 19 shows the tissue distribution of unmethylated RAB4, a gene that is unmethylated specifically in lung. It was found that unmethylated RAB4 does not exist in plasma of healthy individuals, but is present in the plasma of some patients with lung cancer.

It was found that the plasma of lung cancer patients tends to contain more than one lung marker. This feature distinguishes between plasma of lung cancer patients and healthy individuals.

Example 10

Identification of Skeletal Muscle DNA in Plasma after Exercise and in Muscular Dystrophies

The present inventors identified and validated three methylation markers of skeletal muscle that are unmethylated in skeletal muscle and are methylated in other tissues (including heart). FIGS. 20A-C shows that tissue distribution of these markers (TNN, TPO and MAD1).

FIG. 21 shows that the levels of two of these markers (unmethylated TPO and TNN) in the plasma of healthy controls are very low, reflective of very low turnover of muscle at baseline (with one healthy control showing a signal, for unclear reasons). Three healthy individuals shortly after an intensive physical exercise all show a clear signal in both markers, suggesting that these markers can capture exercise-induced muscle cell death. Among 15 patients with Duchenne or Becker Muscular Dystrophy (BMD or DMD), 5 showed a clear signal reflective of muscle cell death. Thus the assay can be used to detect and monitor skeletal muscle cell death after exercise and in pathologic degenerative conditions.

Example 11 Identification of Vascular Endothelial-Derived DNA in the Circulation

To identify methylation markers of endothelial cells, the present inventors determined the methylome of human endothelial cells sorted from umbilical cords. FIG. 22 shows one marker of endothelial cells, which is unmethylated in endothelial cells (from umbilical cords) but completely methylated in all specific cell types tested (lymphocytes, and pancreatic acinar cells, alpha cells and duct cells). The DNA from tissue biopsies showed about 10% lack of methylation in this marker, reflective of the presence of endothelial cells in all tissues. Most healthy individuals have no signal in their serum, suggesting low baseline turnover of endothelial cells. The blood of a female during the menstrual cycle showed a high signal as expected due to extensive vascular collapse. Pregnant women had a signal, which was stronger in women with preeclampia. All cancer patients tested showed a clear signal.

These findings suggest that the method allows for the identification of endothelial cell death in various settings, for example to assess the activity of anti-angiogenesis drugs in cancer or other pathologies, and in pathologies such as preeclampsia.

Example 12 Identification of Liver-Derived DNA in the Circulation

The present inventors identified and validated hepatocyte markers. FIG. 23 shows that the promoter of Albumin (ALB) is unmethylated in hepatocytes (and to some extent in kidney and pancreas) but is methylated elsewhere. The blood of healthy individuals contained either no signal or a relatively high level of unmethyalted albumin promoter DNA, suggesting fluctuating baseline turnover or clearance of hepatocytes. All hepatitis patients tested (n=7) were positive, suggesting that the method can detect pathologic hepatocyte death.

Example 13

Identification of Cell Free Circulating DNA Derived from Non-Lymphocytes

FIGS. 24A-B show markers that were identified as unmethylated in lymphocytes and methylated elsewhere. In blood of most healthy individuals, most molecules are unmethylated, reflecting the fact that most cell free circulating DNA under baseline conditions is derived from dying blood cells. However in the serum of healthy individuals, shortly after an intensive exercise or patients with cancer, there were high levels of these markers methylated, indicative of cell free circulating DNA derived from non-lymphocytes (apparently muscle DNA after exercise and tumor DNA in cancer patients). These markers can be used to assess in a broad way a deviation from normal in an individual: evidence of cell death in tissues other than blood.

Example 14 Identification of Kidney-Derived DNA in the Circulation

FIG. 25 shows that AQP (a sequence from one of the aquaporin genes) is unmethylated in kidney cells and methylated in blood. Healthy individuals do not have unmethylated AQP in blood (likely reflecting the shedding of dying cells in the normal kidney to the urine rather than blood). Pregnant women with preeclampsia (known to have kidney damage) showed a strong signal. These findings indicate that method can be used to detect kidney cell death in various pathologic conditions (for example, acute tubular necrosis in some patients with sepsis).

Example 15 Identification of Adipocyte DNA in the Circulation

The total number of adipocytes has been shown to remain stable during adult life, with evidence of considerable formation of new adipocytes. Thus, it is predicted that there must be adipocyte death at a rate that equal the rate of adipocyte formation.

FIGS. 26A-D show four different loci that are unmethylated in adipocytes and methylated elsewhere (here shown only for blood).

FIGS. 27A-D show that the plasma of most healthy individuals contains multiple methylation markers of adipocyte DNA, indicative of ongoing adipocyte turnover, consistent with continuous formation and destruction of these cells. Thus this method can be used to monitor adipocyte death, for examples to study physiological and pathological conditions as well as drugs that affect this process.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.

Claims

1. A method for quantifying molecules of cell-free DNA having a methylation pattern of a cell- or tissue-type of interest in a sample derived from a subject, the method comprising:

(a) identifying a DNA sequence of 50-200 bp that has at least four methylation sites, wherein a methylation pattern of said at least four methylation sites in the cell- or tissue-type of interest is different as compared to a methylation pattern of said at least four methylation sites in a same DNA sequence in a second different cell- or tissue-type, whereby the methylation pattern of said at least four methylation sites in the cell- or tissue-type of interest is characteristic of said cell- or tissue-type of interest, wherein a methylation pattern comprises a methylation status of each of said at least four methylation sites on a given DNA molecule;
(b) amplifying said identified DNA sequence in the cell-free DNA of said sample to obtain amplified DNA molecules;
(c) sequencing each of a plurality of said amplified DNA molecules to obtain a sequence that includes consecutive nucleotides encompassing the at least four methylation sites;
(d) ascertaining from each of said sequences the methylation status of each of said at least four methylation sites, thereby obtaining a methylation pattern for each of said sequenced DNA molecules; and
(e) quantifying the amount of sequenced DNA molecules having a specific methylation pattern corresponding to the cell- or tissue-type of interest that is present in the total of said sequenced DNA molecules.

2. The method of claim 1, wherein said sample is derived from a subject having a disease.

3. The method of claim 1, wherein said sample is a body fluid selected from the group consisting of blood, plasma, serum, sperm, milk, urine, saliva and cerebral spinal fluid.

4. The method of claim 1, wherein said amplifying step (b) is conducted using primers that are methylation-independent.

5. The method of claim 1, further comprising contacting the DNA of the sample with bisulfite to convert unmethylated cytosines of the DNA to uracils prior to said amplifying step (b).

6. The method of claim 1, wherein the sample comprises cell-free DNA which is derived from said second different cell- or tissue-type.

7. The method of claim 6, comprising repeating said step (e) using the second different cell- or tissue-type as the cell- or tissue-type of interest so as to quantify both the amount of sequenced DNA molecules having the specific methylation pattern of DNA molecules of the cell- or tissue-type of interest and the amount of sequenced DNA molecules having the specific methylation pattern of DNA molecules of the second different cell- or tissue-type in the total sequenced DNA molecules.

8. The method of claim 7, further comprising determining a ratio of the amount of DNA molecules having the specific methylation pattern of said cell- or tissue-type of interest to the amount of DNA molecules having the specific methylation pattern of said second cell- or tissue-type.

9. The method of claim 6, wherein said step (e) quantifies the amount of sequenced DNA molecules having the specific methylation pattern of DNA molecules of both the cell or tissue-type of interest and the second different cell- or tissue-type that are present in the total of said sequenced DNA molecules.

10. The method of claim 1, further comprising, after said quantifying step (e), determining a ratio of the amount of cell-free DNA derived from said cell- or tissue-type of interest to the total amount of cell-free DNA in the sequenced DNA molecules.

11. The method of claim 1, wherein said cell- or tissue-type of interest is selected from the group consisting of a pancreatic beta cell, a pancreatic exocrine cell, a hepatocyte, a brain cell, a lung cell, a uterus cell, a kidney cell, a breast cell, an adipocyte, a colon cell, a rectum cell, a cardiomyocyte, a skeletal muscle cell, a prostate cell and a thyroid cell.

12. The method of claim 1, wherein said cell- or tissue-type of interest is selected from the group consisting of pancreatic tissue, liver tissue, lung tissue, brain tissue, uterus tissue, renal tissue, breast tissue, fat, colon tissue, rectum tissue, heart tissue, skeletal muscle tissue, prostate tissue and thyroid tissue.

13. The method of claim 1, wherein the sample is a blood sample.

14. The method of claim 1, wherein said ascertaining step (d) is effected using a multiplex reaction.

15. The method of claim 1, wherein said cell- or tissue-type of interest is a non-diseased cell-type.

16. The method of claim 1, wherein said second different cell- or tissue-type is a hematopoietic cell.

17. The method of claim 1, wherein said DNA sequence of 50-200 bp has at least five methylation sites and said methylation pattern comprises the methylation status of each of the at least five methylation sites on a given DNA molecule.

18. The method of claim 1, wherein said DNA sequence is no longer than 170 bp.

19. The method of claim 1, wherein the amplified sequences in said amplifying step (b) are no larger than the size of said DNA sequence identified in step (a).

20. A method for quantifying molecules of cell-free DNA having a methylation pattern of a cell- or tissue-type of interest in a sample derived from a subject, the method comprising:

(a) identifying a DNA sequence of 50-200 bp that has at least four methylation sites, wherein a methylation pattern of said at least four methylation sites in the cell- or tissue-type of interest is different as compared to a methylation pattern of said at least four methylation sites in a same DNA sequence in a second different cell- or tissue-type, whereby the methylation pattern of said at least four methylation sites in the cell- or tissue-type of interest is characteristic of said cell- or tissue-type of interest, wherein a methylation pattern comprises a methylation status of each of said at least four methylation sites on a given DNA molecule;
(b) chemically converting unmethylated cytosines of said identified DNA sequence in the cell-free DNA of said sample to uracils, to obtain converted DNA molecules;
(c) amplifying, with at least 15 cycles of polymerase chain reaction (PCR), said converted DNA molecules to obtain amplified DNA molecules;
(d) sequencing each of a plurality of said amplified DNA molecules to obtain a sequence that includes consecutive nucleotides encompassing the at least four methylation sites;
(e) ascertaining from each of said sequences the methylation status of each of said at least four methylation sites, thereby obtaining a methylation pattern for each of said sequenced DNA molecules; and
(e) quantifying the amount of sequenced DNA molecules having a specific methylation pattern corresponding to the cell- or tissue-type of interest that is present in the total of said sequenced DNA molecules.

21. The method of claim 20, wherein, in said step (d), at least 10,000 amplified DNA molecules are sequenced.

Patent History
Publication number: 20230242985
Type: Application
Filed: Feb 15, 2023
Publication Date: Aug 3, 2023
Inventors: Yuval DOR (Jerusalem), Ruth SHEMER (Mevasseret Zion), Benjamin GLASER (Jerusalem)
Application Number: 18/169,821
Classifications
International Classification: C12Q 1/6881 (20060101); C12Q 1/6886 (20060101); C12Q 1/6883 (20060101);