METHOD FOR IDENTIFICATION AND ENUMERATION OF NUCLEIC ACID SEQUENCE, EXPRESSION, COPY, OR DNA METHYLATION CHANGES USING COMBINED NUCLEASE, LIGASE, POLYMERASE, AND SEQUENCING REACTIONS
The present invention relates to a method for the highly specific, targeted capture of regions of human genomes and transcriptomes from the blood, i.e. from cell free circulating DNA, exosomes, microRNA, circulating tumor cells, or total blood cells, to allow for the highly sensitive detection of mutation, expression, copy number, translocation, alternative splicing, and methylation changes using combined nuclease, ligation, polymerase, and massively parallel sequencing reactions. The method generates a collection of different circular chimeric single-stranded nucleic acid constructs, suitable for sequencing on multiple platforms. In some embodiments, each construct of the collection comprised a first single stranded segment of original genomic DNA from a host organism and a second single stranded synthetic nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. These chimeric constructs are suitable for identifying and enumerating mutations, copy changes, translocations, and methylation changes. In other embodiments, input mRNA, lncRNA, or miRNA is used to generate circular DNA products that reflect the presence and copy number of specific mRNA's, lncRNA's splice-site variants, translocations, and miRNA.
This application is a continuation of U.S. patent application Ser. No. 16/524,527, filed Jul. 29, 2019, which is a division of U.S. patent application Ser. No. 15/316,778, filed Dec. 6, 2016, now issued as U.S. Pat. No. 10,407,722, which is a national stage application under 35 U.S.C. § 371 of PCT Application No. PCT/US2015/034724, filed Jun. 8, 2015, which claims the benefit of U.S. Provisional Patent Application Serial Nos. 62/009,047, filed Jun. 6, 2014, and 62/136,093, filed Mar. 20, 2015, each of which is hereby incorporated by reference in its entirety.
FIELD OF THE INVENTIONThe present invention relates to a method for the highly specific, targeted capture of regions of human genomes and transcriptomes from the blood, i.e. from cell free circulating DNA, exosomes, microRNA, circulating tumor cells, or total blood cells, to allow for the highly sensitive detection of mutation, expression, copy number, translocation, alternative splicing, and methylation changes using combined nuclease, ligation, polymerase, and sequencing reactions.
SEQUENCE LISTINGThis application contains a computer readable Sequence Listing, which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on Oct. 6, 2022, is named 147402.007849.xml and is 374,350 bytes in size.
BACKGROUND OF THE INVENTIONAdvances in DNA sequencing hold the promise to standardize and develop non-invasive molecular diagnosis to improve prenatal care, transplantation efficacy, cancer and other disease detection and individualized treatment. Currently, patients with predisposing or early disease are not identified, and those with disease are not given the best treatment—all because of failures at the diagnostic level.
In the cancer field, there is a need to develop such technology for early detection, guiding therapy, and monitoring for recurrence—all from a blood sample. This includes the need to develop: (i) high sensitivity detection of single base mutation, small insertion, and small deletion mutations in known genes (when present at 1% to 0.01% of cell-free DNA); (ii) high sensitivity detection of promoter hypermethylation and hypomethylation (when present at 1% to 0.01% of cell-free DNA); (iii) accurate quantification of tumor-specific mRNA, lncRNA, and miRNA isolated from tumor-derived exosomes or RISC complex, or circulating tumor cells in blood; (iv) accurate quantification of tumor-specific copy changes in DNA isolated from circulating tumor cells; (v) accurate quantification of mutations, promoter hypermethylation and hypomethylation in DNA isolated from circulating tumor cells. All these (except quantification of tumor-specific copy changes in DNA isolated from circulating tumor cells) require focusing the sequencing on targeted genes or regions of the genome. Further, determination of the sequence information or methylation status from both strands of the original fragment provides critically needed confirmation of rare events.
Normal plasma contains nucleic acids released from normal cells undergoing normal physiological processes (i.e. exosomes, apoptosis). There may be additional release of nucleic acids under conditions of stress, inflammation, infection, or injury. In general, DNA released from apoptotic cells in an average of 160 bp in length, while DNA from fetal cells is an average of about 140 bp. Plasma from a cancer patient contains nucleic acids released from cancer cells undergoing abnormal physiological processes, as well as within circulating tumor cells (CTCs). Likewise, plasma from a pregnant woman contains nucleic acids released from fetal cells.
There are a number of challenges for developing reliable diagnostic and screening tests. The first challenge is to distinguish those markers emanating from the tumor or fetus that are indicative of disease (i.e. early cancer) vs. presence of the same markers emanating from normal tissue. There is also a need to balance the number of markers examined and the cost of the test, with the specificity and sensitivity of the assay. This is a challenge that needs to address the biological variation in diseases such as cancer. In many cases the assay should serve as a screening tool, requiring the availability of secondary diagnostic follow-up (i.e. colonoscopy, amniocentesis). Compounding the biological problem is the need to reliably detect nucleic acid sequence mutation or promoter methylation differences, or reliably quantify DNA or RNA copy number from either a very small number of initial cells (i.e. from CTCs), or when the cancer or fetus-specific signal is in the presence of a majority of nucleic acid emanating from normal cells. Finally, there is the technical challenge to distinguish true signal resulting from detecting the desired disease-specific nucleic acid differences vs. false signal generated from normal nucleic acids present in the sample vs. false signal generated in the absence of the disease-specific nucleic acid differences.
By way of an example, consider the challenge of detecting, in plasma, the presence of circulating tumor DNA harboring a mutation in the p53 gene or a methylated promoter region. Such a sample will contain a majority of cell-free DNA arising from normal cells, where the tumor DNA may only comprise 0.01% of the total cell-free DNA. Thus, if one were to attempt to find the presence of such mutant DNA by total sequencing, one would need to sequence 100,000 genomes to identify 10 genomes harboring the mutations. This would require sequencing 300,000 GB of DNA, a task beyond the reach of current sequencing technology, not to mention the enormous data-management issues. To circumvent this problem, many groups have attempted to capture specific target regions or to PCR amplify the regions in question. Sequence capture has suffered from dropout, such that maybe 90-95% of the desired sequences are captured, but desired fragments are missing. Alternatively, PCR amplification provides the risk of introducing a rare error that is indistinguishable from a true mutation. Further, PCR loses methylation information. While bisulfite treatment has been traditionally used to determine the presence of promoter methylation, it is also destructive of the DNA sample and lacks the ability to identify multiple methylation changes in cell-free DNA.
There are a number of different approaches for reducing error rate and improving the accuracy of sequencing runs. A consensus accuracy may be achieved in the presence of high error rates by sequencing the same region of DNA over and over again. However, a high error rate makes it extremely difficult to identify a sequence variant in low abundance, for example when trying to identify a cancer mutation in the presence of normal DNA. Therefore, a low error rate is required to detect a mutation in relatively low abundance.
The first approach termed tagged-amplicon deep sequencing (TAm-Seq) method (Forshew et al., “Noninvasive Identification and Monitoring of Cancer Mutations by Targeted Deep Sequencing of Plasma DNA,” Sci Transl Med. 4(136):136 (2012)) is based on designing primers to amplify 5995 bases that covered select regions of cancer-related genes, including TP53, EGFR, BRAF, and KRAS. This approach is able identify mutations in the p53 gene at frequencies of 2% to 65%. In this approach, primers are designed to pre-amplify the DNA (for 15 cycles) in a multiplexed reaction with many PCR primers. This creates both desired and undesired products, so it is followed with single-plex PCR to further amplify each of the desired products. The fragments subject to a final barcoding PCR prior to standard next-generation sequencing. The advantage of this approach is it uses the time tested multiplexed PCR-PCR, which is unparalleled for amplification of low numbers of starting nucleic acids. The disadvantage is that this approach is unable to distinguish a true mutation from a PCR error in the early rounds of amplification. Thus while the sensitivity of 2% (i.e. detecting one mutant allele in 50 wt alleles) is sufficient for evaluating late-stage cancers prior to making a treatment decision, it is not sensitive enough for early detection.
A variation of the first approach is termed Safe-Sequencing System “Safe-SeqS” (Kinde et al., “Detection and Quantification of Rare Mutations with Massively Parallel Sequencing,” Proc Natl Acad Sci USA 108(23):9530-5 (2011)), where randomly sheared genomic DNA is appended onto the ends of linkers ligated to genomic DNA. The approach demonstrated that the vast majority of mutations described from genomic sequencing are actually errors, and reduced presumptive sequencing errors by at least 70-fold. Likewise, an approach called ultrasensitive deep sequencing (Narayan et al., “Ultrasensitive Measurement of Hotspot Mutations in Tumor DNA in Blood Using Error-suppressed Multiplexed Deep Sequencing,” Cancer Res. 72(14):3492-8 (2012)) appends bar codes onto primers for a nested PCR amplification. Presumably, a similar system of appending barcodes was developed to detect rare mutations and copy number variations that depends on bioinformatics tools (Talasaz, A.; Systems and Methods to Detect Rare Mutations and Copy Number Variation, US Patent Application US 2014/0066317 A1, Mar. 6, 2014). Paired-end reads are used to cover the region containing the presumptive mutation. This method was used to track known mutations in plasma of patients with late stage cancer. These approaches require many reads to establish consensus sequences. Both of these methods requires extending across the target DNA, and thus it would be impossible to distinguish true mutation, from polymerase generated error, especially when copying across a damaged base, such as deaminated cytosine. Finally, these methods do not provide information on methylation status of CpG sites within the fragment.
The second approach termed Duplex sequencing (Schmitt et al., “Detection of Ultra-Rare Mutations by Next-Generation Sequencing,” Proc Natl Acad Sci USA 109(36):14508-13 (2012)) is based on using duplex linkers containing 12 base randomized tags. By amplifying both top and bottom strands of input target DNA, a given fragment obtains a unique identifier (comprised of 12 bases on each end) such that it may be tracked via sequencing. Sequence reads sharing a unique set of tags are grouped into paired families with members having strand identifiers in either the top-strand or bottom-strand orientation. Each family pair reflects the amplification of one double-stranded DNA fragment. Mutations present in only one or a few family members represent sequencing mistakes or PCR-introduced errors occurring late in amplification. Mutations occurring in many or all members of one family in a pair arise from PCR errors during the first round of amplification such as might occur when copying across sites of mutagenic DNA damage. On the other hand, true mutations present on both strands of a DNA fragment appear in all members of a family pair. Whereas artifactual mutations may co-occur in a family pair with a true mutation, all except those arising during the first round of PCR amplification can be independently identified and discounted when producing an error-corrected single-strand consensus sequence. The sequences obtained from each of the two strands of an individual DNA duplex can then be compared to obtain the duplex consensus sequence, which eliminates remaining errors that occurred during the first round of PCR. The advantage of this approach is that it unambiguously distinguishes true mutations from PCR errors or from mutagenic DNA damage, and achieves an extraordinarily low error rate of 3.8×10−10. The disadvantage of this approach is that many fragments need to be sequenced in order to get at least five members of each strand in a family pair (i.e. minimum of 10 sequence reads per original fragment, but often requiring far more due to fluctuations). Further, the method has not been tested on cfDNA, which tend to be smaller then fragments generated from intact genomic DNA, and thus would require sequencing more fragments to cover all potential mutations. Finally, the method does not provide information on methylation status of CpG sites within the fragment.
The third approach, termed smMIP for Single molecule molecular inversion probes (Hiatt et al., “Single Molecule Molecular Inversion Probes for Targeted, High-Accuracy Detection of Low-Frequency Variation,” Genome Res. 23(5):843-54 (2013) combines single molecule tagging with multiplex capture to enable highly sensitive detection of low-frequency subclonal variation. The method claims an error rate of 2.6×10−5 in clinical specimens. The disadvantage of this approach is that many fragments need to be sequenced in order to get at least five members of each strand in a family pair (i.e. minimum of 10 sequence reads per original fragment, but often requiring far more due to fluctuations). Also, the method requires extending across the target DNA, and thus it would be impossible to distinguish true mutation, from polymerase-generated error, especially when copying across a damaged base, such as deaminated cytosine. Further, the method has not been tested on cfDNA, which tend to be smaller then fragments generated from intact genomic DNA, and thus would require sequencing more fragments to cover all potential mutations. Finally, the method does not provide information on methylation status of CpG sites within the fragment.
The fourth approach, termed circle sequencing (Lou et al., “High-throughput DNA Sequencing Errors are Reduced by Orders of Magnitude Using Circle Sequencing,” Proc Natl Acad Sci USA 110(49):19872-7 (2013), see also Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Acevedo A, Brodsky L, Andino R., Nature. 2014 Jan. 30; 505(7485):686-90; and Library preparation for highly accurate population sequencing of RNA viruses. Acevedo A, Andino R. Nat Protoc. 2014 Juk9(7):1760-9.) is based on shearing DNA or RNA to about 150 bases, denaturing to form single strands, circularizing those single strands, using random hexamer primers and phi29 DNA polymerase for rolling circle amplification (in the presence of Uracil-DNA glycosylase and Formamidopyrimidine-DNA glycosylase), re-shearing the products to about 500 bases, and then proceeding with standard next generation sequencing. The advantage of this approach is that the rolling circle amplification makes multiple tandem copies off the original target DNA, such that a polymerase error may appear in only one copy, but a true mutation appears in all copies. The read families average 3 copies in size because the copies are physically linked to each other. The method also uses Uracil-DNA glycosylase and Formamidopyrimidine-DNA glycosylase to remove targets containing damaged bases, to eliminate such errors. The advantage of this technology is that it takes the sequencing error rate from a current level of about 0.1 to 1×10−2, to a rate as low as 7.6×10−6. The latter error rate is now sufficient to distinguish cancer mutations in plasma in the presence of 100 to 10,000-fold excess of wild-type DNA. A further advantage is that 2-3 copies of the same sequence are physically linked, allowing for verification of a true mutation from sequence data generated from a single fragment, as opposed to at least 10 fragments using the Duplex sequencing approach. However, the method does not provide the ability to determine copy number changes, nor provide information on methylation status of CpG sites within the fragment.
The fifth approach, developed by Complete Genomics (Drmanac et al., “Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays,” Science 327(5961):78-81 (2010)) is based on using ligation reads on nanoball arrays. About 400 nucleotides of genomic DNA are circularized with linkers, cleaved, recircularized with additional linkers, and ultimately recircularized to contain about four linkers. The DNA undergoes rolling circle amplification using phi 29 DNA polymerase to generate nanoballs. These are then placed onto an array, and sequenced using a ligation-based approach. The salient point of this approach, of relevance herein, is that multiple tandem copies of the same sequence may be generated and subsequently sequenced off a single rolling circle amplification product. Since the same sequence is interrogated multiple times by either ligase or polymerase (by combining rolling circle with other sequencing by synthesis approaches), the error rate per base may be significantly reduced. As such, sequencing directly off a rolling circle product provides many of the same advantages of the circle sequencing approach described above.
The sixth approach, termed SMRT—single molecule real time—sequencing (Flusberg et al., “Direct Detection of DNA Methylation During Single-Molecule, Real-Time Sequencing,” Nat Methods 7(6):461-5 (2010)) is based on adding hairpin loops onto the ends of a DNA fragment, and allowing a DNA polymerase with strand-displacement activity to extend around the covalently closed loop, providing sequence information on the two complementary strands. Specifically, single molecules of polymerase catalyze the incorporation of fluorescently labeled nucleotides into complementary nucleic acid strands. The polymerase slows down or “stutters” when incorporating a nucleotide opposite a methylated base, and the resulting fluorescence pulses allow direct detection of modified nucleotides in the DNA template, including N6-methyladenine, 5-methylcytosine and 5-hydroxymethylcytosine. The accuracy of the approach has improved, especially as the polymerase may traverse around the closed loop several times, allowing for determination of a consensus sequence. Although the technique is designed to provide sequence information on “dumbbell” shaped substrates (containing mostly the two complementary sequences of a linear fragment of DNA), it may also be applied to single-stranded circular substrates.
The present invention is directed at overcoming these and other deficiencies in the art.
SUMMARY OF THE INVENTIONA first aspect of the present invention is directed to a collection of different circular chimeric single-stranded nucleic acid constructs. Each construct of the collection comprised a first single stranded segment of original genomic DNA from a host organism and a second single stranded synthetic nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. The second single stranded synthetic nucleic acid segment comprises a unique identifier portion, wherein the nucleotide sequence of both the unique identifier portion and the segment of original genomic DNA distinguishes one chimeric single-stranded nucleic acid construct in the collection from every other chimeric single-stranded nucleic acid construct in the collection. The chimeric single-stranded nucleic acid constructs of the collection are circularized and suitable for rolling circle amplification and/or sequencing.
Another aspect of the present invention is directed to a system comprising a collection of different circular chimeric single-stranded nucleic acid constructs. Each construct of the collection comprises a first single stranded segment of original genomic DNA from a host organism and a second single stranded nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. The nucleotide sequence of the second single stranded nucleic acid segment comprises a first solid support primer-specific portion, a second solid support primer-specific portion, and a patient identifier sequence. The chimeric single-stranded nucleic acid constructs of the collection are circularized and suitable for rolling circle amplification and/or sequencing. The system further comprises a collection of extension products, each extension product comprising two or more tandem linear sequences complementary to the chimeric single-stranded nucleic acid construct from the collection. Each extension product in the collection is hybridized to its complementary circular chimeric single-stranded nucleic acid construct of the collection.
Another aspect of the present invention is directed to a system comprising a collection of different circular chimeric single-stranded nucleic acid constructs. Each construct comprises a first single stranded segment of original genomic DNA from a host organism and a second single stranded nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. The nucleotide sequence of the second single stranded nucleic acid segment comprises a first solid support primer-specific portion, a second solid support primer-specific portion, and a patient identifier sequence. The chimeric single-stranded nucleic acid constructs of the collection are suitable for rolling circle amplification and/or sequencing. The system further comprises one or more oligonucleotide amplification primers, each primer comprising at least a first nucleotide sequence portion that is complementary to the first solid support primer-specific portion or the second solid support primer-specific portion of the chimeric single-stranded nucleic acid constructs of the collection; and a polymerase suitable for rolling circle amplification.
Other aspects of the present invention are directed to methods of sequencing a plurality of nucleic acid molecules in a sample using the collection and systems of the present invention.
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target ribonucleic acid molecules potentially containing one or more base differences and generating, in the sample, cDNA of the one or more target ribonucleic acid molecules, if present in the sample. The method further involves providing one or more first oligonucleotide probes, each first oligonucleotide probe comprising (a) a 3′ cDNA target-specific sequence portion, (b) a 5′ cDNA target specific portion, and a further portion, said further portion comprising (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii), and contacting the sample and the one or more first oligonucleotide probes under conditions effective for 3′ and 5′ target specific portions of the first oligonucleotide probes to hybridize in a base specific manner to complementary regions of the cDNA. One or more ligation competent junctions suitable for coupling 3′ and 5′ ends of a first oligonucleotide probe hybridized to its complementary cDNA is generated and the one or more first oligonucleotide probes at the one or more ligation junctions is ligated to form circular ligated products comprising a deoxyribonucleic acid copy of the target ribonucleic acid sequence coupled to the further portion of the first oligonucleotide probe. The method further involves detecting and distinguishing the circular ligated products in the sample to identify the presence of one or more target ribonucleic acid molecules differing from other ribonucleic acid molecules in the sample by one or more bases.
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more nucleic acid molecules potentially comprising distinct first target and second target regions coupled to each other. This method involves providing a sample potentially containing one or more nucleic acid molecules comprising distinct first target and second target regions coupled to each other, and providing one or more oligonucleotide probe sets, each probe set comprising (i) a first oligonucleotide probe comprising a 5′ first target-specific portion, a 3′ second target specific portion, and a further portion, and (ii) a second oligonucleotide probe comprising a 5′ second target specific portion, a 3′ first target specific portion, and a further portion, wherein the further portion of the first or second oligonucleotide probes of a probe set comprises (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii). This method further involves contacting the sample and the one or more oligonucleotide probe sets under conditions effective for first and second oligonucleotide probes of a probe set to hybridize in a base specific manner to their corresponding first and second target regions of the nucleic acid molecule, if present in the sample, and generating one or more ligation competent junctions suitable for coupling 3′ ends of first oligonucleotide probes to 5′ ends of second oligonucleotide probes of a probe set and for coupling 5′ ends of first oligonucleotide probes to 3′ ends of second oligonucleotide probes of a probe set when said probe sets are hybridized to complementary first and second target regions of a nucleic acid molecule. The first and second oligonucleotides of a probe set are ligated at the one or more ligation competent junctions to form circular ligated products comprising a nucleotide sequence corresponding to the first and second distinct target regions of a nucleic acid molecule coupled to a further portion, and the circular ligated products are detected and distinguished in the sample thereby identifying the presence, if any, of one or more nucleic acid molecules comprising distinct first target and second target regions coupled to each other in the sample.
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target ribonucleic acid molecules potentially containing one or more base differences, and appending nucleotide linkers to 3′ and 5′ ends of the target ribonucleic acid molecules in the sample. This method further involves providing one or more oligonucleotide probes, each oligonucleotide probe comprising (a) a 3′ portion complementary to the 3′ nucleotide linker of the target ribonucleic acid molecule, (b) a 5′ portion complementary to the 5′ nucleotide linker of the target ribonucleic acid molecules, and (c) a further portion, said further portion comprising (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii). The sample is contacted with the one or more oligonucleotide probes under conditions effective for the 3′ and 5′ portions of the oligonucleotide probes to hybridize in a base specific manner to complementary nucleotide linkers on the target ribonucleic acid molecules, if present in the sample. The 3′ end of the hybridized oligonucleotide probe is extended to generate a complement of the one or more target ribonucleic acid molecules, and the 3′ extended end of the oligonucleotide probe is ligated to the 5′ end of the oligonucleotide probe to form a circular ligated product comprising a sequence complementary to the 3′ nucleotide linker of the target ribonucleic acid molecule, a sequence complementary to the 5′ nucleotide linker of the target ribonucleic acid molecule, the complement of the one or more target ribonucleic acid molecules, and the further portion of the oligonucleotide probe. This method further involves detecting and distinguishing the circular ligated products in the sample thereby identifying the presence of one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases.
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target ribonucleic acid molecules potentially containing one or more base differences, and ligating nucleotide linkers to 3′ and 5′ ends of the target ribonucleic acid molecules in the sample, wherein said nucleotide linkers are coupled to each other by a further portion, said further portion comprising (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii), whereby said ligating forms a circular ligation product comprising the target ribonucleic acid molecule, the 3′ and 5′ nucleotide linker sequences, and the further portion. This method further involves providing one or more first oligonucleotide primers comprising a nucleotide sequence that is complementary to a 3′ or 5′ nucleotide linker sequence of the circular ligation product, and hybridizing the one or more first oligonucleotide primers to the circular ligation product in a base specific manner. The 3′ end of the first oligonucleotide primer is extended to generate a complement of the circular ligation product, and the circular ligation product complements in the sample are detected and distinguished, thereby identifying the presence of one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases.
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target ribonucleic acid molecules potentially containing one or more base differences, and providing one or more oligonucleotide probe sets, each set comprising (a) a first oligonucleotide probe having a 5′ stem-loop portion and a 3′ portion complementary to a 3′ portion of the target ribonucleic acid molecule, (b) a second oligonucleotide probe having a 3′ portion complementary to a copy of the 5′ end of the target ribonucleic acid molecule, a 5′ portion complementary to the 5′ stem-loop portion of the first oligonucleotide probe, and a further portion comprising (i) a unique target identifier sequence, (ii) a patient identifier sequence, (iii) a primer binding sequence, or any combination of (i), (ii), and (iii). The method further involves blending the sample, the one or more first oligonucleotide probes from a probe set, and a reverse transcriptase to form a reverse transcriptase reaction, and extending the 3′ end of the first oligonucleotide probe hybridized to its complementary target ribonucleic acid molecule to generate a complement of the target ribonucleotide molecule, if present in the sample. The one or more second oligonucleotide probes of a probe set are hybridized to the extended first oligonucleotide probes comprising a complement of the target ribonucleotide sequences, and one or more ligation competent junctions are generated between 3′ and 5′ ends of each second oligonucleotide probe hybridized to an extended first oligonucleotide probe. The method further involves ligating the 3′ and 5′ ends of each second oligonucleotide probe to form circular ligated products comprising a deoxyribonucleic acid copy of the target ribonucleic acid sequence coupled to the further portion of the second oligonucleotide probe. The circular ligated products in the sample are detected and distinguished, thereby identifying the presence of one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases.
The significance of the invention is that it teaches a method for the highly specific, targeted capture of regions of human genomes and transcriptomes from the blood, i.e. from cell free circulating DNA, exosomes, microRNA, circulating tumor cells, or total blood cells, to allow for the highly sensitive detection of mutation, expression, copy number, translocation, alternative splicing, and methylation changes suitable for use in a high-throughput diagnostics mode. The single-stranded constructs of the invention are suitable for readout by a number of different technologies including Next Generation Sequencing and are designed to be readout technology agnostic. In order to maximize the sensitivity of capture, the invention utilizes not just hybridization but polymerase extension and ligation to decrease false positives. In order to maximize specificity, the un-ligated probes are exonuclease digested preserving the circularized target sequences. Additional specificity can be obtained by the sequencing of tandem repeats of the original genomic sequence produced by rolling circle amplification. Finally, the invention teaches a method of capturing tandem repeats produced from the original genomic sequence, generated by rolling circle amplification on a surface and then subjecting them to sequencing by synthesis thereby allowing highly accurate mutation detection, methylation status and transcriptome enumeration. The method is especially suited for identifying cancer-specific markers directly from the blood, for cancer screening as well as monitoring treatment efficacy and recurrence. The method is also suited for prenatal diagnosis of copy abnormalities and Mendelian diseases directly from the maternal blood.
A first aspect of the present invention is directed to a collection of different circular chimeric single-stranded nucleic acid constructs. Each construct of the collection comprises a first single stranded segment of original genomic DNA from a host organism and a second single stranded synthetic nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. The second single stranded synthetic nucleic acid segment comprises a unique identifier portion, wherein the nucleotide sequence of both the unique identifier portion and the segment of original genomic DNA distinguishes one chimeric single-stranded nucleic acid construct in the collection from every other chimeric single-stranded nucleic acid construct in the collection. The chimeric single-stranded nucleic acid constructs of the collection are circularized and suitable for rolling circle amplification and/or sequencing.
In accordance with this aspect of the present invention, the collection generally contains between 1,000 and 4,000,000,000 circular chimeric single-stranded nucleic acid constructs. Methods of making the circular chimeric single-stranded nucleic acid constructs of the collection are described in detail infra.
The first single stranded segment of the chimeric nucleic acid construct comprises a segment of original genomic DNA from a host. This segment of original genomic DNA may be a nucleic acid fragment that is a direct product of fragmentation of genomic DNA (i.e., without addition of one or more linker portions to the ends of the fragment), or a nucleic acid fragment of genomic DNA to which linkers have been added. The original genomic DNA segment may be derived from any genome, e.g., an animal, plant, protozoa, fungus, bacteria, or virus genome such as the genome of a human, apple tree, giardia, yeast, Staphylococcus aureus, or papillomavirus. The segment of DNA can be isolated from any fresh, frozen, or fixed (e.g., formalin-fixed and paraffin embedded) biological source, including, without limitation, tissue, cells, serum, plasma, blood, or exosomes.
The second single stranded synthetic nucleic acid segment of the nucleic acid constructs of the collection is covalently linked to the first single stranded segment, e.g., via a phosphodiester bond. The second single stranded synthetic nucleic acid segment contains an identifier sequence. In one embodiment, the identifier sequence is a barcode. The barcode is generally an 8-12 nucleotide base sequence that is used in conjunction with the sequence of the original genomic DNA segment to distinguish each nucleic acid construct from another in the collection. When the fragments of genomic DNA are of similar sequence, it is important that the identifier or barcode sequence is sufficiently divergent, such that no two nucleic acid constructs are the same. For collections comprising about 10,000 genome equivalents (the approximate genomes in 1 ml of cell-free DNA), unique identifier sequences of 8 nucleotides contain sufficient diversity (65,536) to assure each circular construct is unique. For collections comprising about 1,000,000 genome equivalents (the approximate genomes from total cells in 10 mls of blood), unique identifier sequences of 12 nucleotides contain sufficient diversity (16,777,216) to assure each circular construct is unique. In addition, when the genomic DNA is either randomly fragmented, or biologically fragmented (i.e. as in cell-free DNA), the junctions between the genomic DNA and the synthetic nucleic acid will provide additional unique sequences to assist in distinguishing each circular construct.
In another embodiment, the identifier sequence comprises one or more primer binding sites and/or patient identifier sequences as described infra. The primer binding sites are used not only to facilitate amplification, but alone or in combination with the patient identifier sequence can serve as an identifying sequence segment for purposes of distinguishing individual circular constructs within the collection. Accordingly, the identifier sequence of the second single stranded segment of the nucleic acid constructs of the collection may comprise a barcode sequence, one or more primer sequences, a patient identifier sequence, or any combination thereof.
The chimeric nucleic acid constructs of the collection are circularized, i.e., covalently closed circularized nucleic acid molecules. In accordance with this aspect of the present invention, the circularized constructs are completely single-stranded. In other aspects of the present invention, the circularized constructs may be partially double stranded or completely double-stranded.
Another aspect of the present invention is directed to a method for amplifying nucleic acid molecules. This method involves providing the collection of different circular chimeric single-stranded nucleic acid constructs of the present invention as described supra, and blending the collection with a polymerase and a plurality of short primers (6 to 10 nucleotides wherein a portion of said sequence comprises random bases and/or nucleotide analogues; e.g. random hexamers, heptamers, octamers), to form an amplification reaction. The one or more of short primers is complementary to a portion of one or more circular chimeric single stranded nucleic acid constructs of the collection. The amplification reaction mixture is subjected to one or more hybridization and extension treatments, wherein the one or more short primers hybridize to the circular chimeric single-stranded nucleic acid constructs and the polymerase extends the hybridized primers to produce a plurality of extension products. Each extension product comprises two or more tandem linear sequences that are complementary to a chimeric single-stranded nucleic acid construct from the collection.
The collection of double stranded extension products formed via this amplification method are suitable for any one of a broad range of next generation sequencing (NGS) protocols (e.g., 454 Pyrosequencing, Ion Torrent™ sequencing by synthesis, SOLiD™ sequencing by ligation, or MiSeq™ or HiSeg™ sequencing by synthesis systems). In accordance with NGS protocols, the double-stranded extension products are fragmented such that one or more tandem complementary copies of the target DNA is within a single fragment. Linkers are appended to the 5′ and 3′ ends of the fragmented DNA to allow for standard library preparation and template generation using cluster or bead amplification. A consensus sequence of all physically linked complementary copies of the original DNA molecule within the circular target is generated during sequencing. Because true mutations (*) will be present 2 or 3 times within a fragment, they are easily distinguishable from polymerase error.
In accordance with this and all aspects of the present invention, sequencing of the chimeric circular nucleic acid constructs of the present invention and extension products thereof can be carried out using any sequencing method known in the art, including, without limitation, sequencing by fluorescent primer hybridization, molecular beacon hybridization, primer extension, ligase detection reaction, ligase chain reaction, pyrosequencing, exonuclease-based sequencing, fluorescence-based sequencing-by-synthesis, fluorescence-based sequencing-by-ligation, nanopore and nanotube based sequencing, ion-based sequencing-by-synthesis, and ion-based sequencing-by-ligation. As used herein, “sequencing” encompasses a method by which the identity of at least 10 consecutive nucleotides of a polynucleotide target or template is obtained.
In another aspect of the present invention, each circular nucleic acid construct of the collection comprises a first single stranded segment of original genomic DNA from a host organism and a second single stranded synthetic nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. The second single stranded synthetic nucleic acid segment comprises a unique identifier portion and a primary primer binding site, wherein the nucleotide sequence of the unique identifier portion, the primary primer binding site and the segment of original genomic DNA distinguishes one chimeric single-stranded nucleic acid construct in the collection from every other chimeric single-stranded nucleic acid construct in the collection. The chimeric single-stranded nucleic acid constructs of the collection are circularized and suitable for rolling circle amplification and/or sequencing.
Another aspect of the present invention is directed to a method for generating tandem linear copies of nucleic acid molecules that are suitable for sequencing. This method involves providing the collection of different circular chimeric single-stranded nucleic acid constructs of the present invention as described supra, and blending the collection with a polymerase with strand displacement activity, and one or more primary primers, to form a rolling circle extension reaction. The one or more primary primers are complementary to a portion of one or more circular chimeric single stranded nucleic acid constructs of the collection. The rolling circle extension reaction mixture is subjected to one or more hybridization and extension treatments, wherein the one or more primers hybridize to the circular chimeric single-stranded nucleic acid constructs and the polymerase extends the hybridized primers to produce a plurality of extension products. Each extension product comprises two or more tandem linear sequences that are complementary to a chimeric single-stranded nucleic acid construct from the collection.
The first and second secondary primers hybridize to complementary regions of the single-stranded extension products as shown in
Another aspect of the present invention is directed to a method for generating tandem linear copies of nucleic acid molecules, if the target was methylated in the original genomic DNA (methylated base is depicted as “m”, see
In another aspect of the present invention, each circular nucleic acid construct of the collection comprises a first single stranded segment of original genomic DNA from a host organism linked to the second single stranded synthetic nucleic acid segment, where the second single stranded segment comprises one or more primer-specific sequences (e.g., a first and/or second solid support primer-specific portions), and optionally, a patient identifier sequence. In accordance with this aspect of the present invention, the second single stranded segment may or may not contain a unique identifier portion. The chimeric single-stranded nucleic acid constructs of the collection are circularized and suitable for rolling circle amplification and/or sequencing as described herein.
The patient identifier sequence of the second single stranded segment serves to identify the patient source of the original genomic DNA. The patient identifier sequence generally comprises about 5 to 8 nucleotides in length and is designed to distinguish sequences arising from different patients. In the preferred embodiment, the patient identifier sequences differ from each other in at least 3 positions, such that a single base error in sequencing still allows for positive identification of the correct patient identifier sequence. In current clinical laboratory practice, batch workup of samples is usually designed to be compatible with 96 and 384 well plate formats, such that 8, 16, 24, 48, 96, or 384 samples are processed simultaneously.
Another aspect of the present invention is directed to a system comprising a collection of different circular chimeric single-stranded nucleic acid constructs. Each construct of the collection comprises a first single stranded segment of original genomic DNA from a host organism and a second single stranded nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. The nucleotide sequence of the second single stranded nucleic acid segment comprises a first solid support primer-specific portion, a second solid support primer-specific portion, and a patient identifier sequence. The chimeric single-stranded nucleic acid constructs of the collection are circularized and suitable for rolling circle amplification and/or sequencing. The system further comprises a collection of extension products, each extension product comprising two or more tandem linear sequences that are complementary to the chimeric single-stranded nucleic acid construct from the collection. Each extension product in the collection is hybridized to its complementary circular chimeric single-stranded nucleic acid construct of the collection.
This system of the present invention may further comprise a solid support having a plurality of immobilized first oligonucleotide primers. Each first oligonucleotide primer on the solid support has a nucleotide sequence that is the same as the nucleotide sequence of the first solid support primer-specific portion of the chimeric single stranded nucleic acid constructs of the collection, and that is complementary to the first solid support primer-specific portion of the extension products. Accordingly, one or more of the first oligonucleotide primers on the solid support can hybridize to an extension product of the collection of extension products via the first solid support primer-specific portions.
The solid support can be made from a wide variety of materials. The substrate may be biological, nonbiological, organic, inorganic, or a combination of any of these, existing as particles, strands, precipitates, gels, sheets, tubing, spheres, beads, containers, capillaries, pads, slices, films, plates, slides, discs, membranes, etc. The substrate may have any convenient shape, such as a disc, square, circle, etc. The substrate is preferably flat but may take on a variety of alternative surface configurations. For example, the substrate may contain raised or depressed regions on which the hybridization takes place. The substrate and its surface preferably form a rigid support on which to carry out sequencing reactions described herein.
Commercially available next generation sequencing solid support platforms used for template preparation can be utilized in the system and methods of the present invention. For example, the Illumina® Flow Cell, Life Technologies® IonSphere™ and emulsion PCR beads, and 454 emulsion PCR beads can be used in the system and methods of the present invention. Accordingly, the first solid support primer-specific portion of the circular chimeric single stranded nucleic acid constructs is designed to be the same as the primers immobilized on a commercially available NGS solid support. Therefore, the extension products containing the complement of the first solid support primer-specific portion are capable of hybridizing to primers on the NGS solid support surface.
This system of the present invention may further comprise a collection of crosslinking oligonucleotides as shown in
As shown in
Another aspect of the present invention is directed to a system comprising a collection of different circular chimeric single-stranded nucleic acid constructs. Each construct comprises a first single stranded segment of original genomic DNA from a host organism and a second single stranded nucleic acid segment that is linked to the first single stranded segment and comprises a nucleotide sequence that is exogenous to the host organism. The nucleotide sequence of the second single stranded nucleic acid segment comprises a first solid support primer-specific portion, a second solid support primer-specific portion, and a patient identifier sequence. The chimeric single-stranded nucleic acid constructs of the collection are suitable for rolling circle amplification and/or sequencing. The system further comprises one or more oligonucleotide amplification primers, each primer comprises at least a first nucleotide sequence portion that is complementary to the first solid support primer-specific portion or the second solid support primer-specific portion of the chimeric single-stranded nucleic acid constructs of the collection. Finally, this system also has a polymerase suitable for rolling circle amplification.
Exemplary depictions of this system of the present invention are shown in
In another embodiment, the one or more oligonucleotide amplification primers comprise (i) a first nucleotide sequence that is complementary to the original genomic DNA segment of the chimeric single stranded nucleic acid constructs of the collection, (ii) a 3′ portion comprising a cleavable nucleotide or nucleotide analogue and a blocking group that blocks 3′ polymerase extension of said oligonucleotide amplification primer, and (iii) a 5′ portion comprising a cleavable nucleotide or nucleotide analogue and a capture group, where the capture group is capable of being immobilized to a solid support.
In another embodiment, the one or more oligonucleotide amplification primers comprise (a) a first oligonucleotide amplification primer having (i) a first nucleotide sequence that is complementary to a first portion of the original genomic DNA segment of the chimeric single stranded nucleic acid constructs of the collection, and (ii) a 3′ portion comprising a cleavable nucleotide or nucleotide analogue and a blocking group that blocks 3′ polymerase extension of said first oligonucleotide amplification primer, and (b) a second oligonucleotide amplification primer having (i) a first nucleotide sequence that is complementary to a second portion of the original genomic DNA segment of the chimeric single stranded nucleic acid constructs of the collection, and (ii) a 5′ portion comprising a cleavable nucleotide or nucleotide analogue and a capture group, where the capture group is capable of being immobilized to a solid support.
In accordance with this aspect of the present invention, the system may further comprise a solid support having a plurality of immobilized first oligonucleotide primers. The first oligonucleotide primers on the solid support have a nucleotide sequence that is the same as the nucleotide sequence of the first solid support primer-specific portion of the chimeric single stranded nucleic acid constructs of the collection (see e.g.,
The system of the present invention may also comprise a collection of crosslinking oligonucleotides as described supra (i.e., an oligonucleotide having two or more repeats of a nucleotide sequence, where the repeated nucleotide sequence has the same sequence as at least a portion of the second single stranded nucleic acid segment of the chimeric single-stranded nucleic acid constructs in the collection).
In accordance with this aspect of the present invention, the polymerase of this system is a strand-displacing polymerase that is suitable for rolling circle amplification. Exemplary strand-displacing polymerases include, without limitation, phi29 DNA polymerase, Bst DNA polymerase (large fragment or 5′→3′ exo-), Bsu DNA Polymerase (large fragment or 5′→3′ exo-), DeepVentR® (exo-) polymerase, Klenow Fragment (3′→5′ exo-), DNA Polymerase I (5′→3′ exo-), M-MuLV Reverse Transcriptase, VentR® (exo-) DNA Polymerase, and PyroPhage 3173 DNA Polymerase. Other exemplary strand-displacing polymerases include those having thermostability and strand-displacing activity, such as SD DNA polymerase (a mutant Taq DNA polymerase) (see U.S. Patent Application Publication No. 2012/0115145 to Fu, WO2014/161712 to Ignatov et al., and Ignatov et al., “A Strong Stand Displacement Activity of Thermostable DNA Polymerase Markedly Improves the Results of DNA Amplification,” BioTechniques 57:81087 (2014), which are hereby incorporated by reference in their entirety); AptaHotTaq Polymerase (thermostable 5′→3′ polymerase activity with a 5′ flap endonuclease activity); polymerases derived from thermophilic viruses and microbes (see U.S. Patent Application Publication 2012/0083018 to Schoenfeld et al., and U.S. Pat. No. 8,093,030 to Schoenfeld et al., which are hereby incorporated by reference in their entirety); polymerases derived from Thermus antranikianii and Thermus brockianus as disclosed in WO2006/030455 to Hjorleifsdottir et al., and U.S. Patent Application Publication No. 2008/0311626 to Hjorleifsdottir et al., which are hereby incorporated by reference in their entirety; the thermostable polymerase derived from Thermus scotoductus (see WO2007/076461 to Rech et al., which is hereby incorporated by reference in its entirety); and Type I DNA polymerase derived from Bacillus pallidus (see U.S. Pat. No. 5,736,373 to Hamilton, which is hereby incorporated by reference in its entirety). Other strand-displacing polymerases known in the art are also suitable for this system and related methods of the present invention.
Another aspect of the present invention is directed to a method of sequencing a plurality of nucleic acid molecules using this system. In accordance with this method, the oligonucleotide amplification primers hybridized to complementary circular chimeric single-stranded nucleic acid constructs of the collection are blended with the polymerase to form a rolling circle amplification reaction mixture. The rolling circle amplification reaction mixture is subject to an extension treatment where the polymerase extends the one or more hybridized oligonucleotide amplification primers to produce a plurality of primary extension products. Each primary extension product comprises one or more tandem linear sequences, each tandem linear sequence being complementary to a circular chimeric single-stranded nucleic acid construct in the collection. The circular chimeric single-stranded nucleic acid constructs can be sequenced directly, e.g., the circular chimeric construct is the template for sequence-by-synthesis. Alternatively, the primary extension products formed from the rolling circle amplification reaction are the templates used for sequencing. As noted above, any sequencing method known in the art can be utilized to sequence the circular nucleic acid constructs or the primary extension products thereof.
In one embodiment, the primary extension products are immobilized on a solid support prior to sequencing. A suitable solid support comprises at least a plurality of first oligonucleotide primers each having a nucleotide sequence that is the same as the nucleotide sequence of the first solid support primer-specific portion of the chimeric single stranded nucleic acid constructs of the collection (i.e., having a sequence complementary to the first solid support primer-specific portion of the primary extension products). Suitable solid supports include, without limitation, those that are commercially available and utilized for template preparation in next generation sequencing platforms, e.g., Illuminate flow cell, Life Technologies™ Ion Sphere™. The primary extension products hybridize to the first oligonucleotide primers on the solid support and are sequenced on the support as described in more detail below.
The solid support may further comprise a plurality of second oligonucleotide primers. The second primers have a nucleotide sequence that is complementary to the second solid support primer-specific primer portion of the chimeric single stranded nucleic acid constructs. As depicted in
As depicted in
Once the primary extension product is immobilized on the solid support (
To sequence the opposite strand (i.e. to obtain the sequence of the primary extension products), the first oligonucleotide primer on solid support is unblocked. A polymerase (filled diamonds) with 5′-3′ nuclease activity extends the tethered sequencing primer on the solid support while digesting product from the sequencing by synthesis reaction as shown in
As shown in
Sequencing of the secondary and tertiary extension products can be achieved using sequence-by-synthesis as described and depicted herein. Sequence-by-synthesis includes fluorescence-based sequencing-by-synthesis and ion-based sequencing-by-synthesis. Other suitable sequencing methods can also be employed, including, for example and without limitation, fluorescent primer hybridization, molecular beacon hybridization, primer extension, exonuclease-based sequencing, ligase detection reaction, ligase chain reaction, pyrosequencing, fluorescence-based sequencing-by-ligation, nanopore and nanotube based sequencing, and ion-based sequencing-by-ligation.
Another aspect of the present invention is directed to methods of making the circular chimeric single stranded nucleic acid constructs that are a component of the systems and methods described herein. A number of exemplary methods are described below and depicted in the accompanying figures.
One suitable method for making the circular chimeric single stranded nucleic acid constructs involves providing a sample containing one or more target genomic DNA segments. The target genomic DNA segments may potentially contain one or more base differences or one or more methylated residues of interest for detection. The method further involves providing one or more first oligonucleotide probes, each first oligonucleotide probe comprising (a) a 5′ target-specific portion, (b) a 3′ target-specific portion, and (c) a further portion. The further portion is a nucleotide sequence comprising (i) a patient identifier sequence, (ii) a first solid support primer-specific portion, and (iii) a second solid support primer-specific portion. The sample and the one or more first oligonucleotide probes are contacted under conditions effective for the 3′ target-specific portion of a first oligonucleotide probe to hybridize in a base specific manner to a complementary 3′ end of a target genomic DNA segment, and for the 5′ target-specific portion of the first oligonucleotide probe to hybridize in a base specific manner to a complementary 5′ end of the target genomic DNA segment, if present in the sample. Following hybridization, one or more ligation competent junctions suitable for coupling the 3′ and 5′ ends of the target genomic DNA segment hybridized to the first oligonucleotide probe are generated and the target genomic DNA segment is ligated together at the one or more ligation junctions to form a circular chimeric single-stranded nucleic acid construct of the collection.
The further portion of the oligonucleotide probe (shown as a thick black bar) may also contain a unique identifier sequence, a patient identifier sequence, a primer binding sequence, and/or a cleavable link (
The process depicted in
The further portion of the oligonucleotide probes (shown as a thick black bar) may contain a unique identifier sequence, a patient identifier sequence, a primer binding sequence, and/or a cleavable link (depicted within the thick bar as “U” (
The process depicted in
The further portion of the oligonucleotide probes (shown as a thick black bar) may contain a unique identifier sequence, a patient identifier sequence, a primer binding sequence, and/or a cleavable link (within the thick bar labelled as “U”) (
The process depicted in
The further portion of the oligonucleotide probes (shown as a thick black bar) may contain a unique identifier sequence, a patient identifier sequence, a primer binding sequence, and/or a cleavable link (within the thick bar labelled as “U” (
The 3′ and 5′ target specific portions of the oligonucleotide probes, e.g., the oligonucleotide probes used in the process depicted in
The process depicted in
In the embodiment depicted in
In the embodiment depicted in
In the embodiment of
In the embodiment depicted in
In the embodiment depicted in
In the embodiment of
In the embodiment depicted in
In the embodiment of
In the embodiment depicted in
Another suitable approach for generating the different circular chimeric single stranded nucleic acid constructs of the collection involves providing a sample containing one or more target genomic DNA segments potentially containing one or more base differences or one or more methylated residues and appending nucleotide linkers sequences to 3′ and 5′ ends of the target genomic DNA segments. The appended nucleotide linkers optionally comprise (i) the patient identifier sequence, (ii) the first solid support primer-specific portion, (iii) the second solid support primer-specific portion, and/or (iv) unique identifier sequence. One or more first oligonucleotide probes are provided where each first oligonucleotide probe comprises (a) a portion complementary to a 3′ linker portion of the linker appended target genomic DNA segment, (b) a portion complementary to the 5′ linker portion of the linker appended target genomic DNA segment, and (c) optionally a further portion. The further portion optionally comprises (i) the patient identifier sequence, (ii) the first solid support primer-specific portion, (iii) the second solid support primer-specific portion and/or (iv) unique identifier sequence. The sample and the one or more first oligonucleotide probes are contacted under conditions effective for the 3′ and 5′ portions of the first oligonucleotide probes to hybridize in a base specific manner to complementary linkers of the linker appended target genomic DNA segments, if present in the sample. One or more ligation competent junctions suitable for coupling the 3′ and 5′ ends of the linker appended target genomic DNA segment hybridized to the first oligonucleotide probe are generated. Ligation of the linker appended target genomic DNA segment at the one or more ligation junctions forms different circular chimeric single-stranded nucleic acid construct of the collection.
As shown in
The standard approach for appending linkers is illustrated in
While the standard approach provides the opportunity for introducing unique sequences on the single-stranded portions of the linkers, these sequences do not allow for unambiguous matching of a top strand sequence with a bottom strand sequence. To achieve this type of construct, the standard approach is modified as illustrated in
In this embodiment, the linker sequence comprises a first, second and third oligonucleotide. The first oligonucleotide comprises the first solid support primer-specific portion, one or more first patient identifier sequences, a first sequencing primer binding site, and a 3′ end that is complementary to the third oligonucleotide. The second oligonucleotide comprising a region complementary to the 5′ end of the third oligonucleotide, and the third oligonucleotide comprises a 5′ end that is complementary to the second oligonucleotide, a second sequencing primer binding site, a portion complementary to the 3′ end of the first oligonucleotide, a second patient identifier sequence, and the second solid support primer-specific portion. The third oligonucleotide hybridizes to complementary portions of the first and second oligonucleotides to form composite linkers. As shown in
In this embodiment, the linker sequences comprise (i) a first linker oligonucleotide comprising the first solid support primer-specific portion, one or more first patient identifier sequences, a first sequencing primer binding site, and riboguanosine bases on the 3′ end, and (ii) a second linker oligonucleotide comprising a second sequencing primer binding site, a second patient identifier sequence, the second solid support primer-specific portion, and a 5′ portion that is complementary to the first oligonucleotide. The double-stranded target genomic DNA segments are blended with the linker sequences, a reverse transcriptase, and a ligase to form a reverse-transcription—ligation reaction mixture. The riboguanosine bases of the first linker oligonucleotide hybridize to the 3′ cytosine overhang of the double stranded target genomic DNA segment (
The procedures exemplified in
Thus, the procedures exemplified in
A polymerase with strand-displacement activity is introduced to extend the second primer to generate single stranded extension product containing tandem copies of the target DNA and liberate extension product from the solid support (
The oligonucleotide probe design shown in
The oligonucleotide probe design shown in
The oligonucleotide probe design shown in
The oligonucleotide probe design shown in
The oligonucleotide probe design shown in
The oligonucleotide probe design shown in
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target ribonucleic acid molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target ribonucleic acid molecules potentially containing one or more base differences and generating, in the sample, cDNA of the one or more target ribonucleic acid molecules, if present in the sample. The method further involves providing one or more first oligonucleotide probes, each first oligonucleotide probe comprising (a) a 3′ cDNA target-specific sequence portion, (b) a 5′cDNA target specific portion, and a further portion, said further portion comprising (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii). The sample is contacted with the one or more first oligonucleotide probes under conditions effective for 3′ and 5′ target specific portions of the first oligonucleotide probes to hybridize in a base specific manner to complementary regions of the cDNA. One or more ligation competent junctions suitable for coupling 3′ and 5′ ends of a first oligonucleotide probe hybridized to its complementary cDNA is generated and the first oligonucleotide probe, at the one or more ligation junctions, is ligated to form a ligated circular product comprising a deoxyribonucleic acid copy of the target ribonucleic acid sequence coupled to the further portion of the first oligonucleotide probe. The method further involves detecting and distinguishing the circular ligated products in the sample to identify the presence of one or more target ribonucleic acid molecules differing from other ribonucleic acid molecules in the sample by one or more bases.
In accordance with this aspect of the present invention,
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more nucleic acid molecules potentially comprising distinct first target and second target regions coupled to each other (e.g., putative gene fusions). This method involves providing a sample potentially containing one or more nucleic acid molecules comprising distinct first target and second target regions coupled to each other, and providing one or more oligonucleotide probe sets, each probe set comprising (i) a first oligonucleotide probe comprising a 5′ first target-specific portion, a 3′ second target specific portion, and a further portion, and (ii) a second oligonucleotide probe comprising a 5′ second target specific portion, a 3′ first target specific portion, and a further portion, wherein the further portion of the first or second oligonucleotide probes of a probe set comprises (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii). This method further involves contacting the sample and the one or more oligonucleotide probe sets under conditions effective for first and second oligonucleotide probes of a probe set to hybridize in a base specific manner to their corresponding first and second target regions of the nucleic acid molecule, if present in the sample, and generating one or more ligation competent junctions suitable for coupling 3′ ends of first oligonucleotide probes to 5′ ends of second oligonucleotide probes of a probe set and for coupling 5′ ends of first oligonucleotide probes to 3′ ends of second oligonucleotide probes of a probe set when said probe sets are hybridized to complementary first and second target regions of a nucleic acid molecule. The first and second oligonucleotides of a probe set are ligated at the one or more ligation competent junctions to form circular ligated products comprising a nucleotide sequence corresponding to the first and second distinct target regions of a nucleic acid molecule coupled to a further portion. The circular ligated products are detected and distinguished in the sample thereby identifying the presence, if any, of one or more nucleic acid molecules comprising distinct first target and second target regions coupled to each other in the sample.
In accordance with this aspect of the present invention,
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target miRNA molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target miRNA molecules potentially containing one or more base differences, and appending nucleotide linkers to 3′ and 5′ ends of the target miRNA molecules in the sample. This method further involves providing one or more oligonucleotide probes, each oligonucleotide probe comprising (a) a 3′ portion complementary to the 3′ nucleotide linker of the target miRNA molecule, (b) a 5′ portion complementary to the 5′ nucleotide linker of the target miRNA molecules, and (c) a further portion, said further portion comprising (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii). The sample is contacted with the one or more oligonucleotide probes under conditions effective for the 3′ and 5′ portions of the oligonucleotide probes to hybridize in a base specific manner to complementary nucleotide linkers on the target miRNA molecules, if present in the sample. The 3′ end of the hybridized oligonucleotide probe is extended to generate a complement of the one or more target miRNA molecules, and the 3′ extended end of the oligonucleotide probe is ligated to the 5′ end of the oligonucleotide probe to form a circular ligated product comprising a sequence complementary to the 3′ nucleotide linker of the target miRNA molecule, a sequence complementary to the 5′ nucleotide linker of the target miRNA molecule, the complement of the one or more target miRNA molecules, and the further portion of the oligonucleotide probe. This method further involves detecting and distinguishing the circular ligated products in the sample thereby identifying the presence of one or more target miRNA molecules differing from other nucleic acid molecules in the sample by one or more bases.
In accordance with this aspect of the present invention,
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target miRNA molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target miRNA molecules potentially containing one or more base differences, and ligating nucleotide linkers to 3′ and 5′ ends of the target miRNA molecules in the sample. The nucleotide linkers are coupled to each other by a further portion, said further portion comprising (i) a unique identifier sequence, (ii) a patient identifier sequence, (iii) one or more primer binding sequences, or any combination of (i), (ii), and (iii), whereby said ligating forms a circular ligation product comprising the target miRNA molecule, the 3′ and 5′ nucleotide linker sequences, and the further portion. This method further involves providing one or more first oligonucleotide primers comprising a nucleotide sequence that is complementary to a 3′ or 5′ nucleotide linker sequence of the circular ligation product, and hybridizing the one or more first oligonucleotide primers to the circular ligation product in a base specific manner. The 3′ end of the first oligonucleotide primer is extended to generate a complement of the circular ligation product, and the circular ligation product complements in the sample are detected and distinguished, thereby identifying the presence of one or more target miRNA molecules differing from other nucleic acid molecules in the sample by one or more bases.
In accordance with this aspect of the present invention,
Another aspect of the present invention is directed to a method for identifying, in a sample, one or more target miRNA molecules differing from other nucleic acid molecules in the sample by one or more bases. This method involves providing a sample containing one or more target miRNA molecules potentially containing one or more base differences, and providing one or more oligonucleotide probe sets, each set comprising (a) a first oligonucleotide probe having a 5′ stem-loop portion and a 3′ portion complementary to a 3′ portion of the target miRNA molecule, (b) a second oligonucleotide probe having a 3′ portion complementary to a copy of the 5′ end of the target miRNA molecule, a 5′ portion complementary to the 5′ stem-loop portion of the first oligonucleotide probe, and a further portion comprising (i) a unique target identifier sequence, (ii) a patient identifier sequence, (iii) a primer binding sequence, or any combination of (i), (ii), and (iii). The method further involves blending the sample, the one or more first oligonucleotide probes from a probe set, and a reverse transcriptase to form a reverse transcriptase reaction, and extending the 3′ end of the first oligonucleotide probe hybridized to its complementary target miRNA molecule to generate a complement of the target miRNA molecule, if present in the sample. The one or more second oligonucleotide probes of a probe set are hybridized to the extended first oligonucleotide probes comprising a complement of the target miRNA sequences, and one or more ligation competent junctions are generated between 3′ and 5′ ends of each second oligonucleotide probe hybridized to an extended first oligonucleotide probe. The method further involves ligating the 3′ and 5′ ends of each second oligonucleotide probe to form circular ligated products, each product comprising a deoxyribonucleic acid copy of the target miRNA sequence coupled to the further portion of the second oligonucleotide probe. The circular ligated products in the sample are detected and distinguished, thereby identifying the presence of one or more target miRNA molecules differing from other nucleic acid molecules in the sample by one or more bases.
In accordance with this aspect of the present invention,
Mutational changes in oncogenes are usually in discrete regions or positions and can often drive tumor progression. A list of these genes and their mutations may be found in public databases such as the Sanger Genome Center “COSMIC” database. Presence of such mutations in serum is a strong indicator of some tumor tissue in the body. Traditionally such mutations have been identified using allele-specific PCR amplification. This approach is susceptible to an initial false-amplification, followed by amplification of the false product. Others have used digital PCR to try to quantify mutant DNA in the serum. However, mutational changes in tumor suppressor genes such as p53 and APC are too numerous to cover using allele-specific PCR approaches. Thus, the approach has shifted to deep sequencing across all exons of the protein. When input DNA is limiting, it is important to achieve equal amplification of different regions to assure the same general depth of coverage. There have been attempts to achieve equal amplifications using molecular inversion probes (MIP), see e.g., Akhras et al., “Connector Inversion Probe Technology: A Powerful One-Primer Multiplex DNA Amplification System for Numerous Scientific Applications,” PLoS One 2(9):e915 (2007); Hiatt et al., “Single Molecule Molecular Inversion Probes for Targeted, High-Accuracy Detection of Low-Frequency Variation,” Genome Res. 23(5):843-54 (2013), which are hereby incorporated by reference in their entirety. However these techniques depend on making a copy of genomic DNA, and thus risk introducing additional errors.
Overview of approach: The idea is to faithfully capture every fragment of target DNA that covers the regions or exons of interest, append a unique identifier sequence and an optional patient identifier, and circularize them for subsequent sequencing, such as circle sequencing. The original target DNA strand is circularized and sequenced. This avoids polymerase-induced errors that arise from extending (and circularizing) hybridized oligonucleotides, which would lead to false-positives. This approach provides the advantage of obtaining both copy number information when needed (i.e. viral load, chromosomal imbalance in tumors, detection of aneuploidy in non-invasive prenatal diagnosis), and mutational data with the minimum of sequencing required.
Variation 1.1: (
The hybridization conditions are chosen such that hybridization of the 3′ probe region complementary to the sequence of the 3′ side of the target brings the local concentration of the 3′ end of the linker on the target to its complement, such that it hybridizes correctly and is readily extended by polymerase. However, if there is less than sufficient complementarity between the oligonucleotide probe and the target DNA, then the 3′ end of the linker on the target will not hybridize to its complement, and rarely be extended by polymerase. Extension of the 3′ end of the oligonucleotide on the target enhances association of the probe to the target, and thus increases the ability of the 3′ end of the linker to hybridize correctly to its complement and be extended by polymerase. The 5′43′ nuclease cleavage activity of polymerase (or Fen nuclease) cleaves a matching 5′-overlapping base of target at or near the position where the 5′ side of the target is complementary to the 5′ portion of the probe, leaving ligation-competent 5′-phosphate from the authentic target. Polymerase also extends oligonucleotide on target, and either generates a ligation-competent 5′-phosphate (left side of
Ligase covalently seals the extended 3′ ends to the ligation-competent 5′-phosphate to create circular ligation products. Blocking group prevents circularization of oligonucleotide probe (
This approach requires an enzyme with target-dependent 5′3′ nuclease activity such that the ligation-competent 5′-phosphate is generated only when there is proper hybridization and complementarity between the 5′ probe region and the 5′ target region. When using polymerase with 5′3′ nuclease cleavage activity, the challenge is to avoid having polymerase extend the short linker along the 5′ probe region in such a way that it destroys the 5′ target region (i.e. nick-translation) without a ligation step. Nick-translation that destroys original target DNA and replaces it with extended DNA may inadvertently introduce a polymerase error that would be propagated and miscalled as a mutation. This may be minimized by using a mixture of polymerases, both with and without 5′→3′ nuclease cleavage activity (e.g. in a ratio of 1:20) under conditions of distributive extension in the presence of ligase such that most extension is by polymerase without nuclease activity until polymerase with nuclease activity is required to create the ligation competent junction, followed by polymerase dissociation, and a ligation event to generate the desired circular ligation product.
cfDNA, with sizes in the 140 to 170 nucleotide range, reflect cleavage of chromosomal DNA around nucleosome units. Such cleavage events may be somewhat phased, or more random, generating a number of fragments that may be evenly or unevenly distributed. Thus, there is a need to design oligonucleotide probes that will give coverage independent of where the fragment breaks.
In
In
Detailed Protocol (V1.1) for High Sensitivity Detection of Single Base Mutation, Small Insertion, or Small Deletion Mutations (when Present at 1% to 0.01%,) in Known Genes (e.g. Braf, K-Ras, p53):
1.1.a. Starting with cfDNA (or for example genomic DNA isolated from CTC, sheared to about 150 bp), repair ends with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is added with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Optionally, purify target DNA from unligated linker.
1.1.b. Denature target DNA containing linkers on both ends (94° C. 1 minute) in the presence of oligonucleotide probes (comprising a 5′ probe region complementary to the sequences to the 5′ side of the targets, a unique identifier sequence, an optional patient identifier sequence, a sequence complementary to the 3′ end of the linker, and a 3′ probe region complementary to the sequences to the 3′ side of the targets), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Taq polymerase, and thermostable ligase (preferably from strain AK16D), dNTPs and optional KlenTaq (Taq polymerase lacking nuclease activity) are either added, subsequent to the annealing step, or at the start of the procedure. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
1.1.c. Optionally, cleave the oligonucleotide probe strand at a cleavable link (e.g. U cleaved using UDG and AP endonuclease). Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence as a primer binding site.
Note 1: Oligonucleotide probe may contain an optional blocking group on the 5′ side to interfere with subsequent 5′-3′ nuclease activity of polymerase, such that the oligonucleotide probe does not circularize. Alternatively, a cleavable link may be included in the original oligonucleotide probe.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 4: The ligation extension process may be carried out in two or more reaction tubes, one set containing “odd” numbered probes, the second set containing “even” numbered probes to avoid probes competing for the same target DNA strands. The separate reactions may be subsequently pooled.
Variation 1.2. (
The hybridization conditions are chosen such that hybridization of the 3′ probe region complementary to the sequences of the 3′ side of the target brings the local concentration of the 3′ end of the linker on the target, such that it hybridizes correctly and is readily extended by polymerase. The 5′ end of the linker may be designed to be slightly longer, such that once polymerase extends the 3′ linker, it doesn't extend right through the 5′ linker complementary sequence until it hits the 5′ target portion that is hybridized to the 5′ probe region. However, the “combined” hybridization effect of both the 3′ linker and 5′ linker regions to their complements on the oligonucleotide probe should still have a Tm below the hybridization temperature, such that random hybridization of incorrect target to oligonucleotide probe rarely if ever results in extension and ligation to give a circular product.
This variation requires that the oligonucleotide probe contain sequences complementary to both the 3′ and 5′ linker strands, with those sequences being separated by only about 20 bases. Since the sequences are complementary to the linker strands, which in turn are complementary to each other, self-pairing of the oligonucleotide probe needs to be avoided. One solution is to ligate linkers that contain an internal bubble, such that the two linkers retain double stranded character at the low temperature used for linker ligation (16° C. or even 4° C. with T4 ligase). In addition the 5′ linker may be designed to be longer than the 3′ linker. Finally, the regions of complementarity within the linker may be designed to have subtle mismatches or a nucleotide analogue (i.e. G:T and T:G; or I:A) which are more destabilizing in the complementary oligonucleotide probe (i.e. C:A and A:C; or C:T) such that the oligonucleotide probe is less likely to form the internal self-pairing with a loop at the overall hybridization temperature (i.e. 50° C.).
For example, an I:A mismatch in the middle of an A tract in a linker would lower the Tm by 2.4° C. compared to a T:A match, but the complementary mismatch in the oligonucleotide, C:T lowers the Tm by 10.1° C. (a difference of 7.7° C.). Likewise a G:T mismatch in the middle of an A tract in a linker would lower the Tm by 10.2° C. compared to a G:C match, but the complementary mismatch in the oligonucleotide, A:C lowers the Tm by 18.0° C. (a difference of 7.8° C.). See Kawase et al., “Studies on nucleic acid interactions. I. Stabilities of mini-duplexes (dG2A4XA4G2-dC2T4YT4C2) and self-complementary d(GGGAAXYTTCCC) containing deoxyinosine and other mismatched bases,” Nucleic Acids Res. 14(19):7727-36 (1986), which is hereby incorporated by reference in its entirety). Thus placing both such mismatches within the linker would be predicted to lower Tm of oligonucleotide self-pairing by about 15.5° C. compared to the linker self-pairing. The exact amount may vary based on sequence context, nevertheless, it should be apparent to those of skill in the art that use of I:A and G:T mismatch in a linker DNA duplex that remains double stranded at the ligation temperature of either 4° C. or 16° C., would be more than sufficient to assure that the complementary sequences, separated by about 20 bases within an oligonucleotide would not be self-pairing (i.e. hairpin with a 20 base loop) at the higher hybridization temperature of 50 C.
Extension of the 3′ end of the oligonucleotide on the target enhances association of the probe to the target, and thus increases the ability of the 3′ end of the linker to hybridize correctly to its complement and be extended by polymerase. The 5′→3′ nuclease cleavage activity of polymerase (or Fen nuclease) cleaves a matching 5′-overlapping base of the 5′ linker, leaving ligation-competent 5′-phosphate on the linker. Polymerase also extends oligonucleotide on target, and either generates a ligation-competent 5′-phosphate or does not cleave the blocking group on the 5′ end of the oligonucleotide.
Ligase covalently seals the extended 3′ ends to the ligation-competent 5′-phosphate to create circular ligation products. Blocking group prevents circularization of oligonucleotide probe, or alternatively, a nick is introduced at the cleavable link (e.g. UDG cleavage of dU, followed by cleavage of the apurinic backbone with AP endonuclease). Optional addition of Uracil-DNA glycosylase (UDG) and Formamidopyrimidine-DNA glycosylase (Fpg, also known as 8-oxoguanine DNA glycosylase, which acts both as a N-glycosylase and an AP-lyase) may be used to nick targets containing damaged bases. Exonuclease(s) are then added to digest all unligated or nicked products leaving only desired single-stranded circular DNA comprising of original target DNA with unique identifier sequence. This product is suitable for rolling circle amplification and circle sequencing, or direct SMRT sequencing.
The challenge here is to avoid having polymerase extend the 3′ linker in such a way that it destroys the 5′ linker without a ligation step (i.e. nick-translation). This may be accomplished by incorporating thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream linker), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
An alternative approach is to use a 5′ linker containing an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the linker is bound to the target. The endonuclease also cleaves single-stranded linker, but with lower efficiency, and thus linker hybridized to template would be the preferred substrate. When using thermostable EndoIII, the PCR polymerase used would lack the 5′→3′ exonuclease activity.
As mentioned above, the nick-translation problem may also be minimized by using a mixture of polymerases, both with and without 5′→3′ nuclease cleavage activity (e.g. in a ratio of 1:20) under conditions of distributive extension in the presence of ligase such that most extension is by polymerase without nuclease activity until polymerase with nuclease activity is required to create the ligation competent junction, followed by polymerase dissociation, and a ligation event to generate the desired circular ligation product.
In
In
Detailed Protocol (V1.2) for High Sensitivity Detection of Single Base Mutation, Small Insertion, or Small Deletion Mutations (when Present at 1% to 0.01%,) in Known Genes (e.g. Braf, K-Ras, p53):
1.2.a. Starting with cfDNA (or for example genomic DNA isolated from circulating tumor cells (CTC), sheared to about 150 bp), repair ends with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is added with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase at 4° C. appends linkers on both ends of the fragment. Optionally, purify target DNA from unligated linker.
1.2.b. Denature target DNA containing linkers on both ends (94° C., 1 minute) in the presence of oligonucleotide probes (comprising a 5′ probe region complementary to the sequences on the 5′ side of the targets, a sequence complementary to the 5′ end of the linker, a unique identifier sequence, an optional patient identifier sequence, a sequence complementary to the 3′ end of the linker, and a 3′ probe region complementary to the sequences to the 3′ side of the targets), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are either added subsequent to the annealing step, or at the start of the procedure. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
1.2.c. Optionally, cleave the oligonucleotide probe at a cleavable link (e.g. U cleaved using UDG and AP endonuclease). Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence as a primer binding site.
Note 1: Oligonucleotide may contain an optional blocking group on the 5′ side to interfere with subsequent 5′-3′ nuclease activity of polymerase, such that the oligonucleotide probe does not circularize. Alternatively, a cleavable link may be included in the original oligonucleotide.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: The 5′ end linker may be synthesized to contain thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream linker), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note: 4: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when bound to the target. The endonuclease also cleaves single-stranded oligonucleotide, but with lower efficiency, and thus linker hybridized to template would be the preferred substrate.
Note 5: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain a 5′ phosphate. Alternatively, the 5′ phosphate may be added using T4 kinase either prior to ligating to the target DNA, or after that ligation step.
Note 6: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 7: The ligation extension process may be carried out in two or more reaction tubes, one set containing “odd” numbered probes, the second set containing “even” numbered probes to avoid probes competing for the same target DNA strands. The separate reactions may be subsequently pooled.
Note 8: Example of linkers containing single-base 3′ “T” overhangs is provided as oligonucleotides iSx-003-ShAdT (Top strand), and iSx-004-pShAdB (Bottom strand) (see Table 1). Example of slightly longer linkers, to enhance binding to the linker region, is provided as oligonucleotides iSx-006-MdAdT (Top strand), and iSx-007-pMdAdB (Bottom strand) (see Table 1). Linkers iSx-003-ShAdT and iSx-006-MdAdT (see Table 1) may contain optional thiophosphate groups near the 5′ end to prevent nick-translation and facilitate circularization of targets with linker ends. Linkers iSx-004-pShAdB and iSx-007-pMdAdB (see Table 1) may contain optional thiophosphate groups near or at the 3′ end to prevent degradation if using a polymerase with proofreading activity, and facilitate circularization of targets with linker ends.
Note 9: If an extra base is not added to the target DNA (i.e. skipping the Klenow step), then a blunt end ligation is used. To avoid linker-to-linker ligation, the blunt end of the linker is un-phosphorylated. Example of linkers containing blunt ends is provided as oligonucleotides iSx-003-ShAdT (Top strand), and iSx-005-ShAdB (Bottom strand) (see Table 1).
Note 10: When designing oligonucleotides for use with longer linker sequences, comprising “barcoding” or “indexing” sequences for use with commercial instruments, it may be necessary to assemble the oligonucleotide using PCR, strand-displacement amplification, or a combination thereof. During PCR, use of dUTP instead of TTP incorporates uracil, suitable for subsequent cleavage by UDG. The reverse-strand primer may be phosphorylated, allowing for its digestion using lambda exonuclease or a similar 5′→3′ exonuclease. A dA30 sequence may be appended to the 5′ end of the forward primer, enabling strand displacement amplification. Examples of oligonucleotides suitable for assembly amplification for sequencing regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations are shown in Table 1 (below) and include the following: (i) KRAS forward and reverse target regions (iSx-001-bkA30, iSx-016-bkA30-KRSF11, iSx-017-pKRSF12, iSx-008-701F, iSx-009-501R, iSx-018-pKRSF13, and iSx-018-pKRSR14) and (iSx-015-pKRSF10, iSx-017-pKRSF12, iSx-008-701F, iSx-009-501R, iSx-018-pKRSF13, iSx-019-bkA30-KRSR15, and iSx-001-bkA30); (ii) BRAF forward and reverse target regions (iSx-001-bkA30, iSx-023-bkA30-BRF-F11, iSx-024-pBRF-F12, iSx-008-701F, iSx-009-501R, iSx-025-pBRF-F13, and iSx-026-pBRF-R14) and (iSx-022-pBRF-F10, iSx-024-pBRF-F12, iSx-008-701F, iSx-009-501R, iSx-025-pBRF-F13, iSx-027-bkA30-pBRF-R15, and iSx-001-bkA30); (iii) TP53 Exon 5 upstream forward and reverse target regions (iSx-001-bkA30, iSx-035-bkA30-pTP53e5F11, iSx-036-pTP53e5F12, iSx-008-701F, iSx-009-501R, iSx-037-pTP53e5F13, iSx-038-pTP53e5R14) and (iSx-034-pTP53e5F10, iSx-036-pTP53e5F12, iSx-008-701F, iSx-009-501R, iSx-037-pTP53e5F13, iSx-039-bkA30-TP53e5R15, and iSx-001-bkA30); (iv) TP53 Exon 5 downstream forward and reverse target regions (iSx-001-bkA30, iSx-043-bkA30-TP53e5F21, iSx-044-pTP53e5F22, iSx-008-701F. iSx-009-501R, iSx-045-pTP53e5F23, and iSx-046-pTP53e5R24) and (iSx-042-pTP53e5F20, iSx-044-pTP53e5F22, iSx-008-701F. iSx-009-501R, iSx-045-pTP53e5F23, iSx-047-bkA30-TP53e5R25 and iSx-001-bkA30); (v) TP53 Exon 6 forward and reverse target regions (iSx-001-bkA30, iSx-053-bkA30-TP53e6F31, iSx-054-pTP53e6F32, iSx-008-701F, iSx-009-501R, iSx-055-pTP53e6F33, and iSx-056-pTP53e6R34) and (iSx-052-pTP53e6F30, iSx-054-pTP53e6F32, iSx-008-701F, iSx-009-501R, iSx-055-pTP53e6F33, iSx-057-bkA30-TP53e6R35, and iSx-001-bkA30); (vi) TP53 Exon 7 forward and reverse target regions (iSx-001-bkA30, iSx-063-bkA30-TP53e7F41, iSx-064-pTP53e7F42, iSx-008-701F, iSx-009-501R, iSx-065-pTP53e7F43, and iSx-066-pTP53e7R44) and (iSx-062-pTP53e7F40, iSx-064-pTP53e7F42, iSx-008-701F, iSx-009-501R, iSx-065-pTP53e7F43, iSx-067-pTP53e7R45, and iSx-001-bkA30); and (vii) TP53 Exon 8 forward and reverse target regions (iSx-001-bkA30, iSx-073-bkA30-TP53e8F51, iSx-074-pTP53e8F52, iSx-008-701F, iSx-009-501R, iSx-075-pTP53e8F53, and iSx-076-pTP53e8R54) and (iSx-072-pTP53e8F50, iSx-074-pTP53e8F52, iSx-008-701F, iSx-009-501R, iSx-075-pTP53e8F53, iSx-077-bkA30-TP53e8R55, and iSx-001-bkA30).
Note 11: After generating the circular products comprising target regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations, these regions may be subject to rolling circle amplification using target-specific primers to generate tandem-repeat products. These products may be generated either prior to, or after capture of desired targets with target-specific oligonucleotides on a solid support (See note 12 below). Primers may contain an internal cleavable nucleotide base or abasic site such as 1′,2′-Dideoxyribose (dSpacer), enabling incorporation of dUTP during rolling circle amplification for protection against carryover contamination. Examples of such primers are shown in Table 1 below and include the following: (i) KRAS forward and reverse primers (iSx-108-KRS-rcF26, iSx-109-KRS-rcR27); (ii) BRAF forward and reverse primers (iSx-118-BRF-rcF26, iSx-119-BRF-rcR27); (iii) TP53 Exon 5 forward and reverse primers (iSx-128-TP53e5-rcF66, iSx-129-TP53e5-rcR67; iSx-130-TP53e5-rcF68, iSx-131-TP53e5-rcR69); (iv) TP53 Exon 6 forward and reverse primers (iSx-138-TP53e6-rcF76, iSx-139-TP53e6-rcR77); (v) TP53 Exon 7 forward and reverse primers (iSx-148-TP53e7-rcF86, iSx-149-TP53e7-rcR87); and (vi) TP53 Exon 8 forward and reverse primers (iSx-158-TP53e8-rcF96, iSx-159-TP53e8-rcR97).
Note 12: After generating the circular products comprising target regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations, and/or generating tandem-repeat products, these products may be captured by hybridizing to longer oligonucleotides, which contain a capture group suitable for subsequent capture on a solid support. Examples of such primers, containing biotin groups suitable for capture via streptavidin-coated solid surfaces are shown in Table 1 below and include the following: (i) KRAS forward and reverse capture oligonucleotides (iSx-013-KRS-bcF1, iSx-014-KRS-bcR2); (ii) BRAF forward and reverse capture oligonucleotides (iSx-020-BRF-bcF1, iSx-021-BRF-bcR2); (iii) TP53 Exon 5 forward and reverse capture oligonucleotides (iSx-030-TP53e5-bcF1, iSx-031-TP53e5-bcR2; iSx-032-TP53e5-bcF3, iSx-033-TP53e5-bcR4); (iv) TP53 Exon 6 forward and reverse capture oligonucleotides (iSx-050-TP53e6-bcF5, iSx-051-TP53e6-bcR6); (v) TP53 Exon 7 forward and reverse capture oligonucleotides (iSx-060-TP53e7-bcF7, iSx-061-TP53e7-bcR8); and (vi) TP53 Exon 8 forward and reverse capture oligonucleotides (iSx-070-TP53e8-bcF9, iSx-071-TP53e8-bcR10).
Note 13: With the aforementioned products generated using the above primer and linker designs, after cluster or bead amplification, or capture within a well, address, or surface of a flow cell on a commercial instrument, the following primers may be used to initiate sequencing reactions: (i) iLx-003-PEsqP1, Paired End sequencing primer 1; (ii) iLx-004-BrCdR1, Indexing primer, Barcode Read 1; (iii) iLx-001-P5-BrCdR2, Barcode Read 2; and (iv) iLx-005-PEsqP2, Paired End sequencing primer 2 (see Table 1 for primer sequences).
Variation 1.3: (see e.g.,
Variation 1.4: (see e.g.,
The hybridization conditions are chosen such that hybridization of the 3′ probe region complementary to the sequence of the 3′ side of the target and hybridization of the 5′ probe region complementary to the sequence of the 5′ side of the target are enriched over targets that hybridize to only one side (and would form an unproductive extension product that would not circularize). Extension of the 3′ end of the oligonucleotide probe on the target enhances association of the probe to the target. The 5′→3′ nuclease cleavage activity of polymerase (or Fen nuclease) cleaves a matching 5′-overlapping base of target at or near the position where the 5′ side of the target is complementary to the 5′ portion of the probe, leaving ligation-competent 5′-phosphate from the authentic target. Polymerase also extends oligonucleotide probe using target at template, and either generates a ligation-competent 5′-phosphate (left side of
Ligase covalently seals the extended 3′ ends to the ligation-competent 5′-phosphate to create circular ligation products. Blocking group prevents circularization of oligonucleotide probe (right side), or alternatively, a nick is introduced at the cleavable link (e.g. UDG cleavage of dU, followed by cleavage of the apurinic backbone with AP endonuclease, left side). Optional addition of Uracil-DNA glycosylase (UDG) and Formamidopyrimidine-DNA glycosylase (Fpg, also known as 8-oxoguanine DNA glycosylase, which acts both as a N-glycosylase and an AP-lyase) may be used to nick targets containing damaged bases. Exonuclease(s) are then added to digest all unligated or nicked products leaving only desired single-stranded circular DNA comprising of original target DNA with unique identifier sequence. This product is suitable for rolling circle amplification and circle sequencing, or direct SMRT sequencing.
This approach requires an enzyme or set of enzymes with target-dependent 5′→3′ and 3′→5′ nuclease activity such that the ligation-competent 5′-phosphate is generated only when there is proper hybridization and complementarity between the 5′ probe region and the 5′ target region. When using polymerase with 5′→3′ nuclease cleavage activity, the challenge is to avoid having polymerase extend the short linker along the 5′ probe region in such a way that it destroys the 5′ target region (i.e. nick-translation) without a ligation step. Nick-translation that destroys original target DNA and replaces it with extended DNA may inadvertently introduce a polymerase error that would be propagated and miscalled as a mutation. This may be minimized by using a mixture of polymerases, both with and without 5′→3′ nuclease cleavage activity (e.g. in a ratio of 1:20) under conditions of distributive extension in the presence of ligase such that most extension is by polymerase without nuclease activity until polymerase with nuclease activity is required to create the ligation competent junction, followed by polymerase dissociation, and a ligation event to generate the desired circular ligation product. Use of oligonucleotide probe design as depicted in
In
In
Detailed Protocol for High Sensitivity Detection of Single Base Mutation, Small Insertion, or Small Deletion Mutations (when Present at 1% to 0.01%,) in Known Genes (e.g. Braf, K-Ras, p53)
1.4.a. Starting with cfDNA (or for example genomic DNA isolated from CTC, sheared to about 150 bp), denature target DNA (94° C. 1 minute) in the presence of oligonucleotides (comprising a 5′ probe region complementary to the sequences to the 5′ side of the targets, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a 3′ probe region complementary to the sequences to the 3′ side of the targets), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 40° C. or 45° C. for 2 hours). The reaction is lowered in temperature, and DNA polymerases (i.e. T4 polymerase has strong 3′→5′ proofreading activity) and DNA polymerase 1 (has weak 3′→5′ proofreading, but good 5′→3′ activity), and optionally Klenow fragment (no 5′→3′ activity)), DNA ligase (T4 ligase or E. coli ligase), and dNTPs are added subsequent to the annealing step. Allow for extension and ligation at 23° C. or 30° C., and optionally raise the temperature (e.g. 37° C.) to assure completion of extension and ligation, to generate circular products.
1.4.b. Optionally, cleave the oligonucleotide probe at a cleavable link (e.g. U cleaved using UDG and AP endonuclease). Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence as a primer binding site.
Note 1: Oligonucleotide may contain an optional blocking group on the 5′ side to interfere with subsequent 5′-3′ nuclease activity of polymerase, such that the oligonucleotide probe does not circularize. Alternatively, a cleavable link may be included in the original oligonucleotide.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: A 1:20 mixture of T4 polymerase (with 5′→3′ nuclease activity), DNA polymerase I (weak 3′→5′ proofreading, but good 5′→3′ activity) and Klenow (DNA polymerase I without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 4: The ligation extension process may be carried out in two or more reaction tubes, one set containing “odd” numbered probes, the second set containing “even” numbered probes to avoid probes competing for the same target DNA strands. The separate reactions may be subsequently pooled.
Note 5: Use of mismatch bases in the probe sequence at regular intervals (i.e. 10, 12, or 15 bases) allows for distinguishing authentic target DNA sequence from sequence generated by copying the probe strand. The oligonucleotides listed in Note 6 below contain mismatch bases approximately every 15 bases in the target sequence regions.
Note 6: When designing oligonucleotides for use with longer linker sequences, comprising “barcoding” or “indexing” sequences for use with commercial instruments, it may be necessary to assemble the oligonucleotide using PCR, strand-displacement amplification, or a combination thereof. During PCR, use of dUTP instead of TTP incorporates uracil, suitable for subsequent cleavage by UDG. The reverse-strand primer may be phosphorylated, allowing for its digestion using lambda exonuclease or a similar 5′→3′ exonuclease. A dA30 sequence may be appended to the 5′ end of the forward primer, enabling strand displacement amplification. Examples of oligonucleotides suitable for assembly amplification for sequencing regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations are shown in Table 1 (below) and include the following: (i) KRAS forward and reverse target regions (iSx-001-bkA30, iSx-103-bkA30-KRSF21, iSx-104-pKRSF22, iSx-100-705F, iSx-101-502R, iSx-105-pKRSF23, and iSx-106-pKRSR24) and (iSx-102-pKRSF20, iSx-104-pKRSF22, iSx-100-705F, iSx-101-502R, iSx-105-pKRSF23, iSx-107-bkA30-KRSR25, and iSx-001-bkA30); (ii) BRAF forward and reverse target regions (iSx-001-bkA30, iSx-113-bkA30-BRF-F21, iSx-114-pBRF-F22, iSx-100-705F, iSx-101-502R, iSx-115-pBRF-F23, and iSx-116-pBRF-R24) and (iSx-112-pBRF-F20, iSx-114-pBRF-F22, iSx-100-705F, iSx-101-502R, iSx-115-pBRF-F23, iSx-117-bkA30-BRF-R25 and iSx-001-bkA30); (iii) TP53 Exon 5 forward and reverse target regions (iSx-001-bkA30, iSx-123-bkA30-TP53e5F61, iSx-124-pTP53e5F62, iSx-100-705F, iSx-101-502R, iSx-125-pTP53e5F63, and iSx-126-pTP53e5R64) and (iSx-122-pTP53e5F60, iSx-124-pTP53e5F62, iSx-100-705F, iSx-101-502R, iSx-125-pTP53e5F63, iSx-127-bkA30-TP53e5F65, and iSx-001-bkA30); (iv) TP53 Exon 6 forward and reverse target regions (iSx-001-bkA30, iSx-133-bkA30-TP53e6F71, iSx-134-pTP53e6F72, iSx-100-705F, iSx-101-502R, iSx-135-pTP53e6F73, and iSx-136-pTP53e6R74) and (iSx-132-pTP53e6F70, iSx-134-pTP53e6F72, iSx-100-705F, iSx-101-502R, iSx-135-pTP53e6F73, iSx-137-bkA30-TP53e6F75, and iSx-001-bkA30); (v) TP53 Exon 7 forward and reverse target regions (iSx-001-bkA30, iSx-143-bkA30-TP53e7F81, iSx-144-pTP53e7F82, iSx-100-705F, iSx-101-502R, iSx-145-pTP53e7F83, and iSx-146-pTP53e7R84) and (iSx-142-pTP53e7F80, iSx-144-pTP53e7F82, iSx-100-705F, iSx-101-502R, iSx-145-pTP53e7F83, iSx-147-bkA30-TP53e7R85, and iSx-001-bkA30); and (vi) TP53 Exon 8 forward and reverse target regions (iSx-001-bkA30, iSx-153-bkA30-TP53e8F91, iSx-154-pTP53e8F92, iSx-100-705F, iSx-101-502R, iSx-155-pTP53e8F93, and iSx-156-pTP53e8R94) and (iSx-152-pTP53e8F90, iSx-154-pTP53e8F92, iSx-100-705F, iSx-101-502R, iSx-155-pTP53e8F93, iSx-157-bkA30-TP53e8R95 and iSx-001-bkA30).
Note 7: After generating the circular products comprising target regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations, these regions may be subject to rolling circle amplification using target-specific primers to generate tandem-repeat products. These products may be generated either prior to, or after capture of desired targets with target-specific oligonucleotides on a solid support (See note 8 below). Primers may contain an internal cleavable nucleotide base or abasic site such as 1′,2′-Dideoxyribose (dSpacer), enabling incorporation of dUTP during rolling circle amplification for protection against carryover contamination. Examples of such primers are shown in Table 1 (below) and include the following: (i) KRAS forward and reverse primers (iSx-108-KRS-rcF26, iSx-109-KRS-rcR27); (ii) BRAF forward and reverse primers (iSx-118-BRF-rcF26, iSx-119-BRF-rcR27); (iii) TP53 Exon 5 forward and reverse primers (iSx-128-TP53e5-rcF66, iSx-129-TP53e5-rcR67; iSx-130-TP53e5-rcF68, iSx-131-TP53e5-rcR69); (iv) TP53 Exon 6 forward and reverse primers (iSx-138-TP53e6-rcF76, iSx-139-TP53e6-rcR77); (v) TP53 Exon 7 forward and reverse primers (iSx-148-TP53e7-rcF86, iSx-149-TP53e7-rcR87); and (vi) TP53 Exon 8 forward and reverse primers (iSx-158-TP53e8-rcF96, iSx-159-TP53e8-rcR97).
Note 8: After generating the circular products comprising target regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations, and/or generating tandem-repeat products, these products may be captured by hybridizing to longer oligonucleotides, which contain a capture group suitable for subsequent capture on a solid support. Examples of such capture oligonucleotides, containing biotin groups suitable for capture via streptavidin-coated solid surfaces are shown in Table 1 (below) and include the following: (i) KRAS forward and reverse capture oligonucleotides (iSx-013-KRS-bcF1, iSx-014-KRS-bcR2); (ii) BRAF forward and reverse capture oligonucleotides (iSx-020-BRF-bcF1, iSx-021-BRF-bcR2); (iii) TP53 Exon 5 forward and reverse capture oligonucleotides (iSx-030-TP53e5-bcF1, iSx-031-TP53e5-bcR2; iSx-032-TP53e5-bcF3, iSx-033-TP53e5-bcR4); (iv) TP53 Exon 6 forward and reverse capture oligonucleotides (iSx-050-TP53e6-bcF5, iSx-051-TP53e6-bcR6); (v) TP53 Exon 7 forward and reverse capture oligonucleotides (iSx-060-TP53e7-bcF7, iSx-061-TP53e7-bcR8); and (vi) TP53 Exon 8 forward and reverse capture oligonucleotides (iSx-070-TP53e8-bcF9, iSx-071-TP53e8-bcR10).
Note 9: With the aforementioned products generated using the above primer and linker designs, after cluster or bead amplification, or capture within a well, address, or surface of a flow cell on a commercial instrument, the following primers may be used to initiate sequencing reactions: (i) iLx-003-PEsqP1, Paired End sequencing primer 1; (ii) iLx-004-BrCdR1, Indexing primer, Barcode Read 1; (iii) iLx-001-P5-BrCdR2, Barcode Read 2; and (iv) iLx-005-PEsqP2, Paired End sequencing primer 2 (primer sequences are provided in Table 1 below).
Prophetic Example 2—High Sensitivity Methylation Marker Detection for Promoter Hypermethylation (when Present at 1% to 0.01%) in Plasma DNAPromoter methylation plays an important role in regulating gene expression. Promoters for genes often have regions of high CpG content known as “CpG Islands”. When genes, such as tumor suppressor genes, with promoter CpG islands are turned off, this is usually accompanied with methylation of most CpG sequences within the promoter and 1st exon regions. There have been two traditional approaches to detecting methylation changes.
The first takes advantage of methyl-sensitive restriction enzymes, wherein genomic DNA is cleaved when unmethylated, and this is followed by a PCR amplification using primers that flank the site(s). If the DNA was methylated, it should amplify, if unmethylated, it should not amplify. This technique has the disadvantage that digestions do not always go to completion, and further, it is not accurate for finding low levels of methylated DNA when the majority of the same sequence is unmethylated, as would be the case with plasma detection.
The second approach is known as “Methyl-specific PCR” and is based on bisulfite treatment of DNA, which converts unmethylated C's to U's. If the base is methylated, then it is not converted. Methyl-specific PCR is based on using primers and TaqMan probes that are specific for the resultant converted sequence if it were methylated, but not unmethylated. Methyl-specific PCR has the advantage of being able to detect very low levels of methylated DNA. A further improvement of this technique employs a blocking oligonucleotide that hybridizes to the sequence for bisulfite-converted unmethylated DNA, thus enriching for amplification of bisulfite-converted methylated DNA. The disadvantage is that bisulfite treatment destroys from 50% to 90% of the original DNA integrity by nicking it. When starting with DNA from the plasma (with average length of about 160 bases), this can be a significant problem. Further, converting C's to U's reduces the complexity of the sequence from 4 bases to 3 bases. Since the converted sequence is now more A:T rich, longer PCR primers are also required. Thus, non-specific amplifications can occur, as primers are more likely to mis-prime at closely related but incorrect sequences. This usually necessitates a nested-PCR approach, this runs the risk of carryover contamination and is generally not ideal for multiplexed amplifications.
Overview of approach: The idea is to faithfully copy every fragment of target DNA that contains methylation at adjacent restriction sites for the regions of interest, append a unique identifier sequence and an optional patient identifier, and circularize them for subsequent sequencing. The oligonucleotide DNA strand is circularized and sequenced. This approach provides the advantage of obtaining both copy number information when needed, and methylation data with the minimum of sequencing required.
Detection of Methylation at Adjacent SitesVariation 2.1: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Methylation:2.1a. Cleave isolated cfDNA, or methyl enriched DNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). In this example, HinP1I is used. Heat kill endonuclease(s) (65° C. for 15 minutes) and denature DNA (94° C. 1 minute).
2.1b. Denature target DNA (94° C. 1 minute) in the presence of oligonucleotide probes (comprising a 5′ probe region complementary to the sequence of the 5′ side of the target, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding site, and a 3′ probe region complementary to the sequence of the 3′ side of the target), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 40-50° C. for 2 hours). The oligonucleotide contains unmethylated HinP1I sequences near both the 3′ and 5′ ends of the probes, which are designed to contain either mismatches, blocking groups, or lack phosphorylation, such that they are not substrates for either polymerase or ligase. Cool to 37° C. and add HinP1I, which will nick the unmethylated strands of the probe target hybrid if the target DNA was methylated, liberating an extension-competent 3′OH end, and a ligation competent 5′-phosphate. Heat kill endonuclease(s) (65° C. for 15 minutes). KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are either added, subsequent to the annealing step, or subsequent to the restriction endonuclease nicking step. Allow for extension and ligation at 50° C., and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
2.1.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1: Oligonucleotide may lack a 5′ phosphate, or contain an optional blocking group on the 5′ side, such that the 5′ end of the oligonucleotide is not suitable for ligation. Oligonucleotide may contain 3′ mismatches, 3′ hairpin, or an optional blocking group on the 3′ side, such that the 3′ end of the oligonucleotide is not suitable for extension on the target. In both cases, the block is only liberated by restriction enzyme nicking of the probe when hybridized to methylated target.
Note 2: The above example use KlenTaq, a polymerase lacking strand displacing activity as well as both 3′→5′ and 5′→3′ nuclease activity. If the oligonucleotide has a blocking group on the 3′ side, then one can use polymerase with 3′→5′ nuclease activity, while if the blocking group is on the 5′ side, then one can use polymerase with 5′→3′ nuclease activity.
Note 3: In the example, the restriction endonuclease (HinP1I) is heat inactivated by incubating at 65° C. for 15 minutes. An alternative approach, (or when using a heat insensitive enzyme like BstUI) is to extend with nucleotide analogues (i.e. 5-methyl-dCTP, or alpha-thiophosphate dCTP) that render a restriction site resistant to re-cleavage with the cognate endonuclease. If the restriction site on the 5′ side of the oligonucleotide is not converted to a resistant form by use of nucleotide analogues (i.e. HinP1I), then this case may be solved by using an oligonucleotide with a blocking group on the 5′ side, as well as a polymerase containing 5′→3′ nuclease activity. Initially, the blocking group is removed by the nicking activity of the restriction endonuclease. Subsequently during the extension step using nucleotide analogues, the polymerase containing 5′→3′ nuclease activity nick translates through the recognition site, replacing it with nucleotides that render the site refractory to further cleavage.
Note 4: In the above example, further to minimize nick-translation way past the recognition sequence, the portion of the oligonucleotide directly adjacent to the restriction site of the 5′ probe portion may be synthesized to contain thiophosphate linkages. To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note 5: The target-specific portions of the oligonucleotide probe are designed such that they will remain hybridized to the target even after liberation of the non-productive 3′ and 5′ end. If the target contains additional restriction sites that overlap with the probe portions, then the probe may be synthesized with 5-methyl-dC in those positions. Should the target be methylated at those positions, then the site will be methylated at both the target and the oligonucleotide probe, and hence be refractory to nicking. However, should the target be unmethylated at those positions, then the site will be nicked on the target strand, thus interfering with probe hybridization to the (shortened) target.
Note 6. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats complementary to the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 7. If the oligonucleotide probes also comprise optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Database of Methylation Status at Adjacent Methyl-Sensitive Restriction Sites.The current TCGA database contains information on the methylation status of about 450,000 CpG sites on the human genome, both for normal and for tumor of many different tissue sites. However, it does not cover all the methylation status of adjacent HinP1I sites, nor would it distinguish that both sites are methylated on the same piece of genomic DNA.
Consequently, for the above assay method to be most useful, it would be helpful to create a database of methylation status at adjacent methyl-sensitive restriction sites. One such an approach is illustrated in
Overview of approach: The idea is to generate a library of small fragments that could only have been formed if both ends of the fragment contained restriction sites that were methylated in the original genomic DNA. The fragments have linkers appended with optional unique identifier and optional patient identifier sequences that are now amenable for ligation to create fragment multimers that are then substrates for additional steps and subsequent sequencing.
Variation 2.1.1: (see e.g.,
2.1.1a. Cleave isolated genomic DNA, or methyl enriched DNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). In this example, HinP1I is used. Optionally, heat kill endonuclease(s) (65° C. for 15 minutes). Ligate on linkers that are blocked on the 5′ end of the non-ligating end. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. (See Note 1, below.)
2.1.1b. Add 5′ blocked primers, Taq Polymerase, and dNTPs and perform a few cycles of PCR to generate products that are now unmethylated at the remaining HinP1I sites.
2.1.1.c. Add HinP1I to cleave products containing such sites. Only PCR amplicons containing adjacent HinP1I sites (GCGC) that were methylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with HinP1I. Residual polymerase from Taq polymerase will fill in 2-base overhang (optionally raise temperature to 60° C.). Remove dNTPs (via a spin column), add back only dATP to generate a single base 3′ A overhang.
2.1.1.d. Using T4 ligase, append linkers (containing optional unique identifier sequence, and optional patient identifier sequence) with a single base 3′ T overhang on the ligating end to the single base A overhang of the cleaved and filled in target sequences. The 5′ non-ligating side of the linker contains a 5′ overhang, and optionally is not phosphorylated. The 3′ non-ligating side of the linker contains a 3′ blocking group and/or thiophosphates to inhibit digestion with 3′ exonuclease.
2.1.1.e. Add a 3′ exonuclease (i.e. Exonuclease III) and digest at 37° C. Fragments containing the original short linker on one or both sides will be digested and rendered single-stranded. Only fragments with the new linker ligated to both sides will remain double-stranded. Exonuclease will also render the linkers single-stranded. Optionally, remove digestion products from desired fragments with a spin column.
2.1.11 The free ends of the remaining linker-containing fragments are rendered competent for ligation, either by (i) phosphorylating 5′ end using T4 kinase, (ii) removing blocked 3′ group, or (iii) using 5′→3′ nuclease activity of Taq polymerase to cleave off matching 5′-overlapping base or flap, leaving ligation-competent 5′-phosphate, or any combination thereof. Ligation conditions are designed to favor multimerization. The ligation products comprise of multimers of target fragments with adjacent HinP1I sequences originally methylated in target DNA with optional unique identifier and/or patient identifier sequence. This product is suitable for optional additional steps and subsequent sequencing.
Note 1: Regarding linker ligations: As an alternative for filling in, repairing ends, and A tailing prior to linker ligation with linkers containing a single base T overhang, one can take advantage of the 5′ CpG overhang generated by HinP1I. One can generate a single base 3′ C overhang using Klenow (exo-) and dCTP. Linkers have a single base 3′ C overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Alternatively, linkers with a CG overhang may be used directly, with no fill-in of sites, but the linkers are designed, such that if they self ligate, they create an Ac1I site (AAACGTT). Thus, genomic DNA is cleaved with Hinp1I (GACGC), Ac1I site (AAACGTT) in the presence of both linkers and T4 ligase at 37° C. Ligation of fragments to each other is cleaved by Hinp1I, ligation of linkers to each other is cleaved by Ac1I, however ligation of linkers to the ends of fragments is not cleaved by either enzyme, so this 3-enzyme ligation mix serves as a “biochemical selection” to enrich for the desired products.
Note: 2: The 3′ end of the blocked linker may be synthesized to contain a Uracil or an apurinic (AP) site at a position internal to the block, and after treatment with UDG and AP endonuclease, liberate a ligation competent 3′ end.
Note 3: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 4: Ligation will favor formation of multimers by using crowding agents (i.e. 20% PEG), and/or by mixing two sets of ligation products, wherein the non-palindromic 5′ linker overhang of the first set is complementary to the non-palindromic 5′ linker overhang of the second set.
Detection of Methylation at Adjacent Sites Using Bisulfite TreatmentThe above approach is ideal for identifying and enumerating fragments containing adjacently methylated HinP1I sites as a surrogate for methylation within that fragment of DNA. However, for some applications, it is important to identify methylation status of individual CpG sites within a given region. Thus it may be necessary to treat the input DNA with bisulfite, which converts regular C, but not methyl-C, to U.
Overview of approach using bisulfite: The idea is to faithfully copy every fragment of target DNA that is methylated at adjacent AciI restriction sites of the sequence (G*CGG) for the regions of interest, treat with bisulfite, append a unique identifier sequence, an optional primer binding sequence, and an optional patient identifier, and circularize them for subsequent sequencing. The oligonucleotide DNA strand is circularized and sequenced. This approach provides the advantage of obtaining both copy number information when needed, and methylation data with the minimum of sequencing required.
The idea takes advantage of a unique property of the recognition sequence for AciI. The enzyme cleaves the 3.5 base recognition sequence GACGG in one orientation, and CACGC in the second orientation. If a methylated AciI site is treated with bisulfite in the first orientation (G*CGG), the site will remain unchanged (G*CGG), while in the second orientation (C*CGC), it will be changed (U*CGU, where *C denotes 5-meC.). After a few rounds of PCR amplification, the 5-methyl C is converted to C, while the U is converted to T. Thus, a methylated AciI site in the first orientation remains as GCGG, an unmethylated AciI site in the first orientation is converted to GTGG, a methylated AciI site in the second orientation is converted to TCGT, and an unmethylated AciI site in the second orientation is converted to TTGT. When cleaved with AciI, only adjacent AciI sites methylated in the original target will create fragments that are ligation competent on both the 3′ and 5′ ends.
One unique feature of this approach is that bisulfite conversion creates two non-complementary strands. Thus the top strand will have adjacent G*CGG sequences, and even if there is an intervening C*CGC sequence, it doesn't matter because it is converted to U*CGU, which after PCR conversion to TCGT is not recognized as an AciI site. Further, even if there is an intervening G*CGG or GCGG (i.e. unmethylated) site, it will not be nicked by AciI, since it will be in a region that is single-stranded. Even better, sequences on the top strand of the form C*CGC will be G*CGG on the bottom strand. The bottom strand may also be used to query methylation status, and since the two sequences are now very different, the oligonucleotide probes to the top and bottom strand will not hybridize to each other, only to the converted targets. Thus, this approach allows for obtaining detailed methylation status on the promoter using information from both top and bottom strands.
The principle of treating DNA with bisulfite to convert unmethylated DNA into a sequence that is not cleavable by a restriction enzyme, but if methylated, it is still cleavable by that same restriction enzyme may be extended to some additional restriction sites. For example, a BstUI site (CGACG) retains its same sequence after bisulfite conversion provided both CG sites were methylated (i.e. *CG*CG). Likewise, an Hpy99I site (CGWCGA) also retains its same sequence after bisulfite conversion provided both CG sites were methylated (*CGW*CG). In a different type of example, HpyCH2IV (AACGT) will cleave both sequences of the form A*CGT and A*CGC after bisulfite conversion, as both become A*CGT. Thus, the variations considered below for AciI are equally valid for, but not limited to, restriction endonucleases BstUI, Hpy99I, HpyCH2IV and their isoschizomers.
Variation 2.2: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Methylation:2.2.a. Ligate on linkers that contain 5-methyl C to retain sequence after bisulfite conversion. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
2.2.b. Add primers, Taq Polymerase, and dNTPs and perform a few cycles of PCR to generate products that are now unmethylated at the remaining AciI sites.
2.2.c. Add AciI to cleave products containing such sites. Only PCR amplicons containing adjacent AciI sites (GCGG) that were methylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with AciI. Residual polymerase from Taq polymerase will fill in 2-base overhang (optionally raise temperature to 60° C.).
2.2.d. Denature target DNA (94° C. 1 minute) in the presence of oligonucleotide probes (comprising a 5′ probe region complementary to the sequence of the 5′ side of the targets, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding site, and a 3′ probe region complementary to the sequence of the 3′ side of the targets), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are added to allow for extension and ligation at 50° C., and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
2.2.e. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. Oligonucleotide may lack a 5′ phosphate, or contain an optional blocking group on the 5′ side, such that the 5′ end of the oligonucleotide is not suitable for ligation.
Note 2. The above example use KlenTaq, a polymerase lacking strand displacing activity as well as both 3′→5′ and 5′→3′ nuclease activity. If the oligonucleotide has a blocking group is on the 5′ side, then one can use polymerase with 5′→3′ nuclease activity.
Note 3. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 4. If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
In variation 2.2 both a polymerase and a ligase are used to form a covalently closed circle containing a unique identifier sequence, an optional patient identifier sequence, and an optional primer binding site. This may also be accomplished using just a ligase.
Variation 2.3: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Unmethylation:2.3.a. Ligate on linkers that contain 5-methyl C to retain sequence after bisulfite conversion. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
2.3.b. Add primers, Taq Polymerase, and dNTPs and perform a few cycles of PCR to generate products that are now unmethylated at the remaining AciI sites.
2.3.c. Add AciI to cleave products containing such sites. Only PCR amplicons containing adjacent AciI sites (GCGG) that were methylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with AciI. Residual polymerase from Taq polymerase will fill in 2-base overhang (optionally raise temperature to 60° C.).
2.3.d. Denature target DNA (94° C. 1 minute) in the presence of partially double-stranded oligonucleotide pairs (comprising a first oligonucleotide probe with a 5′ probe region complementary to the sequence of the 5′ side of the target, an optional primer binding sequence, an optional patient identifier sequence, and a 3′ probe region complementary to the sequence of the 3′ side of the target, and a second oligonucleotide probe comprising a 5′ region complementary to the first oligonucleotide probe, a unique identifier sequence and a 3′ region complementary to the first oligonucleotide probe), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). A thermostable ligase (preferably from strain AK16D), is added to allow for ligation at 60° C. to generate circular products.
2.3.e. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 2. If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Database of Methylation Status at Adjacent Methyl-Sensitive Restriction Sites (AciI Sites).The current TCGA database contains information on the methylation status of about 450,000 CpG sites on the human genome, both for normal and for tumor of many different tissue sites. However, it does not cover all the methylation status of adjacent AciI sites, nor would it distinguish that both sites are methylated on the same piece of genomic DNA.
Consequently, for the above assay method to be most useful, it would be helpful to create a database of methylation status at adjacent AciI restriction sites of the same orientation (GCGG). This approach will also include adjacent AciI restriction sites of the other orientation (CCGC), since they will be GCGG on the opposite strand after bisulfite conversion. One such an approach is illustrated in
Overview of approach: The idea is to generate a library of small fragments that could only have been formed if both ends of the fragment contained restriction sites that were methylated in the original genomic DNA. This idea takes advantage of a unique property of the recognition sequence for AciI. The enzyme cleaves the 3.5 base recognition sequence GACGG in one orientation, and CACGC in the other orientation. Start with a methylated AciI site and treat with bisulfite. In the first orientation (G*CGG), the site will remain unchanged, while in the second orientation, it will be changed (U*CGU, where *C denotes 5-meC.). After a few rounds of PCR amplification, the first site is converted to GCGG, which is recognized by AciI, while the second site is converted to TCGT, which is not cleaved. Thus, AciI may be used to identify uniquely methylated sequences after bisulfite treatment. The fragments have linkers appended with optional unique identifier and optional patient identifier sequences that are now amenable for ligation to create fragment multimers that are then substrates for additional steps and subsequent sequencing.
Variation 2.3.1: (see e.g.,
2.3.1.a. Ligate on linkers that are blocked on the 5′ end of the non-ligating end. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
2.3.1.b. Add 5′ blocked primers, Taq Polymerase, and dNTPs and perform a few cycles of PCR to generate products that are now unmethylated at the remaining AciI sites.
2.3.1.c. Add AciI to cleave products containing such sites. Only PCR amplicons containing adjacent AciI sites (GCGG) that were methylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with AciI. Residual polymerase from Taq polymerase will fill in 2-base overhang (optionally raise temperature to 60° C.). Remove dNTPs (via a spin column), add back only dATP to generate a single base 3′ A overhang.
2.3.1.d. Using T4 ligase, append linkers (containing optional unique identifier sequence, and optional patient identifier sequence) with a single base 3′ T overhang on the ligating end to the single base A overhang of the cleaved and filled in target sequences. The 5′ non-ligating side of the linker contains a 5′ overhang, and optionally is not phosphorylated. The 3′ non-ligating side of the linker contains a 3′ blocking group and/or thiophosphates to inhibit digestion with 3′ exonuclease.
2.3.1.e. Add a 3′ exonuclease (i.e. Exonuclease III) and digest at 37° C. Fragments containing the original short linker on one or both sides will be digested and rendered single-stranded. Only fragments with the new linker ligated to both sides will remain double-stranded. Exonuclease will also render the linkers single-stranded. Optionally, remove digestion products from desired fragments with a spin column.
2.3.11 The free ends of the remaining linker-containing fragments are rendered competent for ligation, either by (i) phosphorylating 5′ end using T4 kinase, (ii) removing blocked 3′ group, or (iii) using 5′→3′ nuclease activity of Taq polymerase to cleave off matching 5′-overlapping base or flap, leaving ligation-competent 5′-phosphate, or any combination thereof. Ligation conditions are designed to favor multimerization. The ligation products comprise of multimers of target fragments with adjacent AciI sequences originally methylated in target DNA with optional unique identifier and/or patient identifier sequence. This product is suitable for optional additional steps and subsequent sequencing.
Note: 1: The 3′ end of the blocked linker may be synthesized to contain a Uracil or an apurinic (AP) site at a position internal to the block, and after treatment with UDG and AP endonuclease, liberate a ligation competent 3′ end.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: Ligation will favor formation of multimers by using crowding agents (i.e. 20% PEG), and/or by mixing two sets of ligation products, wherein the non-palindromic 5′ linker overhang of the first set is complementary to the non-palindromic 5′ linker overhang of the second set.
Prophetic Example 3—High Sensitivity Unmethylated Marker for Promoter Hypomethylation (when Present at 1% to 0.1%) in Total Plasma DNAThe majority of methylation changes in tumors are due to hypomethylation. When such hypomethylation occurs in a promoter region that was previously methylated, it may cause increased expression of a gene, such as an oncogene. Further, repetitive element regions and mobile elements are generally silenced by overall methylation, but such silencing is lost when the tumor becomes hypomethylated.
While methyl-sensitive restriction enzymes may be used to help selectively amplify and identify low levels of methylated sequences, the approach does not work for identifying low levels of unmethylated sequences. Bisulfite treatment and use of PCR primers directed to bisulfite modified unmethylated DNA may be used, although such primers are very AT rich and there may be difficulties amplifying all desired fragments, especially when attempting multiplexed PCR.
Overview of approach: The idea is to faithfully copy every fragment of target DNA that is unmethylated at adjacent restriction sites for the regions of interest, append a unique identifier sequence, an optional primer binding sequence, and an optional patient identifier, and circularize them for subsequent sequencing. The oligonucleotide DNA strand is circularized and sequenced. This approach provides the advantage of obtaining both copy number information when needed, and unmethylation data with the minimum of sequencing required.
Variation 3.1: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Unmethylation:3.1a. Cleave isolated cfDNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). This example illustrates the use of HinP1I. Heat kill endonuclease(s) (65° C. for 15 minutes) and denature DNA (94° C. 1 minute).
3.1b. Denature target DNA (94° C. 1 minute) in the presence of oligonucleotides (comprising a 5′ probe region complementary to the sequence of the 5′ side of the targets, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding site, and a 3′ probe region complementary to the sequence of the 3′ side of the targets), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are added to allow for extension and ligation at 50° C., and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
3.1.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. Oligonucleotide may lack a 5′ phosphate, or contain an optional blocking group on the 5′ side, such that the 5′ end of the oligonucleotide is not suitable for ligation.
Note 2. The above example use KlenTaq, a polymerase lacking strand displacing activity as well as both 3′→5′ and 5′→3′ nuclease activity. If the oligonucleotide has a blocking group on the 5′ side, then one can use polymerase with 5′→3′ nuclease activity.
Note 3. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site. When using direct SMRT sequencing, loss of methylation status of the original template DNA may be directly determined.
Note 4. If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Note 5. The same approach may be used to identify regions that are methylated at promoter regions (see e.g.,
Note 6. The same approach may be used to identify regions that are methylated at Bsh1236I sites in promoter regions (see e.g.,
In variation 3.1, both a polymerase and a ligase are used to form a covalently closed circle containing a unique identifier sequence, an optional patient identifier sequence, and an optional primer binding site. This may also be accomplished using just a ligase.
Variation 3.2: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Unmethylation:3.2.a. Cleave isolated cfDNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). In this example, HinP1I is used. Heat kill endonuclease(s) (65° C. for 15 minutes) and denature DNA (94° C. 1 minute).
3.2.b. Denature target DNA (94° C. 1 minute) in the presence of partially double-stranded oligonucleotide pairs (comprising a first oligonucleotide probe with a 5′ probe region complementary to the sequence of the 5′ side of the target, an optional primer binding sequence, an optional patient identifier sequence, and a 3′ probe region complementary to the sequence of the 3′ side of the target, and a second oligonucleotide probe comprising a 5′ region complementary to the first strand, a unique identifier sequence and a 3′ region complementary to the first strand), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). A thermostable ligase (preferably from strain AK16D), is added to allow for ligation at 60° C. to generate circular products.
3.2.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site. When using direct SMRT sequencing, loss of methylation status of the original template DNA may be directly determined.
Note 2. If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Note 3. The same approach may be used to identify regions that are methylated at promoter regions (see e.g.,
The current TCGA database contains information on the methylation status of about 450,000 CpG sites on the human genome, both for normal and for tumor of many different tissue sites. However, it does not cover all the unmethylation status of adjacent HinP1I sites, nor would it distinguish that both sites are unmethylated on the same piece of genomic DNA.
Consequently, for the above assay method to be most useful, it would be helpful to create a database of unmethylation status at adjacent methyl-sensitive restriction sites. One such an approach is illustrated in
Overview of approach: The idea is to generate a library of small fragments that could only have been formed if both ends of the fragment contained restriction sites that were unmethylated in the original genomic DNA. The fragments have linkers appended with optional unique identifier and optional patient identifier sequences that are now amenable for ligation to create fragment multimers that are then substrates for additional steps and subsequent sequencing.
Variation 3.2.1: (see e.g.,
3.2.1a. Ligate on linkers that are blocked on the 5′ end of the non-ligating end. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Cleave isolated genomic DNA, or methyl enriched DNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). In this example, HinP1I was used. Optionally, heat kill endonuclease(s) (65° C. for 15 minutes).
3.2.1.b. Add HinP1I to cleave products containing such sites. Only target containing adjacent HinP1I sites (GCGC) that were unmethylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with HinP1I. Residual polymerase from Taq polymerase will fill in 2-base overhang (optionally raise temperature to 60° C.). Remove dNTPs (via a spin column) and add back only dATP to generate a single base 3′ A overhang.
3.2.1.c. Using T4 ligase, append linkers (containing optional unique identifier sequence, and optional patient identifier sequence) with a single base 3′ T overhang on the ligating end to the single base A overhang of the cleaved and filled in target sequences. The 5′ non-ligating side of the linker contains a 5′ overhang, and optionally is not phosphorylated. The 3′ non-ligating side of the linker contains a 3′ blocking group and/or thiophosphates to inhibit digestion with 3′ exonuclease.
3.2.1.d. Add a 3′ exonuclease (i.e. Exonuclease III) and digest at 37° C. Fragments containing the original short linker on one or both sides will be digested and rendered single-stranded. Only fragments with the new linker ligated to both sides will remain double-stranded. Exonuclease will also render the linkers single-stranded. Optionally, remove digestion products from desired fragments with a spin column.
3.2.1.1.e. The free ends of the remaining linker-containing fragments are rendered competent for ligation, either by (i) phosphorylating 5′ end using T4 kinase, (ii) removing blocked 3′ group, or (iii) using 5′→3′ nuclease activity of Taq polymerase to cleave off matching 5′-overlapping base or flap, leaving ligation-competent 5′-phosphate, or any combination thereof Ligation conditions are designed to favor multimerization. The ligation products comprise of multimers of target fragments with adjacent HinP1I sequences originally methylated in target DNA with optional unique identifier and/or patient identifier sequence. This product is suitable for optional additional steps and subsequent sequencing.
Note: 1: The 3′ end of the blocked linker may be synthesized to contain a Uracil or an apurinic (AP) site at a position internal to the block, and after treatment with UDG and AP endonuclease, liberate a ligation competent 3′ end.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: Ligation will favor formation of multimers by using crowding agents (i.e. 20% PEG), and/or by mixing two sets of ligation products, wherein the non-palindromic 5′ linker overhang of the first set is complementary to the non-palindromic 5′ linker overhang of the second set.
Note 4. The same approach may be used to identify regions that are methylated at promoter regions (see e.g.,
There may be a desire to determine unmethylation status of CpG sequences in between two methyl sensitive restriction sites in a promoter region. This is similar to the approaches above, and includes an extra bisulfite step.
Overview of approach: The idea is to faithfully copy every fragment of target DNA that is unmethylated at adjacent restriction sites for the regions of interest, treat with bisulfite, append a unique identifier sequence, an optional primer binding sequence, and an optional patient identifier, and circularize them for subsequent sequencing. The oligonucleotide DNA strand is circularized and sequenced. This approach provides the advantage of obtaining both copy number information when needed, and unmethylation data with the minimum of sequencing required.
Variation 3.3: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Unmethylation:3.3.a. Cleave isolated cfDNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). This example describes HinP1I. Heat kill endonuclease(s) (65° C. for 15 minutes) and denature DNA (94° C. 1 minute). Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
3.3.b. Denature target DNA (94° C. 1 minute, if needed) in the presence of oligonucleotides (comprising a 5′ probe region complementary to the sequences to the 5′ side of the targets, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding site, and a 3′ probe region complementary to the sequences to the 3′ side of the targets), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are added to allow for extension and ligation at 50° C., and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
3.3.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. Oligonucleotide may lack a 5′ phosphate, or contain an optional blocking group on the 5′ side, such that the 5′ end of the oligonucleotide is not suitable for ligation.
Note 2. The above example use KlenTaq, a polymerase lacking strand displacing activity as well as both 3′→5′ and 5′→3′ nuclease activity. If the oligonucleotide has a blocking group is on the 5′ side, then one can use polymerase with 5′→3′ nuclease activity.
Note 3. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 4. If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
In variation 3.3 both a polymerase and a ligase were used to form a covalently closed circle containing a unique identifier sequence, an optional patient identifier sequence, and an optional primer binding site. This may also be accomplished using just a ligase.
Variation 3.4: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Unmethylation3.4.a. Cleave isolated cfDNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). This example uses HinP1I. Heat kill endonuclease(s) (65° C. for 15 minutes) and denature DNA (94° C. 1 minute). Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
3.4.b. Denature target DNA (94° C. 1 minute) in the presence of partially double-stranded oligonucleotide pairs (comprising a first oligonucleotide probe with a 5′ probe region complementary to the sequences to the 5′ side of the target, an optional primer binding sequence, an optional patient identifier sequence, and a 3′ probe region complementary to the sequences to the 3′ side of the target, and a second oligonucleotide probe comprising a 5′ region complementary to the first strand, a unique identifier sequence and a 3′ region complementary to the first strand), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). A thermostable ligase (preferably from strain AK16D), is added to allow for ligation at 60° C. to generate circular products.
3.4.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site. When using direct SMRT sequencing, loss of methylation status of the original template DNA may be directly determined.
Note 2. If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Database of Unmethylation Status at Adjacent Methyl-Sensitive Restriction SitesThe current TCGA database contains information on the methylation status of about 450,000 CpG sites on the human genome, both for normal and for tumor of many different tissue sites. However, it does not cover all the unmethylation status of adjacent HinP1I sites, nor the CpG sites between those sites, nor would it distinguish that both sites are unmethylated on the same piece of genomic DNA.
Consequently, for the above assay method to be most useful, it would be helpful to create a database of unmethylation status at adjacent methyl-sensitive restriction sites. One such an approach is illustrated in
Overview of approach: The idea is to generate a library of small fragments that could only have been formed if both ends of the fragment contained restriction sites that were unmethylated in the original genomic DNA. The fragments have linkers appended with optional unique identifier and optional patient identifier sequences that are now amenable for ligation to create fragment multimers, then treated with bisulfite to reveal positions of methylation or unmethylation, that are then substrates for additional steps and subsequent sequencing.
Variation 3.4.1: (see e.g.,
3.4.1a. Ligate on linkers that are blocked on the 5′ end of the non-ligating end. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Cleave isolated genomic DNA, or methyl enriched DNA with one or more methyl sensitive enzymes (AciI, HinP1I, Hpy99I, and HpyCH4IV). In this example, HinP1I is used. Optionally, heat kill endonuclease(s) (65° C. for 15 minutes).
3.4.1.b. Add HinP1I to cleave products containing such sites. Only target containing adjacent HinP1I sites (GCGC) that were unmethylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with HinP1I. Residual polymerase from Taq polymerase will fill in 2-base overhang (optionally raise temperature to 60° C.). Remove dNTPs (via a spin column), add back only dATP to generate a single base 3′ A overhang.
3.4.1.c. Using T4 ligase, append linkers (containing optional unique identifier sequence, and optional patient identifier sequence) with a single base 3′ T overhang on the ligating end to the single base A overhang of the cleaved and filled in target sequences. The 5′ non-ligating side of the linker contains a 5′ overhang, and optionally is not phosphorylated. The 3′ non-ligating side of the linker contains a 3′ blocking group and/or thiophosphates to inhibit digestion with 3′ exonuclease.
3.4.1.d. Add a 3′ exonuclease (i.e. Exonuclease III) and digest at 37° C. Fragments containing the original short linker on one or both sides will be digested and rendered single-stranded. Only fragments with the new linker ligated to both sides will remain double-stranded. Exonuclease will also render the linkers single-stranded. Optionally, remove digestion products from desired fragments with a spin column.
3.4.1.1.e. The free ends of the remaining linker-containing fragments are rendered competent for ligation, either by (i) phosphorylating 5′ end using T4 kinase, (ii) removing blocked 3′ group, or (iii) using 5′→3′ nuclease activity of Taq polymerase to cleave off matching 5′-overlapping base or flap, leaving ligation-competent 5′-phosphate, or any combination thereof. Ligation conditions are designed to favor multimerization. The ligation products comprise of multimers of target fragments with adjacent HinP1I sequences originally methylated in target DNA with optional unique identifier and/or patient identifier sequence. Incubate ligation products with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart. These single-stranded products are suitable for optional additional steps and subsequent sequencing.
Note: 1: The 3′ end of the longer blocked linker may be synthesized to contain a Uracil or an apurinic (AP) site at a position internal to the block, and after treatment with UDG and AP endonuclease, liberate a ligation competent 3′ end. Further, these linkers may be synthesized with 5-methyl C, such that after treatment with bisulfate they retain their original status.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: Ligation will favor formation of multimers by using crowding agents (i.e. 20% PEG), and/or by mixing two sets of ligation products, wherein the non-palindromic 5′ linker overhang of the first set is complementary to the non-palindromic 5′ linker overhang of the second set.
Detection of Unmethylation Status of Individual CpG Sites within a Given Region
The above approach is ideal for identifying and enumerating fragments containing adjacently unmethylated HinP1I sites as a surrogate for unmethylation within that fragment of DNA. However, for some applications, it is important to identify unmethylation status of individual CpG sites within a given region. Thus it may be necessary to treat the input DNA with bisulfite, which converts regular C, but not methyl-C, to U.
Overview of approach using bisulfite: The idea is to faithfully copy every fragment of target DNA that is unmethylated at adjacent HphI restriction sites of the sequence (GGTGA) for the regions of interest, treat with bisulfite (convert GGCGA to GGTGA), append a unique identifier sequence, an optional primer binding sequence, and an optional patient identifier, and circularize them for subsequent sequencing. The oligonucleotide DNA strand is circularized and sequenced. This approach provides the advantage of obtaining both copy number information when needed, and methylation data with the minimum of sequencing required.
The idea takes advantage of a unique property of the recognition sequence for HphI. The enzyme cleaves the 4.5 base recognition sequence GGTGA in one orientation, and TCACC in the other orientation. If unmethylated GGCGA site is treated with bisulfite, in the first orientation, the site will become an HphI recognition sequence GGTGA. After a few rounds of PCR amplification, the 5-methyl C is converted to C, while the U is converted to T. When cleaved with HphI, only adjacent sites unmethylated in the original target will create fragments that are ligation competent on both the 3′ and 5′ ends.
One unique feature of this approach is that bisulfite conversion creates two non-complementary strands. Thus the top strand will have adjacent GGCGA or GGTGA sequences, and even if there is an intervening TCACC sequence, it doesn't matter because it is converted to TUAUU, which is not recognized as an HphI site. Even better, sequences on the top strand of the form TCGCC will be GGCGA on the bottom strand. The bottom strand may also be used to query methylation status, and since the two sequences are now very different, the oligonucleotide probes to the top and bottom strand will not hybridize to each other, only to the converted targets. Thus, this approach allows for obtaining detailed methylation status on the promoter using information from both top and bottom strands.
The principle of treating DNA with bisulfite to convert unmethylated DNA into a sequence that is cleavable by a restriction enzyme, but if methylated, it is not cleavable by that same restriction enzyme may be extended to some additional restriction sites. For example, the sequence CCGTC will be converted to a BccI site (CCATC), provided the internal CG is not methylated. (This results from conversion of the opposite strand, i.e. GACGG into the opposite strand recognition sequence of BccI; i.e. GATGG). Likewise, the sequence GGACG will be converted to a FokI site (GGATG) provided the internal CG is not methylated. Thus, the variations considered below for HphI are equally valid for, but not limited to, restriction endonucleases BccI, FokI, and their isoschizomers.
Variation 3.5: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Unmethylation:3.5.a. Ligate on linkers that contain 5-methyl C to retain sequence after bisulfite conversion. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
3.5.b. Add primers, Taq Polymerase, and dNTPs and perform a few cycles of PCR to generate products that are now unmethylated at the remaining HphI sites.
3.5.c. Add HphI to cleave products containing such sites. Only PCR amplicons containing adjacent HphI sites (GGCGA or GGTGA) that were unmethylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with HphI.
3.5.d. Denature target DNA (94° C. 1 minute) in the presence of oligonucleotides (comprising a 5′ probe region complementary to the sequences to the 5′ side of the targets, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding site, and a 3′ probe region complementary to the sequences to the 3′ side of the targets), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are added to allow for extension and ligation at 50° C., and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
3.5.e. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. Oligonucleotide may lack a 5′ phosphate, or contain an optional blocking group on the 5′ side, such that the 5′ end of the oligonucleotide is not suitable for ligation.
Note 2. The above example use KlenTaq, a polymerase lacking strand displacing activity as well as both 3′→5′ and 5′→3′ nuclease activity. If the oligonucleotide has a blocking group is on the 5′ side, then one can use polymerase with 5′→3′ nuclease activity.
Note 3. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 4. If the oligonucleotide probe also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
In variation 3.5, a polymerase and a ligase are used to form a covalently closed circle containing a unique identifier sequence, an optional patient identifier sequence, and an optional primer binding site. This may also be accomplished using just a ligase.
Variation 3.6: (see e.g.,
By insisting on having the restriction endonuclease generate both the 3′OH and the 5′ phosphate, this avoids false signal, and should get rid of any non-specific ligation signal as well. Thus, any rare fragment of genomic DNA that was single-stranded after purification, or did not get cleaved will not form a productive substrate and will be destroyed by the exonuclease treatment step.
Detailed Protocol for Highly Sensitive Detection of Promoter Unmethylation:3.6.a. Ligate on linkers that contain 5-methyl C to retain sequence after bisulfite conversion. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
3.6.b. Add primers, Taq Polymerase, and dNTPs and perform a few cycles of PCR to generate products that are now unmethylated at the remaining HphI sites.
3.6.c. Add HphI to cleave products containing such sites. Only PCR amplicons containing adjacent HphI sites (GGCGA or GGTGA) that were unmethylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with HphI.
3.6.d. Denature target DNA (94° C. 1 minute) in the presence of partially double-stranded oligonucleotide pairs (comprising a first oligonucleotide probe with a 5′ probe region complementary to the sequence of the 5′ side of the target, an optional primer binding sequence, an optional patient identifier sequence, and a 3′ probe region complementary to the sequence of the 3′ side of the target, and a second oligonucleotide probe comprising a 5′ region complementary to the first strand, a unique identifier sequence and a 3′ region complementary to the first strand), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50°-60° C. for 2 hours). A thermostable ligase (preferably from strain AK16D), is added to allow for ligation at 60° C. to generate circular products.
3.6.e. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of a copy of the methylated DNA, the unique identifier sequence, and an optional patient identifier sequence. This circular product is suitable for optional additional steps and subsequent sequencing.
Note 1. The circular product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 2. If the oligonucleotide probe also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Database of Methylation Status at Adjacent HphI Restriction SitesThe current TCGA database contains information on the methylation status of about 450,000 CpG sites on the human genome, both for normal and for tumor of many different tissue sites. However, it does not cover all the unmethylation status of adjacent AciI sites, nor would it distinguish that both sites are methylated on the same piece of genomic DNA.
Consequently, for the above assay method to be most useful, it would be helpful to create a database of methylation status at adjacent HphI restriction sites of the same orientation (converted GGCGA to GGTGA). This approach will also include adjacent HphI restriction sites of the other orientation (TCACC and TCGCC), since they will be GGTGA on the opposite strand after bisulfite conversion. One such an approach is illustrated in
Overview of approach: The idea is to generate a library of small fragments that could only have been formed if both ends of the fragment contained converted GGCGA (to GGTGA) that were unmethylated in the original genomic DNA. The idea takes advantage of a unique property of the recognition sequence for HphI. The enzyme cleaves the 4.5 base recognition sequence GGTGA in one orientation, and TCACC in the other orientation. If an unmethylated GGCGA site is treated with bisulfite, in the first orientation, the site will become an HphI recognition sequence GGTGA. After a few rounds of PCR amplification, the 5-methyl C is converted to C, while the U is converted to T. When cleaved with HphI, only adjacent sites unmethylated in the original target will create fragments that are ligation competent on both the 3′ and 5′ ends. The fragments have linkers appended with optional unique identifier and optional patient identifier sequences that are now amenable for ligation to create fragment multimers that are then substrates for additional steps and subsequent sequencing.
Variation 3.6.1: (see e.g.,
Variation 3.6.1: (see e.g.,
3.6.1.a. Ligate on linkers that are blocked on the 5′ end of the non-ligating end. Target ends are repaired with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is generated with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase appends linkers on both ends of the fragment. Incubate cfDNA with bisulfite, which converts regular C, but not methyl-C, to U. Bisulfite treatment converts the top and bottom strand differently, such that after treatment, the strands will no longer be complementary to each other, and melt apart.
3.6.1.b. Add 5′ blocked primers, Taq Polymerase, and dNTPs and perform a few cycles of PCR to generate products that are now unmethylated at the remaining HphI sites.
3.6.1.c. Add HphI to cleave products containing such sites. Only PCR amplicons containing adjacent HphI sites (GGCGA or GGTGA) that were unmethylated in the original target will generate fragments that are unblocked (i.e. ligation competent for linkers) on both ends, when cleaved with HphI. Target ends are repaired with T4 polymerase and T4 Kinase. Remove dNTPs (via a spin column), add back only dATP to generate a single base 3′ A overhang.
3.6.1.d. Using T4 ligase, append linkers (containing optional unique identifier sequence, and optional patient identifier sequence) with a single base 3′ T overhang on the ligating end to the single base A overhang of the cleaved and filled in target sequences. The 5′ non-ligating side of the linker contains a 5′ overhang, and optionally is not phosphorylated. The 3′ non-ligating side of the linker contains a 3′ blocking group and/or thiophosphates to inhibit digestion with 3′ exonuclease.
3.6.1.e. Add a 3′ exonuclease (i.e. Exonuclease III) and digest at 37° C. Fragments containing the original short linker on one or both sides will be digested and rendered single-stranded. Only fragments with the new linker ligated to both sides will remain double-stranded. Exonuclease will also render the linkers single-stranded. Optionally, remove digestion products from desired fragments with a spin column.
3.6.1.f. The free ends of the remaining linker-containing fragments are rendered competent for ligation, either by (i) phosphorylating 5′ end using T4 kinase, (ii) removing blocked 3′ group, or (iii) using 5′→3′ nuclease activity of Taq polymerase to cleave off matching 5′-overlapping base or flap, leaving ligation-competent 5′-phosphate, or any combination thereof. Ligation conditions are designed to favor multimerization. The ligation products comprise of multimers of target fragments with adjacent AciI sequences originally methylated in target DNA with optional unique identifier and/or patient identifier sequence. This product is suitable for optional additional steps and subsequent sequencing.
Note: 1: The 3′ end of the blocked linker may be synthesized to contain a Uracil or an apurinic (AP) site at a position internal to the block, and after treatment with UDG and AP endonuclease, liberate a ligation competent 3′ end.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: Ligation will favor formation of multimers by using crowding agents (i.e. 20% PEG), and/or by mixing two sets of ligation products, wherein the non-palindromic 5′ linker overhang of the first set is complementary to the non-palindromic 5′ linker overhang of the second set.
Prophetic Example 4—Accurate Quantification of Tumor-Specific Copy Changes in DNA Isolated from Circulating Tumor CellsCopy changes in tumor DNA can be a strong predictor of outcome. Over the last several years, most copy number work has been performed on SNP chips, where bioinformatic approaches average the signal across a region to determine relative copy number. For low numbers of cells, digital PCR approaches are used to obtain an accurate count of starting molecules.
Overview of approach: Generally, copy changes occur over large regions of DNA, such as chromosomal arms. Since there are very low numbers of tumor cells, one could improve accuracy by interrogating multiple regions of a given chromosomal arm simultaneously, and adding or averaging the resultant signal. Likewise, specific genes are amplified in some tumors (i.e. Her2-neu, IGF2), which may predict outcome or guide therapy.
In addition to copy changes, loss of heterozygosity may be followed not only by counting chromosomes, but also by looking for traditional loss of heterozygosity among polymorphic SNPs or markers within the region of interest. The most heterogenous markers are in repetitive sequences. Herein, are illustrated the concepts using tetranucleotide repeat sequences, although tri- and di-nucleotide repeat sequences may also be considered.
Detailed protocol for quantification of tumor-specific copy changes in DNA isolated from circulating tumor cells will be addressed in the next section that scores for mutations from circulating tumor cells, since the overall approach is very similar.
Prophetic Example 5—Detection of Mutations in DNA Isolated from Circulating Tumor Cells or cfDNACirculating tumor cells provide the advantage of concentrating the mutation-containing DNA, so there is no longer a need to find low-level mutations in an excess of wild-type sequence. However, since there are a low number of starting DNA molecules, it is important to amplify all regions accurately, and verify mutations are truly present.
Overview of approach: The approach here is similar to that for finding known common point mutations, or for sequencing multiple exons, as outlined above. However, when dealing with low amounts of input DNA, there is the potential for polymerase error. This problem is addressed by using circle sequencing or SMRT sequencing.
Since the DNA is being obtained from a few captured tumor cells, if a mutation is present, it should be present in some if not most of the captured cells.
Detailed protocol for detection of mutations in DNA isolated from circulating tumor cells is described below, including quantifying copy number, since the approach is very similar.
Variation 5.1: (see e.g.,
In this variation, the aim is to capture all target and non-target DNA sequence, without using probes to bind to the target. Linkers are ligated to the DNA target (cfDNA of average length of about 160 bases). To capture all sequences, oligonucleotide probes comprising a 5′ sequence complementary to the 5′ end of the linker, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a sequence complementary to the 3′ end of the linker, are hybridized to the target. The oligonucleotide may contain an optional blocking group on one end (5′ end shown). Addition of a polymerase allows extension of the 3′ end of the oligonucleotide on the target, as well as extension of the 3′ end linker through the unique identifier sequence, the optional primer binding sequence, and the optional patient identifier sequence until it is adjacent to the 5′ linker on the target.
The 5′ end of the linker may be designed to be slightly longer, such that once polymerase extends the 3′ linker, it doesn't extend right through the 5′ linker complementary sequence until it hits the 5′ target portion that is hybridized to the 5′ probe region.
This variation requires that the oligonucleotide probe contain sequences complementary to both the 3′ and 5′ linker strands, with those sequences being separated by only about 20-40 bases. Since the sequences are complementary to the linker strands, which in turn are complementary to each other, it is important to keep the two sequences within the oligonucleotide probe from just forming an internal hairpin with a 20-40 base loop. One solution is to ligate linkers that contain an internal bubble, such that the two linkers retain double stranded character at the low temperature used for linker ligation (16° C. or even 4° C. with T4 ligase). In addition the 5′ linker may be designed to be longer than the 3′ linker. Finally, the regions of complementarity within the linker may be designed to have subtle mismatches (i.e. G:T and T:G) which are more destabilizing in the complementary oligonucleotide probe (i.e. C:A and A:C) such that the oligonucleotide probe is less likely to form the internal hairpin at the overall hybridization temperature (i.e. 40-50° C.).
Extension of the 3′ end of the oligonucleotide on the target enhances association of the probe to the target, and thus increases the ability of the 3′ end of the linker to hybridize correctly to its complement and be extended by polymerase. The 5′→3′ nuclease cleavage activity of polymerase (or Fen nuclease) cleaves a matching 5′-overlapping base of the 5′ linker, leaving ligation-competent 5′-phosphate on the linker. Polymerase also extends oligonucleotide on target, and does not cleave the blocking group on the 5′ end of the oligonucleotide.
Ligase covalently seals the extended 3′ ends to the ligation-competent 5′-phosphate to create circular ligation products. Blocking group prevents circularization of oligonucleotide probe. Optional addition of Uracil-DNA glycosylase (UDG) and Formamidopyrimidine-DNA glycosylase (Fpg, also known as 8-oxoguanine DNA glycosylase, which acts both as a N-glycosylase and an AP-lyase) may be used to nick targets containing damaged bases. Exonuclease(s) are then added to digest all unligated or nicked products leaving only desired single-stranded circular DNA comprising of original target DNA with unique identifier sequence. This product is suitable for sequence-specific capture of desired targets using biotinylated probes, combined with rolling circle amplification and circle sequencing, or direct SMRT sequencing.
The challenge here is to avoid having polymerase extend the 3′ linker in such a way that it destroys the 5′ linker without a ligation step (i.e. nick-translation). This may be accomplished by incorporating thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
An alternative approach is to use a 5′ linker containing an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the primer is bound to the target. The endonuclease also cleaves single-stranded primer, but with lower efficiency, and thus primer hybridized to template would be the preferred substrate. When using thermostable EndoIII, the PCR polymerase used would lack the 5′→3′ exonuclease activity.
As mentioned above, the nick-translation problem may also be minimized by using a mixture of polymerases, both with and without 5′→3′ nuclease cleavage activity (e.g. in a ratio of 1:20) under conditions of distributive extension in the presence of ligase such that most extension is by polymerase without nuclease activity until polymerase with nuclease activity is required to create the ligation competent junction, followed by polymerase dissociation, and a ligation event to generate the desired circular ligation product.
Detailed Protocol for Accurate Quantification of Tumor-Specific Copy Changes or Detection of Mutations in Known Genes (e.g. Braf, K-Ras, p53) in DNA Isolated from Circulating Tumor Cells or cfDNA:
5.1.a. Starting with cfDNA or genomic DNA isolated from CTC, (sheared to about 150 bp), repair ends with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is added with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase at 4° C. appends linkers on both ends of the fragment. Optionally, purify target DNA from unligated linker.
5.1.b. Denature target DNA containing linkers on both ends (94° C. 1 minute) in the presence of oligonucleotide probes (comprising a 5′ sequence complementary to the 5′ end of the linker, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a sequence complementary to the 3′ end of the linker), and allow the oligonucleotide probes to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 40-50° C. for 30 min). Oligonucleotide probes may contain an optional blocking group at the 5′ end. Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are either added subsequent to the annealing step, or at the start of the procedure. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
5.1.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for sequence-specific capture of desired targets using biotinylated probes, or for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence, followed by enhanced capture of desired targets using biotinylated probes. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence or optional primer binding sequence as a primer binding site.
Note 1: Oligonucleotide may contain an optional blocking group on the 5′ side to interfere with subsequent 5′-3′ nuclease activity of polymerase, such that the oligonucleotide probe does not circularize.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: The 5′ end linker may be synthesized to contain thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase,). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note: 4: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the primer is bound to the target. The endonuclease also cleaves single-stranded primer, but with lower efficiency, and thus primer hybridized to template would be the preferred substrate.
Note 5: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain a 5′ phosphate. Alternatively, the 5′ phosphate may be added using T4 kinase either prior to ligating to the target DNA, or after that ligation step.
Note 6: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 7: Capture of tandem repeat copies of the target (generated through rolling circle amplification) provides the unique advantage of multivalent binding to enhance capture of the correct target. This allows for more stringent hybridization/capture conditions resulting in higher efficiencies of capture and less capture of unwanted sequences. This approach may also be used to capture all targets containing repetitive sequences (i.e. AGAT tetra-nucleotide repeat), allowing scoring for loss of heterozygosity, copy number variation, haplotyping, establishing paternity, and other applications.
Note 8: The unique sequence identifier assures that even after rolling circle or other selection/amplification steps, each original target sequence may be uniquely scored, allowing for accurate quantification of the original copy number of each target.
Note 9: The unique identifier sequence may be added on both sides of the linker. If an extra base is not added to the target DNA (i.e. skipping the Klenow step), then a blunt end ligation is used. To avoid linker-to-linker ligation, the blunt end of the linker is un-phosphorylated. Example of linkers containing unique identifiers and blunt ends is provided as oligonucleotides iSx-201-MdAdT (Top strand), and iSx-202-MdLgAdB-bk (Bottom strand) (see Table 1). The top strand is longer, creating a 5′ single-stranded overhang on the non-ligating side. The bottom strand has a 5′-OH, and optionally contains a 3′ blocking group to avoid extension, as well as internal 5-nitroindole groups to act as universal bases when pairing with the unique identifier sequences of the top strand. After ligation of the blunt end linker, appending unique sequences to the 5′ ends of the double-stranded DNA target, those sequences are copied by extending the 3′ end with polymerase. Optionally, the top strand contains dU bases for subsequent cleavage with UDG and FPG to liberate a 5′ phosphate, suitable for ligation and circularization in subsequent steps. Linker iSx-201-MdAdT (see Table 1) may contain optional thiophosphate groups adjacent to the newly liberate 5′ phosphate to prevent nick-translation and facilitate circularization of targets with linker ends.
Note 10: When designing oligonucleotides for use with longer linker sequences, comprising “barcoding” or “indexing” sequences for use with commercial instruments, it may be necessary to assemble the oligonucleotide using PCR, strand-displacement amplification, or a combination thereof. During PCR, use of dUTP instead of TTP incorporates uracil, suitable for subsequent cleavage by UDG. The reverse-strand primer may be phosphorylated, allowing for its digestion using lambda exonuclease or a similar 5′→3′ exonuclease. A dA30 sequence may be appended to the 5′ end of the forward primer, enabling strand displacement amplification. Example of oligonucleotides for assembly of a reverse complementary sequence, which are suitable for use with the above linkers are: iSx-204-bkA30-Lk-F2, iSx-205-r503-F3, iSx-206-d701-R4, and iSx-207-Lk-R5 (see Table 1).
Note 11: In another variation for appending the unique identifier sequence may be added on both sides of the linker, one strand also comprises a “barcode” or “index” sequence for use with commercial instruments (see e.g.,
Note 12: With the aforementioned products generated using the above primer and linker designs, after cluster or bead amplification, or capture within a well, address, or surface of a flow cell on a commercial instrument, the following primers may be used to initiate sequencing reactions: (i) iLx-003-PEsqP1, Paired End sequencing primer 1; (ii) iLx-004-BrCdR1, Indexing primer, Barcode Read 1; (iii) iLx-001-P5-BrCdR2, Barcode Read 2; and (iv) iLx-005-PEsqP2, Paired End sequencing primer 2 (Primer sequences provided in Table 1 below).
The above procedure uses linkers with a 5′OH. Variation 5.2 uses linkers with a 5′ phosphate to avoid some potential problems associated with using polymerase with a 5′-3′ nuclease activity (see e.g.,
Detailed Protocol for Accurate Quantification of Tumor-Specific Copy Changes or Detection of Mutations in Known Genes (e.g. Braf, K-Ras, p53) in DNA Isolated from Circulating Tumor Cells or cfDNA:
5.2.a. Starting with cfDNA or genomic DNA isolated from CTC, (sheared to about 150 bp), repair ends with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is added with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase at 4° C. appends linkers on both ends of the fragment. The 5′ linker contains a phosphate group (optionally added using T4 kinase).
5.2.b. Denature target DNA containing linkers on both ends (94° C. 1 minute) in the presence of oligonucleotide probes (comprising a 5′ sequence complementary to the 5′ end of the linker, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a sequence complementary to the 3′ end of the linker), and allow the oligonucleotide probes to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 40-50° C. for 30 min). Oligonucleotide probe may contain an optional blocking group at the 5′ end. KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are either added subsequent to the annealing step, or at the start of the procedure. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
5.2.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for sequence-specific capture of desired targets using biotinylated probes, or for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence, followed by enhanced capture of desired targets using biotinylated probes. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence or optional primer binding sequence as a primer binding site.
Note 1: Oligonucleotide may contain an optional blocking group on the 5′ side, or a 5′ OH, such that the oligonucleotide probe does not circularize.
Note 2: Linkers may be synthesized with 5′ phosphate, or the phosphate may be appended using T4 kinase.
Note 3: Capture of tandem repeat copies of the target (generated through rolling circle amplification) provides the unique advantage of multivalent binding to enhance capture of the correct target. This allows for more stringent hybridization/capture conditions resulting in higher efficiencies of capture and less capture of unwanted sequences. This approach may also be used to capture all targets containing repetitive sequences (i.e. AGAT tetra-nucleotide repeat), allowing scoring for loss of heterozygosity, copy number variation, haplotyping, establishing paternity, and other applications.
Note 4: The unique sequence identifier assures that even after rolling circle or other selection/amplification steps, each original target sequence may be uniquely scored, allowing for accurate quantification of the original copy number of each target.
Note 5: Examples of linkers and complementary oligonucleotide to facilitate circularization are: iSx-220-d503-AdT1, iSx-221-pd707-AdB2, and iSx-222-Lk-uRC1, respectively (see Table 1). Linker iSx-220-d503-AdT1 (see Table 1) may contain optional thiophosphate groups at the 2nd and 3rd position from the 5′ end to prevent nick-translation and facilitate circularization of targets with linker ends.
The approaches described in
Variation 5.3 describes an alternative approach, wherein the capture step occurs in liquid, by using probes designed to hybridize near or adjacent to each other on the target, and then create a “landing pad” for the linkers on the ends of the target to hybridize, extend, and ligate to create the covalently closed circular target(s). This approach may be particularly useful when selecting for targets containing repetitive DNA.
Detailed Protocol for Accurate Quantification of Tumor-Specific Copy Changes or Detection of Mutations in Known Genes (e.g. Braf, K-Ras, p53) in DNA Isolated from Circulating Tumor Cells or cfDNA:
5.3.a. Starting with cfDNA or genomic DNA isolated from CTC, (sheared to about 150 bp), repair ends with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is added with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase at 4° C. appends linkers on both ends of the fragment. Linkers may be synthesized with 5′ phosphate, or the phosphate may be appended using T4 kinase. Optionally, purify target DNA from unligated linker.
5.3.b. Denature target DNA containing linkers on both ends (94° C. 1 minute) in the presence of oligonucleotide probes (comprising a 5′ sequence complementary to a unique or repetitive portion (i.e. AGAT repeat) of the target, an optional spacer region, a sequence complementary to the 5′ end of the linker, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, an optional spacer region, a sequence complementary to the 3′ end of the linker, and a 3′ sequence complementary to a unique or repetitive portion (i.e. AGAT repeat) of the target, and adjacent to the 5′ sequence complementary to the target), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Oligonucleotide may contain an optional blocking group on the 5′ side. Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are either added subsequent to the annealing step, or at the start of the procedure. In the case where the linker sequence has a 5′ phosphate, KlenTaq extends the 3′ end until it is directly adjacent to the ligation-competent 5′ end (left side of
5.3.c. Optionally, cleave the oligonucleotide probe at a cleavable link (e.g. U cleaved using UDG and AP endonuclease). Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for sequence-specific capture of desired targets using biotinylated probes, or for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence, followed by enhanced capture of desired targets using biotinylated probes. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence or optional primer binding sequence as a primer binding site.
Note 1: Oligonucleotide may contain an optional blocking group on the 5′ side to interfere with subsequent 5′-3′ nuclease activity of polymerase, such that the oligonucleotide probe does not circularize. Alternatively, a cleavable link may be included in the original oligonucleotide.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: The 5′ end linker may be synthesized to contain thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase,). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note: 4: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the primer is bound to the target. The endonuclease also cleaves single-stranded primer, but with lower efficiency, and thus primer hybridized to template would be the preferred substrate.
Note 5: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain a 5′ phosphate. Alternatively, the 5′ phosphate may be added using T4 kinase either prior to ligating to the target DNA, or after that ligation step.
Note 6: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 7: Capture of tandem repeat copies of the target (generated through rolling circle amplification) provides the unique advantage of multivalent binding to enhance capture of the correct target. This allows for more stringent hybridization/capture conditions resulting in higher efficiencies of capture and less capture of unwanted sequences. This approach may also be used to capture all targets containing repetitive sequences (i.e. AGAT tetra-nucleotide repeat), allowing scoring for loss of heterozygosity, copy number variation, haplotyping, establishing paternity, and other applications.
Note 8: The unique sequence identifier assures that even after rolling circle or other selection/amplification steps, each original target sequence may be uniquely scored, allowing for accurate quantification of the original copy number of each target.
Note 9: The unique identifier sequence may be added on both sides of the linkers comprising of “index” or “barcode” sequences, using a single base T overhang ligation (see e.g.,
Note 10: The unique identifier sequence may be added on both sides of the linkers comprising of “index” or “barcode” sequences, using blunt end ligation. Repair target ends with T4 polymerase. Ligate on 3-piece “gap” linkers with 3′ phosphate and blunt overhang using T4 ligase at 30° C. Use Klenow fragment lacking 3′→5′ nuclease activity to fill in gap at 37° C., heat to 50° C. to denature off small linker. Use Taq DNA polymerase (lacking 3′→5′ nuclease activity, but with 5′-3′ nuclease activity) to fill in gap, and ligate to target with Thermostable ligase. Examples of such 3-piece linker oligonucleotides are: iSx-226-d505-AdT6, iSx-227-SmAdT7p, and iSx-225-pd708-N6AdB5 (same as above; see Table 1). Linker iSx-226-d505-AdT6 (see Table 1) may contain optional thiophosphate groups at the 2nd and 3rd position from the 5′ end to prevent nick-translation and facilitate circularization of targets with linker ends.
Note 11: The unique identifier sequence may be added on both sides of the linkers comprising of “index” or “barcode” sequences, using reverse transcription. Repair target ends with T4 polymerase, phosphorylate 5′ end with T4 kinase (see e.g.,
Note 12: One variation is to use oligonucleotides comprising multiple cleavable linkages, such that after extension/ligation and treating with the cleaving agent, a unique primer is generated on the target strand, suitable for rolling circle amplification to generate tandem-repeat products. Examples of such oligonucleotides suitable for generating the circular products comprising target regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations are shown in Table 1 below and include: (i) KRAS forward and reverse target extension/ligation oligonucleotides (iSx-232-KRS-T32, iSx-233-KRS-B33); (ii) BRAF forward and reverse target extension/ligation oligonucleotides (iSx-234-BRF-T34, iSx-235-BRF-B35); (iii) TP53 Exon 5 forward and reverse target extension/ligation oligonucleotides (iSx-241-TP53e5-T41, iSx-242-TP53e5-B42, iSx-243-TP53e5-T43, iSx-244-TP53e5-B44); (iv) TP53 Exon 6 forward and reverse target extension/ligation oligonucleotides (iSx-245-TP53e6-T45, iSx-246-TP53e6-B46); (v) TP53 Exon 7 forward and reverse target extension/ligation oligonucleotides (iSx-247-TP53e7-T47, iSx-248-TP53e7-B48); and (vi) TP53 Exon 8 forward and reverse target extension/ligation oligonucleotides (iSx-249-TP53e8-T49, iSx-250-TP53e8-B50). The above-mentioned extension/ligation oligonucleotides may contain optional thiophosphate groups at the 2nd and 3rd position from the 5′ end to prevent nick-translation and facilitate circularization on complementary targets.
Note 13: Alternatively, after generating the circular products comprising target regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations, these regions may be subject to rolling circle amplification using newly added target-specific primers to generate tandem-repeat products. These products may be generated either prior to, or after capture of desired targets with target-specific oligonucleotides on a solid support (See note 8 below). Primers may contain an internal cleavable nucleotide base or abasic site such as 1′,2′-Dideoxyribose (dSpacer), enabling incorporation of dUTP during rolling circle amplification for protection against carryover contamination. Examples of such primers are shown in Table 1 below and include the following: (i) KRAS forward and reverse primers (iSx-108-KRS-rcF26, iSx-109-KRS-rcR27); (ii) BRAF forward and reverse primers (iSx-118-BRF-rcF26, iSx-119-BRF-rcR27); (iii) TP53 Exon 5 forward and reverse primers (iSx-128-TP53e5-rcF66, iSx-129-TP53e5-rcR67; iSx-130-TP53e5-rcF68, iSx-131-TP53e5-rcR69); (iv) TP53 Exon 6 forward and reverse primers (iSx-138-TP53e6-rcF76, iSx-139-TP53e6-rcR77); (v) TP53 Exon 7 forward and reverse primers (iSx-148-TP53e7-rcF86, iSx-149-TP53e7-rcR87); and (vi) TP53 Exon 8 forward and reverse primers (iSx-158-TP53e8-rcF96, iSx-159-TP53e8-rcR97).
Note 14: After generating the circular products comprising target regions of KRAS, BRAF, and TP53 exons 5-8 containing hotspot mutations, and/or generating tandem-repeat products, these products may be captured by hybridizing to longer oligonucleotides, which contain a capture group suitable for subsequent capture on a solid support. Examples of such capture oligonucleotides, containing biotin groups suitable for capture via streptavidin-coated solid surfaces are shown in Table 1 below and include the following: (i) KRAS forward and reverse capture oligonucleotides (iSx-013-KRS-bcF1, iSx-014-KRS-bcR2); (ii) BRAF forward and reverse capture oligonucleotides (iSx-020-BRF-bcF1, iSx-021-BRF-bcR2); (iii) TP53 Exon 5 forward and reverse capture oligonucleotides (iSx-030-TP53e5-bcF1, iSx-031-TP53e5-bcR2; iSx-032-TP53e5-bcF3, iSx-033-TP53e5-bcR4); (iv) TP53 Exon 6 forward and reverse capture oligonucleotides (iSx-050-TP53e6-bcF5, iSx-051-TP53e6-bcR6); (v) TP53 Exon 7 forward and reverse capture oligonucleotides (iSx-060-TP53e7-bcF7, iSx-061-TP53e7-bcR8); and (vi) TP53 Exon 8 forward and reverse capture oligonucleotides (iSx-070-TP53e8-bcF9, iSx-071-TP53e8-bcR10).
Note 15: With the aforementioned products generated using the above primer and linker designs, after cluster or bead amplification, or capture within a well, address, or surface of a flow cell on a commercial instrument, the following primers may be used to initiate sequencing reactions: (i) iLx-003-PEsqP1, Paired End sequencing primer 1; (ii) iLx-004-BrCdR1, Indexing primer, Barcode Read 1; (iii) iLx-001-P5-BrCdR2, Barcode Read 2; and (iv) iLx-005-PEsqP2, Paired End sequencing primer 2 (primer sequences are provided in Table 1 below).
The approach described in
Detailed Protocol for Accurate Quantification of Tumor-Specific Copy Changes or Detection of Mutations in Known Genes (e.g. Braf, K-Ras, p53) in DNA Isolated from Circulating Tumor Cells or cfDNA:
5.4.a. Starting with cfDNA or genomic DNA isolated from CTC, (sheared to about 150 bp), repair ends with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is added with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase at 4° C. appends linkers on both ends of the fragment. Linkers may be synthesized with 5′ phosphate, or the phosphate may be appended using T4 kinase. Optionally, purify target DNA from unligated linker.
5.4.b. Denature target DNA containing linkers on both ends (94° C. 1 minute) in the presence of oligonucleotide probes (comprising a 5′ sequence complementary to the 5′ end of the linker, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, an optional spacer region, a sequence complementary to the 3′ end of the linker, and a 3′ sequence complementary to a unique or repetitive portion (i.e. AGAT repeat) of the target), and allow the oligonucleotide probes to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Oligonucleotide probes may contain an optional blocking group on the 5′ side. Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), and for the example of AGAT repeats, only the three complementary nucleotides (i.e. dTTP, dCTP, dATP) are either added subsequent to the annealing step, or at the start of the procedure. In the case where the linker sequence has a 5′ phosphate, KlenTaq extends the 3′ end until it is directly adjacent to the ligation-competent 5′ end (left side of
5.4.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for sequence-specific capture of desired targets using biotinylated probes, or for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence, followed by enhanced capture of desired targets using biotinylated probes. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence or optional primer binding sequence as a primer binding site.
Note 1: Oligonucleotide may contain an optional blocking group on the 5′ side to interfere with subsequent 5′-3′ nuclease activity of polymerase, such that the oligonucleotide probe does not circularize.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: The 5′ end linker may be synthesized to contain thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note: 4: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the primer is bound to the target. The endonuclease also cleaves single-stranded primer, but with lower efficiency, and thus primer hybridized to template would be the preferred substrate.
Note 5: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain a 5′ phosphate. Alternatively, the 5′ phosphate may be added using T4 kinase either prior to ligating to the target DNA, or after that ligation step.
Note 6: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 7: Capture of tandem repeat copies of the target (generated through rolling circle amplification) provides the unique advantage of multivalent binding to enhance capture of the correct target. This allows for more stringent hybridization/capture conditions resulting in higher efficiencies of capture and less capture of unwanted sequences. This approach may also be used to capture all targets containing repetitive sequences (i.e. AGAT tetra-nucleotide repeat), allowing scoring for loss of heterozygosity, copy number variation, haplotyping, establishing paternity, and other applications.
Note 8: The unique sequence identifier assures that even after rolling circle or other selection/amplification steps, each original target sequence may be uniquely scored, allowing for accurate quantification of the original copy number of each target.
To avoid problems with polymerase extension and nick-translation, the procedure may be performed without a polymerase extension step, using just ligase or ligase combined with the 5′-3′ nuclease activity of polymerase, as illustrated in
Detailed Protocol for Accurate Quantification of Tumor-Specific Copy Changes or Detection of Mutations in Known Genes (e.g. Braf, K-Ras, p53) in DNA Isolated from Circulating Tumor Cells or cfDNA:
5.5.a. Starting with cfDNA or genomic DNA isolated from CTC, (sheared to about 150 bp), repair ends with T4 polymerase and T4 Kinase, and subsequently a single base 3′ A overhang is added with Klenow (exo-) and dATP. Linkers have a single base 3′ T overhang, such that ligation using T4 ligase at 4° C. appends linkers on both ends of the fragment. Linkers contain a unique identifier sequence on the 5′ single stranded side of the linker, and may be synthesized with 5′ phosphate, or the phosphate may be appended using T4 kinase. Optionally, purify target DNA from unligated linker and/or dNTPs.
5.5.b. Denature target DNA containing linkers on both ends (94° C. 1 minute) in the presence of oligonucleotides (comprising a 5′ sequence complementary to a unique or repetitive portion (i.e. AGAT repeat) of the target, an optional spacer region, a sequence complementary to the 5′ end of the linker, an optional patient identifier sequence, an optional primer binding sequence, an optional spacer region, a sequence complementary to the 3′ end of the linker, and a 3′ sequence complementary to a unique or repetitive portion (i.e. AGAT repeat) of the target, and directly adjacent to, or overlapping by a single base or flap, the 5′ sequence complementary to the target), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Oligonucleotide may contain an optional blocking group on the 5′ side. Optionally, Taq polymerase, and thermostable ligase (preferably from strain AK16D), are either added subsequent to the annealing step, or at the start of the procedure. In the case where the linker sequence has a 5′ phosphate, the 3′ linker is directly adjacent to the ligation-competent 5′ end (left side of
5.5.c. Optionally, cleave the oligonucleotide probe at a cleavable link (e.g. U cleaved using UDG and AP endonuclease). Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for sequence-specific capture of desired targets using biotinylated probes, or for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence, followed by enhanced capture of desired targets using biotinylated probes. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the linker sequence or optional primer binding sequence as a primer binding site.
Note 1: Oligonucleotide may contain an optional blocking group on the 5′ side to interfere with subsequent 5′-3′ nuclease activity of polymerase, such that the oligonucleotide probe does not circularize. Alternatively, a cleavable link may be included in the original oligonucleotide.
Note 2: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 3: Capture of tandem repeat copies of the target (generated through rolling circle amplification) provides the unique advantage of multivalent binding to enhance capture of the correct target. This allows for more stringent hybridization/capture conditions resulting in higher efficiencies of capture and less capture of unwanted sequences. This approach may also be used to capture all targets containing repetitive sequences (i.e. AGAT tetra-nucleotide repeat), allowing scoring for loss of heterozygosity, copy number variation, haplotyping, establishing paternity, and other applications.
Note 4: The unique sequence identifier assures that even after rolling circle or other selection/amplification steps, each original target sequence may be uniquely scored, allowing for accurate quantification of the original copy number of each target.
Prophetic Example 6—Accurate Quantification of Tumor-Specific mRNA or lncRNA Isolated from Exosomes, Circulating Tumor Cells, or Total Blood Cells that Include Circulating Tumor Cells. (e.g. a Dozen Expression Markers that Predict Outcome or Guide Treatment)Changes in tumor-specific mRNA or lncRNA expression are a powerful tool for classifying disease status in a tissue-specific fashion. This includes accurate detection and quantitation of specific exons of mRNAs or lncRNAs (see e.g.,
Oligonucleotides containing sequences complimentary to adjacent but separate regions are hybridized to the cDNA targets.
Enumeration of these reporter circles can be carried out by a variety of quantitative methods including but not limited to next generation sequencing. In addition to allowing the simultaneous multiplex detection of many target sequences by circle formation, next generation sequencing as the readout provides additional information (two unique identifiers) about the reporter circles which readily distinguishes legitimate ligation products from ligation artifacts thus increasing the sensitivity of the assay.
Detailed Protocol for Detection and Quantification of Exons in mRNA or lncRNA Isolated from Exosomes, Circulating Tumor Cells, or Total Blood Cells Including Circulating Tumor Cells.
Variation 6.1, see e.g.,
6.1.b. Denature target cDNA (94° C. 1 minute) in the presence of oligonucleotides (comprising a 5′ probe region complementary to a portion of the cDNA towards the 5′ side, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a 3′ probe region complementary to another portion of the cDNA towards the 3′ side), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 40-50° C. for 30 min). Oligonucleotides may contain an optional phosphate group at the 5′ end or a matching 5′-overlapping base or flap. Optionally, Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), and optionally dNTPs are either added subsequent to the annealing step, or at the start of the procedure. In the case where the oligonucleotides have a 5′ phosphate directly adjacent to the 3′OH at the junction, the two ends may be directly joined using ligase (right side of
6.1.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising a copy of a short stretch of the original target cDNA, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 1: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 2: The 5′ end of the oligonucleotide primer may be synthesized to contain thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note: 3: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the primer is bound to the target. The endonuclease also cleaves single-stranded primer, but with lower efficiency, and thus primer hybridized to template would be the preferred substrate.
Note 4: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain a 5′ phosphate. Alternatively, the 5′ phosphate may be added using T4 kinase either prior to ligating to the target DNA, or after that ligation step.
Note 5: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 6: If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Note 7: MMLV reverse transcriptase may be engineered to synthesize cDNA at 50-60° C., from total input RNA (Invitrogen Superscript III). Alternatively, Tth or Tma DNA polymerases have been engineered to improve their reverse-transcriptase activity (may require addition of Mn cofactor). Finally, thermophilic PyroPhage 3173 DNA Polymerase has both strand-displacement and reverse-transcription activity, and may also be used.
Detailed Protocol for Detection and Quantification of Gene Fusions in mRNA Isolated from Exosomes, Circulating Tumor Cells, or Total Blood Cells Including Circulating Tumor Cells:
Variation 6.2 (see e.g.,
6.2.b. Denature target cDNA (94° C. 1 minute) in the presence of two oligonucleotides (the first comprising a 5-′ sequence complementary to a unique portion of target exon #1, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a 3′-sequence complementary to a unique portion of target exon #2 and the second comprising a 5′-sequence complimentary to a unique portion of target exon #2, a unique identifier sequence, an optional patient identifier sequence and a 3′ sequence complementary to a unique portion of target exon #1), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Thermostable ligase (preferably from strain AK16D), and optionally Taq polymerase and dNTPs are either added subsequent to the annealing step, or at the start of the procedure. In the case where both oligonucleotide sequences have a 5′ phosphate, no polymerase is required (left side of
6.2.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 1: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 2: If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Note 3: In systems that use two bridging oligonucleotide probes, each containing unique identifier sequence, optional patient identifier sequence and optional primer binding site, it is recommended that only one but not both of the oligonucleotides contain a primer binding site (either the upstream or downstream oligonucleotide). For an additional level of false-positive detection, it is recommended that the unique identifier sequence be different in the upstream and downstream probes.
Note 4: MMLV reverse transcriptase may be engineered to synthesize cDNA at 50-60° C., from total input RNA (Invitrogen Superscript III). Alternatively, Tth or Tma DNA polymerases have been engineered to improve their reverse-transcriptase activity (may require addition of Mn cofactor). Finally, thermophilic PyroPhage 3173 DNA Polymerase has both strand-displacement and reverse-transcription activity, and may also be used.
Detailed Protocol for Another Variation for Detection and Quantification of Gene Fusions in mRNA Isolated from Exosomes, Circulating Tumor Cells, or Total Blood Cells Including Circulating Tumor Cells;
Variation 6.3 (see e.g.,
6.3.b. Denature target cDNA (94° C. 1 minute) in the presence of two oligonucleotides (the first comprising a 5-′ sequence complementary to a unique portion of target exon #1, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a 3′-sequence complementary to a unique portion of target exon #2 and the second comprising a 5′-sequence complimentary to a unique portion of target exon #2, a unique identifier sequence, an optional patient identifier sequence and a 3′ sequence complementary to a unique portion of target exon #1), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity), thermostable ligase (preferably from strain AK16D), and dNTPs are either added subsequent to the annealing step, or at the start of the procedure. In the case where both oligonucleotide sequences have a 5′ phosphate, KlenTaq extends the 3′ end until it is directly adjacent to the ligation-competent 5′ end (left side of
6.3.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 1: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 2: The 5′ end of the oligonucleotide primer may be synthesized to contain thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note: 3: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the primer is bound to the target. The endonuclease also cleaves single-stranded primer, but with lower efficiency, and thus primer hybridized to template would be the preferred substrate.
Note 4: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain a 5′ phosphate. Alternatively, the 5′ phosphate may be added using T4 kinase either prior to ligating to the target DNA, or after that ligation step.
Note 5: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 6: If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Note 7: In systems that use two bridging oligonucleotide probes, each containing unique identifier sequence, optional patient identifier sequence and optional primer binding site, it is recommended that only one but not both of the oligonucleotides contain a primer binding site (either the upstream or downstream oligonucleotide). For an additional level of false-positive detection, it is recommended that the unique identifier sequence be different in the upstream and downstream probes.
Note 8: MMLV reverse transcriptase may be engineered to synthesize cDNA at 50-60° C., from total input RNA (Invitrogen Superscript III). Alternatively, Tth or Tma DNA polymerases have been engineered to improve their reverse-transcriptase activity (may require addition of Mn cofactor). Finally, thermophilic PyroPhage 3173 DNA Polymerase has both strand-displacement and reverse-transcription activity, and may also be used.
Detailed Protocol for Detection and Quantification of mRNA Containing Specific Exons (which May not be Adjacent to Each Other) in mRNA Isolated from Exosomes, Circulating Tumor Cells, or Total Blood Cells Including Circulating Tumor Cells;
Variation 6.4 (see e.g.,
6.4.b. Denature target cDNA (94° C. 1 minute) in the presence of two oligonucleotides (the first comprising a 5′-sequence complementary to a unique portion of target exon #1, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a 3′-sequence complementary to a unique portion of target exon #2 and the second comprising a 5′-sequence complimentary to a unique portion of target exon #2, a unique identifier sequence, an optional patient identifier sequence and a 3′ sequence complementary to a unique portion of target exon #1), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Thermostable ligase (preferably from strain AK16D), and optionally Taq polymerase and dNTPs are either added subsequent to the annealing step, or at the start of the procedure. In the case where both oligonucleotide sequences have a 5′ phosphate, no polymerase is required (left side of
6.4.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 1: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 2: If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Note 3: In systems that use two bridging oligonucleotide probes, each containing unique identifier sequence, optional patient identifier sequence and optional primer binding site, it is recommended that only one but not both of the oligonucleotides contain a primer binding site (either the upstream or downstream oligonucleotide). For an additional level of false-positive detection, it is recommended that the unique identifier sequence be different in the upstream and downstream probes.
Note 4: MMLV reverse transcriptase may be engineered to synthesize cDNA at 50-60° C., from total input RNA (Invitrogen Superscript III). Alternatively, Tth or Tma DNA polymerases have been engineered to improve their reverse-transcriptase activity (may require addition of Mn cofactor). Finally, thermophilic PyroPhage 3173 DNA Polymerase has both strand-displacement and reverse-transcription activity, and may also be used.
Detailed Protocol for Detection and Quantification of mRNA Containing Specific Exons (which May not be Adjacent to Each Other) in mRNA Isolated from Exosomes, Circulating Tumor Cells, or Total Blood Cells Including Circulating Tumor Cells;
Variation 6.5 (see e.g.,
6.5.b. Denature target cDNA (94° C. 1 minute) in the presence of two oligonucleotides (the first comprising a 5-′ sequence complementary to a unique portion of target exon #1, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a 3′-sequence complementary to a unique portion of target exon #2 and the second comprising a 5′-sequence complimentary to a unique portion of target exon #2, a unique identifier sequence, an optional patient identifier sequence and a 3′ sequence complementary to a unique portion of target exon #1), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity) thermostable ligase (preferably from strain AK16D), and dNTPs are either added subsequent to the annealing step, or at the start of the procedure. In the case where both oligonucleotide sequences have a 5′ phosphate, KlenTaq extends the 3′ end until it is directly adjacent to the ligation-competent 5′ end (left side of
6.5.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 1: Fen nuclease may be used instead of polymerase with 5′-3′ nuclease activity to generate the ligation-competent 5′ phosphate on the 5′ side of the target.
Note 2: The 5′ end of the oligonucleotide primer may be synthesized to contain thiophosphate linkages in the 2nd and 3rd position from the 5′ phosphate end, (which will be liberated by the 5′→3′ nuclease activity of the polymerase). To minimize polymerase displacement of those bases as it extends one base too many (which would make it impossible to ligate to the downstream primer), the target bases at the ligation junction would preferentially be AT rich on the 3′ side, and GC rich on the 5′ side.
Note: 3: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain an apurinic (AP) site at the position adjacent to the desired 5′ phosphate. This 5′ phosphate is liberated using a thermostable EndoIII (such as Tma EndoIII). This enzyme cleaves AP sites leaving a 5′ phosphate when the primer is bound to the target. The endonuclease also cleaves single-stranded primer, but with lower efficiency, and thus primer hybridized to template would be the preferred substrate.
Note 4: When using KlenTaq polymerase (Taq polymerase without 5′→3′ nuclease cleavage activity), the 5′ end linker may be synthesized to contain a 5′ phosphate. Alternatively, the 5′ phosphate may be added using T4 kinase either prior to ligating to the target DNA, or after that ligation step.
Note 5: A 1:20 mixture of Taq polymerase (with 5′→3′ nuclease activity) and KlenTaq (Taq polymerase without 5′→3′ nuclease cleavage activity) may be used under conditions of distributive extension (i.e. higher salt concentration) to minimize degradation of target DNA by nick translation.
Note 6: If the oligonucleotide(s) also comprises optional primer binding sites (i.e. universal primer binding sites), it would be suitable for PCR amplification, followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, Real-Time PCR assays, digital PCR, microarray, hybridization, or other detection method.
Note 7: In systems that use two bridging oligonucleotide probes, each containing unique identifier sequence, optional patient identifier sequence and optional primer binding site, it is recommended that only one but not both of the oligonucleotides contain a primer binding site (either the upstream or downstream oligonucleotide). For an additional level of false-positive detection, it is recommended that the unique identifier sequence be different in the upstream and downstream probes.
Note 8: MMLV reverse transcriptase may be engineered to synthesize cDNA at 50-60° C., from total input RNA (Invitrogen Superscript III). Alternatively, Tth or Tma DNA polymerases have been engineered to improve their reverse-transcriptase activity (may require addition of Mn cofactor). Finally, thermophilic PyroPhage 3173 DNA Polymerase has both strand-displacement and reverse-transcription activity, and may also be used.
Prophetic Example 7—Accurate Quantification of Tumor-Specific miRNA Isolated from Exosomes or Argonaut Proteins. (e.g. a Dozen microRNA Markers that Predict Outcome or Guide Treatment)MicroRNA (miRNA) have been identified as potential tissue-specific markers of the presence of tumors, their classification and prognostication. miRNA exist in serum and plasma either as complexes with Ago2 proteins or by encapsulation as exosomes.
Detailed Protocol for the Capture, Identification, and Quantification of all miRNA Species which are Present in Serum, Plasma or Exosomes without any Selection:
Variation 7.1 (see e.g.,
7.1.b. Phosphorylate the 5′-end of the modified miRNA with T4 kinase. Ligate a universal linker to the 5′-end of the modified linker using a DNA linker with a 5′-phosphate and a blocked 3′-end.
7.1.c. After removal of the excess linkers, denature the nucleic acids (94° C. 1 minute) in the presence of oligonucleotides (comprising sequences complementary to the 5′- and 3′-ends of the linkers, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 40-50° C. for 30 min). Add a Reverse Transcriptase lacking 5′-3′ activity (i.e. Tth DNA polymerase using Mn2+cofactor), a thermostable ligase (preferably from strain AK16D) and dNTPs; the components are either added subsequent to the annealing step, or at the start of the procedure. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
7.1.d. Add UDG and AP endonucleases to nick the miRNA in the original target, add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising a copy of the original target miRNA, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the optional primer binding sequence as a primer binding site.
Note 1. If the oligonucleotide(s) also comprise optional primer binding sites (i.e., universal primer binding sites), the circular DNA would be suitable for PCR amplification followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, real-time PCR assays, digital PCR, microarray, hybridization or other detection methods.
The next variants of the method (see e.g.,
Detailed Protocol for the Capture, Identification, and Quantification of Specific miRNA Species which are Present in Serum, Plasma or Exosomes without any Selection:
Variation 7.2 (see e.g.,
7.2.b. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated oligonucleotides, leaving only the desired single-stranded circular miRNA-DNA chimera. Heat inactivate exonucleases at 80° C. for 20 minutes.
7.2.c. Add a universal 5′-phosphorylated primer to the circular products, hybridize at 37° C. Add Reverse Transcriptase lacking a 5′-3′ exonuclease activity (Tth DNA polymerase using Mn2+ cofactor), thermostable ligase (preferably from strain AK16D) and dNTPs; the components are either added subsequent to the annealing step, or at the start of the procedure. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products now contain a copy of the original target miRNA.
7.2.d. Add UDG and AP endonucleases to nick the miRNA in the original target, add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising a copy of the original target miRNA, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the optional primer binding sequence as a primer binding site.
Note 1. If the oligonucleotide(s) also comprise optional primer binding sites (i.e., universal primer binding sites), the circular DNA would be suitable for PCR amplification followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, real-time PCR assays, digital PCR, microarray, hybridization or other detection methods.
The next variation of the method makes use of a C6 or C18 spacer that allows copying of the target miRNA but prevents completion of a circular product and thus eliminating false positive detection of low levels of the desired miRNA. Nick translation of a universal primer containing thiophosphates at the 2nd and 3rd positions relative to the 5′-end is unable to cross the C6 or C18 spacer in the middle of the target miRNA binding complement sequence enabling destruction of any unligated miRNA probes.
Detailed Protocol for the Capture, Identification, and Quantification of Specific miRNA Species which are Present in Serum, Plasma or Exosomes without any Selection:
Variation 7.3 (see e.g.,
7.3.b. Add Reverse Transcriptase possessing a 5′-3′ exonuclease activity (Tth DNA polymerase using Mn2+ cofactor) and thermostable ligase (preferably from strain AK16D), dNTPs are either added subsequent to the annealing step, or at the start of the procedure. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products now contain a copy of the original target miRNA.
7.3.c. Add UDG and AP endonucleases to nick the miRNA in the original target, then add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising a copy of the original target miRNA, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using primer complementary to the primer binding sequence with phi-29 polymerase) to create tandem repeats of the desired sequence. This may be followed by identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the optional primer binding sequence as a primer binding site.
Note 1. If the oligonucleotide(s) also comprise optional primer binding sites (i.e., universal primer binding sites), the circular DNA would be suitable for PCR amplification followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, real-time PCR assays, digital PCR, microarray, hybridization or other detection methods.
Another variant (
Detailed Protocol for the Capture, Identification, and Quantification of Specific miRNA Species which are Present in Serum, Plasma or Exosomes without any Selection:
Variation 7.4 (see e.g.,
7.4.b. Denature the nucleic acids (94° C. 1 minute) in the presence of oligonucleotides (comprising sequences complementary to the 5′-stem-loop sequence and the 3′-ends of the extended cDNA. In addition, the 5′-end is blocked to prevent nick translation.) and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 40-50° C. for 30 min).
7.4.c. Add KlenTaq (Taq polymerase lacking nuclease activity) and thermostable ligase (preferably from strain AK16D), dNTPs are either added subsequent to the annealing step, or at the start of the procedure. KlenTaq extends the 3′ end until it is directly adjacent to the ligation-competent 5′ end. Allow for extension and ligation at the hybridization temperature, and optionally raise the temperature (e.g. 60° C.) to assure completion of extension and ligation, to generate circular products.
7.4.d. Cleave the oligonucleotide probe at a cleavable link (e.g. U cleaved using UDG and AP endonuclease). Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated products and excess probes, leaving only the desired single-stranded circular DNA comprising a copy of a short stretch of the original target cDNA, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 1. If the oligonucleotide(s) also comprise optional primer binding sites (i.e., universal primer binding sites), the circular DNA would be suitable for PCR amplification followed by subsequent identification of the targets using next generation sequencing, TaqMan assays, UniTaq assays, real-time PCR assays, digital PCR, microarray, hybridization or other detection methods.
Prophetic Example 8—Clinical Need in Prenatal Diagnosis from Maternal Plasma SampleIn the prenatal care field, there is an urgent need to develop non-invasive assays for, common aneuploidies, such as trisomy 21, 18, or 13, small deletions, such as those arising from deletions in the Duchenne muscular dystrophy (DMD) gene, other small copy number anomalies, such as those responsible for autism, balanced translocations to determine potential clinical manifestations, methylation changes, which may result in diseases associated with imprinting, such as Angelman's syndrome or Prader-Willi syndrome, triplet repeat changes, responsible for diseases such as Huntington's disease, point mutations, such as those in the CFTR gene responsible for cystic fibrosis.
Overview: Recent work has shown that fetal DNA as a percentage of maternal DNA in the plasma is at approximately 6%, 20%, and 26% in the 1st, 2nd, and 3rd trimester respectively. Due to how DNA is degraded, maternal DNA is usually about 160 bases and still associated with the H1 histone, while fetal DNA is about 140 bases and not associated with histone. Depending on the clinical need, and where the knowledge will provide the best care, tests may be developed with sufficient sensitivity to detect fetal DNA in the appropriate trimester.
There are approximately 3,500 recessive genetic disorders where the gene is known. The most common disorders result from DNA copy anomalies, either an extra chromosome such as in Trisomy 21, or deletion of a portion of a gene, such as in the Duchenne muscular dystrophy (DMD) gene. In considering prenatal screening, one needs to balance the probability of a genetic disorder vs. the risk of the procedure. Currently, the standard of care recommends amniocentesis during week 17 for expectant mothers at age 35, since the risk of Trisomy 21 or other chromosomal aneuploidy at 1 in 200 now matches the risk of spontaneous abortion after the procedure.
In considering the use of the methods of nucleic acid sequencing described herein for prenatal screening, two levels of testing are recommended. For low-cost screening of all pregnancies for Trisomy 21, 13, and 18, the sequencing methods of the present invention may be used to rapidly identify differentially expressed genes on chromosomes 21, 13, and 18, e.g., identify those genes that are turned off in the fetus as a consequence of methylation silencing, but are on in the adult. Similar regions are identified on three control chromosomes, i.e. 2, 5, 7. Even when isolating DNA from the serum of a mother in the first trimester, one can rapidly calculate the percentage of DNA arising from the fetus by comparing methylated to unmethylated DNA among control chromosomal regions—in the example herein, that would be 6%. If there is trisomy at any of the other chromosomes, i.e. Trisomy 21, then the promoters from that chromosome will show methylation at about 9%, in other words, some 50% higher than for the normal disomy case. Scoring 1,000 genome equivalents is recommended, such that a count of 90 methylated copies for the trisomy case is easily distinguished from 60 methylated copies for the normal sample.
As a first step in such a procedure, in order to identify those promoter regions which are methylated in fetal DNA during the earliest stages of development, but never methylated in maternal DNA (i.e. WBC). This needs to be performed empirically, by comparing methylation pattern in cfDNA isolated from women who are pregnant with a diploid fetus, a fetus containing a trisomy, and no pregnancy. For a generalized review on using epigenetic markers for NIPD see Patsalis et al., “A New Non-invasive Prenatal Diagnosis of Down syndrome through Epigenetic Markers and Real-time qPCR,” Expert Opin Biol Ther. 12 Suppl 1:S155-61 (2012), which is hereby incorporated by reference in its entirety. As an example detecting methylation status at a single marker PDE9A, see Lim et al., “Non-invasive Epigenetic Detection of Fetal Trisomy 21 in First Trimester Maternal Plasma,” PLoS One 6(11):e27709 (2011), which is hereby incorporated by reference in its entirety. The approaches described herein will be able to identify such markers far more rapidly. To identify methylation at adjacent HinP1I sites throughout the genome, the approach outlined in
Alternatively, there are certain genes that are turned on during fetal development that are off in adult tissue or blood. Under these conditions, it would be important to identify unmethylated promoter regions, by comparing unmethylation pattern in cfDNA isolated from women who are pregnant with a diploid fetus, a fetus containing a trisomy, and no pregnancy. To identify genome-wide loss of methylation patterns, the methods described in
Further, the above approach will be able to accurately quantify methylation changes at other positions in the genome, which may result in diseases associated with imprinting, such as Angelman's syndrome or Prader-Willi syndrome. The ability of the present invention to determine methylation status and at the same time to determine if the deletion is on the paternal or maternal chromosome by SNP or repetitive sequence polymorphisms detection (i.e., detection of upstream or downstream cis-located maternal or paternal identifying SNPs or repetitive sequence polymorphisms) will enhance its diagnostic discrimination of imprinting diseases.
A ligase-based approach to count chromosomal fragments for non-invasive prenatal diagnosis termed DANSR has been recently reported (Sparks et al., “Selective Analysis of Cell-free DNA in Maternal Blood for Evaluation of Fetal Trisomy,” Prenat Diagn. 32(1):3-9 (2012), which is hereby incorporated by reference in its entirety). This approach is based on using 3 fragments in a ligation reaction on the chromosomal arm: a left fragment containing and upstream universal sequence, a middle fragment, and a right fragment containing a downstream universal sequence. After a ligation reaction, the products are separated from the input primers, PCR amplified, and sequenced. Spurious ligations (i.e. upstream directly to wrong downstream primer, no middle piece) are easily distinguished by sequencing, and were less than 5%. The present invention can also use this approach, either by capturing specific targets from the ligated products of the entire genome, as described in
Since the above approach depends on counting chromosomal regions, the accuracy of the technique may be improved by taking advantage of polymorphisms in such regions.
The example of 1,000 genome equivalents for the maternal DNA, and 100 genome equivalents for the fetal DNA can be used to illustrate the difference in distinguishing by scoring for presence of copy vs. presence of polymorphism. When using highly polymorphic markers such as with tetranucleotide repeats, it is not unusual for 3 or all 4 of the polymorphisms to be different. Results will be compared with these cases, where either the markers for the maternal or paternal chromosome are the same or different. Chromosome 2 will be used as the control, and 21 as the example for trisomy
Case 1 (Copy Number Only)
From this analysis, it is clear that to distinguish trisomy, the most useful markers are those where the maternal loci is polymorphic, and the paternal loci is different from both maternal alleles. The paternal loci does not need to be polymorphic. For X-linked deletions, such as found in Duchenne's muscular dystrophy, the approach is simpler, since the disease is mostly manifest in boys. Again, comparing chromosome 2 with the X chromosome in the area of the deletion would yield:
Case 10 (Maternal Markers Heterozygous, Inherited X-Linked Deletion)
Case 11 (Maternal Markers Heterozygous, Fetus does not Contain X-Linked Deletion)
Case 10 shows inherited Duchenne's muscular dystrophy, where the mother is a carrier. Under these conditions, the amount of the total amount of two X-chromosome alleles appears half of other positions. In case 11, the fetus does not have the disease, and in case 12 the disease is a sporadic mutation that appears in the fetus. The Y chromosome marker confirms the fetus in male. The above shows how genotyping would be performed at the DMD locus. If there is prior knowledge that the mother is a carrier, than phasing of the deletion with neighboring polymorphisms can be determined (see below), and then these neighboring polymorphisms may also be used to verify if the fetus also carries the deletion, and if the fetus is male, and susceptible to the disease. This approach may be used to find both X-linked and autosomal dominant changes.
To determine if the fetus contains an inherited or sporadic mutation on one of the roughly 3,500 other disorders, including deletions, point mutations, or abnormal methylation, a more sophisticated analysis would be recommended. Sequence analysis readily determines presence of the recessive allele in both parents. If the mutation is different in the parents, it is possible to determine if the child is a compound heterozygote for the disease by evaluating cell-free DNA from the maternal serum. To obtain the full answer from analysis of fetal DNA in the maternal serum may require two parts to this assay. The first is to establish phase for the maternal SNPs or polymorphisms in repeat regions that surround the disease gene. This may be accomplished by isolating high molecular weight DNA from white blood cells of the mother, or from saliva of the father.
Determining accurate haplotype or phase may be accomplished by using a variation of an approach developed by Complete Genomics (see Peters et al., “Accurate Whole-genome Sequencing and Haplotyping from 10 to 20 Human Cells,” Nature 487(7406):190-5 (2012), which is hereby incorporated by reference in its entirety). For the present application HMW DNA is distributed into 96 or 384 well plates such that there is less than one chromosome per well. Subsequently, whole genome amplification is used to determine which wells contain the chromosome, and then the phase of 96 neighboring SNPs or repetitive sequence polymorphisms to the maternal disease allele are determined for the gene in question. Once this is accomplished, one scores for presence of the disease allele from the father (as described in approach 4 above), and using sequencing, verifies that the chromosome that is inherited from the mother also contains a disease allele. Multiple approaches for capturing repetitive sequences of DNA throughout the genome are described herein, which may be used to identify polymorphisms from original or whole-genome-amplified DNA.
The ability to determine haplotypes from diploid genomes remains a very technically challenging and expensive task that essentially relies on the physical isolation of chromosomes or sub-chromosomal fragments prior to genotyping by sequencing or some other technology. The following describes a rapid and easy procedure for capturing two regions on the same strand of genomic DNA to allow determination of the haplotype structure defined by the two regions.
For each potential target, two oligonucleotides are hybridized simultaneously, each containing sequences complementary to upstream and downstream portions flanking each of two polymorphisms on the target. These polymorphisms may be tetranucleotide, trinucleotide, or dinucleotide repeats, or SNPs. The most informative polymorphisms are ones that are polymorphic in both the maternal chromosomes, as well as polymorphic in the father's chromosome that is transferred to the fetus. Polymerase is used to extend from each of the two 3′-ends to close the gap and determine the genotype of the polymorphisms located between the two pairs of flanking binding sequences. The two targeted polymorphisms need not necessarily be adjacent to each other, and can be separated by other polymorphisms. The distance between the two polymorphism of interest is limited only by the probability of being bridged by the four binding sequences contained in the two oligonucleotides. Each of the oligonucleotides contains unique identifier sequence, optional patient identifier sequence, optional primer binding sequence, and optional phosphate on 5′ end.
Detailed Protocol for Determining the Haplotype Between Two Known Polymorphisms:Variation 8.1 (see e.g.,
8.1.b. Denature target DNA (94° C. 1 minute) in the presence of two oligonucleotides (the first comprising a 5-′ sequence complementary to a unique region upstream of target Polymorphism #1, a unique identifier sequence, an optional patient identifier sequence, an optional primer binding sequence, and a 3′-sequence complementary to a unique region downstream of target Polymorphism #2 and the second comprising a 5′-sequence complimentary to a unique region upstream of target Polymorphism #2, a unique identifier sequence, an optional patient identifier sequence and a 3′ sequence complementary to a unique region downstream of target Polymorphism #1), and allow the oligonucleotides to hybridize to their complementary regions on the desired fragments by cooling to a desired temperature (e.g. 50° C. for 2 hours). Taq polymerase and/or KlenTaq (Taq polymerase lacking nuclease activity), and thermostable ligase (preferably from strain AK16D), dNTPs are either added subsequent to the annealing step, or at the start of the procedure. In the case where both oligonucleotide sequences have a 5′ phosphate, KlenTaq extends the 3′ end until it is directly adjacent to the ligation-competent 5′ end (left side of
8.1.c. Add Exonuclease I (digests single-stranded DNA in the 3′→5′ direction), and Exonuclease III (digests double-stranded DNA in the 3′→5′ direction), to digest all unligated or nicked products, leaving only the desired single-stranded circular DNA comprising of the original target DNA, the linker sequence, the unique identifier sequence, an optional primer binding sequence, and an optional patient identifier sequence. This product is suitable for rolling circle amplification (using random hexamer primers with phi-29 polymerase) to create tandem repeats of the desired sequence, and subsequent identification of the targets using next generation sequencing, or alternatively, direct SMRT sequencing on the covalently closed template, using the primer binding sequence as a primer binding site.
Note 1: This approach also easily allows establishing haplotype (i.e. phase) with the actual disease causing mutation. Primers are designed such that one of the polymorphic site is the actual disease gene.
Note 2: In systems that use two bridging oligonucleotide probes, each containing unique identifier sequence, optional patient identifier sequence and optional primer binding site, it is recommended that only one but not both of the oligonucleotides contain a primer binding site (either the upstream or downstream oligonucleotide. For an additional level of false-positive detection, it is recommended that the unique identifier sequence be different in the upstream and downstream probes.
Note 3: It is important to maximize formation of ligation products where the two oligo binding sequences are actually joined by a contiguous stretch of DNA, and minimize formation of ligation products where the bridged fragments are not contiguous.
Consider the following example for illustrative purposes. Upstream region is designated “X” and has an AGAT tetranucleotide repeat, while downstream region is designated “Y” and has a CA dinucleotide repeat. The upstream region X is illustrated below as “X-up”, the upstream primer binding site, AGAT (n), the tetranucleotide repeat region, and “X-dn”, the downstream primer binding site. The downstream region Y is illustrated below as “Y-up”, the upstream primer binding site, CA (n), the dinucleotide repeat region, and “Y-dn”, the downstream primer binding site. In this example, the two regions are 10 kb apart, and the two Maternal chromosomes are of the form:
X-up AGAT(12) X-dn . . . (10 kb) . . . Y-up CA(23) Y-dn X-up AGAT(16) X-dn . . . (10 kb) . . . Y-up CA(18) Y-dnThen add Ligation-extension primers that bind to the lower strand (as in
(Left) 5′ X-dn-primer site—Identifier #1—Y-up 3′
(Right) 5′ Y-dn-identifier #2—X-up 3′
(In this example, the primer binding site is only in the left primer.)
Then the 2 possible ligation products defined by the haplotypes (linearized at the primer binding site so it's easier to follow) would be of the form:
primer site—Identifier #1—Y-up CA(23) Y-dn-identifier #2—X-up AGAT(12) X-dn
primer site—Identifier #1—Y-up CA(18) Y-dn-identifier #2—X-up AGAT(16) X-dn
One can maximize formation of the correct products arising from contiguous sites on the same chromosomal fragments by diluting the reaction, such that the probability of products arising from non-contiguous sites becomes infinitesimally small. Further, since the primer is in vast excess of the chromosomal DNA, if two fragments are not contiguous, then there is a far higher probability that each fragment will already have a “Left” and “Right” composite primer-binding to it (i.e. 4 primers binding to two separate fragments). Only when the two fragments are contiguous is there a higher probability the two regions will be bound together by one each of the “Left” and “Right” composite primers. It is a straightforward experiment to optimize yield of correct ligation products by generating a matrix of conditions of different target concentrations vs. different dilutions of ligation-extension primers.
If there are an incorrect ligation products arising from two different chromosomal regions coming together, then it would erroneously create a product where CA(23) is together with AGAT(16), and CA(18) is together with AGAT(12).
As an example, consider a worst case scenario, where 80% of the ligation product arises from bridged fragments that are not contiguous. For simplicity, assume 1,000 genome equivalents. Note, if the product arises from bridged fragments, then there is an equal chance they will come from the same maternal chromosome as from the opposite ones (i.e. 400 each). But those products arising from contiguous DNA will only arise from the same maternal chromosome (i.e. 200 each). Thus, there should be the following combinations:
Even if these numbers fluctuated by 50 in the least favorable direction (equivalent to 90% of the ligation products arising from bridged fragments it would still be straightforward to distinguish the haplotype at each chromosome as CA(23) and AGAT(12); as well as CA(18) and AGAT(16).
Alternatively, in a second approach, the disease genes may be divided into the 20 most common inherited diseases, as well as autosomal dominant diseases, and then divided into 17 groups of less commonly mutated sequences covering an average of 200 genes each. Each group of genes would be covered by sets of capture probes, and then depending on the results from the parental sequencing analysis, the maternal blood would be given proper patient identifiers and evaluated on one or more of the 17 specialty probe capture sets.
The first of the above approaches will identify both inherited and sporadic mutations, as well as determine if the fetus inherited a mutation-bearing region from the mother. This approach should also be able to determine the presence of deletions for x-linked inherited diseases, other chromosomal deletions, aberrant methylation in the fetus, diseases arising from triplet repeats, and diseases arising from chromosomal translocations or other rearrangements.
The second approach will identify disease conditions for the genes interrogated. The key issue will be how important is it for the family to get the right answer. It is straightforward to determine if both parents are carriers, and if the mutations are different, relatively straightforward to determine if the father's disease allele is present in the fetus. If it is absent, then the fetus will be either disease free or a carrier. If it is present, then the chances of inheriting the maternal allele and getting the disease are 50%. If haplotype for the maternal allele has been determined, then haplotype markers may be used to verify presence or absence of the inherited maternal allele. It may also be prudent to do an amniocentesis and directly test for the presence of the maternal allele. The current recommendation is to sequence the gene as outlined above, and score for the paternal disease allele. If present, or if the paternal and maternal disease-specific mutations are identical, then the physician recommends amniocentesis.
The methods of the present invention can also be used for non-invasive prenatal diagnosis and preimplantation genetic diagnosis (PGD) of unbalanced chromosomal translocations. Individuals that carry chromosomal translocations are at increased risk for infertility, miscarriage, stillbirth, and/or having a child with birth defects. Preimplantation genetic diagnosis is able to distinguish between embryos that have the correct amount of genetic material (balanced/normal) and embryos that are missing genetic material as a result of the translocation (unbalanced). Many couples in which one member is a translocation carrier have experienced miscarriages or have had to face difficult decisions when learning about a pregnancy with an unbalanced set of chromosomes. The methods of the present invention based PGD would reduce the likelihood of having to deal with these particular circumstances by knowing prior to conception that the embryo(s) transferred have balanced chromosomal translocations.
The approach of scoring chromosome-specific SNPs or repetitive sequence polymorphisms may also be employed in pre-implantation screening. In contrast to pregnancy, where only trisomy 21, 18, and 13 come to term (as well as X and Y chromosome anomalies), with in vitro fertilization, other chromosomal abnormalities may still allow for growth at the 16, 32, or higher cell number stage. Consequently, polymorphisms throughout the genome will be needed to account for both proper copy number and loss or gain of chromosomal regions. The higher the number of polymorphisms being interrogated, the finer copy number changes can be determined.
Prophetic Example 9—Paternity Testing of the Fetus (e.g. Prenatal Care Support)Overview: The basic approach is to look for presence of alleles present in the father, but absent in the mother. There are two general ways to approach this. One can start with SNPs where the common allele has a frequency around 70-75%, so that there is about a 50% chance the mother is homozygous for the major allele. One starts with about 48 SNPs of which about half of them (24) the mother will be homozygous for the common allele, and there is a 50% chance the father will be either heterozygous or homozygous for the minority allele. One simply scores for the presence of the minority allele in the maternal blood, similar to looking for mutations, but one also quantifies the amount present just to confirm it's a minority allele from the father. A second approach is to start with alleles with frequency around 50%, then there is a 50% chance the mother is homozygous for one of the alleles, and then there is a 75% chance the father will have the other allele at that position. It is a little less informative in differentiating the fathers, but more positions will be informative. An third approach is to use repetitive sequence polymorphisms, where there is a high degree of polymorphism, such that the father's allele has a high probability of being different from either of the mother's allele at a given position. This would require the least amount of alleles. Under all these conditions, care must be taken by the physician to respect the mother's privacy in case the husband is not the father of the fetus (non-paternity).
Although preferred embodiments have been depicted and described in detail herein, it will be apparent to those skilled in the relevant art that various modifications, additions, substitutions, and the like can be made without departing from the spirit of the invention and these are therefore considered to be within the scope of the invention as defined in the claims which follow.
Claims
1.-82. (canceled)
83. A method for sequencing a plurality of target nucleic acid molecules, said method comprising:
- providing a sample containing a plurality of target nucleic acid molecules, wherein said target nucleic acid molecules have 3′ and 5′ ends;
- appending a 3′ linker sequence and a 5′ linker sequence to the 3′ and 5′ ends of the plurality of target nucleic acid molecules, respectively, to form 3′ and 5′ linker-appended target nucleic acid molecules, wherein said 3′ and 5′ linker sequences collectively comprise (i) one or more patient identifier sequences, (ii) a first solid support primer-specific portion, (iii) a second solid support primer-specific portion, (iv) one or more sequencing primer binding sequences, or any combination of (i), (ii), (iii), and (iv);
- providing one or more first oligonucleotide probes, each first oligonucleotide probe comprising a portion complementary to the 3′ linker-appended target nucleic acid molecules, and a portion complementary to the 5′ linker-appended target nucleic acid molecules;
- hybridizing the 3′ and 5′ linker-appended target nucleic acid molecules and the one or more first oligonucleotide probes, wherein the first oligonucleotide probes hybridize in a base specific manner to complementary sequences of the 3′ and 5′ linker-appended target nucleic acid molecules to form one or more ligation-competent junctions suitable for coupling 3′ and 5′ ends of the 3′ and 5′ linker-appended target nucleic acid molecules;
- ligating the 3′ and 5′ linker-appended target nucleic acid molecules at the one or more ligation-competent junctions to form a plurality of circular chimeric single-stranded nucleic acid constructs;
- providing a solid support on which a plurality of first oligonucleotide primers is immobilized, each first oligonucleotide primer comprising a nucleotide sequence that is complementary to the first solid support primer-specific portion of the plurality of circular chimeric single-stranded nucleic acid constructs;
- hybridizing the plurality of circular chimeric single-stranded nucleic acid constructs to the plurality of first oligonucleotide primers immobilized on the solid support;
- blending the plurality of circular chimeric single-stranded nucleic acid constructs hybridized to the plurality of first oligonucleotide primers immobilized on the solid support with a polymerase to form a rolling circle amplification reaction mixture;
- subjecting the rolling circle amplification reaction mixture to an extension treatment where the polymerase extends the one or more hybridized oligonucleotide primers to produce a plurality of primary extension products, each primary extension product comprising one or more tandem linear sequences, wherein each tandem linear sequence is complementary to one of the circular chimeric single-stranded nucleic acid constructs in the collection; and
- sequencing the circular chimeric single-stranded nucleic acid constructs directly or said plurality of primary extension products thereof.
84. The method of claim 83 further comprising:
- removing unligated target nucleic acid molecules, first oligonucleotide probes, and other non-circularized nucleic acid molecules from the sample after said ligating.
85. The method of claim 83, wherein the polymerase is a strand-displacing polymerase.
86. The method of claim 83 further comprising:
- providing a collection of crosslinking oligonucleotides, each crosslinking oligonucleotide comprising two or more repeats of a nucleotide sequence, wherein said nucleotide sequence is the same as at least a portion of the nucleotide sequence of the circular chimeric single-stranded nucleic acid constructs of the collection; and
- capturing one or more tandem linear sequences of a primary extension product on a crosslinking oligonucleotide of the collection, thereby condensing the primary extension product into a compact structure prior to said sequencing.
87. The method of claim 83, wherein said sequencing comprises:
- hybridizing a sequencing primer to each tandem linear sequence of the primary extension product on the solid support;
- extending the sequencing primer; and
- sequencing each tandem linear sequence of the primary extension product on the solid support based on said extending.
88. The method of claim 83, wherein said sequencing is carried out using a method selected from the group consisting of fluorescent primer hybridization, molecular beacon hybridization, primer extension, exonuclease-based sequencing, ligase detection reaction, ligase chain reaction, pyrosequencing, fluorescence-based sequencing-by-synthesis, fluorescence-based sequencing-by-ligation, nanopore based sequencing, ion-based sequencing-by-synthesis, and ion-based sequencing-by-ligation.
89. The method of claim 83, wherein the solid support further comprises:
- a plurality of second oligonucleotide primers immobilized on the solid support, each second oligonucleotide primer comprising a nucleotide sequence that is the same as the second solid support primer-specific portion of the chimeric single stranded nucleic acid constructs of the collection.
90. The method of claim 89, wherein two or more second oligonucleotide primers immobilized on the solid support hybridize to two or more tandem linear sequences of a primary extension product, said method further comprising:
- extending the two or more hybridized second oligonucleotide primers using a polymerase to form a plurality of immobilized secondary extension products, each secondary extension product comprising a nucleotide sequence that is complementary to a portion of the primary extension product; and
- removing the primary extension products from the solid support, whereby said sequencing involves sequencing the plurality of immobilized secondary extension products.
91. A method for sequencing a plurality of target nucleic acid molecules, said method comprising:
- providing a sample containing a plurality of target nucleic acid molecules, wherein said target nucleic acid molecules have 3′ and 5′ ends;
- appending a 3′ linker sequence and a 5′ linker sequence to the 3′ and 5′ ends of the plurality of target nucleic acid molecules, respectively, to form one or more 3′ and 5′ linker-appended target nucleic acid molecules, wherein said 3′ and 5′ linker sequences collectively comprise (i) one or more patient identifier sequences, (ii) one or more sequencing primer binding sequences, or both (i) and (ii);
- providing (a) one or more first oligonucleotide probes, wherein the first oligonucleotide probes comprise a portion complementary to the 3′ linker-appended target nucleic acid molecules, a portion complementary to the 5′ linker-appended target nucleic acid molecules, and a further portion comprising a first solid support primer-specific portion and optionally a second solid support primer-specific portion and (b) one or more second oligonucleotide probes comprising a nucleotide sequence complementary to the further portion of the first oligonucleotide probe(s);
- hybridizing the 3′ and 5′ linker-appended target nucleic acid molecules, the one or more first oligonucleotide probes, and the one or more second oligonucleotide probes, under conditions effective for the one or more first oligonucleotide probes to hybridize in a base specific manner to complementary sequences of the 3′ and 5′ linker-appended target nucleic acid molecules and to hybridize to the second oligonucleotide probe to form one or more ligation competent junctions suitable for coupling 3′ and 5′ ends of the 3′ and 5′ linker-appended target nucleic acid molecules with the second oligonucleotide probe;
- ligating the 3′ and 5′ linker-appended target nucleic acid molecules at the one or more ligation junctions to form a plurality of circular chimeric single-stranded nucleic acid constructs;
- providing a solid support on which a plurality of first oligonucleotide primers is immobilized, each first oligonucleotide primer comprising a nucleotide sequence that is complementary to the first solid support primer-specific portion of the plurality of circular chimeric single-stranded nucleic acid constructs;
- hybridizing the plurality of circular chimeric single-stranded nucleic acid constructs to the plurality of first oligonucleotide primers immobilized on the solid support;
- blending the plurality of circular chimeric single-stranded nucleic acid constructs hybridized to the plurality of first oligonucleotide primers immobilized on the solid support with a polymerase to form a rolling circle amplification reaction mixture;
- subjecting the rolling circle amplification reaction mixture to an extension treatment where the polymerase extends the one or more hybridized oligonucleotide primers to produce a plurality of primary extension products, each primary extension product comprising one or more tandem linear sequences, wherein each tandem linear sequence is complementary to one of the circular chimeric single-stranded nucleic acid constructs in the collection; and
- sequencing the circular chimeric single-stranded nucleic acid constructs directly or said plurality of primary extension products thereof.
92. The method of claim 91 further comprising:
- removing unligated target nucleic acid molecules, first oligonucleotide probes, and other non-circularized nucleic acid molecules from the sample after said ligating.
93. The method of claim 91, wherein the polymerase is a strand-displacing polymerase.
94. The method of claim 91 further comprising:
- providing a collection of crosslinking oligonucleotides, each crosslinking oligonucleotide comprising two or more repeats of a nucleotide sequence, wherein said nucleotide sequence is the same as at least a portion of the nucleotide sequence of the circular chimeric single-stranded nucleic acid constructs of the collection; and
- capturing one or more tandem linear sequences of a primary extension product on a crosslinking oligonucleotide of the collection, thereby condensing the primary extension product into a compact structure prior to said sequencing.
95. The method of claim 91, wherein said sequencing comprises:
- hybridizing a sequencing primer to each tandem linear sequence of the primary extension product on the solid support;
- extending the sequencing primer; and
- sequencing each tandem linear sequence of the primary extension product on the solid support based on said extending.
96. The method of claim 91, wherein said sequencing is carried out using a method selected from the group consisting of fluorescent primer hybridization, molecular beacon hybridization, primer extension, exonuclease-based sequencing, ligase detection reaction, ligase chain reaction, pyrosequencing, fluorescence-based sequencing-by-synthesis, fluorescence-based sequencing-by-ligation, nanopore based sequencing, ion-based sequencing-by-synthesis, and ion-based sequencing-by-ligation.
97. The method of claim 91, wherein the solid support further comprises:
- a plurality of second oligonucleotide primers immobilized on the solid support, each second oligonucleotide primer comprising a nucleotide sequence that is the same as the second solid support primer-specific portion of the chimeric single stranded nucleic acid constructs of the collection.
98. The method of claim 97, wherein two or more second oligonucleotide primers immobilized on the solid support hybridize to two or more tandem linear sequences of a primary extension product, said method further comprising:
- extending the two or more hybridized second oligonucleotide primers using a polymerase to form a plurality of immobilized secondary extension products, each secondary extension product comprising a nucleotide sequence that is complementary to a portion of the primary extension product; and
- removing the primary extension products from the solid support, whereby said sequencing involves sequencing the plurality of immobilized secondary extension products.
99. A system comprising:
- a collection of different chimeric nucleic acid constructs, each construct comprising: one or more single-stranded linker-appended nucleic acid molecules comprising a target nucleic acid molecule from a host organism, wherein said single-stranded linker-appended nucleic acid molecule has 3′ and 5′ ends; wherein the 3′ and 5′ ends of said single-stranded linker-appended nucleic acid molecule have appended 3′ linker sequences and appended 5′ linker sequences on the 3′ and 5′ ends, respectively, which are exogenous to the host organism; and wherein said 3′ and 5′ linker sequences collectively comprise (i) one or more patient identifier sequences, (ii) a first solid support primer-specific portion, (iii) a second solid support primer-specific portion, (iv) one or more sequencing primer binding sequences, or any combination of (i), (ii), (iii), and (iv); one or more first oligonucleotide probes hybridized to the single-stranded linker-appended nucleic acid molecule(s), wherein the one or more first oligonucleotide probes comprises a nucleotide sequence that is exogenous to the host organism, said nucleotide sequence comprising a portion complementary to the 3′ end of the linker-appended nucleic acid molecule(s), and a portion complementary to the 5′ end of the linker-appended nucleic acid molecule(s), wherein there is a ligation competent junction between the 3′ end of the linker-appended nucleic acid molecule(s) and the 5′ end of the linker-appended nucleic acid molecule(s).
100. A system comprising:
- a collection of different chimeric nucleic acid constructs, each construct comprising: one or more single-stranded linker-appended nucleic acid molecules comprising a target nucleic acid molecule from a host organism, wherein said single-stranded linker-appended nucleic acid molecule has 3′ and 5′ ends; wherein the 3′ and 5′ ends of said single-stranded linker-appended nucleic acid molecule have appended 3′ linker sequences and appended 5′ linker sequences on the 3′ and 5′ ends, respectively, which are exogenous to the host organism; and wherein said 3′ and 5′ linker sequences collectively comprise (i) one or more patient identifier sequences, (ii) one or more sequencing primer binding sequences, or both; one or more first oligonucleotide probes hybridized to the single-stranded linker-appended nucleic acid molecule(s), wherein the one or more first oligonucleotide probes comprise a nucleotide sequence that is exogenous to the host organism, said nucleotide sequence comprising a portion complementary to the 3′ sequence of the linker-appended nucleic acid molecule(s), a portion complementary to the 5′ sequence of the linker-appended nucleic acid molecule(s), and a further portion comprising a first solid support primer-specific portion and optionally a second solid support primer-specific portion; one or more second oligonucleotide probes hybridized to the further portion of the first oligonucleotide probe, wherein the second oligonucleotide probe comprises a sequence complementary to the further portion of the first oligonucleotide probe, wherein the 3′ and 5′ ends of the second oligonucleotide probe are adjacent to the 5′ and 3′ ends of the single-stranded linker-appended nucleic acid molecule with a junction suitable for ligation between them.
Type: Application
Filed: Oct 7, 2022
Publication Date: Sep 21, 2023
Inventors: FRANCIS BARANY (New York, NY), John William Efcavitch (San Carlos, CA)
Application Number: 18/045,032