Methods and Uses of Introducing Mutations into Genetic Material for Genome Assembly
Methods of sequencing and assembling a nucleic acid sequence from a nucleic acid sample containing repetitive or low-information regions, which are typically difficult to sequence and/or assemble are provided. The methods of sequencing and assembling introduce mutations into the sample to increase sequence diversity between various repetitive regions present in the nucleic acid sample. This sequence diversity allows various segments to assemble independently of different, but similar sequences present in the nucleic acid sample.
Latest The Board of Trustees of the Leland Stanford Junior University Patents:
- Systems and methods for analyzing, detecting, and treating fibrotic connective tissue network formation
- Indirect liftoff mechanism for high-throughput, single-source laser scribing for perovskite solar modules
- Solution processed metallic nano-glass films
- Engraftment of stem cells with a combination of an agent that targets stem cells and modulation of immunoregulatory signaling
- Compact paired parallel architecture for high-fidelity haptic applications
The current application claims priority to U.S. Provisional Patent Application No. 62/751,469 entitled “Methods and Uses of Introducing Mutations into Genetic Material for Genome Assembly” to Endlich et al., filed Oct. 26, 2018, the disclosure of which is hereby incorporated by reference in its entirety.
FIELD OF THE INVENTIONThe present invention is directed to nucleic acid sequencing, including methods and applications thereof, more particularly, genome sequencing of organisms, including organisms possessing complex genomes that are traditionally difficult to sequence and assemble. The present invention is also directed to methods of assembling genomic sequences derived from complex genomes.
BACKGROUND OF THE INVENTIONThe cost of nucleic acid sequencing has decreased dramatically as sequencing and data analysis technologies and have improved. This cost reduction has made individualized or personalized medicine based on a person's own genetic sequence attainable. However, typical biological sequences (such a genome sequences) can have repetitive and low information regions that make assembly a much more difficult and often an impossible task. In large Eukaryotic genomes, such as human and plant genomes, these problems can be especially severe and make the assembly of these genomes (or genomic subregions) a much more difficult task compared to smaller, more information rich genomes, such as Escherichia coli. The most obvious way to combat these difficulties is to increase read length. Unfortunately, contemporary sequencing platforms capable of long reads are accompanied by very high error rates (as compared to short-read platforms), in addition to limiting sample requirements, such as large amounts of input DNA. Thus, alternative methods to allow genome sequencing and assembly across repetitive regions and low-information regions will improve genome assembly.
SUMMARY OF THE INVENTIONMethods and uses of introducing mutations into genetic material for genome sequencing and assembly are disclosed.
In one embodiment, a method of assembling a genome sequence includes obtaining a nucleic acid sample, mutating the nucleic acid sample, sequencing the mutated nucleic acid sample, and assembling the sequenced mutated nucleic acid sample to build a genome sequence.
In a further embodiment, the method further includes performing size selection on the nucleic acid sample to select a desired size of fragments and generating a sequencing library for the size selected nucleic acid sample, and the mutating step is accomplished by performing a mutagenic reaction on the nucleic acid sample.
In another embodiment, the method further includes the steps of quantifying the size selected nucleic acid sample, and changing the concentration of the size selected nucleic acid sample to a desired concentration.
In a still further embodiment, the changing the concentration step comprises diluting the size selected nucleic acid sample.
In still another embodiment, the method further includes the step of amplifying the size selected nucleic acid sample to generate additional copies of the size selected nucleic acid sample.
In a yet further embodiment, the amplifying step uses a multiple strand displacement amplification reaction.
In yet another embodiment, the amplifying step uses approximately 0.5-10 ng of input nucleic acid.
In a further embodiment again, the introducing mutations step of the method includes performing a multiple displacement amplification reaction using a nucleotide analog.
In another embodiment again, the multiple strand displacement amplification reaction uses Phi29 DNA polymerase.
In a further additional embodiment, the nucleotide analog is selected from the group consisting of deoxy-inosine triphosphate, deoxy-8-oxoguanine triphosphate, and deoxy-2′-Deoxy-P-nucleoside triphosphate.
In another additional embodiment, the nucleotide analog is deoxy-2′-Deoxy-P-nucleoside triphosphate.
In a still yet further embodiment, the generating a sequencing library step generates a sequencing library for an Illumina sequencing platform, and the sequencing step uses an Illumina sequencing platform to sequence the nucleic acid sample.
In still yet another embodiment, a method for producing a sequencing library includes obtaining a template nucleic acid, introducing mutations into the template nucleic acid to create a mutated sample, and generating a sequencing library from the mutated sample.
In a still further embodiment again, the mutations are introduced via a multiple strand displacement amplification.
In still another embodiment again, the multiple strand displacement amplification incorporates a nucleotide analog during the amplification.
In a still further additional embodiment, the nucleotide analog is selected from the group consisting of deoxy-inosine triphosphate, deoxy-8-oxoguanine triphosphate, and deoxy-2′-Deoxy-P-nucleoside triphosphate.
In still another additional embodiment, the multiple strand displacement amplification uses Phi29 DNA polymerase and the nucleotide analog is deoxy-2′-Deoxy-P-nucleoside triphosphate.
In a yet further embodiment again, the mutations are introduced using a chemical mutagen.
In yet another embodiment again, the method further includes amplifying the mutated sample and the sequencing library is generated from the amplified mutated sample.
In a yet further additional embodiment, the method further includes size selecting the mutated sample and quantifying the size selected mutated sample.
These and other features and advantages of the present invention will be better understood by reference to the following detailed description when considered in conjunction with the accompanying drawings where:
Turning now to the diagrams and figures, embodiments of the invention are generally directed to genome sequencing and assembly in accordance with many embodiments of the invention are illustrated. In various embodiments, mutations are introduced into isolated DNA. In some embodiments, the introduced mutations create differences in the isolated DNA that create unique templates in the isolated DNA. In certain embodiments, the unique templates allow for the assembly of repetitive regions. In some embodiments, the mutated template is created using a nucleotide analog incorporated into replicated DNA. In some such embodiments, the nucleotide analog is dPTP.
Current genome sequencing typically use “shotgun” sequencing, where small segments of nucleic acids (typically 100-200 base pairs each) are sequenced and assembled into larger sequences based on sequence similarity between each segment. However, many genomes contain large, repetitive regions throughout the genome. Sequence similarity in these repetitive regions is very high, and the small length of the smaller segments is not enough to span many of these repetitive regions. As such, many of the smaller segments assemble to sequences that are similar, but may not be in the proper location of the genome. This mis-assembly can result in large gaps in an assembled genome. Additionally, some of these regions are difficult to sequence based on inherent biases in current sequencing platforms. Since sequencing these regions is not always possible, a full, definitive assembly cannot be built, which will result in additional gaps in an assembled genome. Because of these gaps, assembled genomes are missing key pieces of information about the structure of a genome, including structural abnormalities, including insertions, deletions, and translocations, as well physical distances between certain regions, including distances between genes and elements capable of regulating the gene, such as a promoter, an enhancer, an insulator, or other regulatory region. Some genetic diseases are linked to structural variation, thus it is important to unlock the knowledge and information that is contained in the regions that are difficult to sequence and/or assemble.
Some current methodologies to sequence across these regions include using long-range sequencing, such as the PacBio platform, which can generate longer sequencing reads. However, these platforms possess large error rates, which reduce accuracy result and can result in a poor quality genome assembly. Additionally, these platforms require large quantities of input nucleic acids (e.g., DNA) in order to build sequencing libraries. Thus, to assemble a genome, fully and accurately, new methods must be developed, which overcome the difficulties in the genome itself and biases in sequencing platforms.
Embodiments of the present invention overcome these challenges by introducing random mutations into a nucleic acid sample to increase sequence diversity between regions that show high levels of sequence similarity. By doing so, the mutated sequences are readable by a nucleic acid sequencer and can be assembled using sequence assembly software. Embodiments disclosed herein demonstrate techniques to insert stochastic mutations at a tunable rate, which is large enough the difficult to assemble regions are now assemblable.
In process 100, an initial step is to obtain a nucleic acid sample 102. In embodiments, the nucleic acid sample is deoxyribonucleic acid (DNA), while in other embodiments, the nucleic acid sample is ribonucleic acid (RNA). In embodiments, the nucleic acid sample is genomic DNA, while other embodiments will obtain a smaller fragment of DNA, such as a plastid DNA, mitochondrial DNA, or DNA isolated in the form of a plasmid, a fosmid, a cosmid, a bacterial artificial chromosome (BAC), a yeast artificial chromosome (YAC), and/or any other sub-genome segment of DNA.
Additionally, at step 102, the DNA may be obtained in any number of ways. In certain embodiments, DNA will be obtained from amplifying DNA present in an environment without an isolation step, while certain embodiments will enrich specific sequences (e.g., targeted enrichment). In certain embodiments, the DNA is isolated from an entire organism or a subpart of an organism, such as an organ or tissue. Isolated DNA can be obtained in any suitable method to isolate DNA, such as using a published protocol or a commercial DNA isolation kit. In some embodiments, the DNA may already be isolated, and the obtaining DNA step 102 is merely to select or remove a portion of the DNA sample for further use in the process 100.
At step 104, mutations are introduced into the DNA obtained in step 102. At this step, the DNA is mutated to increase sequence differences in the obtained DNA. Mutations may be introduced by any number of ways, such as chemical mutagens, ionizing mutagens, or biochemical mutagens
At step 106, the mutated nucleic acid is sequenced in some embodiments. At sequencing step 106, sequencing can occur via map-based sequencing, such that sequence is read in an organized fashion, or sequencing at step 106 can occur via shotgun sequencing, where sequencing occurs to a large number of fragments, then assembled at a later step (e.g., step 108, described below). Sequencing at step 106 can be to any suitable depth for genome assembly, such that some embodiments will sequence to a depth of approximately 1x, where x is the size of the template or reference nucleic acid. In some embodiments, a greater depth may be necessary to fully assemble the sequence. Thus, various embodiments will sequence the sample to a depth of approximately 2x, approximately 3x, approximately 4x, approximately 5x, approximately 10x, approximately 15x, approximately 20x, approximately 25x, approximately 30x, approximately 40x, approximately 50x, or approximately 100x.
Specific machinery used at sequencing step 106 can be performed on any suitable sequencing platform or platforms suitable for further assembly. In some embodiments, sequencing is performed on a suitable sequencing platform, such as an ABI 3730, an ABI SOLiD, an Illumina HiSeq, an Illumina MiSeq, an Illumina MiniSeq, an Illumina iSeq, an Illumina NexSeq, an Illumina NovSeq, an MGISEC-T7, a Roche 454, an Ion Torrent PGM, an Ion Torrent Proton, a Helicos platform, a Pacific Biosciences RSII, a Pacific Biosciences Sequel, an Oxford Nanopore MinION, an Oxford Nanopore GridION, an Oxford Nanopore PromethION, and/or a combination thereof. To sequence the mutated nucleic acid, various embodiments will generate a sequencing library suitable for the sequencing platform or platforms performing the sequencing. In some embodiments, the sequencing library will be built for single-end sequencing, while certain embodiments will build a library for paired-end sequencing, and various embodiments will build a library for mate-pair sequencing.
Further, process 100 assembles the nucleic acid sequence at step 108. In some embodiments, assembly is performed as a single step with the entire reference being assembled at once, while other embodiments will perform multiple rounds of assembly to allow for fragments to assemble, which are then assembled into a full reference sequence.
At step 108, assembly can use one or more algorithms or software packages for assembly suitable to the needs of the genome, such as short read sequencing (e.g., 100-300 base reads) or long read sequencing (10,000+ base reads). For example, various embodiments will use AFEAP cloning Lasergene Genomics Suite, DNASTAR Lasergene Genomics Suite, Newbler, Phrap, Plass, SPAdes, Velvet, HGAP, Falcon, Canu, MaSuRCA, Hinge, ABySS, Bowtie, and/or a combination thereof, which are suitable for read length or fragment size. In embodiments where short reads are assembled into larger sequences before a full assembly, a combination of a short-read assembler is used to assemble short reads into larger fragments followed by a long-read assembler to assemble the larger fragments into a full reference sequence. In some embodiments using a combination of short- and long-read assemblers, these embodiments will use at least one of ABySS and SPAdes for short-read assembly and Canu for long read assembly.
While process 100 generally describes the process of various embodiments disclosed herein, certain embodiments will include additional, specific steps as part of some of the steps described above, which are described in depth below.
Additionally, the above steps of the flow diagram of
Turning now to
In embodiments utilizing chemical mutagenesis, the nucleic acid sample is exposed to a chemical mutagen, which alters a base and/or an interaction between bases, which allow for a different base to be introduced into the nucleic acid. Various embodiments will use ethyleneimine (EI), nitrogen mustard, Sulphur mustard, sodium bisulfite, diethylnitrosamine (DMN), diethylsulphonate (DES), nitrosomethylurea (NMU), ethyleneoxide (EO), diepoxybutane (DEB), diethylsulphonate (DES), methylmethanesulphonate (MMS), ethylmethanesulphonate (EMS), nitrous acid, maleic hydrazide, hydroxylamine, and/or combinations thereof to alter the nucleic acid. In various embodiments, alternate bases are introduced during replication of the sample nucleic acid, based on the alteration caused by one or more of the above mutagens.
Similar to chemical mutagenesis, certain embodiments will use ionizing radiation to alter bases and/or base interactions. Various embodiments will expose the nucleic acid sample to radiation, such as UV radiation, gamma radiation, alpha particles, beta particles, and/or combinations thereof to create base pair changes in the nucleic acid sample. As noted above, alternate bases are introduced during replication of the sample nucleic acid, based on the alteration caused by one or more of the above ionizing radiation methods.
Further, some embodiments will utilize biochemical mutagenesis, which utilizes DNA replication machinery, such as a polymerase, to introduce mutations. In certain embodiments, an error-prone polymerase will be used to introduce base pair mismatches. Additional embodiments will utilize nucleotide analogs to substitute for bases during replication, which allow for different bases to be introduced during replication of the sample nucleic acid, thus creating a mutated version of the sample nucleic acid. These biochemical methods can utilize techniques, such as polymerase chain reaction (PCR), multiple strand displacement amplification (MDA) methods, rolling circle amplification (RCA), and any other known method of amplifying and/or replicating nucleic acids in vitro. Because certain embodiments will introduce mutations using amplification and/or replication reactions, a smaller amount of input nucleic acid can be utilized as compared to long-read methodologies. As such, a number of embodiments will use approximately nanogram levels of input nucleic acid, rather than the microgram starting amounts of nucleic acid in long-read platforms. For example, several embodiments will use approximately 0.5-10 ng of starting nucleic acid, while additional embodiments will use less than 0.5 ng of starting nucleic acid.
Various embodiments will use 5-fluoro uracil, 5-iodo deoxyuridine, 6-mercaptopurine, 6-thioguanine, 8-azaguanine, 5-azauridine, 6-azauridine, 6-azacytidine, 4-hydroxypyrazolopyrimidine, inosine, 8-oxoguanine, 2′-Deoxy-P-nucleoside, and/or combinations thereof. When using a nucleotide analog, the above bases will be attached to a ribose-triphosphate or deoxyribose-triphosphate in order to be incorporated into a new strand during the polymerization of RNA or DNA, respectively.
In embodiments using a polymerase to introduce mutations, the polymerase is a standard polymerase used for molecular replication, such as DNA polymerase I, DNA polymerase II, DNA polymerase III, DNA polymerase IV, DNA polymerase V, RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, RNA polymerase V, Taq polymerase, Phi29 polymerase, Bst polymerase, Bsu DNA polymerase, Vent exo-DNA polymerase, T4 DNA polymerase, T7 DNA polymerase, T7 RNA polymerase, any other applicable polymerase, enzymatic variants of polymerases (e.g., EquiPhi29), and/or a combination thereof. In some embodiments, the polymerase is selected for processivity, such that polymerases that have high levels of processivity can be beneficial for generating long segments of replicated DNA and/or RNA. In additional embodiments, the polymerase can be selected for exonuclease activity. Exonuclease activity is typically associated with error correcting in strand replication, such that a base pair mismatch is excised from the replicated strand and polymerization continues. As such, various embodiments will select polymerases exhibiting reduced exonuclease activity. Further, various embodiments will select a polymerase for strand displacement properties, which allow the polymerase to displace a complementary segment of DNA or RNA that is bound to a template or reference segment of DNA or RNA. Strand displacement properties allow for a polymerase to continually polymerize a growing strand of DNA despite any prior existing pieces of DNA or RNA. As such, polymerases exhibiting strand displacement properties may allow for longer pieces of replicated DNA or RNA to be generated. In many embodiments, Phi29 DNA polymerase is used as the polymerase to introduce mutations, because Phi29 DNA polymerase exhibits strand displacement properties and a high level of processivity. Although Phi29 DNA polymerase can be a good option, any DNA polymerase exhibiting similar characteristics may also be used.
It should be noted that a suitable polymerase may not possess all beneficial characteristics, such as high processivity, strand displacement, and low exonuclease activity. In situations such as this, various embodiments will select a nucleotide analog that is not susceptible to exonuclease activity of the selected polymerase. For example, certain embodiments will utilize Phi29 DNA polymerase along with 2′-Deoxy-P-nucleoside-5′-Triphosphate (dPTP). It should be noted that this combination of Phi29 DNA polymerase and dPTP is only one example of a possible polymerase and nucleotide analog within the scope of this disclosure and is not limiting on the scope of this disclosure.
Additionally, certain enzymes may not represent all regions of a genome, due to biases against GC-rich regions, due to stronger bonding between the GC base pairs. As such, alterations to amplification protocols, including amplification at higher temperatures, enzymes that do not show an anti-GC bias, a change in primer mixture and concentration, and/or a combination thereof can be used to assure amplification and sequencing of these regions. An example of higher temperature enzymes is the EquiPhi29 enzyme. Additionally, MDA generally has a bias against the ends of linear DNA fragments, causing an underrepresentation of these regions. Certain embodiments will incorporate non-random primers that increase the amplification of fragment ends to increase the read depth and assembly of these regions. Additionally, using non-random primers in some embodiments will allow for targeted sequencing of specific regions, genes, and/or other panel of interest in a target species.
Some embodiments will clean or purify the mutated sample at step 204. In embodiments including this step, the mutated nucleic acid sample is isolated from other components that persist from step 202. As such, remnant mutagens, nucleotides, enzymes, buffer, salt and/or other remnants will be removed through known means, such as using column purification, gel purification, alcohol precipitation, salt precipitation, and/or a combination thereof. Additionally, non-mutated template DNA may coprecipitate with the mutated DNA during purification. As such, this non-mutated template can be a contaminant for downstream sequencing and assembly. Certain embodiments will utilize selective methods to filter out the non-mutated template. Such selective methods include incorporating a tag or other moiety onto certain nucleotides during amplification. Such that a selection column will hold the tag or other moiety, thus allowing the non-mutated template to flow through into waste. The selection column would then allow for the elution of the mutated amplification product.
In certain embodiments, the purified sample will be quantified and diluted to a desired concentration for further use. In embodiments that quantify the sample, known methods of quantifying nucleic acids will be utilized, such as light absorption, fluorescence using a dye that binds to nucleic acids. For example, when quantifying a nucleic acid sample using absorption, a spectrophotometer capable of measuring absorption in the UV-Vis range of light is used, including spectrophotometers such as a ThermoScientific NanoDrop 2000. When using fluorescence, a suitable dye is used to bind the nucleic acid, which is then excited, and the emission wavelength is measured using a fluorometer, such as a ThermoScientific NanoDrop 3300 or Qubit. Suitable dyes will also be able to be excited by the specific fluorometer and the fluorometer will be able to read the specific emission wavelength. In various embodiments, the suitable dye is selected from ethidium bromide, propidium iodide, crystal violet, 4′,6-diamidino-2-phenylindole (DAPI), 7-aminoactinomycin D (7-AAD), Hoechst 33258, Hoechst 33342, Hoechst 34580, PicoGreen, Helixyte, YOYO-1, DiYO-1, TOTO-1, DiTO-1, and/or SYBR. It should be noted that additional spectrophotometers, fluorometers, and dyes are known in the art, which are suitable for quantification of nucleic acids.
As noted above, various embodiments will dilute the mutated sample to a desired concentration. In embodiments that dilute the sample, an amount of water, buffer, or other diluent is added to bring the sample to a final concentration. Typical dilution follows formula (1):
Where Ci represents initial concentration, Vi represents initial volume, Cf represents final (or desired) concentration, and Vf represents final volume. Following this formula, the volume of the diluent is calculated to decrease the concentration to the desired concentration.
At step 206, some embodiments will perform a clean amplification. The purpose of a clean amplification is to convert any mutated or non-canonical bases with canonical bases (e.g., cytosine, guanine, thymine, and adenine). For example, embodiments incorporating nucleotide analogs, a clean amplification will replace the analog with a canonical base. This clean amplification step can be performed in accordance with a relevant amplification method, such as those described in step 202, with the exception that any amplification reaction will add only the canonical bases without the inclusion of base analogs or other mutagens.
At step 208, various embodiments will select fragments for a specific size. In some embodiments, size selection allows the isolation of fragments of a specific size, which can be assembled prior to a full assembly of the reference sequence. The specific size used for this step can vary depending on the amount and size of repetitive regions. As such, large fragments may be necessary for genomes or other samples with large repetitive regions prior to a full assembly, while genomes or samples with smaller repetitive regions may be able to be assembled with relatively smaller fragments. As such, in some embodiments, size selection will select for fragments in the range of approximately 5,000 base pairs to approximately 10,000 base pairs, while other embodiments will select for fragments in the range of approximately 10,000 base pairs to approximately 20,000 base pairs. In further embodiments, fragments will be selected in the range of approximately 20,000 base pairs to approximately 30,000 base pairs. Additionally, in yet other embodiments, fragments will be selected in the range of approximately 30,000 base pairs to approximately 50,000 base pairs. Even more embodiments, fragments will be selected in the range of approximately 50,000 base pairs to approximately 100,000 base pairs. Certain embodiments will size select for more than one size range, e.g., these embodiments can select for fragments in the approximately 20,000 to 30,000 base pair range as well as the approximately 50,000 to 100,000 base pair range.
The specific method to select for fragments of a specific size range can vary based on the limitations of the method. In various embodiments, size selection will utilize gel electrophoresis, such as using an agarose gel or an acrylamide gel. In embodiments using gel electrophoresis, the mutated sample is electrophoresed through the gel to allow separation of fragments based on size, where smaller fragments will travel further through the gel than larger fragments. When selecting a specific size, a piece of the gel representing the desired size range will be removed, and the nucleic acid sample will be isolated from the gel through any suitable means known in the art, including using commercial gel extraction kits, Beta-Agarose I digestion, and/or a freeze-n-squeeze method. Additional methods of performing size selection are known in the art, which can be used. As such, some embodiments will utilize a bead or column capture technique, which are commercially available as kits or can be generated in a lab. Additional embodiments will use specialized machinery, such as a Sage Science Pippin Prep to size select fragments of the desired size.
At step 210, various embodiments will quantify the size-selected sample. Additional embodiments will also dilute the size-selected sample to a desired concentration. Means for performing both of these processes are known in the art and discussed above in regard to step 204.
At step 212, the size-selected sample is amplified to increase the concentration of the sample in a number of embodiments. At this step in certain embodiments, the mutated or analog bases create sequence diversity by being replaced with native bases, such as adenine, guanine, thymine, and cytosine. Methods to amplify the sample are known in the art and includes such methods as polymerase chain reaction (PCR) or multiple displacement amplification (MDA). After amplification, various embodiments will clean or purify the sample to remove remnants of the reaction. Means for cleaning or purifying the sample are discussed above in regard to step 204. Further embodiments will also quantify and/or dilute the sample place the sample at a desired concentration for further use. Methods for quantifying and diluting nucleic acid samples are discussed above in regard to step 204.
Further, various embodiments will generate a sequencing library at step 212. Numerous methods are known in the art to generate a sequencing library. Sequencing libraries are typically specific to a single sequencing platform, such that specific features or adapters are necessary on sequencing fragments in order for a sequencer to produce a sequence from a fragment. Various embodiments will utilize commercial kits to generate libraries while other embodiments will utilize known techniques to generate the sequencing libraries using protocols to introduce adapters or primers to fragments through PCR or ligation.
In certain embodiments, sequencing libraries include specific tags or barcodes to identify specific samples. In this way, some embodiments that size select fragments can utilize different barcodes for individual fragments within a single sequencing reaction. By differentially barcoding individual fragments, these embodiments can further isolate fragments to assure assembly of unique fragments in the genome.
Additionally, the above steps of the flow diagram of
The ability of some embodiments to select a specific mutation rate can be accomplished in a number of ways. For example, in chemical mutagenesis, buffer concentration, mutagen concentration, reaction temperature, reaction time, adding additional reagents, template DNA concentration, and/or a combination thereof can be adjusted. By increasing or decreasing these parameters, the mutation rate can be resolved and or identified. For example, increasing mutagen concentration may increase the likelihood of mutations, thus increasing mutation rate.
In biochemical mutagenesis, the reaction parameters may also be adjusted, such that reaction temperature, adding additional reagents, base analog concentration, DNA concentration, canonical nucleotide concentration, and/or a combination thereof can be adjusted in many embodiments.
Turning to
Turning to
Turning to
Turning to
Although the following embodiments provide details on certain embodiments of the inventions, it should be understood that these are only exemplary in nature, and are not intended to limit the scope of the invention.
Example 1: Generating Mutations in Sample Nucleic AcidMethods: In one exemplary embodiment, genomic DNA from Arabidopsis thaliana was acquired and mutated using MDA. In this exemplary embodiment, 1 μL of 2× alkaline denaturation solution and 1 μL of genomic DNA (at a concentration of ˜5-10 ng/μL) were added to a reaction tube, mixed gently, and incubated at room temperature for 3 minutes. After this, the reaction tube was placed on ice, where 2 μL of 2× alkaline denaturation solution was added and mixed gently. After which, 16 μL of a master mix was added and mixed gently. The master mix consisted of Phi29 DNA polymerase, polymerase buffer, bovine serum albumin (BSA), random exo-resistant hexamer primers, 100 μM dNTP mix, and 200 μM dPTP. This reaction solution was incubated at 30° C. for 3 hours and 30 minutes. The reaction was stopped by increasing the temperature of the reaction to 65° C. for 3 minutes in order to denature the polymerase, followed by a 12° C. hold until further processing. The sample was amplified by a clean MDA amplification following the same reaction conditions for mutational MDA with the exception that no dPTP was included in the clean MDA reaction solution. This was followed by size selection and dilution. Sequencing libraries were generated from the mutated sample and a non-mutated control genomic DNA sample and sequenced on an Illumina MiSeq to a length of 98 base pairs. The reads were then aligned to the A. thaliana reference genome sequence
Results:
Conclusion: This exemplary embodiment shows how embodiments can reliably produce large fragments of mutated DNA with a desirable mutation frequency.
Example 2: Mutational SpectrumMethods: In another exemplary embodiment, the Mutational MDA was performed on A. thaliana DNA according the methods in Example 1 and the mutational spectrum was investigated for the purposes of distinguishing it from sequencing error and misalignment background noise. Additionally, three samples varied the concentration of dPTP in the reaction mix, such that a first sample used 200 μM dPTP, a second sample used 400 μM dPTP, and a third sample used 600 μM dPTP. Read alignments to the reference assembly were performed using the Bowtie 2 software.
Results: The mutation type generated in this embodiment is predictably purine-to-purine or pyrimidine-pyrimidine.
Additionally, the position of the mutation is relatively equal across the length of a sequencing read coming from a non-control sample, as illustrated in
Conclusion: The presence of dPTP in the reaction generates a clear mutational spectrum relative to background sequencing and alignment error rates. Further, mutations generated by mutational MDA are evenly distributed across a sequencing read without bias toward any specific location on the read. Knowledge of this particular spectrum can aid in downstream bioinformatics and error correction.
Example 3: Assembling Mutated FragmentsMethods: In one exemplary embodiment, the sequence of chromosome III of Caenorhabditis elegans was computationally mutated at a rate comparable to what can be generated by methods described above. Synthetic sequencing reads were generated based on a control (non-mutated) and a simulated mutated chromosome III sequences using the ART simulated read program. These simulated mutated sequences were selected by sampling the chromosome randomly with a window centered around 30,000 bp (with a 3,000 bp standard deviation) for a total of 0.5× coverage. The control and mutated sequencing reads were assembled using the ABySS assembler to recreate large fragments. The assembled contigs were then aligned to the chromosome III reference sequence. Finally, this process was repeated 21 times, generating mutated and non-mutated assembled contigs for a total of 10.5× coverage. These assembled contigs were subsequently assembled into a final assembly using the Canu assembler.
Results: The size of assembled contigs from the non-mutated and mutated samples is illustrated in
Further,
Also, very few contigs do not map to the reference sequence, indicating few complete misassemblies. Out of a total of 5,209 contigs generated from the mutated sample, only 8 contigs did not align. These non-mapping contigs are illustrated in
Additionally, the final assembly of the control showed numerous gaps in the genome, as illustrated in
The final assembly of the mutated contigs shows significant homology with the reference chromosome III sequence, as illustrated in
Conclusion: Mutated sequence produces larger contigs with lower misassembly rates that subsequently assemble into a final assembly that cover more of the genome with far fewer gaps and with a high level of fidelity to the sequence from which it originates. Thus, mutating a sample prior to sequencing provides a better assembly than non-mutated samples.
Example 4: Assembling a Repetitive BAC SequenceMethods: In another exemplary embodiment, a repetitive bacterial artificial chromosome (BAC) was assembled. In this embodiment, BAC 2-5C-13-12 from a maize (Zea mays) was isolated, mutated, sequenced, and assembled using a combination of SPAdes, ABySS, and Canu assemblers in accordance with the methods described herein. A control sample (non-mutated) was separately sequenced and assembled using SPAdes and ABySS assemblers.
Results:
Additionally,
Conclusion: Introducing mutations into repetitive sequences allows for the assembly of larger fragments of a genome (e.g., chromosome, subregion, etc.) without gaps.
DOCTRINE OF EQUIVALENTSAlthough the invention has been described in detail with particular reference to these preferred embodiments, other embodiments can achieve the same results. Variations and modifications of the present invention will be obvious to those skilled in the art and it is intended to cover all such modifications and equivalents. The entire disclosures of all references, applications, patents, and publications cited above, and of the corresponding application(s), are hereby incorporated by reference.
Claims
1.-20. (canceled)
21. A method of assembling a genome sequence comprising:
- obtaining a nucleic acid sample;
- mutating the nucleic acid sample;
- sequencing the mutated nucleic acid sample; and
- assembling the sequenced mutated nucleic acid sample to build a genome sequence.
22. The method of claim 21, wherein the mutating is done by ionizing radiation.
23. The method of claim 22, wherein the ionizing radiation is UV radiation, gamma radiation, alpha particles, beta particles, and/or combinations thereof.
24. The method of claim 21, wherein the mutating is done by an amplification-based technique.
25. The method of claim 24, wherein the amplification-based technique is polymerase chain reaction (PCR), multiple strand displacement amplification (MDA), and/or rolling circle amplification (RCA).
26. The method of claim 24, wherein the amplification-based technique is polymerase chain reaction (PCR).
27. The method of claim 24, wherein the amplification-based technique is multiple strand displacement amplification (MDA).
28. The method of claim 24, wherein the amplification-based technique is rolling circle amplification (RCA).
29. The method of claim 25, further comprising introducing mutations into the sample during the amplification by incorporating 2′-Deoxy-P-nucleoside-5′-Triphosphate (dPTP).
30. The method of claim 25, wherein the polymerase exhibits strand displacement properties and a high level of processivity.
31. The method of claim 30, wherein the polymerase is Phi29 DNA polymerase or an enzymatic variant thereof.
32. The method of claim 30, wherein the polymerase is Phi29 DNA polymerase or EquiPhi29.
33. The method of claim 25, further comprising introducing mutations into the sample during the amplification by incorporating 2′-Deoxy-P-nucleoside-5′-Triphosphate (dPTP).
34. The method of claim 25, wherein the polymerase exhibits strand displacement properties and a high level of processivity.
35. The method of claim 30, wherein the polymerase is Phi29 DNA polymerase or an enzymatic variant thereof.
36. The method of claim 30, wherein the polymerase is Phi29 DNA polymerase or EquiPhi29.
37. A method comprising:
- obtaining a sample comprising a template nucleic acid;
- introducing mutations into the sample via amplification of the template nucleic acid to create a mutated sample, wherein the amplification comprises contacting the template nucleic acid with a polymerase under conditions that promote amplification of the template nucleic acid;
- sequencing the amplified nucleic acids comprising the mutated sample; and
- assembling the sequences of the amplified nucleic acids comprising the mutated sample to build a genome or genomic region sequence;
- wherein the polymerase is Phi29 DNA polymerase or an enzymatic variant thereof; and
- wherein introducing mutations into the sample comprises incorporating 2′-Deoxy-P-nucleoside-5′-Triphosphate (dPTP) during the amplification;
- wherein introducing the mutations with the polymerase and the dPTP facilitates the assembly and lowers the rate of misassembly.
38. The method of claim 37, wherein the enzymatic variant thereof is EquiPhi29.
39. The method of claim 37, wherein the amplification is polymerase chain reaction (PCR), multiple strand displacement amplification (MDA), and/or rolling circle amplification (RCA).
40. The method of claim 38, wherein the amplification is polymerase chain reaction (PCR), multiple strand displacement amplification (MDA), and/or rolling circle amplification (RCA).
Type: Application
Filed: Sep 9, 2021
Publication Date: Jun 2, 2022
Applicant: The Board of Trustees of the Leland Stanford Junior University (Stanford, CA)
Inventors: Solomon Endlich (Palo Alto, CA), Devin King (La Cañada, CA), Ashby J. Morrison (Stanford, CA)
Application Number: 17/471,084