SEQUENTIAL SEQUENCING

Info

Publication number: 20140274738
Type: Application
Filed: Mar 14, 2014
Publication Date: Sep 18, 2014
Applicant: NuGEN Technologies, Inc. (San Carlos, CA)
Inventors: Doug Amorese (Los Altos, CA), Benjamin G. Schroeder (San Mateo, CA), Jonathan Scolnick (San Francisco, CA)
Application Number: 14/211,261

Abstract

The present invention provides improved methods, compositions and kits for short read next generation sequencing (NGS). The methods, compositions and kits of the present invention enable phasing of two or more nucleic acid sequences in a sample, i.e. determining whether the nucleic acid sequences (typically comprising regions of sequence variation) are located on the same chromosome and/or the same chromosomal fragment. Phasing information is obtained by performing multiple, successive sequencing reactions from the same immobilized nucleic acid template. The methods, compositions and kits provided herein are useful, for example, for haplotyping, SNP phasing, or for determining downstream exons in RNA-seq.

Description

Description

CROSS-REFERENCE

This application claims the benefit of U.S. Provisional Application No. 61/801,600, filed Mar. 15, 2013, which application is incorporated herein by reference.

BACKGROUND OF THE INVENTION

Short read next generation sequencing (NGS) analysis has some limitations in both research and diagnostics. One key drawback is the problem of phasing. That is, when interrogating multiple loci of sequence variation, it is often impossible to determine which loci are co-located on the same chromosome or on the same chromosomal fragment. One example of a phasing problem occurs in diploid organisms in which two parental chromosomes, one from the mother and one from the father, are inherited, resulting in two copies of each gene (except for the genes carried on the sex chromosomes). Within each copy of the two copies of a gene in a diploid cell are regions of sequence variation, or loci, that fall within distinct sequence types known as alleles. Thus, allelic variation across different loci might exist within a single chromosome (maternal or paternal) of a chromosome pair, or across both chromosomes of a chromosome pair. Determining which loci or regions of sequence variation are co-located on the same (maternal or paternal) chromosome is useful for a variety of reasons, as discussed further below.

The pattern of alleles within each individual chromosome is referred to as haplotype. Haplotyping has many diagnostic and clinical applications. For example, two inactivating mutations across different loci within a single gene might be of little or no consequence if present on the same individual chromosome (i.e. chromosome of either maternal or paternal origin), because the other copy of the gene product will remain functional. On the other hand, if one of the inactivating mutations is present in the maternal chromosome and the other in the paternal chromosome, there is no functional copy of the gene product, resulting in a negative phenotype (non-viability, increased risk for disease and others). Haplotyping is also used to predict risk or susceptibility to specific genetic diseases, as many genetic associations are tied to haplotypes. For example, the various haplotypes of the human leukocyte antigen (HLA) system are associated with genetic diseases ranging from autoimmune disease to cancers.

Another instance in which phasing information is useful is distinguishing between functional genes and their non-functional pseudogene counterparts within the genome. One well known functional gene/pseudogene pair is the genes SMN1 and SMN2, which differ in sequence by only five nucleotides over many Kb of sequence, yet one of the nucleotide differences renders the SMN2 gene almost completely non-functional. Using short read sequencing, a mutation may be found in one of the two genes, but unless the mutation happens to occur within the sequencing read that also covers one of the known nucleotide differences between SMN1 and SMN2, it will be impossible to know which of the genes (the functional gene, or the nonfunctional pseudogene) is mutated.

The present NGS methods employ short read sequencing to query regions of variable DNA sequence (polymorphisms etc.) interspersed within regions of conserved DNA sequence. As significant blocks of conserved sequence are typically interspersed between the variable regions, short read sequencing does not lend itself to phasing analysis. Although methods have been developed to obtain phasing information, these methods (for example, Sanger sequencing and subcloning), are typically labor intensive and/or costly.

There is a need for improved NGS methods that provide phasing information. Such methods would ideally provide a highly parallel platform for performing multiple sequencing reactions from the same immobilized templates. The invention described herein fulfills this need.

SUMMARY OF THE INVENTION

The present invention provides novel methods, compositions and kits for phasing two or more nucleic acid sequences in a sample. Specifically, an important aspect of this invention is the methods and compositions that allow for determining whether two or more nucleic acid sequences (typically comprising regions of sequence variation) are located on the same nucleic acid template, such as a chromosome or a chromosomal fragment. The methods and compositions of the invention can also be used to distinguish and differentiate between two closely related nucleic acid sequences by compiling and aligning data from sequential sequencing reads.

The methods, kits and compositions of the present invention employ sequential paired sequencing reads from the same immobilized nucleic acid template. The reads are generated by successive rounds of priming, sequencing, denaturing and repriming, and the results from multiple reads originating from the same template are compiled to obtain phasing information.

Additionally, the methods, kits and compositions of the present invention employ pools of oligonucleotides used as priming sites in sequencing by synthesis reactions that target specific regions of specific DNAs for sequencing. These oligonucleotide pools can be used onboard a sequencer to extend the sequencing of DNAs that have already undergone first round of sequencing.

In one aspect, the invention provides a method for relating multiple nucleic acid sequences (typically comprising regions of sequence variation) to the same nucleic acid template. In some embodiments, the method comprises: a) creating a directional nucleic acid library; b) sequencing the library with an oligonucleotide primer; c) denaturing the first strand; d) performing a second round of sequencing by introducing a new oligonucleotide primer containing sequence complementary to conserved regions present in some of the nucleic acid templates within the nucleic acid library; e) repeating steps c) and d) as needed; and f) compiling sequencing data from the successive sequencing reads to differentiate between closely related nucleic acid sequences.

In some embodiments, the directional nucleic acid library comprises closely related nucleic acid sequences as inserts. In some embodiments, the conserved regions within the nucleic acid inserts are located adjacent to variable regions. In some embodiments, alignment of multiple variable regions enables differentiating between and/or typing of related transcripts. In some embodiments, alignment of multiple variable regions enables differentiating between and/or typing of related micro-organisms.

In another aspect, the invention provides a method for differentiating between closely related nucleic acid sequences (such as genes and pseudogenes) by using specific-sets of oligonucleotide primers containing sequence complementary to a common region shared by the closely related sequences. In some embodiments, the method comprises: a) creating a directional sequencing library with closely related nucleic acid sequences as inserts; b) sequencing the library with an oligonucleotide primer; c) denaturing the first strand; d) performing a second round of sequencing by introducing a new oligonucleotide primer containing sequence complementary to conserved regions present in some of the nucleic acid templates within the nucleic acid library; e) repeating steps c) and d) as needed; and f) compiling sequencing data from the successive sequencing reads to differentiate between closely related nucleic acid sequences.

Kits for performing any of the methods described herein are another feature of the invention. Such kits may include reagents, enzymes and platforms for amplification and sequencing of nucleic acids. In one embodiment, a kit is provided comprising: a) an adaptor or several adaptors, b) one or more of oligonucleotide primers, and c) reagents for amplification. In another embodiment, the kit further comprises reagents for sequencing. A kit will preferably include instructions for employing the kit components as well as the use of any other reagent not included in the kit.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

FIG. 1 depicts sequential sequencing method as applied to 16S microbial rRNA characterization, as described in Example 1.

FIG. 2 depicts the use of specific oligonucleotide pools and the generation of mated pairs of sequencing reads to differentiate between two closely related nucleotide sequences, such as a gene/pseudogene pair.

DETAILED DESCRIPTION OF THE INVENTION General

The methods of the invention can be used for determining whether two or more nucleic acid sequences (typically comprising regions of variable sequence) in a sample are located on the same nucleic acid template, such as a chromosome or a chromosomal fragment. The methods of the invention can be further used to differentiate between closely related nucleic acid sequences. Such methods are useful, for example, for haplotyping, SNP phasing, determining downstream exons in RNA-seq, and in genetic diagnostics applications. The methods, kits and compositions of the present invention employ sequential paired sequencing reads from the same immobilized nucleic acid template. Altogether, the methods of the present invention provide an improvement over the existing methods by offering a highly parallel, efficient method for obtaining phasing information.

Reference will now be made in detail to exemplary embodiments of the invention. While the disclosed methods and compositions will be described in conjunction with the exemplary embodiments, it will be understood that these exemplary embodiments are not intended to limit the invention. On the contrary, the invention is intended to encompass alternatives, modifications and equivalents, which may be included in the spirit and scope of the invention.

Unless otherwise specified, terms and symbols of genetics, molecular biology, biochemistry and nucleic acid used herein follow those of standard treatises and texts in the field, e.g. Kornberg and Baker, DNA Replication, Second Edition (W. H. Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, Second Edition (Wiley-Liss, New York, 1999); Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, Oxford, 1984); and the like.

Phasing and Haplotype

As used herein, the term “phasing” refers to the process of determining whether two or more nucleic acid sequences (typically comprising regions of sequence variation) are located on the same nucleic acid template, such as a chromosome or a chromosomal fragment. Phasing may refer to resolving two or more single-nucleotide variants or polymorphisms within a single sequencing read. Alternatively, phasing may refer to resolving sequencing data over a large genomic region, or resolving a whole genome sequence.

As used herein, the term “haplotype” refers to the pattern of alleles within each individual chromosome. Alternatively, haplotype may refer to a set of single-nucleotide polymorphisms (SNPs) that are linked or present together on a single chromosome. The term haplotype may be used to refer to as few as two alleles or SNPs that are linked or present together on a single chromosome.

Oligonucleotides of the Invention

As used within the invention, the term “oligonucleotide” refers to a polynucleotide chain, typically less than 200 residues long, most typically between 15 and 100 nucleotides long, but also intended to encompass longer polynucleotide chains. Oligonucleotides may be single-or double-stranded. The terms “oligonucleotide probe” or “probe”, as used in this invention, refer to an oligonucleotide capable of hybridizing to a complementary nucleotide sequence. As used in this invention, the term “oligonucleotide” may be used interchangeably with the terms “primer”, “adaptor” and “probe”.

As used herein, the terms “hybridization”! “hybridizing” and “annealing” are used interchangeably and refer to the pairing of complementary nucleic acids.

The term “primer”, as used herein, refers to an oligonucleotide, generally with a free 3′ hydroxyl group, that is capable of hybridizing with a template (such as a target polynucleotide, target DNA, target RNA or a primer extension product) and is also capable of promoting polymerization of a polynucleotide complementary to the template. A primer may contain a non-hybridizing sequence that constitutes a tail of the primer. A primer may still be hybridizing to a target even though its sequences are not fully complementary to the target.

The primers of the invention are generally oligonucleotides that are employed in an extension reaction by a polymerase along a polynucleotide template, such as in PCR or cDNA synthesis, for example. The oligonucleotide primer is often a synthetic polynucleotide that is single stranded, containing a sequence at its 3′-end that is capable of hybridizing with a sequence of the target polynucleotide. Normally, the 3′ region of the primer that hybridizes with the target nucleic acid has at least 80%, preferably 90%, more preferably 95%, most preferably 100%, complementarity to a sequence or primer binding site.

“Complementary”, as used herein, refers to complementarity to all or only to a portion of a sequence. The number of nucleotides in the hybridizable sequence of a specific oligonucleotide primer should be such that stringency conditions used to hybridize the oligonucleotide primer will prevent excessive random non-specific hybridization. Usually, the number of nucleotides in the hybridizing portion of the oligonucleotide primer will be at least as great as the defined sequence on the target polynucleotide that the oligonucleotide primer hybridizes to, namely, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least about 20, and generally from about 6 to about 10 or 6 to about 12 of 12 to about 200 nucleotides, usually about 10 to about 50 nucleotides. In general, the target polynucleotide is larger than the oligonucleotide primer or primers as described previously.

In some cases, the identity of the investigated target polynucleotide sequence is known, and hybridizable primers can be synthesized precisely according to the antisense sequence of the aforesaid target polynucleotide sequence. In other cases, when the target polynucleotide sequence is unknown, the hybridizable sequence of an oligonucleotide primer is a random sequence. Oligonucleotide primers comprising random sequences may be referred to as “random primers”, as described below. In yet other cases, an oligonucleotide primer such as a first primer or a second primer comprises a set of primers such as for example a set of first primers or a set of second primers. In some cases, the set of first or second primers may comprise a mixture of primers designed to hybridize to a plurality (e.g. 2, 3, 4, about 6, 8, 10, 20, 40, 80, 100, 125, 150, 200, 250, 300, 400, 500, 600, 800, 1000, 1500, 2000, 2500, 3000, 4000, 5000, 6000, 7000, 8000, 10,000, 20,000, 25,000 or more) of target sequences. In some cases, the plurality of target sequences may comprise a group of related sequences, random sequences, a whole transcriptome or fraction (e.g. substantial fraction) thereof, or any group of sequences such as mRNA.

In some embodiments of the invention, random priming is used. A “random primer”, as used herein, is a primer that generally comprises a sequence that is not designed based on a particular or specific sequence in a sample, but rather is based on a statistical expectation (or an empirical observation) that a sequence of the random primer is hybridizable, under a given set of conditions, to one or more sequences in a sample. A random primer will generally be an oligonucleotide or a population of oligonucleotides comprising a random sequence(s) in which the nucleotides at a given position on the oligonucleotide can be any of the four nucleotides A, T, G, C or any of their analogs. A random primer may comprise a 5′ or 3′ region that is a specific, non-random sequence. In some embodiments of the invention, the random primers comprise tailed primers with a 3′ random sequence region and a 5′ non-hybridizing region that comprises a specific, common adaptor sequence. The sequence of a random primer, or its complement, may or may not be naturally occurring, and may or may not be present in a pool of sequences in a sample of interest. A “random primer” can also refer to a primer that is a member of a population of primers (a plurality of random primers) which are collectively designed to hybridize to a desired target sequence or sequences.

In some embodiments of the invention, standard or universal sequencing primers are used. In some embodiments of the invention, sequence-specific primers that hybridize to a conserved region or conserved regions within the nucleic acid inserts in the sequencing library are used. In some embodiments of the invention, the sequence-specific primers are designed to hybridize to conserved regions adjacent to regions of variable sequence within the nucleic acid inserts, thereby enabling differentiating between closely related sequences. In some embodiments of the invention, a set of oligonucleotide primers that hybridize to sequences shared in closely related sequences, such as gene/pseudogene pairs, are used.

The term “adaptor”, as used herein, refers to an oligonucleotide of known sequence, the ligation of which to a target polynucleotide or a target polynucleotide strand of interest enables the generation of amplification-ready products of the target polynucleotide or the target polynucleotide strand of interest. Various adaptor designs are envisioned. Various ligation processes and reagents are known in the art and can be useful for carrying out the methods of the invention. For example, blunt ligation can be employed. Similarly, a single dA nucleotide can be added to the 3′-end of the double-stranded DNA product, by a polymerase lacking 3′-exonuclease activity and can anneal to an adaptor comprising a dT overhang (or the reverse). This design allows the hybridized components to be subsequently ligated (e.g., by T4 DNA ligase). Other ligation strategies and the corresponding reagents and known in the art and kits and reagents for carrying out efficient ligation reactions are commercially available (e.g., from New England Biolabs, Roche).

Input Nucleic Acid

The input is a nucleic acid. The input nucleic acid can be DNA, or complex DNA, for example genomic DNA. The input DNA may also be cDNA. The cDNA can be generated from RNA, e.g., mRNA. The input DNA can be of a specific species, for example, human, rat, mouse, other animals, specific plants, bacteria, algae, viruses, and the like. The input complex also can be from a mixture of genomes of different species such as host-pathogen, bacterial populations and the like. The input DNA can be cDNA made from a mixture of genomes of different species. Alternatively, the input nucleic acid can be from a synthetic source. The input DNA can be mitochondrial DNA. The input DNA can be cell-free DNA. The cell-free DNA can be obtained from, e.g., a serum or plasma sample. The input DNA can comprise one or more chromosomes. For example, if the input DNA is from a human, the DNA can comprise one or more of chromosome 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. The DNA can be from a linear or circular genome. The DNA can be plasmid DNA, cosmid DNA, bacterial artificial chromosome (BAC), or yeast artificial chromosome (YAC). The input DNA can be from more than one individual or organism. The input DNA can be double stranded or single stranded. The input DNA can be part of chromatin. The input DNA can be associated with histones.

Directional Library Construction

The term “strand specific” or “directional”, as used herein, refers to the ability to differentiate in a double-stranded polynucleotide between the original template strand and the strand that is complementary to the original template strand.

In some embodiments, the methods of the invention contemplate preserving information about the direction of single-stranded nucleic acid molecules while generating double-stranded polynucleotides. One of the strands of the double-stranded polynucleotide is synthesized so that it has at least one modified nucleotide incorporated into it along the entire length of the strand. In some embodiments, the incorporation of the modified nucleotide marks the strand for degradation or removal.

In some embodiments, the methods of the invention contemplate construction of directional nucleic acid libraries as described in pending U.S. application Ser. No. 13/643,056, titled COMPOSITIONS AND METHODS FOR DIRECTIONAL NUCLEIC ACID AMPLIFICATION AND SEQUENCING, Ser. No. 13/643,056.

Methods of Amplification

Methods of amplification are well known in the art. In some embodiments, the amplification is exponential, e.g. in the enzymatic amplification of specific double stranded sequences of DNA by a polymerase chain reaction (PCR). In other embodiments the amplification method is linear. In other embodiments the amplification method is isothermal.

Methods of Sequencinz

The methods of the invention contemplate sequential sequencing of directional NGS libraries. Sequencing methods are also well known in the art.

For example, a sequencing technique that can be used in the methods of the provided invention is the method commercialized by Illumina, as described U.S. Pat. Nos. 5,750,341; 6,306,597; and 5,969,119. Directional (strand-specific) libraries are prepared, and the selected single-stranded nucleic acid is amplified, for example, by PCR. The resulting nucleic acid is then denatured and the single-stranded amplified polynucleotides are randomly attached to the inside surface of flow-cell channels. Unlabeled nucleotides are added to initiate solid-phase bridge amplification to produce dense clusters of double-stranded DNA. To initiate the first base sequencing cycle, four labeled reversible terminators, primers, and DNA polymerase are added. After laser excitation, fluorescence from each cluster on the flow cell is imaged. The identity of the first base for each cluster is then recorded. Cycles of sequencing are performed to determine the fragment sequence one base at a time.

In some embodiments, the methods of the present invention may employ sequencing by ligation methods commercialized by Applied Biosystems (e.g., SOLiD sequencing). In other embodiments, the methods of the present invention may employ sequencing by synthesis using the methods commercialized by 454/Roche Life Sciences, including but not limited to the methods and apparatus described in Margulies et al., Nature (2005) 437:376-380 (2005); and U.S. Pat. Nos. 7,244,559; 7,335,762; 7,211,390; 7,244,567; 7,264,929; and 7,323,305. In other embodiments, the methods of the present invention may employ the sequencing methods commercialized by Helicos BioSciences Corporation (Cambridge, Mass.) as described in U.S. application Ser. No. 11/167,046, and U.S. Pat. Nos. 7,501,245; 7,491,498; 7,276,720; and in U.S. Patent Application Publication Nos. US20090061439; US20080087826; US20060286566; US20060024711; US20060024678; US20080213770; and US20080103058. In other embodiments, the methods of the present invention may employ sequencing by the methods commercialized by Pacific Biosciences as described in U.S. Pat. Nos. 7,462,452; 7,476,504; 7,405,281; 7,170,050; 7,462,468; 7,476,503; 7,315,019; 7,302,146; 7,313,308; and US Application Publication Nos. US20090029385; US20090068655; US20090024331; and US20080206764.

Another example of a sequencing technique that can be used in the methods of the provided invention is nanopore sequencing (see e.g. Soni G V and Meller A. (2007) Clin Chem 53: 1996-2001). A nanopore can be a small hole of the order of 1 nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential across it can result in a slight electrical current due to conduction of ions through the nanopore. The amount of current that flows is sensitive to the size of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule obstructs the nanopore to a different degree. Thus, the change in the current passing through the nanopore as the DNA molecule passes through the nanopore can represent a reading of the DNA sequence.

Another example of a sequencing technique that can be used in the methods of the provided invention is semiconductor sequencing provided by Ion Torrent (e.g., using the Ion Personal Genome Machine (PGM)). Ion Torrent technology can use a semiconductor chip with multiple layers, e.g., a layer with micro-machined wells, an ion-sensitive layer, and an ion sensor layer. Nucleic acids can be introduced into the wells, e.g., a clonal population of single nucleic can be attached to a single bead, and the bead can be introduced into a well. To initiate sequencing of the nucleic acids on the beads, one type of deoxyribonucleotide (e.g., dATP, dCTP, dGTP, or dTTP) can be introduced into the wells. When one or more nucleotides are incorporated by DNA polymerase, protons (hydrogen ions) are released in the well, which can be detected by the ion sensor. The semiconductor chip can then be washed and the process can be repeated with a different deoxyribonucleotide. A plurality of nucleic acids can be sequenced in the wells of a semiconductor chip. The semiconductor chip can comprise chemical-sensitive field effect transistor (chemFET) arrays to sequence DNA (for example, as described in U.S. Patent Application Publication No. 20090026082). Incorporation of one or more triphosphates into a new nucleic acid strand at the 3′ end of the sequencing primer can be detected by a change in current by a chemFET. An array can have multiple chemFET sensors.

Kits

Any of the compositions described herein may be comprised in a kit. In a non-limiting example, the kit, in a suitable container, comprises: an adaptor or several adaptors, one or more of oligonucleotide primers and reagents for ligation, primer extension and amplification. The kit may also comprise means for purification, such as a bead suspension, and nucleic acid modifying enzymes.

Products Based on the Methods of the Invention

Products based on the methods of the invention may be commercialized by the Applicants under the Encore® Complete family. Encore is a registered trademark of NuGEN Technologies, Inc.

EXAMPLES Example 1—Characterization of the Human Oral Microbiome by Sequential Sequencing of Bacterial 16S Ribosomal DNA

This example describes the characterization of the human oral microbiome by sequencing of the 16S rRNA gene sequences of a number of related bacterial organisms. 16S rRNA gene sequences contain species-specific hypervariable regions that can provide means for bacterial identification.

Sample nucleic acid

Microbial genomic DNA is isolated from human saliva using the OMNIgene-DISCOVER sample collection kit (DNA Genotek) according to the manufacturer's instructions. Extracted DNA is then fragmented via sonication to an average length of 400 by and purified using Agencourt AMPure XP beads (Beckman Coulter Genomics).

Generation of control and test 16 S libraries with ligated adapters

The NuGEN Ovation Ultralow Library System (NuGEN Technologies) is used to generate two directional next generation sequencing libraries from 100 ng of the purified sample according to manufacturer's instructions.

Ligation products of at least 100 by in length are purified by selective binding to Agencourt AMPure XP beads.

Cyclic primer sequencing

16 S ribosomal DNA fragments from the test library are sequenced by Illumina sequencing system using standard forward primers. Alternatively, a custom primer may be used. Following the first sequencing read, the DNA is denatured to wash away the first strand. A second primer that hybridizes to conserved regions within the 16 S library inserts is injected into the sequencer to act as a priming site for a second sequencing read. This second primer is designed to hybridize to conserved regions that are adjacent to variable regions within the inserts. Successive rounds of denaturation, re-priming and sequencing are performed with primers that hybridize to additional conserved regions. Sequence reads from successive priming and sequencing are compiled and aligned to map reads originating from the same nucleic acid fragments.

Example 2—Genomic DNA Sequencing—Distinguishing Between the SMN1 Gene and SMN2 Pseudogene Using Sequential Sequencing

Genomic DNA sequencing libraries are made using the NuGEN's Encore system. These libraries are sequenced on a DNA sequencing system such as those made by Illumina, Ion Torrent, Pacific Biosciences, or Complete Genomics. Following a first sequencing read, the DNA is denatured to wash away the first strand. A pool of primers that hybridize to common sequences in gene/pseudogene pairs are injected into the sequencer to act as a priming site for a second sequencing read. A primer set may include primers that will sequence through one of the nucleotide differences between SMN1 and SMN2 as well as primers that will generate sequence to read nucleotide differences, and therefore determine whether a sequencing read is from a globin gene or pseudogene. A combination of such primers will allow multiple gene/pseudogene pairs across the genome to be analyzed simultaneously for genetic mutations.

Example 3—Targeted DNA Sequencing Library

A targeted DNA sequencing library is made using the a target enrichment product from NuGEN, Agilent, Illumina, or Nimblegen. These libraries are sequenced on a DNA sequencing system such as those made by Illumina, Ion Torrent, Pacific Biosciences, or Complete Genomics. Following a first sequencing read, the DNA is denatured to wash away the first strand. A pool of primers that hybridize to common sequences in gene/pseudogene pairs are injected into the sequencer to act as a priming site for a second sequencing read. A primer set may include primers that will sequence through one of the nucleotide differences between SMN1 and SMN2 as well as primers that will generate sequence to read nucleotide differences, and therefore determine whether a sequencing read is from a globin gene or pseudogene. A combination of such primers will allow multiple gene/pseudogene pairs across the genome to be analyzed simultaneously for genetic mutations. This type of technology is useful for genetic diagnostics.

Example 4—RNA-Sequencing Library

An RNA sequencing library is made from NuGEN's Encore Complete RNA-Seq Library System. The library is sequenced on an Illumina DNA sequencer. Following the first sequencing read, a pool of primers that will hybridize to specific exons of interest is injected into the sequencing machine. These primers are used to generate a second sequencing read in a downstream exon. The second, targeted sequencing read provides information about which exons have been spliced together to generate a particular RNA transcript.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

1. A method for relating at least two nucleic acid sequences or regions of sequence variation to the same nucleic acid template, the method comprising:

a. creating a strand-oriented (i.e. directional) nucleic acid library;

b. sequencing the strand-oriented library with an oligonucleotide primer;

c. denaturing the first strands of the nucleic acid fragments in the library;

d. annealing a new oligonucleotide primer that is complementary to a conserved region or conserved regions within the nucleic acid fragments in the nucleic acid library;

e. sequencing the nucleic acid library with the new oligonucleotide primer; and

f. compiling data from first and second sequencing reads to map reads originating from the same nucleic acid fragments.

2. The method of claim 1, wherein the nucleic acid libraries are amplicons originating from conserved regions of sequence.

3. The method of claim 2, wherein the conserved regions are adjacent to variable regions.

4. The method of claim 3, wherein alignment of multiple variable regions enables differentiation and/or typing of related transcripts.

5. The method of claim 3, wherein alignment of multiple variable regions enables differentiation and/or typing of related micro-organisms.

6. The method of claim 1, wherein libraries are reduced complexity.

7. The method of claim 6, wherein reduced complexity is achieved by target capture.

8. A method for distinguishing between two closely related nucleic acid sequences, the method comprising:

a. creating a strand-oriented nucleic acid library with closely related nucleic acid sequences as inserts;

b. sequencing the strand-oriented library with an oligonucleotide primer;

c. denaturing the first strands of the nucleic acid fragments in the library;

d. annealing a new oligonucleotide primer that is complementary to a conserved region or conserved regions within the nucleic acid fragments in the nucleic acid library;

e. sequencing the nucleic acid library with the new oligonucleotide primer; and

f. compiling data from first and second sequencing reads to map reads originating from the same nucleic acid fragments.