OFF-TARGET CAPTURE REDUCTION IN SEQUENCING TECHNIQUES
Presented herein are methods and compositions for enhancing specific enrichment of target sequences in a nucleic acid library. Off-target hybridization probes may be used to reduce binding and/or capture of off-target regions of a nucleic acid library in a targeted sequencing workflow. The off-target hybridization probes may be specific for locations known to generate off-target sequencing reads for a particular set of hybridization probes.
The present application claims priority to U.S. Provisional Application No. 62/238,411, entitled “DATA-GUIDED DESIGN OF HYBRID CAPTURE OFF-TARGET REDUCERS” and filed Oct. 7, 2015, the disclosure of which is incorporated herein by reference for all purposes.
BACKGROUNDThe present disclosure relates generally to the field of nucleic acid sequencing techniques. More particularly, the disclosure relates to techniques for enriching target capture and reducing off-target capture of nucleic acids to be sequenced in a targeted sequencing workflow.
Sequencing methodology of next-generation sequencing (NGS) platforms typically makes use of nucleic acid fragment libraries. In targeted sequencing techniques, a subset of fragments containing genes or regions of interest of the genome are isolated from the nucleic acid library and sequenced. Targeted approaches using NGS allow researchers to focus time, expenses, and data analysis on specific areas of interest. Such targeted analysis can include the exome (the protein-coding portion of the genome), specific genes of interest (custom content), targets within genes, or mitochondrial DNA. Targeted approaches contrast with whole genome sequencing approaches that are more comprehensive, but that also involve sequencing regions of the genome that may not be of interest to all users.
In one example of a targeted sequencing technique, hybrid capture methods use a panel or set of probes that hybridize to target sequences in the nucleic acid library. Hybridization of the probes to the target sequences allows these sequences to be separated from the rest of the fragments in the library for sequencing. By targeting only a portion of the nucleic acid library, hybrid capture methods avoid sequencing of off-target nucleic acid fragments that do not contain sequences of interest. However, unlike amplicon-based target enrichment methods, hybrid capture methods have a higher rate of off-target sequencing and, in turn, lower on-target specificity. For example, certain hybrid capture methods generally achieve only 40%˜60% efficiency, despite the use of commercial hybridization blockers such as Cot1, tRNA, salmon sperm DNA, poly(dIdC) and blockers targeting the universal adapters of library fragments. The off-target reads not only waste sequencing yield, but also potentially compromise variant calling for somatic mutations of low frequency. Therefore, there is a need for improved enrichment methods that provide for higher specificity in targeted sequencing techniques.
BRIEF SUMMARYPresented herein are techniques for enrichment of target sequences in a nucleic acid library and reducing the capture of off-target sequences by a set of target hybridization probes. Because target hybridization probes have imperfect specificity for their nucleic acid targets, a sequencing run using a set of target hybridization probes may also include a certain percentage of reads that represent sequences that are off-target. For example, in an exome sequencing reaction, certain hybridization probes may pull down intronic or intergenic sequences from a nucleic acid library along with target sequences. These off-target fragments, once pulled down, are then present in the pool of nucleic acid fragments that are sequenced. While the sequencing information representative of the off-target reads is typically discarded, the present techniques use acquired sequencing information of these off-target reads to design hybridization probes that are specific for the off-target sequences and that are used to separate and/or remove fragments that include these sequences from the pool of fragments captured by the target-specific hybridization probes. The off-target hybridization probes are designed based on analysis of the off-target reads of a hybrid capture sequencing run that is performed with a set of target hybridization probes. In certain embodiments, the on-target probe design may also be based on systematic off-target analysis across samples to improve the specificity of the target hybridization probes for their desired targets.
Presented herein is a method of reducing off-target capture in a targeted sequencing reaction. The method includes the steps of providing a set of off-target hybridization probes that specifically bind to a plurality of off-target sequences present in a nucleic acid library generated from a sample, the nucleic acid library comprising a plurality of nucleic acid fragments and providing a set of target-specific hybridization probes that specifically bind to a plurality of target sequences present in the nucleic acid library. The method also includes the steps of contacting the off-target hybridization probes with the nucleic acid library under conditions whereby the off-target hybridization probes hybridize to the off-target sequences and contacting the target-specific hybridization probes with the nucleic acid library under conditions whereby the target-specific hybridization probes hybridize to the target sequences. The method also includes the steps of selecting a group of nucleic acid fragments from the nucleic acid library bound to the target-specific hybridization probes; and sequencing the group of nucleic acid fragments bound to the target-specific hybridization probes.
Presented herein is also a method of providing probes for off-target sequence capture in a targeted sequencing reaction. The method includes the steps of receiving a request for a set of target-specific hybridization probes. The method also includes the steps of contacting the target-specific hybridization probes with a reference nucleic acid library generated from a reference sample, the nucleic acid library comprising a plurality of nucleic acid fragments, to generate a reference group of target-specific and off-target nucleic acid fragments bound to the target-specific hybridization probes and separating the reference group of nucleic acid fragments bound to the target-specific hybridization probes from unbound nucleic acid fragments. The method also includes the steps of sequencing the reference group of nucleic acid fragments to generate reference sequencing data; identifying off-target sequences in the reference sequencing data; and providing a set of off-target hybridization probes based on the identified off-target sequences.
Presented herein is also a sequencing kit for reducing off-target capture in a targeted sequencing reaction that includes a set of off-target hybridization probes that specifically bind to a plurality of off-target sequences present in a nucleic acid library generated from a sample, the nucleic acid library comprising a plurality of nucleic acid fragments and a set of target-specific hybridization probes that specifically bind to a plurality of target sequences present in the nucleic acid library.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Hybrid capture methods in which target sequences are selected via binding by hybridization probes are associated with a high off-target binding rate and low on-target specificity. The present techniques improve sequencing efficiency by reducing the presence of off-target sequences in a hybrid capture sequencing workflow using a data-guided approach. While certain techniques may use blockers or binding attenuators to influence probe binding, such approaches are not data-guided. For example, salmon sperm DNA may be used to prevent non-specific binding of probes to reaction surfaces. However, nonspecific blockers do not prevent the binding of target-specific probes to off-target sequences with similarity to target sequences. Target-specific probes have specificity for their intended targets. However, sequences present in off-target regions may be sufficiently similar to the target sequences (e.g., having short stretches of homology with the target, high string similarity) to permit at least some off-target binding of a target probe, albeit with lower specificity relative to the target sequence binding. Off-target binding is more prevalent in hybrid capture techniques relative to other targeted sequencing methodologies, in part because target-specific hybridization probes are typically longer oligonucleotides (80-120mer) relative to the primers (25-30mer) in PCR-based methods, which may facilitate probe binding to off-target sequences having sufficient similarity to the target sequences. PCR-based targeted sequencing typically requires both ends of the primer binding to a specific area. The double binding need makes random off-target binding slower to amplify compared to on-target binding, which in turn reduces off-target amplification. In another example, longer oligonucleotides are statistically more likely than shorter oligonucleotides to have contiguous base stretches within the oligonucleotides that are similar to the off-target sequences. Such complementary or high similarity contiguous stretches may contribute to off-target binding.
The present techniques use information about off-target sequences to improve hybrid capture and decrease the percentage of off-target capture. A hybrid capture sequencing reaction may acquire sequence data from off-target sequences as a result of undesired off-target binding of target-specific hybridization probes. While such off-target sequencing data is typically discarded, the present techniques harness the sequencing information of the off-target sequences for use in designing probes specific for these off-target regions. Using probes with high specificity for the off-target facilitates a reduction on the total number of off-target regions present in a pool of sequenced fragments. As a result of the data-guided approach, the percentage of off-target sequencing reads in a given sequencing run will be reduced. Accordingly, the present techniques provide the benefit of improving the efficiency of a sequencing device by reducing the total amount of raw data generated in a sequencing run. Further, the reduction in off-target reads present in the sequencing data also improves the efficiency of data analysis by reducing the amount of off-target sequence data to be identified and excluded from analysis.
Turning to the figures, embodiments of the present techniques include acquisition of off-target sequence data as an input for data-guided design of off-target hybridization probes.
As provided herein, a target sequence 14 is a nucleic acid sequence present in a nucleic acid library that is complementary to a target-specific hybridization probe 20. Depending on the desired sequencing outcome, the target sequences 14 may be exonic sequences for exome sequencing. Accordingly, in some embodiments, the target-specific hybridization probes 20 are directed to target sequences 14 of exons. In another embodiment, the target sequences 14 may be custom sequences, or disease or allele-specific sequences. The target sequence 14 may be part of a region of interest in a nucleic acid sample, and the target-specific hybridization probe 20 may be designed based on various metrics to be specific for a portion of the region of interest.
As provided herein, a probe (e.g., a target-specific hybridization probe 20) is an oligonucleotide, such as a single-stranded nucleic acid molecule. The target-specific hybridization probe 20 may be part of a set or panel of target-specific hybridization probes 20. The target-specific hybridization probes 20 may be 80-120 bases in length, 80-100 bases in length, 90-110 bases in length, 100-120 bases in length, etc. In certain embodiments, if the target-specific hybridization probe 20 is 80-120 bases in length, at least 30-50 of the bases of the target-specific hybridization probe are complementary to the target sequence 14. It should be understood that a hybrid capture sequencing reaction may be performed using a set of target-specific hybridization probes 20, wherein different probes are representative of different target sequences 14 in the nucleic acid library. For example, the set of target-specific hybridization probes 20 may be representative of at least 2000 different target sequences 14, at least 5000 different target sequences 14, at least 10,000 different target sequences 14, and so on. Further, while the disclosed embodiments are discussed with regard to hybrid capture technologies, incorporation of the techniques provided herein may also be implemented with PCR or amplicon-based sequencing techniques. In such embodiments, the target-specific hybridization probes 20 may be on the order of 20-40 bases in length.
In certain embodiments, the target-specific hybridization probes 20 may have modifications that facilitate separation of bound fragments 12 from the unbound fragments 12. Such modifications may include biotinylation of the probe to facilitate selection via streptavidin (e.g., streptavidin beads). However, it should be understood that the probes as provided herein may be coupled to other an affinity binding molecule that is part of a binding pair. For example, biotin and streptavidin, biotin and avidin, or digoxigenin and a specific antibody that binds digoxigenin are examples of specific binding pairs. The affinity binding molecule may be an antibody ligand capable of being conjugated to a nucleotide. In certain embodiments, the modification is provided at the 5′ or the 3′ end of the probe. Further, in other embodiments, the probes may be unmodified. The target-specific hybridization probes 20 may also include unique barcodes or sequences that facilitate identification. Such sequences may part of a region of the probe 20 that is non-complementary to the target sequence 14. The target-specific hybridization probes 20 may be in solution or immobilized on a solid support (e.g., an array).
As shown in
As provided herein, an off-target sequence 16 is a sequence that is not an intended target of one or more of the target-specific hybridization probes 20. In one example, if the target-specific hybridization probes 20 are for exome sequencing, an off-target sequence 16 may be an intronic or intergenic sequence. In certain embodiments, a target-specific hybridization probe 20 is capable of binding to an off-target sequence 16 with lower specificity than for the intended target sequence.
An examination of off-target sequences was performed to demonstrate that the off-target sequences are relatively stable between samples.
The sequence similarities between the off-target regions and capture probes also indicate that off-target reads were likely pulled down by probes, rather than by random binding.
As provided herein, sequencing data may include raw data as well as base call data for the sequenced fragments of the nucleic acid library. Further, the sequencing data may have undergone alignment and assembly so that the genome loci of the assembled fragments can be identified. Accordingly, the sequence data may include sequence information and location information for the assembled fragments such that off-target data is identifiable based at least in part on the location of the sequenced fragments. In addition, the sequencing data may include coverage data of off-target sequence reads so that the off-target prevalence as well as locations may be assessed. In this manner, the highest prevalence sequence reads (i.e., highest coverage) for various off-target loci may be identified. In certain embodiments, the off-target reads are ranked according to coverage to identify the highest frequency off-target loci. The off-target hybridization probes may be designed based on the highest 50, 100, 1000, or 2000 loci. In one embodiment, the design is based on a user-specified number of the ranked sequences.
In one embodiment, the method 30 may be performed as part of a workflow for generating a panel of target-specific hybridization probes. Based on a request for a particular panel of target-specific hybridization probes, the method 30 is initiated on a reference sample to identify and assess the off-target sequences. The reference sample may be an internal standard that is known to be a high quality sample. In another embodiment, the method 30 is initiated upon receipt of a customer request for a custom panel of target-specific hybridization probes. As part of synthesizing the custom panel, the method 30 is performed to identify potential off-target sequences. Accordingly, the method 30 may be performed in response to a user or customer input.
Based on the identified off-target sequences, a set of off-target hybridization probes may be identified and synthesized to be provided as part of a sequencing kit. The off-target hybridization probes may be an optional add-on item to improve sequencing yield and reduce off-target sequence capture. In another embodiment, the method 30 may also include generating an estimate of sequencing cost reduction for the reference sample based on an estimated reduction in off-target sequencing reads. For example, if a typical hybrid capture sequencing run generates 60% target reads and 40% off-target reads, then 40% of the cost of sequencing is attributable to off-target sequences. If the set of off-target hybridization probes is designed to correspond to off-target sequences that represent about 50% of the off-target coverage in the reference sequencing data, then the off-target hybridization probes are capable of reducing off-target reads by 50%. Accordingly, a sequencing run using the off-target hybridization probes to reduce off-target capture may be estimated to lower costs by 20% relative to the control. In this manner, a user may determine if the cost of the off-target hybridization probes will generate sufficient savings on sequencing. The method 30 may also permit dynamic estimates based on variable user inputs. For example, reducing the total number of off-target sequences of the off-target hybridization probes will reduce probe cost, but may be associated with a slight increase in off-target sequence capture, resulting in an associated rise in estimated sequencing costs relative to a selection of a higher number of off-target sequences of the off-target hybridization probes. In another embodiment, the user may provide a total sequencing budget, including any target and off-target probe costs, for a given sample, and a determination may be made if cost savings can be achieved using the off-target hybridization probes.
As provided herein, an off-target hybridization probe (e.g., off-target hybridization probe 60, see
It should be understood that a targeted sequencing reaction may be performed using a set of target-specific hybridization probes 20 together with (e.g., in parallel or in sequence) off-target hybridization probes, wherein off-target hybridization probes are representative of different off-target sequences in the nucleic acid library. For example, the set of off-target hybridization probes may be representative of at least 50 different off-target sequences, at least 100 different off-target sequences, at least 10000 different off-target sequences, and so on. In another embodiment, a set of target sequences represents a greater number of different sequences than a set of off-target sequences for the probes used in a hybrid capture sequencing as provided herein. For example, a ratio of the number of different target sequences in the target-specific hybridization probes to the number of different off-target sequences in the off-target hybridization probes may be 2:1, 3:1, 4:1, 5:1 or greater in certain embodiments. There are certain advantages to providing a limited number of off-target hybridization probes due to the cost of manufacturing additional probes for use. Accordingly, the ranking of the prevalence of off-target sequences may be used to permit user selection of a number of desired off-target hybridization probes. Further, certain highly prevalent off-target sequences may be present in the total pool of off-target sequences to such a high degree that having a limited number of off-target hybridization probes specific for highly prevalent off-targets may nonetheless yield a high reduction in off-target sequence capture.
In certain embodiments, the off-target hybridization probes may have modifications that facilitate separation of bound fragments from the unbound fragments. Such modifications may include biotinylation of the probe to facilitate selection via streptavidin (e.g., streptavidin beads). However, it should be understood that the probes as provided herein may be coupled to other an affinity binding molecule that is part of a binding pair. For example, biotin and streptavidin, biotin and avidin, or digoxigenin and a specific antibody that binds digoxigenin are examples of specific binding pairs. In certain embodiments, the modification is provided at the 5′ or the 3′ end of the probe. Further, in other embodiments, the probes may be unmodified.
The off-target hybridization probes may also include unique barcodes or sequences that facilitate identification. Such sequences may part of a region of the probe that is non-complementary to the off-target sequences. The off-target hybridization probes may be in solution or immobilized on a solid support (e.g., an array). In another embodiment, the target-specific hybridization probes and the off-target hybridization probes are provided as similar length probes, i.e., all within a certain range. Accordingly, in a specific embodiment, the target-specific hybridization probes and the off-target hybridization probes are all in a range of 80-120 bases in length. In another embodiment, the target-specific hybridization probes and the off-target hybridization probes are all in a range of 20-40 bases in length. In yet another embodiment, the target-specific hybridization probes have a length all in a first range and the off-target hybridization probes have a length all in a second range, whereby the first range and the second range are different. In one embodiment, the first range encompasses longer probe lengths than the second range. In another embodiment, the first range encompasses shorter probe lengths than the second range.
In certain embodiments of the disclosure, providing the off-target hybridization probes comprises providing the off-target hybridization probes as part of a sequencing kit for use with the target-specific hybridization probes. The off-target hybridization probes may be specific for only certain types of off-target sequences (e.g., introns, intergenic regions). In this manner, a user may select the off-target sequences of interest. In another embodiment, providing the off-target hybridization probes comprises providing the off-target hybridization probes as part of a request or order for a custom target-specific hybridization probe panel. When the request for the custom panel is received, the synthesis facility may also perform the steps of the method 30 to determine the off-target sequences of concern (e.g., highly prevalent off-target sequences) for the custom panel and provide off-target hybridization probes to reduce off-target reads from these identified off-target sequences.
In another embodiment, a universal set of off-target hybridization probes may be provided. That is, regardless of the particular panel of target-specific hybridization probes used, certain off-target reads may be common across a species. In one implementation, a species-specific set of off-target hybridization probes may be used to de-host a sample, such as in microbiology, infectious disease, food safety, and quality monitoring. A universal set and/or a species-specific set may be determined using the data-guided techniques as provided herein. For example, the universal set or the species-specific set may be selected by performing sequencing on reference samples using different panels of target-specific hybridization probes (e.g., using a plurality of human-specific panels or using a plurality of cancer-specific panels) and selecting the top-ranked (i.e., most prevalent) off-target sequences from the sequencing data from all of the different panels to design the off-target hybridization probes. In one embodiment, the top-ranked set may include only the off-target sequences that are common between samples sequenced using different panels. In another embodiment, the top-ranked set may be representative of a pool of all of the off-target sequences in the sequencing data using the different panels, such that some sequences in the pool are only off-target for a given panel. However, the top-ranked set will nonetheless include a number of off-target reads represented in the sequencing data for each sample such that the universal set, when used, will reduce off-target capture when used in conjunction with any of the panels.
Also provided herein are methods of implementing targeted sequencing using the off-target hybridization probes as provided herein.
It should be understood that the target-specific hybridization probes 20 and the off-target hybridization probes 60 as provided herein may be used in conjunction with blockers or other approaches used in hybrid capture to reduce probe self-annealing, sticky probes, or nonspecific binding.
In one embodiment, the off-target hybridization probes are specific for the highly enriched off-target regions to provide reduction of the most-prevalent off-target reads. Where an off-target sequence has a highly similar sequence to the actual target region, use of an off-target hybridization probe specific for that highly similar sequence could cause an unintended coverage drop for the target region having the similar sequence. To prevent this from happening, in one embodiment, off-target hybridization probes may be selected only from off-target regions having less than a threshold similarity with a target sequence according to one or more similarity metrics (e.g., Damerau-Levenshtein distance, Needleman-Wunsch algorithm, BLAST score). In one embodiment, a threshold percent identity or identity score is used to qualify off-target hybridization probes, with only off-target sequences having less than a predetermined percent identity (e.g., less than 50%, less than 25%) with a target sequence being qualified. For example, in one embodiment, only off-target sequences that do not contain matches of 15 or more contiguous bases with a target sequence will be qualified for off-target hybridization probe design. Those off-target sequences with 15 or more contiguous bases in common with a target sequence are not used as the basis for any off-target hybridization probes, even if such off-target sequences are highly prevalent. In another example, because the loci of off-target sequences are known, the sequence for which the off-target hybridization probe is specific can be shifted 5′ or 3′ away from the highly similar region, e.g., moved 20-50 bases 5′ or 3′ such that the targeted region has a lower similarity score.
As shown in
In another embodiment, it may be desirable to retain the off-target group 82 to assess probe quality. The pre-clearing technique (see
In one example, to find the consistent off-target regions, a set of representative samples, e.g. a set of samples of different cell lines/tissues sequenced with good quality, were selected. Firstly, on-target reads were filtered out of the sequencing data, then regions highly enriched for off-target reads were called using peak-calling tools GEM for the ENCODE project. However, other peaking calling algorithms may also be used. Overlapping peaks from different samples were then extracted and peaks within 50 bp were merged and only those that are 400 bp or more away from the targets were kept. The off-target peaks that were identified previously were sorted by the average coverage. According to the ranking those with significantly high coverage were choose to design reducers against. Off-target hybridization probes were designed to be specific for off-target regions that contributed to about 50% of the total off-target reads. DesignStudio (Illumina Inc.) was be utilized to design the off-target hybridization probes, representative of approximately 2000 off-target sequences.
Using off-target probe design as outlined herein,
The techniques provided herein address the problem of a high off-target capture rate by using guided information from data analysis on the off-target regions. Prior attempts to solve this issue have utilized Cot1, tRNA, poly(dI-dC), adapter blockers and blockers for high-representation genes (e.g. anti-mitochondrial gene blockers). In contrast to those methodologies, the methods presented herein represent the first data driven approach. Furthermore, using off-target hybridization probes to clean or remove the unwanted DNA fragments out of sample libraries prior to target-specific binding is a novel approach. Further, the identified systematic off-target regions that are stable between samples as well as different sets or panels of hybridization probes may not necessarily be identified by the conventional wisdom. For example, they may not necessarily be identifiable repetitive elements such as Alu, SINE, LINE, or etc. In some embodiments, the approach described herein can be applied to other genomes to develop specie-specific off-target hybridization probes for metagenomic applications or contamination elimination in sample prep.
The techniques disclosed herein may be implemented in conjunction with a sequencing device and/or a sequence analysis device.
In the depicted embodiment, the sequencing device 120 includes a separate sample processing device 122 and an associated sequence analysis device 124. Further, it is contemplated that the sequence analysis device 124 may be implemented separately form and not associated with the sample processing device 122. Accordingly, in such an embodiment, sequence analysis device 124 receives data from a remote sample processing device 122. However, these may be implemented as a single device. Further, the associated sequence analysis device 124 may be local to or networked with the sample processing device 122. In the depicted embodiment, the biological sample may be loaded into the sample processing device 122 as a sample slide 126 that is imaged to generate sequence data. For example, reagents that interact with the biological sample fluoresce at particular wavelengths in response to an excitation beam generated by an imaging module 128 and thereby return radiation for imaging. For instance, the fluorescent components may be generated by fluorescently tagged nucleic acids that hybridize to complementary molecules of the components or to fluorescently tagged nucleotides that are incorporated into an oligonucleotide using a polymerase. As will be appreciated by those skilled in the art, the wavelength at which the dyes of the sample are excited and the wavelength at which they fluoresce will depend upon the absorption and emission spectra of the specific dyes. Such returned radiation may propagate back through the directing optics. This retrobeam may generally be directed toward detection optics of the imaging module 128.
The imaging module detection optics may be based upon any suitable technology, and may be, for example, a charged coupled device (CCD) sensor that generates pixilated image data based upon photons impacting locations in the device. However, it will be understood that any of a variety of other detectors may also be used including, but not limited to, a detector array configured for time delay integration (TDI) operation, a complementary metal oxide semiconductor (CMOS) detector, an avalanche photodiode (APD) detector, a Geiger-mode photon counter, or any other suitable detector. TDI mode detection can be coupled with line scanning as described in U.S. Pat. No. 7,329,860, which is incorporated herein by reference. Other useful detectors are described, for example, in the references provided previously herein in the context of various nucleic acid sequencing methodologies.
The imaging module 128 may be under processor control, e.g., via a processor 130, and the sample preparation device 122 may also include I/O controls 132, an internal bus 134, non-volatile memory 136, RAM 138 and any other memory structure such that the memory is capable of storing executable instructions, and other suitable hardware components that may be similar to those described with regard to
The sequencing device 120 may be used to request target-specific hybridization probes. Further, the sequencing device 120 may be used to provide user inputs for off-target hybridization probe preparation. The user may provide inputs specifying a desired number of highest ranked sequences to be prepared as the set of off-target hybridization probes. The selections may alternatively or additionally be based on a desired percentage of off-target reduction.
Throughout this application various publications, patents and/or patent applications have been referenced. The disclosure of these publications in their entireties is hereby incorporated by reference in this application. The term comprising is intended herein to be open-ended, including not only the recited elements, but further encompassing any additional elements. While only certain features of the invention have been illustrated and described herein, many modifications and changes will occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention. Further, elements of the disclosed embodiments may be combined or exchanged. Accordingly, other embodiments are within the scope of the following claims.
Claims
1-31. (canceled)
32. A sequencing kit for reducing off-target capture in a targeted sequencing reaction, comprising:
- a set of off-target hybridization probes that specifically bind to a plurality of off-target sequences present in a nucleic acid library generated from a sample, the nucleic acid library comprising a plurality of nucleic acid fragments; and
- a set of target-specific hybridization probes that specifically bind to a plurality of target sequences present in the nucleic acid library.
33. The kit of claim 32, wherein the off-target hybridization probes specifically bind to intronic sequences and the target-specific hybridization probes specifically bind to exonic sequences.
34. The kit of claim 32, wherein the set of off-target hybridization probes comprises a universal set that is configured to be used with samples from a particular species.
35. The kit of claim 32, wherein the probes of the set of off-target hybridization probes are between 80-120 bases in length.
36. The kit of claim 32, wherein the probes of the set of target-specific hybridization probes are between 80-120 bases in length.
37. The kit of claim 32, wherein the set of off-target hybridization probes comprises probes specific for a host sequence of a host, wherein the sample is not a same species as the host.
38. The kit of claim 32, wherein the set of off-target hybridization probes is specific for 5000 or fewer different off-target sequences.
39. The kit of claim 32, wherein the set of target-specific hybridization probes is specific for 10,000 or more different target sequences.
40. The kit of claim 32, wherein the probes of the set of off-target hybridization probes and/or the set of target-specific hybridization probes comprise an affinity binding molecule of a binding pair.
41. The kit of claim 40, wherein the probes of the set of off-target hybridization probes comprise the affinity binding molecule of the binding pair and the set of target-specific hybridization probes do not comprise the affinity binding molecule of the binding pair.
42. The kit of claim 40, wherein the affinity binding molecule comprises biotin.
43. A device for identifying probes for off-target sequence capture in a targeted sequencing reaction; the device comprising:
- a sample processing device configured to receive a reference group of target-specific and off-target nucleic acid fragments of a reference nucleic acid library generated from a reference sample, the reference nucleic acid library comprising a plurality of nucleic acid fragments, wherein the reference group of target-specific and off-target nucleic acid fragments are separated from the reference nucleic acid library based on binding to a set of target-specific hybridization probes;
- an imager configured to image a substrate loaded with the reference group of target-specific and off-target nucleic acid fragments to generate sequencing data; and
- a sequence analysis device configured to: receive the sequencing data to generate reference sequencing data; identify off-target sequences in the reference sequencing data; and identify a set of off-target hybridization probes based on the identified off-target sequences.
44. The device of claim 43, wherein the sequence analysis device is configured to identify the set of off-target hybridization probes by ranking a prevalence of a plurality of off-target sequences in the sequencing data and selecting a plurality of highest prevalence off-target sequences to design the off-target hybridization probes such that the off-target hybridization probes are specific for the highest prevalence off-target sequences.
45. The device of claim 44, wherein selecting the plurality of highest prevalence off-target sequences comprises selecting a predetermined number of off-target sequences according to the ranking.
46. The device of claim 45, wherein the predetermined number is 5000 or fewer different off-target sequences.
47. The device of claim 44, wherein selecting the plurality of highest prevalence off-target sequences comprises selecting a subset of off-target sequences associated with at least 50% of off-target sequence reads in the reference sequencing data.
48. The device of claim 43, wherein the target-specific hybridization probes are specific for 10,000 or more different target sequences.
49. The device of claim 43, wherein the sequence analysis device is configured to communicate the identified set of off-target hybridization probes to a synthesis facility.
50. The device of claim 43, wherein the sequence analysis device is configured to generate instructions to synthesize the identified set of off-target hybridization probes.
51. The device of claim 43, wherein the set of target-specific hybridization probes is a user-defined custom set of target-specific hybridization probes.
52. The device of claim 43, wherein the sequence analysis device is configured to provide an estimated reduction in sequencing cost associated with using the off-target hybridization probes with the reference sample, wherein the estimated reduction in sequencing cost is based on a reduction in off-target sequences.
Type: Application
Filed: Feb 6, 2023
Publication Date: Jun 22, 2023
Inventors: Li Teng (San Diego, CA), Chia-Ling Hsieh (San Diego, CA), Charles Lin (San Diego, CA), Han-Yu Chuang (San Diego, CA)
Application Number: 18/165,153