ENRICHMENT OF MUTATED CELL FREE NUCLEIC ACIDS FOR CANCER DETECTION
Provided herein are methods of enriching mutated cell free nucleic acids for detection and diagnosis of cancer. Also provided are methods using a CRISPR-Cas system to target and deplete unwanted more abundant cell free nucleic acid sequences thereby enriching for less abundant sequences.
This application claims priority benefit of the filing date of U.S. Provisional Patent Application Ser. No. 62/349,514, filed on Jun. 13, 2016, the disclosure of which application is herein incorporated by reference in its entirety. This application also claims priority benefit of the filing date of U.S. Provisional Patent Application Ser. No. 62/357,812, filed on Jul. 1, 2016, the disclosure of which application is herein incorporated by reference in its entirety.
INTRODUCTIONProvided herein are methods of enriching mutated cell free nucleic acids for detection and diagnosis of cancer. Also provided are methods of using a CRISPR-Cas system to target and deplete unwanted, more abundant cell free nucleic acid sequences, thereby enriching for less abundant sequences.
BACKGROUNDAnalysis of circulating cell free nucleic acids (e.g., cell free DNA (cfDNA) and/or cell free RNA (cfRNA)) using next generation sequencing (NGS) is recognized as a valuable diagnostic tool for detection and diagnosis of cancer. However, in a complex sample, such as a plasma sample, sequences from a wild-type allele, for example, can overwhelm detection of a mutant allele during NGS analysis of cfDNA. In another example, transcripts from highly expressed genes can overwhelm detection of less abundant transcripts during NGS analysis of an RNA-Seq library prepared from cfRNA. There is a need for new methods for depleting unwanted more abundant cell free nucleic acid sequences and enriching mutated nucleic acid sequences from a population of cell free nucleic acids for detection and diagnosis of cancer.
SUMMARYAspects of the invention include methods for enriching a plurality of target nucleic acids in a sample, the methods comprising providing an endonuclease system, wherein each of the plurality of target nucleic acids comprises a first variant and a second variant, wherein the endonuclease system comprises a plurality of clustered regularly interspaced short palindromic repeat (CRISPR) RNAs (crRNAs), or derivatives thereof, each crRNA comprising a targeting sequence, and a plurality of CRISPR-associated (Cas) proteins, or variants thereof, each Cas protein capable of binding to a protospacer adjacent motif (PAM) site on a target nucleic acid, wherein the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and wherein the second variant does not comprise the PAM site or does not comprise the region complementary to the crRNA targeting sequence adjacent to the PAM site, and contacting the sample with the endonuclease system, thereby depleting the first variant and enriching the second variant of each of the plurality of target nucleic acids in the sample.
In some embodiments, the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and the second variant does not comprise the PAM site. In some embodiments, the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and the second variant does not comprise the region complementary to the crRNA targeting sequence adjacent to the PAM site. In some embodiments, the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and the second variant does not comprise the region complementary to the crRNA targeting sequence. In some embodiments, the methods comprise amplifying the enriched second variants of the plurality of target nucleic acids to produce an enriched sequencing library. In some embodiments, the methods comprise sequencing the enriched sequencing library to detect structural rearrangements or mutations in the target nucleic acids in the sample.
In some embodiments, the first variant of each of the plurality of target nucleic acids is depleted by more than 50%, more than 60%, more than 70%, more than 80%, more than 90%, more than 95%, more than 98%, more than 99%, more than 99.9%, more than 99.99%, or more than 99.999% after contacting the sample with the endonuclease system, relative to the first variant level in the sample prior to contacting the sample with the endonuclease system. In some embodiments, the plurality of target nucleic acids is between 2 and 100, between 2 and 80, between 2 and 60, between 2 and 40, between 2 and 20, between 2 and 10 target nucleic acids. In some embodiments, the plurality of target nucleic acids is between 2 and 100, between 10 and 100, between 20 and 100, between 30 and 100, between 40 and 100, between 50 and 100, between 60 and 100, between 70 and 100, between 80 and 100, or between 90 and 100 target nucleic acids.
In some embodiments, the first variant of each of the plurality of target nucleic acids is more abundant in the sample than the second variant. In some embodiments, the first variant of each of the plurality of target nucleic acids in the sample comprises at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, at least 99%, at least 99.9%, at least 99.99%, or at least 99.999% of each of the target nucleic acids in the sample. In some embodiments, the first variant of each of the plurality of target nucleic acids comprises a wild-type allele sequence. In some embodiments, the wild-type allele sequence comprises a AKT1, BRAF, EGFR, KRAS, MAP2K1, NRAS, PI3KCA or PTEN wild-type allele sequence.
In some embodiments, the second variant of each of the plurality of target nucleic acids comprise a mutant allele sequence. In some embodiments, the mutant allele sequence comprises a AKT1, BRAF, EGFR, KRAS, MAP2K1, NRAS, PI3KCA or PTEN mutant allele sequence. In some embodiments, the mutant allele sequence comprises a AKT1-E17K, BRAF-V600E, BRAF-L597V, BRAF-G469A, BRAF-G466V, EGFR-E709 T710delins, EGFR-G719S, EGFR-G719C, EGFR-G719A, EGFR-Exon19del, EGFR-T790M, EGFR-L858R, EGFR-L861Q, KRAS-Q61H, KRAS-Q61L, KRAS-Q61R, KRAS-Q61K, KRAS-G13A, KRAS-G13D, KRAS-G13C, KRAS-G13R, KRAS-G13D, KRAS-G13C, KRAS-G13R, KRAS-G13S, KRAS-G12V, KRAS-G12A, KRAS-G12D, KRAS-G12D, KRAS-G12C, KRAS-G12R, KRAS-G125, MAP2K1-Q56P, NRAS-Q61H, NRAS-Q61L, NRAS-Q61R, NRAS-Q61K, NRAS-G12A, NRAS-G12D, NRAS-G12C, NRAS-G12R, NRAS-G12S, PI3KCA-E542K, PI3KCA-E545Q, PI3KCA-E545K, PI3KCA-H1047R, PI3KCA-H1047L, or PTEN-R233* mutant allele sequence.
In some embodiments, the mutant allele sequence comprises a mutant allele sequence according to
In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a PAM site comprising the sequence 5′-NGG-3′, wherein N comprises A, G, C, or T, and wherein the second variant does not comprise the PAM site. In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a PAM site comprising the sequence 5′-TTN-3′, wherein N comprises A, G, C, or T, and wherein the second variant does not comprise the PAM site. In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a PAM site, and wherein the second variant comprises a deletion of the PAM site. In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a region complementary to a crRNA targeting sequence adjacent to a PAM site and the second variant comprises an insertion of 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 12 or more, 14 or more, 16 or more, 18 or more, or 20 or more base pairs (bps) within 50 bps, 40 bps, 30 bps, 20 bps, or 10 bps upstream of the PAM site.
In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a region complementary to a crRNA targeting sequence adjacent to a PAM site, and the second variant does not comprise the region complementary to the crRNA targeting sequence adjacent to the PAM site. In some embodiments, in the second variant the region adjacent to the PAM site comprises a point mutation. In some embodiments, in the second variant the region adjacent to the PAM site comprises the junction of a fusion gene. In some embodiments, the endonuclease system further comprises a crRNA and Cas protein targeting an abundant wild-type target nucleic acid in the sample. In some embodiments, the abundant wild-type target nucleic acid is selected from the group consisting of a ribosomal RNA and a globin RNA.
Aspects of the invention include methods for analyzing the genome of a cancer patient, the methods comprising providing an endonuclease system, wherein the endonuclease system comprises a plurality of crRNAs, or derivatives thereof, each crRNA comprising a targeting sequence, and a plurality of Cas proteins, or variants thereof, each Cas protein capable of binding to a PAM site on a target nucleic acid; contacting a sample obtained from the cancer patient comprising a plurality of target nucleic acids with the endonuclease system to obtain a pool of target nucleic acid fragments; sequencing the pool of target nucleic acid fragments to obtain sequencing data from the cancer patient, and comparing the sequencing data from the cancer patient with sequencing data from a reference genome fragmented by the endonuclease system to detect structural rearrangements and mutations in the genome of the cancer patient.
In some embodiments, the sequencing data comprises comparing the fragmentation pattern of the pool of target nucleic acids from the cancer patient with the fragmentation pattern of a pool of target nucleic acids in the reference genome.
In some embodiments, the method comprises contacting a sample obtained from a healthy subject comprising a plurality of target nucleic acids with the endonuclease system to obtain a pool of target nucleic acid fragments, and sequencing the pool of target nucleic acid fragments to obtain the sequencing data from the reference genome. In some embodiments, the endonuclease system cleaves target nucleic acids in a sample from a healthy subject at a predetermined interval. In some embodiments, the predetermined interval is about 300 bp.
Provided herein are methods of enriching mutated cell free nucleic acids for detection and diagnosis of cancer. In various embodiments, the methods provided herein use a CRISPR-Cas system to target and deplete unwanted more abundant cell free nucleic acid sequences, thereby enriching for less abundant sequences.
Class 2 CRISPR-Cas systems generally includes a CRISPR (“clustered regularly interspaced short palindromic repeat”) RNA (crRNA) and a CRISPR-associated (Cas) protein, wherein the crRNA is a guide RNA that contains a target-specific nucleotide sequence (“targeting sequence”) complementary to a region of a target nucleic acid. The targeting sequence of a guide RNA can be designed to target any DNA sequence that is adjacent to a PAM (protospacer adjacent motif) site. Cas protein binding at a target site is initiated by recognition of the PAM site that is adjacent to a targeted DNA sequence. Subsequent cleavage of the targeted site occurs if the guide RNA targeting sequence successfully hybridizes to the target sequence.
In some embodiments, the methods provided herein are used to enhance the detection of mutant alleles in a next generation sequencing (NGS) library prepared from cell free DNA (cfDNA) isolated from a biological sample (e.g., a plasma sample). In one example, the cfDNA includes circulating tumor DNA (ctDNA) that includes a mutation (e.g., a single nucleotide mutation, an insertion, or a deletion).
In some embodiments, the methods provided herein are used to enhance the detection of less abundant mutant transcripts in an RNA sequencing (RNA-Seq) library prepared from cell free RNA (cfRNA) isolated from a biological sample (e.g., a plasma sample). In some embodiments, the methods provided herein are used for enriching RNA fusions in cell free nucleic acids isolated from a biological sample, (e.g., a plasma sample).
In some embodiments, a CRISPR-Cas system includes a Cas9 protein or variant thereof. Cas9 is guided to a target nucleic acid site by a crRNA that includes a targeting sequence of about 20 bases, and a trans-activating crRNA (tracrRNA). In some embodiments, the Cas9 crRNA and the tracrRNA are fused to form a hybrid single guide RNA (sgRNA). Binding of a target site by Cas9 is initiated by recognition of a PAM site 3′ to the target DNA site. Cas9 then produces a blunt-end double-stranded break in the target DNA if the guide RNAs (e.g., sgRNA) successfully hybridizes with the adjacent target sequence.
In some embodiments, a CRISPR-Cas system includes a Cpf1 protein or variant thereof. Cpf1 is guided to a target nucleic acid region by a single CRISPR guide RNA that includes a targeting sequence of about 24 bases. Binding of a target site by Cpf1 is initiated by recognition of a PAM site 5′ to the target DNA site. Cpf1 then produces a staggered double-stranded break in the target DNA if the guide RNAs (e.g., sgRNA) successfully hybridizes with the adjacent target sequence.
In some embodiments, provided herein is a “tool box” of Cas proteins (e.g., Cas9 or variants thereof, Cas9 orthologs or variants thereof, Cpf1 or variants thereof) that are programmed with a set of guide RNAs having different target specificities. The tool box (or pool or library) of Cas proteins programmed with a set of guide RNAs having different target specificities constitutes an “enzymatic enrichment panel” that can be used for multiplexed enrichment of multiple target sequences. In one example, the pool of Cas proteins can be programmed with a set of guide RNAs designed to target abundant library fragments representative of rRNA allowing enrichment of lower abundance transcripts during whole transcriptome analysis. The tool box can include, for example, (1) Cas9 proteins programmed to target a set of wild-type alleles that include a specific PAM site; (2) Cas9 ortholog(s) programmed to target a set of wild-type alleles that include different PAM sites; (3) Cas9 proteins (or orthologs) programmed with guide RNAs having targeting sequences with enhanced specificity for wild-type alleles; and (4) variants of Cas9 proteins or orthologs with enhanced ability to target single base mutations.
CRISPR-Cas Mediated Enrichment of Mutated Cell Free Nucleic AcidsIn one aspect, provided herein are methods for enriching a plurality of target nucleic acids in a sample, comprising providing an endonuclease system, wherein each of the plurality of target nucleic acids comprises a first variant and a second variant, wherein the endonuclease system comprises a plurality of clustered regularly interspaced short palindromic repeat (CRISPR) RNAs (crRNAs), or derivatives thereof, each crRNA comprising a targeting sequence, and a plurality of CRISPR-associated (Cas) proteins, or variants thereof, each Cas protein capable of binding to a protospacer adjacent motif (PAM) site on a target nucleic acid, wherein the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and wherein the second variant does not comprise the PAM site or does not comprise the region complementary to the crRNA targeting sequence adjacent to the PAM site, and contacting the sample with the endonuclease system, thereby depleting the first variant and enriching the second variant of each of the plurality of target nucleic acids in the sample.
In some embodiments, the methods comprise amplifying the enriched second variants of the plurality of target nucleic acids to produce an enriched sequencing library. In some embodiments, the methods comprise sequencing the enriched sequencing library to detect structural rearrangements or mutations in the target nucleic acids in the sample. In some embodiments, the methods comprise sequencing the enriched sequencing library to detect structural rearrangements or mutations in the target nucleic acids in the sample.
In some embodiments, the first variant of each of the plurality of target nucleic acids is depleted by more than 50%, more than 60%, more than 70%, more than 80%, more than 90%, more than 95%, more than 98%, more than 99%, more than 99.9%, more than 99.99%, or more than 99.999% after contacting the sample with the endonuclease system, relative to the first variant level in the sample prior to contacting the sample with the endonuclease system.
In some embodiments, the first variant of each of the plurality of target nucleic acids is depleted by more than 50%, more than 60%, more than 70%, more than 80%, more than 90%, more than 95%, more than 98%, more than 99%, more than 99.9%, more than 99.99%, or more than 99.999% after contacting the sample with the endonuclease system, relative to the first variant level in the sample prior to contacting the sample with the endonuclease system. In some embodiments, the plurality of target nucleic acids is between 2 and 100, between 2 and 80, between 2 and 60, between 2 and 40, between 2 and 20, between 2 and 10 target nucleic acids. In some embodiments, the plurality of target nucleic acids is between 2 and 100, between 10 and 100, between 20 and 100, between 30 and 100, between 40 and 100, between 50 and 100, between 60 and 100, between 70 and 100, between 80 and 100, or between 90 and 100 target nucleic acids.
In some embodiments, the first variant of each of the plurality of target nucleic acids is more abundant in the sample than the second variant. In some embodiments, the first variant of each of the plurality of target nucleic acids in the sample comprises at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, at least 99%, at least 99.9%, at least 99.99%, or at least 99.999% of each of the target nucleic acids in the sample.
In some embodiments, the plurality of Cas proteins comprises Cas9, or a variant thereof, a Cas9 ortholog, or a variant thereof, or Cpf1, or a variant thereof. In some embodiments, the Cas9, or variant thereof, is derived from Streptococcus pyogenes, or wherein the Cpf1, or variant thereof, is derived from Francisell novicida U112. In some embodiments, each of the plurality of Cas proteins comprises two active nuclease domains. In some embodiments, each of the plurality of target nucleic acids are double stranded nucleic acids. In some embodiments, the endonuclease system produces a blunt-ended double-strand break in one or more target nucleic acids. In some embodiments, the endonuclease system produces a staggered double-strand break in one or more target nucleic acids.
In some embodiments, the sample is a blood, serum, plasma, urine, or cerebrospinal fluid sample. In some embodiments, the sample was obtained from a human cancer patient. In some embodiments, the plurality of target nucleic acids comprise cell-free DNA (cfDNA) or cell-free RNA (cfRNA).
In a step 110, a blood sample is obtained and circulating cell-free nucleic acids are isolated from the plasma fraction.
In a step 115, a sequencing library is prepared. In one example, the sequencing library is an NGS library prepared from cell free DNA. In another example, the sequencing library is an RNA-Seq library prepared from cell free RNA.
In a step 120, a CRISPR-Cas system is provided and targeted nucleic acid sequences are cleaved. The CRISPR-Cas system binds to the targeted sequence(s) and cleaves it. Because the system creates a double stranded break in the targeted sequence(s), these sequences cannot serve as templates for subsequent amplification reactions. Examples of targeted depletion of unwanted sequences for enrichment of mutated sequences are described in more detail with reference to
In a step 125, non-targeted sequences are amplified to produce an enriched sequencing library.
In a step 130, the enriched library is sequenced. In one example, sequencing is performed using a MiSeq, NextSeq or HiSeq system (Illumina, Inc.).
Depletion of Wild-Type Allele for Detection of Mutant AlleleWild-type DNA sequences can overwhelm detection of mutant DNA during analysis of cell free nucleic acids (e.g., ctDNA) from a biological sample. Provided herein are methods of enriching a mutant DNA sequence in a population of wild-type DNA sequences, wherein the wild-type allele of a target nucleic acid is depleted using differential CRISPR-Cas binding and cleavage. Cas protein binding at a target site is initiated by recognition of a PAM site that is adjacent to a targeted DNA sequence. Subsequent cleavage of the targeted site occurs if there is complete or near complete homology between the crRNA target sequence and the targeted DNA sequence. Mutations that ablate a PAM site prevent Cas protein binding and subsequent cleavage of the mutant DNA sequence.
In some embodiments, the wild-type allele sequence comprises a AKT1, BRAF, EGFR, KRAS, MAP2K1, NRAS, PI3KCA or PTEN wild-type allele sequence.
In some embodiments, the second variant of each of the plurality of target nucleic acids comprise a mutant allele sequence.
In some embodiments, the mutant allele sequence comprises a AKT1, BRAF, EGFR, KRAS, MAP2K1, NRAS, PI3KCA or PTEN mutant allele sequence.
PAM Site Recognition for Differential Cleavage and Enrichment of Mutations in ctDNAIn some embodiments, the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and the second variant does not comprise the PAM site.
In some embodiments, the region in the second variant corresponding to the PAM site in the first variant comprises a point mutation (e.g., 1 bp, 2 bps, or 3 bps point mutation). In some embodiments, the region in the second variant corresponding to the PAM site comprises a deletion mutation (e.g., deletion of 1 bp, 2 bps, 3bps, 4 bps, 5 bps, 6 bps, 7 bps, 8 bps, 9 bps, 10 bps, or more).
In some embodiments, the target nucleic acid comprising a mutation in a PAM site comprises AKT1 (e.g., AKT-E17K), BRAF (e.g., BRAF-G469A, BRAF-G466V), EGFR (e.g., EGFR-G719S, EGFR-G719C, EGFR-G719A, EGFR-L858R, EGFR-L861Q), KRAS (e.g., KRAS-G13A, KRAS-G13D, KRAS-G13C, KRAS-G13R, KRAS-G13S, KRAS-G12V, KRAS-G12A, KRAS-G12D, KRAS-G12C, KRAS-G12R, KRAS-G12S), NRAS (e.g., NRAS-G12A, NRAS-G12D, NRAS-G12C, NRAS-G12R, NRAS-G12S).
In some embodiments, the mutation in the PAM site (e.g., mutation in a region in the second variant corresponding to the PAM site of the first variant) comprises a C>A, C>G, C>T, G>A, G>C, G>T, T>A, or T>G mutation.
In some embodiments, the target nucleic acid comprising a mutation in a PAM site comprises a target gene or mutation as shown in
In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a PAM site comprising the sequence 5′-NGG-3′, wherein N comprises A, G, C, or T, and wherein the second variant does not comprise the PAM site.
Referring now to
Cas9 (from S. pyogenes) can be used for depletion of a wild-type allele and enrichment of a mutant allele wherein a mutation changes the G residue in a PAM site (5′-NGG-3′). For example, a Cas9 (from S. pyogenes) can be used for depletion of a wild-type allele and enrichment of a mutant allele wherein a mutation changes a glycine or proline residue to any other amino acid. Cas9 (from S. pyogenes) can also be used for depletion of a wild-type allele and enrichment of a mutant allele wherein a C>T mutation has occurred.
In some embodiments, the Cas9, or variant thereof, is derived from Streptococcus pyogenes.
CRISPR systems from other bacterial species (i.e., Cas9 orthologs) that recognize alternative PAM sequences can also be used for depletion of a wild-type allele and enrichment of a mutant allele. For example, Cpf1 derived from Francisell novicida U112 recognizes a PAM sequence 5′-TTN-3′.
In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a PAM site comprising the sequence 5′-TTN-3′, wherein N comprises A, G, C, or T, and wherein the second variant does not comprise the PAM site.
In some embodiments, the Cpf1, or variant thereof, is derived from Francisell novicida U112.
Referring now to
In some embodiments, the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and the second variant does not comprise the region complementary to the crRNA targeting sequence adjacent to the PAM site.
PAM recognition and Cas cleavage can also be used for enrichment and detection of a deletion mutation. For example, a wild-type allele has a PAM site which allows recognition and cleavage by a Cas protein (e.g., Cas9). Deletion of the PAM site in a mutant sequence ablates the PAM site thereby preventing Cas recognition and cleavage.
In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a PAM site, and wherein the second variant comprises a deletion of the PAM site.
In some embodiments, the deletion of the PAM site comprises a deletion of 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 12 or more, 14 or more, 16 or more, 18 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, or 50 or more base pairs (bps).
In some embodiments, the target nucleic acid comprising a second variant with a PAM deletion mutation comprises EGFR-Exon19del.
In some embodiments, the target nucleic acid comprising a second variant with a PAM deletion mutation comprises a target gene or mutation as shown in
Referring now to
PAM recognition and Cas cleavage can also be used for enrichment and detection of an insertion mutation. For example, an existing PAM site and adjacent target site present in a wild-type allele are separated by an insertion in a mutated sequence. Because the PAM site is separated from the target site, Cas recognition and cleavage of the mutant sequence does not occur.
In some embodiments, the first variant of one or more of the plurality of target nucleic acid sequences comprises a region complementary to a crRNA targeting sequence adjacent to a PAM site and the second variant comprises an insertion of 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 12 or more, 14 or more, 16 or more, 18 or more, or 20 or more base pairs (bps) within 50 bps, 40 bps, 30 bps, 20 bps, or 10 bps upstream (on the 5′-end) of the PAM site.
Referring now to
In some embodiments, the first variant of each target nucleic acid comprises a PAM site adjacent to a region complementary to a crRNA targeting sequence, and the second variant comprises the PAM site and does not comprise the region complementary to the crRNA targeting sequence.
In some embodiments, the region in the second variant corresponding to the crRNA targeting sequence in the first variant comprises a point mutation (e.g., 1 bp, 2 bps, or 3bps point mutation). In some embodiments, the region in the second variant corresponding to the crRNA targeting sequence comprises a deletion mutation (e.g., deletion of 1 bp, 2 bps, 3bps, 4 bps, 5 bps, 6 bps, 7 bps, 8 bps, 9 bps, 10 bps, or more). In some embodiments, the region in the second variant corresponding to the crRNA targeting sequence comprises an insertion mutation (e.g., deletion of 1 bp, 2 bps, 3bps, 4 bps, 5 bps, 6 bps, 7 bps, 8 bps, 9 bps, 10 bps, or more).
In some embodiments, the target nucleic acid comprising a mutation in the crRNA targeting sequence comprises BRAF (e.g., BRAF-V600E, BRAF-L597V), EGFR (e.g., EGFR-T790M, E709 T710delins), KRAS (e.g., KRAS-Q61H, KRAS-Q61L, KRAS-Q61R, KRAS-Q61K), MAP2K1 (e.g., MAP2K1-Q56P), NRAS (e.g., NRAS-Q61H, NRAS-Q61L, NRAS-Q61R, NRAS-Q61K), PIK3CA (e.g., PIK3CA-E542K, PIK3CA-E545Q, PIK3CA-E545K, PIK3CA-H1047R, PIK3CA-H104L), or PTEN (e.g., PTEN-R233*(Stop Codon)).
In some embodiments, the mutation in the crRNA targeting sequence comprises a A>C, A>G, A>T, C>A, C>G, C>T, G>A, G>C, G>T, T>A, T>C, or T>G mutation.
In some embodiments, the target nucleic acid comprising a mutation in a crRNA targeting sequence comprises a target gene or mutation as shown in
Cas cleavage of a targeted DNA sequence occurs if there is sufficient complementarity between the guide RNA (crRNA) targeting sequence and the targeted DNA. The targeting sequence of a guide RNA can be designed to discriminate between a wild-type allele and a mutant allele(s), wherein the mutant allele includes a mutation in the region targeted by the guide RNA.
In one example, addition of one or more base mismatches in the targeting region of a guide RNA can be used to enhance the specificity of a guide RNA and discriminate between a wild-type allele and a mutant allele(s).
In another example, truncation of the targeting region of a guide RNA can be used to enhance the specificity of a guide RNA and discriminate between a wild-type allele and a mutant allele(s).
Referring now to
In some embodiments, provided herein is a “tool box” of Cas proteins that are programmed with a set of guide RNAs having different wild-type specificities. The tool box of Cas proteins constitutes an “enzymatic enrichment panel” that can be used for multiplexed enrichment of multiple mutated sequences. The tool box can include, for example, Cas proteins (i.e., Cas9 and Cas9 orthologs) with different PAM site recognition (e.g., Cas9: 5′-NGG-3; Cpf1: 5′-TTN-3′) and guide RNAs having targeting sequences with enhanced single base specificity for discriminating between a wild-type allele and a mutant allele. In one example, the enzymatic enrichment panel can be used alone for enrichment of mutant alleles for cancer detection and diagnosis. In another example, the enzymatic enrichment panel can be used in combination with an oligonucleotide probe-based enrichment system.
In another example (B), in the EGFR gene, a deletion in exon 19 (Exon19del) ablates a Cas9 PAM site in a mutant allele. Because the Cas9 PAM site is disrupted in the mutant allele, Cas9 and a guide RNA with a targeting sequence complementary to the sequence adjacent to the PAM site can be used to target the wild-type EGFR allele for cleavage as described above with reference to
In yet another example (C), in the KRAS gene, the G12D mutation results in an amino acid substitution at position 12 (from a glycine (G) to an aspartic acid (D)), wherein a G to A base change ablates a Cas9 PAM site. Because the Cas9 PAM site is disrupted in the mutant allele, Cas9 and a guide RNA with a targeting sequence complementary to the sequence adjacent to the PAM site can be used to target the wild-type KRAS allele for selective cleavage as described above with reference to
In some embodiments, the mutant allele sequence comprises a AKT1-E17K, BRAF-V600E, BRAF-L597V, BRAF-G469A, BRAF-G466V, EGFR-E709 T710delins, EGFR-G719S, EGFR-G719C, EGFR-G719A, EGFR-Exon19del, EGFR-T790M, EGFR-L858R, EGFR-L861Q, KRAS-Q61H, KRAS-Q61L, KRAS-Q61R, KRAS-Q61K, KRAS-G13A, KRAS-G13D, KRAS-G13C, KRAS-G13R, KRAS-G13D, KRAS-G13C, KRAS-G13R, KRAS-G13S, KRAS-G12V, KRAS-G12A, KRAS-G12D, KRAS-G12D, KRAS-G12C, KRAS-G12R, KRAS-G12S, MAP2K1-Q56P, NRAS-Q61H, NRAS-Q61L, NRAS-Q61R, NRAS-Q61K, NRAS-G12A, NRAS-G12D, NRAS-G12C, NRAS-G12R, NRAS-G12S, PI3KCA-E542K, PI3KCA-E545Q, PI3KCA-E545K, PI3KCA-H1047R, PI3KCA-H1047L, or PTEN-R233* mutant allele sequence.
In some embodiments, the mutant allele sequence comprises a mutant allele sequence according to
Abundant transcripts can overwhelm detection of less abundant transcripts during analysis of an RNA-Seq library prepared from cell free RNA isolated from a biological sample. In some embodiments, the methods provided herein are used to deplete unwanted high-abundance sequences from an RNA-Seq library. Examples of unwanted high-abundance sequences include, but are not limited to, ribosomal RNA and globin RNA.
In some embodiments, the methods provided herein are used to enrich for a fusion transcript (RNA fusions). A fusion transcript is a chimeric RNA encoded, for example, by a fusion gene created by DNA translocation between two genes in their introns and subsequent splicing to remove the intron, or by the trans-splicing of exons in two different transcripts. Certain fusion transcripts are commonly produced by cancer cells, and detection of fusion transcripts is part of routine diagnostics of certain cancer types (e.g Temprss2-Erg translocations in prostate cancer).
In some embodiments, the endonuclease system further comprises a crRNA and Cas protein targeting an abundant wild-type target nucleic acid in the sample. In some embodiments, the abundant wild-type target nucleic acid is a ribonucleic acid (e.g., a rRNA, mRNA (e.g., a globin mRNA), or tRNA). In some embodiments, the abundant wild-type target nucleic acid is selected from the group consisting of a ribosomal RNA and a globin RNA.
In some embodiments, in the second variant the region adjacent to the PAM site comprises the junction of a fusion gene.
Referring to
CRISPR-Cas systems can be used to mediate differential nucleic acid fragmentation for preparation of a genomic library for sequencing. For example, a pool of Cas9 proteins that are programmed with guide RNAs targeting regions across a genome (e.g., a reference genome) can be used for targeted fragmentation of genomic DNA for subsequent sequencing.
In another aspect, provided herein is a method for analyzing the genome of a cancer patient, comprising providing an endonuclease system, wherein the endonuclease system comprises a plurality of crRNAs, or derivatives thereof, each crRNA comprising a targeting sequence, and a plurality of Cas proteins, or variants thereof, each Cas protein capable of binding to a PAM site on a target nucleic acid; contacting a sample obtained from the cancer patient comprising a plurality of target nucleic acids with the endonuclease system to obtain a pool of target nucleic acid fragments; sequencing the pool of target nucleic acid fragments to obtain sequencing data from the cancer patient, and comparing the sequencing data from the cancer patient with sequencing data from a reference genome fragmented by the endonuclease system to detect structural rearrangements and mutations in the genome of the cancer patient.
In some embodiments, the sequencing data comprises comparing the fragmentation pattern of the pool of target nucleic acids from the cancer patient with the fragmentation pattern of a pool of target nucleic acids in the reference genome.
In some embodiments, the method comprises contacting a sample obtained from a healthy subject comprising a plurality of target nucleic acids with the endonuclease system to obtain a pool of target nucleic acid fragments, and sequencing the pool of target nucleic acid fragments to obtain the sequencing data from the reference genome.
In some embodiments, the endonuclease system cleaves target nucleic acids in a sample from a healthy subject at a predetermined interval.
In some embodiments, the predetermined interval is about 300 bp.
In a step 910, a pool of Cas9 proteins that are programmed with guide RNAS targeting regions across a reference genome is prepared. In one example, the pool of Cas9 proteins is prepared such that genomic DNA is cleaved about every 300 bp.
In a step 915, genomic DNA samples from a healthy control subject and a cancer patient are obtained.
In a step 920, genomic DNA from the control subject and the cancer patient are fragmented (in separate reactions) using the pool of Cas9 proteins.
In a step 925, sequencing libraries are prepared from the fragmented control DNA and the patient DNA.
In a step 930, the libraries are sequenced.
In a step 935, the sequencing data is analyzed to reveal structural rearrangements and mutations in the cancer genome.
Sample Types and Disease ConditionsMethods in accordance with embodiments of the invention can be carried out on any suitable sample type that contains a plurality of nucleic acids, as described above. For example, in some embodiments, a sample comprises a biological fluid. Non-limiting examples of biological fluids include, e.g., blood, plasma, serum, urine, saliva, pleural fluid, pericardial fluid, cerebrospinal fluid (CSF), peritoneal fluid, amniotic fluid, and combinations thereof. In some embodiments, a sample comprises a non-liquid biological sample. Non-limiting examples of non-liquid biological samples include tissue biopsies, such as, e.g., a cancerous tissue biopsy, a healthy tissue biopsy, a fetal tissue sample, or a combination thereof. In some embodiments, a sample comprises a liquid or a non-liquid biological sample collected from a transplanted organ.
In some embodiments, the subject methods are carried out on a patient having or suspected of having a cancer. In some embodiments, a patient is a human or non-human animal. Non-limiting examples of cancers include, e.g., a carcinoma, a sarcoma, a myeloma, a leukemia, a lymphoma, a blastoma, a germ cell tumor, or any combination thereof. In some embodiments, a carcinoma is an adenocarcinoma. In some embodiments, a carcinoma is a squamous cell carcinoma. In some embodiments, a carcinoma is a small cell lung cancer, non-small-cell lung, nasopharyngeal, colorectal, anal, liver, urinary bladder, testicular, cervical, ovarian, gastric, esophageal, head-and-neck, pancreatic, prostate, renal, thyroid, melanoma, or breast carcinoma. In some embodiments, a breast cancer is hormone receptor negative breast cancer or triple negative breast cancer.
In some embodiments, a sarcoma is an osteosarcoma, chondrasarcoma, leiomyosarcoma, rhabdomyosarcoma, mesothelial sarcoma (mesothelioma), fibrosarcoma, angiosarcoma, liposarcoma, glioma, or astrocytoma.
In some embodiments, a leukemia is a myelogenous, granulocytic, lymphatic, lymphocytic, or lymphoblastic leukemia. In some embodiments, a lymphoma is selected from the group consisting of: Hodgkin's lymphoma and Non-Hodgkin's lymphoma.
In some embodiments, the subject methods are carried out on a sample that is obtained from a pregnant female patient. In some embodiments, a sample is obtained from a fetus that is gestating within a pregnant female patient. In some embodiments, a pregnant female patient is a human.
In some embodiments, the subject methods are carried out on a sample that is obtained from a patient that has undergone an organ transplantation procedure, and the sample is obtained from the patient, or is obtained directly from the transplanted organ.
In some embodiments, the subject methods are carried out on a sample that is obtained from a healthy patient, or from a patient with a known disease condition (e.g., a previously diagnosed cancer).
The methods of the present disclosure find use in connection with any of a variety of healthy and/or disease conditions. For example, in some embodiments, the subject methods are carried out on a sample that is obtained from a healthy subject. In some embodiments, the subject methods are carried out on a sample that is obtained from a subject that is suspected of having an unknown disease or condition, e.g., an unknown cancer, or an unknown genetic abnormality. In some embodiments, the subject methods are carried out on a sample that is obtained from a subject that is known to have a specific disease or condition, e.g., a specific type of cancer.
In some embodiments, the subject methods are carried out on a sample from a pregnant female patient, and the methods involve analyzing a sample that is obtained from the pregnant female patient, from a gestating fetus within the pregnant female patient, or both.
In some embodiments, the subject methods are carried out on a sample from a patient that has undergone an organ transplantation procedure, and the methods involve analyzing a sample that is obtained from the patient, from the transplanted organ, or both.
Concluding RemarksThe foregoing detailed description of embodiments refers to the accompanying drawings, which illustrate specific embodiments of the present disclosure. Other embodiments having different structures and operations do not depart from the scope of the present disclosure. This specification is divided into sections for the convenience of the reader only. Headings should not be construed as limiting of the scope of the methods or compositions provided herein. The definitions are intended as a part of the description of the methods or compositions provided herein. It will be understood that various details of the methods or compositions provided herein can be changed without departing from the scope of the methods or compositions provided herein. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation.
Claims
1-40. (canceled)
41. A method for analyzing the genome of a cancer patient, comprising:
- providing an endonuclease system that comprises a plurality of crRNAs, or derivatives thereof, and a plurality of Cas proteins, or variants thereof; wherein each crRNA comprises a targeting sequence; and wherein each Cas protein is capable of binding to a PAM site on a target nucleic acid;
- contacting a patient sample, obtained from the cancer patient and comprising a plurality of target nucleic acids, with the endonuclease system to obtain a pool of target nucleic acid fragments associated with the patient sample, wherein each target nucleic acid fragment of the pool of target nucleic acid fragments is a DNA fragment of cell-free DNA (cfDNA);
- sequencing the pool of target nucleic acid fragments to obtain first sequencing data from the cancer patient; and
- comparing the first sequencing data from the cancer patient with second sequencing data from a reference genome fragmented by the endonuclease system to detect structural rearrangements and mutations in the genome of the cancer patient.
42. The method of claim 41, wherein comparing the sequencing data comprises:
- comparing a first fragmentation pattern of the pool of target nucleic acids fragments associated with the patient sample with a second fragmentation pattern of a pool of target nucleic acids fragments from the reference genome.
43. The method of claim 42, further comprising:
- contacting a control sample, obtained from a control subject and comprising a plurality of target nucleic acids, with the endonuclease system to obtain the pool of target nucleic acid fragments from the reference genome and associated with the control sample; and
- sequencing the pool of target nucleic acid fragments associated with the control sample to obtain the second sequencing data from the reference genome.
44. The method of claim 43, wherein the endonuclease system cleaves target nucleic acids in the control sample at a predetermined interval.
45. The method of claim 44, wherein the predetermined interval is about 300 bp.
46. The method of claim 41, wherein the plurality of Cas proteins comprises Cas9, or a variant thereof, a Cas9 ortholog, or a variant thereof, or Cpf1, or a variant thereof.
47. The method of claim 46, wherein the Cas9, or variant thereof, is derived from Streptococcus pyogenes, and wherein the Cpf1, or variant thereof, is derived from Francisell novicida U112.
48. The method of claim 41, wherein the patient sample comprises a blood sample, a serum sample, a plasma sample, a urine sample, or a cerebrospinal fluid sample.
49. The method of claim 41, wherein a sequence in the plurality of target nucleic acids in the patient sample comprises a mutant allele sequence selected from a group consisting of: AKT1, BRAF, EGFR, KRAS, MAP2K1, NRAS, PI3KCA and PTEN.
50. The method of claim 41, wherein a sequence in the plurality of target nucleic acids in the patient sample comprises a point mutation of at least one base pair.
51. The method of claim 41, wherein the cfDNA comprises circulating tumor DNA (ctDNA) that includes a mutation selected from a group consisting of: a single nucleotide mutation, an insertion, and a deletion.
52. The method of claim 41, wherein a target nucleic acid fragment of the pool of target nucleic acid fragments has either a blunt end or a staggered end.
53. The method of claim 41, wherein a sequence in the plurality of target nucleic acids in the patient sample comprises a mutant allele sequence selected from a group consisting of: AKT1-E17K, BRAF-V600E, BRAF-L597V, BRAF-G469A, BRAF-G466V, EGFR-E709 T710delins, EGFR-G719S, EGFR-G719C, EGFR-G719A, EGFR-Exon19del, EGFR-T790M, EGFR-L858R, EGFR-L861Q, KRAS-Q61H, KRAS-Q61L, KRAS-Q61R, KRAS-Q61K, KRAS-G13A, KRAS-G13D, KRAS-G13C, KRAS-G13R, KRAS-G13D, KRAS-G13C, KRAS-G13R, KRAS-G13S, KRAS-G12V, KRAS-G12A, KRAS-G12D, KRAS-G12D, KRAS-G12C, KRAS-G12R, KRAS-G12S, MAP2K1-Q56P, NRAS-Q61H, NRAS-Q61L, NRAS-Q61R, NRAS-Q61K, NRAS-G12A, NRAS-G12D, NRAS-G12C, NRAS-G12R, NRAS-G12S, PI3KCA-E542K, PI3KCA-E545Q, PI3KCA-E545K, PI3KCA-H1047R, PI3KCA-H1047L, and PTEN-R233*.
54. The method of claim 41, wherein a sequence in the plurality of target nucleic acids in the patient sample comprises a mutation identified in FIG. 7.
Type: Application
Filed: Apr 10, 2023
Publication Date: Aug 31, 2023
Inventors: Gordon Cann (Redwood City, CA), Alex Aravanis (San Mateo, CA), Arash Jamshidi (Menlo Park, CA), Rick Klausner (Los Altos Hills, CA), Richard Rava (Redwood City, CA)
Application Number: 18/298,043