IDENTIFICATION AND USE OF CIRCULATING TUMOR MARKERS

Info

Publication number: 20140296081
Type: Application
Filed: Mar 13, 2014
Publication Date: Oct 2, 2014
Inventors: Maximilian Diehn (Stanford, CA), Arash Ash Alizadeh (San Mateo, CA), Aaron M. Newman (Palo Alto, CA), Scott V. Bratman (Palo Alto, CA)
Application Number: 14/209,807

Abstract

Methods for creating a library of recurrently mutated genomic regions and for using the library to analyze cancer-specific and patient-specific genetic alterations in a patient are provided. The methods can be used to measure tumor-derived nucleic acids in patient blood and thus to monitor the progression of disease. The methods can also be used for cancer screening.

Description

Description

STATEMENT OF GOVERNMENTAL SUPPORT

This invention was made with government support under grant number W81XWH-12-1-0285 awarded by the Department of Defense. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

Analysis of cancer-derived cell-free DNA (cfDNA) has the potential to revolutionize detection and monitoring of cancer. Noninvasive access to malignant DNA is particularly attractive for solid tumors, which cannot be repeatedly sampled without invasive procedures. In non-small cell lung cancer (NSCLC), PCR-based assays have been used previously to detect recurrent point mutations in genes such as KRAS or EGFR in plasma DNA (Taniguchi et al. (2011) Clin. Cancer Res. 17:7808-7815; Gautschi et al. (2007) Cancer Lett. 254:265-273; Kuang et al. (2009) Clin. Cancer Res. 15:2630-2636; Rosell et al. (2009) N. Engl. J. Med. 361:958-967), but the majority of patients lack mutations in these genes. Other studies have proposed identifying patient-specific chromosomal rearrangements in tumors via whole genome sequencing (WGS), followed by breakpoint qPCR from cfDNA (Leary et al. (2010) Sci. Transl. Med. 2:20ra14; McBride et al. (2010) Genes Chrom. Cancer 49:1062-1069). While sensitive, such methods require optimization of molecular assays for each patient, limiting their widespread clinical application. More recently, several groups have reported amplicon-based deep sequencing methods to detect cfDNA mutations in up to 6 recurrently mutated genes (Forshew et al. (2012) Sci. Transl. Med. 4:136ra168; Narayan et al. (2012) Cancer Res. 72:3492-3498; Kinde et al. (2011) Proc. Natl Acad. Sci. USA 108:9530-9535). While powerful, these approaches are limited by the number of mutations that can be interrogated (Rachlin et al. (2005) BMC Genomics 6:102) and the inability to detect genomic fusions.

PCT International Patent Publication No. 2011/103236 describes methods for identifying personalized tumor markers in a cancer patient using “mate-paired” libraries. The methods are limited to monitoring somatic chromosomal rearrangements, however, and must be personalized for each patient, thus limiting their applicability and increasing their cost.

U.S. Patent Application Publication No. 2010/0041048 A1 describes the quantitation of tumor-specific cell-free DNA in colorectal cancer patients using the “BEAMing” technique (Beads, Emulsion, Amplification, and Magnetics). While this technique provides high sensitivity and specificity, this method is for single mutations and thus any given assay can only be applied to a subset of patients and/or requires patient-specific optimization. U.S. Patent Application Publication No. 2012/0183967 A1 describes additional methods to identify and quantify genetic variations, including the analysis of minor variants in a DNA population, using the “BEAMing” technique.

U.S. Patent Application Publication No. 2012/0214678 A1 describes methods and compositions for detecting fetal nucleic acids and determining the fraction of cell-free fetal nucleic acid circulating in a maternal sample. While sensitive, these methods analyze polymorphisms occurring between maternal and fetal nucleic acids rather than polymorphisms that result from somatic mutations in tumor cells. In addition, methods that detect fetal nucleic acids in maternal circulation require much less sensitivity than methods that detect tumor nucleic acids in cancer patient circulation, because fetal nucleic acids are much more abundant than tumor nucleic acids.

U.S. Patent Application Publication Nos. 2012/0237928 A1 and 2013/0034546 describe methods for determining copy number variations of a sequence of interest in a test sample comprising a mixture of nucleic acids. While potentially applicable to the analysis of cancer, these methods are directed to measuring major structural changes in nucleic acids, such as translocations, deletions, and amplifications, rather than single nucleotide variations.

U.S. Patent Application Publication No. 2012/0264121 A1 describes methods for estimating a genomic fraction, for example, a fetal fraction, from polymorphisms such as small base variations or insertions-deletions. These methods do not, however, make use of optimized libraries of polymorphisms, such as, for example, libraries containing recurrently-mutated genomic regions.

U.S. Patent Application Publication No. 2013/0024127 A1 describes computer-implemented methods for calculating a percent contribution of cell-free nucleic acids from a major source and a minor source in a mixed sample. The methods do not, however, provide any advantages in identifying or making use of optimized libraries of polymorphisms in the analysis.

PCT International Publication No. WO 2010/141955 A2 describes methods of detecting cancer by analyzing panels of genes from a patient-obtained sample and determining the mutational status of the genes in the panel. The methods rely on a relatively small number of known cancer genes, however, and they do not provide any ranking of the genes according to effectiveness in detection of relevant mutations. In addition, the methods were unable to detect the presence of mutations in the majority of serum samples from actual cancer patients.

There is thus a need for new and improved methods to detect and monitor tumor-related nucleic acids in cancer patients.

SUMMARY OF THE INVENTION

The present invention addresses these and other problems by providing novel methods and systems relating to the characterization, diagnosis, and monitoring of cancer. In particular, according to one aspect, the invention provides methods for creating a library of recurrently mutated genomic regions comprising:

identifying a plurality of genomic regions from a group of genomic regions that are recurrently mutated in a specific cancer;

wherein the library comprises the plurality of genomic regions;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In specific embodiments of these methods, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

In other specific method embodiments, at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In still other specific method embodiments, at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

In some embodiments, the identifying step comprises for each genomic region in the plurality of genomic regions, ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

In other embodiments, the identifying step comprises for each genomic region in the plurality of genomic regions, ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

In some embodiments, the library comprises a plurality of genomic regions encoding a plurality of driver sequences, more specifically known driver sequences or driver sequences that are recurrently mutated in the specific cancer.

In some embodiments, the library comprises a plurality of genomic regions that are recurrently rearranged in the specific cancer.

In preferred embodiments, the specific cancer is a carcinoma, and in more preferred embodiments, the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

In specific embodiments, the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

In another aspect, the invention provides methods for analyzing a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer;

sequencing a plurality of target regions in the tumor nucleic acid sample and in the genomic nucleic acid sample to obtain a plurality of tumor nucleic acid sequences and a plurality of genomic nucleic acid sequences; and

comparing the plurality of tumor nucleic acid sequences to the plurality of genomic nucleic acid sequences to identify a patient-specific genetic alteration in the tumor nucleic acid sample;

wherein the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In specific embodiments of this aspect of the invention, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

In other specific embodiments, at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In still other specific embodiments, at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

In some embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

In other embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

In some embodiments, the plurality of genomic regions comprises genomic regions encoding a plurality of driver sequences, more specifically known driver sequences or driver sequences that are recurrently mutated in the specific cancer.

In some embodiments, the plurality of genomic regions comprises genomic regions that are recurrently rearranged in the specific cancer.

In preferred embodiments, the specific cancer is a carcinoma, and in more preferred embodiments, the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

In specific embodiments, the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

In some embodiments, the methods further comprising the steps of:

obtaining a cell-free nucleic acid sample from the subject; and

identifying the patient-specific genetic alteration in the cell-free nucleic acid sample.

In specific embodiments, the step of identifying the patient-specific genetic alteration in the cell-free nucleic acid sample comprises sequencing a genomic region comprising the patient-specific genetic alteration in the cell-free sample.

In other specific embodiments, the step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample comprises the step of enriching the plurality of target regions in the tumor nucleic acid sample and the genomic nucleic acid sample, and in more specific embodiments, the enriching step comprises use of a custom library of biotinylated DNA.

In still other specific embodiments, the step of obtaining a cell-free nucleic acid sample comprises the step of enriching the plurality of target regions in the cell-free nucleic acid sample, and in still more specific embodiments, the enriching step comprises use of a custom library of biotinylated DNA.

In some embodiments, the methods further comprise the step of quantifying the cancer-specific genetic alteration in the cell-free sample.

In yet another aspect, the invention provides methods for screening a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a cell-free nucleic acid sample from a subject;

sequencing a plurality of target regions in the cell-free sample to obtain a plurality of cell-free nucleic acid sequences; and

identifying a cancer-specific genetic alteration in the cell-free sample;

wherein the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In specific embodiments, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

In other specific embodiments, at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

In still other specific embodiments, at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

In particular embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

In other particular embodiments, each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

In still other particular embodiments, the plurality of genomic regions comprises genomic regions encoding a plurality of driver sequences, and, more particularly, the driver sequences are known driver sequences or are recurrently mutated in the specific cancer.

In yet still other particular embodiments, the plurality of genomic regions comprises genomic regions that are recurrently rearranged in the specific cancer.

In some embodiments, the specific cancer is a carcinoma, including, for example, an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

In specific embodiments, the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

In other specific embodiments, the step of obtaining a cell-free nucleic acid sample comprises the step of enriching the plurality of target regions in the cell-free nucleic acid sample, and, in some embodiments, the enriching step comprises use of a custom library of biotinylated DNA.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Development of CAncer Personalized Profiling by Deep Sequencing (CAPP-Seq). (a) Schematic depicting design of CAPP-Seq selectors and their application for assessing circulating tumor DNA. (b) Multi-phase design of the NSCLC CAPP-Seq selector. (c) Analysis of the number of SNVs per lung adenocarcinoma covered by the NSCLC CAPP-Seq selector in the TCGA WES cohort (Training; N=229) and an independent lung adenocarcinoma WES data set (Validation; N=183) (Imielinski et al. (2012) Cell 150:1107-1120). (d) Number of SNVs per patient identified by the NSCLC CAPP-Seq selector in WES data from three adenocarcinomas from TCGA, colon (COAD), rectal (READ), and endometrioid (UCEC) cancers. (e-f) Quality parameters from a representative CAPP-Seq analysis of plasma cfDNA, including length distribution of sequenced cfDNA fragments (e), and depth of sequencing coverage across all genomic regions in the selector (f). (g) Variation in sequencing depth across cfDNA samples from 4 patients.

FIG. 2. CAPP-Seq computational pipeline. Major steps of the bioinformatics pipeline for mutation discovery and quantitation in plasma are schematically illustrated.

FIG. 3. Statistical enrichment of recurrently mutated NSCLC exons captures known drivers.

FIG. 4. Development of the FACTERA algorithm. Major steps used by FACTERA (see Detailed Methods) to precisely identify genomic breakpoints from aligned paired-end sequencing data are anecdotally illustrated using two hypothetical genes, w and v. (a) Improperly paired, or “discordant,” reads (indicated in yellow) are used to locate genes involved in a potential fusion (in this case, w and v). (b) Because truncated (i.e., soft-clipped) reads may indicate a fusion breakpoint, any such reads within genomic regions delineated by w and v are also further analyzed. (c) Consider soft-clipped reads, R1 and R2, whose non-clipped segments map to w and v, respectively. If R1 and R2 derive from a fragment encompassing a true fusion between w and v, then the mapped portion of R1 should match the soft-clipped portion of R2, and vice versa. This is assessed by FACTERA using fast k-mer indexing and comparison. (d) Four possible orientations of R1 and R2 are depicted. However, only Cases 1a and 2a can generate valid fusions (see Detailed Methods). Thus, prior to k-mer comparison (panel c), the reverse complement of R1 is taken for Cases 1b and 2b, respectively, converting them into Cases 1a and 2a. (e) In some cases, short sequences immediately flanking the breakpoint are identical, preventing unambiguous determination of the breakpoint. Let iterators i and j denote the first matching sequence positions between R1 and R2. To reconcile sequence overlap, FACTERA arbitrarily adjusts the breakpoint in R2 (i.e., bp2) to match R1 (i.e., bp1) using the sequence offset determined by differences in distance between bp2 and i, and bp1 and j. Two cases are illustrated, corresponding to sequence orientations described in (d).

FIG. 5. Application of FACTERA to NSCLC cell lines NCI-H3122 and HCC78, and Sanger-validation of breakpoints. (a) Pile-up of a subset of soft-clipped reads mapping to the EML4-ALK fusion identified in NCI-H3122 along with the corresponding Sanger chromatogram. (b) Same as (a), but for the SLC34A2-ROS1 translocation identified in HCC78.

FIG. 6. Improvements in CAPP-Seq performance with optimized library preparation procedures.

FIG. 7. Optimizing allele recovery from low input cfDNA during Illumina library preparation.

FIG. 8. CAPP-Seq performance with various amounts of input cfDNA.

FIG. 9. Analysis of CAPP-Seq background, allele detection threshold, and linearity. (a) Analysis of background rate for 6 NSCLC patient plasma samples and a healthy individual (Detailed Methods). (b) Analysis of biological background in (a) focusing on 107 recurrent somatic mutations from a previously reported SNaPshot panel (Su et al. (2011) J. Mol. Diagn. 13:74-84). Mutations found in a given patient's tumor were excluded. The mean frequency for each patient (horizontal red line) was within confidence limits of the mean background limit of 0.007% (horizontal blue line; panel a). A single outlier mutation (TP53 R175H) is indicated by an orange diamond. (c) Individual mutations from (b) ranked by most to least recurrent, according to median frequency across the 7 samples. (d) Dilution series analysis of expected versus observed frequencies of mutant alleles using CAPP-Seq. Dilution series were generated by spiking fragmented HCC78 DNA into control cfDNA. (e) Analysis of the effect of the number of SNVs considered on the estimates of fractional abundance (95% confidence intervals shown in gray). (f) Analysis of the effect of the number of SNVs considered on the mean correlation coefficient between expected and observed cancer fractions (blue dashed line) using data from panel (d). 95% confidence intervals are shown for (a)-(c). Statistical variation for (d) is shown as s.e.m.

FIG. 10. Empirical spiking analysis of CAPP-Seq using two NSCLC cell lines. (a) Expected and observed (by CAPP-Seq) fractions of NCI-H3122 DNA spiked into control HCC78 DNA are linear for all fractions tested (0.1%, 1%, and 10%; R²=1). (b) Using data from (a), analysis of the effect of the number of SNVs considered on the estimates of fractional abundance (95% confidence intervals shown in gray). (c) Analysis of the effect of the number of SNVs considered on the mean correlation coefficient and coefficient of variation between expected and observed cancer fractions (dashed lines) using data from panel (a). (d) Expected and observed fractions of the EML4-ALK fusion present in HCC78 are linear (R²=0.995) over all spiking concentrations tested (see FIG. 5(b) for breakpoint verification). The observed EML4-ALK fractions were normalized based on the relative abundance of the fusion in 100% H3122 DNA (see Detailed Methods for details). Moreover, a single heterozygous insertion (indel) discovered within the selector space of NCI-H3122 (chr7: 107416855, +T) was concordant with defined concentrations (shown are observed fractions adjusted for zygosity).

FIG. 11. Application of CAPP-Seq for noninvasive detection and monitoring of circulating tumor DNA. (a) Characteristics of 11 patients included in this study (Table 3). P-values reflect a two-sided paired t-test for patients with reporter SNVs detected at both time points; other p-values were determined as described in Methods. ND, mutant DNA was not detected above background. Dashes, plasma sample not available. Smoking history, ≧20 pack years (heavy), >0 pack years (light). (b-d) Disease monitoring using CAPP-Seq. Mutant allele frequencies (left y-axis) and absolute concentrations (right y-axis) are shown. The lower limit of detection (defined in FIG. 2(a)-(b)) is indicated by the dashed lines. (b) Pre- and post-surgery circulating tumor DNA levels quantified by CAPP-Seq in a Stage IB and a Stage IIIA NSCLC patient. Complete resections were achieved in both cases. (c) Disease burden changes in response to chemotherapy in a Stage IV NSCLC patient with three rearrangement breakpoints identified by CAPP-Seq. Tumor volume based on CT measurements and CAPP-Seq mutant allele frequencies are shown. Tu, tumor; Ef, pleural effusion. (d) Detection and monitoring of a subclonal EGFR T790M resistance mutation in a patient with Stage IV NSCLC. The fractional abundance of the dominant clone and T790M-containing clone are shown in the primary tumor (left) and plasma samples (right). (e) Predicted transcripts of three fusion genes detected in case P9. (f) Statistically significant co-occurrence of ROS1 fusions and U2AF1 S34F mutations in NSCLC (P=0.0019; two-sided Fisher's exact test). (g) Exploratory analysis of the potential application of CAPP-Seq for cancer screening. Pre-treatment plasma samples from panel (a) and a plasma sample from a healthy individual were examined for the presence of mutant allele outliers without knowledge of the primary tumor mutations (see Detailed Methods). Error bars represent s.e.m.

FIG. 12. Base-pair resolution breakpoint mapping for all patients and cell lines enumerated by FACTERA. Gene fusions involving ALK (a) and ROS1 (b) are graphically depicted. Schematics in the top panels indicate the exact genomic positions (HG19 NCBI Build 37.1/GRCh37) of the breakpoints in ALK, ROS1, EML4, KIF5B, SLC34A2, CD74, MKX, and FYN. Bottom panels depict exons flanking the predicted gene fusions with notation indicating the 5′ fusion partner gene and last fused exon followed by the 3′ fusion partner gene and first fused exon. For example, in S13del37; R34 exons 1-13 of SLC34A2 (excluding the 3′ 37 nucleotides of exon 13) are fused to exons 34-43 of ROS1. Exons in FYN are from its 5′UTR and precede the first coding exon. The green dotted line in the predicted FYN-ROS1 fusion indicates the first in-frame methionine in ROS1 exon 33, which preserves an open reading frame encoding the ROS1 kinase domain. All rearrangements were each independently confirmed by PCR and/or FISH.

FIG. 13. Presence of fusions is inversely related to the number of SNVs detected by CAPP-Seq. For each patient listed in FIG. 11(a) the number of identified SNVs versus the presence or absence of detected genomic fusions are plotted. The shading of the symbols is identical to FIG. 11(a), and indicates smoking history. Statistical significance was determined using a two-sided Wilcoxon rank sum test, and error bars indicate s.e.m.

FIG. 14. Different types of reporters are similarly useful for disease monitoring. Three SNVs and an ALK translocation identified in patient 6 are concordant at each time point, showing a comparable drop in fractional abundance after treatment with the ALK kinase inhibitor Crizotinib. Due to small differences in measured allele frequencies at each time point, linear regression was used to fit all allele frequencies to their adjusted mutant cfDNA concentrations (R²=0.93). Thus, the scale on the right y-axis is interpolated. To accurately quantify disease burden, translocation and SNV frequencies were adjusted based on differences in zygosity and sequencing depth in the tumor sample (see Detailed Methods).

FIG. 15. Flow cytometry-analysis of P9 pleural effusion. Flow cytometry of cryopreserved cells from a pleural effusion revealed only 0.22% of cells stained positive for the epithelial marker, EpCAM, and negative for the lineage markers CD31 (endothelial cells) and CD45 (immune cells). FACS was used to enrich tumor cells and analysis of tumor-enriched genomic DNA identified 3 fusions (FIG. 11(e)), while unsorted low purity tumor specimen hampered de novo fusion discovery using FACTERA (Detailed Methods).

FIG. 16. Analysis of RNA-Seq data from lung adenocarcinoma patients in TCGA identifies 2 candidate cases with ROS1 rearrangements. (a) ROS1 fusions are known to result in over-expression of the C-terminal kinase domain, and breakpoints typically occur downstream of exon 31 (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870; Rikova et al. (2007) Cell 131:1190-1203; Takeuchi et al. (2012) Nat. Med. 18:378-381). Exon-level RPKM values for ROS1 are plotted for 163 LUAD patients. Two patients (TCGA-05-4426 and TCGA-64-1680) have expression patterns suggestive of ROS1 fusions. (b,c) Pileups of RNA-Seq reads in these two patients illustrate an abundance of reads mapping to regions surrounding ROS1 exon boundaries. Colored reads indicate discordant pairs, consistent with ROS1 fusions. Such pairs map to SLC34A2 for patient TCGA-05-4426 (b) and CD74 for patient TCGA-64-1680 (c). A single soft-clipped RNA-Seq read supports a ROS1-CD74 fusion event in TCGA-64-1680.

FIG. 17. Non-invasive cancer screening with CAPP-Seq, related to FIG. 11(g). (a) Steps to identify candidate SNVs in plasma cfDNA demonstrated using a patient sample with NSCLC (P6, see Table 3). Following stepwise filtration, outlier detection is applied (Detailed Methods). (b) Same as (a), but using a plasma cfDNA sample from a patient who had their tumor surgically removed. No SNVs are identified, as expected. (c) Three additional representative samples applying retrospective screening to patients analyzed in this study. P2 and P5 samples have confirmed tumor-derived SNVs, while P9 is cancer positive but lacks tumor-derived SNVs. Red points, confirmed tumor-derived SNVs; Green points, background noise.

DETAILED DESCRIPTION OF THE INVENTION

Tumors continually shed DNA into the circulation, where it is readily accessible. Stroun et al. (1987) Eur J Cancer Clin Oncol 23:707-712. Provided herein are methods for the ultrasensitive detection of circulating tumor DNA called CAncer Personalized Profiling by Deep Sequencing (CAPP-Seq). Also provided are methods for creating libraries of recurrently mutated genomic regions used in the CAPP-Seq methods. CAPP-Seq targets hundreds of recurrently mutated genomic regions and simultaneously detects point mutations, insertions/deletions, and rearrangements. CAPP-Seq for non-small cell lung cancer has been demonstrated herein with a design that identified mutations in >95% of tumors. CAPP-Seq accurately quantified circulating tumor DNA from early and advanced stage tumors and identified mutant alleles down to 0.025% with a detection limit of <0.01%. Tumor-derived DNA levels paralleled clinical responses to diverse therapies and CAPP-Seq identified actionable mutations in plasma. Moreover, CAPP-Seq identified significant co-occurrence of ROS1 translocations with U2AF1 splicing factor mutations. Finally, the utility of CAPP-Seq for cancer screening is also described. CAPP-Seq can be routinely applied to noninvasively detect and monitor tumors, thus facilitating personalized cancer therapy.

Methods for Creating Libraries

According to one aspect of the invention, methods for creating a library of recurrently mutated genomic regions are provided. The methods comprise the step of identifying a plurality of genomic regions from a group of genomic regions that are recurrently mutated in a specific cancer, wherein the library comprises the plurality of genomic regions, the plurality of genomic regions comprises at least 10 different genomic regions, and at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

It should be understood that the term “library” represents a compilation or collection of individual components. Thus, a library of recurrently mutated genomic regions is a compilation or collection of recurrently mutated genomic regions. The libraries of the instant disclosure are useful because they include a large number of potentially mutated genomic regions within a minimal length of genomic sequence. Use of these libraries to identify genetic alternations in specific patient samples is particularly advantageous because the libraries do not need to be optimized on a patient-by-patient basis.

The libraries created according to the instant methods comprise genomic regions that are recurrently mutated in a specific cancer. The identification of these recurrent mutations benefits greatly from the availability of databases such as, for example, The Cancer Genome Atlas (TCGA) and its subsets (http://cancergenome.nih.gov/). Such databases serve as the starting point for identifying the recurrently mutated genomic regions of the instant libraries. The databases also provide a sample of mutations occurring within a given percentage of subjects with a specific cancer.

The libraries created according to the instant methods comprise a plurality of genomic regions, wherein the plurality of genomic regions comprises at least 10 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, at least 500, or even more different genomic regions.

It should be understood that the inclusion of larger numbers of genomic regions generally increases the likelihood that a unique mutation will be identified to distinguish tumor nucleic acid in a subject from the subject's genomic nucleic acid. Including too many genomic regions in the library is not without a cost, however, since the number of genomic regions is directly related to the length of nucleic acids that must be sequenced in the analysis. At the extreme, the entire genome of a tumor sample and a genomic sample could be sequenced, and the resulting sequences could be compared to note any differences. Such a brute force approach is not possible, however, with the vanishingly small quantities of tumor nucleic acid present in a cell-free sample.

The libraries of the instant disclosure address this problem by identifying genomic regions that are recurrently mutated in a particular cancer, and then ranking those regions to maximize the likelihood that the region will include a distinguishing genetic alteration in a particular tumor. The library of recurrently mutated genomic regions, or “selectors”, can be used across an entire population for a given cancer, and does not need to be optimized for each subject.

The term “mutation”, as used herein, refers to a genetic alteration in the genome of an organism, specifically to a change in the nucleotide sequence of the organism. Examples of mutations include point mutations, where a single nucleotide is changed in the genome, and larger-scale changes in the genome, such as rearrangements, insertions, deletions, and amplifications. A recurrent mutation is a mutation that has been identified in more than one individual.

The terms “patient” and “subject” are used interchangeably. These are typically individuals that suffer from the cancer of interest. While the individuals are typically human individuals, the methods and systems of the instant disclosure could also be applied to other species, in particular, to other animal species, for example, livestock animals and pets.

The libraries of recurrently mutated genomic regions disclosed herein are created for a given type of cancer using one or more of the following design phases:

Phase 1: Identify known “driver” genes, i.e., genes that are known to be mutated frequently in the particular cancer.
Phase 2: Maximize patient coverage by selecting genomic regions that contain recurrent mutations in multiple subjects with the particular cancer and ranking those selections to maximize the number of patients identified by mutations in those regions.
Phases 3 and 4: Further ranking of genomic regions containing recurrent mutations by maximizing the “recurrence index”.
Phase 5: Add genomic regions from genes predicted to harbor “driver” mutations in the particular cancer.
Phase 6: Add genomic regions covering fusions and their flanking regions.

It should be understood, however, that the above-described phases of selector design are independent of one another and may be applied separately or in a different order within the methods of library creating and still achieve the desired result.

Application of the above approaches for recurrently mutated genomic regions in non-small cell lung cancer results in the library shown in Table 1. All genomic regions included in the selector, along with their corresponding HUGO gene symbols and genomic coordinates, as well as patient statistics for NSCLC and a variety of other cancers, are shown, organized by selector design phase. The percentage of coverage of NSCLC patients as the Table 1 library was developed is shown in FIG. 1(b). Also shown in the bottom panel of this figure is the cumulative length of genomic regions (in kb) as the library is created according to the above phasing. The three curves in the top panel show percentage coverage of patients with at least one distinguishing mutation between tumor and genomic sequences (≧1 SNVs), at least two distinguishing mutations between tumor and genomic sequences (≧2 SNVs), and at least three distinguishing mutations between tumor and genomic sequences (≧3 SNVs). As is apparent from these graphs, the library created according to the instant methods identifies genomic regions that are highly likely to include identifiable mutations in tumor sequences. This library includes a relatively small total number of genomic regions and thus a relatively short cumulative length of genomic regions and yet provides a high overall coverage of likely mutations in a population. The library does not, therefore, need to be optimized on a patient-by-patient basis. The relatively short cumulative length of genomic regions also means that the analysis of cancer-derived cell-free DNA using these libraries is highly sensitive and allows the sequencing of this DNA to a great depth.

Accordingly, the libraries of recurrently mutated genomic regions created using the instant methods comprise a plurality of genomic regions that are recurrently mutated in a specific cancer, and the plurality of genomic regions comprises at least 10 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 25 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 50 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 100 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 150 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 200 different genomic regions. In some embodiments, the plurality of genomic regions comprises at least 500 different genomic regions or even more.

In some embodiments, the plurality of genomic regions comprises at most 5000 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 2000 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 1000 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 500 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 200 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 150 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 100 different genomic regions. In some embodiments, the plurality of genomic regions comprises at most 50 different genomic regions or even fewer.

Importantly, the libraries of recurrently mutated genomic regions created according to the instant methods enable the identification of patient- and tumor-specific mutations within the genomic regions in a high percentage of subjects. Specifically, in these libraries, at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer. In some embodiments, at least two mutations within the plurality of genomic regions are present in at least 60% of all subjects with the specific cancer. In specific embodiments, at least three mutations, or even more, within the plurality of genomic regions are present in at least 60% of all subjects with the specific cancer.

In some embodiments, in the libraries of recurrently mutated genomic regions created according to these methods, at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, 99.9% or even higher percentages of all subjects with the specific cancer.

In specific embodiments, at least two mutations within the plurality of genomic regions are present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, 99.9% or even higher percentages of all subjects with the specific cancer.

In more specific embodiments, at least three mutations, or even more, within the plurality of genomic regions are present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, 99.9% or even higher percentages of all subjects with the specific cancer.

As previously noted, the cumulative length of genomic regions in the libraries of recurrently mutated genomic regions created according to the instant methods are relatively short, thus minimizing sequencing costs associated with the analytical methods relying on these libraries and maximizing their sensitivity. In some embodiments, the cumulative length of genomic regions is at most 30 megabases (Mb). In some embodiments, the cumulative length of genomic regions is at most 20 Mb, 10 Mb, 5 Mb, 2 Mb, or 1 Mb. In some embodiments, the cumulative length of genomic regions is at most 500 kilobases (kb), 200 kb, 100 kb, 50 kb, 20 kb, 10 kb, or even fewer.

In some embodiments, the library of recurrently mutated genomic regions created according to the instant methods comprises the genomic regions displayed in Table 1, or a subset of those genomic regions.

The instant methods include the step of identifying a plurality of genomic regions from a group of genomic regions that are recurrently mutated in a specific cancer. As noted elsewhere, the libraries are particularly useful in methods for analyzing cancer-specific gene alterations in solid tumors, because those alterations can be detected in cell-free nucleic acids present in blood samples. Accordingly, the libraries created according to these methods include genomic regions that are recurrently mutated in a solid tumor. In some embodiments, the solid tumor is a carcinoma. In specific embodiments, the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma. The methods are also applicable to genomic regions that are recurrently mutated in other cancers, however. Specifically, the other cancer may be, for example, a sarcoma, a leukemia, a lymphoma, or a myeloma.

Systems

The methods for creating a library of recurrently mutated genomic regions, as disclosed herein, are typically implemented by a programmed computer system. Therefore, according to another aspect, the instant disclosure provides computer systems for creating a library of recurrently mutated genomic regions. Such systems comprise at least one processor and a non-transitory computer-readable medium storing computer-executable instructions that, when executed by the at least one processor, cause the computer system to carry out the above-described methods for creating a library.

Methods for Analyzing Genetic Alterations

The libraries created according to the above-described methods are useful in the analysis of genetic alterations, particularly in comparing tumor and genomic sequences in a patient with cancer. As shown in FIG. 2, a tissue biopsy sample from the patient may be used to discover mutations in the tumor by sequencing the genomic regions of the selector library in tumor and genomic nucleic acid samples and comparing the results. Because the selector libraries are designed to identify mutations in tumors from a large percentage of all patients, it is not necessary to optimize the library for each patient.

Accordingly, in this aspect of the invention, methods are provided for analyzing a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer;

sequencing a plurality of target regions in the tumor nucleic acid sample and in the genomic nucleic acid sample to obtain a plurality of tumor nucleic acid sequences and a plurality of genomic nucleic acid sequences; and

comparing the plurality of tumor nucleic acid sequences to the plurality of genomic nucleic acid sequences to identify a patient-specific genetic alteration in the tumor nucleic acid sample.

In these methods, the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer; the plurality of genomic regions comprises at least 10 different genomic regions; and at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer. More specifically, the plurality of target regions may correspond to the plurality of genomic regions found in the libraries of recurrently mutated genomic regions created using the above-described methods. In other words, in various embodiments, the number of different genomic regions in the plurality of genomic regions, the number of mutations within the plurality of genomic regions that are present in a specific percentage of all subjects with the specific cancer, the percentage of all subjects with the specific cancer with at least one mutation within the plurality of genomic regions, the specific composition of the plurality of genomic regions, the types of cancer, and the cumulative length of the plurality of genomic regions have the values disclosed above for the methods of creating a library.

In some embodiments, the plurality of target regions used in the methods for analyzing a cancer-specific genetic alteration in a subject corresponds to the library of recurrently mutated genomic regions displayed in Table 1, or a subset of those genomic regions.

It should be understood that the step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may occur in a single step or in separate steps. For example, it may be possible to obtain a single tissue sample from a patient, for example from a biopsy sample, that includes both tumor nucleic acids and genomic nucleic acids. It is also within the scope of this step to obtain the tumor nucleic acid sample and the genomic nucleic acid sample from the subject in separate samples, in separate tissues, or even at separate times.

The step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may also include the process of extracting a biological fluid or tissue sample from the subject with the specific cancer. These particular steps are well understood by those of ordinary skill in the medical arts, particularly by those working in the medical laboratory arts.

The step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may additionally include procedures to improve the yield or recovery of the nucleic acids in the sample. For example, the step may include laboratory procedures to separate the nucleic acids from other cellular components and contaminants that may be present in the biological fluid or tissue sample. As noted, such steps may improve the yield and/or may facilitate the sequencing reactions.

It should also be understood that the step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer may be performed by a commercial laboratory that does not even have direct contact with the subject. For example, the commercial laboratory may obtain the nucleic acid samples from a hospital or other clinical facility where, for example, a biopsy or other procedure is performed to obtain tissue from a subject. The commercial laboratory may thus carry out all the steps of the instantly-disclosed methods at the request of, or under the instructions of, the facility where the subject is being treated or diagnosed.

Methods for Screening

The methods of the instant invention may also be applied to the detection of cancer in a patient, where there is no prior knowledge of the presence of a tumor in the patient. Accordingly, in this aspect of the invention are provided methods for screening a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a cell-free nucleic acid sample from a subject;

sequencing a plurality of target regions in the cell-free sample to obtain a plurality of cell-free nucleic acid sequences; and

identifying a cancer-specific genetic alteration in the cell-free sample.

In these methods, the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer. In some embodiments, the plurality of genomic regions comprises at least 10 different genomic regions, and at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer. More specifically, the plurality of target regions may correspond to the plurality of genomic regions found in the libraries of recurrently mutated genomic regions created using the above-described methods. In other words, in various embodiments, the number of different genomic regions in the plurality of genomic regions, the number of mutations within the plurality of genomic regions that are present in a specific percentage of all subjects with the specific cancer, the percentage of all subjects with the specific cancer with at least one mutation within the plurality of genomic regions, the specific composition of the plurality of genomic regions, the types of cancer, and the cumulative length of the plurality of genomic regions have the values disclosed above for the methods of creating a library.

In some embodiments, the plurality of target regions used in the methods for screening a cancer-specific genetic alteration in a subject corresponds to the library of recurrently mutated genomic regions displayed in Table 1, or a subset of those genomic regions.

It will be readily apparent to one of ordinary skill in the relevant arts that other suitable modifications and adaptations to the methods and applications described herein may be made without departing from the scope of the invention or any embodiment thereof. Having now described the present invention in detail, the same will be more clearly understood by reference to the following Examples, which are included herewith for purposes of illustration only and are not intended to be limiting of the invention.

Examples Noninvasive and Ultrasensitive Quantitation of Circulating Tumor DNA by Hybrid Capture and Deep Sequencing

To overcome the limitations of prior methods, an ultrasensitive and specific strategy for analysis of cancer-derived cfDNA (CAncer Personalized Profiling by Deep Sequencing (CAPP-Seq)) that can simultaneously detect single nucleotide variants (SNVs), insertions/deletions (indels), and rearrangements, without the need for patient-specific optimization has been developed. CAPP-Seq employs an adaptable “selector” to enrich recurrently mutated regions in the cancer of interest using a custom library of biotinylated DNA oligonucleotides (Ng et al. (2010) Nat. Genetics 42:30-35). To use CAPP-Seq for monitoring circulating tumor DNA, this selector is typically applied first to matched tumor and normal genomic DNA to identify a patient's cancer-specific genetic aberrations and then directly to cfDNA in order to quantify these mutations (FIG. 1a and FIG. 2).

The design of an NSCLC CAPP-Seq selector is shown in FIG. 1(b). Phase 1: Genomic regions harboring known/suspected driver mutations in NSCLC. Phases 2-4: Addition of exons containing recurrent SNVs using WES data from lung adenocarcinomas and squamous cell carcinomas from TCGA (N=407). Regions were selected iteratively to maximize the number of mutations per tumor while minimizing selector size. Recurrence index=total unique patients with mutations covered per kb of exon. Phases 5-6: Exons of predicted NSCLC drivers (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181) and introns/exons harboring breakpoints in rearrangements involving ALK, ROS1, and RET were added. Bottom: increase of selector length during each design phase. FIG. 1(c) shows an analysis of the number of SNVs per lung adenocarcinoma covered by the NSCLC CAPP-Seq selector in the TCGA WES cohort (Training; N=229) and an independent lung adenocarcinoma WES data set (Validation; N=183) (Imielinski et al. (2012) Cell 150:1107-1120). Results are compared to selectors randomly sampled from the exome (P<1.0×10⁻⁶) for the difference between random selectors and the NSCLC CAPP-Seq selector). FIG. 1(d) shows the number of SNVs per patient identified by the NSCLC CAPP-Seq selector in WES data from three adenocarcinomas from TCGA, colon (COAD), rectal (READ), and endometrioid (UCEC) cancers. FIGS. 1(e) and (f) show quality parameters from a representative CAPP-Seq analysis of plasma cfDNA, including length distribution of sequenced cfDNA fragments 1(e), and depth of sequencing coverage across all genomic regions in the selector 1(f). FIG. 1(g) illustrates the variation in sequencing depth across cfDNA samples from 4 patients. The envelope above and below the solid line represents s.e.m. FIG. 2 illustrates the CAPP-Seq computational pipeline. See Detailed Methods section for details.

For the initial implementation of CAPP-Seq we focused on NSCLC, although our approach is generalizable to any cancer for which a comprehensive list of recurrent mutations has been identified. We employed a multi-phase approach to design a NSCLC-specific selector, aiming to identify genomic regions recurrently mutated in this disease (FIG. 1b, Table 1, and Methods). We began by including exons covering recurrent mutations in potential driver genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Forbes et al. (2010) Nucleic Acids Res. 38:D652-657) as well as other sources (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181) (e.g. KRAS, EGFR, TP53). Next, using whole exome sequencing (WES) data from 407 NSCLC patients profiled by The Cancer Genome Atlas (TCGA), an iterative algorithm was applied to maximize the number of mutations per patient while minimizing selector size. The approach relied on a recurrence index that identified known driver mutations as well as uncharacterized genes that are frequently mutated and are therefore likely to be involved in NSCLC pathogenesis (FIG. 3 and Table 1).

TABLE 1 Recurrently mutated genomic regions in NSCLC. Coverage (unique LUAD Selector design Genomic region & SCC patients; n = 407) Regions Genes Length Start End Length Patients Patients No. patients Design phase covered covered (bp) Gene Chr (bp) (bp) (bp) covered gained per exon RI Known drivers 1 1 130 AKT1 chr14 105246424 105246553 130 1 1 1 7.7 Known drivers 2 2 250 BRAF chr7 140453074 140453192 120 9 8 8 66.7 Known drivers 3 2 369 BRAF chr7 140481375 140481493 119 16 7 7 58.8 Known drivers 4 3 677 CDKN2A chr9 21970900 21971207 308 46 30 30 97.4 Known drivers 5 3 1029 CDKN2A chr9 21974475 21974826 352 53 7 7 19.9 Known drivers 6 4 1258 CTNNB1 chr3 41266016 41266244 229 57 4 6 26.2 Known drivers 7 5 1382 EGFR chr7 55241613 55241736 124 58 1 3 24.2 Known drivers 8 5 1482 EGFR chr7 55242414 55242513 100 65 7 8 80.0 Known drivers 9 5 1669 EGFR chr7 55248985 55249171 187 69 4 5 26.7 Known drivers 10 5 1826 EGFR chr7 55259411 55259567 157 81 12 14 89.2 Known drivers 11 6 1926 ERBB2 chr17 37880164 37880263 100 81 0 0 0.0 Known drivers 12 6 2113 ERBB2 chr17 37880978 37881164 187 85 4 4 21.4 Known drivers 13 7 2293 HRAS chr11 533765 533944 180 87 2 3 16.7 Known drivers 14 7 2405 HRAS chr11 534211 534322 112 90 3 3 26.8 Known drivers 15 8 2583 KEAP1 chr19 10599867 10600044 178 93 3 3 16.9 Known drivers 16 8 2790 KEAP1 chr19 10600323 10600529 207 108 15 15 72.5 Known drivers 17 8 3477 KEAP1 chr19 10602252 10602938 687 128 20 25 36.4 Known drivers 18 8 4117 KEAP1 chr19 10610070 10610709 640 141 13 18 28.1 Known drivers 19 8 4285 KEAP1 chr19 10597327 10597494 168 143 2 2 11.9 Known drivers 20 9 4465 KRAS chr12 25380167 25380346 180 147 4 4 22.2 Known drivers 21 9 4577 KRAS chr12 25398207 25398318 112 191 44 56 500.0 Known drivers 22 10 4789 MEK1 chr15 66727364 66727575 212 191 0 0 0.0 Known drivers 23 11 4931 MET chr7 116411902 116412043 142 193 2 2 14.1 Known drivers 24 12 5199 NFE2L2 chr2 178098732 178098998 268 212 19 31 115.7 Known drivers 25 13 5417 NOTCH1 chr9 139396723 139396940 218 212 0 1 4.6 Known drivers 26 13 5850 NOTCH1 chr9 139399124 139399556 433 212 0 0 0.0 Known drivers 27 13 7339 NOTCH1 chr9 139390522 139392010 1489 214 2 3 2.0 Known drivers 28 13 7489 NOTCH1 chr9 139397633 139397782 150 214 0 0 0.0 Known drivers 29 14 7669 NRAS chr1 115256420 115256599 180 217 3 5 27.8 Known drivers 30 14 7781 NRAS chr1 115258670 115258781 112 217 0 0 0.0 Known drivers 31 15 7907 PIK3CA chr3 178935997 178936122 126 225 8 19 150.8 Known drivers 32 15 8179 PIK3CA chr3 178951881 178952152 272 228 3 4 14.7 Known drivers 33 16 8259 PTEN chr10 89624226 89624305 80 229 1 1 12.5 Known drivers 34 16 8345 PTEN chr10 89653781 89653866 86 229 0 0 0.0 Known drivers 35 16 8391 PTEN chr10 89685269 89685314 46 231 2 3 65.2 Known drivers 36 16 8436 PTEN chr10 89690802 89690846 45 231 0 0 0.0 Known drivers 37 16 8676 PTEN chr10 89692769 89693008 240 234 3 5 20.8 Known drivers 38 16 8819 PTEN chr10 89711874 89712016 143 235 1 3 21.0 Known drivers 39 16 8987 PTEN chr10 89717609 89717776 168 238 3 6 35.7 Known drivers 40 16 9213 PTEN chr10 89720650 89720875 226 239 1 3 13.3 Known drivers 41 17 9504 STK11 chr19 1206912 1207202 291 240 1 4 13.7 Known drivers 42 17 9589 STK11 chr19 1218415 1218498 85 241 1 2 23.5 Known drivers 43 17 9680 STK11 chr19 1219322 1219412 91 242 1 1 11.0 Known drivers 44 17 9814 STK11 chr19 1220371 1220504 134 242 0 4 29.9 Known drivers 45 17 9952 STK11 chr19 1220579 1220716 138 242 0 4 29.0 Known drivers 46 17 10081 STK11 chr19 1221211 1221339 129 242 0 4 31.0 Known drivers 47 17 10140 STK11 chr19 1221947 1222005 59 242 0 0 0.0 Known drivers 48 17 10329 STK11 chr19 1222983 1223171 189 242 0 0 0.0 Known drivers 49 17 10524 STK11 chr19 1226452 1226646 195 242 0 0 0.0 Known drivers 50 18 10662 TP53 chr17 7577018 7577155 138 264 22 56 405.8 Known drivers 51 18 10773 TP53 chr17 7577498 7577608 111 286 22 50 450.5 Known drivers 52 18 10887 TP53 chr17 7578176 7578286 114 300 14 39 342.1 Known drivers 53 18 11167 TP53 chr17 7579311 7579590 280 312 12 31 110.7 Known drivers 54 18 11352 TP53 chr17 7578370 7578554 185 340 28 68 367.6 Max coverage 55 19 11472 REG1B chr2 79313937 79314056 120 341 1 10 83.3 Max coverage 56 20 11527 TPTE chr21 10970008 10970062 55 343 2 4 72.7 Max coverage 57 21 11641 CSMD3 chr8 113246593 113246706 114 345 2 8 70.2 Max coverage 58 21 11749 TP53 chr17 7573926 7574033 108 348 3 9 83.3 Max coverage 59 22 11861 FAM135B chr8 139151228 139151339 112 350 2 8 71.4 Max coverage 60 23 11950 U2AF1 chr21 44524424 44524512 89 351 1 5 56.2 Max coverage 61 24 12084 THSD7A chr7 11501637 11501770 134 352 1 9 67.2 Max coverage 62 25 12257 MLL3 chr7 151962122 151962294 173 353 1 11 63.6 Max coverage 63 26 12339 EYA4 chr6 133849862 133849943 82 354 1 5 61.0 Max coverage 64 27 12505 HCN1 chr5 45267190 45267355 166 355 1 9 54.2 Max coverage 65 28 12590 AKR1B10 chr7 134222945 134223029 85 357 2 5 58.8 Max coverage 66 29 12692 SLC6A5 chr11 20668379 20668480 102 358 1 5 49.0 Max coverage 67 30 12801 DPP10 chr2 116525872 116525980 109 360 2 6 55.0 Max coverage 68 31 12894 SCN7A chr2 167327124 167327216 93 361 1 4 43.0 Max coverage 69 32 12988 SNTG1 chr8 51621445 51621538 94 362 1 5 53.2 Max coverage 70 33 13093 VPS13A chr9 79946925 79947029 105 363 1 5 47.6 Max coverage 71 34 13240 IL1RAPL1 chrX 29938065 29938211 147 364 1 7 47.6 Max coverage 72 35 13408 CTNNA2 chr2 80085138 80085305 168 365 1 8 47.6 Max coverage 73 35 13598 CSMD3 chr8 113323206 113323395 190 366 1 9 47.4 Max coverage 74 36 13705 FAM5C chr1 190203501 190203607 107 367 1 5 46.7 Max coverage 75 37 13813 CACNA1E chr1 181708282 161708389 108 368 1 4 37.0 Max coverage 76 38 14528 KRTAP5-5 chr11 1651070 1651784 715 371 3 31 43.4 Max coverage 77 39 14650 PDE1C chr7 31864480 31864601 122 372 1 5 41.0 Max coverage 78 40 14772 RYR2 chr1 237808626 237808747 122 373 1 5 41.0 Max coverage 79 41 14896 NRXN1 chr2 50733632 50733755 124 374 1 5 40.3 Max coverage 80 42 15021 COL19A1 chr6 70637800 70637924 125 375 1 5 40.0 Max coverage 81 42 15349 CSMD3 chr8 113697634 113697961 328 376 1 13 39.6 Max coverage 82 43 15551 LRP1B chr2 141665445 141665646 202 377 1 7 34.7 Max coverage 83 44 15709 GKN2 chr2 69173435 69173592 158 378 1 6 38.0 Max coverage 84 45 16031 CD5L chr1 157805624 157805945 322 379 1 12 37.3 Max coverage 85 46 16250 SPTA1 chr1 158627266 158627484 219 380 1 8 36.5 Max coverage 86 47 16392 DHX9 chr1 182812428 182812569 142 381 1 5 35.2 Max coverage 87 48 16535 ADAMTS20 chr12 43858393 43858535 143 382 1 5 35.0 Max coverage 88 49 16707 NLRP4 chr19 56382192 56382363 172 382 0 6 34.9 Max coverage 89 50 17199 CDH18 chr5 19473334 19473825 492 384 2 17 34.6 Max coverage 90 51 17344 MYH2 chr17 10450791 10450935 145 386 2 5 34.5 RI ≧ 30 91 52 18281 OR5L2 chr11 55594694 55595630 937 386 0 30 32.0 RI ≧ 30 92 53 19317 OR4A15 chr11 55135359 55136394 1036 386 0 32 30.9 RI ≧ 30 93 54 20245 OR6F1 chr1 247875130 247876057 928 386 0 26 28.0 RI ≧ 30 94 55 21176 OR4C6 chr11 55432642 55433572 931 387 1 27 29.0 RI ≧ 30 95 56 22224 OR2T4 chr1 248524882 248525929 1048 387 0 33 31.5 RI ≧ 30 96 56 23342 FAM5C chr1 190067147 190068264 1118 387 0 35 31.3 RI ≧ 30 97 57 23598 PSG2 chr19 43575851 43576106 256 387 0 9 35.2 RI ≧ 30 98 58 23797 ITM2A chrX 78618438 78618636 199 387 0 6 30.2 RI ≧ 30 99 59 24062 TNN chr1 175092535 175092799 265 387 0 12 45.3 RI ≧ 30 100 60 24206 GATA3 chr10 8105958 8106101 144 387 0 3 20.8 RI ≧ 30 101 60 24369 HCN1 chr5 45461947 45462109 183 387 0 5 30.7 RI ≧ 30 102 61 24503 OCA2 chr15 28211835 28211968 134 387 0 6 44.8 RI ≧ 30 103 61 24686 CTNNA2 chr2 80816428 80816610 183 387 0 5 27.3 RI ≧ 30 104 62 24863 CNTN5 chr11 99715818 99715994 177 387 0 5 33.9 RI ≧ 30 105 63 25755 POM121L12 chr7 53103364 53104255 892 387 0 28 31.4 RI ≧ 30 106 64 25945 LRRC7 chr1 70225887 70226076 190 387 0 5 26.3 RI ≧ 30 107 65 26165 CNTNAP5 chr2 125530375 125530594 220 387 0 8 36.4 RI ≧ 30 108 66 26313 SLC4A10 chr2 162751188 162751335 148 387 0 5 33.8 RI ≧ 30 109 67 26412 SETD2 chr3 47142947 47143045 99 387 0 3 30.3 RI ≧ 30 110 68 26744 GFRAL chr6 55216050 55216381 332 387 0 10 30.1 RI ≧ 30 111 69 26837 SORCS3 chr10 106927015 106927107 93 388 1 3 32.3 RI ≧ 30 112 70 27359 POTEG chr14 19553416 19553937 522 388 0 17 32.6 RI ≧ 30 113 71 27489 F9 chrX 138630521 138630650 130 389 1 4 30.8 RI ≧ 30 114 72 27583 SLC26A3 chr7 107416896 107416989 94 389 0 2 21.3 RI ≧ 30 115 73 27753 UNC5D chr8 35806044 35606213 170 389 0 5 29.4 RI ≧ 30 116 74 27860 PDE4DIP chr1 144882775 144882881 107 389 0 4 37.4 RI ≧ 30 117 75 27943 MRPL1 chr4 78870950 78871032 83 389 0 4 48.2 RI ≧ 30 118 76 28013 COL25A1 chr4 109784474 109784543 70 389 0 3 42.9 RI ≧ 30 119 76 28161 SPTA1 chr1 158650372 158650519 148 389 0 5 33.8 RI ≧ 30 120 77 28309 TNR chr1 175331798 175331945 148 369 0 5 33.8 RI ≧ 30 121 78 28491 GALNT13 chr2 155157921 155158102 182 389 0 6 33.0 RI ≧ 30 122 79 28618 EIF3E chr8 109241298 109241424 127 389 0 5 39.4 RI ≧ 30 123 80 28691 SLC5A1 chr22 32445929 32446001 73 389 0 4 54.8 RI ≧ 30 124 81 28757 COASY chr17 40717000 40717065 66 389 0 3 45.5 RI ≧ 30 125 82 28930 TBX15 chr1 119467268 119467440 173 389 0 7 40.5 RI ≧ 30 126 83 29099 PYHIN1 chr1 158908869 158909037 169 389 0 6 35.5 RI ≧ 30 127 84 29164 PSG5 chr19 43690493 43690557 65 389 0 3 46.2 RI ≧ 30 128 85 29262 BTRC chr10 103290993 103291090 98 389 0 2 20.4 RI ≧ 30 129 86 29394 MDGA2 chr14 47324226 47324357 132 389 0 4 30.3 RI ≧ 30 130 87 29454 GUCY1A3 chr4 156629387 156629446 60 389 0 2 33.3 RI ≧ 30 131 88 29570 HGF chr7 81386504 81386619 116 389 0 4 34.5 RI ≧ 30 132 89 29656 TIMD4 chr5 156346467 156346552 86 389 0 3 34.9 RI ≧ 30 133 90 29844 AK5 chr1 77752625 77752812 188 389 0 6 31.9 RI ≧ 30 134 91 30077 ODZ3 chr4 183245173 183245405 233 389 0 7 30.0 RI ≧ 30 135 92 30177 COL5A2 chr2 189927897 189927996 100 389 0 3 30.0 RI ≧ 30 136 93 30299 NTM chr11 132180005 132180126 122 389 0 4 32.8 RI ≧ 30 137 94 30426 LTBP1 chr2 33500031 33500157 127 389 0 5 39.4 RI ≧ 30 138 95 30587 PRSS1 chr7 142458405 142458565 161 389 0 5 31.1 RI ≧ 30 139 95 30794 CDKN2A chr9 21971001 21971207 207 389 0 26 125.6 RI ≧ 30 140 96 30922 CNGB3 chr8 87738758 87738885 128 389 0 4 31.3 RI ≧ 30 141 97 31049 SI chr3 164777689 164777815 127 389 0 4 31.5 RI ≧ 30 142 97 31135 SI chr3 164767578 164767663 86 389 0 4 46.5 RI ≧ 30 143 98 31320 TMEM132D chr12 129822176 129822362 185 389 0 6 32.4 RI ≧ 30 144 99 31429 ASTN1 chr1 176998769 176998877 109 389 0 3 27.5 RI ≧ 30 145 100 31571 SAGE1 chrX 134987410 134987551 142 389 0 6 42.3 RI ≧ 30 146 100 31709 THSD7A chr7 11464322 11464459 138 389 0 5 36.2 RI ≧ 30 147 101 31907 ADAMTS12 chr5 33683963 33684160 198 389 0 6 30.3 RI ≧ 30 148 101 32090 NRXN1 chr2 50463926 50464108 183 389 0 8 43.7 RI ≧ 30 149 101 32294 CSMD3 chr8 113562899 113563102 204 389 0 7 34.3 RI ≧ 30 150 101 32414 CSMD3 chr8 113364644 113364763 120 389 0 5 41.7 RI ≧ 30 151 102 32504 EPB41L4B chr9 112018415 112018504 90 389 0 2 22.2 RI ≧ 30 152 103 32687 POLR3B chr12 106820974 106821136 163 389 0 4 24.5 RI ≧ 30 153 104 32873 ATP10B chr5 160097469 180097674 208 389 0 7 34.0 RI ≧ 30 154 105 33001 CSMD1 chr8 3165216 3165343 128 389 0 4 31.3 RI ≧ 30 155 106 33164 FBN2 chr5 127648325 127648487 163 389 0 5 30.7 RI ≧ 30 156 107 33252 EXOC5 chr14 57684699 57684786 88 389 0 2 22.7 RI ≧ 30 157 108 33315 ANKRD30A chr10 37440987 37441049 63 389 0 3 47.6 RI ≧ 30 158 109 33414 TRIML1 chr4 189065189 189065287 99 389 0 4 40.4 RI ≧ 30 159 109 33538 SPTA1 chr1 158631076 158631199 124 389 0 4 32.3 RI ≧ 30 160 110 33699 POLDIP2 chr17 26684313 26684473 161 389 0 5 31.1 RI ≧ 30 161 111 33863 KLHL1 chr13 70314525 70314688 164 389 0 5 30.5 RI ≧ 20 162 112 34454 TRIM58 chr1 248039201 248039791 591 389 0 14 23.7 RI ≧ 20 163 113 34563 GRIA3 chrX 122537262 122537370 109 389 0 3 27.5 RI ≧ 20 164 114 34777 CNOT4 chr7 135048605 135048818 214 389 0 5 23.4 RI ≧ 20 165 115 34947 NAV3 chr12 78582388 78582557 170 389 0 4 23.5 RI ≧ 20 166 115 35975 NAV3 chr12 78400198 78401225 1028 389 0 22 21.4 RI ≧ 20 167 116 36354 TRPC5 chrX 111195270 111195648 379 389 0 8 21.1 RI ≧ 20 168 117 36480 LRRC2 chr3 46592956 46593081 126 389 0 3 23.8 RI ≧ 20 169 118 36726 ADAMTS16 chr5 5239793 5240038 246 389 0 6 24.4 RI ≧ 20 170 119 36869 ACER2 chr9 19424697 19424839 143 389 0 3 21.0 RI ≧ 20 171 120 37103 AMOT chrX 112024113 112024346 234 389 0 5 21.4 RI ≧ 20 172 121 37215 OBP2A chr9 138439716 138439827 112 389 0 3 26.8 Predicted drivers 173 122 38109 INHBA chr7 41729247 41730140 894 389 0 17 19.0 Predicted drivers 174 122 38498 INHBA chr7 41739584 41739972 389 389 0 3 7.7 Predicted drivers 175 123 38605 EPHA5 chr4 66189831 66189937 107 389 0 3 28.0 Predicted drivers 176 123 38762 EPHA5 chr4 66197690 66197846 157 389 0 2 12.7 Predicted drivers 177 123 38957 EPHA5 chr4 66201649 66201843 195 389 0 2 10.3 Predicted drivers 178 123 39108 EPHA5 chr4 66213771 66213921 151 389 0 3 19.9 Predicted drivers 179 123 39319 EPHA5 chr4 66217106 66217316 211 389 0 4 19.0 Predicted drivers 180 123 39420 EPHA5 chr4 66218740 66218840 101 389 0 2 19.8 Predicted drivers 181 123 39607 EPHA5 chr4 66230734 66230920 187 389 0 3 16.0 Predicted drivers 182 123 39734 EPHA5 chr4 66231649 66231775 127 389 0 3 23.6 Predicted drivers 183 123 39835 EPHA5 chr4 66233058 66233158 101 389 0 2 19.8 Predicted drivers 184 123 39936 EPHA5 chr4 66242698 66242798 101 389 0 0 0.0 Predicted drivers 185 123 40040 EPHA5 chr4 66270091 66270194 104 389 0 2 19.2 Predicted drivers 186 123 40201 EPHA5 chr4 66280001 66280161 161 389 0 1 6.2 Predicted drivers 187 123 40327 EPHA5 chr4 66286158 66286283 126 389 0 0 0.0 Predicted drivers 188 123 40664 EPHA5 chr4 66356094 66356430 337 389 0 5 14.8 Predicted drivers 189 123 40821 EPHA5 chr4 66361105 66361261 157 389 0 1 6.4 Predicted drivers 190 123 41486 EPHA5 chr4 66467358 86468022 665 389 0 6 9.0 Predicted drivers 191 123 41588 EPHA5 chr4 66509062 66509163 102 389 0 0 0.0 Predicted drivers 192 123 41770 EPHA5 chr4 66535279 66535460 182 389 0 1 5.5 Predicted drivers 193 124 41871 EPHA3 chr3 89156892 89156992 101 389 0 0 0.0 Predicted drivers 194 124 41973 EPHA3 chr3 89176340 89176441 102 389 0 2 19.6 Predicted drivers 195 124 42635 EPHA3 chr3 89259009 89259670 662 389 0 6 9.1 Predicted drivers 196 124 42792 EPHA3 chr3 89390065 89390221 157 389 0 4 25.5 Predicted drivers 197 124 43129 EPHA3 chr3 89390904 89391240 337 389 0 3 8.9 Predicted drivers 198 124 43255 EPHA3 chr3 89444986 89445111 126 389 0 2 15.9 Predicted drivers 199 124 43445 EPHA3 chr3 89448467 89448656 190 389 0 1 5.3 Predicted drivers 200 124 43549 EPHA3 chr3 89456418 89456521 104 389 0 0 0.0 Predicted drivers 201 124 43651 EPHA3 chr3 89457198 89457299 102 389 0 0 0.0 Predicted drivers 202 124 43778 EPHA3 chr3 89462290 89462416 127 389 0 3 23.6 Predicted drivers 203 124 43965 EPHA3 chr3 89468354 89468540 187 389 0 1 5.3 Predicted drivers 204 124 44066 EPHA3 chr3 89478236 89478336 101 389 0 0 0.0 Predicted drivers 205 124 44277 EPHA3 chr3 89480299 89480509 211 389 0 4 19.0 Predicted drivers 206 124 44428 EPHA3 chr3 89498374 89498524 151 389 0 1 6.6 Predicted drivers 207 124 44623 EPHA3 chr3 89499326 89499520 185 389 0 2 10.3 Predicted drivers 208 124 44780 EPHA3 chr3 89521613 89521769 157 389 0 3 19.1 Predicted drivers 209 124 44887 EPHA3 chr3 89528546 89528652 107 389 0 1 9.3 Predicted drivers 210 125 44989 PTPRD chr9 8317857 8317958 102 389 0 2 19.6 Predicted drivers 211 125 45126 PTPRD chr9 8319830 8319966 137 389 0 0 0.0 Predicted drivers 212 125 45282 PTPRD chr9 8331581 8331736 156 389 0 1 6.4 Predicted drivers 213 125 45409 PTPRD chr9 8338921 8339047 127 389 0 2 15.7 Predicted drivers 214 125 45537 PTPRD chr9 8340342 8340469 128 389 0 1 7.8 Predicted drivers 215 125 45717 PTPRD chr9 8341089 8341268 180 389 0 0 0.0 Predicted drivers 216 125 46004 PTPRD chr9 8341692 8341978 287 389 0 2 7.0 Predicted drivers 217 125 46160 PTPRD chr9 8375935 8376090 156 389 0 1 6.4 Predicted drivers 218 125 46281 PTPRD chr9 8376606 8376726 121 389 0 1 8.3 Predicted drivers 219 125 46458 PTPRD chr9 8389231 8389407 177 389 0 0 0.0 Predicted drivers 220 125 46583 PTPRD chr9 8404536 8404660 125 389 0 0 0.0 Predicted drivers 221 125 46684 PTPRD chr9 8436590 8436690 101 389 0 1 9.9 Predicted drivers 222 125 46785 PTPRD chr9 8437168 8437268 101 389 0 0 0.0 Predicted drivers 223 125 46899 PTPRD chr9 8449724 8449837 114 389 0 3 26.3 Predicted drivers 224 125 47001 PTPRD chr9 8454536 8454637 102 389 0 0 0.0 Predicted drivers 225 125 47163 PTPRD chr9 8460410 8460571 162 389 0 5 18.5 Predicted drivers 226 125 47374 PTPRD chr9 8465465 8465675 211 389 0 6 28.4 Predicted drivers 227 125 47476 PTPRD chr9 8470989 8471090 102 389 0 1 9.8 Predicted drivers 228 125 47737 PTPRD chr9 8484118 8484378 261 389 0 5 19.2 Predicted drivers 229 125 47839 PTPRD chr9 8485226 8485327 102 389 0 0 0.0 Predicted drivers 230 125 48428 PTPRD chr9 8485761 8436349 589 389 0 4 6.8 Predicted drivers 231 125 48547 PTPRD chr9 8492861 8492979 119 389 0 1 8.4 Predicted drivers 232 125 48649 PTPRD chr9 8497204 8497305 102 389 0 1 9.8 Predicted drivers 233 125 48844 PTPRD chr9 8499646 8499840 195 389 0 2 10.3 Predicted drivers 234 125 49151 PTPRD chr9 8500753 8501059 307 389 0 3 9.8 Predicted drivers 235 125 49297 PTPRD chr9 8504260 8504405 146 389 0 1 6.8 Predicted drivers 236 125 49432 PTPRD chr9 8507300 8507434 135 389 0 1 7.4 Predicted drivers 237 125 50015 PTPRD chr9 8517847 8518429 583 389 0 9 15.4 Predicted drivers 238 125 50286 PTPRD chr9 8521276 8521546 271 389 0 5 18.5 Predicted drivers 239 125 50387 PTPRD chr9 8523468 8523568 101 389 0 1 9.9 Predicted drivers 240 125 50499 PTPRD chr9 8524924 8525035 112 389 0 1 8.9 Predicted drivers 241 125 50600 PTPRD chr9 8526585 8526685 101 389 0 0 0.0 Predicted drivers 242 125 50702 PTPRD chr9 8527298 8527399 102 389 0 2 19.6 Predicted drivers 243 125 50892 PTPRD chr9 8528590 8528779 190 389 0 4 21.1 Predicted drivers 244 125 51035 PTPRD chr9 8633316 8633458 143 389 0 2 13.6 Predicted drivers 245 125 51182 PTPRD chr9 8636698 8636644 147 389 0 2 13.6 Predicted drivers 246 125 51283 PTPRD chr9 8733761 8733861 101 389 0 0 0.0 Predicted drivers 247 126 51507 KDR chr4 55946107 55946330 224 389 0 1 4.5 Predicted drivers 248 126 51608 KDR chr4 55948115 55948215 101 389 0 0 0.0 Predicted drivers 249 126 51709 KDR chr4 55948702 55948802 101 389 0 2 19.8 Predicted drivers 250 126 51862 KDR chr4 55953773 55953925 153 389 0 3 19.6 Predicted drivers 251 126 51969 KDR chr4 55955034 55955140 107 389 0 2 18.7 Predicted drivers 252 126 52070 KDR chr4 55955540 55955640 101 389 0 0 0.0 Predicted drivers 253 126 52183 KDR chr4 55955857 55955969 113 389 0 1 8.8 Predicted drivers 254 126 52307 KDR chr4 55956122 55956245 124 389 0 0 0.0 Predicted drivers 255 126 52408 KDR chr4 55958782 55958882 101 389 0 2 19.8 Predicted drivers 256 128 52563 KDR chr4 55960968 55961122 155 389 0 2 12.9 Predicted drivers 257 126 52665 KDR chr4 55961737 55961838 102 389 0 2 19.6 Predicted drivers 258 126 52780 KDR chr4 55962395 55962509 115 389 0 1 8.7 Predicted drivers 259 126 52886 KDR chr4 55963828 55963933 106 389 0 3 28.3 Predicted drivers 260 126 53023 KDR chr4 55964303 55964439 137 389 0 0 0.0 Predicted drivers 261 126 53131 KDR chr4 55964863 55964970 108 389 0 2 18.5 Predicted drivers 262 126 53264 KDR chr4 55968063 55968195 133 389 0 1 7.5 Predicted drivers 263 126 53412 KDR chr4 55968528 55968675 148 389 0 2 13.5 Predicted drivers 264 126 53755 KDR chr4 55970809 55971151 343 389 0 5 14.6 Predicted drivers 265 126 53865 KDR chr4 55971998 55972107 110 389 0 2 18.2 Predicted drivers 266 126 53990 KDR chr4 55972853 55972977 125 389 0 1 8.0 Predicted drivers 267 126 54148 KDR chr4 55973903 55974060 158 389 0 2 12.7 Predicted drivers 268 126 54313 KDR chr4 55976569 55976733 165 389 0 2 12.1 Predicted drivers 269 126 54429 KDR chr4 55976820 55976935 116 389 0 1 8.6 Predicted drivers 270 126 54608 KDR chr4 55979470 55979648 179 389 0 2 11.2 Predicted drivers 271 128 54749 KDR chr4 55980292 55980432 141 389 0 0 0.0 Predicted drivers 272 126 54919 KDR chr4 55981040 55981209 170 389 0 1 5.9 Predicted drivers 273 126 55051 KDR chr4 55981447 55981578 132 389 0 4 30.3 Predicted drivers 274 126 55249 KDR chr4 55984770 55984967 198 389 0 0 0.0 Predicted drivers 275 126 55350 KDR chr4 55987260 55987360 101 389 0 1 9.9 Predicted drivers 276 126 55452 KDR chr4 55991376 55991477 102 389 0 0 0.0 Predicted drivers 277 127 55639 NTRK3 chr15 88420165 88420351 187 389 0 0 0.0 Predicted drivers 278 127 55799 NTRK3 chr15 88423500 88423659 160 389 0 1 6.3 Predicted drivers 279 127 55900 NTRK3 chr15 88428895 88428995 101 389 0 0 0.0 Predicted drivers 280 127 56145 NTRK3 chr15 88472421 88472665 245 389 0 1 4.1 Predicted drivers 281 127 56319 NTRK3 chr15 88476242 88476415 174 389 0 4 23.0 Predicted drivers 282 127 56451 NTRK3 chr15 88483853 88483984 132 389 0 1 7.6 Predicted drivers 283 127 56571 NTRK3 chr15 88522575 88522694 120 389 0 0 0.0 Predicted drivers 284 127 56707 NTRK3 chr15 88524456 88524591 136 389 0 0 0.0 Predicted drivers 285 127 56897 NTRK3 chr15 88576087 88576276 190 389 0 2 10.5 Predicted drivers 286 127 57001 NTRK3 chr15 88669501 88669604 104 389 0 3 28.8 Predicted drivers 287 127 57103 NTRK3 chr15 88670374 88670475 102 389 0 0 0.0 Predicted drivers 288 127 57204 NTRK3 chr15 88671903 88672003 101 389 0 0 0.0 Predicted drivers 289 127 57502 NTRK3 chr15 88678331 88878628 298 389 0 7 23.5 Predicted drivers 290 127 57645 NTRK3 chr15 88679129 88679271 143 389 0 1 7.0 Predicted drivers 291 127 57789 NTRK3 chr15 88679697 88679840 144 389 0 2 13.9 Predicted drivers 292 127 57948 NTRK3 chr15 88680634 88680792 159 389 0 0 0.0 Predicted drivers 293 127 58050 NTRK3 chr15 88690549 88690650 102 389 0 0 0.0 Predicted drivers 294 127 58151 NTRK3 chr15 88726634 88726734 101 389 0 1 9.9 Predicted drivers 295 127 58253 NTRK3 chr15 88727442 88727543 102 389 0 1 9.8 Predicted drivers 296 126 58391 RB1 chr13 48878048 48878185 138 389 0 0 0.0 Predicted drivers 297 128 56519 RB1 chr13 48881415 48881542 128 389 0 3 23.4 Predicted drivers 298 128 58636 RB1 chr13 48916734 48916850 117 389 0 1 8.5 Predicted drivers 299 128 58757 RB1 chr13 48919215 48919335 121 389 0 1 8.3 Predicted drivers 300 128 58859 RB1 chr13 48921929 48922030 102 389 0 0 0.0 Predicted drivers 301 128 58960 RB1 chr13 48923075 48923175 101 389 0 0 0.0 Predicted drivers 302 128 59072 RB1 chr13 48934152 48934283 112 389 0 2 17.9 Predicted drivers 303 128 59216 RB1 chr13 48936950 48937093 144 389 0 0 0.0 Predicted drivers 304 128 59317 RB1 chr13 48939018 48939118 101 389 0 0 0.0 Predicted drivers 305 128 59428 RB1 chr13 48941629 48941739 111 389 0 3 27.0 Predicted drivers 306 128 59529 RB1 chr13 48942651 48942751 101 389 0 0 0.0 Predicted drivers 307 128 59630 RB1 chr13 48947534 48947634 101 389 0 2 19.8 Predicted drivers 308 128 59748 RB1 chr13 48951053 48951170 118 389 0 0 0.0 Predicted drivers 309 128 59850 RB1 chr13 48953707 48953808 102 389 0 2 19.6 Predicted drivers 310 128 59951 RB1 chr13 48954154 48954254 101 389 0 0 0.0 Predicted drivers 311 128 60053 RB1 chr13 48954288 48954389 102 389 0 1 9.8 Predicted drivers 312 128 60251 RB1 chr13 48955382 48955579 198 389 0 0 0.0 Predicted drivers 313 128 60371 RB1 chr13 49027128 49027247 120 389 0 0 0.0 Predicted drivers 314 128 60518 RB1 chr13 49030339 49030485 147 389 0 3 20.4 Predicted drivers 315 128 60665 RB1 chr13 49033823 49033969 147 389 0 1 6.8 Predicted drivers 316 128 60771 RB1 chr13 49037866 49037971 106 389 0 0 0.0 Predicted drivers 317 128 60886 RB1 chr13 49039133 49039247 115 389 0 1 8.7 Predicted drivers 318 128 61051 RB1 chr13 49039340 49039504 165 389 0 2 12.1 Predicted drivers 319 128 61153 RB1 chr13 49047460 49047561 102 389 0 0 0.0 Predicted drivers 320 128 61297 RB1 chr13 49050836 49050979 144 389 0 0 0.0 Predicted drivers 321 128 61398 RB1 chr13 49051465 49051565 101 389 0 0 0.0 Predicted drivers 322 128 61499 RB1 chr13 49054120 49054220 101 389 0 0 0.0 Predicted drivers 323 129 61946 ERBB4 chr2 212248339 212248785 447 389 0 3 6.7 Predicted drivers 324 129 62245 ERBB4 chr2 212251577 212251875 299 389 0 3 10.0 Predicted drivers 325 129 62346 ERBB4 chr2 212252643 212252743 101 389 0 0 0.0 Predicted drivers 326 129 62518 ERBB4 chr2 212285165 212285336 172 389 0 2 11.6 Predicted drivers 327 129 62619 ERBB4 chr2 212286730 212286830 101 389 0 1 9.9 Predicted drivers 328 129 62787 ERBB4 chr2 212288879 212289026 148 389 0 1 6.8 Predicted drivers 329 129 62868 ERBB4 chr2 212293120 212293220 101 389 0 0 0.0 Predicted drivers 330 129 63025 ERBB4 chr2 212295669 212295825 157 389 0 2 12.7 Predicted drivers 331 129 63212 ERBB4 chr2 212426627 212426813 187 389 0 1 5.3 Predicted drivers 332 129 63312 ERBB4 chr2 212483901 212484000 100 389 0 0 0.0 Predicted drivers 333 129 63436 ERBB4 chr2 212488646 212488769 124 389 0 0 0.0 Predicted drivers 334 129 63570 ERBB4 chr2 212495186 212495319 134 389 0 0 0.0 Predicted drivers 335 129 63672 ERBB4 chr2 212522465 212522566 102 389 0 2 19.6 Predicted drivers 336 129 63828 ERBB4 chr2 212530047 212530202 156 389 0 1 6.4 Predicted drivers 337 129 63929 ERBB4 chr2 212537885 212537985 101 389 0 1 9.9 Predicted drivers 338 129 64063 ERBB4 chr2 212543776 212543909 134 389 0 1 7.5 Predicted drivers 339 129 64264 ERBB4 chr2 212566691 212566891 201 389 0 2 10.0 Predicted drivers 340 129 64366 ERBB4 chr2 212568823 212568924 102 389 0 0 0.0 Predicted drivers 341 129 64467 ERBB4 chr2 212570029 212570129 101 389 0 1 9.8 Predicted drivers 342 129 64595 ERBB4 chr2 212576774 212576901 128 389 0 1 7.8 Predicted drivers 343 129 64710 ERBB4 chr2 212578259 212578373 115 389 0 1 8.7 Predicted drivers 344 129 64853 ERBB4 chr2 212587117 212587259 143 389 0 0 0.0 Predicted drivers 345 129 64973 ERBB4 chr2 212589800 212589919 120 389 0 2 16.7 Predicted drivers 348 129 65074 ERBB4 chr2 212615346 212615446 101 389 0 0 0.0 Predicted drivers 347 129 65210 ERBB4 chr2 212652749 212652884 136 389 0 1 7.4 Predicted drivers 348 129 65398 ERBB4 chr2 212812154 212812341 188 390 1 4 21.3 Predicted drivers 349 129 65551 ERBB4 chr2 212989476 212989628 153 390 0 2 13.1 Predicted drivers 350 129 65652 ERBB4 chr2 213403163 213403263 101 390 0 0 0.0 Predicted drivers 351 130 65754 NTRK1 chr1 156785575 156785676 102 390 0 0 0.0 Predicted drivers 352 130 65868 NTRK1 chr1 156811872 156811985 114 390 0 0 0.0 Predicted drivers 353 130 66061 NTRK1 chr1 156830726 156830938 213 390 0 0 0.0 Predicted drivers 354 130 66183 NTRK1 chr1 156834132 156834233 102 390 0 1 9.8 Predicted drivers 355 130 66284 NTRK1 chr1 156834505 156834605 101 390 0 0 0.0 Predicted drivers 356 130 66386 NTRK1 chr1 156836685 156836786 102 390 0 0 0.0 Predicted drivers 357 130 66533 NTRK1 chr1 156837895 156838041 147 390 0 1 6.8 Predicted drivers 358 130 66677 NTRK1 chr1 156838296 156838439 144 390 0 0 0.0 Predicted drivers 359 130 66811 NTRK1 chr1 156841414 156841547 134 390 0 0 0.0 Predicted drivers 360 130 67139 NTRK1 chr1 156843424 156843751 328 390 0 1 3.0 Predicted drivers 361 130 67240 NTRK1 chr1 156844133 156844233 101 390 0 0 0.0 Predicted drivers 362 130 67341 NTRK1 chr1 156844340 156844440 101 390 0 0 0.0 Predicted drivers 363 130 67445 NTRK1 chr1 156844697 156844800 104 390 0 0 0.0 Predicted drivers 364 130 67593 NTRK1 chr1 156845311 156845458 148 390 0 2 13.5 Predicted drivers 365 130 67725 NTRK1 chr1 156845871 156846002 132 390 0 3 22.7 Predicted drivers 366 130 67899 NTRK1 chr1 156846191 156846364 174 390 0 2 11.5 Predicted drivers 367 130 68141 NTRK1 chr1 156848913 156849154 242 390 0 4 16.5 Predicted drivers 368 130 68301 NTRK1 chr1 156849790 156849949 160 390 0 0 0.0 Predicted drivers 369 130 68488 NTRK1 chr1 156851248 156851434 187 390 0 0 0.0 Predicted drivers 370 131 68589 NF1 chr17 29422307 29422407 101 390 0 0 0.0 Predicted drivers 371 131 68734 NF1 chr17 29483000 29483144 145 390 0 0 0.0 Predicted drivers 372 131 68835 NF1 chr17 29486019 29486119 101 390 0 1 9.9 Predicted drivers 373 131 69027 NF1 chr17 29490203 29490394 192 390 0 1 5.2 Predicted drivers 374 131 89135 NF1 chr17 29496908 29497015 108 390 0 1 9.3 Predicted drivers 375 131 69236 NF1 chr17 29508423 29508523 101 390 0 0 0.0 Predicted drivers 376 131 69337 NF1 chr17 29508715 29508815 101 390 0 0 0.0 Predicted drivers 377 131 69496 NF1 chr17 29509525 29509683 159 390 0 1 6.3 Predicted drivers 378 131 69671 NF1 chr17 29527439 29527613 175 390 0 3 17.1 Predicted drivers 379 131 69795 NF1 chr17 29528054 29528177 124 390 0 0 0.0 Predicted drivers 380 131 69897 NF1 chr17 29528415 29528516 102 390 0 0 0.0 Predicted drivers 381 131 70030 NF1 chr17 29533257 29533389 133 390 0 0 0.0 Predicted drivers 382 131 70166 NF1 chr17 29541468 29541603 136 390 0 1 7.4 Predicted drivers 383 131 70281 NF1 chr17 29546022 29546136 115 390 0 1 8.7 Predicted drivers 384 131 70423 NF1 chr17 29548867 29549008 142 390 0 1 7.0 Predicted drivers 385 131 70548 NF1 chr17 29550461 29550585 125 390 0 0 0.0 Predicted drivers 386 131 70705 NF1 chr17 29552112 29552268 157 390 0 0 0.0 Predicted drivers 387 131 70956 NF1 chr17 29553452 29553702 251 390 0 1 4.0 Predicted drivers 386 131 71057 NF1 chr17 29554222 29554322 101 390 0 0 0.0 Predicted drivers 389 131 71158 NF1 chr17 29554532 29554632 101 390 0 1 9.9 Predicted drivers 390 131 71600 NF1 chr17 29556042 29556483 442 390 0 2 4.5 Predicted drivers 391 131 71741 NF1 chr17 29556852 29556992 141 390 0 1 7.1 Predicted drivers 392 131 71865 NF1 chr17 29557277 29557400 124 390 0 1 8.1 Predicted drivers 393 131 71966 NF1 chr17 29557851 29557951 101 390 0 0 0.0 Predicted drivers 394 131 72084 NF1 chr17 29559090 29559207 118 390 0 0 0.0 Predicted drivers 395 131 72267 NF1 chr17 29559717 29559899 183 390 0 2 10.9 Predicted drivers 396 131 72480 NF1 chr17 29560019 29560231 213 390 0 1 4.7 Predicted drivers 397 131 72643 NF1 chr17 29562628 29562790 163 390 0 2 12.3 Predicted drivers 398 131 72748 NF1 chr17 29562935 29563039 105 390 0 0 0.0 Predicted drivers 399 131 72885 NF1 chr17 29576001 29576137 137 390 0 0 0.0 Predicted drivers 400 131 72987 NF1 chr17 29579936 29580037 102 390 0 0 0.0 Predicted drivers 401 131 73147 NF1 chr17 29585361 29585520 160 390 0 0 0.0 Predicted drivers 402 131 73248 NF1 chr17 29588048 29586148 101 390 0 1 9.9 Predicted drivers 403 131 73396 NF1 chr17 29587386 29587533 148 390 0 2 13.5 Predicted drivers 404 131 73544 NF1 chr17 29588728 29588875 148 390 0 0 0.0 Predicted drivers 405 131 73656 NF1 chr17 29592246 29592357 112 390 0 0 0.0 Predicted drivers 406 131 74090 NF1 chr17 29652837 29653270 434 390 0 2 4.6 Predicted drivers 407 131 74432 NF1 chr17 29654516 29654857 342 390 0 3 8.8 Predicted drivers 408 131 74636 NF1 chr17 29657313 29657516 204 390 0 2 9.8 Predicted drivers 409 131 74831 NF1 chr17 29661855 29662049 195 390 0 3 15.4 Predicted drivers 410 131 74973 NF1 chr17 29663350 29683491 142 390 0 2 14.1 Predicted drivers 411 131 75254 NF1 chr17 29663652 29663932 281 390 0 0 0.0 Predicted drivers 412 131 75470 NF1 chr17 29664385 29664600 216 390 0 1 4.6 Predicted drivers 413 131 75571 NF1 chr17 29664817 29664917 101 390 0 1 9.9 Predicted drivers 414 131 75687 NF1 chr17 29665042 29665157 116 390 0 0 0.0 Predicted drivers 415 131 75790 NF1 chr17 29665721 29665823 103 390 0 2 19.4 Predicted drivers 416 131 75932 NF1 chr17 29667522 29667663 142 390 0 1 7.0 Predicted drivers 417 131 76060 NF1 chr17 29670026 29670153 128 390 0 2 15.6 Predicted drivers 418 131 76193 NF1 chr17 29676137 29676269 133 390 0 2 15.0 Predicted drivers 419 131 76330 NF1 chr17 29677200 29677336 137 390 0 0 0.0 Predicted drivers 420 131 76489 NF1 chr17 29679274 29679432 159 390 0 2 12.6 Predicted drivers 421 131 76613 NF1 chr17 29683477 29683600 124 390 0 0 0.0 Predicted drivers 422 131 76745 NF1 chr17 29683977 29684108 132 390 0 1 7.6 Predicted drivers 423 131 76847 NF1 chr17 29684286 29684387 102 390 0 1 9.8 Predicted drivers 424 131 76991 NF1 chr17 29685497 29685640 144 390 0 1 6.9 Predicted drivers 425 131 77093 NF1 chr17 29685959 29686060 102 390 0 0 0.0 Predicted drivers 426 131 77311 NF1 chr17 29687504 29687721 216 390 0 0 0.0 Predicted drivers 427 131 77455 NF1 chr17 29701030 29701173 144 390 0 1 6.9 Predicted drivers 428 132 77621 APC chr5 112043414 112043579 166 390 0 0 0.0 Predicted drivers 429 132 77757 APC chr5 112090587 112090722 136 390 0 0 0.0 Predicted drivers 430 132 77859 APC chr5 112102014 112102115 102 390 0 1 9.8 Predicted drivers 431 132 78062 APC chr5 112102885 112103087 203 390 0 2 9.9 Predicted drivers 432 132 78172 APC chr5 112111325 112111434 110 390 0 1 9.1 Predicted drivers 433 132 78287 APC chr5 112116486 112116600 115 390 0 0 0.0 Predicted drivers 434 132 78388 APC chr5 112128134 112128234 101 390 0 0 0.0 Predicted drivers 435 132 78494 APC chr5 112136975 112137080 106 390 0 0 0.0 Predicted drivers 436 132 78594 APC chr5 112151191 112151290 100 390 0 0 0.0 Predicted drivers 437 132 78974 APC chr5 112154662 112155041 380 390 0 1 2.6 Predicted drivers 438 132 79075 APC chr5 112157590 112157690 101 390 0 0 0.0 Predicted drivers 439 132 79216 APC chr5 112162804 112162944 141 390 0 0 0.0 Predicted drivers 440 132 79317 APC chr5 112163614 112163714 101 390 0 0 0.0 Predicted drivers 441 132 79435 APC chr5 112164552 112164669 118 390 0 2 16.9 Predicted drivers 442 132 79651 APC chr5 112170647 112170862 216 390 0 0 0.0 Predicted drivers 443 132 86226 APC chr5 112173249 112179823 6575 391 1 23 3.5 Predicted drivers 444 133 86327 ATM chr11 108098337 108096437 101 391 0 0 0.0 Predicted drivers 445 133 86441 ATM chr11 108098502 108098615 114 391 0 1 8.8 Predicted drivers 446 133 86588 ATM chr11 108099904 108100050 147 391 0 0 0.0 Predicted drivers 447 133 86754 ATM chr11 108106396 108106561 168 391 0 0 0.0 Predicted drivers 448 133 86921 ATM chr11 108114679 108114845 167 391 0 0 0.0 Predicted drivers 449 133 87161 ATM chr11 108115514 108115753 240 391 0 1 4.2 Predicted drivers 450 133 87326 ATM chr11 108117690 108117854 165 391 0 0 0.0 Predicted drivers 451 133 87497 ATM chr11 108119659 108119829 171 391 0 1 5.8 Predicted drivers 452 133 87870 ATM chr11 108121427 108121799 373 391 0 0 0.0 Predicted drivers 453 133 88066 ATM chr11 108122563 108122758 196 391 0 0 0.0 Predicted drivers 454 133 88187 ATM chr11 108123541 108123641 101 391 0 1 9.9 Predicted drivers 455 133 88394 ATM chr11 108124540 108124766 227 391 0 0 0.0 Predicted drivers 456 133 88521 ATM chr11 108126941 108127067 127 391 0 1 7.9 Predicted drivers 457 133 88648 ATM chr11 108128207 108128333 127 391 0 0 0.0 Predicted drivers 458 133 88749 ATM chr11 108129707 108129807 101 391 0 0 0.0 Predicted drivers 459 133 88922 ATM chr11 108137897 108138069 173 391 0 1 5.8 Predicted drivers 460 133 89123 ATM chr11 108139136 108139336 201 391 0 0 0.0 Predicted drivers 461 133 89225 ATM chr11 108141781 108141882 102 391 0 0 0.0 Predicted drivers 462 133 89382 ATM chr11 108141977 108142133 157 391 0 0 0.0 Predicted drivers 463 133 89483 ATM chr11 108143246 108143346 101 391 0 0 0.0 Predicted drivers 464 133 89615 ATM chr11 108143448 108143579 132 391 0 1 7.6 Predicted drivers 465 133 89734 ATM chr11 108150217 108150335 119 391 0 0 0.0 Predicted drivers 466 133 89909 ATM chr11 108151721 108151895 175 391 0 0 0.0 Predicted drivers 467 133 90080 ATM chr11 108153436 108153606 171 391 0 2 11.7 Predicted drivers 468 133 90328 ATM chr11 108154953 108155200 248 391 0 1 4.0 Predicted drivers 469 133 90445 ATM chr11 108158326 108158442 117 391 0 0 0.0 Predicted drivers 470 133 90573 ATM chr11 108159703 108159830 128 391 0 1 7.8 Predicted drivers 471 133 90774 ATM chr11 108160328 108160528 201 391 0 1 5.0 Predicted drivers 472 133 90950 ATM chr11 108163345 108163520 176 391 0 0 0.0 Predicted drivers 473 133 91116 ATM chr11 108164039 108164204 166 391 0 0 0.0 Predicted drivers 474 133 91250 ATM chr11 108165653 108165786 134 391 0 0 0.0 Predicted drivers 475 133 91351 ATM chr11 108168011 108168111 101 391 0 1 9.9 Predicted drivers 476 133 91524 ATM chr11 108170440 108170612 173 391 0 1 5.8 Predicted drivers 477 133 91667 ATM chr11 108172374 108172516 143 391 0 0 0.0 Predicted drivers 478 133 91845 ATM chr11 108173579 108173756 178 391 0 0 0.0 Predicted drivers 479 133 92024 ATM chr11 108175401 108175579 179 391 0 2 11.2 Predicted drivers 480 133 92125 ATM chr11 108178617 108178717 101 391 0 0 0.0 Predicted drivers 481 133 92282 ATM chr11 108180886 108181042 157 391 0 0 0.0 Predicted drivers 482 133 92383 ATM chr11 108183131 108183231 101 391 0 1 9.9 Predicted drivers 483 133 92485 ATM chr11 108186543 108186644 102 391 0 0 0.0 Predicted drivers 484 133 92589 ATM chr11 108186737 108186840 104 391 0 1 9.6 Predicted drivers 485 133 92739 ATM chr11 108188099 108188248 150 391 0 0 0.0 Predicted drivers 486 133 92845 ATM chr11 108190680 108190785 106 391 0 0 0.0 Predicted drivers 487 133 92966 ATM chr11 108192027 108192147 121 391 0 0 0.0 Predicted drivers 488 133 93202 ATM chr11 108196036 108196271 236 391 0 1 4.2 Predicted drivers 489 133 93371 ATM chr11 108196784 108196952 169 391 0 0 0.0 Predicted drivers 490 133 93486 ATM chr11 108198371 108198485 115 391 0 0 0.0 Predicted drivers 491 133 93705 ATM chr11 108199747 108199965 218 391 0 1 4.6 Predicted drivers 492 133 93914 ATM chr11 108200940 108201148 209 391 0 0 0.0 Predicted drivers 493 133 94029 ATM chr11 108202170 108202284 115 391 0 0 0.0 Predicted drivers 494 133 94189 ATM chr11 108202605 108202764 160 391 0 0 0.0 Predicted drivers 495 133 94329 ATM chr11 106203488 108203627 140 391 0 0 0.0 Predicted drivers 496 133 94431 ATM chr11 108204603 108204704 102 391 0 1 9.8 Predicted drivers 497 133 94573 ATM chr11 108205695 108205836 142 391 0 3 21.1 Predicted drivers 498 133 94691 ATM chr11 108206571 108206688 118 391 0 1 8.5 Predicted drivers 499 133 94842 ATM chr11 108213948 108214098 151 391 0 0 0.0 Predicted drivers 500 133 95009 ATM chr11 108216469 108216635 167 391 0 0 0.0 Predicted drivers 501 133 95111 ATM chr11 108217998 108218099 102 391 0 1 9.8 Predicted drivers 502 133 95227 ATM chr11 108224492 108224607 116 391 0 1 8.6 Predicted drivers 503 133 95328 ATM chr11 108225519 108225619 101 391 0 0 0.0 Predicted drivers 504 133 95466 ATM chr11 108235808 108235945 138 391 0 1 7.2 Predicted drivers 505 133 95651 ATM chr11 108236051 108236235 185 391 0 2 10.8 Predicted drivers 506 134 95753 FGFR4 chr5 176516598 176516699 102 391 0 0 0.0 Predicted drivers 507 134 960718 FGFR4 chr5 176517390 176517654 265 391 0 1 3.8 Predicted drivers 508 134 96120 FGFR4 chr5 176517735 176517836 102 391 0 1 9.8 Predicted drivers 509 134 96288 FGFR4 chr5 176517938 176518105 168 391 0 0 0.0 Predicted drivers 510 134 96413 FGFR4 chr5 176518685 176518809 125 391 0 0 0.0 Predicted drivers 511 134 96605 FGFR4 chr5 176519321 176519512 192 391 0 0 0.0 Predicted drivers 512 134 96745 FGFR4 chr5 176519646 176519785 140 391 0 0 0.0 Predicted drivers 513 134 97160 FGFR4 chr5 176520138 176520552 415 391 0 2 4.8 Predicted drivers 514 134 97283 FGFR4 chr5 176520654 176520776 123 391 0 0 0.0 Predicted drivers 515 134 97395 FGFR4 chr5 176522330 176522441 112 391 0 1 8.9 Predicted drivers 516 134 97587 FGFR4 chr5 176522533 176522724 192 391 0 0 0.0 Predicted drivers 517 134 97711 FGFR4 chr5 176523057 176523180 124 391 0 0 0.0 Predicted drivers 518 134 97813 FGFR4 chr5 176523272 176523373 102 391 0 0 0.0 Predicted drivers 519 134 97952 FGFR4 chr5 176523604 176523742 139 391 0 0 0.0 Predicted drivers 520 134 98059 FGFR4 chr5 176524292 176524398 107 391 0 0 0.0 Predicted drivers 521 134 98210 FGFR4 chr5 176524527 176524677 151 391 0 0 0.0 Add fusions 522 135 100435 ALK chr2 29446207 29448431 2225 — — — — Add fusions 523 136 117908 ROS1 chr6 117641031 117658503 17473 — — — — Add fusions 524 137 123433 RET chr10 43606655 43612179 5525 — — — — Add fusions 525 138 123876 POGFRA chr4 55140698 55141140 443 — — — — Add fusions 526 139 125384 FGFR1 chr8 38275746 38277253 1508 — — — — Coverage (unique LUAD & SCC patients; n = 407) Coverage (all LUAD & SCC samples; n = 419) No. pa- % pa- % pa- % pa- No. No. sam- % sam- % sam- % sam- tients tients ≧1 tients ≧2 tients ≧3 Samples Samples samples ples ples ≧1 ples ≧2 ples ≧3 Design phase w/1 SNV SNV SNVs SNVs covered gained per exon RI w/1 SNV SNV SNVs SNVs Known drivers 1 0.25 0.00 0.00 1 1 1 7.7 1 0.24 0.00 0.00 Known drivers 9 2.21 0.00 0.00 11 10 10 83.3 11 2.63 0.00 0.00 Known drivers 16 3.93 0.00 0.00 18 7 7 58.8 18 4.30 0.00 0.00 Known drivers 46 11.30 0.00 0.00 48 30 30 97.4 48 11.46 0.00 0.00 Known drivers 53 13.02 0.00 0.00 55 7 7 19.9 55 13.13 0.00 0.00 Known drivers 55 14.00 0.49 0.00 59 4 6 26.2 57 14.08 0.48 0.00 Known drivers 54 14.25 0.98 0.00 60 1 3 24.2 56 14.32 0.95 0.00 Known drivers 60 15.97 1.23 0.00 67 7 8 80.0 62 15.99 1.19 0.00 Known drivers 64 16.95 1.23 0.25 71 4 5 26.7 66 16.95 1.19 0.24 Known drivers 74 19.90 1.72 0.25 84 13 15 95.5 77 20.05 1.67 0.24 Known drivers 74 19.90 1.72 0.25 84 0 0 0.0 77 20.05 1.67 0.24 Known drivers 78 20.88 1.72 0.25 88 4 4 21.4 81 21.00 1.67 0.24 Known drivers 79 21.38 1.87 0.25 90 2 3 16.7 82 21.48 1.91 0.24 Known drivers 82 22.11 1.97 0.25 93 3 3 26.8 85 22.20 1.91 0.24 Known drivers 85 22.85 1.97 0.25 96 3 3 16.3 88 22.91 1.91 0.24 Known drivers 100 26.54 1.97 0.25 111 15 15 72.5 103 26.49 1.91 0.24 Known drivers 117 31.45 2.70 0.74 131 20 25 36.4 120 31.26 2.63 0.72 Known drivers 126 34.64 3.69 0.98 145 14 19 29.7 130 34.81 3.58 0.95 Known drivers 128 35.14 3.69 0.98 147 2 2 11.9 132 35.08 3.58 0.95 Known drivers 132 36.12 3.69 0.98 151 4 4 22.2 136 36.04 3.58 0.95 Known drivers 164 46.93 6.63 0.98 196 45 57 508.9 169 46.78 6.44 0.95 Known drivers 164 46.93 6.63 0.98 196 0 0 0.0 169 46.78 6.44 0.95 Known drivers 166 47.42 6.63 0.98 198 2 2 14.1 171 47.26 6.44 0.95 Known drivers 174 52.09 9.34 0.98 217 19 31 115.7 179 51.79 9.07 0.95 Known drivers 173 52.09 9.58 0.98 217 0 1 4.6 178 51.79 9.31 0.95 Known drivers 173 52.09 9.58 0.98 217 0 0 0.0 178 51.79 9.31 0.95 Known drivers 174 52.58 9.83 0.98 219 2 3 2.0 179 52.27 9.55 0.95 Known drivers 174 52.58 9.83 0.98 219 0 0 0.0 179 52.27 9.55 0.95 Known drivers 175 53.32 10.32 0.98 222 3 5 27.8 180 52.98 10.02 0.95 Known drivers 175 53.32 10.32 0.98 222 0 0 0.0 180 52.98 10.02 0.95 Known drivers 174 55.28 12.53 1.47 230 8 19 150.8 179 54.89 12.17 1.43 Known drivers 176 56.02 12.78 1.47 233 3 4 14.7 181 55.61 12.41 1.43 Known drivers 177 56.27 12.78 1.47 234 1 1 12.5 182 55.85 12.41 1.43 Known drivers 177 56.27 12.78 1.47 234 0 0 0.0 182 55.85 12.41 1.43 Known drivers 178 56.76 13.02 1.47 236 2 3 65.2 183 56.32 12.65 1.43 Known drivers 178 56.76 13.02 1.47 236 0 0 0.0 183 56.32 12.65 1.43 Known drivers 179 57.49 13.51 1.47 239 3 5 20.8 184 57.04 13.13 1.43 Known drivers 179 57.74 13.76 1.72 240 1 3 21.0 184 57.28 13.37 1.67 Known drivers 179 58.48 14.50 1.72 243 3 6 35.7 184 58.00 14.08 1.67 Known drivers 179 58.72 14.74 1.97 244 1 3 13.3 184 58.23 14.32 1.91 Known drivers 179 58.97 14.99 2.46 245 1 4 13.7 184 58.47 14.56 2.39 Known drivers 179 59.21 15.23 2.46 246 1 2 23.5 184 58.71 14.80 2.39 Known drivers 180 59.46 15.23 2.46 247 1 1 11.0 185 58.95 14.80 2.39 Known drivers 177 59.46 15.97 2.70 247 0 4 29.9 182 58.95 15.51 2.63 Known drivers 174 59.46 16.71 2.95 247 0 4 29.0 179 58.95 16.23 2.86 Known drivers 171 59.46 17.44 3.19 247 0 4 31.0 176 58.95 16.95 3.10 Known drivers 171 59.46 17.44 3.19 247 0 0 0.0 178 58.95 16.95 3.10 Known drivers 171 59.46 17.44 3.19 247 0 0 0.0 176 58.95 16.95 3.10 Known drivers 171 59.46 17.44 3.19 247 0 0 0.0 176 58.95 16.95 3.10 Known drivers 168 64.86 23.59 5.16 269 22 58 420.3 171 64.20 23.39 5.01 Known drivers 167 70.27 29.24 6.14 292 23 51 459.5 171 69.69 28.88 5.97 Known drivers 164 73.71 33.42 8.11 306 14 39 342.1 168 73.03 32.94 7.88 Known drivers 164 76.66 36.36 9.58 319 13 32 114.3 169 76.13 35.80 9.31 Known drivers 167 83.54 42.51 12.04 347 28 69 373.0 171 82.62 42.00 11.69 Max coverage 163 83.78 43.73 12.78 349 2 11 91.7 168 83.29 43.20 12.41 Max coverage 165 84.28 43.73 13.02 352 3 5 90.9 171 84.01 43.20 12.65 Max coverage 164 84.77 44.47 13.76 354 2 10 87.7 169 84.49 44.15 13.60 Max coverage 164 85.50 45.21 14.50 357 3 9 83.3 169 85.20 44.87 14.32 Max coverage 162 86.00 46.19 14.99 360 3 9 80.4 168 85.92 45.82 14.80 Max coverage 163 86.24 46.19 15.72 362 2 6 67.4 170 86.40 45.82 15.51 Max coverage 161 86.49 46.93 16.46 363 1 9 67.2 168 86.63 46.54 16.23 Max coverage 160 86.73 47.42 17.69 364 1 11 63.6 167 86.37 47.02 17.42 Max coverage 161 86.98 47.42 18.43 365 1 5 61.0 168 87.11 47.02 18.14 Max coverage 161 87.22 47.67 19.16 366 1 10 60.2 168 87.35 47.26 18.85 Max coverage 163 87.71 47.67 19.66 368 2 5 58.8 170 87.83 47.26 19.33 Max coverage 163 87.96 47.91 20.15 369 1 6 58.8 170 88.07 47.49 20.05 Max coverage 164 88.45 48.16 20.39 371 2 6 55.0 171 88.54 47.73 20.29 Max coverage 164 88.70 48.40 20.64 372 1 5 53.8 170 88.78 48.21 20.53 Max coverage 163 88.94 48.89 20.64 373 1 5 53.2 169 89.02 48.69 20.53 Max coverage 162 89.19 49.39 20.88 374 1 5 47.6 168 89.26 49.16 20.76 Max coverage 161 89.43 49.88 21.87 375 1 7 47.6 167 89.50 49.64 21.72 Max coverage 161 89.68 50.12 22.85 376 1 8 47.6 167 89.74 49.88 22.67 Max coverage 160 89.93 50.61 23.83 377 1 9 47.4 166 89.98 50.36 23.63 Max coverage 159 90.17 51.11 24.32 378 1 5 46.7 165 90.21 50.84 24.11 Max coverage 158 90.42 51.60 24.57 379 1 5 46.3 163 90.45 51.55 24.34 Max coverage 152 91.15 53.81 26.78 382 3 32 44.8 157 91.17 53.70 26.73 Max coverage 153 91.40 53.81 27.03 383 1 5 41.0 158 91.41 53.70 28.97 Max coverage 153 91.65 54.05 27.03 384 1 5 41.0 158 91.85 53.94 26.97 Max coverage 152 91.89 54.55 27.52 385 1 5 40.3 157 91.89 54.42 27.45 Max coverage 152 92.14 54.79 28.01 386 1 5 40.0 157 92.12 54.65 27.92 Max coverage 151 92.38 55.28 28.99 387 1 13 39.6 156 92.36 55.13 28.88 Max coverage 150 92.63 55.77 29.48 388 1 8 39.6 155 92.60 55.61 29.59 Max coverage 149 92.87 56.27 29.98 389 1 6 38.0 154 92.84 56.09 30.07 Max coverage 147 93.12 57.00 30.96 390 1 12 37.3 152 93.08 56.80 31.03 Max coverage 144 93.37 57.99 30.96 391 1 8 36.5 149 93.32 57.76 31.03 Max coverage 143 93.61 58.48 31.20 392 1 5 35.2 148 93.56 58.23 31.26 Max coverage 144 93.86 58.48 31.20 393 1 5 35.0 149 93.79 58.23 31.26 Max coverage 143 93.86 58.72 31.94 394 1 6 34.9 150 94.03 58.23 31.98 Max coverage 140 94.35 59.95 32.68 396 2 17 34.6 147 94.51 59.43 32.70 Max coverage 142 94.84 59.95 32.92 398 2 5 34.5 149 94.99 59.43 32.94 RI ≧ 30 134 94.84 61.92 35.63 398 0 30 32.0 141 94.99 61.34 35.56 RI ≧ 30 126 94.84 63.88 37.59 398 0 34 32.8 133 94.99 63.25 37.71 RI ≧ 30 121 94.84 65.11 38.33 398 0 28 30.2 127 94.99 64.68 38.42 RI ≧ 30 117 95.09 66.34 39.80 399 1 28 30.1 123 95.23 65.87 39.86 RI ≧ 30 113 95.09 67.32 42.01 399 0 33 31.5 119 95.23 66.83 42.00 RI ≧ 30 109 95.09 68.30 43.24 399 0 36 32.2 115 95.23 67.78 43.20 RI ≧ 30 105 95.09 69.29 43.24 399 0 9 35.2 111 95.23 68.74 43.20 RI ≧ 30 102 95.09 70.02 43.49 399 0 6 30.2 108 95.23 69.45 43.44 RI ≧ 30 99 95.09 70.76 43.73 399 0 12 45.3 105 95.23 70.17 43.68 RI ≧ 30 97 95.09 71.25 43.73 399 0 5 34.7 102 95.23 70.88 43.68 RI ≧ 30 94 95.09 71.99 44.23 399 0 5 30.7 99 95.23 71.80 44.15 RI ≧ 30 91 95.09 72.73 44.23 399 0 7 52.2 96 95.23 72.32 44.15 RI ≧ 30 88 95.09 73.46 44.23 399 0 6 32.8 93 95.23 73.03 44.15 RI ≧ 30 85 95.09 74.20 44.23 399 0 6 33.9 90 95.23 73.75 44.15 RI ≧ 30 82 95.09 74.94 45.21 399 0 29 32.5 87 95.23 74.46 45.11 RI ≧ 30 80 95.09 75.43 45.45 399 0 6 31.6 84 95.23 75.18 45.35 RI ≧ 30 77 95.09 76.17 45.70 399 0 8 36.4 81 95.23 75.89 45.58 RI ≧ 30 75 95.09 76.66 45.70 399 0 5 33.8 79 95.23 76.37 45.58 RI ≧ 30 73 95.09 77.15 45.95 399 0 3 30.3 77 95.23 76.85 45.82 RI ≧ 30 71 95.09 77.64 45.95 399 0 11 33.1 75 95.23 77.33 45.82 RI ≧ 30 70 95.33 78.13 45.95 400 1 3 32.3 74 95.47 77.80 45.82 RI ≧ 30 68 95.33 78.62 47.17 400 0 17 32.6 72 95.47 78.28 47.02 RI ≧ 30 67 95.58 79.12 47.17 401 1 4 30.8 71 95.70 78.76 47.02 RI ≧ 30 67 95.58 79.12 47.42 401 0 3 31.9 69 95.70 79.24 47.02 RI ≧ 30 65 95.58 79.61 47.42 401 0 6 35.3 67 95.70 79.71 47.02 RI ≧ 30 63 95.58 80.10 47.42 401 0 4 37.4 65 95.70 80.19 47.02 RI ≧ 30 61 95.58 80.59 47.42 401 0 4 48.2 63 95.70 80.67 47.02 RI ≧ 30 59 95.58 81.08 47.42 401 0 3 42.9 61 95.70 81.15 47.02 RI ≧ 30 57 95.58 81.57 47.42 401 0 5 33.8 59 95.70 81.62 47.02 RI ≧ 30 56 95.58 81.82 47.42 401 0 7 47.3 57 95.70 82.10 47.26 RI ≧ 30 54 95.58 82.31 47.42 401 0 6 33.0 55 95.70 82.58 47.26 RI ≧ 30 52 95.58 82.80 47.67 401 0 5 39.4 53 95.70 83.05 47.49 RI ≧ 30 51 95.58 83.05 47.67 401 0 4 54.8 52 95.70 83.29 47.49 RI ≧ 30 51 95.58 83.05 48.16 401 0 3 45.5 51 95.70 83.53 47.73 RI ≧ 30 50 95.58 83.29 48.65 401 0 7 40.5 50 95.70 83.77 48.21 RI ≧ 30 49 95.58 83.54 48.89 401 0 6 35.5 49 95.70 84.01 48.45 RI ≧ 30 48 95.58 83.78 48.89 401 0 3 46.2 46 95.70 84.25 48.45 RI ≧ 30 47 95.58 84.03 48.89 401 0 3 30.6 47 95.70 84.49 48.45 RI ≧ 30 46 95.58 84.28 48.89 401 0 4 30.3 46 95.70 84.73 48.45 RI ≧ 30 45 95.58 84.52 48.89 401 0 3 50.0 45 95.70 84.96 48.45 RI ≧ 30 44 95.58 84.77 49.14 401 0 4 34.5 44 95.70 85.20 48.69 RI ≧ 30 43 95.58 85.01 49.14 401 0 3 34.9 43 95.70 85.44 48.69 RI ≧ 30 42 95.58 85.26 49.63 401 0 6 31.9 42 95.70 85.68 49.16 RI ≧ 30 41 95.58 85.50 50.61 401 0 7 30.0 41 95.70 85.92 50.12 RI ≧ 30 40 95.58 85.75 50.86 401 0 3 30.0 40 95.70 86.16 50.36 RI ≧ 30 39 95.58 86.00 50.86 401 0 4 32.8 39 95.70 86.40 50.36 RI ≧ 30 38 95.58 86.24 51.11 401 0 5 39.4 38 95.70 86.63 50.60 RI ≧ 30 37 95.58 86.49 51.35 401 0 5 31.1 37 95.70 86.87 50.84 RI ≧ 30 36 95.58 86.73 51.60 401 0 26 125.6 36 95.70 87.11 51.07 RI ≧ 30 35 95.58 86.98 51.60 401 0 4 31.3 35 95.70 87.35 51.07 RI ≧ 30 34 95.58 87.22 51.84 401 0 4 31.5 34 95.70 87.59 51.31 RI ≧ 30 33 95.58 87.47 52.09 401 0 4 46.5 33 95.70 87.83 51.55 RI ≧ 30 32 95.58 87.71 52.09 401 0 6 32.4 32 95.70 88.07 51.55 RI ≧ 30 31 95.58 87.96 52.09 401 0 4 36.7 31 95.70 88.31 51.55 RI ≧ 30 30 95.58 88.21 52.33 401 0 6 42.3 30 95.70 88.54 51.79 RI ≧ 30 29 95.58 88.45 52.33 401 0 5 36.2 29 95.70 88.76 51.79 RI ≧ 30 28 95.58 88.70 52.58 401 0 6 30.3 28 95.70 89.02 52.03 RI ≧ 30 27 95.58 88.94 52.83 401 0 8 43.7 27 95.70 89.26 52.27 RI ≧ 30 26 95.58 89.19 52.83 401 0 7 34.3 26 95.70 89.50 52.27 RI ≧ 30 25 95.58 89.43 53.07 401 0 5 41.7 25 95.70 89.74 52.51 RI ≧ 30 24 95.58 89.68 53.07 401 0 3 33.3 24 95.70 89.96 52.51 RI ≧ 30 23 95.58 89.93 53.56 401 0 5 30.7 23 95.70 90.21 53.22 RI ≧ 30 22 95.58 90.17 53.56 401 0 7 34.0 22 95.70 90.45 53.22 RI ≧ 30 21 95.58 90.42 53.81 401 0 4 31.3 21 95.70 90.69 53.46 RI ≧ 30 20 95.58 90.66 53.81 401 0 5 30.7 20 95.70 90.93 53.46 RI ≧ 30 19 95.58 90.91 53.81 401 0 3 34.1 19 95.70 91.17 53.46 RI ≧ 30 18 95.58 91.15 54.05 401 0 3 47.6 18 95.70 91.41 53.70 RI ≧ 30 17 95.58 91.40 54.30 401 0 4 40.4 17 95.70 91.65 53.94 RI ≧ 30 16 95.58 91.65 54.55 401 0 4 32.3 16 95.70 91.89 54.18 RI ≧ 30 15 95.58 91.89 54.55 401 0 5 31.1 15 95.70 92.12 54.18 RI ≧ 30 14 95.58 92.14 54.55 401 0 6 36.6 14 95.70 92.36 54.18 RI ≧ 20 12 95.58 92.63 55.53 401 0 14 23.7 12 95.70 92.84 55.13 RI ≧ 20 11 95.58 92.87 55.53 401 0 3 27.5 11 95.70 93.08 55.13 RI ≧ 20 10 95.58 93.12 55.77 401 0 5 23.4 10 95.70 93.32 55.37 RI ≧ 20 9 95.58 93.37 56.27 401 0 4 23.5 9 95.70 93.56 55.85 RI ≧ 20 8 95.58 93.61 57.00 401 0 22 21.4 8 95.70 93.79 56.56 RI ≧ 20 7 95.58 93.86 57.49 401 0 8 21.1 7 95.70 94.03 57.04 RI ≧ 20 6 95.58 94.10 57.74 401 0 3 23.8 6 95.70 94.27 57.28 RI ≧ 20 5 95.58 94.35 57.99 401 0 6 24.4 5 95.70 94.51 57.52 RI ≧ 20 4 95.58 94.59 57.99 401 0 4 28.0 4 95.70 94.75 57.52 RI ≧ 20 3 95.58 94.84 58.23 401 0 6 25.6 3 95.70 94.99 57.76 RI ≧ 20 2 95.58 95.09 58.23 401 0 3 26.8 2 95.70 95.23 57.76 Predicted drivers 2 95.58 95.09 58.97 401 0 17 19.0 2 95.70 95.23 56.47 Predicted drivers 2 95.58 95.09 59.46 401 0 3 7.7 2 95.70 95.23 58.95 Predicted drivers 2 95.58 95.09 59.46 401 0 3 28.0 2 95.70 95.23 58.95 Predicted drivers 2 95.58 95.09 59.46 401 0 2 12.7 2 95.70 95.23 58.95 Predicted drivers 2 95.58 95.09 59.71 401 0 2 10.3 2 95.70 95.23 59.19 Predicted drivers 2 95.58 95.09 59.71 401 0 3 19.9 2 95.70 95.23 59.19 Predicted drivers 2 95.58 95.09 59.95 401 0 4 19.0 2 95.70 95.23 59.43 Predicted drivers 2 95.58 95.09 60.44 401 0 2 19.8 2 95.70 95.23 59.90 Predicted drivers 2 95.58 95.09 60.44 401 0 4 21.4 2 95.70 95.23 59.90 Predicted drivers 2 95.58 95.09 60.93 401 0 3 23.6 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 2 19.8 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 0 0.0 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 2 19.2 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 1 6.2 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 0 0.0 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 5 14.8 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 1 6.4 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 6 9.0 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 0 0.0 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 1 5.5 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 0 0.0 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 2 19.6 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 60.93 401 0 6 9.1 2 95.70 95.23 60.38 Predicted drivers 2 95.58 95.09 61.18 401 0 4 25.5 2 95.70 95.23 60.62 Predicted drivers 2 95.58 95.09 61.43 401 0 3 8.9 2 95.70 95.23 60.86 Predicted drivers 2 95.58 95.09 61.67 401 0 2 15.9 2 95.70 95.23 61.10 Predicted drivers 2 95.58 95.09 61.92 401 0 1 5.3 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 61.92 401 0 0 0.0 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 61.92 401 0 0 0.0 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 61.92 401 0 3 23.6 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 61.92 401 0 1 5.3 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 61.92 401 0 0 0.0 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 61.92 401 0 5 23.7 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 61.92 401 0 1 6.6 2 95.70 95.23 61.34 Predicted drivers 2 95.58 95.09 62.16 401 0 2 10.3 2 95.70 95.23 61.58 Predicted drivers 2 95.58 95.09 62.65 401 0 3 19.1 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 1 9.3 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 2 19.6 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 0 0.0 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 1 6.4 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 2 15.7 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 1 7.8 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 0 0.0 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 2 7.0 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 1 6.4 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 1 8.3 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 0 0.0 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 0 0.0 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 1 9.9 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.65 401 0 0 0.0 2 95.70 95.23 62.05 Predicted drivers 2 95.58 95.09 62.90 401 0 3 26.3 2 95.70 95.23 62.29 Predicted drivers 2 95.58 95.09 62.90 401 0 0 0.0 2 95.70 95.23 62.29 Predicted drivers 2 95.58 95.09 62.90 401 0 4 24.7 2 95.70 95.23 62.29 Predicted drivers 2 95.58 95.09 62.90 401 0 7 33.2 2 95.70 95.23 62.29 Predicted drivers 2 95.58 95.09 62.90 401 0 1 9.8 2 95.70 95.23 62.29 Predicted drivers 2 95.58 95.09 62.90 401 0 5 19.2 2 95.70 95.23 62.29 Predicted drivers 2 95.58 95.09 62.90 401 0 0 0.0 2 95.70 95.23 62.29 Predicted drivers 2 95.58 95.09 63.14 401 0 5 8.5 2 95.70 95.23 62.77 Predicted drivers 2 95.58 95.09 63.14 401 0 1 8.4 2 95.70 95.23 62.77 Predicted drivers 2 95.58 95.09 63.14 401 0 1 9.8 2 95.70 95.23 62.77 Predicted drivers 2 95.58 95.09 63.14 401 0 2 10.3 2 95.70 95.23 62.77 Predicted drivers 2 95.58 95.09 63.14 401 0 3 9.8 2 95.70 95.23 62.77 Predicted drivers 2 95.58 95.09 63.14 401 0 1 6.8 2 95.70 95.23 62.77 Predicted drivers 2 95.58 95.09 63.14 401 0 1 7.4 2 95.70 95.23 62.77 Predicted drivers 2 95.58 95.09 63.88 401 0 9 15.4 2 95.70 95.23 63.48 Predicted drivers 2 95.58 95.09 64.13 401 0 5 18.5 2 95.70 95.23 63.72 Predicted drivers 2 95.58 95.09 64.37 401 0 1 9.9 2 95.70 95.23 63.96 Predicted drivers 2 95.58 95.09 64.37 401 0 1 8.9 2 95.70 95.23 63.96 Predicted drivers 2 95.58 95.09 64.37 401 0 0 0.0 2 95.70 95.23 63.96 Predicted drivers 2 95.58 95.09 64.37 401 0 2 19.6 2 95.70 95.23 63.96 Predicted drivers 2 95.58 95.09 64.62 401 0 4 21.1 2 95.70 95.23 64.20 Predicted drivers 2 95.58 95.09 64.86 401 0 3 21.0 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 2 13.6 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 0 0.0 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 1 4.5 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 0 0.0 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 2 19.8 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 3 19.6 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 2 18.7 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 0 0.0 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 1 8.8 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 0 0.0 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 2 19.8 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 2 12.9 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 64.86 401 0 3 29.4 2 95.70 95.23 64.44 Predicted drivers 2 95.58 95.09 65.11 401 0 1 8.7 2 95.70 95.23 64.68 Predicted drivers 2 95.58 95.09 65.11 401 0 3 28.3 2 95.70 95.23 64.68 Predicted drivers 2 95.58 95.09 65.11 401 0 0 0.0 2 95.70 95.23 64.68 Predicted drivers 2 95.58 95.09 65.36 401 0 2 18.5 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 1 7.5 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 2 13.5 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 5 14.6 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 66.36 401 0 2 18.2 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 1 8.0 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 2 12.7 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 2 12.1 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 1 8.6 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 2 11.2 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 0 0.0 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 1 5.9 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 4 30.3 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 0 0.0 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 1 9.9 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 0 0.0 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.36 401 0 0 0.0 2 95.70 95.23 64.92 Predicted drivers 2 95.58 95.09 65.60 401 0 1 6.3 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 65.60 401 0 0 0.0 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 65.60 401 0 2 8.2 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 65.60 401 0 4 23.0 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 65.60 401 0 1 7.6 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 65.60 401 0 0 0.0 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 65.60 401 0 0 0.0 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 65.60 401 0 2 10.5 2 95.70 95.23 65.16 Predicted drivers 2 95.58 95.09 66.09 401 0 3 28.8 2 95.70 95.23 65.63 Predicted drivers 2 95.58 95.09 66.09 401 0 0 0.0 2 95.70 95.23 65.63 Predicted drivers 2 95.58 95.09 66.09 401 0 0 0.0 2 95.70 95.23 65.63 Predicted drivers 2 95.58 95.09 66.09 401 0 8 26.8 2 95.70 95.23 65.63 Predicted drivers 2 95.58 95.09 66.34 401 0 1 7.0 2 95.70 95.23 65.87 Predicted drivers 2 95.58 95.09 66.34 401 0 2 13.9 2 95.70 95.23 65.87 Predicted drivers 2 95.58 95.09 66.34 401 0 0 0.0 2 95.70 95.23 65.87 Predicted drivers 2 95.58 95.09 66.34 401 0 0 0.0 2 95.70 95.23 65.87 Predicted drivers 2 95.58 95.09 66.58 401 0 1 9.9 2 95.70 95.23 66.11 Predicted drivers 2 95.58 95.09 66.83 401 0 1 9.8 2 95.70 95.23 66.35 Predicted drivers 2 95.58 95.09 66.83 401 0 0 0.0 2 95.70 95.23 66.35 Predicted drivers 2 95.58 95.09 67.57 401 0 3 23.4 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 67.57 401 0 1 8.5 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 67.57 401 0 1 8.3 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 67.57 401 0 0 0.0 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 67.57 401 0 0 0.0 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 67.57 401 0 2 17.9 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 67.57 401 0 0 0.0 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 67.57 401 0 0 0.0 2 95.70 95.23 67.06 Predicted drivers 2 95.58 95.09 68.06 401 0 3 27.0 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 0 0.0 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 2 19.8 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 0 0.0 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 2 19.6 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 0 0.0 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 1 9.8 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 0 0.0 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.06 401 0 0 0.0 2 95.70 95.23 67.54 Predicted drivers 2 95.58 95.09 68.30 401 0 3 20.4 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 6.8 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 8.7 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 2 12.1 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 3 6.7 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 3 10.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 2 11.6 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 9.9 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 6.8 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 2 12.7 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 5.3 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 2 19.6 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 6.4 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 9.9 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 7.5 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 2 10.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 9.9 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 1 7.8 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 2 17.4 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 2 16.7 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.30 401 0 0 0.0 2 95.70 95.23 67.78 Predicted drivers 2 95.58 95.09 68.55 401 0 1 7.4 2 95.70 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 1 4 21.3 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 2 13.1 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 1 9.8 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 1 6.8 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 1 3.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 2 13.5 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 3 22.7 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 2 11.5 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 4 16.5 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.55 402 0 0 0.0 3 95.94 95.23 68.02 Predicted drivers 3 95.82 95.09 68.80 402 0 1 9.9 3 95.94 95.23 68.26 Predicted drivers 3 95.82 95.09 68.80 402 0 1 5.2 3 95.94 95.23 68.26 Predicted drivers 3 95.82 95.09 68.80 402 0 1 9.3 3 95.94 95.23 68.26 Predicted drivers 3 95.82 95.09 68.80 402 0 0 0.0 3 95.94 95.23 68.26 Predicted drivers 3 95.82 95.09 68.80 402 0 0 0.0 3 95.94 95.23 68.26 Predicted drivers 3 95.82 95.09 69.04 402 0 1 6.3 3 95.94 95.23 68.50 Predicted drivers 3 95.82 95.09 69.29 402 0 3 17.1 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 7.4 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 8.7 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 7.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 4.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 9.9 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 2 4.5 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 7.1 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 8.1 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 2 10.9 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 1 4.7 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 2 12.3 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.29 402 0 0 0.0 3 95.94 95.23 68.74 Predicted drivers 3 95.82 95.09 69.53 402 0 1 9.9 3 95.94 95.23 68.97 Predicted drivers 3 95.82 95.09 69.78 402 0 2 13.5 3 95.94 95.23 69.21 Predicted drivers 3 95.82 95.09 69.78 402 0 0 0.0 3 95.94 95.23 69.21 Predicted drivers 3 95.82 95.09 69.78 402 0 0 0.0 3 95.94 95.23 69.21 Predicted drivers 3 95.82 95.09 69.78 402 0 2 4.6 3 95.94 95.23 69.21 Predicted drivers 3 95.82 95.09 69.78 402 0 3 8.8 3 95.94 95.23 69.21 Predicted drivers 3 95.82 95.09 69.78 402 0 3 14.7 3 95.94 95.23 69.21 Predicted drivers 3 95.82 95.09 70.02 402 0 3 15.4 3 95.94 95.23 69.45 Predicted drivers 3 95.82 95.09 70.27 402 0 2 14.1 3 95.94 95.23 69.69 Predicted drivers 3 95.82 95.09 70.27 402 0 0 0.0 3 95.94 95.23 69.69 Predicted drivers 3 95.82 95.09 70.52 402 0 1 4.6 3 95.94 95.23 69.93 Predicted drivers 3 95.82 95.09 70.52 402 0 1 9.9 3 95.94 95.23 69.93 Predicted drivers 3 95.82 95.09 70.52 402 0 0 0.0 3 95.94 95.23 69.93 Predicted drivers 3 95.82 95.09 70.76 402 0 2 19.4 3 95.94 95.23 70.17 Predicted drivers 3 95.82 95.09 71.01 402 0 1 7.0 3 95.94 95.23 70.41 Predicted drivers 3 95.82 95.09 71.01 402 0 2 15.6 3 95.94 95.23 70.41 Predicted drivers 3 95.82 95.09 71.01 402 0 2 15.0 3 95.94 95.23 70.41 Predicted drivers 3 95.82 95.09 71.01 402 0 0 0.0 3 95.94 95.23 70.41 Predicted drivers 3 95.82 95.09 71.25 402 0 2 12.6 3 95.94 95.23 70.64 Predicted drivers 3 95.82 95.09 71.25 402 0 0 0.0 3 95.94 95.23 70.64 Predicted drivers 3 95.82 95.09 71.25 402 0 1 7.6 3 95.94 95.23 70.64 Predicted drivers 3 95.82 95.09 71.50 402 0 1 9.8 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.50 402 0 1 6.9 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.50 402 0 0 0.0 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.50 402 0 0 0.0 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.50 402 0 1 6.9 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.50 402 0 0 0.0 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.50 402 0 0 0.0 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.50 402 0 1 9.8 3 95.94 95.23 70.88 Predicted drivers 3 95.82 95.09 71.74 402 0 2 9.9 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 1 9.1 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 1 2.6 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 2 16.9 3 95.94 95.23 71.12 Predicted drivers 3 95.82 95.09 71.74 402 0 0 0.0 3 95.94 95.23 71.12 Predicted drivers 4 96.07 95.09 72.97 403 1 23 3.5 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 0 0.0 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 1 8.8 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 0 0.0 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 0 0.0 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 0 0.0 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 1 4.2 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 0 0.0 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 1 5.8 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 0 0.0 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 72.97 403 0 0 0.0 4 96.18 95.23 72.32 Predicted drivers 4 96.07 95.09 73.22 403 0 1 9.9 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 7.9 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 5.8 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 7.6 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 2 11.7 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 4.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 7.8 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 5.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 9.9 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 5.8 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 2 11.2 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 9.9 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 9.6 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 4.2 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 4.6 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 0 0.0 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.22 403 0 1 9.8 4 96.18 95.23 72.55 Predicted drivers 4 96.07 95.09 73.46 403 0 3 21.1 4 96.18 95.23 72.79 Predicted drivers 4 96.07 95.09 73.46 403 0 1 8.5 4 96.18 95.23 72.79 Predicted drivers 4 96.07 95.09 73.46 403 0 0 0.0 4 96.18 95.23 72.79 Predicted drivers 4 96.07 95.09 73.46 403 0 0 0.0 4 96.18 95.23 72.79 Predicted drivers 4 96.07 95.09 73.46 403 0 1 9.8 4 96.18 95.23 72.79 Predicted drivers 4 96.07 95.09 73.71 403 0 1 8.6 4 96.18 95.23 73.03 Predicted drivers 4 96.07 95.09 73.71 403 0 0 0.0 4 96.18 95.23 73.03 Predicted drivers 4 96.07 95.09 73.96 403 0 1 7.2 4 96.18 95.23 73.27 Predicted drivers 4 96.07 95.09 74.45 403 0 2 10.8 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 1 3.8 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 1 9.8 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 2 4.6 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 1 8.9 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Predicted drivers 4 96.07 95.09 74.45 403 0 0 0.0 4 96.18 95.23 73.75 Add fusions — — — — — — — — — — — — Add fusions — — — — — — — — — — — — Add fusions — — — — — — — — — — — — Add fusions — — — — — — — — — — — — Add fusions — — — — — — — — — — — —

FIG. 3 illustrates how the statistical enrichment of recurrently mutated NSCLC exons captures known drivers. Two metrics were employed to prioritize exons with recurrent mutations for inclusion in the CAPP-Seq NSCLC selector. The first, termed Recurrence Index (RI), is defined as the number of unique patients (i.e. tumors) with somatic mutations per kilobase of a given exon and the second metric is based on the minimum number of unique patients (i.e. tumors) with mutations in a given kb of exon. Exons containing at least one non-silent SNV genotyped by TCGA (n=47,769) in a combined cohort of 407 lung adenocarcinoma (LUAD) and squamous cell carcinoma (SCC) patients were analyzed. As shown in FIG. 3(a), known/suspected NSCLC drivers are highly enriched at RI≧30 (inset), comprising 1.8% (n=861) of analyzed exons. As shown in FIG. 3(b), known/suspected NSCLC drivers are highly enriched at ≧3 patients with mutations per exon (inset), encompassing 16% of analyzed exons.

Approximately 8% of NSCLCs contain clinically actionable rearrangements involving the receptor tyrosine kinases, ALK, ROS1 and RET (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870; Kwak et al. (2010) N. Engl. J. Med. 363:1693-1703; Pao & Hutchinson (2012) Nat. Med. 18:349-351). To utilize the personalized nature and low false detection rate of structural rearrangements (Leary et al. (2010) Sci. Transl. Med. 2:20ra14; McBride et al. (2010) Genes Chrom. Cancer 49:1062-1069), introns and exons spanning recurrent fusion breakpoints in these genes were included in the final design phase (FIG. 1b). To detect fusions in tumor and plasma DNA, a breakpoint-mapping algorithm called FACTERA was developed (FIG. 4). Application of FACTERA to next generation sequencing (NGS) data from 2 NSCLC cell lines known to harbor fusions with previously uncharacterized breakpoints (Koivunen et al. (2008) Clin. Cancer Res. 14:4275-4283; Rikova et al. (2007) Cell 131:1190-1203) readily identified the breakpoints in both cases (FIG. 5).

Collectively, the NSCLC CAPP-Seq selector design targets 521 exons and 13 introns from 139 recurrently mutated genes, in total covering ˜125 kb (FIG. 1b). Within this small target (0.004% of the human genome), CAPP-Seq identifies a median of 4 point mutations and covered 96% of patients with lung adenocarcinoma or squamous cell carcinoma. To validate the number of mutations covered per tumor, we examined the selector region in WES data from an independent cohort of 183 lung adenocarcinoma patients (Imielinski et al. (2012) Cell 150:1107-1120). The selector covered 88% of patients with a median of 4 SNVs per patient, thus validating our selector design algorithm (P<1.0×10⁻⁶; FIG. 1c). When compared to randomly sampling the exome, regions targeted by CAPP-Seq captured ˜4-fold as many mutations per patient (at the median, FIG. 1c). Due to similarities in key oncogenic machinery across cancers (Hanahan & Weinberg (2011) Cell 144:646-674), we hypothesized that our NSCLC selector would perform favorably on other carcinomas. Indeed, when applied to TCGA WES data, the selector successfully captured 99% of colon, 98% of rectal, and 97% of endometrioid uterine carcinomas, with a median of 12, 7, and 3 mutations per patient, respectively (FIG. 1d). This demonstrates the value of targeting hundreds of recurrently mutated genomic regions and suggests that a CAPP-Seq selector could be designed to simultaneously cover mutations for a wide variety of human malignancies.

Using this CAPP-Seq selector, we profiled a total of 52 samples including NSCLC cell lines, primary tumor specimens, peripheral blood leukocytes (PBLs), and cfDNA isolated from plasma of patients with NSCLC before and after various cancer therapies (Table 2). To assess and optimize the performance of CAPP-Seq, we first applied it to cfDNA purified from healthy control plasma. Approximately 60% of reads mapped within the selector target region (Table 2). Sequenced cfDNA fragments had a median length of 169 bp (FIG. 1e), closely corresponding to the length of DNA contained within a chromatosome (Fan et al. (2008) Proc. Natl Acad. Sci. USA 105:16266-16271). To optimize library preparation from small quantities of cfDNA we explored a variety of modifications to the ligation and post-ligation amplification steps including temperature, incubation time, enzyme source, and “with-bead” clean-up. The optimized protocol increased recovery efficiency by >300% and decreased bias for libraries constructed from as little as 4 ng of cfDNA (FIGS. 6-8). Consequently, fluctuations in sequencing depth were minimal (FIG. 1f-g) and unlikely to impact performance.

TABLE 2 Profile of samples using NSCLC CAPP-Seq selector DNA Library Fraction of mass used mass used Total properly Read on- Median for library for capture reads paired target Median fragment Sample (ng) (ng) mapped reads rate depth length H3122 0.1% into HCC78 128 111 99.0% 96.8% 69.5% 8688 173 H3122 1% into HCC78 128 111 98.9% 96.7% 69.8% 8657 171 H3122 10% into HCC78 128 111 98.9% 96.5% 69.8% 6890 170 H3122 100% 128 111 99.0% 96.8% 68.6% 6739 174 HCC78 100% 128 111 99.0% 96.9% 69.7% 7602 172 cfDNA 100% 6 cycles 32 83.3 97.5% 86.7% 60.3% 8280 168 HCC78 10% into cfDNA 4 cycles 128 83.3 97.5% 83.3% 59.3% 2682 170 HCC78 10% into cfDNA 8 cycles SigmaWGA 624 83.3 79.5% 72.0% 50.4% 15 158 HCC78 10% into cfDNA 6 cycles 32 83.3 97.7% 87.2% 60.4% 8261 169 HCC78 10% into cfDNA 8 cycles NEBNextOvernightBead 32 83.3 96.9% 91.8% 61.1% 6258 166 HCC78 10% into cfDNA 8 cycles OrigNEBNext 15 minLig 32 83.3 98.0% 93.1% 60.9% 9862 167 HCC78 10% into cfDNA 4 ng 9 cycles 4 83.3 97.6% 87.6% 60.5% 11630 169 P11 PBL 500 83.3 96.7% 93.8% 59.0% 6970 169 P11 Tumor 500 83.3 93.4% 88.3% 61.3% 7700 156 P6 PBL 500 83.3 96.7% 92.6% 67.2% 3848 152 P6 Tumor 1000 83.3 87.0% 81.8% 64.7% 2445 158 P8 PBL 500 83.3 96.9% 93.0% 65.8% 4021 154 P8 Tumor 500 83.3 91.7% 85.4% 63.6% 5331 151 P10 PBL 400 83.3 96.9% 93.6% 65.3% 4572 161 P10 Tumor 500 83.3 94.0% 89.6% 65.1% 5335 157 P7 PBL 500 83.3 97.1% 93.5% 67.1% 3552 155 P7 Tumor 500 83.3 94.1% 89.3% 64.0% 4793 162 HCC78 0.025% into cfDNA 32 83.3 98.2% 87.0% 46.3% 3913 169 HCC78 0.05% into cfDNA 32 83.3 98.1% 86.1% 44.7% 6549 169 HCC78 0.1% into cfDNA 32 83.3 98.4% 88.1% 44.9% 6897 169 HCC78 0.5% into cfDNA 32 83.3 98.8% 89.8% 46.2% 8096 169 HCC78 1% into cfDNA 32 83.3 98.5% 89.8% 46.5% 7779 171 P6-1 cfDNA 17 83.3 98.6% 91.3% 46.4% 11172 166 P6-2 cfDNA 20 83.3 98.5% 92.0% 46.6% 8455 166 P9 PBL 500 83.3 97.0% 94.4% 59.2% 5441 172 P9 Tumor 69 83.3 99.2% 97.3% 55.3% 7312 239 P3 PBL 500 83.3 99.3% 97.8% 57.0% 8838 235 P3 Tumor 500 83.3 99.3% 98.0% 66.0% 9562 204 P2 PBL 500 83.3 99.2% 97.5% 57.7% 7680 235 P2 Tumor 500 83.3 99.0% 97.1% 62.3% 7247 204 P4 PBL 500 83.3 99.1% 96.5% 56.5% 7331 227 P4 Tumor 200 83.3 97.5% 94.1% 60.0% 3968 189 P1 PBL 500 83.3 99.3% 97.1% 57.1% 7336 220 P1 Tumor 500 83.3 94.6% 90.1% 60.9% 976 192 P5 PBL 500 83.3 99.2% 97.2% 58.7% 8155 219 P5 Tumor 100 83.3 98.8% 97.0% 63.5% 6930 187 P9-1 cfDNA 12 83.3 99.1% 84.2% 65.6% 6839 172 P9-2 cfDNA 17 83.3 98.4% 83.9% 65.2% 6043 169 P9-3 cfDNA 16 83.3 99.4% 88.7% 67.6% 8141 167 P3-1 cfDNA 15 83.3 99.2% 86.0% 63.5% 7057 170 P3-2 cfDNA 16 83.3 99.3% 86.5% 63.5% 10089 171 P2-1 cfDNA 13 83.3 99.4% 86.9% 67.3% 6876 172 P2-2 cfDNA 16 83.3 99.5% 96.4% 63.6% 5248 185 P1-1 cfDNA 13 83.3 99.0% 85.0% 64.6% 5079 171 P1-2 cfDNA 7 83.3 99.4% 84.7% 64.1% 6487 172 P5-1 cfDNA 9 83.3 99.3% 87.8% 66.6% 7604 169 P5-2 cfDNA 15 83.3 99.4% 88.0% 67.5% 10451 170

FIG. 6 illustrates the improvements in CAPP-Seq performance achieved with optimized library preparation procedures. Using 32 ng of input cfDNA from plasma, standard versus “with bead” (Fisher et al. (2011) Genome biology 12:R1) library preparation methods were compared, as well as two commercially available DNA polymerases (Phusion and KAPA HiFi). Template pre-amplification by Whole Genome Amplification (WGA) using Degenerate Oligonucleotide PCR (DOP) were also compared. Indices considered for these comparisons included (a) length of the captured cfDNA fragments sequenced, (b) depth and uniformity of sequencing coverage across all genomic regions in the selector, and (c) sequence mapping and capture statistics, including uniqueness. Collectively, these comparisons identified KAPA HiFi polymerase and a “with bead” protocol as having most robust and uniform performance.

FIG. 7 illustrates the optimization of allele recovery from low input cfDNA during Illumina library preparation. Bars reflect the relative yield of CAPP-Seq libraries constructed from 4 ng cfDNA, calculated by averaging quantitative PCR measurements of 4 pre-selected reporters within CAPP-Seq with pre-defined amplification efficiencies. (a) Sixteen hour ligation at 16° C. increases ligation efficiency and reporter recovery. (b) Adapter ligation volume did not have a significant effect on ligation efficiency and reporter recovery. (c) Performing enzymatic reactions “with-bead” to minimize tube transfer steps increases reporter recovery. (d) Increasing adapter concentration during ligation increases ligation efficiency and reporter recovery. Reporter recovery is also higher when using KAPA HiFi DNA polymerase compared to Phusion DNA polymerase (e) and when using the KAPA Library Preparation Kit with the modifications in a-d compared to the NuGEN SP Ovation Ultralow Library System with automation on a Mondrian SP Workstation (f). Relative reporter abundance was determined by qPCR using the 2^−ΔCtmethod. All values are mean±s.d. N.S., not significant. Based on these results, it was estimated that combining the methodological modifications in a and c-e improves yield in NGS libraries by 3.3-fold.

FIG. 7 illustrates the performance of CAPP-Seq with various amounts of input cfDNA. (a) Length of the captured cfDNA fragments sequenced. (b) Depth of sequencing coverage across all genomic regions in the selector. (c) Sequence mapping and capture statistics. As expected, more input cfDNA mass correlates with more unique fragments sequenced.

The detection limit of CAPP-Seq is affected by the absolute number of available cfDNA molecules in a given volume of peripheral blood, as well as PCR and sequencing errors (i.e. “technical” background). The latter primarily affects substitutions/SNVs as opposed to other CAPP-Seq reporters (i.e., indels (Minoche et al. (2011) Genome Biol. 12:R112) and rearrangements). Separately, mutant cfDNA could be present in the absence of cancer due to contributions from pre-neoplastic cells from diverse tissues (i.e., “biological” background). The combined background from these sources was measured by assessing the error rate at each nucleotide position across the selector in plasma cfDNA from 6 patients and a healthy individual, excluding tumor-derived mutations. Mean and median background rates of ˜0.007% and ˜0% (not detected, N.D.) were found, respectively (FIG. 9 (a)). Next, we hypothesized that if significant biological background is present, it should be highest for recurrently mutated positions in cancer driver genes. We therefore analyzed mutation rates of 107 recurrent cancer-associated SNVs (Su et al. (2011) J. Mol. Diagn. 13:74-84) in the same 7 plasma samples, again excluding those SNVs found in corresponding tumors. Though the median fractional abundance was comparable (˜0%, N.D.), the mean was marginally higher at 0.012% (FIG. 9 (b)). However, only one cancer-associated mutation (TP53 R175H) was detectable in plasma at levels significantly above global background (P<0.01). Since this allele was detected at a median frequency of ˜0.3% across all samples (FIG. 9(c)), we hypothesize that it reflects true biological background and thus excluded it as a potential CAPP-Seq reporter. Collectively, this analysis suggests that biological background is not a significant factor for disease monitoring at the current detection limits of CAPP-Seq.

Next, the allele frequency detection limit and linearity of CAPP-Seq was benchmarked by spiking defined concentrations of fragmented genomic DNA from a NSCLC cell line into cfDNA from a healthy individual (FIG. 9(d)) or into genomic DNA from a second NSCLC line (FIG. 10(a)). CAPP-Seq accurately detected variants at fractional abundances between 0.025% and 10% with high linearity (R²≧0.994). Analyses of the influence of the number of SNV reporters on error metrics showed only marginal improvements above a threshold of 4 reporters per tumor (FIGS. 9(e)-(f), 10 (b)-(c)), equivalent to the median number of SNVs per NSCLC identified by the NSCLC selector. Finally, whether fusion breakpoints and indels could also serve as linear reporters was tested. It was found that the fractional abundance of these mutations correlated highly with expected concentrations (R²≧0.995; FIG. 10(d)).

Having designed, optimized, and benchmarked CAPP-Seq, it was applied to the discovery of somatic mutations in tumors collected from a diverse group of NSCLC patients (n=11; FIG. 11(a) and Table 3). To test the breakpoint enumeration capability of CAPP-Seq, 6 patients with clinically confirmed fusions were included. These translocations served as positive controls, along with SNVs in other tumors previously identified by clinical assays (N=9; Table 3). Tumor samples included formalin fixed surgical or biopsy specimens and pleural fluid. At a mean sequencing depth of ˜6,000× in tumor and paired germline samples, CAPP-Seq confirmed all previously identified SNVs and fusions (3 and 8, respectively) and discovered many additional somatic variants (FIG. 11(a) and Table 3). Moreover, CAPP-Seq characterized breakpoints and partner genes at base pair resolution for each of the 8 rearrangements (FIG. 12). Tumors containing fusions were almost exclusively from never smokers and, as expected (Govindan et al. (2012) Cell 150:1121-1134), contained fewer SNVs than those lacking fusions (FIG. 13). Excluding patients with fusions (<10% of the TCGA design cohort), CAPP-Seq identified a median of 4 SNVs per patient as we had predicted (FIG. 1(b)-(c)).

TABLE 3 Characteristics of patients used for noninvasive detection and monitoring of circulating tumor DNA by CAPP-Seq. SNVs by Fusions Grade and Other TNM Stage Pack- Tumor Germline Clinical Detected Case Age Sex Histology Histological Features Stage Group Smoker Years Source Source Assays by FISH P1 66 M Adeno- Papillary type T2aN0M0 B Yes 20 FFPE Frozen carcinoma cores PBL P2 61 M Large Cell NOS T3N1M0 IIIA Yes 80 FFPE Frozen cores PBL P3 67 F Adeno- Acinar type T1bN3M0 IIIB Yes 15 FFPE Frozen carcinoma cores PBL P4 47 F Adeno- Micropapillary and T2aN2M1b IV Yes 45 FFPE Frozen KRAS G13D carcinoma papillary type cores PBL P5 49 F Adeno- Well differentiated T1bN0M1a IV No 0 FFPE Frozen EGFR L858R; carcinoma cores PBL EGFR T790M P6 54 M Adeno- NOS T3N2M1b IV No 0 Fresh Frozen ALK carcinoma PBL P7 50 M Adeno- Poorly differentiated T1aN2M1b IV Yes 4 FFPE Frozen ALK carcinoma cores PBL P8 48 F Adeno- Mutinous type T4N0M1b IV No 0 FFPE Frozen ALK carcinoma cores PBL P9 49 M Adeno- Not otherwise T4N3M1a IV No 0 Fresh Frozen ALK carcinoma specified (NOS) PBL P10 35 F Adeno- NOS T4N0M0 IIIA No 0 FFPE Frozen ROS1 carcinoma cores PBL P11 38 F Adeno- Well-to-moderately T3N2M0 IIIA No 0 FFPE Frozen ROS1 carcinoma differentiated cores PBL : Related to FIGS. 11 (a) and 14, regarding smoking history, ≧20 pack years was considered heavy and >0 pack years was considered light.

To explore the potential clinical utility of CAPP-Seq for disease monitoring and minimal residual disease detection, we next applied CAPP-Seq to serial plasma samples collected from a subset of these same 11 patients (N=6), all of whom had pre- and post-treatment samples available (FIG. 11; Table 4). Starting from ˜15 ng of plasma cfDNA (˜3 mL of peripheral blood) and sequenced to a mean depth of nearly 8,000× (Table 3), CAPP-Seq detected cancer-derived cfDNA in both early and advanced stage patients (Table 4). Among patients with SNV or indel reporters, all showed a significant reduction in cancer cfDNA burden following treatment, consistent with radiographic response assessment by computed tomography (CT) (FIG. 11(a)). These included two patients—one with stage IB adenocarcinoma (P1) and another with stage IIIA large cell carcinoma (P2)—who underwent surgery with complete tumor resection (FIG. 11(b)). Post-treatment cancer-derived cfDNA was undetectable in the Stage I patient but was above background for the Stage IIIA patient suggesting that residual cancer cells remained after surgery even though a complete resection was thought to have been achieved. In a third case (P6), CAPP-Seq detected 3 SNVs and a KIF5B-ALK fusion, and both mutation types reported similar fractional abundances of mutant cfDNA (FIG. 14). Next, we analyzed a patient with 3 fusions and no detectable SNVs/indels (P9), but from whom 3 serial cfDNA samples were collected. Abundance of fusion product in the plasma was highly correlated with tumor burden and correctly indicated initial response to therapy followed by relapse (R²=0.97; FIG. 11(c)). Finally, in a fifth patient (P5), CAPP-Seq identified a sub-clonal population harboring the T790M EGFR gatekeeper mutation (Kobayashi et al. (2005) N. Engl. J. Med. 352:786-792) (FIG. 11(d)). The ratio between clones was identical in the tumor and pre-treatment plasma cfDNA but changed after treatment with cytotoxic chemotherapy followed by a 3^rdgeneration EGFR inhibitor (FIG. 11(d), inset), suggesting that CAPP-Seq can detect clinically relevant subclones and monitor clonal dynamics during therapy. Taken together, these data demonstrate the potential utility of CAPP-Seq as a noninvasive clinical assay for measuring tumor burden in early and advanced stage NSCLC and for monitoring tumor-derived cfDNA during therapy.

TABLE 4 Monitoring of cfDNA in patients using CAPP-Seq. Time point 1 Time point 2 Time point 3 Mu- Mu- Mu- Mu- Mu- Mu- Mu- tant tant tant tant tant tant tant Ref. allele Total allele Final allele Total allele Final allele Total allele Final Case allele allele Chr Position depth depth % % depth depth % % depth depth % % P1 A G chr1 156785560 0 4572 0.000 0.000 3 6202 0.048 0.048 — — — — P1 T G chr1 157806043 0 1838 0.000 0.000 0 2266 0.000 0.000 — — — — P1 G C chr1 248525206 0 2828 0.000 0.000 0 4529 0.000 0.000 — — — — P1 C T chr2 33500291 1 943 0.106 0.106 0 943 0.000 0.000 — — — — P1 A C chr4 55946307 0 6856 0.000 0.000 0 8817 0.000 0.000 — — — — P1 G A chr4 55963949 0 5742 0.000 0.000 0 7335 0.000 0.000 — — — — P1 A C chr4 55968672 0 5856 0.000 0.000 0 7431 0.000 0.000 — — — — P1 C T chr6 117642146 0 5266 0.000 0.000 4 6849 0.058 0.058 — — — — P1 T G chr9 8376700 3 5535 0.054 0.054 0 7322 0.000 0.000 — — — — P1 T C chr9 8733625 1 827 0.121 0.121 0 1398 0.000 0.000 — — — — P1 T G chr10 43611663 0 3722 0.000 0.000 0 4565 0.000 0.000 — — — — P1 T G chr15 88522525 1 4919 0.020 0.020 4 6736 0.059 0.059 — — — — P1 +G C chr17 7578474 0 1762 0.000 0.000 0 2373 0.000 0.000 — — — — P1 −A G chr17 29552244 1 4484 0.022 0.022 0 6485 0.000 0.000 — — — — P1 +T C chr17 29553484 0 3657 0.000 0.000 0 4713 0.000 0.000 — — — — P1 −T C chr17 29592185 3 3694 0.081 0.081 0 3247 0.000 0.000 — — — — P2 A C chr2 50463926 49 6724 0.729 1.457 0 4981 0.000 0.000 — — — — P2 G A chr3 89457148 40 4838 0.827 0.827 0 4311 0.000 0.000 — — — — P2 T G chr3 89468286 5 4667 0.107 0.214 2 3625 0.055 0.110 — — — — P2 T A chr3 89480240 15 5073 0.296 0.591 0 4321 0.000 0.000 — — — — P2 T A chr4 66189669 4 950 0.421 0.842 5 1436 0.348 0.696 — — — — P2 T G chr4 66242868 16 2107 0.759 0.759 0 1655 0.000 0.000 — — — — P2 A C chr5 176522747 46 2220 2.072 2.072 0 1377 0.000 0.000 — — — — P2 C T chr6 117648229 70 7819 0.895 1.791 0 5985 0.000 0.000 — — — — P2 A C chr12 78400637 35 7907 0.443 0.885 1 6326 0.016 0.032 — — — — P2 T G chr12 78400910 106 8211 1.291 2.582 1 6289 0.016 0.032 — — — — P2 T C chr17 7577551 112 5629 1.990 1.990 2 3814 0.052 0.052 — — — — P2 T G chr19 1207247 15 1124 1.335 2.669 0 747 0.000 0.000 — — — — P2 +A C chr2 79314100 16 3280 0.488 0.98 0 2390 0.000 0.000 — — — — P3 A C chr17 7578253 6 6345 0.095 0.095 0 8583 0.000 0.000 — — — — P5 T C chr7 55249071 42 4736 0.887 0.887 10 5597 0.179 0.179 — — — — P5 G T chr7 55259515 503 11349 4.432 4.432 58 12222 0.475 0.475 — — — — P5 A G chr11 55135338 86 4063 2.117 2.117 10 4798 0.208 0.208 — — — — P5 T C chr17 7577097 227 7429 3.056 3.056 36 9723 0.370 0.370 — — — — P6 A G chr12 78400791 84 13970 0.601 1.203 28 10128 0.276 0.553 — — — — P6 T G chr12 129822187 78 8680 0.899 1.797 9 6604 0.136 0.273 — — — — P6 A G chr17 7576275 140 9376 1.493 1.493 22 7897 0.279 0.279 — — — — P6 KIF5B- — chr10/ — 28 15006 0.187 3.116 2 9989 0.020 0.334 — — — — ALK chr2 P9 EML4- — chr2/ — 0 10688 0.000 0.000 0 13647 0.000 0.000 0 13521 0.000 0.000 ALK chr2 P9 FYN- — chr6/ — 0 9261 0.000 0.000 0 6826 0.000 0.000 2 10693 0.019 0.019 ROS1 chr6 P9 ROS1- — chr6/ — 10 8029 0.125 0.125 1 6485 0.015 0.015 13 9943 0.131 0.131 MKX chr10 Bolded reporters indicate potential homozygous alleles (see Table 3 and Detailed Methods). Note that mutant cfDNA percentages for P5 were calculated from the 3 SNVs representing the dominant clone (see FIGS. 11 (a) and 11 (d)); EGFR T790M (chr7: 55249071 C−>T) was not included. Final allelic percentages reflect any adjustments made based on estimated zygosity (using inferred homozygous reporters) and/or sequencing coverage. See Detailed Methods for details.

In addition to its potential clinical utility, CAPP-Seq analysis promises to yield novel biological insights. For example, in one patient's tumor (P9), we identified both a classic EML4-ALK fusion and two previously unreported fusions involving ROS1: FYN-ROS1 and ROS1-MKX (FIG. 11(e), FIG. 15). While the potential function of these novel ROS1 fusions is unknown, to the best of our knowledge this is the first observation of ROS1 and ALK fusions in the same NSCLC patient. All fusions were confirmed by qPCR amplification of genomic DNA, and were independently recovered in plasma samples (Table 4). Separately, among cases with a ROS1 rearrangement, we found an unexpected enrichment for S34F missense mutations in U2AF1, the 35 kD subunit of the U2 spliceosomal complex auxiliary factor. This SNV was initially described as a recurrent heterozygous mutation in myelodysplastic syndrome (Graubert et al. (2012) Nat. Genet. 44:53-57; Yoshida et al. (2011) Nature 478:64-69). While U2AF1 mutations (Imielinski et al. (2012) Cell 150:1107-1120) and ROS1 translocations (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870) were recently reported to occur individually in ˜3% and ˜1.7% of lung adenocarcinomas, respectively, combining the samples we profiled with publicly available data (Detailed Methods), we observed a significant enrichment for U2AF1 S34F mutations tumors harboring ROS1 fusions (in 3 of 6; P=0.0019; FIG. 11(f), FIG. 16 and Detailed Methods).

Finally, we explored whether CAPP-Seq analysis of cfDNA could potentially be used for cancer screening. As proof-of-principle, we blinded ourselves to the mutations present in each patient's tumor and developed a statistical method to test for the presence of cancer DNA in each pre-treatment plasma sample in our cohort (FIG. 17). This method identified mutant DNA in all plasma samples containing tumor-derived mutant alleles above fractional abundances of 0.5%. Mutant DNA below this level could not be detected by our algorithm, but no mutations were falsely called, indicating the high specificity of this approach (FIG. 11(g) and Detailed Methods). Since ˜95% of nodules identified in patients at high risk for NSCLC by low-dose CT are false positives (Aberle et al. (2011) N. Engl. J. Med. 365:395-409), CAPP-Seq could potentially serve as a complementary noninvasive screening test. However, methodological improvements to further lower the detection threshold will be required to detect early stage tumors.

In conclusion, we have developed a flexible method for ultrasensitive and specific assessment of circulating tumor DNA. CAPP-Seq overcomes limitations of previously proposed methods for cfDNA analysis by simultaneously measuring multiple types of mutations without patient-specific optimization and by covering mutations in the majority of patients. Moreover, due to multiplexing, CAPP-Seq is highly economical, and per sample costs for plasma cfDNA are expected to drop further as NGS costs continue to fall. Our method has the potential to accelerate the personalized detection, therapy, and monitoring of cancer patients. We anticipate that CAPP-Seq will prove valuable in a variety of clinical settings, including the assessment of cancer DNA in alternative biological fluids and specimens with low cancer cell content.

Methods Patient Selection

Between April 2010 and June 2012, patients undergoing treatment for newly diagnosed or recurrent NSCLC were enrolled in a study approved by the Stanford University Institutional Review Board. Enrolled patients had not received blood transfusions within 3 months of blood collection. Patient characteristics are in Table 3.

Sample Collection and Processing

Peripheral blood from consented patients was collected in EDTA Vacutainer tubes (BD). Blood samples were processed within 3 hours of collection. Plasma was separated by centrifugation at 2,500×g for 10 min, transferred to microcentrifuge tubes, and centrifuged at 16,000×g for 10 min to remove cell debris. The cell pellet from the initial spin was used for isolation of germline genomic DNA from PBLs (peripheral blood leukocytes) with the DNeasy Blood & Tissue Kit (Qiagen). Matched tumor DNA was isolated from FFPE specimens or from the cell pellet of pleural effusions. Genomic DNA was quantified by Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen).

Cell-Free DNA Purification and Quantification

Cell-free DNA (cfDNA) was isolated from 1-5 mL plasma with the QIAamp Circulating Nucleic Acid Kit (Qiagen). Absolute quantification of purified cfDNA was determined by quantitative PCR (qPCR) using an 81 bp amplicon on chromosome 1 (Fan et al. (2008) Proc. Natl Acad. Sci. USA 105:16266-16271) and a dilution series of intact male human genomic DNA (Promega) as a standard curve. Power SyberGreen was used for qPCR on a HT7900 Real Time PCR machine (Applied Biosystems). Standard PCR thermal cycling parameters were used.

Illumina NGS Library Construction

Indexed Illumina NGS libraries were prepared from cfDNA and shorn tumor, germline, and cell line genomic DNA. For patient cfDNA, 7-32 ng DNA was used for library construction without additional shearing or fragmentation. For tumor, germline, and cell line genomic DNA, 69-1000 ng DNA was sheared prior to library construction with a Covaris S2 instrument using the recommended settings for 200 bp fragments. See Table 2 for details.

The NGS libraries were constructed using the KAPA Library Preparation Kit (Kapa Biosystems) employing a DNA Polymerase possessing strong 3′-5′ exonuclease (or proofreading) activity and displaying the lowest published error rate (i.e. highest fidelity) of all commercially available B-family DNA polymerases (Quail et al. (2012) Nat. Methods 9:10-11; Oyola et al. (2012) BMC Genomics 13:1). The manufacturer's protocol was modified to incorporate with-bead enzymatic and cleanup steps (Fisher et al. (2011) Genome Biol. 12:R1). Briefly, following the end repair reaction, Agencourt AMPure XP beads (Beckman-Coulter) were added to bind and wash the DNA fragments. The DNA was then eluted directly into 50 μL 1× A-tailing buffer containing the A-tailing enzyme. Following the A-tailing reaction, the DNA fragments were forced to bind to the same AMPure XP beads by adding 90 μL (1.8×) of PEG buffer (20% PEG-8000 in 2.5M NaCl). After washing, the DNA was eluted into 50 μL 1× ligation buffer with ligase and 100-fold molar excess of indexed Illumina TruSeq adapters. Ligation was performed for 16 hours at 16° C. Single-step size selection was performed by adding 40 μL (0.8×) of PEG buffer to enrich for ligated DNA fragments. The ligated fragments were then amplified using 500 nM Illumina backbone oligonucleotides and a variable number of PCR cycles (between 4 and 9) depending on input DNA mass. In order to minimize bias and maximize recovery of GC-rich templates, all PCR reactions were carried out in a BioRad DNA Engine Thermal Cycler with a ramp rate of 2.2° C./sec or an Eppendorf Vapo Protect Mastercycler with the Safe ramp rate setting.

Library purity and concentration was assessed by spectrophotometer (NanoDrop 2000) and qPCR (KAPA Biosystems), respectively. Fragment length was determined on a 2100 Bioanalyzer using the DNA 1000 Kit (Agilent).

Design of Library for Hybrid Selection

Custom hybrid selection was performed with the SeqCap EZ Choice Library, v2.0 (Roche NimbleGen). The custom SeqCap library was designed through the NimbleDesign portal (v1.2.R1) using genome build HG19 NCBI Build 37.1/GRCh37 and with Maximum Close Matches set to 1. Input genomic regions were selected according to the most frequently mutated genes and exons in NSCLC. These regions were identified from the COSMIC database, TCGA, and other published sources as described in the Detailed Materials. Final selector coordinates are provided in Table 1.

Hybrid Selection and High Throughput Sequencing

NimbleGen SeqCap EZ Choice was used according to the manufacturer's protocol with modifications. Between 9 and 12 indexed Illumina libraries were included in a single capture reaction. Prior to hybrid selection, the libraries were quantified with a NanoDrop 2000 spectrophotometer, and 83-111 ng of each library was added (1 μg total DNA per capture reaction). Following hybrid selection, the captured DNA fragments were amplified with 12-to-14 cycles of PCR using 1× KAPA HiFi Hot Start Ready Mix and 2 μM Illumina backbone oligonucleotides in 4-to-6 separate 50 μL reactions. The reactions were then pooled and processed with the QIAquick PCR Purification Kit (Qiagen). Multiplexed libraries were sequenced using 2×100 bp pared-end runs on an Illumina HiSeq 2000.

Mapping and Quality Control of NGS Data

Paired-end reads were mapped to the hg19 reference genome with BWA 0.6.2 (default parameters) (Li & Durbin (2009) Bioinformatics 25:1754-1760), and sorted/indexed with SAMtools (Li et al. (2009) Bioinformatics 25:2078-2079). QC was assessed using a custom Perl script to collect a variety of statistics, including mapping characteristics, read quality, and selector on-target rate (i.e., number of unique reads that intersect the selector space divided by all aligned reads), generated respectively by SAMtools flagstat, FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and BEDTools coverageBed (Quinlan & Hall (2010) Bioinformatics 26:841-842). Importantly, we used a custom version of coverageBed modified to count each read at most once. Plots of fragment length distribution and sequence depth/coverage were automatically generated for visual QC assessment. To mitigate the impact of sequencing errors, analyses not involving fusions were restricted to properly paired reads, and high-quality bases with a Phred quality score of at least 30 (≦0.1% probability of a sequencing error) were further analyzed.

Analysis of Detection Thresholds by CAPP-Seq

Two dilution series were performed to assess the linearity and accuracy of CAPP-Seq for quantitating tumor-derived cfDNA. In one experiment, shorn genomic DNA from a NSCLC cell line (HCC78) was spiked into cfDNA from a healthy individual, while in a second experiment, shorn genomic DNA from one NSCLC cell line (NCI-H3122) was spiked into shorn genomic DNA from a second NSCLC line (HCC78). A total of 32 ng DNA was used for library construction. Following mapping and quality control, homozygous reporters were identified as alleles unique to each sample with at least 20× sequencing depth at an allelic fraction >80%. Fourteen such reporters were identified between HCC78 genomic DNA and plasma cfDNA (FIG. 9 (d), (e)), whereas 24 reporters were found between NCI-H3122 and HCC78 genomic DNA (FIG. 10).

CAPP-Seq Bioinformatics Pipeline

Details of bioinformatics methods are supplied in the Detailed Methods, and a graphical schematic is provided in FIG. 2. Briefly, for detection of SNVs and indels, we employed VarScan 2 (Koboldt et al. (2012) Genome Res. 22:568-576) with strict post-processing filters to improve variant call confidence, and for fusion identification and breakpoint characterization we used a novel algorithm, termed FACTERA (Detailed Methods). To quantify tumor burden in plasma cfDNA, allele frequencies of reporter SNVs/indels were assessed using the output of SAMtools mpileup (Li et al. (2009) Bioinformatics 25:2078-2079), and fusions, if detected, were enumerated with FACTERA.

Statistical Analysis

The NSCLC selector was validated in silico using an independent cohort of lung adenocarcinomas (Imielinski et al. (2012) Cell 150:1107-1120) (FIG. 1(c)). To assess statistical significance, we analyzed the same cohort using 10,000 random selectors sampled from the exome, each with an identical size distribution to the CAPP-Seq NSCLC selector. The performance of random selectors had a Gaussian distribution, and p-values were calculated accordingly. Note that all identified somatic lesions were considered in this analysis.

We used Monte Carlo sampling to estimate the distribution of background alleles across the NSCLC selector (FIG. 9 (a), (c); Detailed Methods). For each plasma sample, background alleles were defined as alleles remaining after exclusion of germline and/or somatic variant calls made by VarScan 2 (Koboldt et al. (2012) Genome Res. 22:568-576) (somatic p-value=0.01; otherwise, default parameters), and with a Phred quality score ≧30. To evaluate the impact of reporter number on tumor burden estimates, we also performed Monte Carlo sampling (1,000×), varying the number of reporters available {1, 2, . . . , max n} in two spiking experiments (FIG. 9 (d)-(f); FIG. 10 (b)-(d)).

To assess the significance of tumor burden estimates in plasma cfDNA, we compared patient-specific SNV frequencies against the null distribution of background SNVs across the selector. Briefly, patient-specific background was quantified using the method described for FIG. 9 (a) (Detailed Methods), but using the number of SNVs identified in the patient's tumor. For patients with at least 1 SNV, but no other reporter types, tumor-derived cfDNA was considered not detectable if mean SNV fractions fell below the 95^thpercentile of background alleles (i.e., P≧0.05) (FIG. 11 (a)). (Due to the ultra-low false detection rate for indels (Minoche et al. (2011) Genome Biol. 12:R112) and fusion breakpoints, these mutation types were considered detected when present with >0 read support.) For patients with detectable disease in only 1 time point, the corresponding empirical p-value is shown in FIG. 11 (a). To assess normality, we analyzed the patient with the most reporter alleles (i.e., P2; FIG. 11 (a)), and found that fractional abundance measurements fit a normal distribution (D'Agostino and Pearson omnibus normality test). Thus, for patients with detectable tumor-derived cfDNA in two time points and with at least 3 cfDNA SNVs/indels, the change in tumor burden was statistically assessed using a two-sided paired t-test. For P9, who lacked reporter SNVs/indels, statistical significance was estimated by correlation of CAPP-Seq measurements with known tumor volume (as measured by CT scans).

Additional details on cell lines, tumor cell sorting, optimizations of library preparation, mutation/translocation validation, CAPP-Seq design and analytical pipelines including FACTERA translocation detection tool, and additional statistical methods are presented in the Detailed Methods.

Detailed Methods A. Molecular Biology Methods A1. Cell Lines

The lung adenocarcinoma cell lines NCI-H3122 and HCC78 were obtained from ATCC and DSMZ, respectively, and grown in RPMI 1640 with L-glutamine (Gibco) supplemented with 10% fetal bovine serum (Gembio) and 1% penicillin/streptomycin cocktail. Cells were maintained in mid-log-phase growth in a 37° C. incubator with 5% CO₂. Genomic DNA was purified from freshly harvested cells with the DNeasy Blood & Tissue Kit (Qiagen).

A2. Pleural Fluid Processing and Flow Cytometry, and Cell Sorting

Cells from pleural fluid from patients P9 and P6 were harvested by centrifugation at 300×g for 5 min at 4° C. and washed in FACS staining buffer (HBSS+2% heat-inactivated calf serum [HICS]). Red blood cells were lysed with ACK Lysing Buffer (Invitrogen), and clumps were removed by passing through a 100 μm nylon filter. Filtered cells were spun down and resuspended in staining buffer. While on ice, the cell suspension was blocked for 20 min with 10 μg/mL rat IgG and then stained for 20 min with APC-conjugated mouse anti-human EpCAM (BioLegend, clone 9C4), PerCP-Cy5.5-conjugated mouse anti-human CD45 (eBioscience, clone 2D1), and PerCP-eFluor710-conjugated mouse anti-human CD31 (eBioscience, clone WM59). After staining, cells were washed and resuspended with staining buffer containing 1 μg/mL DAPI, analyzed, and sorted with a FACSAria II cell sorter (BD Biosciences). Cell doublets and DAPI-positive cells were excluded from analysis and sorting. CD31⁻CD45⁻EpCAM⁺ cells were sorted into staining buffer, spun down, and flash frozen in liquid nitrogen. DNA was isolated with the QIAamp DNA Micro Kit (Qiagen).

A3. Optimization of NGS Library Preparation from Low Input cfDNA

Any method for detecting mutant cfDNA relies on its ability to interrogate each cfDNA molecule in the circulation in order to maximize sensitivity. For this reason, we used the QIAamp Circulating Nucleic Acid kit (Qiagen) with carrier RNA as per the manufacturer's protocol to isolate cfDNA. We also took specific steps to improve the Illumina library preparation workflow.

Protocols for Illumina library construction were compared in a step-wise manner with the goal of (1) optimizing adapter ligation efficiency, (2) reducing the necessary number of PCR cycles following adapter ligation, (3) preserving the naturally occurring size distribution of cfDNA fragments, and (4) minimizing variability in depth of sequencing coverage across all captured genomic regions. Initial optimization was done with NEBNext DNA Library Prep Reagent Set for Illumina (New England BioLabs), which includes reagents for end-repair of the cfDNA fragments, A-tailing, adapter ligation, and amplification of ligated fragments with Phusion High-Fidelity PCR Master Mix. Input was 4 ng cfDNA (obtained from plasma of the same healthy volunteer) for all conditions. Relative allelic abundance in the constructed libraries was assessed by qPCR of 4 genomic loci (Roche NimbleGen: NSC-0237, NSC-0247, NSC-0268, and NSC-0272) and compared by the 2^−ΔCtmethod.

Ligations were performed at 20° C. for 15 min (as per the manufacturer's protocol), at 16° C. for 16 hours, or with temperature cycling for 16 hours as previously described (Lund et al. (1996) Nucl. Acids Res. 24:800-801). Ligation volumes were varied from the standard (50 μL) down to 10 μL while maintaining a constant concentration of DNA ligase, cfDNA fragments, and Illumina adapters. Subsequent optimizations incorporated ligation at 16° C. for 16 hours in 50 μL reaction volumes.

Next, we compared standard SPRI bead processing procedures, in which new AMPure XP beads are added after each enzymatic reaction and DNA is eluted from the beads for the next reaction, to with-bead protocol modifications as previously described (Fisher, S. et al. (2011) Genome Biol. 12:R1). We compared 2 concentrations of Illumina adapters in the ligation reaction: 12 nM (10-fold molar excess to cfDNA fragments) and 120 nM (100-fold molar excess).

Using the optimized library preparation procedures, we next compared the NEBNext DNA Library Prep Reagent Set (with Phusion DNA Polymerase) to the KAPA Library Preparation Kit (with KAPA HiFi DNA Polymerase). The KAPA Library Preparation Kit with our modifications was also compared to the NuGEN SP Ovation Ultralow Library System with automation on Mondrian SP Workstation.

A4. Evaluation of Library Preparation Modifications on CAPP-Seq Performance

We performed CAPP-Seq on 32 ng cfDNA using standard library preparation procedures with the NEBNext kit, or with optimized procedures using either the NEBNext kit or the KAPA Library Preparation Kit. In parallel we performed CAPP-Seq on 4 ng and 128 ng cfDNA using the KAPA kit with our optimized procedures. Indexed libraries were constructed, and hybrid selection was performed in multiplex. The post-capture multiplexed libraries were amplified with Illumina backbone primers for 14 cycles of PCR and then sequenced on a paired-end 100 bp lane of an Illumina HiSeq 2000.

We also evaluated CAPP-Seq on ultralow input following whole genome amplification (WGA). For WGA we chose not to use multiple displacement amplification with Φ29 DNA polymerase due given the small size of cfDNA fragments in plasma (FIG. 1(e)), and due to concern for chimera formation, which would confound analysis of recurrent gene fusions in NSCLC by CAPP-Seq. Instead we used SeqPlex DNA Amplification Kit (Sigma-Aldrich), which employs degenerate oligonucleotide primer PCR. We used the upper limit of input into this kit (1 ng) and performed whole genome amplification according to the manufacturer's protocol. Briefly, 1 ng cfDNA was amplified with real-time monitoring with SYBR Green I (Sigma-Aldrich) on a HT7900 Real Time PCR machine (Applied Biosystems). The amplification was terminated after 17 cycles yielding 2.8 μg DNA. The primer removal step yielded ˜600 ng DNA, and this entire amount was used for library preparation using the NEBNext kit with optimized procedures as described above.

A5. Validation of Variants Detected by CAPP-Seq

All structural rearrangements and a subset of tumoral SNVs detected by CAPP-Seq were independently confirmed by qPCR and/or Sanger sequencing of amplified fragments. For HCC78, a 120 bp fragment containing the SLC34A2-ROS1 breakpoint was amplified from genomic DNA using the primers: 5′-AGACGGGAGAAAATAGCACC-3′ and 5′-ACCAAGGGTTGCAGAAATCC-3′. A 141 bp fragment containing exon 2 of U2AF1 was amplified using the primers: 5′-CATGTGTTTGATATCTTCCCAGC-3′ and 5′-CTGGCTAAACGTCGGTTTATTG-3′. For NCI-H3122, a 143 bp fragment containing the EML4-ALK breakpoint was amplified using the primers: 5′-GAGATGGAGTTTCACTCTTGTTGC-3′ and 5′-GAACCTTTCCATCATACTTAGAAATAC-3′. 5 ng genomic DNA was used as template with 250 nM oligos and 1× Phusion PCR Master Mix (NEB) in 50 μL reactions. Products were resolved on 2.5% agarose gel and bands of the expected size were removed. The amplified DNA fragments were purified using the Qiaquick Gel Extraction Kit (Qiagen) and submitted for Sanger sequencing (Elim Biopharm). For P9, genomic DNA breakpoints were confirmed by qPCR using the primers: 5′-TCCATGGAAGCCAGAAC-3′ and 5′-ATGCTAAGATGTGTCTGTCA-3′ for EML4-ALK; 5′-CCTTAACACAGATGGCTCTTGATGC-3′ and 5′-TCCTCTTTCCACCTTGGCTTTCC-3′ for ROS1-MKX; and 5′-GGTTCAGAACTACCAATAACAAG-3′ and 5′-ACCTGATGTGTGACCTGATTGATG-3′ for FYN-ROS1. For qPCR, 10 ng of pre-amplified genomic DNA was used as template with 250 nM oligos and 1× Power SyberGreen Master Mix in 10 μL reactions performed in triplicate on a HT7900 Real Time PCR machine (Applied Biosystems). Standard PCR thermal cycling parameters were used. Amplification of amplicons spanning all 3 breakpoints detected in P9 were confirmed in tumor genomic DNA as well as plasma cfDNA, and PBL genomic DNA was used as a negative control. Separately, at least 88% of SNVs and indels detected were bona fide somatic mutations in tumors, as 38 of 46 of them were independently observed above 0.025% allele frequency in plasma cfDNA and/or were independently confirmed by SNaPshot clinical assays.

B. Bioinformatics and Statistical Methods B1. Analysis of CAPP-Seq Background

The CAPP-Seq background rate was estimated by Monte Carlo sampling of allelic frequencies across the NSCLC selector (FIG. 9 (a)). Plasma cfDNA samples were pre-filtered to remove all variant calls and dominant alleles. Specifically, for each patient, we excluded germline, loss of heterozygosity (LOH), and/or somatic variant calls made by VarScan 2 (Koboldt et al. (2012) Genome Res. 22:568-576) (somatic p-value=0.01; otherwise, default parameters). We sampled 4 random background alleles across this subset of the selector (equal to the median number of SNVs per NSCLC patient detected by CAPP-Seq) and calculated their mean allelic frequency, only considering bases discordant with the prevailing genotype of the plasma sample at those 4 positions. This process was iterated 10,000 times, and mean, median, and 75^thpercentile statistics were collected. The entire procedure was then repeated for 5 total simulations, shown in FIG. 2a.

We likewise applied Monte Carlo simulation to estimate the probability of finding a background allele in plasma cfDNA at a given fractional abundance (FIG. 9 (c)). For consistency with the ranking of alleles in FIG. 9 (c), we populated a vector containing the mean background allele frequency for each genomic position across 7 plasma cfDNA samples, each filtered to remove dominant alleles as described above. Alleles were randomly sampled from this vector 10,000 times to identify the allele frequency with an empirical p-value of 0.01.

B2. ROS1 and U2AF1 Co-Association Analysis B2.1 Assembly of ROS1 and U2AF1 Mutant NSCLC

We included only cases in which the status of both ROS1 fusion status and U2AF1 S34 mutation was known. There were 163 such cases from TCGA (genotyped for U2AF1 by whole exome sequencing and for ROS1 fusions by RNA-Seq as detailed below), 23 cases from Imielinski et al. (2012) Cell 150:1107-1120, 17 cases from Govindan et al. (2012) Cell 150:1121-1134, and 13 cases from the present study (11 patients and 2 NSCLC cell lines). U2AF1 S34F mutations were detected in 11 cases (5 from TCGA, 3 from Imielinski et al., 1 from Govindan et al., and 2 from the present study), and ROS1 fusions were detected in 6 cases (2 from TCGA, described below, and 4 from the present study). Significance testing was performed using the Fisher's exact test, and a two-tailed P-value is reported.

B2.2. Analysis of Whole Transcriptome Sequencing Data from TCGA for ROS1 Fusions

We identified two TCGA lung adenocarcinoma patients, TCGA-05-4426 and TCGA-64-1680, harboring candidate ROS1 fusions (FIG. 16 (a)) Importantly, the latter patient also has the U2AF1 S34F missense mutation reported in this study and in prior literature (see above). To further analyze both patients' putative rearrangements, whole transcriptome RNA-Seq data (.bam files) were obtained using the UCSC GeneTorrent system (https://cghub.ucsc.edu/downloads.html) and realigned to hg19 using BWA 0.6.2 using default parameters (Li & Durbin (2009) Bioinformatics 25:1754-1760) Importantly, mapped RNA-Seq reads extended significantly past coding regions, allowing for improved assessment of fusion events (FIG. 16 (b), (c)). From a manual inspection of associated RPKM expression data across ROS1 exons (FIG. 16 (a)), we suspected that breakpoint sites for these fusions may lie directly upstream of ROS1 exons 32 and 35, respectively. Using the Integrated Genome Viewer (IGV) (Robinson et al. (2011) Nat. Biotechnol. 29:24-26), we found improperly paired (or discordant) reads near these exons that link ROS1 to its well-described partners, SLC34A2 and CD74, respectively (FIG. 16 (b), (c)). Indeed, by applying FACTERA's templated fusion discovery (detailed below) to patient TCGA-64-1680, we recovered a single read near ROS1 exon 35 that also maps to CD74 (FIG. 16 (c)). Collectively, these data strongly support the existence of expressed ROS1 fusions in these two TCGA patients.

B3. CAPP-Seq Selector Design

Most human cancers are relatively heterogeneous for somatic mutations in individual genes. Specifically, in most human tumors, recurrent somatic alterations of single genes account for a minority of patients, and only a minority of tumor types can be defined using a small number of recurrent mutations (<5-10) at predefined positions. Therefore, the design of the selector is vital to the CAPP-Seq method because (1) it dictates which mutations can be detected in with high probability for a patient with a given cancer, and (2) the selector size (in kb) directly impacts the cost and depth of sequence coverage. For example, the hybrid selection libraries available in current whole exome capture kits range from 51-71 Mb, providing ˜40-60 fold maximum theoretical enrichment versus whole genome sequencing. The degree of potential enrichment is inversely proportional to the selector size such that for a ˜100 kb selector, >10,000 fold enrichment should be achievable.

We employed a six-phase design strategy to identify and prioritize genomic regions for the CAPP-Seq NSCLC selector as detailed below. Three phases were used to incorporate known and suspected NSCLC driver genes, as well as genomic regions known to participate in clinically actionable fusions (phases 1, 5, 6), while another three phases employed an algorithmic approach to maximize both the number of patients covered and SNVs per patient (phases 2-4). The latter relied upon a metric that we termed “Recurrence Index” (RI), defined as the number of NSCLC patients with SNVs that occur within a given kilobase of exonic sequence (i.e., No. of patients with mutations/exon length in kb). RI thus serves to measure patient-level recurrence frequency at the exon level, while simultaneously normalizing for gene/exon size. As a source of somatic mutation data uniformly genotyped across a large cohort of patients, in phases 2-4, we analyzed non-silent SNVs identified in TCGA whole exome sequencing data from 178 patients in the Lung Squamous Cell Carcinoma dataset (SCC) (Hammerman et al. (2012) Nature 489:519-525) and from 229 patients in the Lung Adenocarcinoma (LUAD) datasets (TCGA query date was Mar. 13, 2012). Thresholds for each metric (i.e. RI and patients per exon) were selected to statistically enrich for known/suspected drivers in SCC and LUAD data (FIG. 9). RefSeq exon coordinates (hg19) were obtained via the UCSC Table Browser (query date was Apr. 11, 2012).

The following algorithm was used to design the CAPP-Seq selector (parenthetical descriptions match design phases noted in FIG. 1 (b)).

Phase 1 (Known Drivers)

Initial seed genes were chosen based on their frequency of mutation in NSCLCs.

Analysis of COSMIC (v57) (Forbes et al. (2010) Nucl. Acids Res. 38:D652-657) identified known driver genes that are recurrently mutated in ≧9% of NSCLC (denominator ≧500 cases). Specific exons from these genes were selected based on the pattern of SNVs previously identified in NSCLC. The seed list also included single exons from genes with recurrent mutations that occurred at low frequency but had strong evidence for being driver mutations, such as BRAF exon 15, which harbors V600E mutations in <2% of NSCLC (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181; Okuda et al. (2008) Cancer Sci. 99:2280-2285; Su et al. (2011) J. Mol. Diagn. 13:74-84; Tsao et al. (2007) J. Clin. Oncol. 25:5240-5247; Chaft et al. (2012) Mol. Cancer Ther. 11:485-491; Paik et al. (2011) J. Clin. Oncol. 29:2046-2051; Stephens et al. (2004) Nature 431:525-526; Jin et al. (2010) Lung Cancer 69:279-283; Malanga et al. (2008) Cell Cycle 7:665-669).

Phase 2 (Max. Coverage)

For each exon with SNVs covering ≧5 patients in LUAD and SCC, we selected the exon with highest RI that identified at least 1 new patient when compared to the prior phase. Among exons with equally high RI, we added the exon with minimum overlap among patients already captured by the selector. This was repeated until no further exons met these criteria.

Phase 3 (RI≧30)

For each remaining exon with an RI≧30 and with SNVs covering ≧3 patients in LUAD and SCC, we identified the exon that would result in the largest reduction in patients with only 1 SNV. To break ties among equally best exons, the exon with highest RI was chosen. This was repeated until no additional exons satisfied these criteria.

Phase 4 (RI≧20)

Same procedure as phase 3, but using RI≧20.

Phase 5 (Predicted Drivers)

We included all exons from additional genes previously predicted to harbor driver mutations in NSCLC (Ding et al. (2008) Nature 455:1069-1075; Youn & Simon (2011) Bioinformatics 27:175-181).

Phase 6 (Add Fusions)

For recurrent rearrangements in NSCLC involving the receptor tyrosine kinases ALK, ROS1, and RET, the introns most frequently implicated in the fusion event and the flanking exons were included.

All exons included in the selector, along with their corresponding HUGO gene symbols and genomic coordinates, as well as patient statistics for NSCLC and a variety of other cancers, are provided in Table 1, organized by selector design phase.

C. CAPP-Seq Computational Pipeline C1. Mutation Discovery: SNVs/Indels

For detection of somatic SNV and insertion/deletion events, we employed VarScan 2 (Koboldt et al. (2012) Genome Res 22:568-576) (somatic p-value=0.01, minimum variant frequency=5%, and otherwise default parameters). Somatic variant calls (SNV or indel) present at less than 0.5% mutant allelic frequency in the paired normal sample (PBLs), but in a position with at least 1000× overall depth in PBLs and 100× depth in the tumor, and with at least 1× read depth on each strand, were retained (Table 3). While the selector was designed to predominantly capture exons, in practice, it also captures limited sequence content flanking each targeted region. For instance, this phenomenon is the basis for the (thus far) uniformly successful recovery by CAPP-Seq of fusion partners (which are not included within the selector) for kinase genes such as ALK and ROS1 recurrently rearranged in NSCLC. As such, we also considered variant calls detected within 500 bps of defined selector coordinates. These calls were eliminated if present in non-coding repeat regions, since repeats may confound mapping accuracy. Repeat sequence coordinates were obtained using the RepeatMasker track in the UCSC table browser (hg19). Variant annotation was performed using the SeattleSeq Annotation 137 web server (http://snp.gs.washington.edu/SeattleSeqAnnotation137/). Complete details for all identified SNVs and indels are provided in Table 2.

By manual inspection, two patients (P2 and P6) had SNVs with frequencies consistent with potential heterozygous and homozygous alleles. We labeled these alleles accordingly (Table 3), and based on our assumption of zygosity in these two patients, we adjusted measured fractions of heterozygous reporters in plasma cfDNA to better estimate tumor burden (Table 4).

C2. Mutation Discovery: Fusions

For practical and robust de novo enumeration of genomic fusion events and breakpoints from paired-end next-generation sequencing data, we developed a novel heuristic approach, termed FACTERA (FACile Translocation Enumeration and Recovery Algorithm). FACTERA has minimal external dependencies, works directly on a preexisting .bam alignment file, and produces easily interpretable output. Major steps of the algorithm are summarized below, and are complemented by a graphical schematic to illustrate key elements of the breakpoint identification process (FIG. 4).

As input, FACTERA requires a .bam alignment file of paired-end reads produced by BWA (Li & Durbin (2009) Bioinformatics 25:1754-1760), exon coordinates in .bed format (e.g., hg19 RefSeq coordinates), and a 0.2 bit reference genome to enable fast sequence retrieval (e.g., hg19). In addition, the analysis can be optionally restricted to reads that overlap particular genomic regions (.bed file), such as the CAPP-Seq selector used in this work.

FACTERA processes the input in three sequential phases: identification of discordant reads, detection of breakpoints at base pair-resolution, and in silico validation of candidate fusions. Each phase is described in detail below.

C2.1. Identification of Discordant Reads

To iteratively reduce the sequence space for gene fusion identification, FACTERA, like other algorithms (e.g. BreakDancer (Chen et al. (2009) Nat. Methods 6:677-681)), identifies and classifies discordant read pairs. Such reads indicate a nearby fusion event since they either map to different chromosomes or are separated by an unexpectedly large insert size (i.e. total fragment length), as determined by the BWA mapping algorithm. The bitwise flag accompanying each aligned read encodes a variety of mapping characteristics (e.g., improperly paired, unmapped, wrong orientation, etc.) and is leveraged to rapidly filter the input for discordant pairs. The closest exon of each discordant read is subsequently identified, and used to cluster discordant pairs into distinct gene-gene groups, yielding a list of genomic regions R adjacent to candidate fusion sites. For each member gene of a discordant gene pair, the genomic region R_iis defined by taking the minimum of all 3′ exon/read coordinates in the cluster, and the maximum of all 5′ exon/read coordinates in the cluster. These regions are used to prioritize the search for breakpoints in the next phase (FIG. 4 (a)).

C2.2 Detection of Breakpoints at Base Pair-Resolution

Discordant read pairs may be introduced by NGS library preparation and/or sequencing artifacts (e.g., jumping PCR). However, they are also likely to flank the breakpoints of bona fide fusion events. As such, all discordant gene pairs identified in the preceding of one read matches the soft-clipped region of the other, FACTERA records a putative fusion event. To assess inter-read concordance (e.g. see reads 1 and 2 in FIG. 4 (c)), FACTERA employs the following algorithm. The mapped region of read 1 is parsed into all possible subsequences of length k (i.e., k-mers) using a sliding window (k=10, by default). Each k-mer, along with its lowest sequence index in read 1, is stored in a hash table data structure, allowing k-mer membership to be assessed in constant time (FIG. 4 (c), left panel). Subsequently, the soft clipped sequence of read 2 is parsed into non-overlapping subsequences of length k, and the hash table is interrogated for matching k-mers (FIG. 4 (c), right panel). If a minimum matching threshold is achieved (=0.5×the minimum length of the two compared subsequences), then the two reads are considered concordant. FACTERA will process at most 1000 (by default) putative breakpoint pairs for each discordant gene pair. Moreover, for each gene pair, FACTERA will only compare reads whose orientations are compatible with valid fusions. Such reads have soft-clipped sequences facing opposite directions (FIG. 4 (d), top panel). When this condition is not satisfied, FACTERA uses the reverse complement of read 1 for k-mer analysis (FIG. 4 (d), bottom panel).

In some instances, genomic subsequences flanking the true breakpoint may be nearly or completely identical, causing the aligned portions of soft-clipped reads to overlap. Unfortunately, this prevents an unambiguous determination of the breakpoint. As such, FACTERA incorporates a simple algorithm to arbitrarily adjust the breakpoint in one read (i.e., read 2) to match the other (i.e., read 1). Depending upon read orientation, there are two ways this can occur, both of which are illustrated in FIG. 4 (e). For each read, FACTERA calculates the distance between the breakpoint and the read coordinate corresponding to the first k-mer match between reads. For example, as anecdotally illustrated in FIG. 4 (e), x is defined as the distance between the breakpoint coordinate of read 1 and the index of the first matching k-mer, j, whereas y denotes the corresponding distance for read 2. The offset is estimated as the difference in distances (x, y) between the two reads (see FIG. 4 (e)).

C2.3. In Silico Validation of Candidate Fusions

To confirm each candidate breakpoint in silico, FACTERA performs a local realignment of reads against a template fusion sequence (±500 bp around the putative breakpoint) extracted from the 0.2 bit reference genome. BLAST is currently employed for this purpose, although BLAT or other fast aligners could be substituted. A BLAST database is constructed by collecting all reads that map to each candidate fusion sequence, including discordant reads and soft-clipped reads, as well as all unmapped reads in the original input .bam file. All reads that map to a given fusion candidate with at least 95% identity and a minimum length of 90% of the input read length (by default) are retained, and reads that span or flank the breakpoint are counted. As a final step, output redundancies are minimized by removing fusion sequences within a 20 bp interval of any fusion sequence with greater read support and with the same sequence orientation (to avoid removing reciprocal fusions).

FACTERA produces a simple output text file, which includes for each fusion sequence, the gene pair, the chromosomal sequence coordinates of the breakpoint, the fusion orientation (e.g., forward-forward or forward-reverse), the genomic sequences within 50 bp of the breakpoint, and depth statistics for reads spanning and flanking the breakpoint. Fusions identified in patients analyzed in this work are provided in Table 3.

C2.4. Experimental Validation of FACTERA

To experimentally evaluate the performance of FACTERA, we generated NGS data from two NSCLC cell lines, HCC78 (21.5M×100 bp paired-end reads) and NCI-H3122 (19.4M×100 bp paired-end reads), each of which has a known rearrangement (ROS1 and ALK, respectively) (Bergethon et al. (2012) J. Clin. Oncol. 30:863-870; McDermott et al. (2008) Cancer Res. 68:3389-3395) with a breakpoint that has, to the best of our knowledge, not been previously published. FACTERA readily revealed evidence for a reciprocal SLC34A2-ROS1 translocation in the former and an EML4-ALK fusion in the latter. Precise breakpoints predicted by FACTERA were experimentally validated by PCR amplification and Sanger sequencing (FIG. 5; see also Validation of Variants Detected by CAPP-Seq). Importantly, FACTERA completed each run in practical time (˜90 sec), using only a single thread on a hexa-core 3.4 GHz Intel Xeon E5690 chip. These initial results illustrate the utility of FACTERA as part of the CAPP-Seq analysis pipeline.

C2.5. Templated Fusion Discovery

We implemented a user-directed option to “hunt” for fusions within expected candidate genes. A fusion could be missed by FACTERA if the fusion detection criteria employed by FACTERA are incompletely satisfied—such as if discordant reads, but not soft-clipped reads, are identified—and will most likely occur when fusion allele frequency in the tumor is extremely low. As input, the method is supplied with candidate fusion gene sequences as “baits”. All unmapped and soft-clipped reads in the input .bam file are subsequently aligned to these templates (using blastn) to identify reads that have sufficient similarity to both (for each read, 95% identity, e-value <1.0e-5, and at least 30% of the read length must map to the template, by default). Such reads are output as a list to the user for manual analysis.

We tested this simple approach on a low purity tumor sample found to harbor an ALK fusion by FISH, but not FACTERA (i.e., case P9). Using templates for ALK and its common fusion partner, ELM4, we identified 4 reads that mapped to both, in a region with an overall depth of ˜1900×. The estimated allele frequency of 0.21% is strikingly similar to the 0.22% tumor purity measured by FACS (FIG. 15), confirming the utility of the templated fusion discovery method. We subsequently FACS-depleted CD45+ immune populations and re-sequenced this patient's tumor. In the enriched tumor sample, FACTERA identified the EML4-ALK fusion, along with two novel ROS1 fusions (FIG. 4 (e), Table 3).

C3. Mutation Recovery: SNVs/Indels

Using a custom Perl script, previously identified reporter alleles were intersected with a SAMtools mpileup file generated for each plasma cfDNA sample, and the number and frequency of supporting reads was calculated for each reporter allele. Only reporters in properly paired reads at positions with at least 500× overall depth were considered.

C4. Mutation Recovery: Fusions

For enumeration of fusion frequency in sequenced plasma DNA, FACTERA executes the last step of the discovery phase (i.e., in silico validation of candidate fusions, above) using the set of previously identified fusion templates. The fusion allele frequency is calculated as α/β, where α is the number of breakpoint-spanning reads, and β is the mean overall depth within a genomic region ±5 bps around the breakpoint. Regarding the NSCLC selector described in this work, the latter calculation was always performed on the single gene contained in the NSCLC selector library. If both fusion genes are targeted within a selector library, overall depth is estimated by taking the mean depth calculated for both genes.

Notably, in some cases we observed lower fusion allele frequencies than would be expected for heterozygous alleles (e.g., see cell line fusions in Table 3). This was seen in cell lines, in an empirical spiking experiment, and in one patient's tumor and plasma samples (i.e., P6), and could potentially result from inefficient “pull-down” of fusions whose partners are not represented in the selector. Regardless, fusions are useful reporters—they possess virtually no background signal and show linear behavior over defined concentrations in a spiking experiment (FIG. 10 (d)). Moreover, allelic frequencies in plasma are easily adjusted for such inefficiencies by dividing the measured frequency in plasma by the corresponding frequency in the tumor. In cases where sequenced tumor tissue is impure, tumor content can be estimated using the frequencies of SNVs (or indels) as a reference frame, allowing the fusion fraction to be normalized accordingly (Table 4). As for SNVs/indels, only fusions present in at least one plasma sample were included in calculations of tumor burden.

C5. Screening Plasma cfDNA without Knowledge of Tumor DNA

We devised the following statistical algorithm as a first step toward non-invasive cancer screening with plasma cfDNA. The method identifies candidate SNVs using iterative models of (i) background noise in paired germline DNA (in this work, PBLs), (ii) base-pair resolution background frequencies in plasma cfDNA across the selector, and (iii) sequencing error in cfDNA. Anecdotal examples are provided in FIG. 17. The algorithm works in four main steps, detailed below.

As input, the algorithm takes allele frequencies from a single plasma cfDNA sample and analyzes high quality background alleles, defined in a first step for each genomic position as the non-dominant base with highest fractional abundance. Only alleles with depth of at least 500× and strand bias <90% (conservative, by default) are analyzed. For consistency with variant calling, we allowed the screening approach to interrogate selector regions within 500 bp of defined coordinates, expanding the effective sequence space from ˜125 kb to ˜600 kb.

Second, the binomial distribution is used to test whether a given input cfDNA allele is significantly different from the corresponding paired germline allele (FIG. 17 (a)-(b)). Here the probability of success is taken to be the frequency of the background allele in PBLs, and the number of trials is the allele's corresponding depth in plasma cfDNA. To avoid contributions from alleles in rare circulating tumor cells that might contaminate PBLs, input alleles with a fractional abundance greater than 0.5% in paired PBLs (by default) or a Bonferroni-adjusted binomial probability greater than 2.08×10⁻⁸are not further considered (alpha of 0.05/[˜600 kb*4 alleles per position]).

Third, a database of cfDNA background allele frequencies is assembled. Here, we used samples analyzed in the present study (i.e., pre-treatment NSCLC samples and 1 sample from a healthy volunteer), except the input sample is left out to avoid bias. Based on the assumption that all background allele fractions follow a normal distribution, a Z-test is employed to test whether a given input allele differs significantly from typical cfDNA background at the same position (FIG. 17 (a)-(b)). All alleles within the selector are evaluated, and those with an average background frequency of 5% or greater (by default) or a Bonferroni-adjusted single-tailed Z-score <5.6 are not further considered (alpha of 0.05, adjusted as above).

Finally, candidate alleles are tested for remaining possible sequencing errors. This step leverages the observation that non-tumor variants (i.e., “errors”) in plasma cfDNA tend to have a higher duplication rate than bona fide variants detectable in the patient's tumor (data not shown). As such, the number of supporting reads is compared for each input allele between nondeduped (all fragments meeting QC criteria; see Methods) and deduped data (only unique fragments meeting QC criteria). An outlier analysis is then used to distinguish candidate tumor-derived SNVs from remaining background noise (FIG. 17 (a)-(c)). Specifically, to reveal outlier tendency in the data, the square root of the robust distance Rd (Mahalanobis distance) is compared against the square root of the quantiles of a chi-squared distribution Cs. This transformation reveals natural separation between true SNVs and false positives in cancer patients (FIG. 17 (a), (c)), and notably, reveals an absence of outlier structure in patient samples lacking tumor-derived SNVs (FIG. 17 (b), (c)). To automatically call SNVs without prior knowledge, the screening approach iterates through data points by decreasing Rb and recalculating the Pearson's correlation coefficient Rho between Rd and Cs for points 1 to i, where Rd_iis the current maximum Rd. The algorithm iteratively reports outliers (i.e., candidate SNVs) until it terminates when Rho≧0.85.

Importantly, this approach positively identified 60% of the cancer samples with tumor-derived SNVs analyzed in this study with no false positive calls (FIG. 11 (g)). When corresponding germline DNA from PBLs are unavailable, one can skip the 2^ndstep in this screening routine. After removal of germline SNVs with an allelic fraction >20%, this modified approach identified no SNVs when applied to a healthy volunteer.

All patents, patent publications, and other published references mentioned herein are hereby incorporated by reference in their entireties as if each had been individually and specifically incorporated by reference herein.

While specific examples have been provided, the above description is illustrative and not restrictive. Any one or more of the features of the previously described embodiments can be combined in any manner with one or more features of any other embodiments in the present invention. Furthermore, many variations of the invention will become apparent to those skilled in the art upon review of the specification. The scope of the invention should, therefore, be determined by reference to the appended claims, along with their full scope of equivalents.

Claims

1. A method for creating a library of recurrently mutated genomic regions comprising:

identifying a plurality of genomic regions from a group of genomic regions that are recurrently mutated in a specific cancer;

wherein the library comprises the plurality of genomic regions;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

2. The method of claim 1, wherein the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

3. The method of claim 1, wherein at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

4. The method of claim 1, wherein at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

5. The method of claim 1, wherein the identifying step comprises for each genomic region in the plurality of genomic regions, ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

6. The method of claim 1, wherein the identifying step comprises for each genomic region in the plurality of genomic regions, ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

7. The method of claim 1, wherein the library comprises a plurality of genomic regions encoding a plurality of driver sequences.

8. The method of claim 7, wherein the driver sequences are known driver sequences.

9. The method of claim 7, wherein the driver sequences are recurrently mutated in the specific cancer.

10. The method of claim 1, wherein the library comprises a plurality of genomic regions that are recurrently rearranged in the specific cancer.

11. The method of claim 1, wherein the specific cancer is a carcinoma.

12. The method of claim 11, wherein the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

13. The method of claim 1, wherein the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

14. A method for analyzing a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a tumor nucleic acid sample and a genomic nucleic acid sample from a subject with a specific cancer;

sequencing a plurality of target regions in the tumor nucleic acid sample and in the genomic nucleic acid sample to obtain a plurality of tumor nucleic acid sequences and a plurality of genomic nucleic acid sequences; and

comparing the plurality of tumor nucleic acid sequences to the plurality of genomic nucleic acid sequences to identify a patient-specific genetic alteration in the tumor nucleic acid sample;

wherein the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

15. The method of claim 14, wherein the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

16. The method of claim 14, wherein at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

17. The method of claim 14, wherein at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

18. The method of claim 14, wherein each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

19. The method of claim 14, wherein each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

20. The method of claim 14, wherein the plurality of genomic regions comprises genomic regions encoding a plurality of driver sequences.

21. The method of claim 20, wherein the driver sequences are known driver sequences.

22. The method of claim 20, wherein the driver sequences are recurrently mutated in the specific cancer.

23. The method of claim 14, wherein the plurality of genomic regions comprises genomic regions that are recurrently rearranged in the specific cancer.

24. The method of claim 14, wherein the specific cancer is a carcinoma.

25. The method of claim 24, wherein the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

26. The method of claim 14, wherein the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

27. The method of any one of claims 14-26, further comprising the steps of:

obtaining a cell-free nucleic acid sample from the subject; and

identifying the patient-specific genetic alteration in the cell-free nucleic acid sample.

28. The method of claim 27, wherein the step of identifying the patient-specific genetic alteration in the cell-free nucleic acid sample comprises sequencing a genomic region comprising the patient-specific genetic alteration in the cell-free sample.

29. The method of claim 27, wherein the step of obtaining a tumor nucleic acid sample and a genomic nucleic acid sample comprises the step of enriching the plurality of target regions in the tumor nucleic acid sample and the genomic nucleic acid sample.

30. The method of claim 29, wherein the enriching step comprises use of a custom library of biotinylated DNA.

31. The method of claim 27, wherein the step of obtaining a cell-free nucleic acid sample comprises the step of enriching the plurality of target regions in the cell-free nucleic acid sample.

32. The method of claim 27, further comprising the step of quantifying the cancer-specific genetic alteration in the cell-free sample.

33. A method for screening a cancer-specific genetic alteration in a subject comprising the steps of:

obtaining a cell-free nucleic acid sample from a subject;

sequencing a plurality of target regions in the cell-free sample to obtain a plurality of cell-free nucleic acid sequences; and

identifying a cancer-specific genetic alteration in the cell-free sample;

wherein the plurality of target regions are selected from a plurality of genomic regions that are recurrently mutated in the specific cancer;

the plurality of genomic regions comprises at least 10 different genomic regions; and

at least one mutation within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

34. The method of claim 33, wherein the plurality of genomic regions comprises at least 25, at least 50, at least 100, at least 150, at least 200, or at least 500 different genomic regions.

35. The method of claim 33, wherein at least two mutations within the plurality of genomic regions or at least three mutations within the plurality of genomic regions is present in at least 60% of all subjects with the specific cancer.

36. The method of claim 33, wherein at least one mutation within the plurality of genomic regions is present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of all subjects with the specific cancer.

37. The method of claim 33, wherein each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the number of all subjects with the specific cancer having at least one mutation within the genomic region.

38. The method of claim 33, wherein each genomic region in the plurality of genomic regions is identified by ranking the genomic region to maximize the ratio between the number of all subjects with the specific cancer having at least one mutation within the genomic region and the length of the genomic region.

39. The method of claim 33, wherein the plurality of genomic regions comprises genomic regions encoding a plurality of driver sequences.

40. The method of claim 39, wherein the driver sequences are known driver sequences.

41. The method of claim 39, wherein the driver sequences are recurrently mutated in the specific cancer.

42. The method of claim 33, wherein the plurality of genomic regions comprises genomic regions that are recurrently rearranged in the specific cancer.

43. The method of claim 33, wherein the specific cancer is a carcinoma.

44. The method of claim 43, wherein the carcinoma is an adenocarcinoma, a non-small cell lung cancer, or a squamous cell carcinoma.

45. The method of claim 33, wherein the cumulative length of the plurality of genomic regions is at most 30 Mb, 20 Mb, 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb, or 10 kb.

46. The method of claim 33, wherein the step of obtaining a cell-free nucleic acid sample comprises the step of enriching the plurality of target regions in the cell-free nucleic acid sample.

47. The method of claim 46, wherein the enriching step comprises use of a custom library of biotinylated DNA.