SYSTEMS AND METHODS FOR COMPREHENSIVE ANALYSIS OF MOLECULAR PROFILES ACROSS MULTIPLE TUMOR AND GERMLINE EXOMES
Omics patient data are analyzed using sequences or diff objects of tumor and matched normal tissue to identify patient and disease specific mutations, using transcriptomic data to identify expression levels of the mutated genes, and pathway analysis based on the so obtained omic data to identify specific pathway characteristics for the diseased tissue. Most notably, many different tumors have shared pathway characteristics, and identification of a pathway characteristic of a tumor may thus indicate effective treatment options ordinarily not considered when tumor analysis is based on anatomical tumor type only.
This application is a continuation application of allowed U.S. patent application with the Ser. No. 14/726,930, which was filed Jun. 1, 2015, which claims the benefit of priority to U.S. provisional application having Ser. No. 62/005,766, filed May 30, 2014, all of which are incorporated by reference herein.
FIELD OF THE INVENTIONThe field of the invention is computational omics, especially as it relates to analysis of molecular profiles across a large number of tumor and germline exomes from multiple patient and tumor samples.
BACKGROUND OF THE INVENTIONThe background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
While the clinical world is familiar with genomic assays targeted to a limited number of mutations as a means to derive molecular insight to therapies, the power to deliver more comprehensive, non-assumptive, and stochastic molecular analysis is sorely needed to guide treatment decisions that are unbiased to traditional tissue-by-tissue anatomical assignment of therapeutics, or a priori assumptions that a few hundred DNA mutations are drivers of cancer. Indeed, most clinicians today are challenged by a deluge of rapidly advancing science with which it becomes increasingly difficult to keep pace. In this era of personalized medicine, there are nearly 800 drugs in development targeted against specific protein targets driving the growth of the tumor. This cognitive overload may have significant consequences in decision making in life-threatening diseases as complex as cancer.
Today the approach most widely used by oncologists to guide treatment selection of drugs that are targeted against altered proteins is to identify gene DNA mutations in tumor samples deploying panels of fewer than 500 “actionable” genes. Such actionable genes are typically identified from large-scale studies of various cancers (see e.g., Nature Genetics 45, 1127-1133 (2013)). All publications and applications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
Unfortunately, the current reliance on genotyping of tumor samples to drive treatment decisions is largely based on the assumption that identification of mutated DNA routinely translates downstream (from “DNA to protein expression”) to an alteration in the underlying protein pathways that are targeted by the therapy to be selected, and these identified DNA mutations are thus nominated as clinically actionable. However, exclusive analysis of genetic mutations in tumor genomes fails to take into account whether or not the mutated genes are transcribed at all, whether changes in the genome are variants or disease-drivers, and/or what the functional context of such mutations are, and whether or not compensatory mechanisms exists in a cell affected by such mutation.
Therefore, analysis of selected mutations with disregard of the above drawbacks will likely lead to various false-positive, false negative, and non-relevant results that in turn may misdirect treatment of a patient. Therefore, there remains a need for improved systems and methods for comprehensive analysis of molecular profiles.
SUMMARY OF THE INVENTIONThe inventive subject matter is drawn to systems and methods of omics analysis in which shared pathway characteristics are obtained from various distinct tumor samples. Most preferably, omics analysis includes analysis of tumor and matched normal tissue to identify patient and tumor specific changes, which is further refined using transcriptomics data. Based on such analysis, a treatment recommendation is then prepared that is typically independent of the anatomical tumor type but that takes into account a molecular signature characteristic of shared pathway characteristics.
In one aspect of the inventive subject matter, the inventors contemplate a method of identifying a molecular signature for a tumor cell that includes a step of using an analysis engine to receive a plurality of data sets from a respective plurality of patients, wherein at least two (or at least three, or at least five) of the plurality of patients are diagnosed with different tumors, and wherein each data set is representative of genomics information from tumor and matched normal cells. In another step, the analysis engine receives transcriptomics information for the at least two patients, and in yet another step, the analysis engine identifies shared pathway characteristics among the tumor cells of the at least two patients using the genomics information and the transcriptomics information. In a still further step, the analysis engine is then used to assign, on the basis of the shared pathway characteristics, a molecular signature to the tumor cells, wherein the molecular signature is assigned independently of an anatomical tumor type, and a patient record is then generated or updated using the molecular signature.
While not limiting to the inventive subject matter, it is generally contemplated that the data sets are in a BAMBAM format, a SAMBAM format, a FASTQ format, or a FASTA format, and it is typically preferred that the data sets are BAMBAM diff objects. Therefore, in further contemplated aspects, the data sets will preferably comprise mutation information, copy number information, insertion information, deletion information, orientation information and/or breakpoint information.
With respect to the genomics information it is contemplated that such information may be whole genome sequencing information or exome sequencing information, and that the transcriptomics information comprises information on transcription level and/or sequence information. Most typically, the transcriptomics information will cover at least 50% (or at least 80%) of all exomes in the genomics information from the tumor cells. Furthermore, it is contemplated that the transcriptomics information is used in the step of identifying to infer reduced or absence of function of a protein encoded by a mutated gene.
Therefore, the inventors contemplate that the shared pathway characteristics will include a constitutively activated pathway, a functionally impaired pathway, and a dysregulated pathway, and/or that the shared pathway characteristics may be characterized by a mutated non-functional protein, mutated dysfunctional protein, an overexpressed protein, or an under-expressed protein. In still further preferred aspects, the step of identifying is performed using PARADIGM or other pathway-centric method of analysis.
Additionally, it is contemplated that the molecular signature comprises information about one or more pathway elements, and especially drug identification and type of interaction with the one or more pathway elements. Therefore, it should be appreciated that the patient record may also include a treatment recommendation based on the molecular signature of the tumor cells (e.g., treatment recommendation for a first patient with a first tumor may be based on shared pathway characteristics with a second patient with a distinct second tumor).
Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
The inventive subject matter provides apparatus, systems, and methods for improved omics analysis of various tumors. More specifically, the inventors discovered that omics data analysis can be significantly improved by first identifying patient and tumor relevant changes in the genome, typically via comparison of tumor and matched normal samples. Once such differences are ascertained, further transcriptomic data of the same patient are used to identify whether the changed sequences are expressed in the tumor. The so obtained patient data are then subjected to pathway analysis to identify pathway characteristics of the tumor, and particularly shared pathway characteristics of the tumor with various other types of tumors. As should be readily appreciated, shared pathway characteristics may be employed to inform treatment using one or more treatment modalities from anatomically unrelated tumors that would otherwise not have been identified. Viewed from a different perspective, different tumor types share pathway characteristics irrespective of the anatomical tumor type, and the knowledge of shared pathway characteristics with respective molecular signatures may identify drug treatment strategies that had not been appreciated for a particular tumor type.
Consequently, in one aspect of the inventive subject matter, the inventors contemplate a method of identifying a molecular signature for a tumor cell, and especially a molecular signature of a cell signaling pathway. Most typically, identification and analysis is performed using a fully integrated, cloud-based, supercomputer-driven, genomic, and transcriptomic analytic engine. It should be noted that any language directed to a computer should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, controllers, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). The software instructions preferably configure the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclosed apparatus. In especially preferred embodiments, the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public-private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods. Data exchanges preferably are conducted over a packet-switched network, the Internet, LAN, WAN, VPN, or other type of packet switched network.
In especially preferred methods, an analysis engine receives a plurality of data sets from a respective plurality of patients, wherein at least two of the plurality of patients are diagnosed with different tumors, and wherein each data set is representative of genomics information from tumor and matched normal cells. In a further step, the analysis engine receives transcriptomics information for the at least two patients and identifies shared pathway characteristics among the tumor cells of the at least two patients using the genomics information and the transcriptomics information (of course, it should be noted that shared pathway characteristics may also be identified only for a single patient sample while pathway characteristics of other tumors may be obtained from a pathway database). In yet another step, the analysis engine is then used to assign, on the basis of the shared pathway characteristics, a molecular signature to the tumor cells, wherein the molecular signature is assigned independently (i.e., in an agnostic manner) of an anatomical tumor type. In a still further step, a patient record may be generated or updated using the molecular signature.
With respect to the data sets from the plurality of patients it is contemplated that the type of data sets may vary considerably and that numerous types of data sets are deemed suitable for use herein. Therefore, data sets may include unprocessed or processed data sets, and exemplary data sets include those having BAMBAM format, SAMBAM format, FASTQ format, or FASTA format. However, it is especially preferred that the data sets are provided in BAMBAM format or as BAMBAM diff objects (see e.g., US2012/0059670A1 and US2012/0066001A1). Therefore, and viewed from another perspective, it should be noted that the data sets are reflective of a tumor and a matched normal sample of the same patient to so obtain patient and tumor specific information. Thus, genetic germ line alterations not giving rise to the tumor (e.g., silent mutation, SNP, etc.) can be excluded. Of course, it should be recognized that the tumor sample may be from an initial tumor, from the tumor upon start of treatment, from a recurrent tumor or metastatic site, etc. In most cases, the matched normal sample of the patient may be blood, or non-diseased tissue from the same tissue type as the tumor.
It should also be noted that the data sets may be streamed from a data set generating device (e.g., sequencer, qPCR machine, etc.) or provided from a data base storing the data sets. For example, suitable data sets may be derived from a BAM server (e.g., as described in US2012/0059670A1 and US2012/0066001A1) and/or a pathway analysis engine (e.g., as described in WO2011/139345A2 and WO2013/062505A1). Such is particularly true where the data sets from a tumor and matched normal sample are not derived from the patient. Thus, at least some of the data sets may be independently stored and provided, and analysis may be performed on a newly obtained patient sample (e.g., within one week of obtaining patient tissue samples) using data sets from the patient's tumor and matched normal sample and previously stored tumor and matched normal sample not derived from the patient.
With further respect to the data sets it is noted that the data sets from all tumors are in a format that allows ready comparison without further conversion and/or processing. Thus, the data sets will preferably comprise mutation information, methylation status information, copy number information, insertion/deletion information, orientation information, and/or breakpoint information specific to the tumor and the patient. It is still further contemplated that the data set is representative of at least a portion of the entire genome, and most typically the whole genome. Therefore, the data sets are preferably prepared form whole genome sequencing covering the entire genome (or at least 50%, or at least 70%, or at least 90% of the entire genome). Alternatively, exome sequencing is also contemplated, and in most cases it is contemplated that at least 50%, more typically at least 70, and most typically at least 90% of the entire exome is sequenced.
Moreover, and with respect to the origin of the data sets it should be appreciated that numerous non-patient tumor data are used. Therefore, it is contemplated that for data sets other than a patient data set will be derived from at least two different tumors, and more preferably from at least three, or at least five different tumor types to identify shared pathway characteristics. Data sets from different tumor types can be obtained from different patient samples as such samples are available (e.g., from a hospital, clinical trial, epidemiological study, etc.) and/or can be provided from previously acquired analyses or data. For example, the TCGA provides a good sample of well-characterized omic information useful to prepare data sets suitable for use herein and Table 1 below exemplarily illustrates data used in the present analysis.
With reference to the TCGA data it was further observed that different tumor types had multiple mutations in multiple genes. As such, it is apparent that simple targeting of an individual druggable target is in most circumstances not a viable option. Indeed,
As can be taken from
In particularly preferred aspects, transcription information is obtained to cover at least 50%, or at least 70, or at least 80, or at least 90% of all exomes in the genomics information from the tumor cells. Thus, it is contemplated that transcripts of a tumor cell or tissue may also be analyzed for their quantity (and optionally also for sequence information to identify RNA editing and/or RNA splicing). Such analysis may include threshold values that are typically user defined as further described in copending US provisional application with the Ser. No. 62/162,530, filed 15-May-15.
In addition to the lack of consideration of transcriptomics data, the functional impact of a mutation within a cell signaling network has not been appreciated in most of heretofore known systems and methods, especially where multiple mutations are present in multiple genes associated with a tumor. To overcome such shortcoming, the inventors used the patient and tumor specific mutation information and associated expression levels in an analysis of cell signaling pathways to thereby obtain information on pathway usage and compensation where a pathway function was compromised. Therefore, it is noted that the transcriptomics information is preferably used to infer reduced or absence of function of a protein encoded by a mutated gene, and with that influence on a particular pathway.
While various pathway analytical tools are know in the art, the inventors especially contemplate use of dynamic pathway maps in which pathways are expressed as probabilistic pathway model. For example, pathway analyses may be performed using PARADIGM, as described in WO2011/139345, WO2013/062505, WO2014/059036, or WO2014/193982, using the data sets and transcriptomics information to so arrive at the particular pathways usage of a specific tumor. As will be readily appreciated, where multiple data sets from multiple patients having distinct tumors as employed, the analysis engine will be able to identify for each tumor particular pathway characteristics with a molecular signature of the tumor cells. For example, the analysis engine may identify shared pathway characteristics among multiple tumor types where such shared characteristics may include a constitutively activated pathway, a functionally impaired pathway, and a dysregulated pathway. Such shared pathway may be characterized or due to a variety of factors and exemplary factors leading to a particular pathway characteristic include a mutated non-functional protein, mutated dysfunctional protein, an overexpressed protein, or an underexpressed protein in a pathway, etc. Of course, it should be noted that at least some of the pathway characteristics may be previously determined and stored in a data base or that at least some of the pathway characteristics may also be determined de novo. Therefore, it should be recognized that new patient data may be compared against already obtained data from a database.
Among other benefits of integrated genomics, transcriptomics, and pathway analysis for multiple tumor types of multiple patients, it should be appreciated various subsequent analyses are now possible to group or classify certain molecular events into otherwise not observable categories. For example, as is illustrated in
Most notably, and as exemplarily shown in
In another manner of classification, the inventors contemplate that selected pathways and/or pathway elements may be analyzed from a multiple different tumors as is exemplarily shown in
Therefore, the inventors contemplate that a patient record will typically include one or more treatment recommendations based on the molecular signature of the tumor cells (and with that based on the shared pathway characteristics with other unrelated tumors). In other words, a treatment recommendation for a first patient with a first tumor may be based on a shared pathway characteristics with a second patient with a distinct second tumor.
As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Moreover, as used herein, and unless the context dictates otherwise, the term “coupled to” is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms “coupled to” and “coupled with” are used synonymously. Moreover, all methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.
It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the scope of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.
Claims
1. A computer-executed method for identification of a treatment option based on omics data of tumor cells, comprising:
- identifying, by an analysis engine, shared pathway characteristics among tumor cells of a plurality of patients using a plurality of respective omics data sets of the tumor cells;
- wherein each omics data set belongs to each of the plurality of patients, respectively, and wherein at least two of the plurality of patients are diagnosed with different tumors;
- wherein each set of omics data comprises genomics information and transcriptomics information from tumor and matched normal cells of each of the plurality patients;
- wherein the transcriptomics information comprises expression levels or sequences of transcripts;
- stratifying the tumor cells as belonging to a class of tumors based on the shared pathway characteristics;
- using the analysis engine to provide, based on the class, a treatment option to the tumor cells, wherein the treatment option is common to the tumor cells independently of an anatomical types of the tumor cells;
- wherein the treatment option includes identification of a drug and indication that the tumor cells are treatable with the drug, or wherein the treatment option is selected based on known or available treatments for the anatomically unrelated and distinct tumors; and
- treating the tumor cells using the drug or known or available treatment.
2. The method of claim 1 wherein the at least two sets of omics data are in a BAMBAM format, a SAMBAM format, a FASTQ format, or a FASTA format.
3. The method of claim 1 wherein the at least two sets of omics data are BAMBAM diff objects.
4. The method of claim 1 wherein the genomics information comprise mutation information, copy number information, insertion information, deletion information, orientation information and/or breakpoint information.
5. The method of claim 1 wherein the genomics information is whole genome sequencing information.
6. The method of claim 1 wherein the genomics information is exome sequencing information.
7. The method of claim 1 wherein the transcriptomics information covers at least 50% of all exomes in the genomics information from the tumor cells.
8. The method of claim 1 wherein the transcriptomics information covers at least 80% of all exomes in the genomics information from the tumor cells.
9. The method of claim 1 wherein the shared pathway characteristics are selected from the group consisting of a constitutively activated pathway, a functionally impaired pathway, and a dysregulated pathway.
10. The method of claim 1 wherein the shared pathway characteristics are characterized by a mutated non-functional protein, mutated dysfunctional protein, an overexpressed protein, or an underexpressed protein in a pathway.
11. The method of claim 1 wherein the transcriptomics information is used in the step of identifying to infer reduced or absence of function of a protein encoded by a mutated gene.
12. The method of claim 1 wherein the step of identifying is performed using PARADIGM.
13. The method of claim 1, wherein the treatment option includes identification of a drug and indication that the tumor cells are treatable with the drug.
14. The method of claim 1, wherein the treatment option for a first patient with a first tumor is based on shared pathway characteristics with a second patient with a distinct second tumor.
15. The method of claim 1, further comprising analyzing information, by an analysis engine, on pathway usage or compensation where a pathway function is compromised.
16. The method of claim 1, wherein the sequences of transcripts are used to identify RNA editing or RNA splicing.
17. The method of claim 1, wherein the treatment option is selected based on known or available treatments for the anatomically unrelated and distinct tumors.
18. The method of claim 1, wherein the treatment option target a mutated element common to the tumor cells.
19. The method of claim 1, wherein the treatment option targets a non-mutated element, which compensates for a defect of a pathway where a mutated element common to the tumor cells is disposed.
20. The method of claim 1, wherein the expression level comprises quantity of the transcripts.
Type: Application
Filed: Jan 2, 2020
Publication Date: Jun 11, 2020
Inventors: Shahrooz Rabizadeh (Los Angeles, CA), John Zachary Sanborn (Santa Cruz, CA), Charles Joseph Vaske (Santa Cruz, CA), Stephen Charles Benz (Santa Cruz, CA), Patrick Soon-Shiong (Los Angeles, CA)
Application Number: 16/733,013