IDENTIFYING PRESENCE AND COMPOSITION OF CELL-FREE NUCLEIC ACIDS

Info

Publication number: 20230095082
Type: Application
Filed: Sep 30, 2022
Publication Date: Mar 30, 2023
Inventors: Jaime F. Modiano (Minneapolis, MN), Milcah C. Scott (Minneapolis, MN), John R. Garbe (Woodbury, MN)
Application Number: 17/937,259

Abstract

This disclosure describes example techniques and systems for identifying the presence and/or composition of nucleic acids in the blood of a host organism of a model species harboring tissue of a donor organism of another species. For example, the technique may involve identifying the presence and composition of nucleic acids in the blood of a mouse harboring tissue of a human or another companion animal. These cell-free nucleic acids that are identified can be used as biomarkers to determine the presence of a disease, its biological behavior, its rate of progression, and/or the response of the disease to one or more unique therapies. In other examples, the cell-free nucleic acids may be used as biomarkers to determine a response of the host species to the tissue of the donor organism or a response of tissue derived from the second organism to transplantation within the first organism of the first species.

Description

Description

This application is a continuation of U.S. patent application Ser. No. 15/783,776, filed Oct. 13, 2017, which claims the benefit of U.S. Provisional Patent Application No. 62/407,987, filed Oct. 13, 2016, the entire content of which being incorporated herein by reference.

TECHNICAL FIELD

This disclosure relates generally to methods for identifying and analyzing nucleic acids and, more particularly, methods for identifying and analyzing cell-free nucleic-acid biomarkers.

BACKGROUND

The composition and abundance of an organism's nucleic acids provide biomarkers indicative of various aspects of the organism's genome and transcriptional expression, including the organism's predisposition toward particular biological states, as well as the presence and progression of such biological states. Much of a living, multicellular organism's total nucleic acid complement is located intracellularly: DNA is chiefly located within the nuclei of the cells, whereas RNA of numerous types is abundant within the various organelles and cytoplasm of cells. Thus, nucleic acids may be derived from cells and used as biomarkers to determine a biological state of organism, such as the presence of a disease or the biological behavior of the disease. In applications such as xenografting, tissue from a donor animal may be grafted into a host animal, and then the biological behavior of the donor tissue may be evaluated in the host animal by analyzing nucleic-acid biomarkers derived from cells of the donor tissue.

SUMMARY

This disclosure describes example techniques and systems for determining the composition and abundance of nucleic acids in the blood of a host animal that has received a tissue xenograft and detecting biomarkers of a biological state via identification of the nucleic acids. Such techniques may include creating a combined reference genome that incorporates gene sequences from both the genome of the host animal and the genome of the xenograft donor animal. Cell-free nucleic acid sequences isolated from a blood sample of the host animal may then be aligned with the combined reference genome. In order to distinguish between sequences originating from the donor animal and those originating from the host animal, those sequences that align with a single region of the combined reference genome may be retained for further analysis. The retained sequences may then be analyzed to determine their identity, species of origin, abundance, and association with predetermined gene clusters that represent known biochemical pathways. In some examples, the techniques described herein may enable accurate identification and analysis of biomarkers associated with a biological state of xenograft donor tissue or xenograft host tissue based on a blood sample of the host animal, in part by eliminating from consideration confounding sequences originating from one of the host animal or the donor animal.

In one example, a method comprises obtaining a plurality of exosomes from a sample of bodily fluid derived from a first organism of a first species, wherein the first organism of the first species comprises tissue derived from both the first organism of the first species and tissue derived from a second organism of a second species, and wherein the plurality of exosomes comprises a plurality of molecules of ribonucleic acid (RNA); determining, for substantially each molecule of the plurality of molecules of RNA, a corresponding RNA sequence; determining, for each corresponding RNA sequence, whether the RNA sequence is substantially aligned with exactly one corresponding gene sequence of a combined reference genome; determining one or more characteristics of each RNA sequence substantially aligned with exactly one corresponding gene sequence of the combined reference genome, wherein the one or more characteristics include at least one of: a gene name of the corresponding gene sequence; a species associated with the corresponding gene sequence, wherein the species is one of the first species and the second species; determining an approximate number of times that each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample of blood; and determining, based on one or more of the one or more characteristics of each RNA sequence substantially aligned with exactly one corresponding gene sequence of the combined reference genome or the approximate number of times each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample of bodily fluid, whether the tissue derived from the second organism of the second species contains a biomarker indicative of at least one of a disease status, a response of the first organism of the first species to the tissue derived from the second organism of the second species, or a response of tissue derived from the second organism to transplantation within the organism of the first species.

In another example, a method comprises determining a corresponding RNA sequence for substantially each molecule of a plurality of molecules of ribonucleic acid (RNA), wherein a plurality of exosomes from a sample of bodily fluid derived from a first organism of a first species comprises the plurality of molecules of RNA, and wherein the first organism of the first species comprises tissue derived from both the first organism of the first species and tissue derived from a second organism of a second species; determining, for each corresponding RNA sequence, whether the RNA sequence is substantially aligned with exactly one corresponding gene sequence of a combined reference genome; determining one or more characteristics of each RNA sequence substantially aligned with exactly one corresponding gene sequence of the combined reference genome; and determining an approximate number of times that each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample of bodily fluid; and determining, based on one or more of the one or more characteristics of each RNA sequence substantially aligned with exactly one corresponding gene sequence of the combined reference genome or the approximate number of times each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample of bodily fluid, whether the tissue derived from the second organism of the second species contains a biomarker indicative of at least one of a disease status, a response of the first organism of the first species to the tissue derived from the second organism of the second species, or a response of tissue derived from the second organism to transplantation within the first organism of the first species.

In another example, a system comprises a reservoir configured to receive a sample of bodily fluid; and processing circuitry configured to: determine a corresponding RNA sequence for substantially each molecule of a plurality of molecules of ribonucleic acid (RNA), wherein a plurality of exosomes from the sample of bodily fluid is derived from a first organism of a first species and comprises the plurality of molecules of RNA, and wherein the first organism of the first species comprises tissue derived from both the first organism of the first species and tissue derived from a second organism of a second species; determine, for each corresponding RNA sequence, whether the RNA sequence is substantially aligned with exactly one corresponding gene sequence of a combined reference genome; determine one or more characteristics of each RNA sequence substantially aligned with exactly one corresponding gene sequence of the combined reference genome; determine an approximate number of times that each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample of bodily fluid; and determine, based on one or more of the one or more characteristics of each RNA sequence substantially aligned with exactly one corresponding gene sequence of the combined reference genome or the approximate number of times each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample of bodily fluid, whether the tissue derived from the second organism of the second species contains a biomarker indicative of at least one of a disease status, a response of the first organism of the first species to the tissue derived from the second organism of the second species, or a response of tissue derived from the second organism to transplantation within the first organism of the first species.

The details of one or more example are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a flow diagram illustrating an example technique in accordance with the examples of this disclosure.

FIG. 2 is a flow diagram illustrating an example technique in accordance with the examples of this disclosure.

FIG. 3 is flow diagram illustrating an example technique in accordance with the examples of this disclosure.

FIG. 4 is functional block diagram illustrating an example system that may be used to implement the techniques described herein, which may include remote computing devices, such as a server and one or more other computing devices, that are connected to one or more external devices via a network.

FIG. 5 is a functional block diagram further illustrating the external server in the example system of FIG. 4 that may be used to implement the techniques described herein.

FIGS. 6A and 6B are graphical representations of differences in gene expression and patient survival times between OS-1 and OS-2 phenotypes.

FIGS. 7A and 7B are photographic representations of data pertaining to the application of the techniques described herein to the OS-1/OS-2 xenograft example.

FIGS. 8A-8C are photographic and graphical representations of data pertaining to the application of the techniques described herein to the OS-1/OS-2 xenograft example.

FIGS. 9A-9J are graphical representations of data pertaining to the application of the techniques described herein to the OS-1/OS-2 xenograft example.

FIGS. 10A and 10B are graphical representations of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIG. 11 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIG. 12 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIG. 13 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIG. 14 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIG. 15 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, indicating predicted upstream regulators identified from the data analysis technique of FIG. 14.

FIG. 16 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, indicating biological processes and canonical pathways associated with gene clusters identified from the data analysis technique of FIG. 14.

FIG. 17 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIG. 18 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIG. 19 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIGS. 20A and 20B are graphical representations of a data gathering and analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example

FIGS. 21A-21C are graphical representations of data gathering and analysis techniques in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example.

FIGS. 22-27 illustrate tables providing additional information pertaining to the application of the techniques described herein to the OS-1/OS-2 xenograft example.

FIGS. 28A-28C are graphical representations of a workflow by which RNA contents of OS-derived exosomes from cultured cells may be defined using next-generation sequencing, and outcomes of example of data analyses performed on data derived from the workflow indicating that exosomes from OS-1 and OS-2 contain transcripts involved in different cell behaviors.

FIGS. 29A-29C are graphical representations of a workflow by which RNA contents of OS-derived exosomes from cultured cells may be defined, and outcomes of example data analyses performed on data derived from the workflow indicating that decreased expression of cytokines may be found in fibroblasts treated with OS-2 derived exosomes.

FIGS. 30A and 30B are graphical representations of differentially expressed mouse genes.

FIGS. 31A and 31B are graphical representations of differentially expressed mouse genes and canine orthologs of the mouse genes.

FIG. 32 is a graphical illustration of a bioinformatics method that shows the number of transcripts at each step of differential expression analysis.

FIG. 33A-33C are is graphical representations of 198 differentially expressed transcripts.

FIGS. 34A-34D are graphical representations of the detection of biomarkers of disease and host response.

DETAILED DESCRIPTION

Favorable clinical outcomes of many medical conditions depend, to varying degrees, on factors such as the accurate prediction of a patient's risk of a condition or disease, reliable testing for early detection of the condition or disease, accurate prediction of disease progression, or the selection and administration of appropriate therapies. In some cases, the composition and abundance of nucleic acids present in the patient's tissue may be used as one or more biomarkers corresponding to the patient's risk of the condition or disease, or to the presence, progression, or potential response of the condition or disease to a particular therapy. Identification of such biomarkers present in the patient's tissue thus may facilitate one or more interventions associated with a favorable clinical outcome of the patient.

At present, diagnostics and interventions based on nucleic-acid biomarkers largely rely on the identification of single nucleic-acid biomarkers, which may be identified based on cells derived from tissue samples (e.g., tumor biopsies) obtained from multiple patients having a particular disease. Tissue samples from another patient then may be analyzed to determine whether the biomarker is present in the sample. While some such biomarkers pass predetermined statistical thresholds for biomarker identification, the use of single biomarkers in diagnosis and treatment-selection is subject to several drawbacks. For example, in the case of a particular type of cancer, there may be inherent genetic heterogeneity within tumors in individual patients and among different patients with the same type of cancer. Due to such heterogeneity, analysis of a tissue sample from a patient may lead to a false-negative diagnosis that the patient does not have the particular type of cancer because the single biomarker was not detected in the tissue sample.

In addition, methods for identifying and detecting biomarkers based on nucleic acids from cells derived from tissue samples are subject to their own limitations. For example, such methods limit the scope of inquiry to the tissue samples themselves. Although nucleic-acid biomarkers identified in tissue samples (e.g., tumor biopsies) may indicate the presence of a disease or condition, such biomarkers do not reflect aspects of the disease or condition that occur outside of the sampled tissue. For example, while much of a cell's nucleic acids are located within the cell, some nucleic acids, such as RNA, can be transported out of the cell inside vesicles called exosomes. In particular, cell-free RNA may be found in the bloodstream of animals inside exosomes.

The RNA contained within exosomes may be indicative of its parent cell's transcription profile, and may provide biomarkers indicative of a patient's biological state. For example, the RNA contained within exosomes found within a patient's bloodstream may be indicative of one or more of the patient's risk of developing a condition or disease, the presence of the condition or disease, the progress of the condition or disease, or the potential future progress of the condition or disease. Indeed, some species of RNA contained within exosomes may be indicative of a metastatic cancer phenotype (i.e., that a cancerous condition is likely to metastasize), and may indicate that a process of metastasis of a primary tumor has begun. Such species of RNA may be microRNAs, which may be packaged into exosomes, secreted from a tumor cell into the bloodstream of an organism, and disseminated to distant tissue sites. Once such microRNAs have infiltrated a distant tissue site, they may perform a signaling or conditioning function that causes non-cancerous cells at the distant tissue site to undergo changes that make the tissue site more favorable to colonization by tumor cells.

In some examples, monitoring of the composition and abundance of such RNA species present within the bloodstream of a patient may reveal a progressive increase in the abundance of such RNA species, which in turn may serve as a biomarker of progress of a cancerous condition toward metastasis. Such findings may inform a clinician's decision to select a particular type of therapy over another type of therapy, as different types of therapy may be predictable more appropriate at different stages of the condition or disease.

Thus, a complement of multiple cell-free nucleic acid biomarkers derived from blood may provide a better indicator of the presence, progress and future progression of a condition or disease than nucleic acid biomarkers derived from cells taken from a tissue sample. The collection of blood samples for the detection and analysis of cell-free nucleic acid biomarkers may be less invasive, costly, and time consuming than tissue biopsies and/or imaging procedures that may be involved in the detection of nucleic acid biomarkers from tissue samples. Thus, diagnostic and treatment-selection methods based on the detection of cell-free nucleic acid biomarkers derived from exosomes present in a patient's bloodstream may provide numerous clinical benefits over interventions based on the detection of a single nucleic acid biomarker derived from cells.

Described herein are example techniques and system for cell-free nucleic acid biomarker identification that may be used to account for the genetic heterogeneity of many conditions and diseases, as well as example methods for detecting such biomarkers within the blood of a patient. Such methods may be used for virtually any disease or condition for which there are cell lines that grow as xenografts, or patient derived xenografts that reflect the expected variance in a disease or condition. One method involves the use of gene cluster expression summary scores. These scores account for coordinated transcriptional regulation of multiple genes, overcoming deficiencies of single biomarkers. Another method includes the use of reconstructed hybrid genomes (e.g., mouse host and tumor donor species) and bioinformatics approaches to identify mRNAs expressed exclusively in the tumor cells (donor) and mRNAs expressed exclusively in supporting stromal cells (host). Such methods may distinguish mRNAs present in donor-derived exosomes from mRNAs present in host-derived exosomes, enabling a novel means to identify candidate biomarkers. Because the approach is not restricted to identification of donor-derived exosomes, it can also measure biomarkers of host response as well as biomarkers that can define response to therapy. The example methods described herein thus may provide efficient and cost-effective ways to discover biomarkers that can inform the risk of diseases or conditions, their diagnosis and prognosis, may be used to predict response to therapy, and may be rapidly validated in patient samples.

In some methods described herein, identification of cell-free nucleic acid biomarkers may be conducted via xenograft procedures. In such procedures, a tissue sample may be obtained from a donor animal of a first species. Whole tissue or cell lines cultured from the tissue sample may then be grafted into a host animal of another species. In some examples, the tissue sample or cell lines derived from the donor animal may be derived from tissue harboring a disease or condition, such as tissue from a cancerous tumor. In such examples, healthy tissue or cell lines derived from another donor animal may be introduced into another host animal as a control. In other examples, the tissue derived from the donor animal may be healthy organ tissue, and it may be desirable to identify cell-free nucleic acid biomarkers associated with a response of the host tissue and/or the donor organ tissue to a graft procedure. In either example, blood samples subsequently may be obtained from the host animal, from which exosomes containing RNA may be extracted. The RNA then may be sequenced, quantified, and aligned with a hybrid genome prepared by combining the genome of the donor-animal species and the host-animal species. In order to distinguish RNA sequences derived from the donor species from RNA sequences derived from the host species, sequences derived from the respective species may be analyzed separately. In examples in which cell-free nucleic acid biomarkers of a disease or condition are to be determined, donor sequences resulting from control donor animals also may be compared to donor sequences resulting test donor animals. RNA sequences that are differentially expressed in the tissue derived from the control donor animals and the tissue derived from the test donor animals may be identified as biomarker sequences for use in disease diagnosis or analysis of disease behavior. Similarly, RNA sequences that are differentially expressed in instances of organ-tissue acceptance or rejection by the host animals may be identified as biomarker sequences for use in analysis of organ-transplant feasibility.

In other examples, a biological status associated with a particular biomarker may be associated with acceptance or rejection of the tissue of the donor animal by the body of a host animal. Example techniques described herein also may include identifying a predisposition toward a particular biological state of the donor animal based on the presence of a biomarker, such as a predisposition toward acceptance or rejection of donor tissue. In other examples, such techniques may include identifying targeted therapies based on the presence of a biomarker that indicates that the body of the host animal is accepting or rejecting the tissue of the donor animal, such as in the case of organ tissue transplanted into the host animal to assess the long-term feasibility of organ transplant from the donor animal into a different host animal of yet another species.

For the sake of illustration, the example techniques described herein are described within the context of an example in which the host animal is a mouse harboring cells associated with a disease of humans or companion animals, such as cells derived from a canine osteosarcoma tumor. The xenografts described in the example presented herein represent two molecular phenotypes of osteosarcoma (OS-1 and OS-2) with distinct biological behavior that are highly conserved between dogs and humans. However, it should be understood that the example techniques may be used in the identification and detection of biomarkers associated with other diseases or conditions of humans or other animal species.

Osteosarcoma (OS) is a heterogeneous disease with a disproportionate human impact, as it mainly affects children and adolescents, and is the most common malignant pediatric tumor of bone. Standard therapy for OS comprises neoadjuvant chemotherapy, surgery and adjuvant chemotherapy. The 5-year survival rates of OS patients with localized and operable OS is 60-70%, but the outcome of patients with non-resectable or metastatic OS is poor, as more than half of patients with OS succumb to metastatic disease.

OS is also the most common primary malignant tumor of bone in dogs, and it is particularly prevalent in large and giant breeds. OS is an incurable, highly prevalent cancer of large and giant breed dogs that has been identified as a high priority for health research by over 25 AKC Parent breed clubs. In contrast to humans, OS occurs most commonly in older dogs. The number of diagnoses per year has been estimated at >8,000, and possibly as high as 80,000 in the US, with the lifetime risk for OS in some breeds being as high as 1 in 5 to 1 in 7. Similarly to humans, the outcome of canine patients with metastatic OS is poor. Tumors at the primary site may be managed surgically, but most dogs with OS die from metastasis to lungs or to other bones or organs.

These collective statistics illustrate that progress in managing OS has been hindered by its heterogeneity in both humans and in dogs. For example, neither the histological appearance nor the propensity of the tumor cells to elaborate bone, cartilage, or collagen matrices are predictive of behavior, and while recurrent molecular events have been described, these are yet to be adopted as prognostic or predictive biomarkers for this disease. Thus, clarification of the etiology of the disease, development of better strategies to manage disease progression, and methods to guide personalized treatments are among the unmet health needs for both human OS patients and canine patients. These needs may be met by models (e.g., models in species other than humans or canines) that accurately recapitulate the natural heterogeneity of OS in both humans and in dogs. Such models may provide a better understanding of the events that underlie OS tumor heterogeneity and contribute to disease progression may enable the development of effective strategies to manage OS and to improve outcomes. In some cases, a single model may be applicable to both humans and dogs, because spontaneous OS may be a homologous cellular and molecular disease of humans and dogs. For example, prognostically significant gene- and microRNA-expression signatures have been discovered that are evolutionarily conserved in human and canine OS. Such expression signatures may predict both the biological behavior OS and patient survival. While not necessarily linked to metastatic potential, the molecular components of such gene and microRNA expression profiles may reflect tumor growth, invasive potential, time to metastasis, or patient response to therapy.

Techniques for modeling OS to obtain a better understanding of OS disease-events in are within the scope of this disclosure and are described in further detail below. Such techniques also may enable a better understanding of events that occur in other diseases, such as other diseases that may affect dogs, other non-human animals, or humans. In addition, such techniques may enable a better understanding of events that occur in other medical situations, such as in tissue-transplant or other situations.

In the example described below with respect to FIGS. 7A-34D, the techniques may be illustrated with respect to orthotopic xenografts of canine osteosarcoma in nude mice, or with respect to cells cultured with exosomes derived from OS-1 or OS-2 tumors. In this case, potential biomarkers for disease include nucleic acids (genes) indicative of osteosarcoma (canine origin), nucleic acids indicative of biological behavior and/or progression for specific osteosarcomas (canine origin), and nucleic acids indicative of host response to bone invasion, host response to osteosarcoma in general, and response to distinct osteosarcomas with different biological behavior in particular (all of murine origin).

In some examples, cells used for xenografts are called OS-1 (OSCA-32) and OS-2 (OSCA-40). Such cells may be derived from canine tumors with distinct biological behavior and recapitulate this behavior in xenografts. In this example, the cross-species hybrid genome approach may be used to identify separate canine and mouse sequences from tumor xenografts that inform the progression of disease (in the mouse). Thus, it is possible to use tumor samples grown in mice to determine the contribution of dog sequences (derived from the implanted, growing tumor cells) and mouse sequences (derived from infiltrating stroma) to define features of progression for tumors arising from implantation of the different cell lines. In the following description, references are made to illustrative examples. It is understood that other examples may be utilized without departing from the scope of the disclosure.

FIG. 1 is a flow diagram illustrating an example technique according to this disclosure. At block 102, serum may be isolated from blood collected from mice at a “time 0,” i.e., prior to any manipulation. In some examples, experimental groups may include: mice injected intra-tibially with PBS (phosphate-buffered saline), with no cells, i.e., control for host response to intratibial injection and possible consequent inflammation; mice injected intra-tibially with OS-1 cells; and mice injected intra-tibially with OS-2 cells. In this example, serum may be isolated from blood collected from mice in each group every two weeks for up to 8 weeks. For each group, there may be two cages of 4 mice each. Each cage may be an experimental replicate (blood pooled from all the mice in the cage to isolate sufficient serum for exosomes; furthermore, blood may be pooled for analysis from weeks 2, 4, 6, and 8 for each cage, although aliquots may be preserved from the pool for each week for validation by qRT-PCR).

Exosomes may first be isolated from the serum (102). In some examples, this may be accomplished by using ExoQuick kits from System Biosciences, Inc. (SBI), although other suitable techniques may be used.

Next, total RNA may be isolated from the exosomes (104). For example, this may be accomplished by using the Complete SeraMir Exosome RNA Amplification kit from SBI and precipitated with the Dr. GenTLE (Gene Trapping by Liquid Extraction) System from SBI, although other suitable techniques may be used.

Sequencing libraries may be generated (108) from the RNA by using Nextera XT DNA Library Preparation Kit (Clontech) at the University of Minnesota Genomics Center (UMGC), although other suitable techniques and facilities may be used. In some examples, sequencing-library preparation may include RNA purification, reverse-transcriptase PCR production of cDNA from the RNA molecules, PCR amplification of the resulting cDNA molecules, and transcription of the cDNA molecules into RNA. Sequencing may be done at UMGC on a 50 base-pair paired-end (PE) run on a HiSeq 2500 nucleic acid sequencing instrument using Rapid chemistry. In some examples, it may be desirable to use 8 samples per lane and generate >120 M reads, which may be fairly well balanced across projects. Preferably, average quality scores may be above Q30 for all PE reads.

The sequences obtained at block 106 are then compared to a cross-species hybrid genome is performed, followed by bioinformatic analyses (110). A summary of example bioinformatics methods for creation and mapping to cross-species hybrid genome and the workflow of data analysis steps with illustrations is described below.

FIG. 2 is a flowchart illustrating an example bioinformatics method according to this disclosure. It will be summarized here with respect to FIG. 2 and described in greater detail below. First, a single hybrid reference genome for two species may be created by combining the reference sequences of all chromosomes of each species into one file, with chromosome names modified to indicate the species of origin (202). Next, a single hybrid genome annotation file describing the locations of genes in the genome may be created by combining the annotation of each species into one file, with chromosome and gene names modified to indicate the species of origin (204). A sequence alignment program such as HISAT2 may be used to align RNA-Seq sequence reads to the hybrid genome (206). Most reads will map uniquely to a chromosome of one of the species. Some parts of the genomes will be identical in both species resulting in a small number of multi-mapped reads mapping to two chromosomes, one from each species, although longer sequence reads reduce the number of multi-mapped reads. The presence and abundance levels of genes are determined by comparing the genomic location of each uniquely aligned read with the genomic locations of genes in the hybrid annotation file and summing the number of reads aligning to each gene.

Next, multi-mapped genes are excluded from the analysis (208). Excluding multi-mapped reads from the abundance estimation step may be useful to help avoid incorrectly identifying the presence of graft-derived nucleic acids. Aligning RNA-Seq reads only to the reference genome of the graft species may result in the spurious identification of graft-derived genes in cases where the genes have identical sequences in both species. It may be desirable to compare gene expressions levels from a xenograft sample with a negative control sample way provide further power to reduce false-positives. The identity and abundance of genes originating from the donor animal, which in this example may be a dog, is then determined (210). As described in further detail below, the determined identity and abundance of genes originating from the donor animal may be used to determine the presence of disease and disease progression, and may inform treatment decisions.

FIG. 3 is a graphical representation related to the techniques described with respect to the flowcharts of FIGS. 1 and 2. Specifically, FIG. 3 provides an overview of the techniques described with respect to the flowcharts of FIGS. 1 and 2 within the context of a dog-to-mouse xenograft, where the dog harbors a primary tumor. However, FIG. 3 is illustrative in nature, and provides a broad overview of the techniques described herein. Other species may be substituted for the dog and mouse illustrated in FIG. 3, for example, such as other murine species, rodent species, feline, porcine, or non-human primate species. In addition, in some examples, the donor organism may be a human. In the example of FIG. 3, a first organism of a first species (e.g., a dog donor-organism) may harbor diseased tissue such as a primary tumor. A clinician or experimenter may obtain cells from the primary tumor of the dog donor-organism and introduce the cells into a second organism of a second species (e.g., a mouse host-organism) in a xenograft process. Next, the cells from the primary tumor of the dog donor-organism may be allowed to grow and form a xenograft tumor within the mouse host-organism. Thereafter, the clinician or experimenter may obtain a sample of bodily fluid (e.g., blood or blood serum) from the mouse host-organism. Exosomes containing RNA molecules then may be isolated from the sample of bodily fluid, such as by the clinician or experimenter, or by a suitable instrument, and the RNA molecules contained within the exosomes may be sequenced (e.g., by next-generation or other sequencing techniques) to determine a sequence for substantially each molecule of RNA. Bioinformatics analysis then may be performed on the RNA sequences. In some examples, the bioinformatics analysis may be performed by a system that includes processing circuitry (as described below with respect to FIGS. 4 and 5), as well as a suitable receptacle or reservoir configured to receive the sample of bodily fluid.

The bioinformatics analysis illustrated in FIG. 3 may include determining whether each RNA sequence is substantially aligned with exactly one corresponding gene sequence (e.g., a coding or regulatory sequence) of a combined reference genome that includes the genomes of both the dog donor-organism and the mouse host-organism, and determining an approximate number of times that each RNA sequence that aligns with the combined reference genome occurs in the sample of bodily fluid. One or more of the characteristics (e.g., a gene name or a species) of the RNA sequences that align with the combined reference genome or the number of times that such RNA sequences occur in the sample of bodily fluid then may be used to determine whether the sample of bodily fluid contains a biomarker (e.g., a nucleic acid sequence) associated with a disease status of the dog donor-organism.

In some examples, a disease status may be a predisposition to a disease, the presence of a disease, or the progress or potential progression of a disease of the dog donor-animal. In some examples, the disease status may enable a clinician or experimenter to select an appropriate treatment for the dog donor-organism or the mouse host-organism. In other examples, instead of a disease status, the bioinformatics analysis may indicate a response of the mouse host-organism to non-diseased cells or tissue derived from the dog donor-organism, or a response of the non-diseased cells or tissue derived from the dog donor-organism to transplantation within the mouse host-organism. In such examples, the characteristics of the RNA sequences that align with the combined reference genome or the number of times that such RNA sequences occur in the sample of bodily fluid then may be used to determine whether the dog-donor organism is a good candidate to receive a tissue transplant (e.g., an organ transplant).

In some examples, the bioinformatics analysis illustrated in FIG. 3 may include a determination that one or more of the RNA sequences that correspond to exactly one gene sequence are associated with a predetermined cluster of genes. Such clusters of genes may be groups of genes that share one or more functional characteristics. The functional characteristics of the gene clusters may include one or more biological processes or canonical pathways, such as one or more of transcriptional regulation, intracellular signaling, intercellular signaling, cell apoptosis, biomolecule metabolism, biomolecule synthesis, RNA processing, or macromolecule assembly. The example technique broadly illustrated in FIG. 3 will be discussed in greater detail below with respect to FIGS. 7A-34D.

As illustrated in FIGS. 4 and 5, various aspects of the techniques described herein may be implemented within one or more processors, including one or more microprocessors, DSPs, ASICs, FPGAs, or any other equivalent integrated or discrete logic circuitry, as well as any combinations of such components, embodied in programmers, such as physician or patient programmers, electrical stimulators, or other devices. The term “processor” or “processing circuitry” may generally refer to any of the foregoing logic circuitry, alone or in combination with other logic circuitry, or any other equivalent circuitry.

FIGS. 4 and 5 are functional block diagrams of an example system 218 configured to perform the techniques described in accordance with the disclosure. In the example illustrated in FIG. 4, one or more computing devices 230A-230N are connected to network 222. In some examples, an external server device, such as server device 224, may also be connected to network 222. The server device 224 shown in FIGS. 3 and 4 may include processing circuitry 228, memory 226, user interface 242, communication module 244, and power source 240. Processing circuitry 228 may include one or more processors. In one example, processing circuitry 228 is configured to run the software instructions in order to control operation of system 218. Processing circuitry 228 can include one or more processors, including one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or any other equivalent integrated or discrete logic circuitry, as well as any suitable combination of such components. The term “processor” or “processing circuitry” may generally refer to any of the foregoing logic circuitry, alone or in combination with other logic circuitry, or any other equivalent circuitry.

Memory 226 may include any volatile or non-volatile media, such as a random access memory (RAM), read only memory (ROM), non-volatile RAM (NVRAM), electrically erasable programmable ROM (EEPROM), flash memory, and the like. As mentioned above, memory 226 may store information including instructions for execution by processing circuitry 228 such as, but not limited to, instructions for performing the techniques described herein. Communication module 244 may provide one or more channels for receiving and/or transmitting information. Communication module 244 may be configured to perform wired and/or wireless communication with other devices, such as radio frequency communications. In other examples, communication module 244 may not be implemented, and instead, memory 226 may be removable (e.g., a removable flash memory).

Power source 240 delivers operating power to various components of computing device 218. Power source 240 may generate operational power from an alternating current source (e.g., residential or commercial electrical power outlet) or direct current source such as a rechargeable or non-rechargeable battery and a power generation circuit to produce the operating power. In other examples, non-rechargeable storage devices may be used for a limited period of time.

In one or more examples, the functions described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media forming a tangible, non-transitory medium. Instructions may be executed by one or more processors, such as one or more DSPs, ASICs, FPGAs, general purpose microprocessors, or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to one or more of any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.

In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules. Depiction of different features as modules or units is intended to highlight different functional aspects and does not necessarily imply that such modules or units must be realized by separate hardware or software components. Rather, functionality associated with one or more modules or units may be performed by separate hardware or software components, or integrated within common or separate hardware or software components. Also, the techniques could be fully implemented in one or more circuits or logic elements. The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including an IMD, an external programmer, a combination of an IMD and external programmer, an integrated circuit (IC) or a set of ICs, and/or discrete electrical circuitry, residing in an IMD and/or external programmer.

Further aspects of the disclosure will now be discussed, including further details of the techniques described herein. The example laboratory techniques described herein for accomplishing routine laboratory tasks, such as the collection of blood and the isolation of serum from blood, as well as others, are not intended to be limiting and may be performed by any suitable laboratory techniques. In addition to the techniques described above, supplementary techniques, as described below, may be employed. The example techniques described herein may be used to identify cell-free transcripts or other nucleic acids in blood that are specifically associated with the particular tumor and the host response, thereby creating sets of biomarkers with distinct diagnostic utility. In some examples, the human or animal disease may be re-created in a mouse (xenograft, xenotransplants), and to recognize that certain components of the response will be absent when immunodeficient mice are used (for example, the host T-cell component in athymic nude mice).

Tumor cells may regularly enter the circulation. Most of these cells die because they fail to adapt to conditions of growth outside the tumor niche. However, rare cells survive at distant sites and re-establish a niche that is suitable for tumor growth. It is known that tumors with different metastatic potential (including OS) display distinct patterns of gene expression, which may be derived from tumor cell-autonomous properties, but may be influenced by the local tumor microenvironment (TME) stratification of OS into prognostically significant groups. Canine and human OS may be stratified into more and less aggressive groups based on their molecular signatures. The differences in behavior among these subtypes of OS are intrinsic—that is, they are not entirely due to therapeutic management. Moreover, while OS generally has high potential to metastasize, somewhere between 30-40% of children and 10%-25% of dogs treated for this disease, respectively, will survive more than 10 years and more than 2 years. Thus, some tumors metastasize more slowly than others, and this may true for tumor xenografts in immunodeficient mice.

The example described herein illustrates that OS-2 xenografts may have a greater propensity to disseminate to the lung than OS-1 xenografts. Highly expressed genes present in stromal cells of OS-2 xenograft tumors are associated with inflammation and immune response. In addition, pulmonary fibroblasts treated with OS-2 exosomes show decreased expression of chemokines that are strongly chemotactic for lymphocytes and that establish an environment favoring polarization to the Th17 phenotype. OS-2 derived exosomes contain genes involved in immune regulation and inflammation. Both OS-1 and OS-2 exosomes may enable the re-programming of the tumor micro-environment and establishing a favorable niche for tumor dissemination and metastases.

FIGS. 6A and 6B are graphical representations of differences in gene expression and patient survival times between OS-1 and OS-2 phenotypes. FIG. 6A is a sample dendrogram indicating that representative sample canine cell lines OSCA-40 (associated with the OS-2 phenotype) and OSCA-32 (associated with the OS-1 phenotype) were used in the analysis of the example described herein. FIG. 6B is a chart that depicts a Kaplan-Meier analysis of overall survival time (ST) of canine patients according to the sample clustering shown in FIG. 6A. As shown in FIG. 6B, the median survival time of patients with OS-1 is 13.97 months, whereas the median survival time of patients with OS-2 is 2.83 months. The difference in survival times between the OS-1 and OS-2 phenotypes may be at least partially attributable to the greater metastatic potential of OS-2 relative to OS-1. In the example of OS-1 and OS-2, the OSCA-40 cell line may be rapidly metastatic, whereas the OSCA-32 cell line may be poorly or non-metastatic in vivo. Gene expression may vary between different OS cell lines. For example, miR-20a and miR-135b are expressed at higher levels in more aggressive cell lines; this pattern is conserved in human OS. In order to determine or anticipate the biological behavior of OS, OS xenografts can be established in the tibia or tarsus of immunocompromised mice, and the gene expression of the OS xenografts may be analyzed. In addition, such techniques may enable determination of biological interactions between the xenograft tissue (e.g., OS-1 or OS-2 cell lines) and the TME within the host tissue. These experiments may be of relatively short duration and pain associated with tumor growth, and may be managed by using medication and/or amputation as appropriate.

Tumor exosomes package bioactive molecules such mRNAs and microRNAs. Thus, nucleic acids enriched in OS tumors with distinct biological behaviors may be contained in exosomes. Exosomes in canine OS cell lines have been characterized in situ, by flow cytometry, and by biochemical analysis. The data suggest there may be preferential accumulation of mRNA species in cells derived from tumors with different behavior. Thus, at least some the in vivo behavior of OS may be associated with an altered composition of exosomes that enter the circulation and influence distant environments in preparation for tumor dissemination.

For this study, cell lines derived from two spontaneous canine OS with distinctly different biological behavior (OS-1 and OS-2) were used for heterotypic in vivo modeling that recapitulates the heterogeneous biology and behavior of this disease. Both cell lines demonstrated stability of the transcriptome when grown as orthotopic xenografts in athymic nude mice. Consistent with the behavior of the original tumors, OS-2 xenografts grew more rapidly at the primary site and had greater propensity to disseminate to lung and establish microscopic metastasis. Moreover, OS-2 promoted formation of a different tumor-associated stromal environment than OS-1 xenografts. In addition to comprising a larger fraction of the tumors, a robust pro-inflammatory population dominated the stromal cell infiltrates in OS-2 xenografts, while a mesenchymal population with a gene signature reflecting myogenic signaling dominated those in the OS-1 xenografts. The studies described herein show that canine OS cell lines maintain intrinsic features of the tumors from which they were derived and recapitulate the heterogeneous biology and behavior of bone cancer in mouse models. This system provides a resource to understand interactions between tumor cells and the stromal environment that may drive progression and metastatic propensity of OS.

Understanding the heterogeneous biology and behavior of OS may be useful to fully elucidate the pathogenesis of osteosarcoma and other diseases. As described herein, orthotopic canine OS xenografts preserve the biological, molecular, and heterotypic heterogeneity observed in the tumors from which they were derived. Moreover, transcriptome analysis of xenograft tumors revealed a strong OS cell specific stromal response, which may provide evidence that intrinsic genetic tumor characteristics and cross-talk between tumor and stromal cells might underlie heterogeneity of biological behavior in OS patients. These data may provide insight into tumor-host interactions and identify targets that may play a role in treatment strategies for OS patients.

Results

FIGS. 7A and 7B are photographic representations of data pertaining to the application of the techniques described herein to the OS-1/OS-2 xenograft example. FIG. 7A includes photographs of mice injected with OS-1 and OS-2 cells at day 1 post-injection and again at day 8 post-injection. FIG. 7B includes photographs of the mice injected with OS-1 and OS-2 cells at days 8, 15, and 49 post-injection. Luminescence resulting from luciferase activity is shown in radiance (p/sec/cm²/sr). Differential metastatic propensity in orthotopic canine OS-1 and OS-2 xenografts: Luciferase activity was observed in the lungs of mice receiving intratibial OS-2 cells, but not in mice injected with OS-1 cells, within 6 hours of injections (FIG. 7A). This was interpreted as evidence of systemic dissemination of OS-2 cells with accumulation in the lungs. The luciferase signal disappeared from the lungs within one week after tumor administration, but the presence of OS-2 cells was evident focally in the lungs of one mouse from this group again within two weeks after tumor administration, and the luciferase activity in this area continued to increase until the end of the experiment (FIG. 7B). When the mice from all the experiments were considered together, OS-2 cells achieved metastatic dissemination more rapidly than OS-1 cells (by 15, 22, and 29 days), although the rate of microscopic and macroscopic metastasis between the two groups on Day 36 when the experiments were terminated was not different based on imaging (p=0.35) or histopathology (p=0.77; see also the table illustrated in FIG. 22).

FIGS. 8A-8C are photographic and graphical representations of data pertaining to the application of the techniques described herein to the OS-1/OS-2 xenograft example, and indicate differential growth rates at the primary site in orthotopic canine OS-1 and OS-2 xenografts. FIG. 8A includes photographs of mice injected with OS-1 and OS-2 cells at days 1, 29, and 57 post-injection. Luminescence resulting from luciferase activity is shown in radiance (p/sec/cm²/sr). FIG. 8B illustrates in vivo luciferase activity at different times from 1-59 days post-injection. FIG. 8C illustrates disease progression over time as indicated by tibia volume. Development and progression of primary tumors were examined using in vivo imaging starting six hours after orthotopic cell injections and then weekly for the duration of the study. Luciferase activity was detectable within 6 hours in many of the mice receiving OS-1 or OS-2 cells, and all of the mice showed disease progression over time. Expansion of tumor cells may be inferred from the increased luciferase emission over time. FIG. 8B shows that OS-2 intratibial xenografts had grown significantly faster than OS-1 intratibial xenografts by day 22 and this difference persisted until day 50. The results shown in FIG. 8C encompass a more complex process, as the physical size of the tumors in the proximal tibia would be influenced by infiltrating host stromal cells and swelling. The data confirm that OS-2 intratibial xenografts grew significantly faster than OS-1 intratibial xenografts in this example, although the effect was delayed (detectable by day 29), with this relative difference persisting until day 50 (FIGS. 8B and 8C; see also the table illustrated in FIG. 22). It is worth noting that neither indirect imaging measurements nor direct physical measurements may necessarily account for tumor invasion and loss of periosteal integrity, as is described below. Nevertheless, the data shown in FIGS. 8A-8C and FIG. 23 indicate that, in this example, disease progression was significantly faster in animals harboring OS-2 xenografts than in animals harboring OS-1 xenografts.

FIGS. 9A-9J are graphical representations of data pertaining to the application of the techniques described herein to the OS-1/OS-2 xenograft example. Primary and metastatic tumors derived from orthotopic implantation of OS-1 and OS-2 cells show histological features and organization that are characteristic of canine OS: All of the mice injected with OS-1 or OS-2 cells had evidence of gross tumor burden in the proximal tibia at necropsy on the eighth week after injection (FIGS. 9A and 9B). Histologically, OS-1-derived tumor xenografts were characterized by a relatively well-differentiated, polygonal to spindle-shaped cells that had round to oval nuclei, mild to moderate anisocytosis and anisokaryosis, and infrequent mitotic activity (FIGS. 9C and 9E). These tumors contained organized osteoid ribbons and showed limited destruction of cortical bone and epiphyseal invasion (FIG. 9C).

In contrast, OS-2 tumors had a more aggressive appearance with spindle-shaped, anaplastic cells that had round to elongate nuclei, moderate anisocytosis and anisokaryosis, and frequent mitotic activity (FIGS. 9D and 9F). The cells in these tumors were embedded in a poorly organized, pale eosinophilic matrix and they showed extensive necrosis with marked destruction of cortical bone and epiphyseal invasion (FIG. 9D).

The different metastatic propensity of OS-1 and OS-2 was confirmed histologically (FIGS. 9G-9J; also see the table illustrated in FIG. 22). Fewer than 20% of the mice injected with OS-2 and 7% of the mice injected with OS-1 developed metastasis by Day 36. When lung metastasis was present, the histological appearance of the metastatic tumors recapitulated that of the parent tumors as illustrated by the photomicrographs on one mouse receiving OS-2 orthotopically in FIGS. 9H and 9J. In these animals, the morphology and mitotic activity of the cells and their residence in a poorly organized, pale eosinophilic matrix with extensive areas of necrosis and frequent mitotic activity were comparable to that seen in the primary tumors.

FIGS. 10A-19 illustrate bioinformatics methods that may be used to carry out one or more portions of the methods described herein as applied to the OS-1/OS-2 example. Generally, FIGS. 10A-19 illustrate that a single hybrid reference genome for two species is created by combining the reference sequences of all chromosomes of each species into one file, with chromosome names modified to indicate the species of origin. A single hybrid genome annotation file describing the locations of genes in the genome is created by combining the annotation of each species into one file, with chromosome and gene names modified to indicate the species of origin. A sequence alignment program such as HISAT2 is used to align RNA-Seq sequence reads to the hybrid genome. Most reads may map uniquely to a chromosome of one of the species. Some parts of the genomes may be identical in both species resulting in a small number of multi-mapped reads mapping to two chromosomes, one from each species, although longer sequence reads reduce the number of multi-mapped reads. The presence and abundance levels of genes may be determined by comparing the genomic location of each uniquely aligned read with the genomic locations of genes in the hybrid annotation file and summing the number of reads aligning to each gene. Excluding multi-mapped reads from this abundance estimation step may help avoid incorrectly identifying the presence of graft-derived nucleic acids. Aligning RNA-Seq reads only to the reference genome of the graft species will result in the spurious identification of graft-derived genes in cases where the genes have identical sequences in both species. Comparing gene expressions levels from a xenograft sample with a negative control sample provides further power to reduce false-positives.

FIG. 10A is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. As shown in FIG. 10A, Phred scores may be used for a quality control check to ensure that the raw data fall within acceptable parameters and that there are no problems or biases in the data. In FIG. 10A, a plot of mean quality values (Phred scrore) across all bases at each position in the sequence read is prepared. A Phred score >28 indicate good calling performance. Greyscale shading indicates quality: very good quality calls (246), and calls of reasonable quality (248).

FIG. 10B is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. As shown in FIG. 10B, a percent of sequences aligned to the cross-species hybrid genome, such as by using a HISAT2 aligner to align the RNA sequences to cross-species hybrid genome, is determined.

FIG. 11 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. As shown in FIG. 11, gene abundance values may be generated and graphically rendered. The presence and abundance levels of genes are determined by comparing the genomic location of each uniquely aligned read with the genomic locations of genes in the hybrid annotation file and summing the number of reads aligning to each gene. For this analysis raw counts may be generated by a feature counts summarization program as the abundance value. Such a program does not count reads overlapping with more than one genomic region. FIG. 11 illustrates the violin plots that show the distributions of gene abundance values, although other types of plots also may be used to illustrate gene abundance values.

FIG. 12 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. Specifically, FIG. 12 is plot from multidimensional scaling of gene abundance values is examined to explore relationships between samples and to identify and potential outliers, and illustrates multi-dimensional scaled plot for samples in this experiment based on gene abundance values illustrated in FIG. 11. Note scale has been shrunken to appreciate separation. Samples from controls and mice without tumors form a tight cluster, with some separation along the first dimension for tumor samples, and separation along the second dimension between the tumor samples.

FIG. 13 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. Specifically, FIG. 13 is a hierarchical clustering heat map illustrating that samples that are most similar occupy closer positions in tree, while samples that are less similar are separated by larger numbers of branch points. Hierarchical clustering heatmaps, such as the heatmap of FIG. 13, may be generated after converting gene abundance values to z-scores. A color or greyscale scheme may be applied for the visualization of “high” and “low” gene abundance values in the samples, as shown in FIG. 13. For the sake of illustration, an area indicating low z-scores is depicted at 250, and an area of high z-scores at 252. The rows of the heatmap identify the names of the genes represented in the heatmap. In the heatmap of FIG. 13, the dog genes are identifiable by having an Ensembl gene name (e.g., ENSCAFG . . . ID), while the mouse genes are identifiable by murine gene symbol. OSCA-40 replicate samples cluster together away from OSCA-32 and the control samples indicating that they are more similar to one another than to other samples.

FIG. 14 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. As illustrated in FIG. 14, identification of canine and murine may performed to identify genes that are differentially abundant in: controls versus xenograft samples (canine specific sequences), and OS-1 versus OS-2 xenograft samples. FIG. 14 illustrates a hierarchical clustering heatmap of log transformed and mean centered gene abundance values. The heatmap represents clustered gene-level counts with lower than mean (values of −3.00-−1.00; area 254), higher than the mean (values of 1.00-3.00; areas 256 and 258), and mean (value of 0.00) levels of expression. Each row of the heat map represents a single gene. As shown in FIG. 14, there are a number of highly correlated genes that are abundant in xenograft samples, OS-1 (OSCA-32) and OS-2 (OSCA-40) compared to control samples (denoted as Cluster 1). These 1078 genes all had canine genes IDs. There are a number of highly correlated genes that are more abundant in OS-2 xenografts compared to OS-1 xenograft samples (denoted as Cluster 4). In the example illustrated in FIG. 14, such genes all had canine gene IDs.

FIG. 15 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure, including the determination of biological processes and canonical pathways associated with gene clusters identified from the hierarchical clustering heatmap of FIG. 14. FIG. 15 depicts the heatmap of in FIG. 14 overlaid with the outcome of a technique that includes predicting upstream regulators for the seven gene clusters that were identified from hierarchical clustering heatmap by the data analysis technique of FIG. 14. FIG. 15 illustrates that Cluster 7 consists of exclusively mouse genes.

FIG. 16 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. As illustrated in FIG. 16, biological processes and canonical pathways associated with gene clusters identified from the hierarchical clustering heatmap of FIGS. 14 and 15 may be determined. The heat map pictured in FIG. 16 is the same as shown in FIGS. 14 and 15. Top predicted transcriptional regulators of genes for each gene cluster defined by hierarchical clustering are shown in the heatmap. FIG. 16 illustrates that Cluster 7 consists of exclusively mouse genes. ErbB-1 also is named epidermal growth factor receptor (EGFR), and ErbB-2 is also named HER2 in humans and neu in rodents.

FIG. 17 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. FIG. 17 illustrates a determination of canine and murine genes that have statistically different abundances and identification of predicted upstream regulators of these genes with respect to (1) controls versus xenograft samples, and (2) OS-1 xenograft samples versus OS-2 xenograft samples. As illustrated in FIG. 17, predicted upstream regulators for 125 statistically different mouse (host) genes between controls and xenograft samples were identified using the data analysis technique of FIG. 17.

FIG. 18 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. At FIG. 18, predicted upstream regulators for 325 statistically (p<0.005) different mouse (host) genes between OS-1 and OS-2 xenograft samples are determined. Predicted activity is in OS-2 with respect to OS-1 xenograft samples.

FIG. 19 is a graphical representation of a data analysis technique in accordance with the examples of this disclosure, as applied to the OS-1/OS-2 xenograft example, and further illustrates a portion of the bioinformatics workflow analysis in accordance with an example technique of this disclosure. FIG. 19 illustrates predicted upstream regulators of 530 statistically (p<0.05) different canine genes between OS-1 and OS-2 xenograft samples. Predicted activity is in OS-2 with respect to OS-1 xenograft samples. In some examples, canine genes may validate previously reported data indicating differences between tumors arising from OS-1 and OS-2, although canine genes isolated from blood may be different from genes obtained from tissue (e.g., from tumor tissue).

FIGS. 20A and 20B are graphical representations of a data gathering and analysis technique in accordance with the examples of this disclosure. FIG. 20A illustrates that gene signatures of tumor cells in OS xenografts resemble those of parent cell lines. FIG. 20B illustrates hierarchical clustering of tumor xenografts and parent cell lines with canine and murine genes. One obstacle to using xenograft models to understand the heterogeneity of genetically complex tumors is the presumption that these tumors are unstable and will drift rapidly as they adapt to the host microenvironment. Indeed, previous data suggest that altered genomic signatures due to tumor cell plasticity and/or harsh clonal selection lead to unpredictable behavior of tumor cell lines after being transplanted into mice. Here, RNA sequencing was used to examine the stability of key transcriptomic properties between the parental OS cell lines and their corresponding tumor xenografts. The tumor xenografts were more similar to their corresponding parent cell lines than to each other or to the alternative cell line based on principal components analysis and by unsupervised clustering (FIG. 20A), where tumor xenografts were assigned to the same group as their corresponding parent cell line based on the expression signatures from canine genes.

As shown in FIG. 20A, gene signatures of parent tumor cell lines maintained in OS-1 and OS-2 xenograft tumors. 24,579 total canine genes were filtered to remove genes that did not have a log 2 counts per million (CPM) mean-centered value ≥1 in at least two samples. 13,141 genes remained after filtering. The heatmap represents clustered gene-level counts with lower than mean (values of −2.00-˜−0.10; e.g., areas 266 in FIG. 20B), higher than the mean (˜0.10-2.00; e.g., area 267 in FIG. 20B), and mean (0.00) levels of expression. Each row represents a single gene. The dendrogram represents the distance or dissimilarity between sample clusters, calculated using unsupervised hierarchical clustering on CPM values for the 13,141 filtered genes. In this dendrogram, there are two sample clusters as two branches that occur at about the same vertical distance. One of the sample clusters consists of four OS-1 (260) xenograft tumors (261) and two parental cell line replicates (262), and one of these clusters consists of four OS-2 (263) xenograft tumors (264) and two parental cell line replicates (265). All replicates are biological replicates.

When dog and mouse genes were analyzed together, expression of mouse-specific genes was not detected in the canine cell lines (FIG. 20B), indicating that the mouse genes present in the tumor xenograft tissues could be accurately differentiated from the dog genes using the comparative bioinformatics approach. Furthermore, significantly larger numbers of mouse genes were detectable in OS-2 than in OS-1 xenografts, suggesting the former tumors were more heavily infiltrated by host stroma (FIG. 20B).

FIG. 20B illustrates log-transformed and mean-centered counts per million (CPM) values for 47,997 canine and murine genes in xenograft and parental cell line samples that were filtered to remove genes that did not have a log 2 CPM mean-centered value ≥3 in at least two samples. This filtering step excluded most of the canine genes while including the murine genes. Unsupervised hierarchical clustering with counts per million values for the remaining 13,968 canine and murine genes indicated that expression levels of murine genes in canine cell lines was absent in comparison to the tumor xenografts, illustrating that murine genes can be properly differentiated from the canine genes in xenografts.

FIGS. 21A-21C are graphical representations of data gathering and analysis techniques in accordance with the examples of this disclosure, and illustrate that OS-1 and OS-2 xenografts may promote distinct tumor-associated stromal environments. To determine the nature of the stromal interactions and the identity of the infiltrating cells in the xenografts, pair-wise Exact Test comparisons were performed, with TMM normalization of gene counts, to identify the differentially expressed murine genes in tumors from each group (OS-1 and OS-2). Four biological replicates were used for each OS subtype. Common dispersion across all genes was calculated as 0.079 and the biological coefficient of variation (BCV) as 0.23. Mean tag-wise dispersion (individual dispersion for each gene) was calculated as 0.095. Using a false discovery rate (FDR)-adjusted p-value of <0.005 and log₂fold change >2,482 genes were identified that were expressed at significantly different levels between the two groups (FIG. 21A; Table S2). After identifying differentially expressed genes (DEG), log transformed and mean-centered counts per million (CPM) values for 47,997 canine and murine genes were generated. The Pearson distance similarity metric and average linkage clustering method was used for hierarchical clustering of log₂CPM values for the 482 differentially expressed murine genes (see table S2 below for detailed gene lists).

The heatmap in FIG. 21A shows clustered gene-level counts with lower than mean (negative values; e.g., area 270), higher than the mean (positive values; e.g., area 272), and mean (value=zero) levels of expression. Each row represents a single gene. The dendrogram of the horizontal axis of the heat map shows two sample clusters; OS-1 and OS-2 xenografts are in separate sample groups (FIG. 21A). The rows of the heat map (vertical axis) cluster into two highly correlated groups. Rows corresponding to a positive value in the vertical dendrogram are murine genes that are upregulated in OS-2 xenografts (e.g., a majority of the rows in region 270), whereas rows corresponding to a negative value are downrgulated relative to OS-1 xenografts (e.g., a majority of the rows in region 272). Enriched pathway and functional classification analyses of DEGs were performed using QIAGEN's Ingenuity® Pathway Analysis (IPA®, QIAGEN Redwood City, www.qiagen.com/ingenuity) according to row cluster designation. Upregulated genes are identified in FIG. 21B, and downregulated genes are identified in FIG. 21C.

To better understand the differences between OS-1 and OS-2, the upregulated and downregulated murine genes in OS-2 were considered as separate lists and used IPA to identify enriched biological functions and transcription factors that regulate these genes. The 482 differentially expressed murine genes included 240 that were upregulated, and 242 that were downregulated in OS-2 tumor xenografts relative to OS-1 tumor xenografts. The most upregulated murine gene in the OS-2 xenografts was Mcpt1 (+11.25 fold), whereas the most downregulated murine gene was Nkx2-1 (−10.97 fold) (see table S2, which is shown below at paragraph [0140]).

Based on biological function and processes, the most upregulated murine genes in OS-2 tumors were proteases, metallopeptidases, cytokines, and chemokines involved in cell movement, leukocyte migration, inflammation, and angiogenesis (FIG. 21C, Table S2). On the other hand, the most downregulated genes in OS-2 tumor xenografts were transcriptional regulators of cellular differentiation and cell cycle involved in formation and morphology of muscle (FIG. 21C, Table S2).

FIGS. 22-27 are tables providing additional information pertaining to the application of the techniques described herein to the OSC1/OSC2 xenograft example. For example, the tables illustrated in FIGS. 22-27 provide information pertaining to the metastatic rates, rates of tumor progression, and pathways for differentially expressed murine genes and their upstream regulators for this particular example.

FIG. 22 is a table that illustrates the metastatic properties of the OS-1 xenografts as compared to the OS-2 xenografts at 15-57 days post-injection. When the mice from all the experiments were considered together, OS-2 cells achieved metastatic dissemination more rapidly than OS-1 cells (by 15, 22, and 29 days), although the rate of microscopic and macroscopic metastasis between the two groups on Day 36 when the experiments were terminated was not different based on imaging (p=0.35) or histopathology (p=0.77). The different metastatic propensity of OS-1 and OS-2 was confirmed histologically as illustrated in FIG. 9.

FIG. 23 is a table that illustrates that the OS-1 and OS-2 xenografts show differential rates of tumor progression. As shown in FIG. 23, the progress of OS-2 xenograft tumors (as determined by measured in change in tumor volume) was significantly more rapid than the progress of the OS-1 xenograft tumors from 22-43 days post-injection.

FIG. 24 is a table that illustrates a MetaCore analysis identifying pathways for murine genes that are differentially expressed between OS-1 and OS-2 xenograft tumors. The top 10 most enriched pathways, shown in FIG. 24, suggest immune and inflammatory themes that modulate IL-17, TGF-beta signaling, the complement system, and patterning behavior and cytoskeletal remodeling with involvement of Rho GTPases. Analysis of the 482 murine genes identified as differentially expressed between OS-1 and xenograft tumors was performed with MetaCore software (https://portal.genego.com/) to show the top 10 processes and pathways ranked in terms of the enrichment of the common target-related genes (p-value).

FIG. 25 is a table identifying upstream regulators of differentially-expressed murine genes in OS-2 xenograft tumors as compared to OS-1 xenograft tumors. These upstream regulators of the 482 differentially expressed murine genes were observed using IPA. The most significant, predicted activated upstream regulators in OS-2 (worse prognosis), relative to OS-1 tumor xenografts were CEBPB and NFKB1 (p-value 5.54E-10 and 3.94E-09, respectively), whereas the most significant, predicted inhibited upstream regulator was MEF2C (p-value 2.54E-23) (FIG. 25). The retinoblastoma tumor suppressor gene (RB1) was also among the predicted significant upstream regulators (p-value 1.25E-04) showing inactivation in OS-2 xenograft tumors, as otherwise may have predicted based on previous studies. The differentially expressed murine genes were determined by pair-wise Exact Test comparisons in EdgeR. IPA was used to determine upstream modulators of the 482 differentially expressed genes and their predicted activities. Predicted activity based on gene expression values in OS-2 xenografts relative to OS-1 xenografts. A Z-score >2 indicates activation, while a Z-score <−2 indicates inactivation.

FIG. 26 is a table identifying upstream regulators of the upregulated murine genes in OS-2. The upstream regulators predicted to modulate expression and activity of the 240 upregulated murine genes expressed in the OS-2 tumor xenografts included the T-helper cell type-17 (Th17) activating cytokines TGF-β (p-value 1.26E-27), IL-1B (p-value 9.07E-25), and IL-6 (p-value 9.03E-22). Differentially expressed genes were determined by pair-wise Exact Test comparisons in EdgeR. IPA was used determined upstream modulators of the 240-upregulated murine genes in OS-2 xenografts. Predicted activity based on gene expression values in OS-2 xenografts. A Z-score >2 indicates activation, while a Z-score <−2 indicates inactivation.

FIG. 27 provides a table indicating upstream regulators of downregulated murine genes in OS-2 xenografts relative to OS-1 xenografts. The top upstream regulators predicted to modulate expression and activity of the 244 downregulated murine genes in the OS-2 xenografts were MEF2C and MYOD1 (p-value 1.15E-24 and 2.16E-15, respectively). MEF2C and MYOD1, both predicted as being inhibited in OS-2 xenografts and activated in OS-1 tumors, are important in promoting transcription of muscle-specific target genes and play a role in muscle differentiation. Differentially expressed genes were determined by pair-wise Exact Test comparisons in EdgeR. IPA was used determined upstream modulators of the 242-downregulated murine genes in OS-2 xenografts. Predicted activity based on gene expression values in OS-2 xenografts. A Z-score >2 indicates activation, while a Z-score <−2 indicates inactivation.

FIGS. 28A-28C are graphical representations of a workflow by which RNA contents of OS-derived exosomes from cultured cells may be defined using next-generation sequencing, and outcomes of example of data analyses performed on data derived from the workflow indicating that exosomes from OS-1 and OS-2 contain transcripts involved in different cell behaviors. FIG. 28A illustrates that an example workflow may include establishing the presence of exosomes in OS cultured cells, purifying the exosomes from OS cell culture and validating the purification, isolating RNA molecules, sequencing the RNA molecules, and then performing bioinformatics analysis on the resulting sequences to determine the identity of the RNA molecules, as well as other characteristics of the RNA molecules. For example, as shown in FIGS. 28B and 28C, other characteristics of the RNA molecules that may be determined by the bioinformatics analysis of FIG. 28A may include identifying pathways and cell behaviors associated with genes corresponding to the sequenced RNA molecules. For example, the heatmap and table of FIG. 28B indicate that OS-1 derived exosomes may contain RNA associated with genes involved in cellular signaling and metabolism, whereas the heatmap and table of FIG. 28C indicate that OS-2 derived exosomes may contain RNA associated with genes involved in communication between innate and adaptive immune cells.

FIGS. 29A-29C are graphical representations of a workflow by which RNA contents of OS-derived exosomes from cultured cells may be defined, and outcomes of example data analyses performed on data derived from the workflow indicating that decreased expression of cytokines may be found in fibroblasts treated with OS-2 derived exosomes. FIG. 29A illustrates that an example workflow may include culturing a test group of fibroblasts with exosomes derived from OS cells (with a phenotype of either OS-1 or OS-2), and culturing a control group of fibroblasts without exosomes derived from OS cells. The workflow may further include isolating RNA from the groups of fibroblasts, sequencing the RNA molecules, and then performing bioinformatics analysis on the resulting sequences to determine the identity of the RNA molecules and differences in gene expression between the test fibroblasts that were cultured with OS exosomes and the control fibroblasts that were cultured without OS exosomes. FIG. 29B is a table indicating differentially-expressed genes identified in fibroblasts that were cultured with OS-1 exosomes, as compared to fibroblasts cultured with OS-2 exosomes and/or as compared to fibroblasts cultured without OS exosomes. FIG. 29C is a table indicating differentially-expressed genes identified in fibroblasts that were cultured with OS-2 exosomes, as compared to fibroblasts cultured with OS-1 exosomes and/or as compared to fibroblasts cultured without OS exosomes. In the illustrated example of FIGS. 29A-29C, the RNA sequences derived from the fibroblasts cultured with OS-2 exosomes indicated that such fibroblasts had decreased expression of cytokines as compared to fibroblasts cultured with OS-1 exosomes and/or as compared to fibroblasts cultured without OS exosomes.

FIGS. 30A and 30B are graphical representations of differentially expressed mouse genes. The heatmaps depicted in FIGS. 30A and 30B illustrate top differentially expressed genes (mouse) identified by MetaCore Analysis, and indicates that different platforms and different methods of measuring gene expression and analyzing resulting data may produce consistent results.

FIGS. 31A and 31B are graphical representations of differentially expressed mouse genes and canine orthologs of the mouse genes. The differentially-expressed mouse genes are depicted by the heatmap in FIG. 31A, and the canine orthologs of the differentially-expressed mouse genes are depicted in the same order by the heatmap in FIG. 31B. FIGS. 31A and 31B illustrate that the differential expression of mouse transcripts (e.g., mRNA profiles) in exosomes isolated from mice harboring OS-1 or OS-2 xenografts was not due to spurious mapping of canine genes to the mouse genome.

FIG. 32 is a graphical illustration of a bioinformatics method that shows the number of transcripts at each step of differential expression analysis. The ‘DESeq2’ package in RStudio (version 0.99.491) was used for differential analysis of transcript counts obtained from Kallisto data. Transcript counts were first summarized to gene counts and then DESeq2 was used to convert count values to integer mode, correct for library size, and estimate dispersions and log 2 fold changes between comparison groups. Genes with a BH adjusted p-value <0.05 and log 2 fold change >+/−2 between control and xenograft samples were called significant. Statistically differentially expressed canine genes were removed if they had a DeSeq2 normalized value of greater than zero in the control (mouse sequences), as these may be genes that are highly homologous between the mouse and canine).

FIG. 33A-33C are is graphical representations of 198 differentially expressed transcripts. FIG. 33A is a heatmap representing all 198 differentially expressed transcripts. FIGS. 33B and 33C are close-up portions of the heatmap of FIG. 33A that respectively represent mouse- and dog-specific transcripts.

FIGS. 34A-34D are graphical representations of the detection of biomarkers of disease and host response. FIG. 34A is a heatmap of 25 most differentially expressed dog transcripts identified by statistical with DESeq2, which may be validated in dog patients with spontaneous osteosarcoma, and incudes a graphical representation of a cycle in which exosomes from dogs with osteosarcoma and healthy dogs is used for in-species validation of transcripts identified from serum exosomes of xenograft models. FIG. 34B is a heatmap of 38 differentially expressed mouse genes. FIG. 34C illustrates significant pathways, and FIG. 34D illustrates a top network (cell cycle) identified by IPA as being associated with differentially expressed host (mouse) genes shown in FIG. 34B.

In the example described herein, mouse xenografts were used to study the heterogeneity and biological behavior of OS in vivo. Specifically, this approach creates opportunities to examine tumor-intrinsic properties, as well as organotypic, tumor-stromal interactions that influence tumor progression. Cells were injected at the orthotopic site to simulate the biology of the spontaneous disease. The anatomical site of implantation may be considered carefully, as the biological behavior of tumors is dependent on both the intrinsic properties of tumor cells and host factors that differ between tissues and organs. The microenvironment in subcutaneous xenografts consists of desmoplastic mouse stromal cells that do not resemble the organization seen in autochthonous tumors. These properties also apply to OS: orthotopic canine OS xenografts in nude mice produced osteoid matrix and metastasized spontaneously, while subcutaneous xenografts did not.

The data show that heterogeneity of biological behavior (including metastatic propensity) can be recapitulated to a limited extent in tumors from cell lines, but more readily by utilizing multiple cell lines that cover the spectrum of tumor behavior. Further, the data show that the major genetic drivers that distinguish the two canine OS cell lines in vitro were retained in the orthotopic xenografts. In addition to stability of the transcriptome, the cell lines show stable morphology from the primary canine tumors to the primary orthotopic tumors, and to the metastatic tumors. Confirmation of genetic and morphologic stability over many passages was validated the utility of the present model to understand OS tumor heterogeneity.

As predicted from the original behavior of the spontaneous tumors in the dogs and from their gene and microRNA expression signatures, the logarithmic expansion phase of OS-2 primary xenografts was faster than that of OS-1 primary xenografts. However, both cell lines seemed to reach the tumor endpoints at approximately the same time. Two factors might account for this. First, the tumors are growing within a cavity surrounded by bone, and despite the fact that OS-2 xenografts showed greater epiphyseal destruction and invasion, the bone constrains the maximum size achievable by the primary tumors within the experimental time frame. Second, mice with OS-2 xenografts did not show greater morbidity than mice with OS-1 xenografts, determined by the absence of lameness, ambulatory deficits, and other behaviors associated with chronic pain. This could be due to adaptive behavior of prey species to hide pain; however, previous work has shown that painful intramedullary bone tumors produce behavioral changes in mice. It should be noted that these cell lines accurately represent the biological behavior of the tumors from which they were originally derived, and more broadly the classification of more aggressive and less aggressive tumors. Furthermore, such properties have been verified independently by other groups using one of these cell lines, and they generally extend to human and murine osteosarcoma.

Beyond growth at the primary site, biological behavior can be quantified by metastatic propensity and successful spread to distant sites. Again, the predictions from the original spontaneous tumors were confirmed experimentally in the models described herein. OS-2 cells were a representative example from a group of highly aggressive tumors (worse prognosis) that showed high expression of cell cycle and DNA damage repair associated genes, with concomitant reduced expression of a complement of genes that defined “microenvironment interactions. This reduced expression of molecules that mediate local cell communication could explain, at least in part, the observation that cells injected intratibially achieved rapid systemic distribution, spreading to the lungs within 6 hours; i.e., there was nothing to hold the cells in place, and they had no preference to remain in the local bone environment.

The results suggest that even though both OS-1 and OS-2 cell lines can establish a metastatic niche, they do so with different kinetics, creating a suitable model to study intrinsic differences in metastatic propensity, as well as host-related factors that contribute to the metastatic niche in OS.

Based on these observations, two distinct mechanisms for the different metastatic potential of OS-1 and OS-2 xenografts are herein proposed. OS-2 cells might have greater metastatic potential due to their interaction with the local microenvironment in the bone, which leads to reduced retention, and potentially to an increased capability to condition the distant site. The alternative possibility is that OS-2 cells seed the lungs shortly after inoculation, and even though many of these cells might leave the lungs or die, accounting for the loss of luciferase signal by 24 hours, some cells remain and eventually form the pulmonary lesions (i.e., equivalent to seeding or colonization by intravenous inoculation). Preliminary experiments suggest that OS-1 and OS-2 cells have low efficiency of pulmonary colonization upon intravenous injection which would indicate the first possibility occurs.

Highly expressed mouse genes present in the OS-2 xenografts were associated with B cell signaling, inflammation, and immune response, whereas mouse genes in the OS-1 cells xenografts were associated with patterning, and especially with muscle formation. Increased expression of myogenic regulators in mouse stromal cells in OS-1 xenografts raises interesting questions regarding possible effects of OS-1 tumor cells on marrow derived mesenchymal stromal cells.

Intriguingly, the most downregulated murine gene in the OS-2 xenografts was the transcription factor Nkx2-1, which is known to regulate lung epithelial cell morphogenesis and differentiation. Down-regulation of NKX2-1 has been shown to precede dissemination of lung adenocarcinoma cells. NKX2-1 amplification has been reported in one human OS patient but there are no reports of down regulation or loss of NKX2-1 in OS patients.

Thus, xenograft models that recapitulated the heterogeneous biological behavior of OS have been developed and have been described herein. These models may be useful to understand the mechanisms that drive progression and metastasis of OS, as they are expandable into additional cell lines to represent a wider spectrum of disease.

Materials and Methods

Cells and culture conditions: Two canine OS cell lines representing previously described “highly aggressive” and “less aggressive” molecular phenotypes (OS-1 and OS-2), were used in this study. OS-1 and OS-2 are derivatives of the OSCA-32 and OSCA-40 and OSCA-40 cell lines. Specifically, OS-1 represents a subline that successfully established tumors after orthotopic implantation, as the parental OSCA-32 did not establish heterotopic or orthotopic tumors in every occasion. OS-2 represents the parental OSCA-40, which reliably formed tumors after orthotopic implantation in every experiment done.

Cell lines were validated using STR Short Tandem Repeats (STR) profiles by DNA Diagnostics Center (DDC Medical) (Fairfield, Ohio). OS-1 and OS-2 cells were modified to stably express green fluorescent protein (GFP) and firefly luciferase as described (Scott et al., 2015) and used for orthotopic injections in mice. After transfection and selection, it was confirmed that the GFP/luciferase construct was stably integrated in each cell line by fluorescence in situ hybridization, and it was corroborated that the two cell lines had approximately equivalent luciferase activity on a per cell basis using conventional luciferase assays. All cell lines were grown in DMEM (Gibco, Grand Island, N.Y.) containing 5% glucose and L-glutamine, supplemented with 10% fetal bovine serum (Atlas Biologicals, Fort Collins, Colo.), 10 mM 4-(2-hydroxyethyl)-1-piperazine ethanesulphonic acid buffer (HEPES) and 0.1% Primocin (Invivogen, San Diego, Calif.) and cultured at 37° C. in a humidified atmosphere of 5% CO₂. Canine OS cell lines are available for distribution through Kerafast, Inc. (Boston, Mass.). Each cell line was passaged more than 15 times before the experiments when they were inoculated into mice.

Mice: Six-week old, female, athymic nude mice (strain NCr^nu/nu) were obtained from Charles River Laboratories (Wilmington, Mass.). The University of Minnesota Institutional Animal Care and Use Committee approved protocols for mouse experiments of this study (Protocol No.: 1307-30806A).

Tumor xenografts: Eight animals per group provide >95% power to identify a 15% change in the median time to tumor when the a for both populations is <2.0 and the acceptable α error is 5% (p<0.05). Experimental replicates increased statistical robustness, accounting for the expected heterogeneity.

Four replicate experiments were done to assess orthotopic growth and metastatic dissemination of OS-1 and OS-2 cells. For the first pilot experiment, groups of three mice were used to validate the approach. All of the mice receiving OS-1 xenografts showed successful implantation, but only two of the three mice receiving OS-2 xenografts showed successful implantation. For the second experiment, groups of 16 mice were used to establish significance. In this experiment, all of the mice receiving OS-2 xenografts showed successful implantation, but eight mice injected with OS-1 xenografts had significant adverse effects during anesthesia and were not recovered (i.e., they were humanely euthanatized). For the third experiment, nine mice were inoculated with OS-2 cells to verify the unexpected effects of rapid dissemination to the lung. No mice received OS-1 for this experiment. Finally, for the fourth experiment, five mice were inoculated with each cell line (OS-1 and OS-2) to achieve a biological replicate of experiment two, maintaining the sample size at a number to maximize a positive outcome. Appropriate censoring was used to include all animals in the analyses, only excluding any which succumbed acutely or subacutely during the intratibial injection procedure. Thus, 16 mice inoculated with OS-1 were included in the analyses of tumor growth, and 32 mice inoculated with OS-2 were included in the analyses of tumor growth.

It was previously determined that four samples per group approximate the point of minimal returns using large genomic datasets for gene expression profiling, and these estimates hold true from microarrays to RNAseq where the fidelity of replication within samples is high, despite orders of magnitude more data (see analysis of RNA sequencing below).

Animals were assigned to separate cages (4 animals each) in random order for each experiment. All of the animals in each cage received the same treatment. OS-1 and OS-2 cells expressing GFP and firefly luciferase were injected intratibially. Mice were anesthetized with xylazine (10 mg/kg, I.P.) and ketamine (100 mg/kg, I.P.), and 1×10⁵cells suspended in 10 μl of sterile PBS were injected into the left tibia using a tuberculin syringe with 29-gauge needle. Buprenorphine (0.075 mg/kg, I.P. q.8 hours; Buprenex®, Reckitt Benckiser Healthcare, Richmond, Va.) was used for pain control over the first 24 hours after injection of tumor cells, and prophylactic ibuprofen administrated in the water was used over the next 3.

Tumor growth was monitored by measuring width-and-length of the proximal tibia and the stifle joint weekly using calipers, as well as by in vivo imaging as described (Kim et al., 2014). Bioluminescence imaging (Xenogen IVIS spectrum, Caliper Life Sciences, Hopkinton, Mass.) was done after injection of D-luciferin (Gold biotechnology, St. Louis, Mo.) following isoflurane inhalant anesthesia and analyzed with Living Image Software (Caliper Life Sciences). Bone tissue volume was calculated from both tibiae using the equation V=L×W²×0.52 (Banerjee et al., 2013) and tumor volume was estimated by subtracting the normal bone tissue volume of the contralateral unaffected (right) tibia from the volume of the affected (left) tibia.

Mice were observed for up to 8 weeks or until tumor endpoint criteria were reached (ill thrift, tumor reaching 1 cm in the largest diameter, visible lameness, pain, or severe weight loss), at which time they were humanely euthanized with pentobarbital sodium and sodium phenytoin solution (Beuthanasia-D Special®, Schering-Plough Animal Health, Union, N.J.). Primary bone tumors and lung tissues were dissected and a portion of each was stored at −80° C. for RNA extraction. The remaining tissues were fixed in 10% neutral-buffered formalin, and processed for routine histological examination.

Luciferase activity and tumor sizes were compared using multiple t-test and Holm-Sidak method with Prism 6 software (GraphPad). p<0.05 was used as the level of significance.

RNA extraction, library preparation, and RNA sequencing: Total RNA was extracted from primary intratibial tumors and from cell lines using miRNeasy Mini Kit (QIAGEN, Valencia, Calif.). RNA integrity was examined using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, Calif.) and RIN values of all samples were >8.0. Sequencing libraries were prepared with TruSeq Library Preparation Kit (Illumina, San Diego, Calif.). RNA sequencing (100-bp paired-end) with HiSeq 2500 (Illumina) was done at the University of Minnesota Genomics Center (UMGC). A minimum of ten million read-pairs was generated for each sample.

Analysis of RNA sequencing data: Initial quality control analysis of RNA sequencing (FASTQ) data for each sample was performed using the FastQC software (version 0.11.2)(Andrews). FASTQ data were trimmed with Trimmomatic (Bolger, 2014). HISAT2 (Kim et al., 2015) was used to map paired-end reads from eight xenograft tumors (four tumors of OS-1 and four tumors of OS-2) and four parental cell line samples (two each for OS-1 and OS-2 cell lines). For accurate alignment of sequencing reads to canine and murine genes within xenograft tumors a HISAT2 index for mapping was built from a multi-sequence fasta file containing both the canine (canFam3) and murine (mm10) genomes. Insertion size metrics were calculated for each sample using Picard software (version 1.126) (http://picard.sourceforge.net.). Samtools (version 1.0_BCFTools_HTSLib) was used to sort and index the bam files (Li et al., 2009). Transcript abundance estimates were generated using the Rsubread featureCounts program for differential gene expression analysis (Liao et al., 2014).

Gene counts for each xenograft sample were imported into RStudio (v. 3.2.3) for differential gene expression (DGE) analysis with EdgeR. Lowly expressed genes were removed by filtering. A gene was considered expressed if had log 2-transformed read counts per million (CPM)>1 in at least two of the eight xenograft tumors. Biological variation within xenograft sample groups was estimated by common dispersion and biological coefficient of variation (BCV) calculations. Pair-wise empirical analysis of differential gene expression was performed on sample groups (OS-1 and OS-2) using ‘Exact Test’ for two-group comparisons with trimmed mean of M-values (TMM) normalization (Robinson and Oshlack, 2010). Tagwise dispersion (individual dispersion for each gene) was used to adjust for abundance differences across biological replicates (n=4) within each xenograft group (OS-1 and OS-2). Gene counts as CPM, were imported into Partek Genomic Suite for clustering analysis and visualization. The Pearson similarity metric and average linkage clustering method were used for hierarchical clustering of mean-centered CPM values. Enriched pathway and functional classification analyses of DGEs were performed using IPA. The reference set for all IPA analyses was the Ingenuity Knowledge Base (genes only) and human Entrez gene names were used as the output format. To understand the high level functions and utilities that each gene identified as differentially expressed between OS-1 and OS-2 was associated with, Metacore software (Thompson Reuters) was used to identify statistically over-represented cellular processes in the dataset.

The following table, referred to above as Table S2, provides a list of differentially-expressed murine genes in OS-2 relative to OS-1 xenografts. Fold change (FC), p values, and FDR-adjusted p values were calculated by pair-wise Exact Test comparisons in EdgeR. Genes were annotated with the Ingenuity Knowledge Base of IPA.

Gene logFC PValue FDR Entrez Gene Name Location Type(s) Nkx2-1 −10.97163756 3.67E−43 2.51E−39 NK2 homeobox 1 Nucleus transcription regulator Zic1 −10.37720965 2.80E−28 5.47E−25 Zic family member 1 Nucleus transcription regulator Kcnq2 −7.93642633 1.90E−18 1.37E−15 potassium channel, Plasma ion channel voltage gated KQT- Membrane like subfamily Q, member 2 Smpd3 −5.937398394 4.58E−11 4.93E−09 sphingomyelin Cytoplasm enzyme phosphodiesterase 3, neutral membrane (neutral sphingomyelinase II) Phex −5.479136674 3.98E−13 7.78E−11 phosphate regulating Cytoplasm peptidase endopeptidase homolog, X-linked Fam43b −5.16551588 7.72E−13 1.37E−10 family with Other other sequence similarity 43, member B Myoz2 −5.031353137 1.21E−14 3.75E−12 myozenin 2 Other other Myh2 −5.016278218 1.42E−10 1.37E−08 myosin, heavy chain Cytoplasm enzyme 2, skeletal muscle, adult Hoxd13 −5.005651203 6.42E−16 2.83E−13 homeobox D13 Nucleus transcription regulator Slcl3a5 −4.983558827 3.64E−11 4.15E−09 solute carrier family Plasma transporter 13 (sodium- Membrane dependent citrate transporter), member 5 Panx3 −4.824377849 2.74E−06 7.53E−05 pannexin 3 Plasma other Membrane Tmem145 −4.602792152 2.02E−10 1.92E−08 transmembrane Other other protein 145 Lect1 −4.567514167 5.15E−09 3.25E−07 leukocyte cell Extracellular other derived chemotaxin Space 1 Asic3 −4.473594151 2.73E−14 7.19E−12 acid sensing (proton Plasma ion channel gated) ion channel 3 Membrane Ankrd2 −4.445048104 2.61E−19 2.23E−16 ankyrin repeat Nucleus transcription domain 2 (stretch regulator responsive muscle) Fam180a −4.444919786 1.04E−07 4.35E−06 family with Other other sequence similarity 180, member A Col9a1 −4.397295279 2.52E−11 3.08E−09 collagen, type IX, Extracellular other alpha 1 Space Fabp3 −4.367492217 2.88E−18 1.97E−15 fatty acid binding Cytoplasm transporter protein 3, muscle and heart Lyz1 −4.308114309 0.000102599 0.001637874 lysozyme Extracellular enzyme Space Adamts18 −4.303169665 3.57E−09 2.42E−07 ADAM Extracellular peptidase metallopeptidase Space with thrombospondin type 1 motif, 18 Sp7 −4.224816491 1.29E−08 7.16E−07 Sp7 transcription Nucleus transcription factor regulator Omd −4.186098129 2.89E−17 1.58E−14 osteomodulin Extracellular other Space 1700101I11Rik −4.155168387 2.75E−07 9.92E−06 RIKEN cDNA Other other 1700101I11 gene Dlx6 −4.109636472 3.22E−11 3.76E−09 distal-less homeobox Nucleus transcription 6 regulator Actn2 −4.069827028 5.15E−17 2.71E−14 actinin, alpha 2 Nucleus transcription regulator Tceal7 −4.038267592 4.23E−06 0.000108501 transcription Nucleus transcription elongation factor A regulator (SII)-like 7 Xirp2 −4.014810326 6.23E−10 5.33E−08 xin actin binding Other other repeat containing 2 Paqr6 −3.961900754 2.42E−10 2.28E−08 progestin and Plasma other adipoQ receptor Membrane family member VI Csrp3 −3.936461238 8.12E−11 8.23E−09 cysteine and Nucleus other glycine-rich protein 3 (cardiac LIM protein) Fcrls −3.810649791 2.99E−37 1.36E−33 Fc receptor-like S, Plasma other scavenger receptor Membrane Ckmt2 −3.806425077 2.10E−09 1.55E−07 creatine kinase, Cytoplasm kinase mitochondrial 2 (sarcomeric) Myl6b −3.787544034 9.99E−09 5.82E−07 myosin, light chain Cytoplasm other 6B, alkali, smooth muscle and non- muscle Gli1 −3.740040126 9.38E−17 4.58E−14 GLI family zinc Nucleus transcription finger 1 regulator Zdbf2 −3.73484272 1.50E−07 6.03E−06 zinc finger, DBF- Other other type containing 2 Opn1mw −3.726345151 8.04E−07 2.55E−05 opsin 1 (cone Plasma G-protein pigments), long- Membrane coupled wave-sensitive receptor Gsg1l −3.72152491 9.85E−10 8.07E−08 GSG1-like Plasma other Membrane Abra −3.69798677 9.51E−13 1.65E−10 actin binding Rho Cytoplasm other activating protein Myom3 −3.681916158 4.38E−08 2.07E−06 myomesin 3 Other other Serpinb1c −3.657917675 4.90E−11 5.20E−09 serine (or cysteine) Extracellular other peptidase inhibitor, Space clade B, member 1c Foxg1 −3.648321574 1.21E−19 1.14E−16 forkhead box G1 Nucleus transcription regulator Ifitm5 −3.617680152 2.81E−09 2.03E−07 interferon induced Plasma other transmembrane Membrane protein 5 9130024F11Rik −3.490015735 2.34E−12 3.72E−10 RIKEN cDNA Other other 9130024F11 gene Csrnp3 −3.450251912 1.48E−06 4.37E−05 cysteine-serine-rich Nucleus transcription nuclear protein 3 regulator Slc47a1 −3.443994582 2.09E−06 5.96E−05 solute carrier family Plasma transporter 47 (multidrug and Membrane toxin extrusion), member 1 2410137F16Rik −3.441108061 6.33E−07 2.07E−05 #N/A #N/A #N/A A930003A15Rik −3.43849311 7.11E−08 3.21E−06 RIKEN cDNA Other other A930003A15 gene Col11a2 −3.43289575 9.67E−09 5.68E−07 collagen, type XI, Extracellular other alpha 2 Space Cst6 −3.420908168 2.33E−07 8.66E−06 cystatin E/M Extracellular other Space Actc1 −3.400606916 1.73E−07 6.78E−06 actin, alpha, cardiac Cytoplasm enzyme muscle 1 Atp1b4 −3.352991208 2.08E−05 0.000425726 ATPase, Na+/K+ Plasma transporter transporting, beta 4 Membrane polypeptide Alpk2 −3.341104185 9.43E−05 0.001527325 alpha-kinase 2 Nucleus kinase Myh1 −3.324236394 1.55E−09 1.18E−07 myosin, heavy chain Plasma enzyme 1, skeletal muscle, Membrane adult Hspb3 −3.317439048 1.86E−08 9.63E−07 heat shock 27 kDa Cytoplasm other protein 3 Itgb1bp2 −3.311620649 2.66E−09 1.93E−07 integrin beta 1 Other other binding protein (melusin) 2 Casq2 −3.292086149 1.53E−08 8.17E−07 calsequestrin 2 Cytoplasm other (cardiac muscle) Dmp1 −3.257994756 1.22E−09 9.51E−08 dentin matrix acidic Extracellular other phosphoprotein 1 Space Plb1 −3.242320263 4.00E−07 1.39E−05 phospholipase B1 Cytoplasm enzyme Cacna2d2 −3.241001086 1.16E−09 9.20E−08 calcium channel, Plasma ion channel voltage-dependent, Membrane alpha 2/delta subunit 2 AU022793 −3.198508581 2.72E−06 7.49E−05 expressed sequence Other other AU022793 Mylpf −3.196418301 5.32E−07 1.77E−05 myosin light chain, Cytoplasm other phosphorylatable, fast skeletal muscle Xirp1 −3.193088545 6.65E−12 9.38E−10 xin actin binding Plasma other repeat containing 1 Membrane Dok7 −3.156216975 4.84E−09 3.08E−07 docking protein 7 Extracellular other Space Hsd11b2 −3.155020704 5.80E−10 5.06E−08 hydroxysteroid (11- Cytoplasm enzyme beta) dehydrogenase 2 Fat3 −3.116929351 9.82E−07 3.04E−05 FAT atypical Plasma other cadherin 3 Membrane Bex1 −3.098852594 5.93E−07 1.96E−05 brain expressed gene Other other 1 Siglec1 −3.056322266 5.56E−36 1.52E−32 sialic acid binding Plasma other Ig-like lectin 1, Membrane sialoadhesin Klhl30 −3.042670774 3.66E−12 5.38E−10 kelch-like family Other other member 30 Srpk3 −3.036559842 8.83E−10 7.33E−08 SRSF protein kinase Cytoplasm kinase 3 Nmrk2 −3.036504708 9.23E−08 3.94E−06 nicotinamide Plasma kinase riboside kinase 2 Membrane Hspb7 −3.035223935 4.33E−12 6.23E−10 heat shock 27 kDa Cytoplasm other protein family, member 7 (cardiovascular) Hspb6 −3.010038437 3.90E−11 4.38E−09 heat shock protein, Cytoplasm other alpha-crystallin- related, B6 Myh3 −3.005412878 3.96E−09 2.63E−07 myosin, heavy chain Cytoplasm enzyme 3, skeletal muscle, embryonic Zim1 −2.999002281 5.28E−07 1.77E−05 zinc finger, Nucleus other imprinted 1 Fhl1 −2.971716832 7.50E−12 1.05E−09 four and a half LIM Cytoplasm other domains 1 Cryab −2.942079665 4.57E−11 4.93E−09 crystallin, alpha B Nucleus other Col26al −2.922599266 0.000140087 0.002118637 collagen, type Extracellular other XXVI, alpha 1 Space Tpm2 −2.901026545 3.06E−11 3.67E−09 tropomyosin 2, beta Cytoplasm other Smpx −2.8905869 3.29E−15 1.25E−12 small muscle Cytoplasm other protein, X-linked Mybpc1 −2.889275402 7.51E−07 2.41E−05 myosin binding Cytoplasm other protein C, slow type Car3 −2.883690634 3.67E−09 2.48E−07 carbonic anhydrase Cytoplasm enzyme III Myl3 −2.879903468 1.51E−05 0.000327184 myosin, light chain Cytoplasm other 3, alkali; ventricular, skeletal, slow Acta1 −2.858497666 2.23E−07 8.33E−06 actin, alpha 1, Cytoplasm other skeletal muscle Adprhl1 −2.839594188 3.39E−09 2.36E−07 ADP- Other enzyme ribosylhydrolase like 1 Robo2 −2.838080162 1.25E−06 3.77E−05 roundabout guidance Plasma transmembrane receptor 2 Membrane receptor Col9a2 −2.836541279 9.14E−11 9.13E−09 collagen, type IX, Extracellular other alpha 2 Space Frzb −2.822407749 2.37E−07 8.77E−06 frizzled-related Extracellular other protein Space Matn3 −2.818890834 2.10E−05 0.000429131 matrilin 3 Extracellular other Space Vgll2 −2.779983734 3.56E−09 2.42E−07 vestigial-like family Nucleus transcription member 2 regulator Alpl −2.774474841 3.78E−13 7.49E−11 alkaline Plasma phosphatase phosphatase, Membrane liver/bone/kidney Cdo1 −2.770636161 1.67E−08 8.76E−07 cysteine Cytoplasm enzyme dioxygenase type 1 Mfsd7c −2.762027564 6.69E−07 2.18E−05 feline leukemia virus Plasma transporter subgroup C cellular Membrane receptor family, member 2 Crhr2 −2.736336233 1.20E−11 1.54E−09 corticotropin Plasma G-protein releasing hormone Membrane coupled receptor 2 receptor Myadml2 −2.724262402 4.46E−06 0.00011354 myeloid- associated Cytoplasm other differentiation marker-like 2 Pax2 −2.719600255 9.94E−06 0.000228268 paired box 2 Nucleus transcription regulator Zic2 −2.715698644 1.17E−05 0.000263232 Zic family member 2 Nucleus transcription regulator S100b −2.709990606 1.76E−14 5.13E−12 S100 calcium Cytoplasm other binding protein B Synpo21 −2.701441875 3.57E−09 2.42E−07 synaptopodin 2-like Cytoplasm other Cox6a2 −2.693936484 4.57E−07 1.54E−05 cytochrome c Cytoplasm enzyme oxidase subunit VIa polypeptide 2 Gm6524 −2.692410726 5.56E−05 0.000970923 katanin p60 Other other (ATPase-containing) subunit A1 pseudogene Ccrl1 −2.676579274 2.47E−06 6.85E−05 #N/A #N/A #N/A Col22a1 −2.674727987 3.90E−10 3.47E−08 collagen, type XXII, Extracellular other alpha 1 Space Cav3 −2.673548764 1.73E−07 6.78E−06 caveolin 3 Plasma enzyme Membrane Slc38a3 −2.654180586 6.41E−05 0.001092134 solute carrier family Plasma transporter 38, member 3 Membrane Tmem8c −2.65326667 8.29E−06 0.000193267 transmembrane Plasma other protein 8C Membrane Klhl41 −2.642347539 2.57E−07 9.39E−06 kelch-like family Cytoplasm other member 41 Des −2.619769533 8.19E−12 1.11E−09 desmin Cytoplasm other Ldb3 −2.619385875 7.34E−06 0.000173688 LIM domain binding Cytoplasm transporter 3 Sbk2 −2.606775291 8.68E−05 0.001420753 SH3 domain binding Other other kinase family, member 2 Popdc2 −2.58927308 2.98E−06 8.02E−05 popeye domain Other other containing 2 Snca −2.588807158 4.55E−06 0.000115237 synuclein, alpha Cytoplasm other (non A4 component of amyloid precursor) Ogn −2.586436671 3.77E−10 3.37E−08 osteoglycin Extracellular growth factor Space Lmod2 −2.574611571 7.58E−07 2.43E−05 leiomodin 2 Other other (cardiac) Lepr −2.568144399 1.03E−08 5.96E−07 leptin receptor Plasma transmembrane Membrane receptor Lrrc30 −2.564534101 0.000103946 0.001655518 leucine rich repeat Other other containing 30 Tuba8 −2.563554458 0.000319672 0.004213331 tubulin, alpha 8 Cytoplasm other Tceal5 −2.557601315 5.74E−05 0.000995638 transcription Other other elongation factor A (SII)-like 5 Myot −2.538095094 1.07E−05 0.000242622 myotilin Cytoplasm other Ndnf −2.537156126 6.13E−10 5.28E−08 neuron-derived Extracellular other neurotrophic factor Space Ch25h −2.531462006 3.17E−12 4.77E−10 cholesterol 25- Cytoplasm enzyme hydroxylase Lrtm1 −2.531422723 0.000304181 0.004044222 leucine-rich repeats Other other and transmembrane domains 1 Yipf7 −2.520519196 3.17E−06 8.47E−05 Yip1 domain family, Other other member 7 Rsad2 −2.52017011 3.91E−08 1.88E−06 radical S-adenosyl Cytoplasm enzyme methionine domain containing 2 Myl1 −2.508451421 8.20E−05 0.001356344 myosin, light chain Cytoplasm other 1, alkali; skeletal, fast Gm10767 −2.507994364 1.62E−05 0.000344605 predicted gene Other other 10767 Col9a3 −2.503274415 0.00020731 0.002966738 collagen, type IX, Extracellular other alpha 3 Space Pdlim3 −2.502660067 9.22E−09 5.50E−07 PDZ and LIM Plasma other domain 3 Membrane Tnnc2 −2.502251052 2.88E−05 0.000568941 troponin C type 2 Cytoplasm other (fast) Myom2 −2.481140713 6.30E−07 2.07E−05 myomesin 2 Cytoplasm other Ccl4 −2.473254523 1.56E−09 1.18E−07 chemokine (C-C Extracellular cytokine motif) ligand 4 Space Fgfr4 −2.467767258 5.43E−05 0.000952702 fibroblast growth Plasma kinase factor receptor 4 Membrane Hand2 −2.466208146 5.90E−05 0.001018934 heart and neural Nucleus transcription crest derivatives regulator expressed 2 Ppargc1a −2.461171212 2.96E−05 0.000580045 peroxisome Nucleus transcription proliferator- regulator activated receptor gamma, coactivator 1 alpha Asb12 −2.455387214 9.25E−05 0.001503012 ankyrin repeat and Nucleus transcription SOCS box regulator containing 12 Klhl40 −2.453997338 1.87E−08 9.68E−07 kelch-like family Other other member 40 Hspa1l −2.448496231 7.22E−10 6.10E−08 heat shock 70 kDa Cytoplasm other protein 1-like Srl −2.437036897 2.82E−06 7.69E−05 sarcalumenin Cytoplasm other Fndc5 −2.434034744 6.89E−06 0.000164402 fibronectin type III Other other domain containing 5 Tnnt3 −2.43394736 4.26E−05 0.000786196 troponin T type 3 Cytoplasm other (skeletal, fast) Greb1 −2.426503014 1.19E−08 6.75E−07 growth regulation by Cytoplasm other estrogen in breast cancer 1 I830012O16Rik −2.419437095 0.000180549 0.002630558 #N/A #N/A #N/A Sox11 −2.416593585 3.33E−09 2.34E−07 SRY (sex Nucleus transcription determining region regulator Y)-box 11 Nrcam −2.409994027 1.07E−05 0.000242622 neuronal cell Plasma other adhesion molecule Membrane Foxl1 −2.40842338 5.02E−06 0.000124633 forkhead box L1 Nucleus transcription regulator Foxc1 −2.3807345 1.25E−19 1.14E−16 forkhead box C1 Nucleus transcription regulator Tuba4a −2.367985915 4.07E−07 1.41E−05 tubulin, alpha 4a Cytoplasm other Tcap −2.362652632 7.01E−07 2.27E−05 titin-cap Cytoplasm other B430306N03Rik −2.351177563 4.46E−07 1.51E−05 RIKEN cDNA Other other B430306N03 gene Cap2 −2.351093169 7.43E−08 3.31E−06 CAP, adenylate Plasma other cyclase-associated Membrane protein, 2 (yeast) Ucp3 −2.345743121 3.11E−05 0.00060655 uncoupling protein 3 Cytoplasm transporter (mitochondrial, proton carrier) Dmrta2 −2.338686972 1.16E−06 3.53E−05 DMRT-like family Nucleus transcription A2 regulator Fgfr3 −2.337098522 2.07E−11 2.60E−09 fibroblast growth Plasma kinase factor receptor 3 Membrane Mapt −2.331242707 2.08E−08 1.06E−06 microtubule- Plasma other associated protein Membrane tau Fgfr2 −2.32142915 2.02E−05 0.000418907 fibroblast growth Plasma kinase factor receptor 2 Membrane Hhatl −2.311519505 2.06E−05 0.000424294 hedgehog Cytoplasm enzyme acyltransferase-like Jsrp1 −2.307700559 7.35E−08 3.30E−06 junctional Cytoplasm other sarcoplasmic reticulum protein 1 Ppm1e −2.302246505 7.68E−07 2.45E−05 protein phosphatase, Nucleus phosphatase Mg2+/Mn2+ dependent, 1E Flnc −2.296173473 1.55E−07 6.21E−06 filamin C, gamma Cytoplasm other Smad9 −2.295442355 7.21E−06 0.000171229 SMAD family Nucleus transcription member 9 regulator Alpk3 −2.286390928 7.03E−07 2.27E−05 alpha-kinase 3 Nucleus kinase Npr3 −2.28173834 5.37E−07 1.79E−05 natriuretic peptide Plasma G-protein receptor 3 Membrane coupled receptor Fras1 −2.279123226 1.29E−07 5.32E−06 Fraser extracellular Extracellular other matrix complex Space subunit 1 Cmpk2 −2.278794093 6.99E−06 0.000166661 cytidine Cytoplasm kinase monophosphate (UMP-CMP) kinase 2, mitochondrial Rbp7 −2.276049923 1.43E−10 1.38E−08 retinol binding Cytoplasm other protein 7, cellular Popdc3 −2.270239846 1.86E−07 7.13E−06 popeye domain Other other containing 3 Dusp26 −2.262773674 1.97E−05 0.000409563 dual specificity Cytoplasm enzyme phosphatase 26 (putative) Slc28a2 −2.26166738 3.16E−06 8.47E−05 solute carrier family Plasma transporter 28 (concentrative Membrane nucleoside transporter), member 2 Smyd1 −2.257820234 1.29E−05 0.000287231 SET and MYND Nucleus transcription domain containing 1 regulator Tbx1 −2.253565715 3.52E−09 2.42E−07 T-box 1 Nucleus transcription regulator Tnni2 −2.250955521 0.000114204 0.001787671 troponin I type 2 Cytoplasm enzyme (skeletal, fast) Ccl3 −2.249366721 9.28E−05 0.001506348 chemokine (C-C Extracellular cytokine motif) ligand 3-like Space 3 Slc16a4 −2.247378871 7.91E−05 0.0013143 solute carrier family Plasma transporter 16, member 4 Membrane 3425401B19Rik −2.245394181 7.12E−06 0.000169305 chromosome 10 Other other open reading frame 71 Lrtm2 −2.237129762 0.000265996 0.003635462 leucine-rich repeats Other other and transmembrane domains 2 Sult1a1 −2.236886147 3.44E−06 9.05E−05 sulfotransferase Cytoplasm enzyme family 1A, phenol- preferring, member 1 Nrap −2.225366834 3.52E−06 9.21E−05 nebulin-related Cytoplasm other anchoring protein Cacna1s −2.216075785 1.08E−05 0.000245159 calcium channel, Plasma ion channel voltage-dependent, Membrane L type, alpha 1S subunit Mum1l1 −2.213497637 0.000179172 0.002616065 melanoma Cytoplasm other associated antigen (mutated) 1-like 1 Hk3 −2.213406939 2.77E−12 4.31E−10 hexokinase 3 (white Cytoplasm kinase cell) Camk2b −2.209496356 6.55E−09 3.98E−07 calcium/calmodulin- Cytoplasm kinase dependent protein kinase II beta Lamc3 −2.208122416 5.95E−05 0.001026687 laminin, gamma 3 Extracellular other Space Wnt10b −2.205958028 2.75E−06 7.53E−05 wingless-type Extracellular other MMTV integration Space site family, member 10B Fam107 a −2.204108844 0.000258693 0.003553395 family with Nucleus other sequence similarity 107, member A 2310002L09Rik −2.194169008 5.30E−05 0.000933799 RIKEN cDNA Cytoplasm other 2310002L09 gene Meis1 −2.193357038 1.42E−08 7.72E−07 Meis homeobox 1 Nucleus transcription regulator Trdn −2.192261248 4.81E−06 0.000120606 triadin Cytoplasm other Mlip −2.187237572 3.41E−06 8.99E−05 muscular LMNA- Nucleus other interacting protein Sh3bgr −2.183033333 0.000248973 0.003433665 SH3-binding domain Cytoplasm other glutamic acid-rich protein Prkag3 −2.181939396 0.0001064 0.001682844 protein kinase, Cytoplasm other AMP-activated, gamma 3 non- catalytic subunit Cacng1 −2.168663866 1.05E−06 3.25E−05 calcium channel, Plasma ion channel voltage-dependent, Membrane gamma subunit 1 Sypl2 −2.167562914 0.000345737 0.004491952 synaptophysin-like 2 Other other Hspb1 −2.159559855 4.85E−08 2.26E−06 heat shock 27 kDa Cytoplasm other protein 1 Dusp27 −2.133894504 4.49E−09 2.91E−07 dual specificity Other phosphatase phosphatase 27 (putative) Notum −2.119495301 6.47E−06 0.000156084 notum Extracellular other pectinacetylesterase Space homolog (Drosophila) Pdk4 −2.119339262 9.60E−08 4.04E−06 pyruvate Cytoplasm kinase dehydrogenase kinase, isozyme 4 Myo18b −2.119301176 1.98E−06 5.66E−05 myosin XVIIIB Cytoplasm other Trim72 −2.115143227 1.68E−07 6.67E−06 tripartite motif Cytoplasm enzyme containing 72, E3 ubiquitin protein ligase 1500017E21Rik −2.111456775 0.00022252 0.003148182 RIKEN cDNA Other other 1500017E21 gene Cnih2 −2.099012367 9.24E−06 0.000213856 cornichon family Extracellular other AMPA receptor Space auxiliary protein 2 Mustn1 −2.092811522 1.30E−06 3.91E−05 musculoskeletal, Nucleus other embryonic nuclear protein 1 Rbm20 −2.092417242 1.29E−05 0.000287162 RNA binding motif Nucleus other protein 20 Casq1 −2.090786079 0.000388851 0.004925805 calsequestrin 1 (fast- Cytoplasm other twitch, skeletal muscle) H19 −2.08747205 7.92E−13 1.39E−10 H19, imprinted Cytoplasm other maternally expressed transcript (non- protein coding) Tlr7 −2.071501831 8.33E−08 3.64E−06 toll-like receptor 7 Plasma transmembrane Membrane receptor Kcnc3 −2.069589796 1.37E−06 4.09E−05 potassium channel, Plasma ion channel voltage gated Shaw Membrane related subfamily C, member 3 Twist1 −2.066523744 1.16E−16 5.49E−14 twist family bHLH Nucleus transcription transcription factor 1 regulator Galnt3 −2.060068511 4.79E−05 0.000861365 polypeptide N- Cytoplasm enzyme acetylgalactosaminyl transferase 3 Aldoart2 −2.056238854 0.000286733 0.003853432 aldolase 1 A, Other enzyme retrogene 2 Bves −2.052626225 1.58E−05 0.00033814 blood vessel Plasma other epicardial substance Membrane Myf6 −2.041661831 0.000242965 0.0033727 myogenic factor 6 Nucleus transcription (herculin) regulator Sgms2 −2.041025291 2.31E−05 0.000467309 sphingomyelin Plasma enzyme synthase 2 Membrane Mrc1 −2.038914921 2.65E−15 1.04E−12 mannose receptor, C Plasma transmembrai type 1 Membrane e receptor Slc8a3 −2.036140803 1.48E−06 4.37E−05 solute carrier family Plasma transporter 8 (sodium/calcium Membrane exchanger), member 3 Mx1 −2.032963249 4.78E−05 0.000859613 MX dynamin-like Nucleus enzyme GTPase 1 Dlx5 −2.013607351 9.59E−05 0.00154859 distal-less homeobox Nucleus transcription 5 regulator Cd180 −1.999988899 3.23E−09 2.28E−07 CD180 molecule Plasma other Membrane Hspb2 −1.999144835 2.17E−06 6.09E−05 heat shock 27 kDa Cytoplasm other protein 2 Penk −1.992073589 5.76E−05 0.000997614 proenkephalin Extracellular other Space Phospho1 −1.974838655 3.91E−05 0.000737067 phosphatase, orphan Extracellular enzyme 1 Space Colq −1.971439149 2.18E−05 0.000441925 collagen-like tail Extracellular other subunit (single Space strand of homotrimer) of asymmetric acetylcholinesterase Myom1 −1.964431143 8.37E−05 0.001377131 myomesin 1 Cytoplasm other Eef1a2 −1.963924398 6.39E−06 0.00015481 eukaryotic Cytoplasm translation translation regulator elongation factor 1 alpha 2 Ovol1 −1.96297186 3.25E−05 0.000631034 ovo-like zinc finger Nucleus transcription 1 regulator Lrrc2 −1.962718812 3.67E−06 9.57E−05 leucine rich repeat Other other containing 2 Ccl12 −1.960416694 4.35E−09 2.83E−07 chemokine (C-C Extracellular cytokine motif) ligand 2 Space Otud1 −1.9582093 1.43E−08 7.74E−07 OTU deubiquitinase Other peptidase 1 Lonrf3 −1.955839816 6.45E−09 3.96E−07 LON peptidase N- Other other terminal domain and ring finger 3 Bai1 −1.955724654 0.000100996 0.001616059 #N/A #N/A #N/A Hoxc9 −1.955660858 1.01E−12 1.72E−10 homeobox C9 Nucleus transcription regulator Arpp21 −1.945185646 0.000233277 0.003266598 cAMP-regulated Cytoplasm other phosphoprotein, 21 kDa Obscn −1.939029162 0.000206389 0.002959758 obscurin, Cytoplasm kinase cytoskeletal calmodulin and titin- interacting RhoGEF Trem2 −1.936446063 4.42E−08 2.09E−06 triggering receptor Plasma transmembrane expressed on Membrane receptor myeloid cells 2 Tpm1 −1.933315987 6.35E−05 0.001083589 tropomyosin 1, alpha Plasma other Membrane Mb −1.927240981 4.29E−10 3.79E−08 myoglobin Cytoplasm transporter Coro6 −1.923052355 9.30E−05 0.001506695 coronin 6 Extracellular other Space Satb2 −1.922158855 0.000113421 0.001777453 SATB homeobox 2 Nucleus transcription regulator Dlgap3 −1.921373713 0.00034172 0.00444821 discs, large Cytoplasm other (Drosophila) homolog-associated protein 3 Ptn −1.91531434 0.000166962 0.002471403 pleiotrophin Extracellular growth factor Space Bmp5 −1.905399922 3.70E−05 0.000700196 bone morphogenetic Extracellular growth factor protein 5 Space Ttn −1.901348241 0.000364585 0.004670309 titin Cytoplasm kinase Art1 −1.901283303 1.60E−05 0.000340711 ADP- Plasma enzyme ribosyltransferase 1 Membrane Sybu −1.900896598 7.80E−06 0.000182995 syntabulin (syntaxin- Other other interacting) Tex15 −1.900378417 1.57E−05 0.000337967 testis expressed 15 Extracellular other Space Wnt5a 1.906940595 1.25E−07 5.19E−06 wingless-type Extracellular cytokine MMTV integration Space site family, member 5A Ero1l 1.910974733 1.72E−11 2.18E−09 endoplasmic Cytoplasm enzyme reticulum oxidoreductase alpha Cyp7b1 1.913128075 6.13E−05 0.001053765 cytochrome P450, Cytoplasm enzyme family 7, subfamily B, polypeptide 1 Timp1 1.914391071 1.11E−09 8.91E−08 TIMP Extracellular cytokine metallopeptidase Space inhibitor 1 Bhlhe22 1.920077447 0.00024332 0.0033727 basic helix-loop- Nucleus transcription helix family, regulator member e22 Clca5 1.922252646 0.000128997 0.001980076 #N/A #N/A #N/A Nos2 1.93449052 9.69E−06 0.000223586 nitric oxide synthase Cytoplasm enzyme 2, inducible Sdc1 1.934765043 1.90E−12 3.13E−10 syndecan 1 Plasma enzyme Membrane Cel11 1.935685485 1.01E−05 0.00023188 chemokine (C-C Extracellular cytokine motif) ligand 11 Space Sfrp2 1.937570432 0.00010539 0.001670733 secreted frizzled- Plasma transmembrane related protein 2 Membrane receptor Adora2b 1.937719318 1.43E−06 4.25E−05 adenosine A2b Plasma G-protein receptor Membrane coupled receptor C1rb 1.948311191 6.64E−06 0.000159392 complement Extracellular peptidase component 1, r Space subcomponent Cadm3 1.954868801 1.65E−06 4.81E−05 cell adhesion Plasma other molecule 3 Membrane Gcnt4 1.959146422 5.60E−05 0.000974542 glucosaminyl (N- Cytoplasm enzyme acetyl) transferase 4, core 2 AA467197 1.95975912 0.000134462 0.002053093 chromosome 15 Nucleus other open reading frame 48 Adamts5 1.96550313 1.70E−12 2.84E−10 ADAM Extracellular peptidase metallopeptidase Space with thrombospondin type 1 motif, 5 Il6 1.96970782 3.32E−05 0.000642426 interleukin 6 Extracellular cytokine Space Acp5 1.97208448 1.73E−05 0.000365845 acid phosphatase 5, Cytoplasm phosphatase tartrate resistant Plac8 1.972654229 1.03E−06 3.18E−05 placenta-specific 8 Nucleus other Hic1 1.977666371 2.86E−10 2.64E−08 hypermethylated in Nucleus transcription cancer 1 regulator Il18rap 1.988047069 1.49E−05 0.000324778 interleukin 18 Plasma transmembrane receptor accessory Membrane receptor protein Prss46 2.005016082 0.000152543 0.002288306 protease, serine, 46 Other peptidase Csgalnact1 2.006832892 2.07E−12 3.33E−10 chondroitin sulfate Cytoplasm enzyme N- acetylgalactosaminyl transferase 1 Phlda2 2.012095452 0.000118738 0.001848071 pleckstrin Cytoplasm other homology-like domain, family A, member 2 Barx2 2.013964382 1.83E−06 5.25E−05 BARX homeobox 2 Nucleus transcription regulator Kctd11 2.020123243 8.08E−12 1.11E−09 potassium channel Cytoplasm other tetramerization domain containing 11 Hilpda 2.022779424 7.35E−08 3.30E−06 hypoxia inducible Cytoplasm other lipid droplet- associated Klhdc8a 2.029163571 7.64E−06 0.000180115 kelch domain Other other containing 8A Crabp2 2.041000695 3.54E−05 0.000676149 cellular retinoic acid Cytoplasm transporter binding protein 2 Medag 2.044971831 1.99E−12 3.24E−10 mesenteric estrogen- Cytoplasm other dependent adipogenesis Napsa 2.050808167 6.69E−08 3.03E−06 napsin A aspartic Extracellular peptidase peptidase Space Col23a1 2.074615476 6.01E−09 3.72E−07 collagen, type Plasma other XXIII, alpha 1 Membrane Wnt2b 2.077564459 1.78E−05 0.000373778 wingless-type Extracellular other MMTV integration Space site family, member 2B Lgi3 2.083185898 0.000201013 0.002897856 leucine-rich repeat Extracellular other LGI family, member Space 3 Il33 2.084646455 5.04E−11 5.30E−09 interleukin 33 Extracellular cytokine Space H2-Ab1 2.087468297 1.75E−09 1.30E−07 major Plasma other histocompatibility Membrane complex, class II, DQ beta 1 4930502E18Rik 2.087590606 5.50E−05 0.000961458 RIKEN cDNA Other other 4930502E18 gene Osr1 2.114093041 1.84E−08 9.56E−07 odd-skipped related Nucleus other transciption factor 1 Serping1 2.116258617 1.60E−13 3.43E−11 serpin peptidase Extracellular other inhibitor, clade G Space (C1 inhibitor), member 1 P2ry10 2.117660014 6.16E−05 0.001055499 purinergic receptor Plasma G-protein P2Y, G-protein Membrane coupled coupled, 10 receptor Ddit4 2.120727355 1.04E−15 4.32E−13 DNA-damage- Cytoplasm other inducible transcript 4 Tmeff2 2.123849758 0.000286592 0.003853432 transmembrane Cytoplasm other protein with EGF- like and two follistatin-like domains 2 Pthlh 2.12599575 3.81E−05 0.000719195 parathyroid Extracellular other hormone-like Space hormone Pla1a 2.128297502 3.15E−12 4.77E−10 phospholipase A1 Extracellular enzyme member A Space Cwc22 2.131128484 0.000289077 0.003873516 CWC22 Nucleus other spliceosome- associated protein Adamts4 2.131910626 9.89E−12 1.30E−09 ADAM Extracellular peptidase metallopeptidase Space with thrombospondin type 1 motif, 4 Ocstamp 2.133707622 0.000285909 0.003849919 osteoclast Other other stimulatory transmembrane protein Avpr1a 2.135058799 3.05E−08 1.49E−06 arginine vasopressin Plasma G-protein receptor 1A Membrane coupled receptor Sphk1 2.137577627 5.04E−10 4.42E−08 sphingosine kinase 1 Cytoplasm kinase Alox12 2.147703459 7.02E−05 0.001179092 arachidonate 12- Cytoplasm enzyme lipoxygenase Cd74 2.154265386 8.23E−10 6.87E−08 CD74 molecule, Plasma transmembrane major Membrane receptor histocompatibility complex, class II invariant chain Ier3 2.156413161 6.99E−10 5.94E−08 immediate early Cytoplasm other response 3 Niacr1 2.161017459 4.17E−06 0.000107108 #N/A #N/A #N/A Galnt16 2.163332213 1.33E−11 1.70E−09 polypeptide N- Cytoplasm enzyme acetylgalactosaminyl transferase 16 Fam83f 2.163464457 9.66E−05 0.001557314 family with Other other sequence similarity 83, member F Phyhipl 2.166920709 0.000352603 0.004552696 phytanoyl-CoA 2- Cytoplasm other hydroxylase interacting protein- like H2-Aa 2.16974298 3.79E−09 2.53E−07 major Plasma transmembrane histocompatibility Membrane receptor complex, class II, DQ alpha 1 Il1rl1 2.175512643 4.51E−06 0.000114652 interleukin 1 Plasma transmembrane receptor-like 1 Membrane receptor Dpt 2.180012546 2.29E−13 4.74E−11 dermatopontin Extracellular other Space Kcnjl5 2.180673205 1.37E−08 7.48E−07 potassium channel, Plasma ion channel inwardly rectifying Membrane subfamily J, member 15 Rnd1 2.181967661 9.57E−09 5.64E−07 Rho family GTPase Cytoplasm enzyme 1 Gpr114 2.189646297 1.94E−07 7.41E−06 #N/A #N/A #N/A Ccbp2 2.193904635 2.20E−07 8.27E−06 #N/A #N/A #N/A Elfn1 2.199467366 4.19E−05 0.000776232 extracellular leucine- Plasma other rich repeat and Membrane fibronectin type III domain containing 1 Cxadr 2.20082858 0.000332993 0.004351168 coxsackie virus and Plasma transmembrane adenovirus receptor Membrane receptor Mcpt4 2.206254485 0.000169343 0.002496531 mast cell protease 4 Other peptidase Stac2 2.212366039 9.24E−09 5.50E−07 SH3 and cysteine Other other rich domain 2 Cxcr7 2.216406879 4.59E−14 1.16E−11 #N/A #N/A #N/A Foxd1 2.217141687 2.09E−05 0.000427216 forkhead box D1 Nucleus transcription regulator Cd209f 2.232430603 0.000140148 0.002118637 CD209f antigen Other other Crabp1 2.235053806 0.000377869 0.004813427 cellular retinoic acid Cytoplasm transporter binding protein 1 Rtn4rl2 2.236586861 9.06E−08 3.91E−06 reticulon 4 receptor- Plasma other like 2 Membrane Slc39a14 2.238458836 2.66E−14 7.14E−12 solute carrier family Plasma transporter 39 (zinc transporter), Membrane member 14 Ifnlr1 2.242774024 1.94E−05 0.000403681 interferon, lambda Plasma transmembrane receptor 1 Membrane receptor 5730416F02Rik 2.253801229 5.43E−06 0.00013307 capping protein Other other (actin filament), gelsolin-like pseudogene Trpm6 2.25769376 2.03E−07 7.73E−06 transient receptor Plasma kinase potential cation Membrane channel, subfamily M, member 6 Gfra1 2.258361982 1.82E−06 5.23E−05 GDNF family Plasma transmembrane receptor alpha 1 Membrane receptor Egln3 2.26022722 4.82E−15 1.69E−12 egl-9 family Cytoplasm enzyme hypoxia-inducible factor 3 S100a9 2.261077077 0.000236631 0.003306796 S100 calcium Cytoplasm other binding protein A9 Fbln2 2.261656935 3.57E−10 3.21E−08 fibulin 2 Extracellular other Space Tnfsf11 2.262220766 0.000168565 0.002489291 tumor necrosis Extracellular cytokine factor (ligand) Space superfamily, member 11 S1pr3 2.26806489 8.22E−08 3.62E−06 sphingosine-1- Plasma G-protein phosphate receptor 3 Membrane coupled receptor Acsbg1 2.27206321 2.08E−05 0.000425956 acyl-CoA synthetase Cytoplasm enzyme bubblegum family member 1 Kcne3 2.28228483 7.11E−11 7.32E−09 potassium channel, Plasma ion channel voltage gated Membrane subfamily E regulatory beta subunit 3 Lmx1a 2.286295067 8.74E−05 0.001429283 LIM homeobox Nucleus transcription transcription factor regulator 1, alpha Sfrp1 2.286645932 0.000382833 0.004863077 secreted frizzled- Plasma transmembrane related protein 1 Membrane receptor Aqp2 2.294873359 3.68E−05 0.000699026 aquaporin 2 Plasma transporter (collecting duct) Membrane 1810033B17Rik 2.303750907 1.76E−05 0.000371982 #N/A #N/A #N/A Tmem178 2.30838751 8.85E−06 0.000205126 transmembrane Other other protein 178A Figf 2.324615912 3.70E−09 2.48E−07 c-fos induced Extracellular growth factor growth factor Space (vascular endothelial growth factor D) Slc6a2 2.327386448 5.24E−05 0.000926129 solute carrier family Plasma transporter 6 (neurotransmitter Membrane transporter), member 2 Gpr123 2.333408054 2.32E−07 8.66E−06 #N/A #N/A #N/A Ces2g 2.337823509 2.07E−05 0.000424294 carboxylesterase 2G Other enzyme Treml4 2.356184647 7.19E−05 0.00120497 triggering receptor Other other expressed on myeloid cells-like 4 Doc2b 2.356263534 0.000143893 0.002170448 double C2-like Cytoplasm transporter domains, beta Lbp 2.35803444 5.08E−13 9.65E−11 lipopolysaccharide Plasma transporter binding protein Membrane Ifi205 2.360747496 5.29E−14 1.32E−11 interferon, gamma- Nucleus transcription inducible protein 16 regulator Rgs9 2.360817111 6.48E−05 0.001101485 regulator of G- Cytoplasm enzyme protein signaling 9 Arsi 2.371268625 2.16E−08 1.09E−06 arylsulfatase family, Extracellular enzyme member I Space Ciita 2.373772401 6.36E−07 2.08E−05 class II, major Nucleus transcription histocompatibility regulator complex, transactivator Dusp4 2.379388346 2.36E−06 6.57E−05 dual specificity Nucleus phosphatase phosphatase 4 Rorb 2.38089784 0.000295309 0.003937748 RAR-related orphan Nucleus ligand- receptor B dependent nuclear receptor Sbsn 2.383926086 1.80E−05 0.000377684 suprabasin Cytoplasm other Cdh1 2.392769181 3.51E−08 1.69E−06 cadherin 1, type 1 Plasma other Membrane Fgr 2.395997722 1.73E−17 9.84E−15 FGR proto- Nucleus kinase oncogene, Src family tyrosine kinase Kcnip1 2.401474217 1.95E−05 0.000405805 Kv channel Plasma ion channel interacting protein 1 Membrane Ak4 2.401846699 5.33E−08 2.46E−06 adenylate kinase 4 Cytoplasm kinase A630023A22Rik 2.414937431 7.63E−05 0.001272883 RIKEN cDNA Other other A630023A22 gene Has1 2.417424502 1.62E−08 8.57E−07 hyaluronan synthase Plasma enzyme 1 Membrane Sdk1 2.424720594 1.24E−08 6.91E−07 sidekick cell Plasma other adhesion molecule 1 Membrane Gjb5 2.429127176 6.04E−06 0.000146855 gap junction protein, Plasma transporter beta 5, 31.1kDa Membrane 5730559C18Rik 2.434932765 5.17E−05 0.000919826 chromosome 1 open Other other reading frame 106 Adm 2.437796473 3.10E−09 2.21E−07 adrenomedullin Extracellular other Space Hmga2 2.445141593 1.35E−05 0.000297813 high mobility group Nucleus enzyme AT-hook 2 Itgax 2.449058631 3.26E−16 1.49E−13 integrin, alpha X Plasma transmembrane (complement Membrane receptor component 3 receptor 4 subunit) Itln1 2.454578666 3.52E−07 1.24E−05 intelectin 1 Plasma other (galactofuranose Membrane binding) Ifitm1 2.454902535 1.49E−13 3.24E−11 interferon induced Other other transmembrane protein 1 Sh2d5 2.457966113 5.17E−13 9.68E−11 SH2 domain Plasma other containing 5 Membrane Ndufa4l2 2.461273301 4.04E−12 5.88E−10 NADH Other enzyme dehydrogenase (ubiquinone) 1 alpha subcomplex, 4-like 2 Ffar2 2.472209214 0.000226332 0.003185641 free fatty acid Plasma G-protein receptor 2 Membrane coupled receptor Scd1 2.488851454 3.66E−07 1.28E−05 stearoyl-CoA Cytoplasm enzyme desaturase (delta-9- desaturase) Mmp9 2.499142569 4.69E−05 0.000845283 matrix Extracellular peptidase metallopeptidase 9 Space Cxcl1 2.51537654 3.67E−05 0.000696707 chemokine (C-X-C Extracellular cytokine motif) ligand 2 Space Nrn1 2.520515557 2.96E−06 8.01E−05 neuritin 1 Cytoplasm other Inhbb 2.523010814 5.62E−13 1.04E−10 inhibin, beta B Extracellular growth factor Space Col28a1 2.531182459 9.55E−07 2.97E−05 collagen, type Extracellular other XXVIII, alpha 1 Space Dnmt31 2.532511137 0.000269506 0.003668764 DNA (cytosine-5-)- Nucleus transcription methyltransferase 3- regulator like Fcrla 2.535289213 0.000140017 0.002118637 Fc receptor-like A Plasma other Membrane Cxcl14 2.538668396 1.33E−09 1.04E−07 chemokine (C-X-C Extracellular cytokine motif) ligand 14 Space Pi16 2.545417685 9.68E−08 4.06E−06 peptidase inhibitor Extracellular other 16 Space C4b 2.554327389 3.52E−09 2.42E−07 complement Extracellular other component 4B Space (Chido blood group) Gzmc 2.586719596 2.23E−06 6.24E−05 granzyme C Cytoplasm peptidase Car9 2.588221576 5.94E−09 3.70E−07 carbonic anhydrase Nucleus enzyme IX H2-Eb1 2.589361992 4.28E−11 4.72E−09 major Plasma transmembrane histocompatibility Membrane receptor complex, class II, DR beta 5 Cpa3 2.592224708 0.000349947 0.004527105 carboxypeptidase A3 Extracellular peptidase (mast cell) Space Rhov 2.597158136 3.97E−05 0.000744269 ras homolog family Plasma enzyme member V Membrane Smoc1 2.609276659 1.13E−17 7.01E−15 SPARC related Extracellular other modular calcium Space binding 1 Cd244 2.610625963 2.48E−09 1.82E−07 CD244 molecule, Plasma transmembrane natural killer cell Membrane receptor receptor 2B4 Serpina3h 2.626086546 9.48E−16 4.05E−13 serine (or cysteine) Extracellular other peptidase inhibitor, Space clade A, member 3H Dpp6 2.627269895 3.28E−06 8.69E−05 dipeptidyl-peptidase Plasma other 6 Membrane Tmem95 2.630604846 3.26E−06 8.64E−05 transmembrane Other other protein 95 Rgs16 2.641503944 7.07E−14 1.61E−11 regulator of G- Cytoplasm other protein signaling 16 Mmp12 2.661387138 8.47E−12 1.14E−09 matrix Extracellular peptidase metallopeptidase 12 Space Ttyh1 2.666310269 1.17E−08 6.67E−07 tweety family Plasma ion channel member 1 Membrane Tmem125 2.684592738 0.00020899 0.002984538 transmembrane Other other protein 125 Pcsk5 2.688181091 2.88E−12 4.42E−10 proprotein Extracellular peptidase convertase Space subtilisin/kexin type 5 Slc2a1 2.701076956 2.95E−13 5.94E−11 solute carrier family Plasma transporter 2 (facilitated glucose Membrane transporter), member 1 Frmd5 2.704255771 7.70E−05 0.001282835 FERM domain Other other containing 5 Col5a3 2.706994884 9.13E−22 9.61E−19 collagen, type V, Extracellular other alpha 3 Space Dmkn 2.711140688 0.00034439 0.004478701 dermokine Extracellular other Space Lrrc15 2.71712803 3.40E−12 5.06E−10 leucine rich repeat Plasma other containing 15 Membrane C3 2.726900819 2.87E−10 2.64E−08 complement Extracellular peptidase component 3 Space Nt5e 2.732947213 7.62E−12 1.05E−09 5′-nucleotidase, ecto Plasma phosphatase (CD73) Membrane Serpind1 2.762623986 9.29E−07 2.91E−05 serpin peptidase Extracellular other inhibitor, clade D Space (heparin cofactor), member 1 Unc13a 2.772685031 1.69E−09 1.27E−07 unc-13 homolog A Plasma other (C. elegans) Membrane Tpsb2 2.780651006 8.40E−08 3.65E−06 tryptase alpha/beta 1 Extracellular peptidase Space Inhba 2.795338332 3.18E−08 1.54E−06 inhibin, beta A Extracellular growth factor Space C4a 2.798731104 2.61E−10 2.44E−08 complement Extracellular other component 4B Space (Chido blood group) Slc2a3 2.810697868 2.17E−13 4.56E−11 solute carrier family Plasma transporter 2 (facilitated glucose Membrane transporter), member 3 Wt1 2.82046663 0.000169678 0.002498782 Wilms tumor 1 Nucleus transcription regulator 1300002K09Rik 2.834274508 7.10E−07 2.29E−05 #N/A #N/A #N/A Vat1l 2.842318796 6.74E−05 0.001138678 vesicle amine Other enzyme transport 1-like Il1b 2.842564699 3.03E−24 3.76E−21 interleukin 1, beta Extracellular cytokine Space Gjb3 2.850664786 2.13E−06 6.01E−05 gap junction protein, Plasma transporter beta 3, 31 kDa Membrane Sfrp4 2.873431041 5.56E−14 1.35E−11 secreted frizzled- Plasma transmembrane related protein 4 Membrane receptor Osbp2 2.894267579 8.23E−11 8.28E−09 oxysterol binding Cytoplasm other protein 2 Serpina3i 2.896733041 3.59E−11 4.13E−09 serine (or cysteine) Other other peptidase inhibitor, clade A, member 3G Ccbe1 2.899591596 1.80E−14 5.13E−12 collagen and Extracellular other calcium binding Space EGF domains 1 Dnase1l3 2.914095612 4.71E−11 5.03E−09 deoxyribonuclease I- Nucleus enzyme like 3 Prg4 2.920652095 1.48E−13 3.24E−11 proteoglycan 4 Extracellular other (megakaryocyte Space stimulating factor, articular superficial zone protein) Serpine1 2.943112329 2.65E−13 5.41E−11 serpin peptidase Extracellular other inhibitor, clade E Space (nexin, plasminogen activator inhibitor type 1), member 1 Nfasc 2.955386081 4.40E−11 4.81E−09 neurofascin Plasma other Membrane Tnfsf8 2.958098025 7.42E−08 3.31E−06 tumor necrosis Plasma cytokine factor (ligand) Membrane superfamily, member 8 Adra2a 2.963089237 3.96E−36 1.36E−32 adrenoceptor alpha Plasma G-protein 2A Membrane coupled receptor Syt5 2.967862699 1.03E−09 8.32E−08 synaptotagmin V Cytoplasm transporter Erv3 2.976253995 1.57E−05 0.000337967 endogenous Other other retroviral sequence 3 Lgi2 2.980280668 7.82E−07 2.49E−05 leucine-rich repeat Extracellular other LGI family, member Space 2 Adcy5 2.989745548 1.35E−15 5.42E−13 adenylate cyclase 5 Plasma enzyme Membrane Lcn2 3.00569291 4.88E−09 3.09E−07 lipocalin 2 Extracellular transporter Space Syt17 3.012074261 1.31E−06 3.93E−05 synaptotagmin XVII Plasma other Membrane Efemp1 3.013217113 2.56E−07 9.39E−06 EGF containing Extracellular enzyme fibulin-like Space extracellular matrix protein 1 Fam5c 3.02276903 1.82E−07 7.05E−06 #N/A #N/A #N/A Sorcs1 3.045355688 1.60E−05 0.000340968 sortilin-related Plasma transporter VPS 10 domain Membrane containing receptor 1 Adamts15 3.054035878 7.61E−15 2.54E−12 ADAM Extracellular peptidase metallopeptidase Space with thrombospondin type 1 motif, 15 Clec2e 3.07635781 6.84E−06 0.000163535 C-type lectin domain Plasma transmembrane family 2, member h Membrane receptor Chl1 3.084606232 5.81E−07 1.92E−05 cell adhesion Plasma other molecule L1-like Membrane Mmrn1 3.110168932 0.000104343 0.001656047 multimerin 1 Extracellular other Space Gpr35 3.118726492 3.52E−19 2.84E−16 G protein-coupled Plasma G-protein receptor 35 Membrane coupled receptor Rarres2 3.147276259 1.10E−09 8.87E−08 retinoic acid Plasma transmembrane receptor responder Membrane receptor (tazarotene induced) 2 Pgf 3.150550679 8.07E−17 4.09E−14 placental growth Extracellular growth factor factor Space Serpina3f 3.155375641 5.61E−14 1.35E−11 serine (or cysteine) Other other peptidase inhibitor, clade A, member 3G Il1r2 3.155818097 9.76E−15 3.11E−12 interleukin 1 Plasma transmembrane receptor, type II Membrane receptor Il13ra2 3.208712372 4.27E−05 0.000786196 interleukin 13 Plasma transmembrane receptor, alpha 2 Membrane receptor Nxph4 3.212429551 1.20E−08 6.76E−07 neurexophilin 4 Extracellular other Space Slit1 3.221521866 8.08E−08 3.56E−06 slit guidance ligand Extracellular other 1 Space Col10a1 3.255299689 8.78E−08 3.80E−06 collagen, type X, Extracellular other alpha 1 Space Grem1 3.306998841 4.81E−09 3.07E−07 gremlin 1, DAN Extracellular other family BMP Space antagonist Rpl21 3.319158494 0.000321595 0.004226455 ribosomal protein Cytoplasm other L21 Ly6k 3.330210263 1.32E−05 0.000292565 lymphocyte antigen Nucleus other 6 complex, locus K Pcsk9 3.343662872 1.11E−05 0.000249753 proprotein Extracellular peptidase convertase Space subtilisin/kexin type 9 Dbx2 3.374124712 8.16E−10 6.85E−08 developing brain Nucleus transcription homeobox 2 regulator B3galt5 3.42887914 3.17E−06 8.47E−05 UDP- Cytoplasm enzyme Gal:betaGlcNAc beta 1,3- galactosyltransferase , polypeptide 5 Il11 3.446479515 1.36E−08 7.48E−07 interleukin 11 Extracellular cytokine Space Htr1b 3.47009247 2.52E−14 6.90E−12 5- Plasma G-protein hydroxytryptamine Membrane coupled (serotonin) receptor receptor 1B, G protein- coupled Cxcl13 3.554799387 3.92E−05 0.000737235 chemokine (C-X-C Extracellular cytokine motif) ligand 13 Space 9330182L06Rik 3.599154487 3.76E−06 9.79E−05 KIAA1324-like Other other Cd207 3.698500979 6.80E−11 7.05E−09 CD207 molecule, Plasma other langerin Membrane Serpina3n 3.699329372 1.12E−13 2.51E−11 serpin peptidase Extracellular other inhibitor, clade A Space (alpha-1 antiproteinase, antitrypsin), member 3 Tmem132e 3.706736426 9.24E−08 3.94E−06 transmembrane Other other protein 132E Serpina3m 3.722226604 5.48E−18 3.57E−15 serpin peptidase Extracellular other inhibitor, clade A Space (alpha-1 antiproteinase, antitrypsin), member 3 Kcnmb1 3.771552594 2.30E−08 1.15E−06 potassium channel Plasma ion channel subfamily M Membrane regulatory beta subunit 1 Gpr141 3.898353543 1.01E−10 9.98E−09 G protein-coupled Plasma G-protein receptor 141 Membrane coupled receptor Arg1 3.924911517 2.76E−08 1.37E−06 arginase 1 Cytoplasm enzyme Tpsab1 3.958781996 9.04E−15 2.94E−12 tryptase alpha/beta 1 Nucleus peptidase Ereg 3.993604416 9.00E−07 2.82E−05 epiregulin Extracellular growth factor Space Mmp13 4.025132705 3.95E−15 1.42E−12 matrix Extracellular peptidase metallopeptidase 13 Space Tnfrsf9 4.100043244 1.10E−24 1.67E−21 tumor necrosis Plasma transmembrane factor receptor Membrane receptor superfamily, member 9 Slc7a11 4.123064166 1.29E−17 7.68E−15 solute carrier family Plasma transporter 7 (anionic amino Membrane acid transporter light chain, xc- system), member 11 Akr1c18 4.133960273 1.07E−11 1.40E−09 aldo-keto reductase Cytoplasm enzyme family 1, member C3 Mgarp 4.215853911 3.91E−11 4.38E−09 mitochondria- Cytoplasm other localized glutamic acid-rich protein Serpina3k 4.258478095 6.90E−13 1.26E−10 serpin peptidase Extracellular other inhibitor, clade A Space (alpha-1 antiproteinase, antitrypsin), member 3 Ccl20 4.325657841 1.88E−10 1.80E−08 chemokine (C-C Extracellular cytokine motif) ligand 20 Space Cfi 4.589933583 1.17E−09 9.20E−08 complement factor I Extracellular peptidase Space Reg3g 4.66412117 1.46E−12 2.47E−10 regenerating islet- Extracellular other derived 3 gamma Space Krt19 4.779241895 1.21E−05 0.000271903 keratin 19, type I Cytoplasm other Ptprn 4.824685996 6.13E−22 6.98E−19 protein tyrosine Plasma phosphatase phosphatase, Membrane receptor type, N A2m 4.936644365 1.29E−07 5.33E−06 alpha-2- Extracellular transporter macroglobulin Space Saa3 4.938190554 6.34E−08 2.88E−06 serum amyloid A 3 Extracellular other Space Gzme 5.281856145 6.57E−14 1.52E−11 granzyme H Cytoplasm peptidase (cathepsin G-like 2, protein h-CCPX) Mmp3 5.447660714 4.84E−28 8.28E−25 matrix Extracellular peptidase metallopeptidase 3 Space Prokr2 5.903313554 2.25E−14 6.29E−12 prokineticin receptor Plasma G-protein 2 Membrane coupled receptor Fgf23 6.223273913 4.42E−14 1.14E−11 fibroblast growth Extracellular growth factor factor 23 Space Mcpt2 6.857304981 4.93E−30 1.12E−26 mast cell protease 2 Extracellular peptidase Space Gzmd 7.248393542 7.17E−13 1.29E−10 granzyme H Cytoplasm peptidase (cathepsin G-like 2, protein h-CCPX) Cldn10 7.636808366 5.34E−09 3.35E−07 claudin 10 Plasma other Membrane Mmp10 7.64543229 2.07E−24 2.84E−21 matrix Extracellular peptidase metallopeptidase 10 Space Gm9992 7.648664919 4.61E−08 2.16E−06 unc-93 homolog A Plasma other (C. elegans) Membrane Mcpt8 8.11716942 5.31E−12 7.57E−10 mast cell protease 8 Cytoplasm other Reg1 10.74685846 8.53E−19 6.49E−16 regenerating islet- Extracellular growth factor derived 1 alpha Space Mcpt1 11.25382227 2.86E−47 3.92E−43 mast cell protease 1 Other peptidase

Various aspects of the disclosure have been described. These and other aspects are within the scope of the following claims.

Claims

1. A method comprising:

xenografting tissue from a donor organism of a second species on to a host organism of a first species;

obtaining a sample derived from the host organism, wherein the sample comprises a plurality of molecules of messenger ribonucleic acid (mRNA);

determining, for substantially each molecule of the plurality of molecules of mRNA, a corresponding RNA sequence;

generating a combined dataset of RNA sequence reads by aligning each RNA sequence to a combined reference genome, wherein aligning each RNA sequence to the combined reference genome includes comparing a genomic location of each corresponding RNA sequence with a genomic location of a gene sequence of the combined reference genome, wherein the combined reference genome includes one or more gene sequences from at least a portion of a first genome derived from the first species and at least a portion of a second genome derived from the second species, and wherein the combined dataset of RNA sequence reads includes RNA sequence reads from both the first species and the second species;

filtering non-unique RNA sequence reads from the combined dataset by identifying species-specific RNA sequences exclusive to either the first species or the second species, wherein identifying species-specific RNA sequences includes determining, for each corresponding RNA sequence, whether the RNA sequence is substantially aligned with exactly one corresponding gene sequence of the combined reference genome;

at least one of: differentiating an origin species of each species-specific RNA sequence in the filtered combined dataset by determining whether the corresponding RNA sequence is aligned to a gene sequence of the combined reference genome associated with the first genome of the first species or the second genome of the second species, or quantifying an abundance level of each species-specific RNA sequence in the sample by determining an approximate number of times that each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample; and

determining, based on one or more of the differentiation of the origin species or the quantification of the abundance level, that the tissue derived from the donor organism contains a biomarker indicative of at least one of: a disease status, a response of the host organism to the tissue derived from the donor organism, a response of tissue derived from the donor organism to transplantation within the host organism, or a response of the host organism to therapy administered to the host organism.

2. The method of claim 1, wherein the one or more gene sequences comprise at least one of one or more coding sequences or one or more regulatory sequences.

3. The method of claim 1, further comprising generating the combined reference genome.

4. The method of claim 3, wherein generating the combined reference genome comprises:

identifying, for each of the one or more gene sequences, a corresponding location within the combined reference genome; and

annotating, for each of the one or more gene sequences, the corresponding location indicates the origin species of the corresponding gene sequence.

5. The method of claim 1, further comprising, for each species-specific RNA sequence, determining that the exactly one corresponding gene sequence is associated with a predetermined cluster of gene sequences.

6. The method of claim 5, wherein the predetermined cluster of gene sequences comprises a group of genes sharing one or more functional characteristics.

7. The method of claim 6, wherein the one or more functional characteristics comprises one or more biological processes or canonical pathways.

8. The method of claim 7, wherein the one or more biological processes or functional characteristics comprise one or more of transcriptional regulation, intracellular signaling, intercellular signaling, cell apoptosis, biomolecule metabolism, biomolecule synthesis, RNA processing, or macromolecule assembly.

9. The method of claim 1,

wherein the donor organism contains a biomarker indicative of the disease status, and

wherein the biomarker comprises a nucleic acid sequence associated with a disease.

10. The method of claim 1,

wherein the donor organism contains a biomarker indicative of the disease status, and

wherein the disease status comprises at least one of: the presence or absence of a disease state, one or more characteristics of an existing disease state, a likelihood of a future progression of an existing disease state, or one or more characteristics of a predicted future progression of an existing disease state.

11. The method of claim 1, further comprising determining, based on determining that the tissue derived from the donor organism contains the biomarker indicative of the disease status, a therapy to be administered to at least one of the host organism or the donor organism.

12. The method of claim 11, further comprising administering the determined therapy to the at least one of the host organism or the donor organism.

13. The method of claim 1,

wherein the donor organism contains a biomarker indicative of a response to the tissue derived from the donor organism, and

wherein the response of the host organism to the tissue derived from the donor organism corresponds to one of acceptance or rejection of the tissue derived from the donor organism by the host organism.

14. The method of claim 1, wherein obtaining the sample of bodily fluid derived from the host organism comprises:

obtaining a sample of blood;

isolating, from the sample of blood, a volume of blood serum; and

isolating, from the volume of blood serum, a plurality of exosomes.

15. The method of claim 1, further comprising:

isolating, from the sample of bodily fluid, the plurality of molecules of mRNA;

purifying the molecules of mRNA;

performing a reverse-transcriptase polymerase chain reaction using the molecules of RNA to produce a plurality of molecules of complementary deoxyribonucleic acid (cDNA), wherein each molecule of the plurality of molecules of cDNA corresponds to one of the plurality of molecules of mRNA;

performing a polymerase chain reaction to amplify the molecules of cDNA;

transcribing substantially each of the molecules of cDNA into RNA; and

determining the nucleic acid sequence of substantially each of the molecules of mRNA.

16. The method of claim 1,

wherein the first species comprises one of a rodent species or a non-human primate species, and

wherein the second species comprises one of a canine, feline, porcine, or human species.

17. A method comprising:

xenografting tissue from a donor organism of a second species on to a host organism of a first species;

obtaining a sample derived from the host organism wherein the sample comprises a plurality of molecules of messenger ribonucleic acid (mRNA);

generating a combined reference genome, wherein the combined reference genome comprises one or more gene sequences from: at least a portion of a first genome derived from the first species and at least a portion of the second genome derived from the second species;

determining, for substantially each molecule of the plurality of molecules of mRNA, a corresponding RNA sequence;

generating a combined dataset of RNA sequence reads by aligning each RNA sequence to the combined reference genome, wherein aligning each RNA sequence to the combined reference genome includes comparing a genomic location of each corresponding RNA sequence with a genomic location of a gene sequence of the combined reference genome, and wherein the combined dataset of RNA sequence reads includes RNA sequence reads from both the first species and the second species;

filtering non-unique RNA sequence reads from the combined dataset by identifying species-specific RNA sequences exclusive to either the first species or the second species, wherein identifying species-specific RNA sequences includes determining, for each corresponding RNA sequence, whether the RNA sequence is substantially aligned with exactly one corresponding gene sequence of the combined reference genome.

18. The method of claim 17, wherein generating the combined reference genome further comprises:

identifying, for each of the one or more gene sequences of the combined reference genome, a corresponding location within the combined reference genome; and

annotating, for each of the one or more gene sequences of the combined reference genome, the corresponding location to indicate the origin species of the corresponding gene sequence.

19. The method of claim 17, wherein generating the combined reference genome further comprises:

receiving data indicating gene sequences of at least a portion of the first genome derived from the first species;

receiving data indicating gene sequences of at least a portion of the second genome derived from the second species; and

outputting one or more computer files representing the one or more gene sequences of the combined reference genome.

20. The method of claim 17, further comprising at least one of:

differentiating an origin species of each species-specific RNA sequence in the filtered combined dataset by determining whether the corresponding RNA sequence is aligned to a gene sequence of the combined reference genome associated with the first genome of the first species or the second genome of the second species, or

quantifying an abundance level of each species-specific RNA sequence in the sample by determining an approximate number of times that each RNA sequence substantially aligned with exactly one corresponding gene occurs in the sample.