Normalization of gene expression data

Info

Publication number: 20050221357
Type: Application
Filed: Mar 22, 2005
Publication Date: Oct 6, 2005
Inventors: Mark Shannon (Livermore, CA), Mark Oldham (Los Gatos, CA), David Ruff (San Francisco, CA)
Application Number: 11/086,253

Abstract

A method for determining bias across two domains comprising gene expression data. The method can comprise (a) providing a first domain and a second domain; (b) obtaining information indicative of a bias within the first domain; (c) obtaining information indicative of a bias within the second domain; and (d) using the information indicative of the bias within the first domain and the information indicative of the bias within the second domain to produce an indication of bias across the two domains.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of U.S. patent application Ser. No. 10/944,673 filed on Sep. 17, 2004, and U.S. patent application Ser. No. 10/944,668 filed on Sep. 17, 2004. U.S. patent application Ser. No. 10/944,673 claims a benefit to U.S. Provisional Application No. 60/504,500 filed on Sep. 19, 2003; U.S. Provisional Application No. 60/504,052 filed on Sep. 19, 2003; U.S. Provisional Application No. 60/589,224 filed Jul. 19, 2004; U.S. Provisional Application No. 60/589,225 filed on Jul. 19, 2004; and U.S. Provisional Application No. 60/601,716 filed on Aug. 13, 2004. U.S. patent application Ser. No. 10/944,668 is a continuation-in-part of U.S. patent application Ser. No. 10/913,601 filed on Aug. 5, 2004 and further claims the benefit of U.S. Provisional Application No. 60/504,052 filed on Sep. 19, 2003; U.S. Provisional Application No. 60/589,224 filed Jul. 19, 2004; and U.S. Provisional Application No. 60/601,716 filed on Aug. 13, 2004.

All literature and similar materials cited in this application, including but not limited to, patents, patent applications, articles, books, treatises, and internet web pages, regardless of the format of such literature and similar materials, are expressly incorporated by reference in their entirety for any purpose. In the event that one or more of the incorporated literature and similar materials differs from or contradicts this application, including but not limited to defined terms, term usage, described techniques, or the like, this application controls.

INTRODUCTION

Currently, genomic analysis, including that of the estimated 30,000 human genes is a major focus of basic and applied biochemical and pharmaceutical research. Such analysis may aid in developing diagnostics, medicines, and therapies for a wide variety of disorders. However, the complexity of the human genome and the interrelated functions of genes often make this task difficult. There is a continuing need for methods and apparatus to aid in such analysis.

DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

The skilled artisan will understand that the drawings, described herein, are for illustration purposes only. The drawings are not intended to limit the scope of the present teachings in any way.

FIG. 1 is a flowchart illustrating the use of a database system according to some embodiments;

FIG. 2 is a flowchart illustrating a process for determining bias.

FIG. 3 is a graph exemplifying a comparison of amplification with IVT and multiplex preamplification.

FIG. 4 is a graph exemplifying a ΔΔC_Tcomparison of a brain and a liver sample.

FIG. 5 is a graph exemplifying a ΔΔC_Tcomparison between four different sample inputs.

FIG. 6 is a graph exemplifying a ΔΔΔC_Tof a liver and a brain sample with IVT preamplification.

FIG. 7 is a graph exemplifying a ΔΔΔC_Tof a liver and a brain sample with multiplex preamplification.

FIG. 8 is a flowchart illustrating a process for determining bias between two gene expression platforms.

DESCRIPTION OF VARIOUS EMBODIMENTS

The following description of various embodiments is merely exemplary in nature and is in no way intended to limit the present teachings, applications, or uses. Although the present teachings will be discussed in various embodiments as relating to polynucleotide amplification, such as PCR, such discussion should not be regarded as limiting the present teaching to only such applications.

In general, gene expression is a process by which a gene's coded information is converted into the structures present and operating in the cell. Gene expression is a multi-step process that begins with transcription and translation and is followed by folding, post-translational modification and targeting. The amount of protein that a cell expresses depends on the tissue, the developmental stage of the organism and the metabolic or physiologic state of the cell. Expressed genes can include those that are transcribed into mRNA and then translated into protein and those that are transcribed into RNA but not translated into protein. In various embodiments, gene expression can be studied using analytical techniques such as polymerase chain reaction (PCR), Northern blots, serial analysis of gene expression (SAGE) microarrays, hybridization arrays, and high density oligonucleotide arrays.

Briefly, by way of background, PCR can be used to amplify a sample of target Deoxyribose Nucleic Acid (DNA) for analysis. Typically, the PCR reaction involves copying the strands of the target DNA and then using the copies to generate additional copies in subsequent cycles. Each cycle doubles the amount of the target DNA present, thereby resulting in a geometric progression in the number of copies of the target DNA. The temperature of a double-stranded target DNA is elevated to denature the DNA, and the temperature is then reduced to anneal at least one primer to each strand of the denatured target DNA. In some embodiments, the target DNA can be a cDNA. In some embodiments, primers are used as a pair—a forward primer and a reverse primer—and can be referred to as a primer pair or primer set. In some embodiments, the primer set comprises a 5′ upstream primer that can bind with the 5′ end of one strand of the denatured target DNA and a 3′ downstream primer that can bind with the 3′ end of the other strand of the denatured target DNA. Once a given primer binds to the strand of the denatured target DNA, the primer can be extended by the action of a polymerase. In some embodiments, the polymerase can be a thermostable DNA polymerase, for example, a Taq polymerase. The product of this extension, which sometimes may be referred to as an amplicon, can then be denatured from the resultant strands and the process can be repeated. Temperatures suitable for carrying out the reactions are well known in the art. Certain basic principles of PCR are set forth in U.S. Pat. Nos. 4,683,195, 4,683,202, 4,800,159, and 4,965,188, each issued to Mullis et al.

In some embodiments, PCR can be conducted under conditions allowing for quantitative and/or qualitative analysis of one or more target DNA. Accordingly, detection probes can be used for detecting the presence of the target DNA in an assay. In some embodiments, the detection probes can comprise physical (e.g., fluorescent) or chemical properties that change upon binding of the detection probe to the target DNA. Some embodiments of the present teaching can provide real time fluorescence-based detection and analysis of amplicons as described, for example, in PCT Publication No. WO 95/30139 and U.S. patent application Ser. No. 08/235,411.

In some embodiments, assay can be a homogenous polynucleotide amplification assay, for coupled amplification and detection, wherein the process of amplification generates a detectable signal and the need for subsequent sample handling and manipulation to detect the amplified product is minimized or eliminated. Homogeneous assays can provide for amplification that is detectable without opening a sealed well or further processing steps once amplification is initiated. Such homogeneous assays can be suitable for use in conjunction with detection probes. For example, in some embodiments, the use of an oligonucleotide detection probe, specific for detecting a particular target DNA can be included in an amplification reaction in addition to a DNA binding agent of the present teachings. Homogenous assays among those useful herein are described, for example, in commonly assigned U.S. Pat. No. 6,814,934.

In some embodiments, methods are provided for detecting a plurality of targets. Such methods include those comprising forming an initial mixture comprising an analyte sample suspected of comprising the plurality of targets, a polymerase, and a plurality of primer sets. In some embodiments, each primer set comprises a forward primer and a reverse primer and at least one detection probe unique for one of the plurality of primer sets. In some embodiments, the initial mixture can be formed under conditions in which one primer elongates if hybridized to a target.

In some embodiments for amplification of a polynucleotide, assay can comprise a preamplification product, wherein one or more polynucleotides in an analyte has been amplified prior to being deposited in at least one of the plurality of wells. In some embodiments, these methods can further comprise forming a plurality of preamplification products by subjecting an initial analyte comprising a plurality of polynucleotides to at least one cycle of PCR to form a detection mixture comprising a plurality of preamplification products. The detection mixture of preamplification products can be then used for further amplification using a PCR. In some embodiments, preamplification comprises the use of isothermal methods.

In some embodiments, a two-step multiplex amplification reaction can be performed wherein the first step truncates a standard multiplex amplification round to boost a copy number of the DNA target by about 100-1000 or more fold. Following the first step, the resulting product can be divided into optimized secondary single amplification reactions, each containing one or more of the primer sets that were used previously in the first or multiplexed booster step. The booster step can occur, for example, using an aqueous target or using a solid phase archived nucleic acid. See, for example, U.S. Pat. No. 6,605,452, Marmaro.

In some embodiments, preamplification methods can employ in vitro transcription (IVT) comprising amplifying at least one sequence in a collection of nucleic acids sequences. The processes can comprise synthesizing a nucleic acid by hybridizing a primer complex to the sequence and extending the primer to form a first strand complementary to the sequence and a second strand complementary to the first strand. The primer complex can comprise a primer complementary to the sequence and a promoter region in anti-sense orientation with respect to the sequence. Copies of anti-sense RNA can be transcribed off the second strand. The promoter region, which can be single or double stranded, can be capable of inducing transcription from an operably linked DNA sequence in the presence of ribonucleotides and a RNA polymerase under suitable conditions. Suitable promoter regions may be prokaryote viruses, such as from T3 or T7 bacteriophage. In some embodiments, the primer can be a single stranded nucleotide of sufficient length to act as a template for synthesis of extension products under suitable conditions and can be poly (T) or a collection of degenerate sequences. In some embodiments, the methods involve the incorporation of an RNA polymerase promoter into selected cDNA molecule by priming cDNA synthesis with a primer complex comprising a synthetic oligonucleotide containing the promoter. Following synthesis of double-stranded cDNA, a polymerase generally specific for the promoter can be added, and anti-sense RNA can be transcribed from the cDNA template. The progressive synthesis of multiple RNA molecules from a single cDNA template results in amplified, anti-sense RNA (aRNA) that serves as starting material for cloning procedures by using random primers. The amplification, which will typically be at least about 20-40, typically to 50 to 100 or 250-fold, but can be 500 to 1000-fold or more, can be achieved from nanogram quantities or less of cDNA.

In some embodiments, a two stage preamplification method can be used to preamplify assay in one vessel by IVT and, for example, this preamplification stage can be 100×sample. In the second stage, the preamplified product can be divided into aliquots and preamplified by PCR and, for example, this preamplification stage can be 16,000×sample or more.

In some embodiments, the preamplification can be a multiplex preamplification, wherein the analyte sample can be divided into a plurality of aliquots. Each aliquot can then be subjected to preamplification using a plurality of primer sets for DNA targets. In some embodiments, the primer sets in at least some of the plurality of aliquots differ from the primer sets in the remaining aliquots. Each resulting preamplification product detection mixture can then be dispersed into at least some of a plurality of wells of a microplate comprising an assay having corresponding primer sets and detection probes for further amplification and detection according to the methods described herein. In some embodiments, the primer sets of assay in each of the plurality of wells can correspond to the primer sets used in making the preamplification product detection mixture. The resulting assay in each of the plurality of wells thus can comprise a preamplification product and primer sets and detection probes for amplification for DNA targets, which, if present in the analyte sample, have been preamplified.

Since a plurality of different sequences can be amplified simultaneously in a single reaction, the multiplex preamplification can be used in a variety of contexts to effectively increase the concentration or quantity of a sample available for downstream analysis and/or assays. In some embodiments, because of the increased concentration or quantity of target DNA, significantly more analyses can be performed with multiplex amplified samples than can be performed with the original sample. In many embodiments, multiplex amplification further permits the ability to perform analyses that require more sample or a higher concentration of sample than was originally available. In such embodiments, multiplex amplification enables downstream analysis for assays that could not have been possible with the original sample due to its limited quantity. In some embodiments, the plurality of aliquots can comprise 16 aliquots with each of the 16 aliquots comprising about 1536 primer sets. In such embodiments, a sample comprising a whole genome for a species, for example a human genome, can be preamplified. In some embodiments, the plurality of aliquots can be greater than 16 aliquots. In some embodiments, the number of primer sets can be greater than 1536 primer sets. In some embodiments, the plurality of aliquots can be less than 16 aliquots and the number of primer sets can be greater than 1536 primer sets. For examples of such embodiments, see PCT Publication No. WO 2004/051218 to Andersen and Ruff.

In some embodiments, multiplex methods are provided wherein assay comprises a first universal primer that binds to a complement of a first target, a second universal primer that binds to a complement of a second target, a first detection probe comprising a sequence that binds to the sequence comprised by the first target, and a second detection probe comprising a sequence that binds to a sequence comprised by the second target. In some embodiments, at least some of a plurality of wells of a microplate comprise a solution operable to perform multiplex PCR. The first and second detection probes can comprise different labels, for example, different fluorophores such as, in non-limiting example, VIC and FAM. Sequences of the first and second detection probes can differ by as little as one nucleotide, two nucleotides, three nucleotides, four nucleotides, or greater, provided that hybridization occurs under conditions that allow each detection probe to hybridize specifically to its corresponding detection probe.

In some embodiments, multiplex PCR can be used for relative quantification, where one primer set and detection probe amplifies the target DNA and another primer set and detection probe amplifies an endogenous reference. In some embodiments, the present teaching provide for analysis of at least four DNA targets in each of a plurality of wells and/or analysis of a plurality of DNA targets and a reference in each of a plurality of wells.

In some embodiments, as seen in FIG. 1, a plurality of microplates having assay filled thereon can be analyzed as described herein with sequence detection system, such as a PCR system to generate data. In some embodiments, this data can be stored in a gene expression analysis system database 736. Software can then be used to generate gene expression analysis information 738.

In some embodiments, a gene expression analysis system can utilize computer software that organizes analysis sessions into studies and stores them in database 738. An analysis session can comprise the results of running microplate in sequence detection system. To analyze session data, one can load an existing study that contains analysis session data or create a new study and attach analysis session data to it. Studies can be opened and reexamined an unlimited number of times to reanalyze the analysis session data or to add other analysis sessions to the analysis.

In some embodiments, gene expression analysis system database 736 stores the analyzed data for each microplate run on sequence detection system as an analysis session in database 736. The software can identify each analysis session by marking indicia of the associated microplate and the date on which it was created. Once analysis sessions have been assigned to a study, various functions can be performed. These functions comprise, but are not limited to, designating replicates, removing outliers, filtering data out of a particular view or report, correction of preamplification values via stored values, and computation of gene expression values.

In various embodiments, real time PCR is adapted to perform quantitative real time PCR (qRT-PCR). In various embodiments, two different methods of analyzing data from qRT-PCR experiments can be used: absolute quantification and relative quantification. In some embodiments, absolute quantification can determine an input copy number of the target DNA of interest This can be accomplished by relating a signal from a detection probe to a standard curve. In various embodiments, relative quantification can describe the change in expression of the target DNA relative to a reference or a group of references such as, for an example, an untreated control, an endogenous control, a passive internal reference, an universal reference RNA, or a sample at time zero in a time course study. When determining absolute quantification, the expression of the target DNA can be compared across many samples, for example, from different individuals, from different tissues, from multiple replicates, and/or serial dilution of standards in one or more matrices. In various embodiments of the present teachings, qRT-PCR can be performed using relative quantification and the use of standard curve is not required. Relative quantification can compare the changes in steady state target DNA levels of two or more genes to each other with one of the genes acting as an endogenous reference which may be used to normalize a signal from a sample gene. In various embodiments, in order to compare between experiments, resulting fold differences from the normalization of sample to the reference can be expressed relative to a calibrator sample. In some embodiments, the calibrator sample is included in each assay 1000. The gene expression analysis system can determine the amount of target DNA, normalized to a reference, by determining
ΔC_T=C_Tq−C_Tendo
where C_Tis the threshold cycle for detection of a fluorophore in real time PCR; C_Tqis the threshold cycle for detection of a fluorophore for a target DNA in assay 1000; and C_Tendois the threshold cycle for detection of a fluorophore for an endogenous reference or a passive internal reference in assay.

In some embodiments, a gene expression analysis system can determine the amount of target DNA, normalized to a reference and relative to a calibrator, by determining:
ΔΔC_T=ΔC_Tq−ΔC_Tcb

where C_Tqis the threshold cycle for detection of a fluorophore for the target DNA in assay 1000; C_Tcbis the threshold cycle for detection of a fluorophore for a calibrator sample; ΔC_Tqis a difference in threshold cycles for the target DNA and an endogenous reference; and ΔC_Tcbis a difference in threshold cycles for the calibrator sample and the endogenous reference If ΔΔC_Tis determined, the relative quantity of the target DNA can be determined using a relationship of relative quantity of the target DNA can be equal to 2^−ΔΔC_T. In various embodiments, ΔΔC_Tcan be about zero. In some embodiments, ΔΔC_Tcan be less than ±1. In various embodiments, the above calculations can be adapted for use in multiplex PCR (See, for example, Livak et al. Applied Biosystems User Bulletin #2, updated October 2001 and Livak and Schmittgen, Methods (25) 402-408 (2001).

In some embodiments, assay can be preamplified, as discussed herein, in order to increase the amount of target DNA prior to distribution into a plurality of wells of a microplate. In some embodiments, assay can be collected, for example, via a needle biopsy that typically yields a small amount of sample. Distributing this sample across a large number of wells can result in variances in sample distribution that can affect the veracity of subsequent gene expression computations. In such situations, assay can be preamplified using, for example, a pooled primer set to increase the number of copies of all target DNA simultaneously.

In various embodiments, preamplification processes can be non-biased, such that all target DNA are amplified similarly and to about the same power. In such embodiments, each target DNA can be amplified reproducibly from one input sample to the next input sample. For example, if target DNA X is initially present in sample A at 100 target molecules, then after 10 cycles of PCR amplification (1000-fold), 100,000 target molecules should be present. Continuing with the example, if target DNA X is initially present in sample B at 500 target molecules, then after 10 cycles of PCR amplification (1000-fold), 500,000 target molecules should be present. In this example, the ratio of target DNA X in samples A/B remains constant before and after the amplification procedure.

In various embodiments, a minor proportion of all target DNA can have an observed preamplification efficiency of less than 100%. In such embodiments, if the amplification bias is reproducible and consistent from one input sample to another, then the ability to accurately compute comparative relative quantitation between any two samples containing different relative amounts of target can be maintained. Continuing the example from above and assuming 50% reproducible amplification efficiency, if target DNA X is initially present in sample A at 100 target molecules, then after 10 cycles of PCR amplification (50% of 1000-fold), 50,000 target molecules should be present. Further continuing the example, if target X is initially present in sample B at 500 target molecules, then after 10 cycles of PCR amplification (50% of 1000-fold), 250,000 target molecules should be present. In this example, the ratio of template X in samples A/B remains constant before and after the amplification procedure and is the same ratio as the 100% efficiency scenario.

In various embodiments, an unbiased amplification of each target DNA (x, y, z, etc.) can be determined by calculating the difference in CT value of the target DNA (x,y,z, etc.) from the C_Tvalue of a selected endogenous reference, and such calculation is referred to as the ΔC_Tvalue for each given target DNA, as described above. In various embodiments, a reference for a bias calculation can be non-preamplified, amplified target DNA and an experimental sample can be a preamplified amplified target DNA. In some embodiments, the standard sample and experimental sample can originate from the same sample, for example, same tissue, same individual and/or same species. In various embodiments, comparison of ΔC_Tvalues between the non-preamplified amplified target DNA and preamplified amplified target DNA can provide a measure for the bias of the preamplification process between the endogenous reference and the target DNA (x, y, z, etc.).

In various embodiments, the difference between the two ΔC_Tvalues (ΔΔC_T) can be zero and as such there is no bias from preamplification. This is explained in greater detail below with reference to FIG. 2. In some embodiments, the gene expression analysis system can be calibrated for potential differences in preamplification efficiency that can arise from a variety of sources, such as the effects of multiple primer sets in the same reaction. In some embodiments, calibration can be performed by computing a reference number that reflects preamplification bias. Reference number similarity for a given target DNA across different samples is indicative that the preamplification reaction ΔC_TS can be used to achieve reliable gene expression computations.

In various embodiments of the present teaching, a gene expression analysis system can compute these reference numbers by collecting a sample (designated as Sample A (S_A)) and processing it with one or more protocols. A first protocol comprises running individual PCR gene expression reactions for each target DNA (T_x) relative to an endogenous reference (endo), such as, for example, 18s or GAPDH. These reactions can yield cycle threshold values for each target DNA relative to the endogenous control; as computed by:
ΔC_{T not preamplified}T_xS_A=C_{T not preamplified}T_XS_A−C_{T notpreamplified}endo

A second protocol can comprise running a single PCR preamplification step on assay with, for example, a pooled primer set. In various embodiments, the pooled primer set can contain primers for each target DNA. Subsequently, the preamplified product can be distributed among a plurality of wells of a microplate. PCR gene-expression reactions can be run for each preamplified target DNA (T_X) relative to an endogenous reference (endo). These reactions can yield cycle threshold values for each preamplified target DNA relative to the endogenous control, as computed by:
ΔC_{T preamplified}T_xS_A=C_{T preamplified}T_xS_A−C_{T preamplified endo}T_xS_A
A difference between these ΔC_Tnot preamplified T_xS_Aand ΔC_{T preamplified}T_xS_Acan be computed by:
ΔΔC_TT_xS_A=ΔC_{T not preamplified}T_xS_A−ΔC_{T preamplified}T_xS_A

In various embodiments, a value for ΔΔC_TT_xS_Acan be zero or close to zero, which can indicate that there is no bias in the preamplification of target DNA T_x. In various embodiments, a negative ΔΔC_TT_xS_Avalue can indicate the preamplification process was less than 100% efficient for a given target DNA (T_x). For example, when using an IVT preamplification process, a percentage of target DNA with a ΔΔC_Tof +/−1 C_Tof zero can be ˜50%, as shown in FIG. 3. In another example, when using a multiplex preamplification process, a percentage of target DNA with a ΔΔC_Tof +/−1 C_Tof zero can be ˜90%, as shown in FIG. 4.

In various embodiments, an amplification efficiency can be less than 100% for a particular target DNA, therefore ΔΔC_Tis less than zero for the particular target DNA. An example, as shown in FIG. 5, can be an evaluation of ΔΔC_Tvalues for a group of target DNA from a 1536-plex for the multiplex preamplification process including four different human sample input sources: liver, lung, brain and an universal reference tissue composite. In this example, most ΔΔC_Tvalues are near zero, however, some of the target DNA have a negative ΔΔC_Tvalue but these negative values are reproducible from one sample input source to another. In various embodiments, a gene expression analysis system can determine if a bias exists for target DNA analyzed for different sample inputs.

In various embodiments of the present teachings, a gene expression analysis system can use ΔΔC_Tvalues computed for the same target DNA but in different samples (Sample A (S_A) and Sample B (S_B)) in order to determine the accuracy of subsequent relative expression computations. This results in the equation,
ΔΔΔC_TT_x=ΔΔC_TT_xS_A−ΔΔC_TT_xS_B

In various embodiments a value for ΔΔΔC_TT_xcan be zero or reasonably close to zero which can indicate that the preamplified ΔC_Tvalues for T_x(ΔC_{T preamplified}T_xS_Aand ΔC_{T preamplified}T_xS_B) can be used for relative gene expression computation between different samples via a standard relative gene expression calculation.

In some embodiments, a standard relative gene expression calculation can determine the amount of the target DNA. In some embodiments, a standard relative gene expression calculation employs a comparative C_T. In various embodiments, the above methods can be practiced during experimental design and once the conditions have been optimized so that the ΔΔΔC_TT_xis reasonably close to zero, subsequent experiments only require the computation of the ΔC_Tvalue for the preamplified reactions. In various embodiments, ΔΔC_TT_xS_Avalues can be stored in a database or other storage medium. In such embodiments, these values can then be used to convert ΔΔC_{Tpreamplified}T_xS_Avalues to ΔΔC_{T not preamplified}T_xS_Avalues. In such embodiments, the ΔΔC_{T preamplified}T_xS_yvalues can be mapped back to a common domain. In various embodiments, a not preamplified domain can be calculated using other gene expression instrument platforms such as, for example, a microarray. In various embodiments, the ΔΔC_TT_xS_Avalues need not be stored for all different sample source inputs (S_A) if it can be illustrated that the ΔΔC_{T preamplified}T_xis reasonably consistent over different sample source inputs. For example, a distribution of ΔΔΔC_Tfor two different sample inputs (liver and brain) are shown in FIG. 6 (IVT preamplification) and FIG. 7 (multiplex preamplification).

In various embodiments, gene expression can be assessed with microarray technology, which can provide a measure of the cellular concentration of different mRNAs. In some embodiments, a microarray can be a piece of glass or plastic on which single stranded pieces of DNA are affixed in a microscopic array as probes. In some embodiments, thousands of identical probes can be affixed at each point in the array which can make effective detectors.

Typically arrays can be used to detect the presence of mRNAs that may have been transcribed from different genes and which encode different proteins. The RNA can be extracted from many cells, ideally from a single cell type, then converted to cDNA. In various embodiments, the cDNA may be amplified in quantity by PCR. Fluorescent tags can be enzymatically incorporated into the or can be chemically attached to strands of cDNA. In various embodiments, a cDNA molecule that contains a sequence complementary to one of the probes will hybridize via base pairing to the point at which the complementary probes are affixed. In such embodiments, the point on the array can then fluoresce when examined using a microarray scanner. In some embodiments, the intensity of the fluorescence can be proportional to the number of copies of a particular mRNA that were present and thus roughly indicates the activity or expression level of that gene.

In various embodiments, a microarray can be, for example, a cDNA array, a hybridization array, a DNA microchip, a high density sequence oligonucleotide array, or the like. In various embodiments, a microarray can be available from a commercial source such as, for example, Applied Biosystems, Affymetrix, Agilent, Illumina, or Xeotron. In various embodiments, a microarray can be made by any number of technologies including printing with fine-pointed pins onto glass slides, photolithography using pre-made masks, photolithography using dynamic micromirror devices, or ink-jet printers. The lack of standardization in microarrays can present an interoperability problem in bioinformatics, since it can limit the exchange of array data.

In various embodiments, microarray output data can be in a format of fluorescence intensity and in other embodiments, microarray output data may be in a format of chemiluminescence intensity. In various embodiments, an intensity value from a microarray output data can be globally normalized. In some embodiments, told difference values can be determined by subtracting background noise and normalizing the array signal intensity, then dividing experimental sample signal intensity by a control sample signal intensity yielding net sample intensity. In some embodiments, a control sample used to generate the control sample signal intensity can be, for example, Stratagene®, UHR or the like. In some embodiments, a full difference can be converted to a log₂by the following equation:
2^ΔΔC_T=3.3 log₁₀(net intensity sample 1/net intensity sample 2)
In such embodiments, microarray output data is in a ΔΔC_Tformat. In some embodiments, microarray output data can be converted into a ΔΔC_Tformat by the following equation:
R=(½)^ΔΔC_T
where R is the resulting measurement from a microarray. Such calculations are available commercially, such as GeneSpring from Silicon Genetics. Other embodiments include converting microarray output data into a ΔΔC_Tformat using a Global Pattern Recognition (GPR) algorithm which can convert intensity values generated from microarrays from linear values to algorithmic values and can use transformed intensity cutoffs to effect gene and normalizer filters. In such embodiments, GPR, a software algorithm for gene expression analysis is available from The Jackson Laboratory. In various embodiments, microarray output data can be in a standard language or format such as MAGE-ML (microarray and gene expression markup language), MAML (microarray markup language), or MIAME (minimum information about microarray experiments). In various embodiments, such standardized formats and language can be converted to a ΔΔC_Tformat.

In various embodiments, after microarray output data is in a ΔΔC_Tformat, then real-time PCR data can be directly compared to data from microarray platforms as shown in FIG. 8. In various embodiments, a ΔΔΔC_Tcalculation can be a validation tool to confirm that relative quantitation data can be compared from one amplification/detection process to another. In various embodiments, ΔΔΔC_Tcalculation can be a validation tool to confirm that relative quantitation data can be compared from one sample input source to another sample input source, for example, comparing a sample from liver to a sample from brain in the same individual. In various embodiments, ΔΔΔC_Tcalculation can be a validation tool to confirm that relative quantitation data can be compared from one high-density sequence detector system to another high-density sequence detection system. In various embodiments, ΔΔΔC_Tcalculation can be a validation tool to confirm that relative quantitation data can be compared from one platform to another, for example, data from real time PCR to data from a hybridization array is especially valuable for cross-platform validation. In various embodiments, real time PCR and hybridization array data can be directly compared. In various embodiments, a TaqMan ΔΔC_Tcan be compared to a microarray output converted to the ΔΔC_Tformat. In such embodiments, the resultant ΔΔΔC_T, if within +/−1 C_Tof zero, can determine a high-degree of confidence that the actual fold difference observed within each of the two platforms is correlative.

In various embodiments, a correction, which can be a quantity added to a calculated or observed value to obtain the true value, may be used so that data generated on two different platforms can be used together in further calculations and analysis. Such embodiments allow for larger and sometimes more complete data sets to be used in gene expression studies. In some embodiments, the correction can be calculated from a resulting ΔΔΔC_T. In various embodiments, a correction can be a bias correction.

Claims

1. A method for determining bias across two domains comprising gene expression data, the method comprising:

providing a first domain and a second domain;

obtaining information indicative of a bias within the first domain;

obtaining information indicative of a bias within the second domain; and

using the information indicative of the bias within the first domain and the information indicative of the bias within the second domain to produce an indication of bias across the two domains.

2. The method according to claim 1, wherein the providing a first domain and a second domain further comprises providing at least one of the first domain and the second domain comprising information collected from a polynucleotide amplification instrument.

3. The method according to claim 2 wherein the polynucleotide amplification instrument is capable of running polymerase chain reactions.

4. The method according to claim 1 wherein the providing a first domain and a second domain further comprises providing at least one of the first domain and the second domain comprising information generated by gene expression analysis.

5. The method according to claim 4 wherein the providing at least one of the first domain and the second domain comprise data generated by gene expression analysis comprising generating gene expression data from an polynucleotide amplification instrument.

6. The method according to claim 4 wherein the providing at least one of the first domain and the second domain comprise data generated by gene expression analysis comprising generating gene expression data from an microarray.

7. The method according to claim 4 wherein the providing at least one of the first domain and the second domain comprise data generated by gene expression analysis comprising generating gene expression data from a hybridization chip.

8. The method according to claim 1 wherein the providing a first domain and a second domain further comprises providing information from a preamplified and amplified sample and providing information from an amplified sample.

9. The method according to claim 1 further comprising amplifying a polynucleotide sample.

10. The method according to claim 9 further comprising collecting information indicative of the amplifying a polynucleotide sample.

11. The method according to claim 10 further comprising organizing the information indicative of the amplifying a polynucleotide sample into the first domain.

12. The method according to claim 9 further comprising preamplifying a polynucleotide sample.

13. The method according to claim 12 further comprising collecting information indicative of the preamplifying a polynucleotide sample and amplifying a polynucleotide sample.

14. The method according to claim 13 further comprising organizing the information indicative of the preamplifying a polynucleotide sample and amplifying a polynucleotide sample into the first domain.

15. The method according to claim 14 wherein the obtaining information indicative of bias within the first domain comprises evaluating a bias between the amplifying a polynucleotide sample and the preamplifying a polynucleotide sample and amplifying a polynucleotide sample.

16. The method according to claim 15 wherein the obtaining information indicative of bias within the second domain comprises evaluating a bias between the amplifying a second polynucleotide sample and the preamplifying a polynucleotide second sample and amplifying a second polynucleotide sample.

17. The method according to claim 16 wherein the polynucleotide sample and the second polynucleotide sample are from different tissues.

18. The method according to claim 17 wherein the different tissues are from the same individual.

19. The method according to claim 1 wherein the obtaining information indicative of bias within the first domain further comprises performing a ΔΔCT calculation.

20. The method according to claim 1 wherein the obtaining information indicative of bias within the second domain further comprises performing a ΔΔCT calculation.

21. The method according to claim 1 wherein the using the information indicative of bias within the first domain and the information indicative of bias within the second domain to produce an indication of bias across the two domains further comprises performing a ΔΔΔCT calculation.

22. The method according to claim 1 wherein the obtaining information indicative of bias within the first domain comprises information from a first amplification device and the obtaining information indicative of bias within the second domain comprises from information a second amplification device.

23. The method according to claim 1 further comprising converting information from a micro array device into a ΔΔCT format.

24. The method according to claim 23 the obtaining information indicative of bias within the first domain comprises information from a first amplification device and the obtaining information indicative of bias within the second domain comprises from information from a microarray device.

25. The method according to claim 1 wherein the obtaining information indicative of bias within the first domain comprises information from a first tissue type and the obtaining information indicative of bias within the second domain comprises from information a second tissue type.

26. The method according to claim 1 wherein the obtaining information indicative of bias within the first domain comprises information from gene expression of a first individual and the obtaining information indicative of bias within the second domain comprises from information from gene expression of a second individual.

27. The method according to claim 26 wherein the first individual and the second individual are from the same species.

28. The method according to claim 26 wherein the first individual and the second individual are from different species.

29. The method according to claim 1 further comprising applying a correction to at least one of the first domains and the second domains.

30. The method according to claim 29 wherein the applying a correction to at least one of the first domains and the second domains corrects for a bias across the first domain and the second domain.

31. The method according to claim 1 wherein at least one of the first domain and the second domain comprises genomic information.

32. A method for determining bias across two domains comprising gene expression data, the method comprising:

providing a first domain comprising preamplified gene expression data and non-preamplified gene expression data from a first sample input;

determining a bias between the preamplified gene expression data and the non-preamplified gene expression data of the first domain;

providing a second domain comprising preamplified gene expression data and non-preamplified gene expression data from a second sample input.

determining a bias between the preamplified gene expression data and the non-preamplified gene expression data of the second domain.

using the bias of the first domain and the bias of the second domain to produce an indication of bias across the first domain and the second domain.

33. The method according to claim 32 further comprising amplifying the first sample input.

34. The method according to claim 33 further comprising collecting data from the amplifying the first sample input.

35. The method according to claim 33 further comprising adding a first reference to the first sample input.

36. The method according to claim 34 further comprising using the first reference to normalize data from the amplifying the first sample input.

37. The method according to claim 34 further comprising using a ΔCT calculation to normalize data from the amplifying the first sample input.

38. The method according to claim 32 further comprising preamplifying the first sample input.

39. The method according to claim 38 further comprising amplifying the first sample input.

40. The method according to claim 39 further comprising collecting data from the amplifying the first sample input.

41. The method according to claim 39 further comprising adding a first reference to the first sample input.

42. The method according to claim 41 further comprising using the first reference to normalize data from the preamplifying and amplifying the first sample input.

43. The method according to claim 41 further comprising using a ΔCT calculation to normalize data from the preamplifying and amplifying the first sample input.

44. The method according claims 37 and 43 further comprising using a ΔΔCT calculation to determine bias.

45. The method according to claim 32 further comprising amplifying the second sample input.

46. The method according to claim 45 further comprising collecting data from the amplifying the second sample input.

47. The method according to claim 45 further comprising adding a second reference to the second sample input.

48. The method according to claim 46 further comprising using the second reference to normalize data from the amplifying the second sample input.

49. The method according to claim 46 further comprising using a ΔCT calculation to normalize data from the amplifying the second sample input.

50. The method according to claim 32 further comprising preamplifying the second sample input.

51. The method according to claim 50 further comprising amplifying the second sample input.

52. The method according to claim 51 further comprising collecting data from the amplifying the second sample input.

53. The method according to claim 50 further comprising adding a second reference to the second sample input.

54. The method according to claim 53 further comprising using the second reference to normalize data from the preamplifying and amplifying the second sample input.

55. The method according to claim 53 further comprising using a ΔCT calculation to normalize data from the preamplifying and amplifying the second sample input.

56. The method according claims 49 and 55 further comprising using a ΔΔCT calculation to determine bias.

57. A method according to claim 32 wherein the using the bias of the first domain and the bias of the second domain to produce an indication of bias across the first domain and the second domain further comprises performing a ΔΔΔCT calculation.

58. A system for determining bias across two domains, the system comprising:

a first domain stored on a media;

a second domain stored on a media;

a first algorithm for obtaining information indicative of a bias within the first domain;

a second algorithm for obtaining information indicative of a bias in the second domain;

a third algorithm for using the information indicative of the bias within the first domain and the information indicative of the bias within the second domain to produce an indication of bias across the two domains; and

an output.

59. The system according to claim 58 further comprising at least one computer.

60. The system according to claim 58 further comprising at least one PCR device.

61. The system according to claim 58 further comprising at least one microarray device.

62. The system according to claim 58 further comprising high density sequence detection system.

63. The system according to claim 58 further comprising at least one sample.

64. The system according to claim 58 further comprising a graphical use interface.

65. The system according to claim 58 further comprising a network.

66. The system according to claim 65 wherein the first domain and the second domain are at different loci on the network.

67. The system according to claim 58 further comprising a hybridization chip device.

68. The system according to claim 58 wherein at least one of the first algorithm, the second algorithm, and the third algorithm is a comparative method.

69. The system according to claim 58 wherein at least one of the first algorithm and the second algorithm is a ΔΔCT calculation.

70. The system according to claim 58 wherein the third algorithm is a ΔΔΔCT calculation.

71. The system of claim 58 wherein at least one of the first domain and the second domain comprises polynucleotide information.

72. The system of claim 58 wherein at least one of the first domain and the second domain comprises gene expression information.

73. The system of claim 58 wherein at least one of the first domain and the second domain comprises polynucleotide amplification information.

74. The system of claim 58 wherein at least one of the first domain and the second domain comprises genomic information.

75. The system of claim 58 wherein the first domain comprises information on a first tissue and the second domain comprises information on a second tissue type.

76. The system according to claim 58 wherein at least one of the first domain and the second domain comprises information from a preamplified and amplified sample and providing information from an amplified sample.

77. The system according the claim 63 wherein the at least one sample is analyzed for gene expression.

78. The system according to claim 58 further comprising a data bank of reference information.

79. The system according to claim 78 further comprising obtaining information indicative of a bias within at least one of the first domain and the second domain using the data bank of reference information.