Composition for the detection of blood cell and immunological response gene expression
The present invention relates to a composition comprising a plurality of polynucleotide probes. The composition can be used as hybridizable array elements in a microarray. The present invention also relates to a method for selecting polynucleotide probes for the composition.
Latest Incyte Corporation Patents:
 This application is a continuation of U.S. application Ser. No. 09/023,655, filed Feb. 9, 1998, the entire contents of which is hereby incorporated by reference.FIELD OF THE INVENTION
 The present invention relates to a composition comprising a plurality of polynucleotide probes for use in research and diagnostic applications.BACKGROUND OF THE INVENTION
 DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that predispose them to cancer. The array has over 50,000 DNA probes to analyze more than 400 distinct mutations of p53. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity.
 DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes. There is a growing awareness that gene expression is affected in a global fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes. In some cases the interactions may be expected, such as where the genes are part of the same signaling pathway. In other cases, such as when the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the expression of a large number of genes.
 cDNA-based arrays have been used in discovery and analysis of inflammatory disease related genes (Heller et al. (1997) Proc. Natl. Acad. Sci USA 94: 2150-2155). A first type of array was employed to characterize the expression patterns of a class of 96 genes coding for polypeptides known to be involved in rheumatoid arthritis. This array contained preselected probes for the 96 genes. A second type of array was used to investigate gene expression patterns characteristic of blood cells. This array contained probes for 1,000 human genes randomly selected from a human blood cell cDNA library.
 Current cDNA-based arrays suffer from a variety of limitations. One is the first type of array can only detect the expression patterns of a limited number of genes already associated with a disease. The expression of other, yet to be identified, relevant genes is not detected. Another is the second type of array contains probes for genes that have very little to do with the regulation of inflammation. Also, high abundance genes are likely to be over represented and low abundance genes are likely to be under represented. The present invention provides a way to overcome such limitations.SUMMARY OF THE INVENTION
 In one aspect, the present invention provides a composition comprising a plurality of polynucleotide probes, wherein each of said polynucleotide probes comprises at least a portion of a gene implicated in blood cell biology. The plurality of polynucleotide probes can be selected from I) first polynucleotide probes, wherein each of said first polynucleotide probes comprises at least a portion of a gene differentially expressed in an immunological response; II) second polynucleotide probes, wherein each of said second polynucleotide probes comprises at least a portion of a gene abundantly expressed in an immunological response;
 III) third polynucleotide probes, wherein each of said third polynucleotide probes comprises at least a portion of a gene coding for a polypeptide known to regulate blood cell biology; and IV) combinations of first, second or third polynucleotide probes.
 Preferably, the plurality of polynucleotide probes comprises:
 I) first polynucleotide probes, wherein each of said first polynucleotide probes comprises at least a portion of a gene differentially expressed in an immunological response; II) second polynucleotide probes, wherein each of said second polynucleotide probes comprises at least a portion of a gene abundantly expressed in an immunological response; and III) third polynucleotide probes, wherein each of said third polynucleotide probes comprises at least a portion of a gene coding for a polypeptide known to regulate blood cell biology.
 Generally, first polynucleotide probes are selected by a) preparing at least one first target transcript profile from a first biological sample selected from the group consisting of hematopoietic cells and inflamed tissue and at least one first subtraction transcript profile from a noninflamed, nonhematopoietic biological sample; b) subtracting said first subtraction transcript profile from said first target profile to detect a plurality of genes that are differentially expressed in an immunological response; and c) identifying one of said detected genes that are differentially expressed in an immunological response. Second polynucleotide probes are selected by a) preparing at least one second target transcript profile from a second biological sample selected from the group consisting of hematopoietic cells and inflamed tissue to detect genes that are abundantly expressed in said second biological sample; and
 b) identifying one of said detected genes that are abundantly expressed. Third polynucleotide probes are selected by a third method comprising identifying a gene coding for a polypeptide with a known function in immunological responses.
 In one preferred embodiment, the composition comprises a plurality of polynucleotide probes, wherein each polynucleotide probe comprises at least a portion of a sequence selected from the group consisting of SEQ ID Nos:1-1508. In a second preferred embodiment, the composition comprises a plurality of polynucleotide probes comprising at least a portion of at least about 1000 of the sequences of SEQ ID Nos: 1-1508. In yet another embodiment, the composition comprises a plurality of polynucleotide probes wherein said polynucleotide probes comprise at least a portion of substantially all the sequences of SEQ ID Nos: 1-1508. The polynucleotide probes can be cDNAs, clone DNAs and the like.
 The composition is particularly useful as hybridizable array elements in a microarray for monitoring the expression of a plurality of target polynucleotides. The microarray comprises a substrate and the hybridizable array elements. The microarray can be used, for example, in the diagnosis and treatment of an immunopathology.
 In another aspect, the present invention provides an expression profile that can reflect the levels of a plurality of target polynucleotides in a sample. The expression profile comprises a microarray and a plurality of detectable complexes. Each detectable complex is formed by hybridization of at least one of said target polynucletodies to at least one of said polynucleotide probes and further comprises a labeling moiety for detection.
 In yet another aspect, the present invention provides a method for identifying a plurality of polynucleotide probes. The method comprises selecting I) first polynucleotide probes, wherein each of said first polynucleotide probes comprises at least a portion of a gene differentially expressed in an immunological response; II) second polynucleotide probes, wherein each of said second polynucleotide probes comprises at least a portion of a gene abundantly expressed in an immunological response; and III) third polynucleotide probes, wherein each of said third polynucleotide probes comprises at least a portion of a gene coding for a polypeptide known to regulate blood cell biology.DESCRIPTION OF THE SEQUENCE LISTING AND TABLES
 A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
 The Sequence Listing is a compilation of nucleotide sequences obtained by sequencing clone inserts (isolates) of different cDNA libraries. Each sequence is identified by a sequence identification number (SEQ ID No:), by the clone number from which it was obtained and by the cDNA library from which the sequence was obtained.
 In accordance with the requirements of 37 CFR §1.821-1.825, and by waiver of 37 CFR § 1.2 and in compliance with new rule 37 CFR §1.52(e), Applicants hereby submit (in duplicate) two (2) identical Compact Disk-Recordable (CD-R) discs, each containing the Sequence Listing for the application. These CD-Rs are provided in compliance with 37 CFR §1.821 et seq. and as permitted by 37 CFR §1.52(e). Each file recorded on the CD-R is formatted in plain ASCII text. CD-R 1 is identified as “Copy 1 of 2”, file name: pa00011con_seqlist, created Aug. 11, 2003 and is 3.25 MB in size. CR—R 2 is identified as “Copy 2 of 2” and is an exact copy of CD-R 1. CD-R 3 is identified as “Copy 3” and is also an exact copy of CD-R 1 in the computer readable form (the CRF) of the Sequence Listing. The contents of each CD-R identified above are incorporated by reference herein in their entirety.
 Table 1 is a list if the sequences according to the SEQ ID Nos:. For SEQ ID Nos: 1-854 (homologous to polypeptide or nucleotide sequences found in GenBank, GenPept or BLOCKS databases), the first column contains Incyte clone numbers. The second column contains relevant GenBank Identification numbers, if any. The last column contains an annotation associated with the referenced GenBank identification numbers, if any. For SEQ ID Nos: 855-1508 (exact matches to GenBank) the first column contains the GenBank identification number and the second column contains an annotation associated with the referenced GenBank identification number.
 Table 2 is a list of the cDNA libraries and a description of the preparation of the cDNA libraries.
 FIG. 1 is an image derived from an experiment where gene expression of untreated THP1 cells is investigated using a microarray comprising cDNA polynucleotide probes.
 FIG. 2 is an image derived from an experiment where gene expression of treated THP cells is investigated using a microarray comprising cDNA polynucleotides.DESCRIPTION OF THE INVENTION
 The term “microarray” refers to an ordered arrangement of hybridizable array elements. The array elements are arranged so that there are preferably at least one or more different array elements, more preferably at least 100 array elements, and most preferably at least 1,000 array elements, on a 1 cm2 substrate surface. The maximum number of array elements is unlimited, but is at least 100,000 array elements. Furthermore, the hybridization signal from each of the array elements is individually distinguishable. In a preferred embodiment, the array elements comprise polynucleotide probes.
 A “polynucleotide” refers to a chain of nucleotides. Preferably, the chain has from about 100 to 10,000 nucleotides, more preferably from about 150 to 3,500 nucleotides. The term “probe” refers to a polynucleotide sequence capable of hybridizing with a target sequence to form a polynucleotide probe/target complex. A “target polynucleotide” refers to a chain of nucleotides to which a polynucleotide probe can hybridize by base pairing. In some instances, the sequences will be complementary (no mismatches). In other instances, there may be a 10% mismatch.
 A “plurality” refers preferably to a group of at least one or more members, more preferably to a group of at least about 100, and even more preferably to a group of at least about 1,000, members. The maximum number of members is unlimited, but is at least about 100,000 members.
 A “portion” means a stretch of at least about 100 consecutive nucleotides. A “portion” can also mean a stretch of at least 100 consecutive nucleotides that contains one or more deletions, insertions or substitutions. A “portion” can also mean the whole coding sequence of a gene. Preferred portions are those that lack secondary structure as identified by using computer software programs such as OLIO 4.06 Primer Analysis Software (National Biosciences), Lasergene (DNASTAR), MacDNAsis (Hitachi Software Engineering Co., Ltd.) and the like.
 The term “gene” or “genes” refers to the partial or complete coding sequence of a gene. The phrase “genes implicated in blood cell biology” refers to genes that code for polypeptides that are known to regulate blood cell biology and genes of unknown function which are differentially or abundantly expressed in hematopoiesis or immunological responses and include those listed in the Sequence Listing and in Table 1.
 The phrase “differentially expressed gene” refers to a gene whose abundance in a target transcript profile is preferably at least about 1.5× higher, more preferably about 2× higher, than that in a subtraction transcript profile. The phrase also refers to genes that are not detectable in the subtraction transcript profile but are preferably at levels of at least about 2 copies per cell, more preferably at least about 3 copies per cell, in the target transcript profile. “Abundantly expressed gene” refers to a gene which represents preferably at least about 0.01% of the transcripts in a transcript profile.
 As used herein, the profile of transcripts which reflect gene expression in a particular tissue, at a particular time, is defined as a “transcript profile”. Such profiles can be generated by naming, matching, and counting all copies of related clone inserts and arranging them in order of abundance. A “target transcript profile” refers to a profile derived from a biological sample that contains transcripts of interest along side transcripts which are not of interest. A “subtraction transcript profile” refers to a profile derived from a biological sample that contains predominantly transcripts that are not of interest.
 The phrase “blood cell biology” encompasses hematopoeisis and all variety of immunological responses, including T cell and B cell activation, monocyte activation, and the like, and immunopathology.
 “Hematopoeisis” refers to the process of blood cell growth and differentiation. “Immunological response” refers to responses elicited from blood cells including normal and immunopathological responses.
 The phrase “genes coding for a polypeptide known to regulate blood cell biology” refers to genes whose known function is related to immunological responses, such as cytokines, chemokines, growth factors, transcription factors, leukotrienes, cell surface receptors, phosphatases and the like.
 The term “hematopoietic cells” include erythrocytes, neutrophils, eosinophil, basophils, mast cells, megakaryocytes, platelets, monocytes, macrophages, dendritic cells, T lymphocytes, B lymphocytes, natural killer cells and the like. Furthermore, the term includes cells from tissues such as spleen, thymus, adenoid gland, fetal liver tissue and the like.THE INVENTION
 The present invention provides a composition comprising a plurality of polynucleotide probes comprising at least a portion of genes implicated in blood cell biology. Preferably, the polynucleotide probes comprise at least a portion of one or more of the sequences (SEQ ID Nos: 1-1508) presented in the Sequence Listing. In one preferred embodiment, the composition comprises a plurality of polynucleotide probes, wherein each polynucleotide probe comprises at least a portion of a sequence selected from the group consisting of SEQ ID Nos: 1-1508. In a second preferred embodiment, the composition comprises a plurality of polynucleotide probes comprising at least a portion of at least about 1000 of the sequences of SEQ ID Nos: 1-1508. In yet another embodiment, the composition comprises a plurality of polynucleotide probes wherein said polynucleotide probes comprise at least a portion of substantially all the sequences of SEQ ID Nos: 1-1508.
 The composition is particularly useful when it is used as hybridizable array elements in a microarray. Such a microarray can be employed to monitor the expression of genes of unknown function, but which are differentially or abundantly expressed in an immunological response or an immunopathology. In addition, the microarray can be used to monitor the expression of genes with a known function in blood cell biology.
 The microarray can be used for large scale genetic or gene expression analysis of a large number of target polynucleotides. The microarray can be used in the diagnosis of diseases and in the monitoring of treatments where altered expression of genes implicated in blood cell biology cause disease, such as cancer, an immunopathology and the like. The microarray can also be used to investigate an individual's predisposition to a disease, such as cancer, an immunopathology and the like. Furthermore, the microarray can be employed to investigate cellular responses, such as stress responses, apoptosis, cell proliferation and the like.
 When the composition of the invention is employed as hybridizable array elements in a microarray, the array elements are organized in an ordered fashion so that each element is present at a specified location on the substrate. Because the array elements are at specified locations on the substrate, the hybridization patterns and intensities (which together create a unique expression profile) can be interpreted in terms of expression levels of particular genes and can be correlated with a particular disease or condition or treatment.
 The composition comprising a plurality of polynucleotide probes can also be used to purify a subpopulation of mRNAs, cDNAs, genomic fragments and the like, in a sample. Typically, samples will include the target polynucleotides of interest and other nucleic acids which may enhance the hybridization background in the sample. Therefore it may be advantageous to remove these nucleic acids. One method for removing the additional nucleic acids is by hybridizing the sample containing target polynucleotides with immobilized polynucleotide probes under hybridizing conditions. Those nucleic acids that do not hybridize to the polynucleotide probes are washed away. At a later point, the immobilized target polynucleotide probes can be released in the form of purified target polynucleotides.
 Method for Selecting Polynucleotide Probes
 This section describes the selection of probe sequences for the plurality of polynucleotide probes. The probe sequences are selected by identifying genes coding for polypeptides with a known function in immunological responses or genes which are abundantly or differentially expressed in specific biological samples. Since some of the probe sequences are identified solely based on expression levels, it is not essential to know a priori the function of a particular gene in blood cell biology.
 The selection method is based, in part, on expression sequence tag (EST) analysis. EST analysis entails sequencing, in whole or in part, isolated clone inserts from a complementary DNA (cDNA) library, clustering overlapping sequences and determining the clustered sequences' frequency in the cDNA library.
 ESTs are sequenced by methods well known in the art. The methods can employ such enzymes as the Klenow fragment of DNA polymerase I, Taq polymerase, thermostable T7 polymerase, or combinations of polymerases and proofreading exonucleases. Preferably, the process is automated. ESTs derived from the same transcript can be combined to form a cluster of ESTs. Clusters are formed by identifying overlapping EST sequences and assembling the ESTs. A nucleic acid fragment assembly tool, such as the Phrap tool (WashU-Merck) and the GELVIEW Fragment Assembly system (Genetics Computer Group), can be used for this purpose. Clones can be arranged in clusters in descending order of abundance. The minimum number of clones necessary to constitute a cluster is two.
 After assembling EST clusters, a transcript profile for a particular biological sample is generated and the frequency or abundance of a given EST cluster can be determined. The frequency of an EST cluster in a clone population is correlated to the level of expression of a particular gene. By this process those genes that are abundantly expressed in a biological sample can be identified.
 Furthermore, EST analysis can be employed to identify genes that are differentially expressed in one biological sample (from which a target cDNA library and a target transcript profile are derived) but not in another biological sample (from which a subtraction cDNA library and a subtraction transcript profile are derived). For this purpose, transcript profiles from both biological samples are generated compared. By comparing transcript profiles those genes that are differentially expressed in a target biological sample can be identified.
 With a large enough number of transcript profiles derived from different biological samples, a statistically significant correlation can emerge between cell and tissue source information, such as disease states, treatment outcomes, exposure to various environmental factors or genotypes, and the expression levels of particular genes or groups of genes. Comparisons between transcript profiles of different cells or tissues or of the same cells or tissues under different conditions can be used to discern differences in transcriptional activities. For example, a transcript profile can show differences occurring between two different tissues, such as liver and prostate; between normal and diseased tissue, such as normal and prostate tumor or between untreated and treated tissues, such as prostate tumor and irradiated prostate tumor.
 The biological samples from which transcript profiles are derived can be from a variety of sources. For purposes of this invention, since the intent is to select polynucleotide probes useful for investigating gene expression as it relates to blood cell biology, biological samples include those derived from hematopoietic and inflamed samples and nonhematopoietic, noninflamed biological samples.
 In particular, where probe sequences are derived from genes differentially expressed in an immunological response, the transcript profiles of hematopoietic cells or tissues associated with an immunological response (normal or inflamed) are compared to those of noninflamed nonhematopoietic samples. Examples of hematopoietic cells or tissues associated with an immunological response include inflamed adenoid, bone marrow, macrophages, lymphocytes, granulocytes, spleen, tonsil, eosinophil, asthmatic lung tissue, diabetic pancreas, colon tissue derived from an individual suffering from Crohn's disease and the like. Examples of noninflamed nonhematopoietic tissue include fibroblasts, keratinocytes, fetal lung, brain, melanocytes and the like. Only those genes that are differentially expressed, i.e., the transcript levels are preferably at least about 1.5× higher, more preferably at least about 2× higher, in the hematopoietic sample than that in the nonhematopoietic sample, are selected. Additionally, genes that are not detectable in the nonhematopoietic sample but which have transcript levels of preferably at least about two (2) copies per cell, more preferably at least about three (3) copies per cell, in the hematopoeitic sample are selected.
 Where probe sequences are derived from genes that are abundantly expressed in an immunological response, the transcript profiles of hematopoietic cells or tissues associated with an inflammatory process are obtained. Only those genes whose transcripts represent preferably at least 0.01% of the transcripts in a biological sample are selected.
 For purposes of this invention, transcript profile comparisions can be obtained by methods well known to those skilled in the art. Transcript levels and profiles can be obtained and compared, for example, by a differential gene expression assay based on a quantitative hybridization of arrayed cDNA clones (Nguyen, et al. (1995) Genomics 29: 207-216), based on the serial analysis of gene expression (SAGE) technology (Velculescu et al. (1995) Science 270: 484-487), based on the polymerase chain reaction (Peng et al. (1992) Science 257: 967-971, Prashar et al. (1996) Proc. Natl. Acad. Sci. USA 93: 659-663), by a differential amplification protocol (Van Gelder et al. U.S. Pat. No. 5,545,522) or based on electronic analysis, such as the Lifeseq® Transcript Imaging tool (Incyte) or the GeneCalling and Quantitative Expression Analysis technology (Carageen). Comparisons (subtractions) between two of more transcript profiles are preferably performed electronically using the Lifeseq® Multiple Transcript Subsetting tool (Incyte).
 Selection Protocols
 The method of selecting polynucleotide probe sequences is based on three selection protocols. The polynucleotide probes of the composition can be selected by employing any one of these three selection protocols, the combination of any two of these protocols, or all the protocols.
 A first selection protocol (1) can provide for first polynucleotide probes derived from genes differentially expressed in an immunological response. A number of target cDNA libraries are prepared from first biological samples. The first biological sample can be hematopoietic cells or tissues associated with an immunological response, such as inflamed tissues. Preferably at least one cDNA library, more preferably at least four cDNA libraries, are selected from hematopoietic cells or tissues associated with an immunological response. First target transcript profiles are generated from each of the libraries. Transcript profiles can be combined to obtain an averaged transcript profile. An averaged transcript profile entails adding up the transcript abundances for each transcript from each biological sample and then dividing summed up transcript abundances by the total number of biological samples.
 A number of subtraction cDNA libraries are prepared from biological samples that are noninflamed and nonhematopoietic. Preferably at least one cDNA library, more preferably at least four cDNA libraries, are selected from noninflamed nonhematopoietic biological samples. First subtraction transcript profiles are generated from these cDNA libraries. Preferably, these transcript profiles are combined to obtain an averaged transcript profile.
 In one embodiment, the averaged transcript image from the subtraction cDNA libraries is subtracted from each target cDNA library. In another embodiment, the averaged transcript image from the subtraction cDNA libraries is subtracted from an averaged transcript image of the target cDNA libraries.
 In either case, a transcript profile is obtained showing the genes that are differentially expressed in biological samples consisting of hematopoietic cells or tissues associated with an immunological response rather than with noninflamed and nonhematopoietic biological samples. In one embodiment, the top 100 most abundant transcripts, more preferably the top 40 most abundant transcripts, are selected to generate first polynucleotide probes. In a second embodiment all upregulated transcripts are selected. By upregulated is meant that the genes are not detectable in the subtraction transcript profile but are preferably at levels of at least about 2 copies per cell, more preferably at least about 3 copies per cell, in the target transcript profile.
 A second selection protocol (II) can provide for second polynucleotide probes derived from genes abundantly expressed in an immunological response. A number of target cDNA libraries are prepared from second biological samples. The second biological sample can be hematopoietic cells or tissues associated with an inflammatory process (normal or diseased). Preferably at least one cDNA library, more preferably at least four cDNA libraries, are selected from hematopoietic cells or tissues associated with an inflammatory process. Third transcript profiles are generated from such libraries. The transcripts are ranked according to abundance. Those transcripts that are most abundant are selected. Preferably the top 100 most abundant transcripts are selected from the remaining transcripts, more preferably the top 40 most abundant transcripts, including the top 20 novel sequences (i.e., not in a public database), can be selected to generate second polynucleotide probes.
 In a third selection protocol (III), the literature is surveyed and sequences in GenBank and Lifeseq® (Incyte) screened to identify genes coding for polypeptides whose function is related to immunological responses. These genes can be selected to generate third polynucleotide probes.
 The resulting composition can comprise polynucleotide probes that are not redundant, i.e., there is no more than one polynucleotide probe to represent a particular gene. Alternatively, the composition can contain polynucleotide probes that are redundant, i.e., a gene is represented by more than one polynucleotide probe.
 The selected polynucleotide probes may be manipulated further to optimize the performance of the polynucleotide probes as hybridization probes. Some probes may not hybridize effectively under hybridization conditions due to secondary structure. To optimize probe hybridization, the probe sequences are examined using a computer algorithm to identify portions of genes without potential secondary structure. Such computer algorithms are well known in the art, such as OLIO 4.06 Primer Analysis Software (National Biosciences) or Lasergene (DNASTAR). These programs can search nucleotide sequences to identify stem loop structures and tandem repeats and to analyze G+C content of the sequence (those sequences with a G+C content greater than 60% are excluded). Alternatively, the probes can be optimized by trial and error. Experiments can be performed to determine whether probes and complementary target polynucleotides hybridize optimally under experimental conditions.
 Where the number of different polynucleotide probes is desired to be greatest, the probe sequences are extended to assure that different polynucleotide probes are not derived from the same gene, i.e., the polynucleotide probes are not redundant. The probe sequences may be extended utilizing the partial nucleotide sequences derived from EST sequencing by employing various methods known in the art. For example, one method which may be employed, “restriction-site” PCR, uses universal primers to retrieve unknown sequence adjacent to a known locus (Sarkar, G. (1993) PCR Methods Applic. 2: 318-322).
 Polynucleotide Probes
 This section describes the polynucleotide probes. The polynucleotide probes can be DNA or RNA, or any RNA-like or DNA-like material, such as peptide nucleic acids, branched DNAs and the like.
 The polynucleotide probes can be sense or antisense polynucleotide probes. Where target polynucleotides are double stranded, the probes may be either sense or antisense strands. Where the target polynucleotides are single stranded, the nucleotide probes are complementary single strands.
 In one embodiment, the polynucleotide probes are cDNAs. The size of the DNA sequence of interest may vary, and is preferably from about 100 to 10,000 nucleotides, more preferably about from 150 to 3,500 nucleotides.
 In a second embodiment, the polynucleotide probes are clone DNAs. In this case the size of the DNA sequence of interest, i.e., the insert sequence excluding the vector DNA, may vary from 100 to 10,000 nucleotides, more preferably from 150 to 3,500 nucleotides.
 The polynucleotide probes can be prepared by a variety of synthetic or enzymatic schemes which are well known in the art. The probes can be synthesized, in whole or in part, using chemical methods well known in the art. (Caruthers et al. (1980) Nucl. Acids Res. Symp. Ser. 215-233). Alternatively, the probes can be generated, in whole or in part, enzymatically.
 Nucleotide analogues can be incorporated into the polynucleotide probes by methods well known in the art. The only requirement is that the incorporated nucleotide analogues must serve to base pair with target polynucleotide sequences. For example, certain guanine nucleotides can be substituted with hypoxanthine which base pairs with cytosine residues. However, these base pairs are less stable than those between guanine and cytosine. Alternatively, adenine nucleotides can be substituted with 2, 6-diaminopurine which can form stronger base pairs than those between adenine and thymidine.
 Additionally, the polynucleotide probes can include nucleotides that have been derivatized chemically or enzymatically. Typical chemical modifications include derivatization with acyl, alkyl, aryl or amino groups.
 The polynucleotide probes can be immobilized on a substrate. Preferred substrates are any suitable rigid or semirigid support including membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles and capillaries. The substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which the polynucleotide probes are bound. Preferably, the substrates are optically transparent.
 Probes can be synthesized, in whole or in part, on the surface of a substrate by using a chemical coupling procedure and a piezoelectric printing apparatus, such as that described in PCT publication WO95/251116 (Baldeschweiler et al.). Alternatively, the probe can be synthesized using a self-addressable electronic device that controls when reagents are added (Heller et al. U.S. Pat. No. 5,605,662) or by photolysis using imaging fibers for light delivery (Healey et al. (1995) Science 269: 1078-80).
 Complementary DNA (cDNA) can be arranged and then immobilized on a substrate. The probes can be immobilized by covalent means such as by chemical bonding procedures or UV. In one such method, a cDNA is bound to a glass surface which has been modified to contain epoxide or aldehyde groups. In another case, a cDNA probe is placed on a polylysine coated surface and then UV cross-linked (Shalon et al. PCT publication WO95/35505, herein incorporated by reference). In yet another method, a DNA is actively transported from a solution to a given position on a substrate by electrical means (Heller et al. U.S. Pat. No. 5,605,662). Alternatively, individual DNA clones can be gridded on a filter. Cells are lysed, proteins and cellular components degraded and the DNA coupled to the filter by UV cross-linking.
 Furthermore, the probes do not have to be directly bound to the substrate, but rather can be bound to the substrate through a linker group. The linker groups are typically about 6 to 50 atoms long to provide exposure to the attached polynucleotide probe. Preferred linker groups include ethylene glycol oligomers, diamines, diacids and the like. Reactive groups on the substrate surface react with one of the terminal portions of the linker to bind the linker to the substrate. The other terminal portion of the linker is then functionalized for binding the polynucleotide probe.
 The polynucleotide probes can be attached to a substrate by dispensing reagents for probe synthesis on the substrate surface or by dispensing preformed DNA fragments or clones on the substrate surface. Typical dispensers include a micropipette delivering solution to the substrate with a robotic system to control the position of the micropipette with respect to the substrate. There can be a multiplicity of dispensers so that reagents can be delivered to the reaction regions simultaneously.
 Sample Preparation
 In order to conduct sample analysis, a sample containing target polynucleotides is provided. The samples can be any sample containing target polynucleotides and obtained from any bodily fluid (blood, urine, saliva, phlegm, gastric juices, etc.), cultured cells, biopsies, or other tissue preparations. DNA or RNA can be isolated from the sample according to any of a number of methods well known to those of skill in the art. For example, methods of purification of nucleic acids are described in Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed. Elsevier (1993). In one case, total RNA is isolated using the TRIZOL total RNA isolation reagent (Life Technologies) and mRNA is isolated using olio d(T) column chromatography or glass beads. Alternatively, when target polynucleotides are derived from an mRNA, the target polynucleotides can be a cDNA reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA amplified from that cDNA, an RNA transcribed from the amplified DNA, and the like. When the target polynucleotide is derived from DNA, the target polynucleotide can be DNA amplified from DNA or RNA reverse transcribed from DNA. In yet another alternative, the targets are target polynucleotides prepared by more than one method.
 When target polynucleotides are amplified it is desirable to amplify the nucleic acid sample and maintain the relative abundances of the original sample, including low abundance transcripts. Total mRNA can be amplified by reverse transcription using a reverse transcriptase and a primer consisting of olio d(T) and a sequence encoding the phage T7 promoter to provide a single stranded DNA template.
 The second cDNA strand is polymerized using a DNA polymerase and a RNAse which assists in breaking up the DNA/RNA hybrid. After synthesis of the double stranded cDNA, T7 RNA polymerase can be added and RNA transcribed from the second cDNA strand template (Van Gelder et al. U.S. Pat. No. 5,545,522). RNA can be amplified in vitro, in situ or in vivo (See Eberwine U.S. Pat. No. 5,514,545).
 It is also advantageous to include quantitation controls within the sample to assure that amplification and labeling procedures do not change the true distribution of target polynucleotides in a sample. For this purpose, a sample is spiked with a known amount of a control target polynucleotide and the composition of polynucleotide probes includes reference polynucleotide probes which specifically hybridize with the control target polynucleotides. After hybridization and processing, the hybridization signals obtained should reflect accurately the amounts of control target polynucleotide added to the sample.
 Prior to hybridization, it may be desirable to fragment the nucleic acid target polynucleotides. Fragmentation improves hybridization by minimizing secondary structure and cross-hybridization to other nucleic acid target polynucleotides in the sample or noncomplementary polynucleotide probes. Fragmentation can be performed by mechanical or chemical means.
 The target polynucleotides may be labeled with one or more labeling moieties to allow for detection of hybridized probe/target polynucleotide complexes. The labeling moieties can include compositions that can be detected by spectroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. The labeling moieties include radioisotopes, such as 32P, 33P or 35S, chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, such as fluorescent markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the like.
 Exemplary dyes include quinoline dyes, triarylmethane dyes, phthaleins, azo dyes, cyanine dyes and the like. Preferably, fluorescent markers absorb light above about 300 nm, preferably above 400 nm, and usually emit light at wavelengths at least greater than 10 nm above the wavelength of the light absorbed. Specific preferred fluorescent markers include fluorescein, phycoerythrin, rhodamine, lissamine, and C3 and C5 available from Amersham.
 Labeling can be carried out during an amplification reaction, such as polymerase chain and in vitro transcription reactions, or by nick translation or 5′ or 3′-end-labeling reactions. In one case, labeled nucleotides are used in an in vitro transcription reaction. When the label is incorporated after or without an amplification step, the label is incorporated by using terminal transferase or by kinasing the 5′ end of the target polynucleotide and then incubating overnight with a labeled oligonucleotide in the presence of T4 RNA ligase.
 Alternatively, the labeling moiety can be incorporated after hybridization once a probe/target complex has formed. In one case, biotin is first incorporated during an amplification step as described above. After the hybridization reaction, unbound nucleic acids are rinsed away so that the only biotin remaining bound to the substrate is that attached to target polynucleotides that are hybridized to the polynucleotide probes. Then, an avidin-conjugated fluorophore, such as avidin-phycoerythrin, that binds with high affinity to biotin is added. In another case, the labeling moiety is incorporated by intercalation into preformed target/polynucleotide probe complexes. In this case, an intercalating dye such as a psoralen-linked dye can be employed.
 Under some circumstances it may be advantageous to immobilize the target polynucleotides on a substrate and have the polynucleotide probes bind to the immobilized target polynucleotides. In such cases the target polynucleotides can be attached to a substrate as described above.
 Hybridization and Detection
 Hybridization causes a denatured polynucleotide probe and a denatured complementary target to form a stable duplex through base pairing. Hybridization methods are well known to those skilled in the art (See, for example, Laboratory Techniques in Biochemistry and Molecular Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y. (1993)). Conditions can be selected for hybridization where exactly complementary target and polynucleotide probe can hybridize, i.e., each base pair must interact with its complementary base pair. Alternatively, conditions can be selected where target and polynucleotide probes have mismatches but are still able to hybridize. Suitable conditions can be selected, for example, by varying the concentrations of salt or formamide in the prehybridization, hybridization and wash solutions, or by varying the hybridization and wash temperatures.
 Hybridization can be performed at low stringency with buffers, such as 6× SSPE with 0.005% Triton X-100 at 37° C., which permits hybridization between target and polynucleotide probes that contain some mismatches to form target polynucleotide/probe complexes. Subsequent washes are performed at higher stringency with buffers, such as 0.5×SSPE with 0.005% Triton X-100 at 50° C., to retain hybridization of only those target/probe complexes that contain exactly complementary sequences. Alternatively, hybridization can be performed with buffers, such as 5×SSC/0.2% SDS at 60° C. and washes are performed in 2×SSC/0.2% SDS and then in 0.1×SSC. Stringency can also be increased by adding agents such as formamide. Background signals can be reduced by the use of detergent, such as sodium dodecyl sulfate, Sarcosyl or Triton X-100, or a blocking agent, such as sperm DNA.
 Hybridization specificity can be evaluated by comparing the hybridization of specificity-control polynucleotide probes to specificity-control target polynucleotides that are added to a sample in a known amount. The specificity-control target polynucleotides may have one or more sequence mismatches compared with the corresponding polynucleotide probes. In this manner, whether only complementary target polynucleotides are hybridizing to the polynucleotide probes or whether mismatched hybrid duplexes are forming is determined.
 Hybridization reactions can be performed in absolute or differential hybridization formats. In the absolute hybridization format, target polynucleotides from one sample are hybridized to the probes in a microarray format and signals detected after hybridization complex formation correlate to target polynucleotide levels in a sample. In the differential hybridization format, the differential expression of a set of genes in two biological samples is analyzed. For differential hybridization, target polynucleotides from both biological samples are prepared and labeled with different labeling moieties. A mixture of the two labeled target polynucleotides is added to a microarray. The microarray is then examined under conditions in which the emissions from the two different labels are individually detectable. Probes in the microarray that are hybridized to substantially equal numbers of target polynucleotides derived from both biological samples give a distinct combined fluorescence (Shalon et al. PCT publication WO95/35505). In a preferred embodiment, the labels are fluorescent labels with distinguish-able emission spectra, such as a lissamine conjugated nucleotide analog and a fluorescein conjugated nucleotide analog. In another embodiment C3/C5 fluorophores (Amersham) are employed.
 After hybridization, the microarray is washed to remove nonhybridized nucleic acids and complex formation between the hybridizable array elements and the target polynucleotides is detected.
 Methods for detecting complex formation are well known to those skilled in the art. In a preferred embodiment, the target polynucleotides are labeled with a fluorescent label and measurement of levels and patterns of fluorescence indicative of complex formation is accomplished by fluorescence microscopy, preferably confocal fluorescence microscopy. An argon ion laser excites the fluorescent label, emissions are directed to a photomultiplier and the amount of emitted light detected and quantitated. The detected signal should be proportional to the amount of probe/target polynucleotide complex at each position of the microarray. The fluorescence microscope can be associated with a computer-driven scanner device to generate a quantitative two-dimensional image of hybridization intensity. The scanned image is examined to determine the abundance/expression level of each hybridized target polynucleotide.
 In a differential hybridization experiment, target polynucleotides from two or more different biological samples are labeled with two or more different fluorescent labels with different emission wavelengths. Fluorescent signals are detected separately with different photomultipliers set to detect specific wavelengths. The relative abundances/expression levels of the target polynucleotides in two or more samples is obtained.
 Typically, microarray fluorescence intensities can be normalized to take into account variations in hybridization intensities when more than one microarray is used under similar test conditions. In a preferred embodiment, individual polynucleotide probe/target complex hybridization intensities are normalized using the intensities derived from internal normalization controls contained on each microarray.
 Expression Profiles
 This section describes an expression profile using the composition of this invention. The expression profile can be used to detect changes in the expression of genes implicated in blood cell biology. These genes include genes whose altered expression is correlated with cancer, immunopathology, apoptosis and the like.
 The expression profile comprises the polynucleotide probes of the invention. The expression profile also includes a plurality of detectable complexes. Each complex is formed by hybridization of one or more polynucleotide probes to one or more complementary target polynucleotides. At least one of the polynucleotide probes, preferably a plurality of polynucleotide probes, is hybridized to a complementary target polynucleotide forming, at least one, preferably a plurality of complexes. A complex is detected by incorporating at least one labeling moiety in the complex. The labeling moiety has been described above.
 The expression profiles provide “snapshots” that can show unique expression patterns that are characteristic of a disease or condition.UTILITY OF THE INVENTION
 The composition comprising a plurality of polynucleotide probes can be used as hybridizable elements in a microarray. Such a microarray can be employed in several applications including diagnostics, prognostics and treatment regimens, drug discovery and development, toxicological and carcinogenicity studies, forensics, pharmacogenomics and the like.
 In one situation, the microarray is used to monitor the progression of disease. Researchers can assess and catalog the differences in gene expression between healthy and diseased tissues or cells. By analyzing changes in patterns of gene expression, disease can be diagnosed at earlier stages before the patient is symptomatic.
 Similarly, the invention can be used to monitor the progression of disease or the efficacy of treatment. For some treatments with known side effects, the microarray is employed to “fine tune” the treatment regimen. A dosage is established that causes a change in genetic expression patterns indicative of successful treatment. Expression patterns associated with undesirable side effects are avoided. This approach may be more sensitive and rapid than waiting for the patient to show inadequate improvement, or manifest symptoms, before altering the course of treatment.
 Alternatively, animal models which mimic a disease, rather than patients, can be used to characterize expression profiles associated with a particular disease or condition. For example, a characteristic gene expression pattern for the graft versus host reaction can be generated using analogous reactions that occur when lymphocytes from one donor are mixed with lymphocytes from another donor. This gene expression data may be useful in diagnosing and monitoring the course of graft versus host reaction in a patient, in determining gene targets for intervention, and in testing novel immunosuppressants.
 The composition is particularly useful for diagnosing and monitoring the progression of diseases that are associated with the altered expression of genes implicated in blood cell biology. The expression of these genes is associated with cellular processes such as hematopoiesis, immunological responses, immunopathology, cell proliferation, apoptosis, and the like. Thus, the microarray is particularly useful to diagnose immunopathologies including, but not limited to, AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, asthma, atherosclerosis, bronchitis, cholecystitus, Crohn's disease, ulcerative colitis, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, atrophic gastritis, glomerulonephritis, gout, Graves' disease, hypereosinophilia, irritable bowel syndrome, lupus erythematosus, multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, rheumatoid arthritis, scleroderma, Sjogren's syndrome, and autoimmune thyroiditis; viral, bacterial, fungal, parasitic, and protozoal infections and trauma.
 The invention also allows researchers to develop sophisticated profiles of the effects of currently available therapeutic drugs. Tissues or cells treated with these drugs can be analyzed and compared to untreated samples of the same tissues or cells. In this way, an expression profile of known therapeutic agents will be developed. Knowing the identity of sequences that are differentially regulated in the presence and absence of a drug will allow researchers to elucidate the molecular mechanisms of action of that drug.
 Also, researchers can use the microarray to rapidly screen large numbers of candidate drugs, looking for ones that have an expression profile similar to those of known therapeutic drugs, with the expectation that molecules with the same expression profile will likely have similar therapeutic effects. Thus, the invention provides the means to determine the molecular mode of action of a drug.
 It is understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary. It is also understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. The examples below are provided to illustrate the subject invention and are not included for the purpose of limiting the invention.EXAMPLES
 I cDNA Library Construction
 For purposes of example, the preparation and sequencing of the LNODNOT03 cDNA library, from which Incyte Clones 1573272, 1573553, 1574415, 1574617, 1574637, and 1576661 were isolated, is described in detail. Preparation and sequencing of cDNA libraries in the LifeSeq® database have varied over time, and the gradual changes involved use of kits, plasmids, and machinery available at the particular time the library was made and analyzed.
 The LNODNOT03 cDNA library was constructed from microscopically normal lymph node tissue excised from a 67-year-old Caucasian male. This tissue was associated with tumorous lung tissue. The patient history included squamous cell carcinoma of the lower lobe, benign hypertension, arteriosclerotic vascular disease, and tobacco abuse. The patient was taking Doxycycline, a tetracycline, to treat an infection.
 The frozen tissue was homogenized and lysed using a Brinkmann Homogenizer Polytron PT-3000 (Brinkmann Instruments, Westbury, N.J.) in guanidinium isothiocyanate solution. The lysate was centrifuged over a
 5.7 M CsCl cushion using an Beckman SW28 rotor in a Beckman L8-70M Ultracentrifuge (Beckman Instruments) for 18 hours at 25,000 rpm at ambient temperature. The RNA was extracted with acid phenol pH 4.7, precipitated using 0.3 M sodium acetate and 2.5 volumes of ethanol, resuspended in RNAse-free water, and DNase treated at 37° C. The RNA extraction was repeated with acid phenol pH 4.7 and precipitated with sodium acetate and ethanol as before. The mRNA was then isolated using the Qiagen Oligotex kit (QIAGEN, Inc., Chatsworth, Calif.) and used to construct the cDNA library.
 The mRNA was handled according to the recommended protocols in the SuperScript Plasmid System for cDNA Synthesis and Plasmid Cloning
 (Cat. #18248-013, Gibco/BRL). cDNAs were fractionated on a Sepharose CL4B column (Cat. #275105-01, Pharmacia), and those cDNAs exceeding 400 BP were ligated into pSPORT I. The plasmid pSPORT I was subsequently transformed into DH5 &mgr;m competent cells (Cat. #18258-012, Gibco/BRL).
 II cDNA Library Normalization
 In some cases, cDNA libraries have been normalized in a single round according to the procedure of Soares et al. ((1994), Proc. Natl. Acad. Sci. USA 91: 9928-9932) with the following modifications. The primer to template ratio in the primer extension reaction was increased from 2:1 to 10:1. The dNTP concentration in the reaction was reduced to 150 &mgr;M each dNTP, allowing the generation of longer (400-100 nt) primer extension products. The reannealing hybridization was extended from 13 to 19 hours. The single stranded DNA circles of the normalized library were purified by hydroxyapatite chromatography and converted to partially double-stranded by random priming, followed by electroporation into DH10B competent bacteria (Gibco/BRL).
 The Soares normalization procedure is designed to reduce the initial variation in individual cDNA frequencies to achieve abundances within one order of magnitude while maintaining the overall sequence complexity of the library. In the normalization process, the prevalence of high-abundance cDNA clones decreases significantly, clones with mid-level abundance are relatively unaffected, and clones for rare transcripts are effectively increased in abundance. In the modified Soares normalization procedure, significantly longer hybridization times are used which allows for the increase of gene discovery rates by biasing the normalized libraries toward low-abundance cDNAs that are well represented in a standard transcript image.
 III Isolation and Sequencing of cDNA Clones
 Plasmid cDNA was released from the cells and purified using the REAL Prep 96 plasmid kit (Catalog #26173, QIAGEN). The recommended protocol was employed except for the following changes: 1) the bacteria were cultured in 1 ml of sterile Terrific Broth (Catalog #22711, GIBCO-BRL) with carbenicillin at 25 mg/L and glycerol at 0.4%; 2) after inoculation, the cultures were incubated for 19 hours and at the end of incubation, the cells were lysed with 0.3 ml of lysis buffer; and 3) following isopropanol precipitation, the plasmid DNA pellet was resuspended in 0.1 ml of distilled water. After the last step in the protocol, samples were transferred to a 96-well block for storage at 4° C.
 cDNAs were sequenced according to the method of Sanger et al. ((1975), J. Mol. Biol. 94: 441f), using the Perkin Elmer Catalyst 800 or a Hamilton Micro Lab 2200 (Hamilton, Reno, Nev.) in combination with Peltier Thermal Cyclers (PTC200 from MJ Research, Watertown, Mass.) and Applied Biosystems 377 DNA Sequencing Systems or the Perkin Elmer 373 DNA Sequencing System and the reading frame was determined.
 IV Homology Searching of cDNA Clones and Their Deduced Proteins
 As used herein, “homology” refers to sequence similarity between a reference sequence and at least a portion of a newly sequenced clone insert, and can refer to either a nucleic acid or amino acid sequence. The Genbank databases which contain previously identified and annotated sequences, were searched for regions of homology using BLAST (Basic Local Alignment Search Tool). (See, e.g., Altschul, S. F. (1993) J. Mol. Evol. 36: 290-300; and Altschul et al. (1990) J. Mol. Biol. 215: 403-410.)
 BLAST involves first finding similar segments between the query sequence and a database sequence, then evaluating the statistical significance of any matches that are found and finally reporting only those matches that satisfy a user-selectable threshold of significance. BLAST produces alignments of both nucleotide and amino acid sequences to determine sequence similarity. The fundamental unit of the BLAST algorithm output is the High scoring Segment Pair (HSP). An HSP consists of two sequence fragments of arbitrary, but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user.
 The basis of the search is the product score, which is defined as: 1 % ⁢ ⁢ sequence ⁢ ⁢ identity × ⁢ % ⁢ ⁢ maximum ⁢ ⁢ BLAST ⁢ ⁢ score 100
 The product score takes into account both the degree of similarity (identity) between two sequences and the length of the sequence match as reflected in the BLAST score. The BLAST score is calculated by scoring +5 for every base that matches in an HSP and 4 for every mismatch. For example, with a product score of 40, the match will be exact within a 1% to 2% error, and, with a product score of 70, the match will be exact. Homologous molecules are usually identified by selecting those which show product scores between 15 and 40, although lower scores may identify related molecules. The P-value for any given HSP is a function of its expected frequency of occurrence and the number of HSPs observed against the same database sequence with scores at least as high.
 V Transcript Imaging
 Transcript profiles were generated using the Lifeseq® Transcript Imaging tool (Incyte). To identify genes that are differentially expressed in hematopoietic or inflamed biological samples, reverse transcript profiles from specific target cDNA library pools derived from either hematopoietic or inflamed biological samples were obtained. The number of cDNA libraries in a cDNA library pool varied from one cDNA library member to 6 cDNA library members. For library pools which contained more than one member, an averaged reverse transcript image was obtained. The target library pools were the following: 1. ADENINB01; 2. COLSUCT01, COLNNOT23, SINTNOT13; 3. COLNCRT01, COLNNOT27, SINTBST01; 4. ENDCNOT01, ENDCNOT02, ENDCNOT03, HUVELPB01, HUVENOB01, HUVESTB01; 5. EOSIHET02, UCMCL5T01;
 6. HMC1NOT01; 7. LNODNOT02, LNODNOT03; 8. LUNGAST01; 9. MMLR1DT01, MMLR2DT01, MMLR3DT01; 10. NEUTFMT01, NEUTGMT01, NEUTLPT01; 11. PANCDIT01, 12. TONSNOT01; 13. SYNOOAT01; 14. SYNORAB01, SYNORAT01, SYNORAT03, SYNORAT04, SYNORAT05; 15. THP1PLB01, THP1PLB02; and 16. TMLR2DT01, TMLR3DT01, TMLR3DT02.
 Reverse transcript profiles were also derived from 39 subtraction cDNA libraries which were derived from predominantly nonhematopoetic noninflamed biological samples. The following is a list of the cDNA libraries: FIBRAGT01, FIBRAGT02, FIBRANT01, FIBRNGT01, FIBRNGT02, FIBRSEM01, KERANOT01, COLNNOM01, COLNTUM01, EYECNOM01, FIBRFEM01, HNT2NOM01, OVARTUM02, PANCTUM01, UTRPNOM01, CARDFEM01, FIBRSEM01, PANCISM01, LUNGFEM01, PTHYTUM01, BRAINOM01, BRAINOM02, BRAINOM03, BRSTNOM01, BRSTNOM02, COCHFEM01, LIVRNOM01, LUNGNOM01, MELANOM01, NERVMSM01, OLFENOM01, OVARNOM01, OVARTUM01, PINENOM01, PLACNOM01, PLACNOM02, PLACNOM03, RETNNOM01, and RETNNOM02. In some cases, one or more subtraction libraries were derived from hematopoietic or inflamed biological samples, such as the NERVMSM01 library derived from the tissue of a patient suffering from multiple sclerosis. An averaged subtraction transcript profile was obtained by pooling the transcript information from all 39 libraries.
 The LifeSeq Multiple Transcript Subsetting tool (Incyte) was used to subtract the averaged subtraction transcript profile from each target cDNA library pool. A list of subtracted transcripts which consisted of clustered ESTs was generated. The subtracted transcripts were ranked according to abundance. The 40 most abundant transcripts were selected from each set of subtracted library pools.
 To identify additonal genes that are differentially expressed in immunological responses, reverse transcript profiles from specific target cDNA libraries derived from inflamed biological biological samples were obtained. Subtraction cDNA library transcript profiles were generated from healthy counterparts. In some cases the subtraction transcript profile was created by averaging transcript images from a pool of cDNA libraries. Target and subtraction cDNA libraries are listed in the following table. 1 TARGET LIBRARY SUBTRACTION LIBRARIES LUNGAST01 LUNGFEMO1, LUNGFET03, LUNGNOM01, LUNGNOTO1, LUNGNOT02, LUNGNOT03, LUNGNOT04, LUNGNOT09, LUNONOT10, LUNGNOT12, LUNGNOT14, LUNGNOT15, LUNGNOT18, LUNGNOT20 COLNCRT01 COLNNOT05 HUVELPBO1 HUVENOB01, HUVESTB01 PANCDIT01 PANCNOT01, PANCNOT04, PANCNOT05
 A list of subtracted transcripts which consisted of clustered ESTs was generated. After subtracting transcript profiles, all upregulated sequences were selected.
 To identify novel genes that are abundantly expressed in an immunological response, 43 cDNA libraries derived from hematopoietic or inflamed biological samples were picked. These libraries were: ADENINB01, AMLBNOT01, BMARNOR02, BMARNOT02, BMARNOT03, EOSIHET02, HMC1NOT01, LEUKNT02, LEUKNOT03, LNODNOT02, LNODNOT03, MMLR1DT01, MMLR2DT01, MMLR3DT01, MPHGLPT02, MPHGNOT02, MPHGNOT03, NEUTFMT01, NEUTGMT01, NEUTLPT01, THYMNOT02, TLYMNOT01, TLYMNOT02, TMLR2DT01, TMLR3DT01, TMLR3DT02, SPLNFET01, SPLNFET02, SPLNNOT02, SPLNNOT04, TBLYNOT01, THP1NOB01, THP1NOT01, THP1NOTO3, THP1AZT01, THP1PEB01, THP1PLB01, THP1PLB02, THP1T7T01, TONSNOT01, U937NOT01, UCMCL5T01, and UCMCNOT02.
 Reverse transcript profiles were generated from the 43 cDNA libraries using a product score of 100 as the maximum cutoff. Using this setting returns all the sequences in the transcript profile. The top most abundant sequences were selected. Reverse transcript profiles were also generated from the chosen libraries using a product score of 70 as the maximum cutoff. This procedure effectively removed exact matches to gene sequences found in GenBank. A list of transcripts which consisted of clustered ESTs was generated. The top 20 most abundant sequences were selected.
 To identify genes known to be associated with the regulation of blood cell biology, the literature was surveyed and relevant sequences in GenBank and Lifeseq® databases were identified. Genes were selected on the following basis. Genes were identified from the literature that are involved in blood cell biology. Then GenBank and Lifeseq® databases were screened to identify other genes containing homologous sequences using BLAST. Additionally, the Lifeseq® database was searched using a group of key words including cyclin, cul, phosphatase, apoptosis, kinase, serine kinase, tyrosine kinase, cdk, phosphodiesterase, protease, protease inhibitor, metalloproteinase, cathepsin, phospholipase, E2F, integrin, receptor, cytochrome, p450, cox, lipocortin, retinoic acid, CD, cdc, fas, TNF, gadd, cytokine, chemokine, growth factor, interleukin, heat shock protein, HSP, stress, STAT, myb, jun, fos, dpl, myc, bak, bcl, p53, phox, inflammation, oxidase, and glutathione.
 After selecting the transcripts (ESTs) of interest, partial sets of ESTs representing a single gene were identified. Partial sets of ESTs for the same gene but identified by different selection methods were clustered. Sets of partial sequences (and their quality scores) were used to assemble contiguous sequences using the Phrap program with default settings. The longest “high quality” set of partial sequences was chosen. “High quality” is defined as sequences starting and ending with at least 10 contiguous base calls with quality scores above 12. When performing the clustering process, the full Lifeseq® database (Incyte) was searched in order to obtain the longest “high quality” set of partial sequences.
 After clustering related ESTs, EST clusters were checked for redundancy. The cDNAs were compared to each other using the BLASTn database search program. Any two sequences with similarity scores greater than 250 and percent identities greater than 95% were considered redundant. A representative cDNA sequence from each redundancy set was chosen and corresponds to the EST with the longest read sequence. In some cases, the representative cDNA sequence did not originate from the original cDNA libraries used in the selection processes.
 Full length cDNAs (identified from Genbank or Lifeseq®) and EST sets were also compared with each other to remove redundant sequences.
 Illustrative polynucleotide probes for use in this invention are provided in the Sequence Listing and are SEQ ID Nos: 1-1508. The polynucleotide probes are derived from genes implicated in blood cell biology, including hematopoiesis, immunological responses and immuno-pathology. Of the 1,508 nonredundant polynucleotide probes 43% were exact matches to sequences in the public domain, 57% were homolgous to public domain sequences or unique sequences which are abundantly or differentially expressed in blood cell biology. Some of the public domain sequences were not known to be abundantly expressed or differentially expressed in hematopoiesis or immunological responses.
 VI Preparation of a Microarray
 A microarray was prepared as follows: 96 different PCR polynucleotide probes were laid down in quadruplicate (4 arrays with 100 spots each) on a aldehyde derivatized slide available from Cel Associates. The glass slide had dimensions of 18×24 mm. The distance between the spots on the array is 500 microns. Samples were printed on the glass slide from the left upper corner to the right lower corner in the following order: g177865 (Human tumor necrosis factor alpha); g177869 (Human alpha-2-macroglobulin); g178163 (Human ADP-ribosylation factor 1); g219475 (Human immediate-early-response); g179699 (Human C5a anaphylatoxin receptor); g179892 (Human cAMP phosphodiesterase); g184840 (Human Fc-gamma receptor I); g 181181 (Human cathepsin G); g 181485 (Human DNA-binding protein B); g182487 (Human Fc-epsilon-receptor gamma-chain); g182504 (Human ferritin H chain mRNA); g182632 (Human FKBP-12 protein); g182976 (Human glyceraldehyde-3-phosphate dehydrogenase); g183063 (Human glia-derived nexin); g183067 (Human mRNA sequence with homology to GDP binding protein); g31914 (Human mRNA for coupling protein G(s) alpha); g184420 (Human 90-kDa heat-shock protein); g186264 (Human gamma-interferon-inducible protein); g179579 (Human beta-thromboglobulin-like protein); g187172 (Human leukotriene A-4 hydrolase); g187220 (Human L-plastin gene); g187243 (Human lysozyme mRNA); g188255 (Human MHC class II HLA-DR-alpha); g189150 (Human nephropontin); g190813 (Human Wilm's tumor-related protein); g189267 (Human neutrophil oxidase factor; g219868 (Human HM89); g182482 (Human fibroblast collagenase inhibitor); g189546 (Human plasminogen activator inhibitor); g899458 (Human 14-3-3 protein); g184628 (Human interleukin 6); g250802 (cathepsin S); g264772 (thymosin beta-10); g28251 (beta-actin.); g291926 (Human cystatin B); g292416 (Human macrophage inflammatory protein); g29508 (Human BTG1); g181179 (Human cathepsin D); g179952 (Human cathepsin L); g186933 (Human leukocyte adhesion protein); g29793 (leukocyte antigen CD37); g184628 (Human interleukin 6); g29850 (Human CDw40 nerve growth factor); g238776 (p55); g306467 (Human binding protein mRNA); g306486 (Human cap-binding protein); g306773 (Human GM-CSF receptor); g307165 (Human myeloid cell differentiation protein); g307374 (Human RHOA proto-oncogene multi-drug-resistance); g31097 (Human mRNA for elongation factor 1 alpha); g32576 (Human mRNA for interleukin-1 receptor); g220063 (Human sphingolipid activator proteins); g186283 (Human interleukin 1-beta); g190419 (Human secretory granule proteoglycan peptide); g33917 (Human mRNA for gamma-interferon inducible protein); g339420 (Human T cell-specific protein (RANTES)); g339688 (Human thymosin beta4 mRNA); g339690 (Human prothymosin alpha); g339737 (Human tumor necrosis factor (TNF)); g340020 (Human alpha-tubulin); g34312 (Human mRNA for lactate dehydrogenase-A); g187434 (Human monocyte chemotactic and activatinG factor); g179579 (Human beta-thromboglobulin-like protein); g34625 (Human gene for melanoma growth stimulator); g188558 (Human macrophage inflammatory protein); g181191 (Human cathepsin B proteinase); g348911 (Human glycoprotein); g337494 (Human ribosomal protein L7a); g35517 (Human mRNA for pleckstrin (P47)); g1562497 (Human poly(A)-binding protein); g338285 (Human manganese-containing superoxide dismutase); g339737 (Human tumor necrosis factor (TNF)); g404012 (Human pre-B cell enhancing factor); g495286 (Human melanoma differentiation associated factor); g416368 (Human CTLA4 counter-receptor); g433415 (Human mRNA for DNA-binding protein, TAXR); g434760 (Human mRNA for ORF); g598867 (Human HepG2 partial cDNA); g450280 (Human suilisol); g517197 (Human urokinase-type plasminogen activator); g187151 (Human lysosomal acid lipase); g468150 (Human MAP kinase); g496975 (Human cyclooxygenase-2); g186512 (Human (clone 1950.2) interferon-gamma); g560790 (Human mRNA for calgizzarin); g1255239 (Human lysosomal-associated multitransmembrane protein); g245388 (beta 2-microglobulin); g178019 (Human cytokine (SCYA2) gene); g1304482 (Human tissue inhibitor of metalloprotein); g886049 (Human Ich-2 cysteine protease); g189177 (Human nuclear factor kappa-B DNA binding protein); g37983 (Human mRNA of X-CGD gene); g178083 (Human adenylyl cyclase-associated protein); g29899 Human mRNA for c-fms proto-oncogene); and g36606 (Human spermidine/spermine N1-acetyltransferase). The 9th and the 18th rows contained controls derived from Drosophila melonogaster and Arabidopsis thaliana.
 An experiment was performed to measure gene expression in A. THP1 cells and B. THP1 cells first treated with 100 ng/ml PMA (phorbol ester myristate) for 48 hours and then treated with 1 microgram/ml LPS (liposacharride) for 48 hours. THP1 cells were obtained for the American Type Culture Collection. Treated and untreated cells were lysed, mRNA was purified using oligo d(T) chromatography. The mRNA was labeled by adding oligo(dT)20 (5 micrograms) to 2 micrograms mRNA and heating the reaction to 70° C. Hybridization was performed with 5×SSC/0.2% SDS for six (6) hours at 60° C. The slides were washed with five (5) minutes in 1×SSC/0.2% SDS at 30° C. followed by five (5) minutes in 0.1×SSC/0.2 SDS at room temperature followed by 30 minutes in 0.1×SCC at room temperature. Signals were detected using a custom made confocal fluorescence microscope.
 Gene expression analysis was performed to identify gene sequences which were expressed at higher levels in treated THP1 cells rather than untreated THP1 cells. FIG. 1 shows the gene expression pattern observed in untreated THP1 cells. FIG. 2 shows the gene expression pattern observed in treated THP1 cells. By a comparision of both expression patterns, those genes differentially expressed in treated THP1 cells can be identified. For example, genes that are highly expressed in treated THP1 cells include the macrophage inflammatory protein gene, the cytokine (SCYA2) gene, and beta-thromboglobulin gene among others.
1. A composition comprising a plurality of polynucleotide probes, wherein said plurality of polynucleotide probes comprise:
- I) first polynucleotide probes, wherein each of said first polynucleotide probes comprises at least a portion of a gene differentially expressed in an immunological response;
- II) second polynucleotide probes, wherein each of said second polynucleotide probes comprises at least a portion of a gene abundantly expressed in an immunological response; and
- III) third polynucleotide probes, wherein each of said third polynucleotide probes comprises at least a portion of a gene coding for a polypeptide known to regulate blood cell biology.
2. The composition of claim 1, wherein each of said first polynucleotide probes is selected by a first method comprising:
- a) preparing at least one first target transcript profile from a first biological sample selected from the group consisting of hematopoietic cells and inflamed tissue and at least one first subtraction transcript profile from a noninflamed, nonhematopoietic biological sample;
- b) subtracting said first subtraction transcript profile from said first target profile to detect a plurality of genes that are differentially expressed in an immunological response; and
- c) identifying one of said detected genes that are differentially expressed in an immunological response.
3. The composition of claim 1, wherein each of said second polynucleotide probes is selected by a second method comprising:
- a) preparing at least one second target transcript profile from a second biological sample selected from the group consisting of hematopoietic cells and inflamed tissue to detect genes that are abundantly expressed in said second biological sample; and
- b) identifying one of said detected genes that are abundantly expressed.
4. The composition of claim 1, wherein each of said third polynucleotide probes is selected by a third method comprising identifying a gene coding for a polypeptide with a known function in immunological responses.
5. An isolated polynucleotide selected from the group consisting of SEQ ID NOs:1-1508, or the complete complement thereof.
6. The polynucleotide of claim 5, wherein said polynucleotide is immobilized on a substrate.
7. The polynucleotide of claim 5, wherein said polynucleotide is a hybridizable element on a microarray.
8. A composition comprising the polynucleotide of claim 5 and a labeling moiety.
9. A method for using a polynucleotide to detect expression of a nucleic acid in a sample comprising:
- a) hybridizing the composition of claim 8 to nucleic acids of the sample under conditions to form at least one hybridization complex; and
- b) detecting hybridization complex formation, wherein complex formation indicates expression of the nucleic acid in the sample.
10. The method of claim 9 further comprising amplifying the nucleic acids of the sample prior to hybridization.
11. The method of claim 10 wherein complex formation is compared to a standard and is diagnostic of an immunopathological condition.
International Classification: C12Q001/68; C07H021/04;