Microarray probe Tm matching by selective destabilization

Info

Publication number: 20070087370
Type: Application
Filed: Oct 20, 2006
Publication Date: Apr 19, 2007
Inventor: Bo Curry (Redwood City, CA)
Application Number: 11/584,110

Abstract

Methods and systems for designing oligonucleotide probes for use in microarray applications are provided herein. The described methods use duplex melting temperature (Tm) matching to destabilize the hybridization oligonucleotide probes to non-target sequences as compared to a target nucleotide sequence. Nucleic acid arrays containing probes selected by the described methods are also described.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This Application is a continuation-in-part of, and claims priority to, U.S. patent application Ser. No. 10/996,323, filed Nov. 23, 2004, and now published as US 2006/0110744.

BACKGROUND

Comparative genomic hybridization (CGH) and location analysis are important applications, which allow scientists to improve their understanding of the expression and regulation of genes in biological systems. Both CGH and location analysis entail quantifying or measuring changes in copy number of genomic sequences. CGH is particularly important in developmental biology as well as the causes of cancer and offers great potential in the diagnostics of cancer and developmental diseases. Recently, cDNA microarrays have been used for CGH studies. An oligo-array based approach has several substantial advantages over other technologies, in that it allows the designer to position the probes anywhere within the genomic or polynucleotide sequence of interest. The probes can be placed at whatever density is commensurate with the real-estate or area available on the microarray (in terms of number of features) and the genomic regions of interest can be evaluated by analyzing the hybridization of target sequences to the surface-bound probes. The oligonucleotide probe approach also offers the flexibility of focusing in on regions within exons or introns of expressed sequences, or intergenic regions and regulatory regions for location analysis, as well as any desirable admixture of the aforementioned.

The oligonucleotide probe-based approach requires hybridizing many thousands of probe-target hybrids under uniform conditions, and this requirement is a known source of error in microarray measurements. The discrimination between a target sequence of interest and competing sequences in a sample is greatest when the hybridization conditions are such that the hybrid formed between the probe and the desired target is stable, while hybrids between the probe and competing sequences are melted off. Probes are designed to maximize the differential stability between the hybrids with the desired targets and with competing sequences. The assay conditions must be chosen such that the melting point (T_m) of the desired hybrid is above the temperature of the assay, and the T_mfor competing hybrids is below assay temperature. This is difficult to achieve for thousands of probe-target hybrid pairs, but can be addressed through various methods of T_m-matching.

For CGH arrays, where appropriate T_m-matched probes cannot always be found, it is usual to destabilize the more tightly bound hybrids (high T_mprobes) so as to reduce their Tm to equal those of less tightly bound hybrids (low T_mprobes). Typical methods include introducing arbitrarily selected mismatches or deletions into the probe sequence, or shortening the length of the probes. Such methods are effective for short probes (12-24 mers), but less effective for the 60-mer or longer probes used in microarray applications. The probe sequence shortening, or the mismatches in the sequence, may not correspond or coincide with the subsequence in the region of interest that hybridizes with the competing sequence. As a result, the modification of the probe (via truncation, deletion or substitution) destabilizes the target hybrid more than the competing hybrid, and fails to accomplish its purpose.

SUMMARY

This disclosure is directed to methods for designing and/or modifying a target-specific oligonucleotide probe. The methods as described herein provide for modifications of target-specific probes by substituting or deleting at least one nucleotide in a region of the probe that is complementary to a region of the target sequence that has the most homology with a non-target sequence. In some embodiments, the modification of the target specific oligonucleotide probe decreases the computed T_mof the hybridization of the probe to the target sequence. In some embodiments, the computed T_mof the target-specific oligonucleotide probe to at least one non-target sequence is also reduced. In some embodiments, the decrease in computed T_mof the modified probe may decrease the stability of non-target sequences that compete with the target sequence for binding to the oligonucleotide probe.

In some embodiments, the methods comprise identifying a target-specific oligonucleotide probe comprising a sequence complementary to a target nucleotide sequence and that has a computed T_mof about 65° C. or greater and modifying the target-specific oligonucleotide probe to decrease the T_mso that the modified target-specific probe hybridizes to at least one non-target sequence with a T_mthe same or lower than the computed T_mof the hybridization of the modified target-specific nucleotide probe to the target nucleotide sequence.

In aspects, the present description provides methods for modifying or designing target-specific oligonucleotide probes for microarray applications. Candidate probes with sequences complementary to a target region of interest are identified. Using a computerized search engine, the sequence of the entire genome is searched to find all sequences that can form stable hybrids with the candidate probes (i.e. sequences with homology to the candidate probes). The most homologous sequences are selected, and the candidate probes are modified by deletion or substitution of one or more nucleotides in the candidate probe sequence. The deletion or substitution destabilizes the hybrid pair formed between the candidate probe and the undesired sequences by reducing the T_mfor the hybrid pairs, below the computed T_mof the hybrid between the probe and the desired target sequence. In some embodiments, candidate probes are selected such that (a) the hybrid between the destabilized probe and the desired target is not melted at the chosen assay temperature, and (b) the hybrids between the probe and all undesired homologous targets are melted at the chosen assay temperature, and (c) the melting temperatures of the desired and undesired hybrids are as different as possible.

Algorithms for performing the described methods recorded on a computer-readable medium, as well as computational analysis systems that include the same are provided. Also provided are nucleic acid arrays with oligonucleotide probes selected according to the subject methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart generally depicting the methods described herein.

DETAILED DESCRIPTION

Various embodiments of the present invention will be described in detail with reference to the drawings, wherein like reference numerals represent like parts throughout the several views. Reference to various embodiments does not limit the scope of the invention, which is limited only by the scope of the claims attached hereto. Additionally, any examples set forth in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the claimed invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Although any methods, devices and material similar or equivalent to those described herein can be used in practice or testing, the methods, devices and materials are now described.

All publications and patent applications in this specification are indicative of the level of ordinary skill in the art and are incorporated herein by reference in their entireties.

In this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural reference, unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art.

Definitions

The terms “nucleic acid” and “polynucleotide” are used interchangeably herein to describe a polymer of any length, e.g., greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, usually up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, or compounds produced synthetically (e.g., PNA as described in U.S. Pat. No. 5,948,902 and the references cited therein) which can hybridize with naturally occurring nucleic acids in a sequence specific manner analogous to that of two naturally occurring nucleic acids, e.g., can participate in Watson-Crick base pairing interactions. The term “hybrid” refers to a double-stranded nucleic acid molecule formed by hybridization between complementary nucleotides. The terms “hybrid” and “hybrid pair” are used interchangeably herein.

The term “complementary,” “complement,” or “complementary nucleic acid sequence” refers to the nucleic acid strand that is related to the base sequence in another nucleic acid strand by the Watson-Crick base-pairing rules. In general, two sequences are complementary when the sequence of one can bind to the sequence of the other in an anti-parallel sense wherein the 3′-end of each sequence binds to the 5′-end of the other sequence and each A, T(U), G, and C of one sequence is then aligned with a T(U), A, C, and G respectively, of the other sequence. RNA sequences can also include complementary G/U or U/G basepairs.

The terms “ribonucleic acid” and “RNA” as used herein mean a polymer composed of ribonucleotides.

The terms “deoxyribonucleic acid” and “DNA” as used herein mean a polymer composed of deoxyribonucleotides.

The term “oligonucleotide” as used herein denotes single stranded nucleotide multimers of from about 10 to 100 nucleotides and up to 200 nucleotides in length. Oligonucleotides are usually synthetic and, in many embodiments, are under 50 nucleotides in length.

The term “oligomer” is used herein to indicate a chemical entity that contains a plurality of nucleotide monomers, i.e., a nucleotide multimer. As used herein, the terms “oligomer” and “polymer” are used interchangeably, as it is generally, although not necessarily, smaller “polymers” that are prepared using the functionalized substrates of the invention, particularly in conjunction with combinatorial chemistry techniques. Examples of oligomers and polymers include polydeoxyribonucleotides (DNA), polyribonucleotides (RNA), other nucleic acids that are C-glycosides of a purine or pyrimidine base, polypeptides (proteins), polysaccharides (starches, or polysugars), and other chemical entities that contain repeating units of like chemical structure.

The term “sample” as used herein relates to a material or mixture of materials, typically, although not necessarily, in fluid form, containing one or more components of interest. Samples include, but are not limited to, biological samples obtained from natural biological sources, such as cells or tissue. The samples may also be derived from tissue biopsies and other clinical procedures.

The terms “nucleotide” and “nucleotide” are intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the terms “nucleotide” and “nucleotide” include those moieties that contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides or nucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.

The phrase “surface-bound polynucleotide” refers to a polynucleotide that is immobilized on a surface of a solid substrate, where the substrate can have a variety of configurations, e.g., a sheet, bead, or other structure. In certain embodiments, the collections of oligonucleotide probe elements employed herein are present on a surface of the same planar support, e.g., in the form of an array.

The phrase “labeled population of nucleic acids” refers to a mixture of nucleic acids that are detectably labeled, e.g., fluorescently labeled, such that the presence of the nucleic acids can be detected by assessing the presence of the label. A labeled population of nucleic acids is often “made from” a biological DNA sample.

The term “array” encompasses the term “microarray” and refers to an ordered array presented for binding to nucleic acids and the like. An “array,” includes any two-dimensional or substantially two-dimensional (as well as a three-dimensional) arrangement of spatially addressable regions bearing nucleic acids, particularly oligonucleotides or synthetic mimetics thereof, and the like. Where the arrays are arrays of nucleic acids, the nucleic acids may be adsorbed, physisorbed, chemisorbed, or covalently attached to the arrays at any point or points along the nucleic acid chain.

In those embodiments where an array includes two or more features immobilized on the same surface of a solid support, the array may be referred to as addressable. An array is “addressable” when it has multiple regions of different moieties (e.g., different oligonucleotide sequences) such that a region (i.e., a “feature” or “spot” of the array) at a particular predetermined location (i.e., an “address”) on the array will detect a particular sequence. Array features are typically, but need not be, separated by intervening spaces. In the case of an array in the context of the present application, the “population of labeled nucleic acids” will be referenced as a moiety in a mobile phase (typically fluid), to be detected by “surface-bound polynucleotides” which are bound to the substrate at the various regions. These phrases are synonymous with the arbitrary terms “target” and “probe”, or “probe” and “target”, respectively, as they are used in other publications.

A “scan region” refers to a contiguous (preferably, rectangular) area in which the array spots or features of interest, as defined above, are found or detected. Where fluorescent labels are employed, the scan region is that portion of the total area illuminated from which the resulting fluorescence is detected and recorded. Where other detection protocols are employed, the scan region is that portion of the total area queried from which resulting signal is detected and recorded. For the purposes of this invention and with respect to fluorescent detection embodiments, the scan region includes the entire area of the slide scanned in each pass of the lens, between the first feature of interest, and the last feature of interest, even if there are intervening areas that lack features of interest.

The term “substrate” as used herein refers to a surface upon which marker molecules or probes, e.g., an array, may be adhered. Glass slides are the most common substrate for biochips, although fused silica, silicon, plastic, flexible web and other materials are also suitable.

An “array layout” refers to one or more characteristics of the features, such as feature positioning on the substrate, one or more feature dimensions, and an indication of a moiety at a given location. “Hybridizing” and “binding”, with respect to nucleic acids, are used interchangeably. The terms “hybridizing,”“hybridizing specifically to,” and “specific hybridization” as used herein, refer to the binding, duplexing, or hybridizing of a nucleic acid molecule preferentially to a particular nucleotide sequence under stringent conditions.

The term “stringent assay conditions” as used herein refers to conditions that are compatible to produce binding pairs of nucleic acids, e.g., probes and targets, of sufficient complementary to provide for the desired level of specificity in the assay while being incompatible to the formation of binding pairs between binding members of insufficient complementary to provide for the desired specificity. The term stringent assay conditions refers to the combination of hybridization and wash conditions.

A “stringent hybridization” and “stringent hybridization wash conditions” in the context of nucleic acid hybridization (e.g., as in array, Southern or Northern hybridizations) are sequence dependent, and are different under different environmental parameters. Stringent hybridization conditions that can be used to identify nucleic acids within the scope of the invention can include, e.g., hybridization in a buffer comprising 50% formamide, 5×SSC, and 1% SDS at 42° C., or hybridization in a buffer comprising 5×SSC and 1% SDS at 65° C., both with a wash of 0.2×SSC and 0.1% SDS at 65° C. Exemplary stringent hybridization conditions can also include a hybridization in a buffer of 40% formamide, 1 M NaCl, and 1% SDS at 37° C., and a wash in 0.1×SSC at 45° C. Alternatively, hybridization to filter-bound DNA in 0.5 M NaHPO₄, 7% sodium dodecyl sulfate (SDS), 1 mnM EDTA at 65° C., and washing in 0.1×SSC/0.1% SDS at 68° C. can be employed. Yet additional stringent hybridization conditions include hybridization at 60° C. or higher and 3×SSC (450 mM sodium chloride/45 mM sodium citrate) or incubation at 42° C. in a solution containing 30% formamide, 1M NaCl, 0.5% sodium sarcosine, 50 mM MES, pH 6.5. Those of ordinary skill will readily recognize that alternative but comparable hybridization and wash conditions can be utilized to provide conditions of similar stringency.

In certain embodiments, the stringency of the wash conditions determine whether a nucleic acid is specifically hybridized to a probe. Wash conditions used to identify nucleic acids may include, e.g.: a salt concentration of about 0.02 M at pH 7 and a temperature of about 20° C. to about 40° C.; or, a salt concentration of about 0.15 M NaCl at 72° C. for about 15 minutes; or, a salt concentration of about 0.2×SSC at a temperature of about 30° C. to about 50° C. for about 2 to about 20 minutes; or, the hybridization complex is washed twice with a solution with a salt concentration of about 2×SSC containing 1% SDS at room temperature for 15 minutes and then washed twice by 0.1×SSC containing 0.1% SDS at 37° C. for 15 minutes; or, equivalent conditions. Stringent conditions for washing can also be, e.g., 0.2×SSC/0.1% SDS at 42° C. See Sambrook, Ausubel, or Tijssen (cited below) for detailed descriptions of equivalent hybridization and wash conditions and for reagents and buffers, e.g., SSC buffers and equivalent reagents and conditions.

A specific example of stringent assay conditions is rotating hybridization at 65° C. in a salt based hybridization buffer with a total monovalent cation concentration of 1.5M (e.g., as described in U.S. patent application Ser. No. 09/655,482 filed on Sep. 5, 2000, the disclosure of which is herein incorporated by reference) followed by washes of 0.5×SSC and 0.1×SSC at room temperature.

Stringent hybridization conditions may also include a “prehybridization” of aqueous phase nucleic acids with complexity-reducing nucleic acids to suppress repetitive sequences. For example, certain stringent hybridization conditions include, prior to any hybridization to surface-bound polynucleotides, hybridization with Cot-1 DNA, or the like.

Stringent assay conditions are hybridization conditions that are at least as stringent as the above representative conditions, where a given set of conditions are considered to be at least as stringent if substantially no additional binding complexes that lack sufficient complementary to provide for the desired specificity are produced in the given set of conditions as compared to the above specific conditions, whereby “substantially no more” is meant less than about 5-fold more, typically less than about 3-fold more. Other stringent hybridization conditions are known in the art and may also be employed, as appropriate.

The term “mixture”, as used herein, refers to a combination of elements, that are interspersed and not in any particular order. A mixture is heterogeneous and not spatially separable into its different constituents. Examples of mixtures of elements include a number of different elements that are dissolved in the same aqueous solution, or a number of different elements attached to a solid support at random or in no particular order in which the different elements are not especially distinct. In other words, a mixture is not addressable. To be specific, an array of surface-bound polynucleotides, as is commonly known in the art and described below, is not a mixture of capture agents because the species of surface-bound polynucleotides are spatially distinct and the array is addressable.

“Isolated” or “purified” generally refers to isolation of a substance (compound, polynucleotide, protein, polypeptide, polypeptide, chromosome, etc.) such that the substance comprises the majority percent of the sample in which it resides. Typically in a sample a substantially purified component comprises 50%, preferably 80%-85%, more preferably 90-95% of the sample. Techniques for purifying polynucleotides, polypeptides and intact chromosomes of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography, sorting, and sedimentation according to density.

The terms “assessing” and “evaluating” are used interchangeably to refer to any form of measurement, and include determining if an element is present or not. The terms “determining,” “measuring,” and “assessing,” and “assaying” are used interchangeably and include both quantitative and qualitative determinations. Assessing may be relative or absolute. “Assessing the presence of” includes determining the amount of something present, as well as determining whether it is present or absent.

The term “using” has its conventional meaning, and, as such, means employing, e.g., putting into service, a method or composition to attain an end. For example, if a program is used to create a file, a program is executed to make a file, the file usually being the output of the program. In another example, if a computer file is used, it is usually accessed, read, and the information stored in the file employed to attain an end. Similarly if a unique identifier, e.g., a barcode is used, the unique identifier is usually read to identify, for example, an object or file associated with the unique identifier.

A “target-specific oligonucleotide probe” means a polynucleotide which can specifically hybridize to a target nucleotide, either in solution or as a surface-bound polynucleotide. A “non-target sequence” is a sequence sufficiently related (i.e. complementary) to the target nucleic acid sequence, such that the non-target nucleotide sequence probe interferes with the hybridization of the target-specific oligonucleotide probe with the target nucleotide sequence in the region of interest.

The term “T_m” refers to the melting temperature of two oligonucleotides which have formed a duplex structure (i.e. either hybrids of perfectly matched sequences, or hybrids of sequences containing mismatches or deletions). Computed T_mcan be calculated using available software and methods known to the art. In some embodiments, the duplex T_mis measured empirically, by measuring the degree of hybridization at various temperatures.

Approaches and Methods for Target-Specific Probe Selection

The present methods provide alternative and novel methods and systems for designing or modifying target-specific probes for analysis such as CGH and location analysis in microarray applications that overcome the drawbacks of existing microarray probe selection techniques. General methods that utilize probe/target hybridization experiments and/or unique data analysis techniques to identify and select nucleotide probe(s) targeting polynucleotide fragments in a region of interest are described in U.S. Patent Publication No. 2006/0110744. The methods described herein provide for a more efficient design or modification of target-specific probes, thereby reducing to a minimum the number of such probes that are utilized in analyzing a target sequence within a region of interest.

The present description provides methods, systems and computer readable media for identifying and modifying nucleic acid probes for detecting a target with a nucleic acid probe array or microarray. In some embodiments, the methods comprise, in general terms: the selection of genomic nucleotide ranges of interest, determining appropriate target sequences for CGH and/or location analysis, generating candidate probes specific for the target sequences and analyzing candidate probes for specific probe properties by computational and/or experimental processes to optimize probe selection and reduce the number of probes to a value appropriate for placement on a microarray. The description also provides microarrays comprising probes designed or modified by the methods described herein. The microarrays comprise a solid support and a plurality of surface bound probes, the surface bound probes having very similar thermodynamic properties as well as similar GC content. In some embodiments, more specifically, a large portion of the probes utilized in the microarrays of the invention, have duplex melting temperatures (T_m) which are within a narrow temperature range compared to the T_mrange of possible genomic probes, such as, for example, genomic probes selected at random for a particular application.

The methods provided herein are particularly useful with comparative genome hybridization microarrays, such as microarrays based on the human or mouse genome. These methods permit more cost-effective and efficient identification of gene regions or sections which can be associated with human disease, points of therapeutic intervention, and potential toxic side-effects of proposed therapeutic entities.

The methods as described herein are methods for designing or modifying a target-specific oligonucleotide probe to increase the specific hybridization of the probe to the target sequence by choosing or designing probes that discriminate between the target nucleotide sequences and competing non-target sequences. In some embodiments, a method comprises designing or modifying a target-specific oligonucleotide probe comprising identifying a target specific oligonucleotide probe comprising a sequence complementary to a target nucleotide sequence of interest. In some embodiments, the target specific oligonucleotide and the target nucleotide sequence pair has a computed T_mof at least 65° C. or greater. A method further comprises modifying the sequence of the identified target-specific oligonucleotide probe to decrease the computed T_mso that the modified target-specific oligonucleotide probe hybridizes to at least one non-target sequence with a computed T_mlower than the computed T_mof the hybridization of the modified target specific oligonucleotide probe to the target nucleotide sequence.

Identifying Target-Specific Oligonucleotide Probes

Target-specific oligonucleotide probes can be designed or selected by first identifying a target nucleotide sequence of interest. Target nucleotide sequences of interest include genomic nucleic acid sequences, RNA sequences from a cell, a particular gene, one or more regions of a chromosome, and the like.

Candidate target-specific probe sequences can be identified from a plurality of candidate probe sequences by searching sequences with a high homology to the target sequence of interest and identifying probes that can hybridize to the target sequence as well as one or more homologous non-target sequences. In some embodiments, a candidate target-specific oligonucleotide probe hybridizes or is complementary to the target sequence and at least two non-target sequences. In some embodiments, candidate probes that are too closely homologous to the non-target sequences, for example, greater than 50% sequence identity to one or more non-target sequences are excluded. If a candidate target-specific probe sequence has a higher than desired computed T_mor higher homology than desired to non-target sequences, the probe can be designed or modified using the methods described herein.

As indicated in operation 102 in FIG. 1, a computerized algorithm can be used to search all sequences homologous to the candidate probe. In an aspect, homology algorithms can be used to interrogate known gene databases for naturally occurring sequences that are closest to the original (or candidate) probe sequence. Known homology algorithms or search engines that can be used with the present methods include BLAST (from NCBI, see F. Altschul et al., J. Mol. Biol. 215:403-10 (1990)), MegaBLAST (a variation on the BLAST search engine), BLAT, etc. A number of other homology-based algorithms are also known, such as thermodynamically-scored homology programs, for example, as described in U.S. Pat. No. 5,556,749.

In embodiments, the methods described herein use a homology search engine or algorithm or database that returns sequences with the lowest number of mismatched nucleotides against the candidate probe being subjected to T_m-matching. In an aspect, priority is given to those sequences within reasonably homologous regions or strains of the same or similar genomes. The search may also compare sequences based on specific factors, including thermodynamic factors, such as free energy. In an aspect, the search comprises homologous sequences with a computed T_msubstantially the same as the predetermined T_mfor the hybrid pair formed between the candidate probe and the target sequence.

Within the complex sample mixture, there may be nucleotides or nucleotide sequences that are most likely to form hybrids with the candidate probes (i.e. most likely to interfere or compete with probe-target hybridization), and thereby cause problems in probe design. In an aspect, using a homology search engine to identify sequences with homology to the candidate probe is advantageous, because all potential sequences within sequenced genomes can be identified. This allows a searcher to take into account all sequences that would potentially occur in an actual biological sample.

Various homology scoring mechanisms can be used to evaluate whether a particular competitor sequence is sufficiently homologous to the candidate probe. These include, without limitation, symbolic match score (score is based on the number of identical bases in a position-by-position comparison between the sequence of interest and a putative homologous sequence), ungapped BLAST score, gapped blast scores, thermodynamic scores, and score thresholds, for example. In aspects, duplex melting temperature or T_mis used to determine homology. In embodiments, sequences are selected based on the homology score. In an aspect, where symbolic match scores, or BLAST scores are used, the sequences with the highest score are selected. Where T_mis used, sequences that show T_mdifferences of about less than 30° C. are selected for further analysis.

In some embodiments, candidate target-specific oligonucleotide probes are identified from a plurality of candidate probe sequences by searching for sequence homology to the target sequence and then further filtered using one or more criteria. For comparative genomic hybridization and location analysis, the candidate target-specific probes are identified and filtered as described in U.S. Publication No. 2006/0110744 which is hereby incorporated by reference.

In some embodiments, the candidate oligonucleotide probes have a nucleotide length in the range of at least 25 to about 200 nucleotides. In some embodiments, the length of probes range from 30 to 100 nucleotides. In some embodiments, at least 50% of the nucleotide probes on the solid support have the same length and the length may be about 60 nucleotides. In some embodiments, candidate probes may have a subsequence less that the full length sequence that has a greater degree of homology to the target sequence and/or one or more target sequences. The subsequences can range from about 15 to 190 nucleotides, about 25 to 150 nucleotides, or about 25-55 nucleotides.

In some embodiments, a candidate target-specific oligonucleotide probe has a computed T_mof 65° C. or greater to the target nucleotide sequence. In some embodiments, a candidate probe may have a higher than desired T_mdue to a subsequence that has a higher percentage of GC content, for example, 40% or greater GC content over at least 15-25 nucleotides of the probe. Subsequences of higher GC % content can serve as nucleation sites for non-specific binding to non-target nucleotide sequences. In some embodiments, a candidate probe has a T_mfor the target sequence of 65 to 95° C. In other embodiments, a candidate target-specific oligonucleotide probe has a T_mof 70 to 95° C., about 75 to 95° C., about 80 to 95° C., or about 85° C. or greater.

In a method of the disclosure, the sequence of a target-specific oligonucleotide probe is designed or modified to decrease computed T_mto the target nucleotide sequence so that the modified target-specific oligonucleotide probe hybridizes to at least one non-target nucleotide sequence with a computed T_mthe same as or lower than the computed T_mfor hybridization to the target nucleotide sequence. In some embodiments, the computed T_mof the modified or designed probes with at least one non-target sequence is 65° C. or less.

In some embodiments, designing or modifying the target-specific oligonucleotide probe further comprises identifying the at least one non-target nucleotide sequence that is homologous to the complement of the probe sequence, which is the same length as the region of the target sequence that is complementary to the probe sequence. At least one nucleotide in the target-specific oligonucleotide probe is deleted or substituted in a region of the target-specific oligonucleotide probe that is complementary to or hybridizes to a sequence of the non-target sequence that has the highest homology to the target nucleotide sequence to form a first modified probe. In some embodiments, the sequence of highest homology is a region or subsequence of the non-target sequence. In some embodiments, the non-target sequence is longer than the target sequence, the same length as the target sequence or shorter than the target sequence. The % homology between target and at least one non-target nucleotide sequence can be determined using standard methods, such as BLAST. When the target and non-target sequence are the same length, the region of highest homology between them can include about 15 nucleotides up to the full-length sequence.

In some embodiments, a subsequence is at least about 15 nucleotides, about 20 nucleotides, or about 25 nucleotides or greater. In some embodiments, the subsequence can have a % GC content of at least 40% to about 100%, 50% to 100%, 60 to 100%, 70 to 100%, 80 to 100%, or 90 to 100%. In some embodiments, the at least one nucleotide that is deleted or substituted is located in the middle of the subsequence or the non-target sequence. In other embodiments, the at least one nucleotide is substituted to eliminate a GC pair. In some embodiments, that at least one nucleotide to be substituted or deleted in the target nucleotide is complementary to both the target nucleotide sequence and the at least one non-target sequence. The nucleotide can be substituted or deleted. More than one nucleotide in the target-specific oligonucleotide probe can be substituted or deleted until the desired number of non-target sequences have a computed T_mlower than the computed T_mfor the target sequence.

In some embodiments of a method, the target-specific oligonucleotide can be designed or modified to decrease the computed T_mto at least one non-target sequence to the same or lower than the computed T_mf to the target sequence. Decreasing the T_mto non-target sequences increases the specific hybridization of the probe to the target sequence by decreasing the number of non-target sequences that compete with the target sequence for hybridization to the probe.

In some embodiments, the first modified probe may be further modified by substituting or deleting at least one nucleotide in a region of the first modified probe that is complementary to region of a second non-target sequence that has the second most homology to the target nucleotide sequences. For example, once a target nucleotide sequence is identified, the sequence can be used to search for and identify homologous sequences that form a plurality of target specific candidate probe sequences that are complementary to the target and non-target sequences. The plurality of homologous sequences include both target and non-target sequences. The plurality of homologous sequences comprise sequences that have at least 50%, 60%, 70%, 80%, 90%, 95% or 100% sequence identity to the target nucleotide sequence. In some embodiments, the region of greatest homology between a target and a non-target sequence is at least 15 nucleotides to 200 nucleotides, 20 nucleotides to 100 nucleotides, or 25 to 50 nucleotides. Such searching for sequence identity can be conducted using methods and databases known in the art. From the plurality of candidate probe sequences, candidate probes can be designed or modified to have optimal hybridization to the target sequence while minimizing hybridization to non-target sequences that might be in the sample. In some embodiments, the probe is designed or modified so at least one nucleotide in a region of the probe that is complementary to that of a region of high homology between at least one or two non-target sequences and the target sequence is substituted or deleted.

In some embodiments, a method further comprises identifying a second non-target sequence that has the second most homology to the target nucleotide sequence and modifying the first modified target-specific oligonucleotide probe by substituting or deleting at least one nucleotide in the probe sequence that is complementary to a region of the second non-target sequence that has the second most homology to the target nucleotide sequence to form a second modified probe. The second modified probe has a computed T_mto the second non-target nucleotide sequence that is the same or lower than the computed T_mof the second modified probe to the target nucleotide sequence.

Multiple modifications can be made in accord with the method described above until the desired number of non-target sequences no longer hybridize to the target-specific oligonucleotide under conditions of the assay or have a lower computed T_mto the designed or modified probe. Typically, that means that the computed T_mof the target-specific oligonucleotide to at least one of the non-target nucleotide sequence is less than 65° C. In some embodiments, the computed T_mof the target specific oligonucleotide probe to the target nucleotide sequence is at least 65° C. or greater, 65 to 95° C., 70 to 95° C., about 75 to 95° C., 80 to 95° C., or 85° C. or greater. In some embodiments of the method described above, modifications are made until the computed T_mof the target-specific oligonucleotides probe to at least one non-target nucleotide sequence decreases at least 1° C., 2° C., 3° C., 4° C. or 5° C. as compared to the computed T_mto the target sequence. In some embodiments, the computed T_mto at least one non-target sequence is decreased at least 1 to about 25° C., 1 to about 20° C., 1 about 15° C., 1 to about 10° C., or 1 to about 5° C. as compared to that of the computed T_mto the target nucleotide sequence.

In some embodiments, a method further comprises identifying at least one homologous nucleotide sequence to the first modified target-specific oligonucleotide probe and identifying a second non-target sequence that has the most homology to the complement of the first modified target-specific oligonucleotide probe. The method further comprises modifying the first modified probe by substituting or deleting at least one nucleotide in the sequence of the first modified probe that is complementary to a region of the second non-target sequence that has the most homology to the complement of the first modified probe to form a second modified probe having a computed T_mfor hybridization to the second non-target sequence that is the same or lower than that of the T_mto the target nucleotide sequence. Multiple modifications to the target-specific oligonucleotide probe can be made until the desired T_mto at least one non-target sequence is obtained as described above.

Arrays

The present description also provides nucleic acid microarrays produced using the subject methods, as described herein. The subject arrays include at least two distinct nucleic acids that differ by monomeric sequence immobilized on, e.g., covalently to, different and known locations on the substrate surface. In certain embodiments, each distinct nucleic acid sequence of the array is typically present as a composition of multiple copies of the polymer on the substrate surface, e.g., as a spot on the surface of the substrate. The number of distinct nucleic acid sequences, and hence spots or similar structures, present on the array may vary, but is generally at least 2, usually at least 5 and more usually at least 10, where the number of different spots on the array may be as a high as 50, 100, 500, 1000, 10,000 or higher, depending on the intended use of the array. The spots of distinct polymers present on the array surface are generally present as a pattern, where the pattern may be in the form of organized rows and columns of spots, e.g., a grid of spots, across the substrate surface, a series of curvilinear rows across the substrate surface, e.g., a series of concentric circles or semi-circles of spots, and the like. The density of spots present on the array surface may vary, but will generally be at least about 10 and usually at least about 100 spots/cm², where the density may be as high as 10⁶or higher, but will generally not exceed about 10⁵spots/cm². In other embodiments, the polymeric sequences are not arranged in the form of distinct spots, but may be positioned on the surface such that there is substantially no space separating one polymer sequence/feature from another. An exemplary array is described in U.S. Patent Publication No. 20050095596, which is incorporated herein by reference.

Arrays can be fabricated using drop deposition from pulsejets of either polynucleotide precursor units (such as monomers) in the case of in situ fabrication, or the previously obtained polynucleotide. Such methods are described in detail in, for example, the previously cited references including U.S. Pat. Nos. 6,242,266, 6,232,072, 6,180,351, 6,171,797, 6,323,043, U.S. patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren et al., and the references cited therein. These references are incorporated herein by reference. Other drop deposition methods can be used for fabrication, as previously described herein.

A feature of the subject arrays is that they include one or more, usually a plurality of, oligonucleotide probes. The oligonucleotide probes selected according to the subject methods are suitable for use in a plurality of different gene expression or genomic microarray applications.

In using an array, the array will typically be exposed to a sample (for example, a fluorescently labeled analyte, such as a sample containing genomic DNA) and the array then read. Reading of the array may be accomplished by illuminating the array and reading the location and intensity of resulting fluorescence at each feature of the array to detect any binding complexes on the surface of the array. For example, a scanner may be used for this purpose that is similar to the AGILENT MICROARRAY SCANNER available from Agilent Technologies, Palo Alto, Calif. Other suitable apparatus and methods are described in U.S. patent applications: Ser. No. 09/846,125 “Reading Multi-Featured Arrays” by Dorsel et al.; and Ser. No. 09/430,214 “Interrogating Multi-Featured Arrays” by Dorsel et al. As previously mentioned, these references are incorporated herein by reference. However, arrays may be read by any other method or apparatus than the foregoing, with other reading methods including other optical techniques (for example, detecting chemiluminescent or electroluminescent labels) or electrical techniques (where each feature is provided with an electrode to detect hybridization at that feature in a manner disclosed in U.S. Pat. No. 6,221,583 and elsewhere). Results from the reading may be raw results (such as fluorescence intensity readings for each feature in one or more color channels) or may be processed results such as obtained by rejecting a reading for a feature which is below a predetermined threshold and/or forming conclusions based on the pattern read from the array (such as whether or not a particular target sequence may have been present in the sample or an organism from which a sample was obtained exhibits a particular condition).

The results of the reading (processed or not) may be forwarded (such as by communication) to a remote location if desired, and received there for further use (such as further processing). By “remote location” is meant a location other than the location at which the array is present and hybridization occur. For example, a remote location could be another location (e.g. office, lab, etc.) in the same city, another location in a different city, another location in a different state, another location in a different country, etc. As such, when one item is indicated as being “remote” from another, what is meant is that the two items are at least in different buildings, and may be at least one mile, ten miles, or at least one hundred miles apart. “Communicating” information means transmitting the data representing that information as electrical signals over a suitable communication channel (for example, a private or public network). “Forwarding” an item refers to any means of getting that item from one location to the next, whether by physically transporting that item or otherwise (where that is possible) and includes, at least in the case of data, physically transporting a medium carrying the data or communicating the data. The data may be transmitted to the remote location for further evaluation and/or use. Any convenient telecommunications means may be employed for transmitting the data, e.g., facsimile, modem, internet, etc.

Designing a microarray involves determining the amount of “real estate” (number of probes) that is available for the final array. The array designer also determines the amount of probes or “real estate” to use for specified regulatory regions, intergenic regions as well the amount of probes necessary to adequately cover introns and exons of the chromosomes of interest. Initially, a designer will generate 20 to 40 million candidate probes and need to filter the probes for certain probe properties or parameters to obtain a final array with approximately 250,000 probes. Intermediate arrays are manufactured in some embodiments of the methods of the invention, which have a redundancy of 3 or 4 fold over the number of probes selected for the final array, these intermediate arrays are utilized to screen candidate probes for certain probe properties by direct or indirect experimentation.

Standard hybridization techniques (using high stringency hybridization conditions) are used to probe subject array. Suitable methods are described in references describing, for example, CGH techniques (Kallioniemi et al., Science 258:818-821 (1992) and WO 93/18186). Several guides to general techniques are available, e.g., Tijssen, Hybridization with Nucleic Acid Probes, Parts I and II (Elsevier, Amsterdam 1993). For a descriptions of techniques suitable for in situ hybridizations see, Gall et al. Meth. Enzymol. 21:470-480 (1981) and Angerer et al. in Genetic Engineering. Principles and Methods (Setlow and Hollander, eds.), vol. 7, pp. 43-65 (Plenum Press, New York 1985). See also U.S. Pat, Nos. 6,335,167; 6,197,501; 5,830,645; and 5,665,549; the disclosures of which are incorporated herein by reference.

In embodiments, the present description provides methods for selecting oligonucleotide probes that are specific to a target nucleic acid sequence within a region of interest. The probes are selected by a method that optimally discriminates between desired homologous target sequences and competing undesired sequences within a genome, transcription, or other known complex background, for example. Discrimination between the target of interest and competing sequences is optimal (i.e. most sensitive) when the hybridization conditions are such that the hybrid formed between the probe and the target sequence is stable, while hybrids between the probe and one or more competing sequences are more unstable. Such unstable sequences are melted off at the temperature at which the hybridization is performed. A hybrid pair with a melting temperature (T_m) less than the temperature of the stringent hybridization conditions of the assay (e.g. 65C) is considered unstable, and hybrid pairs melting above the temperature of the assay are stable. In many cases, however, the undesired targets form hybrid pairs with the probe which melt at too high a temperature (i.e. high T_m), and interfere with the assay. This occurs particularly in GC-rich (“hot”) regions of the genome, where in many cases no probes can be found with the desired probe-target T_m. One method of designing probes for targets in hot regions is to destabilize the probe-target hybrid by truncation, deletion of nucleotides, or mismatching nucleotides. It is important, however, to destabilize not only the desired probe-target hybrid, but also hybrids with undesired competing targets.

Systems

The methods described herein are carried out in part with the aid of a computer-based system, driven by software specific to the methods. A “computer-based system” refers to the hardware, software, and data storage used to analyze the information of the present disclosure. Typical hardware of the computer-based systems of the present disclosure comprises a central processing unit (CPU), input, output, and data storage. A skilled artisan can readily appreciate that any one of the currently available computer-based system are suitable for use in the present disclosure. The data storage means may comprise any manufacture comprising a recording of the present information as described above, or a memory access means that can access such a manufacture. In certain instances a computer-based system may include one or more wireless devices.

To “record” data, programming or other information on a computer-readable medium refers to a process for storing information on a recordable storage medium, using any such methods as known in the art. Examples include magnetic media such as hard drives, tapes, disks, and the like. Optical media can include CDs, DVDs, and the like. Any convenient data storage structure may be chosen, based on the means used to access the stored information. A variety of data processor programs and the formats can be used for storage, e.g., word processing text file, database format, etc.

A “processor” references any hardware and/or software combination that will perform the functions required of it. For example, any processor herein may be a programmable digital microprocessor such as available in the form of an electronic controller, mainframe, server or personal computer (desktop or portable). Where the processor is programmable, suitable programming can be communicated from a remote location to the processor, or previously saved in a computer program product (such as a portable or fixed computer readable storage medium, whether magnetic, optical or solid state device based). For example, a magnetic medium or optical disk may carry the programming, and can be read by a suitable reader communicating with each processor at its corresponding station.

In aspects, the methods described herein are performed using computer-readable media containing programming stored thereon implementing the subject methods. The computer-readable media may be, for example, in the form of a computer disk or CD, a floppy disk, a magnetic “hard card”, a server, or any other computer-readable media capable of containing data or the like, stored electronically, magnetically, optically or by other means. Accordingly, stored programming embodying steps for carrying out the subject methods may be transferred to a computer such as a personal computer (PC), (i.e. accessible by a researcher or the like), by physical transfer of a CD, floppy disk, or like medium, or may be transferred using a computer network, server, or any other interface connection, e.g., the Internet.

In an embodiment, the system described herein may include a single computer or the like with a stored algorithm capable of evaluating probe performance, as described herein, i.e. a computational analysis system that performs statistical regression analysis on a set of training data. In certain embodiments, the system is further characterized in that it provides a user interface, where the user interface presents to a user the option of selecting among one or more different, or multiple different inputs. For example, in the systems described herein, the user has the option of selecting various predictive parameters, such as composition factors, thermodynamic factors, kinetic factors, and mathematical combinations of such factors, as well as analogous parameters for the intended genomic targets. Computational systems that may be readily modified to become systems of the subject invention include those described in U.S. Pat. No. 6,251,588, the disclosure of which is incorporated herein by reference.

The various embodiments described above are provided by way of illustration only and should not be construed to limit the invention. Those skilled in the art will readily recognize various modifications and changes that may be made to the present invention without following the example embodiments and applications illustrated and described herein, and without departing from the true spirit and scope of the present invention without following the example embodiments and applications illustrated and described herein, and without departing from the true spirit and scope of the present invention, which is set forth in the following claims.

Claims

1. A method for designing a target-specific oligonucleotide probe comprising:

a) identifying a target-specific oligonucleotide probe comprising a sequence complementary to a target nucleotide sequence of interest and that has a computed Tm of about 65° C. or greater; and

b) modifying the sequence of the identified target-specific oligonucleotide probe to decrease the computed Tm so that the modified target-specific oligonucleotide probe hybridizes to at least one non-target nucleotide sequence with a computed Tm lower than the computed Tm of the hybridization of the modified target-specific oligonucleotide probe to the target nucleotide sequence.

2. The method of claim 1, wherein the target-specific oligonucleotide probe hybridizes to the target nucleotide sequence at a computed Tm at about 75° C. or greater.

3. The method of claim 1, wherein the target-specific oligonucleotide probe can hybridize to the target nucleotide sequence and one or more non-target nucleotide sequences.

4. The method of claim 1, wherein designing the target-specific oligonucleotide probe further comprises:

a) identifying the at least one non-target nucleotide sequence that is homologous to the complement of the probe sequence; and

b) deleting or substituting at least one nucleotide in a region of the target specific oligonucleotide probe that is complementary to a region of the non-target nucleotide sequence that has the highest homology to the target nucleotide sequence to form a first modified target-specific oligonucleotide probe having a computed Tm to the at least one non-target nucleotide sequence that is the same or lower than the computed Tm of first modified target-specific oligonucleotide probe to the target nucleotide sequence.

5. The method of claim 4, further comprising:

c) identifying a second non-target nucleotide sequence that has the second most homology to the target nucleotide sequence; and

d) modifying the first modified target-specific oligonucleotide probe by substituting or deleting at least one nucleotide in the sequence that is complementary to a region of the second non-target nucleotide sequence that has the second most homology to target nucleotide sequence to form a second modified probe having a computed Tm to the second non-target nucleotide sequence that is the same or lower than the computed Tm of the second modified probe to the target nucleotide sequence.

6. A method of claim 4, further comprising:

a) identifying at least one homologous nucleotide sequence to the first modified target-specific oligonucleotide probe and identifying a second non-target nucleotide sequence that has the most homology to the complement of the first modified target-specific oligonucleotide probe; and

b) modifying the first modified target-specific oligonucleotide probe by substituting or deleting at least one nucleotide in the sequence of the first modified probe that is complementary to a region of the second non-target sequence that has the most homology to the complement of the first modified target specific oligonucleotide probe to form a second modified probe having a computed Tm to the second non-target nucleotide sequence the same or lower than that of the computed Tm of the second modified probe to the target nucleotide sequence.

7. The method of claim 1, wherein the target-specific oligonucleotide probe is complementary to the target nucleotide sequence and at least two other non-target nucleotide sequences.

8. The method of claim 1, wherein at least one nucleotide is deleted.

9. The method of claim 1, wherein at least one nucleotide is substituted.

10. The method of claim 1, wherein the target-specific oligonucleotide probe is at least 25 nucleotides long.

11. The method of claim 7, wherein the target nucleotide sequence and the at least two other non-target nucleotide sequences have at least 80% homology over at least 25 nucleotides.

12. The method of claim 1, wherein steps a) and b) are repeated until a computed Tm of the modified target-specific oligonucleotide probe to at least one of the non-target sequences has a decrease in computed Tm of at least 1° C.

13. The method of claim 1, wherein the region of the non-target sequence that has the most sequence homology to the target sequence has a % GC content of at least 40%.

14. The method of claim 13, wherein at least one nucleotide is substituted to in the target-specific oligonucleotide probe to eliminate a GC pair.

15. The method of claim 1, wherein at least one nucleotide that is modified is located in the middle of the region of target-specific oligonucleotide probe that is complementary to the region of most homology between the target and non-target sequence.