Scanned image alignment systems and methods
Systems and methods for aligning scanned images are provided. A pattern is included in the scanned image so that when the image is convolved with a filter, a recognizable pattern is generated in the convolved image. The scanned image may then be aligned according to the position of the recognizable pattern in the convolved image. The filter may also act to remove the portions of the scanned image that do not correspond to the pattern in the scanned image.
Latest Affymetrix, Inc. Patents:
- Locus specific amplification using array probes
- Multiplex targeted amplification using flap nuclease
- Methods for identifying DNA copy number changes using hidden markov model based estimations
- Array-based methods for analysing mixed samples using differently labelled allele-specific probes
- Viterbi decoder for microarray signal processing
The present Application is a continuation of U.S. patent application Ser. No. 10/648,819, filed on Aug. 25, 2003, which is a continuation of U.S. patent application Ser. No. 09/542,151, filed on Apr. 4, 2000, now U.S. Pat. No. 6,611,767, issued on Aug. 26, 2003, which is a continuation of U.S. patent application Ser. No. 08/996,737, filed on Dec. 23, 1997, now U.S. Pat. No. 6,090,555, issued on Jul. 18, 2000, which claims the benefit of U.S. Provisional Application No. 60/069,032, filed on Dec. 11, 1997. U.S. patent application Ser. No. 10/648,819, filed on Aug. 25, 2003, is also a Continuation-in-Part of U.S. patent application Ser. No. 09/699,852, filed Oct. 30, 2000 (now U.S. Pat. No. 6,741,344), which is a Continuation of U.S. patent application Ser. No. 08/823,824, filed Mar. 25, 1997 (now U.S. Pat. No. 6,141,096), which is a Continuation of U.S. patent application Ser. No. 08/195,889, filed Feb. 10, 1994, (now U.S. Pat. No. 5,631,734). Each of the disclosures of the above-applications is incorporated by reference herein in its entirety.
COPYRIGHT NOTICEA portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the xerographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
SOFTWARE APPENDICESA Software Appendix of source code for an embodiment of the invention including two (2) sheets is included herewith.
BACKGROUND OF THE INVENTIONThe present invention relates to the field of image processing. More specifically, the present invention relates to computer systems for aligning grids on a scanned image of a chip including hybridized nucleic acid sequences.
Devices and computer systems for forming and using arrays of materials on a chip or substrate are known. For example, PCT applications WO92/10588 and 95/11995, both incorporated herein by reference for all purposes, describe techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed in arrays according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. Nos. 5,445,934, 5,384,261 and 5,571,639, each incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a chip. A labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file (also called a cell file) indicating the locations where the labeled nucleic acids are bound to the chip. Based upon the image file and identities of the probes at specific locations, it becomes possible to extract information such as the nucleotide or monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to genetic diseases, cancers, infectious diseases, HIV, and other genetic characteristics.
The VLSIPS.™. technology provides methods of making very large arrays of oligonucleotide probes on very small chips. See U.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 92/10092, each of which is incorporated by reference for all purposes. The oligonucleotide probes on the DNA probe array are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the “target” nucleic acid).
For sequence checking applications, the chip may be tiled for a specific target nucleic acid sequence. As an example, the chip may contain probes that are perfectly complementary to the target sequence and probes that differ from the target sequence by a single base mismatch. For de novo sequencing applications, the chip may include all the possible probes of a specific length. The probes are tiled on a chip in rows and columns of cells, where each cell includes multiple copies of a particular probe. Additionally, “blank” cells may be present on the chip which do not include any probes. As the blank cells contain no probes, labeled targets should not bind specifically to the chip in this area. Thus, a blank cell provides a measure of the background intensity.
In the scanned image file, a cell is typically represented by multiple pixels. Although a visual inspection of the scanned image file may be performed to identify the individual cells in the scanned image file. It would be desirable to utilize computer-implemented image processing techniques to align the scanned image file.
SUMMARY OF THE INVENTIONEmbodiments of the present invention provide innovative techniques for aligning scanned images. A pattern is included in the scanned image so that when the image is convolved with a filter, a recognizable pattern is generated in the convolved image. The scanned image may then be aligned according to the position of the recognizable pattern in the convolved image. The filter may also act to remove or “filter out” the portions of the scanned image that do not correspond to the pattern in the scanned image. Several embodiments of the invention are described below.
In one embodiment, the invention provides a computer-implemented method of aligning scanned images. The scanned image is convolved with a filter. The scanned image includes a first pattern that the filter will convolve into a second pattern in the convolved image. The scanned image is then aligned according to the position of the second pattern in the convolved image. In a preferred embodiment, the first pattern may be a checkerboard pattern that is convolved into a grid pattern in the convolved image.
In another embodiment, the invention provides a method of aligning scanned images of chips with hybridized nucleic sequences. A chip having attached nucleic acid sequences (probes) is synthesized, with the chip including a first pattern of nucleic acid sequences. Labeled nucleic acid sequences are hybridized to nucleic acid sequences on the chip and the hybridized chip is scanned to produce a scanned image. The scanned image is convolved with a filter that will convolve the first pattern into a second pattern in the convolved image. The scanned image is then aligned according to the position of the second pattern in the convolved image. In a preferred embodiment, the first pattern may be a checkerboard pattern that is generated by control nucleic acid sequences that hybridize to alternating squares in the checkerboard pattern.
Other features and advantages of the invention will become readily apparent upon review of the following detailed description in association with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Overview
In the description that follows, the present invention will be described in reference to preferred embodiments that utilize VLSIPS™ technology for making very large arrays of oligonucleotide probes on chips. However, the invention is not limited to images produced in this fashion and may be advantageously applied other hybridization technologies or images in other technology areas. Therefore, the description of the embodiments that follows for purposes of illustration and not limitation.
The system bus architecture of computer system 1 is represented by arrows 67. However, these arrows are illustrative of any interconnection scheme serving to link the subsystems. For example, a local bus could be utilized to connect the central processor to the system memory and display adapter. Computer system 1 shown in
The present invention provides methods of aligning scanned images or image files of hybridized chips including nucleic acid probes. In a representative embodiment, the scanned image files include fluorescence data from a biological array, but the files may also represent other data such as radioactive intensity, light scattering, refractive index, conductivity, electroluminescence, or large molecule detection data. Therefore, the present invention is not limited to analyzing fluorescence measurements of hybridization but may be readily utilized to analyze other measurements of hybridization
For purposes of illustration, the present invention is described as being part of a computer system that designs a chip mask, synthesizes the probes on the chip, labels the nucleic acids, and scans the hybridized nucleic acid probes. Such a system is fully described in U.S. Pat. No. 5,571,639 that has been incorporated by reference for all purposes. However, the present invention may be used separately from the overall system for analyzing data generated by such systems.
The chip design files are provided to a system 106 that designs the lithographic masks used in the fabrication of arrays of molecules such as DNA. The system or process 106 may include the hardware necessary to manufacture masks 110 and also the necessary computer hardware and software 108 necessary to lay the mask patterns out on the mask in an efficient manner. As with the other features in
The masks 110, as well as selected information relating to the design of the chips from system 100, are used in a synthesis system 112. Synthesis system 112 includes the necessary hardware and software used to fabricate arrays of polymers on a substrate or chip 114. For example, synthesizer 112 includes a light source 116 and a chemical flow cell 118 on which the substrate or chip 114 is placed. Mask 110 is placed between the light source and the substratetchip, and the two are translated relative to each other at appropriate times for deprotection of selected regions of the chip. Selected chemical regents are directed through flow cell 118 for coupling to deprotected regions, as well as for washing and other operations. All operations are preferably directed by an appropriately programmed computer 119, which may or may not be the same computer as the computer(s) used in mask design and mask making.
The substrates fabricated by synthesis system 112 are optionally diced into smaller chips and exposed to marked targets. The targets may or may not be complementary to one or more of the molecules on the substrate. The targets are marked with a label such as a fluorescein label (indicated by an asterisk in
The image file 124 is provided as input to an analysis system 126 that incorporates the scanned image alignment techniques of the present invention. Again, the analysis system may be any one of a wide variety of computer system(s), but in a preferred embodiment the analysis system is based on a WINDOWS NT workstation or equivalent. The analysis-system may analyze the image file(s) to generate appropriate output 128, such as the identity of specific mutations in a target such as DNA or RNA.
For de novo sequencing, a chip may be synthesized to include cells containing all the possible probes of a specific length. For example, a chip may be synthesized that includes all the possible 8-mer DNA probes. Such a chip would have 65,536 cells (4*4*4*4*4*4*4*4), with each cell corresponding to a particular probe. A chip may also include other probes including all the probes of other lengths.
At a step 203 the system determines which probes would be desirable on the chip, and provides an appropriate “layout” on the chip for the probes. The layout implements desired characteristics such as an arrangement on the chip that permits “reading” of genetic sequence and/or minimization of edge effects, ease of synthesis, and the like.
The masks for the chip synthesis are designed at a step 205. The masks are designed according to the desired chip characteristics and layout. At a step 207, the system synthesizes the DNA or other polymer chips. Software controls, among other things, the relative translation of the substrate and mask, the flow of the desired reagents through a flow cell, the synthesis temperature of the flow cell, and other parameters.
As shown, when the fluorescein-labeled (or otherwise marked) target 5′-TCTTGCA is exposed to the array, it is complementary only to the probe 3′-AGAACGT, and fluorescein will be primarily found on the surface of the chip where 3′-AGAACGT is located. The chip contains cells that include multiple copies of a particular probe. Thus, the image file will contain fluorescence intensities, one for each probe (or cell). By analyzing the fluorescence intensities associated with a specific probe, it becomes possible to extract sequence information from such arrays using the methods of the invention disclosed herein.
For ease of reference, one may call bases by assigning the bases the following codes:
Most of the codes conform the IUPAC standard However, code N has been redefined and code X has been added.
Scanned Image Alignment
Before the scanned image alignment of the invention are discussed, it may be helpful to provide an overview of the overall process in one embodiment.
The hybridized chip is scanned at a step 259. For example, the hybridized chip may be laser scanned to detect where fluorescein-labeled sample fragments have hybridized to the chip. Numerous techniques may be utilized to label the sample fragments and the scanning process will typically be performed according to the type of label utilized. The scanning step produces a digital image of the chip.
In preferred embodiments, the scanned image of the chip includes varying fluorescent intensities that correspond to the hybridization intensity or affinity of the sample to the probes in a cell. In order to achieve more accurate results, it is beneficial to identify the pixels that belong to each cell on the chip. At an image alignment step 263, the scanned image is aligned so that the pixels that correspond to each cell can be identified. Optionally, the image alignment step includes the alignment of a grid over the scanned image (see
At a step 267, the analysis system analyzes the scanned image to calculate the relative hybridization intensities for each cell of interest on the chip. For example, the hybridization intensity for a cell, and therefore the relative hybridization affinity between the probe of the cell and the sample sequence, may be calculated as the mean of the pixel values within the cell. The pixel values may correspond to photon counts from the labeled hybridized sample fragments.
The cell intensities may be stored as a cell intensity file 269. In preferred embodiments, the cell intensity file includes a list of cell intensities for the cells. At an analysis step 271, the analysis system may analyze the cell intensity file and chip characteristics to generate results 273. The chip characteristics may be utilized to identify the probes that have been synthesized at each cell on the chip. By analyzing both the sequence of the probes and their hybridization intensities from the cell intensity file, the system is able to extract sequence information such as the location of mutations, deletions or insertions, or the sequence of the sample nucleic acid. Accordingly, the results may include sequence information, graphs of the hybridization intensities of probe(s), graphs of the differences between sequences, and the like. See U.S. patent application Ser. No. 08/327,525, which is hereby incorporated by reference for all purposes.
In order to align the scanned image, the invention provides a pattern in the scanned image that will be convolved into a recognizable pattern. In preferred embodiments, the pattern in the scanned image is a checkerboard pattern that is generated by synthesizing alternating cells that include probes that are complementary to a control nucleic acid sequence. The control nucleic acid sequence may be a known sequence that is labeled and hybridized to the chip for the purpose of aligning the scanned image. Additionally, the brightness of the cells complementary to the control nucleic acid sequence may be utilized as a baseline or for comparison to other intensities.
As an example,
With regard to
At a step 353, the convolved image is searched for bright areas. When the scanned image is convolved, the pattern(s) in the scanned image will be convolved into a recognizable pattern or patterns of bright areas. Accordingly, once bright areas are identified in the convolved image, the system confirms that the bright areas are in the expected recognizable pattern (e.g., a grid pattern) at a step 355.
In order to better understand what is meant by the different patterns,
Once a pixel selected, neighbor pixels may then be selected at a step 503. By neighbor pixels, it is meant pixels that the pixels are near, but not necessarily adjacent to a pixel. For example,
At a step 505, the average of the odd pixels and the average of the even pixels is determined. Referring again to
Pixel 1 is convolved into a convolved pixel in a convolved image by determining if the average of the odd pixels is greater than the average of the even pixels at a step 507. If the average of the odd pixels is greater, the convolved pixel is set equal to the intensity of the minimum of the odd pixels minus the intensity of the maximum of the even pixels at a step 509. Otherwise, the convolved pixel is set equal to the intensity of the minimum of the even pixels minus the intensity of the maximum of the odd pixels at a step 511.
Conceptually, the neighbor pixels may be thought of as being filtered, such as by a software filter in preferred embodiments. With the filter, the system is searching for a checkerboard pattern where all the odd pixels are either darker or lighter than the even pixels. Accordingly, averages of the odd and even pixels are calculated at step 505. Step 507 acts to determine if the pixels likely reflect a checkerboard pattern where the odd pixels, and therefore squares, are light (e.g., high intensity) or dark (e.g. low intensity). If the odd pixels likely reflect a checkerboard pattern where the odd pixels are light, step 509 sets the convolved pixel to the difference between selected odd and even pixels, where the selected odd pixel is the minimum of the odd pixels and the selected even pixel is the maximum of the even pixels. Step 511 is similar but reversed.
Therefore, at step 509, if all the odd pixels are much brighter than all the even pixels, the difference will be a larger value. Hence, the convolved pixel will be relatively bright (e.g., high intensity). The convolved pixel will also be relatively bright if all the even pixels are much brighter than all the odd pixels at step 511. However, if the difference at step 509 or 511 is very small (or negative), the convolved pixel will be set to a relatively dark intensity. Convolved pixels with negative pixel values may be set to a zero in preferred embodiments. In short, if the filter finds a checkerboard pattern, the convolved pixel will be bright and if the filter finds a relatively random pattern, the convolved pixel will be dark (thus, filtering out “noise” that is not the desired pattern).
The recognizable pattern in
Additionally, as the software filter of
The following shows how well an embodiment of the invention aligned scanned images of hybridized chips:
The previous method was to analyze the scanned image (unfiltered) to locate bright areas or spots in a checkerboard pattern. As shown, an embodiment of the invention was able to dramatically increase the accuracy of scanned image alignment.
Refined Grid Alignment
In preferred embodiments, refined image alignment may be performed to further increase the accuracy of the scanned image alignment.
At a step 551, pixel intensities on grid lines in the grid are summed. For example, the intensities of the grid in a vertical direction in the checkerboard pattern in the scanned image may be summed.
Then, at a step 553, the system may determine if there are more positions of the grid to analyze. If there are, the position of the grid may be adjusted at a step 555. Therefore, the grid may be moved left and right by one or more pixels before the intensities are summed along grid lines at step 551. Once all the positions of the grid have been analyzed, the system selects a grid position where pixel intensities (e.g., the sum calculated at step 551) are at a minimum. Therefore, if the pixel intensities for grid lines are lower at another position, the grid is adjusted accordingly. This refinement will work well if the cells are typically separated by a darker area or line.
Although the process in
The following shows how well an embodiment of the invention aligned scanned images of hybridized chips utilizing the refined grid alignment:
Once again, the previous method was to analyze the scanned image (unfiltered) to locate bright areas or spots in a checkerboard pattern. As shown, an embodiment of the invention was able to dramatically increase the accuracy of scanned image alignment. Furthermore, refining grid alignment increased the percentage of scanned images that were perfectly aligned with the invention from 4% to 64%. Therefore, performing a refinement of grid alignment can significantly increase the accuracy of the grid alignment.
Conclusion
While the above is a complete description of preferred embodiments of the invention, various alternatives, modifications, and equivalents may be used. It should be evident that the invention is equally applicable by making appropriate modifications to the embodiments described above. For example, the invention has been described in reference to a checkerboard pattern in the scanned image. However, the invention is not limited to any one pattern and may be advantageously applied to other patterns including those described herein. Therefore, the above description should not be taken as limiting the scope of the invention that is defined by the metes and bounds of the appended claims along with their full scope of equivalents.
SOFTWARE APPENDIX
Claims
1. An array comprising one or more array information features.
2. The array of claim 1, wherein said one or more array information features comprises at least 4 features.
3. The array of claim 2, wherein said features are positioned in a defined pattern on said array.
4. The array of claim 3, wherein said defined pattern provides a symbol when specifically bound to target.
5. The array of claim 1, wherein said one or more features provides coded information when specifically bound to target.
6. The array of claim 5, wherein said coded information is binary or non-binary coded information.
7. A method for providing information about an array, said method comprising: contacting an array of claim 1 with a sample comprising a target that binds to at least one of said one or more information features to produce at least one signal that provides information about said microarray.
8. The method of claim 7, wherein said target is spiked into said sample prior to contacting of said array with said sample.
9. The method of claim 7, wherein said information is provided by assessing binding of said target to said one or more array information probes.
10. The method of claim 9, wherein said assessing is by determining the presence, absence or level of binding to control levels of binding.
11. The method of claim 7, further comprising determining the presence, absence or level of at least one signal that provides said information.
12. The method of claim 11, wherein said at least one signal provides a binary code, where 0 is represented by no detectable signal and 1 is represented by a detectable signal.
13. The method of claim 11, wherein said at least one signal provides a binary code, where 1 is represented by no detectable signal and 0 is represented by a detectable signal.
14. The method of claim 11, wherein said at least one signal provides a binary code, where 0 is represented by a signal generated by a first label and 1 is represented by a signal generated by a second label that is detectably distinguishable from the first label.
15. The method of claim 7, further comprising determining a level of said at least one signal to provide a non-binary code that provides said information.
16. The method of claim 15, wherein said non-binary code is represented by levels of signal relative to a control level of signal.
17. A composition comprising a labeled array information target that specifically binds to an array information probe.
18. A kit comprising:
- (a) an array information probe; and
- (b) a target that binds to said array information probe under specific binding conditions to produce a signal and thereby provide information about an array.
19. The kit of claim 18, further comprising instructions for using said array information probe and said target to provide information about a microarray.
20. The kit of claim 19, wherein said probe is present in one or more array information elements on the surface on an array.
21. The kit of claim 18, wherein said instructions include a protocol for spiking a sample with said target prior to contacting said array with said sample.
22. A system for providing information about an array, said system comprising:
- a) an array comprising one or more array information features; and
- b) a target that specifically binds to at least one of said one or more array information features.
23. A method of detecting the presence of an analyte in a sample, said method comprising:
- (a) contacting a sample suspected of containing said analyte with an array of claim 1, wherein said array comprises a probe for said analyte;
- (b) detecting any resultant binding complexes on the surface of said array to obtain binding complex data to determine whether said analyte is present in said sample.
24. The method of claim 23, further comprising obtaining information about said array by assessing binding of target to said one or more array information features.
25. The method of claim 23, wherein said analyte is a nucleic acid and said array is an array of nucleic acid probes.
26. A method comprising transmitting a result obtained from a method of claim 23 from a first location to a second location.
27. The method of claim 26, wherein said second location is a remote location.
28. A method comprising receiving a result of a method of claim 23.
29. A hybridization assay comprising the steps of:
- (a) contacting at least one sample containing nucleic acids labeled with a detectable label with a nucleic acid array comprising one or more array information features to produce a hybridization pattern for said nucleic acid sample; and
- (b) analyzing said hybridization pattern for each detectable label to produce data on the amounts of said target nucleic acid in said sample and provide information about the array.
30. A computer readable medium comprising programming to obtain information about an array from data obtained using the array.
31. A computer-readable medium comprising: information for decoding encoded array information obtained from an array comprising one or more array information features.
32. The computer readable medium of claim 31, wherein said array is an array of nucleic acids.
33. The computer-readable medium of claim 31, wherein said information comprises a table that contains: a list of feature identifiers; and a list of probe identifiers corresponding to said feature identifiers.
34. The computer-readable medium of claim 33, wherein said table indicates that certain features of said array are array information features.
35. The computer readable medium of claim 33, wherein said table indicates which features correspond to which bit of a code.
36. The computer-readable medium of claim 31, wherein said information indicates an executable program for decoding said encoded array information.
37. The computer-readable medium of claim 31, wherein said information is a file that has a unique identifier that corresponds to a unique identifier of an array.
38. The computer-readable medium of claim 31, wherein said array information features encode binary coded information, and said file contains information for decoding said binary coded information.
39. A method for obtaining information about an array, comprising: reading an array comprising one or more array information features to provide encoded information for said array; and decoding said encoded information using a computer readable medium of claim 31 to provide information about said array.
40. The method of claim 39, wherein said array is a nucleic acid array.
41. The method of claim 39, wherein said scanning provides a data file comprising feature identifiers and numerical assessments of the brightness of said array information features.
42. The method of claim 39, wherein said decoding comprises identifying an executable program using said file, and executing said program, to decode said encoded information and provide information about said array.
43. The method of claim 42, wherein said executable program is obtained from a remote location.
44. A method for obtaining information about an array, comprising: encoding information on an array using one or more array information features; and providing information for decoding said encoded information.
45. The method of claim 44, wherein said array information is provided from a location remote to said array.
46. A method of assaying a sample, said method comprising:
- (a) contacting said sample with an array comprising one or more array information features,
- (b) reading said array with an array scanner to obtain data, and
- (c) decoding said data using a computer readable medium of claim 31 to provide information for said array.
47. The method according to claim 46, wherein said array is a nucleic acid array.
48. A method comprising transmitting a result obtained from a method of claim 46 from a first location to a second location.
49. The method of claim 48, where said second location is a remote location.
50. A method comprising receiving data representing said data obtained by the method of claim 46.
51. A kit for use with an array scanner, said kit comprising:
- (a) a computer-readable medium according to claim 31; and
- (b) instructions for operating said scanner according to said programming.
52. The kit of claim 51, further comprising an array.
53. A kit for use with an array scanner, said kit comprising:
- (a) an array comprising array information features; and
- (b) instructions for obtaining information for decoding encoded array information encoded by said array information features.
54. A method, embodied in a computer program, for automated extraction data from a molecular array having features arranged in a regular pattern, the method comprising:
- receiving a number of images of the molecular array, each produced by scanning the molecular array to determine intensities of data signals emanating from discrete positions on a surface of the molecular array;
- estimating initial positions of selected marker features within an image of the molecular array;
- calculating refined positions of the selected marker features within the image of the molecular array;
- using the refined positions of the selected marker features to compute an initial coordinate system for locating features of the molecular array in the number of images of the molecular array;
- using the initial coordinate system to locate positions of strong features within one or more images of the molecular array;
- refining the positions of strong features within the one or more images of the molecular array by analyzing data signal intensity values in regions of the one or more images of the molecular array that contain the strong features;
- using the refined positions of strong features in the one or more images of the molecular array to calculate a refined coordinate system to locate positions of weak features within the number of images of the molecular array;
- using the refined positions of strong features in the one or more images of the molecular array to calculate a refined coordinate system to locate positions of local background regions surrounding all strong and weak features within the number of images of the molecular array; and
- extracting data from strong features, and their respective local background regions, within the number of images of the molecular array using the refined positions of strong features within the number of images of the molecular array and extracting data from weak features, and their respective local background regions, within the number of images of the molecular array using locations for the weak features calculated from the refined coordinate system.
55. The method of claim 54 wherein data signals emanating from discrete positions on the surface of the molecular array include:
- fluorescent emission from fluorophores incorporated into molecules bound to features of the molecular array;
- radiation emitted by radioisotopes incorporated into molecules bound to features of the molecular array; and
- light emission from chemoluminescent moieties incorporated into molecules bound to features of the molecular array.
56. The method of claim 54 wherein each image of the number of images comprise an array of pixels, each pixel having a data signal intensity value.
57. The method of claim 56 wherein the features of the molecular array are arranged in a rectilinear grid, wherein corner features are selected as marker features, and wherein estimating initial positions of selected marker features within an image of the molecular array further includes:
- calculating row and column vectors by considering the values of pixels in rows and columns of the image;
- determining a first and last peak in the row and column vectors; and
- using pixel coordinates of the first and last peaks in the row vector to determine horizontal coordinates of the corner features and using pixel coordinates of the first and last peaks in the column vector to determine vertical coordinates of the corner features.
58. A system for automated extraction of data from a molecular array having features arranged in a regular pattern, the system comprising:
- a scanning component that produces images of the molecular array representing intensities of data signals emitted from discrete positions on a surface of the molecular array;
- a computer program that processes the images of the molecular array produced by the scanning component to index features in the images of the molecular array corresponding to molecules bound to features of the molecular array and that extracts data from the indexed features within images of the molecular array; and a computer for executing the computer program.
59. A The system of claim 58 wherein data signal intensities emanating from discrete positions on the surface of the molecular array include:
- radiation emitted by radioisotopes incorporated into molecules bound to features of the molecular array;
- fluorescent emission from fluorophores incorporated into molecules bound to features of the molecular array; and
- light emission from chemoluminescent moieties incorporated into molecules bound to features of the molecular array.
60. The system of claim 58 wherein the computer program processes the images of the molecular array and extracts data from indexed features within images of the molecular array by:
- receiving a number of images of the molecular array produced by the scanning component;
- estimating initial positions of selected marker features within an image of the molecular array;
- calculating refined positions of the selected marker features within the image of the molecular array;
- using the refined positions of the selected marker features to compute an initial coordinate system for locating features of the molecular array in the number of images of the molecular array;
- using the initial coordinate system to locate positions of strong features within one or more images of the molecular array;
- refining the positions of strong features within the one or more images of the molecular array by analyzing data signal intensity values in regions of the one or more images of the molecular array that contain the strong features;
- using the refined positions of strong features in the one or more images of the molecular array to calculate a refined coordinate system to locate positions of weak features within the number of images of the molecular array;
- using the refined positions of strong features in the one or more images of the molecular array to calculate a refined coordinate system to locate positions of local background regions surrounding all strong and weak features within the number of images of the molecular array; and
- extracting data from strong features, and their respective local background regions, within the number of images of the molecular array using the refined positions of strong features within the number of images of the molecular array and extracting data from weak features, and their respective local background regions, within the number of images of the molecular array using locations for the weak features calculated from the refined coordinate system
Type: Application
Filed: Mar 3, 2006
Publication Date: Nov 16, 2006
Applicant: Affymetrix, Inc. (Santa Clara, CA)
Inventors: Peter Fiekowsky (Los Altos, CA), Dan Bartell (Palo Alto, CA), David Stern (Mt. View, CA)
Application Number: 11/366,515
International Classification: G06F 19/00 (20060101);