Secure coordinate identification method, system and program

- IBM

A method, program and system (10) for processing data are disclosed. The method, program and system comprising the steps of: (a) receiving data representing a location of an item (e.g., people, personal property, real property, organizations, chemical compounds, organic compounds, proteins, biological structures, biometric values or atomic structures), (c) determining a plurality of fixed coordinates that represent the location (e.g., by “rounding” and/or comparing to a reference grid), (d) utilizing an algorithm (e.g., encryption, encoding and/or one-way function) to process the plurality of fixed coordinates (each separately or together), and (e) comparing the processed data to at least a portion of secondary data (perhaps comprising data previously stored in a database).

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of provisional application No. 60/457,119, filed in the United States Patent Office on Mar. 24, 2003.

FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not Applicable.

TECHNICAL FIELD

This invention generally relates to processing data and, more particularly, to the identification, processing, and comparison of location coordinates in a confidential and anonymous manner.

BACKGROUND

Identifying and sharing a location of an item (e.g., individual, personal property, or real property) in a confidential manner is an important goal in various situations. For example, United States army personnel may have identified the location of a first target and may wish to determine whether a second target identified by a foreign army's personnel is the same in a greater effort of coordinating strike options, while at the same time not disclosing: (a) to the foreign army's personnel the location of the first target if the second target is not the same, (b) to the United States army personnel the location of the second target if the second target is not the same as the first target and/or (c) to any third person either the United States army personnel's knowledge of the first target or the foreign army's personnel's knowledge of the second target.

However, there are no existing systems that use a cryptographic algorithm to identify, disclose and compare location coordinates representing the locations of particular items in a secure and confidential manner.

The present invention is provided to address these and other issues.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram of the system in accordance with the invention;

FIG. 2 is a functional block diagram of the System block of FIG. 1; and,

FIG. 3 is a representation of a non-uniform grid system.

DETAILED DESCRIPTION

While this invention is susceptible of embodiment in many different forms, there is shown in the drawing, and will be described herein in detail, specific embodiments thereof with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific embodiments illustrated.

A data processing system 10 for processing data is illustrated in FIG. 1. The system 10 includes at least one conventional computer 12 having a processor 14 and memory 16. The memory 16 is used both for storage of the executable software to operate the system 10 as well as for storage of the data in a database and random access memory. All or part of the software may be embodied within various applications and equipment, depending upon the relevant confidentiality and security requirements. For example, the software may be embodied, stored or provided on any computer readable medium utilizing any of the following, at a minimum: (a) an installed software application on the source system, (b) a box unit that self-destroys the unit upon any tampering, and/or (c) a CD, DVD, floppy disc or other removable media. The system 10 may effect all or part of the processing of the item location data on one or more computers 12 at the source location and/or may effect all or part of the processing with one or more computers 12 at a location different from the source (e.g., a central processing system).

To keep the item location more secure, the item location data can be encrypted. However, due to the nature of the mathematics, when values are processed, for example, by encryption or hashing, and are compared to another location data, there will only be a match when the two item location data being compared match identically. That is, data of two item locations that vary by only one unit of measurement will not be identifiable as having a potential relationship.

To overcome this potential misidentification, a system can determine a fixed coordinate grid point corresponding to the item location, with the fixed coordinate grid point then processed for use in the comparison process. This ensures that the encrypted values are matchable to previously stored data (i.e. previously stored fixed coordinate grid points). However, this raises the possibility that two item locations that are close enough to warrant a match may be assigned to two different fixed coordinate grid points and would fail to match after being processed.

Assigning the item location to more than one fixed coordinate grid point addresses the issue of incorrectly failed matching. By determining more than one of the nearest fixed coordinates to a given item location and using each in a comparison process, failed matches can be reduced. When each of the fixed coordinate grid points is processed, a set of results is available for comparison. If any element of the set matches with known data, the item location may be worthy of further investigation.

As illustrated in FIG. 2, in a step 18 the system 10 receives data representing the location of a particular item (e.g., natural person, organization, chemical compound, organic compound, protein, biological structure, biometric value, atomic structure, inventory item, real property, personal property). In a step 20 the system then determines a plurality of fixed coordinates that represent the location by several processes, for example, rounding or comparison to a predetermined grid. Rounding calculates values on a virtual grid based upon the location. Comparing the location to the pre-determined grid finds the nearest and/or surrounding fixed coordinates. The pre-determined grid may be uniform (e.g., equal spacing between grid points), non-uniform (e.g., equal spacing in a first portion of grid points, but differential spacing in a second portion of grid points), multiple, tiered and/or three-dimensional, four dimensional or more multi-dimensional. For example, where the data representing the location consists of latitude (x), longitude (y), height (z) and time (t) variables, the system in a multiple grid circumstance compares the data to a four-dimensional non-uniform grid representing latitude, longitude, height and time dimensions to establish a plurality of fixed coordinates which would also allow for comparisons of moving targets.

While the embodiment can use just two fixed coordinates, using only two fixed coordinates creates a greater possibility that two item locations, which may be infinitesimally close to each other, may be determined to be near separate pairs of fixed coordinates. For example, the two fixed coordinates corresponding with one of the two item locations are determined to be different, and perhaps significantly farther from the two fixed coordinates corresponding with the other of the two item locations. As such, the grid would preferably have at least three (3) fixed coordinates (creating a triangle-type shape if lines were to connect the fixed coordinates on the grid), with scaled positioning of other coordinates through the grid based upon a user-defined criterion, such as spacing of a particular distance or time (e.g., one (1) minute) and potentially subdividing the coordinates according to quantity (e.g., population density). In addition, the coordinates may cover a several areas or layers, for example, the system can determine the nearest three fixed coordinates and an additional three fixed coordinates surrounding those, creating a broader matching region.

Several grids and grid combinations may be used in determining fixed coordinates. For example, an item location on a rectangular grid could be assigned to the three nearest grid coordinates. Similarly, an item location could be assigned to all the coordinates of the grid encompassing the item location. Both of these examples involve simple geometric and trigonometric calculations. When the grid system is more complex, as discussed above, these simple techniques may not be sufficient.

Referring briefly to FIG. 3, an illustration of a non-uniform grid system 30 is shown. The non-uniform grid system 30 has a plurality of triangular grids that may relate to population density, terrain features or other criteria. Coordinates x, y, z, m, n, o, p, q, represent a plurality of fixed coordinates. Item location A is within a bounding triangle defined by the fixed coordinates x, y and z. The closest fixed grid points are, in this case, x, y and z, which would be used for comparison with previously stored data. It can be seen that point B, while within a bounding triangle m, n, o, may actually be closer to fixed coordinates outside that bounding triangle, such as points x, z and q. If so, point B, when using nearest fixed coordinates, would be associated with the fixed coordinate grid points x, z and q when being compared to previously stored data. Point C appears closest to point m of its bounding triangle m, o, p. The system, if using nearest fixed coordinates, can use mathematical evaluation to determine the other two fixed coordinates closest to C.

The use of more complex mathematics can help ensure that the most relevant fixed coordinates represent an item location, particularly in the case of non-uniform or high-dimension grid systems. One useful technique is the affine transform, which allows transformation to a coordinate system that preserves linearity and spacing. A high level overview of the use of an affine transform in this respect is illustrated below.

By way of a detailed example of one embodiment of how the system determines 20 three (3) fixed coordinates that represent the location, given a uniform triangular grid with spacing of one (1) minute, the system processes data by: (a) taking a given (x, y) where x is longitude and y is latitude in degrees corresponding to the location, (b) multiplying the given (x, y) by sixty (60) to scale to minutes, (c) taking an affine transformation (x′, y′)=(x−½y, y), which transforms the uniform triangular grid into a uniform rectangular grid (i.e., creating a rectangle-type shape if lines were to connect four (4) fixed coordinates on the uniform rectangular grid) and enabling the point (x′, y′) to fall within the uniform rectangular grid that corresponds to two (2) three fixed coordinate areas in the uniform triangular grid, (d) set (x0, y0)=(└x′┘, └y′┘) to establish the lower left corner on the uniform rectangular grid, (e) set P1=(x0+1, y0) and P2=(x0, y0+1) to determine two (2) fixed coordinates on the uniform rectangular grid, (f) calculate (x′−x0)+(y′−y0) to determine a third fixed coordinate on the uniform rectangular grid, which, depending upon whether the third fixed coordinate is in the top right or lower left area of the uniform rectangular grid, is P0=(x0+1, y0+1) if the result of the calculation is greater than 1 or P0=(x0, y0) if the result of the calculation is less than 1, (g) transform the resulting three (3) fixed coordinates back to the uniform triangular grid by applying the affine transformation (x′, y′)=(x+½y, y) to each of P0, P1 and P2 (which may be implemented using integers), resulting in an integral number of half minutes, which may be converted to a number from 0 to 43199 to take into account the international date line, and P0, P1 and P2 being the three (3) nearest fixed coordinates on the uniform triangular grid representing the location.

The system then: (a) processes each of the plurality of fixed coordinates through a cryptographic algorithm (e.g., encryption, encoding, one-way function such as MD-5) to render the plurality of fixed coordinates confidential (“Processed Coordinates”) in step 22 and (b) compares the Processed Coordinates to secondary data (e.g., previously saved data) and matches any data reflecting one or more identical fixed coordinate in step 24. For example, where the plurality of fixed coordinates associated with a first location is determined to be 1, 2, 3 and the plurality of fixed coordinates associated with a second location is determined to be 2, 3, 5, the system 10 processes each of the plurality of fixed coordinates through the cryptographic algorithm, such as MD-5, and combines salt to the plurality of fixed coordinates in step 22, causing each resulting Processed Coordinate to be confidential. Then, the comparison between the resulting Processed Coordinates would identify the match of the respective Processed Coordinate associated with the 2 and 3 of the plurality of fixed coordinates common between the first location and the second location.

Thereafter, the Processed Coordinates and any matches are stored in a database in step 26 and the system issues a signal (e.g., match or no match) based upon user-defined rules and policies in step 28, such as transferring the Processed Coordinates to other systems for analysis and coordination.

From the foregoing, it will be observed that numerous variations and modifications may be effected without departing from the spirit and scope of the invention. It is to be understood that no limitation with respect to the specific apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.

Claims

1. A computer-implemented method for identification, processing, and comparison of location coordinate data in a confidential and anonymous manner, comprising:

receiving, in a computer, a plurality of fixed coordinates, each of the fixed coordinates representing a location of an item comprising an individual, personal property or real property, and the plurality of fixed coordinates being generated by more than one process;
utilizing, in the computer, a cryptographic algorithm to encrypt the plurality of fixed coordinates, thereby forming a processed data; and
comparing, in the computer, the encrypted fixed coordinates of the processed data to at least a portion of secondary data that comprises one or more encrypted fixed coordinates to determine whether a relationship exists between the encrypted fixed coordinates of the processed data and the encrypted fixed coordinates of the secondary data.

2. The method of claim 1 further comprising the step of receiving data representing the location of the item and determining the plurality of fixed coordinates that represent the location of the item prior to receiving the plurality of fixed coordinates.

3. The method of claim 1 further comprising the step of storing the processed data in a database.

4. The method of claim 1 wherein the step of comparing the processed data to at least a portion of secondary data includes the secondary data comprising data previously stored in a database.

5. The method of claim 1 further comprising the step of matching the processed data to the at least a portion of secondary data that is determined to reflect an identical one of the plurality of fixed coordinates.

6. The method of claim 1 further comprising the step of issuing a signal based upon a user-defined rule.

7. The method of claim 1 wherein the step of determining the plurality of fixed coordinates that represent the location occurs in relation to a grid.

8. The method of claim 7 wherein the grid comprises a uniform grid.

9. The method of claim 7 wherein the grid comprises a non-uniform grid.

10. The method of claim 7 wherein the grid is a multi-dimensional grid.

11. The method of claim 7 wherein the grid is based upon a user-defined criterion.

12. The method of claim 11 wherein the user-defined criterion corresponds with quantity.

13. The method of claim 11 wherein the user-defined criterion corresponds to time.

14. The method of claim 1 wherein the step of determining the plurality of fixed coordinates that represent the location includes the step of determining a nearest of the plurality of fixed coordinates.

15. The method of claim 1 wherein the step of determining a plurality of fixed coordinates that represent the location includes the step of determining the plurality of fixed coordinates surrounding the location.

16. A non-transitory computer readable medium containing program instructions for execution by a computer for performing a method of identification, processing, and comparison of location coordinate data in a confidential and anonymous manner, comprising:

receiving, in a computer, a plurality of fixed coordinates, each of the fixed coordinates representing a location of an item comprising an individual, personal property or real property, and the plurality of fixed coordinates being generated by more than one process;
utilizing, in the computer, a cryptographic algorithm to encrypt the plurality of fixed coordinates, thereby forming a processed data; and
comparing, in the computer, the encrypted fixed coordinates of the processed data to at least a portion of secondary data that comprises one or more encrypted fixed coordinates to determine whether a relationship exists between the encrypted fixed coordinates of the processed data and the encrypted fixed coordinates of the secondary data.

17. The computer readable medium for performing the method of claim 16 further comprising the step of receiving data representing the location of the item and determining the plurality of fixed coordinates that represent the location of the item prior to receiving the plurality of fixed coordinates.

18. The computer readable medium for performing the method of claim 16 further comprising the step of storing the processed data in a database.

19. The computer readable medium for performing the method of claim 16 wherein the step of comparing the processed data to at least a portion of secondary data includes the secondary data comprising data previously stored in a database.

20. The computer readable medium for performing the method of claim 16 further comprising the step of matching the processed data to the at least a portion of secondary data that is determined to reflect an identical one of the plurality of fixed coordinates.

21. The computer readable medium for performing the method of claim 16 further comprising the step of issuing a signal based upon a user-defined rule.

22. The computer readable medium for performing the method of claim 16 wherein the step of determining the plurality of fixed coordinates that represent the location occurs in relation to a grid.

23. The computer readable medium for performing the method of claim 22 wherein the grid comprises a uniform grid.

24. The computer readable medium for performing the method of claim 22 wherein the grid comprises a non-uniform grid.

25. The computer readable medium for performing the method of claim 22 wherein the grid is a multi-dimensional grid.

26. The computer readable medium for performing the method of claim 22 wherein the grid is based upon a user-defined criterion.

27. The computer readable medium for performing the method of claim 26 wherein the user-defined criterion corresponds with quantity.

28. The computer readable medium for performing the method of claim 26 wherein the user-defined criterion corresponds to time.

29. The computer readable medium for performing the method of claim 16 wherein the step of determining the plurality of fixed coordinates that represent the location includes the step of determining the nearest of the plurality of fixed coordinates.

30. The computer readable medium for performing the method of claim 16 wherein the step of determining a plurality of fixed coordinates that represent the location includes the step of determining the plurality of fixed coordinates surrounding the location.

Referenced Cited
U.S. Patent Documents
1261167 April 1918 Russell
3659085 April 1972 Porter
3793634 February 1974 Heller et al.
4232313 November 4, 1980 Fleishman
4981370 January 1, 1991 Dziewit et al.
5010478 April 23, 1991 Deran
5229764 July 20, 1993 Matchett et al.
5403639 April 4, 1995 Belsan et al.
5454101 September 26, 1995 Mackay et al.
5534855 July 9, 1996 Shockley et al.
5555409 September 10, 1996 Leenstra et al.
5560006 September 24, 1996 Layden et al.
5608907 March 4, 1997 Fehskens et al.
5659731 August 19, 1997 Gustafson
5675785 October 7, 1997 Hall et al.
5758343 May 26, 1998 Vigil et al.
5764977 June 9, 1998 Oulid-Aissa et al.
5778375 July 7, 1998 Hecht
5781911 July 14, 1998 Young et al.
5784464 July 21, 1998 Akiyama et al.
5794246 August 11, 1998 Sankaran et al.
5799309 August 25, 1998 Srinivasan
5819263 October 6, 1998 Bromley et al.
5848373 December 8, 1998 Delorme et al.
5878416 March 2, 1999 Harris et al.
5892828 April 6, 1999 Perlman
5933831 August 3, 1999 Jorgensen
5991408 November 23, 1999 Pearson et al.
5991733 November 23, 1999 Aleia et al.
5991758 November 23, 1999 Ellard
5991765 November 23, 1999 Vethe
5995097 November 30, 1999 Tokumine et al.
5995973 November 30, 1999 Daudenarde
6014670 January 11, 2000 Zamanian et al.
6032158 February 29, 2000 Mukhopadhyay et al.
6035295 March 7, 2000 Klein
6035300 March 7, 2000 Cason et al.
6035306 March 7, 2000 Lowenthal et al.
6041410 March 21, 2000 Hsu et al.
6044378 March 28, 2000 Gladney
6049805 April 11, 2000 Drucker et al.
6052693 April 18, 2000 Smith et al.
6058477 May 2, 2000 Kusakabe et al.
6065001 May 16, 2000 Ohkubo et al.
6073140 June 6, 2000 Morgan et al.
6076167 June 13, 2000 Borza
6092199 July 18, 2000 Dutcher et al.
6122641 September 19, 2000 Williamson et al.
6122757 September 19, 2000 Kelley
6160903 December 12, 2000 Hamid et al.
6167517 December 26, 2000 Gilchrist et al.
6185557 February 6, 2001 Liu
6202151 March 13, 2001 Musgrave et al.
6208990 March 27, 2001 Suresh et al.
6263446 July 17, 2001 Kausik et al.
6272495 August 7, 2001 Hetherington
6317834 November 13, 2001 Gennaro et al.
6334132 December 25, 2001 Weeks
6339775 January 15, 2002 Zamanian et al.
6357004 March 12, 2002 Davis
6385604 May 7, 2002 Bakalash et al.
6418450 July 9, 2002 Daudenarde
6446210 September 3, 2002 Borza
6460037 October 1, 2002 Weiss et al.
6523041 February 18, 2003 Morgan et al.
6674860 January 6, 2004 Pirila
6684334 January 27, 2004 Abraham
6697947 February 24, 2004 Matyas, Jr. et al.
6734783 May 11, 2004 Anbai
6743022 June 1, 2004 Sarel
6819797 November 16, 2004 Smith
6886747 May 3, 2005 Snapp
6948062 September 20, 2005 Clapper
6968338 November 22, 2005 Gawdiak et al.
7007168 February 28, 2006 Kubo et al.
7143289 November 28, 2006 Denning et al.
7177426 February 13, 2007 Dube
7249257 July 24, 2007 Brundage et al.
7254839 August 7, 2007 Fahraeus et al.
7512234 March 31, 2009 McDonnell et al.
20020023088 February 21, 2002 Thwaites
20020184509 December 5, 2002 Scheidt et al.
20030097380 May 22, 2003 Mulhern et al.
20030108202 June 12, 2003 Clapper
20030154194 August 14, 2003 Jonas
20030182018 September 25, 2003 Snapp
20030182568 September 25, 2003 Snapp et al.
20030191739 October 9, 2003 Chatterjee et al.
20040007616 January 15, 2004 Snapp
20040049682 March 11, 2004 Wilson et al.
20040128274 July 1, 2004 Snapp et al.
20040162802 August 19, 2004 Jonas
20040210763 October 21, 2004 Jonas
20050060556 March 17, 2005 Jonas
20060010119 January 12, 2006 Jonas
Foreign Patent Documents
10154231 June 1998 JP
10327142 December 1998 JP
11224236 August 1999 JP
11265432 September 1999 JP
2002222170 August 2002 JP
2008305662 December 2008 JP
2009204401 September 2009 JP
98/52317 November 1998 WO
Other references
  • Winkler et al., The State of Record Linkage and Current Research Problems.
  • Winkler et al., Advanced Methods for Record Linkage.
  • Jaro, “Record Linkage Research and the Calibration of Record Linkage Algorithms”, U.S. Bureau of the Census, Report No. rr-84/27 (Aug. 9, 1984).
  • DeWitt et al., An Evaluation of Non-Equijoin Algorithms, Proc. 17th Intl. Conf. on Very Large Data Bases, Sep. 1991, pp. 443-452.
  • Li et al., Skew Handling Techniques in Sort-Merge Join.
  • Verykios et al., A Bayesian decision model for cost optimal record matching, The VLDB Journal, 2000, vol. 12, Nos. 28-450, pp. 28-40.
  • van den Bercken et al., The Bulk Index Join: A Generic Approach to Processing Non-Equijoins.
  • Monge, Matching Algorithms within a Duplicate Detection System.
  • Yuwono et al., Search and Ranking Algorithms for Locating Resources on the World Wide Web, Intl. Conf. on Data Engineering, 1996, pp. 164-171.
  • Hou et al., Medical Image Retrieval by Spatial Features, 1992 IEEE Intl. Conf. on Systems, Man and Cybernetics, Oct. 1992, vol. 1, pp. 1364-1369.
  • Callan et al., Searching Distributed Collections With Inference Networks, Proc. 18th Annual Intl. ACM SIGIR Conf. on R&D in Information Retrieval, Jul. 9-13, 1995, pp. 21-28.
  • DeFazio et al., Integrating IR and RDBMS Using Cooperative Indexing, Proc. 18th Annual Intl. ACM SIGIR Conf. on R&D in Information Retrieval, Jul. 9-13, 1995, pp. 84-92.
  • Sclaroff et al., ImageRover: A Content-Based Image Browser for the World Wide Web, IEEE Workshop on Content-Based Access of Image and Video Libraries, Jun. 1997, pp. 2-9.
  • Knoblock, Searching the World Wide Web, Trends & Controversies, Jan.-Feb. 1997, pp. 8-24.
  • Amba et al., Automatic Linking of Thesauri, Proc. 18th Annual Intl. ACM SIGIR Conf. on R&D in Information Retrieval, Jul. 9-13, 1995, pp. 181-188.
  • Gelbart et al., Toward a Comprehensive Legal Information Retrieval System, Database and Expert Systems Applns., Proc. Intl. Conf. in Vienna, Austria, 1990, pp. 121-125.
  • Kimoto et al., Construction of a Dynamic Thesaurus and Its Use for Associated Information Retrieval, Proc. 13th Intl. Conf. on R&D in Information Retrieval, Sep. 5-7, 1990, pp. 227-241.
  • Findler, Information Retrieval Systems, An Artificial Intelligence Technique for Information and Fact Retrieval, 1991.
  • Yearwood et al., Retrieving cases for treatment advice in nursing using text representation and structured text retrieval, Artificial Intelligence in Medicine, Jan. 1997, vol. 9, No. 1, pp. 79-98.
  • Batory et al., Implementation Concepts for an Extensible Data Model and Data Language, acm Transactions on Database Systems, Sep. 1988, vol. 13, No. 3, pp. 231-262.
  • Haisten, Designing a Data Warehouse, InfoDB, vol. 9, No. 2, pp. 2-9.
  • Labio et al., The WHIPS Prototype for Data Warehouse Creation and Maintenance, 1997, pp. 557-559.
  • Haisten, Information Discovery in the Data Warehouse, InfoDB, vol. 9, No. 6, pp. 14-25.
  • Suardi et al., Execution of Extended Multidatabase SQL, 1993, pp. 641-650.
  • Romberg, Meta-Entities Keeping Pace with Change, Database Programming & Design; Jan. 1995, pp. 54-59.
  • Fellegi, Tutorial on the Fellegi-Sunter Model for Record Linkage, Section II: Overview of Applications and Introduction to Theory, pp. 127-178.
  • Crane et al., “Project LINK-LINK: An Interactive Database of Administrative Record Linkage Studies”, National Center for Education Statistics and U.S. Department of Agriculture, Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 311-315 (May 9-10, 1985).
  • Childers et al., The IRS/Census Direct Match Study—Final Report, Bureau of the Census—Statistical Research Division Report Series, Aug. 1, 1984, pp. 1-22.
  • LaPlant Jr., Generalized Data Standardization Program Generator (GENSTAN) Program Generation System Part II, Bureau of the Census Statistical Research Division Report Series, Jul. 22, 1986.
  • LaPlant Jr., User's Guide for the Generalized Record Linkage Program Generator (GENLINK) SRD Program Generator System User's Guide: Part III, Bureau of the Census Statistical Research Division Report Series, Sep. 1, 1986.
  • Winkler et al., An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census.
  • Winkler, Matching and Record Linkage.
  • Scheuren et al., Recursive Merging and Analysis of Administrative Lists and Data.
  • Winkler, Record Linkage Software and Methods for Merging Administrative Lists.
  • Wang et al., Automatically Detecting Deceptive Criminal Identities.
  • Hernandez, A Generalization of Band Joins and the Merge/Purge Problem, IEEE Trans. on Knowledge and Data Engineering, 1996.
  • Lu et al., Pipelined Band Join in Shared-Nothing Systems, Proc. 1995 Asian Computing Science Conf., Dec. 1995, pp. 239-253.
  • Beebe, “Why are Epidemiologists Interested in Matching Algorithms?”, National Cancer Institute, Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 139-143 (May 9-10, 1985).
  • Boruch et al., “Exact Matching of Micro Data Sets in Social Research: Benefits and Problems”, Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 145-153 (May 9-10, 1985).
  • Scheuren, “Methodologic Issues in Linkage of Multiple Data Bases”, National Academy of Sciences, Panel on Statistics for an Aging Population (Sep. 13, 1985), reprinted Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 155-178 (May 9-10, 1985).
  • Winkler, “Processing of Lists and String Comparison”, Energy Information Administration, Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 181-187 (May 9-10, 1985).
  • Jaro, “Current Record Linkage Research”, U.S. Bureau of the Census, Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 317-320 (May 9-10, 1985).
  • Smith, “Record-Keeping and Data Preparation Practices to Facilitate Record Linkage”, Statistics Canada, Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 321-326 (May 9-10, 1985).
  • Hill et al., “Generalized Iterative Record Linkage System”, Statistics Canada, Record Linkage Techniques—1985: Proceedings of the Workshop on Exact Matching Methodologies, Arlington, VA, pp. 327-333 (May 9-10, 1985).
  • Howe et al., “A Generalized Iterative Record Linkage Computer System for Use in Medical Follow-up Studies”, Computers and Biomedical Research 14, pp. 327, 240 (1981).
  • Lee, Joon Ho, “Combining Multiple Evidence from Different Properties of Weighting Schemes”, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA (Jul. 9-Jul. 13, 1995) pp. 180-188.
  • International Search Report from PCT/US03/35607, dated Apr. 24, 2004.
  • International Search Report for International Application No. PCT/US04/09035 dated Nov. 4, 2004.
  • Written Opinion for International Application No. PCT/US04/09035 dated Nov. 4, 2004.
  • Japanese Office Action dated Mar. 30, 2010.
  • Written Opinion for International Application No. PCT/US02/41630 dated Jan. 19, 2005.
  • International Search Report from PCT/US03/35607 dated Apr. 23, 2004.
  • International Search Report for PCT/US04/03465 dated Apr. 7, 2005.
  • International Search Report for PCT/US03/41662 dated May 28, 2004.
  • Hanming Tu, Pattern Recognition and Geographical Data Standarization, The Proceedings of Geoinformatics '99 Conference, Jun. 19-21, 1999, pp. 1-7.
  • Matchware Technologies Inc., AUTOSTAN, Generalized Standardization System, User's Manual Version 4.6, Feb. 11, 1998, pp. 1-90.
  • Vality Technology Incorporated, INTEGRITY, Data Re-enginering Environment, SUPERSTAN User Guide Version 2.5, Mar. 1998.
Patent History
Patent number: 7962757
Type: Grant
Filed: Mar 24, 2004
Date of Patent: Jun 14, 2011
Patent Publication Number: 20050066182
Assignee: International Business Machines Corporation (Armonk, NY)
Inventors: Jeffrey J. Jonas (Las Vegas, NV), Steven Bruce Dunham (Las Vegas, NV)
Primary Examiner: Nasser Moazzami
Assistant Examiner: Fikremariam Yalew
Attorney: Gates & Cooper LLP
Application Number: 10/807,826