Abstract: The semantic integration problem for merging multiple databases of very large size, the merge/purge problem, can be solved by multiple runs of the sorted neighborhood method or the clustering method with small windows followed by the computation of the transitive closure over the results of each run. The sorted neighborhood method works well under this scheme but is computationally expensive due to the sorting phase. An alternative method based on data clustering that reduces the complexity to linear time making multiple runs followed by transitive closure feasible and efficient.
March 15, 1994
Date of Patent:
March 5, 1996
Salvatore J. Stolfo
Salvatore J. Stolfo, Mauricio A. Herna/ ndez