Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.
Type:
Application
Filed:
March 28, 2014
Publication date:
October 1, 2015
Applicant:
DATATAMER, INC.
Inventors:
Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker