Patents by Inventor Hugh CASSIDY

Hugh CASSIDY has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and system for cleansing and de-duplicating data

Patent number: 10558627

Abstract: Method and system for cleansing and de-duplicating data in database are provided. The method includes filtering garbage records from a plurality of records based on data fields, and applying cleansing rules to create a cleansed database. A similarity vector is generated, where each vector corresponds to pairwise comparison of distinct data entries in cleansed database. Matching rules are applied to label each vector as one of matched, unmatched and unclassified. The method analyzes the vectors labeled as matched and unmatched to train a machine learning model to identify duplicates in the cleansed database. Unclassified vectors in the cleansed database are labeled as matched or unmatched by applying machine learning model on unclassified vectors. Thereafter, the method processes all the vectors labeled as matched to create clusters of records that are duplicates of each other. Further, records in each cluster are merged to obtain de-duplicated cleansed database using predefined consolidated rules.

Type: Grant

Filed: April 14, 2017

Date of Patent: February 11, 2020

Assignee: LeanTaas, Inc.

Inventors: Hugh Cassidy, Sofia DeMarco, Jayant Lakshmikanthan
METHOD AND SYSTEM FOR CLEANSING AND DE-DUPLICATING DATA

Publication number: 20170308557

Abstract: Method and system for cleansing and de-duplicating data in database are provided. The method includes filtering garbage records from a plurality of records based on data fields, and applying cleansing rules to create a cleansed database. A similarity vector is generated, where each vector corresponds to pairwise comparison of distinct data entries in cleansed database. Matching rules are applied to label each vector as one of matched, unmatched and unclassified. The method analyzes the vectors labeled as matched and unmatched to train a machine learning model to identify duplicates in the cleansed database. Unclassified vectors in the cleansed database are labeled as matched or unmatched by applying machine learning model on unclassified vectors. Thereafter, the method processes all the vectors labeled as matched to create clusters of records that are duplicates of each other. Further, records in each cluster are merged to obtain de-duplicated cleansed database using predefined consolidated rules.

Type: Application

Filed: April 14, 2017

Publication date: October 26, 2017

Inventors: Hugh CASSIDY, Sofia DeMARCO, Jayant LAKSHMIKANTHAN

Method and system for cleansing and de-duplicating data

METHOD AND SYSTEM FOR CLEANSING AND DE-DUPLICATING DATA