Patents Assigned to LeanTaas, Inc.
  • Patent number: 10558627
    Abstract: Method and system for cleansing and de-duplicating data in database are provided. The method includes filtering garbage records from a plurality of records based on data fields, and applying cleansing rules to create a cleansed database. A similarity vector is generated, where each vector corresponds to pairwise comparison of distinct data entries in cleansed database. Matching rules are applied to label each vector as one of matched, unmatched and unclassified. The method analyzes the vectors labeled as matched and unmatched to train a machine learning model to identify duplicates in the cleansed database. Unclassified vectors in the cleansed database are labeled as matched or unmatched by applying machine learning model on unclassified vectors. Thereafter, the method processes all the vectors labeled as matched to create clusters of records that are duplicates of each other. Further, records in each cluster are merged to obtain de-duplicated cleansed database using predefined consolidated rules.
    Type: Grant
    Filed: April 14, 2017
    Date of Patent: February 11, 2020
    Assignee: LeanTaas, Inc.
    Inventors: Hugh Cassidy, Sofia DeMarco, Jayant Lakshmikanthan