Patents Assigned to Tamr, Inc.
  • Patent number: 11948055
    Abstract: Record clustering is performed for a collection of records using training rules, training-rule labels, training data created from a sample of pairs of records, a pair-wise classifier, and a clustering algorithm. Record clustering is also performed for a collection of records using prediction rules, prediction-rule labels, a pair-wise classifier, and a clustering algorithm.
    Type: Grant
    Filed: March 1, 2023
    Date of Patent: April 2, 2024
    Assignee: TAMR, INC.
    Inventors: George Anwar Dany Beskales, Nikolaus Bates-Haus, Ihab F. Ilyas
  • Patent number: 11782966
    Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: October 10, 2023
    Assignee: TAMR, INC.
    Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
  • Patent number: 11500818
    Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.
    Type: Grant
    Filed: February 19, 2021
    Date of Patent: November 15, 2022
    Assignee: TAMR, INC.
    Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
  • Patent number: 11416780
    Abstract: A collection of clusters are selected to be used in training in an active learning workflow when using clusters to train supervised entity resolution in data sets. A collection of records is provided wherein each record in the collection has a cluster membership. A collection of record pairs is also provided, each record pair containing two distinct records from the collection of records, and each record pair having a similarity score. A collection of clusters is generated with uncertainty from the collection of records and the collection of record pairs. A subset of the collection of clusters with uncertainty is then selected using weighted sampling, wherein a function of the cluster uncertainty is used as the weight in the weighted sampling. The subset of the collection of clusters with uncertainty is the collection of clusters for training in and active learning workflow when using clusters to train supervised entity resolution in data sets.
    Type: Grant
    Filed: September 22, 2021
    Date of Patent: August 16, 2022
    Assignee: TAMR, INC.
    Inventor: George Anwar Dany Beskales
  • Patent number: 11321359
    Abstract: Methods are provided to represent proposed changes to clusterings for ease of review, as well as tools to help subject matter experts identify clusters that warrant review versus those that do not. These tools make overall assessment of proposed clustering changes and targeted curation practical at large scale. Use of these tools and method enables efficient data management operations when dealing with extreme scale, such as where entity resolution involves clusterings created from data sources involving millions of entities.
    Type: Grant
    Filed: December 6, 2019
    Date of Patent: May 3, 2022
    Assignee: TAMR, INC.
    Inventors: Timothy Kwok Webber, George Anwar Dany Beskales, Dennis Cunningham, Alan Benjamin Wagner Rodriguez, Liam Cleary
  • Patent number: 11294937
    Abstract: A method is provided for producing a record clustering with estimated accuracy metrics with confidence intervals. These metrics can be used to determine whether a clustering should be accepted as the output of the system, and whether model training is necessary to meet desired clustering accuracy. A collection of test records is used in the process, wherein each test record is a member of a collection of input records.
    Type: Grant
    Filed: October 4, 2021
    Date of Patent: April 5, 2022
    Assignee: TAMR, INC.
    Inventors: George Anwar Dany Beskales, Alexandra V. Batchelor, Brian A. Long
  • Patent number: 11232143
    Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.
    Type: Grant
    Filed: October 12, 2020
    Date of Patent: January 25, 2022
    Assignee: TAMR, INC.
    Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
  • Patent number: 11204707
    Abstract: Fast record deduplication is accomplished by providing as an input, data records having multiple attributes, and local similarity functions of individual attributes with local similarity thresholds. Bin IDs are then generated based on the local similarity functions and the local similarity thresholds. The Bin IDs are unique identifiers of a respective bin of records, and the bin of records is a set of records that are possibly pairwise similar. Local candidate pairs are identified based on data records that share Bin IDs. The local candidate pairs are aggregated to produce a set of global candidate pairs. The set of global candidate pairs are filtered by deciding whether a pair of data records represents a duplicate.
    Type: Grant
    Filed: April 3, 2020
    Date of Patent: December 21, 2021
    Assignee: TAMR, INC.
    Inventors: George Beskales, Ihab F. Ilyas
  • Patent number: 11049028
    Abstract: Record clustering is performed by learning from verified clusters which are used as the source of training data in a deduplication workflow utilizing supervised machine learning.
    Type: Grant
    Filed: March 9, 2021
    Date of Patent: June 29, 2021
    Assignee: TAMR, INC.
    Inventors: George Anwar Dany Beskales, Pedro Giesemann Cattori, Alexandra V. Batchelor, Brian A. Long, Nikolaus Bates-Haus
  • Patent number: 11042523
    Abstract: A data curation system is provided that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow for querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and metadata thereof.
    Type: Grant
    Filed: December 11, 2019
    Date of Patent: June 22, 2021
    Assignee: TAMR, INC.
    Inventors: Vladimir Gluzman Peregrine, Ihab F. Ilyas, Michael Ralph Stonebraker, Stan Zdonik, Andrew H. Palmer, Alexander Richter Pagan, Daniel Meir Bruckner, George Beskales, Aizana Turmukhametova, Tianyu Zhu, Kanak Kshetri, Jason Liu, Nikolaus Bates-Haus
  • Patent number: 11003636
    Abstract: A system and method of use resolves the frustration of repeated manual work during schema mapping. The system utilizes a transformation graph—a collection of nodes (unified attributes) and edges (transformations) in which source attributes are mapped and transformed. The system further leverages existing mappings and transformations for the purpose of suggesting to a user the optimal paths (i.e., the lowest cost paths) for mapping new sources, which is particularly useful when new sources share similarity with previously mapped sources and require the same transformations. As such, the system also promotes an evolving schema by allowing users to select which unified attributes they want to include in a target schema at any time. The system addresses the technical challenge of finding optimal transformation paths and how to present these to the user for evaluation.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: May 11, 2021
    Assignee: TAMR, INC.
    Inventors: Sharon Roth, Ihab F. Ilyas, Daniel Meir Bruckner, Gideon Goldin
  • Patent number: 10929348
    Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions sent to the experts. The system solves the problem of schema mapping and record deduplication in a holistic way by unifying these problems into a unified linkage problem.
    Type: Grant
    Filed: November 23, 2016
    Date of Patent: February 23, 2021
    Assignee: TAMR, INC.
    Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
  • Patent number: 10877948
    Abstract: Given a local distance metric for geospatial features, a binning is produced that is guaranteed to label features within a given distance threshold with the same bin, while labeling a minimum number of features separated by a distance that is greater than the threshold with the same bin.
    Type: Grant
    Filed: July 1, 2020
    Date of Patent: December 29, 2020
    Assignee: TAMR, INC.
    Inventors: George Anwar Dany Beskales, Nikolaus Bates-Haus
  • Patent number: 10860548
    Abstract: A system and method of use resolves the frustration of repeated manual work during schema mapping. The system utilizes a transformation graph—a collection of nodes (unified attributes) and edges (transformations) in which source attributes are mapped and transformed. The system further leverages existing mappings and transformations for the purpose of suggesting to a user the optimal paths (i.e., the lowest cost paths) for mapping new sources, which is particularly useful when new sources share similarity with previously mapped sources and require the same transformations. As such, the system also promotes an evolving schema by allowing users to select which unified attributes they want to include in a target schema at any time. The system addresses the technical challenge of finding optimal transformation paths and how to present these to the user for evaluation.
    Type: Grant
    Filed: December 5, 2019
    Date of Patent: December 8, 2020
    Assignee: TAMR, INC.
    Inventors: Sharon Roth, Ihab F. Ilyas, Daniel Meir Bruckner, Gideon Goldin
  • Patent number: 10817362
    Abstract: Structured metadata is automatically captured regarding issues reported by a user when the user interacts with application software for presentation, analysis, or management of structured data. The reported issues correspond to structured data that is displayed by the application software. During user interaction with the application software, a user interface display screen is presented that includes one or more fields for reporting an issue with respect to structured data that is presently being displayed by the application software. Structured metadata is then automatically captured related to the reported issue. The structured metadata includes at least a location within the structured data for the reported issue. Records are stored of each reported issue in a database. Each record includes the reported issue, and the automatically captured structured metadata related to the reported issue.
    Type: Grant
    Filed: February 25, 2020
    Date of Patent: October 27, 2020
    Assignee: TAMR, INC.
    Inventors: Daniel Meir Bruckner, Gideon Goldin, Matthew Holzapfel, Nicolas Malfroy-Camine
  • Patent number: 10803105
    Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.
    Type: Grant
    Filed: December 5, 2019
    Date of Patent: October 13, 2020
    Assignee: Tamr, Inc.
    Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
  • Patent number: 10613785
    Abstract: A very efficient computer system is presented to generate all pairs of records that have a certain similarity. Similarity is defined in terms of the textual similarity of the record attributes and/or absolute difference for numeric record attributes. Software assigns each record to a number of bins, and then compares pairs of records that belong to the same bin. This is more efficient than comparing all pairs of records since the number of records compared to each other is much smaller.
    Type: Grant
    Filed: October 11, 2017
    Date of Patent: April 7, 2020
    Assignee: Tamr, Inc.
    Inventors: George Beskales, Ihab F. Ilyas
  • Patent number: 9542412
    Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.
    Type: Grant
    Filed: March 28, 2014
    Date of Patent: January 10, 2017
    Assignee: Tamr, Inc.
    Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker