Patents by Inventor George Beskales

George Beskales has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Computer-implemented method for performing hierarchical classification

Patent number: 11782966

Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.

Type: Grant

Filed: January 21, 2022

Date of Patent: October 10, 2023

Assignee: TAMR, INC.

Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
Method and system for large scale data curation

Patent number: 11500818

Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.

Type: Grant

Filed: February 19, 2021

Date of Patent: November 15, 2022

Assignee: TAMR, INC.

Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
Computer-implemented method for performing hierarchical classification

Patent number: 11232143

Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.

Type: Grant

Filed: October 12, 2020

Date of Patent: January 25, 2022

Assignee: TAMR, INC.

Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
Scalable binning for big data deduplication

Patent number: 11204707

Abstract: Fast record deduplication is accomplished by providing as an input, data records having multiple attributes, and local similarity functions of individual attributes with local similarity thresholds. Bin IDs are then generated based on the local similarity functions and the local similarity thresholds. The Bin IDs are unique identifiers of a respective bin of records, and the bin of records is a set of records that are possibly pairwise similar. Local candidate pairs are identified based on data records that share Bin IDs. The local candidate pairs are aggregated to produce a set of global candidate pairs. The set of global candidate pairs are filtered by deciding whether a pair of data records represents a duplicate.

Type: Grant

Filed: April 3, 2020

Date of Patent: December 21, 2021

Assignee: TAMR, INC.

Inventors: George Beskales, Ihab F. Ilyas
Data curation system with version control for workflow states and provenance

Patent number: 11042523

Abstract: A data curation system is provided that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow for querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and metadata thereof.

Type: Grant

Filed: December 11, 2019

Date of Patent: June 22, 2021

Assignee: TAMR, INC.

Inventors: Vladimir Gluzman Peregrine, Ihab F. Ilyas, Michael Ralph Stonebraker, Stan Zdonik, Andrew H. Palmer, Alexander Richter Pagan, Daniel Meir Bruckner, George Beskales, Aizana Turmukhametova, Tianyu Zhu, Kanak Kshetri, Jason Liu, Nikolaus Bates-Haus
METHOD AND SYSTEM FOR LARGE SCALE DATA CURATION

Publication number: 20210173817

Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.

Type: Application

Filed: February 19, 2021

Publication date: June 10, 2021

Inventors: Nikolaus BATES-HAUS, George BESKALES, Daniel Meir BRUCKNER, Ihab F. ILYAS, Alexander Richter PAGAN, Michael Ralph STONEBRAKER
Method and system for large scale data curation

Patent number: 10929348

Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions sent to the experts. The system solves the problem of schema mapping and record deduplication in a holistic way by unifying these problems into a unified linkage problem.

Type: Grant

Filed: November 23, 2016

Date of Patent: February 23, 2021

Assignee: TAMR, INC.

Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
Computer-implemented method for performing hierarchical classification

Patent number: 10803105

Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.

Type: Grant

Filed: December 5, 2019

Date of Patent: October 13, 2020

Assignee: Tamr, Inc.

Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
SCALABLE BINNING FOR BIG DATA DEDUPLICATION

Publication number: 20200233597

Abstract: Fast record deduplication is accomplished by providing as an input, data records having multiple attributes, and local similarity functions of individual attributes with local similarity thresholds. Bin IDs are then generated based on the local similarity functions and the local similarity thresholds. The Bin IDs are unique identifiers of a respective bin of records, and the bin of records is a set of records that are possibly pairwise similar. Local candidate pairs are identified based on data records that share Bin IDs. The local candidate pairs are aggregated to produce a set of global candidate pairs. The set of global candidate pairs are filtered by deciding whether a pair of data records represents a duplicate.

Type: Application

Filed: April 3, 2020

Publication date: July 23, 2020

Inventors: George BESKALES, Ihab F. ILYAS
DATA CURATION SYSTEM WITH VERSION CONTROL FOR WORKFLOW STATES AND PROVENANCE

Publication number: 20200117643

Abstract: A data curation system is provided that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow for querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and metadata thereof.

Type: Application

Filed: December 11, 2019

Publication date: April 16, 2020

Inventors: Vladimir Gluzman PEREGRINE, Ihab F. ILYAS, Michael Ralph STONEBRAKER, Stan ZDONIK, Andrew H. PALMER, Alexander Richter PAGAN, Daniel Meir BRUCKNER, George BESKALES, Aizana TURMUKHAMETOVA, Tianyu ZHU, Kanak KSHETRI, Jason LIU, Nikolaus BATES-HAUS
Scalable binning for big data deduplication

Patent number: 10613785

Abstract: A very efficient computer system is presented to generate all pairs of records that have a certain similarity. Similarity is defined in terms of the textual similarity of the record attributes and/or absolute difference for numeric record attributes. Software assigns each record to a number of bins, and then compares pairs of records that belong to the same bin. This is more efficient than comparing all pairs of records since the number of records compared to each other is much smaller.

Type: Grant

Filed: October 11, 2017

Date of Patent: April 7, 2020

Assignee: Tamr, Inc.

Inventors: George Beskales, Ihab F. Ilyas
DATA CURATION SYSTEM WITH VERSION CONTROL FOR WORKFLOW STATES AND PROVENANCE

Publication number: 20180341667

Abstract: A data curation system that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow the querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and the metadata.

Type: Application

Filed: August 2, 2018

Publication date: November 29, 2018

Inventors: Vladimir Gluzman Peregrine, Ihab F. Ilyas, Michael Ralph Stonebraker, Stan Zdonik, Andrew H. Palmer, Alexander Richter Pagan, Daniel Meir Bruckner, George Beskales, Aizana Turmukhametova, Tianyu Zhu, Kanak Kshetri, Jason Liu, Nikolaus Bates-Haus
Method and system for integrating data into a database

Patent number: 9720986

Abstract: A method for integrating data into a database comprises storing data comprising a plurality of records which each comprise a plurality of attributes; analyzing a sample of records from the plurality of records by: identifying duplicate pairs of records in the sample records; analyzing each attribute of each record of the duplicate pairs of records to identify a respective attribute condition which is indicative that the pairs of records are duplicates; wherein the method further comprises: comparing each attribute of a record with the respective attribute condition and, if the attribute satisfies the attribute condition, allocating the record to a disjoint group which comprises records with an attribute that satisfies the same respective attribute condition; identifying duplicate pairs of records in the records in each disjoint group; identifying duplicate pairs of records in records that are not allocated to a disjoint group; and consolidating each duplicate pair of records into one consolidated record and s

Type: Grant

Filed: June 27, 2013

Date of Patent: August 1, 2017

Assignee: QATAR FOUNDATION

Inventors: George Beskales, Ihab Francis Ilyas Kaldas
METHOD AND SYSTEM FOR LARGE SCALE DATA CURATION

Publication number: 20170075918

Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions sent to the experts. The system solves the problem of schema mapping and record deduplication in a holistic way by unifying these problems into a unified linkage problem.

Type: Application

Filed: November 23, 2016

Publication date: March 16, 2017

Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
Method and system for large scale data curation

Patent number: 9542412

Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.

Type: Grant

Filed: March 28, 2014

Date of Patent: January 10, 2017

Assignee: Tamr, Inc.

Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
DATA CURATION SYSTEM WITH VERSION CONTROL FOR WORKFLOW STATES AND PROVENANCE

Publication number: 20160048542

Abstract: A data curation system that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow the querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and the metadata.

Type: Application

Filed: September 2, 2014

Publication date: February 18, 2016

Inventors: Vladimir Gluzman Peregrine, Ihab F. Ilyas, Michael Ralph Stonebraker, Stan Zdonik, Andrew H. Palmer, Alexander Richter Pagan, Daniel Meir Bruckner, George Beskales, Aizana Turmukhametova, Tianyu Zhu, Kanak Kshetri, Jason Liu, Nikolaus Bates-Haus
METHOD AND SYSTEM FOR LARGE SCALE DATA CURATION

Publication number: 20150278241

Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.

Type: Application

Filed: March 28, 2014

Publication date: October 1, 2015

Applicant: DATATAMER, INC.

Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
Data cleaning

Patent number: 8805798

Abstract: A computer-implemented method comprising partitioning data representing an input instance of a database including multiple tuples into multiple fragments of tuples, detecting tuples which violate a data quality specification in respective ones of the fragments, selecting a data cleaning asset on the basis of characteristics of errors in detected tuples for a fragment and based on declared asset capabilities, assigning a selected data cleaning asset to the fragment, the selected data cleaning asset to provide a set of candidate corrections for the detected tuples in the fragment, providing data representing an output instance of the database in which detected tuples are replaced with selected candidate corrections.

Type: Grant

Filed: May 10, 2012

Date of Patent: August 12, 2014

Assignee: Qatar Foundation

Inventors: Ihab Francis Ilyas Kaldas, George Beskales, Ahmed Elmagarmid
Method and System for Integrating Data Into a Database

Publication number: 20140156606

Abstract: A method for integrating data into a database comprises storing data comprising a plurality of records which each comprise a plurality of attributes; analysing a sample of records from the plurality of records by: identifying duplicate pairs of records in the sample records; analysing each attribute of each record of the duplicate pairs of records to identify a respective attribute condition which is indicative that the pairs of records are duplicates; wherein the method further comprises: comparing each attribute of a record with the respective attribute condition and, if the attribute satisfies the attribute condition, allocating the record to a disjoint group which comprises records with an attribute that satisfies the same respective attribute condition; identifying duplicate pairs of records in the records in each disjoint group; identifying duplicate pairs of records in records that are not allocated to a disjoint group; and consolidating each duplicate pair of records into one consolidated record and s

Type: Application

Filed: June 27, 2013

Publication date: June 5, 2014

Inventors: George BESKALES, Ihab Francis IIyas KALDAS
DATA CLEANING

Publication number: 20130275393

Abstract: A computer-implemented method comprising partitioning data representing an input instance of a database including multiple tuples into multiple fragments of tuples, detecting tuples which violate a data quality specification in respective ones of the fragments, selecting a data cleaning asset on the basis of characteristics of errors in detected tuples for a fragment and based on declared asset capabilities, assigning a selected data cleaning asset to the fragment, the selected data cleaning asset to provide a set of candidate corrections for the detected tuples in the fragment, providing data representing an output instance of the database in which detected tuples are replaced with selected candidate corrections.

Type: Application

Filed: May 10, 2012

Publication date: October 17, 2013

Applicant: Qatar Foundation

Inventors: Ihab Francis Ilyas Kaldas, George Beskales, Ahmed Elmagarmid