Patents by Inventor Ihab F. Ilyas
Ihab F. Ilyas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11948055Abstract: Record clustering is performed for a collection of records using training rules, training-rule labels, training data created from a sample of pairs of records, a pair-wise classifier, and a clustering algorithm. Record clustering is also performed for a collection of records using prediction rules, prediction-rule labels, a pair-wise classifier, and a clustering algorithm.Type: GrantFiled: March 1, 2023Date of Patent: April 2, 2024Assignee: TAMR, INC.Inventors: George Anwar Dany Beskales, Nikolaus Bates-Haus, Ihab F. Ilyas
-
Patent number: 11782966Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.Type: GrantFiled: January 21, 2022Date of Patent: October 10, 2023Assignee: TAMR, INC.Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
-
Patent number: 11500818Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.Type: GrantFiled: February 19, 2021Date of Patent: November 15, 2022Assignee: TAMR, INC.Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
-
Patent number: 11232143Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.Type: GrantFiled: October 12, 2020Date of Patent: January 25, 2022Assignee: TAMR, INC.Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
-
Patent number: 11204707Abstract: Fast record deduplication is accomplished by providing as an input, data records having multiple attributes, and local similarity functions of individual attributes with local similarity thresholds. Bin IDs are then generated based on the local similarity functions and the local similarity thresholds. The Bin IDs are unique identifiers of a respective bin of records, and the bin of records is a set of records that are possibly pairwise similar. Local candidate pairs are identified based on data records that share Bin IDs. The local candidate pairs are aggregated to produce a set of global candidate pairs. The set of global candidate pairs are filtered by deciding whether a pair of data records represents a duplicate.Type: GrantFiled: April 3, 2020Date of Patent: December 21, 2021Assignee: TAMR, INC.Inventors: George Beskales, Ihab F. Ilyas
-
Publication number: 20210334248Abstract: A system and method of use resolves the frustration of repeated manual work during schema mapping. The system utilizes a transformation graph—a collection of nodes (unified attributes) and edges (transformations) in which source attributes are mapped and transformed. The system further leverages existing mappings and transformations for the purpose of suggesting to a user the optimal paths (i.e., the lowest cost paths) for mapping new sources, which is particularly useful when new sources share similarity with previously mapped sources and require the same transformations. As such, the system also promotes an evolving schema by allowing users to select which unified attributes they want to include in a target schema at any time. The system addresses the technical challenge of finding optimal transformation paths and how to present these to the user for evaluation.Type: ApplicationFiled: May 10, 2021Publication date: October 28, 2021Inventors: Sharon ROTH, Ihab F. ILYAS, Daniel Meir BRUCKNER, Gideon GOLDIN
-
Patent number: 11042523Abstract: A data curation system is provided that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow for querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and metadata thereof.Type: GrantFiled: December 11, 2019Date of Patent: June 22, 2021Assignee: TAMR, INC.Inventors: Vladimir Gluzman Peregrine, Ihab F. Ilyas, Michael Ralph Stonebraker, Stan Zdonik, Andrew H. Palmer, Alexander Richter Pagan, Daniel Meir Bruckner, George Beskales, Aizana Turmukhametova, Tianyu Zhu, Kanak Kshetri, Jason Liu, Nikolaus Bates-Haus
-
Publication number: 20210173817Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.Type: ApplicationFiled: February 19, 2021Publication date: June 10, 2021Inventors: Nikolaus BATES-HAUS, George BESKALES, Daniel Meir BRUCKNER, Ihab F. ILYAS, Alexander Richter PAGAN, Michael Ralph STONEBRAKER
-
Patent number: 11003636Abstract: A system and method of use resolves the frustration of repeated manual work during schema mapping. The system utilizes a transformation graph—a collection of nodes (unified attributes) and edges (transformations) in which source attributes are mapped and transformed. The system further leverages existing mappings and transformations for the purpose of suggesting to a user the optimal paths (i.e., the lowest cost paths) for mapping new sources, which is particularly useful when new sources share similarity with previously mapped sources and require the same transformations. As such, the system also promotes an evolving schema by allowing users to select which unified attributes they want to include in a target schema at any time. The system addresses the technical challenge of finding optimal transformation paths and how to present these to the user for evaluation.Type: GrantFiled: July 18, 2018Date of Patent: May 11, 2021Assignee: TAMR, INC.Inventors: Sharon Roth, Ihab F. Ilyas, Daniel Meir Bruckner, Gideon Goldin
-
Patent number: 10929348Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions sent to the experts. The system solves the problem of schema mapping and record deduplication in a holistic way by unifying these problems into a unified linkage problem.Type: GrantFiled: November 23, 2016Date of Patent: February 23, 2021Assignee: TAMR, INC.Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
-
Patent number: 10860548Abstract: A system and method of use resolves the frustration of repeated manual work during schema mapping. The system utilizes a transformation graph—a collection of nodes (unified attributes) and edges (transformations) in which source attributes are mapped and transformed. The system further leverages existing mappings and transformations for the purpose of suggesting to a user the optimal paths (i.e., the lowest cost paths) for mapping new sources, which is particularly useful when new sources share similarity with previously mapped sources and require the same transformations. As such, the system also promotes an evolving schema by allowing users to select which unified attributes they want to include in a target schema at any time. The system addresses the technical challenge of finding optimal transformation paths and how to present these to the user for evaluation.Type: GrantFiled: December 5, 2019Date of Patent: December 8, 2020Assignee: TAMR, INC.Inventors: Sharon Roth, Ihab F. Ilyas, Daniel Meir Bruckner, Gideon Goldin
-
Patent number: 10803105Abstract: Given a number of records and a number of target classes to which these records belong to, a (weakly) supervised machine learning classification method leverages known possibly dirty classification rules, efficiently and accurately learns a classification model from training data, and applies the learned model to the data records to predict their classes.Type: GrantFiled: December 5, 2019Date of Patent: October 13, 2020Assignee: Tamr, Inc.Inventors: George Beskales, John Kraemer, Ihab F. Ilyas, Liam Cleary, Paul Roome
-
Publication number: 20200233597Abstract: Fast record deduplication is accomplished by providing as an input, data records having multiple attributes, and local similarity functions of individual attributes with local similarity thresholds. Bin IDs are then generated based on the local similarity functions and the local similarity thresholds. The Bin IDs are unique identifiers of a respective bin of records, and the bin of records is a set of records that are possibly pairwise similar. Local candidate pairs are identified based on data records that share Bin IDs. The local candidate pairs are aggregated to produce a set of global candidate pairs. The set of global candidate pairs are filtered by deciding whether a pair of data records represents a duplicate.Type: ApplicationFiled: April 3, 2020Publication date: July 23, 2020Inventors: George BESKALES, Ihab F. ILYAS
-
Publication number: 20200117643Abstract: A data curation system is provided that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow for querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and metadata thereof.Type: ApplicationFiled: December 11, 2019Publication date: April 16, 2020Inventors: Vladimir Gluzman PEREGRINE, Ihab F. ILYAS, Michael Ralph STONEBRAKER, Stan ZDONIK, Andrew H. PALMER, Alexander Richter PAGAN, Daniel Meir BRUCKNER, George BESKALES, Aizana TURMUKHAMETOVA, Tianyu ZHU, Kanak KSHETRI, Jason LIU, Nikolaus BATES-HAUS
-
Publication number: 20200110731Abstract: A system and method of use resolves the frustration of repeated manual work during schema mapping. The system utilizes a transformation graph—a collection of nodes (unified attributes) and edges (transformations) in which source attributes are mapped and transformed. The system further leverages existing mappings and transformations for the purpose of suggesting to a user the optimal paths (i.e., the lowest cost paths) for mapping new sources, which is particularly useful when new sources share similarity with previously mapped sources and require the same transformations. As such, the system also promotes an evolving schema by allowing users to select which unified attributes they want to include in a target schema at any time. The system addresses the technical challenge of finding optimal transformation paths and how to present these to the user for evaluation.Type: ApplicationFiled: December 5, 2019Publication date: April 9, 2020Inventors: Sharon ROTH, Ihab F. ILYAS, Daniel Meir BRUCKNER, Gideon GOLDIN
-
Patent number: 10613785Abstract: A very efficient computer system is presented to generate all pairs of records that have a certain similarity. Similarity is defined in terms of the textual similarity of the record attributes and/or absolute difference for numeric record attributes. Software assigns each record to a number of bins, and then compares pairs of records that belong to the same bin. This is more efficient than comparing all pairs of records since the number of records compared to each other is much smaller.Type: GrantFiled: October 11, 2017Date of Patent: April 7, 2020Assignee: Tamr, Inc.Inventors: George Beskales, Ihab F. Ilyas
-
Publication number: 20190384836Abstract: A system and method of use resolves the frustration of repeated manual work during schema mapping. The system utilizes a transformation graph—a collection of nodes (unified attributes) and edges (transformations) in which source attributes are mapped and transformed. The system further leverages existing mappings and transformations for the purpose of suggesting to a user the optimal paths (i.e., the lowest cost paths) for mapping new sources, which is particularly useful when new sources share similarity with previously mapped sources and require the same transformations. As such, the system also promotes an evolving schema by allowing users to select which unified attributes they want to include in a target schema at any time. The system addresses the technical challenge of finding optimal transformation paths and how to present these to the user for evaluation.Type: ApplicationFiled: July 18, 2018Publication date: December 19, 2019Inventors: Sharon Roth, Ihab F. Ilyas, Daniel Meir Bruckner, Gideon Goldin
-
Publication number: 20180341667Abstract: A data curation system that includes various methods to enable efficient reuse of human and machine effort. To reuse effort, various facilities are presented that model, save, and allow the querying of provenance and state information of a curation workflow and allow for incremental, stateful transitions of the data and the metadata.Type: ApplicationFiled: August 2, 2018Publication date: November 29, 2018Inventors: Vladimir Gluzman Peregrine, Ihab F. Ilyas, Michael Ralph Stonebraker, Stan Zdonik, Andrew H. Palmer, Alexander Richter Pagan, Daniel Meir Bruckner, George Beskales, Aizana Turmukhametova, Tianyu Zhu, Kanak Kshetri, Jason Liu, Nikolaus Bates-Haus
-
Publication number: 20170075918Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions sent to the experts. The system solves the problem of schema mapping and record deduplication in a holistic way by unifying these problems into a unified linkage problem.Type: ApplicationFiled: November 23, 2016Publication date: March 16, 2017Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker
-
Patent number: 9542412Abstract: An end-to-end data curation system and the various methods used in linking, matching, and cleaning large-scale data sources. The goal of this system is to provide scalable and efficient record deduplication. The system uses a crowd of experts to train the system. The system operator can optionally provide a set of hints to reduce the number of questions send to the experts. The system solves the problem of schema mapping and record deduplication a holistic way by unifying these problems into a unified linkage problem.Type: GrantFiled: March 28, 2014Date of Patent: January 10, 2017Assignee: Tamr, Inc.Inventors: Nikolaus Bates-Haus, George Beskales, Daniel Meir Bruckner, Ihab F. Ilyas, Alexander Richter Pagan, Michael Ralph Stonebraker