Patents by Inventor Michael TANDECKI

Michael TANDECKI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEM FOR PREPARING MACHINE LEARNING TRAINING DATA FOR USE IN EVALUATION OF TERM DEFINITION QUALITY

Publication number: 20250131338

Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.

Type: Application

Filed: December 31, 2024

Publication date: April 24, 2025

Inventors: Gretel De Paepe, Michael Tandecki
Systems and methods for generating synthetic data

Patent number: 12282581

Abstract: The present disclosure is directed to systems and methods for generating synthetic data. Entities maintain large amounts of data, and conducting probability distribution and/or correlation analyses on these large datasets while maintaining data privacy for personally identifiable information (PII) is difficult. The present application describes methods for identifying data fields that comprise PII and synthesizing the data so that the PII is removed, but the integrity of the probability distribution and/or correlation metrics remains. Certain data is grouped into data fields based on a data table type, and each data type may be assigned a certain data analysis strategy, which may comprise a joint probability distribution, a characterbase data faker, a genetic regex generator, and/or a timeseries model. A table sketch may be generated that may comprise at least one synthesizer recipe to be used in future data queries.

Type: Grant

Filed: April 15, 2022

Date of Patent: April 22, 2025

Assignee: Collibra Belgium BV

Inventors: Gretel De Paepe, Vicky Froyen, Kelsey Schuster, Michael Tandecki, Anna Filipiak
System for preparing machine learning training data for use in evaluation of term definition quality

Patent number: 12190207

Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.

Type: Grant

Filed: December 22, 2020

Date of Patent: January 7, 2025

Assignee: Collibra Belgium BV

Inventors: Gretel De Paepe, Michael Tandecki
METHODS AND SYSTEMS FOR WORD EDIT DISTANCE EMBEDDING

Publication number: 20240395245

Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.

Type: Application

Filed: August 5, 2024

Publication date: November 28, 2024

Inventors: Michael Tandecki, Michael Maes, Anna Filipiak
Methods and systems for word edit distance embedding

Patent number: 12057108

Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.

Type: Grant

Filed: October 15, 2020

Date of Patent: August 6, 2024

Assignee: Collibra Belgium BV

Inventors: Michael Tandecki, Michael Maes, Anna Filipiak
Bespoke transformation and quality assessment for term definition

Patent number: 11966696

Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.

Type: Grant

Filed: May 18, 2023

Date of Patent: April 23, 2024

Assignee: Collibra Belgium BV

Inventors: Gretel De Paepe, Michael Tandecki
BESPOKE TRANSFORMATION AND QUALITY ASSESSMENT FOR TERM DEFINITION

Publication number: 20230376683

Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.

Type: Application

Filed: May 18, 2023

Publication date: November 23, 2023

Inventors: Gretel De Paepe, Michael Tandecki
SYSTEMS AND METHODS FOR GENERATING SYNTHETIC DATA

Publication number: 20230334169

Abstract: The present disclosure is directed to systems and methods for generating synthetic data. Entities maintain large amounts of data, and conducting probability distribution and/or correlation analyses on these large datasets while maintaining data privacy for personally identifiable information (PII) is difficult. The present application describes methods for identifying data fields that comprise PII and synthesizing the data so that the PII is removed, but the integrity of the probability distribution and/or correlation metrics remains. Certain data is grouped into data fields based on a data table type, and each data type may be assigned a certain data analysis strategy, which may comprise a joint probability distribution, a characterbase data faker, a genetic regex generator, and/or a timeseries model. A table sketch may be generated that may comprise at least one synthesizer recipe to be used in future data queries.

Type: Application

Filed: April 15, 2022

Publication date: October 19, 2023

Inventors: Gretel De Paepe, Vicky Froyen, Kelsey Schuster, Michael Tandecki, Anna Filipiak
Bespoke transformation and quality assessment for term definition

Patent number: 11669682

Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.

Type: Grant

Filed: December 22, 2020

Date of Patent: June 6, 2023

Assignee: Collibra Belgium BV

Inventors: Gretel De Paepe, Michael Tandecki
SYSTEM FOR PREPARING MACHINE LEARNING TRAINING DATA FOR USE IN EVALUATION OF TERM DEFINITION QUALITY

Publication number: 20220198323

Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.

Type: Application

Filed: December 22, 2020

Publication date: June 23, 2022

Inventors: Gretel De Paepe, Michael Tandecki
BESPOKE TRANSFORMATION AND QUALITY ASSESSMENT FOR TERM DEFINITION

Publication number: 20220198139

Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.

Type: Application

Filed: December 22, 2020

Publication date: June 23, 2022

Inventors: Gretel De Paepe, Michael Tandecki
METHODS AND SYSTEMS FOR WORD EDIT DISTANCE EMBEDDING

Publication number: 20210319785

Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.

Type: Application

Filed: October 15, 2020

Publication date: October 14, 2021

Inventors: Michael Tandecki, Michael Maes, Anna Filipiak
Classification of data using aggregated information from multiple classification modules

Patent number: 11138477

Abstract: The present disclosure relates to methods and systems to classify data. A set of classification modules may inspect received data and identify proposed classifications for confidence values for the received data. An aggregation module may receive and aggregate the proposed classifications and confidence values. Based on the aggregated proposed classifications and the confidence values, the aggregation module may generate a final classification for the received data. An external device may perform an action with respect to the received data based on the final classification associated with the data. The action performed may include maintaining the data such that the data may be retrieved upon receipt a request for the data. Any of the classification modules and the aggregation module may be based on training data that may be utilized in subsequent iterations of classifying data to increase classification accuracy.

Type: Grant

Filed: August 15, 2019

Date of Patent: October 5, 2021

Assignee: COLLIBRA NV

Inventors: Michael Tandecki, Michael Maes, Gretel De Paepe, Anna Filipiak
CLASSIFICATION OF DATA USING AGGREGATED INFORMATION FROM MULTIPLE CLASSIFICATION MODULES

Publication number: 20210049421

Abstract: The present disclosure relates to methods and systems to classify data. A set of classification modules may inspect received data and identify proposed classifications for confidence values for the received data. An aggregation module may receive and aggregate the proposed classifications and confidence values. Based on the aggregated proposed classifications and the confidence values, the aggregation module may generate a final classification for the received data. An external device may perform an action with respect to the received data based on the final classification associated with the data. The action performed may include maintaining the data such that the data may be retrieved upon receipt a request for the data. Any of the classification modules and the aggregation module may be based on training data that may be utilized in subsequent iterations of classifying data to increase classification accuracy.

Type: Application

Filed: August 15, 2019

Publication date: February 18, 2021

Inventors: Michael TANDECKI, Michael MAES, Gretel DE PAEPE, Anna FILIPIAK