Patents by Inventor Michael TANDECKI
Michael TANDECKI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
SYSTEM FOR PREPARING MACHINE LEARNING TRAINING DATA FOR USE IN EVALUATION OF TERM DEFINITION QUALITY
Publication number: 20250131338Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.Type: ApplicationFiled: December 31, 2024Publication date: April 24, 2025Inventors: Gretel De Paepe, Michael Tandecki -
Patent number: 12282581Abstract: The present disclosure is directed to systems and methods for generating synthetic data. Entities maintain large amounts of data, and conducting probability distribution and/or correlation analyses on these large datasets while maintaining data privacy for personally identifiable information (PII) is difficult. The present application describes methods for identifying data fields that comprise PII and synthesizing the data so that the PII is removed, but the integrity of the probability distribution and/or correlation metrics remains. Certain data is grouped into data fields based on a data table type, and each data type may be assigned a certain data analysis strategy, which may comprise a joint probability distribution, a characterbase data faker, a genetic regex generator, and/or a timeseries model. A table sketch may be generated that may comprise at least one synthesizer recipe to be used in future data queries.Type: GrantFiled: April 15, 2022Date of Patent: April 22, 2025Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Vicky Froyen, Kelsey Schuster, Michael Tandecki, Anna Filipiak
-
System for preparing machine learning training data for use in evaluation of term definition quality
Patent number: 12190207Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.Type: GrantFiled: December 22, 2020Date of Patent: January 7, 2025Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Michael Tandecki -
Publication number: 20240395245Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.Type: ApplicationFiled: August 5, 2024Publication date: November 28, 2024Inventors: Michael Tandecki, Michael Maes, Anna Filipiak
-
Patent number: 12057108Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.Type: GrantFiled: October 15, 2020Date of Patent: August 6, 2024Assignee: Collibra Belgium BVInventors: Michael Tandecki, Michael Maes, Anna Filipiak
-
Patent number: 11966696Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.Type: GrantFiled: May 18, 2023Date of Patent: April 23, 2024Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Michael Tandecki
-
Publication number: 20230376683Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.Type: ApplicationFiled: May 18, 2023Publication date: November 23, 2023Inventors: Gretel De Paepe, Michael Tandecki
-
Publication number: 20230334169Abstract: The present disclosure is directed to systems and methods for generating synthetic data. Entities maintain large amounts of data, and conducting probability distribution and/or correlation analyses on these large datasets while maintaining data privacy for personally identifiable information (PII) is difficult. The present application describes methods for identifying data fields that comprise PII and synthesizing the data so that the PII is removed, but the integrity of the probability distribution and/or correlation metrics remains. Certain data is grouped into data fields based on a data table type, and each data type may be assigned a certain data analysis strategy, which may comprise a joint probability distribution, a characterbase data faker, a genetic regex generator, and/or a timeseries model. A table sketch may be generated that may comprise at least one synthesizer recipe to be used in future data queries.Type: ApplicationFiled: April 15, 2022Publication date: October 19, 2023Inventors: Gretel De Paepe, Vicky Froyen, Kelsey Schuster, Michael Tandecki, Anna Filipiak
-
Patent number: 11669682Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.Type: GrantFiled: December 22, 2020Date of Patent: June 6, 2023Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Michael Tandecki
-
SYSTEM FOR PREPARING MACHINE LEARNING TRAINING DATA FOR USE IN EVALUATION OF TERM DEFINITION QUALITY
Publication number: 20220198323Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.Type: ApplicationFiled: December 22, 2020Publication date: June 23, 2022Inventors: Gretel De Paepe, Michael Tandecki -
Publication number: 20220198139Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.Type: ApplicationFiled: December 22, 2020Publication date: June 23, 2022Inventors: Gretel De Paepe, Michael Tandecki
-
Publication number: 20210319785Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.Type: ApplicationFiled: October 15, 2020Publication date: October 14, 2021Inventors: Michael Tandecki, Michael Maes, Anna Filipiak
-
Patent number: 11138477Abstract: The present disclosure relates to methods and systems to classify data. A set of classification modules may inspect received data and identify proposed classifications for confidence values for the received data. An aggregation module may receive and aggregate the proposed classifications and confidence values. Based on the aggregated proposed classifications and the confidence values, the aggregation module may generate a final classification for the received data. An external device may perform an action with respect to the received data based on the final classification associated with the data. The action performed may include maintaining the data such that the data may be retrieved upon receipt a request for the data. Any of the classification modules and the aggregation module may be based on training data that may be utilized in subsequent iterations of classifying data to increase classification accuracy.Type: GrantFiled: August 15, 2019Date of Patent: October 5, 2021Assignee: COLLIBRA NVInventors: Michael Tandecki, Michael Maes, Gretel De Paepe, Anna Filipiak
-
Publication number: 20210049421Abstract: The present disclosure relates to methods and systems to classify data. A set of classification modules may inspect received data and identify proposed classifications for confidence values for the received data. An aggregation module may receive and aggregate the proposed classifications and confidence values. Based on the aggregated proposed classifications and the confidence values, the aggregation module may generate a final classification for the received data. An external device may perform an action with respect to the received data based on the final classification associated with the data. The action performed may include maintaining the data such that the data may be retrieved upon receipt a request for the data. Any of the classification modules and the aggregation module may be based on training data that may be utilized in subsequent iterations of classifying data to increase classification accuracy.Type: ApplicationFiled: August 15, 2019Publication date: February 18, 2021Inventors: Michael TANDECKI, Michael MAES, Gretel DE PAEPE, Anna FILIPIAK