Patents Assigned to COLLIBRA NV

Systems and methods for predicting correct or missing data and data anomalies

Patent number: 11568328

Abstract: The present disclosure is directed to systems and methods for predicting and correcting data anomalies. In one example aspect, data is received by the system. The system may analyze the data by profiling the data for certain profiling statistics (e.g., min, max, mean, cardinality, etc.). At least one machine-learning algorithm (e.g., a Random-Forest algorithm) may be applied to the profiled data to identify potential relationships among certain data columns in the data. Once certain relationships are identified, the data that is related may be extracted to form an itemset. A second machine-learning algorithm (e.g., Frequent Pattern Growth algorithm) may be applied to the itemset to identify certain frequencies of related values in the itemset. Low frequency values may indicate anomalies in the dataset. If an anomaly is detected, the system may be configured to provide an intelligent remedial action, such as substituting certain values and/or filling in a missing value.

Type: Grant

Filed: April 21, 2021

Date of Patent: January 31, 2023

Assignee: Collibra NV

Inventors: Kirk J. Haslbeck, Brian N. Mearns
Systems and method of contextual data masking for private and secure data linkage

Patent number: 11366928

Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.

Type: Grant

Filed: January 29, 2020

Date of Patent: June 21, 2022

Assignee: Collibra NV

Inventors: Satyender Goel, Upwan Chachra, James B. Cushman, II
Classification of data using aggregated information from multiple classification modules

Patent number: 11138477

Abstract: The present disclosure relates to methods and systems to classify data. A set of classification modules may inspect received data and identify proposed classifications for confidence values for the received data. An aggregation module may receive and aggregate the proposed classifications and confidence values. Based on the aggregated proposed classifications and the confidence values, the aggregation module may generate a final classification for the received data. An external device may perform an action with respect to the received data based on the final classification associated with the data. The action performed may include maintaining the data such that the data may be retrieved upon receipt a request for the data. Any of the classification modules and the aggregation module may be based on training data that may be utilized in subsequent iterations of classifying data to increase classification accuracy.

Type: Grant

Filed: August 15, 2019

Date of Patent: October 5, 2021

Assignee: COLLIBRA NV

Inventors: Michael Tandecki, Michael Maes, Gretel De Paepe, Anna Filipiak

Systems and methods for predicting correct or missing data and data anomalies

Systems and method of contextual data masking for private and secure data linkage

Classification of data using aggregated information from multiple classification modules