Abstract: The present disclosure is directed to systems and methods for predicting and correcting data anomalies. In one example aspect, data is received by the system. The system may analyze the data by profiling the data for certain profiling statistics (e.g., min, max, mean, cardinality, etc.). At least one machine-learning algorithm (e.g., a Random-Forest algorithm) may be applied to the profiled data to identify potential relationships among certain data columns in the data. Once certain relationships are identified, the data that is related may be extracted to form an itemset. A second machine-learning algorithm (e.g., Frequent Pattern Growth algorithm) may be applied to the itemset to identify certain frequencies of related values in the itemset. Low frequency values may indicate anomalies in the dataset. If an anomaly is detected, the system may be configured to provide an intelligent remedial action, such as substituting certain values and/or filling in a missing value.
Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.
Type:
Grant
Filed:
January 29, 2020
Date of Patent:
June 21, 2022
Assignee:
Collibra NV
Inventors:
Satyender Goel, Upwan Chachra, James B. Cushman, II
Abstract: The present disclosure relates to methods and systems to classify data. A set of classification modules may inspect received data and identify proposed classifications for confidence values for the received data. An aggregation module may receive and aggregate the proposed classifications and confidence values. Based on the aggregated proposed classifications and the confidence values, the aggregation module may generate a final classification for the received data. An external device may perform an action with respect to the received data based on the final classification associated with the data. The action performed may include maintaining the data such that the data may be retrieved upon receipt a request for the data. Any of the classification modules and the aggregation module may be based on training data that may be utilized in subsequent iterations of classifying data to increase classification accuracy.
Type:
Grant
Filed:
August 15, 2019
Date of Patent:
October 5, 2021
Assignee:
COLLIBRA NV
Inventors:
Michael Tandecki, Michael Maes, Gretel De Paepe, Anna Filipiak