Patents Assigned to COLLIBRA
-
Patent number: 12282581Abstract: The present disclosure is directed to systems and methods for generating synthetic data. Entities maintain large amounts of data, and conducting probability distribution and/or correlation analyses on these large datasets while maintaining data privacy for personally identifiable information (PII) is difficult. The present application describes methods for identifying data fields that comprise PII and synthesizing the data so that the PII is removed, but the integrity of the probability distribution and/or correlation metrics remains. Certain data is grouped into data fields based on a data table type, and each data type may be assigned a certain data analysis strategy, which may comprise a joint probability distribution, a characterbase data faker, a genetic regex generator, and/or a timeseries model. A table sketch may be generated that may comprise at least one synthesizer recipe to be used in future data queries.Type: GrantFiled: April 15, 2022Date of Patent: April 22, 2025Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Vicky Froyen, Kelsey Schuster, Michael Tandecki, Anna Filipiak
-
Patent number: 12235813Abstract: The present disclosure is directed to continuous data profiling (CDP). Entities may house large amounts of disorganized and/or duplicative data. To organize and standardize data across a data set, the data may be profiled. However, profiling large data set can be inefficient and give rise to security problems, as profiling datasets typically requires exporting a dataset to a third-party profiling runtime environment. To remedy these issues, the present disclosure is directed to a continuous data profiling platform that comprises a CDP manager communicatively coupled to a client's database. The CDP manager provides access to a CDP API that may install CDP tools on a client's native database environment, enabling the database management system to profile datasets within the client's native database environment, which results in a more efficient use of computing resources and more secure process of profiling datasets.Type: GrantFiled: September 18, 2023Date of Patent: February 25, 2025Assignee: Collibra Belgium BVInventors: James B. Cushman, II, Vadim Vaks, Satyender Goel
-
System for preparing machine learning training data for use in evaluation of term definition quality
Patent number: 12190207Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.Type: GrantFiled: December 22, 2020Date of Patent: January 7, 2025Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Michael Tandecki -
Patent number: 12182300Abstract: Systems and methods for policy management are described. In some implementations, a master policy management system can create a policy template in which all policies of a user can be built, monitored, and enforced. The master policy management system can create a taxonomy for the policy template and receive access and control settings for the policy template from the user. A user can generate policies in the policy template and the master policy management system can review and certify the policies based the accuracy of the policies. Once a policy is built, the master policy management system can review and certify the policy, provide a quality score for the policy, perform lifecycle management, record the policy use, and report alerts regarding the policy.Type: GrantFiled: September 7, 2021Date of Patent: December 31, 2024Assignee: Collibra Belgium BVInventors: Hafeesmon Chett, James B. Cushman, II
-
Patent number: 12130777Abstract: The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.Type: GrantFiled: June 14, 2023Date of Patent: October 29, 2024Assignee: Collibra Belgium BVInventors: Curtiss W. Schuler, Brett A. Norris, Satyender Goel
-
Patent number: 12067141Abstract: A data marketplace for enriching data records by matching, identifying composite data records, and utilizing Reference Source datasets. Customer data can be tokenized and then subsequently transmitted to a third-party Data Marketplace Platform. Similarly, a Reference Source dataset may be tokenized and transmitted to a Data Marketplace Platform. On the Data Marketplace Platform, the customer data and the reference source data may be compared, wherein certain data attributes (i.e., tokens on the Data Marketplace Platform) may be identified as missing in the customer dataset and present in the reference source dataset. The customer may then have the ability to acquire the missing and value-added data attributes by transacting with the reference source via a data broker, such as the Data Marketplace Platform.Type: GrantFiled: March 31, 2021Date of Patent: August 20, 2024Assignee: Collibra Belgium BVInventors: Satyender Goel, James B. Cushman, II
-
Patent number: 12057108Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.Type: GrantFiled: October 15, 2020Date of Patent: August 6, 2024Assignee: Collibra Belgium BVInventors: Michael Tandecki, Michael Maes, Anna Filipiak
-
Patent number: 12056763Abstract: The present disclosure is directed to systems and methods for enriching data. Specifically, the systems and methods disclosed enable the enrichment of data via matching, identifying composite data records, and utilizing Reference Source datasets. In one example aspect, Customer data is tokenized and then subsequently transmitted to a third-party Consolidation Platform. The Customer tokens may comprise multiple token records, wherein the multiple token records are displayed in the form of a bitmap. The bitmap may indicate which attributes in a Customer record may be present or absent. The composited Customer token records may then be matched to a Reference Source token set, wherein the matching analysis identifies missing data attributes in the Customer token set that the Customer may or may not already possess. The missing data attributes may be populated and/or updated in a Customer environment based on the Reference Source token set.Type: GrantFiled: November 24, 2020Date of Patent: August 6, 2024Assignee: Collibra Belgium BVInventors: Satyender Goel, James B. Cushman
-
Patent number: 12026138Abstract: The present disclosure is directed to systems and methods for reference source matching. Specifically, the systems and methods disclosed enable matching among tokens using a reference source. In one example, a Consolidation Platform may receive tokens from a customer environment and tokens from a reference source environment. The customer tokens may be compared to each other using AB matching. If a match does not occur, the customer tokens may further be compared to the reference source tokens via transitive matching. If a match does occur, then the customer tokens may be denoted as a match. In further example aspects, the reference source may be a universal reference token repository that comprises unique tokens. If, after a match is indicated, the matched token(s) may be compared to the universal reference token repository. If the matched token(s) does not exist, it may be added to the repository for future use.Type: GrantFiled: June 8, 2023Date of Patent: July 2, 2024Assignee: Collibra Belgium BVInventors: Satyender Goel, James B. Cushman
-
Patent number: 12008137Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.Type: GrantFiled: June 28, 2023Date of Patent: June 11, 2024Assignee: Collibra Belgium BVInventors: Satyender Goel, Upwan Chachra, James B. Cushman, II
-
Patent number: 12008453Abstract: The present disclosure is directed to systems and methods for predicting and correcting data anomalies. In one example aspect, data is received by the system. The system may analyze the data by profiling the data for certain profiling statistics (e.g., min, max, mean, cardinality, etc.). At least one machine-learning algorithm (e.g., a Random-Forest algorithm) may be applied to the profiled data to identify potential relationships among certain data columns in the data. Once certain relationships are identified, the data that is related may be extracted to form an itemset. A second machine-learning algorithm (e.g., Frequent Pattern Growth algorithm) may be applied to the itemset to identify certain frequencies of related values in the itemset. Low frequency values may indicate anomalies in the dataset. If an anomaly is detected, the system may be configured to provide an intelligent remedial action, such as substituting certain values and/or filling in a missing value.Type: GrantFiled: January 26, 2023Date of Patent: June 11, 2024Assignee: Collibra Belgium BVInventors: Kirk J. Haslbeck, Brian N. Mearns
-
Patent number: 11966696Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.Type: GrantFiled: May 18, 2023Date of Patent: April 23, 2024Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Michael Tandecki
-
Patent number: 11966402Abstract: The present disclosure relates to methods and systems for processing data via a data profiling process. Data profiling can include modifying attributes included in source data and identifying aspects of the source data. The data profiling process can include processing an attribute according to a set of validation rules to validate information included in the attribute. The process can also include processing the attribute according to a set of standardization rules to modify the attribute into a standardized format. The process can also include processing the attribute according to a set of rules engines. The modified attributes can be outputted for further processing. The data profiling process can also include deriving a value score and usage rank of an attribute, which can be used in deriving insights into the source data.Type: GrantFiled: April 9, 2020Date of Patent: April 23, 2024Assignee: Collibra Belgium BVInventors: Satyender Goel, Aurko Joshi, Vicky Froyen, Upwan Chachra, Pieter De Leenheer, James B. Cushman
-
Patent number: 11949773Abstract: The present disclosure is directed to systems and methods for securely managing and administering an encryption/decryption key using distributed ledger technology (DLT). In some examples, a client may possess a data attribute (or a dataset of data attributes). The client may receive tokenization parameters to apply to the data attribute to encrypt the data attribute. After tokenizing the data attribute, the client may then request the creation of an encryption key to be applied to the token. A third-party key management system (KMS) may create an encryption key and a salt. The salt may be applied to the token, and the salted token may then be encrypted. Additionally, a decryption key may be created and stored securely at the third-party KMS. The client may transmit the encrypted token to a third-party consolidation platform, wherein the consolidation platform requests access to the decryption key to unveil the underlying token.Type: GrantFiled: March 29, 2021Date of Patent: April 2, 2024Assignee: Collibra Belgium BVInventor: Satyender Goel
-
Patent number: 11782889Abstract: The present disclosure is directed to continuous data profiling (CDP). Entities may house large amounts of disorganized and/or duplicative data. To organize and standardize data across a data set, the data may be profiled. However, profiling large data set can be inefficient and give rise to security problems, as profiling datasets typically requires exporting a dataset to a third-party profiling runtime environment. To remedy these issues, the present disclosure is directed to a continuous data profiling platform that comprises a CDP manager communicatively coupled to a client's database. The CDP manager provides access to a CDP API that may install CDP tools on a client's native database environment, enabling the database management system to profile datasets within the client's native database environment, which results in a more efficient use of computing resources and more secure process of profiling datasets.Type: GrantFiled: June 30, 2021Date of Patent: October 10, 2023Assignee: Collibra Belgium BVInventors: James B. Cushman, II, Vadim Vaks, Satyender Goel
-
Patent number: 11734361Abstract: The present disclosure is directed to systems and methods for recognizing and categorizing documents. In some embodiments, a computing system can access an archetype template and a corresponding label for each targeted category. The computing system can analyze a set of target binary documents based on a set of sequenced and contextually triggered hashing operations. The target binary documents can be categorized based on comparing the analysis results to the archetype templates or results derived from the archetype templates.Type: GrantFiled: April 15, 2022Date of Patent: August 22, 2023Assignee: Collibra Belgium BVInventor: Sergio Lohengrin Castro Mejía
-
Patent number: 11704438Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.Type: GrantFiled: June 21, 2022Date of Patent: July 18, 2023Assignee: Collibra Belgium BVInventors: Satyender Goel, Upwan Chachra, James B. Cushman, II
-
Patent number: 11693821Abstract: The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.Type: GrantFiled: July 7, 2021Date of Patent: July 4, 2023Assignee: Collibra Belgium BVInventors: Curtiss W. Schuler, Brett A. Norris, Satyender Goel
-
Patent number: 11675754Abstract: The present disclosure is directed to systems and methods for reference source matching. Specifically, the systems and methods disclosed enable matching among tokens using a reference source. In one example, a Consolidation Platform may receive tokens from a customer environment and tokens from a reference source environment. The customer tokens may be compared to each other using AB matching. If a match does not occur, the customer tokens may further be compared to the reference source tokens via transitive matching. If a match does occur, then the customer tokens may be denoted as a match. In further example aspects, the reference source may be a universal reference token repository that comprises unique tokens. If, after a match is indicated, the matched token(s) may be compared to the universal reference token repository. If the matched token(s) does not exist, it may be added to the repository for future use.Type: GrantFiled: November 24, 2020Date of Patent: June 13, 2023Assignee: Collibra Belgium BVInventors: Satyender Goel, James B. Cushman
-
Patent number: 11669682Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.Type: GrantFiled: December 22, 2020Date of Patent: June 6, 2023Assignee: Collibra Belgium BVInventors: Gretel De Paepe, Michael Tandecki