Patents Assigned to COLLIBRA
  • Patent number: 12282581
    Abstract: The present disclosure is directed to systems and methods for generating synthetic data. Entities maintain large amounts of data, and conducting probability distribution and/or correlation analyses on these large datasets while maintaining data privacy for personally identifiable information (PII) is difficult. The present application describes methods for identifying data fields that comprise PII and synthesizing the data so that the PII is removed, but the integrity of the probability distribution and/or correlation metrics remains. Certain data is grouped into data fields based on a data table type, and each data type may be assigned a certain data analysis strategy, which may comprise a joint probability distribution, a characterbase data faker, a genetic regex generator, and/or a timeseries model. A table sketch may be generated that may comprise at least one synthesizer recipe to be used in future data queries.
    Type: Grant
    Filed: April 15, 2022
    Date of Patent: April 22, 2025
    Assignee: Collibra Belgium BV
    Inventors: Gretel De Paepe, Vicky Froyen, Kelsey Schuster, Michael Tandecki, Anna Filipiak
  • Patent number: 12235813
    Abstract: The present disclosure is directed to continuous data profiling (CDP). Entities may house large amounts of disorganized and/or duplicative data. To organize and standardize data across a data set, the data may be profiled. However, profiling large data set can be inefficient and give rise to security problems, as profiling datasets typically requires exporting a dataset to a third-party profiling runtime environment. To remedy these issues, the present disclosure is directed to a continuous data profiling platform that comprises a CDP manager communicatively coupled to a client's database. The CDP manager provides access to a CDP API that may install CDP tools on a client's native database environment, enabling the database management system to profile datasets within the client's native database environment, which results in a more efficient use of computing resources and more secure process of profiling datasets.
    Type: Grant
    Filed: September 18, 2023
    Date of Patent: February 25, 2025
    Assignee: Collibra Belgium BV
    Inventors: James B. Cushman, II, Vadim Vaks, Satyender Goel
  • Patent number: 12190207
    Abstract: A system for preparing machine learning training data for use in evaluation of term definition quality. The system can include a server having at least one server processor and at least one server memory for storing a plurality of terms with corresponding definitions, and a plurality of client devices each having at least one client memory device and at least one client processor. The client processor programmed to receive at least one of the plurality of terms and its corresponding definition from the server, display the term and its corresponding definition, and receive an indication of whether the definition satisfies one or more definition quality guidelines. The server memory includes instructions for causing the at least one server processor to receive the indications from the plurality of client devices and label each definition as satisfying each of the definition quality guidelines or not based on the received indications.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: January 7, 2025
    Assignee: Collibra Belgium BV
    Inventors: Gretel De Paepe, Michael Tandecki
  • Patent number: 12182300
    Abstract: Systems and methods for policy management are described. In some implementations, a master policy management system can create a policy template in which all policies of a user can be built, monitored, and enforced. The master policy management system can create a taxonomy for the policy template and receive access and control settings for the policy template from the user. A user can generate policies in the policy template and the master policy management system can review and certify the policies based the accuracy of the policies. Once a policy is built, the master policy management system can review and certify the policy, provide a quality score for the policy, perform lifecycle management, record the policy use, and report alerts regarding the policy.
    Type: Grant
    Filed: September 7, 2021
    Date of Patent: December 31, 2024
    Assignee: Collibra Belgium BV
    Inventors: Hafeesmon Chett, James B. Cushman, II
  • Patent number: 12130777
    Abstract: The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.
    Type: Grant
    Filed: June 14, 2023
    Date of Patent: October 29, 2024
    Assignee: Collibra Belgium BV
    Inventors: Curtiss W. Schuler, Brett A. Norris, Satyender Goel
  • Patent number: 12067141
    Abstract: A data marketplace for enriching data records by matching, identifying composite data records, and utilizing Reference Source datasets. Customer data can be tokenized and then subsequently transmitted to a third-party Data Marketplace Platform. Similarly, a Reference Source dataset may be tokenized and transmitted to a Data Marketplace Platform. On the Data Marketplace Platform, the customer data and the reference source data may be compared, wherein certain data attributes (i.e., tokens on the Data Marketplace Platform) may be identified as missing in the customer dataset and present in the reference source dataset. The customer may then have the ability to acquire the missing and value-added data attributes by transacting with the reference source via a data broker, such as the Data Marketplace Platform.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: August 20, 2024
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, James B. Cushman, II
  • Patent number: 12057108
    Abstract: A system for classifying words in a batch of words can include at least one memory device storing instructions for causing at least one processor to create dictionary vectors for each of a plurality of dictionary words using a neural network (NN), store each dictionary vector along with a classification indicator corresponding to the associated dictionary word, and create word vectors for each word in a batch of words for classification using the NN. The closest matching dictionary vectors are found for each word vector and the classification indicators of the closest matching dictionary vector for each word vector in the batch is reported.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: August 6, 2024
    Assignee: Collibra Belgium BV
    Inventors: Michael Tandecki, Michael Maes, Anna Filipiak
  • Patent number: 12056763
    Abstract: The present disclosure is directed to systems and methods for enriching data. Specifically, the systems and methods disclosed enable the enrichment of data via matching, identifying composite data records, and utilizing Reference Source datasets. In one example aspect, Customer data is tokenized and then subsequently transmitted to a third-party Consolidation Platform. The Customer tokens may comprise multiple token records, wherein the multiple token records are displayed in the form of a bitmap. The bitmap may indicate which attributes in a Customer record may be present or absent. The composited Customer token records may then be matched to a Reference Source token set, wherein the matching analysis identifies missing data attributes in the Customer token set that the Customer may or may not already possess. The missing data attributes may be populated and/or updated in a Customer environment based on the Reference Source token set.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: August 6, 2024
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, James B. Cushman
  • Patent number: 12026138
    Abstract: The present disclosure is directed to systems and methods for reference source matching. Specifically, the systems and methods disclosed enable matching among tokens using a reference source. In one example, a Consolidation Platform may receive tokens from a customer environment and tokens from a reference source environment. The customer tokens may be compared to each other using AB matching. If a match does not occur, the customer tokens may further be compared to the reference source tokens via transitive matching. If a match does occur, then the customer tokens may be denoted as a match. In further example aspects, the reference source may be a universal reference token repository that comprises unique tokens. If, after a match is indicated, the matched token(s) may be compared to the universal reference token repository. If the matched token(s) does not exist, it may be added to the repository for future use.
    Type: Grant
    Filed: June 8, 2023
    Date of Patent: July 2, 2024
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, James B. Cushman
  • Patent number: 12008137
    Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.
    Type: Grant
    Filed: June 28, 2023
    Date of Patent: June 11, 2024
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, Upwan Chachra, James B. Cushman, II
  • Patent number: 12008453
    Abstract: The present disclosure is directed to systems and methods for predicting and correcting data anomalies. In one example aspect, data is received by the system. The system may analyze the data by profiling the data for certain profiling statistics (e.g., min, max, mean, cardinality, etc.). At least one machine-learning algorithm (e.g., a Random-Forest algorithm) may be applied to the profiled data to identify potential relationships among certain data columns in the data. Once certain relationships are identified, the data that is related may be extracted to form an itemset. A second machine-learning algorithm (e.g., Frequent Pattern Growth algorithm) may be applied to the itemset to identify certain frequencies of related values in the itemset. Low frequency values may indicate anomalies in the dataset. If an anomaly is detected, the system may be configured to provide an intelligent remedial action, such as substituting certain values and/or filling in a missing value.
    Type: Grant
    Filed: January 26, 2023
    Date of Patent: June 11, 2024
    Assignee: Collibra Belgium BV
    Inventors: Kirk J. Haslbeck, Brian N. Mearns
  • Patent number: 11966696
    Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.
    Type: Grant
    Filed: May 18, 2023
    Date of Patent: April 23, 2024
    Assignee: Collibra Belgium BV
    Inventors: Gretel De Paepe, Michael Tandecki
  • Patent number: 11966402
    Abstract: The present disclosure relates to methods and systems for processing data via a data profiling process. Data profiling can include modifying attributes included in source data and identifying aspects of the source data. The data profiling process can include processing an attribute according to a set of validation rules to validate information included in the attribute. The process can also include processing the attribute according to a set of standardization rules to modify the attribute into a standardized format. The process can also include processing the attribute according to a set of rules engines. The modified attributes can be outputted for further processing. The data profiling process can also include deriving a value score and usage rank of an attribute, which can be used in deriving insights into the source data.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: April 23, 2024
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, Aurko Joshi, Vicky Froyen, Upwan Chachra, Pieter De Leenheer, James B. Cushman
  • Patent number: 11949773
    Abstract: The present disclosure is directed to systems and methods for securely managing and administering an encryption/decryption key using distributed ledger technology (DLT). In some examples, a client may possess a data attribute (or a dataset of data attributes). The client may receive tokenization parameters to apply to the data attribute to encrypt the data attribute. After tokenizing the data attribute, the client may then request the creation of an encryption key to be applied to the token. A third-party key management system (KMS) may create an encryption key and a salt. The salt may be applied to the token, and the salted token may then be encrypted. Additionally, a decryption key may be created and stored securely at the third-party KMS. The client may transmit the encrypted token to a third-party consolidation platform, wherein the consolidation platform requests access to the decryption key to unveil the underlying token.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: April 2, 2024
    Assignee: Collibra Belgium BV
    Inventor: Satyender Goel
  • Patent number: 11782889
    Abstract: The present disclosure is directed to continuous data profiling (CDP). Entities may house large amounts of disorganized and/or duplicative data. To organize and standardize data across a data set, the data may be profiled. However, profiling large data set can be inefficient and give rise to security problems, as profiling datasets typically requires exporting a dataset to a third-party profiling runtime environment. To remedy these issues, the present disclosure is directed to a continuous data profiling platform that comprises a CDP manager communicatively coupled to a client's database. The CDP manager provides access to a CDP API that may install CDP tools on a client's native database environment, enabling the database management system to profile datasets within the client's native database environment, which results in a more efficient use of computing resources and more secure process of profiling datasets.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: October 10, 2023
    Assignee: Collibra Belgium BV
    Inventors: James B. Cushman, II, Vadim Vaks, Satyender Goel
  • Patent number: 11734361
    Abstract: The present disclosure is directed to systems and methods for recognizing and categorizing documents. In some embodiments, a computing system can access an archetype template and a corresponding label for each targeted category. The computing system can analyze a set of target binary documents based on a set of sequenced and contextually triggered hashing operations. The target binary documents can be categorized based on comparing the analysis results to the archetype templates or results derived from the archetype templates.
    Type: Grant
    Filed: April 15, 2022
    Date of Patent: August 22, 2023
    Assignee: Collibra Belgium BV
    Inventor: Sergio Lohengrin Castro Mejía
  • Patent number: 11704438
    Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.
    Type: Grant
    Filed: June 21, 2022
    Date of Patent: July 18, 2023
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, Upwan Chachra, James B. Cushman, II
  • Patent number: 11693821
    Abstract: The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.
    Type: Grant
    Filed: July 7, 2021
    Date of Patent: July 4, 2023
    Assignee: Collibra Belgium BV
    Inventors: Curtiss W. Schuler, Brett A. Norris, Satyender Goel
  • Patent number: 11675754
    Abstract: The present disclosure is directed to systems and methods for reference source matching. Specifically, the systems and methods disclosed enable matching among tokens using a reference source. In one example, a Consolidation Platform may receive tokens from a customer environment and tokens from a reference source environment. The customer tokens may be compared to each other using AB matching. If a match does not occur, the customer tokens may further be compared to the reference source tokens via transitive matching. If a match does occur, then the customer tokens may be denoted as a match. In further example aspects, the reference source may be a universal reference token repository that comprises unique tokens. If, after a match is indicated, the matched token(s) may be compared to the universal reference token repository. If the matched token(s) does not exist, it may be added to the repository for future use.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: June 13, 2023
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, James B. Cushman
  • Patent number: 11669682
    Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: June 6, 2023
    Assignee: Collibra Belgium BV
    Inventors: Gretel De Paepe, Michael Tandecki