Patents Assigned to COLLIBRA
  • Patent number: 11966402
    Abstract: The present disclosure relates to methods and systems for processing data via a data profiling process. Data profiling can include modifying attributes included in source data and identifying aspects of the source data. The data profiling process can include processing an attribute according to a set of validation rules to validate information included in the attribute. The process can also include processing the attribute according to a set of standardization rules to modify the attribute into a standardized format. The process can also include processing the attribute according to a set of rules engines. The modified attributes can be outputted for further processing. The data profiling process can also include deriving a value score and usage rank of an attribute, which can be used in deriving insights into the source data.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: April 23, 2024
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, Aurko Joshi, Vicky Froyen, Upwan Chachra, Pieter De Leenheer, James B. Cushman
  • Patent number: 11966696
    Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.
    Type: Grant
    Filed: May 18, 2023
    Date of Patent: April 23, 2024
    Assignee: Collibra Belgium BV
    Inventors: Gretel De Paepe, Michael Tandecki
  • Patent number: 11949773
    Abstract: The present disclosure is directed to systems and methods for securely managing and administering an encryption/decryption key using distributed ledger technology (DLT). In some examples, a client may possess a data attribute (or a dataset of data attributes). The client may receive tokenization parameters to apply to the data attribute to encrypt the data attribute. After tokenizing the data attribute, the client may then request the creation of an encryption key to be applied to the token. A third-party key management system (KMS) may create an encryption key and a salt. The salt may be applied to the token, and the salted token may then be encrypted. Additionally, a decryption key may be created and stored securely at the third-party KMS. The client may transmit the encrypted token to a third-party consolidation platform, wherein the consolidation platform requests access to the decryption key to unveil the underlying token.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: April 2, 2024
    Assignee: Collibra Belgium BV
    Inventor: Satyender Goel
  • Patent number: 11782889
    Abstract: The present disclosure is directed to continuous data profiling (CDP). Entities may house large amounts of disorganized and/or duplicative data. To organize and standardize data across a data set, the data may be profiled. However, profiling large data set can be inefficient and give rise to security problems, as profiling datasets typically requires exporting a dataset to a third-party profiling runtime environment. To remedy these issues, the present disclosure is directed to a continuous data profiling platform that comprises a CDP manager communicatively coupled to a client's database. The CDP manager provides access to a CDP API that may install CDP tools on a client's native database environment, enabling the database management system to profile datasets within the client's native database environment, which results in a more efficient use of computing resources and more secure process of profiling datasets.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: October 10, 2023
    Assignee: Collibra Belgium BV
    Inventors: James B. Cushman, II, Vadim Vaks, Satyender Goel
  • Patent number: 11734361
    Abstract: The present disclosure is directed to systems and methods for recognizing and categorizing documents. In some embodiments, a computing system can access an archetype template and a corresponding label for each targeted category. The computing system can analyze a set of target binary documents based on a set of sequenced and contextually triggered hashing operations. The target binary documents can be categorized based on comparing the analysis results to the archetype templates or results derived from the archetype templates.
    Type: Grant
    Filed: April 15, 2022
    Date of Patent: August 22, 2023
    Assignee: Collibra Belgium BV
    Inventor: Sergio Lohengrin Castro Mejía
  • Patent number: 11704438
    Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.
    Type: Grant
    Filed: June 21, 2022
    Date of Patent: July 18, 2023
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, Upwan Chachra, James B. Cushman, II
  • Patent number: 11693821
    Abstract: The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.
    Type: Grant
    Filed: July 7, 2021
    Date of Patent: July 4, 2023
    Assignee: Collibra Belgium BV
    Inventors: Curtiss W. Schuler, Brett A. Norris, Satyender Goel
  • Patent number: 11675754
    Abstract: The present disclosure is directed to systems and methods for reference source matching. Specifically, the systems and methods disclosed enable matching among tokens using a reference source. In one example, a Consolidation Platform may receive tokens from a customer environment and tokens from a reference source environment. The customer tokens may be compared to each other using AB matching. If a match does not occur, the customer tokens may further be compared to the reference source tokens via transitive matching. If a match does occur, then the customer tokens may be denoted as a match. In further example aspects, the reference source may be a universal reference token repository that comprises unique tokens. If, after a match is indicated, the matched token(s) may be compared to the universal reference token repository. If the matched token(s) does not exist, it may be added to the repository for future use.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: June 13, 2023
    Assignee: Collibra Belgium BV
    Inventors: Satyender Goel, James B. Cushman
  • Patent number: 11669682
    Abstract: An enterprise data management system with definition quality assessment capabilities for automatically assessing the quality of definitions for terms stored in the enterprise data management system. The system can include a processor programmed to receive a term and a corresponding definition. The processor assess the quality of the definition, including for each of a plurality of quantifiable definition guidelines: deriving feature inputs based on the definition; feeding the feature inputs into a machine learning model corresponding to the definition guideline; and receiving a quality score for the definition guideline from the corresponding machine learning model. An overall quality score is calculated based on the quality score for each of the definition guidelines. The overall quality score and the quality score for each of the plurality of definition guidelines is displayed and if the overall quality score is less than a selected threshold score, a transformation of the definition is recommended.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: June 6, 2023
    Assignee: Collibra Belgium BV
    Inventors: Gretel De Paepe, Michael Tandecki
  • Patent number: 11568328
    Abstract: The present disclosure is directed to systems and methods for predicting and correcting data anomalies. In one example aspect, data is received by the system. The system may analyze the data by profiling the data for certain profiling statistics (e.g., min, max, mean, cardinality, etc.). At least one machine-learning algorithm (e.g., a Random-Forest algorithm) may be applied to the profiled data to identify potential relationships among certain data columns in the data. Once certain relationships are identified, the data that is related may be extracted to form an itemset. A second machine-learning algorithm (e.g., Frequent Pattern Growth algorithm) may be applied to the itemset to identify certain frequencies of related values in the itemset. Low frequency values may indicate anomalies in the dataset. If an anomaly is detected, the system may be configured to provide an intelligent remedial action, such as substituting certain values and/or filling in a missing value.
    Type: Grant
    Filed: April 21, 2021
    Date of Patent: January 31, 2023
    Assignee: Collibra NV
    Inventors: Kirk J. Haslbeck, Brian N. Mearns
  • Patent number: 11366928
    Abstract: The present disclosure relates to methods and systems for contextual data masking and registration. A data masking process may include classifying ingested data, processing the data, and tokenizing the data while maintaining security/privacy of the ingested data. The data masking process may include data configuration that comprises generating anonymized labels of the ingested data, validating an attribute of the ingested data, standardizing the attribute into a standardized format, and processing the data via one or more rules engines. One rules engine can include an address standardization that generates a list of standard addresses that can provide insights into columns of the ingested data without externally transmitting the client data. The masked data can be tokenized as part of the data masking process to securely maintain an impression of the ingested data and generate insights into the ingested data.
    Type: Grant
    Filed: January 29, 2020
    Date of Patent: June 21, 2022
    Assignee: Collibra NV
    Inventors: Satyender Goel, Upwan Chachra, James B. Cushman, II
  • Patent number: 11138477
    Abstract: The present disclosure relates to methods and systems to classify data. A set of classification modules may inspect received data and identify proposed classifications for confidence values for the received data. An aggregation module may receive and aggregate the proposed classifications and confidence values. Based on the aggregated proposed classifications and the confidence values, the aggregation module may generate a final classification for the received data. An external device may perform an action with respect to the received data based on the final classification associated with the data. The action performed may include maintaining the data such that the data may be retrieved upon receipt a request for the data. Any of the classification modules and the aggregation module may be based on training data that may be utilized in subsequent iterations of classifying data to increase classification accuracy.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: October 5, 2021
    Assignee: COLLIBRA NV
    Inventors: Michael Tandecki, Michael Maes, Gretel De Paepe, Anna Filipiak
  • Patent number: 9171022
    Abstract: A method for modifying a mapping from at least one application path of a data system to a conceptual path of an ontology system, wherein the application path addresses a part of the structure of the data system and the conceptual path addresses a part of the structure of the ontology system. The method includes steps of detecting a change to a part of the structure of the ontology system one or more of the conceptual paths is addressing and updating the mappings to reflect the change to the part of the structure of the ontology system.
    Type: Grant
    Filed: August 27, 2014
    Date of Patent: October 27, 2015
    Assignees: COLLIBRA NV/SA, VRIJE UNIVERSITEIT BRUSSEL
    Inventors: Damien Trog, Stijn Christiaens, Pieter Gaston Marguerite De Leenheer, Felix Urbain Yolande Van De Maele, Robert Alfons Meersman
  • Patent number: 8849874
    Abstract: A method for modifying a mapping from at least one application path of a data system to a conceptual path of an ontology system is provided. The application path addresses a part of the structure of the data system, and the conceptual path addresses a part of the structure of the ontology system. The method comprises the steps detecting a change to a part of the structure of the ontology system one or more of the conceptual paths is addressing and updating the mappings to reflect the change to the part of the structure of the ontology system.
    Type: Grant
    Filed: April 27, 2010
    Date of Patent: September 30, 2014
    Assignees: Collibra NV/SA, Vrije Universiteit Brussel
    Inventors: Damien Trog, Stijn Christiaens, Pieter De Leenheer, Felix Urbain Yolande Van De Maele, Robert Alfons Meersman
  • Patent number: 8812553
    Abstract: A method for populating a data system is provided. The method includes the step of mapping at least one application path of the data system to at least one conceptual path of an ontology system. The application path addresses parts of the structure of the data system, and the conceptual path addresses parts of the structure of the ontology system. The method further includes the step of automatically populating the data system at a location addressed by the application path with data values contained in the conceptual path.
    Type: Grant
    Filed: April 29, 2010
    Date of Patent: August 19, 2014
    Assignees: Collibra NV/SA, Vrije Universiteit Brussel
    Inventors: Damien Trog, Stijn Christiaens, Pieter De Leenheer, Felix Urbain Yolande Van De Maele, Robert Alfons Meersman
  • Publication number: 20140108071
    Abstract: A method, apparatus, data model and computer program product for providing entities having transient properties. The method may be performed by a computerized device and comprises: receiving an initial entity type specification, the initial specification comprising lifecycle of the entity type; receiving an indication of a transient property for the entity type; and receiving a possession formula for the transient property, wherein the possession formula is associated with a stage or condition in the lifecycle of at least one entity type.
    Type: Application
    Filed: October 15, 2012
    Publication date: April 17, 2014
    Applicants: COLLIBRA, INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David Boaz, Pieter De Leenheer, Richard B. Hull, Lior Limonad, Mark H. Linehan