Patents by Inventor Mike W. Grasselt
Mike W. Grasselt has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11941056Abstract: The present disclosure relates to a method for a weighting graph comprising nodes representing entities and edges representing relationships between entities in accordance with one or more domains. The method comprises: pre-processing the graph comprising assigning weights to the nodes and/or the edges of the graph in accordance with a specific domain of the domains, wherein the weight indicates a domain specific data quality problem of attribute values representing an edge of the edges and/or an entity involved in that edge. The weighted graph may be provided for enabling a processing of the graph in accordance with the specific domain.Type: GrantFiled: April 20, 2021Date of Patent: March 26, 2024Assignee: International Business Machines CorporationInventors: Martin Oberhofer, Mike W. Grasselt, Claudio Andrea Fanconi, Thuany Karoline Stuart, Yannick Saillet, Basem Elasioty, Hemanth Kumar Babu, Robert Kern
-
Patent number: 11748382Abstract: A method provides for classifying data fields of a dataset. A classifier configured for determining confidence values for a plurality of data classes for the data fields may be applied. Using the confidence values, data class candidates may be identified. Data fields may be determined for which a plurality of data class candidates is identifiable. Using previous user-selected data class assignments, a probability may be determined for the data class candidates that the respective data class candidate is a data class to which the respective data field is to be assigned. The data fields may be classified using the probabilities to select for the data fields a data class from the data class candidates. The dataset may be provided with metadata identifying for the data fields the data classes to which the respective data fields are assigned.Type: GrantFiled: May 18, 2020Date of Patent: September 5, 2023Assignee: International Business Machines CorporationInventors: Yannick Saillet, Namit Kabra, Mike W. Grasselt, Krishna Kishore Bonagiri
-
Publication number: 20230185786Abstract: A computer-implemented method for detecting reference data standardization gaps in data sets is disclosed. The method comprises identifying at least one reference data candidate in a data set, using an index for values of the identified at least one reference data candidate, and determining a difference between an earlier version of a reference data set relating to the reference data candidate and a current version of the reference data set. Furthermore, the method comprises comparing the determined difference with values of the index, and identifying entries in the at least one reference data candidate having a value identical to a value of the difference as reference data standardization gap.Type: ApplicationFiled: December 13, 2021Publication date: June 15, 2023Inventors: Albert Maier, Dennis Butterstein, Alexandre Luz Xavier Da Costa, Mike W. Grasselt, Timo Kussmaul, Yevgen Karpenko
-
Publication number: 20230177193Abstract: A database system can comprise records, each record including a set of attributes. The database system can further comprise database views, each database view representing a subset of the set of attributes. Data purpose objects indicating a subset of attributes of the set of attributes and a processing purpose can be stored. Each processing purpose can be associated with one or more entities that authorized access to the subset of attributes of the processing purpose. A request for data for a specific processing purpose and a selected view of the database views can be received. A data purpose object that indicates the specific processing purpose can be retrieved. The subset of attributes represented by the selected view can be compared with the subset of the attributes indicated in the retrieved data purpose object. Values of the subset of attributes of the selected view can be provided.Type: ApplicationFiled: December 8, 2021Publication date: June 8, 2023Inventors: Lars Bremer, Albert Maier, Mike W. Grasselt, Yannick Saillet, Michael Baessler
-
Patent number: 11651055Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a first graph comprising first nodes representing first entities and first edges representing relationships between first entities, the first nodes being associated with first entity attributes descriptive of the first entities represented by the first nodes, the first edges being associated with first edge attributes descriptive of the relationships represented by the first edges; determining a first subgraph for a certain node of the first nodes of the first graph, the first subgraph including the certain node and at least one neighboring node of the certain node; and determining a data quality issue regarding the certain node based, at least in part, on applying one or more applicable rules of a set of data quality rules to first entity attribute values and first edge attribute values of the first subgraph.Type: GrantFiled: October 29, 2020Date of Patent: May 16, 2023Assignee: International Business Machines CorporationInventors: Yannick Saillet, Claudio Andrea Fanconi, Martin Oberhofer, Hemanth Kumar Babu, Basem Elasioty, Mike W. Grasselt, Robert Kern, Thuany Karoline Stuart
-
Patent number: 11550813Abstract: Techniques are described relating to automatic data standardization in a managed services domain of a cloud computing environment. An associated computer-implemented method includes receiving a dataset during a data onboarding procedure and classifying datapoints within the dataset. The method further includes applying a machine learning data standardization model to each classified datapoint within the dataset and deriving a proposed set of data standardization rules for the dataset based upon any standardization modification determined consequent to model application. Optionally, the method includes presenting the proposed set of data standardization rules for client review and, responsive to acceptance of the proposed set of data standardization rules, applying the proposed set of data standardization rules to the dataset. The method further includes, responsive to acceptance of the proposed set of data standardization rules, updating the machine learning data standardization model accordingly.Type: GrantFiled: February 24, 2021Date of Patent: January 10, 2023Assignee: International Business Machines CorporationInventors: Namit Kabra, Krishna Kishore Bonagiri, Mike W. Grasselt, Yannick Saillet
-
Patent number: 11537552Abstract: A computer system, computer program product, and a computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules is disclosed. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.Type: GrantFiled: May 11, 2020Date of Patent: December 27, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
-
Publication number: 20220391848Abstract: Embodiments of the present invention provide methods, computer program products, and systems. Embodiments of the present invention can condense a hierarchy in a data governance system, wherein the hierarchy comprises a root node and at least one child node comprising related sub-trees by determining, for a parent node in the hierarchy of governance system, governance terms and respective assignment relationships from a plurality of information assets, determining usage of the governance term in at least one of a plurality of governance rules, and marking a governance term of the plurality of governance terms for elimination based on the determined assignment relationships and the determined usage of the governance term in the plurality of governance rules. Embodiments of the present invention can then delete the governance term from the hierarchy if the governance term is marked for elimination.Type: ApplicationFiled: June 7, 2021Publication date: December 8, 2022Inventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
-
Patent number: 11500876Abstract: Embodiments of the present invention determines duplicates in a graph. The graph comprises nodes representing entities and edges representing relationships between the entities. The method comprises: identifying at least two nodes in the graph. A neighborhood subgraph may be determined for each of the two nodes. The neighborhood subgraph includes the respective node. The method further comprises determining whether the two nodes are duplicates with respect to each other, based on a result of a comparison between the two subgraphs.Type: GrantFiled: December 8, 2020Date of Patent: November 15, 2022Assignee: International Business Machines CorporationInventors: Thuany Karoline Stuart, Basem Elasioty, Claudio Andrea Fanconi, Mike W. Grasselt, Hemanth Kumar Babu, Yannick Saillet, Robert Kern, Martin Oberhofer, Lars Bremer, Jonathan Roesner, Jason Allen Woods
-
Patent number: 11487770Abstract: A computer implemented method is used for sorting data elements of a given set. The method includes performing an evaluation of a first type of usage of each data element. The method includes determining a set of data element candidates dependent on the evaluation of the first type of usage. The method includes performing an evaluation of a second type of usage of each data element of the set of data element candidates. The method includes sorting the data elements of the set of data element candidates dependent on the evaluation of the second type of usage of each data element of the set of data element candidates. The method includes providing the sorted data elements of the set of data element candidates, and in response, receiving a request for a data processing based on the provided sorted data elements of the set of data element candidates.Type: GrantFiled: May 18, 2020Date of Patent: November 1, 2022Assignee: International Business Machines CorporationInventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
-
Publication number: 20220318028Abstract: A database of deployed configurations, as well as attempted configurations that failed is maintained and used as reference to compare against configurations of attempted software deployments. Upon detecting a failed deployment, disclosed embodiments search the database for working configurations that most closely resemble the failed configuration, and rank the configurations based on various criteria. Disclosed embodiments may then automatically select a highest ranked working configuration, and perform an automatic upgrade of the necessary components to create a working configuration.Type: ApplicationFiled: April 6, 2021Publication date: October 6, 2022Inventors: Krishna Kishore Bonagiri, Namit Kabra, Yannick Saillet, Mike W. Grasselt
-
Publication number: 20220277017Abstract: Techniques are described relating to automatic data standardization in a managed services domain of a cloud computing environment. An associated computer-implemented method includes receiving a dataset during a data onboarding procedure and classifying datapoints within the dataset. The method further includes applying a machine learning data standardization model to each classified datapoint within the dataset and deriving a proposed set of data standardization rules for the dataset based upon any standardization modification determined consequent to model application. Optionally, the method includes presenting the proposed set of data standardization rules for client review and, responsive to acceptance of the proposed set of data standardization rules, applying the proposed set of data standardization rules to the dataset. The method further includes, responsive to acceptance of the proposed set of data standardization rules, updating the machine learning data standardization model accordingly.Type: ApplicationFiled: February 24, 2021Publication date: September 1, 2022Inventors: Namit Kabra, Krishna Kishore Bonagiri, Mike W. Grasselt, Yannick Saillet
-
Publication number: 20220138512Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a first graph comprising first nodes representing first entities and first edges representing relationships between first entities, the first nodes being associated with first entity attributes descriptive of the first entities represented by the first nodes, the first edges being associated with first edge attributes descriptive of the relationships represented by the first edges; determining a first subgraph for a certain node of the first nodes of the first graph, the first subgraph including the certain node and at least one neighboring node of the certain node; and determining a data quality issue regarding the certain node based, at least in part, on applying one or more applicable rules of a set of data quality rules to first entity attribute values and first edge attribute values of the first subgraph.Type: ApplicationFiled: October 29, 2020Publication date: May 5, 2022Inventors: Yannick Saillet, Claudio Andrea Fanconi, Martin Oberhofer, Hemanth Kumar Babu, Basem Elasioty, Mike W. Grasselt, Robert Kern, Thuany Karoline Stuart
-
Publication number: 20220123935Abstract: The exemplary embodiments disclose a method, a computer program product, and a computer system for protecting sensitive information. The exemplary embodiments may include using an inverted text index for evaluating one or more statistical measures of an index token of the inverted text index, using the one or more statistical measures for selecting a set of candidate tokens, extracting metadata from the inverted text index, associating the set of candidate tokens with respective token metadata, tokenizing at least one document resulting in one or more document tokens, comparing the one or more document tokens with the set of candidate tokens, selecting a set of document tokens to be masked, selecting at least part of the set of document tokens that comprises sensitive information according to the associated token metadata, masking the at least part of the set of document tokens, and providing one or more masked documents.Type: ApplicationFiled: October 19, 2020Publication date: April 21, 2022Inventors: Michael Baessler, Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer
-
Publication number: 20220100899Abstract: In an approach, a processor receives a request of a document. A processor identifies a set of datasets comprising a sensitive dataset, the set of datasets being interrelated in accordance with a relational model. A processor extracts attribute values of the document. A processor determines that a set of one or more attribute values of the extracted attribute values is in the set of datasets, the set of attribute values being values of a set of attributes. A processor determines that one or more entities of the sensitive dataset can be identified based on relations of the relational model between the set of attributes, where at least part of attribute values of the one or more entities comprises sensitive information. A processor, responsive to determining that the one or more entities can be identified, masks at least part of the set of one or more attribute values in the document.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Inventors: Yannick Saillet, Albert Maier, Mike W. Grasselt, Michael Baessler, Lars Bremer
-
Publication number: 20210357699Abstract: The invention relates to an approach for data quality assessment for data analytics, the approach comprising providing a data set, the data set comprising multiple data fields, predicting by a first trained machine learning model at least one usage type of the data set using characteristics of the data fields as input, for each usage type of the at least one usage type, determining a usage specific data quality score of each of the predicted usage types, and using of the data set based on the at least one usage type and associated data quality score.Type: ApplicationFiled: May 14, 2020Publication date: November 18, 2021Inventors: Yannick Saillet, Mike W. Grasselt, Namit Kabra, Krishna Kishore Bonagiri
-
Publication number: 20210357183Abstract: A computer implemented method is used for sorting data elements of a given set. The method includes performing an evaluation of a first type of usage of each data element. The method includes determining a set of data element candidates dependent on the evaluation of the first type of usage. The method includes performing an evaluation of a second type of usage of each data element of the set of data element candidates. The method includes sorting the data elements of the set of data element candidates dependent on the evaluation of the second type of usage of each data element of the set of data element candidates. The method includes providing the sorted data elements of the set of data element candidates, and in response, receiving a request for a data processing based on the provided sorted data elements of the set of data element candidates.Type: ApplicationFiled: May 18, 2020Publication date: November 18, 2021Inventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
-
Publication number: 20210342397Abstract: The present disclosure relates to a method for a weighting graph comprising nodes representing entities and edges representing relationships between entities in accordance with one or more domains. The method comprises: pre-processing the graph comprising assigning weights to the nodes and/or the edges of the graph in accordance with a specific domain of the domains, wherein the weight indicates a domain specific data quality problem of attribute values representing an edge of the edges and/or an entity involved in that edge. The weighted graph may be provided for enabling a processing of the graph in accordance with the specific domain.Type: ApplicationFiled: April 20, 2021Publication date: November 4, 2021Inventors: Martin Oberhofer, Mike W. Grasselt, Claudio Andrea Fanconi, Thuany Karoline Stuart, Yannick Saillet, Basem Elasioty, Hemanth Kumar Babu, Robert Kern
-
Publication number: 20210342352Abstract: Embodiments of the present invention determines duplicates in a graph. The graph comprises nodes representing entities and edges representing relationships between the entities. The method comprises: identifying at least two nodes in the graph. A neighborhood subgraph may be determined for each of the two nodes. The neighborhood subgraph includes the respective node. The method further comprises determining whether the two nodes are duplicates with respect to each other, based on a result of a comparison between the two subgraphs.Type: ApplicationFiled: December 8, 2020Publication date: November 4, 2021Inventors: Thuany Karoline Stuart, Basem Elasioty, Claudio Andrea Fanconi, Mike W. Grasselt, Hemanth Kumar Babu, Yannick Saillet, Robert Kern, Martin Oberhofer, Lars Bremer, Jonathan Roesner, Jason Allen Woods
-
Patent number: 11023497Abstract: Data classification includes tracking classification of columns of data into data classes of a collection of classes available for classifying the columns, obtaining a target column of data, of a target dataset, to be classified into a data class of the collection of candidate classes, and classifying the target column of data into a data class of the collection of classes based on historical data classification characteristics provided by the tracking. The classifying includes selecting a group of candidate data classes of the collection of classes to compare to value(s) of the target column, the selecting excludes at least some candidate data classes of the collection from comparison to the value(s), and establishing a priority between the candidate data classes of the group of candidate classes in comparing the value(s) of the target column of data to the selected group of candidate classes.Type: GrantFiled: September 12, 2019Date of Patent: June 1, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Namit Kabra, Krishna Kishore Bonagiri, Yannick Saillet, Mike W. Grasselt