Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Condensing hierarchies in a governance system based on usage

Patent number: 12271425

Abstract: Embodiments of the present invention provide methods, computer program products, and systems. Embodiments of the present invention can condense a hierarchy in a data governance system, wherein the hierarchy comprises a root node and at least one child node comprising related sub-trees by determining, for a parent node in the hierarchy of governance system, governance terms and respective assignment relationships from a plurality of information assets, determining usage of the governance term in at least one of a plurality of governance rules, and marking a governance term of the plurality of governance terms for elimination based on the determined assignment relationships and the determined usage of the governance term in the plurality of governance rules. Embodiments of the present invention can then delete the governance term from the hierarchy if the governance term is marked for elimination.

Type: Grant

Filed: June 7, 2021

Date of Patent: April 8, 2025

Assignee: International Business Machines Corporation

Inventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
Conditional access to data

Patent number: 12265636

Abstract: A database system can comprise records, each record including a set of attributes. The database system can further comprise database views, each database view representing a subset of the set of attributes. Data purpose objects indicating a subset of attributes of the set of attributes and a processing purpose can be stored. Each processing purpose can be associated with one or more entities that authorized access to the subset of attributes of the processing purpose. A request for data for a specific processing purpose and a selected view of the database views can be received. A data purpose object that indicates the specific processing purpose can be retrieved. The subset of attributes represented by the selected view can be compared with the subset of the attributes indicated in the retrieved data purpose object. Values of the subset of attributes of the selected view can be provided.

Type: Grant

Filed: December 8, 2021

Date of Patent: April 1, 2025

Assignee: International Business Machines Corporation

Inventors: Lars Bremer, Albert Maier, Mike W. Grasselt, Yannick Saillet, Michael Baessler
DETECTING LABELS OF A DATA CATALOG INCORRECTLY ASSIGNED TO DATA SET FIELDS

Publication number: 20250013629

Abstract: Described are techniques for detecting labels incorrectly assigned to data set fields. The data of each data set field, such as those data set fields assigned to the same label, are represented using a set of characteristics. The data set fields are then clustered into clusters based on the characteristics of the data of the data set fields. Those clusters of data set fields with a homogeneity (being assigned the same label) that exceeds a first threshold value and is below a second threshold value are identified. One or labels assigned to the data set fields of the identified clusters are identified as being suspect for incorrect assignments by having a frequency below a third threshold value (e.g., 3%), which may be user-designated. The label(s) identified as being suspect for incorrect assignment are then presented to a user for review.

Type: Application

Filed: July 8, 2023

Publication date: January 9, 2025

Inventors: Orna Raz, Yannick Saillet, Maya Zohar, Marcel Zalmanovici
DETECTING HIGH-IMPACT DATA QUALITY RULES AND POLICIES FROM DATA CLEANSING STEPS

Publication number: 20240411736

Abstract: A method, computer system, and a computer program product are provided for cleansing steps. These are in accordance with existing data quality and rules in existence for using a plurality of different transformation assets. Information is obtained about a plurality of different transformation assets and their associated data quality and rules are extracted. A plurality of possible cleansing steps to be performed are identified for the plurality of different transformation assets. An analysis is performed for the identified cleansing steps, on impact on the different transformation assets. It is then determined when more than one identified step has a similar semantics across the plurality of different transformation assets and when more than any two of them need to perform a similar step across the same dataset. The relevance of each cleansing step to be performed is then determined and a cleansing step order of performance is provided.

Type: Application

Filed: June 12, 2023

Publication date: December 12, 2024

Inventors: Alexander Lang, Albert Maier, Werner Schuetz, Sergej Schuetz, Martin Anton Oberhofer, Yannick Saillet, Mike W. Grasselt
Masking sensitive information in a document

Patent number: 12088718

Abstract: The exemplary embodiments disclose a method, a computer program product, and a computer system for protecting sensitive information. The exemplary embodiments may include using an inverted text index for evaluating one or more statistical measures of an index token of the inverted text index, using the one or more statistical measures for selecting a set of candidate tokens, extracting metadata from the inverted text index, associating the set of candidate tokens with respective token metadata, tokenizing at least one document resulting in one or more document tokens, comparing the one or more document tokens with the set of candidate tokens, selecting a set of document tokens to be masked, selecting at least part of the set of document tokens that comprises sensitive information according to the associated token metadata, masking the at least part of the set of document tokens, and providing one or more masked documents.

Type: Grant

Filed: October 19, 2020

Date of Patent: September 10, 2024

Assignee: International Business Machines Corporation

Inventors: Michael Baessler, Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer
IMPACT SCORE FOR ONTOLOGY CHANGES

Publication number: 20240256591

Abstract: Described are techniques for a re-analysis of assignments of terms to assets. The techniques include detecting a change in a term ontology comprising a plurality of terms, and determining at least one selected from a group consisting of: a domain feature change vector (DFCV) for a domain of the term ontology affected by the change, and a term feature change vector (TFCV) for the term affected by the change. The techniques further include identifying assets for the re-analysis of the assignments of terms, wherein each of the identified assets is associated with an impact score value based on the DFCV and/or the TFCV, and performing the re-analysis of the assignments of terms for the identified assets ordered by the impact score value.

Type: Application

Filed: February 1, 2023

Publication date: August 1, 2024

Inventors: Oliver Suhre, Thomas Hampp-Bahnmueller, Peter Gerstl, Yannick Saillet, Albert Maier, Michael Baessler
DATA CURRENCY SCORES FOR AN ANALYTICS PLATFORM

Publication number: 20240242161

Abstract: An approach is provided for computing and using a currency score. A currency score of a data element is determined as a weighted average of scores of dimensions of the data element. The dimensions include a combination of change frequency, change size, outdated value, and sources score dimensions. Relative to the data element, the change frequency dimension indicates an update frequency, the change size dimension indicates amounts of data being created, updated, and deleted per time unit, the outdated value dimension indicates a portion of values that are not semantically correct, but were semantically correct in the past, and the sources score dimension indicates a currency of input source(s). Based on the currency score, a currency of data included in the data element is evaluated. Based on the currency of the data, a remedial action is performed to improve the currency of the data.

Type: Application

Filed: January 18, 2023

Publication date: July 18, 2024

Inventors: Albert Maier, Mike W. Grasselt, Martin Anton Oberhofer, Alexander Lang, Sergej Schuetz, Yannick Saillet, Werner Schuetz
Automatic application dependency management

Patent number: 12026522

Abstract: A database of deployed configurations, as well as attempted configurations that failed is maintained and used as reference to compare against configurations of attempted software deployments. Upon detecting a failed deployment, disclosed embodiments search the database for working configurations that most closely resemble the failed configuration, and rank the configurations based on various criteria. Disclosed embodiments may then automatically select a highest ranked working configuration, and perform an automatic upgrade of the necessary components to create a working configuration.

Type: Grant

Filed: April 6, 2021

Date of Patent: July 2, 2024

Assignee: International Business Machines Corporation

Inventors: Krishna Kishore Bonagiri, Namit Kabra, Yannick Saillet, Mike W. Grasselt
SMART IDENTIFICATION OF INDICATOR TEXT WITH FULL-TEXT SEARCH OR OPTIMIZED DOCUMENT ANALYSIS

Publication number: 20240202358

Abstract: Several aspects for optimizing unstructured document analysis comprise operating a document system, where the document system comprises a plurality of documents comprising unstructured content and a full-text index; receiving a request to identify documents comprising a type of data elements; selecting a sample out of the plurality of documents; determining data elements of the type in the sample of documents; determining an indicator context expression for the type of data elements out of the determined data elements of the type; determining a query for searching, using a search engine, the full-text index using the indicator context expression; and determining the documents in the document system being compliant to the determined query.

Type: Application

Filed: December 19, 2022

Publication date: June 20, 2024

Inventors: Thomas Hampp-Bahnmueller, Michael Baessler, Yannick Saillet
OPTIMIZING METADATA ENRICHMENT OF DATA ASSETS

Publication number: 20240152494

Abstract: The present disclosure relates to a method of metadata enrichment using an enrichment comprising multiple steps. The method comprises: determining for an input data asset a metadata value descriptive of the input data asset. Characteristics of the metadata value of the input data asset may be determined. At least one informativeness score of the metadata value of the input data asset may be computed using the determined characteristics. An execution of the enrichment step may be skipped in case an input characteristic of the enrichment step is not part of the determined characteristics. In case the input characteristic of the enrichment step is part of the determined characteristics, the enrichment step may be adapted and executed or the enrichment step may be executed without adaptation. Labels resulting from the executed enrichment steps may be combined for providing one or more labels of the data asset.

Type: Application

Filed: January 4, 2023

Publication date: May 9, 2024

Inventors: Thomas Hampp-Bahnmueller, Peter Gerstl, Yannick Saillet, Michael Baessler, Albert Maier, Oliver Suhre
Method for weighting a graph

Patent number: 11941056

Abstract: The present disclosure relates to a method for a weighting graph comprising nodes representing entities and edges representing relationships between entities in accordance with one or more domains. The method comprises: pre-processing the graph comprising assigning weights to the nodes and/or the edges of the graph in accordance with a specific domain of the domains, wherein the weight indicates a domain specific data quality problem of attribute values representing an edge of the edges and/or an entity involved in that edge. The weighted graph may be provided for enabling a processing of the graph in accordance with the specific domain.

Type: Grant

Filed: April 20, 2021

Date of Patent: March 26, 2024

Assignee: International Business Machines Corporation

Inventors: Martin Oberhofer, Mike W. Grasselt, Claudio Andrea Fanconi, Thuany Karoline Stuart, Yannick Saillet, Basem Elasioty, Hemanth Kumar Babu, Robert Kern
DETECTING AND RECTIFYING MODEL DRIFT USING GOVERNANCE

Publication number: 20240095547

Abstract: An embodiment for monitoring machine learning models to detect and rectify model drift using governance. The embodiment may receive a plurality of machine learning models and register the plurality of machine learning models to a governance dashboard. The embodiment may automatically monitor the received plurality of machine learning models to identify factors used by each of the received plurality of machine learning models and generate corresponding clusters of similar machine learning models. The embodiment may automatically detect an incorrect decision made by a target machine learning model and then automatically calculate a correlation score between the target machine learning model and machine learning models within an associated corresponding cluster of similar machine learning models. The embodiment may, in response to detecting a correlation score above a threshold, automatically determine and output a cluster reinforcement recommendation.

Type: Application

Filed: September 21, 2022

Publication date: March 21, 2024

Inventors: Neerju Gupta, Namit Kabra, Yannick Saillet
MANAGING DATA INGESTION AND STORAGE

Publication number: 20240078241

Abstract: An embodiment for managing data using machine learning models and information governance. The embodiment may automatically detect a data analysis request made within a system and identify subject datasets. The embodiment may automatically conduct shallow term assignments on each row and column of data in the subject datasets and automatically match the shallow term assignments for each row and column with a stored set of ranked terms, and automatically flag rows or columns matching with ranked terms above a predetermined threshold ranking for further analysis. The embodiment may automatically and continuously monitor and detect irrelevant metadata types to prevent subsequent analysis and storage of data including the irrelevant metadata types. The embodiment may automatically generate a criticality ranking for stored analysis datasets.

Type: Application

Filed: September 7, 2022

Publication date: March 7, 2024

Inventors: Neerju Gupta, Namit Kabra, Yannick Saillet
Analyzing deduplicated data blocks associated with unstructured documents

Patent number: 11921676

Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.

Type: Grant

Filed: November 29, 2021

Date of Patent: March 5, 2024

Assignee: International Business Machines Corporation

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet
Data classification

Patent number: 11748382

Abstract: A method provides for classifying data fields of a dataset. A classifier configured for determining confidence values for a plurality of data classes for the data fields may be applied. Using the confidence values, data class candidates may be identified. Data fields may be determined for which a plurality of data class candidates is identifiable. Using previous user-selected data class assignments, a probability may be determined for the data class candidates that the respective data class candidate is a data class to which the respective data field is to be assigned. The data fields may be classified using the probabilities to select for the data fields a data class from the data class candidates. The dataset may be provided with metadata identifying for the data fields the data classes to which the respective data fields are assigned.

Type: Grant

Filed: May 18, 2020

Date of Patent: September 5, 2023

Assignee: International Business Machines Corporation

Inventors: Yannick Saillet, Namit Kabra, Mike W. Grasselt, Krishna Kishore Bonagiri
Method for classifying an unmanaged dataset

Patent number: 11734233

Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.

Type: Grant

Filed: November 16, 2021

Date of Patent: August 22, 2023

Assignee: International Business Machines Corporation

Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
AUTOMATICALLY ASSIGN TERM TO TEXT DOCUMENTS

Publication number: 20230186023

Abstract: In an approach, a processor receives an unstructured text document. A processor extracts at least one unrecognized token from the unstructured text document. A processor identifies at least one structured data element in a predefined set of data sources, where the at least one structured data element is related to the at least one extracted unrecognized token from the unstructured text document. A processor relates a label associated with the identified at least one structured data element to the unstructured text document.

Type: Application

Filed: December 13, 2021

Publication date: June 15, 2023

Inventors: Yannick Saillet, Alexander Lang, Robert Kern, Gudrun Kaufmann
Automatically completing a pipeline graph in an internet of things network

Patent number: 11675838

Abstract: An approach is provided for completing a pipeline graph. Using a deep learning based sequence model, an initial data pipeline having a sequence of nodes is generated. Mismatch(es) between data formats required by input and output in the sequence of nodes is identified. Virtual gap node(s) that correct the mismatch(es) are added to the initial data pipeline. For a given virtual gap node, tentative graph structures are determined using knowledge graphs and a crowd sourced validation system. Reuse forecast scores and performance scores for the tentative graph structures are calculated. Based on the reuse forecast scores and the performance scores, a final graph structure for implementing the given virtual gap node is determined.

Type: Grant

Filed: May 11, 2021

Date of Patent: June 13, 2023

Assignee: International Business Machines Corporation

Inventors: Namit Kabra, Ritesh Kumar Gupta, Yannick Saillet, Vijay Ekambaram
CONDITIONAL ACCESS TO DATA

Publication number: 20230177193

Abstract: A database system can comprise records, each record including a set of attributes. The database system can further comprise database views, each database view representing a subset of the set of attributes. Data purpose objects indicating a subset of attributes of the set of attributes and a processing purpose can be stored. Each processing purpose can be associated with one or more entities that authorized access to the subset of attributes of the processing purpose. A request for data for a specific processing purpose and a selected view of the database views can be received. A data purpose object that indicates the specific processing purpose can be retrieved. The subset of attributes represented by the selected view can be compared with the subset of the attributes indicated in the retrieved data purpose object. Values of the subset of attributes of the selected view can be provided.

Type: Application

Filed: December 8, 2021

Publication date: June 8, 2023

Inventors: Lars Bremer, Albert Maier, Mike W. Grasselt, Yannick Saillet, Michael Baessler
ANALYZING DEDUPLICATED DATA BLOCKS ASSOCIATED WITH UNSTRUCTURED DOCUMENTS

Publication number: 20230169041

Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.

Type: Application

Filed: November 29, 2021

Publication date: June 1, 2023

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet

1 2 3 4 5 … next