Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11941056
    Abstract: The present disclosure relates to a method for a weighting graph comprising nodes representing entities and edges representing relationships between entities in accordance with one or more domains. The method comprises: pre-processing the graph comprising assigning weights to the nodes and/or the edges of the graph in accordance with a specific domain of the domains, wherein the weight indicates a domain specific data quality problem of attribute values representing an edge of the edges and/or an entity involved in that edge. The weighted graph may be provided for enabling a processing of the graph in accordance with the specific domain.
    Type: Grant
    Filed: April 20, 2021
    Date of Patent: March 26, 2024
    Assignee: International Business Machines Corporation
    Inventors: Martin Oberhofer, Mike W. Grasselt, Claudio Andrea Fanconi, Thuany Karoline Stuart, Yannick Saillet, Basem Elasioty, Hemanth Kumar Babu, Robert Kern
  • Publication number: 20240095547
    Abstract: An embodiment for monitoring machine learning models to detect and rectify model drift using governance. The embodiment may receive a plurality of machine learning models and register the plurality of machine learning models to a governance dashboard. The embodiment may automatically monitor the received plurality of machine learning models to identify factors used by each of the received plurality of machine learning models and generate corresponding clusters of similar machine learning models. The embodiment may automatically detect an incorrect decision made by a target machine learning model and then automatically calculate a correlation score between the target machine learning model and machine learning models within an associated corresponding cluster of similar machine learning models. The embodiment may, in response to detecting a correlation score above a threshold, automatically determine and output a cluster reinforcement recommendation.
    Type: Application
    Filed: September 21, 2022
    Publication date: March 21, 2024
    Inventors: Neerju Gupta, Namit Kabra, Yannick Saillet
  • Publication number: 20240078241
    Abstract: An embodiment for managing data using machine learning models and information governance. The embodiment may automatically detect a data analysis request made within a system and identify subject datasets. The embodiment may automatically conduct shallow term assignments on each row and column of data in the subject datasets and automatically match the shallow term assignments for each row and column with a stored set of ranked terms, and automatically flag rows or columns matching with ranked terms above a predetermined threshold ranking for further analysis. The embodiment may automatically and continuously monitor and detect irrelevant metadata types to prevent subsequent analysis and storage of data including the irrelevant metadata types. The embodiment may automatically generate a criticality ranking for stored analysis datasets.
    Type: Application
    Filed: September 7, 2022
    Publication date: March 7, 2024
    Inventors: Neerju Gupta, Namit Kabra, Yannick Saillet
  • Patent number: 11921676
    Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.
    Type: Grant
    Filed: November 29, 2021
    Date of Patent: March 5, 2024
    Assignee: International Business Machines Corporation
    Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet
  • Patent number: 11748382
    Abstract: A method provides for classifying data fields of a dataset. A classifier configured for determining confidence values for a plurality of data classes for the data fields may be applied. Using the confidence values, data class candidates may be identified. Data fields may be determined for which a plurality of data class candidates is identifiable. Using previous user-selected data class assignments, a probability may be determined for the data class candidates that the respective data class candidate is a data class to which the respective data field is to be assigned. The data fields may be classified using the probabilities to select for the data fields a data class from the data class candidates. The dataset may be provided with metadata identifying for the data fields the data classes to which the respective data fields are assigned.
    Type: Grant
    Filed: May 18, 2020
    Date of Patent: September 5, 2023
    Assignee: International Business Machines Corporation
    Inventors: Yannick Saillet, Namit Kabra, Mike W. Grasselt, Krishna Kishore Bonagiri
  • Patent number: 11734233
    Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: August 22, 2023
    Assignee: International Business Machines Corporation
    Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
  • Publication number: 20230186023
    Abstract: In an approach, a processor receives an unstructured text document. A processor extracts at least one unrecognized token from the unstructured text document. A processor identifies at least one structured data element in a predefined set of data sources, where the at least one structured data element is related to the at least one extracted unrecognized token from the unstructured text document. A processor relates a label associated with the identified at least one structured data element to the unstructured text document.
    Type: Application
    Filed: December 13, 2021
    Publication date: June 15, 2023
    Inventors: Yannick Saillet, Alexander Lang, Robert Kern, Gudrun Kaufmann
  • Patent number: 11675838
    Abstract: An approach is provided for completing a pipeline graph. Using a deep learning based sequence model, an initial data pipeline having a sequence of nodes is generated. Mismatch(es) between data formats required by input and output in the sequence of nodes is identified. Virtual gap node(s) that correct the mismatch(es) are added to the initial data pipeline. For a given virtual gap node, tentative graph structures are determined using knowledge graphs and a crowd sourced validation system. Reuse forecast scores and performance scores for the tentative graph structures are calculated. Based on the reuse forecast scores and the performance scores, a final graph structure for implementing the given virtual gap node is determined.
    Type: Grant
    Filed: May 11, 2021
    Date of Patent: June 13, 2023
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Ritesh Kumar Gupta, Yannick Saillet, Vijay Ekambaram
  • Publication number: 20230177193
    Abstract: A database system can comprise records, each record including a set of attributes. The database system can further comprise database views, each database view representing a subset of the set of attributes. Data purpose objects indicating a subset of attributes of the set of attributes and a processing purpose can be stored. Each processing purpose can be associated with one or more entities that authorized access to the subset of attributes of the processing purpose. A request for data for a specific processing purpose and a selected view of the database views can be received. A data purpose object that indicates the specific processing purpose can be retrieved. The subset of attributes represented by the selected view can be compared with the subset of the attributes indicated in the retrieved data purpose object. Values of the subset of attributes of the selected view can be provided.
    Type: Application
    Filed: December 8, 2021
    Publication date: June 8, 2023
    Inventors: Lars Bremer, Albert Maier, Mike W. Grasselt, Yannick Saillet, Michael Baessler
  • Publication number: 20230169041
    Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.
    Type: Application
    Filed: November 29, 2021
    Publication date: June 1, 2023
    Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet
  • Publication number: 20230153566
    Abstract: Classification of cell data includes obtaining a target dataset and an artificial intelligence (AI) model trained to identify relationship(s) between cells of a row and classify whether a focus cell of the row is erroneous based on the identified relationship(s), and applying the AI model to the target dataset to identify erroneous cell(s) thereof. The applying includes selecting a row of cells of the target dataset, inputting the selected row of cells to the AI model with an identification of a focus cell, the focus cell to be classified by the AI model, classifying the focus cell to obtain a classification of the focus cell, the classifying identifying whether the focus cell is erroneous, and outputting an indication of the classification of the focus cell.
    Type: Application
    Filed: November 18, 2021
    Publication date: May 18, 2023
    Inventors: Shaikh Shahriar Quader, Omar Al-Shamali, James Miller, Yannick Saillet, Albert Maier, Remus Lazar
  • Patent number: 11651055
    Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a first graph comprising first nodes representing first entities and first edges representing relationships between first entities, the first nodes being associated with first entity attributes descriptive of the first entities represented by the first nodes, the first edges being associated with first edge attributes descriptive of the relationships represented by the first edges; determining a first subgraph for a certain node of the first nodes of the first graph, the first subgraph including the certain node and at least one neighboring node of the certain node; and determining a data quality issue regarding the certain node based, at least in part, on applying one or more applicable rules of a set of data quality rules to first entity attribute values and first edge attribute values of the first subgraph.
    Type: Grant
    Filed: October 29, 2020
    Date of Patent: May 16, 2023
    Assignee: International Business Machines Corporation
    Inventors: Yannick Saillet, Claudio Andrea Fanconi, Martin Oberhofer, Hemanth Kumar Babu, Basem Elasioty, Mike W. Grasselt, Robert Kern, Thuany Karoline Stuart
  • Patent number: 11550813
    Abstract: Techniques are described relating to automatic data standardization in a managed services domain of a cloud computing environment. An associated computer-implemented method includes receiving a dataset during a data onboarding procedure and classifying datapoints within the dataset. The method further includes applying a machine learning data standardization model to each classified datapoint within the dataset and deriving a proposed set of data standardization rules for the dataset based upon any standardization modification determined consequent to model application. Optionally, the method includes presenting the proposed set of data standardization rules for client review and, responsive to acceptance of the proposed set of data standardization rules, applying the proposed set of data standardization rules to the dataset. The method further includes, responsive to acceptance of the proposed set of data standardization rules, updating the machine learning data standardization model accordingly.
    Type: Grant
    Filed: February 24, 2021
    Date of Patent: January 10, 2023
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Krishna Kishore Bonagiri, Mike W. Grasselt, Yannick Saillet
  • Publication number: 20220414401
    Abstract: A machine-learning model that is using production data and is operating in a production environment within a data-sensitive realm is analyzed, where this model was trained using a training dataset. An accuracy of the model is identified as falling below an accuracy threshold when providing one or more predictions of a subset of the production data. At least one characteristic of the production data that is used to predict the subset of the production data is determined to be underrepresented in the training dataset. The one or more predictions and the at least one characteristic are provided to a location outside of the production environment.
    Type: Application
    Filed: June 23, 2021
    Publication date: December 29, 2022
    Inventors: Yannick Saillet, Chris Immanuel Harlander
  • Patent number: 11537552
    Abstract: A computer system, computer program product, and a computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules is disclosed. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.
    Type: Grant
    Filed: May 11, 2020
    Date of Patent: December 27, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
  • Publication number: 20220391848
    Abstract: Embodiments of the present invention provide methods, computer program products, and systems. Embodiments of the present invention can condense a hierarchy in a data governance system, wherein the hierarchy comprises a root node and at least one child node comprising related sub-trees by determining, for a parent node in the hierarchy of governance system, governance terms and respective assignment relationships from a plurality of information assets, determining usage of the governance term in at least one of a plurality of governance rules, and marking a governance term of the plurality of governance terms for elimination based on the determined assignment relationships and the determined usage of the governance term in the plurality of governance rules. Embodiments of the present invention can then delete the governance term from the hierarchy if the governance term is marked for elimination.
    Type: Application
    Filed: June 7, 2021
    Publication date: December 8, 2022
    Inventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
  • Patent number: 11500876
    Abstract: Embodiments of the present invention determines duplicates in a graph. The graph comprises nodes representing entities and edges representing relationships between the entities. The method comprises: identifying at least two nodes in the graph. A neighborhood subgraph may be determined for each of the two nodes. The neighborhood subgraph includes the respective node. The method further comprises determining whether the two nodes are duplicates with respect to each other, based on a result of a comparison between the two subgraphs.
    Type: Grant
    Filed: December 8, 2020
    Date of Patent: November 15, 2022
    Assignee: International Business Machines Corporation
    Inventors: Thuany Karoline Stuart, Basem Elasioty, Claudio Andrea Fanconi, Mike W. Grasselt, Hemanth Kumar Babu, Yannick Saillet, Robert Kern, Martin Oberhofer, Lars Bremer, Jonathan Roesner, Jason Allen Woods
  • Patent number: 11487770
    Abstract: A computer implemented method is used for sorting data elements of a given set. The method includes performing an evaluation of a first type of usage of each data element. The method includes determining a set of data element candidates dependent on the evaluation of the first type of usage. The method includes performing an evaluation of a second type of usage of each data element of the set of data element candidates. The method includes sorting the data elements of the set of data element candidates dependent on the evaluation of the second type of usage of each data element of the set of data element candidates. The method includes providing the sorted data elements of the set of data element candidates, and in response, receiving a request for a data processing based on the provided sorted data elements of the set of data element candidates.
    Type: Grant
    Filed: May 18, 2020
    Date of Patent: November 1, 2022
    Assignee: International Business Machines Corporation
    Inventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
  • Patent number: 11481368
    Abstract: In an approach for automatically ranking and routing data quality remediation tasks, a processor analyzes a data set ingested by a repository to produce a set of data quality problems. A processor computes a score for each data quality problem of the set of data quality problems. A processor identifies a route to send each data quality problem of the set of data quality problems. A processor exports each data quality problem according to the score and the route.
    Type: Grant
    Filed: June 20, 2019
    Date of Patent: October 25, 2022
    Assignee: International Business Machines Corporation
    Inventors: Yannick Saillet, Namit Kabra, Manish Anand Bhide
  • Publication number: 20220318028
    Abstract: A database of deployed configurations, as well as attempted configurations that failed is maintained and used as reference to compare against configurations of attempted software deployments. Upon detecting a failed deployment, disclosed embodiments search the database for working configurations that most closely resemble the failed configuration, and rank the configurations based on various criteria. Disclosed embodiments may then automatically select a highest ranked working configuration, and perform an automatic upgrade of the necessary components to create a working configuration.
    Type: Application
    Filed: April 6, 2021
    Publication date: October 6, 2022
    Inventors: Krishna Kishore Bonagiri, Namit Kabra, Yannick Saillet, Mike W. Grasselt