Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210342352
    Abstract: Embodiments of the present invention determines duplicates in a graph. The graph comprises nodes representing entities and edges representing relationships between the entities. The method comprises: identifying at least two nodes in the graph. A neighborhood subgraph may be determined for each of the two nodes. The neighborhood subgraph includes the respective node. The method further comprises determining whether the two nodes are duplicates with respect to each other, based on a result of a comparison between the two subgraphs.
    Type: Application
    Filed: December 8, 2020
    Publication date: November 4, 2021
    Inventors: Thuany Karoline Stuart, Basem Elasioty, Claudio Andrea Fanconi, Mike W. Grasselt, Hemanth Kumar Babu, Yannick Saillet, Robert Kern, Martin Oberhofer, Lars Bremer, Jonathan Roesner, Jason Allen Woods
  • Publication number: 20210342397
    Abstract: The present disclosure relates to a method for a weighting graph comprising nodes representing entities and edges representing relationships between entities in accordance with one or more domains. The method comprises: pre-processing the graph comprising assigning weights to the nodes and/or the edges of the graph in accordance with a specific domain of the domains, wherein the weight indicates a domain specific data quality problem of attribute values representing an edge of the edges and/or an entity involved in that edge. The weighted graph may be provided for enabling a processing of the graph in accordance with the specific domain.
    Type: Application
    Filed: April 20, 2021
    Publication date: November 4, 2021
    Inventors: Martin Oberhofer, Mike W. Grasselt, Claudio Andrea Fanconi, Thuany Karoline Stuart, Yannick Saillet, Basem Elasioty, Hemanth Kumar Babu, Robert Kern
  • Patent number: 11106820
    Abstract: The present disclosure relates to a method for data anonymization of a database system. The method comprises: determining if a first dataset and second dataset of the database system have a relationship indicative of an entity having values in the two datasets. A request may be received from a user for at least one of the first and second datasets. In case the first dataset and second dataset have the relationship, at least one of the first and second datasets may be modified such that the indication of the entity is not accessible to the user. And the requested dataset may be provided.
    Type: Grant
    Filed: March 19, 2018
    Date of Patent: August 31, 2021
    Assignee: International Business Machines Corporation
    Inventors: Martin Oberhofer, Albert Maier, Yannick Saillet
  • Publication number: 20210240677
    Abstract: Methods and systems for data quality evaluation are disclosed. A method includes: receiving, by a computing device, at least one data set and a list of rule expressions to bind; building, by the computing device, candidate binding combinations between columns of the at least one data set and variables of each rule expression in the list of rule expressions; building, by the computing device, a new bound rule expression candidate based on the candidate binding combinations; generating, by the computing device, a new bound rule expression based on the new bound rule expression candidate and a data transformation applied to at least one of the columns of the at least one data set; and storing, by the computing device, the new bound rule expression.
    Type: Application
    Filed: January 30, 2020
    Publication date: August 5, 2021
    Inventors: Kunjavihari Madhav Kashalikar, Yannick Saillet, Ketki Ramesh Purandare
  • Patent number: 11036701
    Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.
    Type: Grant
    Filed: January 6, 2020
    Date of Patent: June 15, 2021
    Assignee: International Business Machines Corporation
    Inventors: Albert Maier, Yannick Saillet, Damir Spisic
  • Patent number: 11023452
    Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.
    Type: Grant
    Filed: June 8, 2015
    Date of Patent: June 1, 2021
    Assignee: International Business Machines Corporation
    Inventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert
  • Patent number: 11023483
    Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.
    Type: Grant
    Filed: August 4, 2016
    Date of Patent: June 1, 2021
    Assignee: International Business Machines Corporation
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
  • Patent number: 11023497
    Abstract: Data classification includes tracking classification of columns of data into data classes of a collection of classes available for classifying the columns, obtaining a target column of data, of a target dataset, to be classified into a data class of the collection of candidate classes, and classifying the target column of data into a data class of the collection of classes based on historical data classification characteristics provided by the tracking. The classifying includes selecting a group of candidate data classes of the collection of classes to compare to value(s) of the target column, the selecting excludes at least some candidate data classes of the collection from comparison to the value(s), and establishing a priority between the candidate data classes of the group of candidate classes in comparing the value(s) of the target column of data to the selected group of candidate classes.
    Type: Grant
    Filed: September 12, 2019
    Date of Patent: June 1, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Namit Kabra, Krishna Kishore Bonagiri, Yannick Saillet, Mike W. Grasselt
  • Patent number: 11023484
    Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.
    Type: Grant
    Filed: December 6, 2017
    Date of Patent: June 1, 2021
    Assignee: International Business Machines Corporation
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
  • Patent number: 11017874
    Abstract: A method and system for improving data and memory reorganization and storage technology is provided. The method includes configuring data capture and analysis settings of a database system resulting in configured data capture settings. A data and associated memory analysis request is received and specified test code is selected. A specified portion of data and associated memory is selected and the specified analysis code is executed resulting in execution of said specified type of analysis with respect to the specified portion of said data and associated memory. The specified portion of said data and associated memory is modified and stored.
    Type: Grant
    Filed: May 3, 2019
    Date of Patent: May 25, 2021
    Assignee: International Business Machines Corporation
    Inventors: Yannick Saillet, Namit Kabra, Likhitha Maddirala, Ritesh Kumar Gupta
  • Publication number: 20210124611
    Abstract: In an approach to dynamically identifying and modifying the parallelism of a particular task in a pipeline, the optimal execution time of each stage in a dynamic pipeline is calculated. The actual execution time of each stage in the dynamic pipeline is measured. Whether the actual time of completion of the data processing job will exceed a threshold is determined. If it is determined that the actual time of completion of the data processing job will exceed the threshold, then additional instances of the stages are created.
    Type: Application
    Filed: October 25, 2019
    Publication date: April 29, 2021
    Inventors: Yannick Saillet, Namit Kabra, Ritesh Kumar Gupta
  • Publication number: 20210081435
    Abstract: Data classification includes tracking classification of columns of data into data classes of a collection of classes available for classifying the columns, obtaining a target column of data, of a target dataset, to be classified into a data class of the collection of candidate classes, and classifying the target column of data into a data class of the collection of classes based on historical data classification characteristics provided by the tracking. The classifying includes selecting a group of candidate data classes of the collection of classes to compare to value(s) of the target column, the selecting excludes at least some candidate data classes of the collection from comparison to the value(s), and establishing a priority between the candidate data classes of the group of candidate classes in comparing the value(s) of the target column of data to the selected group of candidate classes.
    Type: Application
    Filed: September 12, 2019
    Publication date: March 18, 2021
    Inventors: Namit KABRA, Krishna Kishore BONAGIRI, Yannick SAILLET, Mike W. GRASSELT
  • Publication number: 20210026872
    Abstract: A method provides for classifying data fields of a dataset. A classifier configured for determining confidence values for a plurality of data classes for the data fields may be applied. Using the confidence values, data class candidates may be identified. Data fields may be determined for which a plurality of data class candidates is identifiable. Using previous user-selected data class assignments, a probability may be determined for the data class candidates that the respective data class candidate is a data class to which the respective data field is to be assigned. The data fields may be classified using the probabilities to select for the data fields a data class from the data class candidates. The dataset may be provided with metadata identifying for the data fields the data classes to which the respective data fields are assigned.
    Type: Application
    Filed: May 18, 2020
    Publication date: January 28, 2021
    Inventors: Yannick Saillet, Namit Kabra, Mike W. Grasselt, Krishna Kishore Bonagiri
  • Publication number: 20200401565
    Abstract: In an approach for automatically ranking and routing data quality remediation tasks, a processor analyzes a data set ingested by a repository to produce a set of data quality problems. A processor computes a score for each data quality problem of the set of data quality problems. A processor identifies a route to send each data quality problem of the set of data quality problems. A processor exports each data quality problem according to the score and the route.
    Type: Application
    Filed: June 20, 2019
    Publication date: December 24, 2020
    Inventors: Yannick Saillet, Namit Kabra, Manish Anand Bhide
  • Publication number: 20200379803
    Abstract: According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.
    Type: Application
    Filed: May 29, 2019
    Publication date: December 3, 2020
    Inventors: Yannick Saillet, Namit Kabra
  • Publication number: 20200350032
    Abstract: A method and system for improving data and memory reorganization and storage technology is provided. The method includes configuring data capture and analysis settings of a database system resulting in configured data capture settings. A data and associated memory analysis request is received and specified test code is selected. A specified portion of data and associated memory is selected and the specified analysis code is executed resulting in execution of said specified type of analysis with respect to the specified portion of said data and associated memory. The specified portion of said data and associated memory is modified and stored.
    Type: Application
    Filed: May 3, 2019
    Publication date: November 5, 2020
    Inventors: Yannick Saillet, Namit Kabra, Likhitha Maddirala, Ritesh Kumar Gupta
  • Patent number: 10789225
    Abstract: A method to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.
    Type: Grant
    Filed: November 8, 2017
    Date of Patent: September 29, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20200272598
    Abstract: A computer system, computer program product, and a computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules is disclosed. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.
    Type: Application
    Filed: May 11, 2020
    Publication date: August 27, 2020
    Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
  • Patent number: 10747716
    Abstract: The invention relates to computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.
    Type: Grant
    Filed: December 11, 2017
    Date of Patent: August 18, 2020
    Assignee: International Business Machines Corporation
    Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
  • Patent number: 10740488
    Abstract: A computer implemented method for data anonymization comprises: receiving a request for data that needs anonymization. The request comprises at least one field descriptor of data to be retrieved and a usage scenario of a user for the requested data. Then, based on the usage scenario, an anonymization algorithm to be applied to the data that is referred to by the field descriptor is determined. Subsequently, the determined anonymization algorithm is applied to the data that is referred to by the field descriptor. A testing is performed, as to whether the degree of anonymization fulfills a requirement that is related to the usage scenario. In the case, the requirement is fulfilled, access to the anonymized data is provided.
    Type: Grant
    Filed: November 17, 2017
    Date of Patent: August 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Albert Maier, Martin Oberhofer, Yannick Saillet