Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD FOR DUPLICATE DETERMINATION IN A GRAPH

Publication number: 20210342352

Abstract: Embodiments of the present invention determines duplicates in a graph. The graph comprises nodes representing entities and edges representing relationships between the entities. The method comprises: identifying at least two nodes in the graph. A neighborhood subgraph may be determined for each of the two nodes. The neighborhood subgraph includes the respective node. The method further comprises determining whether the two nodes are duplicates with respect to each other, based on a result of a comparison between the two subgraphs.

Type: Application

Filed: December 8, 2020

Publication date: November 4, 2021

Inventors: Thuany Karoline Stuart, Basem Elasioty, Claudio Andrea Fanconi, Mike W. Grasselt, Hemanth Kumar Babu, Yannick Saillet, Robert Kern, Martin Oberhofer, Lars Bremer, Jonathan Roesner, Jason Allen Woods
METHOD FOR WEIGHTING A GRAPH

Publication number: 20210342397

Abstract: The present disclosure relates to a method for a weighting graph comprising nodes representing entities and edges representing relationships between entities in accordance with one or more domains. The method comprises: pre-processing the graph comprising assigning weights to the nodes and/or the edges of the graph in accordance with a specific domain of the domains, wherein the weight indicates a domain specific data quality problem of attribute values representing an edge of the edges and/or an entity involved in that edge. The weighted graph may be provided for enabling a processing of the graph in accordance with the specific domain.

Type: Application

Filed: April 20, 2021

Publication date: November 4, 2021

Inventors: Martin Oberhofer, Mike W. Grasselt, Claudio Andrea Fanconi, Thuany Karoline Stuart, Yannick Saillet, Basem Elasioty, Hemanth Kumar Babu, Robert Kern
Data anonymization

Patent number: 11106820

Abstract: The present disclosure relates to a method for data anonymization of a database system. The method comprises: determining if a first dataset and second dataset of the database system have a relationship indicative of an entity having values in the two datasets. A request may be received from a user for at least one of the first and second datasets. In case the first dataset and second dataset have the relationship, at least one of the first and second datasets may be modified such that the indication of the entity is not accessible to the user. And the requested dataset may be provided.

Type: Grant

Filed: March 19, 2018

Date of Patent: August 31, 2021

Assignee: International Business Machines Corporation

Inventors: Martin Oberhofer, Albert Maier, Yannick Saillet
DATA QUALITY EVALUATION

Publication number: 20210240677

Abstract: Methods and systems for data quality evaluation are disclosed. A method includes: receiving, by a computing device, at least one data set and a list of rule expressions to bind; building, by the computing device, candidate binding combinations between columns of the at least one data set and variables of each rule expression in the list of rule expressions; building, by the computing device, a new bound rule expression candidate based on the candidate binding combinations; generating, by the computing device, a new bound rule expression based on the new bound rule expression candidate and a data transformation applied to at least one of the columns of the at least one data set; and storing, by the computing device, the new bound rule expression.

Type: Application

Filed: January 30, 2020

Publication date: August 5, 2021

Inventors: Kunjavihari Madhav Kashalikar, Yannick Saillet, Ketki Ramesh Purandare
Data sampling in a storage system

Patent number: 11036701

Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.

Type: Grant

Filed: January 6, 2020

Date of Patent: June 15, 2021

Assignee: International Business Machines Corporation

Inventors: Albert Maier, Yannick Saillet, Damir Spisic
Data dictionary with a reduced need for rebuilding

Patent number: 11023452

Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.

Type: Grant

Filed: June 8, 2015

Date of Patent: June 1, 2021

Assignee: International Business Machines Corporation

Inventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert
Model-driven profiling job generator for data sources

Patent number: 11023483

Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.

Type: Grant

Filed: August 4, 2016

Date of Patent: June 1, 2021

Assignee: International Business Machines Corporation

Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
Data classification

Patent number: 11023497

Abstract: Data classification includes tracking classification of columns of data into data classes of a collection of classes available for classifying the columns, obtaining a target column of data, of a target dataset, to be classified into a data class of the collection of candidate classes, and classifying the target column of data into a data class of the collection of classes based on historical data classification characteristics provided by the tracking. The classifying includes selecting a group of candidate data classes of the collection of classes to compare to value(s) of the target column, the selecting excludes at least some candidate data classes of the collection from comparison to the value(s), and establishing a priority between the candidate data classes of the group of candidate classes in comparing the value(s) of the target column of data to the selected group of candidate classes.

Type: Grant

Filed: September 12, 2019

Date of Patent: June 1, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Namit Kabra, Krishna Kishore Bonagiri, Yannick Saillet, Mike W. Grasselt
Model-driven profiling job generator for data sources

Patent number: 11023484

Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.

Type: Grant

Filed: December 6, 2017

Date of Patent: June 1, 2021

Assignee: International Business Machines Corporation

Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
Data and memory reorganization

Patent number: 11017874

Abstract: A method and system for improving data and memory reorganization and storage technology is provided. The method includes configuring data capture and analysis settings of a database system resulting in configured data capture settings. A data and associated memory analysis request is received and specified test code is selected. A specified portion of data and associated memory is selected and the specified analysis code is executed resulting in execution of said specified type of analysis with respect to the specified portion of said data and associated memory. The specified portion of said data and associated memory is modified and stored.

Type: Grant

Filed: May 3, 2019

Date of Patent: May 25, 2021

Assignee: International Business Machines Corporation

Inventors: Yannick Saillet, Namit Kabra, Likhitha Maddirala, Ritesh Kumar Gupta
DYNAMICALLY MODIFYING THE PARALLELISM OF A TASK IN A PIPELINE

Publication number: 20210124611

Abstract: In an approach to dynamically identifying and modifying the parallelism of a particular task in a pipeline, the optimal execution time of each stage in a dynamic pipeline is calculated. The actual execution time of each stage in the dynamic pipeline is measured. Whether the actual time of completion of the data processing job will exceed a threshold is determined. If it is determined that the actual time of completion of the data processing job will exceed the threshold, then additional instances of the stages are created.

Type: Application

Filed: October 25, 2019

Publication date: April 29, 2021

Inventors: Yannick Saillet, Namit Kabra, Ritesh Kumar Gupta
DATA CLASSIFICATION

Publication number: 20210081435

Abstract: Data classification includes tracking classification of columns of data into data classes of a collection of classes available for classifying the columns, obtaining a target column of data, of a target dataset, to be classified into a data class of the collection of candidate classes, and classifying the target column of data into a data class of the collection of classes based on historical data classification characteristics provided by the tracking. The classifying includes selecting a group of candidate data classes of the collection of classes to compare to value(s) of the target column, the selecting excludes at least some candidate data classes of the collection from comparison to the value(s), and establishing a priority between the candidate data classes of the group of candidate classes in comparing the value(s) of the target column of data to the selected group of candidate classes.

Type: Application

Filed: September 12, 2019

Publication date: March 18, 2021

Inventors: Namit KABRA, Krishna Kishore BONAGIRI, Yannick SAILLET, Mike W. GRASSELT
DATA CLASSIFICATION

Publication number: 20210026872

Abstract: A method provides for classifying data fields of a dataset. A classifier configured for determining confidence values for a plurality of data classes for the data fields may be applied. Using the confidence values, data class candidates may be identified. Data fields may be determined for which a plurality of data class candidates is identifiable. Using previous user-selected data class assignments, a probability may be determined for the data class candidates that the respective data class candidate is a data class to which the respective data field is to be assigned. The data fields may be classified using the probabilities to select for the data fields a data class from the data class candidates. The dataset may be provided with metadata identifying for the data fields the data classes to which the respective data fields are assigned.

Type: Application

Filed: May 18, 2020

Publication date: January 28, 2021

Inventors: Yannick Saillet, Namit Kabra, Mike W. Grasselt, Krishna Kishore Bonagiri
AUTOMATICALLY RANK AND ROUTE DATA QUALITY REMEDIATION TASKS

Publication number: 20200401565

Abstract: In an approach for automatically ranking and routing data quality remediation tasks, a processor analyzes a data set ingested by a repository to produce a set of data quality problems. A processor computes a score for each data quality problem of the set of data quality problems. A processor identifies a route to send each data quality problem of the set of data quality problems. A processor exports each data quality problem according to the score and the route.

Type: Application

Filed: June 20, 2019

Publication date: December 24, 2020

Inventors: Yannick Saillet, Namit Kabra, Manish Anand Bhide
RESOURCE AVAILABILITY-BASED WORKFLOW EXECUTION TIMING DETERMINATION

Publication number: 20200379803

Abstract: According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.

Type: Application

Filed: May 29, 2019

Publication date: December 3, 2020

Inventors: Yannick Saillet, Namit Kabra
DATA AND MEMORY REORGANIZATION

Publication number: 20200350032

Abstract: A method and system for improving data and memory reorganization and storage technology is provided. The method includes configuring data capture and analysis settings of a database system resulting in configured data capture settings. A data and associated memory analysis request is received and specified test code is selected. A specified portion of data and associated memory is selected and the specified analysis code is executed resulting in execution of said specified type of analysis with respect to the specified portion of said data and associated memory. The specified portion of said data and associated memory is modified and stored.

Type: Application

Filed: May 3, 2019

Publication date: November 5, 2020

Inventors: Yannick Saillet, Namit Kabra, Likhitha Maddirala, Ritesh Kumar Gupta
Column weight calculation for data deduplication

Patent number: 10789225

Abstract: A method to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.

Type: Grant

Filed: November 8, 2017

Date of Patent: September 29, 2020

Assignee: International Business Machines Corporation

Inventors: Namit Kabra, Yannick Saillet
RULE GENERATION IN A DATA GOVERNANCE FRAMEWORK

Publication number: 20200272598

Abstract: A computer system, computer program product, and a computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules is disclosed. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.

Type: Application

Filed: May 11, 2020

Publication date: August 27, 2020

Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
Rule generation in a data governance framework

Patent number: 10747716

Abstract: The invention relates to computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.

Type: Grant

Filed: December 11, 2017

Date of Patent: August 18, 2020

Assignee: International Business Machines Corporation

Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
Cognitive data anonymization

Patent number: 10740488

Abstract: A computer implemented method for data anonymization comprises: receiving a request for data that needs anonymization. The request comprises at least one field descriptor of data to be retrieved and a usage scenario of a user for the requested data. Then, based on the usage scenario, an anonymization algorithm to be applied to the data that is referred to by the field descriptor is determined. Subsequently, the determined anonymization algorithm is applied to the data that is referred to by the field descriptor. A testing is performed, as to whether the degree of anonymization fulfills a requirement that is related to the usage scenario. In the case, the requirement is fulfilled, access to the anonymized data is provided.

Type: Grant

Filed: November 17, 2017

Date of Patent: August 11, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Albert Maier, Martin Oberhofer, Yannick Saillet

prev 1 2 3 4 5 6 7 … next