Patents by Inventor Yannick Saillet
Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210342352Abstract: Embodiments of the present invention determines duplicates in a graph. The graph comprises nodes representing entities and edges representing relationships between the entities. The method comprises: identifying at least two nodes in the graph. A neighborhood subgraph may be determined for each of the two nodes. The neighborhood subgraph includes the respective node. The method further comprises determining whether the two nodes are duplicates with respect to each other, based on a result of a comparison between the two subgraphs.Type: ApplicationFiled: December 8, 2020Publication date: November 4, 2021Inventors: Thuany Karoline Stuart, Basem Elasioty, Claudio Andrea Fanconi, Mike W. Grasselt, Hemanth Kumar Babu, Yannick Saillet, Robert Kern, Martin Oberhofer, Lars Bremer, Jonathan Roesner, Jason Allen Woods
-
Publication number: 20210342397Abstract: The present disclosure relates to a method for a weighting graph comprising nodes representing entities and edges representing relationships between entities in accordance with one or more domains. The method comprises: pre-processing the graph comprising assigning weights to the nodes and/or the edges of the graph in accordance with a specific domain of the domains, wherein the weight indicates a domain specific data quality problem of attribute values representing an edge of the edges and/or an entity involved in that edge. The weighted graph may be provided for enabling a processing of the graph in accordance with the specific domain.Type: ApplicationFiled: April 20, 2021Publication date: November 4, 2021Inventors: Martin Oberhofer, Mike W. Grasselt, Claudio Andrea Fanconi, Thuany Karoline Stuart, Yannick Saillet, Basem Elasioty, Hemanth Kumar Babu, Robert Kern
-
Patent number: 11106820Abstract: The present disclosure relates to a method for data anonymization of a database system. The method comprises: determining if a first dataset and second dataset of the database system have a relationship indicative of an entity having values in the two datasets. A request may be received from a user for at least one of the first and second datasets. In case the first dataset and second dataset have the relationship, at least one of the first and second datasets may be modified such that the indication of the entity is not accessible to the user. And the requested dataset may be provided.Type: GrantFiled: March 19, 2018Date of Patent: August 31, 2021Assignee: International Business Machines CorporationInventors: Martin Oberhofer, Albert Maier, Yannick Saillet
-
Publication number: 20210240677Abstract: Methods and systems for data quality evaluation are disclosed. A method includes: receiving, by a computing device, at least one data set and a list of rule expressions to bind; building, by the computing device, candidate binding combinations between columns of the at least one data set and variables of each rule expression in the list of rule expressions; building, by the computing device, a new bound rule expression candidate based on the candidate binding combinations; generating, by the computing device, a new bound rule expression based on the new bound rule expression candidate and a data transformation applied to at least one of the columns of the at least one data set; and storing, by the computing device, the new bound rule expression.Type: ApplicationFiled: January 30, 2020Publication date: August 5, 2021Inventors: Kunjavihari Madhav Kashalikar, Yannick Saillet, Ketki Ramesh Purandare
-
Patent number: 11036701Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.Type: GrantFiled: January 6, 2020Date of Patent: June 15, 2021Assignee: International Business Machines CorporationInventors: Albert Maier, Yannick Saillet, Damir Spisic
-
Patent number: 11023452Abstract: A processor receives statistical information about a data set included in a column of a data table. The processor receives additional information about the data set that indicates a data format utilized by the data set and a type of information represented by the data set. The processor generates a data dictionary for compression of the data set based, at least in part, on the statistical information and the additional information. The data dictionary is created such that the data dictionary is capable of compressing data that is statistically predicted to be received at a future point.Type: GrantFiled: June 8, 2015Date of Patent: June 1, 2021Assignee: International Business Machines CorporationInventors: Martin A. Oberhofer, Yannick Saillet, Jens Seifert
-
Patent number: 11023483Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.Type: GrantFiled: August 4, 2016Date of Patent: June 1, 2021Assignee: International Business Machines CorporationInventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
-
Patent number: 11023497Abstract: Data classification includes tracking classification of columns of data into data classes of a collection of classes available for classifying the columns, obtaining a target column of data, of a target dataset, to be classified into a data class of the collection of candidate classes, and classifying the target column of data into a data class of the collection of classes based on historical data classification characteristics provided by the tracking. The classifying includes selecting a group of candidate data classes of the collection of classes to compare to value(s) of the target column, the selecting excludes at least some candidate data classes of the collection from comparison to the value(s), and establishing a priority between the candidate data classes of the group of candidate classes in comparing the value(s) of the target column of data to the selected group of candidate classes.Type: GrantFiled: September 12, 2019Date of Patent: June 1, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Namit Kabra, Krishna Kishore Bonagiri, Yannick Saillet, Mike W. Grasselt
-
Patent number: 11023484Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.Type: GrantFiled: December 6, 2017Date of Patent: June 1, 2021Assignee: International Business Machines CorporationInventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
-
Patent number: 11017874Abstract: A method and system for improving data and memory reorganization and storage technology is provided. The method includes configuring data capture and analysis settings of a database system resulting in configured data capture settings. A data and associated memory analysis request is received and specified test code is selected. A specified portion of data and associated memory is selected and the specified analysis code is executed resulting in execution of said specified type of analysis with respect to the specified portion of said data and associated memory. The specified portion of said data and associated memory is modified and stored.Type: GrantFiled: May 3, 2019Date of Patent: May 25, 2021Assignee: International Business Machines CorporationInventors: Yannick Saillet, Namit Kabra, Likhitha Maddirala, Ritesh Kumar Gupta
-
Publication number: 20210124611Abstract: In an approach to dynamically identifying and modifying the parallelism of a particular task in a pipeline, the optimal execution time of each stage in a dynamic pipeline is calculated. The actual execution time of each stage in the dynamic pipeline is measured. Whether the actual time of completion of the data processing job will exceed a threshold is determined. If it is determined that the actual time of completion of the data processing job will exceed the threshold, then additional instances of the stages are created.Type: ApplicationFiled: October 25, 2019Publication date: April 29, 2021Inventors: Yannick Saillet, Namit Kabra, Ritesh Kumar Gupta
-
Publication number: 20210081435Abstract: Data classification includes tracking classification of columns of data into data classes of a collection of classes available for classifying the columns, obtaining a target column of data, of a target dataset, to be classified into a data class of the collection of candidate classes, and classifying the target column of data into a data class of the collection of classes based on historical data classification characteristics provided by the tracking. The classifying includes selecting a group of candidate data classes of the collection of classes to compare to value(s) of the target column, the selecting excludes at least some candidate data classes of the collection from comparison to the value(s), and establishing a priority between the candidate data classes of the group of candidate classes in comparing the value(s) of the target column of data to the selected group of candidate classes.Type: ApplicationFiled: September 12, 2019Publication date: March 18, 2021Inventors: Namit KABRA, Krishna Kishore BONAGIRI, Yannick SAILLET, Mike W. GRASSELT
-
Publication number: 20210026872Abstract: A method provides for classifying data fields of a dataset. A classifier configured for determining confidence values for a plurality of data classes for the data fields may be applied. Using the confidence values, data class candidates may be identified. Data fields may be determined for which a plurality of data class candidates is identifiable. Using previous user-selected data class assignments, a probability may be determined for the data class candidates that the respective data class candidate is a data class to which the respective data field is to be assigned. The data fields may be classified using the probabilities to select for the data fields a data class from the data class candidates. The dataset may be provided with metadata identifying for the data fields the data classes to which the respective data fields are assigned.Type: ApplicationFiled: May 18, 2020Publication date: January 28, 2021Inventors: Yannick Saillet, Namit Kabra, Mike W. Grasselt, Krishna Kishore Bonagiri
-
Publication number: 20200401565Abstract: In an approach for automatically ranking and routing data quality remediation tasks, a processor analyzes a data set ingested by a repository to produce a set of data quality problems. A processor computes a score for each data quality problem of the set of data quality problems. A processor identifies a route to send each data quality problem of the set of data quality problems. A processor exports each data quality problem according to the score and the route.Type: ApplicationFiled: June 20, 2019Publication date: December 24, 2020Inventors: Yannick Saillet, Namit Kabra, Manish Anand Bhide
-
Publication number: 20200379803Abstract: According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.Type: ApplicationFiled: May 29, 2019Publication date: December 3, 2020Inventors: Yannick Saillet, Namit Kabra
-
Publication number: 20200350032Abstract: A method and system for improving data and memory reorganization and storage technology is provided. The method includes configuring data capture and analysis settings of a database system resulting in configured data capture settings. A data and associated memory analysis request is received and specified test code is selected. A specified portion of data and associated memory is selected and the specified analysis code is executed resulting in execution of said specified type of analysis with respect to the specified portion of said data and associated memory. The specified portion of said data and associated memory is modified and stored.Type: ApplicationFiled: May 3, 2019Publication date: November 5, 2020Inventors: Yannick Saillet, Namit Kabra, Likhitha Maddirala, Ritesh Kumar Gupta
-
Patent number: 10789225Abstract: A method to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.Type: GrantFiled: November 8, 2017Date of Patent: September 29, 2020Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Publication number: 20200272598Abstract: A computer system, computer program product, and a computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules is disclosed. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.Type: ApplicationFiled: May 11, 2020Publication date: August 27, 2020Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
-
Patent number: 10747716Abstract: The invention relates to computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.Type: GrantFiled: December 11, 2017Date of Patent: August 18, 2020Assignee: International Business Machines CorporationInventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
-
Patent number: 10740488Abstract: A computer implemented method for data anonymization comprises: receiving a request for data that needs anonymization. The request comprises at least one field descriptor of data to be retrieved and a usage scenario of a user for the requested data. Then, based on the usage scenario, an anonymization algorithm to be applied to the data that is referred to by the field descriptor is determined. Subsequently, the determined anonymization algorithm is applied to the data that is referred to by the field descriptor. A testing is performed, as to whether the degree of anonymization fulfills a requirement that is related to the usage scenario. In the case, the requirement is fulfilled, access to the anonymized data is provided.Type: GrantFiled: November 17, 2017Date of Patent: August 11, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Albert Maier, Martin Oberhofer, Yannick Saillet