Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180349184
    Abstract: The invention provides for a method for processing a plurality of data sets (105; 106; 108; 110-113; DB1; DB2) in a data repository (104) for storing at least unstructured data, the method comprising: —providing (302) a set of agents (150-168), each agent being operable to trigger the processing of one or more of the data sets, the execution of each of said agents being automatically triggered in case one or more conditions assigned to said agent are met, at least one of the conditions relating to the existence, structure, content and/or annotations of the data set whose processing can be triggered by said agent; —executing (304) a first one of the agents; —updating (306) the annotations (115) of the first data set by the first agent; and —executing (308) a second one of the agents, said execution being triggered by the updated annotations of the first data set meeting the conditions of the second agent, thereby triggering a further updating of the annotations of the first data set.
    Type: Application
    Filed: August 14, 2018
    Publication date: December 6, 2018
    Inventors: Albert Maier, Yannick Saillet, Harald C. Smith, Daniel C. Wolfson
  • Patent number: 10055430
    Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.
    Type: Grant
    Filed: October 14, 2015
    Date of Patent: August 21, 2018
    Assignee: International Business Machines Corporation
    Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
  • Patent number: 9996558
    Abstract: Embodiments relate to accessing a set of data tables in a source database. A set of table categories is provided for tables in the source database and a set of metrics is provided. For each table of the set of the data tables: the set of metrics is evaluated, the evaluated set of metrics is analyzed, and the table is categorized into one of the set of table categories using the result of the analysis. Information indicative of the table category of each table of the set of tables is output, and in response, a request to select data tables of the set of data tables is received according to a part of the table categories for data processing. A subset of data tables of the set of data tables is selected using the table categories for performing the data processing on the subset of data tables.
    Type: Grant
    Filed: September 3, 2014
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
  • Publication number: 20180137148
    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
    Type: Application
    Filed: November 11, 2016
    Publication date: May 17, 2018
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20180137189
    Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.
    Type: Application
    Filed: November 11, 2016
    Publication date: May 17, 2018
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20180137151
    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
    Type: Application
    Filed: December 5, 2017
    Publication date: May 17, 2018
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20180137193
    Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.
    Type: Application
    Filed: December 7, 2017
    Publication date: May 17, 2018
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20180121535
    Abstract: The present disclosure relates to a method for centrally processing data records using a record linkage algorithm. The method comprises providing a centralized master repository for storing data records in a predefined data structure having a set of attributes. At least one clustering metric is provided. Clusters of records may be determined using a clustering function that is based on the at least one clustering metric. For each particular cluster, a set of configuration data for the record linkage algorithm may be defined based on a value of the clustering metric within that particular cluster. The individual data records may be assigned to one or more clusters of the clusters using the clustering metric values and the record linkage algorithm may be applied to a set of two or more individual data records assigned to at least one common cluster using the set of configuration data for the common cluster.
    Type: Application
    Filed: January 2, 2018
    Publication date: May 3, 2018
    Inventors: Martin Oberhofer, Yannick Saillet, Scott Schumacher, Jens P. Seifert
  • Publication number: 20180113928
    Abstract: The present disclosure relates to a method for centrally processing data records using a record linkage algorithm. The method comprises providing a centralized master repository for storing data records in a predefined data structure having a set of attributes. At least one clustering metric is provided. Clusters of records may be determined using a clustering function that is based on the at least one clustering metric. For each particular cluster, a set of configuration data for the record linkage algorithm may be defined based on a value of the clustering metric within that particular cluster. The individual data records may be assigned to one or more clusters of the clusters using the clustering metric values and the record linkage algorithm may be applied to a set of two or more individual data records assigned to at least one common cluster using the set of configuration data for the common cluster.
    Type: Application
    Filed: October 21, 2016
    Publication date: April 26, 2018
    Inventors: Martin Oberhofer, Yannick Saillet, Scott Schumacher, Jens P. Seifert
  • Publication number: 20180101538
    Abstract: The invention relates to computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.
    Type: Application
    Filed: December 11, 2017
    Publication date: April 12, 2018
    Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
  • Publication number: 20180096038
    Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.
    Type: Application
    Filed: December 6, 2017
    Publication date: April 5, 2018
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
  • Publication number: 20180089233
    Abstract: A mechanism is provided for deduplicating a set of records of data. Each record of the set of records has a set of attributes. The mechanism identifies a subset of records of the set of records each having one or more invalid attribute values. For each invalid attribute value of a given attribute of the subset of the set of records the mechanism determines one or more associated valid candidates of attribute values of the given attribute using the set of records. For each record of the subset of records of the set of records the mechanism replaces the one or more invalid attribute values by one or more combinations of the determined valid candidates of attribute values, resulting in a modified set of records. The mechanism selects a subset of records of the modified set of records that satisfy a consistency condition on the attribute values of each record.
    Type: Application
    Filed: September 26, 2016
    Publication date: March 29, 2018
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20180089235
    Abstract: A mechanism is provided for deduplicating a set of records of data. Each record of the set of records has a set of attributes. The mechanism identifies a subset of records of the set of records each having one or more invalid attribute values. For each invalid attribute value of a given attribute of the subset of the set of records the mechanism determines one or more associated valid candidates of attribute values of the given attribute using the set of records. For each record of the subset of records of the set of records the mechanism replaces the one or more invalid attribute values by one or more combinations of the determined valid candidates of attribute values, resulting in a modified set of records. The mechanism selects a subset of records of the modified set of records that satisfy a consistency condition on the attribute values of each record.
    Type: Application
    Filed: November 28, 2017
    Publication date: March 29, 2018
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20180067973
    Abstract: A method to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.
    Type: Application
    Filed: November 8, 2017
    Publication date: March 8, 2018
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20180039680
    Abstract: Embodiments of the present invention disclose generating a data profiling jobs for source data in a data processing system, the source data being described by at least one source functional data model. A target functional data model is provided, for describing target data that can be generated from the source data. One or more source functional data models are identified that correspond to the target functional data model. At least one functional source-to-target model mapping is associated to at least one source-target pair based on the target functional data model and identified source functional data models. A physical source-to-target model mapping for at least one source-target pair based on the logical source-to-target model mapping is calculated. For all physical source attributes, the needed data profiling jobs are generated based on the target attribute for analyzing the physical source attributes.
    Type: Application
    Filed: August 4, 2016
    Publication date: February 8, 2018
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens P. Seifert
  • Patent number: 9870419
    Abstract: In an embodiment of the invention, a method for data profiling incorporating an enterprise service bus (ESB) coupling the target and source systems following an extraction, transformation, and loading (ETL) process for a target system and a source system is provided. The method includes receiving baseline data profiling results obtained during ETL from a source application to a target application, caching the updates, determining current data profiling results within the ESB for cached updates, and triggering an action if a threshold disparity is detected upon the current data profiling results and the baseline data profiling results.
    Type: Grant
    Filed: February 28, 2012
    Date of Patent: January 16, 2018
    Assignee: International Business Machines Corporation
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
  • Patent number: 9870418
    Abstract: In an embodiment of the invention, a method for data profiling incorporating an enterprise service bus (ESB) coupling the target and source systems following an extraction, transformation, and loading (ETL) process for a target system and a source system is provided. The method includes receiving baseline data profiling results obtained during ETL from a source application to a target application, caching the updates, determining current data profiling results within the ESB for cached updates, and triggering an action if a threshold disparity is detected upon the current data profiling results and the baseline data profiling results.
    Type: Grant
    Filed: December 31, 2010
    Date of Patent: January 16, 2018
    Assignee: International Business Machines Corporation
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
  • Publication number: 20170351717
    Abstract: A computer system with the capability to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.
    Type: Application
    Filed: June 2, 2016
    Publication date: December 7, 2017
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20170329788
    Abstract: The invention relates to computer-implemented method for supplementing a data governance framework with one or more new data governance technical rules. The method comprises providing a plurality of expressions and a first mapping. The expressions assign natural language patterns to technical language patterns. The first mapping maps first terms to data sources. A rule generator receives a new natural language (NL) rule comprising one or more natural-language patterns and one or more first terms. The rule generator resolves the new NL rule into one or more new technical rules interpretable by a respective rule engine and stores the one or more technical rules in a rule repository.
    Type: Application
    Filed: May 10, 2016
    Publication date: November 16, 2017
    Inventors: Mike W. Grasselt, Yannick Saillet, Marvin Schaefer
  • Patent number: 9779266
    Abstract: The invention provides for a data processing system comprising an application server comprising at least one processor. Execution of the instructions cause the processor to: receive an analysis request, the analysis request comprising multiple data analysis commands for generating an analysis report descriptive of a structured data file; divide the commands into private analysis commands and public analysis commands; send the private analysis commands to a trusted distributed file system; send a portion of the public analysis commands to a public distributed file system; send a remainder of the public analysis commands to the trusted distributed file system; and generate the analysis report using public analysis results from the public distributed file system and trusted analysis results from the trusted distributed file system.
    Type: Grant
    Filed: February 26, 2015
    Date of Patent: October 3, 2017
    Assignee: International Business Machines Corporation
    Inventors: Sebastian Nelke, Martin A. Oberhofer, Yannick Saillet, Jens Seifert