Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10719536
    Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.
    Type: Grant
    Filed: December 7, 2017
    Date of Patent: July 21, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10719627
    Abstract: A computer implemented method for data anonymization comprises: receiving a request for data that needs anonymization. The request comprises at least one field descriptor of data to be retrieved and a usage scenario of a user for the requested data. Then, based on the usage scenario, an anonymization algorithm to be applied to the data that is referred to by the field descriptor is determined. Subsequently, the determined anonymization algorithm is applied to the data that is referred to by the field descriptor. A testing is performed, as to whether the degree of anonymization fulfills a requirement that is related to the usage scenario. In the case, the requirement is fulfilled, access to the anonymized data is provided.
    Type: Grant
    Filed: April 23, 2019
    Date of Patent: July 21, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Albert Maier, Martin Oberhofer, Yannick Saillet
  • Publication number: 20200225941
    Abstract: The present disclosure relates to a method for creating run-time executables for data analysis functions. The method comprises in response to receiving a data analysis request from a user, selecting from a repository a repository of data analysis functions a set of data analysis functions for execution in a hosting environment or on premises of the user. Usage conditions of the set of data analysis functions by the user may be determined. An additional code for applying the determined usage conditions may be created. The selected data analysis functions and the additional code may be compiled, resulting in an executable code. The executable code may be certified. The certified executable code may be deployed or provided for download to a run-time environment for certified executable codes.
    Type: Application
    Filed: January 15, 2019
    Publication date: July 16, 2020
    Inventors: Martin Oberhofer, Mike W. Grasselt, Yannick Saillet, Jens P. Seifert
  • Publication number: 20200225942
    Abstract: The present disclosure relates to a method for creating run-time executables for data analysis functions. The method comprises in response to receiving a data analysis request from a user, selecting from a repository a repository of data analysis functions a set of data analysis functions for execution in a hosting environment or on premises of the user. Usage conditions of the set of data analysis functions by the user may be determined. An additional code for applying the determined usage conditions may be created. The selected data analysis functions and the additional code may be compiled, resulting in an executable code. The executable code may be certified. The certified executable code may be deployed or provided for download to a run-time environment for certified executable codes.
    Type: Application
    Filed: July 2, 2019
    Publication date: July 16, 2020
    Inventors: Martin Oberhofer, Mike W. Grasselt, Yannick Saillet, Jens P. Seifert
  • Publication number: 20200183954
    Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.
    Type: Application
    Filed: February 14, 2020
    Publication date: June 11, 2020
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10671627
    Abstract: Embodiments relate to processing a data set stored in a computer system. In one aspect, a method of processing a data set stored in a computer system includes providing one or more parameters for quantifying data quality of the data set. A processor generates, for each parameter of the one or more parameters, a reference pattern indicating a dysfunctional behavior of the values of the parameter. The data set is processed to obtain values of the one or more parameters. A parameter of the one or more parameters is identified whose obtained values match a corresponding reference pattern of the generated reference patterns. The identified parameter is assigned a resource weight value indicating the amount of processing resources required to fix the dysfunctional behavior of the identified parameter.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: June 2, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
  • Publication number: 20200151155
    Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.
    Type: Application
    Filed: January 10, 2020
    Publication date: May 14, 2020
    Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
  • Publication number: 20200142870
    Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.
    Type: Application
    Filed: January 6, 2020
    Publication date: May 7, 2020
    Inventors: Albert Maier, Yannick Saillet, Damir Spisic
  • Patent number: 10635486
    Abstract: The invention provides for a method for processing a plurality of data sets (105; 106; 108; 110-113; DB1; DB2) in a data repository (104) for storing at least unstructured data, the method comprising: —providing (302) a set of agents (150-168), each agent being operable to trigger the processing of one or more of the data sets, the execution of each of said agents being automatically triggered in case one or more conditions assigned to said agent are met, at least one of the conditions relating to the existence, structure, content and/or annotations of the data set whose processing can be triggered by said agent; —executing (304) a first one of the agents; —updating (306) the annotations (115) of the first data set by the first agent; and —executing (308) a second one of the agents, said execution being triggered by the updated annotations of the first data set meeting the conditions of the second agent, thereby triggering a further updating of the annotations of the first data set.
    Type: Grant
    Filed: August 14, 2018
    Date of Patent: April 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Albert Maier, Yannick Saillet, Harald C. Smith, Daniel C. Wolfson
  • Patent number: 10635693
    Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.
    Type: Grant
    Filed: November 11, 2016
    Date of Patent: April 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10621492
    Abstract: The present disclosure relates to a method for centrally processing data records using a record linkage algorithm. The method comprises providing a centralized master repository for storing data records in a predefined data structure having a set of attributes. At least one clustering metric is provided. Clusters of records may be determined using a clustering function that is based on the at least one clustering metric. For each particular cluster, a set of configuration data for the record linkage algorithm may be defined based on a value of the clustering metric within that particular cluster. The individual data records may be assigned to one or more clusters of the clusters using the clustering metric values and the record linkage algorithm may be applied to a set of two or more individual data records assigned to at least one common cluster using the set of configuration data for the common cluster.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: April 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Martin Oberhofer, Yannick Saillet, Scott Schumacher, Jens P. Seifert
  • Patent number: 10621493
    Abstract: The present disclosure relates to a method for centrally processing data records using a record linkage algorithm. The method comprises providing a centralized master repository for storing data records in a predefined data structure having a set of attributes. At least one clustering metric is provided. Clusters of records may be determined using a clustering function that is based on the at least one clustering metric. For each particular cluster, a set of configuration data for the record linkage algorithm may be defined based on a value of the clustering metric within that particular cluster. The individual data records may be assigned to one or more clusters of the clusters using the clustering metric values and the record linkage algorithm may be applied to a set of two or more individual data records assigned to at least one common cluster using the set of configuration data for the common cluster.
    Type: Grant
    Filed: January 2, 2018
    Date of Patent: April 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Martin Oberhofer, Yannick Saillet, Scott Schumacher, Jens P. Seifert
  • Patent number: 10592481
    Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.
    Type: Grant
    Filed: April 6, 2017
    Date of Patent: March 17, 2020
    Assignee: International Business Machines Corporation
    Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
  • Patent number: 10585865
    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
    Type: Grant
    Filed: December 5, 2017
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10585864
    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
    Type: Grant
    Filed: November 11, 2016
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10540336
    Abstract: A mechanism is provided for deduplicating a set of records of data. The mechanism identifies a subset of records each having one or more invalid attribute values. For each invalid attribute value of a given attribute the mechanism determines one or more associated valid candidates of attribute values of the given attribute using the set of records. For each record of the subset of records the mechanism replaces the one or more invalid attribute values by one or more combinations of the determined valid candidates of attribute values, resulting in a modified set of records. The mechanism selects a subset of records of the modified set of records that satisfy a consistency condition on the attribute values of each record. The mechanism deduplicates the selected subset of records of the modified set of records responsive to determining the subset of records comprises more than one record.
    Type: Grant
    Filed: September 26, 2016
    Date of Patent: January 21, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Patent number: 10534763
    Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.
    Type: Grant
    Filed: May 10, 2019
    Date of Patent: January 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Albert Maier, Yannick Saillet, Damir Spisic
  • Patent number: 10534762
    Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.
    Type: Grant
    Filed: May 10, 2019
    Date of Patent: January 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Albert Maier, Yannick Saillet, Damir Spisic
  • Patent number: 10528534
    Abstract: A mechanism is provided for deduplicating a set of records of data. The mechanism identifies a subset of records each having one or more invalid attribute values. For each invalid attribute value of a given attribute the mechanism determines one or more associated valid candidates of attribute values of the given attribute using the set of records. For each record of the subset of records the mechanism replaces the one or more invalid attribute values by one or more combinations of the determined valid candidates of attribute values, resulting in a modified set of records. The mechanism selects a subset of records of the modified set of records that satisfy a consistency condition on the attribute values of each record. The mechanism deduplicates the selected subset of records of the modified set of records responsive to determining the subset of records comprises more than one record.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: January 7, 2020
    Assignee: International Business Machines Corporation
    Inventors: Namit Kabra, Yannick Saillet
  • Publication number: 20190377715
    Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.
    Type: Application
    Filed: August 22, 2019
    Publication date: December 12, 2019
    Inventors: Namit Kabra, Yannick Saillet