Patents by Inventor Yannick Saillet
Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10719536Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.Type: GrantFiled: December 7, 2017Date of Patent: July 21, 2020Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Patent number: 10719627Abstract: A computer implemented method for data anonymization comprises: receiving a request for data that needs anonymization. The request comprises at least one field descriptor of data to be retrieved and a usage scenario of a user for the requested data. Then, based on the usage scenario, an anonymization algorithm to be applied to the data that is referred to by the field descriptor is determined. Subsequently, the determined anonymization algorithm is applied to the data that is referred to by the field descriptor. A testing is performed, as to whether the degree of anonymization fulfills a requirement that is related to the usage scenario. In the case, the requirement is fulfilled, access to the anonymized data is provided.Type: GrantFiled: April 23, 2019Date of Patent: July 21, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Albert Maier, Martin Oberhofer, Yannick Saillet
-
Publication number: 20200225941Abstract: The present disclosure relates to a method for creating run-time executables for data analysis functions. The method comprises in response to receiving a data analysis request from a user, selecting from a repository a repository of data analysis functions a set of data analysis functions for execution in a hosting environment or on premises of the user. Usage conditions of the set of data analysis functions by the user may be determined. An additional code for applying the determined usage conditions may be created. The selected data analysis functions and the additional code may be compiled, resulting in an executable code. The executable code may be certified. The certified executable code may be deployed or provided for download to a run-time environment for certified executable codes.Type: ApplicationFiled: January 15, 2019Publication date: July 16, 2020Inventors: Martin Oberhofer, Mike W. Grasselt, Yannick Saillet, Jens P. Seifert
-
Publication number: 20200225942Abstract: The present disclosure relates to a method for creating run-time executables for data analysis functions. The method comprises in response to receiving a data analysis request from a user, selecting from a repository a repository of data analysis functions a set of data analysis functions for execution in a hosting environment or on premises of the user. Usage conditions of the set of data analysis functions by the user may be determined. An additional code for applying the determined usage conditions may be created. The selected data analysis functions and the additional code may be compiled, resulting in an executable code. The executable code may be certified. The certified executable code may be deployed or provided for download to a run-time environment for certified executable codes.Type: ApplicationFiled: July 2, 2019Publication date: July 16, 2020Inventors: Martin Oberhofer, Mike W. Grasselt, Yannick Saillet, Jens P. Seifert
-
Publication number: 20200183954Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.Type: ApplicationFiled: February 14, 2020Publication date: June 11, 2020Inventors: Namit Kabra, Yannick Saillet
-
Patent number: 10671627Abstract: Embodiments relate to processing a data set stored in a computer system. In one aspect, a method of processing a data set stored in a computer system includes providing one or more parameters for quantifying data quality of the data set. A processor generates, for each parameter of the one or more parameters, a reference pattern indicating a dysfunctional behavior of the values of the parameter. The data set is processed to obtain values of the one or more parameters. A parameter of the one or more parameters is identified whose obtained values match a corresponding reference pattern of the generated reference patterns. The identified parameter is assigned a resource weight value indicating the amount of processing resources required to fix the dysfunctional behavior of the identified parameter.Type: GrantFiled: November 29, 2018Date of Patent: June 2, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sebastian Nelke, Martin Oberhofer, Yannick Saillet, Jens Seifert
-
Publication number: 20200151155Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.Type: ApplicationFiled: January 10, 2020Publication date: May 14, 2020Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
-
Publication number: 20200142870Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.Type: ApplicationFiled: January 6, 2020Publication date: May 7, 2020Inventors: Albert Maier, Yannick Saillet, Damir Spisic
-
Patent number: 10635486Abstract: The invention provides for a method for processing a plurality of data sets (105; 106; 108; 110-113; DB1; DB2) in a data repository (104) for storing at least unstructured data, the method comprising: —providing (302) a set of agents (150-168), each agent being operable to trigger the processing of one or more of the data sets, the execution of each of said agents being automatically triggered in case one or more conditions assigned to said agent are met, at least one of the conditions relating to the existence, structure, content and/or annotations of the data set whose processing can be triggered by said agent; —executing (304) a first one of the agents; —updating (306) the annotations (115) of the first data set by the first agent; and —executing (308) a second one of the agents, said execution being triggered by the updated annotations of the first data set meeting the conditions of the second agent, thereby triggering a further updating of the annotations of the first data set.Type: GrantFiled: August 14, 2018Date of Patent: April 28, 2020Assignee: International Business Machines CorporationInventors: Albert Maier, Yannick Saillet, Harald C. Smith, Daniel C. Wolfson
-
Patent number: 10635693Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.Type: GrantFiled: November 11, 2016Date of Patent: April 28, 2020Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Patent number: 10621492Abstract: The present disclosure relates to a method for centrally processing data records using a record linkage algorithm. The method comprises providing a centralized master repository for storing data records in a predefined data structure having a set of attributes. At least one clustering metric is provided. Clusters of records may be determined using a clustering function that is based on the at least one clustering metric. For each particular cluster, a set of configuration data for the record linkage algorithm may be defined based on a value of the clustering metric within that particular cluster. The individual data records may be assigned to one or more clusters of the clusters using the clustering metric values and the record linkage algorithm may be applied to a set of two or more individual data records assigned to at least one common cluster using the set of configuration data for the common cluster.Type: GrantFiled: October 21, 2016Date of Patent: April 14, 2020Assignee: International Business Machines CorporationInventors: Martin Oberhofer, Yannick Saillet, Scott Schumacher, Jens P. Seifert
-
Patent number: 10621493Abstract: The present disclosure relates to a method for centrally processing data records using a record linkage algorithm. The method comprises providing a centralized master repository for storing data records in a predefined data structure having a set of attributes. At least one clustering metric is provided. Clusters of records may be determined using a clustering function that is based on the at least one clustering metric. For each particular cluster, a set of configuration data for the record linkage algorithm may be defined based on a value of the clustering metric within that particular cluster. The individual data records may be assigned to one or more clusters of the clusters using the clustering metric values and the record linkage algorithm may be applied to a set of two or more individual data records assigned to at least one common cluster using the set of configuration data for the common cluster.Type: GrantFiled: January 2, 2018Date of Patent: April 14, 2020Assignee: International Business Machines CorporationInventors: Martin Oberhofer, Yannick Saillet, Scott Schumacher, Jens P. Seifert
-
Patent number: 10592481Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.Type: GrantFiled: April 6, 2017Date of Patent: March 17, 2020Assignee: International Business Machines CorporationInventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
-
Patent number: 10585865Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.Type: GrantFiled: December 5, 2017Date of Patent: March 10, 2020Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Patent number: 10585864Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.Type: GrantFiled: November 11, 2016Date of Patent: March 10, 2020Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Patent number: 10540336Abstract: A mechanism is provided for deduplicating a set of records of data. The mechanism identifies a subset of records each having one or more invalid attribute values. For each invalid attribute value of a given attribute the mechanism determines one or more associated valid candidates of attribute values of the given attribute using the set of records. For each record of the subset of records the mechanism replaces the one or more invalid attribute values by one or more combinations of the determined valid candidates of attribute values, resulting in a modified set of records. The mechanism selects a subset of records of the modified set of records that satisfy a consistency condition on the attribute values of each record. The mechanism deduplicates the selected subset of records of the modified set of records responsive to determining the subset of records comprises more than one record.Type: GrantFiled: September 26, 2016Date of Patent: January 21, 2020Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Patent number: 10534763Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.Type: GrantFiled: May 10, 2019Date of Patent: January 14, 2020Assignee: International Business Machines CorporationInventors: Albert Maier, Yannick Saillet, Damir Spisic
-
Patent number: 10534762Abstract: A computer-implemented method, computer program product and system for data sampling in a storage system. The storage system includes a dataset comprising records and a buffer. The dataset is scanned record-by-record to determine whether the current record belongs to a random sample. If so, then the current record may be added to a first set of records. Otherwise, at least one storage score may be calculated or determined for the current record using attribute values of the current record. Next, it may be determined whether the buffer includes available size for storing the current record. In case the buffer comprises the available size, the current record may be stored in the buffer. Otherwise, at least part of the buffer may be free up. A subsample of the dataset may be provided as a result of merging the first set of records and at least part of the buffered records.Type: GrantFiled: May 10, 2019Date of Patent: January 14, 2020Assignee: International Business Machines CorporationInventors: Albert Maier, Yannick Saillet, Damir Spisic
-
Patent number: 10528534Abstract: A mechanism is provided for deduplicating a set of records of data. The mechanism identifies a subset of records each having one or more invalid attribute values. For each invalid attribute value of a given attribute the mechanism determines one or more associated valid candidates of attribute values of the given attribute using the set of records. For each record of the subset of records the mechanism replaces the one or more invalid attribute values by one or more combinations of the determined valid candidates of attribute values, resulting in a modified set of records. The mechanism selects a subset of records of the modified set of records that satisfy a consistency condition on the attribute values of each record. The mechanism deduplicates the selected subset of records of the modified set of records responsive to determining the subset of records comprises more than one record.Type: GrantFiled: November 28, 2017Date of Patent: January 7, 2020Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Publication number: 20190377715Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.Type: ApplicationFiled: August 22, 2019Publication date: December 12, 2019Inventors: Namit Kabra, Yannick Saillet