Patents by Inventor Yannick Saillet

Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamically modifying the parallelism of a task in a pipeline

Patent number: 11461135

Abstract: In an approach to dynamically identifying and modifying the parallelism of a particular task in a pipeline, the optimal execution time of each stage in a dynamic pipeline is calculated. The actual execution time of each stage in the dynamic pipeline is measured. Whether the actual time of completion of the data processing job will exceed a threshold is determined. If it is determined that the actual time of completion of the data processing job will exceed the threshold, then additional instances of the stages are created.

Type: Grant

Filed: October 25, 2019

Date of Patent: October 4, 2022

Assignee: International Business Machines Corporation

Inventors: Yannick Saillet, Namit Kabra, Ritesh Kumar Gupta
STANDARDIZATION IN THE CONTEXT OF DATA INTEGRATION

Publication number: 20220277017

Abstract: Techniques are described relating to automatic data standardization in a managed services domain of a cloud computing environment. An associated computer-implemented method includes receiving a dataset during a data onboarding procedure and classifying datapoints within the dataset. The method further includes applying a machine learning data standardization model to each classified datapoint within the dataset and deriving a proposed set of data standardization rules for the dataset based upon any standardization modification determined consequent to model application. Optionally, the method includes presenting the proposed set of data standardization rules for client review and, responsive to acceptance of the proposed set of data standardization rules, applying the proposed set of data standardization rules to the dataset. The method further includes, responsive to acceptance of the proposed set of data standardization rules, updating the machine learning data standardization model accordingly.

Type: Application

Filed: February 24, 2021

Publication date: September 1, 2022

Inventors: Namit Kabra, Krishna Kishore Bonagiri, Mike W. Grasselt, Yannick Saillet
Cognitive recommendations for data preparation

Patent number: 11429878

Abstract: A method, computer system, and computer program product for providing recommendations about processing datasets. A set of machine learning models are provided for use in respectively determining data processing action performable on a dataset based on a respective set of features of the dataset. A current dataset is received. A set of features of the current dataset are determined. One or more data processing actions are generated to be executed on the current dataset, which are determined by at least two machine learning models of the provided set, based on the determined set of features of the current dataset. One or more of the data processing actions are performed on the current dataset.

Type: Grant

Filed: September 22, 2017

Date of Patent: August 30, 2022

Assignee: International Business Machines Corporation

Inventors: Yannick Saillet, Martin A. Oberhofer, Jens P. Seifert
Data standardization rules generation

Patent number: 11397855

Abstract: A method for generating data standardization rules includes receiving a training data set containing tokenized and tagged data values. A set of machine mining models is built using different learning algorithms for identifying tags and tag patterns using the training set. For each data value in a further data set: a tokenization is applied on the data value, resulting in a set of tokens. For each token of the set of tokens one or more tag candidates are determined using a lookup dictionary of tags and tokens and/or at least part of the set of machine mining models, resulting for each token of the set of tokens in a list of possible tags. Unique combinations of the sets of tags of the further data set having highest aggregated confidence values are provided for use as standardization rules.

Type: Grant

Filed: December 12, 2017

Date of Patent: July 26, 2022

Assignee: International Business Machines Corporation

Inventors: Yannick Saillet, Martin Oberhofer, Namit Kabra
Mobile device based VR content control

Patent number: 11393171

Abstract: Aspects of the present disclosure relate to controlling virtual reality (VR) content displayed on a VR head mounted display (HMD). Communication can be established between a computer system, a VR HMD, and a mobile device. A user input configured to control VR content displayed on a display of the VR HMD can be received on the mobile device. The VR content displayed on the VR HMD can then be controlled based on the user input received on the mobile device.

Type: Grant

Filed: July 21, 2020

Date of Patent: July 19, 2022

Assignee: International Business Machines Corporation

Inventors: Namit Kabra, Smitkumar Narotambhai Marvaniya, Yannick Saillet, Kunjavihari Madhav Kashalikar
Data classification

Patent number: 11366843

Abstract: The invention relates to a computer-implemented method for classifying a set of data values. For each of the data values of the set of data values, a set of one or more terms associated with the respective data value is determined using one or more first knowledge bases. A set of common terms is determined. The set of common terms comprises terms present in more than one of the sets of terms. For each of the common terms, a number of hits for a lookup query against one or more second knowledge data bases is determined. One or more common terms of the set of common terms with the smallest number of hits are determined and a result is returned. The result comprises the one or more common terms with the smallest number of hits as one or more candidate classes for classifying the set of data values.

Type: Grant

Filed: April 23, 2019

Date of Patent: June 21, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Albert Maier, Martin Oberhofer, Yannick Saillet
Classifying an unmanaged dataset

Patent number: 11354282

Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.

Type: Grant

Filed: January 10, 2020

Date of Patent: June 7, 2022

Assignee: International Business Machinos Corporation

Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
Efficiently finding potential duplicate values in data

Patent number: 11334603

Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.

Type: Grant

Filed: February 14, 2020

Date of Patent: May 17, 2022

Assignee: International Business Machines Corporation

Inventors: Namit Kabra, Yannick Saillet
MEASURING DATA QUALITY OF DATA IN A GRAPH DATABASE

Publication number: 20220138512

Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a first graph comprising first nodes representing first entities and first edges representing relationships between first entities, the first nodes being associated with first entity attributes descriptive of the first entities represented by the first nodes, the first edges being associated with first edge attributes descriptive of the relationships represented by the first edges; determining a first subgraph for a certain node of the first nodes of the first graph, the first subgraph including the certain node and at least one neighboring node of the certain node; and determining a data quality issue regarding the certain node based, at least in part, on applying one or more applicable rules of a set of data quality rules to first entity attribute values and first edge attribute values of the first subgraph.

Type: Application

Filed: October 29, 2020

Publication date: May 5, 2022

Inventors: Yannick Saillet, Claudio Andrea Fanconi, Martin Oberhofer, Hemanth Kumar Babu, Basem Elasioty, Mike W. Grasselt, Robert Kern, Thuany Karoline Stuart
MASKING SENSITIVE INFORMATION IN A DOCUMENT

Publication number: 20220123935

Abstract: The exemplary embodiments disclose a method, a computer program product, and a computer system for protecting sensitive information. The exemplary embodiments may include using an inverted text index for evaluating one or more statistical measures of an index token of the inverted text index, using the one or more statistical measures for selecting a set of candidate tokens, extracting metadata from the inverted text index, associating the set of candidate tokens with respective token metadata, tokenizing at least one document resulting in one or more document tokens, comparing the one or more document tokens with the set of candidate tokens, selecting a set of document tokens to be masked, selecting at least part of the set of document tokens that comprises sensitive information according to the associated token metadata, masking the at least part of the set of document tokens, and providing one or more masked documents.

Type: Application

Filed: October 19, 2020

Publication date: April 21, 2022

Inventors: Michael Baessler, Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer
PROTECTING SENSITIVE DATA IN DOCUMENTS

Publication number: 20220100899

Abstract: In an approach, a processor receives a request of a document. A processor identifies a set of datasets comprising a sensitive dataset, the set of datasets being interrelated in accordance with a relational model. A processor extracts attribute values of the document. A processor determines that a set of one or more attribute values of the extracted attribute values is in the set of datasets, the set of attribute values being values of a set of attributes. A processor determines that one or more entities of the sensitive dataset can be identified based on relations of the relational model between the set of attributes, where at least part of attribute values of the one or more entities comprises sensitive information. A processor, responsive to determining that the one or more entities can be identified, masks at least part of the set of one or more attribute values in the document.

Type: Application

Filed: September 25, 2020

Publication date: March 31, 2022

Inventors: Yannick Saillet, Albert Maier, Mike W. Grasselt, Michael Baessler, Lars Bremer
METHOD FOR CLASSIFYING AN UNMANAGED DATASET

Publication number: 20220075762

Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.

Type: Application

Filed: November 16, 2021

Publication date: March 10, 2022

Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
Computing the need for standardization of a set of values

Patent number: 11243924

Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.

Type: Grant

Filed: August 22, 2019

Date of Patent: February 8, 2022

Assignee: International Business Machines Corporation

Inventors: Namit Kabra, Yannick Saillet
Computing the need for standardization of a set of values

Patent number: 11243923

Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.

Type: Grant

Filed: August 22, 2019

Date of Patent: February 8, 2022

Assignee: International Business Machines Corporation

Inventors: Namit Kabra, Yannick Saillet
RESOURCE AVAILABILITY-BASED WORKFLOW EXECUTION TIMING DETERMINATION

Publication number: 20220035667

Abstract: According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.

Type: Application

Filed: October 19, 2021

Publication date: February 3, 2022

Inventors: Yannick Saillet, Namit Kabra
MOBILE DEVICE BASED VR CONTROL

Publication number: 20220028168

Abstract: Aspects of the present disclosure relate to controlling virtual reality (VR) content displayed on a VR head mounted display (HMD). Communication can be established between a computer system, a VR HMD, and a mobile device. A user input configured to control VR content displayed on a display of the VR HMD can be received on the mobile device. The VR content displayed on the VR HMD can then be controlled based on the user input received on the mobile device.

Type: Application

Filed: July 21, 2020

Publication date: January 27, 2022

Inventors: Namit Kabra, Smitkumar Narotambhai Marvaniya, Yannick Saillet, Kunjavihari Madhav Kashalikar
Data quality evaluation

Patent number: 11200215

Abstract: Methods and systems for data quality evaluation are disclosed. A method includes: receiving, by a computing device, at least one data set and a list of rule expressions to bind; building, by the computing device, candidate binding combinations between columns of the at least one data set and variables of each rule expression in the list of rule expressions; building, by the computing device, a new bound rule expression candidate based on the candidate binding combinations; generating, by the computing device, a new bound rule expression based on the new bound rule expression candidate and a data transformation applied to at least one of the columns of the at least one data set; and storing, by the computing device, the new bound rule expression.

Type: Grant

Filed: January 30, 2020

Date of Patent: December 14, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kunjavihari Madhav Kashalikar, Yannick Saillet, Ketki Ramesh Purandare
SORTING DATA ELEMENTS OF A GIVEN SET OF DATA ELEMENTS

Publication number: 20210357183

Abstract: A computer implemented method is used for sorting data elements of a given set. The method includes performing an evaluation of a first type of usage of each data element. The method includes determining a set of data element candidates dependent on the evaluation of the first type of usage. The method includes performing an evaluation of a second type of usage of each data element of the set of data element candidates. The method includes sorting the data elements of the set of data element candidates dependent on the evaluation of the second type of usage of each data element of the set of data element candidates. The method includes providing the sorted data elements of the set of data element candidates, and in response, receiving a request for a data processing based on the provided sorted data elements of the set of data element candidates.

Type: Application

Filed: May 18, 2020

Publication date: November 18, 2021

Inventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
DATA QUALITY ASSESSMENT FOR DATA ANALYTICS

Publication number: 20210357699

Abstract: The invention relates to an approach for data quality assessment for data analytics, the approach comprising providing a data set, the data set comprising multiple data fields, predicting by a first trained machine learning model at least one usage type of the data set using characteristics of the data fields as input, for each usage type of the at least one usage type, determining a usage specific data quality score of each of the predicted usage types, and using of the data set based on the at least one usage type and associated data quality score.

Type: Application

Filed: May 14, 2020

Publication date: November 18, 2021

Inventors: Yannick Saillet, Mike W. Grasselt, Namit Kabra, Krishna Kishore Bonagiri
Resource availability-based workflow execution timing determination

Patent number: 11175951

Abstract: According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.

Type: Grant

Filed: May 29, 2019

Date of Patent: November 16, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yannick Saillet, Namit Kabra

prev 1 2 3 4 5 6 … next