Patents by Inventor Yannick Saillet
Yannick Saillet has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11461135Abstract: In an approach to dynamically identifying and modifying the parallelism of a particular task in a pipeline, the optimal execution time of each stage in a dynamic pipeline is calculated. The actual execution time of each stage in the dynamic pipeline is measured. Whether the actual time of completion of the data processing job will exceed a threshold is determined. If it is determined that the actual time of completion of the data processing job will exceed the threshold, then additional instances of the stages are created.Type: GrantFiled: October 25, 2019Date of Patent: October 4, 2022Assignee: International Business Machines CorporationInventors: Yannick Saillet, Namit Kabra, Ritesh Kumar Gupta
-
Publication number: 20220277017Abstract: Techniques are described relating to automatic data standardization in a managed services domain of a cloud computing environment. An associated computer-implemented method includes receiving a dataset during a data onboarding procedure and classifying datapoints within the dataset. The method further includes applying a machine learning data standardization model to each classified datapoint within the dataset and deriving a proposed set of data standardization rules for the dataset based upon any standardization modification determined consequent to model application. Optionally, the method includes presenting the proposed set of data standardization rules for client review and, responsive to acceptance of the proposed set of data standardization rules, applying the proposed set of data standardization rules to the dataset. The method further includes, responsive to acceptance of the proposed set of data standardization rules, updating the machine learning data standardization model accordingly.Type: ApplicationFiled: February 24, 2021Publication date: September 1, 2022Inventors: Namit Kabra, Krishna Kishore Bonagiri, Mike W. Grasselt, Yannick Saillet
-
Patent number: 11429878Abstract: A method, computer system, and computer program product for providing recommendations about processing datasets. A set of machine learning models are provided for use in respectively determining data processing action performable on a dataset based on a respective set of features of the dataset. A current dataset is received. A set of features of the current dataset are determined. One or more data processing actions are generated to be executed on the current dataset, which are determined by at least two machine learning models of the provided set, based on the determined set of features of the current dataset. One or more of the data processing actions are performed on the current dataset.Type: GrantFiled: September 22, 2017Date of Patent: August 30, 2022Assignee: International Business Machines CorporationInventors: Yannick Saillet, Martin A. Oberhofer, Jens P. Seifert
-
Patent number: 11397855Abstract: A method for generating data standardization rules includes receiving a training data set containing tokenized and tagged data values. A set of machine mining models is built using different learning algorithms for identifying tags and tag patterns using the training set. For each data value in a further data set: a tokenization is applied on the data value, resulting in a set of tokens. For each token of the set of tokens one or more tag candidates are determined using a lookup dictionary of tags and tokens and/or at least part of the set of machine mining models, resulting for each token of the set of tokens in a list of possible tags. Unique combinations of the sets of tags of the further data set having highest aggregated confidence values are provided for use as standardization rules.Type: GrantFiled: December 12, 2017Date of Patent: July 26, 2022Assignee: International Business Machines CorporationInventors: Yannick Saillet, Martin Oberhofer, Namit Kabra
-
Patent number: 11393171Abstract: Aspects of the present disclosure relate to controlling virtual reality (VR) content displayed on a VR head mounted display (HMD). Communication can be established between a computer system, a VR HMD, and a mobile device. A user input configured to control VR content displayed on a display of the VR HMD can be received on the mobile device. The VR content displayed on the VR HMD can then be controlled based on the user input received on the mobile device.Type: GrantFiled: July 21, 2020Date of Patent: July 19, 2022Assignee: International Business Machines CorporationInventors: Namit Kabra, Smitkumar Narotambhai Marvaniya, Yannick Saillet, Kunjavihari Madhav Kashalikar
-
Patent number: 11366843Abstract: The invention relates to a computer-implemented method for classifying a set of data values. For each of the data values of the set of data values, a set of one or more terms associated with the respective data value is determined using one or more first knowledge bases. A set of common terms is determined. The set of common terms comprises terms present in more than one of the sets of terms. For each of the common terms, a number of hits for a lookup query against one or more second knowledge data bases is determined. One or more common terms of the set of common terms with the smallest number of hits are determined and a result is returned. The result comprises the one or more common terms with the smallest number of hits as one or more candidate classes for classifying the set of data values.Type: GrantFiled: April 23, 2019Date of Patent: June 21, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Albert Maier, Martin Oberhofer, Yannick Saillet
-
Patent number: 11354282Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.Type: GrantFiled: January 10, 2020Date of Patent: June 7, 2022Assignee: International Business Machinos CorporationInventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
-
Patent number: 11334603Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.Type: GrantFiled: February 14, 2020Date of Patent: May 17, 2022Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Publication number: 20220138512Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a first graph comprising first nodes representing first entities and first edges representing relationships between first entities, the first nodes being associated with first entity attributes descriptive of the first entities represented by the first nodes, the first edges being associated with first edge attributes descriptive of the relationships represented by the first edges; determining a first subgraph for a certain node of the first nodes of the first graph, the first subgraph including the certain node and at least one neighboring node of the certain node; and determining a data quality issue regarding the certain node based, at least in part, on applying one or more applicable rules of a set of data quality rules to first entity attribute values and first edge attribute values of the first subgraph.Type: ApplicationFiled: October 29, 2020Publication date: May 5, 2022Inventors: Yannick Saillet, Claudio Andrea Fanconi, Martin Oberhofer, Hemanth Kumar Babu, Basem Elasioty, Mike W. Grasselt, Robert Kern, Thuany Karoline Stuart
-
Publication number: 20220123935Abstract: The exemplary embodiments disclose a method, a computer program product, and a computer system for protecting sensitive information. The exemplary embodiments may include using an inverted text index for evaluating one or more statistical measures of an index token of the inverted text index, using the one or more statistical measures for selecting a set of candidate tokens, extracting metadata from the inverted text index, associating the set of candidate tokens with respective token metadata, tokenizing at least one document resulting in one or more document tokens, comparing the one or more document tokens with the set of candidate tokens, selecting a set of document tokens to be masked, selecting at least part of the set of document tokens that comprises sensitive information according to the associated token metadata, masking the at least part of the set of document tokens, and providing one or more masked documents.Type: ApplicationFiled: October 19, 2020Publication date: April 21, 2022Inventors: Michael Baessler, Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer
-
Publication number: 20220100899Abstract: In an approach, a processor receives a request of a document. A processor identifies a set of datasets comprising a sensitive dataset, the set of datasets being interrelated in accordance with a relational model. A processor extracts attribute values of the document. A processor determines that a set of one or more attribute values of the extracted attribute values is in the set of datasets, the set of attribute values being values of a set of attributes. A processor determines that one or more entities of the sensitive dataset can be identified based on relations of the relational model between the set of attributes, where at least part of attribute values of the one or more entities comprises sensitive information. A processor, responsive to determining that the one or more entities can be identified, masks at least part of the set of one or more attribute values in the document.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Inventors: Yannick Saillet, Albert Maier, Mike W. Grasselt, Michael Baessler, Lars Bremer
-
Publication number: 20220075762Abstract: A computer implemented method for classifying at least one source dataset of a computer system. The method may include providing a plurality of associated reference tables organized and associated in accordance with a reference storage model in the computer system. The method may also include calculating, by a data classifier application of the computer system, a first similarity score between the source dataset and a first reference table of the reference tables based on common attributes in the source dataset and a join of the first reference table with at least one further reference table of the reference tables having a relationship with the first reference table. The method may further include classifying, by the data classifier application, the source dataset by determining using at least the calculated first similarity score whether the source dataset is organized as the first reference table in accordance to the reference storage model.Type: ApplicationFiled: November 16, 2021Publication date: March 10, 2022Inventors: Martin Oberhofer, Adapala S. Reddy, Yannick Saillet, Jens Seifert
-
Patent number: 11243924Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.Type: GrantFiled: August 22, 2019Date of Patent: February 8, 2022Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Patent number: 11243923Abstract: A method, system and computer program product for determining a data standardization score for an attribute of a dataset. A data standardization score is calculated, which reflects whether data quality of attribute values would increase if a standardization rule is applied to the attribute values. Based on attribute metadata, it may be determined whether an indication to carry or not to carry out standardization is available for at least part of the attribute values of the dataset. In response to finding the indication, a respective value may be set for the data standardization score. In response to not finding the indication, a data standardization score algorithm may be run on the at least part of the attribute values of the dataset. The data standardization score value may be compared to a predefined criterion to determine whether data standardization is to be applied on the attribute.Type: GrantFiled: August 22, 2019Date of Patent: February 8, 2022Assignee: International Business Machines CorporationInventors: Namit Kabra, Yannick Saillet
-
Publication number: 20220035667Abstract: According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.Type: ApplicationFiled: October 19, 2021Publication date: February 3, 2022Inventors: Yannick Saillet, Namit Kabra
-
Publication number: 20220028168Abstract: Aspects of the present disclosure relate to controlling virtual reality (VR) content displayed on a VR head mounted display (HMD). Communication can be established between a computer system, a VR HMD, and a mobile device. A user input configured to control VR content displayed on a display of the VR HMD can be received on the mobile device. The VR content displayed on the VR HMD can then be controlled based on the user input received on the mobile device.Type: ApplicationFiled: July 21, 2020Publication date: January 27, 2022Inventors: Namit Kabra, Smitkumar Narotambhai Marvaniya, Yannick Saillet, Kunjavihari Madhav Kashalikar
-
Patent number: 11200215Abstract: Methods and systems for data quality evaluation are disclosed. A method includes: receiving, by a computing device, at least one data set and a list of rule expressions to bind; building, by the computing device, candidate binding combinations between columns of the at least one data set and variables of each rule expression in the list of rule expressions; building, by the computing device, a new bound rule expression candidate based on the candidate binding combinations; generating, by the computing device, a new bound rule expression based on the new bound rule expression candidate and a data transformation applied to at least one of the columns of the at least one data set; and storing, by the computing device, the new bound rule expression.Type: GrantFiled: January 30, 2020Date of Patent: December 14, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kunjavihari Madhav Kashalikar, Yannick Saillet, Ketki Ramesh Purandare
-
Publication number: 20210357183Abstract: A computer implemented method is used for sorting data elements of a given set. The method includes performing an evaluation of a first type of usage of each data element. The method includes determining a set of data element candidates dependent on the evaluation of the first type of usage. The method includes performing an evaluation of a second type of usage of each data element of the set of data element candidates. The method includes sorting the data elements of the set of data element candidates dependent on the evaluation of the second type of usage of each data element of the set of data element candidates. The method includes providing the sorted data elements of the set of data element candidates, and in response, receiving a request for a data processing based on the provided sorted data elements of the set of data element candidates.Type: ApplicationFiled: May 18, 2020Publication date: November 18, 2021Inventors: Albert Maier, Mike W. Grasselt, Yannick Saillet, Lars Bremer, Michael Baessler
-
Publication number: 20210357699Abstract: The invention relates to an approach for data quality assessment for data analytics, the approach comprising providing a data set, the data set comprising multiple data fields, predicting by a first trained machine learning model at least one usage type of the data set using characteristics of the data fields as input, for each usage type of the at least one usage type, determining a usage specific data quality score of each of the predicted usage types, and using of the data set based on the at least one usage type and associated data quality score.Type: ApplicationFiled: May 14, 2020Publication date: November 18, 2021Inventors: Yannick Saillet, Mike W. Grasselt, Namit Kabra, Krishna Kishore Bonagiri
-
Patent number: 11175951Abstract: According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.Type: GrantFiled: May 29, 2019Date of Patent: November 16, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yannick Saillet, Namit Kabra