Patents by Inventor Horst Cornelius Samulowitz

Horst Cornelius Samulowitz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DATA AUGMENTATION USING SEMANTIC TRANSFORMS

Publication number: 20240144084

Abstract: A method of data augmentation includes receiving, by a processor, a set of data including a plurality of variables, mapping each variable to one or more target concepts associated with a name of each variable, and acquiring a set of semantic transforms, each semantic transform including a function applied to one or more concepts mapped to a respective variable. The method also includes comparing the one or more target concepts to the one or more concepts of each semantic transform, selecting at least one semantic transform based on the comparing, generating an expression for each selected semantic transform, each expression configured to apply a function of a selected semantic transform to at least one of the plurality of variables, and augmenting the set of data for use in an application by adding each expression to the set of data.

Type: Application

Filed: November 2, 2022

Publication date: May 2, 2024

Inventors: Horst Cornelius Samulowitz, Udayan Khurana, Kavitha Srinivas, TAKAAKI TATEISHI, IBRAHIM ABDELAZIZ, Julian Timothy Dolby
AUTOMATED TUNING OF HYPERPARAMETERS BASED ON RANKINGS IN A FEDERATED LEARNING ENVIRONMENT

Publication number: 20240144026

Abstract: A computer-implemented method, according to one approach, includes issuing a hyperparameter optimization (HPO) query to a plurality of computing devices. HPO results are received from the plurality of computing devices, and the HPO results include a set of hyperparameter (HP)/rank value pairs. The method further includes computing, based on the set of HP/rank value pairs, a global set of HPs from the HPO results for federated learning (FL) training. An indication of the global set of HPs is output to the plurality of computing devices. A computer program product, according to another approach, includes a computer readable storage medium having program instructions embodied therewith. The program instructions are readable and/or executable by a computer to cause the computer to perform the foregoing method.

Type: Application

Filed: February 28, 2023

Publication date: May 2, 2024

Inventors: Yi Zhou, Parikshit Ram, Theodoros Salonidis, Nathalie Baracaldo Angel, Horst Cornelius Samulowitz, Heiko H. Ludwig
Automated time series forecasting pipeline generation

Patent number: 11966340

Abstract: To automate time series forecasting machine learning pipeline generation, a data allocation size of time series data may be determined based on one or more characteristics of a time series data set. The time series data may be allocated for use by candidate machine learning pipelines based on the data allocation size. Features for the time series data may be determined and cached by the candidate machine learning pipelines. Predictions of each of the candidate machine learning pipelines using at least the one or more features may be evaluated. A ranked list of machine learning pipelines may be automatically generated from the candidate machine learning pipelines for time series forecasting based upon evaluating predictions of each of the one or more candidate machine learning pipelines.

Type: Grant

Filed: March 15, 2022

Date of Patent: April 23, 2024

Assignee: International Business Machines Corporation

Inventors: Long Vu, Bei Chen, Xuan-Hong Dang, Peter Daniel Kirchner, Syed Yousaf Shah, Dhavalkumar C. Patel, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Gregory Bramble, Horst Cornelius Samulowitz, Saket K. Sathe, Wesley M. Gifford, Petros Zerfos
Automatic domain annotation of structured data

Patent number: 11954424

Abstract: A processor may receive structured data. The structured data may include one or more columns and associated column names. The processor may analyze the structured data. Analyzing the structured data may include gathering a requisite set of keywords from the associated column names across all columns and/or a sample of column cells. The processor may access a corpus of documents. Each of the documents in the corpus may be associated with a respective keyword. The processor may search the corpus of documents based on the requisite set of keywords. The processor may summarize one or more documents associated with the requisite set of keywords.

Type: Grant

Filed: May 2, 2022

Date of Patent: April 9, 2024

Assignee: International Business Machines Corporation

Inventors: Horst Cornelius Samulowitz, Kavitha Srinivas
Automated machine learning using nearest neighbor recommender systems

Patent number: 11941541

Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.

Type: Grant

Filed: August 10, 2020

Date of Patent: March 26, 2024

Assignee: International Business Machines Corporation

Inventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
MINING CODE EXPRESSIONS FOR DATA ANALYSIS

Publication number: 20240069873

Abstract: Techniques for computer software code analysis are disclosed. One or more data flows are generated, based on analyzing software code using static analysis. A data object is identified in the software code using the one or more data flows, the data object relating to a structured dataset. A correspondence between a code expression in the software code and a characteristic of the structured dataset is identified, based on analyzing one or more reads from and one or more writes to the data object using the one or more data flows. The code expression for the structured dataset is analyzed, based on the correspondence, including at least one of: (i) generating a software code recommendation engine based on the code expression and the structured dataset, or (ii) generating one or more lambda expressions for application to the structured dataset, based on the code expression.

Type: Application

Filed: August 25, 2022

Publication date: February 29, 2024

Inventors: Julian Timothy DOLBY, Horst Cornelius SAMULOWITZ, Kavitha SRINIVAS
Automated unsupervised machine learning utilizing meta-learning

Patent number: 11868230

Abstract: Computer hardware and/or software that performs the following operations: (i) assessing a performance of a plurality of unsupervised machine learning pipelines against a plurality of data sets; (ii) associating the performance with meta-features corresponding to respective pipeline/data set combinations; (iii) training a supervised meta-learning model using the associated performance and meta-features as training data; and (iv) utilizing the trained model to identify one or more pipelines for processing an input data set.

Type: Grant

Filed: March 11, 2022

Date of Patent: January 9, 2024

Assignee: International Business Machines Corporation

Inventors: Saket K. Sathe, Long Vu, Peter Daniel Kirchner, Horst Cornelius Samulowitz
Code generation for Auto-AI

Patent number: 11861469

Abstract: An embodiment of the invention may include a method, computer program product, and system for creating a data analysis tool. The method may include a computing device that generates an AI pipeline based on an input dataset, wherein the AI pipeline is generated using an Automated Machine Learning program. The method may include converting the AI pipeline to a non-native format of the Automated Machine Learning program. This may enable the AI pipeline to be used outside of the Automated Machine Learning program, thereby increasing the usefulness of the created program by not tying it to the Automated Machine Learning program. Additionally, this may increase the efficiency of running the AI pipeline by eliminating unnecessary computations performed by the Automated Machine Learning program.

Type: Grant

Filed: July 2, 2020

Date of Patent: January 2, 2024

Assignee: International Business Machines Corporation

Inventors: Peter Daniel Kirchner, Gregory Bramble, Horst Cornelius Samulowitz, Dakuo Wang, Arunima Chaudhary, Gregory Filla
Quality assessment of extracted features from high-dimensional machine learning datasets

Patent number: 11816127

Abstract: A quality determination method, system, and computer program product that includes performing a dimensionality reduction on a high-dimensional dataset to form a dimensional-reduced dataset and determining, using a machine learning tool executed on a computing device, a quality of the dimensional-reduced dataset via a review of an extracted feature extracted from the dimensional-reduced dataset.

Type: Grant

Filed: February 26, 2021

Date of Patent: November 14, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Petr Novotny, Aindrila Basak, Shaikh Shahriar Quader, Horst Cornelius Samulowitz, Chad Marston
AUTOMATIC DOMAIN ANNOTATION OF STRUCTURED DATA

Publication number: 20230351101

Abstract: A processor may receive structured data. The structured data may include one or more columns and associated column names. The processor may analyze the structured data. Analyzing the structured data may include gathering a requisite set of keywords from the associated column names across all columns and/or a sample of column cells. The processor may access a corpus of documents. Each of the documents in the corpus may be associated with a respective keyword. The processor may search the corpus of documents based on the requisite set of keywords. The processor may summarize one or more documents associated with the requisite set of keywords.

Type: Application

Filed: May 2, 2022

Publication date: November 2, 2023

Inventors: Horst Cornelius Samulowitz, Kavitha Srinivas
AUTOMATED UNSUPERVISED MACHINE LEARNING UTILIZING META-LEARNING

Publication number: 20230289277

Abstract: Computer hardware and/or software that performs the following operations: (i) assessing a performance of a plurality of unsupervised machine learning pipelines against a plurality of data sets; (ii) associating the performance with meta-features corresponding to respective pipeline/data set combinations; (iii) training a supervised meta-learning model using the associated performance and meta-features as training data; and (iv) utilizing the trained model to identify one or more pipelines for processing an input data set.

Type: Application

Filed: March 11, 2022

Publication date: September 14, 2023

Inventors: Saket K. Sathe, Long VU, Peter Daniel Kirchner, Horst Cornelius Samulowitz
Methods for automatically configuring performance evaluation schemes for machine learning algorithms

Patent number: 11681931

Abstract: A system that provides a mathematical formulation for new problem of model validation and model selection in presence of test data feedback. The system comprises a memory that stores computer-executable components. A processor, operably coupled to the memory, executes the computer-executable components stored in the memory. A selection component selects a metric of performance evaluation accuracy; and a configuration component configures performance evaluation schemes for machine learning algorithms. A characterization component employs a supervised learning-based approach to characterize relationship between the configuration of the performance evaluation scheme and fidelity of performance estimates; and an optimization component that optimizes accuracy of the machine learning algorithms as a function of size of training data set relative to size of validation data set through selection of values associated with the configuration parameters.

Type: Grant

Filed: September 24, 2019

Date of Patent: June 20, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bo Zhang, Gregory Bramble, Parikshit Ram, Horst Cornelius Samulowitz
PERFORMING AUTOMATED TUNING OF HYPERPARAMETERS IN A FEDERATED LEARNING ENVIRONMENT

Publication number: 20230186168

Abstract: A computer-implemented method according to one embodiment includes issuing a hyperparameter optimization (HPO) query to a plurality of computing devices; receiving HPO results from each of the plurality of computing devices; generating a unified performance metric surface utilizing the HPO results from each of the plurality of computing devices; and determining optimal global hyperparameters, utilizing the unified performance metric surface.

Type: Application

Filed: December 9, 2021

Publication date: June 15, 2023

Inventors: Yi Zhou, Parikshit Ram, Nathalie Baracaldo Angel, Theodoros Salonidis, Horst Cornelius Samulowitz, Martin Wistuba, Heiko H. Ludwig
PERFORMING AUTOMATED SEMANTIC FEATURE DISCOVERY

Publication number: 20230177032

Abstract: A computer-implemented method according to one embodiment includes identifying a data set and meta information; and augmenting the data set with additional features in response to an automatic analysis of the data set in view of the meta information.

Type: Application

Filed: December 8, 2021

Publication date: June 8, 2023

Inventors: Daniel Karl I. Weidele, Lisa Amini, Udayan Khurana, Kavitha Srinivas, Horst Cornelius Samulowitz, Takaaki Tateishi, Carolina Maria Spina, Dakuo Wang, Abel Valente, Arunima Chaudhary, Toshihiro Takahashi
Question answering approach to semantic parsing of mathematical formulas

Patent number: 11663251

Abstract: A method, system, and computer program product are disclosed. The method includes extracting at least one identifier from a formula in a document and extracting text passages in the document that contain the identifier(s). The method also includes selecting an identifier and extracted text passages containing the identifier, as well as generating identifier-passage pairs for the selected text passages and the identifier. Further, the method includes submitting the identifier-passage pairs to a question answering (QA) model, which generates candidate answers from the selected text passages. A definition of the identifier is then selected from the candidate answers.

Type: Grant

Filed: September 8, 2021

Date of Patent: May 30, 2023

Assignee: International Business Machines Corporation

Inventors: William Karol Lynch, Kavitha Srinivas, Horst Cornelius Samulowitz, Fabio Lorenzi
Automated machine learning pipeline generation

Patent number: 11620582

Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.

Type: Grant

Filed: July 29, 2020

Date of Patent: April 4, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bei Chen, Long Vu, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
QUESTION ANSWERING APPROACH TO SEMANTIC PARSING OF MATHEMATICAL FORMULAS

Publication number: 20230076089

Abstract: A method, system, and computer program product are disclosed. The method includes extracting at least one identifier from a formula in a document and extracting text passages in the document that contain the identifier(s). The method also includes selecting an identifier and extracted text passages containing the identifier, as well as generating identifier-passage pairs for the selected text passages and the identifier. Further, the method includes submitting the identifier-passage pairs to a question answering (QA) model, which generates candidate answers from the selected text passages. A definition of the identifier is then selected from the candidate answers.

Type: Application

Filed: September 8, 2021

Publication date: March 9, 2023

Inventors: William Karol Lynch, Kavitha Srinivas, Horst Cornelius Samulowitz, FABIO LORENZI
Knowledge aided feature engineering

Patent number: 11599826

Abstract: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.

Type: Grant

Filed: January 13, 2020

Date of Patent: March 7, 2023

Assignee: International Business Machines Corporation

Inventors: Udayan Khurana, Sainyam Galhotra, Oktie Hassanzadeh, Kavitha Srinivas, Horst Cornelius Samulowitz
INTERACTIVE FEATURE ENGINEERING IN AUTOMATIC MACHINE LEARNING WITH DOMAIN KNOWLEDGE

Publication number: 20220366269

Abstract: A dataset including features and values associated with the features can be received. Each of the features in the dataset can be mapped to a corresponding node in a knowledge graph based on the concept represented by the corresponding node. The knowledge graph can be traversed to find a candidate node connected to at least one mapped node, the candidate node not being mapped to a feature in the dataset. A concept associated with the candidate node can be identified as a new feature. A machine learning model pipeline can use the features in the dataset and the new feature to select a subset of features for training a machine learning model.

Type: Application

Filed: May 11, 2021

Publication date: November 17, 2022

Inventors: Dakuo Wang, Udayan Khurana, Daniel Karl I. Weidele, Arunima Chaudhary, Carolina Maria Spina, Abel Valente, Chuang Gan, Horst Cornelius Samulowitz, Lisa Amini
AUTOMATED TIME SERIES FORECASTING PIPELINE GENERATION

Publication number: 20220327058

Abstract: To automate time series forecasting machine learning pipeline generation, a data allocation size of time series data may be determined based on one or more characteristics of a time series data set. The time series data may be allocated for use by candidate machine learning pipelines based on the data allocation size. Features for the time series data may be determined and cached by the candidate machine learning pipelines. Predictions of each of the candidate machine learning pipelines using at least the one or more features may be evaluated. A ranked list of machine learning pipelines may be automatically generated from the candidate machine learning pipelines for time series forecasting based upon evaluating predictions of each of the one or more candidate machine learning pipelines.

Type: Application

Filed: March 15, 2022

Publication date: October 13, 2022

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Long VU, Bei CHEN, Xuan-Hong DANG, Peter Daniel KIRCHNER, Syed Yousaf SHAH, Dhavalkumar C. PATEL, Si Er HAN, Ji Hui YANG, Jun WANG, Jing James XU, Dakuo WANG, Gregory BRAMBLE, Horst Cornelius SAMULOWITZ, Saket K. SATHE, Wesley M. GIFFORD, Petros ZERFOS

1 2 next