Patents by Inventor Saket SATHE

Saket SATHE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Automated machine learning using nearest neighbor recommender systems

Patent number: 11941541

Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.

Type: Grant

Filed: August 10, 2020

Date of Patent: March 26, 2024

Assignee: International Business Machines Corporation

Inventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
Distributed resource-aware training of machine learning pipelines

Patent number: 11829799

Abstract: A method, a structure, and a computer system for predicting pipeline training requirements. The exemplary embodiments may include receiving one or more worker node features from one or more worker nodes, extracting one or more pipeline features from one or more pipelines to be trained, and extracting one or more dataset features from one or more datasets used to train the one or more pipelines. The exemplary embodiments may further include predicting an amount of one or more resources required for each of the one or more worker nodes to train the one or more pipelines using the one or more datasets based on one or more models that correlate the one or more worker node features, one or more pipeline features, and one or more dataset features with the one or more resources. Lastly, the exemplary embodiments may include identifying a worker node requiring a least amount of the one or more resources of the one or more worker nodes for training the one or more pipelines.

Type: Grant

Filed: October 13, 2020

Date of Patent: November 28, 2023

Assignee: International Business Machines Corporation

Inventors: Saket Sathe, Gregory Bramble, Long Vu, Theodoros Salonidis
METALEARNER FOR UNSUPERVISED AUTOMATED MACHINE LEARNING

Publication number: 20230177387

Abstract: A method, system, and computer program product for a metalearner for automated machine learning are provided. The method receives a labeled data set. A set of data subsets is generated from the labeled data set. A set of unsupervised machine learning pipelines is generated. A training set is generated from the set of data subsets and the set of unsupervised machine learning pipelines. The method trains a metalearner for unsupervised tasks based on the training set.

Type: Application

Filed: December 8, 2021

Publication date: June 8, 2023

Inventors: Saket Sathe, Long Vu, Peter Daniel Kirchner, Charu C. Aggarwal
Automated machine learning pipeline generation

Patent number: 11620582

Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.

Type: Grant

Filed: July 29, 2020

Date of Patent: April 4, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bei Chen, Long Vu, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
Enhanced ensemble model diversity and learning

Patent number: 11593716

Abstract: Embodiments for implementing enhanced ensemble model diversity and learning by a processor. One or more data sets may be created by combining one or more clusters of data points of a minority class with selected data points of a majority class. One or more ensemble models may be created from the one or more data sets using a supervised machine learning operation. An occurrence of an event may be predicted using the one or more ensemble models.

Type: Grant

Filed: April 11, 2019

Date of Patent: February 28, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Saket Sathe, Deepak Turaga, Charu Aggarwal, Raju Pavuluri, Yuan-Chi Chang
PIPELINE RANKING WITH MODEL-BASED DYNAMIC DATA ALLOCATION

Publication number: 20220343207

Abstract: In a method for ranking machine learning (ML) pipelines for a dataset, a processor receives first performance curves predicted by a meta learner model for a plurality of ML pipelines. A processor allocates a first subset of data points from the dataset to each of the plurality of ML pipelines. A processor receives first performance scores for each of the ML pipelines for the first subset of data points. A processor updates the meta learner model using the first performance scores. A processor receives second performance curves from the meta learner model updated with the first performance scores. A processor ranks the plurality of ML pipelines based on the second performance curves.

Type: Application

Filed: April 22, 2021

Publication date: October 27, 2022

Inventors: Long Vu, Saket Sathe, Bei Chen, Peter Daniel Kirchner
IMPLEMENTING PAY-AS-YOU-GO (PAYG) AUTOMATED MACHINE LEARNING AND AI

Publication number: 20220207444

Abstract: A system and method for assessing Pay-As-You-Go (PAYG) Automatic machine learned (AutoML) model pipeline charge to a user on the basis of performance improvement achieved by configuring a model pipeline with performance enhancements relative to a performance obtained by a base model pipeline. The method performs a ranking of pipelines (customized models) based on a user-specified metric (for example, prediction accuracy, run time, F1 score) or combination of metrics. The price for ranked pipelines is specified based on a “surrogate” model where the surrogate model is fit to the base model price and the maximum price for a model. The base model price relates to use of a current cloud resource utilization-based pricing model. The pricing per model pipeline increments on the basis of performance metric(s) in a linear fashion, e.g., using a linear pricing model, or in an exponential fashion, e.g., using a fixed percentage hike price model.

Type: Application

Filed: December 30, 2020

Publication date: June 30, 2022

Inventors: Gregory Bramble, Saket Sathe, Long Vu, Theodoros Salonidis, Horst Cornelius Samulowitz, Jean-François Puget
DISTRIBUTED RESOURCE-AWARE TRAINING OF MACHINE LEARNING PIPELINES

Publication number: 20220114019

Abstract: A method, a structure, and a computer system for predicting pipeline training requirements. The exemplary embodiments may include receiving one or more worker node features from one or more worker nodes, extracting one or more pipeline features from one or more pipelines to be trained, and extracting one or more dataset features from one or more datasets used to train the one or more pipelines. The exemplary embodiments may further include predicting an amount of one or more resources required for each of the one or more worker nodes to train the one or more pipelines using the one or more datasets based on one or more models that correlate the one or more worker node features, one or more pipeline features, and one or more dataset features with the one or more resources. Lastly, the exemplary embodiments may include identifying a worker node requiring a least amount of the one or more resources of the one or more worker nodes for training the one or more pipelines.

Type: Application

Filed: October 13, 2020

Publication date: April 14, 2022

Inventors: Saket Sathe, Gregory Bramble, Long VU, Theodoros Salonidis
Automated data and label creation for supervised machine learning regression testing

Patent number: 11295242

Abstract: Split an input dataset into training and test datasets; the former includes a plurality of data examples, each represented as a feature vector, and having an associated true label. Split the training dataset into a plurality of training data subsets; for each, train a corresponding machine learning model to obtain a plurality of such models, and apply same to the test dataset to obtain a plurality of predicted labels and prediction scores. For each of the plurality of examples, compute an agreement metric based on a corresponding one of the associated true labels; corresponding ones of the predicted labels; and corresponding ones of the prediction scores. Based on the computed metric, select, for at least some of the true label values, appropriate ones of the data examples to be added to a regression set. Add the appropriate ones of the data examples from the test dataset to the regression set.

Type: Grant

Filed: November 13, 2019

Date of Patent: April 5, 2022

Assignee: International Business Machines Corporation

Inventors: Yuan-Chi Chang, Deepak Srinivas Turaga, Long Vu, Venkata Nagaraju Pavuluri, Saket Sathe, Rodrigue Ngueyep Tzoumpe
Random feature transformation forests for automatic feature engineering

Patent number: 11275974

Abstract: Embodiments for automated feature engineering by one or more processors are described. One or more selected transformations may be applied to a set of features in a dataset to create a set of transform features using random feature transformation forest (RFTF) classifiers. A transform feature may be selected from the set of transform features having a highest discriminative power as compared to other features of the set of transform features. At each node in a decision tree, store the selected feature, a split value, and the one or more selected transformations for the transform feature.

Type: Grant

Filed: September 17, 2018

Date of Patent: March 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Saket Sathe, Deepak S. Turaga, Horst Cornelius Samulowitz, Charu C. Aggarwal
Efficient unsupervised anomaly detection on homomorphically encrypted data

Patent number: 11271958

Abstract: Aspects of the present disclosure describe techniques for detecting anomalous data in an encrypted data set. An example method generally includes receiving a data set of encrypted data points. A tree data structure having a number of levels is generated for the data set. Each level of the tree data structure generally corresponds to a feature of the encrypted plurality of features, and each node in the tree data structure at a given level represents a probability distribution of a likelihood that each data point is less than or greater than a split value determined for a given feature. An encrypted data point is received for analysis, and anomaly score is calculated based on a probability identified for each of the plurality of encrypted features. Based on determining that the calculated anomaly score exceeds a threshold value, the encrypted data point is identified as potentially anomalous.

Type: Grant

Filed: September 20, 2019

Date of Patent: March 8, 2022

Assignee: International Business Machines Corporation

Inventors: Kanthi Sarpatwar, Venkata Sitaramagiridharganesh Ganapavarapu, Saket Sathe, Roman Vaculin
AUTOMATED MACHINE LEARNING USING NEAREST NEIGHBOR RECOMMENDER SYSTEMS

Publication number: 20220044078

Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.

Type: Application

Filed: August 10, 2020

Publication date: February 10, 2022

Inventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
AUTOMATED MACHINE LEARNING PIPELINE GENERATION

Publication number: 20220036246

Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.

Type: Application

Filed: July 29, 2020

Publication date: February 3, 2022

Inventors: Bei Chen, Long VU, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
AUTOMATED DATA AND LABEL CREATION FOR SUPERVISED MACHINE LEARNING REGRESSION TESTING

Publication number: 20210142222

Abstract: Split an input dataset into training and test datasets; the former includes a plurality of data examples, each represented as a feature vector, and having an associated true label. Split the training dataset into a plurality of training data subsets; for each, train a corresponding machine learning model to obtain a plurality of such models, and apply same to the test dataset to obtain a plurality of predicted labels and prediction scores. For each of the plurality of examples, compute an agreement metric based on a corresponding one of the associated true labels; corresponding ones of the predicted labels; and corresponding ones of the prediction scores. Based on the computed metric, select, for at least some of the true label values, appropriate ones of the data examples to be added to a regression set. Add the appropriate ones of the data examples from the test dataset to the regression set.

Type: Application

Filed: November 13, 2019

Publication date: May 13, 2021

Inventors: Yuan-Chi Chang, Deepak Srinivas Turaga, Long Vu, Venkata Nagaraju Pavuluri, Saket Sathe, Rodrigue Ngueyep Tzoumpe
EFFICIENT UNSUPERVISED ANOMALY DETECTION ON HOMOMORPHICALLY ENCRYPTED DATA

Publication number: 20210092137

Abstract: Aspects of the present disclosure describe techniques for detecting anomalous data in an encrypted data set. An example method generally includes receiving a data set of encrypted data points. A tree data structure having a number of levels is generated for the data set. Each level of the tree data structure generally corresponds to a feature of the encrypted plurality of features, and each node in the tree data structure at a given level represents a probability distribution of a likelihood that each data point is less than or greater than a split value determined for a given feature. An encrypted data point is received for analysis, and anomaly score is calculated based on a probability identified for each of the plurality of encrypted features. Based on determining that the calculated anomaly score exceeds a threshold value, the encrypted data point is identified as potentially anomalous.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Inventors: Kanthi Sarpatwar, Venkata Sitaramagiridharganesh Ganapavarapu, Saket Sathe, Roman Vaculin
Accurate temporal event predictive modeling

Patent number: 10956821

Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.

Type: Grant

Filed: November 29, 2016

Date of Patent: March 23, 2021

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga
ENHANCED ENSEMBLE MODEL DIVERSITY AND LEARNING

Publication number: 20200327456

Abstract: Embodiments for implementing enhanced ensemble model diversity and learning by a processor. One or more data sets may be created by combining one or more clusters of data points of a minority class with selected data points of a majority class. One or more ensemble models may be created from the one or more data sets using a supervised machine learning operation. An occurrence of an event may be predicted using the one or more ensemble models.

Type: Application

Filed: April 11, 2019

Publication date: October 15, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Saket SATHE, Deepak TURAGA, Charu AGGARWAL, Raju PAVULURI, Yuan-Chi CHANG
RANDOM FEATURE TRANSFORMATION FORESTS FOR AUTOMATIC FEATURE ENGINEERING

Publication number: 20200090010

Abstract: Embodiments for automated feature engineering by one or more processors are described. One or more selected transformations may be applied to a set of features in a dataset to create a set of transform features using random feature transformation forest (RFTF) classifiers. A transform feature may be selected from the set of transform features having a highest discriminative power as compared to other features of the set of transform features. At each node in a decision tree, store the selected feature, a split value, and the one or more selected transformations for the transform feature.

Type: Application

Filed: September 17, 2018

Publication date: March 19, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Saket SATHE, Deepak S. TURAGA, Horst Cornelius SAMULOWITZ, Charu C. AGGARWAL
ACCURATE TEMPORAL EVENT PREDICTIVE MODELING

Publication number: 20200027024

Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.

Type: Application

Filed: September 30, 2019

Publication date: January 23, 2020

Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga
ACCURATE TEMPORAL EVENT PREDICTIVE MODELING

Publication number: 20200027023

Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.

Type: Application

Filed: September 30, 2019

Publication date: January 23, 2020

Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga

1 2 next