Patents by Inventor Saket SATHE
Saket SATHE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11941541Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.Type: GrantFiled: August 10, 2020Date of Patent: March 26, 2024Assignee: International Business Machines CorporationInventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
-
Patent number: 11829799Abstract: A method, a structure, and a computer system for predicting pipeline training requirements. The exemplary embodiments may include receiving one or more worker node features from one or more worker nodes, extracting one or more pipeline features from one or more pipelines to be trained, and extracting one or more dataset features from one or more datasets used to train the one or more pipelines. The exemplary embodiments may further include predicting an amount of one or more resources required for each of the one or more worker nodes to train the one or more pipelines using the one or more datasets based on one or more models that correlate the one or more worker node features, one or more pipeline features, and one or more dataset features with the one or more resources. Lastly, the exemplary embodiments may include identifying a worker node requiring a least amount of the one or more resources of the one or more worker nodes for training the one or more pipelines.Type: GrantFiled: October 13, 2020Date of Patent: November 28, 2023Assignee: International Business Machines CorporationInventors: Saket Sathe, Gregory Bramble, Long Vu, Theodoros Salonidis
-
Publication number: 20230177387Abstract: A method, system, and computer program product for a metalearner for automated machine learning are provided. The method receives a labeled data set. A set of data subsets is generated from the labeled data set. A set of unsupervised machine learning pipelines is generated. A training set is generated from the set of data subsets and the set of unsupervised machine learning pipelines. The method trains a metalearner for unsupervised tasks based on the training set.Type: ApplicationFiled: December 8, 2021Publication date: June 8, 2023Inventors: Saket Sathe, Long Vu, Peter Daniel Kirchner, Charu C. Aggarwal
-
Patent number: 11620582Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.Type: GrantFiled: July 29, 2020Date of Patent: April 4, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bei Chen, Long Vu, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
-
Patent number: 11593716Abstract: Embodiments for implementing enhanced ensemble model diversity and learning by a processor. One or more data sets may be created by combining one or more clusters of data points of a minority class with selected data points of a majority class. One or more ensemble models may be created from the one or more data sets using a supervised machine learning operation. An occurrence of an event may be predicted using the one or more ensemble models.Type: GrantFiled: April 11, 2019Date of Patent: February 28, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Saket Sathe, Deepak Turaga, Charu Aggarwal, Raju Pavuluri, Yuan-Chi Chang
-
Publication number: 20220343207Abstract: In a method for ranking machine learning (ML) pipelines for a dataset, a processor receives first performance curves predicted by a meta learner model for a plurality of ML pipelines. A processor allocates a first subset of data points from the dataset to each of the plurality of ML pipelines. A processor receives first performance scores for each of the ML pipelines for the first subset of data points. A processor updates the meta learner model using the first performance scores. A processor receives second performance curves from the meta learner model updated with the first performance scores. A processor ranks the plurality of ML pipelines based on the second performance curves.Type: ApplicationFiled: April 22, 2021Publication date: October 27, 2022Inventors: Long Vu, Saket Sathe, Bei Chen, Peter Daniel Kirchner
-
Publication number: 20220207444Abstract: A system and method for assessing Pay-As-You-Go (PAYG) Automatic machine learned (AutoML) model pipeline charge to a user on the basis of performance improvement achieved by configuring a model pipeline with performance enhancements relative to a performance obtained by a base model pipeline. The method performs a ranking of pipelines (customized models) based on a user-specified metric (for example, prediction accuracy, run time, F1 score) or combination of metrics. The price for ranked pipelines is specified based on a “surrogate” model where the surrogate model is fit to the base model price and the maximum price for a model. The base model price relates to use of a current cloud resource utilization-based pricing model. The pricing per model pipeline increments on the basis of performance metric(s) in a linear fashion, e.g., using a linear pricing model, or in an exponential fashion, e.g., using a fixed percentage hike price model.Type: ApplicationFiled: December 30, 2020Publication date: June 30, 2022Inventors: Gregory Bramble, Saket Sathe, Long Vu, Theodoros Salonidis, Horst Cornelius Samulowitz, Jean-François Puget
-
Publication number: 20220114019Abstract: A method, a structure, and a computer system for predicting pipeline training requirements. The exemplary embodiments may include receiving one or more worker node features from one or more worker nodes, extracting one or more pipeline features from one or more pipelines to be trained, and extracting one or more dataset features from one or more datasets used to train the one or more pipelines. The exemplary embodiments may further include predicting an amount of one or more resources required for each of the one or more worker nodes to train the one or more pipelines using the one or more datasets based on one or more models that correlate the one or more worker node features, one or more pipeline features, and one or more dataset features with the one or more resources. Lastly, the exemplary embodiments may include identifying a worker node requiring a least amount of the one or more resources of the one or more worker nodes for training the one or more pipelines.Type: ApplicationFiled: October 13, 2020Publication date: April 14, 2022Inventors: Saket Sathe, Gregory Bramble, Long VU, Theodoros Salonidis
-
Patent number: 11295242Abstract: Split an input dataset into training and test datasets; the former includes a plurality of data examples, each represented as a feature vector, and having an associated true label. Split the training dataset into a plurality of training data subsets; for each, train a corresponding machine learning model to obtain a plurality of such models, and apply same to the test dataset to obtain a plurality of predicted labels and prediction scores. For each of the plurality of examples, compute an agreement metric based on a corresponding one of the associated true labels; corresponding ones of the predicted labels; and corresponding ones of the prediction scores. Based on the computed metric, select, for at least some of the true label values, appropriate ones of the data examples to be added to a regression set. Add the appropriate ones of the data examples from the test dataset to the regression set.Type: GrantFiled: November 13, 2019Date of Patent: April 5, 2022Assignee: International Business Machines CorporationInventors: Yuan-Chi Chang, Deepak Srinivas Turaga, Long Vu, Venkata Nagaraju Pavuluri, Saket Sathe, Rodrigue Ngueyep Tzoumpe
-
Patent number: 11275974Abstract: Embodiments for automated feature engineering by one or more processors are described. One or more selected transformations may be applied to a set of features in a dataset to create a set of transform features using random feature transformation forest (RFTF) classifiers. A transform feature may be selected from the set of transform features having a highest discriminative power as compared to other features of the set of transform features. At each node in a decision tree, store the selected feature, a split value, and the one or more selected transformations for the transform feature.Type: GrantFiled: September 17, 2018Date of Patent: March 15, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Saket Sathe, Deepak S. Turaga, Horst Cornelius Samulowitz, Charu C. Aggarwal
-
Patent number: 11271958Abstract: Aspects of the present disclosure describe techniques for detecting anomalous data in an encrypted data set. An example method generally includes receiving a data set of encrypted data points. A tree data structure having a number of levels is generated for the data set. Each level of the tree data structure generally corresponds to a feature of the encrypted plurality of features, and each node in the tree data structure at a given level represents a probability distribution of a likelihood that each data point is less than or greater than a split value determined for a given feature. An encrypted data point is received for analysis, and anomaly score is calculated based on a probability identified for each of the plurality of encrypted features. Based on determining that the calculated anomaly score exceeds a threshold value, the encrypted data point is identified as potentially anomalous.Type: GrantFiled: September 20, 2019Date of Patent: March 8, 2022Assignee: International Business Machines CorporationInventors: Kanthi Sarpatwar, Venkata Sitaramagiridharganesh Ganapavarapu, Saket Sathe, Roman Vaculin
-
Publication number: 20220044078Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.Type: ApplicationFiled: August 10, 2020Publication date: February 10, 2022Inventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
-
Publication number: 20220036246Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.Type: ApplicationFiled: July 29, 2020Publication date: February 3, 2022Inventors: Bei Chen, Long VU, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
-
Publication number: 20210142222Abstract: Split an input dataset into training and test datasets; the former includes a plurality of data examples, each represented as a feature vector, and having an associated true label. Split the training dataset into a plurality of training data subsets; for each, train a corresponding machine learning model to obtain a plurality of such models, and apply same to the test dataset to obtain a plurality of predicted labels and prediction scores. For each of the plurality of examples, compute an agreement metric based on a corresponding one of the associated true labels; corresponding ones of the predicted labels; and corresponding ones of the prediction scores. Based on the computed metric, select, for at least some of the true label values, appropriate ones of the data examples to be added to a regression set. Add the appropriate ones of the data examples from the test dataset to the regression set.Type: ApplicationFiled: November 13, 2019Publication date: May 13, 2021Inventors: Yuan-Chi Chang, Deepak Srinivas Turaga, Long Vu, Venkata Nagaraju Pavuluri, Saket Sathe, Rodrigue Ngueyep Tzoumpe
-
Publication number: 20210092137Abstract: Aspects of the present disclosure describe techniques for detecting anomalous data in an encrypted data set. An example method generally includes receiving a data set of encrypted data points. A tree data structure having a number of levels is generated for the data set. Each level of the tree data structure generally corresponds to a feature of the encrypted plurality of features, and each node in the tree data structure at a given level represents a probability distribution of a likelihood that each data point is less than or greater than a split value determined for a given feature. An encrypted data point is received for analysis, and anomaly score is calculated based on a probability identified for each of the plurality of encrypted features. Based on determining that the calculated anomaly score exceeds a threshold value, the encrypted data point is identified as potentially anomalous.Type: ApplicationFiled: September 20, 2019Publication date: March 25, 2021Inventors: Kanthi Sarpatwar, Venkata Sitaramagiridharganesh Ganapavarapu, Saket Sathe, Roman Vaculin
-
Patent number: 10956821Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.Type: GrantFiled: November 29, 2016Date of Patent: March 23, 2021Assignee: International Business Machines CorporationInventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga
-
Publication number: 20200327456Abstract: Embodiments for implementing enhanced ensemble model diversity and learning by a processor. One or more data sets may be created by combining one or more clusters of data points of a minority class with selected data points of a majority class. One or more ensemble models may be created from the one or more data sets using a supervised machine learning operation. An occurrence of an event may be predicted using the one or more ensemble models.Type: ApplicationFiled: April 11, 2019Publication date: October 15, 2020Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Saket SATHE, Deepak TURAGA, Charu AGGARWAL, Raju PAVULURI, Yuan-Chi CHANG
-
Publication number: 20200090010Abstract: Embodiments for automated feature engineering by one or more processors are described. One or more selected transformations may be applied to a set of features in a dataset to create a set of transform features using random feature transformation forest (RFTF) classifiers. A transform feature may be selected from the set of transform features having a highest discriminative power as compared to other features of the set of transform features. At each node in a decision tree, store the selected feature, a split value, and the one or more selected transformations for the transform feature.Type: ApplicationFiled: September 17, 2018Publication date: March 19, 2020Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Saket SATHE, Deepak S. TURAGA, Horst Cornelius SAMULOWITZ, Charu C. AGGARWAL
-
Publication number: 20200027024Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.Type: ApplicationFiled: September 30, 2019Publication date: January 23, 2020Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga
-
Publication number: 20200027023Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.Type: ApplicationFiled: September 30, 2019Publication date: January 23, 2020Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga