Patents by Inventor Saket SATHE

Saket SATHE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11941541
    Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: March 26, 2024
    Assignee: International Business Machines Corporation
    Inventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
  • Patent number: 11829799
    Abstract: A method, a structure, and a computer system for predicting pipeline training requirements. The exemplary embodiments may include receiving one or more worker node features from one or more worker nodes, extracting one or more pipeline features from one or more pipelines to be trained, and extracting one or more dataset features from one or more datasets used to train the one or more pipelines. The exemplary embodiments may further include predicting an amount of one or more resources required for each of the one or more worker nodes to train the one or more pipelines using the one or more datasets based on one or more models that correlate the one or more worker node features, one or more pipeline features, and one or more dataset features with the one or more resources. Lastly, the exemplary embodiments may include identifying a worker node requiring a least amount of the one or more resources of the one or more worker nodes for training the one or more pipelines.
    Type: Grant
    Filed: October 13, 2020
    Date of Patent: November 28, 2023
    Assignee: International Business Machines Corporation
    Inventors: Saket Sathe, Gregory Bramble, Long Vu, Theodoros Salonidis
  • Publication number: 20230177387
    Abstract: A method, system, and computer program product for a metalearner for automated machine learning are provided. The method receives a labeled data set. A set of data subsets is generated from the labeled data set. A set of unsupervised machine learning pipelines is generated. A training set is generated from the set of data subsets and the set of unsupervised machine learning pipelines. The method trains a metalearner for unsupervised tasks based on the training set.
    Type: Application
    Filed: December 8, 2021
    Publication date: June 8, 2023
    Inventors: Saket Sathe, Long Vu, Peter Daniel Kirchner, Charu C. Aggarwal
  • Patent number: 11620582
    Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: April 4, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bei Chen, Long Vu, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
  • Patent number: 11593716
    Abstract: Embodiments for implementing enhanced ensemble model diversity and learning by a processor. One or more data sets may be created by combining one or more clusters of data points of a minority class with selected data points of a majority class. One or more ensemble models may be created from the one or more data sets using a supervised machine learning operation. An occurrence of an event may be predicted using the one or more ensemble models.
    Type: Grant
    Filed: April 11, 2019
    Date of Patent: February 28, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Saket Sathe, Deepak Turaga, Charu Aggarwal, Raju Pavuluri, Yuan-Chi Chang
  • Publication number: 20220343207
    Abstract: In a method for ranking machine learning (ML) pipelines for a dataset, a processor receives first performance curves predicted by a meta learner model for a plurality of ML pipelines. A processor allocates a first subset of data points from the dataset to each of the plurality of ML pipelines. A processor receives first performance scores for each of the ML pipelines for the first subset of data points. A processor updates the meta learner model using the first performance scores. A processor receives second performance curves from the meta learner model updated with the first performance scores. A processor ranks the plurality of ML pipelines based on the second performance curves.
    Type: Application
    Filed: April 22, 2021
    Publication date: October 27, 2022
    Inventors: Long Vu, Saket Sathe, Bei Chen, Peter Daniel Kirchner
  • Publication number: 20220207444
    Abstract: A system and method for assessing Pay-As-You-Go (PAYG) Automatic machine learned (AutoML) model pipeline charge to a user on the basis of performance improvement achieved by configuring a model pipeline with performance enhancements relative to a performance obtained by a base model pipeline. The method performs a ranking of pipelines (customized models) based on a user-specified metric (for example, prediction accuracy, run time, F1 score) or combination of metrics. The price for ranked pipelines is specified based on a “surrogate” model where the surrogate model is fit to the base model price and the maximum price for a model. The base model price relates to use of a current cloud resource utilization-based pricing model. The pricing per model pipeline increments on the basis of performance metric(s) in a linear fashion, e.g., using a linear pricing model, or in an exponential fashion, e.g., using a fixed percentage hike price model.
    Type: Application
    Filed: December 30, 2020
    Publication date: June 30, 2022
    Inventors: Gregory Bramble, Saket Sathe, Long Vu, Theodoros Salonidis, Horst Cornelius Samulowitz, Jean-François Puget
  • Publication number: 20220114019
    Abstract: A method, a structure, and a computer system for predicting pipeline training requirements. The exemplary embodiments may include receiving one or more worker node features from one or more worker nodes, extracting one or more pipeline features from one or more pipelines to be trained, and extracting one or more dataset features from one or more datasets used to train the one or more pipelines. The exemplary embodiments may further include predicting an amount of one or more resources required for each of the one or more worker nodes to train the one or more pipelines using the one or more datasets based on one or more models that correlate the one or more worker node features, one or more pipeline features, and one or more dataset features with the one or more resources. Lastly, the exemplary embodiments may include identifying a worker node requiring a least amount of the one or more resources of the one or more worker nodes for training the one or more pipelines.
    Type: Application
    Filed: October 13, 2020
    Publication date: April 14, 2022
    Inventors: Saket Sathe, Gregory Bramble, Long VU, Theodoros Salonidis
  • Patent number: 11295242
    Abstract: Split an input dataset into training and test datasets; the former includes a plurality of data examples, each represented as a feature vector, and having an associated true label. Split the training dataset into a plurality of training data subsets; for each, train a corresponding machine learning model to obtain a plurality of such models, and apply same to the test dataset to obtain a plurality of predicted labels and prediction scores. For each of the plurality of examples, compute an agreement metric based on a corresponding one of the associated true labels; corresponding ones of the predicted labels; and corresponding ones of the prediction scores. Based on the computed metric, select, for at least some of the true label values, appropriate ones of the data examples to be added to a regression set. Add the appropriate ones of the data examples from the test dataset to the regression set.
    Type: Grant
    Filed: November 13, 2019
    Date of Patent: April 5, 2022
    Assignee: International Business Machines Corporation
    Inventors: Yuan-Chi Chang, Deepak Srinivas Turaga, Long Vu, Venkata Nagaraju Pavuluri, Saket Sathe, Rodrigue Ngueyep Tzoumpe
  • Patent number: 11275974
    Abstract: Embodiments for automated feature engineering by one or more processors are described. One or more selected transformations may be applied to a set of features in a dataset to create a set of transform features using random feature transformation forest (RFTF) classifiers. A transform feature may be selected from the set of transform features having a highest discriminative power as compared to other features of the set of transform features. At each node in a decision tree, store the selected feature, a split value, and the one or more selected transformations for the transform feature.
    Type: Grant
    Filed: September 17, 2018
    Date of Patent: March 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Saket Sathe, Deepak S. Turaga, Horst Cornelius Samulowitz, Charu C. Aggarwal
  • Patent number: 11271958
    Abstract: Aspects of the present disclosure describe techniques for detecting anomalous data in an encrypted data set. An example method generally includes receiving a data set of encrypted data points. A tree data structure having a number of levels is generated for the data set. Each level of the tree data structure generally corresponds to a feature of the encrypted plurality of features, and each node in the tree data structure at a given level represents a probability distribution of a likelihood that each data point is less than or greater than a split value determined for a given feature. An encrypted data point is received for analysis, and anomaly score is calculated based on a probability identified for each of the plurality of encrypted features. Based on determining that the calculated anomaly score exceeds a threshold value, the encrypted data point is identified as potentially anomalous.
    Type: Grant
    Filed: September 20, 2019
    Date of Patent: March 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Kanthi Sarpatwar, Venkata Sitaramagiridharganesh Ganapavarapu, Saket Sathe, Roman Vaculin
  • Publication number: 20220044078
    Abstract: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.
    Type: Application
    Filed: August 10, 2020
    Publication date: February 10, 2022
    Inventors: Saket Sathe, Gregory Bramble, Horst Cornelius Samulowitz, Charu C. Aggarwal
  • Publication number: 20220036246
    Abstract: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.
    Type: Application
    Filed: July 29, 2020
    Publication date: February 3, 2022
    Inventors: Bei Chen, Long VU, Syed Yousaf Shah, Xuan-Hong Dang, Peter Daniel Kirchner, Si Er Han, Ji Hui Yang, Jun Wang, Jing James Xu, Dakuo Wang, Dhavalkumar C. Patel, Gregory Bramble, Horst Cornelius Samulowitz, Saket Sathe, Chuang Gan
  • Publication number: 20210142222
    Abstract: Split an input dataset into training and test datasets; the former includes a plurality of data examples, each represented as a feature vector, and having an associated true label. Split the training dataset into a plurality of training data subsets; for each, train a corresponding machine learning model to obtain a plurality of such models, and apply same to the test dataset to obtain a plurality of predicted labels and prediction scores. For each of the plurality of examples, compute an agreement metric based on a corresponding one of the associated true labels; corresponding ones of the predicted labels; and corresponding ones of the prediction scores. Based on the computed metric, select, for at least some of the true label values, appropriate ones of the data examples to be added to a regression set. Add the appropriate ones of the data examples from the test dataset to the regression set.
    Type: Application
    Filed: November 13, 2019
    Publication date: May 13, 2021
    Inventors: Yuan-Chi Chang, Deepak Srinivas Turaga, Long Vu, Venkata Nagaraju Pavuluri, Saket Sathe, Rodrigue Ngueyep Tzoumpe
  • Publication number: 20210092137
    Abstract: Aspects of the present disclosure describe techniques for detecting anomalous data in an encrypted data set. An example method generally includes receiving a data set of encrypted data points. A tree data structure having a number of levels is generated for the data set. Each level of the tree data structure generally corresponds to a feature of the encrypted plurality of features, and each node in the tree data structure at a given level represents a probability distribution of a likelihood that each data point is less than or greater than a split value determined for a given feature. An encrypted data point is received for analysis, and anomaly score is calculated based on a probability identified for each of the plurality of encrypted features. Based on determining that the calculated anomaly score exceeds a threshold value, the encrypted data point is identified as potentially anomalous.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Inventors: Kanthi Sarpatwar, Venkata Sitaramagiridharganesh Ganapavarapu, Saket Sathe, Roman Vaculin
  • Patent number: 10956821
    Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.
    Type: Grant
    Filed: November 29, 2016
    Date of Patent: March 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga
  • Publication number: 20200327456
    Abstract: Embodiments for implementing enhanced ensemble model diversity and learning by a processor. One or more data sets may be created by combining one or more clusters of data points of a minority class with selected data points of a majority class. One or more ensemble models may be created from the one or more data sets using a supervised machine learning operation. An occurrence of an event may be predicted using the one or more ensemble models.
    Type: Application
    Filed: April 11, 2019
    Publication date: October 15, 2020
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Saket SATHE, Deepak TURAGA, Charu AGGARWAL, Raju PAVULURI, Yuan-Chi CHANG
  • Publication number: 20200090010
    Abstract: Embodiments for automated feature engineering by one or more processors are described. One or more selected transformations may be applied to a set of features in a dataset to create a set of transform features using random feature transformation forest (RFTF) classifiers. A transform feature may be selected from the set of transform features having a highest discriminative power as compared to other features of the set of transform features. At each node in a decision tree, store the selected feature, a split value, and the one or more selected transformations for the transform feature.
    Type: Application
    Filed: September 17, 2018
    Publication date: March 19, 2020
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Saket SATHE, Deepak S. TURAGA, Horst Cornelius SAMULOWITZ, Charu C. AGGARWAL
  • Publication number: 20200027024
    Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.
    Type: Application
    Filed: September 30, 2019
    Publication date: January 23, 2020
    Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga
  • Publication number: 20200027023
    Abstract: Embodiments for accurate temporal event predictive modeling by a processor. An average reverse event delay may be determined from one or more event delays in a time-series window. A time-series event may be predicted by applying the average reverse event delay in conjunction with one or more weighted factors in a predictive model.
    Type: Application
    Filed: September 30, 2019
    Publication date: January 23, 2020
    Inventors: Charu C. Aggarwal, Lingtao Cao, Tan Hung M. Ng, Saket Sathe, Deepak S. Turaga