Patents by Inventor Tomas Karnagel

Tomas Karnagel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Automated provisioning for database performance

Patent number: 11782926

Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.

Type: Grant

Filed: January 12, 2022

Date of Patent: October 10, 2023

Assignee: Oracle International Corporation

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Estimating number of distinct values in a data set using machine learning

Patent number: 11620547

Abstract: Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.

Type: Grant

Filed: May 19, 2020

Date of Patent: April 4, 2023

Assignee: Oracle International Corporation

Inventors: Tomas Karnagel, Onur Kocberber, Farhan Tauheed, Nipun Agarwal
Automatic feature subset selection based on meta-learning

Patent number: 11615265

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer ranks features of datasets of a training corpus. For each dataset and for each landmark percentage, a target ML model is configured to receive only a highest ranking landmark percentage of features, and a landmark accuracy achieved by training the ML model with the dataset is measured. Based on the landmark accuracies and meta-features values of the dataset, a respective training tuple is generated for each dataset. Based on all of the training tuples, a regressor is trained to predict an optimal amount of features for training the target ML model.

Type: Grant

Filed: August 21, 2019

Date of Patent: March 28, 2023

Assignee: Oracle International Corporation

Inventors: Tomas Karnagel, Sam Idicula, Hesam Fathi Moghadam, Nipun Agarwal
Automated configuration parameter tuning for database performance

Patent number: 11567937

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.

Type: Grant

Filed: May 12, 2021

Date of Patent: January 31, 2023

Assignee: Oracle International Corporation

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Automatic feature subset selection using feature ranking and scalable automatic search

Patent number: 11544630

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer calculates, for each feature of a training dataset, a relevance score based on: a relevance scoring function, and statistics of values, of the feature, that occur in the training dataset. A rank based on relevance scores of the features is calculated for each feature. A sequence of distinct subsets of the features, based on the ranks of the features, is generated. For each distinct subset of the sequence of distinct feature subsets, a fitness score is generated based on training a machine learning (ML) model that is configured for the distinct subset.

Type: Grant

Filed: May 20, 2019

Date of Patent: January 3, 2023

Assignee: Oracle International Corporation

Inventors: Tomas Karnagel, Sam Idicula, Nipun Agarwal
Chaining bloom filters to estimate the number of keys with low frequencies in a dataset

Patent number: 11520834

Abstract: Techniques are described for generating an approximate frequency histogram using a series of Bloom filters (BF). For example, to estimate the f1 and f2 cardinalities in a dataset, an ordered chain of three BFs is established (“BF1”, “BF2”, and “BF3”). An insertion operation is performed for each datum in the dataset, whereby the BFs are tested in order (starting at BF1) for the datum. If the datum is represented in a currently-tested BF, the subsequent BF in the chain is tested for the datum. If the datum is not represented in the currently-tested BF, the datum is added to the BF, a counter for the BF is incremented, and the insertion operation for the current datum ends. To estimate the cardinality of f1-values in the dataset, the BF2-counter is subtracted from the BF1-counter. Similarly, to estimate the cardinality of f2-values in the dataset, the BF3-counter is subtracted from the BF2-counter.

Type: Grant

Filed: July 28, 2021

Date of Patent: December 6, 2022

Assignee: Oracle International Corporation

Inventors: Tomas Karnagel, Suratna Budalakoti, Onur Kocberber, Nipun Agarwal, Alan Wood
AUTOMATED PROVISIONING FOR DATABASE PERFORMANCE

Publication number: 20220138199

Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.

Type: Application

Filed: January 12, 2022

Publication date: May 5, 2022

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Automated provisioning for database performance

Patent number: 11256698

Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.

Type: Grant

Filed: April 11, 2019

Date of Patent: February 22, 2022

Assignee: Oracle International Corporation

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
ENABLING EFFICIENT MACHINE LEARNING MODEL INFERENCE USING ADAPTIVE SAMPLING FOR AUTONOMOUS DATABASE SERVICES

Publication number: 20210406717

Abstract: Herein are approaches for self-optimization of a database management system (DBMS) such as in real time. Adaptive just-in-time sampling techniques herein estimate database content statistics that a machine learning (ML) model may use to predict configuration settings that conserve computer resources such as execution time and storage space. In an embodiment, a computer repeatedly samples database content until a dynamic convergence criterion is satisfied. In each iteration of a series of sampling iterations, a subset of rows of a database table are sampled, and estimates of content statistics of the database table are adjusted based on the sampled subset of rows. Immediately or eventually after detecting dynamic convergence, a machine learning (ML) model predicts, based on the content statistic estimates, an optimal value for a configuration setting of the DBMS.

Type: Application

Filed: June 29, 2020

Publication date: December 30, 2021

Inventors: Farhan Tauheed, Onur Kocberber, Tomas Karnagel, Nipun Agarwal
ESTIMATING NUMBER OF DISTINCT VALUES IN A DATA SET USING MACHINE LEARNING

Publication number: 20210365805

Abstract: Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.

Type: Application

Filed: May 19, 2020

Publication date: November 25, 2021

Inventors: Tomas Karnagel, Onur Kocberber, Farhan Tauheed, Nipun Agarwal
AUTOMATED CONFIGURATION PARAMETER TUNING FOR DATABASE PERFORMANCE

Publication number: 20210263934

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.

Type: Application

Filed: May 12, 2021

Publication date: August 26, 2021

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Automated configuration parameter tuning for database performance

Patent number: 11061902

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.

Type: Grant

Filed: March 11, 2019

Date of Patent: July 13, 2021

Assignee: Oracle International Corporation

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Automatic Feature Subset Selection based on Meta-Learning

Publication number: 20200327357

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer ranks features of datasets of a training corpus. For each dataset and for each landmark percentage, a target ML model is configured to receive only a highest ranking landmark percentage of features, and a landmark accuracy achieved by training the ML model with the dataset is measured. Based on the landmark accuracies and meta-features values of the dataset, a respective training tuple is generated for each dataset. Based on all of the training tuples, a regressor is trained to predict an optimal amount of features for training the target ML model.

Type: Application

Filed: August 21, 2019

Publication date: October 15, 2020

Inventors: TOMAS KARNAGEL, SAM IDICULA, HESAM FATHI MOGHADAM, NIPUN AGARWAL
AUTOMATED PROVISIONING FOR DATABASE PERFORMANCE

Publication number: 20200125568

Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.

Type: Application

Filed: April 11, 2019

Publication date: April 23, 2020

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
AUTOMATED CONFIGURATION PARAMETER TUNING FOR DATABASE PERFORMANCE

Publication number: 20200125545

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.

Type: Application

Filed: March 11, 2019

Publication date: April 23, 2020

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
AUTOMATIC FEATURE SUBSET SELECTION USING FEATURE RANKING AND SCALABLE AUTOMATIC SEARCH

Publication number: 20200118036

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer calculates, for each feature of a training dataset, a relevance score based on: a relevance scoring function, and statistics of values, of the feature, that occur in the training dataset. A rank based on relevance scores of the features is calculated for each feature. A sequence of distinct subsets of the features, based on the ranks of the features, is generated. For each distinct subset of the sequence of distinct feature subsets, a fitness score is generated based on training a machine learning (ML) model that is configured for the distinct subset.

Type: Application

Filed: May 20, 2019

Publication date: April 16, 2020

Inventors: Tomas Karnagel, Sam Idicula, Nipun Agarwal