Patents by Inventor Onur Kocberber

Onur Kocberber has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Automating data load operations for in-memory data warehouses

Patent number: 12248444

Abstract: Auto-parallel-load techniques are provided for automatically loading database objects from an on-disk database system into an in-memory database system. The auto-parallel-load techniques involve a pipeline that includes several components. In one implementation, each of the pipeline components is configured to receive, extract information from, and add information to, a “state object”. One or more of the pipeline components include logic that is based on the output of a corresponding machine learning model. The machine learning models used by the pipeline components may be trained from training sets from which outliers have been excluded, and may be used as the basis for generating linear models that are used during runtime, to produce estimates that affect the parameters of the auto-parallel-load operation.

Type: Grant

Filed: December 14, 2023

Date of Patent: March 11, 2025

Assignee: Oracle International Corporation

Inventors: Fotis Savva, Farhan Tauheed, Marc Jolles, Onur Kocberber, Seema Sundara, Nipun Agarwal
Workload-aware data placement advisor for OLAP database systems

Patent number: 12229135

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automatic data placement recommendations for partitioning data across multiple nodes in a database system. The system is configured to extract workload-specific features of a database workload running at a database system and dataset-specific features of a database running on the database system. The workload-specific features characterize utilization of the database workload. The dataset-specific features characterize how data is organized within the database. The system identifies a plurality of candidate keys for determining how to partition data stored in the database across nodes. Based at least in part on the workload-specific features, the dataset specific features, and the plurality of candidate keys, a set of candidate key combinations for partitioning data is generated.

Type: Grant

Filed: March 21, 2022

Date of Patent: February 18, 2025

Assignee: Oracle International Corporation

Inventors: Urvashi Oswal, Jian Wen, Farhan Tauheed, Onur Kocberber, Seema Sundara, Nipun Agarwal
Enabling efficient machine learning model inference using adaptive sampling for autonomous database services

Patent number: 12014286

Abstract: Herein are approaches for self-optimization of a database management system (DBMS) such as in real time. Adaptive just-in-time sampling techniques herein estimate database content statistics that a machine learning (ML) model may use to predict configuration settings that conserve computer resources such as execution time and storage space. In an embodiment, a computer repeatedly samples database content until a dynamic convergence criterion is satisfied. In each iteration of a series of sampling iterations, a subset of rows of a database table are sampled, and estimates of content statistics of the database table are adjusted based on the sampled subset of rows. Immediately or eventually after detecting dynamic convergence, a machine learning (ML) model predicts, based on the content statistic estimates, an optimal value for a configuration setting of the DBMS.

Type: Grant

Filed: June 29, 2020

Date of Patent: June 18, 2024

Assignee: Oracle International Corporation

Inventors: Farhan Tauheed, Onur Kocberber, Tomas Karnagel, Nipun Agarwal
Workload-aware data encoding

Patent number: 11907250

Abstract: Techniques are described for executing machine learning models trained for specific operators with feature values that are based on the actual execution of a workload set. The machine learning models generate an estimate of benefit gain/cost for executing operations on data portions in the alternative encoding format. Such data potions may be sorted based on the estimated benefit, in an embodiment. Using cost estimation machine learning models for memory space, the data portions with the most benefits that comply with the existing memory space constraints are recommended and/or are automatically encoded into the alternative encoding format.

Type: Grant

Filed: July 22, 2022

Date of Patent: February 20, 2024

Assignee: Oracle International Corporation

Inventors: Urvashi Oswal, Marc Jolles, Onur Kocberber, Seema Sundara, Nipun Agarwal
WORKLOAD-AWARE DATA ENCODING

Publication number: 20240028605

Abstract: Techniques are described for executing machine learning models trained for specific operators with feature values that are based on the actual execution of a workload set. The machine learning models generate an estimate of benefit gain/cost for executing operations on data portions in the alternative encoding format. Such data potions may be sorted based on the estimated benefit, in an embodiment. Using cost estimation machine learning models for memory space, the data portions with the most benefits that comply with the existing memory space constraints are recommended and/or are automatically encoded into the alternative encoding format.

Type: Application

Filed: July 22, 2022

Publication date: January 25, 2024

Inventors: URVASHI OSWAL, MARC JOLLES, ONUR KOCBERBER, SEEMA SUNDARA, NIPUN AGARWAL
Prediction of buffer pool size for transaction processing workloads

Patent number: 11868261

Abstract: Techniques are described herein for prediction of an buffer pool size (BPS). Before performing BPS prediction, gathered data are used to determine whether a target workload is in a steady state. Historical utilization data gathered while the workload is in a steady state are used to predict object-specific BPS components for database objects, accessed by the target workload, that are identified for BPS analysis based on shares of the total disk I/O requests, for the workload, that are attributed to the respective objects. Preference of analysis is given to objects that are associated with larger shares of disk I/O activity. An object-specific BPS component is determined based on a coverage function that returns a percentage of the database object size (on disk) that should be available in the buffer pool for that database object. The percentage is determined using either a heuristic-based or a machine learning-based approach.

Type: Grant

Filed: July 20, 2021

Date of Patent: January 9, 2024

Assignee: Oracle International Corporation

Inventors: Peyman Faizian, Mayur Bency, Onur Kocberber, Seema Sundara, Nipun Agarwal
WORKLOAD-AWARE DATA PLACEMENT ADVISOR FOR OLAP DATABASE SYSTEMS

Publication number: 20230297573

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automatic data placement recommendations for partitioning data across multiple nodes in a database system. The system is configured to extract workload-specific features of a database workload running at a database system and dataset-specific features of a database running on the database system. The workload-specific features characterize utilization of the database workload. The dataset-specific features characterize how data is organized within the database. The system identifies a plurality of candidate keys for determining how to partition data stored in the database across nodes. Based at least in part on the workload-specific features, the dataset specific features, and the plurality of candidate keys, a set of candidate key combinations for partitioning data is generated.

Type: Application

Filed: March 21, 2022

Publication date: September 21, 2023

Inventors: Urvashi Oswal, Jian Wen, Farhan Tauheed, Onur Kocberber, Seema Sundara, Nipun Agarwal
Datacenter level utilization prediction without operating system involvement

Patent number: 11657256

Abstract: Embodiments use a hierarchy of machine learning models to predict datacenter behavior at multiple hardware levels of a datacenter without accessing operating system generated hardware utilization information. The accuracy of higher-level models in the hierarchy of models is increased by including, as input to the higher-level models, hardware utilization predictions from lower-level models. The hierarchy of models includes: server utilization models and workload/OS prediction models that produce predictions at a server device-level of a datacenter; and also top-of-rack switch models and backbone switch models that produce predictions at higher levels of the datacenter. These models receive, as input, hardware utilization information from non-OS sources. Based on datacenter-level network utilization predictions from the hierarchy of models, the datacenter automatically configures its hardware to avoid any predicted over-utilization of hardware in the datacenter.

Type: Grant

Filed: July 18, 2022

Date of Patent: May 23, 2023

Assignee: Oracle International Corporation

Inventors: Pravin Shinde, Felix Schmidt, Onur Kocberber
Estimating number of distinct values in a data set using machine learning

Patent number: 11620547

Abstract: Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.

Type: Grant

Filed: May 19, 2020

Date of Patent: April 4, 2023

Assignee: Oracle International Corporation

Inventors: Tomas Karnagel, Onur Kocberber, Farhan Tauheed, Nipun Agarwal
Disk drive failure prediction with neural networks

Patent number: 11579951

Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.

Type: Grant

Filed: September 27, 2018

Date of Patent: February 14, 2023

Assignee: Oracle International Corporation

Inventors: Onur Kocberber, Felix Schmidt, Arun Raghavan, Nipun Agarwal, Sam Idicula, Guang-Tong Zhou, Nitin Kunal
PREDICTION OF BUFFER POOL SIZE FOR TRANSACTION PROCESSING WORKLOADS

Publication number: 20230022884

Abstract: Techniques are described herein for prediction of an buffer pool size (BPS). Before performing BPS prediction, gathered data are used to determine whether a target workload is in a steady state. Historical utilization data gathered while the workload is in a steady state are used to predict object-specific BPS components for database objects, accessed by the target workload, that are identified for BPS analysis based on shares of the total disk I/O requests, for the workload, that are attributed to the respective objects. Preference of analysis is given to objects that are associated with larger shares of disk I/O activity. An object-specific BPS component is determined based on a coverage function that returns a percentage of the database object size (on disk) that should be available in the buffer pool for that database object. The percentage is determined using either a heuristic-based or a machine learning-based approach.

Type: Application

Filed: July 20, 2021

Publication date: January 26, 2023

Inventors: Peyman Faizian, Mayur Bency, Onur Kocberber, Seema Sundara, Nipun Agarwal
Method for generating rulesets using tree-based models for black-box machine learning explainability

Patent number: 11531915

Abstract: Herein are techniques to generate candidate rulesets for machine learning (ML) explainability (MLX) for black-box ML models. In an embodiment, an ML model generates classifications that each associates a distinct example with a label. A decision tree that, based on the classifications, contains tree nodes is received or generated. Each node contains label(s), a condition that identifies a feature of examples, and a split value for the feature. When a node has child nodes, the feature and the split value that are identified by the condition of the node are set to maximize information gain of the child nodes. Candidate rules are generated by traversing the tree. Each rule is built from a combination of nodes in a tree traversal path. Each rule contains a condition of at least one node and is assigned to a rule level. Candidate rules are subsequently optimized into an optimal ruleset for actual use.

Type: Grant

Filed: March 20, 2019

Date of Patent: December 20, 2022

Assignee: Oracle International Corporation

Inventors: Tayler Hetherington, Zahra Zohrevand, Onur Kocberber, Karoon Rashedi Nia, Sam Idicula, Nipun Agarwal
Chaining bloom filters to estimate the number of keys with low frequencies in a dataset

Patent number: 11520834

Abstract: Techniques are described for generating an approximate frequency histogram using a series of Bloom filters (BF). For example, to estimate the f1 and f2 cardinalities in a dataset, an ordered chain of three BFs is established (“BF1”, “BF2”, and “BF3”). An insertion operation is performed for each datum in the dataset, whereby the BFs are tested in order (starting at BF1) for the datum. If the datum is represented in a currently-tested BF, the subsequent BF in the chain is tested for the datum. If the datum is not represented in the currently-tested BF, the datum is added to the BF, a counter for the BF is incremented, and the insertion operation for the current datum ends. To estimate the cardinality of f1-values in the dataset, the BF2-counter is subtracted from the BF1-counter. Similarly, to estimate the cardinality of f2-values in the dataset, the BF3-counter is subtracted from the BF2-counter.

Type: Grant

Filed: July 28, 2021

Date of Patent: December 6, 2022

Assignee: Oracle International Corporation

Inventors: Tomas Karnagel, Suratna Budalakoti, Onur Kocberber, Nipun Agarwal, Alan Wood
DATACENTER LEVEL UTILIZATION PREDICTION WITHOUT OPERATING SYSTEM INVOLVEMENT

Publication number: 20220351023

Abstract: Embodiments use a hierarchy of machine learning models to predict datacenter behavior at multiple hardware levels of a datacenter without accessing operating system generated hardware utilization information. The accuracy of higher-level models in the hierarchy of models is increased by including, as input to the higher-level models, hardware utilization predictions from lower-level models. The hierarchy of models includes: server utilization models and workload/OS prediction models that produce predictions at a server device-level of a datacenter; and also top-of-rack switch models and backbone switch models that produce predictions at higher levels of the datacenter. These models receive, as input, hardware utilization information from non-OS sources. Based on datacenter-level network utilization predictions from the hierarchy of models, the datacenter automatically configures its hardware to avoid any predicted over-utilization of hardware in the datacenter.

Type: Application

Filed: July 18, 2022

Publication date: November 3, 2022

Inventors: Pravin Shinde, Felix Schmidt, Onur Kocberber
Datacenter level utilization prediction without operating system involvement

Patent number: 11443166

Abstract: Embodiments use a hierarchy of machine learning models to predict datacenter behavior at multiple hardware levels of a datacenter without accessing operating system generated hardware utilization information. The accuracy of higher-level models in the hierarchy of models is increased by including, as input to the higher-level models, hardware utilization predictions from lower-level models. The hierarchy of models includes: server utilization models and workload/OS prediction models that produce predictions at a server device-level of a datacenter; and also top-of-rack switch models and backbone switch models that produce predictions at higher levels of the datacenter. These models receive, as input, hardware utilization information from non-OS sources. Based on datacenter-level network utilization predictions from the hierarchy of models, the datacenter automatically configures its hardware to avoid any predicted over-utilization of hardware in the datacenter.

Type: Grant

Filed: October 29, 2018

Date of Patent: September 13, 2022

Assignee: Oracle International Corporation

Inventors: Pravin Shinde, Felix Schmidt, Onur Kocberber
Out of band server utilization estimation and server workload characterization for datacenter resource optimization and forecasting

Patent number: 11423327

Abstract: Techniques are described herein for estimating CPU, memory, and I/O utilization for a workload via out-of-band sensor readings using a machine learning model. The framework involves receiving sensor data associated with executing benchmark applications, obtaining ground truth utilization values for the benchmarks, preprocessing the training data to select a set of enhanced sequences, and using the enhanced sequences to train a random forest model to estimate CPU, memory, and I/O utilization given sensor monitoring data. Prior to the training phase, a machine learning model is trained using a set of predefined hyper-parameters. The trained models are used to generate estimations for CPU, memory, and I/O utilizations values. The utilization values are used with workload context information to assess the deployment and generate one or more recommendations for machine types that will best serve the workload in terms of system utilization.

Type: Grant

Filed: October 10, 2018

Date of Patent: August 23, 2022

Assignee: Oracle International Corporation

Inventors: Onur Kocberber, Felix Schmidt, Craig Schelp, Andrew Brownsword, Nipun Agarwal
Efficient adjustment of spin-locking parameter values

Patent number: 11379456

Abstract: Systems and methods for adjusting parameters for a spin-lock implementation of concurrency control are described herein. In an embodiment, a system continuously retrieves, from a resource management system, one or more state values defining a state of the resource management system. Based on the one or more state values, the system determines that the resource management system has reached a steady state and, in response adjusts a plurality of parameters for spin-locking performed by said resource management system to identify optimal values for the plurality of parameters. After adjusting the plurality of parameters, the system detects, based on one or more current state values, a workload change in the resource management system and, in response, readjusts the plurality of parameters for spin-locking performed by said resource management system to identify new optimal values for the parameters.

Type: Grant

Filed: October 1, 2020

Date of Patent: July 5, 2022

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Onur Kocberber, Mayur Bency, Marc Jolles, Seema Sundara, Nipun Agarwal
EFFICIENT ADJUSTMENT OF SPIN-LOCKING PARAMETER VALUES

Publication number: 20220107933

Abstract: Systems and methods for adjusting parameters for a spin-lock implementation of concurrency control are described herein. In an embodiment, a system continuously retrieves, from a resource management system, one or more state values defining a state of the resource management system. Based on the one or more state values, the system determines that the resource management system has reached a steady state and, in response adjusts a plurality of parameters for spin-locking performed by said resource management system to identify optimal values for the plurality of parameters. After adjusting the plurality of parameters, the system detects, based on one or more current state values, a workload change in the resource management system and, in response, readjusts the plurality of parameters for spin-locking performed by said resource management system to identify new optimal values for the parameters.

Type: Application

Filed: October 1, 2020

Publication date: April 7, 2022

Inventors: Onur Kocberber, Mayur Bency, Marc Jolles, Seema Sundara, Nipun Agarwal
ENABLING EFFICIENT MACHINE LEARNING MODEL INFERENCE USING ADAPTIVE SAMPLING FOR AUTONOMOUS DATABASE SERVICES

Publication number: 20210406717

Abstract: Herein are approaches for self-optimization of a database management system (DBMS) such as in real time. Adaptive just-in-time sampling techniques herein estimate database content statistics that a machine learning (ML) model may use to predict configuration settings that conserve computer resources such as execution time and storage space. In an embodiment, a computer repeatedly samples database content until a dynamic convergence criterion is satisfied. In each iteration of a series of sampling iterations, a subset of rows of a database table are sampled, and estimates of content statistics of the database table are adjusted based on the sampled subset of rows. Immediately or eventually after detecting dynamic convergence, a machine learning (ML) model predicts, based on the content statistic estimates, an optimal value for a configuration setting of the DBMS.

Type: Application

Filed: June 29, 2020

Publication date: December 30, 2021

Inventors: Farhan Tauheed, Onur Kocberber, Tomas Karnagel, Nipun Agarwal
ESTIMATING NUMBER OF DISTINCT VALUES IN A DATA SET USING MACHINE LEARNING

Publication number: 20210365805

Abstract: Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.

Type: Application

Filed: May 19, 2020

Publication date: November 25, 2021

Inventors: Tomas Karnagel, Onur Kocberber, Farhan Tauheed, Nipun Agarwal

1 2 next