Patents by Inventor Sam Idicula

Sam Idicula has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient partitioning of relational data

Patent number: 11163800

Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.

Type: Grant

Filed: August 15, 2019

Date of Patent: November 2, 2021

Assignee: Oracle International Corporation

Inventors: Negar Koochakzadeh, Nitin Kunal, Sam Idicula, Cagri Balkesen, Nipun Agarwal
ASYMMETRIC ALLOCATION OF SRAM AND DATA LAYOUT FOR EFFICIENT MATRIX-MATRIX MULTIPLICATION

Publication number: 20210312014

Abstract: Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.

Type: Application

Filed: June 16, 2021

Publication date: October 7, 2021

Inventors: Gaurav Chadha, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
Assymetric allocation of SRAM and data layout for efficient matrix multiplication

Patent number: 11138291

Abstract: Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.

Type: Grant

Filed: September 26, 2017

Date of Patent: October 5, 2021

Assignee: Oracle International Corporation

Inventors: Gaurav Chadha, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
Massively parallel and in-memory execution of grouping and aggregation in a heterogeneous system

Patent number: 11126626

Abstract: A system and method for processing a group and aggregate query on a relation are disclosed. A database system determines whether assistance of a heterogeneous system (HS) of compute nodes is beneficial in performing the query. Assuming that the relation has been partitioned and loaded into the HS, the database system determines, in a compile phase, whether the HS has the functional capabilities to assist, and whether the cost and benefit favor performing the operation with the assistance of the HS. If the cost and benefit favor using the assistance of the HS, then the system enters the execution phase. The database system starts, in the execution phase, an optimal number of parallel processes to produce and consume the results from the compute nodes of the HS. After any needed transaction consistency checks, the results of the query are returned by the database system.

Type: Grant

Filed: February 11, 2019

Date of Patent: September 21, 2021

Assignee: Oracle International Corporation

Inventors: Sabina Petride, Sam Idicula, Nipun Agarwal
Scalable and efficient distributed auto-tuning of machine learning and deep learning models

Patent number: 11120368

Abstract: Herein are techniques for automatic tuning of hyperparameters of machine learning algorithms. System throughput is maximized by horizontally scaling and asynchronously dispatching the configuration, training, and testing of an algorithm. In an embodiment, a computer stores a best cost achieved by executing a target model based on best values of the target algorithm's hyperparameters. The best values and their cost are updated by epochs that asynchronously execute. Each epoch has asynchronous costing tasks that explore a distinct hyperparameter. Each costing task has a sample of exploratory values that differs from the best values along the distinct hyperparameter. The asynchronous costing tasks of a same epoch have different values for the distinct hyperparameter, which accomplishes an exploration. In an embodiment, an excessive update of best values or best cost creates a major epoch for exploration in a subspace that is more or less unrelated to other epochs, thereby avoiding local optima.

Type: Grant

Filed: September 21, 2018

Date of Patent: September 14, 2021

Assignee: Oracle International Corporation

Inventors: Venkatanathan Varadarajan, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
AUTOMATED CONFIGURATION PARAMETER TUNING FOR DATABASE PERFORMANCE

Publication number: 20210263934

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.

Type: Application

Filed: May 12, 2021

Publication date: August 26, 2021

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Automated configuration parameter tuning for database performance

Patent number: 11061902

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.

Type: Grant

Filed: March 11, 2019

Date of Patent: July 13, 2021

Assignee: Oracle International Corporation

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Adaptive resolution histogram on complex datatypes

Patent number: 11048679

Abstract: Techniques herein map between key spaces to generate a balanced adaptive resolution histogram for dataset partitioning. In embodiments, a computer (C) creates a mapping that associates sparse keys (SKs) with distinct dense keys. C constructs a trie by processing each item of a dataset as follows. Based on the item, C obtains an SK. C navigates from a root NT (node of the trie) to a particular NT based on a sequence of dense digits (SDD). Each dense digit of the SDD is based on the mapping. Each NT identifies a dense prefix comprising dense digits. C assigns the item to a target node based on a threshold and count of items assigned to a subtree rooted at the particular node. C determines a range of SKs for each partition of the dataset, based on: an item count for a node or subtree, dense prefixes of NTs, and the mapping.

Type: Grant

Filed: October 31, 2017

Date of Patent: June 29, 2021

Assignee: Oracle International Corporation

Inventors: Anantha Kiran Kandukuri, Sam Idicula
Hybrid instrumentation framework for multicore low power processors

Patent number: 11030073

Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces.

Type: Grant

Filed: October 31, 2019

Date of Patent: June 8, 2021

Assignee: Oracle International Corporation

Inventors: Sam Idicula, Kirtikar Kashyap, Arun Raghavan, Evangelos Vlachos, Venkatraman Govindaraju
Dynamic operation scheduling for distributed data processing

Patent number: 10956417

Abstract: Techniques are provided for scheduling data operations for a given query based upon a query-cost model that analyzes the cost of scheduling data operations based upon their operation cost and the type of resources needed for the operation. In an embodiment, a database server receives a set of operations for a query. The database server determines a set of leaf operation nodes from the set of data operations, where the set of leaf operation nodes includes operation nodes that do not depend on the execution of other nodes within the set of data operations. The database server compares operation costs between the leaf operation nodes to determine which leaf operation node to insert into a scheduled order set. The database server inserts the leaf operation node into the scheduled order set. Then the database server iteratively determines new leaf operation nodes and performs cost analysis on remaining leaf operation nodes to generate a set of scheduled data operations.

Type: Grant

Filed: April 28, 2017

Date of Patent: March 23, 2021

Assignee: Oracle International Corporation

Inventors: Jarod Wen, Sam Idicula, Nitin Kunal, Thomas Chang, Gong Zhang, Nipun Agarwal, Farhan Tauheed
High-performance data repartitioning for cloud-scale clusters

Patent number: 10862755

Abstract: Techniques herein partition data using data repartitioning that is store-and-forward, content-based, and phasic. In embodiments, computer(s) maps network elements (NEs) to grid points (GPs) in a multidimensional hyperrectangle. Each NE contains data items (DIs). For each particular dimension (PD) of the hyperrectangle the computers perform, for each particular NE (PNE), various activities including: determining a linear subset (LS) of NEs that are mapped to GPs in the hyperrectangle at a same position as the GP of the PNE along all dimensions of the hyperrectangle except the PD, and data repartitioning that includes, for each DI of the PNE, the following activities. The PNE determines a bit sequence based on the DI. The PNE selects, based on the PD, a bit subset of the bit sequence. The PNE selects, based on the bit subset, a receiving NE of the LS. The PNE sends the DI to the receiving NE.

Type: Grant

Filed: June 30, 2017

Date of Patent: December 8, 2020

Assignee: Oracle International Corporation

Inventors: Aarti Basant, Sam Idicula, Nipun Agarwal
Using Metamodeling for Fast and Accurate Hyperparameter optimization of Machine Learning and Deep Learning Models

Publication number: 20200380378

Abstract: Herein are techniques that train regressor(s) to predict how effective would a machine learning model (MLM) be if rained with new hyperparameters and/or dataset. In an embodiment, for each training dataset, a computer derives, from the dataset, values for dataset metafeatures. The computer performs, for each hyperparameters configuration (HC) of a MLM, including landmark HCs: configuring the MLM based on the HC, training the MLM based on the dataset, and obtaining an empirical quality score that indicates how effective was said training the MLM when configured with the HC. A performance tuple is generated that contains: the HC, the values for the dataset metafeatures, the empirical quality score and, for each landmark configuration, the empirical quality score of the landmark configuration and/or the landmark configuration itself. Based on the performance tuples, a regressor is trained to predict an estimated quality score based on a given dataset and a given HC.

Type: Application

Filed: May 30, 2019

Publication date: December 3, 2020

Inventors: ALI MOHARRER, VENKATANATHAN VARADARAJAN, SAM IDICULA, SANDEEP AGRAWAL, NIPUN AGARWAL
ADAPTIVE SAMPLING FOR IMBALANCE MITIGATION AND DATASET SIZE REDUCTION IN MACHINE LEARNING

Publication number: 20200342265

Abstract: According to an embodiment, a method includes generating a first dataset sample from a dataset, calculating a first validation score for the first dataset sample and a machine learning model, and determining whether a difference in validation score between the first validation score and a second validation score satisfies a first criteria. If the difference in validation score does not satisfy the first criteria, the method includes generating a second dataset sample from the dataset. If the difference in validation score does satisfy the first criteria, the method includes updating a convergence value and determining whether the updated convergence value satisfies a second criteria. If the updated convergence value satisfies the second criteria, the method includes returning the first dataset sample. If the updated convergence value does not satisfy the second criteria, the method includes generating the second dataset sample from the dataset.

Type: Application

Filed: December 17, 2019

Publication date: October 29, 2020

Inventors: Jingxiao Cai, Sandeep Agrawal, Sam Idicula, Venkatanathan Varadarajan, Anatoly Yakovlev, Nipun Agarwal
USING HYPERPARAMETER PREDICTORS TO IMPROVE ACCURACY OF AUTOMATIC MACHINE LEARNING MODEL SELECTION

Publication number: 20200334569

Abstract: Techniques are provided for selection of machine learning algorithms based on performance predictions by using hyperparameter predictors. In an embodiment, for each mini-machine learning model (MML model) of a plurality of MML models, a respective hyperparameter predictor set that predicts a respective set of hyperparameter settings for a first data set is trained. Each MML model represents a respective reference machine learning model (RML model) of a plurality of RML models. A first plurality of data set samples is generated from the first data set. A first plurality of first meta-feature sets is generated, each first meta-feature set describing a respective first data set sample of said first plurality. A respective target set of hyperparameter settings are generated for said each MML model using a hypertuning algorithm. The first plurality of first meta-feature sets and the respective target set of hyperparameter settings are used to train the respective hyperparameter predictor set.

Type: Application

Filed: April 18, 2019

Publication date: October 22, 2020

Inventors: Hesam Fathi Moghadam, Sandeep Agrawal, Venkatanathan Varadarajan, Anatoly Yakovlev, Sam Idicula, Nipun Agarwal
Distributed relational dictionaries

Patent number: 10810195

Abstract: Techniques related to distributed relational dictionaries are disclosed. In some embodiments, one or more non-transitory storage media store a sequence of instructions which, when executed by one or more computing devices, cause performance of a method. The method involves generating, by a query optimizer at a distributed database system (DDS), a query execution plan (QEP) for generating a code dictionary and a column of encoded database data. The QEP specifies a sequence of operations for generating the code dictionary. The code dictionary is a database table. The method further involves receiving, at the DDS, a column of unencoded database data from a data source that is external to the DDS. The DDS generates the code dictionary according to the QEP. Furthermore, based on joining the column of unencoded database data with the code dictionary, the DDS generates the column of encoded database data according to the QEP.

Type: Grant

Filed: January 3, 2018

Date of Patent: October 20, 2020

Assignee: Oracle International Corporation

Inventors: Anantha Kiran Kandukuri, Seema Sundara, Sam Idicula, Pit Fender, Nitin Kunal, Sabina Petride, Georgios Giannikis, Nipun Agarwal
PREDICTING MACHINE LEARNING OR DEEP LEARNING MODEL TRAINING TIME

Publication number: 20200327448

Abstract: Herein are techniques for exploring hyperparameters of a machine learning model (MLM) and to train a regressor to predict a time needed to train the MLM based on a hyperparameter configuration and a dataset. In an embodiment that is deployed in production inferencing mode, for each landmark configuration, each containing values for hyperparameters of a MLM, a computer configures the MLM based on the landmark configuration and measures time spent training the MLM on a dataset. An already trained regressor predicts time needed to train the MLM based on a proposed configuration of the MLM, dataset meta-feature values, and training durations and hyperparameter values of landmark configurations of the MLM. When instead in training mode, a regressor in training ingests a training corpus of MLM performance history to learn, by reinforcement, to predict a training time for the MLM for new datasets and/or new hyperparameter configurations.

Type: Application

Filed: April 15, 2019

Publication date: October 15, 2020

Inventors: ANATOLY YAKOVLEV, VENKATANATHAN VARADARAJAN, SANDEEP AGRAWAL, HESAM FATHI MOGHADAM, SAM IDICULA, NIPUN AGARWAL
Automatic Feature Subset Selection based on Meta-Learning

Publication number: 20200327357

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer ranks features of datasets of a training corpus. For each dataset and for each landmark percentage, a target ML model is configured to receive only a highest ranking landmark percentage of features, and a landmark accuracy achieved by training the ML model with the dataset is measured. Based on the landmark accuracies and meta-features values of the dataset, a respective training tuple is generated for each dataset. Based on all of the training tuples, a regressor is trained to predict an optimal amount of features for training the target ML model.

Type: Application

Filed: August 21, 2019

Publication date: October 15, 2020

Inventors: TOMAS KARNAGEL, SAM IDICULA, HESAM FATHI MOGHADAM, NIPUN AGARWAL
METHOD FOR GENERATING RULESETS USING TREE-BASED MODELS FOR BLACK-BOX MACHINE LEARNING EXPLAINABILITY

Publication number: 20200302318

Abstract: Herein are techniques to generate candidate rulesets for machine learning (ML) explainability (MLX) for black-box ML models. In an embodiment, an ML model generates classifications that each associates a distinct example with a label. A decision tree that, based on the classifications, contains tree nodes is received or generated. Each node contains label(s), a condition that identifies a feature of examples, and a split value for the feature. When a node has child nodes, the feature and the split value that are identified by the condition of the node are set to maximize information gain of the child nodes. Candidate rules are generated by traversing the tree. Each rule is built from a combination of nodes in a tree traversal path. Each rule contains a condition of at least one node and is assigned to a rule level. Candidate rules are subsequently optimized into an optimal ruleset for actual use.

Type: Application

Filed: March 20, 2019

Publication date: September 24, 2020

Inventors: TAYLER HETHERINGTON, ZAHRA ZOHREVAND, ONUR KOCBERBER, KAROON RASHEDI NIA, SAM IDICULA, NIPUN AGARWAL
Partition aware evaluation of top-N queries

Patent number: 10706055

Abstract: Techniques are described for executing an analytical query with a top-N clause. In an embodiment, a stream of tuples are received by each of the processing units from a data source identified in the query. The processing unit uses a portion of a received tuple to identify the partition that the tuple is assigned to. For each partition, the processing unit maintains a top-N data store that stores an N number of received tuples that match the criteria of top N tuples according to the query. The received tuple is compared to the N number of tuples to determine whether to store the received tuple and discard an already stored tuple, or to discard the received tuple. After all the tuples have been similarly processed by the processing units, all the top-N data stores for each partition are merged, yielding the top N number of tuples for each partition to return as a result of the query.

Type: Grant

Filed: April 6, 2016

Date of Patent: July 7, 2020

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Gong Zhang, Sam Idicula, Michael Duller, Nitin Kunal
Consistent query execution for big data analytics in a hybrid database

Patent number: 10691722

Abstract: Techniques are described for efficient query processing and data change propagation to a secondary database system. The secondary database system may execute queries received at a primary database system. Database changes made at the primary system are copied to the secondary system. The primary system receives a query to be executed on either the primary system or the secondary system. The primary system determines whether to send the query to the secondary system based upon whether data objects stored within the secondary system have pending changes that need to be applied to the data objects. The pending changes are stored within in-memory journals within the primary system. The primary system scans for the pending changes to the data objects and sends the pending changes to the secondary system. The secondary system then receives and applies the pending changes to the data objects within the secondary system. Upon applying the pending changes, the secondary system executes the query.

Type: Grant

Filed: May 31, 2017

Date of Patent: June 23, 2020

Assignee: Oracle International Corporation

Inventors: Shenoda Guirguis, Kantikiran Pasupuleti, Sabina Petride, Sam Idicula

prev 1 2 3 4 5 6 … next