Patents by Inventor Sam Idicula

Sam Idicula has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11163800
    Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: November 2, 2021
    Assignee: Oracle International Corporation
    Inventors: Negar Koochakzadeh, Nitin Kunal, Sam Idicula, Cagri Balkesen, Nipun Agarwal
  • Publication number: 20210312014
    Abstract: Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.
    Type: Application
    Filed: June 16, 2021
    Publication date: October 7, 2021
    Inventors: Gaurav Chadha, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
  • Patent number: 11138291
    Abstract: Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.
    Type: Grant
    Filed: September 26, 2017
    Date of Patent: October 5, 2021
    Assignee: Oracle International Corporation
    Inventors: Gaurav Chadha, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
  • Patent number: 11126626
    Abstract: A system and method for processing a group and aggregate query on a relation are disclosed. A database system determines whether assistance of a heterogeneous system (HS) of compute nodes is beneficial in performing the query. Assuming that the relation has been partitioned and loaded into the HS, the database system determines, in a compile phase, whether the HS has the functional capabilities to assist, and whether the cost and benefit favor performing the operation with the assistance of the HS. If the cost and benefit favor using the assistance of the HS, then the system enters the execution phase. The database system starts, in the execution phase, an optimal number of parallel processes to produce and consume the results from the compute nodes of the HS. After any needed transaction consistency checks, the results of the query are returned by the database system.
    Type: Grant
    Filed: February 11, 2019
    Date of Patent: September 21, 2021
    Assignee: Oracle International Corporation
    Inventors: Sabina Petride, Sam Idicula, Nipun Agarwal
  • Patent number: 11120368
    Abstract: Herein are techniques for automatic tuning of hyperparameters of machine learning algorithms. System throughput is maximized by horizontally scaling and asynchronously dispatching the configuration, training, and testing of an algorithm. In an embodiment, a computer stores a best cost achieved by executing a target model based on best values of the target algorithm's hyperparameters. The best values and their cost are updated by epochs that asynchronously execute. Each epoch has asynchronous costing tasks that explore a distinct hyperparameter. Each costing task has a sample of exploratory values that differs from the best values along the distinct hyperparameter. The asynchronous costing tasks of a same epoch have different values for the distinct hyperparameter, which accomplishes an exploration. In an embodiment, an excessive update of best values or best cost creates a major epoch for exploration in a subspace that is more or less unrelated to other epochs, thereby avoiding local optima.
    Type: Grant
    Filed: September 21, 2018
    Date of Patent: September 14, 2021
    Assignee: Oracle International Corporation
    Inventors: Venkatanathan Varadarajan, Sam Idicula, Sandeep Agrawal, Nipun Agarwal
  • Publication number: 20210263934
    Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.
    Type: Application
    Filed: May 12, 2021
    Publication date: August 26, 2021
    Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
  • Patent number: 11061902
    Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.
    Type: Grant
    Filed: March 11, 2019
    Date of Patent: July 13, 2021
    Assignee: Oracle International Corporation
    Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
  • Patent number: 11048679
    Abstract: Techniques herein map between key spaces to generate a balanced adaptive resolution histogram for dataset partitioning. In embodiments, a computer (C) creates a mapping that associates sparse keys (SKs) with distinct dense keys. C constructs a trie by processing each item of a dataset as follows. Based on the item, C obtains an SK. C navigates from a root NT (node of the trie) to a particular NT based on a sequence of dense digits (SDD). Each dense digit of the SDD is based on the mapping. Each NT identifies a dense prefix comprising dense digits. C assigns the item to a target node based on a threshold and count of items assigned to a subtree rooted at the particular node. C determines a range of SKs for each partition of the dataset, based on: an item count for a node or subtree, dense prefixes of NTs, and the mapping.
    Type: Grant
    Filed: October 31, 2017
    Date of Patent: June 29, 2021
    Assignee: Oracle International Corporation
    Inventors: Anantha Kiran Kandukuri, Sam Idicula
  • Patent number: 11030073
    Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: June 8, 2021
    Assignee: Oracle International Corporation
    Inventors: Sam Idicula, Kirtikar Kashyap, Arun Raghavan, Evangelos Vlachos, Venkatraman Govindaraju
  • Patent number: 10956417
    Abstract: Techniques are provided for scheduling data operations for a given query based upon a query-cost model that analyzes the cost of scheduling data operations based upon their operation cost and the type of resources needed for the operation. In an embodiment, a database server receives a set of operations for a query. The database server determines a set of leaf operation nodes from the set of data operations, where the set of leaf operation nodes includes operation nodes that do not depend on the execution of other nodes within the set of data operations. The database server compares operation costs between the leaf operation nodes to determine which leaf operation node to insert into a scheduled order set. The database server inserts the leaf operation node into the scheduled order set. Then the database server iteratively determines new leaf operation nodes and performs cost analysis on remaining leaf operation nodes to generate a set of scheduled data operations.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: March 23, 2021
    Assignee: Oracle International Corporation
    Inventors: Jarod Wen, Sam Idicula, Nitin Kunal, Thomas Chang, Gong Zhang, Nipun Agarwal, Farhan Tauheed
  • Patent number: 10862755
    Abstract: Techniques herein partition data using data repartitioning that is store-and-forward, content-based, and phasic. In embodiments, computer(s) maps network elements (NEs) to grid points (GPs) in a multidimensional hyperrectangle. Each NE contains data items (DIs). For each particular dimension (PD) of the hyperrectangle the computers perform, for each particular NE (PNE), various activities including: determining a linear subset (LS) of NEs that are mapped to GPs in the hyperrectangle at a same position as the GP of the PNE along all dimensions of the hyperrectangle except the PD, and data repartitioning that includes, for each DI of the PNE, the following activities. The PNE determines a bit sequence based on the DI. The PNE selects, based on the PD, a bit subset of the bit sequence. The PNE selects, based on the bit subset, a receiving NE of the LS. The PNE sends the DI to the receiving NE.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: December 8, 2020
    Assignee: Oracle International Corporation
    Inventors: Aarti Basant, Sam Idicula, Nipun Agarwal
  • Publication number: 20200380378
    Abstract: Herein are techniques that train regressor(s) to predict how effective would a machine learning model (MLM) be if rained with new hyperparameters and/or dataset. In an embodiment, for each training dataset, a computer derives, from the dataset, values for dataset metafeatures. The computer performs, for each hyperparameters configuration (HC) of a MLM, including landmark HCs: configuring the MLM based on the HC, training the MLM based on the dataset, and obtaining an empirical quality score that indicates how effective was said training the MLM when configured with the HC. A performance tuple is generated that contains: the HC, the values for the dataset metafeatures, the empirical quality score and, for each landmark configuration, the empirical quality score of the landmark configuration and/or the landmark configuration itself. Based on the performance tuples, a regressor is trained to predict an estimated quality score based on a given dataset and a given HC.
    Type: Application
    Filed: May 30, 2019
    Publication date: December 3, 2020
    Inventors: ALI MOHARRER, VENKATANATHAN VARADARAJAN, SAM IDICULA, SANDEEP AGRAWAL, NIPUN AGARWAL
  • Publication number: 20200342265
    Abstract: According to an embodiment, a method includes generating a first dataset sample from a dataset, calculating a first validation score for the first dataset sample and a machine learning model, and determining whether a difference in validation score between the first validation score and a second validation score satisfies a first criteria. If the difference in validation score does not satisfy the first criteria, the method includes generating a second dataset sample from the dataset. If the difference in validation score does satisfy the first criteria, the method includes updating a convergence value and determining whether the updated convergence value satisfies a second criteria. If the updated convergence value satisfies the second criteria, the method includes returning the first dataset sample. If the updated convergence value does not satisfy the second criteria, the method includes generating the second dataset sample from the dataset.
    Type: Application
    Filed: December 17, 2019
    Publication date: October 29, 2020
    Inventors: Jingxiao Cai, Sandeep Agrawal, Sam Idicula, Venkatanathan Varadarajan, Anatoly Yakovlev, Nipun Agarwal
  • Publication number: 20200334569
    Abstract: Techniques are provided for selection of machine learning algorithms based on performance predictions by using hyperparameter predictors. In an embodiment, for each mini-machine learning model (MML model) of a plurality of MML models, a respective hyperparameter predictor set that predicts a respective set of hyperparameter settings for a first data set is trained. Each MML model represents a respective reference machine learning model (RML model) of a plurality of RML models. A first plurality of data set samples is generated from the first data set. A first plurality of first meta-feature sets is generated, each first meta-feature set describing a respective first data set sample of said first plurality. A respective target set of hyperparameter settings are generated for said each MML model using a hypertuning algorithm. The first plurality of first meta-feature sets and the respective target set of hyperparameter settings are used to train the respective hyperparameter predictor set.
    Type: Application
    Filed: April 18, 2019
    Publication date: October 22, 2020
    Inventors: Hesam Fathi Moghadam, Sandeep Agrawal, Venkatanathan Varadarajan, Anatoly Yakovlev, Sam Idicula, Nipun Agarwal
  • Patent number: 10810195
    Abstract: Techniques related to distributed relational dictionaries are disclosed. In some embodiments, one or more non-transitory storage media store a sequence of instructions which, when executed by one or more computing devices, cause performance of a method. The method involves generating, by a query optimizer at a distributed database system (DDS), a query execution plan (QEP) for generating a code dictionary and a column of encoded database data. The QEP specifies a sequence of operations for generating the code dictionary. The code dictionary is a database table. The method further involves receiving, at the DDS, a column of unencoded database data from a data source that is external to the DDS. The DDS generates the code dictionary according to the QEP. Furthermore, based on joining the column of unencoded database data with the code dictionary, the DDS generates the column of encoded database data according to the QEP.
    Type: Grant
    Filed: January 3, 2018
    Date of Patent: October 20, 2020
    Assignee: Oracle International Corporation
    Inventors: Anantha Kiran Kandukuri, Seema Sundara, Sam Idicula, Pit Fender, Nitin Kunal, Sabina Petride, Georgios Giannikis, Nipun Agarwal
  • Publication number: 20200327448
    Abstract: Herein are techniques for exploring hyperparameters of a machine learning model (MLM) and to train a regressor to predict a time needed to train the MLM based on a hyperparameter configuration and a dataset. In an embodiment that is deployed in production inferencing mode, for each landmark configuration, each containing values for hyperparameters of a MLM, a computer configures the MLM based on the landmark configuration and measures time spent training the MLM on a dataset. An already trained regressor predicts time needed to train the MLM based on a proposed configuration of the MLM, dataset meta-feature values, and training durations and hyperparameter values of landmark configurations of the MLM. When instead in training mode, a regressor in training ingests a training corpus of MLM performance history to learn, by reinforcement, to predict a training time for the MLM for new datasets and/or new hyperparameter configurations.
    Type: Application
    Filed: April 15, 2019
    Publication date: October 15, 2020
    Inventors: ANATOLY YAKOVLEV, VENKATANATHAN VARADARAJAN, SANDEEP AGRAWAL, HESAM FATHI MOGHADAM, SAM IDICULA, NIPUN AGARWAL
  • Publication number: 20200327357
    Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer ranks features of datasets of a training corpus. For each dataset and for each landmark percentage, a target ML model is configured to receive only a highest ranking landmark percentage of features, and a landmark accuracy achieved by training the ML model with the dataset is measured. Based on the landmark accuracies and meta-features values of the dataset, a respective training tuple is generated for each dataset. Based on all of the training tuples, a regressor is trained to predict an optimal amount of features for training the target ML model.
    Type: Application
    Filed: August 21, 2019
    Publication date: October 15, 2020
    Inventors: TOMAS KARNAGEL, SAM IDICULA, HESAM FATHI MOGHADAM, NIPUN AGARWAL
  • Publication number: 20200302318
    Abstract: Herein are techniques to generate candidate rulesets for machine learning (ML) explainability (MLX) for black-box ML models. In an embodiment, an ML model generates classifications that each associates a distinct example with a label. A decision tree that, based on the classifications, contains tree nodes is received or generated. Each node contains label(s), a condition that identifies a feature of examples, and a split value for the feature. When a node has child nodes, the feature and the split value that are identified by the condition of the node are set to maximize information gain of the child nodes. Candidate rules are generated by traversing the tree. Each rule is built from a combination of nodes in a tree traversal path. Each rule contains a condition of at least one node and is assigned to a rule level. Candidate rules are subsequently optimized into an optimal ruleset for actual use.
    Type: Application
    Filed: March 20, 2019
    Publication date: September 24, 2020
    Inventors: TAYLER HETHERINGTON, ZAHRA ZOHREVAND, ONUR KOCBERBER, KAROON RASHEDI NIA, SAM IDICULA, NIPUN AGARWAL
  • Patent number: 10706055
    Abstract: Techniques are described for executing an analytical query with a top-N clause. In an embodiment, a stream of tuples are received by each of the processing units from a data source identified in the query. The processing unit uses a portion of a received tuple to identify the partition that the tuple is assigned to. For each partition, the processing unit maintains a top-N data store that stores an N number of received tuples that match the criteria of top N tuples according to the query. The received tuple is compared to the N number of tuples to determine whether to store the received tuple and discard an already stored tuple, or to discard the received tuple. After all the tuples have been similarly processed by the processing units, all the top-N data stores for each partition are merged, yielding the top N number of tuples for each partition to return as a result of the query.
    Type: Grant
    Filed: April 6, 2016
    Date of Patent: July 7, 2020
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Gong Zhang, Sam Idicula, Michael Duller, Nitin Kunal
  • Patent number: 10691722
    Abstract: Techniques are described for efficient query processing and data change propagation to a secondary database system. The secondary database system may execute queries received at a primary database system. Database changes made at the primary system are copied to the secondary system. The primary system receives a query to be executed on either the primary system or the secondary system. The primary system determines whether to send the query to the secondary system based upon whether data objects stored within the secondary system have pending changes that need to be applied to the data objects. The pending changes are stored within in-memory journals within the primary system. The primary system scans for the pending changes to the data objects and sends the pending changes to the secondary system. The secondary system then receives and applies the pending changes to the data objects within the secondary system. Upon applying the pending changes, the secondary system executes the query.
    Type: Grant
    Filed: May 31, 2017
    Date of Patent: June 23, 2020
    Assignee: Oracle International Corporation
    Inventors: Shenoda Guirguis, Kantikiran Pasupuleti, Sabina Petride, Sam Idicula