Patents by Inventor Sam Idicula

Sam Idicula has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Complete, correct and fast compile-time encoding inference on the basis of an underlying type system

Patent number: 10685021

Abstract: Techniques are described herein for introducing transcode operators into a generated operator tree during query processing. Setting up the transcode operators with correct encoding type at runtime is performed by inferring correct encoding type information during compile time. The inference of the correct encoding type information occurs in three phases during compile time: the first phase involves collecting, consolidating, and propagating the encoding-type information of input columns up the expression tree. The second phase involves pushing the encoding-type information down the tree for nodes in the expression tree that do not yet have any encoding-type assigned. The third phase involves determining which inputs to the current relational operator need to be pre-processed by a transcode operator.

Type: Grant

Filed: October 24, 2017

Date of Patent: June 16, 2020

Assignee: Oracle International Corporation

Inventors: Pit Fender, Sam Idicula, Nipun Agarwal, Benjamin Schlegel
Performing database operations using a vectorized approach or a non-vectorized approach

Patent number: 10671583

Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.

Type: Grant

Filed: August 24, 2017

Date of Patent: June 2, 2020

Assignee: Oracle International Corporation

Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
AUTOMATED PROVISIONING FOR DATABASE PERFORMANCE

Publication number: 20200125568

Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.

Type: Application

Filed: April 11, 2019

Publication date: April 23, 2020

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
MINI-MACHINE LEARNING

Publication number: 20200125961

Abstract: Techniques are described for generating and applying mini-machine learning variants of machine learning algorithms to save computational resources in tuning and selection of machine learning algorithms. In an embodiment, at least one of the hyper-parameter values for a reference variant is modified to a new hyper-parameter value thereby generating a new variant of machine learning algorithm from the reference variant of machine learning algorithm. A performance score is determined for the new variant of machine learning algorithm using a training dataset, the performance score representing the accuracy of the new machine learning model for the training dataset. By performing training of the new variant of machine learning algorithm with the training data set, a cost metric of the new variant of machine learning algorithm is measured by measuring usage the used computing resources for the training.

Type: Application

Filed: October 19, 2018

Publication date: April 23, 2020

Inventors: SANDEEP AGRAWAL, VENKATANATHAN VARADARAJAN, SAM IDICULA, NIPUN AGARWAL
AUTOMATED CONFIGURATION PARAMETER TUNING FOR DATABASE PERFORMANCE

Publication number: 20200125545

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.

Type: Application

Filed: March 11, 2019

Publication date: April 23, 2020

Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
Scalable distributed computation framework for data-intensive computer vision workloads

Patent number: 10630957

Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.

Type: Grant

Filed: October 1, 2019

Date of Patent: April 21, 2020

Assignee: Oracle International Corporation

Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
AUTOMATIC FEATURE SUBSET SELECTION USING FEATURE RANKING AND SCALABLE AUTOMATIC SEARCH

Publication number: 20200118036

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer calculates, for each feature of a training dataset, a relevance score based on: a relevance scoring function, and statistics of values, of the feature, that occur in the training dataset. A rank based on relevance scores of the features is calculated for each feature. A sequence of distinct subsets of the features, based on the ranks of the features, is generated. For each distinct subset of the sequence of distinct feature subsets, a fitness score is generated based on training a machine learning (ML) model that is configured for the distinct subset.

Type: Application

Filed: May 20, 2019

Publication date: April 16, 2020

Inventors: Tomas Karnagel, Sam Idicula, Nipun Agarwal
DISK DRIVE FAILURE PREDICTION WITH NEURAL NETWORKS

Publication number: 20200104200

Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.

Type: Application

Filed: September 27, 2018

Publication date: April 2, 2020

Inventors: ONUR KOCBERBER, FELIX SCHMIDT, ARUN RAGHAVAN, NIPUN AGARWAL, SAM IDICULA, GUANG-TONG ZHOU, NITIN KUNAL
AUTOMATED WINDOW BASED FEATURE GENERATION FOR TIME-SERIES FORECASTING AND ANOMALY DETECTION

Publication number: 20200097810

Abstract: Techniques are described herein for automatically generating statistical features describing trends in time-series data that may then become inputs to machine learning models. The framework involves a set of algorithms for selecting a number and size of window based statistical features to use as input features, evaluating them during a series of training phases with a machine learning model using training, test and validation time series data. The training and evaluation phases provide particular values for a number and a size of window based statistical features that yield best scores in terms of prediction accuracy. The particular values are then used with input time series data to generate an augmented time-series data to input to the trained machine learning model for obtaining predictions regarding the time series as well as identified anomalies in the input time series data.

Type: Application

Filed: September 25, 2018

Publication date: March 26, 2020

Inventors: Tayler Hetherington, Sam Idicula, Nipun Agarwal
Efficient partitioning of relational data

Patent number: 10592531

Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.

Type: Grant

Filed: February 21, 2017

Date of Patent: March 17, 2020

Assignee: Oracle International Corporation

Inventors: Negar Koochakzadeh, Nitin Kunal, Sam Idicula, Cagri Balkesen, Nipun Agarwal
Multi-system query execution plan

Patent number: 10585887

Abstract: Techniques are described to evaluate an operation from an execution plan of a query to offload the operation to another database management system for less costly execution. In an embodiment, the execution plan is determined based on characteristics of the database management system that received the query for execution. One or more operations in the execution plan are then evaluated for offloading to another heterogeneous database management system. In a related embodiment, the offloading cost for each operation may also include communication cost between the database management systems. The operations that are estimated to be less costly to execute on the other database management system are then identified for offloading to the other database management system. In an alternative embodiment, the database management system generates permutations of execution plans for the same query, and similarly evaluates each permutation of the execution plans for offloading its one or more operations.

Type: Grant

Filed: March 30, 2015

Date of Patent: March 10, 2020

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Khai Tran, Rajkumar Sen, Sabina Petride, Sam Idicula
HYBRID INSTRUMENTATION FRAMEWORK FOR MULTICORE LOW POWER PROCESSORS

Publication number: 20200065215

Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces.

Type: Application

Filed: October 31, 2019

Publication date: February 27, 2020

Inventors: Sam Idicula, Kirtikar Kashyap, Arun Raghavan, Evangelos Vlachos, Venkatraman Govindaraju
SCALABLE DISTRIBUTED COMPUTATION FRAMEWORK FOR DATA-INTENSIVE COMPUTER VISION WORKLOADS

Publication number: 20200036954

Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.

Type: Application

Filed: October 1, 2019

Publication date: January 30, 2020

Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
Efficient parallel algorithm for integral image computation for many-core CPUs

Patent number: 10529049

Abstract: Techniques are provided herein for generating an integral image of an input image in parallel across the cores of a multi-core processor. The input image is split into a plurality of tiles, each of which is stored in a scratchpad memory associated with a distinct core. At each tile, a partial integral image of the tile is first computed over the tile, using a Single-Pass Algorithm. This is followed by aggregating partial sums belonging to subsets of tiles using a 2D Inclusive Parallel Prefix Algorithm. A summation is finally performed over the aggregated partial sums to generate the integral image over the entire input image.

Type: Grant

Filed: March 27, 2017

Date of Patent: January 7, 2020

Assignee: Oracle International Corporation

Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
Matrix multiplication at memory bandwidth

Patent number: 10521225

Abstract: Techniques related to matrix multiplication at memory bandwidth are disclosed. Computing device(s) perform multiplication of a first matrix with a second matrix to generate a third matrix. A first register stores contiguous element values of the first matrix. Furthermore, a second register stores a first set of contiguous element values of the second matrix, and a third register stores a second set of contiguous element values of the second matrix. The first set and the second set correspond to a first row and a second row, respectively, of the second matrix. The first row and the second row are contiguous rows. A single instruction is executed to cause at least a partial computation of contiguous element values of the third matrix. The single instruction causes multiplication of element values stored in the first register with element values stored in the second and third registers and grouped accumulation of the products.

Type: Grant

Filed: June 29, 2017

Date of Patent: December 31, 2019

Assignee: Oracle International Corporation

Inventors: Arun Raghavan, Sandeep R. Agrawal, Sam Idicula, Nipun Agarwal
HYBRID DECLARATIVE QUERY COMPILER AND OPTIMIZER FRAMEWORK

Publication number: 20190392068

Abstract: Techniques are described herein for building a framework for declarative query compilation using both rule-based and cost-based approaches for database management. The framework involves constructing and using: a set of rule-based properties tables that contain optimization parameters for both logical and physical optimization, a recursive algorithm to form candidate physical query plans that is based on the rule based tables, and a cost model for estimating the cost of a generated physical query plan that is used with the rule based properties tables to prune inferior query plans.

Type: Application

Filed: June 25, 2018

Publication date: December 26, 2019

Inventors: JIAN WEN, SAM IDICULA, NITIN KUNAL, FARHAN TAUHEED, SEEMA SUNDARA, NIPUN AGARWAL, INDU BHAGAT
Hybrid instrumentation framework for multicore low power processors

Patent number: 10503626

Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces.

Type: Grant

Filed: January 29, 2018

Date of Patent: December 10, 2019

Assignee: Oracle International Corporation

Inventors: Sam Idicula, Kirtikar Kashyap, Arun Raghavan, Evangelos Vlachos, Venkatraman Govindaraju
EFFICIENT PARTITIONING OF RELATIONAL DATA

Publication number: 20190370268

Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.

Type: Application

Filed: August 15, 2019

Publication date: December 5, 2019

Inventors: NEGAR KOOCHAKZADEH, NITIN KUNAL, SAM IDICULA, CAGRI BALKESEN, NIPUN AGARWAL
Scalable distributed computation framework for data-intensive computer vision workloads

Patent number: 10469822

Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.

Type: Grant

Filed: March 28, 2017

Date of Patent: November 5, 2019

Assignee: Oracle International Corporation

Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
Memory management for sparse matrix multiplication

Patent number: 10452744

Abstract: Techniques related to memory management for sparse matrix multiplication are disclosed. Computing device(s) may perform a method for multiplying a row of a first sparse matrix with a second sparse matrix to generate a product matrix row. A compressed representation of the second sparse matrix is stored in main memory. The compressed representation comprises a values array that stores non-zero value(s). Tile(s) corresponding to row(s) of second sparse matrix are loaded into scratchpad memory. The tile(s) comprise set(s) of non-zero value(s) of the values array. A particular partition of an uncompressed representation of the product matrix row is generated in the scratchpad memory. The particular partition corresponds to a partition of the second sparse matrix comprising non-zero value(s) included in the tile(s). When a particular tile is determined to comprise non-zero value(s) that are required to generate the particular partition, the particular tile is loaded into the scratchpad memory.

Type: Grant

Filed: March 27, 2017

Date of Patent: October 22, 2019

Assignee: Oracle International Corporation

Inventors: Sandeep R. Agrawal, Sam Idicula, Nipun Agarwal

prev 1 2 3 4 5 6 7 … next