Patents by Inventor Sam Idicula

Sam Idicula has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10685021
    Abstract: Techniques are described herein for introducing transcode operators into a generated operator tree during query processing. Setting up the transcode operators with correct encoding type at runtime is performed by inferring correct encoding type information during compile time. The inference of the correct encoding type information occurs in three phases during compile time: the first phase involves collecting, consolidating, and propagating the encoding-type information of input columns up the expression tree. The second phase involves pushing the encoding-type information down the tree for nodes in the expression tree that do not yet have any encoding-type assigned. The third phase involves determining which inputs to the current relational operator need to be pre-processed by a transcode operator.
    Type: Grant
    Filed: October 24, 2017
    Date of Patent: June 16, 2020
    Assignee: Oracle International Corporation
    Inventors: Pit Fender, Sam Idicula, Nipun Agarwal, Benjamin Schlegel
  • Patent number: 10671583
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: June 2, 2020
    Assignee: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Publication number: 20200125568
    Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.
    Type: Application
    Filed: April 11, 2019
    Publication date: April 23, 2020
    Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
  • Publication number: 20200125961
    Abstract: Techniques are described for generating and applying mini-machine learning variants of machine learning algorithms to save computational resources in tuning and selection of machine learning algorithms. In an embodiment, at least one of the hyper-parameter values for a reference variant is modified to a new hyper-parameter value thereby generating a new variant of machine learning algorithm from the reference variant of machine learning algorithm. A performance score is determined for the new variant of machine learning algorithm using a training dataset, the performance score representing the accuracy of the new machine learning model for the training dataset. By performing training of the new variant of machine learning algorithm with the training data set, a cost metric of the new variant of machine learning algorithm is measured by measuring usage the used computing resources for the training.
    Type: Application
    Filed: October 19, 2018
    Publication date: April 23, 2020
    Inventors: SANDEEP AGRAWAL, VENKATANATHAN VARADARAJAN, SAM IDICULA, NIPUN AGARWAL
  • Publication number: 20200125545
    Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.
    Type: Application
    Filed: March 11, 2019
    Publication date: April 23, 2020
    Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
  • Patent number: 10630957
    Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.
    Type: Grant
    Filed: October 1, 2019
    Date of Patent: April 21, 2020
    Assignee: Oracle International Corporation
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Publication number: 20200118036
    Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer calculates, for each feature of a training dataset, a relevance score based on: a relevance scoring function, and statistics of values, of the feature, that occur in the training dataset. A rank based on relevance scores of the features is calculated for each feature. A sequence of distinct subsets of the features, based on the ranks of the features, is generated. For each distinct subset of the sequence of distinct feature subsets, a fitness score is generated based on training a machine learning (ML) model that is configured for the distinct subset.
    Type: Application
    Filed: May 20, 2019
    Publication date: April 16, 2020
    Inventors: Tomas Karnagel, Sam Idicula, Nipun Agarwal
  • Publication number: 20200104200
    Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 2, 2020
    Inventors: ONUR KOCBERBER, FELIX SCHMIDT, ARUN RAGHAVAN, NIPUN AGARWAL, SAM IDICULA, GUANG-TONG ZHOU, NITIN KUNAL
  • Publication number: 20200097810
    Abstract: Techniques are described herein for automatically generating statistical features describing trends in time-series data that may then become inputs to machine learning models. The framework involves a set of algorithms for selecting a number and size of window based statistical features to use as input features, evaluating them during a series of training phases with a machine learning model using training, test and validation time series data. The training and evaluation phases provide particular values for a number and a size of window based statistical features that yield best scores in terms of prediction accuracy. The particular values are then used with input time series data to generate an augmented time-series data to input to the trained machine learning model for obtaining predictions regarding the time series as well as identified anomalies in the input time series data.
    Type: Application
    Filed: September 25, 2018
    Publication date: March 26, 2020
    Inventors: Tayler Hetherington, Sam Idicula, Nipun Agarwal
  • Patent number: 10592531
    Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.
    Type: Grant
    Filed: February 21, 2017
    Date of Patent: March 17, 2020
    Assignee: Oracle International Corporation
    Inventors: Negar Koochakzadeh, Nitin Kunal, Sam Idicula, Cagri Balkesen, Nipun Agarwal
  • Patent number: 10585887
    Abstract: Techniques are described to evaluate an operation from an execution plan of a query to offload the operation to another database management system for less costly execution. In an embodiment, the execution plan is determined based on characteristics of the database management system that received the query for execution. One or more operations in the execution plan are then evaluated for offloading to another heterogeneous database management system. In a related embodiment, the offloading cost for each operation may also include communication cost between the database management systems. The operations that are estimated to be less costly to execute on the other database management system are then identified for offloading to the other database management system. In an alternative embodiment, the database management system generates permutations of execution plans for the same query, and similarly evaluates each permutation of the execution plans for offloading its one or more operations.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: March 10, 2020
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Khai Tran, Rajkumar Sen, Sabina Petride, Sam Idicula
  • Publication number: 20200065215
    Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces.
    Type: Application
    Filed: October 31, 2019
    Publication date: February 27, 2020
    Inventors: Sam Idicula, Kirtikar Kashyap, Arun Raghavan, Evangelos Vlachos, Venkatraman Govindaraju
  • Publication number: 20200036954
    Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.
    Type: Application
    Filed: October 1, 2019
    Publication date: January 30, 2020
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Patent number: 10529049
    Abstract: Techniques are provided herein for generating an integral image of an input image in parallel across the cores of a multi-core processor. The input image is split into a plurality of tiles, each of which is stored in a scratchpad memory associated with a distinct core. At each tile, a partial integral image of the tile is first computed over the tile, using a Single-Pass Algorithm. This is followed by aggregating partial sums belonging to subsets of tiles using a 2D Inclusive Parallel Prefix Algorithm. A summation is finally performed over the aggregated partial sums to generate the integral image over the entire input image.
    Type: Grant
    Filed: March 27, 2017
    Date of Patent: January 7, 2020
    Assignee: Oracle International Corporation
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Patent number: 10521225
    Abstract: Techniques related to matrix multiplication at memory bandwidth are disclosed. Computing device(s) perform multiplication of a first matrix with a second matrix to generate a third matrix. A first register stores contiguous element values of the first matrix. Furthermore, a second register stores a first set of contiguous element values of the second matrix, and a third register stores a second set of contiguous element values of the second matrix. The first set and the second set correspond to a first row and a second row, respectively, of the second matrix. The first row and the second row are contiguous rows. A single instruction is executed to cause at least a partial computation of contiguous element values of the third matrix. The single instruction causes multiplication of element values stored in the first register with element values stored in the second and third registers and grouped accumulation of the products.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: December 31, 2019
    Assignee: Oracle International Corporation
    Inventors: Arun Raghavan, Sandeep R. Agrawal, Sam Idicula, Nipun Agarwal
  • Publication number: 20190392068
    Abstract: Techniques are described herein for building a framework for declarative query compilation using both rule-based and cost-based approaches for database management. The framework involves constructing and using: a set of rule-based properties tables that contain optimization parameters for both logical and physical optimization, a recursive algorithm to form candidate physical query plans that is based on the rule based tables, and a cost model for estimating the cost of a generated physical query plan that is used with the rule based properties tables to prune inferior query plans.
    Type: Application
    Filed: June 25, 2018
    Publication date: December 26, 2019
    Inventors: JIAN WEN, SAM IDICULA, NITIN KUNAL, FARHAN TAUHEED, SEEMA SUNDARA, NIPUN AGARWAL, INDU BHAGAT
  • Patent number: 10503626
    Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces.
    Type: Grant
    Filed: January 29, 2018
    Date of Patent: December 10, 2019
    Assignee: Oracle International Corporation
    Inventors: Sam Idicula, Kirtikar Kashyap, Arun Raghavan, Evangelos Vlachos, Venkatraman Govindaraju
  • Publication number: 20190370268
    Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Inventors: NEGAR KOOCHAKZADEH, NITIN KUNAL, SAM IDICULA, CAGRI BALKESEN, NIPUN AGARWAL
  • Patent number: 10469822
    Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.
    Type: Grant
    Filed: March 28, 2017
    Date of Patent: November 5, 2019
    Assignee: Oracle International Corporation
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Patent number: 10452744
    Abstract: Techniques related to memory management for sparse matrix multiplication are disclosed. Computing device(s) may perform a method for multiplying a row of a first sparse matrix with a second sparse matrix to generate a product matrix row. A compressed representation of the second sparse matrix is stored in main memory. The compressed representation comprises a values array that stores non-zero value(s). Tile(s) corresponding to row(s) of second sparse matrix are loaded into scratchpad memory. The tile(s) comprise set(s) of non-zero value(s) of the values array. A particular partition of an uncompressed representation of the product matrix row is generated in the scratchpad memory. The particular partition corresponds to a partition of the second sparse matrix comprising non-zero value(s) included in the tile(s). When a particular tile is determined to comprise non-zero value(s) that are required to generate the particular partition, the particular tile is loaded into the scratchpad memory.
    Type: Grant
    Filed: March 27, 2017
    Date of Patent: October 22, 2019
    Assignee: Oracle International Corporation
    Inventors: Sandeep R. Agrawal, Sam Idicula, Nipun Agarwal