Patents by Inventor Nipun Agarwal

Nipun Agarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10671583
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: June 2, 2020
    Assignee: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Publication number: 20200125545
    Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters.
    Type: Application
    Filed: March 11, 2019
    Publication date: April 23, 2020
    Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
  • Publication number: 20200125568
    Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.
    Type: Application
    Filed: April 11, 2019
    Publication date: April 23, 2020
    Inventors: Sam Idicula, Tomas Karnagel, Jian Wen, Seema Sundara, Nipun Agarwal, Mayur Bency
  • Publication number: 20200125961
    Abstract: Techniques are described for generating and applying mini-machine learning variants of machine learning algorithms to save computational resources in tuning and selection of machine learning algorithms. In an embodiment, at least one of the hyper-parameter values for a reference variant is modified to a new hyper-parameter value thereby generating a new variant of machine learning algorithm from the reference variant of machine learning algorithm. A performance score is determined for the new variant of machine learning algorithm using a training dataset, the performance score representing the accuracy of the new machine learning model for the training dataset. By performing training of the new variant of machine learning algorithm with the training data set, a cost metric of the new variant of machine learning algorithm is measured by measuring usage the used computing resources for the training.
    Type: Application
    Filed: October 19, 2018
    Publication date: April 23, 2020
    Inventors: SANDEEP AGRAWAL, VENKATANATHAN VARADARAJAN, SAM IDICULA, NIPUN AGARWAL
  • Patent number: 10630957
    Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.
    Type: Grant
    Filed: October 1, 2019
    Date of Patent: April 21, 2020
    Assignee: Oracle International Corporation
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Publication number: 20200118039
    Abstract: Techniques are described herein for estimating CPU, memory, and I/O utilization for a workload via out-of-band sensor readings using a machine learning model. The framework involves receiving sensor data associated with executing benchmark applications, obtaining ground truth utilization values for the benchmarks, preprocessing the training data to select a set of enhanced sequences, and using the enhanced sequences to train a random forest model to estimate CPU, memory, and I/O utilization given sensor monitoring data. Prior to the training phase, a machine learning model is trained using a set of predefined hyper-parameters. The trained models are used to generate estimations for CPU, memory, and I/O utilizations values. The utilization values are used with workload context information to assess the deployment and generate one or more recommendations for machine types that will best serve the workload in terms of system utilization.
    Type: Application
    Filed: October 10, 2018
    Publication date: April 16, 2020
    Inventors: Onur Kocberber, Felix Schmidt, Craig Schelp, Andrew Brownsword, Nipun Agarwal
  • Publication number: 20200118036
    Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer calculates, for each feature of a training dataset, a relevance score based on: a relevance scoring function, and statistics of values, of the feature, that occur in the training dataset. A rank based on relevance scores of the features is calculated for each feature. A sequence of distinct subsets of the features, based on the ranks of the features, is generated. For each distinct subset of the sequence of distinct feature subsets, a fitness score is generated based on training a machine learning (ML) model that is configured for the distinct subset.
    Type: Application
    Filed: May 20, 2019
    Publication date: April 16, 2020
    Inventors: Tomas Karnagel, Sam Idicula, Nipun Agarwal
  • Publication number: 20200104200
    Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 2, 2020
    Inventors: ONUR KOCBERBER, FELIX SCHMIDT, ARUN RAGHAVAN, NIPUN AGARWAL, SAM IDICULA, GUANG-TONG ZHOU, NITIN KUNAL
  • Publication number: 20200097810
    Abstract: Techniques are described herein for automatically generating statistical features describing trends in time-series data that may then become inputs to machine learning models. The framework involves a set of algorithms for selecting a number and size of window based statistical features to use as input features, evaluating them during a series of training phases with a machine learning model using training, test and validation time series data. The training and evaluation phases provide particular values for a number and a size of window based statistical features that yield best scores in terms of prediction accuracy. The particular values are then used with input time series data to generate an augmented time-series data to input to the trained machine learning model for obtaining predictions regarding the time series as well as identified anomalies in the input time series data.
    Type: Application
    Filed: September 25, 2018
    Publication date: March 26, 2020
    Inventors: Tayler Hetherington, Sam Idicula, Nipun Agarwal
  • Patent number: 10599647
    Abstract: Techniques are described for generation of an efficient hash table for probing during join operations. A node stores a partition and generates a hash table that includes a hash bucket array and a link array, where the link array is index aligned to the partition. Each hash bucket element contains an offset that defines a location of a build key array element in the partition and a link array element in the link array. For a particular build key array element, the node determines a hash bucket element that corresponds to the build key array. If the hash bucket element contains an existing offset, the existing offset is copied to the link array element that corresponds to the offset of the particular build key array element and the offset for the particular build key array element is copied into the hash bucket element. When probing, the offset in a hash bucket element is used to locate a build key array element and other offsets stored in the link array for additional build key array elements.
    Type: Grant
    Filed: December 22, 2017
    Date of Patent: March 24, 2020
    Assignee: Oracle International Corporation
    Inventors: Cagri Balkesen, Nipun Agarwal
  • Publication number: 20200089529
    Abstract: Herein are techniques for analysis of data streams. In an embodiment, a computer associates each software actor with data streams. Each software actor has its own backlog queue of data to analyze. In response to receiving some stream content and based on the received stream content, data is distributed to some software actors. In response to determining that the data satisfies completeness criteria of a particular software actor, an indication of the data is appended onto the backlog queue of the particular software actor. The particular software actor is reset to an initial state by loading an execution snapshot of a previous initial execution of an embedded virtual machine. Based on the particular software actor, execution of the execution snapshot of the previous initial execution is resumed to dequeue and process the indication of the data from the backlog queue of the particular software actor to generate a result.
    Type: Application
    Filed: September 19, 2018
    Publication date: March 19, 2020
    Inventors: ANDREW BROWNSWORD, TAYLER HETHERINGTON, PAVAN CHANDRASHEKAR, AKHILESH SINGHANIA, STUART WRAY, PRAVIN SHINDE, FELIX SCHMIDT, CRAIG SCHELP, ONUR KOCBERBER, JUAN FERNANDEZ PEINADOR, ROD REDDEKOPP, MANEL FERNANDEZ GOMEZ, NIPUN AGARWAL
  • Patent number: 10592531
    Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.
    Type: Grant
    Filed: February 21, 2017
    Date of Patent: March 17, 2020
    Assignee: Oracle International Corporation
    Inventors: Negar Koochakzadeh, Nitin Kunal, Sam Idicula, Cagri Balkesen, Nipun Agarwal
  • Publication number: 20200076841
    Abstract: Techniques are provided herein for contextual embedding of features of operational logs or network traffic for anomaly detection based on sequence prediction. In an embodiment, a computer has a predictive recurrent neural network (RNN) that detects an anomalous network flow. In an embodiment, an RNN contextually transcodes sparse feature vectors that represent log messages into dense feature vectors that may be predictive or used to generate predictive vectors. In an embodiment, graph embedding improves feature embedding of log traces. In an embodiment, a computer detects and feature-encodes independent traces from related log messages. These techniques may detect malicious activity by anomaly analysis of context-aware feature embeddings of network packet flows, log messages, and/or log traces.
    Type: Application
    Filed: September 5, 2018
    Publication date: March 5, 2020
    Inventors: HOSSEIN HAJIMIRSADEGHI, GUANG-TONG ZHOU, ANDREW BROWNSWORD, NIPUN AGARWAL, PAVAN CHANDRASHEKAR, KAROON RASHEDI NIA
  • Publication number: 20200036954
    Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels.
    Type: Application
    Filed: October 1, 2019
    Publication date: January 30, 2020
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Publication number: 20200034208
    Abstract: Embodiments monitor statistics from groups of devices and generate an alarm upon detecting a utilization imbalance that is beyond a threshold. Particular balance statistics are periodically sampled, over a timeframe, for a group of devices configured to have balanced utilization. The devices are ranked at every data collection timestamp based on the gathered device statistics. The numbers of times each device appears within each rank over the timeframe are tallied. The device/rank summations are collectively used as a probability distribution representing the probability of each device being ranked at each of the rankings in the future. Based on this probability distribution, an entropy value that represents a summary of the imbalance of the group of devices over the timeframe is derived. An imbalance alert is generated when one or more entropy values for a group of devices shows an imbalanced utilization of the devices going beyond an identified imbalance threshold.
    Type: Application
    Filed: July 24, 2018
    Publication date: January 30, 2020
    Inventors: Stuart Wray, Felix Schmidt, Craig Robert Schelp, Manel Fernandez Gomez, Nipun Agarwal
  • Patent number: 10546269
    Abstract: The present disclosure relates to methods, systems, and apparatuses for identifying related records in a database.
    Type: Grant
    Filed: October 13, 2015
    Date of Patent: January 28, 2020
    Assignee: GROUPON, INC.
    Inventors: Prashant Gaurav, Nipun Agarwal, Sushant Wason
  • Patent number: 10529049
    Abstract: Techniques are provided herein for generating an integral image of an input image in parallel across the cores of a multi-core processor. The input image is split into a plurality of tiles, each of which is stored in a scratchpad memory associated with a distinct core. At each tile, a partial integral image of the tile is first computed over the tile, using a Single-Pass Algorithm. This is followed by aggregating partial sums belonging to subsets of tiles using a 2D Inclusive Parallel Prefix Algorithm. A summation is finally performed over the aggregated partial sums to generate the integral image over the entire input image.
    Type: Grant
    Filed: March 27, 2017
    Date of Patent: January 7, 2020
    Assignee: Oracle International Corporation
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Patent number: 10521225
    Abstract: Techniques related to matrix multiplication at memory bandwidth are disclosed. Computing device(s) perform multiplication of a first matrix with a second matrix to generate a third matrix. A first register stores contiguous element values of the first matrix. Furthermore, a second register stores a first set of contiguous element values of the second matrix, and a third register stores a second set of contiguous element values of the second matrix. The first set and the second set correspond to a first row and a second row, respectively, of the second matrix. The first row and the second row are contiguous rows. A single instruction is executed to cause at least a partial computation of contiguous element values of the third matrix. The single instruction causes multiplication of element values stored in the first register with element values stored in the second and third registers and grouped accumulation of the products.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: December 31, 2019
    Assignee: Oracle International Corporation
    Inventors: Arun Raghavan, Sandeep R. Agrawal, Sam Idicula, Nipun Agarwal
  • Publication number: 20190392068
    Abstract: Techniques are described herein for building a framework for declarative query compilation using both rule-based and cost-based approaches for database management. The framework involves constructing and using: a set of rule-based properties tables that contain optimization parameters for both logical and physical optimization, a recursive algorithm to form candidate physical query plans that is based on the rule based tables, and a cost model for estimating the cost of a generated physical query plan that is used with the rule based properties tables to prune inferior query plans.
    Type: Application
    Filed: June 25, 2018
    Publication date: December 26, 2019
    Inventors: JIAN WEN, SAM IDICULA, NITIN KUNAL, FARHAN TAUHEED, SEEMA SUNDARA, NIPUN AGARWAL, INDU BHAGAT
  • Publication number: 20190370268
    Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Inventors: NEGAR KOOCHAKZADEH, NITIN KUNAL, SAM IDICULA, CAGRI BALKESEN, NIPUN AGARWAL