Patents Assigned to Qubole, Inc.

Systems and methods for determining peak memory requirements in SQL processing engines with concurrent subtasks

Patent number: 11704316

Abstract: The present invention is generally directed to systems and methods of determining and provisioning peak memory requirements in Structured Query Language Processing engines. More specifically, methods may include determining or obtaining a query execution plan; gathering statistics associated with each database table; breaking the query execution plan into one or more subtasks: calculating an estimated memory usage for each subtask using the statistics; determining or obtaining a dependency graph of the one or more subtasks; based at least in part on the dependency graph, determining which subtasks can execute concurrently on a single worker node; and totaling the amount of estimated memory for each subtask that can execute concurrently on a single worker node and setting this amount of estimated memory as the estimated peak memory requirement for the specefic database query.

Type: Grant

Filed: July 24, 2019

Date of Patent: July 18, 2023

Assignee: Qubole, Inc.

Inventors: Ankit Dixit, Shubham Tagra
Systems and methods for auto-scaling a big data system

Patent number: 11474874

Abstract: Systems and methods for automatically scaling a big data system. Methods include determining, at a first time, a first number of nodes for a cluster to process a request; assigning an amount of nodes equal to the first number of nodes to the cluster; determining a rate of progress of the request; determining, at a second time based on the rate of progress a second number of nodes; and modifying the amount of nodes to equal the second number of nodes. Systems include a cluster manager, to add and/or remove any nodes; the big data system, to process requests that utilize the cluster and nodes, and an automatic scaling cluster manager including a big data interface for communicating with the big data system; a cluster manager interface for communicating with the cluster manager; and a cluster state machine.

Type: Grant

Filed: August 14, 2014

Date of Patent: October 18, 2022

Assignee: QUBOLE, INC.

Inventors: Joydeep Sen Sarma, Mayank Ahuja, Sivaramakrishnan Narayanan, Shrikanth Shankar
Pure-spot and dynamically rebalanced auto-scaling clusters

Patent number: 11436667

Abstract: The present invention is generally directed to systems and methods of providing automatic scaling pure-spot clusters. Such dusters may be dynamically rebalanced for further costs savings. In accordance with some methods of the present invention may include a method of utilizing a cluster in a big data cloud computing environment where instances may include reserved on-demand instances for a set price and on-demand spot instances that may be bid on by a user, the method including: creating one or more stable nodes, comprising spot instances with a bid price above a price for an equivalent on-demand instance; creating one or more volatile nodes, comprising spot instances with a bid price below a price for an equivalent on-demand instance; using one or more of the stable nodes as a master node; and using the volatile nodes as slave nodes.

Type: Grant

Filed: June 7, 2016

Date of Patent: September 6, 2022

Assignee: Qubole, Inc.

Inventors: Hariharan Iyer, Joydeep Sen Sarma, Mayank Ahuja
System and methods for auto-tuning big data workloads on cloud platforms

Patent number: 11228489

Abstract: The invention is generally directed to systems and methods of automatically tuning big data workloads across various cloud platforms, the system being in communication with a cloud platform and a user, the cloud platform including data storage and a data engine. The system may include: a system information module in communication with the cloud platform; a static tuner in communication with the system information module; a cloud tuner in communication with the static tuner and the user; and an automation module in communication with the cloud tuner. Methods may include extracting information impacting or associated with the performance of the big data workload from the cloud platform; determining recommendations based at least in part on the information extracted; iterating through different hardware configurations to determine optimal hardware and data engine configuration; and applying the determined configuration to the data engine.

Type: Grant

Filed: August 14, 2018

Date of Patent: January 18, 2022

Assignee: QUBOLE, INC.

Inventors: Amogh Margoor, Rajat Venkatesh
System and method for scheduling and running interactive database queries with service level agreements in a multi-tenant processing system

Patent number: 11144360

Abstract: The invention is directed to systems and methods for scheduling interactive database queries from multiple tenants onto distributed query processing clusters with service level agreements (SLAs). SLAs may be provided through a combination of estimation of resources per query followed by scheduling of that query onto a cluster if enough resources are available or triggering proactive autoscaling to spawn new clusters if they are not. In some embodiments systems may include a workflow manager; a resource estimator cluster; one or more execution clusters; and one or more metastores. A workflow manager may include an active node and a passive node configured to send a query to the resource estimator cluster and receive a resource estimate. A resource estimator cluster may be in communication with the workflow manager. One or more execution clusters may be scaled by the workflow manager as part of a schedule or autoscale based on workload.

Type: Grant

Filed: July 25, 2019

Date of Patent: October 12, 2021

Assignee: QUBOLE, INC.

Inventors: Vijay Mann, Ankit Dixit, Shubham Tagra, Raunaq Morarka, Rajat Venkatesh, Ting Yao
Caching framework for big-data engines in the cloud

Patent number: 11080207

Abstract: The present invention is generally directed to a caching framework that provides a common abstraction across one or more big data engines, comprising a cache filesystem including a cache filesystem interface used by applications to access cloud storage through a cache subsystem, the cache filesystem interface in communication with a big data engine extension and a cache manager; the big data engine extension, providing cluster information to the cache filesystem and working with the cache filesystem interface to determine which nodes cache which part of a file; and a cache manager for maintaining metadata about the cache, the metadata comprising the status of blocks for each file. The invention may provide common abstraction across big data engines that does not require changes to the setup of infrastructure or user workloads, allows sharing of cached data and caching only the parts of files that are required, can process columnar format.

Type: Grant

Filed: June 7, 2017

Date of Patent: August 3, 2021

Assignee: Qubole, Inc.

Inventors: Joydeep Sen Sarma, Rajat Venkatesh, Shubham Tagra
High performance hadoop with new generation instances

Patent number: 10606478

Abstract: The present invention is generally directed to a distributed computing system comprising a plurality of computational clusters, each computational cluster comprising a plurality of compute optimized instances, each instance comprising local instance data storage and in communication with reserved disk storage, wherein processing hierarchy provides priority to local instance data storage before providing priority to reserved disk storage.

Type: Grant

Filed: October 22, 2015

Date of Patent: March 31, 2020

Assignee: Qubole, Inc.

Inventors: Mayank Ahuja, Joydeep Sen Sarma, Shrikanth Shankar

Systems and methods for determining peak memory requirements in SQL processing engines with concurrent subtasks

Systems and methods for auto-scaling a big data system

Pure-spot and dynamically rebalanced auto-scaling clusters

System and methods for auto-tuning big data workloads on cloud platforms

System and method for scheduling and running interactive database queries with service level agreements in a multi-tenant processing system

Caching framework for big-data engines in the cloud

High performance hadoop with new generation instances