Patents by Inventor Alexander Behm

Alexander Behm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dictionary filtering and evaluation in columnar databases

Patent number: 12242485

Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.

Type: Grant

Filed: January 31, 2023

Date of Patent: March 4, 2025

Assignee: Databricks, Inc.

Inventors: Utkarsh Agarwal, Shoumik Palkar, Alexander Behm, Sriram Krishnamurthy
Evaluating expressions over dictionary data

Patent number: 12210528

Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.

Type: Grant

Filed: January 31, 2023

Date of Patent: January 28, 2025

Assignee: Databricks, Inc.

Inventors: Utkarsh Agarwal, Shoumik Palkar, Alexander Behm, Sriram Krishnamurthy
Scan parsing

Patent number: 12189628

Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

Type: Grant

Filed: January 31, 2023

Date of Patent: January 7, 2025

Assignee: Databricks, Inc.

Inventors: Prashanth Menon, Alexander Behm, Sriram Krishnamurthy
Hash based rollup with passthrough

Patent number: 12153558

Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.

Type: Grant

Filed: January 31, 2023

Date of Patent: November 26, 2024

Assignee: Databricks, Inc.

Inventors: Alexander Behm, Ankur Dave
Adaptive approach to lazy materialization in database scans using pushed filters

Patent number: 12124450

Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.

Type: Grant

Filed: January 27, 2023

Date of Patent: October 22, 2024

Assignee: Databricks, Inc.

Inventors: Shoumik Palkar, Alexander Behm, Mostafa Mokhtar, Sriram Krishnamurthy
Distinct value estimation for query planning

Patent number: 12105712

Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.

Type: Grant

Filed: April 24, 2023

Date of Patent: October 1, 2024

Assignee: CLOUDERA, INC.

Inventors: Alexander Behm, Mostafa Mokhtar
Scan parsing

Patent number: 12072880

Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

Type: Grant

Filed: August 22, 2022

Date of Patent: August 27, 2024

Assignee: Databricks, Inc.

Inventors: Prashanth Menon, Alexander Behm, Sriram Krishnamurthy
Evaluating Expressions Over Dictionary Data

Publication number: 20240256549

Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.

Type: Application

Filed: January 31, 2023

Publication date: August 1, 2024

Inventors: Utkarsh Agarwal, Shoumik Palkar, Alexander Behm, Sriram Krishnamurthy
Dictionary Filtering and Evaluation in Columnar Databases

Publication number: 20240256550

Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.

Type: Application

Filed: January 31, 2023

Publication date: August 1, 2024

Inventors: Utkarsh Agarwal, Shoumik Palkar, Alexander Behm, Sriram Krishnamurthy
STATIC APPROACH TO LAZY MATERIALIZATION IN DATABASE SCANS USING PUSHED FILTERS

Publication number: 20240256539

Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. The method includes receiving a request to perform a new query in a columnar database containing a plurality of columns. A step in the method includes accessing a set of data in a column of the plurality of columns based on the query. The method includes generating an input to a machine-learned model comprising characteristics of the set of data in the column. From the machine-learned model, the method includes generating a likelihood value indicative of whether a filter of a first portion of the set of data in the column has greater efficiency than a download followed by a filter of the set of data in the column. The method further includes comparing the likelihood value to a threshold value. Based on the comparison, the method includes filtering the first portion of the set of data before downloading the set of data if the likelihood value is equal to or above the threshold value.

Type: Application

Filed: January 27, 2023

Publication date: August 1, 2024

Inventors: Shoumik Palkar, Alexander Behm, Mostafa Mokhtar, Sriram Krishnamurthy
Adaptive Approach to Lazy Materialization in Database Scans using Pushed Filters

Publication number: 20240256543

Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.

Type: Application

Filed: January 27, 2023

Publication date: August 1, 2024

Inventors: Shoumik Palkar, Alexander Behm, Mostafa Mokhtar, Sriram Krishnamurthy
Scan Parsing

Publication number: 20240061840

Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

Type: Application

Filed: January 31, 2023

Publication date: February 22, 2024

Inventors: Prashanth Menon, Alexander Behm, Sriram Krishnamurthy
SCAN PARSING

Publication number: 20240061839

Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

Type: Application

Filed: August 22, 2022

Publication date: February 22, 2024

Inventors: Prashanth Menon, Alexander Behm, Sriram Krishnamurthy
Integrated native vectorized engine for computation

Patent number: 11874832

Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.

Type: Grant

Filed: January 23, 2023

Date of Patent: January 16, 2024

Assignee: Databricks, Inc.

Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hovell tot Westerflier
DISTINCT VALUE ESTIMATION FOR QUERY PLANNING

Publication number: 20230350894

Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.

Type: Application

Filed: April 24, 2023

Publication date: November 2, 2023

Inventors: Alexander Behm, Mostafa Mokhtar
Hash based rollup with passthrough

Patent number: 11675767

Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.

Type: Grant

Filed: November 16, 2020

Date of Patent: June 13, 2023

Assignee: Databricks, Inc.

Inventors: Alexander Behm, Ankur Dave
Distinct value estimation for query planning

Patent number: 11663213

Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.

Type: Grant

Filed: November 25, 2020

Date of Patent: May 30, 2023

Assignee: Cloudera, Inc.

Inventors: Alexander Behm, Mostafa Mokhtar
Integrated native vectorized engine for computation

Patent number: 11586624

Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.

Type: Grant

Filed: April 22, 2021

Date of Patent: February 21, 2023

Assignee: Databricks, Inc.

Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hövell tot Westerflier
LIFO based spilling for grouping aggregation

Patent number: 11481398

Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.

Type: Grant

Filed: December 9, 2020

Date of Patent: October 25, 2022

Assignee: Databricks Inc.

Inventors: Alexander Behm, Ankur Dave, Ryan Deng, Shoumik Palkar
INTEGRATED NATIVE VECTORIZED ENGINE FOR COMPUTATION

Publication number: 20220100761

Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.

Type: Application

Filed: April 22, 2021

Publication date: March 31, 2022

Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hövell tot Westerflier

1 2 next