Patents by Inventor Alexander Behm

Alexander Behm has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240061839
    Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.
    Type: Application
    Filed: August 22, 2022
    Publication date: February 22, 2024
    Inventors: Prashanth Menon, Alexander Behm, Sriram Krishnamurthy
  • Publication number: 20240061840
    Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.
    Type: Application
    Filed: January 31, 2023
    Publication date: February 22, 2024
    Inventors: Prashanth Menon, Alexander Behm, Sriram Krishnamurthy
  • Patent number: 11874832
    Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
    Type: Grant
    Filed: January 23, 2023
    Date of Patent: January 16, 2024
    Assignee: Databricks, Inc.
    Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hovell tot Westerflier
  • Publication number: 20230350894
    Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
    Type: Application
    Filed: April 24, 2023
    Publication date: November 2, 2023
    Inventors: Alexander Behm, Mostafa Mokhtar
  • Patent number: 11675767
    Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.
    Type: Grant
    Filed: November 16, 2020
    Date of Patent: June 13, 2023
    Assignee: Databricks, Inc.
    Inventors: Alexander Behm, Ankur Dave
  • Patent number: 11663213
    Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: May 30, 2023
    Assignee: Cloudera, Inc.
    Inventors: Alexander Behm, Mostafa Mokhtar
  • Patent number: 11586624
    Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
    Type: Grant
    Filed: April 22, 2021
    Date of Patent: February 21, 2023
    Assignee: Databricks, Inc.
    Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hövell tot Westerflier
  • Patent number: 11481398
    Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.
    Type: Grant
    Filed: December 9, 2020
    Date of Patent: October 25, 2022
    Assignee: Databricks Inc.
    Inventors: Alexander Behm, Ankur Dave, Ryan Deng, Shoumik Palkar
  • Publication number: 20220100761
    Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
    Type: Application
    Filed: April 22, 2021
    Publication date: March 31, 2022
    Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hövell tot Westerflier
  • Publication number: 20210149904
    Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
    Type: Application
    Filed: November 25, 2020
    Publication date: May 20, 2021
    Inventors: Alexander Behm, Mostafa Mokhtar
  • Patent number: 10853368
    Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
    Type: Grant
    Filed: April 2, 2018
    Date of Patent: December 1, 2020
    Assignee: Cloudera, Inc.
    Inventors: Alexander Behm, Mostafa Mokhtar
  • Publication number: 20190303479
    Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
    Type: Application
    Filed: April 2, 2018
    Publication date: October 3, 2019
    Inventors: ALEXANDER BEHM, MOSTAFA MOKHTAR
  • Patent number: 7996369
    Abstract: A computer process, called VGRAM, improves the performance of these string search algorithms in computers by using a carefully chosen dictionary of variable-length grams based on their frequencies in the string collection. A dynamic programming algorithm for computing a tight lower bound on the number of common grams shared by two similar strings in order to improve query performance is disclosed. A method for automatically computing a dictionary of high-quality grams for a workload of queries. Improvement on query performance is achieved by these techniques by a cost-based quantitative approach to deciding good grams for approximate string queries. An approach for answering approximate queries efficiently based on discarding gram lists, and another is based on combining correlated lists. An indexing structure is reduced to a given amount of space, while retaining efficient query processing by using algorithms in a computer based on discarding gram lists and combining correlated lists.
    Type: Grant
    Filed: December 14, 2008
    Date of Patent: August 9, 2011
    Assignee: The Regents of the University of California
    Inventors: Chen Li, Bin Wang, Xaochun Yang, Alexander Behm, Shengyue Ji, Jiaheng Lu
  • Publication number: 20100125594
    Abstract: A computer process, called VGRAM, improves the performance of these string search algorithms in computers by using a carefully chosen dictionary of variable-length grams based on their frequencies in the string collection. A dynamic programming algorithm for computing a tight lower bound on the number of common grams shared by two similar strings in order to improve query performance is disclosed. A method for automatically computing a dictionary of high-quality grams for a workload of queries. Improvement on query performance is achieved by these techniques by a cost-based quantitative approach to deciding good grams for approximate string queries. An approach for answering approximate queries efficiently based on discarding gram lists, and another is based on combining correlated lists. An indexing structure is reduced to a given amount of space, while retaining efficient query processing by using algorithms in a computer based on discarding gram lists and combining correlated lists.
    Type: Application
    Filed: December 14, 2008
    Publication date: May 20, 2010
    Applicant: The Regents of the University of California
    Inventors: Chen Li, Bin Wang, Xaochun Yang, Alexander Behm, Shengyue Ji, Jiaheng Lu
  • Publication number: 20080046455
    Abstract: A method is disclosed for automatically configuring database statistics by: collecting information from a database system, the database information including data query feedback; consolidating and formatting the database information into a plurality of intervals; converting the plurality of intervals into a plurality of non-overlapping buckets; computing frequencies for the buckets by solving a constrained maximum entropy problem to create a proxy data distribution function; and using the proxy data distribution function to determine a set of statistics to maintain for the database information.
    Type: Application
    Filed: August 16, 2006
    Publication date: February 21, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: ALEXANDER BEHM, PETER JAY HAAS, VOLKER GERHARD MARKL