Patents Assigned to Databricks Inc.
  • Patent number: 12124450
    Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.
    Type: Grant
    Filed: January 27, 2023
    Date of Patent: October 22, 2024
    Assignee: Databricks, Inc.
    Inventors: Shoumik Palkar, Alexander Behm, Mostafa Mokhtar, Sriram Krishnamurthy
  • Patent number: 12117983
    Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
    Type: Grant
    Filed: November 17, 2023
    Date of Patent: October 15, 2024
    Assignee: Databricks, Inc.
    Inventors: Aaron Daniel Davidson, Clemens Mewald, Tomas Nykodym
  • Patent number: 12105690
    Abstract: A system for multipass sort includes a communication interface and a processor. The communication interface is configured to receive from a client device a request to sort a dataset that includes a plurality of rows. The processor is configured to perform a first sort pass on the dataset in part by: extracting prefixes associated with a first schema element associated with the dataset for the plurality of rows; and sorting the extracted prefixes utilizing an integer sort algorithm based on a sort order included in the request to sort the dataset, where sorting the extracted prefixes includes utilizing NULL values to resolve a tied range that includes at least two rows of the plurality of rows having a same extracted prefix.
    Type: Grant
    Filed: July 27, 2022
    Date of Patent: October 1, 2024
    Assignee: Databricks, Inc.
    Inventors: Timothy Armstrong, Arvind Sai Krishnan, Khayyam Guliyev
  • Patent number: 12099525
    Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.
    Type: Grant
    Filed: July 7, 2023
    Date of Patent: September 24, 2024
    Assignee: Databricks, Inc.
    Inventors: Alexander Balikov, Tathagata Das, Karthikeyan Ramasamy
  • Patent number: 12079167
    Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.
    Type: Grant
    Filed: January 6, 2023
    Date of Patent: September 3, 2024
    Assignee: Databricks, Inc.
    Inventors: Rahul Shivu Mahadev, Burak Yavuz, Tathagata Das
  • Patent number: 12072863
    Abstract: A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.
    Type: Grant
    Filed: July 5, 2023
    Date of Patent: August 27, 2024
    Assignee: Databricks, Inc.
    Inventors: Prakhar Jain, Frederick Ryan Johnson, Bart Samwel
  • Patent number: 12072843
    Abstract: The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.
    Type: Grant
    Filed: January 20, 2022
    Date of Patent: August 27, 2024
    Assignee: Databricks, Inc.
    Inventors: Prakhar Jain, Bart Samwel, Burak Yavuz
  • Patent number: 12072880
    Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.
    Type: Grant
    Filed: August 22, 2022
    Date of Patent: August 27, 2024
    Assignee: Databricks, Inc.
    Inventors: Prashanth Menon, Alexander Behm, Sriram Krishnamurthy
  • Patent number: 12061586
    Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.
    Type: Grant
    Filed: May 6, 2022
    Date of Patent: August 13, 2024
    Assignee: Databricks, Inc.
    Inventors: Bart Samwel, Prakhar Jain
  • Patent number: 12056126
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).
    Type: Grant
    Filed: August 25, 2022
    Date of Patent: August 6, 2024
    Assignee: Databricks, Inc.
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel, Prakhar Jain
  • Patent number: 12045220
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).
    Type: Grant
    Filed: August 25, 2022
    Date of Patent: July 23, 2024
    Assignee: Databricks, Inc.
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Chirstos Stavrakakis
  • Patent number: 12033041
    Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.
    Type: Grant
    Filed: August 26, 2022
    Date of Patent: July 9, 2024
    Assignee: Databricks, Inc.
    Inventors: Benjamin Thomas Wilson, Corey Zumar
  • Patent number: 12032573
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: October 28, 2022
    Date of Patent: July 9, 2024
    Assignee: Databricks, Inc.
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 12019682
    Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.
    Type: Grant
    Filed: December 27, 2022
    Date of Patent: June 25, 2024
    Assignee: Databricks, Inc.
    Inventors: Michael Paul Armbrust, Andreas Neumann, Mukul Murthy, Jonathan Mio
  • Patent number: 12008040
    Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries. The processor is coupled to the communication interface and is configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; insert a node in the DAG of nodes to generate an updated DAG to enforce an expectation; determine a dataflow graph based on the updated DAG; and provide the dataflow graph.
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: June 11, 2024
    Assignee: Databricks, Inc.
    Inventors: Michael Paul Armbrust, Andreas Neumann, Mukul Murthy, Jonathan Mio
  • Patent number: 11960494
    Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.
    Type: Grant
    Filed: June 16, 2022
    Date of Patent: April 16, 2024
    Assignee: Databricks, Inc.
    Inventors: Bogdan Ionut Ghit, Juliusz Sompolski, Shi Xin, Bart Samwel
  • Patent number: 11948084
    Abstract: A function creation method is disclosed. The method comprises defining one or more database function inputs, defining cluster processing information, defining a deep learning model, and defining one or more database function outputs. A database function is created based at least in part on the one or more database function inputs, the cluster set-up information, the deep learning model, and the one or more database function outputs. In some embodiments, the database function enables a non-technical user to utilize deep learning models.
    Type: Grant
    Filed: January 31, 2023
    Date of Patent: April 2, 2024
    Assignee: Databricks, Inc.
    Inventors: Sue Ann Hong, Shi Xin, Timothee Hunter, Ali Ghodsi
  • Patent number: 11874832
    Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
    Type: Grant
    Filed: January 23, 2023
    Date of Patent: January 16, 2024
    Assignee: Databricks, Inc.
    Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hovell tot Westerflier
  • Patent number: 11853277
    Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
    Type: Grant
    Filed: January 31, 2023
    Date of Patent: December 26, 2023
    Assignee: Databricks, Inc.
    Inventors: Aaron Daniel Davidson, Tomas Nykodym, Clemens Mewald
  • Patent number: 11775499
    Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
    Type: Grant
    Filed: March 15, 2022
    Date of Patent: October 3, 2023
    Assignee: Databricks, Inc.
    Inventors: Michael Paul Armbrust, Shixiong Zhu, Burak Yavuz