Patents Assigned to Databricks Inc.
  • Patent number: 11693837
    Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: July 4, 2023
    Assignee: Databricks, Inc.
    Inventors: Aaron Daniel Davidson, Tomas Nykodym, Clemens Mewald
  • Patent number: 11693723
    Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.
    Type: Grant
    Filed: November 29, 2021
    Date of Patent: July 4, 2023
    Assignee: Databricks, Inc.
    Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin
  • Patent number: 11675767
    Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.
    Type: Grant
    Filed: November 16, 2020
    Date of Patent: June 13, 2023
    Assignee: Databricks, Inc.
    Inventors: Alexander Behm, Ankur Dave
  • Patent number: 11599783
    Abstract: A function creation method is disclosed. The method comprises defining one or more database function inputs, defining cluster processing information, defining a deep learning model, and defining one or more database function outputs. A database function is created based at least in part on the one or more database function inputs, the cluster set-up information, the deep learning model, and the one or more database function outputs. In some embodiments, the database function enables a non-technical user to utilize deep learning models.
    Type: Grant
    Filed: May 31, 2017
    Date of Patent: March 7, 2023
    Assignee: Databricks, Inc.
    Inventors: Sue Ann Hong, Shi Xin, Timothee Hunter, Ali Ghodsi
  • Patent number: 11586624
    Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
    Type: Grant
    Filed: April 22, 2021
    Date of Patent: February 21, 2023
    Assignee: Databricks, Inc.
    Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hövell tot Westerflier
  • Patent number: 11567998
    Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: January 31, 2023
    Assignee: Databricks, Inc.
    Inventors: Michael Paul Armbrust, Andreas Neumann, Mukul Murthy, Jonathan Mio
  • Patent number: 11567900
    Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: January 31, 2023
    Assignee: Databricks, Inc.
    Inventors: Rahul Shivu Mahadev, Burak Yavuz, Tathagata Das
  • Patent number: 11514045
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: November 29, 2022
    Assignee: Databricks Inc.
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 11481398
    Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.
    Type: Grant
    Filed: December 9, 2020
    Date of Patent: October 25, 2022
    Assignee: Databricks Inc.
    Inventors: Alexander Behm, Ankur Dave, Ryan Deng, Shoumik Palkar
  • Patent number: 11468369
    Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.
    Type: Grant
    Filed: January 28, 2022
    Date of Patent: October 11, 2022
    Assignee: Databricks Inc.
    Inventors: Benjamin Thomas Wilson, Corey Zumar
  • Patent number: 11379272
    Abstract: The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a list of cluster machines storing one or more intermediate data files of a set of intermediate data files; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of a set of tasks executing or pending on the set of cluster machines nor storing the one or more intermediate data files of the set of intermediate data files, where the set of intermediate data files is associated with the set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.
    Type: Grant
    Filed: September 14, 2020
    Date of Patent: July 5, 2022
    Assignee: Databricks Inc.
    Inventors: Srinath Shankar, Eric Keng-Hao Liang
  • Patent number: 11308071
    Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
    Type: Grant
    Filed: July 28, 2020
    Date of Patent: April 19, 2022
    Assignee: Databricks Inc.
    Inventors: Michael Paul Armbrust, Shixiong Zhu, Burak Yavuz
  • Patent number: 11216324
    Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.
    Type: Grant
    Filed: February 18, 2020
    Date of Patent: January 4, 2022
    Assignee: Databricks Inc.
    Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin
  • Patent number: 11113043
    Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: September 7, 2021
    Assignee: Databricks Inc.
    Inventors: Srinath Shankar, Eric Keng-Hao Liang, Gregory George Owen
  • Patent number: 11068447
    Abstract: A system for directory level atomic commits includes an interface and a processor. The interface is configured to receive an indication to provide a set of files. The processor is configured to determine whether a file in a directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted and provide the file as one file of the set of files in the event that the file in the directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted.
    Type: Grant
    Filed: April 14, 2017
    Date of Patent: July 20, 2021
    Assignee: Databricks Inc.
    Inventors: Eric Keng-hao Liang, Srinath Shankar, Shi Xin
  • Patent number: 10810051
    Abstract: The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a set of tasks executing or pending on the set of cluster machines; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of the set of tasks nor storing one or more intermediate data files of a set of intermediate data files, where the set of intermediate data files is associated with a set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: October 20, 2020
    Assignee: Databricks Inc.
    Inventors: Srinath Shankar, Eric Keng-Hao Liang
  • Patent number: 10769130
    Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
    Type: Grant
    Filed: May 23, 2018
    Date of Patent: September 8, 2020
    Assignee: Databricks Inc.
    Inventors: Michael Paul Armbrust, Shixiong Zhu, Burak Yavuz
  • Patent number: 10691433
    Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.
    Type: Grant
    Filed: August 31, 2018
    Date of Patent: June 23, 2020
    Assignee: Databricks Inc.
    Inventors: Srinath Shankar, Eric Keng-hao Liang, Gregory George Owen
  • Patent number: 10678536
    Abstract: A system for processing a notebook includes an input interface and a processor. The input interface is to receive a first notebook. The notebook comprises code for interactively querying and viewing data. The processor is to load the first notebook into a shell. The shell receives one or more parameters associated with the first notebook. The shell executes the first notebook using a cluster.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: June 9, 2020
    Assignee: Databricks Inc.
    Inventors: Timothee Hunter, Ali Ghodsi, Ion Stoica
  • Patent number: 10606675
    Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.
    Type: Grant
    Filed: November 10, 2017
    Date of Patent: March 31, 2020
    Assignee: Databricks Inc.
    Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin