Patents Assigned to Databricks Inc.
-
Patent number: 11693837Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.Type: GrantFiled: May 19, 2021Date of Patent: July 4, 2023Assignee: Databricks, Inc.Inventors: Aaron Daniel Davidson, Tomas Nykodym, Clemens Mewald
-
Patent number: 11693723Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.Type: GrantFiled: November 29, 2021Date of Patent: July 4, 2023Assignee: Databricks, Inc.Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin
-
Patent number: 11675767Abstract: A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.Type: GrantFiled: November 16, 2020Date of Patent: June 13, 2023Assignee: Databricks, Inc.Inventors: Alexander Behm, Ankur Dave
-
Patent number: 11599783Abstract: A function creation method is disclosed. The method comprises defining one or more database function inputs, defining cluster processing information, defining a deep learning model, and defining one or more database function outputs. A database function is created based at least in part on the one or more database function inputs, the cluster set-up information, the deep learning model, and the one or more database function outputs. In some embodiments, the database function enables a non-technical user to utilize deep learning models.Type: GrantFiled: May 31, 2017Date of Patent: March 7, 2023Assignee: Databricks, Inc.Inventors: Sue Ann Hong, Shi Xin, Timothee Hunter, Ali Ghodsi
-
Patent number: 11586624Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.Type: GrantFiled: April 22, 2021Date of Patent: February 21, 2023Assignee: Databricks, Inc.Inventors: Shi Xin, Alexander Behm, Shoumik Palkar, Herman Rudolf Petrus Catharina van Hövell tot Westerflier
-
Patent number: 11567998Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.Type: GrantFiled: June 29, 2021Date of Patent: January 31, 2023Assignee: Databricks, Inc.Inventors: Michael Paul Armbrust, Andreas Neumann, Mukul Murthy, Jonathan Mio
-
Patent number: 11567900Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.Type: GrantFiled: July 23, 2021Date of Patent: January 31, 2023Assignee: Databricks, Inc.Inventors: Rahul Shivu Mahadev, Burak Yavuz, Tathagata Das
-
Patent number: 11514045Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.Type: GrantFiled: December 19, 2019Date of Patent: November 29, 2022Assignee: Databricks Inc.Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
-
Patent number: 11481398Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.Type: GrantFiled: December 9, 2020Date of Patent: October 25, 2022Assignee: Databricks Inc.Inventors: Alexander Behm, Ankur Dave, Ryan Deng, Shoumik Palkar
-
Patent number: 11468369Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.Type: GrantFiled: January 28, 2022Date of Patent: October 11, 2022Assignee: Databricks Inc.Inventors: Benjamin Thomas Wilson, Corey Zumar
-
Patent number: 11379272Abstract: The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a list of cluster machines storing one or more intermediate data files of a set of intermediate data files; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of a set of tasks executing or pending on the set of cluster machines nor storing the one or more intermediate data files of the set of intermediate data files, where the set of intermediate data files is associated with the set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.Type: GrantFiled: September 14, 2020Date of Patent: July 5, 2022Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-Hao Liang
-
Patent number: 11308071Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.Type: GrantFiled: July 28, 2020Date of Patent: April 19, 2022Assignee: Databricks Inc.Inventors: Michael Paul Armbrust, Shixiong Zhu, Burak Yavuz
-
Patent number: 11216324Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.Type: GrantFiled: February 18, 2020Date of Patent: January 4, 2022Assignee: Databricks Inc.Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin
-
Patent number: 11113043Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.Type: GrantFiled: April 30, 2020Date of Patent: September 7, 2021Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-Hao Liang, Gregory George Owen
-
Patent number: 11068447Abstract: A system for directory level atomic commits includes an interface and a processor. The interface is configured to receive an indication to provide a set of files. The processor is configured to determine whether a file in a directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted and provide the file as one file of the set of files in the event that the file in the directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted.Type: GrantFiled: April 14, 2017Date of Patent: July 20, 2021Assignee: Databricks Inc.Inventors: Eric Keng-hao Liang, Srinath Shankar, Shi Xin
-
Patent number: 10810051Abstract: The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a set of tasks executing or pending on the set of cluster machines; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of the set of tasks nor storing one or more intermediate data files of a set of intermediate data files, where the set of intermediate data files is associated with a set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.Type: GrantFiled: November 13, 2018Date of Patent: October 20, 2020Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-Hao Liang
-
Patent number: 10769130Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.Type: GrantFiled: May 23, 2018Date of Patent: September 8, 2020Assignee: Databricks Inc.Inventors: Michael Paul Armbrust, Shixiong Zhu, Burak Yavuz
-
Patent number: 10691433Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.Type: GrantFiled: August 31, 2018Date of Patent: June 23, 2020Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-hao Liang, Gregory George Owen
-
Patent number: 10678536Abstract: A system for processing a notebook includes an input interface and a processor. The input interface is to receive a first notebook. The notebook comprises code for interactively querying and viewing data. The processor is to load the first notebook into a shell. The shell receives one or more parameters associated with the first notebook. The shell executes the first notebook using a cluster.Type: GrantFiled: April 8, 2019Date of Patent: June 9, 2020Assignee: Databricks Inc.Inventors: Timothee Hunter, Ali Ghodsi, Ion Stoica
-
Patent number: 10606675Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.Type: GrantFiled: November 10, 2017Date of Patent: March 31, 2020Assignee: Databricks Inc.Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin