Patents Assigned to Databricks Inc.
-
Patent number: 11514045Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.Type: GrantFiled: December 19, 2019Date of Patent: November 29, 2022Assignee: Databricks Inc.Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
-
Patent number: 11481398Abstract: A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.Type: GrantFiled: December 9, 2020Date of Patent: October 25, 2022Assignee: Databricks Inc.Inventors: Alexander Behm, Ankur Dave, Ryan Deng, Shoumik Palkar
-
Patent number: 11468369Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.Type: GrantFiled: January 28, 2022Date of Patent: October 11, 2022Assignee: Databricks Inc.Inventors: Benjamin Thomas Wilson, Corey Zumar
-
Patent number: 11379272Abstract: The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a list of cluster machines storing one or more intermediate data files of a set of intermediate data files; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of a set of tasks executing or pending on the set of cluster machines nor storing the one or more intermediate data files of the set of intermediate data files, where the set of intermediate data files is associated with the set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.Type: GrantFiled: September 14, 2020Date of Patent: July 5, 2022Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-Hao Liang
-
Patent number: 11308071Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.Type: GrantFiled: July 28, 2020Date of Patent: April 19, 2022Assignee: Databricks Inc.Inventors: Michael Paul Armbrust, Shixiong Zhu, Burak Yavuz
-
Patent number: 11216324Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.Type: GrantFiled: February 18, 2020Date of Patent: January 4, 2022Assignee: Databricks Inc.Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin
-
Patent number: 11113043Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.Type: GrantFiled: April 30, 2020Date of Patent: September 7, 2021Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-Hao Liang, Gregory George Owen
-
Patent number: 11068447Abstract: A system for directory level atomic commits includes an interface and a processor. The interface is configured to receive an indication to provide a set of files. The processor is configured to determine whether a file in a directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted and provide the file as one file of the set of files in the event that the file in the directory has been either 1) atomically committed or 2) written by a non-atomic process and not designated as deleted.Type: GrantFiled: April 14, 2017Date of Patent: July 20, 2021Assignee: Databricks Inc.Inventors: Eric Keng-hao Liang, Srinath Shankar, Shi Xin
-
Patent number: 10810051Abstract: The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a set of tasks executing or pending on the set of cluster machines; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of the set of tasks nor storing one or more intermediate data files of a set of intermediate data files, where the set of intermediate data files is associated with a set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.Type: GrantFiled: November 13, 2018Date of Patent: October 20, 2020Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-Hao Liang
-
Patent number: 10769130Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.Type: GrantFiled: May 23, 2018Date of Patent: September 8, 2020Assignee: Databricks Inc.Inventors: Michael Paul Armbrust, Shixiong Zhu, Burak Yavuz
-
Patent number: 10691433Abstract: A system for code development and execution includes a client interface and a client processor. The client interface is configured to receive user code for execution and receive an indication of a server that will perform the execution. The client processor is configured to parse the user code to identify one or more data items referred to during the execution. The client processor is also configured to provide the server with an inquiry for metadata regarding the one or more data items, receive the metadata regarding the one or more data items, determine a logical plan based at least in part on the metadata regarding the one or more data items; and provide the logical plan to the server for execution.Type: GrantFiled: August 31, 2018Date of Patent: June 23, 2020Assignee: Databricks Inc.Inventors: Srinath Shankar, Eric Keng-hao Liang, Gregory George Owen
-
Patent number: 10678536Abstract: A system for processing a notebook includes an input interface and a processor. The input interface is to receive a first notebook. The notebook comprises code for interactively querying and viewing data. The processor is to load the first notebook into a shell. The shell receives one or more parameters associated with the first notebook. The shell executes the first notebook using a cluster.Type: GrantFiled: April 8, 2019Date of Patent: June 9, 2020Assignee: Databricks Inc.Inventors: Timothee Hunter, Ali Ghodsi, Ion Stoica
-
Patent number: 10606675Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.Type: GrantFiled: November 10, 2017Date of Patent: March 31, 2020Assignee: Databricks Inc.Inventors: Alicja Luszczak, Srinath Shankar, Shi Xin
-
Patent number: 10558664Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.Type: GrantFiled: April 28, 2017Date of Patent: February 11, 2020Assignee: Databricks Inc.Inventors: Michael Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
-
Patent number: 10474736Abstract: A system for multiple views for a notebook includes an input interface and a processor. The input interface to receive a notebook. The processor is to load the notebook into a shell, wherein the shell executes the notebook using a cluster, to receive an indication to view a dashboard associated with the notebook, and to provide dashboard display information. The dashboard includes a page layout display.Type: GrantFiled: December 22, 2015Date of Patent: November 12, 2019Assignee: Databricks Inc.Inventors: Ion Stoica, Ali Ghodsi, Chaoyu Yang
-
Patent number: 10474501Abstract: A system for cluster resource allocation includes an interface and a processor. The interface is configured to receive a process and input data. The processor is configured to determine an estimate for resources required for the process to process the input data; determine existing available resources in a cluster for running the process; determine whether the existing available resources are sufficient for running the process; in the event it is determined that the existing available resources are not sufficient for running the process, indicate to add new resources; determine an allocated share of resources in the cluster for running the process; and cause execution of the process using the share of resources.Type: GrantFiled: April 28, 2017Date of Patent: November 12, 2019Assignee: Databricks Inc.Inventors: Ali Ghodsi, Srinath Shankar, Sameer Paranjpye, Shi Xin, Matei Zaharia
-
Patent number: 10361928Abstract: A system for cluster management comprises a status monitor and an instance replacement manager. The status monitor is for monitoring status of an instance of a set of instances on a cluster provider. The instance replacement manager is for determining a replacement strategy for the instance in the event the instance does not respond. The replacement strategy for the instance is based at least in part on a management criteria for on-demand instances and spot instances on the cluster provider.Type: GrantFiled: August 21, 2017Date of Patent: July 23, 2019Assignee: Databricks Inc.Inventors: Ali Ghodsi, Ion Stoica, Matei Zaharia
-
Patent number: 10296329Abstract: A system for processing a notebook includes an input interface and a processor. The input interface is to receive a first notebook. The notebook comprises code for interactively querying and viewing data. The processor is to load the first notebook into a shell. The shell receives one or more parameters associated with the first notebook. The shell executes the first notebook using a cluster.Type: GrantFiled: November 3, 2017Date of Patent: May 21, 2019Assignee: Databricks Inc.Inventors: Timothee Hunter, Ali Ghodsi, Ion Stoica
-
Patent number: 10095735Abstract: A system for exploring data in a database comprises a query parser, a parameter manager, a query submitter, and a result formatter. The query parser is to receive a base query and determine an input parameter from the base query. The parameter manager is to provide a first request for a value for the input parameter; receive the value for the input parameter; and provide a second request for the value for the input parameter. The query submitter is to determine a first query using the base query and the value for the input parameter; and provide an indication to execute the first query. The result formatter is to receive a result associated with the indication to execute the first query.Type: GrantFiled: August 11, 2017Date of Patent: October 9, 2018Assignee: Databricks Inc.Inventors: Ali Ghodsi, Ion Stoica, Matei Zaharia
-
Patent number: 9990230Abstract: A system for scheduling a notebook execution includes an interface and a processor. The interface is to receive an indication to schedule a notebook for execution, wherein the indication comprises a scheduled time and a cluster. The processor is to determine whether it is the scheduled time; and in the event that it is the scheduled time: determine whether the cluster is running; and in the event that the cluster is not running, set up the cluster and cause the notebook to execute using the cluster.Type: GrantFiled: February 24, 2016Date of Patent: June 5, 2018Assignee: Databricks Inc.Inventors: Ion Stoica, Yandong Mao, Eric Liang