Patents by Inventor Tathagata Das

Tathagata Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12360806
    Abstract: The present application discloses a method, system, and computer system for automatically scaling task-processing capacity. The method includes obtaining, at a data layer, a current measure of queued tasks and/or a task-processing capacity, obtaining, by one or more processors, a cost-prioritized criterion or a latency-prioritized criterion, determining a set of tasks to process using the task-processing capacity based at least in part on a set of an input data to process, and automatically scaling the task-processing capacity based at least in part on the current measure of queued tasks and/or the task-processing capacity and either (i) the cost-prioritized criterion or (ii) the latency-prioritized criterion.
    Type: Grant
    Filed: April 25, 2022
    Date of Patent: July 15, 2025
    Assignee: Databricks, Inc.
    Inventors: Andreas Neumann, Kiavash Kianfar, Li Zhang, Tathagata Das
  • Patent number: 12346330
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and obtaining other resulting files based at least in part on a second set of unmatched rows among the target table and the source table that results from the first set of unmatched rows having been processed in the second job, and obtaining a resulting table based on (i) second job resulting file(s), and (ii) other resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a first matching action based on matched rows and a second matching action based on a subset of unmatched rows.
    Type: Grant
    Filed: August 25, 2022
    Date of Patent: July 1, 2025
    Assignee: Databricks, Inc.
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel
  • Publication number: 20250086177
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Application
    Filed: June 17, 2024
    Publication date: March 13, 2025
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Publication number: 20250061132
    Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.
    Type: Application
    Filed: August 30, 2024
    Publication date: February 20, 2025
    Inventors: Alexander Balikov, Tathagata Das, Karthikeyan Ramasamy
  • Publication number: 20250013644
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).
    Type: Application
    Filed: July 10, 2024
    Publication date: January 9, 2025
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel, Prakhar Jain
  • Patent number: 12099525
    Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.
    Type: Grant
    Filed: July 7, 2023
    Date of Patent: September 24, 2024
    Assignee: Databricks, Inc.
    Inventors: Alexander Balikov, Tathagata Das, Karthikeyan Ramasamy
  • Patent number: 12079167
    Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.
    Type: Grant
    Filed: January 6, 2023
    Date of Patent: September 3, 2024
    Assignee: Databricks, Inc.
    Inventors: Rahul Shivu Mahadev, Burak Yavuz, Tathagata Das
  • Patent number: 12056126
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).
    Type: Grant
    Filed: August 25, 2022
    Date of Patent: August 6, 2024
    Assignee: Databricks, Inc.
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel, Prakhar Jain
  • Patent number: 12045220
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).
    Type: Grant
    Filed: August 25, 2022
    Date of Patent: July 23, 2024
    Assignee: Databricks, Inc.
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Chirstos Stavrakakis
  • Patent number: 12032573
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: October 28, 2022
    Date of Patent: July 9, 2024
    Assignee: Databricks, Inc.
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Publication number: 20240202211
    Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.
    Type: Application
    Filed: July 7, 2023
    Publication date: June 20, 2024
    Inventors: Alexander Balikov, Tathagata Das, Karthikeyan Ramasamy
  • Publication number: 20240070155
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and obtaining other resulting files based at least in part on a second set of unmatched rows among the target table and the source table that results from the first set of unmatched rows having been processed in the second job, and obtaining a resulting table based on (i) second job resulting file(s), and (ii) other resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a first matching action based on matched rows and a second matching action based on a subset of unmatched rows.
    Type: Application
    Filed: August 25, 2022
    Publication date: February 29, 2024
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel
  • Publication number: 20240070153
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).
    Type: Application
    Filed: August 25, 2022
    Publication date: February 29, 2024
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel, Prakhar Jain
  • Publication number: 20240070138
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).
    Type: Application
    Filed: August 25, 2022
    Publication date: February 29, 2024
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Chirstos Stavrakakis
  • Publication number: 20240069863
    Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first, second and a third jobs, and obtaining a resulting table based at least in part on the second job resulting file(s) and third job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s). Performing the third job includes determining unmatched rows for target table files and storing the unmatched rows in third job resulting file(s).
    Type: Application
    Filed: August 25, 2022
    Publication date: February 29, 2024
    Inventors: Bart Samwel, Tathagata Das, Lars Kroll, Yijia Cui, Juliusz Sompolski, Tom Van Bussel
  • Publication number: 20230141556
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Application
    Filed: October 28, 2022
    Publication date: May 11, 2023
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 11567900
    Abstract: The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: January 31, 2023
    Assignee: Databricks, Inc.
    Inventors: Rahul Shivu Mahadev, Burak Yavuz, Tathagata Das
  • Patent number: 11514045
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: November 29, 2022
    Assignee: Databricks Inc.
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Publication number: 20200257689
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Application
    Filed: December 19, 2019
    Publication date: August 13, 2020
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 10558664
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: February 11, 2020
    Assignee: Databricks Inc.
    Inventors: Michael Armbrust, Tathagata Das, Shi Xin, Matei Zaharia