Patents by Inventor Matei Zaharia

Matei Zaharia has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12353445
    Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
    Type: Grant
    Filed: October 29, 2021
    Date of Patent: July 8, 2025
    Assignee: Databricks, Inc.
    Inventors: Mani Parkhe, Clemens Mewald, Matei Zaharia, Avesh Singh
  • Patent number: 12353446
    Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
    Type: Grant
    Filed: January 31, 2023
    Date of Patent: July 8, 2025
    Assignee: Databricks, Inc.
    Inventors: Mani Parkhe, Clemens Mewald, Matei Zaharia, Avesh Singh
  • Publication number: 20250200198
    Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.
    Type: Application
    Filed: March 6, 2025
    Publication date: June 19, 2025
    Inventors: Matei Zaharia, David Lewis, Cheng Lian, Yuchen Huo, Ali Ghodsi
  • Publication number: 20250131118
    Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.
    Type: Application
    Filed: November 25, 2024
    Publication date: April 24, 2025
    Inventors: Matei Zaharia, Shixiong Zhu, Xiaotong Sun, Ramesh Chandra, Michael Paul Armbrust, Ali Ghodsi
  • Patent number: 12277237
    Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.
    Type: Grant
    Filed: October 29, 2021
    Date of Patent: April 15, 2025
    Assignee: Databricks, Inc.
    Inventors: Matei Zaharia, David Lewis, Cheng Lian, Yuchen Huo, Ali Ghodsi
  • Publication number: 20250086177
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Application
    Filed: June 17, 2024
    Publication date: March 13, 2025
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 12182292
    Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.
    Type: Grant
    Filed: January 31, 2023
    Date of Patent: December 31, 2024
    Assignee: Databricks, Inc.
    Inventors: Matei Zaharia, Shixiong Zhu, Xiaotong Sun, Ramesh Chandra, Michael Paul Armbrust, Ali Ghodsi
  • Publication number: 20240412095
    Abstract: A system performs training and execution of machine learning models that use on-demand features using feature functions. The system receives commands for registering metadata associated with a machine learning model. The machine learning model may process a set of features including on-demand features as well as other features such as batch features. The system executes the command by storing an association between the machine learning model and the feature functions associated with any on-demand features processed by the machine learning model. The feature functions are executed using an end point of a data asset service. The use of the data asset service for invoking the feature functions ensures that the same set of instructions is executed during model training and model inferencing, thereby avoiding model skew.
    Type: Application
    Filed: June 6, 2023
    Publication date: December 12, 2024
    Inventors: Matei Zaharia, Avesh Singh, Mani Parkhe, Maxim Lukiyanov, Xiangrui Meng, Aakrati Talati, Chenen Liang, Kasey Uhlenhuth
  • Patent number: 12147555
    Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.
    Type: Grant
    Filed: April 29, 2022
    Date of Patent: November 19, 2024
    Assignee: Databricks, Inc.
    Inventors: Matei Zaharia, Shixiong Zhu, Xiaotong Sun, Ramesh Chandra, Michael Paul Armbrust, Ali Ghodsi
  • Patent number: 12032573
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: October 28, 2022
    Date of Patent: July 9, 2024
    Assignee: Databricks, Inc.
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Publication number: 20230177072
    Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
    Type: Application
    Filed: January 31, 2023
    Publication date: June 8, 2023
    Inventors: Mani Parkhe, Clemens Mewald, Matei Zaharia, Avesh Singh
  • Publication number: 20230141556
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Application
    Filed: October 28, 2022
    Publication date: May 11, 2023
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 11514045
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: November 29, 2022
    Assignee: Databricks Inc.
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Publication number: 20220374532
    Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.
    Type: Application
    Filed: October 29, 2021
    Publication date: November 24, 2022
    Inventors: Matei Zaharia, David Lewis, Cheng Lian, Yuchen Huo, Ali Ghodsi
  • Publication number: 20220374457
    Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
    Type: Application
    Filed: October 29, 2021
    Publication date: November 24, 2022
    Inventors: Mani Parkhe, Clemens Mewald, Matei Zaharia, Avesh Singh
  • Publication number: 20200257689
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Application
    Filed: December 19, 2019
    Publication date: August 13, 2020
    Inventors: Michael Paul Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 10558664
    Abstract: A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: February 11, 2020
    Assignee: Databricks Inc.
    Inventors: Michael Armbrust, Tathagata Das, Shi Xin, Matei Zaharia
  • Patent number: 10474501
    Abstract: A system for cluster resource allocation includes an interface and a processor. The interface is configured to receive a process and input data. The processor is configured to determine an estimate for resources required for the process to process the input data; determine existing available resources in a cluster for running the process; determine whether the existing available resources are sufficient for running the process; in the event it is determined that the existing available resources are not sufficient for running the process, indicate to add new resources; determine an allocated share of resources in the cluster for running the process; and cause execution of the process using the share of resources.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: November 12, 2019
    Assignee: Databricks Inc.
    Inventors: Ali Ghodsi, Srinath Shankar, Sameer Paranjpye, Shi Xin, Matei Zaharia
  • Patent number: 10361928
    Abstract: A system for cluster management comprises a status monitor and an instance replacement manager. The status monitor is for monitoring status of an instance of a set of instances on a cluster provider. The instance replacement manager is for determining a replacement strategy for the instance in the event the instance does not respond. The replacement strategy for the instance is based at least in part on a management criteria for on-demand instances and spot instances on the cluster provider.
    Type: Grant
    Filed: August 21, 2017
    Date of Patent: July 23, 2019
    Assignee: Databricks Inc.
    Inventors: Ali Ghodsi, Ion Stoica, Matei Zaharia
  • Publication number: 20180314556
    Abstract: A system for cluster resource allocation includes an interface and a processor. The interface is configured to receive a process and input data. The processor is configured to determine an estimate for resources required for the process to process the input data; determine existing available resources in a cluster for running the process; determine whether the existing available resources are sufficient for running the process; in the event it is determined that the existing available resources are not sufficient for running the process, indicate to add new resources; determine an allocated share of resources in the cluster for running the process; and cause execution of the process using the share of resources.
    Type: Application
    Filed: April 28, 2017
    Publication date: November 1, 2018
    Inventors: Ali Ghodsi, Srinath Shankar, Sameer Paranjpye, Shi Xin, Matei Zaharia