Patents by Inventor Orestis KOSTAKIS

Orestis KOSTAKIS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240119051
    Abstract: The subject technology receives a query directed to a set of source tables, each source table organized into a set of micro-partitions. The subject technology determines a set of metadata, the set of metadata comprising table metadata, query metadata, and historical data related to the query. The subject technology predicts, using a machine learning model, an indicator of an amount of computing resources for executing the query based at least in part on the set of metadata. The subject technology generates a query plan for executing the query based at least in part on the predicted indicator of the amount of computing resources. The subject technology executes the query based at least in part on the query plan.
    Type: Application
    Filed: December 19, 2023
    Publication date: April 11, 2024
    Inventors: Qiming Jiang, Orestis Kostakis
  • Patent number: 11947533
    Abstract: A method includes parsing, by at least one hardware processor, a query to determine query comments and query code associated with the query. A query execution plan is generated based on the query code. Query execution using the query code is performed at a first computing node associated with a query processing pipeline. A detection is made that the query comments are indicative of a software bug in the query code based on analysis of the query comments. The detection is performed at a second computing node associated with a query analysis pipeline. A notification of the software bug and a result of the query execution is output.
    Type: Grant
    Filed: May 16, 2023
    Date of Patent: April 2, 2024
    Assignee: Snowflake Inc.
    Inventor: Orestis Kostakis
  • Patent number: 11934927
    Abstract: Systems and methods for managing input and output error of a machine learning (ML) model in a database system are presented herein. A set of test queries is executed on a first version of a database system to generate first test data, wherein the first version of the system comprises a ML model to generate an output corresponding to a function of the database system. An error model is trained based on the first test data and second test data generated based on a previous version of the system. The error model determines an error associated with the ML model between the first and previous versions of the system. The first version of the system is deployed with the error model, which corrects an output or an input of the ML model until sufficient data has been produced by the error model to retrain the ML model.
    Type: Grant
    Filed: December 22, 2022
    Date of Patent: March 19, 2024
    Assignee: Snowflake Inc.
    Inventors: Orestis Kostakis, Qiming Jiang, Boxin Jiang
  • Publication number: 20240078235
    Abstract: A system for improving task scheduling on a cloud data platform is provided. A task to be executed using resources of a computing cluster is received. A task execution plan is generated and information about data to be used for the ask is accessed. Resource requirements for executing the task are predicted by applying machine learning to the task execution plan and the information about the data. Assignment data is generated to execute the task on the resources by applying machine learning information about a current state of the resources and predicted resource requirements.
    Type: Application
    Filed: July 31, 2023
    Publication date: March 7, 2024
    Inventors: Qiming Jiang, Orestis Kostakis, John Reumann
  • Publication number: 20240062098
    Abstract: The subject technology receives first party training data provided by an end-user of a baseline machine learning model. The subject technology determines a first set of common features based on the first party training data. The subject technology receives, from at least one data source. The subject technology determines a second set of common features based on the set of datasets. The subject technology trains, using the first set of common features and the second set of common features, a second machine learning model, the second machine learning model incorporating additional training data from the external data supplier during training compared to the baseline machine learning model. The subject technology generates a boosted machine learning model based at least in part on the training, the boosted machine learning model comprising the trained second machine learning model.
    Type: Application
    Filed: August 23, 2022
    Publication date: February 22, 2024
    Inventors: Rachel Frances Blum, Nancy Dou, Matthew J. Glickman, Boxin Jiang, Orestis Kostakis, Justin Langseth, Michael Earle Rainey, Haoran Yu
  • Patent number: 11880364
    Abstract: The subject technology receives a query directed to a set of source tables, each source table organized into a set of micro-partitions. The subject technology determines a set of metadata, the set of metadata comprising table metadata, query metadata, and historical data related to the query. The subject technology predicts, using a machine learning model, an indicator of an amount of computing resources for executing the query based at least in part on the set of metadata. The subject technology generates a query plan for executing the query based at least in part on the predicted indicator of the amount of computing resources. The subject technology executes the query based at least in part on the query plan.
    Type: Grant
    Filed: January 25, 2021
    Date of Patent: January 23, 2024
    Assignee: Snowflake Inc.
    Inventors: Qiming Jiang, Orestis Kostakis
  • Publication number: 20230409968
    Abstract: A method includes installing, in a consumer database account, a shared-instance database that includes a shared instance of a provider-account database that resides in a provider database account. The shared-instance database includes a first schema that includes provider-account training data, provider-account scoring data, a training function, and a scoring function. The method also includes invoking the training function from the consumer database account, which results in creation in the consumer database account of a second schema that includes a machine-learning-model instance of a machine learning model, and which also results in training the machine-learning model instance with at least the provider-account training data. Additionally, the method includes generating consumer-account scoring data by inputting, into the trained machine-learning-model instance, consumer-account input data that is stored in the consumer database account.
    Type: Application
    Filed: January 31, 2023
    Publication date: December 21, 2023
    Inventors: Orestis Kostakis, Justin Langseth
  • Publication number: 20230401185
    Abstract: A set of affinity metrics may be determined for a set of listings, each listing of the set of listings comprising data to be shared through a data exchange, wherein the set of affinity metrics includes a set of characteristics allowing identification of a listing having one or more characteristics in the set of characteristics. For each pair of listings of the set of listings, an affinity score can be calculated, using the set of affinity metrics, and stored as part of the record in an affinity store. One or more listings of the set of listings using the affinity score between the first listing of the set of listings and the one or more listings of the set of listings can be presented.
    Type: Application
    Filed: February 22, 2023
    Publication date: December 14, 2023
    Inventors: Orestis Kostakis, Prasanna V. Krishnan, Subramanian Muralidhar, Shakhina Pulatova, Megan Marie Schoendorf
  • Publication number: 20230393816
    Abstract: The subject technology identifies a set of functions included in a set of files corresponding to a library. The subject technology, for each function in the set of functions, registers the function as a user defined function (UDF). The subject technology generates a name for the function based at least in part on a predetermined prefix, wherein the predetermined prefix comprises an alphanumeric string. The subject technology generates, using at least a particular set of input parameters utilized by the function and a particular type of parameter of each input parameter of the particular set of input parameters, a particular set of source code. The subject technology stores information corresponding to the function in a metadata database. The subject technology provides access to the function in a different application.
    Type: Application
    Filed: July 31, 2023
    Publication date: December 7, 2023
    Inventors: Jianzhun Du, Orestis Kostakis, Kristopher Wagner, Yijun Xie
  • Patent number: 11836138
    Abstract: A system for generating similarity data for different datasets in a cloud data platform. A first dataset of a plurality of datasets on the cloud data platform is identified, where the first dataset is associated with a first user of the cloud data platform. A semantic type for each feature the first dataset is identified, and each semantic type for the first dataset is compared with existing data of the first user. Semantic types for each feature of each dataset are identified, and each semantic type for the first dataset is compared to each semantic type of each dataset. Overlap requests are generated to output overlap datasets between the first dataset and each of the plurality of datasets. A results dataset is generated by applying the overlap requests to a joined dataset comprising data from the first dataset and data from each of the plurality of datasets.
    Type: Grant
    Filed: January 31, 2023
    Date of Patent: December 5, 2023
    Assignee: Snowflake Inc.
    Inventors: Matthew J. Glickman, Orestis Kostakis, Justin Langseth
  • Publication number: 20230385286
    Abstract: A system for generating similarity data for different datasets in a cloud data platform. A first dataset of a plurality of datasets on the cloud data platform is identified, where the first dataset is associated with a first user of the cloud data platform. A semantic type for each feature the first dataset is identified, and each semantic type for the first dataset is compared with existing data of the first user. Semantic types for each feature of each dataset are identified, and each semantic type for the first dataset is compared to each semantic type of each dataset. Overlap requests are generated to output overlap datasets between the first dataset and each of the plurality of datasets. A results dataset is generated by applying the overlap requests to a joined dataset comprising data from the first dataset and data from each of the plurality of datasets.
    Type: Application
    Filed: January 31, 2023
    Publication date: November 30, 2023
    Inventors: Matthew J. Glickman, Orestis Kostakis, Justin Langseth
  • Publication number: 20230385284
    Abstract: Systems, methods, and machine-readable storage devices provide for identifying a user dataset on a distributed database. The system includes generating a similarity score dataset that indicates a similarity between the user dataset and a plurality of datasets of other users of the distributed database. The system generates a plurality of overlap queries that are configured to output overlap datasets between the user dataset and one or more of the plurality of datasets. The system further generates a results dataset by applying one or more of the plurality of overlap queries to a joined dataset comprising data from the user dataset and one of the plurality of datasets of other users on the distributed database.
    Type: Application
    Filed: May 27, 2022
    Publication date: November 30, 2023
    Inventors: Matthew J. Glickman, Orestis Kostakis, Justin Langseth
  • Patent number: 11755576
    Abstract: A system for improving task scheduling on a cloud data platform is provided. A task is received, from a user of a cloud data platform, for execution on a dataset of a cloud data platform using a plurality of resources. A task graph is generated, and metadata related to the dataset is accessed for use in execution of the task. A predicted resource profile is generated by applying a first machine learning scheme to the task graph and the metadata of the dataset. Assignment data is generated to execute processes of the task on the plurality of resources. The assignment data generated by applying a second machine learning scheme to current state data of a current computational state of the plurality of resources and the predicted resource profile generated by the first machine learning scheme.
    Type: Grant
    Filed: January 31, 2023
    Date of Patent: September 12, 2023
    Assignee: Snowflake Inc.
    Inventors: Qiming Jiang, Orestis Kostakis, John Reumann
  • Patent number: 11755291
    Abstract: The subject technology identifies a set of functions in a set of files corresponding to a library. The subject technology, for each function, registers the function as a user defined function (UDF) based on a set of input parameters utilized by the function and a type of parameter of each of the input parameters. The subject technology provides access to each registered function in a different application.
    Type: Grant
    Filed: January 31, 2023
    Date of Patent: September 12, 2023
    Assignee: Snowflake Inc.
    Inventors: Jianzhun Du, Orestis Kostakis, Kristopher Wagner, Yijun Xie
  • Publication number: 20230281196
    Abstract: A method includes parsing, by at least one hardware processor, a query to determine query comments and query code associated with the query. A query execution plan is generated based on the query code. Query execution using the query code is performed at a first computing node associated with a query processing pipeline. A detection is made that the query comments are indicative of a software bug in the query code based on analysis of the query comments. The detection is performed at a second computing node associated with a query analysis pipeline. A notification of the software bug and a result of the query execution is output.
    Type: Application
    Filed: May 16, 2023
    Publication date: September 7, 2023
    Inventor: Orestis Kostakis
  • Patent number: 11726996
    Abstract: Disclosed herein are embodiments of systems and methods for analyzing query comments for identifying potential software bugs. In an example, a data platform obtains query comments associated with a query. Based on determining that the query comments include a reference to a software bug of the data platform, the data platform generates a software-bug alert based on the query comments, and transmits the software-bug alert to an endpoint.
    Type: Grant
    Filed: March 9, 2022
    Date of Patent: August 15, 2023
    Assignee: Snowflake Inc.
    Inventor: Orestis Kostakis
  • Patent number: 11687506
    Abstract: Affinity-based listing recommendations are created and used in a public data exchange. Listings can be evaluated against one another for affinity or similarity such that users working with a particular dataset can be presented with other datasets that share an affinity. Affinity can be determined from both the dataset metadata as well as information from the dataset content. Calculation of affinity scores can be pre-computed and stored, in advance of use, or determined on-the-fly. Presentation of most-similar listings can be deterministic, can contain randomization, can employ time-decay, can be weighted, and can make use of a tiered-sum approach.
    Type: Grant
    Filed: July 25, 2022
    Date of Patent: June 27, 2023
    Assignee: Snowflake Inc.
    Inventors: Orestis Kostakis, Prasanna V. Krishnan, Subramanian Muralidhar, Shakhina Pulatova, Megan Marie Schoendorf
  • Patent number: 11651287
    Abstract: Embodiments of the present disclosure may provide a data sharing system implemented as a local application in a consumer database of a distributed database. The local application can include a training function and a scoring function to train a machine learning model on provider and consumer data, and generate output data by applying the trained machine learning model on input data. The input data can include data portions from a consumer database and a provider database that are joined to create a joined dataset for scoring.
    Type: Grant
    Filed: July 31, 2022
    Date of Patent: May 16, 2023
    Assignee: Snowflake Inc.
    Inventors: Orestis Kostakis, Justin Langseth
  • Publication number: 20230132117
    Abstract: Systems and methods for managing input and output error of a machine learning (ML) model in a database system are presented herein. A set of test queries is executed on a first version of a database system to generate first test data, wherein the first version of the system comprises a ML model to generate an output corresponding to a function of the database system. An error model is trained based on the first test data and second test data generated based on a previous version of the system. The error model determines an error associated with the ML model between the first and previous versions of the system. The first version of the system is deployed with the error model, which corrects an output or an input of the ML model until sufficient data has been produced by the error model to retrain the ML model.
    Type: Application
    Filed: December 22, 2022
    Publication date: April 27, 2023
    Inventors: Orestis Kostakis, Qiming Jiang, Boxin Jiang
  • Patent number: 11620110
    Abstract: The subject technology receives a set of files corresponding to a library, the library comprising a set of functions included in the set of files. The subject technology parses the set of files. The subject technology identifies a set of functions in the set of files based on the parsing. The subject technology, for each function, registers the function as a user defined function (UDF) based on a set of input parameters utilized by the function and a type of parameter of each of the input parameters. The subject technology provides access to each registered function in a different application.
    Type: Grant
    Filed: June 7, 2022
    Date of Patent: April 4, 2023
    Assignee: Snowflake Inc.
    Inventors: Jianzhun Du, Orestis Kostakis, Kristopher Wagner, Yijun Xie