Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11934398
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for optimizing selection of a cached execution plan to use in processing a parametric query. For example, systems described herein involve training a plan selection model that makes use of machine learning to identify an execution plan from a set of pre-selected execution plans based on predicted cost of executing a query instance in accordance with the selected execution plan (e.g., relative to predicted costs of executing the query instance using other pre-selected execution plans). This application describes features related to lowering costs associated with selecting the execution plan in a way that will continue to be more accurate overtime based on training and refining the plan selection model.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: March 19, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Anshuman Dutt, Kapil Eknath Vaidya, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Patent number: 11886457
    Abstract: A transform-by-pattern (TBP) system is configured to proactively suggest relevant TBP programs based on inputted source dataset and target dataset without requiring users typing in examples. The TBP system has access to multiple TBP programs, each of which includes a combination of a source pattern, a target pattern, and a transformation program that is configured to transform data that fits into the target pattern into data that fits into the source pattern. When a source dataset and a target dataset are received from a user, the TBP system identifies a subset of the source dataset and a subset of the target dataset as related data. The TBP system then identifies one or more applicable TBP programs amongst the multiple TBP programs, and suggest or apply at least one of the one or more applicable TBP programs.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: January 30, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Surajit Chaudhuri, Zhongjun Jin
  • Publication number: 20240028607
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
    Type: Application
    Filed: September 28, 2023
    Publication date: January 25, 2024
    Inventors: Yeye HE, Kris K. GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Patent number: 11880344
    Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.
    Type: Grant
    Filed: May 14, 2021
    Date of Patent: January 23, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Surajit Chaudhuri, Junwen Yang
  • Publication number: 20230394041
    Abstract: A system and method for executing SQL statements includes receiving an SQL statement for comparing two trendsets over a relation using a scoring function, each of the trendsets including one or more trends, each of the trends being designated by a constraint and a grouping-measure combination, wherein comparing the trendsets includes identifying trend pairs for comparison, each of the trend pairs including a trend from the each of the trendsets having a common grouping-measure combination. The SQL statement is transformed into a basic plan of existing logical operators for performing the SQL statement. A set of sub-plans is determined based on the basic plan. Pairs of sub-plans are merged to generate a set of merged sub-plans. A cost for each of the merged sub-plans is determined. The merged sub-plan having the lowest cost is used to execute the SQL statement.
    Type: Application
    Filed: June 6, 2022
    Publication date: December 7, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Tarique Ashraf SIDDIQUI, Surajit CHAUDHURI, Vivek Ravindranath NARASAYYA
  • Patent number: 11836646
    Abstract: A model generator constructs a model for estimating selectivity of database operations by determining a number of training examples necessary for the model to achieve a target accuracy and by generating approximate selectivity labels for the training examples. The model generator may train the model on an initial number of training examples using cross-validation. The model generator may determine whether the model satisfies the target accuracy and iteratively and geometrically increase the number of training examples based on an optimized geometric step size (which may minimize model construction time) until the model achieves the target accuracy based on a defined confidence level. The model generator may generate labels using a subset of tuples from an intermediate query expression. The model generator may iteratively increase a size of the subset of tuples used until a relative error of the generated labels is below a target threshold.
    Type: Grant
    Filed: June 30, 2020
    Date of Patent: December 5, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Anshuman Dutt, Chi Wang, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Publication number: 20230385261
    Abstract: A method of training an index filter for an index tuning system includes receiving a plurality of different workloads and a plurality of different databases, each database including different tables and each workload including a plurality of queries; generating labeled training by making optimizer calls to a query optimizer using query and index configuration pairs from the plurality of databases and the plurality of workloads; training an index filter model to identify signals in the labeled training data, the signals being indicative of a potential performance improvement associated with using an index configuration for a given query; training the index filter model to learn rules over the signals for identifying spurious indexes; and storing the index filter model in a memory.
    Type: Application
    Filed: August 29, 2022
    Publication date: November 30, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Tarique Ashraf SIDDIQUI, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI, Wentao WU
  • Publication number: 20230368068
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for training and implementing pipeline error detection models to facilitate automated detection of data quality (DQ) issues within recurring data pipelines. For example, systems described herein involve training a pipeline error detection model by first constructing a plurality of DQ constraints for a recurring data pipeline based on ranges of values observed over a history of pipeline executions. The systems may further train the model to predict DQ issues by synthetically applying data variants to historical executions of the recurring data pipeline or to data pipelines having similar characteristics thereto. Once trained, the pipeline error detection model(s) can be applied to new executions of the data pipeline as they become available to quickly and efficiently predict whether a given execution includes a predicted DQ issue therein.
    Type: Application
    Filed: May 12, 2022
    Publication date: November 16, 2023
    Inventors: Yeye HE, Weiwei CUI, Song GE, Haidong ZHANG, Shi HAN, Dongmei ZHANG, Surajit CHAUDHURI
  • Publication number: 20230367771
    Abstract: The present disclosure relates to methods and systems for compressing workloads for use with index tuning. The methods and systems receive a workload with a plurality of queries. The methods and systems represent each query using query features and a utility. The methods and systems select a query for a query subset based on a benefit of the query determined using the query features and the utility. The methods and systems update the features and the utility of the remaining queries in the workload and select another query to add to the query subset based on an updated benefit determined using the updated features and utilities. The methods and systems select queries for the query subset equal to a received query subset size. The methods and systems use the query subset in index tuning to provide one or more indexes to recommendations.
    Type: Application
    Filed: May 10, 2022
    Publication date: November 16, 2023
    Inventors: Tarique Ashraf SIDDIQUI, Saehan JO, Wentao WU, Chi WANG, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Patent number: 11809223
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.
    Type: Grant
    Filed: November 8, 2021
    Date of Patent: November 7, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
  • Patent number: 11809442
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.
    Type: Grant
    Filed: April 13, 2020
    Date of Patent: November 7, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Publication number: 20230315701
    Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.
    Type: Application
    Filed: June 7, 2023
    Publication date: October 5, 2023
    Inventors: Meiyalagan BALASUBRAMANIAN, Lengning LIU, Aditya KUPPA, Kirk Hartmann FREIHEIT, Kalen WONG, Paula Budig GREVE, Patrick Clinton LITTLE, Lucas PRITZ, Yue WANG, Vivek Ravindranath NARASAYYA, Katchaguy AREEKIJSEREE, Yehe HE, Surajit CHAUDHURI, Gaurav Ghosh
  • Publication number: 20230315702
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for determining optimal index configurations for processing workloads in a database management system. For instance, an index configuration system can efficiently determine a subset of indexes for processing a workload utilizing one or more reinforcement learning models. For example, in various implementations, the index configuration system utilizes a Markov decision process and/or a Monte Carlo tree search model to determine an optimal subset of indexes for processing a workload in a manner that effectively utilizes computing device resources while also avoiding significant interference with customer workloads.
    Type: Application
    Filed: June 3, 2022
    Publication date: October 5, 2023
    Inventors: Wentao WU, Chi WANG, Tarique Ashraf SIDDIQUI, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Patent number: 11745093
    Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.
    Type: Grant
    Filed: November 23, 2021
    Date of Patent: September 5, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: John C. Platt, Surajit Chaudhuri, Lev Novik, Henricus Johannes Maria Meijer
  • Patent number: 11734274
    Abstract: the present disclosure relates to systems, methods, and computer-readable media for optimizing and implementing operator trees based on a received query. For example, systems disclosed herein may generate an operator tree based on a received query. The systems described herein may systematically analyze the impact of bitvector filters in optimizing a join order of the operator tree to generate an optimized operator tree. The systems described herein may further implement the bit-vector aware operator tree by providing the optimized operator tree to an execution engine for further processing.
    Type: Grant
    Filed: June 30, 2020
    Date of Patent: August 22, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Bailu Ding, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Patent number: 11714790
    Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: August 1, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Meiyalagan Balasubramanian, Lengning Liu, Aditya Kuppa, Kirk Hartmann Freiheit, Kalen Wong, Paula Budig Greve, Patrick Clinton Little, Lucas Pritz, Yue Wang, Vivek Ravindranath Narasayya, Katchaguy Areekijseree, Yeye He, Surajit Chaudhuri, Gaurav Ghosh
  • Publication number: 20230098926
    Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.
    Type: Application
    Filed: September 30, 2021
    Publication date: March 30, 2023
    Inventors: Meiyalagan BALASUBRAMANIAN, Lengning LIU, Aditya KUPPA, Kirk Hartmann FREIHEIT, Kalen WONG, Paula Budig GREVE, Patrick Clinton LITTLE, Lucas PRITZ, Yue WANG, Vivek Ravindranath NARASAYYA, Katchaguy AREEKIJSEREE, Yeye HE, Surajit CHAUDHURI
  • Publication number: 20220414099
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for optimizing selection of a cached execution plan to use in processing a parametric query. For example, systems described herein involve training a plan selection model that makes use of machine learning to identify an execution plan from a set of pre-selected execution plans based on predicted cost of executing a query instance in accordance with the selected execution plan (e.g., relative to predicted costs of executing the query instance using other pre-selected execution plans). This application describes features related to lowering costs associated with selecting the execution plan in a way that will continue to be more accurate overtime based on training and refining the plan selection model.
    Type: Application
    Filed: June 28, 2021
    Publication date: December 29, 2022
    Inventors: Anshuman DUTT, Kapil Eknath VAIDYA, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
  • Patent number: 11520800
    Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: December 6, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kris Ganjam, Yeye He, Vivek Ravindranath Narasayya, Surajit Chaudhuri
  • Publication number: 20220365910
    Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.
    Type: Application
    Filed: May 14, 2021
    Publication date: November 17, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yeye HE, Surajit CHAUDHURI, Junwen YANG