Patents by Inventor Surajit Chaudhuri
Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240411740Abstract: This document relates to relational databases and corresponding data tables. Non-conforming data tables can be automatically transformed into conforming relational data tables. One example can obtain conforming relational data tables and can generate training data without human labelling by identifying a transformational operator that will transform an individual conforming relational data table to a non-conforming data table and an inverse transformational operator that will transform the non-conforming data table back to the individual conforming relational data table. The example can train a model with the training data. The trained model can synthesize programs to transform other non-conforming data tables to conforming relational data tables.Type: ApplicationFiled: June 7, 2023Publication date: December 12, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Yeye HE, Cong YAN, Yue WANG, Surajit CHAUDHURI, Peng LI
-
Publication number: 20240346427Abstract: The present disclosure relates to methods and systems that automatically predict a business intelligence model for tables of data provided as input. The methods and systems automatically generate a graph representing the business intelligence model and provide the graph as output. The graph provides a visual representation of the business intelligence model with nodes of the graph representing each input table and edges of the graph representing weighted edges joining pairs of tables together.Type: ApplicationFiled: April 14, 2023Publication date: October 17, 2024Inventors: Yeye HE, Yiming Stefania LIN, Surajit CHAUDHURI
-
Patent number: 12105713Abstract: The present disclosure relates to methods and systems for compressing workloads for use with index tuning. The methods and systems receive a workload with a plurality of queries. The methods and systems represent each query using query features and a utility. The methods and systems select a query for a query subset based on a benefit of the query determined using the query features and the utility. The methods and systems update the features and the utility of the remaining queries in the workload and select another query to add to the query subset based on an updated benefit determined using the updated features and utilities. The methods and systems select queries for the query subset equal to a received query subset size. The methods and systems use the query subset in index tuning to provide one or more indexes to recommendations.Type: GrantFiled: May 10, 2022Date of Patent: October 1, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Tarique Ashraf Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek Ravindranath Narasayya, Surajit Chaudhuri
-
Patent number: 12066993Abstract: The present disclosure relates to systems, methods, and computer-readable media for determining optimal index configurations for processing workloads in a database management system. For instance, an index configuration system can efficiently determine a subset of indexes for processing a workload utilizing one or more reinforcement learning models. For example, in various implementations, the index configuration system utilizes a Markov decision process and/or a Monte Carlo tree search model to determine an optimal subset of indexes for processing a workload in a manner that effectively utilizes computing device resources while also avoiding significant interference with customer workloads.Type: GrantFiled: June 3, 2022Date of Patent: August 20, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Wentao Wu, Chi Wang, Tarique Ashraf Siddiqui, Vivek Ravindranath Narasayya, Surajit Chaudhuri
-
Publication number: 20240184798Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.Type: ApplicationFiled: December 5, 2022Publication date: June 6, 2024Inventors: Kris K. GANJAM, Yeye HE, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
-
Patent number: 11934398Abstract: The present disclosure relates to systems, methods, and computer-readable media for optimizing selection of a cached execution plan to use in processing a parametric query. For example, systems described herein involve training a plan selection model that makes use of machine learning to identify an execution plan from a set of pre-selected execution plans based on predicted cost of executing a query instance in accordance with the selected execution plan (e.g., relative to predicted costs of executing the query instance using other pre-selected execution plans). This application describes features related to lowering costs associated with selecting the execution plan in a way that will continue to be more accurate overtime based on training and refining the plan selection model.Type: GrantFiled: June 28, 2021Date of Patent: March 19, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Anshuman Dutt, Kapil Eknath Vaidya, Vivek Ravindranath Narasayya, Surajit Chaudhuri
-
Patent number: 11886457Abstract: A transform-by-pattern (TBP) system is configured to proactively suggest relevant TBP programs based on inputted source dataset and target dataset without requiring users typing in examples. The TBP system has access to multiple TBP programs, each of which includes a combination of a source pattern, a target pattern, and a transformation program that is configured to transform data that fits into the target pattern into data that fits into the source pattern. When a source dataset and a target dataset are received from a user, the TBP system identifies a subset of the source dataset and a subset of the target dataset as related data. The TBP system then identifies one or more applicable TBP programs amongst the multiple TBP programs, and suggest or apply at least one of the one or more applicable TBP programs.Type: GrantFiled: May 29, 2020Date of Patent: January 30, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Surajit Chaudhuri, Zhongjun Jin
-
Publication number: 20240028607Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.Type: ApplicationFiled: September 28, 2023Publication date: January 25, 2024Inventors: Yeye HE, Kris K. GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
-
Patent number: 11880344Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.Type: GrantFiled: May 14, 2021Date of Patent: January 23, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Surajit Chaudhuri, Junwen Yang
-
Publication number: 20230394041Abstract: A system and method for executing SQL statements includes receiving an SQL statement for comparing two trendsets over a relation using a scoring function, each of the trendsets including one or more trends, each of the trends being designated by a constraint and a grouping-measure combination, wherein comparing the trendsets includes identifying trend pairs for comparison, each of the trend pairs including a trend from the each of the trendsets having a common grouping-measure combination. The SQL statement is transformed into a basic plan of existing logical operators for performing the SQL statement. A set of sub-plans is determined based on the basic plan. Pairs of sub-plans are merged to generate a set of merged sub-plans. A cost for each of the merged sub-plans is determined. The merged sub-plan having the lowest cost is used to execute the SQL statement.Type: ApplicationFiled: June 6, 2022Publication date: December 7, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Tarique Ashraf SIDDIQUI, Surajit CHAUDHURI, Vivek Ravindranath NARASAYYA
-
Patent number: 11836646Abstract: A model generator constructs a model for estimating selectivity of database operations by determining a number of training examples necessary for the model to achieve a target accuracy and by generating approximate selectivity labels for the training examples. The model generator may train the model on an initial number of training examples using cross-validation. The model generator may determine whether the model satisfies the target accuracy and iteratively and geometrically increase the number of training examples based on an optimized geometric step size (which may minimize model construction time) until the model achieves the target accuracy based on a defined confidence level. The model generator may generate labels using a subset of tuples from an intermediate query expression. The model generator may iteratively increase a size of the subset of tuples used until a relative error of the generated labels is below a target threshold.Type: GrantFiled: June 30, 2020Date of Patent: December 5, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Anshuman Dutt, Chi Wang, Vivek Ravindranath Narasayya, Surajit Chaudhuri
-
Publication number: 20230385261Abstract: A method of training an index filter for an index tuning system includes receiving a plurality of different workloads and a plurality of different databases, each database including different tables and each workload including a plurality of queries; generating labeled training by making optimizer calls to a query optimizer using query and index configuration pairs from the plurality of databases and the plurality of workloads; training an index filter model to identify signals in the labeled training data, the signals being indicative of a potential performance improvement associated with using an index configuration for a given query; training the index filter model to learn rules over the signals for identifying spurious indexes; and storing the index filter model in a memory.Type: ApplicationFiled: August 29, 2022Publication date: November 30, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Tarique Ashraf SIDDIQUI, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI, Wentao WU
-
Publication number: 20230367771Abstract: The present disclosure relates to methods and systems for compressing workloads for use with index tuning. The methods and systems receive a workload with a plurality of queries. The methods and systems represent each query using query features and a utility. The methods and systems select a query for a query subset based on a benefit of the query determined using the query features and the utility. The methods and systems update the features and the utility of the remaining queries in the workload and select another query to add to the query subset based on an updated benefit determined using the updated features and utilities. The methods and systems select queries for the query subset equal to a received query subset size. The methods and systems use the query subset in index tuning to provide one or more indexes to recommendations.Type: ApplicationFiled: May 10, 2022Publication date: November 16, 2023Inventors: Tarique Ashraf SIDDIQUI, Saehan JO, Wentao WU, Chi WANG, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
-
Publication number: 20230368068Abstract: The present disclosure relates to systems, methods, and computer-readable media for training and implementing pipeline error detection models to facilitate automated detection of data quality (DQ) issues within recurring data pipelines. For example, systems described herein involve training a pipeline error detection model by first constructing a plurality of DQ constraints for a recurring data pipeline based on ranges of values observed over a history of pipeline executions. The systems may further train the model to predict DQ issues by synthetically applying data variants to historical executions of the recurring data pipeline or to data pipelines having similar characteristics thereto. Once trained, the pipeline error detection model(s) can be applied to new executions of the data pipeline as they become available to quickly and efficiently predict whether a given execution includes a predicted DQ issue therein.Type: ApplicationFiled: May 12, 2022Publication date: November 16, 2023Inventors: Yeye HE, Weiwei CUI, Song GE, Haidong ZHANG, Shi HAN, Dongmei ZHANG, Surajit CHAUDHURI
-
Patent number: 11809442Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.Type: GrantFiled: April 13, 2020Date of Patent: November 7, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
-
Patent number: 11809223Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a plurality of remote sources is searched to identify candidate transformation tools relevant for performing data transformations. The candidate transformation tools are analyzed to identify tool examples corresponding with each of the candidate transformation tools. For each of the candidate transformation tools, the tool examples are stored in association with the corresponding candidate transformation tool. Based on a comparison of tool examples with example values, a transformation tool is identified as relevant to facilitate transforming example input values to the desired form in which to transform data.Type: GrantFiled: November 8, 2021Date of Patent: November 7, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Xu Chu
-
Publication number: 20230315701Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.Type: ApplicationFiled: June 7, 2023Publication date: October 5, 2023Inventors: Meiyalagan BALASUBRAMANIAN, Lengning LIU, Aditya KUPPA, Kirk Hartmann FREIHEIT, Kalen WONG, Paula Budig GREVE, Patrick Clinton LITTLE, Lucas PRITZ, Yue WANG, Vivek Ravindranath NARASAYYA, Katchaguy AREEKIJSEREE, Yehe HE, Surajit CHAUDHURI, Gaurav Ghosh
-
Publication number: 20230315702Abstract: The present disclosure relates to systems, methods, and computer-readable media for determining optimal index configurations for processing workloads in a database management system. For instance, an index configuration system can efficiently determine a subset of indexes for processing a workload utilizing one or more reinforcement learning models. For example, in various implementations, the index configuration system utilizes a Markov decision process and/or a Monte Carlo tree search model to determine an optimal subset of indexes for processing a workload in a manner that effectively utilizes computing device resources while also avoiding significant interference with customer workloads.Type: ApplicationFiled: June 3, 2022Publication date: October 5, 2023Inventors: Wentao WU, Chi WANG, Tarique Ashraf SIDDIQUI, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
-
Patent number: 11745093Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.Type: GrantFiled: November 23, 2021Date of Patent: September 5, 2023Assignee: Microsoft Technology Licensing, LLCInventors: John C. Platt, Surajit Chaudhuri, Lev Novik, Henricus Johannes Maria Meijer
-
Patent number: 11734274Abstract: the present disclosure relates to systems, methods, and computer-readable media for optimizing and implementing operator trees based on a received query. For example, systems disclosed herein may generate an operator tree based on a received query. The systems described herein may systematically analyze the impact of bitvector filters in optimizing a join order of the operator tree to generate an optimized operator tree. The systems described herein may further implement the bit-vector aware operator tree by providing the optimized operator tree to an execution engine for further processing.Type: GrantFiled: June 30, 2020Date of Patent: August 22, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Bailu Ding, Vivek Ravindranath Narasayya, Surajit Chaudhuri