Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient transformation program generation

Patent number: 12386849

Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a transformation function is executed using an example input value to obtain an initial output value. Thereafter, a plurality of supplemental transformation tools is applied to the initial output value to generate a plurality of intermediary output values. Based on a comparison of each of the intermediary output values to an example output value, the supplemental transformation tool that generated an intermediary output value having a greatest extent of similarity to the example output values is identified. The identified supplemental transformation tool and the transformation function are used to generate a transformation program that transforms the example input values to the desired form in which to transform data.

Type: Grant

Filed: September 9, 2020

Date of Patent: August 12, 2025

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Yeye He, Kris Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
Cache-efficient top-k aggregation over high cardinality large datasets

Patent number: 12380098

Abstract: A data processing system implements a cache-conscious aggregation framework for cache-efficient top-k aggregation over high cardinality large datasets. The framework leverages skew in the distribution of data in the datasets to minimize data movements within the local caches of the cores of the multicore processors of the data processing system. The framework performs representative sampling on the dataset and utilizes these samples to identify candidate groups in the dataset for the top-k results. The system performs exact aggregations for the candidate groups and performs hashing and pruning on the non-candidate groups in the dataset to identify top-k results included in the non-candidate groups without having to calculate the exact aggregations for the non-candidate groups.

Type: Grant

Filed: September 26, 2023

Date of Patent: August 5, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Tarique Ashraf Siddiqui, Vivek Ravindranath Narasayya, Marius Dumitru, Surajit Chaudhuri
Method and system for automatically tagging data

Patent number: 12380082

Abstract: Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.

Type: Grant

Filed: June 23, 2022

Date of Patent: August 5, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Kumar Seshagirirao Anil, Yaron Y. Goland, Gaurav Malhotra, Blake Lassiter
USING WORKLOAD REDUCTION TO IMPROVE INDEX TUNING

Publication number: 20250245222

Abstract: This disclosure describes a workload reduction system that reduces the complexity of workloads sent to a database system. For instance, the workload reduction system pre-processes workloads sent to a database system by generating reduced workloads that include less complex queries that reference fewer tables or columns than the original workloads. In various implementations, the workload reduction system uses table reduction functions and query re-writing functions to generate the reduced workloads. As a result, the workload reduction system improves computational efficiency by rewriting complex queries from workloads into simpler ones that speed up index tuning and decrease individual what-if call times.

Type: Application

Filed: January 29, 2024

Publication date: July 31, 2025

Inventors: Matteo BRUCATO, Tarique Ashraf SIDDIQUI, Wentao WU, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
Synthesizing transformations to relationalize data tables

Patent number: 12373415

Abstract: This document relates to relational databases and corresponding data tables. Non-conforming data tables can be automatically transformed into conforming relational data tables. One example can obtain conforming relational data tables and can generate training data without human labelling by identifying a transformational operator that will transform an individual conforming relational data table to a non-conforming data table and an inverse transformational operator that will transform the non-conforming data table back to the individual conforming relational data table. The example can train a model with the training data. The trained model can synthesize programs to transform other non-conforming data tables to conforming relational data tables.

Type: Grant

Filed: June 7, 2023

Date of Patent: July 29, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri, Peng Li
Facilitating data transformations

Patent number: 12298996

Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.

Type: Grant

Filed: September 28, 2023

Date of Patent: May 13, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yeye He, Kris K. Ganjam, Vivek Ravindranath Narasayya, Surajit Chaudhuri
Data unification

Patent number: 12292866

Abstract: Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.

Type: Grant

Filed: June 7, 2023

Date of Patent: May 6, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Meiyalagan Balasubramanian, Lengning Liu, Aditya Kuppa, Kirk Hartmann Freiheit, Kalen Wong, Paula Budig Greve, Patrick Clinton Little, Lucas Pritz, Yue Wang, Vivek Ravindranath Narasayya, Katchaguy Areekijseree, Yeye He, Surajit Chaudhuri, Gaurav Ghosh
CACHE-EFFICIENT TOP-K AGGREGATION OVER HIGH CARDINALITY LARGE DATASETS

Publication number: 20250103591

Abstract: A data processing system implements a cache-conscious aggregation framework for cache-efficient top-k aggregation over high cardinality large datasets. The framework leverages skew in the distribution of data in the datasets to minimize data movements within the local caches of the cores of the multicore processors of the data processing system. The framework performs representative sampling on the dataset and utilizes these samples to identify candidate groups in the dataset for the top-k results. The system performs exact aggregations for the candidate groups and performs hashing and pruning on the non-candidate groups in the dataset to identify top-k results included in the non-candidate groups without having to calculate the exact aggregations for the non-candidate groups.

Type: Application

Filed: September 26, 2023

Publication date: March 27, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Tarique Ashraf SIDDIQUI, Vivek Ravindranath NARASAYYA, Marius DUMITRU, Surajit CHAUDHURI
Scalable index tuning with index filtering and index cost models

Patent number: 12248454

Abstract: A method of training an index filter for an index tuning system includes receiving a plurality of different workloads and a plurality of different databases, each database including different tables and each workload including a plurality of queries; generating labeled training by making optimizer calls to a query optimizer using query and index configuration pairs from the plurality of databases and the plurality of workloads; training an index filter model to identify signals in the labeled training data, the signals being indicative of a potential performance improvement associated with using an index configuration for a given query; training the index filter model to learn rules over the signals for identifying spurious indexes; and storing the index filter model in a memory.

Type: Grant

Filed: August 29, 2022

Date of Patent: March 11, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Tarique Ashraf Siddiqui, Vivek Ravindranath Narasayya, Surajit Chaudhuri, Wentao Wu
Efficient configuration selection for automated machine learning

Patent number: 12223407

Abstract: In automated machine learning, an approximate best configuration can be selected among multiple candidate machine-learning configurations by progressively sampling training and test datasets for the iterative training and testing of the configurations while progressively pruning the set of candidate configurations based on associated estimated confidence intervals for their respective performance.

Type: Grant

Filed: August 23, 2018

Date of Patent: February 11, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chi Wang, Silu Huang, Surajit Chaudhuri, Bolin Ding
Systems and methods for accelerating and optimizing groupwise comparison in relational databases

Patent number: 12222939

Abstract: A system and method for executing SQL statements includes receiving an SQL statement for comparing two trendsets over a relation using a scoring function, each of the trendsets including one or more trends, each of the trends being designated by a constraint and a grouping-measure combination, wherein comparing the trendsets includes identifying trend pairs for comparison, each of the trend pairs including a trend from the each of the trendsets having a common grouping-measure combination. The SQL statement is transformed into a basic plan of existing logical operators for performing the SQL statement. A set of sub-plans is determined based on the basic plan. Pairs of sub-plans are merged to generate a set of merged sub-plans. A cost for each of the merged sub-plans is determined. The merged sub-plan having the lowest cost is used to execute the SQL statement.

Type: Grant

Filed: June 6, 2022

Date of Patent: February 11, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Tarique Ashraf Siddiqui, Surajit Chaudhuri, Vivek Ravindranath Narasayya
Synthesizing Transformations to Relationalize Data Tables

Publication number: 20240411740

Abstract: This document relates to relational databases and corresponding data tables. Non-conforming data tables can be automatically transformed into conforming relational data tables. One example can obtain conforming relational data tables and can generate training data without human labelling by identifying a transformational operator that will transform an individual conforming relational data table to a non-conforming data table and an inverse transformational operator that will transform the non-conforming data table back to the individual conforming relational data table. The example can train a model with the training data. The trained model can synthesize programs to transform other non-conforming data tables to conforming relational data tables.

Type: Application

Filed: June 7, 2023

Publication date: December 12, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yeye HE, Cong YAN, Yue WANG, Surajit CHAUDHURI, Peng LI
AUTOMATICALLY BUILDING BUSINESS INTELLIGENCE MODELS

Publication number: 20240346427

Abstract: The present disclosure relates to methods and systems that automatically predict a business intelligence model for tables of data provided as input. The methods and systems automatically generate a graph representing the business intelligence model and provide the graph as output. The graph provides a visual representation of the business intelligence model with nodes of the graph representing each input table and edges of the graph representing weighted edges joining pairs of tables together.

Type: Application

Filed: April 14, 2023

Publication date: October 17, 2024

Inventors: Yeye HE, Yiming Stefania LIN, Surajit CHAUDHURI
Compressing workloads for scalable index tuning

Patent number: 12105713

Abstract: The present disclosure relates to methods and systems for compressing workloads for use with index tuning. The methods and systems receive a workload with a plurality of queries. The methods and systems represent each query using query features and a utility. The methods and systems select a query for a query subset based on a benefit of the query determined using the query features and the utility. The methods and systems update the features and the utility of the remaining queries in the workload and select another query to add to the query subset based on an updated benefit determined using the updated features and utilities. The methods and systems select queries for the query subset equal to a received query subset size. The methods and systems use the query subset in index tuning to provide one or more indexes to recommendations.

Type: Grant

Filed: May 10, 2022

Date of Patent: October 1, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Tarique Ashraf Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek Ravindranath Narasayya, Surajit Chaudhuri
Constraint-based index tuning in database management systems utilizing reinforcement learning

Patent number: 12066993

Abstract: The present disclosure relates to systems, methods, and computer-readable media for determining optimal index configurations for processing workloads in a database management system. For instance, an index configuration system can efficiently determine a subset of indexes for processing a workload utilizing one or more reinforcement learning models. For example, in various implementations, the index configuration system utilizes a Markov decision process and/or a Monte Carlo tree search model to determine an optimal subset of indexes for processing a workload in a manner that effectively utilizes computing device resources while also avoiding significant interference with customer workloads.

Type: Grant

Filed: June 3, 2022

Date of Patent: August 20, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Wentao Wu, Chi Wang, Tarique Ashraf Siddiqui, Vivek Ravindranath Narasayya, Surajit Chaudhuri
EXTENSIBLE DATA TRANSFORMATIONS

Publication number: 20240184798

Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values are received. A repository of transformation tools is searched to identify a new transformation tool as relevant to a data transformation associated with the received set of example values. The repository includes annotations associated with the new transformation tool. The new transformation tool is used to generate a transformation program that produces transformed output values. Additional annotations are generated for the new transformation tool based on the transformed output values.

Type: Application

Filed: December 5, 2022

Publication date: June 6, 2024

Inventors: Kris K. GANJAM, Yeye HE, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
Using query logs to optimize execution of parametric queries

Patent number: 11934398

Abstract: The present disclosure relates to systems, methods, and computer-readable media for optimizing selection of a cached execution plan to use in processing a parametric query. For example, systems described herein involve training a plan selection model that makes use of machine learning to identify an execution plan from a set of pre-selected execution plans based on predicted cost of executing a query instance in accordance with the selected execution plan (e.g., relative to predicted costs of executing the query instance using other pre-selected execution plans). This application describes features related to lowering costs associated with selecting the execution plan in a way that will continue to be more accurate overtime based on training and refining the plan selection model.

Type: Grant

Filed: June 28, 2021

Date of Patent: March 19, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Anshuman Dutt, Kapil Eknath Vaidya, Vivek Ravindranath Narasayya, Surajit Chaudhuri
Automatic transformation of data by patterns

Patent number: 11886457

Abstract: A transform-by-pattern (TBP) system is configured to proactively suggest relevant TBP programs based on inputted source dataset and target dataset without requiring users typing in examples. The TBP system has access to multiple TBP programs, each of which includes a combination of a source pattern, a target pattern, and a transformation program that is configured to transform data that fits into the target pattern into data that fits into the source pattern. When a source dataset and a target dataset are received from a user, the TBP system identifies a subset of the source dataset and a subset of the target dataset as related data. The TBP system then identifies one or more applicable TBP programs amongst the multiple TBP programs, and suggest or apply at least one of the one or more applicable TBP programs.

Type: Grant

Filed: May 29, 2020

Date of Patent: January 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yeye He, Surajit Chaudhuri, Zhongjun Jin
FACILITATING DATA TRANSFORMATIONS

Publication number: 20240028607

Abstract: Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating data transformations, according to embodiments of the present invention. In one embodiment, a set of example values including example input values that indicate data values to be transformed and example output values that indicate a desired form in which to transform data. Based on the set of example values, a data transformation function that is relevant to the set of example values is identified. The data transformation function is used to generate a transformation program to transform the example input values to the desired form in which to transform data. A suggestion of the transformation program can be provided to a user device, wherein selection of the transformation program suggestion results in a data transformation.

Type: Application

Filed: September 28, 2023

Publication date: January 25, 2024

Inventors: Yeye HE, Kris K. GANJAM, Vivek Ravindranath NARASAYYA, Surajit CHAUDHURI
Synthesizing multi-operator data transformation pipelines

Patent number: 11880344

Abstract: Methods and systems for generating multi-operator data transformation pipelines. An example method includes accessing raw data for transformation; receiving a selection of a target table or target visualization, wherein the target table or target visualization is for data other than the raw data; extracting table properties and target constraints; and based on the extracted table properties and target constraints, synthesizing one or more multi-operator data transformation pipelines for transforming the raw data to a generated table or generated visualization.

Type: Grant

Filed: May 14, 2021

Date of Patent: January 23, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Yeye He, Surajit Chaudhuri, Junwen Yang

1 2 3 4 5 … next