Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Relaxation-based approach to automatic physical database tuning

Publication number: 20060242102

Abstract: A system that facilitates automatic selection of a physical configuration of a database comprises an optimizer component that determines simulated physical structures and creates a hypothetical configuration based thereon. A reduction component progressively reduces size of the configuration until the hypothetical configuration is associated with a size below a threshold. For example, the simulated physical structures can be based at least in part upon a workload.

Type: Application

Filed: April 21, 2005

Publication date: October 26, 2006

Applicant: Microsoft Corporation

Inventors: Nicolas Bruno, Surajit Chaudhuri
Optimization based method for estimating the results of aggregate queries

Patent number: 7120624

Abstract: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an estimated result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.

Type: Grant

Filed: May 21, 2001

Date of Patent: October 10, 2006

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Guatam Das
Optimizing multi-predicate selections on a relation using indexes

Patent number: 7120623

Abstract: Methods of optimizing access to a relation queried through a number of predicates. The methods identify one or more candidate predicates of the selection condition that can be used to factorize the selection condition. A gain from using one or more of the candidate predicates to factorize the selection condition is computed. One or more of the candidate predicates that result in a positive gain are factored from the selection condition to produce a rewritten selection condition. The candidate predicates can be predicates that appear exactly in the selection condition more than once and/or merged predicates that may be predicates in the selection condition that overlap.

Type: Grant

Filed: August 29, 2002

Date of Patent: October 10, 2006

Assignee: Microsoft Corporation

Inventors: Prasana Ganesan, Surajit Chaudhuri
Flexible database generators

Publication number: 20060123009

Abstract: A flexible, easy to use, and scalable framework for database generation and mappings of synthetic distributions to the framework. The framework discloses a specification language, database primitives, aspects of a runtime system, and an extension to create table SQL statements, to generate databases with complex synthetic distributions and inter-table correlations. The framework facilitates generation of a data generator which can output the synthetic data distribution. The data distribution includes at least one of a complex intra-table correlation and a complex inter-table correlation. The framework further comprises an annotations component that facilitates annotation of a relational database statement (e.g., a CREATE TABLE statement) which specifies concisely how a table will be populated. The framework further comprises a language component (e.g., a Data Generation Language (DGL)) that specifies the data distribution.

Type: Application

Filed: December 7, 2004

Publication date: June 8, 2006

Applicant: Microsoft Corporation

Inventors: Nicolas Bruno, Surajit Chaudhuri
Sampling for queries

Publication number: 20060085463

Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.

Type: Application

Filed: December 7, 2005

Publication date: April 20, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Sampling for queries

Publication number: 20060085410

Abstract: A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

Type: Application

Filed: December 7, 2005

Publication date: April 20, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Database tuning advisor

Publication number: 20060085484

Abstract: An automated physical database design tool may provide an integrated physical design recommendation for horizontal partitioning, indexes and indexed views, all three features being tuned together (in concert). Manageability requirements may be specified when optimizing for performance. User-specified configuration may enable the specification of a partial physical design without materialization of the physical design. The tuning process may be performed for a production server but may be conducted substantially on a test server. Secondary indexes may be suggested for XML columns. Tuning of a database may be invoked by any owner of a database. Usage of objects may be evaluated and a recommendation for dropping unused objects may be issued. Reports may be provided concerning the count and percentage of queries in the workload that reference a particular database, and/or the count and percentage of queries in the workload that reference a particular table or column.

Type: Application

Filed: October 15, 2004

Publication date: April 20, 2006

Applicant: Microsoft Corporation

Inventors: Alexander Raizman, Arunprasad Marathe, Djana Milton, Dmitry Sonkin, Lubor Kollar, Maciej Sarnowicz, Manoj Syamala, Raja Duddupudi, Sanjay Agrawal, Surajit Chaudhuri, Vivek Narasayya
Schema for physical database tuning

Publication number: 20060085378

Abstract: Internal communications within components of an automated physical database design tool may be conducted in a data description language such as XML. Inputs to and outputs from the automated physical database design tool may also be presented in the data description language (e.g., XML). The communications, inputs and outputs may comply with a schema for the data description language. The schema may be written in a schema language such as XSD. Inputs presented in the data description language may comprise tuning options. Outputs may comprise a proposed physical design for a database and reports.

Type: Application

Filed: October 15, 2004

Publication date: April 20, 2006

Applicant: Microsoft Corporation

Inventors: Alexander Raizman, Arunprasad Marathe, Djana Ophelia Milton, Dmitry Sonkin, Lubor Kollar, Maciej Sarnowicz, Manoj Syamala, Raja Duddupudi, Sanjay Agrawal, Surajit Chaudhuri, Vivek Narasayya
Database aggregation query result estimator

Publication number: 20060053103

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Application

Filed: October 7, 2005

Publication date: March 9, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Robust detector of fuzzy duplicates

Publication number: 20060053129

Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.

Type: Application

Filed: August 30, 2004

Publication date: March 9, 2006

Applicant: Microsoft Corporation

Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
Method of building multidimensional workload-aware histograms

Patent number: 7007039

Abstract: In a database system, a method of maintaining a self-tuning histogram having a plurality of existing rectangular shaped buckets arranged in a hierarchical manner and defined by at least two bucket boundaries, a bucket volume, and a bucket frequency. At least one new bucket is created in response to a query on the database. Each new bucket is contained within at least one existing bucket and the new bucket becomes a child bucket and the existing bucket containing it becomes a parent bucket. The boundaries of each new bucket correspond to a region of the database accessed by the query and the frequency of the new bucket is a number of data records returned by the query. Buckets may be merged based on a merge criterion such as similar bucket density when the total number of buckets exceeds the predetermined budget.

Type: Grant

Filed: June 14, 2001

Date of Patent: February 28, 2006

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Nicolas Bruno, Luis Gravano
Database aggregation query result estimator

Publication number: 20060036600

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Application

Filed: October 7, 2005

Publication date: February 16, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Rajeev Motwani, Mayur Datar
Dynamic physical database design

Publication number: 20060036989

Abstract: A monitoring component of a database server collects a subset of a query workload along with related statistics. A remote index tuning component uses the workload subset and related statistics to determine a physical design that minimizes the cost of executing queries in the workload subset while ensuring that queries omitted from the subset do not degrade in performance.

Type: Application

Filed: August 10, 2004

Publication date: February 16, 2006

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Arnd Konig, Vivek Narasayya
Automatic categorization of query results

Publication number: 20060036581

Abstract: At least one implementation of database management technology, described herein, utilizes categorization of query results when querying a relational database in order to reduce information overload. To reduce information overload even further, another implementation, described herein, utilizes both categorization and ranking of query results when searching a relational database.

Type: Application

Filed: August 13, 2004

Publication date: February 16, 2006

Applicant: Microsoft Corporation

Inventors: Kaushik Chakrabarti, Seung-won Hwang, Surajit Chaudhuri
Ranking database query results

Publication number: 20050289102

Abstract: A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.

Type: Application

Filed: June 29, 2004

Publication date: December 29, 2005

Applicant: Microsoft Corporation

Inventors: Gautam Das, Surajit Chaudhuri, Vagelis Hristidis, Gerhard Weikum
Method and apparatus for exploiting statistics on query expressions for optimization

Publication number: 20050267877

Abstract: A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query.

Type: Application

Filed: July 7, 2005

Publication date: December 1, 2005

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Nicolas Bruno
Detecting duplicate records in databases

Publication number: 20050262044

Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key-foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Type: Application

Filed: July 14, 2005

Publication date: November 24, 2005

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
Detecting duplicate records in database

Patent number: 6961721

Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Type: Grant

Filed: June 28, 2002

Date of Patent: November 1, 2005

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
Query selectivity estimation with confidence interval

Publication number: 20050228779

Abstract: Selectivity estimates are produced that meet a desired confidence threshold. To determine the confidence level of a given selectivity estimate for a query expression, the query expression is evaluated on a sample tuples. A probability density function is derived based on the number of tuples in the sample that satisfy the query expression. The cumulative distribution for the probability density function is solved for the given threshold to determine a selectivity estimate at the given confidence value.

Type: Application

Filed: April 6, 2004

Publication date: October 13, 2005

Inventors: Surajit Chaudhuri, Brian Babcock
Primitives for workload summarization

Publication number: 20050223026

Abstract: A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.

Type: Application

Filed: March 31, 2004

Publication date: October 6, 2005

Inventors: Surajit Chaudhuri, Vivek Narasayya, Prasanna Ganesan

prev … 8 9 10 11 12 13 14 15 16 next