Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for estimating progress of database queries

Patent number: 7454407

Abstract: Techniques for estimating the progress of database queries are described herein. In a first implementation, a respective lower-bound parameter is associated with each node in an operator tree that representing a given database query, and the progress of the database query at a given point is estimated based upon the lower-bound parameters. In a second implementation, the progress of the query is estimated by associating respective lower-bound and upper-bound parameters with each node in the operator tree. The progress of the query at the given point is then estimated based on the lower-bound and upper-bound parameters.

Type: Grant

Filed: June 10, 2005

Date of Patent: November 18, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy, Kaushik Shriraghav
CONTINUOUS PHYSICAL DESIGN TUNING

Publication number: 20080183764

Abstract: Online physical design tuning is constantly monitoring database indexes and can effectively react to changes in a workload by modifying the physical design as needed. Algorithms can be utilized that take into account various criteria including storage constraints, update statements, and the cost of temporarily creating physical structures.

Type: Application

Filed: January 31, 2007

Publication date: July 31, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Nicolas Bruno, Surajit Chaudhuri
LIGHTWEIGHT PHYSICAL DESIGN ALERTER

Publication number: 20080183644

Abstract: A lightweight physical design alerter can analyze a workload and determine whether a comprehensive tuning session would result in a configuration improvement over the current configuration. The alerter provides a low-overhead procedure that can run during normal operation of a database management system and produce a notification if a current configuration is less than optimal. The alerter can report lower and upper bounds on the improvements that could be obtained if a comprehensive tuning tool is launched. A lower bound can be justified by generating feasible configurations. The disclosed embodiments can be extended to query updates, materialized views, and other physical design features (e.g., partitioning).

Type: Application

Filed: January 31, 2007

Publication date: July 31, 2008

Applicant: Microsoft Corporation

Inventors: Nicolas Bruno, Surajit Chaudhuri
Primitive operator for similarity joins in data cleaning

Patent number: 7406479

Abstract: A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing. The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.

Type: Grant

Filed: February 10, 2006

Date of Patent: July 29, 2008

Assignee: Microsoft Corporation

Inventors: Kaushik Shriraghav, Surajit Chaudhuri, Venkatesh Ganti
INCREMENTAL REPAIR OF QUERY PLANS

Publication number: 20080177694

Abstract: Database systems use a plan cache to avoid the overheads (e.g., time, money) of query recompilation. Query plans can become invalidated by updates to the statistics on data or changes to the physical database design. Once a plan is invalidated, it can be repaired utilizing one or more of the disclosed embodiments. Incremental repair of query plans includes reusing parts of the current plan rather than discarding the plan entirely when it is invalidated. Repair to an existing query plan is attempted before resorting to full recompilation.

Type: Application

Filed: January 19, 2007

Publication date: July 24, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy
Ranking database query results using probabilistic models from information retrieval

Patent number: 7383262

Abstract: A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.

Type: Grant

Filed: June 29, 2004

Date of Patent: June 3, 2008

Assignee: Microsoft Corporation

Inventors: Gautam Das, Surajit Chaudhuri, Vagelis Hristidis, Gerhard Weikum
Method and apparatus for exploiting statistics on query expressions for optimization

Patent number: 7363289

Abstract: A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query.

Type: Grant

Filed: July 7, 2005

Date of Patent: April 22, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Nicolas Bruno
Database aggregation query result estimator

Patent number: 7363301

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Grant

Filed: October 7, 2005

Date of Patent: April 22, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Efficient evaluation of queries with mining predicates

Patent number: 7346601

Abstract: A method for evaluating a user query on a database having a mining model that classifies records contained in the database into classes when the query comprises at least one mining predicate that refers to a class of database records. An upper envelope is derived for the class referred to by the mining predicate corresponding to a query that returns a set of database records that includes all of the database records belonging to the class. The upper envelope is included in the user query for query evaluation. The method may be practiced during a preprocessing phase by evaluating the mining model to extract a set of classes of the database records and deriving an upper envelope for each class. These upper envelopes are stored for access during user query evaluation.

Type: Grant

Filed: June 3, 2002

Date of Patent: March 18, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Sunita Sarawagi
Method and apparatus for generating statistics on query expressions for optimization

Patent number: 7330848

Abstract: A method and apparatus for creating a statistical representation of a query result that can be performed without executing the underlying query. For a binary-join query, a scan is performed on one of the join tables. A multiplicity value that estimates the number of tuples in the other join table that has a matching join attribute to the scanned tuple is calculated. A number of copies (as determined by the multiplicity value) are placed in a stream of tuples that is sampled to compile the statistical representation of the query result. For acyclic-join generating queries including selections, the above procedure is recursively extended. If multiple statistical representations are sought, scans can be shared. Scan sharing can be optimized using shortest common supersequence techniques.

Type: Grant

Filed: May 23, 2003

Date of Patent: February 12, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Nicolas Bruno
Optimization based method for estimating the results of aggregate queries

Patent number: 7328221

Abstract: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.

Type: Grant

Filed: September 8, 2004

Date of Patent: February 5, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Gantam Das
LOCALIZED MARKETING

Publication number: 20080005104

Abstract: A localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. Further, a system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results. Another aspect of the disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
VISUAL AND MULTI-DIMENSIONAL SEARCH

Publication number: 20080005091

Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
VISUAL AND MULTI-DIMENSIONAL SEARCH

Publication number: 20080005105

Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
SEARCH GUIDED BY LOCATION AND CONTEXT

Publication number: 20080005071

Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Gary W. Flake, William H. Gates, Trenholme J. Griffin, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Oliver Hurst-Hiller, Kenneth A. Moss
SEARCH OVER DESIGNATED CONTENT

Publication number: 20080005074

Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
Cardinality estimation of joins

Patent number: 7299226

Abstract: A method of estimating cardinality of a join of tables using multi-column density values and additionally using coarser density values of a subset of the multi-column density attributes. In one embodiment, the subset of attributes for the coarser densities is a prefix of the set of multi-column density attributes. A number of tuples from each table that participate in the join may be estimated using densities of the subsets. The cardinality of the join can be estimated using the multi-column density for each table and the estimated number of tuples that participate in the join from each table.

Type: Grant

Filed: June 19, 2003

Date of Patent: November 20, 2007

Assignee: Microsoft Corporation

Inventors: Nicolas Bruno, Murali Krishna, Ming-Chuan Wu, Surajit Chaudhuri
Constructing database object workload summaries

Patent number: 7299220

Abstract: A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.

Type: Grant

Filed: March 31, 2004

Date of Patent: November 20, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Prasanna Ganesan
Efficient fuzzy match for evaluating data records

Patent number: 7296011

Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

Type: Grant

Filed: June 20, 2003

Date of Patent: November 13, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
Database aggregation query result estimator

Patent number: 7293037

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Grant

Filed: October 7, 2005

Date of Patent: November 6, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar

prev … 6 7 8 9 10 11 12 13 14 … next