Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

INCREMENTAL REPAIR OF QUERY PLANS

Publication number: 20080177694

Abstract: Database systems use a plan cache to avoid the overheads (e.g., time, money) of query recompilation. Query plans can become invalidated by updates to the statistics on data or changes to the physical database design. Once a plan is invalidated, it can be repaired utilizing one or more of the disclosed embodiments. Incremental repair of query plans includes reusing parts of the current plan rather than discarding the plan entirely when it is invalidated. Repair to an existing query plan is attempted before resorting to full recompilation.

Type: Application

Filed: January 19, 2007

Publication date: July 24, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy
Ranking database query results using probabilistic models from information retrieval

Patent number: 7383262

Abstract: A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.

Type: Grant

Filed: June 29, 2004

Date of Patent: June 3, 2008

Assignee: Microsoft Corporation

Inventors: Gautam Das, Surajit Chaudhuri, Vagelis Hristidis, Gerhard Weikum
Method and apparatus for exploiting statistics on query expressions for optimization

Patent number: 7363289

Abstract: A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query.

Type: Grant

Filed: July 7, 2005

Date of Patent: April 22, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Nicolas Bruno
Database aggregation query result estimator

Patent number: 7363301

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Grant

Filed: October 7, 2005

Date of Patent: April 22, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Efficient evaluation of queries with mining predicates

Patent number: 7346601

Abstract: A method for evaluating a user query on a database having a mining model that classifies records contained in the database into classes when the query comprises at least one mining predicate that refers to a class of database records. An upper envelope is derived for the class referred to by the mining predicate corresponding to a query that returns a set of database records that includes all of the database records belonging to the class. The upper envelope is included in the user query for query evaluation. The method may be practiced during a preprocessing phase by evaluating the mining model to extract a set of classes of the database records and deriving an upper envelope for each class. These upper envelopes are stored for access during user query evaluation.

Type: Grant

Filed: June 3, 2002

Date of Patent: March 18, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Sunita Sarawagi
Method and apparatus for generating statistics on query expressions for optimization

Patent number: 7330848

Abstract: A method and apparatus for creating a statistical representation of a query result that can be performed without executing the underlying query. For a binary-join query, a scan is performed on one of the join tables. A multiplicity value that estimates the number of tuples in the other join table that has a matching join attribute to the scanned tuple is calculated. A number of copies (as determined by the multiplicity value) are placed in a stream of tuples that is sampled to compile the statistical representation of the query result. For acyclic-join generating queries including selections, the above procedure is recursively extended. If multiple statistical representations are sought, scans can be shared. Scan sharing can be optimized using shortest common supersequence techniques.

Type: Grant

Filed: May 23, 2003

Date of Patent: February 12, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Nicolas Bruno
Optimization based method for estimating the results of aggregate queries

Patent number: 7328221

Abstract: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.

Type: Grant

Filed: September 8, 2004

Date of Patent: February 5, 2008

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Gantam Das
SEARCH GUIDED BY LOCATION AND CONTEXT

Publication number: 20080005071

Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Gary W. Flake, William H. Gates, Trenholme J. Griffin, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Oliver Hurst-Hiller, Kenneth A. Moss
SEARCH OVER DESIGNATED CONTENT

Publication number: 20080005074

Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
VISUAL AND MULTI-DIMENSIONAL SEARCH

Publication number: 20080005105

Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
VISUAL AND MULTI-DIMENSIONAL SEARCH

Publication number: 20080005091

Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
LOCALIZED MARKETING

Publication number: 20080005104

Abstract: A localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. Further, a system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results. Another aspect of the disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user.

Type: Application

Filed: June 28, 2006

Publication date: January 3, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
Cardinality estimation of joins

Patent number: 7299226

Abstract: A method of estimating cardinality of a join of tables using multi-column density values and additionally using coarser density values of a subset of the multi-column density attributes. In one embodiment, the subset of attributes for the coarser densities is a prefix of the set of multi-column density attributes. A number of tuples from each table that participate in the join may be estimated using densities of the subsets. The cardinality of the join can be estimated using the multi-column density for each table and the estimated number of tuples that participate in the join from each table.

Type: Grant

Filed: June 19, 2003

Date of Patent: November 20, 2007

Assignee: Microsoft Corporation

Inventors: Nicolas Bruno, Murali Krishna, Ming-Chuan Wu, Surajit Chaudhuri
Constructing database object workload summaries

Patent number: 7299220

Abstract: A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.

Type: Grant

Filed: March 31, 2004

Date of Patent: November 20, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Prasanna Ganesan
Efficient fuzzy match for evaluating data records

Patent number: 7296011

Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

Type: Grant

Filed: June 20, 2003

Date of Patent: November 13, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
Compressing database workloads

Patent number: 7293036

Abstract: Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.

Type: Grant

Filed: December 8, 2004

Date of Patent: November 6, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Ashish Kumar Gupta, Vivek Narasayya
Database aggregation query result estimator

Patent number: 7293037

Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

Type: Grant

Filed: October 7, 2005

Date of Patent: November 6, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Duplicate data elimination system

Patent number: 7287019

Abstract: A process for finding a similar data records from a set of data records. A database table or tables provide a number of data records from which one or more canonical data records are identified. Tokens are identified within the data records and classified according to attribute field. A similarity score is assigned to data records in relation to other data records based on a similarity between tokens of the data records. Data records whose similarity score with respect to each other is greater than a threshold form one or more groups of data records. The records or tuples form nodes of a graph wherein edges between nodes represent a similarity score between records of a group. Within each group a canonical record is identified based on the similarity of data records to each other within the group.

Type: Grant

Filed: June 4, 2003

Date of Patent: October 23, 2007

Assignee: Microsoft Corporation

Inventors: Rahul Kapoor, Venkatesh Ganti, Surajit Chaudhuri
Sampling for queries

Patent number: 7287020

Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.

Type: Grant

Filed: January 12, 2001

Date of Patent: October 23, 2007

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
Database physical design refinement using a merge-reduce approach

Publication number: 20070239744

Abstract: Various embodiments are disclosed relating to database configuration refinement. In an example embodiment, a method is provided that may include determining a size limitation for a database configuration, determining a workload of the database configuration, and making a determination that a size of the database configuration is greater than a size limit. The method may also include applying either a merge process or a reduction process to decrease the size of the database configuration. The merge process may merge a first index/view with a second index/view to produce a merged index/view, for example. The reduction process may delete a first portion of a first view to produce a reduced view.

Type: Application

Filed: March 28, 2006

Publication date: October 11, 2007

Applicant: Microsoft Corporation

Inventors: Nicolas Bruno, Surajit Chaudhuri

prev … 7 8 9 10 11 12 13 14 15 … next