Patents by Inventor Surajit Chaudhuri
Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20080177694Abstract: Database systems use a plan cache to avoid the overheads (e.g., time, money) of query recompilation. Query plans can become invalidated by updates to the statistics on data or changes to the physical database design. Once a plan is invalidated, it can be repaired utilizing one or more of the disclosed embodiments. Incremental repair of query plans includes reusing parts of the current plan rather than discarding the plan entirely when it is invalidated. Repair to an existing query plan is attempted before resorting to full recompilation.Type: ApplicationFiled: January 19, 2007Publication date: July 24, 2008Applicant: MICROSOFT CORPORATIONInventors: Surajit Chaudhuri, Ravishankar Ramamurthy
-
Patent number: 7383262Abstract: A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.Type: GrantFiled: June 29, 2004Date of Patent: June 3, 2008Assignee: Microsoft CorporationInventors: Gautam Das, Surajit Chaudhuri, Vagelis Hristidis, Gerhard Weikum
-
Patent number: 7363289Abstract: A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query.Type: GrantFiled: July 7, 2005Date of Patent: April 22, 2008Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Nicolas Bruno
-
Patent number: 7363301Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.Type: GrantFiled: October 7, 2005Date of Patent: April 22, 2008Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Patent number: 7346601Abstract: A method for evaluating a user query on a database having a mining model that classifies records contained in the database into classes when the query comprises at least one mining predicate that refers to a class of database records. An upper envelope is derived for the class referred to by the mining predicate corresponding to a query that returns a set of database records that includes all of the database records belonging to the class. The upper envelope is included in the user query for query evaluation. The method may be practiced during a preprocessing phase by evaluating the mining model to extract a set of classes of the database records and deriving an upper envelope for each class. These upper envelopes are stored for access during user query evaluation.Type: GrantFiled: June 3, 2002Date of Patent: March 18, 2008Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Sunita Sarawagi
-
Patent number: 7330848Abstract: A method and apparatus for creating a statistical representation of a query result that can be performed without executing the underlying query. For a binary-join query, a scan is performed on one of the join tables. A multiplicity value that estimates the number of tuples in the other join table that has a matching join attribute to the scanned tuple is calculated. A number of copies (as determined by the multiplicity value) are placed in a stream of tuples that is sampled to compile the statistical representation of the query result. For acyclic-join generating queries including selections, the above procedure is recursively extended. If multiple statistical representations are sought, scans can be shared. Scan sharing can be optimized using shortest common supersequence techniques.Type: GrantFiled: May 23, 2003Date of Patent: February 12, 2008Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Nicolas Bruno
-
Patent number: 7328221Abstract: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.Type: GrantFiled: September 8, 2004Date of Patent: February 5, 2008Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Gantam Das
-
Publication number: 20080005071Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.Type: ApplicationFiled: June 28, 2006Publication date: January 3, 2008Applicant: MICROSOFT CORPORATIONInventors: Gary W. Flake, William H. Gates, Trenholme J. Griffin, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Oliver Hurst-Hiller, Kenneth A. Moss
-
Publication number: 20080005074Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.Type: ApplicationFiled: June 28, 2006Publication date: January 3, 2008Applicant: MICROSOFT CORPORATIONInventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
-
Publication number: 20080005105Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.Type: ApplicationFiled: June 28, 2006Publication date: January 3, 2008Applicant: MICROSOFT CORPORATIONInventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
-
Publication number: 20080005091Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.Type: ApplicationFiled: June 28, 2006Publication date: January 3, 2008Applicant: MICROSOFT CORPORATIONInventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
-
Publication number: 20080005104Abstract: A localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. Further, a system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results. Another aspect of the disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user.Type: ApplicationFiled: June 28, 2006Publication date: January 3, 2008Applicant: MICROSOFT CORPORATIONInventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
-
Patent number: 7299226Abstract: A method of estimating cardinality of a join of tables using multi-column density values and additionally using coarser density values of a subset of the multi-column density attributes. In one embodiment, the subset of attributes for the coarser densities is a prefix of the set of multi-column density attributes. A number of tuples from each table that participate in the join may be estimated using densities of the subsets. The cardinality of the join can be estimated using the multi-column density for each table and the estimated number of tuples that participate in the join from each table.Type: GrantFiled: June 19, 2003Date of Patent: November 20, 2007Assignee: Microsoft CorporationInventors: Nicolas Bruno, Murali Krishna, Ming-Chuan Wu, Surajit Chaudhuri
-
Patent number: 7299220Abstract: A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.Type: GrantFiled: March 31, 2004Date of Patent: November 20, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Prasanna Ganesan
-
Patent number: 7296011Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.Type: GrantFiled: June 20, 2003Date of Patent: November 13, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
-
Patent number: 7293036Abstract: Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.Type: GrantFiled: December 8, 2004Date of Patent: November 6, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Ashish Kumar Gupta, Vivek Narasayya
-
Patent number: 7293037Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.Type: GrantFiled: October 7, 2005Date of Patent: November 6, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Patent number: 7287019Abstract: A process for finding a similar data records from a set of data records. A database table or tables provide a number of data records from which one or more canonical data records are identified. Tokens are identified within the data records and classified according to attribute field. A similarity score is assigned to data records in relation to other data records based on a similarity between tokens of the data records. Data records whose similarity score with respect to each other is greater than a threshold form one or more groups of data records. The records or tuples form nodes of a graph wherein edges between nodes represent a similarity score between records of a group. Within each group a canonical record is identified based on the similarity of data records to each other within the group.Type: GrantFiled: June 4, 2003Date of Patent: October 23, 2007Assignee: Microsoft CorporationInventors: Rahul Kapoor, Venkatesh Ganti, Surajit Chaudhuri
-
Patent number: 7287020Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.Type: GrantFiled: January 12, 2001Date of Patent: October 23, 2007Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
-
Publication number: 20070239744Abstract: Various embodiments are disclosed relating to database configuration refinement. In an example embodiment, a method is provided that may include determining a size limitation for a database configuration, determining a workload of the database configuration, and making a determination that a size of the database configuration is greater than a size limit. The method may also include applying either a merge process or a reduction process to decrease the size of the database configuration. The merge process may merge a first index/view with a second index/view to produce a merged index/view, for example. The reduction process may delete a first portion of a first view to produce a reduced view.Type: ApplicationFiled: March 28, 2006Publication date: October 11, 2007Applicant: Microsoft CorporationInventors: Nicolas Bruno, Surajit Chaudhuri