Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7454407
    Abstract: Techniques for estimating the progress of database queries are described herein. In a first implementation, a respective lower-bound parameter is associated with each node in an operator tree that representing a given database query, and the progress of the database query at a given point is estimated based upon the lower-bound parameters. In a second implementation, the progress of the query is estimated by associating respective lower-bound and upper-bound parameters with each node in the operator tree. The progress of the query at the given point is then estimated based on the lower-bound and upper-bound parameters.
    Type: Grant
    Filed: June 10, 2005
    Date of Patent: November 18, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy, Kaushik Shriraghav
  • Publication number: 20080183764
    Abstract: Online physical design tuning is constantly monitoring database indexes and can effectively react to changes in a workload by modifying the physical design as needed. Algorithms can be utilized that take into account various criteria including storage constraints, update statements, and the cost of temporarily creating physical structures.
    Type: Application
    Filed: January 31, 2007
    Publication date: July 31, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Publication number: 20080183644
    Abstract: A lightweight physical design alerter can analyze a workload and determine whether a comprehensive tuning session would result in a configuration improvement over the current configuration. The alerter provides a low-overhead procedure that can run during normal operation of a database management system and produce a notification if a current configuration is less than optimal. The alerter can report lower and upper bounds on the improvements that could be obtained if a comprehensive tuning tool is launched. A lower bound can be justified by generating feasible configurations. The disclosed embodiments can be extended to query updates, materialized views, and other physical design features (e.g., partitioning).
    Type: Application
    Filed: January 31, 2007
    Publication date: July 31, 2008
    Applicant: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Patent number: 7406479
    Abstract: A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing. The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.
    Type: Grant
    Filed: February 10, 2006
    Date of Patent: July 29, 2008
    Assignee: Microsoft Corporation
    Inventors: Kaushik Shriraghav, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20080177694
    Abstract: Database systems use a plan cache to avoid the overheads (e.g., time, money) of query recompilation. Query plans can become invalidated by updates to the statistics on data or changes to the physical database design. Once a plan is invalidated, it can be repaired utilizing one or more of the disclosed embodiments. Incremental repair of query plans includes reusing parts of the current plan rather than discarding the plan entirely when it is invalidated. Repair to an existing query plan is attempted before resorting to full recompilation.
    Type: Application
    Filed: January 19, 2007
    Publication date: July 24, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy
  • Patent number: 7383262
    Abstract: A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.
    Type: Grant
    Filed: June 29, 2004
    Date of Patent: June 3, 2008
    Assignee: Microsoft Corporation
    Inventors: Gautam Das, Surajit Chaudhuri, Vagelis Hristidis, Gerhard Weikum
  • Patent number: 7363289
    Abstract: A method for evaluating a user query on a relational database having records stored therein, a workload made up of a set of queries that have been executed on the database, and a query optimizer that generates a query execution plan for the user query. Each query plan includes a plurality of intermediate query plan components that verify a subset of records from the database meeting query criteria. The method accesses the query plan and a set of stored intermediate statistics for records verified by query components, such as histograms that summarize the cardinality of the records that verify the query component. The method forms a transformed query plan based on the selected intermediate statistics (possibly by rewriting the query plan) and estimates the cardinality of the transformed query plan to arrive at a more accurate cardinality estimate for the query.
    Type: Grant
    Filed: July 7, 2005
    Date of Patent: April 22, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Nicolas Bruno
  • Patent number: 7363301
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
    Type: Grant
    Filed: October 7, 2005
    Date of Patent: April 22, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7346601
    Abstract: A method for evaluating a user query on a database having a mining model that classifies records contained in the database into classes when the query comprises at least one mining predicate that refers to a class of database records. An upper envelope is derived for the class referred to by the mining predicate corresponding to a query that returns a set of database records that includes all of the database records belonging to the class. The upper envelope is included in the user query for query evaluation. The method may be practiced during a preprocessing phase by evaluating the mining model to extract a set of classes of the database records and deriving an upper envelope for each class. These upper envelopes are stored for access during user query evaluation.
    Type: Grant
    Filed: June 3, 2002
    Date of Patent: March 18, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Sunita Sarawagi
  • Patent number: 7330848
    Abstract: A method and apparatus for creating a statistical representation of a query result that can be performed without executing the underlying query. For a binary-join query, a scan is performed on one of the join tables. A multiplicity value that estimates the number of tuples in the other join table that has a matching join attribute to the scanned tuple is calculated. A number of copies (as determined by the multiplicity value) are placed in a stream of tuples that is sampled to compile the statistical representation of the query result. For acyclic-join generating queries including selections, the above procedure is recursively extended. If multiple statistical representations are sought, scans can be shared. Scan sharing can be optimized using shortest common supersequence techniques.
    Type: Grant
    Filed: May 23, 2003
    Date of Patent: February 12, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Nicolas Bruno
  • Patent number: 7328221
    Abstract: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.
    Type: Grant
    Filed: September 8, 2004
    Date of Patent: February 5, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Gantam Das
  • Publication number: 20080005104
    Abstract: A localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. Further, a system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results. Another aspect of the disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user.
    Type: Application
    Filed: June 28, 2006
    Publication date: January 3, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
  • Publication number: 20080005091
    Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.
    Type: Application
    Filed: June 28, 2006
    Publication date: January 3, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
  • Publication number: 20080005105
    Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.
    Type: Application
    Filed: June 28, 2006
    Publication date: January 3, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
  • Publication number: 20080005071
    Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.
    Type: Application
    Filed: June 28, 2006
    Publication date: January 3, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Gary W. Flake, William H. Gates, Trenholme J. Griffin, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Oliver Hurst-Hiller, Kenneth A. Moss
  • Publication number: 20080005074
    Abstract: The subject disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user. Still further yet, a localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. A system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.
    Type: Application
    Filed: June 28, 2006
    Publication date: January 3, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Gary W. Flake, William H. Gates, Eric J. Horvitz, Joshua T. Goodman, Surajit Chaudhuri, Trenholme J. Griffin, Oliver Hurst-Hiller, Kenneth A. Moss
  • Patent number: 7299226
    Abstract: A method of estimating cardinality of a join of tables using multi-column density values and additionally using coarser density values of a subset of the multi-column density attributes. In one embodiment, the subset of attributes for the coarser densities is a prefix of the set of multi-column density attributes. A number of tuples from each table that participate in the join may be estimated using densities of the subsets. The cardinality of the join can be estimated using the multi-column density for each table and the estimated number of tuples that participate in the join from each table.
    Type: Grant
    Filed: June 19, 2003
    Date of Patent: November 20, 2007
    Assignee: Microsoft Corporation
    Inventors: Nicolas Bruno, Murali Krishna, Ming-Chuan Wu, Surajit Chaudhuri
  • Patent number: 7299220
    Abstract: A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: November 20, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Prasanna Ganesan
  • Patent number: 7296011
    Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: November 13, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani
  • Patent number: 7293037
    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.
    Type: Grant
    Filed: October 7, 2005
    Date of Patent: November 6, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar