Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7293036
    Abstract: Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.
    Type: Grant
    Filed: December 8, 2004
    Date of Patent: November 6, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Ashish Kumar Gupta, Vivek Narasayya
  • Patent number: 7287019
    Abstract: A process for finding a similar data records from a set of data records. A database table or tables provide a number of data records from which one or more canonical data records are identified. Tokens are identified within the data records and classified according to attribute field. A similarity score is assigned to data records in relation to other data records based on a similarity between tokens of the data records. Data records whose similarity score with respect to each other is greater than a threshold form one or more groups of data records. The records or tuples form nodes of a graph wherein edges between nodes represent a similarity score between records of a group. Within each group a canonical record is identified based on the similarity of data records to each other within the group.
    Type: Grant
    Filed: June 4, 2003
    Date of Patent: October 23, 2007
    Assignee: Microsoft Corporation
    Inventors: Rahul Kapoor, Venkatesh Ganti, Surajit Chaudhuri
  • Patent number: 7287020
    Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.
    Type: Grant
    Filed: January 12, 2001
    Date of Patent: October 23, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Publication number: 20070239744
    Abstract: Various embodiments are disclosed relating to database configuration refinement. In an example embodiment, a method is provided that may include determining a size limitation for a database configuration, determining a workload of the database configuration, and making a determination that a size of the database configuration is greater than a size limit. The method may also include applying either a merge process or a reduction process to decrease the size of the database configuration. The merge process may merge a first index/view with a second index/view to produce a merged index/view, for example. The reduction process may delete a first portion of a first view to produce a reduced view.
    Type: Application
    Filed: March 28, 2006
    Publication date: October 11, 2007
    Applicant: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Patent number: 7281013
    Abstract: A method for providing workload information in a structured workload information data structure format that is organized according to a workload schema to be conducive to a given end usage of the information. The structured workload information can be made accessible using standard database analytical server applications to facilitate ad-hoc querying of the structured workload information to summarize and analyze the database workload or to facilitate exchange of workload information. A structured workload information (SWI) is constructed according to a SWI schema to facilitate a desired end usage of the workload information. The query information is extracted from the workload and stored in a structured workload information (SWI) data structure according to the schema based on the desired end usage of the information such as ad hoc querying or information exchange.
    Type: Grant
    Filed: June 3, 2002
    Date of Patent: October 9, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Omer Zaki
  • Patent number: 7281007
    Abstract: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.
    Type: Grant
    Filed: September 8, 2004
    Date of Patent: October 9, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Gantam Das
  • Publication number: 20070198439
    Abstract: The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and based on a cardinality estimate and a cost estimate an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.
    Type: Application
    Filed: February 17, 2006
    Publication date: August 23, 2007
    Applicant: Microsoft Corporation
    Inventors: Kaushik Shriraghav, Surajit Chaudhuri, Nilesh Dalvi
  • Publication number: 20070192342
    Abstract: A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing. The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.
    Type: Application
    Filed: February 10, 2006
    Publication date: August 16, 2007
    Applicant: Microsoft Corporation
    Inventors: Kaushik Shriraghav, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20070185851
    Abstract: A query generation using cardinality constraints process including choosing a first set of parameters for a query, calculating an additional set of parameters based on the first set of parameters, executing the query using additional set of parameters, evaluating the cardinality error the additional set of parameters, and refining the additional set of parameters to meet the desired cardinality constraint. Creating a query and selecting parameters for the query to meet a desired cardinality constraint or set of cardinality constraints when the query is executed against a database may be difficult. A query generation using cardinality constraints process may create a set of parameters for a query which satisfies a desired cardinality constraint or set of cardinality constraints. An application of such a query generation using cardinality constraints process may be database component and code testing.
    Type: Application
    Filed: January 27, 2006
    Publication date: August 9, 2007
    Applicant: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri, Dilys Thomas
  • Patent number: 7251648
    Abstract: A method for automatically ranking database records by relevance to a given query. A similarity function is derived from data in the database and/or queries in a workload. The derrived similarity function is applied to a given query and records it in the database to rank the records. The records are returned in a ranked order.
    Type: Grant
    Filed: June 28, 2002
    Date of Patent: July 31, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Gautam Das, Aris Gionis
  • Patent number: 7249141
    Abstract: Layout in a database system is performed using workload information. Execution information for a workload is obtained. Cumulative access and co-access information for database objects is then assembled. A cost model is developed for quantitatively capturing the value of different layouts, and a search is performed for a recommended database layout. In one embodiment, a greedy search is performed which initially attempts provide a layout that minimizes co-location of objects on storage objects, and then attempts to improve that layout via a greedy search.
    Type: Grant
    Filed: April 30, 2003
    Date of Patent: July 24, 2007
    Assignee: Microsoft Corporation
    Inventors: Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das, Vivek Narasayya
  • Patent number: 7249120
    Abstract: By transforming a query into a product of conditional selectivity expressions, an existing set of statistics on query expressions can be used more effectively to estimate cardinality values. Conditional selectivity values are progressively separated according to rules of conditional probability to yield a set of non-separable decompositions that can be matched with the stored statistics on query expressions. The stored statistics are used to estimate the selectivity of the query and the estimated selectivity can be multiplied by the Cartesian product of referenced tables to yield a cardinality value.
    Type: Grant
    Filed: June 27, 2003
    Date of Patent: July 24, 2007
    Assignee: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Patent number: 7240044
    Abstract: Database system query optimizers use several techniques such as histograms and sampling to estimate the result sizes of operators and sub-plans (operator trees) and the number of distinct values in their outputs. Instead of estimates, the invention uses the exact actual values of the result sizes and the number of distinct values in the outputs of sub-plans encountered by the optimizer. This is achieved by optimizing the query in phases. In each phase, newly encountered sub-plans are recorded for which result size and/or distinct value estimates are required. These sub-plans are executed at the end of the phase to determine their actual result sizes and the actual number of distinct values in their outputs. In subsequent phases, the optimizer uses these actual values when it encounters the same sub-plan again.
    Type: Grant
    Filed: September 15, 2004
    Date of Patent: July 3, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Ashraf I Aboulnaga
  • Patent number: 7228312
    Abstract: An XML transformation tool that constructs a relational database with associated physical structures that can be populated with shredded XML data. A mapping transformation enumerator examines queries in the workload and enumerates mapping transformations that use XSD specific constraints and statistics on XML data and can be used to generate mappings from XSD to relational database schema that may lead to better performance in presence of physical design. A design tuner that searches mappings generated from a default mapping using enumerated transformations together with physical design structures associated with those mappings and selects a preferred mapping and the physical design structures. Cost estimates for performing queries in the workload are determined for the relational database implementing the mapping and associated physical design structures.
    Type: Grant
    Filed: March 9, 2004
    Date of Patent: June 5, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Zhiyuan Chen, Kyuseok Shim, Yuqing Yu
  • Patent number: 7194451
    Abstract: A framework is provided within a database system for specifying database monitoring rules that will be evaluated as part of the execution code path of database events being monitored. The occurrence of a selected database event triggers a rule that evaluates some parameter of an object related to the event against a condition in the rule. If the condition is met, a specified action is taken that can alter the execution of the database event or database system performance. Lightweight aggregation tables are utilized to enable aggregation of object parameter values so that presently occurring events can be compared to a summary of the object parameter values from previously occurring database events. Signatures are assigned to queries based on the structure of the query plan so that information in the lightweight aggregation tables can be grouped according to query signature.
    Type: Grant
    Filed: February 26, 2004
    Date of Patent: March 20, 2007
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Arnd Christian Konig, Vivek Narasayya
  • Patent number: 7155459
    Abstract: A method is provided for tuning a database to recommend a set of physical design structures for the database that optimize database performance for a given workload given a total time bound that defines a maximum amount of time that can be spent tuning the database. A cumulative set of recommended structures is maintained and incrementally updated based on tuning that is performed in intervals over portions of the workload. The cumulative set of recommended structures is updated by tuning the database by examining a predetermined portion of the workload during a time slice that is a fraction of the total time bound. At the end of the time slice, a set of recommended structures has been enumerated that is based on the workload portions that have been examined thus far. The set of recommended structures is updated until all queries in the workload have been examined or until the time bound is reached.
    Type: Grant
    Filed: June 28, 2002
    Date of Patent: December 26, 2006
    Assignee: Miccrosoft Corporation
    Inventors: Surajit Chaudhuri, Sanjay Agrawal, Vivek Narasayya
  • Publication number: 20060282436
    Abstract: A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.
    Type: Application
    Filed: May 6, 2005
    Publication date: December 14, 2006
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Kaushik Shriraghav
  • Publication number: 20060282404
    Abstract: Techniques for estimating the progress of database queries are described herein. In a first implementation, a respective lower-bound parameter is associated with each node in an operator tree that representing a given database query, and the progress of the database query at a given point is estimated based upon the lower-bound parameters. In a second implementation, the progress of the query is estimated by associating respective lower-bound and upper-bound parameters with each node in the operator tree. The progress of the query at the given point is then estimated based on the lower-bound and upper-bound parameters. The progress estimate is computed by dividing the work done so far by the sums of the above averages for each node in the tree.
    Type: Application
    Filed: June 10, 2005
    Publication date: December 14, 2006
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy, Kaushik Shriraghav
  • Patent number: 7149735
    Abstract: A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.
    Type: Grant
    Filed: June 24, 2003
    Date of Patent: December 12, 2006
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano
  • Patent number: 7139778
    Abstract: In a relational database system, a set of physical design structures is enumerated that optimizes database performance over a given workload consisting of workload entries that include queries and updates that have been executed against the database. An individual benefit is calculated for each candidate structure relevant to a given workload entry and these individual benefits are summed over the entries in the workload examined thus far. A workload entry is selected from the workload and a set of candidate structures relevant to the workload entry is identified.
    Type: Grant
    Filed: June 28, 2002
    Date of Patent: November 21, 2006
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Mayur Datar