Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7636707
    Abstract: Selectivity estimates are produced that meet a desired confidence threshold. To determine the confidence level of a given selectivity estimate for a query expression, the query expression is evaluated on a sample tuples. A probability density function is derived based on the number of tuples in the sample that satisfy the query expression. The cumulative distribution for the probability density function is solved for the given threshold to determine a selectivity estimate at the given confidence value.
    Type: Grant
    Filed: April 6, 2004
    Date of Patent: December 22, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Brian Frederick Babcock
  • Publication number: 20090254522
    Abstract: A database server may be configured to compute distinct page counts of pages accessed to execute operands of respective queries. The queries may be executed against a table comprised of the pages and having an index managed by the database server. The distinct page counts may be obtained by counting, as a part of the executing of the queries, distinct pages accessed during the execution of the queries.
    Type: Application
    Filed: April 4, 2008
    Publication date: October 8, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
  • Publication number: 20090210418
    Abstract: A transformation-based record matching technique. The technique provides a flexible way to account for synonyms and more general forms of string equivalences when performing record matching by taking as explicit input user-defined transformation rules (such as, for example, the fact that “Robert” and “Bob” that are synonymous). The input string and user-defined transformation rules are used to generate a larger set of strings which are used when performing record matching. Both the input string and data elements in a database can be transformed using the user-defined transformation rules in order to generate a larger set of potential record matches. These potential record matches can then be subjected to a threshold test in order to determine one or more best matches. Additionally, signature-based similarity functions are used to improve the computational efficiency of the technique.
    Type: Application
    Filed: February 15, 2008
    Publication date: August 20, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Arvind Arasu, Surajit Chaudhuri, Shriraghav Kaushik
  • Patent number: 7577638
    Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.
    Type: Grant
    Filed: December 7, 2005
    Date of Patent: August 18, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7567962
    Abstract: At least one implementation of database management technology, described herein, utilizes categorization of query results when querying a relational database in order to reduce information overload. To reduce information overload even further, another implementation, described herein, utilizes both categorization and ranking of query results when searching a relational database.
    Type: Grant
    Filed: August 13, 2004
    Date of Patent: July 28, 2009
    Assignee: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Seung-won Hwang, Surajit Chaudhuri
  • Patent number: 7567949
    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.
    Type: Grant
    Filed: September 10, 2002
    Date of Patent: July 28, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
  • Patent number: 7562067
    Abstract: A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.
    Type: Grant
    Filed: May 6, 2005
    Date of Patent: July 14, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Kaushik Shriraghav
  • Publication number: 20090106746
    Abstract: Infrastructure for capturing and correlating application context and database context for tuning, profiling and debugging tasks. The infrastructure extends the DBMS and application profiling infrastructure making it easy for a developer to invoke and interact with a tool from inside the development environment. Three sources of information are employed when an application is executed: server tracing, data access layer tracing, and application tracing. The events obtained from each of these sources are written into a log file. An event log is generated on each machine that involves either an application process or the DBMS server process and the log file receives log traces from different processes on a machine to the same trace session. A post-processing step over the event log(s) correlates the application and database contexts. The output is a single view where both the application and database profile of each statement issued by the application are exposed.
    Type: Application
    Filed: October 19, 2007
    Publication date: April 23, 2009
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Manoj A. Symala
  • Publication number: 20090094086
    Abstract: Assignment algorithm for automatically making assignments between documents and document reviewers for a review process. If the automated assignments need adjusting, a coordinator can manually refine the assignment(s). The assignment algorithm facilitates the automated assignment process based on inputs related to a constraint and/or a preference. The constraints and preferences include, but are not limited to, a conflict of interest, a minimum number of reviews, a maximum number of submissions, a partial assignment, bidding preferences, and health metrics. Once the assignments have been made, histograms can be generated that present an overview of certain health metrics, further allowing refinement of the assignment process.
    Type: Application
    Filed: October 3, 2007
    Publication date: April 9, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Nicolas Bruno, Vivek R. Narasayya, Surajit Chaudhuri
  • Publication number: 20090094191
    Abstract: A proactive monitoring mechanism for correcting the choice of access methods (available query plans) for a given query, based on execution feedback from the same query. The mechanism exploits bypassing predicate short-circuiting inside the database server's predicate evaluation module to obtain expression cardinalities. The mechanism can also modify a plan to obtain expression cardinalities. These techniques are used judiciously by the query optimizer and/or a database administrator (DBA) so that the execution overheads are within acceptable limits.
    Type: Application
    Filed: October 8, 2007
    Publication date: April 9, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Ravishankar Ramamurthy
  • Patent number: 7516149
    Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.
    Type: Grant
    Filed: August 30, 2004
    Date of Patent: April 7, 2009
    Assignee: Microsoft Corporation
    Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20090083238
    Abstract: Stop-and-restart query execution that partially leverages the work already performed during the initial execution of the query to reduce the execution time during a restart. The technique selectively saves information from a previous execution of the query so that the overhead associated with restarting the query execution can be bounded. Despite saving only limited information, the disclosed technique substantially reduces the running time of the restarted query. The stop-and-restart query execution technique is constrained to save and reuse only a bounded number of records (intermediate records or output records) thereby releasing all other resources, rather than some of the resources. The technique chooses a subset of the records to save that were found during normal execution and then skipping the corresponding records when performing a scan during restart to prevent the duplication of execution. A skip-scan operator is employed to facilitate the disclosed restart technique.
    Type: Application
    Filed: September 21, 2007
    Publication date: March 26, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Shriraghav Kaushik, Abhijit Pol, Ravishankar Ramamurthy
  • Publication number: 20090083214
    Abstract: Index structures and query processing framework that enforces a given threshold on the overhead of computing conjunctive keyword queries. This includes a keyword processing algorithm, logic to determine which indexes to materialize, and a probabilistic approach to reducing the overhead for determining which indexes to build. The index structures leverage the fact that the frequency distribution of natural-language text follows a power law. Given a document collection, a set of indexes is proposed for materialization so that the time for intersecting keywords does not exceed a given threshold ?. When considering the associated space requirement, the additional indexes are limited. Materialization of such a set of indexes for reasonable values of ? (e.g., the time required to scan 20% of the largest inverted index), at least for a collection of short documents is distributed by the power law.
    Type: Application
    Filed: September 21, 2007
    Publication date: March 26, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Arnd C. Konig, Surajit Chaudhuri, Kenneth Church, Liying Sui
  • Patent number: 7493316
    Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.
    Type: Grant
    Filed: December 7, 2005
    Date of Patent: February 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7493337
    Abstract: A query progress indicator that provides an indication to a user of the progress of a query being executed on a database. The indication of the progress of the query allows the user to decide whether the query should be allowed to complete or should be aborted. One method that may be used to estimate the progress of a query that is being executed on a database defines a model of work performed during execution of a query. The total amount of work that will be performed during execution of the query is estimated according to the model. The amount of work performed at a given point during execution of the query is estimated according to the model. The progress of the query is estimated using the estimated amount of work at the given point in time and the estimated total amount of work. This estimated progress of query execution may be provided to the user.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: February 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
  • Patent number: 7483918
    Abstract: A monitoring component of a database server collects a subset of a query workload along with related statistics. A remote index tuning component uses the workload subset and related statistics to determine a physical design that minimizes the cost of executing queries in the workload subset while ensuring that queries omitted from the subset do not degrade in performance.
    Type: Grant
    Filed: August 10, 2004
    Date of Patent: January 27, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Arnd Christian Konig, Vivek R. Narasayya
  • Patent number: 7472107
    Abstract: Integrating the partitioning of physical design structures with the physical design process can result in more efficient query execution. When candidate structures are evaluated for their relative benefit, one or more partitioning methods is associated with each structure so that the benefits of various partitioning methods are taken into consideration when the structures are selected for use by the database. A pool of partitioned candidate structures is formed by proposing and evaluating the benefit of candidate structures with associated partitioning on a per query basis. The selected partitioned candidates are then used to construct generalized structures with associated partitioning methods that are evaluated for their benefit over the workload. Those generalized structures are added to the pool of partitioned candidate structures. From this augmented pool of partitioned candidate structures, an optimal set of partitioned structures is enumerated for use by the database system.
    Type: Grant
    Filed: June 23, 2003
    Date of Patent: December 30, 2008
    Assignee: Microsoft Corporation
    Inventors: Sanjay Agrawal, Surajit Chaudhuri, Vivek Narasayya
  • Publication number: 20080306945
    Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
  • Publication number: 20080306908
    Abstract: Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20080288482
    Abstract: A deduplication algorithm that provides improved accuracy in data deduplication by using aggregate and/or groupwise constraints. Deduplication is accomplished using only as many of these constraints that are satisfied rather than be imposed inflexibly as hard constraints. Additionally, textual similarity between tuples is leveraged to restrict the search space. The algorithm begins with a coarse initial partition of data records and continues by raising the similarity threshold until the threshold splits a given partition. This sequence of splits defines a rich space of alternatives. Over this space, an algorithm finds a partition of the input that maximizes constraint satisfaction. In the context of groupwise aggregation constraints for deduplication all SQL (structured query language) aggregates are allowed, including summation.
    Type: Application
    Filed: May 18, 2007
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Shriraghav Kaushik