Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7567949
    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.
    Type: Grant
    Filed: September 10, 2002
    Date of Patent: July 28, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
  • Patent number: 7567962
    Abstract: At least one implementation of database management technology, described herein, utilizes categorization of query results when querying a relational database in order to reduce information overload. To reduce information overload even further, another implementation, described herein, utilizes both categorization and ranking of query results when searching a relational database.
    Type: Grant
    Filed: August 13, 2004
    Date of Patent: July 28, 2009
    Assignee: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Seung-won Hwang, Surajit Chaudhuri
  • Patent number: 7562067
    Abstract: A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.
    Type: Grant
    Filed: May 6, 2005
    Date of Patent: July 14, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Kaushik Shriraghav
  • Publication number: 20090106746
    Abstract: Infrastructure for capturing and correlating application context and database context for tuning, profiling and debugging tasks. The infrastructure extends the DBMS and application profiling infrastructure making it easy for a developer to invoke and interact with a tool from inside the development environment. Three sources of information are employed when an application is executed: server tracing, data access layer tracing, and application tracing. The events obtained from each of these sources are written into a log file. An event log is generated on each machine that involves either an application process or the DBMS server process and the log file receives log traces from different processes on a machine to the same trace session. A post-processing step over the event log(s) correlates the application and database contexts. The output is a single view where both the application and database profile of each statement issued by the application are exposed.
    Type: Application
    Filed: October 19, 2007
    Publication date: April 23, 2009
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Manoj A. Symala
  • Publication number: 20090094191
    Abstract: A proactive monitoring mechanism for correcting the choice of access methods (available query plans) for a given query, based on execution feedback from the same query. The mechanism exploits bypassing predicate short-circuiting inside the database server's predicate evaluation module to obtain expression cardinalities. The mechanism can also modify a plan to obtain expression cardinalities. These techniques are used judiciously by the query optimizer and/or a database administrator (DBA) so that the execution overheads are within acceptable limits.
    Type: Application
    Filed: October 8, 2007
    Publication date: April 9, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Ravishankar Ramamurthy
  • Publication number: 20090094086
    Abstract: Assignment algorithm for automatically making assignments between documents and document reviewers for a review process. If the automated assignments need adjusting, a coordinator can manually refine the assignment(s). The assignment algorithm facilitates the automated assignment process based on inputs related to a constraint and/or a preference. The constraints and preferences include, but are not limited to, a conflict of interest, a minimum number of reviews, a maximum number of submissions, a partial assignment, bidding preferences, and health metrics. Once the assignments have been made, histograms can be generated that present an overview of certain health metrics, further allowing refinement of the assignment process.
    Type: Application
    Filed: October 3, 2007
    Publication date: April 9, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Nicolas Bruno, Vivek R. Narasayya, Surajit Chaudhuri
  • Patent number: 7516149
    Abstract: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.
    Type: Grant
    Filed: August 30, 2004
    Date of Patent: April 7, 2009
    Assignee: Microsoft Corporation
    Inventors: Rajeev Motwani, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20090083238
    Abstract: Stop-and-restart query execution that partially leverages the work already performed during the initial execution of the query to reduce the execution time during a restart. The technique selectively saves information from a previous execution of the query so that the overhead associated with restarting the query execution can be bounded. Despite saving only limited information, the disclosed technique substantially reduces the running time of the restarted query. The stop-and-restart query execution technique is constrained to save and reuse only a bounded number of records (intermediate records or output records) thereby releasing all other resources, rather than some of the resources. The technique chooses a subset of the records to save that were found during normal execution and then skipping the corresponding records when performing a scan during restart to prevent the duplication of execution. A skip-scan operator is employed to facilitate the disclosed restart technique.
    Type: Application
    Filed: September 21, 2007
    Publication date: March 26, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Shriraghav Kaushik, Abhijit Pol, Ravishankar Ramamurthy
  • Publication number: 20090083214
    Abstract: Index structures and query processing framework that enforces a given threshold on the overhead of computing conjunctive keyword queries. This includes a keyword processing algorithm, logic to determine which indexes to materialize, and a probabilistic approach to reducing the overhead for determining which indexes to build. The index structures leverage the fact that the frequency distribution of natural-language text follows a power law. Given a document collection, a set of indexes is proposed for materialization so that the time for intersecting keywords does not exceed a given threshold ?. When considering the associated space requirement, the additional indexes are limited. Materialization of such a set of indexes for reasonable values of ? (e.g., the time required to scan 20% of the largest inverted index), at least for a collection of short documents is distributed by the power law.
    Type: Application
    Filed: September 21, 2007
    Publication date: March 26, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Arnd C. Konig, Surajit Chaudhuri, Kenneth Church, Liying Sui
  • Patent number: 7493316
    Abstract: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.
    Type: Grant
    Filed: December 7, 2005
    Date of Patent: February 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar
  • Patent number: 7493337
    Abstract: A query progress indicator that provides an indication to a user of the progress of a query being executed on a database. The indication of the progress of the query allows the user to decide whether the query should be allowed to complete or should be aborted. One method that may be used to estimate the progress of a query that is being executed on a database defines a model of work performed during execution of a query. The total amount of work that will be performed during execution of the query is estimated according to the model. The amount of work performed at a given point during execution of the query is estimated according to the model. The progress of the query is estimated using the estimated amount of work at the given point in time and the estimated total amount of work. This estimated progress of query execution may be provided to the user.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: February 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
  • Patent number: 7483918
    Abstract: A monitoring component of a database server collects a subset of a query workload along with related statistics. A remote index tuning component uses the workload subset and related statistics to determine a physical design that minimizes the cost of executing queries in the workload subset while ensuring that queries omitted from the subset do not degrade in performance.
    Type: Grant
    Filed: August 10, 2004
    Date of Patent: January 27, 2009
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Arnd Christian Konig, Vivek R. Narasayya
  • Patent number: 7472107
    Abstract: Integrating the partitioning of physical design structures with the physical design process can result in more efficient query execution. When candidate structures are evaluated for their relative benefit, one or more partitioning methods is associated with each structure so that the benefits of various partitioning methods are taken into consideration when the structures are selected for use by the database. A pool of partitioned candidate structures is formed by proposing and evaluating the benefit of candidate structures with associated partitioning on a per query basis. The selected partitioned candidates are then used to construct generalized structures with associated partitioning methods that are evaluated for their benefit over the workload. Those generalized structures are added to the pool of partitioned candidate structures. From this augmented pool of partitioned candidate structures, an optimal set of partitioned structures is enumerated for use by the database system.
    Type: Grant
    Filed: June 23, 2003
    Date of Patent: December 30, 2008
    Assignee: Microsoft Corporation
    Inventors: Sanjay Agrawal, Surajit Chaudhuri, Vivek Narasayya
  • Publication number: 20080306945
    Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
  • Publication number: 20080306908
    Abstract: Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20080288482
    Abstract: A deduplication algorithm that provides improved accuracy in data deduplication by using aggregate and/or groupwise constraints. Deduplication is accomplished using only as many of these constraints that are satisfied rather than be imposed inflexibly as hard constraints. Additionally, textual similarity between tuples is leveraged to restrict the search space. The algorithm begins with a coarse initial partition of data records and continues by raising the similarity threshold until the threshold splits a given partition. This sequence of splits defines a rich space of alternatives. Over this space, an algorithm finds a partition of the input that maximizes constraint satisfaction. In the context of groupwise aggregation constraints for deduplication all SQL (structured query language) aggregates are allowed, including summation.
    Type: Application
    Filed: May 18, 2007
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Shriraghav Kaushik
  • Patent number: 7454407
    Abstract: Techniques for estimating the progress of database queries are described herein. In a first implementation, a respective lower-bound parameter is associated with each node in an operator tree that representing a given database query, and the progress of the database query at a given point is estimated based upon the lower-bound parameters. In a second implementation, the progress of the query is estimated by associating respective lower-bound and upper-bound parameters with each node in the operator tree. The progress of the query at the given point is then estimated based on the lower-bound and upper-bound parameters.
    Type: Grant
    Filed: June 10, 2005
    Date of Patent: November 18, 2008
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy, Kaushik Shriraghav
  • Publication number: 20080183644
    Abstract: A lightweight physical design alerter can analyze a workload and determine whether a comprehensive tuning session would result in a configuration improvement over the current configuration. The alerter provides a low-overhead procedure that can run during normal operation of a database management system and produce a notification if a current configuration is less than optimal. The alerter can report lower and upper bounds on the improvements that could be obtained if a comprehensive tuning tool is launched. A lower bound can be justified by generating feasible configurations. The disclosed embodiments can be extended to query updates, materialized views, and other physical design features (e.g., partitioning).
    Type: Application
    Filed: January 31, 2007
    Publication date: July 31, 2008
    Applicant: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Publication number: 20080183764
    Abstract: Online physical design tuning is constantly monitoring database indexes and can effectively react to changes in a workload by modifying the physical design as needed. Algorithms can be utilized that take into account various criteria including storage constraints, update statements, and the cost of temporarily creating physical structures.
    Type: Application
    Filed: January 31, 2007
    Publication date: July 31, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Patent number: 7406479
    Abstract: A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing. The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.
    Type: Grant
    Filed: February 10, 2006
    Date of Patent: July 29, 2008
    Assignee: Microsoft Corporation
    Inventors: Kaushik Shriraghav, Surajit Chaudhuri, Venkatesh Ganti