Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ERROR TOLERANT AUTOCOMPLETION

Publication number: 20100325136

Abstract: Techniques for error-tolerant autocompletion are described. While displaying characters of an input string as they are inputted by a user, when a character is added to the input string by the user, matching strings may be selected from among a set of candidate strings by determining which of the candidate strings have a prefix whose characters match the characters of the input string within a given edit distance of the input string.

Type: Application

Filed: June 23, 2009

Publication date: December 23, 2010

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Shriraghav Kaushik
INTERACTIVE PHYSICAL DESIGN TUNING

Publication number: 20100318543

Abstract: An architecture for providing interactive sessions for physical database design is described, allowing users to readily try different options, identify problems, and obtain physical designs in a flexible way. Embodiments based on a .NET assembly and modifications to a database management system (DBMS) are also described.

Type: Application

Filed: June 15, 2009

Publication date: December 16, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Surajit Chaudhuri, Nicolas Bruno
IDENTIFYING SYNONYMS OF ENTITIES USING A DOCUMENT COLLECTION

Publication number: 20100313258

Abstract: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.

Type: Application

Filed: June 4, 2009

Publication date: December 9, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
Keyword Searching On Database Views

Publication number: 20100299367

Abstract: A keyword search is executed on a view of a database based on a Boolean keyword query. The view includes multiple text columns, and the keyword search is executed on each of the multiple text columns in the view. The output results from the keyword search on each of the text columns include tuple identifiers of one or more relevant tuples and a relevancy score for ranking the results of the keyword query.

Type: Application

Filed: May 20, 2009

Publication date: November 25, 2010

Applicant: Microsoft Corporation

Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
IDENTIFYING SYNONYMS OF ENTITIES USING WEB SEARCH

Publication number: 20100293179

Abstract: Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.

Type: Application

Filed: May 14, 2009

Publication date: November 18, 2010

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
AUTOMATED FILTERED INDEX RECOMMENDATIONS

Publication number: 20100262593

Abstract: The described implementations relate to filtered index recommendations. In one case a filtered index recommendation (FIR) tool is configured to recommend a final set of filtered indexes to use with a workload. The final set is selected from a first set of candidate filtered indexes and a second set of merged filtered indexes.

Type: Application

Filed: April 8, 2009

Publication date: October 14, 2010

Applicant: Microsoft Corporation

Inventors: Nicolas Bruno, Surajit Chaudhuri, Vivek R. Narasayya, Manoj A. Syamala
FLEXIBLE QUERY HINTS IN A RELATIONAL DATABASE

Publication number: 20100250518

Abstract: A flexible query hints system and method for discovering and expressing query hints in a database management system. Embodiments of the flexible query hints system and method include a power hints (Phints) language that enables the specification of constraints to influence a query optimizer. Phints expressions are defined as tree patterns annotated with constraints. Embodiments of the flexible query hints system and method also include techniques to incorporate the power hints language expressions into an extended query optimizer. Theses techniques include computing a directed acyclic graph for Phints expression, deriving candidate matches using the Phints expression and the graph, computing candidate matches, and extracting a revised execution plan having a lowest cost and satisfying constraints of the Phints expression. Embodiments of the flexible query hints system and method include a flexible query hint user interface that allow users to interactively adjust query hints.

Type: Application

Filed: March 28, 2009

Publication date: September 30, 2010

Applicant: Microsoft Corporation

Inventors: Nicolas Bruno, Ravishankar Ramamurthy, Surajit Chaudhuri
TECHNIQUES FOR EXACT CARDINALITY QUERY OPTIMIZATION

Publication number: 20100235347

Abstract: An exact cardinality query optimization system and method for optimizing a query having a plurality of expressions to obtain a cardinality-optimal query execution plan for the query. Embodiments of the system and method use various techniques to shorten the time necessary to obtain the cardinality-optimal query execution plan, which contains the query execution plan when all cardinalities are exact. Embodiments of the system and method include a covering queries technique that leverages query execution feedback to obtain an unordered subset of relevant expressions for the query, an early termination technique that bounds the cardinality to determine whether the processing can be terminate before each of the expressions are executed, and an expressions ordering technique that finds an ordering of expressions that yields the greatest reduction in time to obtain the cardinality-optimal query execution plan.

Type: Application

Filed: March 14, 2009

Publication date: September 16, 2010

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
Incremental repair of query plans

Patent number: 7739269

Abstract: Database systems use a plan cache to avoid the overheads (e.g., time, money) of query recompilation. Query plans can become invalidated by updates to the statistics on data or changes to the physical database design. Once a plan is invalidated, it can be repaired utilizing one or more of the disclosed embodiments. Incremental repair of query plans includes reusing parts of the current plan rather than discarding the plan entirely when it is invalidated. Repair to an existing query plan is attempted before resorting to full recompilation.

Type: Grant

Filed: January 19, 2007

Date of Patent: June 15, 2010

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Ravishankar Ramamurthy
Visual and multi-dimensional search

Patent number: 7739221

Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

Type: Grant

Filed: June 28, 2006

Date of Patent: June 15, 2010

Assignee: Microsoft Corporation

Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
Robust cardinality and cost estimation for skyline operator

Patent number: 7707207

Abstract: The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and, based on a cardinality estimate and a cost estimate, an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.

Type: Grant

Filed: February 17, 2006

Date of Patent: April 27, 2010

Assignee: Microsoft Corporation

Inventors: Kaushik Shriraghav, Surajit Chaudhuri, Nilesh N. Dalvi
Database physical design refinement using a merge-reduce approach

Patent number: 7685145

Abstract: Various embodiments are disclosed relating to database configuration refinement. In an example embodiment, a method is provided that may include determining a size limitation for a database configuration, determining a workload of the database configuration, and making a determination that a size of the database configuration is greater than a size limit. The method may also include applying either a merge process or a reduction process to decrease the size of the database configuration. The merge process may merge a first index/view with a second index/view to produce a merged index/view, for example. The reduction process may delete a first portion of a first view to produce a reduced view.

Type: Grant

Filed: March 28, 2006

Date of Patent: March 23, 2010

Assignee: Microsoft Corporation

Inventors: Nicolas Bruno, Surajit Chaudhuri
Detecting duplicate records in databases

Patent number: 7685090

Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Type: Grant

Filed: July 14, 2005

Date of Patent: March 23, 2010

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
Constrained Physical Design Tuning

Publication number: 20100042963

Abstract: Described is a constraint language and related technology by which complex constraints may be used in selecting configurations for use in physical database design tuning. The complex constraint (or constraints) is processed, e.g., in a search framework, to determine and output at least one configuration that meets the constraint, e.g., a best configuration found before a stopping condition is met. The search framework processes a current configuration into candidate configurations, including by searching for candidate configurations from a current configuration based upon a complex constraint, iteratively evaluating a search space until a stopping condition is satisfied, using transformation rules to generate new candidate configurations, and selecting a best candidate configuration. Transformation rules and pruning rules are applied to efficiently perform the search. Constraints may be specified as assertions that need to be satisfied, or as soft assertions that come close to satisfying the constraint.

Type: Application

Filed: August 13, 2008

Publication date: February 18, 2010

Applicant: Microsoft Corporation

Inventors: Nicolas Bruno, Surajit Chaudhuri
QUERY-DRIVEN WEB PORTALS

Publication number: 20090327223

Abstract: The described implementations relate to query portals. One technique analyzes search results generated by a web search engine responsive to a user search query. The technique also dynamically generates a query portal that lists the search results as well as entities identified from the search results.

Type: Application

Filed: June 26, 2008

Publication date: December 31, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Dong Xin, Sanjay Agrawal, Arnd Christian Konig
Scalable lookup-driven entity extraction from indexed document collections

Publication number: 20090319500

Abstract: A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Applicant: Microsoft Corporation

Inventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
Query selectivity estimation with confidence interval

Patent number: 7636707

Abstract: Selectivity estimates are produced that meet a desired confidence threshold. To determine the confidence level of a given selectivity estimate for a query expression, the query expression is evaluated on a sample tuples. A probability density function is derived based on the number of tuples in the sample that satisfy the query expression. The cumulative distribution for the probability density function is solved for the given threshold to determine a selectivity estimate at the given confidence value.

Type: Grant

Filed: April 6, 2004

Date of Patent: December 22, 2009

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Brian Frederick Babcock
DETECTING ESTIMATION ERRORS IN DICTINCT PAGE COUNTS

Publication number: 20090254522

Abstract: A database server may be configured to compute distinct page counts of pages accessed to execute operands of respective queries. The queries may be executed against a table comprised of the pages and having an index managed by the database server. The distinct page counts may be obtained by counting, as a part of the executing of the queries, distinct pages accessed during the execution of the queries.

Type: Application

Filed: April 4, 2008

Publication date: October 8, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
TRANSFORMATION-BASED FRAMEWORK FOR RECORD MATCHING

Publication number: 20090210418

Abstract: A transformation-based record matching technique. The technique provides a flexible way to account for synonyms and more general forms of string equivalences when performing record matching by taking as explicit input user-defined transformation rules (such as, for example, the fact that “Robert” and “Bob” that are synonymous). The input string and user-defined transformation rules are used to generate a larger set of strings which are used when performing record matching. Both the input string and data elements in a database can be transformed using the user-defined transformation rules in order to generate a larger set of potential record matches. These potential record matches can then be subjected to a threshold test in order to determine one or more best matches. Additionally, signature-based similarity functions are used to improve the computational efficiency of the technique.

Type: Application

Filed: February 15, 2008

Publication date: August 20, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Arvind Arasu, Surajit Chaudhuri, Shriraghav Kaushik
Sampling for queries

Patent number: 7577638

Abstract: An outlier index for a database and a given workload is generated by identifying sub-relations of tuples in the database induced by selection and group by conditions in queries in the workload. A variance is then generated for values in each sub-relation. Sub-relations having higher variances are selected, and outliers from such sub-relations having higher variances are generated.

Type: Grant

Filed: December 7, 2005

Date of Patent: August 18, 2009

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Rajeev Motwani, Mayur D. Datar

prev … 5 6 7 8 9 10 11 12 13 … next