Patents by Inventor Surajit Chaudhuri
Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8214402Abstract: An architecture for providing interactive sessions for physical database design is described, allowing users to readily try different options, identify problems, and obtain physical designs in a flexible way. Embodiments based on a .NET assembly and modifications to a database management system (DBMS) are also described.Type: GrantFiled: June 15, 2009Date of Patent: July 3, 2012Assignee: Microsoft CorporationInventors: Nicolas Bruno, Surajit Chaudhuri
-
Patent number: 8204866Abstract: A deduplication algorithm that provides improved accuracy in data deduplication by using aggregate and/or groupwise constraints. Deduplication is accomplished using only as many of these constraints that are satisfied rather than be imposed inflexibly as hard constraints. Additionally, textual similarity between tuples is leveraged to restrict the search space. The algorithm begins with a coarse initial partition of data records and continues by raising the similarity threshold until the threshold splits a given partition. This sequence of splits defines a rich space of alternatives. Over this space, an algorithm finds a partition of the input that maximizes constraint satisfaction. In the context of groupwise aggregation constraints for deduplication all SQL (structured query language) aggregates are allowed, including summation.Type: GrantFiled: May 18, 2007Date of Patent: June 19, 2012Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Venkatesh Ganti, Shriraghav Kaushik, Anish Das Sarma
-
Patent number: 8195655Abstract: Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.Type: GrantFiled: June 5, 2007Date of Patent: June 5, 2012Assignee: Microsoft CorporationInventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
-
Patent number: 8190595Abstract: A flexible query hints system and method for discovering and expressing query hints in a database management system. Embodiments of the flexible query hints system and method include a power hints (Phints) language that enables the specification of constraints to influence a query optimizer. Phints expressions are defined as tree patterns annotated with constraints. Embodiments of the flexible query hints system and method also include techniques to incorporate the power hints language expressions into an extended query optimizer. Theses techniques include computing a directed acyclic graph for Phints expression, deriving candidate matches using the Phints expression and the graph, computing candidate matches, and extracting a revised execution plan having a lowest cost and satisfying constraints of the Phints expression. Embodiments of the flexible query hints system and method include a flexible query hint user interface that allow users to interactively adjust query hints.Type: GrantFiled: March 28, 2009Date of Patent: May 29, 2012Assignee: Microsoft CorporationInventors: Nicolas Bruno, Ravishankar Ramamurthy, Surajit Chaudhuri
-
Patent number: 8185519Abstract: An exact cardinality query optimization system and method for optimizing a query having a plurality of expressions to obtain a cardinality-optimal query execution plan for the query. Embodiments of the system and method use various techniques to shorten the time necessary to obtain the cardinality-optimal query execution plan, which contains the query execution plan when all cardinalities are exact. Embodiments of the system and method include a covering queries technique that leverages query execution feedback to obtain an unordered subset of relevant expressions for the query, an early termination technique that bounds the cardinality to determine whether the processing can be terminate before each of the expressions are executed, and an expressions ordering technique that finds an ordering of expressions that yields the greatest reduction in time to obtain the cardinality-optimal query execution plan.Type: GrantFiled: March 14, 2009Date of Patent: May 22, 2012Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
-
Patent number: 8150790Abstract: A lightweight physical design alerter can analyze a workload and determine whether a comprehensive tuning session would result in a configuration improvement over the current configuration. The alerter provides a low-overhead procedure that can run during normal operation of a database management system and produce a notification if a current configuration is less than optimal. The alerter can report lower and upper bounds on the improvements that could be obtained if a comprehensive tuning tool is launched. A lower bound can be justified by generating feasible configurations. The disclosed embodiments can be extended to query updates, materialized views, and other physical design features (e.g., partitioning).Type: GrantFiled: January 31, 2007Date of Patent: April 3, 2012Assignee: Microsoft CorporationInventors: Nicolas Bruno, Surajit Chaudhuri
-
Patent number: 8140548Abstract: Described is a constraint language and related technology by which complex constraints may be used in selecting configurations for use in physical database design tuning. The complex constraint (or constraints) is processed, e.g., in a search framework, to determine and output at least one configuration that meets the constraint, e.g., a best configuration found before a stopping condition is met. The search framework processes a current configuration into candidate configurations, including by searching for candidate configurations from a current configuration based upon a complex constraint, iteratively evaluating a search space until a stopping condition is satisfied, using transformation rules to generate new candidate configurations, and selecting a best candidate configuration. Transformation rules and pruning rules are applied to efficiently perform the search. Constraints may be specified as assertions that need to be satisfied, or as soft assertions that come close to satisfying the constraint.Type: GrantFiled: August 13, 2008Date of Patent: March 20, 2012Assignee: Microsoft CorporationInventors: Nicolas Bruno, Surajit Chaudhuri
-
Publication number: 20110320446Abstract: This patent application relates to interval-based information retrieval (IR) search techniques for efficiently and correctly answering keyword search queries. In some embodiments, a range of information-containing blocks for a search query can be identified. Each of these blocks, and thus the range, can include document identifiers that identify individual corresponding documents that contain a term found in the search query. From the range, a subrange(s) having a smaller number of blocks than the range can be selected. This can be accomplished without decompressing the blocks by partitioning the range into intervals and evaluating the intervals. The smaller number of blocks in the subranges(s) can then be decompressed and processed to identify a doc ID(s) and thus document(s) that satisfies the query.Type: ApplicationFiled: June 25, 2010Publication date: December 29, 2011Applicant: MICROSOFT CORPORATIONInventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
-
Publication number: 20110313999Abstract: A relational database server may concurrently execute many relational queries, but a complex relational query may cause performance delays in the fulfillment of other relational queries. Instead, the relational database server may generate a query plan for the relational query, and may endeavor to partition the relational query between a spool operator and a scan operator into two or more query slices, where each query slice may be executed within a query slice threshold. Many alternative candidate query plans may be considered, such as inserting spool and scan operators after various operators and parameterizing operators in order to partition the records of a relation into two or more ranges based on an attribute of the relation. A large search space of candidate query plans may be reviewed in order to select a query plan that respects the query slice threshold while efficiently executing the logic of the relational query.Type: ApplicationFiled: June 17, 2010Publication date: December 22, 2011Applicant: Microsoft CorporationInventors: Nicolas Bruno, Ravishankar Ramamurthy, Surajit Chaudhuri, Vivek Ravindranath Narasayya
-
Publication number: 20110314000Abstract: Technology is described for transformation rule profiling for a query optimizer. The method can include obtaining a database query configured to be optimized by the query optimizer of a database system. An optimized query plan for the database query can be found using a host set of transformation rules. One transformation rule can be removed and checked at a time. Each transformation rule can be checked to determine whether the transformation rule affects an optimal query plan output. A test query plan can be generated after each transformation rule has been removed. The query optimizer can determine whether the test query plan is different than the optimized query plan in the absence of the removed transformation rule. An equivalent set of transformation rules can be created that includes transformation rules where the test query plan generated from the equivalent set of transformation rules is equivalent to the optimized plan.Type: ApplicationFiled: June 18, 2010Publication date: December 22, 2011Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Leo Giakoumakis, Vivek Narasayya, Ravi Ramamurthy
-
Patent number: 8046339Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.Type: GrantFiled: June 5, 2007Date of Patent: October 25, 2011Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Bee Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
-
Patent number: 8032546Abstract: A transformation-based record matching technique. The technique provides a flexible way to account for synonyms and more general forms of string equivalences when performing record matching by taking as explicit input user-defined transformation rules (such as, for example, the fact that “Robert” and “Bob” that are synonymous). The input string and user-defined transformation rules are used to generate a larger set of strings which are used when performing record matching. Both the input string and data elements in a database can be transformed using the user-defined transformation rules in order to generate a larger set of potential record matches. These potential record matches can then be subjected to a threshold test in order to determine one or more best matches. Additionally, signature-based similarity functions are used to improve the computational efficiency of the technique.Type: GrantFiled: February 15, 2008Date of Patent: October 4, 2011Assignee: Microsoft Corp.Inventors: Arvind Arasu, Surajit Chaudhuri
-
Publication number: 20110214080Abstract: This patent application relates to taxonomy editing. One implementation involves a taxonomy editor configured to generate a visual representation of a taxonomy associated with a set of scientific papers. The taxonomy editor includes a properties module configured to identify properties relating to an individual node of the taxonomy and a statistics module configured to determine trends relating to the individual node. The taxonomy editor further includes a similarity module configured to evaluate keyword similarity relative to individual scientific papers associated with the individual node. The taxonomy editor also includes a suggestion module configured to utilize the properties, the trends and the keyword similarity to identify potential modifications to the taxonomy. The taxonomy editor is further configured to present at least some of the potential modifications, the properties, the trends, and the keyword similarity concurrently with the visual representation of the taxonomy.Type: ApplicationFiled: February 26, 2010Publication date: September 1, 2011Applicant: Microsoft CorporationInventors: Sanjay Agrawal, Surajit Chaudhuri, Venkatesh Ganti, Yuri Siradeghyan
-
Publication number: 20110208748Abstract: This patent application relates to foreign-key detection. One implementation obtains a set of data tables. This implementation automatically determines foreign-key relationships of columns from separate tables of the set.Type: ApplicationFiled: February 21, 2010Publication date: August 25, 2011Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Zhimin Chen
-
Publication number: 20110167053Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.Type: ApplicationFiled: March 15, 2011Publication date: July 7, 2011Applicant: Microsoft CorporationInventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
-
Patent number: 7958114Abstract: A database server may be configured to compute distinct page counts of pages accessed to execute operands of respective queries. The queries may be executed against a table comprised of the pages and having an index managed by the database server. The distinct page counts may be obtained by counting, as a part of the executing of the queries, distinct pages accessed during the execution of the queries.Type: GrantFiled: April 4, 2008Date of Patent: June 7, 2011Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
-
Patent number: 7930301Abstract: A search of an index database or another search method is conducted to identify preliminary results listing one or more selected computer objects having selected identifying information stored in an index database. In addition, one or more selected computer objects of the preliminary search results are correlated with one or more other computer objects that have associations with the selected computer objects of the preliminary search results. Integrated search results are then returned and include the preliminary search results and one or more other computer objects that have associations with the selected computer objects of the preliminary search results. The associations may be determined by a association system and represent relationships between computer files based upon user or other interactions between the objects. The associations between the objects may include similarities between them and their importance.Type: GrantFiled: March 31, 2003Date of Patent: April 19, 2011Assignee: Microsoft CorporationInventors: Cezary Marcjan, Ryszard Kott, Surajit Chaudhuri, Lili Cheng
-
Patent number: 7917514Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.Type: GrantFiled: June 28, 2006Date of Patent: March 29, 2011Assignee: Microsoft CorporationInventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
-
Publication number: 20110038531Abstract: Techniques are described to leverage a set of sample or example matched pairs of strings to learn string transformation rules, which may be used to match data records that are semantically equivalent. In one embodiment, matched pairs of input strings are accessed. For a set of matched pairs, a set of one or more string transformation rules are learned. A transformation rule may include two strings determined to be semantically equivalent. The transformation rules are used to determine whether a first and second string match each other.Type: ApplicationFiled: August 14, 2009Publication date: February 17, 2011Applicant: MICROSOFT CORPORATIONInventors: Arvind Arasu, Surajit Chaudhuri, Shriraghav Kaushik
-
Patent number: 7882121Abstract: A query generation using cardinality constraints process including choosing a first set of parameters for a query, calculating an additional set of parameters based on the first set of parameters, executing the query using additional set of parameters, evaluating the cardinality error the additional set of parameters, and refining the additional set of parameters to meet the desired cardinality constraint. Creating a query and selecting parameters for the query to meet a desired cardinality constraint or set of cardinality constraints when the query is executed against a database may be difficult. A query generation using cardinality constraints process may create a set of parameters for a query which satisfies a desired cardinality constraint or set of cardinality constraints. An application of such a query generation using cardinality constraints process may be database component and code testing.Type: GrantFiled: January 27, 2006Date of Patent: February 1, 2011Assignee: Microsoft CorporationInventors: Nicolas Bruno, Surajit Chaudhuri, Dilys Thomas