Patents by Inventor Surajit Chaudhuri

Surajit Chaudhuri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8332388
    Abstract: Technology is described for transformation rule profiling for a query optimizer. The method can include obtaining a database query configured to be optimized by the query optimizer of a database system. An optimized query plan for the database query can be found using a host set of transformation rules. One transformation rule can be removed and checked at a time. Each transformation rule can be checked to determine whether the transformation rule affects an optimal query plan output. A test query plan can be generated after each transformation rule has been removed. The query optimizer can determine whether the test query plan is different than the optimized query plan in the absence of the removed transformation rule. An equivalent set of transformation rules can be created that includes transformation rules where the test query plan generated from the equivalent set of transformation rules is equivalent to the optimized plan.
    Type: Grant
    Filed: June 18, 2010
    Date of Patent: December 11, 2012
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Leo Giakoumakis, Vivek Narasayya, Ravi Ramamurthy
  • Patent number: 8307343
    Abstract: Infrastructure for capturing and correlating application context and database context for tuning, profiling and debugging tasks. The application context can include events such as data access events, and the database context can include events such as database server events. The events can be obtained from server tracing, data access layer tracing, and/or application tracing and written into respective log files. A data access event can indicate that an application consumed a row from a result set returned from a DBMS query. A post-processing step can correlate the application and database contexts by tokenizing strings and computing intersections between the tokenized strings. A tool inside a development environment may also suggest a query hint for the database or a data access API for the application based on the correlated context.
    Type: Grant
    Filed: October 19, 2007
    Date of Patent: November 6, 2012
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Manoj A. Symala
  • Patent number: 8249336
    Abstract: Techniques are described to leverage a set of sample or example matched pairs of strings to learn string transformation rules, which may be used to match data records that are semantically equivalent. In one embodiment, matched pairs of input strings are accessed. For a set of matched pairs, a set of one or more string transformation rules are learned. A transformation rule may include two strings determined to be semantically equivalent. The transformation rules are used to determine whether a first and second string match each other.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: August 21, 2012
    Assignee: Microsoft Corporation
    Inventors: Arvind Arasu, Surajit Chaudhuri, Shriraghav Kaushik
  • Publication number: 20120173500
    Abstract: A location associated with a user of a computing device and a prefix portion of an input string may be received as one or more successive characters of the input string are provided by the user via the computing device. A list of suggested items may be obtained based on a function of respective recommendation indicators and proximities of the items to the location in response to receiving the prefix portion, and based on partially traversing a character string search structure having a plurality of non-terminal nodes augmented with bound indicators associated with spatial regions. The list of suggested items and descriptive information associated with each suggested item may be returned to the user, in response to receiving the prefix portion, for rendering an image illustrating indicators associated with the list in a manner relative to the location, as the user provides each successive character of the input string.
    Type: Application
    Filed: December 29, 2010
    Publication date: July 5, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri
  • Patent number: 8214402
    Abstract: An architecture for providing interactive sessions for physical database design is described, allowing users to readily try different options, identify problems, and obtain physical designs in a flexible way. Embodiments based on a .NET assembly and modifications to a database management system (DBMS) are also described.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: July 3, 2012
    Assignee: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Patent number: 8204866
    Abstract: A deduplication algorithm that provides improved accuracy in data deduplication by using aggregate and/or groupwise constraints. Deduplication is accomplished using only as many of these constraints that are satisfied rather than be imposed inflexibly as hard constraints. Additionally, textual similarity between tuples is leveraged to restrict the search space. The algorithm begins with a coarse initial partition of data records and continues by raising the similarity threshold until the threshold splits a given partition. This sequence of splits defines a rich space of alternatives. Over this space, an algorithm finds a partition of the input that maximizes constraint satisfaction. In the context of groupwise aggregation constraints for deduplication all SQL (structured query language) aggregates are allowed, including summation.
    Type: Grant
    Filed: May 18, 2007
    Date of Patent: June 19, 2012
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Shriraghav Kaushik, Anish Das Sarma
  • Patent number: 8195655
    Abstract: Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: June 5, 2012
    Assignee: Microsoft Corporation
    Inventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Patent number: 8190595
    Abstract: A flexible query hints system and method for discovering and expressing query hints in a database management system. Embodiments of the flexible query hints system and method include a power hints (Phints) language that enables the specification of constraints to influence a query optimizer. Phints expressions are defined as tree patterns annotated with constraints. Embodiments of the flexible query hints system and method also include techniques to incorporate the power hints language expressions into an extended query optimizer. Theses techniques include computing a directed acyclic graph for Phints expression, deriving candidate matches using the Phints expression and the graph, computing candidate matches, and extracting a revised execution plan having a lowest cost and satisfying constraints of the Phints expression. Embodiments of the flexible query hints system and method include a flexible query hint user interface that allow users to interactively adjust query hints.
    Type: Grant
    Filed: March 28, 2009
    Date of Patent: May 29, 2012
    Assignee: Microsoft Corporation
    Inventors: Nicolas Bruno, Ravishankar Ramamurthy, Surajit Chaudhuri
  • Patent number: 8185519
    Abstract: An exact cardinality query optimization system and method for optimizing a query having a plurality of expressions to obtain a cardinality-optimal query execution plan for the query. Embodiments of the system and method use various techniques to shorten the time necessary to obtain the cardinality-optimal query execution plan, which contains the query execution plan when all cardinalities are exact. Embodiments of the system and method include a covering queries technique that leverages query execution feedback to obtain an unordered subset of relevant expressions for the query, an early termination technique that bounds the cardinality to determine whether the processing can be terminate before each of the expressions are executed, and an expressions ordering technique that finds an ordering of expressions that yields the greatest reduction in time to obtain the cardinality-optimal query execution plan.
    Type: Grant
    Filed: March 14, 2009
    Date of Patent: May 22, 2012
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
  • Patent number: 8150790
    Abstract: A lightweight physical design alerter can analyze a workload and determine whether a comprehensive tuning session would result in a configuration improvement over the current configuration. The alerter provides a low-overhead procedure that can run during normal operation of a database management system and produce a notification if a current configuration is less than optimal. The alerter can report lower and upper bounds on the improvements that could be obtained if a comprehensive tuning tool is launched. A lower bound can be justified by generating feasible configurations. The disclosed embodiments can be extended to query updates, materialized views, and other physical design features (e.g., partitioning).
    Type: Grant
    Filed: January 31, 2007
    Date of Patent: April 3, 2012
    Assignee: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Patent number: 8140548
    Abstract: Described is a constraint language and related technology by which complex constraints may be used in selecting configurations for use in physical database design tuning. The complex constraint (or constraints) is processed, e.g., in a search framework, to determine and output at least one configuration that meets the constraint, e.g., a best configuration found before a stopping condition is met. The search framework processes a current configuration into candidate configurations, including by searching for candidate configurations from a current configuration based upon a complex constraint, iteratively evaluating a search space until a stopping condition is satisfied, using transformation rules to generate new candidate configurations, and selecting a best candidate configuration. Transformation rules and pruning rules are applied to efficiently perform the search. Constraints may be specified as assertions that need to be satisfied, or as soft assertions that come close to satisfying the constraint.
    Type: Grant
    Filed: August 13, 2008
    Date of Patent: March 20, 2012
    Assignee: Microsoft Corporation
    Inventors: Nicolas Bruno, Surajit Chaudhuri
  • Publication number: 20110320446
    Abstract: This patent application relates to interval-based information retrieval (IR) search techniques for efficiently and correctly answering keyword search queries. In some embodiments, a range of information-containing blocks for a search query can be identified. Each of these blocks, and thus the range, can include document identifiers that identify individual corresponding documents that contain a term found in the search query. From the range, a subrange(s) having a smaller number of blocks than the range can be selected. This can be accomplished without decompressing the blocks by partitioning the range into intervals and evaluating the intervals. The smaller number of blocks in the subranges(s) can then be decompressed and processed to identify a doc ID(s) and thus document(s) that satisfies the query.
    Type: Application
    Filed: June 25, 2010
    Publication date: December 29, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20110313999
    Abstract: A relational database server may concurrently execute many relational queries, but a complex relational query may cause performance delays in the fulfillment of other relational queries. Instead, the relational database server may generate a query plan for the relational query, and may endeavor to partition the relational query between a spool operator and a scan operator into two or more query slices, where each query slice may be executed within a query slice threshold. Many alternative candidate query plans may be considered, such as inserting spool and scan operators after various operators and parameterizing operators in order to partition the records of a relation into two or more ranges based on an attribute of the relation. A large search space of candidate query plans may be reviewed in order to select a query plan that respects the query slice threshold while efficiently executing the logic of the relational query.
    Type: Application
    Filed: June 17, 2010
    Publication date: December 22, 2011
    Applicant: Microsoft Corporation
    Inventors: Nicolas Bruno, Ravishankar Ramamurthy, Surajit Chaudhuri, Vivek Ravindranath Narasayya
  • Publication number: 20110314000
    Abstract: Technology is described for transformation rule profiling for a query optimizer. The method can include obtaining a database query configured to be optimized by the query optimizer of a database system. An optimized query plan for the database query can be found using a host set of transformation rules. One transformation rule can be removed and checked at a time. Each transformation rule can be checked to determine whether the transformation rule affects an optimal query plan output. A test query plan can be generated after each transformation rule has been removed. The query optimizer can determine whether the test query plan is different than the optimized query plan in the absence of the removed transformation rule. An equivalent set of transformation rules can be created that includes transformation rules where the test query plan generated from the equivalent set of transformation rules is equivalent to the optimized plan.
    Type: Application
    Filed: June 18, 2010
    Publication date: December 22, 2011
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Leo Giakoumakis, Vivek Narasayya, Ravi Ramamurthy
  • Patent number: 8046339
    Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: October 25, 2011
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Bee Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
  • Patent number: 8032546
    Abstract: A transformation-based record matching technique. The technique provides a flexible way to account for synonyms and more general forms of string equivalences when performing record matching by taking as explicit input user-defined transformation rules (such as, for example, the fact that “Robert” and “Bob” that are synonymous). The input string and user-defined transformation rules are used to generate a larger set of strings which are used when performing record matching. Both the input string and data elements in a database can be transformed using the user-defined transformation rules in order to generate a larger set of potential record matches. These potential record matches can then be subjected to a threshold test in order to determine one or more best matches. Additionally, signature-based similarity functions are used to improve the computational efficiency of the technique.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: October 4, 2011
    Assignee: Microsoft Corp.
    Inventors: Arvind Arasu, Surajit Chaudhuri
  • Publication number: 20110214080
    Abstract: This patent application relates to taxonomy editing. One implementation involves a taxonomy editor configured to generate a visual representation of a taxonomy associated with a set of scientific papers. The taxonomy editor includes a properties module configured to identify properties relating to an individual node of the taxonomy and a statistics module configured to determine trends relating to the individual node. The taxonomy editor further includes a similarity module configured to evaluate keyword similarity relative to individual scientific papers associated with the individual node. The taxonomy editor also includes a suggestion module configured to utilize the properties, the trends and the keyword similarity to identify potential modifications to the taxonomy. The taxonomy editor is further configured to present at least some of the potential modifications, the properties, the trends, and the keyword similarity concurrently with the visual representation of the taxonomy.
    Type: Application
    Filed: February 26, 2010
    Publication date: September 1, 2011
    Applicant: Microsoft Corporation
    Inventors: Sanjay Agrawal, Surajit Chaudhuri, Venkatesh Ganti, Yuri Siradeghyan
  • Publication number: 20110208748
    Abstract: This patent application relates to foreign-key detection. One implementation obtains a set of data tables. This implementation automatically determines foreign-key relationships of columns from separate tables of the set.
    Type: Application
    Filed: February 21, 2010
    Publication date: August 25, 2011
    Inventors: Surajit Chaudhuri, Vivek R. Narasayya, Zhimin Chen
  • Publication number: 20110167053
    Abstract: A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.
    Type: Application
    Filed: March 15, 2011
    Publication date: July 7, 2011
    Applicant: Microsoft Corporation
    Inventors: Stephen Lawler, Eric J. Horvitz, Joshua T. Goodman, Anoop Gupta, Christopher A. Meek, Eric D. Brill, Gary W. Flake, Ramez Naam, Surajit Chaudhuri, Oliver Hurst-Hiller
  • Patent number: 7958114
    Abstract: A database server may be configured to compute distinct page counts of pages accessed to execute operands of respective queries. The queries may be executed against a table comprised of the pages and having an index managed by the database server. The distinct page counts may be obtained by counting, as a part of the executing of the queries, distinct pages accessed during the execution of the queries.
    Type: Grant
    Filed: April 4, 2008
    Date of Patent: June 7, 2011
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy