Patents by Inventor Venkatesh Ganti

Venkatesh Ganti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20110320446
    Abstract: This patent application relates to interval-based information retrieval (IR) search techniques for efficiently and correctly answering keyword search queries. In some embodiments, a range of information-containing blocks for a search query can be identified. Each of these blocks, and thus the range, can include document identifiers that identify individual corresponding documents that contain a term found in the search query. From the range, a subrange(s) having a smaller number of blocks than the range can be selected. This can be accomplished without decompressing the blocks by partitioning the range into intervals and evaluating the intervals. The smaller number of blocks in the subranges(s) can then be decompressed and processed to identify a doc ID(s) and thus document(s) that satisfies the query.
    Type: Application
    Filed: June 25, 2010
    Publication date: December 29, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20110314010
    Abstract: A query comprising a set of keywords may be applied to a data set having various attributes, but it may be difficult to determine the query predicates intended for each keyword (e.g., the attributes targeted by each keyword, and the values of those attributes satisfying the keyword.) The meaning of a keyword of interest may be inferred from a set of query pairs, comprising a background query (comprising a set of keywords excluding the keyword of interest) and a foreground query (comprising the same set of keywords but also including the keyword of interest.) Differences in the query results for the foreground query and the background query of many query pairs may identify a query predicate intended by the keyword and a confidence score. These results may be associated with the keyword in a keyword map, useful for translating queries into query predicates that may yield relevant query results.
    Type: Application
    Filed: June 17, 2010
    Publication date: December 22, 2011
    Applicant: Microsoft Corporation
    Inventors: Venkatesh Ganti, Dong Xin, Yeye He
  • Publication number: 20110282856
    Abstract: Embodiments for identifying an entity synonym of an entity are described. A query log is stored in a database located on at least one computing device. A candidate generation module can select a candidate query in the query log that shares a click on a URL with the entity. A correlated tag module can generate a set of phrase-tag pairs for the entity and the candidate query and measure a mutual information value for each phrase-tag pair. A candidate filtering module can determine a click similarity value between the candidate query and the entity based on a set of URLs selected in the search engine results and a tag similarity value based on the mutual information values. A candidate query is selected as an entity synonym if the click similarity value and the tag similarity value are greater than predetermined thresholds respectively.
    Type: Application
    Filed: May 14, 2010
    Publication date: November 17, 2011
    Applicant: Microsoft Corporation
    Inventors: Venkatesh Ganti, Dong Xin
  • Patent number: 8046339
    Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: October 25, 2011
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Bee Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
  • Patent number: 8037069
    Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.
    Type: Grant
    Filed: June 3, 2008
    Date of Patent: October 11, 2011
    Assignee: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
  • Publication number: 20110214080
    Abstract: This patent application relates to taxonomy editing. One implementation involves a taxonomy editor configured to generate a visual representation of a taxonomy associated with a set of scientific papers. The taxonomy editor includes a properties module configured to identify properties relating to an individual node of the taxonomy and a statistics module configured to determine trends relating to the individual node. The taxonomy editor further includes a similarity module configured to evaluate keyword similarity relative to individual scientific papers associated with the individual node. The taxonomy editor also includes a suggestion module configured to utilize the properties, the trends and the keyword similarity to identify potential modifications to the taxonomy. The taxonomy editor is further configured to present at least some of the potential modifications, the properties, the trends, and the keyword similarity concurrently with the visual representation of the taxonomy.
    Type: Application
    Filed: February 26, 2010
    Publication date: September 1, 2011
    Applicant: Microsoft Corporation
    Inventors: Sanjay Agrawal, Surajit Chaudhuri, Venkatesh Ganti, Yuri Siradeghyan
  • Patent number: 7970808
    Abstract: Entities, such as people, places and things, are labeled based on information collected across a possibly large number of documents. One or more documents are scanned to recognize the entities, and features are extracted from the context in which those entities occur in the documents. Observed entity-feature pairs are stored either in an in-memory store or an external store. A store manager optimizes use of the limited amount of space for an in-memory store by determining which store to put an entity-feature pair in, and when to evict features from the in-memory store to make room for new pairs. Feature that may be observed in an entity's context may take forms such as specific word sequences or membership in a particular list.
    Type: Grant
    Filed: May 5, 2008
    Date of Patent: June 28, 2011
    Assignee: Microsoft Corporation
    Inventors: Arnd Christian Konig, Venkatesh Ganti
  • Publication number: 20110125791
    Abstract: Techniques are described herein for classifying a search query with respect to query intent using search result tag ratios. A tag is a character or a combination of characters (e.g., one or more words) that indicates a property of a document, such as a topic of the document, a type of entity (i.e., subject matter) the document references, etc. A search result tag ratio is defined as a fraction (e.g., a proportion, a percentage, etc.) of the documents in a search result that includes a respective tag. A search query may be classified based on back-off ratios, which are tag ratios of search queries that are related to the search query to be classified. Tag ratios may be pre-computed (i.e., calculated before the corresponding search queries are received from users).
    Type: Application
    Filed: November 25, 2009
    Publication date: May 26, 2011
    Applicant: Microsoft Corporation
    Inventors: Arnd Christian Konig, Venkatesh Ganti, Xiao Li
  • Patent number: 7865505
    Abstract: A machine implemented system and method that efficiently facilitates and effectuates exact similarity joins between collections of sets. The system and method obtains a collection of sets and a threshold value from an interface, and based at least in part on an identifiable similarity, such as an overlap or intersection, between the collection of sets the analysis component generates and outputs a candidate pair that at least equals or exceeds the threshold value.
    Type: Grant
    Filed: January 30, 2007
    Date of Patent: January 4, 2011
    Assignee: Microsoft Corporation
    Inventors: Arvind Arasu, Venkatesh Ganti, Kaushik Shriraghav
  • Publication number: 20100313258
    Abstract: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
  • Publication number: 20100299367
    Abstract: A keyword search is executed on a view of a database based on a Boolean keyword query. The view includes multiple text columns, and the keyword search is executed on each of the multiple text columns in the view. The output results from the keyword search on each of the text columns include tuple identifiers of one or more relevant tuples and a relevancy score for ranking the results of the keyword query.
    Type: Application
    Filed: May 20, 2009
    Publication date: November 25, 2010
    Applicant: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Publication number: 20100293179
    Abstract: Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.
    Type: Application
    Filed: May 14, 2009
    Publication date: November 18, 2010
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
  • Patent number: 7730060
    Abstract: The subject disclosure pertains to a class of object finder queries that return the best target objects that match a set of given keywords. Mechanisms are provided that facilitate identification of target objects related to search objects that match a set of query keywords. Scoring mechanisms/functions are also disclosed that compute relevance scores of target objects. Further, efficient early termination techniques are provided to compute the top K target objects based on a scoring function.
    Type: Grant
    Filed: June 9, 2006
    Date of Patent: June 1, 2010
    Assignee: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Venkatesh Ganti, Dong Xin
  • Patent number: 7720883
    Abstract: Architecture that provides a data profile computation technique which employs key profile computation and data pattern profile computation. Key profile computation in a data table includes both exact keys as well as approximate keys, and is based on key strengths. A key strength of 100% is an exact key, and any other percentage in an approximate key. The key strength is estimated based on the number of table rows that have duplicated attribute values. Only column sets that exceed a threshold value are returned. Pattern profiling identifies a small set of regular expression patterns which best describe the patterns within a given set of attribute values. Pattern profiling includes three phases: a first phases for determining token regular expressions, a second phase for determining candidate regular expressions, and a third phase for identifying the best regular expressions of the candidates that match the attribute values.
    Type: Grant
    Filed: June 27, 2007
    Date of Patent: May 18, 2010
    Assignee: Microsoft Corporation
    Inventors: Zhimin Chen, Venkatesh Ganti, Gunjan Jha, Shriraghav Kaushik, Vivek Narasayya
  • Patent number: 7685090
    Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.
    Type: Grant
    Filed: July 14, 2005
    Date of Patent: March 23, 2010
    Assignee: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
  • Publication number: 20090327223
    Abstract: The described implementations relate to query portals. One technique analyzes search results generated by a web search engine responsive to a user search query. The technique also dynamically generates a query portal that lists the search results as well as entities identified from the search results.
    Type: Application
    Filed: June 26, 2008
    Publication date: December 31, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Dong Xin, Sanjay Agrawal, Arnd Christian Konig
  • Publication number: 20090319500
    Abstract: A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Applicant: Microsoft Corporation
    Inventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Patent number: 7634464
    Abstract: The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match records. To assist design of such packages, a learning technique based on examples is provided. More specifically, a set of matching and non-matching record pairs can be input and employed to facilitate automatic package generation. A generated package can subsequently be transformed manually and/or automatically into a semantically equivalent form optimized for execution.
    Type: Grant
    Filed: June 14, 2006
    Date of Patent: December 15, 2009
    Assignee: Microsoft Corporation
    Inventors: Bee-Chung Chen, Venkatesh Ganti, Kaushik Shriraghav
  • Publication number: 20090300014
    Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.
    Type: Application
    Filed: June 3, 2008
    Publication date: December 3, 2009
    Applicant: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
  • Patent number: 7627567
    Abstract: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.
    Type: Grant
    Filed: April 14, 2004
    Date of Patent: December 1, 2009
    Assignee: Microsoft Corporation
    Inventors: Venkatesh Ganti, Vassilakis Theodore, Yevgeny Agichtein