Patents by Inventor Venkatesh Ganti
Venkatesh Ganti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20110320446Abstract: This patent application relates to interval-based information retrieval (IR) search techniques for efficiently and correctly answering keyword search queries. In some embodiments, a range of information-containing blocks for a search query can be identified. Each of these blocks, and thus the range, can include document identifiers that identify individual corresponding documents that contain a term found in the search query. From the range, a subrange(s) having a smaller number of blocks than the range can be selected. This can be accomplished without decompressing the blocks by partitioning the range into intervals and evaluating the intervals. The smaller number of blocks in the subranges(s) can then be decompressed and processed to identify a doc ID(s) and thus document(s) that satisfies the query.Type: ApplicationFiled: June 25, 2010Publication date: December 29, 2011Applicant: MICROSOFT CORPORATIONInventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
-
Publication number: 20110314010Abstract: A query comprising a set of keywords may be applied to a data set having various attributes, but it may be difficult to determine the query predicates intended for each keyword (e.g., the attributes targeted by each keyword, and the values of those attributes satisfying the keyword.) The meaning of a keyword of interest may be inferred from a set of query pairs, comprising a background query (comprising a set of keywords excluding the keyword of interest) and a foreground query (comprising the same set of keywords but also including the keyword of interest.) Differences in the query results for the foreground query and the background query of many query pairs may identify a query predicate intended by the keyword and a confidence score. These results may be associated with the keyword in a keyword map, useful for translating queries into query predicates that may yield relevant query results.Type: ApplicationFiled: June 17, 2010Publication date: December 22, 2011Applicant: Microsoft CorporationInventors: Venkatesh Ganti, Dong Xin, Yeye He
-
Publication number: 20110282856Abstract: Embodiments for identifying an entity synonym of an entity are described. A query log is stored in a database located on at least one computing device. A candidate generation module can select a candidate query in the query log that shares a click on a URL with the entity. A correlated tag module can generate a set of phrase-tag pairs for the entity and the candidate query and measure a mutual information value for each phrase-tag pair. A candidate filtering module can determine a click similarity value between the candidate query and the entity based on a set of URLs selected in the search engine results and a tag similarity value based on the mutual information values. A candidate query is selected as an entity synonym if the click similarity value and the tag similarity value are greater than predetermined thresholds respectively.Type: ApplicationFiled: May 14, 2010Publication date: November 17, 2011Applicant: Microsoft CorporationInventors: Venkatesh Ganti, Dong Xin
-
Patent number: 8046339Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.Type: GrantFiled: June 5, 2007Date of Patent: October 25, 2011Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Bee Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
-
Patent number: 8037069Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.Type: GrantFiled: June 3, 2008Date of Patent: October 11, 2011Assignee: Microsoft CorporationInventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
-
Publication number: 20110214080Abstract: This patent application relates to taxonomy editing. One implementation involves a taxonomy editor configured to generate a visual representation of a taxonomy associated with a set of scientific papers. The taxonomy editor includes a properties module configured to identify properties relating to an individual node of the taxonomy and a statistics module configured to determine trends relating to the individual node. The taxonomy editor further includes a similarity module configured to evaluate keyword similarity relative to individual scientific papers associated with the individual node. The taxonomy editor also includes a suggestion module configured to utilize the properties, the trends and the keyword similarity to identify potential modifications to the taxonomy. The taxonomy editor is further configured to present at least some of the potential modifications, the properties, the trends, and the keyword similarity concurrently with the visual representation of the taxonomy.Type: ApplicationFiled: February 26, 2010Publication date: September 1, 2011Applicant: Microsoft CorporationInventors: Sanjay Agrawal, Surajit Chaudhuri, Venkatesh Ganti, Yuri Siradeghyan
-
Patent number: 7970808Abstract: Entities, such as people, places and things, are labeled based on information collected across a possibly large number of documents. One or more documents are scanned to recognize the entities, and features are extracted from the context in which those entities occur in the documents. Observed entity-feature pairs are stored either in an in-memory store or an external store. A store manager optimizes use of the limited amount of space for an in-memory store by determining which store to put an entity-feature pair in, and when to evict features from the in-memory store to make room for new pairs. Feature that may be observed in an entity's context may take forms such as specific word sequences or membership in a particular list.Type: GrantFiled: May 5, 2008Date of Patent: June 28, 2011Assignee: Microsoft CorporationInventors: Arnd Christian Konig, Venkatesh Ganti
-
Publication number: 20110125791Abstract: Techniques are described herein for classifying a search query with respect to query intent using search result tag ratios. A tag is a character or a combination of characters (e.g., one or more words) that indicates a property of a document, such as a topic of the document, a type of entity (i.e., subject matter) the document references, etc. A search result tag ratio is defined as a fraction (e.g., a proportion, a percentage, etc.) of the documents in a search result that includes a respective tag. A search query may be classified based on back-off ratios, which are tag ratios of search queries that are related to the search query to be classified. Tag ratios may be pre-computed (i.e., calculated before the corresponding search queries are received from users).Type: ApplicationFiled: November 25, 2009Publication date: May 26, 2011Applicant: Microsoft CorporationInventors: Arnd Christian Konig, Venkatesh Ganti, Xiao Li
-
Patent number: 7865505Abstract: A machine implemented system and method that efficiently facilitates and effectuates exact similarity joins between collections of sets. The system and method obtains a collection of sets and a threshold value from an interface, and based at least in part on an identifiable similarity, such as an overlap or intersection, between the collection of sets the analysis component generates and outputs a candidate pair that at least equals or exceeds the threshold value.Type: GrantFiled: January 30, 2007Date of Patent: January 4, 2011Assignee: Microsoft CorporationInventors: Arvind Arasu, Venkatesh Ganti, Kaushik Shriraghav
-
Publication number: 20100313258Abstract: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.Type: ApplicationFiled: June 4, 2009Publication date: December 9, 2010Applicant: MICROSOFT CORPORATIONInventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
-
Publication number: 20100299367Abstract: A keyword search is executed on a view of a database based on a Boolean keyword query. The view includes multiple text columns, and the keyword search is executed on each of the multiple text columns in the view. The output results from the keyword search on each of the text columns include tuple identifiers of one or more relevant tuples and a relevancy score for ranking the results of the keyword query.Type: ApplicationFiled: May 20, 2009Publication date: November 25, 2010Applicant: Microsoft CorporationInventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
-
Publication number: 20100293179Abstract: Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.Type: ApplicationFiled: May 14, 2009Publication date: November 18, 2010Applicant: Microsoft CorporationInventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
-
Patent number: 7730060Abstract: The subject disclosure pertains to a class of object finder queries that return the best target objects that match a set of given keywords. Mechanisms are provided that facilitate identification of target objects related to search objects that match a set of query keywords. Scoring mechanisms/functions are also disclosed that compute relevance scores of target objects. Further, efficient early termination techniques are provided to compute the top K target objects based on a scoring function.Type: GrantFiled: June 9, 2006Date of Patent: June 1, 2010Assignee: Microsoft CorporationInventors: Kaushik Chakrabarti, Venkatesh Ganti, Dong Xin
-
Patent number: 7720883Abstract: Architecture that provides a data profile computation technique which employs key profile computation and data pattern profile computation. Key profile computation in a data table includes both exact keys as well as approximate keys, and is based on key strengths. A key strength of 100% is an exact key, and any other percentage in an approximate key. The key strength is estimated based on the number of table rows that have duplicated attribute values. Only column sets that exceed a threshold value are returned. Pattern profiling identifies a small set of regular expression patterns which best describe the patterns within a given set of attribute values. Pattern profiling includes three phases: a first phases for determining token regular expressions, a second phase for determining candidate regular expressions, and a third phase for identifying the best regular expressions of the candidates that match the attribute values.Type: GrantFiled: June 27, 2007Date of Patent: May 18, 2010Assignee: Microsoft CorporationInventors: Zhimin Chen, Venkatesh Ganti, Gunjan Jha, Shriraghav Kaushik, Vivek Narasayya
-
Patent number: 7685090Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.Type: GrantFiled: July 14, 2005Date of Patent: March 23, 2010Assignee: Microsoft CorporationInventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
-
Publication number: 20090327223Abstract: The described implementations relate to query portals. One technique analyzes search results generated by a web search engine responsive to a user search query. The technique also dynamically generates a query portal that lists the search results as well as entities identified from the search results.Type: ApplicationFiled: June 26, 2008Publication date: December 31, 2009Applicant: MICROSOFT CORPORATIONInventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Dong Xin, Sanjay Agrawal, Arnd Christian Konig
-
Publication number: 20090319500Abstract: A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.Type: ApplicationFiled: June 24, 2008Publication date: December 24, 2009Applicant: Microsoft CorporationInventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
-
Patent number: 7634464Abstract: The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match records. To assist design of such packages, a learning technique based on examples is provided. More specifically, a set of matching and non-matching record pairs can be input and employed to facilitate automatic package generation. A generated package can subsequently be transformed manually and/or automatically into a semantically equivalent form optimized for execution.Type: GrantFiled: June 14, 2006Date of Patent: December 15, 2009Assignee: Microsoft CorporationInventors: Bee-Chung Chen, Venkatesh Ganti, Kaushik Shriraghav
-
Publication number: 20090300014Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.Type: ApplicationFiled: June 3, 2008Publication date: December 3, 2009Applicant: Microsoft CorporationInventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
-
Patent number: 7627567Abstract: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.Type: GrantFiled: April 14, 2004Date of Patent: December 1, 2009Assignee: Microsoft CorporationInventors: Venkatesh Ganti, Vassilakis Theodore, Yevgeny Agichtein