Patents by Inventor Venkatesh Ganti

Venkatesh Ganti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Pushing Search Query Constraints Into Information Retrieval Processing

Publication number: 20110320446

Abstract: This patent application relates to interval-based information retrieval (IR) search techniques for efficiently and correctly answering keyword search queries. In some embodiments, a range of information-containing blocks for a search query can be identified. Each of these blocks, and thus the range, can include document identifiers that identify individual corresponding documents that contain a term found in the search query. From the range, a subrange(s) having a smaller number of blocks than the range can be selected. This can be accomplished without decompressing the blocks by partitioning the range into intervals and evaluating the intervals. The smaller number of blocks in the subranges(s) can then be decompressed and processed to identify a doc ID(s) and thus document(s) that satisfies the query.

Type: Application

Filed: June 25, 2010

Publication date: December 29, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
KEYWORD TO QUERY PREDICATE MAPS FOR QUERY TRANSLATION

Publication number: 20110314010

Abstract: A query comprising a set of keywords may be applied to a data set having various attributes, but it may be difficult to determine the query predicates intended for each keyword (e.g., the attributes targeted by each keyword, and the values of those attributes satisfying the keyword.) The meaning of a keyword of interest may be inferred from a set of query pairs, comprising a background query (comprising a set of keywords excluding the keyword of interest) and a foreground query (comprising the same set of keywords but also including the keyword of interest.) Differences in the query results for the foreground query and the background query of many query pairs may identify a query predicate intended by the keyword and a confidence score. These results may be associated with the keyword in a keyword map, useful for translating queries into query predicates that may yield relevant query results.

Type: Application

Filed: June 17, 2010

Publication date: December 22, 2011

Applicant: Microsoft Corporation

Inventors: Venkatesh Ganti, Dong Xin, Yeye He
IDENTIFYING ENTITY SYNONYMS

Publication number: 20110282856

Abstract: Embodiments for identifying an entity synonym of an entity are described. A query log is stored in a database located on at least one computing device. A candidate generation module can select a candidate query in the query log that shares a click on a URL with the entity. A correlated tag module can generate a set of phrase-tag pairs for the entity and the candidate query and measure a mutual information value for each phrase-tag pair. A candidate filtering module can determine a click similarity value between the candidate query and the entity based on a set of URLs selected in the search engine results and a tag similarity value based on the mutual information values. A candidate query is selected as an entity synonym if the click similarity value and the tag similarity value are greater than predetermined thresholds respectively.

Type: Application

Filed: May 14, 2010

Publication date: November 17, 2011

Applicant: Microsoft Corporation

Inventors: Venkatesh Ganti, Dong Xin
Example-driven design of efficient record matching queries

Patent number: 8046339

Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.

Type: Grant

Filed: June 5, 2007

Date of Patent: October 25, 2011

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Bee Chung Chen, Venkatesh Ganti, Shriraghav Kaushik
Membership checking of digital text

Patent number: 8037069

Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.

Type: Grant

Filed: June 3, 2008

Date of Patent: October 11, 2011

Assignee: Microsoft Corporation

Inventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
Taxonomy Editor

Publication number: 20110214080

Abstract: This patent application relates to taxonomy editing. One implementation involves a taxonomy editor configured to generate a visual representation of a taxonomy associated with a set of scientific papers. The taxonomy editor includes a properties module configured to identify properties relating to an individual node of the taxonomy and a statistics module configured to determine trends relating to the individual node. The taxonomy editor further includes a similarity module configured to evaluate keyword similarity relative to individual scientific papers associated with the individual node. The taxonomy editor also includes a suggestion module configured to utilize the properties, the trends and the keyword similarity to identify potential modifications to the taxonomy. The taxonomy editor is further configured to present at least some of the potential modifications, the properties, the trends, and the keyword similarity concurrently with the visual representation of the taxonomy.

Type: Application

Filed: February 26, 2010

Publication date: September 1, 2011

Applicant: Microsoft Corporation

Inventors: Sanjay Agrawal, Surajit Chaudhuri, Venkatesh Ganti, Yuri Siradeghyan
Leveraging cross-document context to label entity

Patent number: 7970808

Abstract: Entities, such as people, places and things, are labeled based on information collected across a possibly large number of documents. One or more documents are scanned to recognize the entities, and features are extracted from the context in which those entities occur in the documents. Observed entity-feature pairs are stored either in an in-memory store or an external store. A store manager optimizes use of the limited amount of space for an in-memory store by determining which store to put an entity-feature pair in, and when to evict features from the in-memory store to make room for new pairs. Feature that may be observed in an entity's context may take forms such as specific word sequences or membership in a particular list.

Type: Grant

Filed: May 5, 2008

Date of Patent: June 28, 2011

Assignee: Microsoft Corporation

Inventors: Arnd Christian Konig, Venkatesh Ganti
QUERY CLASSIFICATION USING SEARCH RESULT TAG RATIOS

Publication number: 20110125791

Abstract: Techniques are described herein for classifying a search query with respect to query intent using search result tag ratios. A tag is a character or a combination of characters (e.g., one or more words) that indicates a property of a document, such as a topic of the document, a type of entity (i.e., subject matter) the document references, etc. A search result tag ratio is defined as a fraction (e.g., a proportion, a percentage, etc.) of the documents in a search result that includes a respective tag. A search query may be classified based on back-off ratios, which are tag ratios of search queries that are related to the search query to be classified. Tag ratios may be pre-computed (i.e., calculated before the corresponding search queries are received from users).

Type: Application

Filed: November 25, 2009

Publication date: May 26, 2011

Applicant: Microsoft Corporation

Inventors: Arnd Christian Konig, Venkatesh Ganti, Xiao Li
Efficient exact set similarity joins

Patent number: 7865505

Abstract: A machine implemented system and method that efficiently facilitates and effectuates exact similarity joins between collections of sets. The system and method obtains a collection of sets and a threshold value from an interface, and based at least in part on an identifiable similarity, such as an overlap or intersection, between the collection of sets the analysis component generates and outputs a candidate pair that at least equals or exceeds the threshold value.

Type: Grant

Filed: January 30, 2007

Date of Patent: January 4, 2011

Assignee: Microsoft Corporation

Inventors: Arvind Arasu, Venkatesh Ganti, Kaushik Shriraghav
IDENTIFYING SYNONYMS OF ENTITIES USING A DOCUMENT COLLECTION

Publication number: 20100313258

Abstract: Identifying synonyms of entities using a collection of documents is disclosed herein. In some aspects, a document from a collection of documents may be analyzed to identify hit sequences that include one or more tokens (e.g., words, number, etc.). The hit sequences may then be used to generate discriminating token sets (DTS's) that are subsets of both the hit sequences and the entity names. The DTS's are matched with corresponding entity names, and then used to create DTS phrases by selecting adjacent text in the document that is proximate to the DTS. The DTS phrases may be analyzed to determine whether the corresponding DTS is synonyms of the entity name. In various aspects, the tokens of an associated entity name that are present in the DTS phrases are used to generate a score for the DTS. When the score at least reaches a threshold, the DTS may be designated as a synonym. A list of synonyms may be generated for each entity name.

Type: Application

Filed: June 4, 2009

Publication date: December 9, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
Keyword Searching On Database Views

Publication number: 20100299367

Abstract: A keyword search is executed on a view of a database based on a Boolean keyword query. The view includes multiple text columns, and the keyword search is executed on each of the multiple text columns in the view. The output results from the keyword search on each of the text columns include tuple identifiers of one or more relevant tuples and a relevancy score for ranking the results of the keyword query.

Type: Application

Filed: May 20, 2009

Publication date: November 25, 2010

Applicant: Microsoft Corporation

Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
IDENTIFYING SYNONYMS OF ENTITIES USING WEB SEARCH

Publication number: 20100293179

Abstract: Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.

Type: Application

Filed: May 14, 2009

Publication date: November 18, 2010

Applicant: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
Efficient evaluation of object finder queries

Patent number: 7730060

Abstract: The subject disclosure pertains to a class of object finder queries that return the best target objects that match a set of given keywords. Mechanisms are provided that facilitate identification of target objects related to search objects that match a set of query keywords. Scoring mechanisms/functions are also disclosed that compute relevance scores of target objects. Further, efficient early termination techniques are provided to compute the top K target objects based on a scoring function.

Type: Grant

Filed: June 9, 2006

Date of Patent: June 1, 2010

Assignee: Microsoft Corporation

Inventors: Kaushik Chakrabarti, Venkatesh Ganti, Dong Xin
Key profile computation and data pattern profile computation

Patent number: 7720883

Abstract: Architecture that provides a data profile computation technique which employs key profile computation and data pattern profile computation. Key profile computation in a data table includes both exact keys as well as approximate keys, and is based on key strengths. A key strength of 100% is an exact key, and any other percentage in an approximate key. The key strength is estimated based on the number of table rows that have duplicated attribute values. Only column sets that exceed a threshold value are returned. Pattern profiling identifies a small set of regular expression patterns which best describe the patterns within a given set of attribute values. Pattern profiling includes three phases: a first phases for determining token regular expressions, a second phase for determining candidate regular expressions, and a third phase for identifying the best regular expressions of the candidates that match the attribute values.

Type: Grant

Filed: June 27, 2007

Date of Patent: May 18, 2010

Assignee: Microsoft Corporation

Inventors: Zhimin Chen, Venkatesh Ganti, Gunjan Jha, Shriraghav Kaushik, Vivek Narasayya
Detecting duplicate records in databases

Patent number: 7685090

Abstract: The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Type: Grant

Filed: July 14, 2005

Date of Patent: March 23, 2010

Assignee: Microsoft Corporation

Inventors: Surajit Chaudhuri, Venkatesh Ganti, Rohit Ananthakrishna
QUERY-DRIVEN WEB PORTALS

Publication number: 20090327223

Abstract: The described implementations relate to query portals. One technique analyzes search results generated by a web search engine responsive to a user search query. The technique also dynamically generates a query portal that lists the search results as well as entities identified from the search results.

Type: Application

Filed: June 26, 2008

Publication date: December 31, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Dong Xin, Sanjay Agrawal, Arnd Christian Konig
Scalable lookup-driven entity extraction from indexed document collections

Publication number: 20090319500

Abstract: A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Applicant: Microsoft Corporation

Inventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
Designing record matching queries utilizing examples

Patent number: 7634464

Abstract: The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match records. To assist design of such packages, a learning technique based on examples is provided. More specifically, a set of matching and non-matching record pairs can be input and employed to facilitate automatic package generation. A generated package can subsequently be transformed manually and/or automatically into a semantically equivalent form optimized for execution.

Type: Grant

Filed: June 14, 2006

Date of Patent: December 15, 2009

Assignee: Microsoft Corporation

Inventors: Bee-Chung Chen, Venkatesh Ganti, Kaushik Shriraghav
MEMBERSHIP CHECKING OF DIGITAL TEXT

Publication number: 20090300014

Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.

Type: Application

Filed: June 3, 2008

Publication date: December 3, 2009

Applicant: Microsoft Corporation

Inventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
Segmentation of strings into structured records

Patent number: 7627567

Abstract: An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The process then determines a most probable segmentation of the input record by comparing the tokens of the input record with a state models derived for attributes from the reference table.

Type: Grant

Filed: April 14, 2004

Date of Patent: December 1, 2009

Assignee: Microsoft Corporation

Inventors: Venkatesh Ganti, Vassilakis Theodore, Yevgeny Agichtein

prev 1 2 3 4 next