Patents Assigned to POWERSET, INC.

Semi-Automatic Example-Based Induction of Semantic Translation Rules to Support Natural Language Search

Publication number: 20090138454

Abstract: Technologies are described herein for generating a semantic translation rule to support natural language search. In one method, a first expression and a second expression are received. A first representation is generated based on the first expression, and a second representation is generated based on the second expression. Aligned pairs of a first term in the first representation and a second term in the second representation are determined. For each aligned pair, the first term and the second term are replaced with a variable associated with the aligned pair. Word facts that occur in both the first representation and the second representation are removed from the first representation and the second representation. The remaining word facts in the first representation are replaced with a broader representation of the word facts. The translation rule including the first representation, an operator, and the second semantic representation is generated.

Type: Application

Filed: August 29, 2008

Publication date: May 28, 2009

Applicant: POWERSET, INC.

Inventors: Emmanuel Rayner, Richard Crouch, Hannah Copperman, Giovanni Lorenzo Thione, Martin Henk Van den Berg
Efficient Storage and Retrieval of Posting Lists

Publication number: 20090132521

Abstract: A role tree having nodes corresponding to semantic roles in a hierarchy is defined. A posting list is generated for each association of a term and a semantic role in the hierarchy. The posting lists are stored contiguously on a physical storage medium such that a subtree of the hierarchy of semantic roles can be loaded from the storage medium as a single contiguous block. The posting lists for a subtree of the hierarchy are retrieved by obtaining data identifying the beginning location on the physical storage medium of the posting lists for the term at the top of a desired subtree of the hierarchy and data identifying the length of the posting lists of the desired subtree of the hierarchy. A single contiguous block that includes the posting lists for the desired subtree of the hierarchy is then retrieved from the beginning location through the specified length.

Type: Application

Filed: August 29, 2008

Publication date: May 21, 2009

Applicant: POWERSET, INC.

Inventors: Chad Walters, Giovanni Lorenzo Thione, Barney Pell, Lukas Biewald, Brendan O'Connor
Efficiently Representing Word Sense Probabilities

Publication number: 20090094019

Abstract: Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index.

Type: Application

Filed: August 29, 2008

Publication date: April 9, 2009

Applicant: POWERSET, INC.

Inventors: Rion Snow, Giovanni Lorenzo Thione, Scott A. Waterman, Chad Walters, Timothy Converse
Natural Language Hypernym Weighting For Word Sense Disambiguation

Publication number: 20090089047

Abstract: Technologies are described herein for probabilistically assigning weights to word senses and hypernyms of a word. The weights can be used in natural language processing applications such as information indexing and querying. A word hypernym weight (WHW) score can be determined by summing word sense probabilities of word senses from which the hypernym is inherited. WHW scores can be used to prune away hypernyms prior to indexing, to rank query results, and for other functions related to information indexing and querying. A semantic search technique can use WHW scores to retrieve an entry related to a word from an index in response to matching an indexed hypernym of the word with a query term applied to the index. More refined and accurate query results may be provided based on reduced user inputs.

Type: Application

Filed: August 29, 2008

Publication date: April 2, 2009

Applicant: POWERSET, INC.

Inventors: Barney Pell, Rion Snow, Scott A. Waterman
Calculating Valence Of Expressions Within Documents For Searching A Document Index

Publication number: 20090077069

Abstract: Tools and techniques related to calculating valence of expressions within documents. These tools may provide methods that include receiving input documents for processing, and extracting expressions from the documents for valence analysis, with scope relationships occurring between terms contained in the expressions. The methods may calculate calculating valences of the expressions, based on the scope relationships between terms in the expressions.

Type: Application

Filed: August 29, 2008

Publication date: March 19, 2009

Applicant: POWERSET, INC.

Inventors: Livia Polanyi, Martin Henk Van den Berg, Barney Pell
Coreference Resolution In An Ambiguity-Sensitive Natural Language Processing System

Publication number: 20090076799

Abstract: Technologies are described herein for coreference resolution in an ambiguity-sensitive natural language processing system. Techniques for integrating reference resolution functionality into a natural language processing system can processes documents to be indexed within an information search and retrieval system. Ambiguity awareness features, as well as ambiguity resolution functionality, can operate in coordination with coreference resolution. Annotation of coreference entities, as well as ambiguous interpretations, can be supported by in-line markup within text content or by external entity maps. Information expressed within documents can be formally organized in terms of facts, or relationships between entities in the text. Expansion can support applying multiple aliases, or ambiguities, to an entity being indexed so that all of the possibly references or interpretations for that entity are captured into the index.

Type: Application

Filed: August 29, 2008

Publication date: March 19, 2009

Applicant: POWERSET, INC.

Inventors: Richard Crouch, Martin Henk Van den Berg, Franco Salvetti, Giovanni Lorenzo Thione, David Ahn
Checkpointing Iterators During Search

Publication number: 20090070308

Abstract: Tools and techniques are described herein for checkpointing iterators during search. These tools may provide methods that include instantiating iterators in response to a search request. The iterators include fixed state information that remains constant over a life of the iterator, and further include dynamic state information that is updated over the life of the iterator. The iterators traverse through postings lists in connection with performing the search request. As the iterators traverse the posting lists, the iterators may update their dynamic state information. The iterators may then evaluate whether to create checkpoints, with the checkpoints including representations of the dynamic state information.

Type: Application

Filed: August 29, 2008

Publication date: March 12, 2009

Applicant: POWERSET, INC.

Inventors: Chad Walters, Lukas Biewald, Nitay Joffe, Andrew Alan James
Iterators for Applying Term Occurrence-Level Constraints in Natural Language Searching

Publication number: 20090070298

Abstract: Tools and techniques are described that relate to iterators for applying term occurrence-level constraints in natural language searching. These tools may receive a natural language input query, and define term occurrence-level constraints applicable to the input query. The methods may also identify facts requested in the input query, and may instantiate an iterator to traverse a fact index to identify candidate facts responsive to the input query. This iterator may traverse through at least a portion of the fact index. The methods may receive candidate facts from this iterator, with these candidate facts including terms, referred to as term-level occurrences. The methods may apply the term occurrence-level constraints to the term-level occurrences. The methods may select the candidate fact for inclusion in search results for the input query, based at least in part on applying the term occurrence-level constraint.

Type: Application

Filed: August 29, 2008

Publication date: March 12, 2009

Applicant: POWERSET, INC.

Inventors: Giovanni Lorenzo Thione, Barney Pell, Chad Walters, Richard Crouch
BROWSING KNOWLEDGE ON THE BASIS OF SEMANTIC RELATIONS

Publication number: 20090070322

Abstract: Computer-readable media and computer systems for conducting semantic processes to facilitate navigation of search results that include sets of tuples representing facts associated with content of documents in response to queries for information. Content of documents is accessed and semantic structures are derived by distilling linguistic representations from the content. Groups of two or more related words, called tuples, are extracted from the documents or the semantic structures. Tuples can be stored at a tuple index. Representations of the relational tuples are displayed in addition to documents retrieved in response to a query.

Type: Application

Filed: August 29, 2008

Publication date: March 12, 2009

Applicant: Powerset, Inc.

Inventors: FRANCO SALVETTI, GIOVANNI LORENZO THIONE, RICHARD S. CROUCH, DAVID AHN, LUKAS A. BIEWALD, BRENDAN O'CONNOR, BARNEY D. PELL
FACT-BASED INDEXING FOR NATURAL LANGUAGE SEARCH

Publication number: 20090063550

Abstract: Computer-readable media and a computer system for implementing a natural language search using fact-based structures and for generating such fact-based structures are provided. A fact-based structure is generated using a semantic structure, which represents information, such as text, from a document, such as a web page. Typically, a natural language parser is used to create a semantic structure of the information, and the parser identifies terms, as well as the relationship between the terms. A fact-based structure of a semantic structure allows for a linear structure of these terms and their relationships to be created, while also maintaining identifiers of the terms to convey the dependency of one fact-based structure on another fact-based structure. Additionally, synonyms and hypernyms are identified while generating the fact-based structure to improve the accuracy of the overall search.

Type: Application

Filed: August 29, 2008

Publication date: March 5, 2009

Applicant: POWERSET, INC.

Inventors: MARTIN HENK VAN DEN BERG, DANIEL BOBROW, ROBERT D. CHESLOW, BARNEY D. PELL, GIOVANNI LORENZO THIONE, CHAD WATERS
INDEXING ROLE HIERARCHIES FOR WORDS IN A SEARCH INDEX

Publication number: 20090063473

Abstract: Methods, systems and computer readable media for finding documents in a data store that match a natural language query submitted by a user are provided. The documents and queries are matched by determining that words within the query have the same relationship to each other as the same words in the document. Documents are semantically analyzed and words in the document are indexed along with the role the word plays in a sentence. The initial semantic role may be generalized using a role hierarchy and stored in the index along with the original role. A similar analysis may be used with the search query to find words used in the same role in both the query and the document.

Type: Application

Filed: August 29, 2008

Publication date: March 5, 2009

Applicant: Powerset, Inc.

Inventors: Martin HENK VAN DEN BERG, Richard S. CROUCH, Giovanni L. THIONE, Chad P. WALTERS
IDENTIFICATION OF SEMANTIC RELATIONSHIPS WITHIN REPORTED SPEECH

Publication number: 20090063426

Abstract: Methods and computer-readable media for associating words or groups of words distilled from content, such as reported speech or an attitude report, of a document to form semantic relationships collectively used to generate a semantic representation of the content are provided. Semantic representations may include elements identified or parsed from a text portion of the content, the elements of which may be associated with other elements that share a semantic relationship, such as an agent, location, or topic relationship. Relationships may also be developed by associating one element that is in relation to, or is about, another element, thereby allowing for rapid and effective comparison of associations found in a semantic representation with associations derived from queries. The semantic relationships may be determined based on semantic information, such as potential meanings and grammatical functions of each element within the text portion of the content.

Type: Application

Filed: August 29, 2008

Publication date: March 5, 2009

Applicant: POWERSET, INC.

Inventors: RICHARD S. CROUCH, MARTIN HENK VAN DEN BERG, DAVID AHN, OLGA GUREVICH, BARNEY D. PELL, LIVIA POLANYI, SCOTT A. PREVOST, GIOVANNI LORENZO THIONE