Abstract: One embodiment of the present invention provides a system that automatically generates synonyms for words from documents. During operation, this system determines co-occurrence frequencies for pairs of words in the documents. The system also determines closeness scores for pairs of words in the documents, wherein a closeness score indicates whether a pair of words are located so close to each other that the words are likely to occur in the same sentence or phrase. Finally, the system determines whether pairs of words are synonyms based on the determined co-occurrence frequencies and the determined closeness scores. While making this determination, the system can additionally consider correlations between words in a title or an anchor of a document and words in the document as well as word-form scores for pairs of words in the documents.
Abstract: In various example embodiments, a system and method to provide query linguistic service is provided. An initial query term set is received. Phrase recognition is performed on the initial query term set to determine recognized phrases. Using the determined recognized phrases, one or more synonyms for each of the recognized phrases are determined. Results matching the initial query term set and any selected synonyms from the determined one or more synonyms are determined.
Type:
Application
Filed:
March 5, 2010
Publication date:
September 9, 2010
Inventors:
Karin Mauge', Radoslav Valentinov Petranov, Jean-David Ruvini, Antoniya T. Statelova, Neelakantan Sundaresan
Abstract: A stemming framework for combining stemming algorithms together in a multilingual environment to obtain improved stemming behavior over any individual stemming algorithm, together with a new language independent stemming algorithm based on shortest path techniques. The stemmer essentially treats the stemming problem as a simple instance of the shortest path problem where the cost for each path can be computed from its word component and its number of characters. The goal of the stemmer is to find the shortest path to construct the entire word. The stemmer uses dynamic dictionaries constructed as lexical analyzer state transition tables to recognize the various allowable word parts for any given language in order to obtain maximum speed. The stemming framework provides the necessary logic to combine multiple stemmers in parallel and to merge their results to obtain the best behavior. Mapping dictionaries handle irregular plurals, tense, phrase mapping and proper name recognition.