Syntactic Pre-processing Steps, E.g., Stopword Elimination, Stemming, Etc. (epo) Patents (Class 707/E17.072)

Document-based synonym generation

Patent number: 8161041

Abstract: One embodiment of the present invention provides a system that automatically generates synonyms for words from documents. During operation, this system determines co-occurrence frequencies for pairs of words in the documents. The system also determines closeness scores for pairs of words in the documents, wherein a closeness score indicates whether a pair of words are located so close to each other that the words are likely to occur in the same sentence or phrase. Finally, the system determines whether pairs of words are synonyms based on the determined co-occurrence frequencies and the determined closeness scores. While making this determination, the system can additionally consider correlations between words in a title or an anchor of a document and words in the document as well as word-form scores for pairs of words in the documents.

Type: Grant

Filed: February 10, 2011

Date of Patent: April 17, 2012

Assignee: Google Inc.

Inventors: Oleksandr Grushetskyy, Steven D. Baker
SYSTEM AND METHOD TO PROVIDE QUERY LINGUISTIC SERVICE

Publication number: 20100228762

Abstract: In various example embodiments, a system and method to provide query linguistic service is provided. An initial query term set is received. Phrase recognition is performed on the initial query term set to determine recognized phrases. Using the determined recognized phrases, one or more synonyms for each of the recognized phrases are determined. Results matching the initial query term set and any selected synonyms from the determined one or more synonyms are determined.

Type: Application

Filed: March 5, 2010

Publication date: September 9, 2010

Inventors: Karin Mauge', Radoslav Valentinov Petranov, Jean-David Ruvini, Antoniya T. Statelova, Neelakantan Sundaresan
LANGUAGE INDEPENDENT STEMMING

Publication number: 20080228748

Abstract: A stemming framework for combining stemming algorithms together in a multilingual environment to obtain improved stemming behavior over any individual stemming algorithm, together with a new language independent stemming algorithm based on shortest path techniques. The stemmer essentially treats the stemming problem as a simple instance of the shortest path problem where the cost for each path can be computed from its word component and its number of characters. The goal of the stemmer is to find the shortest path to construct the entire word. The stemmer uses dynamic dictionaries constructed as lexical analyzer state transition tables to recognize the various allowable word parts for any given language in order to obtain maximum speed. The stemming framework provides the necessary logic to combine multiple stemmers in parallel and to merge their results to obtain the best behavior. Mapping dictionaries handle irregular plurals, tense, phrase mapping and proper name recognition.

Type: Application

Filed: March 16, 2007

Publication date: September 18, 2008

Inventor: John Fairweather

Document-based synonym generation

SYSTEM AND METHOD TO PROVIDE QUERY LINGUISTIC SERVICE

LANGUAGE INDEPENDENT STEMMING