Patents by Inventor Andi Wu

Andi Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for expanding dictionaries during parsing

Patent number: 7158930

Abstract: A method is provided for parsing text in a corpus. The method includes hypothesizing a possible new entry for a dictionary based on a first segment of text. A successful parse is then formed for the first segment of text using the possible new entry. Based on the successful parse, the dictionary is changed to include the new entry. The new entry in the dictionary is then used to parse a second segment of text.

Type: Grant

Filed: August 15, 2002

Date of Patent: January 2, 2007

Assignee: Microsoft Corporation

Inventors: Joseph E. Pentheroudakis, Andi Wu
Method and apparatus for expanding dictionaries during parsing

Publication number: 20040034525

Abstract: A method is provided for parsing text in a corpus. The method includes hypothesizing a possible new entry for a dictionary based on a first segment of text. A successful parse is then formed for the first segment of text using the possible new entry. Based on the successful parse, the dictionary is changed to include the new entry. The new entry in the dictionary is then used to parse a second segment of text.

Type: Application

Filed: August 15, 2002

Publication date: February 19, 2004

Inventors: Joseph E. Pentheroudakis, Andi Wu
Proper name identification in chinese

Patent number: 6694055

Abstract: A word segmentation method to identify proper names in input text includes locating a sequence of single-characters in the input text not forming part of a multiple-character word. The method further includes comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name, and comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. Instructions can be provided on a computer readable medium to implement the method.

Type: Grant

Filed: July 15, 1998

Date of Patent: February 17, 2004

Assignee: Microsoft Corporation

Inventor: Andi Wu
Parameterized word segmentation of unsegmented text

Patent number: 6678409

Abstract: The present invention segments a non-segmented input text. The input text is received and segmented based on parameter values associated with parameterized word formation rules. In one illustrative embodiment, the input text is processed into a form which includes parameter indications, but which preserves the word-internal structure of the input text. Thus, the parameter values can be changed without entirely re-processing the input text.

Type: Grant

Filed: January 14, 2000

Date of Patent: January 13, 2004

Assignee: Microsoft Corporation

Inventors: Andi Wu, Zixin Jiang
Word segmentation in chinese text

Patent number: 6640006

Abstract: The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word.

Type: Grant

Filed: May 29, 1998

Date of Patent: October 28, 2003

Assignee: Microsoft Corporation

Inventors: Andi Wu, Stephen D. Richardson, Zixin Jiang
WORD SEGMENTATION IN CHINESE TEXT

Publication number: 20020102025

Abstract: The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word.

Type: Application

Filed: May 29, 1998

Publication date: August 1, 2002

Inventors: ANDI WU, STEPHEN D. RICHARDSON, ZIXIN JIANG
Method and apparatus for identifying erroneous characters in text

Patent number: 6360197

Abstract: A method and apparatus are provided that identify confused characters in a text written in a language having a large number of distinct characters. To identify the confused characters, a set of characters from the text are segmented into individual characters. A confusable character for at least one of the segmented characters is then retrieved. Lexical information is identified for both the segmented characters and the retrieved confusable characters and is used to parse the segmented characters and the confusable characters. Based on the parse, a segmented character is identified that has been confused with a confusable character.

Type: Grant

Filed: October 19, 1999

Date of Patent: March 19, 2002

Assignee: Microsoft Corporation

Inventors: Andi Wu, George E. Heidorn
PROPER NAME IDENTIFICATION IN CHINESE

Publication number: 20020003898

Abstract: A word segmentation method to identify proper names in input text includes locating a sequence of single-characters in the input text not forming part of a multiple-character word. The method further includes comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name, and comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. Instructions can be provided on a computer readable medium to implement the method.

Type: Application

Filed: July 15, 1998

Publication date: January 10, 2002

Inventor: ANDI WU