Patents by Inventor Andi Wu

Andi Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7158930
    Abstract: A method is provided for parsing text in a corpus. The method includes hypothesizing a possible new entry for a dictionary based on a first segment of text. A successful parse is then formed for the first segment of text using the possible new entry. Based on the successful parse, the dictionary is changed to include the new entry. The new entry in the dictionary is then used to parse a second segment of text.
    Type: Grant
    Filed: August 15, 2002
    Date of Patent: January 2, 2007
    Assignee: Microsoft Corporation
    Inventors: Joseph E. Pentheroudakis, Andi Wu
  • Publication number: 20040034525
    Abstract: A method is provided for parsing text in a corpus. The method includes hypothesizing a possible new entry for a dictionary based on a first segment of text. A successful parse is then formed for the first segment of text using the possible new entry. Based on the successful parse, the dictionary is changed to include the new entry. The new entry in the dictionary is then used to parse a second segment of text.
    Type: Application
    Filed: August 15, 2002
    Publication date: February 19, 2004
    Inventors: Joseph E. Pentheroudakis, Andi Wu
  • Patent number: 6694055
    Abstract: A word segmentation method to identify proper names in input text includes locating a sequence of single-characters in the input text not forming part of a multiple-character word. The method further includes comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name, and comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. Instructions can be provided on a computer readable medium to implement the method.
    Type: Grant
    Filed: July 15, 1998
    Date of Patent: February 17, 2004
    Assignee: Microsoft Corporation
    Inventor: Andi Wu
  • Patent number: 6678409
    Abstract: The present invention segments a non-segmented input text. The input text is received and segmented based on parameter values associated with parameterized word formation rules. In one illustrative embodiment, the input text is processed into a form which includes parameter indications, but which preserves the word-internal structure of the input text. Thus, the parameter values can be changed without entirely re-processing the input text.
    Type: Grant
    Filed: January 14, 2000
    Date of Patent: January 13, 2004
    Assignee: Microsoft Corporation
    Inventors: Andi Wu, Zixin Jiang
  • Patent number: 6640006
    Abstract: The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word.
    Type: Grant
    Filed: May 29, 1998
    Date of Patent: October 28, 2003
    Assignee: Microsoft Corporation
    Inventors: Andi Wu, Stephen D. Richardson, Zixin Jiang
  • Publication number: 20020102025
    Abstract: The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word.
    Type: Application
    Filed: May 29, 1998
    Publication date: August 1, 2002
    Inventors: ANDI WU, STEPHEN D. RICHARDSON, ZIXIN JIANG
  • Patent number: 6360197
    Abstract: A method and apparatus are provided that identify confused characters in a text written in a language having a large number of distinct characters. To identify the confused characters, a set of characters from the text are segmented into individual characters. A confusable character for at least one of the segmented characters is then retrieved. Lexical information is identified for both the segmented characters and the retrieved confusable characters and is used to parse the segmented characters and the confusable characters. Based on the parse, a segmented character is identified that has been confused with a confusable character.
    Type: Grant
    Filed: October 19, 1999
    Date of Patent: March 19, 2002
    Assignee: Microsoft Corporation
    Inventors: Andi Wu, George E. Heidorn
  • Publication number: 20020003898
    Abstract: A word segmentation method to identify proper names in input text includes locating a sequence of single-characters in the input text not forming part of a multiple-character word. The method further includes comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name, and comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. Instructions can be provided on a computer readable medium to implement the method.
    Type: Application
    Filed: July 15, 1998
    Publication date: January 10, 2002
    Inventor: ANDI WU