Patents by Inventor Joseph E. Pentheroudakis

Joseph E. Pentheroudakis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7269547
    Abstract: The present invention is a segmenter used in a natural language processing system. The segmenter segments a textual input string into tokens for further natural language processing. In accordance with one feature of the invention, the segmenter includes a tokeinzer engine that proposes segmentations and submits them to a linguistic knowledge component for validation. In accordance with another feature of the invention, the segmentation system includes language specific data that contains a precedence hierarchy for punctuation. If proposed tokens in the input string contain punctuation, they can illustratively be broken into subtokens based on the precedence hierarchy.
    Type: Grant
    Filed: July 15, 2005
    Date of Patent: September 11, 2007
    Assignee: Microsoft Corporation
    Inventors: Joseph E. Pentheroudakis, David G. Bradlee, Sonja S. Knoll
  • Patent number: 7158930
    Abstract: A method is provided for parsing text in a corpus. The method includes hypothesizing a possible new entry for a dictionary based on a first segment of text. A successful parse is then formed for the first segment of text using the possible new entry. Based on the successful parse, the dictionary is changed to include the new entry. The new entry in the dictionary is then used to parse a second segment of text.
    Type: Grant
    Filed: August 15, 2002
    Date of Patent: January 2, 2007
    Assignee: Microsoft Corporation
    Inventors: Joseph E. Pentheroudakis, Andi Wu
  • Patent number: 7092871
    Abstract: The present invention is a segmenter used in a natural language processing system. The segmenter segments a textual input string into tokens for further natural language processing. In accordance with one feature of the invention, the segmenter includes a tokenizer engine that proposes segmentations and submits them to a linguistic knowledge component for validation. In accordance with another feature of the invention, the segmentation system includes language-specific data that contains a precedence hierarchy for punctuation. If proposed tokens in the input string contain punctuation, they can illustratively be broken into subtokens based on the precedence hierarchy.
    Type: Grant
    Filed: March 30, 2001
    Date of Patent: August 15, 2006
    Assignee: Microsoft Corporation
    Inventors: Joseph E. Pentheroudakis, David G. Bradlee, Sonja S. Knoll
  • Publication number: 20040034525
    Abstract: A method is provided for parsing text in a corpus. The method includes hypothesizing a possible new entry for a dictionary based on a first segment of text. A successful parse is then formed for the first segment of text using the possible new entry. Based on the successful parse, the dictionary is changed to include the new entry. The new entry in the dictionary is then used to parse a second segment of text.
    Type: Application
    Filed: August 15, 2002
    Publication date: February 19, 2004
    Inventors: Joseph E. Pentheroudakis, Andi Wu
  • Publication number: 20030023425
    Abstract: The present invention is a segmenter used in a natural language processing system. The segmenter segments a textual input string into tokens for further natural language processing. In accordance with one feature of the invention, the segmenter includes a tokenizer engine that proposes segmentations and submits them to a linguistic knowledge component for validation. In accordance with another feature of the invention, the segmentation system includes language-specific data that contains a precedence hierarchy for punctuation. If proposed tokens in the input string contain punctuation, they can illustratively be broken into subtokens based on the precedence hierarchy.
    Type: Application
    Filed: March 30, 2001
    Publication date: January 30, 2003
    Inventors: Joseph E. Pentheroudakis, David G. Bradlee, Sonja S. Knoll
  • Patent number: 6192333
    Abstract: A computer readable medium has computer executable components that include a morphological analyzer capable of using a corpus of words to automatically form a dictionary containing words associated with respective lemmas and respective parts of speech. The computer executable components also include a dictionary analyzer capable of automatically improving such a dictionary.
    Type: Grant
    Filed: May 12, 1998
    Date of Patent: February 20, 2001
    Assignee: Microsoft Corporation
    Inventor: Joseph E. Pentheroudakis
  • Patent number: 5724594
    Abstract: A method and system for determining the derivational relatedness of a derived word and a base word. In a preferred embodiment, the system includes a machine-readable dictionary containing entries for head words and morphemes. Each entry contains definitional information and semantic relations. Each semantic relation specifies a relation between the head word with a word used in its definition. Semantic relations may contain nested semantic relations to specify relations between words in the definition. The system compares the semantic relations of the derived word to the semantic relations of a morpheme, which is putatively combined with the base word when forming the derived word. The system then generates a derivational score that indicates the confidence that the derived word derives from the base word.
    Type: Grant
    Filed: February 10, 1994
    Date of Patent: March 3, 1998
    Assignee: Microsoft Corporation
    Inventor: Joseph E. Pentheroudakis