Patents by Inventor Kai-Fu Lee

Kai-Fu Lee has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Language input architecture for converting one text form to another text form with tolerance to spelling typographical and conversion errors

Patent number: 7424675

Abstract: A language input architecture converts input strings of phonetic text to an output string of language text. The language input architecture has a search engine, one or more typing models, a language model, and one or more lexicons for different languages. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string. The language model provides probable conversion strings for each of the typing candidates based on probabilities of how likely a probable conversion output string represents the candidate string. The search engine combines the probabilities of the typing and language models to find the most probable conversion string that represents a converted form of the input string.

Type: Grant

Filed: September 27, 2004

Date of Patent: September 9, 2008

Assignee: Microsoft Corporation

Inventors: Kai-Fu Lee, Zheng Chen, Jian Han
Language input user interface

Patent number: 7403888

Abstract: A language input architecture receives input text (e.g., phonetic text of a character-based language) entered by a user from an input device (e.g., keyboard, voice recognition). The input text is converted to an output text (e.g., written language text of a character-based language). The language input architecture has a user interface that displays the output text and unconverted input text in line with one another. As the input text is converted, it is replaced in the UI with the converted output text. In addition to this in-line input feature, the UI enables in-place editing or error correction without requiring the user to switch modes from an entry mode to an edit mode. To assist with this in-place editing, the UI presents pop-up windows containing the phonetic text from which the output text was converted as well as first and second candidate lists that contain small and large sets of alternative candidates that might be used to replace the current output text.

Type: Grant

Filed: June 28, 2000

Date of Patent: July 22, 2008

Assignee: Microsoft Corporation

Inventors: Jian Wang, Gao Zhang, Jian Han, Zheng Chen, Xianoning Ling, Kai-Fu Lee
Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors

Patent number: 7302640

Abstract: A language input architecture converts input strings of phonetic text to an output string of language text. The language input architecture has a search engine, typing models, a language model, and one or more lexicons for different languages. Each typing model is trained on real data, and learns probabilities of typing errors. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string. The language model provides probable conversion strings for each of the typing candidates based on probabilities of how likely a probable conversion output string represents the candidate string. The search engine combines the probabilities of the typing and language models to find the most probable conversion string that represents a converted form of the input string.

Type: Grant

Filed: October 21, 2004

Date of Patent: November 27, 2007

Assignee: Microsoft Corporation

Inventors: Kai-Fu Lee, Zheng Chen, Jian Han
System and method for joint optimization of language model performance and size

Patent number: 7275029

Abstract: A method for the joint optimization of language model performance and size is presented comprising developing a language model from a tuning set of information, segmenting at least a subset of a received textual corpus and calculating a perplexity value for each segment and refining the language model with one or more segments of the received corpus based, at least in part, on the calculated perplexity value for the one or more segments.

Type: Grant

Filed: June 30, 2000

Date of Patent: September 25, 2007

Assignee: Microsoft Corporation

Inventors: Jianfeng Gao, Kai-Fu Lee, Mingjing Li, Hai-Feng Wang, Dong-Feng Cai, Lee-Feng Chien
Method and apparatus for generating and managing a language model data structure

Patent number: 7216066

Abstract: A method is presented comprising assigning each of a plurality of segments comprising a received corpus to a node in a data structure denoting dependencies between nodes, and calculating a transitional probability between each of the nodes in the data structure.

Type: Grant

Filed: February 22, 2006

Date of Patent: May 8, 2007

Assignee: Microsoft Corporation

Inventors: Shuo Di, Kai-Fu Lee, Lee-Feng Chien, Zheng Chen, Jianfeng Gao
Language input architecture for converting one text form to another text form with modeless entry

Patent number: 7165019

Abstract: A language input architecture converts input strings of phonetic text (e.g., Chinese Pinyin) to an output string of language text (e.g., Chinese Hanzi) in a manner that minimizes typographical errors and conversion errors that occur during conversion from the phonetic text to the language text. The language input architecture has a search engine, one or more typing models, a language model, and one or more lexicons for different languages. Each typing model is trained on real data, and learns probabilities of typing errors. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string.

Type: Grant

Filed: June 28, 2000

Date of Patent: January 16, 2007

Assignee: Microsoft Corporation

Inventors: Kai-Fu Lee, Zheng Chen, Jian Han
Method and Apparatus for Generating and Managing a Language Model Data Structure

Publication number: 20060184341

Abstract: A method is presented comprising assigning each of a plurality of segments comprising a received corpus to a node in a data structure denoting dependencies between nodes, and calculating a transitional probability between each of the nodes in the data structure.

Type: Application

Filed: February 22, 2006

Publication date: August 17, 2006

Applicant: Microsoft Corporation

Inventors: Shuo Di, Kai-Fu Lee, Lee-Feng Chien, Zheng Chen, Jianfeng Gao
Method and apparatus for generating and managing a language model data structure

Patent number: 7020587

Abstract: The generation and management of a language model data structure include assigning each segment of a received corpus to a node in a data structure that denotes dependencies between the respective nodes. A transitional probability between each of the nodes in the data structure is calculated. A frequency of occurrence is calculated for each item of the respective segments, and those nodes of the data structure associated with items that do not meet a minimum frequency of occurrence threshold are removed. The data structure may be managed across a system memory of a computer system and an extended memory of the computer system.

Type: Grant

Filed: June 30, 2000

Date of Patent: March 28, 2006

Assignee: Microsoft Corporation

Inventors: Shuo Di, Kai-Fu Lee, Lee-Feng Chien, Zheng Chen, Jianfeng Gao
System and iterative method for lexicon, segmentation and language model joint optimization

Patent number: 6904402

Abstract: A method for optimizing a language model is presented comprising developing an initial language model from a lexicon and segmentation derived from a received corpus using a maximum match technique, and iteratively refining the initial language model by dynamically updating the lexicon and re-segmenting the corpus according to statistical principles until a threshold of predictive capability is achieved.

Type: Grant

Filed: June 30, 2000

Date of Patent: June 7, 2005

Assignee: Microsoft Corporation

Inventors: Hai-Feng Wang, Chang-Ning Huang, Kai-Fu Lee, Shuo Di, Jianfeng Gao, Dong-Feng Cai, Lee-Feng Chien
Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors

Publication number: 20050086590

Abstract: A language input architecture converts input strings of phonetic text (e.g., Chinese Pinyin) to an output string of language text (e.g., Chinese Hanzi) in a manner that minimizes typographical errors and conversion errors that occur during conversion from the phonetic text to the language text. The language input architecture has a search engine, one or more typing models, a language model, and one or more lexicons for different languages. Each typing model is trained on real data, and learns probabilities of typing errors. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string. The probable typing candidates may be stored in a database. The language model provides probable conversion strings for each of the typing candidates based on probabilities of how likely a probable conversion output string represents the candidate string.

Type: Application

Filed: October 21, 2004

Publication date: April 21, 2005

Applicant: Microsoft Corporation

Inventors: Kai-Fu Lee, Zheng Chen, Jian Han
Language conversion and display

Publication number: 20050060138

Abstract: A language input architecture receives input text (e.g., phonetic text of a character-based language) entered by a user from an input device (e.g., keyboard, voice recognition). The input text is converted to an output text (e.g., written language text of a character-based language). The language input architecture has a user interface that displays the output text and unconverted input text in line with one another. As the input text is converted, it is replaced in the UI with the converted output text. In addition to this in-line input feature, the UI enables in-place editing or error correction without requiring the user to switch modes from an entry mode to an edit mode. To assist with this in-place editing, the UI presents pop-up windows containing the phonetic text from which the output text was converted as well as first and second candidate lists that contain small and large sets of alternative candidates that might be used to replace the current output text.

Type: Application

Filed: July 23, 2004

Publication date: March 17, 2005

Applicant: Microsoft Corporation

Inventors: Jian Wang, Gao Zhang, Jian Han, Zheng Chen, Xianoning Ling, Kai-Fu Lee
Language input architecture for converting one text form to another text form with tolerance to spelling typographical and conversion errors

Publication number: 20050044495

Abstract: A language input architecture converts input strings of phonetic text to an output string of language text. The language input architecture has a search engine, one or more typing models, a language model, and one or more lexicons for different languages. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string. The language model provides probable conversion strings for each of the typing candidates based on probabilities of how likely a probable conversion output string represents the candidate string. The search engine combines the probabilities of the typing and language models to find the most probable conversion string that represents a converted form of the input string.

Type: Application

Filed: September 27, 2004

Publication date: February 24, 2005

Applicant: Microsoft Corporation

Inventors: Kai-Fu Lee, Zheng Chen, Jian Han
Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors

Patent number: 6848080

Abstract: A language input architecture converts input strings of phonetic text to an output string of language text. The language input architecture has a search engine, one or more typing models, a language model, and one or more lexicons for different languages. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string. The language model provides probable conversion strings for each of the typing candidates based on probabilities of how likely a probable conversion output string represents the candidate string. The search engine combines the probabilities of the typing and language models to find the most probable conversion string that represents a converted form of the input string.

Type: Grant

Filed: June 28, 2000

Date of Patent: January 25, 2005

Assignee: Microsoft Corporation

Inventors: Kai-Fu Lee, Zheng Chen, Jian Han
Search engine with natural language-based robust parsing of user query and relevance feedback learning

Publication number: 20040243568

Abstract: A search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches. The search engine architecture includes a natural language parser that parses a user query and extracts syntactic and semantic information. The parser is robust in the sense that it not only returns fully-parsed results (e.g., a parse tree), but is also capable of returning partially-parsed fragments in those cases where more accurate or descriptive information in the user query is unavailable. A question matcher is employed to match the fully-parsed output and the partially-parsed fragments to a set of frequently asked questions (FAQs) stored in a database. The question matcher then correlates the questions with a group of possible answers arranged in standard templates that represent possible solutions to the user query. The search engine architecture also has a keyword searcher to locate other possible answers by searching on any keywords returned from the parser.

Type: Application

Filed: March 22, 2004

Publication date: December 2, 2004

Inventors: Hai-Feng Wang, Kai-Fu Lee, Qiang Yang
System and iterative method for lexicon, segmentation and language model joint optimization

Publication number: 20040210434

Abstract: A method for optimizing a language model is presented comprising developing an initial language model from a lexicon and segmentation derived from a received corpus using a maximum match technique, and iteratively refining the initial language model by dynamically updating the lexicon and re-segmenting the corpus according to statistical principles until a threshold of predictive capability is achieved.

Type: Application

Filed: May 10, 2004

Publication date: October 21, 2004

Applicant: Microsoft Corporation

Inventors: Hai-Feng Wang, Chang-Ning Huang, Kai-Fu Lee, Shuo Di, Jianfeng Gao, Dong-Feng Cai, Lee-Feng Chien
Search engine with natural language-based robust parsing for user query and relevance feedback learning

Patent number: 6766320

Abstract: A search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches. The search engine architecture includes a natural language parser that parses a user query and extracts syntactic and semantic information. The parser is robust in the sense that it not only returns fully-parsed results (e.g., a parse tree), but is also capable of returning partially-parsed fragments in those cases where more accurate or descriptive information in the user query is unavailable. A question matcher is employed to match the fully-parsed output and the partially-parsed fragments to a set of frequently asked questions (FAQs) stored in a database. The question matcher then correlates the questions with a group of possible answers arranged in standard templates that represent possible solutions to the user query. The search engine architecture also has a keyword searcher to locate other possible answers by searching on any keywords returned from the parser.

Type: Grant

Filed: August 24, 2000

Date of Patent: July 20, 2004

Assignee: Microsoft Corporation

Inventors: Hai-Feng Wang, Kai-Fu Lee, Qiang Yang
System and method for automatic subcharacter unit and lexicon generation for handwriting recognition

Patent number: 5757964

Abstract: A system for automatic subcharacter unit and lexicon generation for handwriting recognition comprises a processing unit, a handwriting input device, and a memory wherein a segmentation unit, a subcharacter generation unit, a lexicon unit, and a modeling unit reside. The segmentation unit generates feature vectors corresponding to sample characters. The subcharacter generation unit clusters feature vectors and assigns each feature vector associated with a given cluster an identical label. The lexicon unit constructs a lexical graph for each character in a character set. The modeling unit generates a Hidden Markov Model for each set of identically-labeled feature vectors. After a first set of lexical graphs and Hidden Markov Models have been created, the subcharacter generation unit determines for each feature vector which Hidden Markov Model produces a highest likelihood value.

Type: Grant

Filed: July 29, 1997

Date of Patent: May 26, 1998

Assignee: Apple Computer, Inc.

Inventors: Kai-Fu Lee, Yen-Lu Chow, Kamil Grajski
Rapid tree-based method for vector quantization

Patent number: 5734791

Abstract: The branching decision for each node in a vector quantization (VQ) binary tree is made by a simple comparison of a pre-selected element of the candidate vector with a stored threshold resulting in a binary decision for reaching the next lower level. Each node has a preassigned element and threshold value. Conventional centroid distance training techniques (such as LBG and k-means) are used to establish code-book indices corresponding to a set of VQ centroids. The set of training vectors are used a second time to select a vector element and threshold value at each node that approximately splits the data evenly. After processing the training vectors through the binary tree using threshold decisions, a histogram is generated for each code-book index that represents the number of times a training vector belonging to a given index set appeared at each index. The final quantization is accomplished by processing and then selecting the nearest centroid belonging to that histogram.

Type: Grant

Filed: December 31, 1992

Date of Patent: March 31, 1998

Assignee: Apple Computer, Inc.

Inventors: Alejandro Acero, Kai-Fu Lee, Yen-Lu Chow
Continuous mandarin chinese speech recognition system having an integrated tone classifier

Patent number: 5602960

Abstract: A speech recognition system for continuous Mandarin Chinese speech comprises a microphone, an A/D converter, a syllable recognition system, an integrated tone classifier, and a confidence score augmentor. The syllable recognition system generates N-best theories with initial confidence scores. The integrated tone classifier has a pitch estimator to estimate the pitch of the input once and a long-term tone analyzer to segment the estimated pitch according to the syllables of each of the N-best theories. The long-term tone analyzer performs long-term tonal analysis on the segmented, estimated pitch and generates a long-term tonal confidence signal. The confidence score augmentor receives the initial confidence scores and the long-term tonal confidence signals, modifies each initial confidence score according to the corresponding long-term tonal confidence signal, re-ranks the N-best theories according to the augmented confidence scores, and outputs the N-best theories.

Type: Grant

Filed: September 30, 1994

Date of Patent: February 11, 1997

Assignee: Apple Computer, Inc.

Inventors: Hsiao-Wuen Hon, Yen-Lu Chow, Kai-Fu Lee
Handwriting signal processing front-end for handwriting recognizers

Patent number: 5577135

Abstract: A handwriting signal processing front-end method and apparatus for a handwriting training and recognition system which includes non-uniform segmentation and feature extraction in combination with multiple vector quantization. In a training phase, digitized handwriting samples are partitioned into segments of unequal length. Features are extracted from the segments and are grouped to form feature vectors for each segment. Groups of adjacent from feature vectors are then combined to form input frames. Feature-specific vectors are formed by grouping features of the same type from each of the feature vectors within a frame. Multiple vector quantization is then performed on each feature-specific vector to statistically model the distributions of the vectors for each feature by identifying clusters of the vectors and determining the mean locations of the vectors in the clusters. Each mean location is represented by a codebook symbol and this information is stored in a codebook for each feature.

Type: Grant

Filed: March 1, 1994

Date of Patent: November 19, 1996

Assignee: Apple Computer, Inc.

Inventors: Kamil A. Grajski, Yen-Lu Chow, Kai-Fu Lee

1 2 next