Patents by Inventor Frank Kao

Frank Kao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HANDWRITING SYMBOL RECOGNITION ACCURACY USING SPEECH INPUT

Publication number: 20090214117

Abstract: Described is a bimodal data input technology by which handwriting recognition results are combined with speech recognition results to improve overall recognition accuracy. Handwriting data and speech data corresponding to mathematical symbols are received and processed (including being recognized) into respective graphs. A fusion mechanism uses the speech graph to enhance the handwriting graph, e.g., to better distinguish between similar handwritten symbols that are often misrecognized. The graphs include nodes representing symbols, and arcs between the nodes representing probability scores. When arcs in the first and second graphs are determined to match one another, such as aligned in time and associated with corresponding symbols, the probability score in the second graph for that arc is used to adjust the matching probability score in the first graph. Normalization and smoothing may be performed to correspond the graphs to one another and to control the influence of one graph on the other.

Type: Application

Filed: February 26, 2008

Publication date: August 27, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Lei Ma, Yu Shi, Frank Kao-ping Soong
STATUS-AWARE PERSONAL INFORMATION MANAGEMENT

Publication number: 20090177601

Abstract: Described is a technology by which personal information that comes into a computer system is intelligently managed according to current state data including user presence and/or user attention data. Incoming information is processed against the state data to determine whether corresponding data is to be output, and if so, what output modality or modalities to use. For example, if a user is present and busy, a notification may be blocked or deferred to avoid disturbing the user. Cost analysis may be used to determine the cost of outputting the data. In addition to user state data, the importance of the information, other state data, the cost of converting data to another format for output (e.g., text-to-speech), and/or user preference data, may factor into the decision. The output data may be modified (e.g., audio made louder) based on a current output environment as determined via the state data.

Type: Application

Filed: January 8, 2008

Publication date: July 9, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Chao Huang, Chunhui Zhang, Frank Kao-ping Soong, Zhengyou Zhang, Yuan Kong
CONSTRAINED LINE SEARCH OPTIMIZATION FOR DISCRIMINATIVE TRAINING OF HMMS

Publication number: 20090132444

Abstract: An exemplary method for optimizing a continuous density hidden Markov model (CDHMM) includes imposing a constraint for discriminative training, approximating an objective function as a smooth function of CDHMM parameters and performing a constrained line search on the smoothed function to optimize values of the CDHMM parameters. Various other methods, devices and systems are disclosed.

Type: Application

Filed: November 20, 2007

Publication date: May 21, 2009

Applicant: Microsoft Corporation

Inventors: Peng Liu, Hui Jiang, Frank Kao-PingK Soong
Unnatural prosody detection in speech synthesis

Publication number: 20090083036

Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.

Type: Application

Filed: September 20, 2007

Publication date: March 26, 2009

Applicant: Microsoft Corporation

Inventors: Yong Zhao, Frank Kao-ping Soong, Min Chu, Lijuan Wang
HMM-BASED BILINGUAL (MANDARIN-ENGLISH) TTS TECHNIQUES

Publication number: 20090055162

Abstract: An exemplary method for generating speech based on text in one or more languages includes providing a phone set for two or more languages, training multilingual HMMs where the HMMs include state level sharing across languages, receiving text in one or more of the languages of the multilingual HMMs and generating speech, for the received text, based at least in part on the multilingual HMMs. Other exemplary techniques include mapping between a decision tree for a first language and a decision tree for a second language, and optionally vice versa, and Kullback-Leibler divergence analysis for a multilingual text-to-speech system.

Type: Application

Filed: August 20, 2007

Publication date: February 26, 2009

Applicant: Microsoft Corporation

Inventors: Yao Qian, Frank Kao-PingK Soong
Combined input processing for a computing device

Patent number: 7496513

Abstract: Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.

Type: Grant

Filed: June 28, 2005

Date of Patent: February 24, 2009

Assignee: Microsoft Corporation

Inventors: Frank Kao-Ping Soong, Jian-Lai Zhou, Ye Tian
Hidden Markov Model Based Handwriting/Calligraphy Generation

Publication number: 20090041354

Abstract: An exemplary method for handwritten character generation includes receiving one or more characters and, for the one or more received characters, generating handwritten characters using Hidden Markov Models trained for generating handwritten characters. In such a method the trained Hidden Markov Models can be adapted using a technique such as a maximum a posterior technique, a maximum likelihood linear regression technique or an Eigen-space technique.

Type: Application

Filed: August 10, 2007

Publication date: February 12, 2009

Applicant: Microsoft Corporation

Inventors: Peng Liu, Yi-Jian Wu, Lei Ma, Frank Kao-PingK Soong
Voice persona service for embedding text-to-speech features into software programs

Publication number: 20090006096

Abstract: Described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.

Type: Application

Filed: June 27, 2007

Publication date: January 1, 2009

Applicant: Microsoft Corporation

Inventors: Yusheng Li, Min Chu, Xin Zou, Frank Kao-ping Soong
SYMBOL GRAPH GENERATION IN HANDWRITTEN MATHEMATICAL EXPRESSION RECOGNITION

Publication number: 20080240570

Abstract: A forward pass through a sequence of strokes representing a handwritten equation is performed from the first stroke to the last stroke in the sequence. At each stroke, a path score is determined for a plurality of symbol-relation pairs that each represents a symbol and its spatial relation to a predecessor symbol. A symbol graph having nodes and links is constructed by backtracking through the strokes from the last stroke to the first stroke and assigning scores to the links based on the path scores for the symbol-relation pairs. The symbol graph is used to recognize a mathematical expression based in part on the scores for the links and the mathematical expression is stored.

Type: Application

Filed: March 29, 2007

Publication date: October 2, 2008

Applicant: Microsoft Corporation

Inventors: Yu Shi, Frank Kao-Ping Soong, Jian-lai Zhou, Dongmei Zhang
MINIMUM DIVERGENCE BASED DISCRIMINATIVE TRAINING FOR PATTERN RECOGNITION

Publication number: 20080243503

Abstract: A method of providing discriminative training of a speech recognition unit is discussed. The method includes receiving an acoustic indication of an utterance having a hypothesis space and comparing the hypothesis space against a reference. The method measures the Kullback-Leibler Divergence (KLD) between the reference and the hypothesis space to adjust the reference and stores the adjusted reference on a tangible storage medium.

Type: Application

Filed: March 30, 2007

Publication date: October 2, 2008

Applicant: Microsoft Corporation

Inventors: Frank Kao-Ping Soong, Peng Liu, Jian-Iai Zhou, Dongmei Zhang
EVENT RECOGNITION

Publication number: 20080215318

Abstract: Recognition of events can be performed by accessing an audio signal having static and dynamic features. A value for the audio signal can be calculated by utilizing different weights for the static and dynamic features such that a frame of the audio signal can be associated with a particular event. A filter can also be used to aid in determining the event for the frame.

Type: Application

Filed: March 1, 2007

Publication date: September 4, 2008

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Yuan Kong, Chao Huang, Frank Kao-Ping K. Soong
Name synthesis

Publication number: 20080208574

Abstract: An automated method of providing a pronunciation of a word to a remote device is disclosed. The method includes receiving an input indicative of the word to be pronounced. The method further includes searching a database having a plurality of records. Each of the records has an indication of a textual representation and an associated indication of an audible representation. At least one output is provided to the remote device of an audible representation of the word to be pronounced.

Type: Application

Filed: February 28, 2007

Publication date: August 28, 2008

Applicant: Microsoft Corporation

Inventors: Yining Chen, Yusheng Li, Min Chu, Frank Kao-Ping Soong
Unsupervised labeling of sentence level accent

Publication number: 20080201145

Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.

Type: Application

Filed: February 20, 2007

Publication date: August 21, 2008

Applicant: Microsoft Corporation

Inventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
Line Spectrum pair density modeling for speech applications

Publication number: 20080195381

Abstract: Novel techniques for providing superior performance and sound quality in speech applications, such as speech synthesis, speech coding, and automatic speech recognition, are hereby disclosed. In one illustrative embodiment, a method includes modeling a speech signal with parameters comprising line spectrum pairs. Density parameters are provided based on the density of the line spectrum pairs. A speech application output, such as synthesized speech, is provided based at least in part on the line spectrum pair density parameters. The line spectrum pair density parameters use computing resources efficiently while providing improved performance and sound quality in the speech application output.

Type: Application

Filed: February 9, 2007

Publication date: August 14, 2008

Applicant: Microsoft Corporation

Inventors: Frank Kao-Ping Soong, Yao Qian
Segmentation posterior based boundary point determination

Publication number: 20080189109

Abstract: Boundary points for speech in an audio signal are determined based on posterior probabilities for the boundary points given a set of possible segmentations of the audio signal. The boundary point posterior probability is determined based on a set of level posterior probabilities that each provide the probability of a sequence of feature vectors given one of the segmentations in the set of possible segmentations.

Type: Application

Filed: February 5, 2007

Publication date: August 7, 2008

Applicant: Microsoft Corporation

Inventors: Yu Shi, Frank Kao-Ping Soong
Position-dependent phonetic models for reliable pronunciation identification

Publication number: 20080172224

Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.

Type: Application

Filed: January 11, 2007

Publication date: July 17, 2008

Applicant: Microsoft Corporation

Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
Multi-space distribution for pattern recognition based on mixed continuous and discrete observations

Publication number: 20080120108

Abstract: Performing speech recognition on a tonal language is done using a plurality of tonal models. Each tonal model has a multi-space distribution and corresponds to a known syllable in a language. A first data stream indicative of an observation of an utterance is received. The observation has both a discrete and a continuous tonal feature. A second data stream indicative of spectral features of a syllable of an utterance is also received. The first data stream is compared against at least one of the plurality of tonal models and the second data stream is compared against a spectral model.

Type: Application

Filed: November 16, 2006

Publication date: May 22, 2008

Inventors: Frank Kao-Ping Soong, Yao Qian
Auto segmentation based partitioning and clustering approach to robust endpointing

Publication number: 20080059169

Abstract: Possible segmentations for an audio signal are scored based on distortions for feature vectors of the audio signal and the total number of segments in the segmentation. The scores are used to select a segmentation and the selected segmentation is used to identify a starting point and an ending point for a speech signal in the audio signal.

Type: Application

Filed: August 15, 2006

Publication date: March 6, 2008

Applicant: Microsoft Corporation

Inventors: Yu Shi, Frank Kao-ping Soong, Jian-lai Zhou
Calculating cost measures between HMM acoustic models

Publication number: 20080059184

Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.

Type: Application

Filed: August 22, 2006

Publication date: March 6, 2008

Applicant: Microsoft Corporation

Inventors: Frank Kao-Ping K. Soong, Jian-Lai Zhou, Peng Liu
Identifying language of origin for words using estimates of normalized appearance frequency

Publication number: 20080059151

Abstract: The language of origin of a word or named entity is predicted using estimates of frequency of occurrence of the word or named entity in different languages. In one embodiment, the normalized frequency of occurrence of the word or named entity in a variety of different languages is estimated and the values are used as features in a feature vector which is scored and used to identify language of origin.

Type: Application

Filed: September 1, 2006

Publication date: March 6, 2008

Applicant: Microsoft Corporation

Inventors: Yi Ning Chen, Min Chu, Jiali You, Frank Kao-Ping Soong

prev 1 2 3 4 5 next