Patents by Inventor Frank Kao-Ping Soong

Frank Kao-Ping Soong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7496513
    Abstract: Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.
    Type: Grant
    Filed: June 28, 2005
    Date of Patent: February 24, 2009
    Assignee: Microsoft Corporation
    Inventors: Frank Kao-Ping Soong, Jian-Lai Zhou, Ye Tian
  • Publication number: 20090006096
    Abstract: Described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.
    Type: Application
    Filed: June 27, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Yusheng Li, Min Chu, Xin Zou, Frank Kao-ping Soong
  • Publication number: 20080240570
    Abstract: A forward pass through a sequence of strokes representing a handwritten equation is performed from the first stroke to the last stroke in the sequence. At each stroke, a path score is determined for a plurality of symbol-relation pairs that each represents a symbol and its spatial relation to a predecessor symbol. A symbol graph having nodes and links is constructed by backtracking through the strokes from the last stroke to the first stroke and assigning scores to the links based on the path scores for the symbol-relation pairs. The symbol graph is used to recognize a mathematical expression based in part on the scores for the links and the mathematical expression is stored.
    Type: Application
    Filed: March 29, 2007
    Publication date: October 2, 2008
    Applicant: Microsoft Corporation
    Inventors: Yu Shi, Frank Kao-Ping Soong, Jian-lai Zhou, Dongmei Zhang
  • Publication number: 20080243503
    Abstract: A method of providing discriminative training of a speech recognition unit is discussed. The method includes receiving an acoustic indication of an utterance having a hypothesis space and comparing the hypothesis space against a reference. The method measures the Kullback-Leibler Divergence (KLD) between the reference and the hypothesis space to adjust the reference and stores the adjusted reference on a tangible storage medium.
    Type: Application
    Filed: March 30, 2007
    Publication date: October 2, 2008
    Applicant: Microsoft Corporation
    Inventors: Frank Kao-Ping Soong, Peng Liu, Jian-Iai Zhou, Dongmei Zhang
  • Publication number: 20080208574
    Abstract: An automated method of providing a pronunciation of a word to a remote device is disclosed. The method includes receiving an input indicative of the word to be pronounced. The method further includes searching a database having a plurality of records. Each of the records has an indication of a textual representation and an associated indication of an audible representation. At least one output is provided to the remote device of an audible representation of the word to be pronounced.
    Type: Application
    Filed: February 28, 2007
    Publication date: August 28, 2008
    Applicant: Microsoft Corporation
    Inventors: Yining Chen, Yusheng Li, Min Chu, Frank Kao-Ping Soong
  • Publication number: 20080201145
    Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.
    Type: Application
    Filed: February 20, 2007
    Publication date: August 21, 2008
    Applicant: Microsoft Corporation
    Inventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
  • Publication number: 20080195381
    Abstract: Novel techniques for providing superior performance and sound quality in speech applications, such as speech synthesis, speech coding, and automatic speech recognition, are hereby disclosed. In one illustrative embodiment, a method includes modeling a speech signal with parameters comprising line spectrum pairs. Density parameters are provided based on the density of the line spectrum pairs. A speech application output, such as synthesized speech, is provided based at least in part on the line spectrum pair density parameters. The line spectrum pair density parameters use computing resources efficiently while providing improved performance and sound quality in the speech application output.
    Type: Application
    Filed: February 9, 2007
    Publication date: August 14, 2008
    Applicant: Microsoft Corporation
    Inventors: Frank Kao-Ping Soong, Yao Qian
  • Publication number: 20080189109
    Abstract: Boundary points for speech in an audio signal are determined based on posterior probabilities for the boundary points given a set of possible segmentations of the audio signal. The boundary point posterior probability is determined based on a set of level posterior probabilities that each provide the probability of a sequence of feature vectors given one of the segmentations in the set of possible segmentations.
    Type: Application
    Filed: February 5, 2007
    Publication date: August 7, 2008
    Applicant: Microsoft Corporation
    Inventors: Yu Shi, Frank Kao-Ping Soong
  • Publication number: 20080172224
    Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
    Type: Application
    Filed: January 11, 2007
    Publication date: July 17, 2008
    Applicant: Microsoft Corporation
    Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
  • Publication number: 20080120108
    Abstract: Performing speech recognition on a tonal language is done using a plurality of tonal models. Each tonal model has a multi-space distribution and corresponds to a known syllable in a language. A first data stream indicative of an observation of an utterance is received. The observation has both a discrete and a continuous tonal feature. A second data stream indicative of spectral features of a syllable of an utterance is also received. The first data stream is compared against at least one of the plurality of tonal models and the second data stream is compared against a spectral model.
    Type: Application
    Filed: November 16, 2006
    Publication date: May 22, 2008
    Inventors: Frank Kao-Ping Soong, Yao Qian
  • Publication number: 20080059183
    Abstract: A multi-state pattern recognition model with non-uniform kernel allocation is formed by setting a number of states for a multi-state pattern recognition model and assigning different numbers of kernels to different states. The kernels are then trained using training data to form the multi-state pattern recognition model.
    Type: Application
    Filed: August 16, 2006
    Publication date: March 6, 2008
    Applicant: Microsoft Corporation
    Inventors: Peng Liu, Jian-lai Zhou, Frank Kao-ping Soong
  • Publication number: 20080059169
    Abstract: Possible segmentations for an audio signal are scored based on distortions for feature vectors of the audio signal and the total number of segments in the segmentation. The scores are used to select a segmentation and the selected segmentation is used to identify a starting point and an ending point for a speech signal in the audio signal.
    Type: Application
    Filed: August 15, 2006
    Publication date: March 6, 2008
    Applicant: Microsoft Corporation
    Inventors: Yu Shi, Frank Kao-ping Soong, Jian-lai Zhou
  • Publication number: 20080059151
    Abstract: The language of origin of a word or named entity is predicted using estimates of frequency of occurrence of the word or named entity in different languages. In one embodiment, the normalized frequency of occurrence of the word or named entity in a variety of different languages is estimated and the values are used as features in a feature vector which is scored and used to identify language of origin.
    Type: Application
    Filed: September 1, 2006
    Publication date: March 6, 2008
    Applicant: Microsoft Corporation
    Inventors: Yi Ning Chen, Min Chu, Jiali You, Frank Kao-Ping Soong
  • Publication number: 20070005355
    Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.
    Type: Application
    Filed: July 1, 2005
    Publication date: January 4, 2007
    Applicant: Microsoft Corporation
    Inventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
  • Patent number: 6701291
    Abstract: A method and apparatus for extracting speech features from a speech signal in which the linear frequency spectrum data, as generated, for example, by a conventional frequency transform, is first converted to logarithmic frequency spectrum data having frequency data distributed on a substantially logarithmic (rather than linear) frequency scale. Then, a plurality of digital auditory filters is applied to the resultant logarithmic frequency spectrum data, each of these filters having a substantially similar shape, but centered at different points on the logarithmic frequency scale. Because each of the filters have a similar shape, the feature extraction approach of the present invention advantageously can be easily modified or tuned by adjusting each of the filters in a coordinated manner, with the adjustment of only a handful of filter parameters.
    Type: Grant
    Filed: April 2, 2001
    Date of Patent: March 2, 2004
    Assignee: Lucent Technologies Inc.
    Inventors: Qi P. Li, Olivier Siohan, Frank Kao-Ping Soong
  • Publication number: 20020062211
    Abstract: A method and apparatus for extracting speech features from a speech signal in which the linear frequency spectrum data, as generated, for example, by a conventional frequency transform, is first converted to logarithmic frequency spectrum data having frequency data distributed on a substantially logarithmic (rather than linear) frequency scale. Then, a plurality of digital auditory filters is applied to the resultant logarithmic frequency spectrum data, each of these filters having a substantially similar shape, but centered at different points on the logarithmic frequency scale. Because each of the filters have a similar shape, the feature extraction approach of the present invention advantageously can be easily modified or tuned by adjusting each of the filters in a coordinated manner, with the adjustment of only a handful of filter parameters.
    Type: Application
    Filed: April 2, 2001
    Publication date: May 23, 2002
    Inventors: Qi P. Li, Olivier Siohan, Frank Kao-Ping Soong
  • Patent number: 6138095
    Abstract: Speech recognition in which the log probabilities of the null and alternative hypothesis are computed for an input speech sample by comparison with specific stored speech vocabularies/grammars and with general speech characteristics. The difference in probabilities is normalized by the magnitude of the null hypothesis to derive a likelihood factor which is compared with a rejection threshold that is utterance-length dependent. Advantageously, a high-order polynomial representation of the rejection threshold length dependency may be simplified by a series of piece-wise constants which are stored as rejection thresholds to be selected in accordance with the length of the input speech sample.
    Type: Grant
    Filed: September 3, 1998
    Date of Patent: October 24, 2000
    Assignee: Lucent Technologies Inc.
    Inventors: Sunil K. Gupta, Frank Kao-Ping Soong
  • Patent number: 5680506
    Abstract: The present invention provides a novel method of analyzing speech signals in order to reduce the computational power required to perform both speech compression and voice recognition operations. Digital speech signals are provided to a speech analyzer which generates a linear predictive coded (LPC) speech analysis signal that is compatible for use in both the voice recognition circuit and the speech compression circuit. The speech analysis signal is then provided to the compression circuit, which further processes the signal into a form used by an encoder and then the encoder encodes the processed signal. The same speech analysis signal is also provided to a voice recognition circuit, which further processes the signal into a form used by a recognizer and then the recognizer performs recognition on the processed signal.
    Type: Grant
    Filed: December 29, 1994
    Date of Patent: October 21, 1997
    Assignee: Lucent Technologies Inc.
    Inventors: Peter Kroon, Suhas A. Pai, Frank Kao-Ping Soong
  • Patent number: 5675704
    Abstract: A facility is provided for allowing a caller to place a telephone call by merely uttering a label identifying a desired called destination and to charge the telephone call to a particular billing account by merely uttering a label identifying that account. Alternatively, the caller may place the call by dialing or uttering the telephone number of the called destination or by entering a speed dial code associated with that telephone number. The facility includes a speaker verification system which employs cohort normalized scoring. Cohort normalized scoring provides a dynamic threshold for the verification process making the process more robust to variation in training and verification utterences. Such variation may be caused by, e.g., changes in communication channel characteristics or speaker loudness level.
    Type: Grant
    Filed: April 26, 1996
    Date of Patent: October 7, 1997
    Assignee: Lucent Technologies Inc.
    Inventors: Biing-Hwang Juang, Chin-Hui Lee, Aaron Edward Rosenberg, Frank Kao-Ping Soong