Training Of Hmm (epo) Patents (Class 704/256.2)
  • Patent number: 9280968
    Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.
    Type: Grant
    Filed: October 4, 2013
    Date of Patent: March 8, 2016
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Enrico Luigi Bocchieri, Dimitrios Dimitriadis
  • Patent number: 9276893
    Abstract: Techniques, systems, and articles of manufacture for determining the current logical state of a social media communication thread. A method includes computing an initial probability for applicability of each of multiple logical states for a first entry in a social media communication thread, wherein each logical state corresponds to a stage of interaction between customers of an enterprise and/or agents of the enterprise based on features derived from content of entries in the communication thread, network structure of entries, and identity of authors of entries, computing a transition probability between each subsequent consecutive entry in the communication thread, wherein the transition probability indicates the probability of moving from one logical state to another, and determining the current logical state of the communication thread based on the computed initial probability for the first entry and the computed transition probability between each subsequent entry in the communication thread.
    Type: Grant
    Filed: January 15, 2013
    Date of Patent: March 1, 2016
    Assignee: International Business Machines Corporation
    Inventors: Jitendra Ajmera, Ashish Verma, Katyaini H. Naga
  • Patent number: 9263030
    Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.
    Type: Grant
    Filed: January 23, 2013
    Date of Patent: February 16, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shizhen Wang, Yifan Gong, Fileno Alleva
  • Patent number: 9218806
    Abstract: Features are disclosed for selecting and using multiple transforms associated with a particular remote device for use in automatic speech recognition (“ASR”). Each transform may be based on statistics that have been generated from processing utterances that share some characteristic (e.g., acoustic characteristics, time frame within which the utterances where processed, etc.). When an utterance is received from the remote device, a particular transform or set of transforms may be selected for use in speech processing based on data obtained from the remote device, speech processing of a portion of the utterance, speech processing of prior utterances, etc. The transform or transforms used in processing the utterances may then be updated based on the results of the speech processing.
    Type: Grant
    Filed: May 10, 2013
    Date of Patent: December 22, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Stan Weidner Salvador, Shengbin Yang, Hugh Evan Secker-Walker, Karthik Ramakrishnan
  • Patent number: 9043209
    Abstract: This device 301 stores a first content-specific language model representing a probability that a specific word appears in a word sequence representing a first content, and a second content-specific language model representing a probability that the specific word appears in a word sequence representing a second content. Based on a first probability parameter representing a probability that a content represented by a target word sequence included in a speech recognition hypothesis generated by a speech recognition process of recognizing a word sequence corresponding to a speech, a second probability parameter representing a probability that the content represented by the target word sequence is a second content, the first content-specific language model and the second content-specific language model, the device creates a language model representing a probability that the specific word appears in a word sequence corresponding to a part corresponding to the target word sequence of the speech.
    Type: Grant
    Filed: September 3, 2009
    Date of Patent: May 26, 2015
    Assignee: NEC CORPORATION
    Inventors: Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki
  • Patent number: 9037460
    Abstract: Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction.
    Type: Grant
    Filed: March 28, 2012
    Date of Patent: May 19, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jian Luan, Linfang Wang, Hairong Xia, Sheng Zhao, Daniela Braga
  • Patent number: 9020820
    Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.
    Type: Grant
    Filed: April 13, 2012
    Date of Patent: April 28, 2015
    Assignee: Fujitsu Limited
    Inventors: Shoji Hayakawa, Naoshi Matsuo
  • Publication number: 20150100312
    Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.
    Type: Application
    Filed: October 4, 2013
    Publication date: April 9, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Enrico Luigi BOCCHIERI, Dimitrios DIMITRIADIS
  • Patent number: 8964948
    Abstract: A method for setting a voice tag is provided, which comprises the following steps. First, counting a number of phone calls performed between a user and a contact person. If the number of phone calls exceeds a predetermined times or a voice dialing performed by the user is failed before calling to the contact person within a predetermined duration, the user is inquired whether or not to set a voice tag corresponding to the contact person after the phone call is complete. If the user decides to set the voice tag, a voice training procedure is executed for setting the voice tag corresponding to the contact person.
    Type: Grant
    Filed: May 29, 2012
    Date of Patent: February 24, 2015
    Assignee: HTC Corporation
    Inventor: Fu-Chiang Chou
  • Patent number: 8959022
    Abstract: A method for determining a relatedness between a query video and a database video is provided. A processor extracts an audio stream from the query video to produce a query audio stream, extracts an audio stream from the database video to produce a database audio stream, produces a first-sized snippet from the query audio stream, and produces a first-sized snippet from the database audio stream. An estimation is made of a first most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the query audio stream. An estimation is made of a second most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the database audio stream. A similarity is measured between the first sequence and the second sequence producing a score of relatedness between the two snippets. Finally a relatedness is determined between the query video and a database video.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: February 17, 2015
    Assignee: Motorola Solutions, Inc.
    Inventors: Yang M. Cheng, Dusan Macho
  • Patent number: 8949130
    Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.
    Type: Grant
    Filed: October 21, 2009
    Date of Patent: February 3, 2015
    Assignee: Vlingo Corporation
    Inventor: Michael S. Phillips
  • Patent number: 8914292
    Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.
    Type: Grant
    Filed: October 21, 2009
    Date of Patent: December 16, 2014
    Assignee: Vlingo Corporation
    Inventor: Michael S. Phillips
  • Patent number: 8886533
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Grant
    Filed: October 25, 2011
    Date of Patent: November 11, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
  • Patent number: 8843370
    Abstract: Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system.
    Type: Grant
    Filed: November 26, 2007
    Date of Patent: September 23, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Daniel Willett, Chuang He
  • Patent number: 8812322
    Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.
    Type: Grant
    Filed: May 27, 2011
    Date of Patent: August 19, 2014
    Assignee: Adobe Systems Incorporated
    Inventors: Gautham J. Mysore, Paris Smaragdis
  • Patent number: 8756062
    Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: June 17, 2014
    Assignee: General Motors LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Patent number: 8744849
    Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.
    Type: Grant
    Filed: October 12, 2011
    Date of Patent: June 3, 2014
    Assignee: Industrial Technology Research Institute
    Inventor: Hsien-Cheng Liao
  • Patent number: 8700403
    Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 15, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Lin Zhao
  • Patent number: 8694316
    Abstract: An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognized. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts.
    Type: Grant
    Filed: October 20, 2005
    Date of Patent: April 8, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: John Brian Pickering, Timothy David Poultney, Benjamin Terrick Staniford, Matthew Whitbourne
  • Patent number: 8635067
    Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: January 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
  • Patent number: 8612225
    Abstract: A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: December 17, 2013
    Assignee: NEC Corporation
    Inventors: Takayuki Arakawa, Ken Hanazawa, Masanori Tsujikawa
  • Patent number: 8515758
    Abstract: Some implementations provide for speech recognition based on structured modeling, irrelevant variability normalization and unsupervised online adaptation of one or more speech recognition parameters. Some implementations may improve the ability of a runtime speech recognizer or decoder to adapt to new speakers and new environments.
    Type: Grant
    Filed: April 14, 2010
    Date of Patent: August 20, 2013
    Assignee: Microsoft Corporation
    Inventor: Qiang Huo
  • Patent number: 8510111
    Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: August 13, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
  • Patent number: 8484025
    Abstract: Disclosed embodiments relate to mapping an utterance to an action using a classifier. One illustrative computing device includes a user interface having an input component. The computing device further includes a processor and a computer-readable storage medium, having stored thereon program instructions that, upon execution by the processor, cause the computing device to perform a set of operations including: receiving an audio utterance via the input component; determining a text string based on the utterance; determining a string-feature vector based on the text string; selecting a target classifier from a set of classifiers, wherein the target classifier is selected based on a determination that a string-feature criteria of the target classifier corresponds to at least one string-feature of the string-feature vector; and initiating a target action that corresponds to the target classifier.
    Type: Grant
    Filed: October 4, 2012
    Date of Patent: July 9, 2013
    Assignee: Google Inc.
    Inventors: Pedro J. Moreno Mengibar, Martin Jansche, Fadi Biadsy
  • Patent number: 8484023
    Abstract: Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set. The acoustic model trained on the new test feature set may be used to decode user speech input to the speech recognition system.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: July 9, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 8386251
    Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.
    Type: Grant
    Filed: June 8, 2009
    Date of Patent: February 26, 2013
    Assignee: Microsoft Corporation
    Inventors: Nikko Strom, Julian Odell, Jon Hamaker
  • Patent number: 8374865
    Abstract: A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: February 12, 2013
    Assignee: Google Inc.
    Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar, Kaisuke Nakajima, Daniel Martin Bikel
  • Patent number: 8311825
    Abstract: A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree, said apparatus comprising: means to assign a language model probability to each of the words of the vocabulary using a first low order language model; means to calculate the language look ahead probabilities for all nodes in said tree using said first language model; means to determine if the language model probability of one or more words of said vocabulary can be calculated using a higher order language model and updating said words with the higher order language model; and means to update the look ahead probability at only the nodes which are affected by the words where the language model has been updated.
    Type: Grant
    Filed: October 3, 2008
    Date of Patent: November 13, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Langzhou Chen
  • Patent number: 8301449
    Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.
    Type: Grant
    Filed: October 16, 2006
    Date of Patent: October 30, 2012
    Assignee: Microsoft Corporation
    Inventors: Xiaodong He, Li Deng
  • Patent number: 8265930
    Abstract: The present invention relates to recording voice data using a voice communication device connected to a communication network and converting the voice data into a text file for delivery to a text communication device. In accordance with the present invention, the voice communication device may transfer the voice data in real-time or store the voice data on the device to be transmitted at a later time. Transcribing the voice data into a text file may be accomplished by automated computer software, either speaker-independent or speaker-dependent or by a human who transcribes the voice data into a text file. After transcribing the voice data into a text file, the text file may be delivered to a text communication device in a number of ways, such as email, file transfer protocol (FTP), or hypertext transfer protocol (HTTP).
    Type: Grant
    Filed: April 13, 2005
    Date of Patent: September 11, 2012
    Assignee: Sprint Communications Company L.P.
    Inventors: Bryce A. Jones, Raymond Edward Dickensheets
  • Patent number: 8234116
    Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: July 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou
  • Patent number: 8234112
    Abstract: Provided are an apparatus and method for generating a noise adaptive acoustic model including a noise adaptive discriminative adaptation method. The method includes: generating a baseline model parameter from large-capacity speech training data including various noise environments; and receiving the generated baseline model parameter and applying a discriminative adaptation method to the generated results to generate an migrated acoustic model parameter suitable for an actually applied environment.
    Type: Grant
    Filed: April 25, 2008
    Date of Patent: July 31, 2012
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Byung Ok Kang, Ho Young Jung, Yun Keun Lee
  • Patent number: 8229729
    Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.
    Type: Grant
    Filed: March 25, 2008
    Date of Patent: July 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
  • Patent number: 8229744
    Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.
    Type: Grant
    Filed: August 26, 2003
    Date of Patent: July 24, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
  • Patent number: 8160878
    Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.
    Type: Grant
    Filed: September 16, 2008
    Date of Patent: April 17, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
  • Patent number: 8086455
    Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: December 27, 2011
    Assignee: Microsoft Corporation
    Inventors: Yifan Gong, Ye Tian
  • Patent number: 8015008
    Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: September 6, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
  • Patent number: 8010341
    Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
    Type: Grant
    Filed: September 13, 2007
    Date of Patent: August 30, 2011
    Assignee: Microsoft Corporation
    Inventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
  • Patent number: 8010358
    Abstract: Methods and apparatus for voice recognition are disclosed. A voice signal is obtained and two or more voice recognition analyses are performed on the voice signal. Each voice recognition analysis uses a filter bank defined by a different maximum frequency and a different minimum frequency and wherein each voice recognition analysis produces a recognition probability ri of recognition of one or more speech units, whereby there are two or more recognition probabilities ri. The maximum frequency and the minimum frequency may be adjusted every time speech is windowed and analyzed. A final recognition probability Pf is determined based on the two or more recognition probabilities ri.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: August 30, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 7970614
    Abstract: The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. The method for treating distortion propagated though a detection system includes receiving a signal from a remote device, and compensating the signal for untreated distortions.
    Type: Grant
    Filed: May 8, 2007
    Date of Patent: June 28, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Janice J. Kim, Jiri Navratil, Jason W. Pelecanos, Ganesh N. Ramaswamy
  • Publication number: 20110071835
    Abstract: Embodiments of small footprint text-to-speech engine are disclosed. In operation, the small footprint text-to-speech engine generates a set of feature parameters for an input text. The set of feature parameters includes static feature parameters and delta feature parameters. The small footprint text-to-speech engine then derives a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta parameters. Finally, the small footprint text-to-speech engine produces a smoothed trajectory from the saw-tooth stochastic trajectory, and generates synthesized speech based on the smoothed trajectory.
    Type: Application
    Filed: September 22, 2009
    Publication date: March 24, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Yi-Ning Chen, Zhi-Jie Yan, Frank Kao-Ping Soong
  • Publication number: 20110015925
    Abstract: A speech recognition method, comprising: receiving a speech input in a first noise environment which comprises a sequence of observations; determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, comprising: providing an acoustic model for performing speech recognition on a input signal which comprises a sequence of observations, wherein said model has been trained to recognise speech in a second noise environment, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; adapting the model trained in the second environment to that of the first environment; the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said spee
    Type: Application
    Filed: March 26, 2010
    Publication date: January 20, 2011
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Mark John Francis Gales
  • Publication number: 20110010176
    Abstract: An HMM (Hidden Markov Model) learning device includes: a learning unit for learning a state transition probability as the function of actions that an agent can execute, with learning with HMM performed based on actions that the agent has executed, and time series information made up of an observation signal; and a storage unit for storing learning results by the learning unit as internal model data including a state-transition probability table and an observation probability table; with the learning unit calculating frequency variables used for estimation calculation of HMM state-transition and HMM observation probabilities; with the storage unit holding the frequency variables corresponding to each of state-transition probabilities and each of observation probabilities respectively, of the state-transition probability table; and with the learning unit using the frequency variables held by the storage unit to perform learning, and estimating the state-transition probability and the observation probability bas
    Type: Application
    Filed: July 2, 2010
    Publication date: January 13, 2011
    Inventors: Yukiko YOSHIIKE, Kenta Kawamoto, Kuniaki Noda, Kohtaro Sabe
  • Patent number: 7856351
    Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.
    Type: Grant
    Filed: January 19, 2007
    Date of Patent: December 21, 2010
    Assignee: Microsoft Corporation
    Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
  • Publication number: 20100318354
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Publication number: 20100312562
    Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
  • Patent number: 7818172
    Abstract: The method of recognizing speech in an acoustic signal comprises developing acoustic stochastic models of voice units in the form of a set of states of an acoustic signal and using the acoustic models for recognition by a comparison of the signal with predetermined acoustic models obtained via a prior learning process. While developing the acoustic models, the voice units are modeled by means of a first portion of the states independent of adjacent voice units and by means of a second portion of the states dependent on adjacent voice units. The second portion of states dependent on adjacent voice units shares common parameters with a plurality of units sharing same phonemes.
    Type: Grant
    Filed: April 20, 2004
    Date of Patent: October 19, 2010
    Assignee: France Telecom
    Inventors: Ronaldo Messina, Denis Jouvet
  • Patent number: 7805301
    Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.
    Type: Grant
    Filed: July 1, 2005
    Date of Patent: September 28, 2010
    Assignee: Microsoft Corporation
    Inventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
  • Patent number: 7778831
    Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: August 17, 2010
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Publication number: 20100204988
    Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.
    Type: Application
    Filed: April 20, 2010
    Publication date: August 12, 2010
    Inventors: Haitian XU, Kean Kheong Chin