With Insufficient Amount Of Training Data, E.g., State Sharing, Tying, Deleted Interpolation (epo) Patents (Class 704/256.3)
  • Patent number: 10699699
    Abstract: The embodiments of the present disclosure disclose a method for constructing a speech decoding network in digital speech recognition. The method comprises acquiring training data obtained by digital speech recording, the training data comprising a plurality of speech segments, and each speech segment comprising a plurality of digital speeches; performing acoustic feature extraction on the training data to obtain a feature sequence corresponding to each speech segment; performing progressive training starting from a mono-phoneme acoustic model to obtain an acoustic model; acquiring a language model, and constructing a speech decoding network by the language model and the acoustic model obtained by training.
    Type: Grant
    Filed: May 30, 2018
    Date of Patent: June 30, 2020
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Fuzhang Wu, Binghua Qian, Wei Li, Ke Li, Yongjian Wu, Feiyue Huang
  • Patent number: 9785891
    Abstract: Embodiments of a computer-implemented method for automatically analyzing a conversational sequence between multiple users are disclosed. The method includes receiving signals corresponding to a training dataset including multiple conversational sequences; extracting a feature from the training dataset based on predefined feature categories; formulating multiple tasks for being learned from the training dataset based on the extracted feature, each task related to a predefined label; and providing a model for each formulated task, the model including a set of parameters common to the tasks. The set includes an explicit parameter, which is explicitly shared with each of the formulated tasks. The method further includes optimizing a value of the explicit parameter to create an optimized model; creating a trained model for the formulated tasks using the optimized value of the explicit parameter; and assigning predefined labels for the formulated tasks to a live dataset based on the corresponding trained model.
    Type: Grant
    Filed: December 9, 2014
    Date of Patent: October 10, 2017
    Assignee: Conduent Business Services, LLC
    Inventors: Arvind Agarwal, Saurabh Kataria
  • Patent number: 9110887
    Abstract: According to one embodiment, a speech synthesis apparatus includes a language analyzer, statistical model storage, model selector, parameter generator, basis model storage, and filter processor. The language analyzer analyzes text data and outputs language information data that represents linguistic information of the text data. The statistical model storage stores statistical models prepared by statistically modeling acoustic information included in speech. The model selector selects a statistical model from the models based on the language information data. The parameter generator generates speech parameter sequences using the statistical model selected by the model selector. The basis model storage stores a basis model including basis vectors, each of which expresses speech information for each limited frequency range. The filter processor outputs synthetic speech by executing filter processing of the speech parameter sequences and the basis model.
    Type: Grant
    Filed: December 26, 2012
    Date of Patent: August 18, 2015
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Yamato Ohtani, Masatsune Tamura, Masahiro Morita
  • Patent number: 8838433
    Abstract: An architecture is discussed that provides the capability to subselect the most relevant data from an out-domain corpus to use either in isolation or in combination conjunction with in-domain data. The architecture is a domain adaptation for machine translation that selects the most relevant sentences from a larger general-domain corpus of parallel translated sentences. The methods for selecting the data include monolingual cross-entropy measure, monolingual cross-entropy difference, bilingual cross entropy, and bilingual cross-entropy difference. A translation model is trained on both the in-domain data and an out-domain subset, and the models can be interpolated together to boost performance on in-domain translation tasks.
    Type: Grant
    Filed: February 8, 2011
    Date of Patent: September 16, 2014
    Assignee: Microsoft Corporation
    Inventors: Amittai Axelrod, Jianfeng Gao, Xiaodong He
  • Patent number: 8700403
    Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 15, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Lin Zhao
  • Patent number: 8635067
    Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: January 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
  • Patent number: 8521529
    Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.
    Type: Grant
    Filed: April 18, 2005
    Date of Patent: August 27, 2013
    Assignee: Creative Technology Ltd
    Inventors: Michael M. Goodwin, Jean Laroche
  • Patent number: 8484023
    Abstract: Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set. The acoustic model trained on the new test feature set may be used to decode user speech input to the speech recognition system.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: July 9, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 8401841
    Abstract: Methods of retrieving documents using a language model are disclosed. A method may include preparing a language model of a plurality of documents, receiving a query, processing the query using the language model, and using the processed query to retrieve documents responding to the query via the search engine. The methods may be implemented in software and/or hardware on computing devices, including personal computers, telephones, servers, and others.
    Type: Grant
    Filed: August 30, 2007
    Date of Patent: March 19, 2013
    Assignee: OrcaTec LLC
    Inventors: Herbert L. Roitblat, Brian Golbère
  • Patent number: 8374865
    Abstract: A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: February 12, 2013
    Assignee: Google Inc.
    Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar, Kaisuke Nakajima, Daniel Martin Bikel
  • Patent number: 8340965
    Abstract: Embodiments of rich context modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.
    Type: Grant
    Filed: December 2, 2009
    Date of Patent: December 25, 2012
    Assignee: Microsoft Corporation
    Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
  • Patent number: 8244534
    Abstract: An exemplary method for generating speech based on text in one or more languages includes providing a phone set for two or more languages, training multilingual HMMs where the HMMs include state level sharing across languages, receiving text in one or more of the languages of the multilingual HMMs and generating speech, for the received text, based at least in part on the multilingual HMMs. Other exemplary techniques include mapping between a decision tree for a first language and a decision tree for a second language, and optionally vice versa, and Kullback-Leibler divergence analysis for a multilingual text-to-speech system.
    Type: Grant
    Filed: August 20, 2007
    Date of Patent: August 14, 2012
    Assignee: Microsoft Corporation
    Inventors: Yao Qian, Frank Kao-PingK Soong
  • Patent number: 8234116
    Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: July 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou
  • Patent number: 8140333
    Abstract: A probability density function compensation method used for a continuous hidden Markov model and a speech recognition method and apparatus, the probability density function compensation method including extracting feature vectors from speech signals, and using the extracted feature vectors, training a model having a plurality of probability density functions to increase probabilities of recognizing the speech signals; obtaining a global variance by averaging variances of the plurality of the probability density functions after completing the training; obtaining a compensation factor using the global variance; and applying the global variance to each of the probability density functions and compensating each of the probability density functions for the global variance using the compensation factor.
    Type: Grant
    Filed: February 28, 2005
    Date of Patent: March 20, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Icksang Han, Sangbae Jeong, Eugene Jon
  • Patent number: 8126162
    Abstract: An audio signal interpolation apparatus is configured to perform interpolation processing on the basis of audio signals preceding and/or following a predetermined segment on a time axis so as to obtain an audio signal corresponding to the predetermined segment. The audio signal interpolation apparatus includes a waveform formation unit configured to form a waveform for the predetermined segment on the basis of time-domain samples of the preceding and/or the following audio signals and a power control unit configured to control power of the waveform for the predetermined segment formed by the waveform formation unit using a non-linear model selected on the basis of the preceding audio signal when the power of the preceding audio signal is larger than that of the following audio signal, or the following audio signal when the power of the preceding audio signal is smaller than that of the following audio signal.
    Type: Grant
    Filed: May 23, 2007
    Date of Patent: February 28, 2012
    Assignee: Sony Corporation
    Inventors: Chunmao Zhang, Toru Chinen
  • Patent number: 7925505
    Abstract: Architecture is disclosed herewith for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application. The architecture allows assignment of an improved weighting value to each term or phrase to reduce empirical error. Empirical errors are minimized whether a user provides correction results or not based on criteria for discriminatively adapting the user language model (LM)/context-free grammar (CFG) to the target. Moreover, algorithms are provided for the training and adaptation processes of LM/CFG parameters for criteria optimization.
    Type: Grant
    Filed: April 10, 2007
    Date of Patent: April 12, 2011
    Assignee: Microsoft Corporation
    Inventor: Jian Wu
  • Patent number: 7853449
    Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: December 14, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
  • Patent number: 7587321
    Abstract: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.
    Type: Grant
    Filed: May 8, 2001
    Date of Patent: September 8, 2009
    Assignee: Intel Corporation
    Inventors: Xiaoxing Liu, Baosheng Yuan, Yonghong Yan
  • Patent number: 7542901
    Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.
    Type: Grant
    Filed: August 24, 2006
    Date of Patent: June 2, 2009
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
  • Patent number: 7437288
    Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.
    Type: Grant
    Filed: March 11, 2002
    Date of Patent: October 14, 2008
    Assignee: NEC Corporation
    Inventor: Koichi Shinoda
  • Patent number: 7395205
    Abstract: In an Automatic Speech Recognition (ASR) system having at least two language models, a method is provided for combining language model scores generated by at least two language models. A list of most likely words is generated for a current word in a word sequence uttered by a speaker, and acoustic scores corresponding to the most likely words are also generated. Language model scores are computed for each of the most likely words in the list, for each of the at least two language models. A set of coefficients to be used to combine the language model scores of each of the most likely words in the list is respectively and dynamically determined, based on a context of the current word. The language model scores of each of the most likely words in the list are respectively combined to obtain a composite score for each of the most likely words in the list, using the set of coefficients determined therefor.
    Type: Grant
    Filed: February 13, 2001
    Date of Patent: July 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Martin Franz, Peder Andreas Olsen
  • Patent number: 7143035
    Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.
    Type: Grant
    Filed: March 27, 2002
    Date of Patent: November 28, 2006
    Assignee: International Business Machines Corporation
    Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
  • Patent number: 7050975
    Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.
    Type: Grant
    Filed: October 9, 2002
    Date of Patent: May 23, 2006
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang