Training Of Hmm (epo) Patents (Class 704/256.2)
-
Patent number: 12061979Abstract: The present teaching relates to obtaining a model for identifying content matching a query. Training data are received which include queries, advertisements, and hyperlinks. A plurality of subwords are identified from each of the queries and a plurality of vectors for the plurality of subwords of each of the queries are obtained. Via a neural network, a vector for each of the queries is derived based on a plurality of vectors for the plurality of subwords of the query. A query/ads model is obtained via optimization with respect to an objective function, based on vectors associated with the plurality of subwords of each of the queries and vectors for the queries obtained from the neural network.Type: GrantFiled: February 9, 2018Date of Patent: August 13, 2024Assignee: YAHOO AD TECH LLCInventors: Erik Ordentlich, Milind Rao, Jun Shi, Andrew Feng
-
Patent number: 11887583Abstract: Some devices may perform processing using machine learning models trained at a centralized system and distributed to the device. The centralized system may update the machine learning model and distribute the update to the device (or devices). To reduce the size of an update, the centralized system may train a model update object, which may be smaller in size than the model itself and thus more suitable for sending to the device(s). A device may receive the model update object and use it to update the on-device machine learning model; for example, by changing some parameters of the model. Parameters left unchanged during the update may retain their previous value. Thus, using the model update object to update the on-device model may result in a more accurate updated model when compared to sending an updated model compressed to a size similar to that of the model update object.Type: GrantFiled: June 9, 2021Date of Patent: January 30, 2024Assignee: Amazon Technologies, Inc.Inventors: Grant Strimel, Jonathan Jenner Macoskey, Ariya Rastrow
-
Patent number: 11783826Abstract: A method, computer program product, and computing system for receiving one or more inputs indicative of at least one of: a relative location of a speaker and a microphone array, and a relative orientation of the speaker and the microphone array. One or more reference signals may be received. A speech processing system may be trained using the one or more inputs and the one or more reference signals.Type: GrantFiled: February 18, 2021Date of Patent: October 10, 2023Assignee: Nuance Communications, Inc.Inventors: Patrick A. Naylor, Dushyant Sharma, Uwe Helmut Jost, William F. Ganong, III
-
Patent number: 11398221Abstract: [Problem] There are proposed an information processing apparatus, an information processing method, and a program, which are capable of learning a meaning corresponding to a speech recognition result of a first speech adaptively to a determination result as to whether or not a second speech is a restatement of the first speech. [Solution] An information processing apparatus including: a learning unit configured to learn, based on a determination result as to whether or not a second speech collected at second timing after first timing is a restatement of a first speech collected at the first timing, a meaning corresponding to a speech recognition result of the first speech.Type: GrantFiled: November 30, 2018Date of Patent: July 26, 2022Assignee: SONY CORPORATIONInventors: Shinichi Kawano, Hiro Iwase, Yuhei Taki
-
Patent number: 11302308Abstract: A method for generating synthetic telephony narrowband data for training an automatic speech recognition model by receiving a broadband audio data file and then initiating a telephony call using a pre-configured telephone provider to play the broadband audio data file in the telephony call and to record and store audio data generated by transmission of the broadband audio data file in the telephony call, thereby generating the synthetic telephony narrowband data file from the broadband audio data file.Type: GrantFiled: July 9, 2019Date of Patent: April 12, 2022Assignee: International Business Machines CorporationInventors: Vamshi Krishna Thotempudi, Pierre-Hadrien Arnoux, Vibha S. Sinha
-
Patent number: 11295726Abstract: A system and apparatus are provided for generating synthetic telephony narrowband data for training an automatic speech recognition model by receiving a broadband audio data file and then initiating a telephony call using a pre-configured telephone provider to play the broadband audio data file in the telephony call and to record and store audio data generated by transmission of the broadband audio data file in the telephony call, thereby generating the synthetic telephony narrowband data file from the broadband audio data file.Type: GrantFiled: April 8, 2019Date of Patent: April 5, 2022Assignee: International Business Machines CorporationInventors: Vamshi Krishna Thotempudi, Pierre-Hadrien Arnoux, Vibha S. Sinha
-
Patent number: 10839156Abstract: Generally described, one or more aspects of the present application correspond to a machine learning address normalization system. A system of deep learning networks can normalize the tokens of a free-form address into an address component hierarchy. Feature vectors representing various characters and words of the address tokens can be input into a bi-directional long short term memory network (LSTM) to generate a hidden state representation of each token, which can be individually passed through a softmax layer to generate probabilistic values of the token being each of the components in the address hierarchy. Thereafter, a conditional random field (CRF) model can select a particular address component for each token by using learned parameters to optimize a path through the collective outputs of the softmax layer for the tokens. Thus, the free-form address can be normalized to determine the values it contains for different components of a specified address hierarchy.Type: GrantFiled: January 3, 2019Date of Patent: November 17, 2020Assignee: Amazon Technologies, Inc.Inventors: Satyam Saxena, Sourav Kumar Agarwal, Alok Chandra
-
Patent number: 10649725Abstract: Systems of the present disclosure adjust an interface mode of an application based on paralinguistic features of audio input. The audio input is via a microphone associated with a computing device. A predictive model uses paralinguistic features of the audio input and additional features received from sensors or a user profile to predict an interface mode that a user would currently prefer to use. The interface mode specifies how output is provided and how input is received. The interface mode may also specify which elements of a graphical user interface are displayed, where the elements are placed, and how the elements are sized.Type: GrantFiled: October 27, 2016Date of Patent: May 12, 2020Assignee: Intuit Inc.Inventors: Benjamin Indyk, Igor A. Podgorny, Raymond Chan
-
Patent number: 9280968Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.Type: GrantFiled: October 4, 2013Date of Patent: March 8, 2016Assignee: AT&T Intellectual Property I, L.P.Inventors: Enrico Luigi Bocchieri, Dimitrios Dimitriadis
-
Patent number: 9276893Abstract: Techniques, systems, and articles of manufacture for determining the current logical state of a social media communication thread. A method includes computing an initial probability for applicability of each of multiple logical states for a first entry in a social media communication thread, wherein each logical state corresponds to a stage of interaction between customers of an enterprise and/or agents of the enterprise based on features derived from content of entries in the communication thread, network structure of entries, and identity of authors of entries, computing a transition probability between each subsequent consecutive entry in the communication thread, wherein the transition probability indicates the probability of moving from one logical state to another, and determining the current logical state of the communication thread based on the computed initial probability for the first entry and the computed transition probability between each subsequent entry in the communication thread.Type: GrantFiled: January 15, 2013Date of Patent: March 1, 2016Assignee: International Business Machines CorporationInventors: Jitendra Ajmera, Ashish Verma, Katyaini H. Naga
-
Patent number: 9263030Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.Type: GrantFiled: January 23, 2013Date of Patent: February 16, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Shizhen Wang, Yifan Gong, Fileno Alleva
-
Patent number: 9218806Abstract: Features are disclosed for selecting and using multiple transforms associated with a particular remote device for use in automatic speech recognition (“ASR”). Each transform may be based on statistics that have been generated from processing utterances that share some characteristic (e.g., acoustic characteristics, time frame within which the utterances where processed, etc.). When an utterance is received from the remote device, a particular transform or set of transforms may be selected for use in speech processing based on data obtained from the remote device, speech processing of a portion of the utterance, speech processing of prior utterances, etc. The transform or transforms used in processing the utterances may then be updated based on the results of the speech processing.Type: GrantFiled: May 10, 2013Date of Patent: December 22, 2015Assignee: Amazon Technologies, Inc.Inventors: Stan Weidner Salvador, Shengbin Yang, Hugh Evan Secker-Walker, Karthik Ramakrishnan
-
Patent number: 9043209Abstract: This device 301 stores a first content-specific language model representing a probability that a specific word appears in a word sequence representing a first content, and a second content-specific language model representing a probability that the specific word appears in a word sequence representing a second content. Based on a first probability parameter representing a probability that a content represented by a target word sequence included in a speech recognition hypothesis generated by a speech recognition process of recognizing a word sequence corresponding to a speech, a second probability parameter representing a probability that the content represented by the target word sequence is a second content, the first content-specific language model and the second content-specific language model, the device creates a language model representing a probability that the specific word appears in a word sequence corresponding to a part corresponding to the target word sequence of the speech.Type: GrantFiled: September 3, 2009Date of Patent: May 26, 2015Assignee: NEC CORPORATIONInventors: Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki
-
Patent number: 9037460Abstract: Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction.Type: GrantFiled: March 28, 2012Date of Patent: May 19, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Jian Luan, Linfang Wang, Hairong Xia, Sheng Zhao, Daniela Braga
-
Patent number: 9020820Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.Type: GrantFiled: April 13, 2012Date of Patent: April 28, 2015Assignee: Fujitsu LimitedInventors: Shoji Hayakawa, Naoshi Matsuo
-
Publication number: 20150100312Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.Type: ApplicationFiled: October 4, 2013Publication date: April 9, 2015Applicant: AT&T Intellectual Property I, L.P.Inventors: Enrico Luigi BOCCHIERI, Dimitrios DIMITRIADIS
-
Patent number: 8964948Abstract: A method for setting a voice tag is provided, which comprises the following steps. First, counting a number of phone calls performed between a user and a contact person. If the number of phone calls exceeds a predetermined times or a voice dialing performed by the user is failed before calling to the contact person within a predetermined duration, the user is inquired whether or not to set a voice tag corresponding to the contact person after the phone call is complete. If the user decides to set the voice tag, a voice training procedure is executed for setting the voice tag corresponding to the contact person.Type: GrantFiled: May 29, 2012Date of Patent: February 24, 2015Assignee: HTC CorporationInventor: Fu-Chiang Chou
-
Patent number: 8959022Abstract: A method for determining a relatedness between a query video and a database video is provided. A processor extracts an audio stream from the query video to produce a query audio stream, extracts an audio stream from the database video to produce a database audio stream, produces a first-sized snippet from the query audio stream, and produces a first-sized snippet from the database audio stream. An estimation is made of a first most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the query audio stream. An estimation is made of a second most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the database audio stream. A similarity is measured between the first sequence and the second sequence producing a score of relatedness between the two snippets. Finally a relatedness is determined between the query video and a database video.Type: GrantFiled: November 19, 2012Date of Patent: February 17, 2015Assignee: Motorola Solutions, Inc.Inventors: Yang M. Cheng, Dusan Macho
-
Patent number: 8949130Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.Type: GrantFiled: October 21, 2009Date of Patent: February 3, 2015Assignee: Vlingo CorporationInventor: Michael S. Phillips
-
Patent number: 8914292Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.Type: GrantFiled: October 21, 2009Date of Patent: December 16, 2014Assignee: Vlingo CorporationInventor: Michael S. Phillips
-
Patent number: 8886533Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.Type: GrantFiled: October 25, 2011Date of Patent: November 11, 2014Assignee: AT&T Intellectual Property I, L.P.Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
-
Patent number: 8843370Abstract: Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system.Type: GrantFiled: November 26, 2007Date of Patent: September 23, 2014Assignee: Nuance Communications, Inc.Inventors: Daniel Willett, Chuang He
-
Patent number: 8812322Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.Type: GrantFiled: May 27, 2011Date of Patent: August 19, 2014Assignee: Adobe Systems IncorporatedInventors: Gautham J. Mysore, Paris Smaragdis
-
Patent number: 8756062Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.Type: GrantFiled: December 10, 2010Date of Patent: June 17, 2014Assignee: General Motors LLCInventors: Gaurav Talwar, Rathinavelu Chengalvarayan
-
Patent number: 8744849Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.Type: GrantFiled: October 12, 2011Date of Patent: June 3, 2014Assignee: Industrial Technology Research InstituteInventor: Hsien-Cheng Liao
-
Patent number: 8700403Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.Type: GrantFiled: November 3, 2005Date of Patent: April 15, 2014Assignee: Robert Bosch GmbHInventors: Fuliang Weng, Lin Zhao
-
Patent number: 8694316Abstract: An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognized. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts.Type: GrantFiled: October 20, 2005Date of Patent: April 8, 2014Assignee: Nuance Communications, Inc.Inventors: John Brian Pickering, Timothy David Poultney, Benjamin Terrick Staniford, Matthew Whitbourne
-
Patent number: 8635067Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.Type: GrantFiled: December 9, 2010Date of Patent: January 21, 2014Assignee: International Business Machines CorporationInventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
-
Patent number: 8612225Abstract: A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.Type: GrantFiled: February 26, 2008Date of Patent: December 17, 2013Assignee: NEC CorporationInventors: Takayuki Arakawa, Ken Hanazawa, Masanori Tsujikawa
-
Patent number: 8515758Abstract: Some implementations provide for speech recognition based on structured modeling, irrelevant variability normalization and unsupervised online adaptation of one or more speech recognition parameters. Some implementations may improve the ability of a runtime speech recognizer or decoder to adapt to new speakers and new environments.Type: GrantFiled: April 14, 2010Date of Patent: August 20, 2013Assignee: Microsoft CorporationInventor: Qiang Huo
-
Patent number: 8510111Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the sType: GrantFiled: February 8, 2008Date of Patent: August 13, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
-
Patent number: 8484023Abstract: Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set. The acoustic model trained on the new test feature set may be used to decode user speech input to the speech recognition system.Type: GrantFiled: September 24, 2010Date of Patent: July 9, 2013Assignee: Nuance Communications, Inc.Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
-
Patent number: 8484025Abstract: Disclosed embodiments relate to mapping an utterance to an action using a classifier. One illustrative computing device includes a user interface having an input component. The computing device further includes a processor and a computer-readable storage medium, having stored thereon program instructions that, upon execution by the processor, cause the computing device to perform a set of operations including: receiving an audio utterance via the input component; determining a text string based on the utterance; determining a string-feature vector based on the text string; selecting a target classifier from a set of classifiers, wherein the target classifier is selected based on a determination that a string-feature criteria of the target classifier corresponds to at least one string-feature of the string-feature vector; and initiating a target action that corresponds to the target classifier.Type: GrantFiled: October 4, 2012Date of Patent: July 9, 2013Assignee: Google Inc.Inventors: Pedro J. Moreno Mengibar, Martin Jansche, Fadi Biadsy
-
Patent number: 8386251Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.Type: GrantFiled: June 8, 2009Date of Patent: February 26, 2013Assignee: Microsoft CorporationInventors: Nikko Strom, Julian Odell, Jon Hamaker
-
Patent number: 8374865Abstract: A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.Type: GrantFiled: April 26, 2012Date of Patent: February 12, 2013Assignee: Google Inc.Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar, Kaisuke Nakajima, Daniel Martin Bikel
-
Patent number: 8311825Abstract: A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree, said apparatus comprising: means to assign a language model probability to each of the words of the vocabulary using a first low order language model; means to calculate the language look ahead probabilities for all nodes in said tree using said first language model; means to determine if the language model probability of one or more words of said vocabulary can be calculated using a higher order language model and updating said words with the higher order language model; and means to update the look ahead probability at only the nodes which are affected by the words where the language model has been updated.Type: GrantFiled: October 3, 2008Date of Patent: November 13, 2012Assignee: Kabushiki Kaisha ToshibaInventor: Langzhou Chen
-
Patent number: 8301449Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.Type: GrantFiled: October 16, 2006Date of Patent: October 30, 2012Assignee: Microsoft CorporationInventors: Xiaodong He, Li Deng
-
Patent number: 8265930Abstract: The present invention relates to recording voice data using a voice communication device connected to a communication network and converting the voice data into a text file for delivery to a text communication device. In accordance with the present invention, the voice communication device may transfer the voice data in real-time or store the voice data on the device to be transmitted at a later time. Transcribing the voice data into a text file may be accomplished by automated computer software, either speaker-independent or speaker-dependent or by a human who transcribes the voice data into a text file. After transcribing the voice data into a text file, the text file may be delivered to a text communication device in a number of ways, such as email, file transfer protocol (FTP), or hypertext transfer protocol (HTTP).Type: GrantFiled: April 13, 2005Date of Patent: September 11, 2012Assignee: Sprint Communications Company L.P.Inventors: Bryce A. Jones, Raymond Edward Dickensheets
-
Patent number: 8234116Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.Type: GrantFiled: August 22, 2006Date of Patent: July 31, 2012Assignee: Microsoft CorporationInventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou
-
Patent number: 8234112Abstract: Provided are an apparatus and method for generating a noise adaptive acoustic model including a noise adaptive discriminative adaptation method. The method includes: generating a baseline model parameter from large-capacity speech training data including various noise environments; and receiving the generated baseline model parameter and applying a discriminative adaptation method to the generated results to generate an migrated acoustic model parameter suitable for an actually applied environment.Type: GrantFiled: April 25, 2008Date of Patent: July 31, 2012Assignee: Electronics and Telecommunications Research InstituteInventors: Byung Ok Kang, Ho Young Jung, Yun Keun Lee
-
Patent number: 8229744Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.Type: GrantFiled: August 26, 2003Date of Patent: July 24, 2012Assignee: Nuance Communications, Inc.Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
-
Patent number: 8229729Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.Type: GrantFiled: March 25, 2008Date of Patent: July 24, 2012Assignee: International Business Machines CorporationInventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
-
Patent number: 8160878Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.Type: GrantFiled: September 16, 2008Date of Patent: April 17, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
-
Patent number: 8086455Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.Type: GrantFiled: January 9, 2008Date of Patent: December 27, 2011Assignee: Microsoft CorporationInventors: Yifan Gong, Ye Tian
-
Patent number: 8015008Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.Type: GrantFiled: October 31, 2007Date of Patent: September 6, 2011Assignee: AT&T Intellectual Property I, L.P.Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
-
Patent number: 8010341Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.Type: GrantFiled: September 13, 2007Date of Patent: August 30, 2011Assignee: Microsoft CorporationInventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
-
Patent number: 8010358Abstract: Methods and apparatus for voice recognition are disclosed. A voice signal is obtained and two or more voice recognition analyses are performed on the voice signal. Each voice recognition analysis uses a filter bank defined by a different maximum frequency and a different minimum frequency and wherein each voice recognition analysis produces a recognition probability ri of recognition of one or more speech units, whereby there are two or more recognition probabilities ri. The maximum frequency and the minimum frequency may be adjusted every time speech is windowed and analyzed. A final recognition probability Pf is determined based on the two or more recognition probabilities ri.Type: GrantFiled: February 21, 2006Date of Patent: August 30, 2011Assignee: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Patent number: 7970614Abstract: The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. The method for treating distortion propagated though a detection system includes receiving a signal from a remote device, and compensating the signal for untreated distortions.Type: GrantFiled: May 8, 2007Date of Patent: June 28, 2011Assignee: Nuance Communications, Inc.Inventors: Janice J. Kim, Jiri Navratil, Jason W. Pelecanos, Ganesh N. Ramaswamy
-
Publication number: 20110071835Abstract: Embodiments of small footprint text-to-speech engine are disclosed. In operation, the small footprint text-to-speech engine generates a set of feature parameters for an input text. The set of feature parameters includes static feature parameters and delta feature parameters. The small footprint text-to-speech engine then derives a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta parameters. Finally, the small footprint text-to-speech engine produces a smoothed trajectory from the saw-tooth stochastic trajectory, and generates synthesized speech based on the smoothed trajectory.Type: ApplicationFiled: September 22, 2009Publication date: March 24, 2011Applicant: MICROSOFT CORPORATIONInventors: Yi-Ning Chen, Zhi-Jie Yan, Frank Kao-Ping Soong
-
Publication number: 20110015925Abstract: A speech recognition method, comprising: receiving a speech input in a first noise environment which comprises a sequence of observations; determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, comprising: providing an acoustic model for performing speech recognition on a input signal which comprises a sequence of observations, wherein said model has been trained to recognise speech in a second noise environment, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; adapting the model trained in the second environment to that of the first environment; the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speeType: ApplicationFiled: March 26, 2010Publication date: January 20, 2011Applicant: Kabushiki Kaisha ToshibaInventors: Haitian Xu, Mark John Francis Gales