Training Of Hmm (epo) Patents (Class 704/256.2)

With insufficient amount of training data, e.g., state sharing, tying, deleted interpolation (epo) (Class 704/256.3)

System and method for query to ad matching using deep neural net based query embedding

Patent number: 12061979

Abstract: The present teaching relates to obtaining a model for identifying content matching a query. Training data are received which include queries, advertisements, and hyperlinks. A plurality of subwords are identified from each of the queries and a plurality of vectors for the plurality of subwords of each of the queries are obtained. Via a neural network, a vector for each of the queries is derived based on a plurality of vectors for the plurality of subwords of the query. A query/ads model is obtained via optimization with respect to an objective function, based on vectors associated with the plurality of subwords of each of the queries and vectors for the queries obtained from the neural network.

Type: Grant

Filed: February 9, 2018

Date of Patent: August 13, 2024

Assignee: YAHOO AD TECH LLC

Inventors: Erik Ordentlich, Milind Rao, Jun Shi, Andrew Feng
Updating models with trained model update objects

Patent number: 11887583

Abstract: Some devices may perform processing using machine learning models trained at a centralized system and distributed to the device. The centralized system may update the machine learning model and distribute the update to the device (or devices). To reduce the size of an update, the centralized system may train a model update object, which may be smaller in size than the model itself and thus more suitable for sending to the device(s). A device may receive the model update object and use it to update the on-device machine learning model; for example, by changing some parameters of the model. Parameters left unchanged during the update may retain their previous value. Thus, using the model update object to update the on-device model may result in a more accurate updated model when compared to sending an updated model compressed to a size similar to that of the model update object.

Type: Grant

Filed: June 9, 2021

Date of Patent: January 30, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Grant Strimel, Jonathan Jenner Macoskey, Ariya Rastrow
System and method for data augmentation and speech processing in dynamic acoustic environments

Patent number: 11783826

Abstract: A method, computer program product, and computing system for receiving one or more inputs indicative of at least one of: a relative location of a speaker and a microphone array, and a relative orientation of the speaker and the microphone array. One or more reference signals may be received. A speech processing system may be trained using the one or more inputs and the one or more reference signals.

Type: Grant

Filed: February 18, 2021

Date of Patent: October 10, 2023

Assignee: Nuance Communications, Inc.

Inventors: Patrick A. Naylor, Dushyant Sharma, Uwe Helmut Jost, William F. Ganong, III
Information processing apparatus, information processing method, and program

Patent number: 11398221

Abstract: [Problem] There are proposed an information processing apparatus, an information processing method, and a program, which are capable of learning a meaning corresponding to a speech recognition result of a first speech adaptively to a determination result as to whether or not a second speech is a restatement of the first speech. [Solution] An information processing apparatus including: a learning unit configured to learn, based on a determination result as to whether or not a second speech collected at second timing after first timing is a restatement of a first speech collected at the first timing, a meaning corresponding to a speech recognition result of the first speech.

Type: Grant

Filed: November 30, 2018

Date of Patent: July 26, 2022

Assignee: SONY CORPORATION

Inventors: Shinichi Kawano, Hiro Iwase, Yuhei Taki
Synthetic narrowband data generation for narrowband automatic speech recognition systems

Patent number: 11302308

Abstract: A method for generating synthetic telephony narrowband data for training an automatic speech recognition model by receiving a broadband audio data file and then initiating a telephony call using a pre-configured telephone provider to play the broadband audio data file in the telephony call and to record and store audio data generated by transmission of the broadband audio data file in the telephony call, thereby generating the synthetic telephony narrowband data file from the broadband audio data file.

Type: Grant

Filed: July 9, 2019

Date of Patent: April 12, 2022

Assignee: International Business Machines Corporation

Inventors: Vamshi Krishna Thotempudi, Pierre-Hadrien Arnoux, Vibha S. Sinha
Synthetic narrowband data generation for narrowband automatic speech recognition systems

Patent number: 11295726

Abstract: A system and apparatus are provided for generating synthetic telephony narrowband data for training an automatic speech recognition model by receiving a broadband audio data file and then initiating a telephony call using a pre-configured telephone provider to play the broadband audio data file in the telephony call and to record and store audio data generated by transmission of the broadband audio data file in the telephony call, thereby generating the synthetic telephony narrowband data file from the broadband audio data file.

Type: Grant

Filed: April 8, 2019

Date of Patent: April 5, 2022

Assignee: International Business Machines Corporation

Inventors: Vamshi Krishna Thotempudi, Pierre-Hadrien Arnoux, Vibha S. Sinha
Address normalization using deep learning and address feature vectors

Patent number: 10839156

Abstract: Generally described, one or more aspects of the present application correspond to a machine learning address normalization system. A system of deep learning networks can normalize the tokens of a free-form address into an address component hierarchy. Feature vectors representing various characters and words of the address tokens can be input into a bi-directional long short term memory network (LSTM) to generate a hidden state representation of each token, which can be individually passed through a softmax layer to generate probabilistic values of the token being each of the components in the address hierarchy. Thereafter, a conditional random field (CRF) model can select a particular address component for each token by using learned parameters to optimize a path through the collective outputs of the softmax layer for the tokens. Thus, the free-form address can be normalized to determine the values it contains for different components of a specified address hierarchy.

Type: Grant

Filed: January 3, 2019

Date of Patent: November 17, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Satyam Saxena, Sourav Kumar Agarwal, Alok Chandra
Integrating multi-channel inputs to determine user preferences

Patent number: 10649725

Abstract: Systems of the present disclosure adjust an interface mode of an application based on paralinguistic features of audio input. The audio input is via a microphone associated with a computing device. A predictive model uses paralinguistic features of the audio input and additional features received from sensors or a user profile to predict an interface mode that a user would currently prefer to use. The interface mode specifies how output is provided and how input is received. The interface mode may also specify which elements of a graphical user interface are displayed, where the elements are placed, and how the elements are sized.

Type: Grant

Filed: October 27, 2016

Date of Patent: May 12, 2020

Assignee: Intuit Inc.

Inventors: Benjamin Indyk, Igor A. Podgorny, Raymond Chan
System and method of using neural transforms of robust audio features for speech processing

Patent number: 9280968

Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.

Type: Grant

Filed: October 4, 2013

Date of Patent: March 8, 2016

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Enrico Luigi Bocchieri, Dimitrios Dimitriadis
Determining states of a threaded communication

Patent number: 9276893

Abstract: Techniques, systems, and articles of manufacture for determining the current logical state of a social media communication thread. A method includes computing an initial probability for applicability of each of multiple logical states for a first entry in a social media communication thread, wherein each logical state corresponds to a stage of interaction between customers of an enterprise and/or agents of the enterprise based on features derived from content of entries in the communication thread, network structure of entries, and identity of authors of entries, computing a transition probability between each subsequent consecutive entry in the communication thread, wherein the transition probability indicates the probability of moving from one logical state to another, and determining the current logical state of the communication thread based on the computed initial probability for the first entry and the computed transition probability between each subsequent entry in the communication thread.

Type: Grant

Filed: January 15, 2013

Date of Patent: March 1, 2016

Assignee: International Business Machines Corporation

Inventors: Jitendra Ajmera, Ashish Verma, Katyaini H. Naga
Adaptive online feature normalization for speech recognition

Patent number: 9263030

Abstract: A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN.

Type: Grant

Filed: January 23, 2013

Date of Patent: February 16, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shizhen Wang, Yifan Gong, Fileno Alleva
Generation and use of multiple speech processing transforms

Patent number: 9218806

Abstract: Features are disclosed for selecting and using multiple transforms associated with a particular remote device for use in automatic speech recognition (“ASR”). Each transform may be based on statistics that have been generated from processing utterances that share some characteristic (e.g., acoustic characteristics, time frame within which the utterances where processed, etc.). When an utterance is received from the remote device, a particular transform or set of transforms may be selected for use in speech processing based on data obtained from the remote device, speech processing of a portion of the utterance, speech processing of prior utterances, etc. The transform or transforms used in processing the utterances may then be updated based on the results of the speech processing.

Type: Grant

Filed: May 10, 2013

Date of Patent: December 22, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Stan Weidner Salvador, Shengbin Yang, Hugh Evan Secker-Walker, Karthik Ramakrishnan
Language model creation device

Patent number: 9043209

Abstract: This device 301 stores a first content-specific language model representing a probability that a specific word appears in a word sequence representing a first content, and a second content-specific language model representing a probability that the specific word appears in a word sequence representing a second content. Based on a first probability parameter representing a probability that a content represented by a target word sequence included in a speech recognition hypothesis generated by a speech recognition process of recognizing a word sequence corresponding to a speech, a second probability parameter representing a probability that the content represented by the target word sequence is a second content, the first content-specific language model and the second content-specific language model, the device creates a language model representing a probability that the specific word appears in a word sequence corresponding to a part corresponding to the target word sequence of the speech.

Type: Grant

Filed: September 3, 2009

Date of Patent: May 26, 2015

Assignee: NEC CORPORATION

Inventors: Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki
Dynamic long-distance dependency with conditional random fields

Patent number: 9037460

Abstract: Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction.

Type: Grant

Filed: March 28, 2012

Date of Patent: May 19, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jian Luan, Linfang Wang, Hairong Xia, Sheng Zhao, Daniela Braga
State detecting apparatus, communication apparatus, and storage medium storing state detecting program

Patent number: 9020820

Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.

Type: Grant

Filed: April 13, 2012

Date of Patent: April 28, 2015

Assignee: Fujitsu Limited

Inventors: Shoji Hayakawa, Naoshi Matsuo
SYSTEM AND METHOD OF USING NEURAL TRANSFORMS OF ROBUST AUDIO FEATURES FOR SPEECH PROCESSING

Publication number: 20150100312

Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.

Type: Application

Filed: October 4, 2013

Publication date: April 9, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Enrico Luigi BOCCHIERI, Dimitrios DIMITRIADIS
Method for setting voice tag

Patent number: 8964948

Abstract: A method for setting a voice tag is provided, which comprises the following steps. First, counting a number of phone calls performed between a user and a contact person. If the number of phone calls exceeds a predetermined times or a voice dialing performed by the user is failed before calling to the contact person within a predetermined duration, the user is inquired whether or not to set a voice tag corresponding to the contact person after the phone call is complete. If the user decides to set the voice tag, a voice training procedure is executed for setting the voice tag corresponding to the contact person.

Type: Grant

Filed: May 29, 2012

Date of Patent: February 24, 2015

Assignee: HTC Corporation

Inventor: Fu-Chiang Chou
System for media correlation based on latent evidences of audio

Patent number: 8959022

Abstract: A method for determining a relatedness between a query video and a database video is provided. A processor extracts an audio stream from the query video to produce a query audio stream, extracts an audio stream from the database video to produce a database audio stream, produces a first-sized snippet from the query audio stream, and produces a first-sized snippet from the database audio stream. An estimation is made of a first most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the query audio stream. An estimation is made of a second most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the database audio stream. A similarity is measured between the first sequence and the second sequence producing a score of relatedness between the two snippets. Finally a relatedness is determined between the query video and a database video.

Type: Grant

Filed: November 19, 2012

Date of Patent: February 17, 2015

Assignee: Motorola Solutions, Inc.

Inventors: Yang M. Cheng, Dusan Macho
Internal and external speech recognition use with a mobile communication facility

Patent number: 8949130

Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.

Type: Grant

Filed: October 21, 2009

Date of Patent: February 3, 2015

Assignee: Vlingo Corporation

Inventor: Michael S. Phillips
Internal and external speech recognition use with a mobile communication facility

Patent number: 8914292

Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.

Type: Grant

Filed: October 21, 2009

Date of Patent: December 16, 2014

Assignee: Vlingo Corporation

Inventor: Michael S. Phillips
System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

Patent number: 8886533

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.

Type: Grant

Filed: October 25, 2011

Date of Patent: November 11, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
Joint discriminative training of multiple speech recognizers

Patent number: 8843370

Abstract: Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system.

Type: Grant

Filed: November 26, 2007

Date of Patent: September 23, 2014

Assignee: Nuance Communications, Inc.

Inventors: Daniel Willett, Chuang He
Semi-supervised source separation using non-negative techniques

Patent number: 8812322

Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.

Type: Grant

Filed: May 27, 2011

Date of Patent: August 19, 2014

Assignee: Adobe Systems Incorporated

Inventors: Gautham J. Mysore, Paris Smaragdis
Male acoustic model adaptation based on language-independent female speech data

Patent number: 8756062

Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.

Type: Grant

Filed: December 10, 2010

Date of Patent: June 17, 2014

Assignee: General Motors LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
Microphone-array-based speech recognition system and method

Patent number: 8744849

Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.

Type: Grant

Filed: October 12, 2011

Date of Patent: June 3, 2014

Assignee: Industrial Technology Research Institute

Inventor: Hsien-Cheng Liao
Unified treatment of data-sparseness and data-overfitting in maximum entropy modeling

Patent number: 8700403

Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.

Type: Grant

Filed: November 3, 2005

Date of Patent: April 15, 2014

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Lin Zhao
Methods, apparatus and computer programs for automatic speech recognition

Patent number: 8694316

Abstract: An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognized. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts.

Type: Grant

Filed: October 20, 2005

Date of Patent: April 8, 2014

Assignee: Nuance Communications, Inc.

Inventors: John Brian Pickering, Timothy David Poultney, Benjamin Terrick Staniford, Matthew Whitbourne
Model restructuring for client and server based automatic speech recognition

Patent number: 8635067

Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.

Type: Grant

Filed: December 9, 2010

Date of Patent: January 21, 2014

Assignee: International Business Machines Corporation

Inventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
Voice recognition device, voice recognition method, and voice recognition program

Patent number: 8612225

Abstract: A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.

Type: Grant

Filed: February 26, 2008

Date of Patent: December 17, 2013

Assignee: NEC Corporation

Inventors: Takayuki Arakawa, Ken Hanazawa, Masanori Tsujikawa
Speech recognition including removal of irrelevant information

Patent number: 8515758

Abstract: Some implementations provide for speech recognition based on structured modeling, irrelevant variability normalization and unsupervised online adaptation of one or more speech recognition parameters. Some implementations may improve the ability of a runtime speech recognizer or decoder to adapt to new speakers and new environments.

Type: Grant

Filed: April 14, 2010

Date of Patent: August 20, 2013

Assignee: Microsoft Corporation

Inventor: Qiang Huo
Speech recognition apparatus and method and program therefor

Patent number: 8510111

Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s

Type: Grant

Filed: February 8, 2008

Date of Patent: August 13, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
Sparse representation features for speech recognition

Patent number: 8484023

Abstract: Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set. The acoustic model trained on the new test feature set may be used to decode user speech input to the speech recognition system.

Type: Grant

Filed: September 24, 2010

Date of Patent: July 9, 2013

Assignee: Nuance Communications, Inc.

Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
Mapping an audio utterance to an action using a classifier

Patent number: 8484025

Abstract: Disclosed embodiments relate to mapping an utterance to an action using a classifier. One illustrative computing device includes a user interface having an input component. The computing device further includes a processor and a computer-readable storage medium, having stored thereon program instructions that, upon execution by the processor, cause the computing device to perform a set of operations including: receiving an audio utterance via the input component; determining a text string based on the utterance; determining a string-feature vector based on the text string; selecting a target classifier from a set of classifiers, wherein the target classifier is selected based on a determination that a string-feature criteria of the target classifier corresponds to at least one string-feature of the string-feature vector; and initiating a target action that corresponds to the target classifier.

Type: Grant

Filed: October 4, 2012

Date of Patent: July 9, 2013

Assignee: Google Inc.

Inventors: Pedro J. Moreno Mengibar, Martin Jansche, Fadi Biadsy
Progressive application of knowledge sources in multistage speech recognition

Patent number: 8386251

Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.

Type: Grant

Filed: June 8, 2009

Date of Patent: February 26, 2013

Assignee: Microsoft Corporation

Inventors: Nikko Strom, Julian Odell, Jon Hamaker
Sampling training data for an automatic speech recognition system based on a benchmark classification distribution

Patent number: 8374865

Abstract: A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.

Type: Grant

Filed: April 26, 2012

Date of Patent: February 12, 2013

Assignee: Google Inc.

Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar, Kaisuke Nakajima, Daniel Martin Bikel
Automatic speech recognition method and apparatus

Patent number: 8311825

Abstract: A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree, said apparatus comprising: means to assign a language model probability to each of the words of the vocabulary using a first low order language model; means to calculate the language look ahead probabilities for all nodes in said tree using said first language model; means to determine if the language model probability of one or more words of said vocabulary can be calculated using a higher order language model and updating said words with the higher order language model; and means to update the look ahead probability at only the nodes which are affected by the words where the language model has been updated.

Type: Grant

Filed: October 3, 2008

Date of Patent: November 13, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Langzhou Chen
Minimum classification error training with growth transformation optimization

Patent number: 8301449

Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.

Type: Grant

Filed: October 16, 2006

Date of Patent: October 30, 2012

Assignee: Microsoft Corporation

Inventors: Xiaodong He, Li Deng
System and method for recording voice data and converting voice data to a text file

Patent number: 8265930

Abstract: The present invention relates to recording voice data using a voice communication device connected to a communication network and converting the voice data into a text file for delivery to a text communication device. In accordance with the present invention, the voice communication device may transfer the voice data in real-time or store the voice data on the device to be transmitted at a later time. Transcribing the voice data into a text file may be accomplished by automated computer software, either speaker-independent or speaker-dependent or by a human who transcribes the voice data into a text file. After transcribing the voice data into a text file, the text file may be delivered to a text communication device in a number of ways, such as email, file transfer protocol (FTP), or hypertext transfer protocol (HTTP).

Type: Grant

Filed: April 13, 2005

Date of Patent: September 11, 2012

Assignee: Sprint Communications Company L.P.

Inventors: Bryce A. Jones, Raymond Edward Dickensheets
Calculating cost measures between HMM acoustic models

Patent number: 8234116

Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.

Type: Grant

Filed: August 22, 2006

Date of Patent: July 31, 2012

Assignee: Microsoft Corporation

Inventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou
Apparatus and method for generating noise adaptive acoustic model for environment migration including noise adaptive discriminative adaptation method

Patent number: 8234112

Abstract: Provided are an apparatus and method for generating a noise adaptive acoustic model including a noise adaptive discriminative adaptation method. The method includes: generating a baseline model parameter from large-capacity speech training data including various noise environments; and receiving the generated baseline model parameter and applying a discriminative adaptation method to the generated results to generate an migrated acoustic model parameter suitable for an actually applied environment.

Type: Grant

Filed: April 25, 2008

Date of Patent: July 31, 2012

Assignee: Electronics and Telecommunications Research Institute

Inventors: Byung Ok Kang, Ho Young Jung, Yun Keun Lee
Class detection scheme and time mediated averaging of class dependent models

Patent number: 8229744

Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.

Type: Grant

Filed: August 26, 2003

Date of Patent: July 24, 2012

Assignee: Nuance Communications, Inc.

Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
Machine translation in continuous space

Patent number: 8229729

Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.

Type: Grant

Filed: March 25, 2008

Date of Patent: July 24, 2012

Assignee: International Business Machines Corporation

Inventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
Piecewise-based variable-parameter Hidden Markov Models and the training thereof

Patent number: 8160878

Abstract: A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter ? and a variance parameter ?. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter ? and the variance parameter ? of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data.

Type: Grant

Filed: September 16, 2008

Date of Patent: April 17, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Yifan Gong, Alejandro Acero
Model development authoring, generation and execution based on data and processor dependencies

Patent number: 8086455

Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.

Type: Grant

Filed: January 9, 2008

Date of Patent: December 27, 2011

Assignee: Microsoft Corporation

Inventors: Yifan Gong, Ye Tian
System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants

Patent number: 8015008

Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

Type: Grant

Filed: October 31, 2007

Date of Patent: September 6, 2011

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
Adding prototype information into probabilistic models

Patent number: 8010341

Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.

Type: Grant

Filed: September 13, 2007

Date of Patent: August 30, 2011

Assignee: Microsoft Corporation

Inventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
Voice recognition with parallel gender and age normalization

Patent number: 8010358

Abstract: Methods and apparatus for voice recognition are disclosed. A voice signal is obtained and two or more voice recognition analyses are performed on the voice signal. Each voice recognition analysis uses a filter bank defined by a different maximum frequency and a different minimum frequency and wherein each voice recognition analysis produces a recognition probability ri of recognition of one or more speech units, whereby there are two or more recognition probabilities ri. The maximum frequency and the minimum frequency may be adjusted every time speech is windowed and analyzed. A final recognition probability Pf is determined based on the two or more recognition probabilities ri.

Type: Grant

Filed: February 21, 2006

Date of Patent: August 30, 2011

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Continuous adaptation in detection systems via self-tuning from target population subsets

Patent number: 7970614

Abstract: The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. The method for treating distortion propagated though a detection system includes receiving a signal from a remote device, and compensating the signal for untreated distortions.

Type: Grant

Filed: May 8, 2007

Date of Patent: June 28, 2011

Assignee: Nuance Communications, Inc.

Inventors: Janice J. Kim, Jiri Navratil, Jason W. Pelecanos, Ganesh N. Ramaswamy
SMALL FOOTPRINT TEXT-TO-SPEECH ENGINE

Publication number: 20110071835

Abstract: Embodiments of small footprint text-to-speech engine are disclosed. In operation, the small footprint text-to-speech engine generates a set of feature parameters for an input text. The set of feature parameters includes static feature parameters and delta feature parameters. The small footprint text-to-speech engine then derives a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta parameters. Finally, the small footprint text-to-speech engine produces a smoothed trajectory from the saw-tooth stochastic trajectory, and generates synthesized speech based on the smoothed trajectory.

Type: Application

Filed: September 22, 2009

Publication date: March 24, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Yi-Ning Chen, Zhi-Jie Yan, Frank Kao-Ping Soong
SPEECH RECOGNITION SYSTEM AND METHOD

Publication number: 20110015925

Abstract: A speech recognition method, comprising: receiving a speech input in a first noise environment which comprises a sequence of observations; determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, comprising: providing an acoustic model for performing speech recognition on a input signal which comprises a sequence of observations, wherein said model has been trained to recognise speech in a second noise environment, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to an observation; adapting the model trained in the second environment to that of the first environment; the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said spee

Type: Application

Filed: March 26, 2010

Publication date: January 20, 2011

Applicant: Kabushiki Kaisha Toshiba

Inventors: Haitian Xu, Mark John Francis Gales

1 2 next