Neural Network Patents (Class 704/232)
  • Patent number: 9031844
    Abstract: A method includes an act of causing a processor to access a deep-structured model retained in a computer-readable medium, the deep-structured model includes a plurality of layers with respective weights assigned to the plurality of layers, transition probabilities between states, and language model scores. The method further includes the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.
    Type: Grant
    Filed: September 21, 2010
    Date of Patent: May 12, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Li Deng, Abdel-rahman Samir Abdel-rahman Mohamed
  • Publication number: 20150127337
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.
    Type: Application
    Filed: April 22, 2014
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A.U. Bacchiani
  • Publication number: 20150127336
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.
    Type: Application
    Filed: March 28, 2014
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventors: Xin Lei, Erik McDermott, Ehsan Variani, Ignacio L. Moreno
  • Patent number: 9026065
    Abstract: Methods and apparatus for voice and data interlacing in a system having a shared antenna. In one embodiment, a voice and data communication system has a shared antenna for transmitting and receiving information in time slots, wherein the antenna can only be used for transmit or receive at a given time. The system determines timing requirements for data transmission and reception and interrupts data transmission for transmission of speech in selected intervals while meeting the data transmission timing and throughput requirements. The speech can be manipulated to fit with the selected intervals, to preserve the intelligibility of the manipulated speech.
    Type: Grant
    Filed: March 21, 2012
    Date of Patent: May 5, 2015
    Assignee: Raytheon Company
    Inventors: David R. Peterson, Timothy S. Loos, David F. Ring, James F. Keating
  • Patent number: 9009038
    Abstract: A method for analyzing a digital audio signal associated with a baby cry, comprising the steps of: (a) processing the digital audio signal using a spectral analysis to generate a spectral data; (b) processing the digital audio signal using a time-frequency analysis to generate a time-frequency characteristic; (c) categorizing the baby cry into one of a basic type and a special type based on the spectral data; (d) if the baby cry is of the basic type, determining a basic need based on the time-frequency characteristic and a predetermined lookup table; and (e) if the baby cry is of the special type, determining a special need by inputting the time-frequency characteristic into a pre-trained artificial neural network.
    Type: Grant
    Filed: May 22, 2013
    Date of Patent: April 14, 2015
    Assignee: National Taiwan Normal University
    Inventors: Jon-Chao Hong, Chao-Hsin Wu, Mei-Yung Chen
  • Publication number: 20150100312
    Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.
    Type: Application
    Filed: October 4, 2013
    Publication date: April 9, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Enrico Luigi BOCCHIERI, Dimitrios DIMITRIADIS
  • Publication number: 20150095027
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for key phrase detection. One of the methods includes receiving a plurality of audio frame vectors that each model an audio waveform during a different period of time, generating an output feature vector for each of the audio frame vectors, wherein each output feature vector includes a set of scores that characterize an acoustic match between the corresponding audio frame vector and a set of expected event vectors, each of the expected event vectors corresponding to one of the scores and defining acoustic properties of at least a portion of a keyword, and providing each of the output feature vectors to a posterior handling module.
    Type: Application
    Filed: September 30, 2013
    Publication date: April 2, 2015
    Applicant: Google Inc.
    Inventors: Maria Carolina Parada San Martin, Alexander H. Gruenstein, Guoguo Chen
  • Publication number: 20150095026
    Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Applicant: Amazon Technologies, Inc.
    Inventors: Michael Maximilian Emanuel Bisani, Nikko Strom, Bjorn Hoffmeister, Ryan Paul Thomas
  • Publication number: 20150066496
    Abstract: Technologies pertaining to slot filling are described herein. A deep neural network, a recurrent neural network, and/or a spatio-temporally deep neural network are configured to assign labels to words in a word sequence set forth in natural language. At least one label is a semantic label that is assigned to at least one word in the word sequence.
    Type: Application
    Filed: September 2, 2013
    Publication date: March 5, 2015
    Applicant: Microsoft Corporation
    Inventors: Anoop Deoras, Kaisheng Yao, Xiaodong He, Li Deng, Geoffrey Gerson Zweig, Ruhi Sarikaya, Dong Yu, Mei-Yuh Hwang, Gregoire Mesnil
  • Patent number: 8972253
    Abstract: A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.
    Type: Grant
    Filed: September 15, 2010
    Date of Patent: March 3, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Li Deng, Dong Yu, George Edward Dahl
  • Patent number: 8972254
    Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: March 3, 2015
    Assignee: Utah State University
    Inventors: Jacob Gunther, Todd Moon
  • Publication number: 20150039302
    Abstract: An apparatus comprising: an analyser configured to analyse at least one input to determine one or more expression within the at least one input; and a controller configured to control at least one audio signal associated with the at least one input dependent on the determination of the one or more expression.
    Type: Application
    Filed: March 14, 2012
    Publication date: February 5, 2015
    Applicant: Nokia Corporation
    Inventors: Roope Olavi Jarvinen, Kari Juhani Järvinen, Juha Henrik Arrasvuori, Miikka Vilermo
  • Publication number: 20150039301
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.
    Type: Application
    Filed: July 31, 2013
    Publication date: February 5, 2015
    Applicant: Google Inc.
    Inventors: Andrew W. Senior, Ignacio L. Moreno
  • Publication number: 20150019214
    Abstract: A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.
    Type: Application
    Filed: December 16, 2013
    Publication date: January 15, 2015
    Applicant: Tencent Technology (Shenzhen) Company Limited
    Inventors: Eryu WANG, Li LU, Xiang ZHANG, Haibo LIU, Feng RAO, Lou LI, Shuai YUE, Bo CHEN
  • Publication number: 20140372112
    Abstract: A Deep Neural Network (DNN) model used in an Automatic Speech Recognition (ASR) system is restructured. A restructured DNN model may include fewer parameters compared to the original DNN model. The restructured DNN model may include a monophone state output layer in addition to the senone output layer of the original DNN model. Singular value decomposition (SVD) can be applied to one or more weight matrices of the DNN model to reduce the size of the DNN Model. The output layer of the DNN model may be restructured to include monophone states in addition to the senones (tied triphone states) which are included in the original DNN model. When the monophone states are included in the restructured DNN model, the posteriors of monophone states are used to select a small part of senones to be evaluated.
    Type: Application
    Filed: June 18, 2013
    Publication date: December 18, 2014
    Inventors: Jian Xue, Emilian Stoimenov, Jinyu Li, Yifan Gong
  • Publication number: 20140288928
    Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.
    Type: Application
    Filed: March 25, 2013
    Publication date: September 25, 2014
    Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed
  • Publication number: 20140278390
    Abstract: Systems and methods for processing a query include determining a plurality of sets of match candidates for a query using a processor, each of the plurality of sets of match candidates being independently determined from a plurality of diverse word lattice generation components of different type. The plurality of sets of match candidates is merged by generating a first score for each match candidate to provide a merged set of match candidates. A second score is computed for each match candidate of the merged set based upon features of that match candidate. The first score and the second score are combined to provide a final set of match candidates as matches to the query.
    Type: Application
    Filed: March 12, 2013
    Publication date: September 18, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian E. D. Kingsbury, Hong-Kwang Jeff Kuo, Lidia Luminita Mangu, Hagen Soltau
  • Patent number: 8838446
    Abstract: Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).
    Type: Grant
    Filed: August 31, 2007
    Date of Patent: September 16, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: So-young Jeong, Kwang-cheol Oh, Jae-hoon Jeong, Jeong-su Kim
  • Publication number: 20140257804
    Abstract: Technologies pertaining to training a deep neural network (DNN) for use in a recognition system are described herein. The DNN is trained using heterogeneous data, the heterogeneous data including narrowband signals and wideband signals. The DNN, subsequent to being trained, receives an input signal that can be either a wideband signal or narrowband signal. The DNN estimates the class posterior probability of the input signal regardless of whether the input signal is the wideband signal or the narrowband signal.
    Type: Application
    Filed: March 7, 2013
    Publication date: September 11, 2014
    Applicant: Microsoft Corporation
    Inventors: Jinyu Li, Dong Yu, Yifan Gong
  • Publication number: 20140257803
    Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.
    Type: Application
    Filed: March 6, 2013
    Publication date: September 11, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
  • Publication number: 20140257805
    Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.
    Type: Application
    Filed: March 11, 2013
    Publication date: September 11, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, Yifan Gong
  • Publication number: 20140244248
    Abstract: Techniques for conversion of non-back-off language models for use in speech decoders. For example, a method comprises the following step. A non-back-off language model is converted to a back-off language model. The converted back-off language model is pruned. The converted back-off language model is usable for decoding speech.
    Type: Application
    Filed: February 22, 2013
    Publication date: August 28, 2014
    Applicant: International Business Machines Corporation
    Inventors: Ebru Arisoy, Bhuvana Ramabhadran, Abhinav Sethy, Stanley Chen
  • Publication number: 20140214417
    Abstract: A method and device for voiceprint recognition, include: establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user.
    Type: Application
    Filed: December 12, 2013
    Publication date: July 31, 2014
    Applicant: Tencent Technology (Shenzhen) Company Limited
    Inventors: Eryu WANG, Li LU, Xiang ZHANG, Haibo LIU, Lou LI, Feng RAO, Duling LU, Shuai YUE, Bo CHEN
  • Patent number: 8793127
    Abstract: In addition to conveying primary information, human speech also conveys information concerning the speaker's gender, age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics, which is referred to as secondary information. Disclosed herein are both the means of automatic discovery and use of such secondary information to direct other aspects of the behavior of a controlled system. One embodiment of the invention comprises an improved method to determine, with high reliability, the gender of an adult speaker. A further embodiment of the invention comprises the use of this information to display a gender-appropriate advertisement to the user of an information retrieval system that uses a cell phone as the input and output device.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: July 29, 2014
    Assignee: Promptu Systems Corporation
    Inventors: Harry Printz, Vikas Gulati
  • Patent number: 8775183
    Abstract: Textual transcription of speech is generated and formatted according to user-specified transformation and behavior requirements for a speech recognition system having input grammars and transformations. An apparatus may include a speech recognition platform configured to receive a user-specified transformation requirement, recognize speech in speech data into recognized speech according to a set of recognition grammars; and apply transformations to the recognized speech according to the user-specified transformation requirement. The apparatus may further be configured to receive a user-specified behavior requirement and transform the recognized speech according to the behavior requirement. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: July 8, 2014
    Assignee: Microsoft Corporation
    Inventors: Jonathan E. Hamaker, Keith C. Herold
  • Patent number: 8762142
    Abstract: Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.
    Type: Grant
    Filed: August 15, 2007
    Date of Patent: June 24, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: So-young Jeong, Kwang-cheol Oh, Jae-hoon Jeong, Jeong-su Kim
  • Publication number: 20140163977
    Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.
    Type: Application
    Filed: December 12, 2012
    Publication date: June 12, 2014
    Applicant: AMAZON TECHNOLOGIES, INC.
    Inventor: AMAZON TECHNOLOGIES, INC.
  • Publication number: 20140149112
    Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.
    Type: Application
    Filed: May 23, 2013
    Publication date: May 29, 2014
    Applicant: Sony Computer Entertainment Inc.
    Inventor: Ozlem KALINLI-AKBACAK
  • Patent number: 8719019
    Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.
    Type: Grant
    Filed: April 25, 2011
    Date of Patent: May 6, 2014
    Assignee: Microsoft Corporation
    Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
  • Patent number: 8682669
    Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.
    Type: Grant
    Filed: August 21, 2009
    Date of Patent: March 25, 2014
    Assignee: Synchronoss Technologies, Inc.
    Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
  • Patent number: 8655664
    Abstract: According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text.
    Type: Grant
    Filed: August 11, 2011
    Date of Patent: February 18, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kentaro Tachibana, Gou Hirabayashi, Takehiko Kagoshima
  • Patent number: 8655659
    Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: February 18, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventors: Qingfang Wang, Shouchun He
  • Patent number: 8639508
    Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: January 28, 2014
    Assignee: General Motors LLC
    Inventors: Xufang Zhao, Gaurav Talwar
  • Publication number: 20130317815
    Abstract: A method for analyzing a digital audio signal associated with a baby cry, comprising the steps of: (a) processing the digital audio signal using a spectral analysis to generate a spectral data; (b) processing the digital audio signal using a time-frequency analysis to generate a time-frequency characteristic; (c) categorizing the baby cry into one of a basic type and a special type based on the spectral data; (d) if the baby cry is of the basic type, determining a basic need based on the time-frequency characteristic and a predetermined lookup table; and (e) if the baby cry is of the special type, determining a special need by inputting the time-frequency characteristic into a pre-trained artificial neural network.
    Type: Application
    Filed: May 22, 2013
    Publication date: November 28, 2013
    Applicant: NATIONAL TAIWAN NORMAL UNIVERSITY
    Inventors: Jon-Chao Hong, Chao-Hsin Wu, Mei-Yung Chen
  • Patent number: 8560324
    Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: October 15, 2013
    Assignee: LG Electronics Inc.
    Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
  • Patent number: 8560311
    Abstract: A speech recognition system includes a natural language processing component and an automated speech recognition component distinct from each other such that uncertainty in speech recognition is isolated from uncertainty in natural language understanding, wherein the natural language processing component and an automated speech recognition component communicate corresponding weighted meta-information representative of the uncertainty.
    Type: Grant
    Filed: September 23, 2010
    Date of Patent: October 15, 2013
    Inventors: Robert W. Williams, John E. Keane
  • Patent number: 8554555
    Abstract: The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.
    Type: Grant
    Filed: February 17, 2010
    Date of Patent: October 8, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Rainer Gruhn, Daniel Vasquez, Guillermo Aradilla
  • Patent number: 8527273
    Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
    Type: Grant
    Filed: July 30, 2012
    Date of Patent: September 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mehryar Mohri, Michael Dennis Riley
  • Patent number: 8504361
    Abstract: A method and system for labeling a selected word of a sentence using a deep neural network includes, in one exemplary embodiment, determining an index term corresponding to each feature of the word, transforming the index term or terms of the word into a vector, and predicting a label for the word using the vector. The method and system, in another exemplary embodiment, includes determining, for each word in the sentence, an index term corresponding to each feature of the word, transforming the index term or terms of each word in the sentence into a vector, applying a convolution operation to the vector of the selected word and at least one of the vectors of the other words in the sentence, to transform the vectors into a matrix of vectors, each of the vectors in the matrix including a plurality of row values, constructing a single vector from the vectors in the matrix, and predicting a label for the selected word using the single vector.
    Type: Grant
    Filed: February 9, 2009
    Date of Patent: August 6, 2013
    Assignee: NEC Laboratories America, Inc.
    Inventors: Ronan Collobert, Jason Weston
  • Patent number: 8478589
    Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.
    Type: Grant
    Filed: January 5, 2005
    Date of Patent: July 2, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Lee Begeja, Giuseppe Di Fabbrizio, David Crawford Gibbon, Dilek Z. Hakkani-Tur, Zhu Liu, Bernard S. Renger, Behzad Shahraray, Gokhan Tur
  • Publication number: 20130166291
    Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.
    Type: Application
    Filed: August 23, 2010
    Publication date: June 27, 2013
    Applicant: RMIT UNIVERSITY
    Inventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
  • Patent number: 8463606
    Abstract: A computerized system for advising one communicant in electronic communication between two or more communicants has apparatus monitoring and recording interaction between the communicants, software executing from a machine-readable medium and providing analytics, the software functions including rendering speech into text, and analyzing the rendered text for topics, performing communicant verification, and detecting changes in communicant emotion. Advice is offered to the one communicant during the interaction, based on results of the analytics.
    Type: Grant
    Filed: July 13, 2009
    Date of Patent: June 11, 2013
    Assignee: Genesys Telecommunications Laboratories, Inc.
    Inventors: Mark Scott, Jim Barnett
  • Publication number: 20130138436
    Abstract: Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.
    Type: Application
    Filed: November 26, 2011
    Publication date: May 30, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Dong Yu, Li Deng, Frank Torsten Bernd Seide, Gang Li
  • Patent number: 8442820
    Abstract: The present invention provides a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction only by voice and lip movements, thus allowing a driver to look ahead during a navigation operation and reducing vehicle accidents related to navigation operations during driving.
    Type: Grant
    Filed: December 1, 2009
    Date of Patent: May 14, 2013
    Assignees: Hyundai Motor Company, Kia Motors Corporation
    Inventors: Dae Hee Kim, Dai-Jin Kim, Jin Lee, Jong-Ju Shin, Jin-Seok Lee
  • Patent number: 8442821
    Abstract: A method and system for multi-frame prediction in a hybrid neural network/hidden Markov model automatic speech recognition (ASR) system is disclosed. An audio input signal may be transformed into a time sequence of feature vectors, each corresponding to respective temporal frame of a sequence of periodic temporal frames of the audio input signal. The time sequence of feature vectors may be concurrently input to a neural network, which may process them concurrently. In particular, the neural network may concurrently determine for the time sequence of feature vectors a set of emission probabilities for a plurality of hidden Markov models of the ASR system, where the set of emission probabilities are associated with the temporal frames. The set of emission probabilities may then be concurrently applied to the hidden Markov models for determining speech content of the audio input signal.
    Type: Grant
    Filed: July 27, 2012
    Date of Patent: May 14, 2013
    Assignee: Google Inc.
    Inventor: Vincent Vanhoucke
  • Patent number: 8428946
    Abstract: An architecture and framework for speech/noise classification of an audio signal using multiple features with multiple input channels (e.g., microphones) are provided. The architecture may be implemented with noise suppression in a multi-channel environment where noise suppression is based on an estimation of the noise spectrum. The noise spectrum is estimated using a model that classifies each time/frame and frequency component of a signal as speech or noise by applying a speech/noise probability function. The speech/noise probability function estimates a speech/noise probability for each frequency and time bin. A speech/noise classification estimate is obtained by fusing (e.g., combining) data across different input channels using a layered network model.
    Type: Grant
    Filed: July 6, 2012
    Date of Patent: April 23, 2013
    Assignee: Google Inc.
    Inventor: Marco Paniconi
  • Patent number: 8417185
    Abstract: A wireless device for use with speech recognition applications comprises a frame generator for generating successive frames from digitized original audio signals, the frames representing portions of the digitized audio signals. An autocorrelation circuit generates a set of coefficients for each frame, the coefficient set being reflective of spectral characteristics of the audio signal portion represented by the frame. In one embodiment, the autocorrelation coefficients may be used to predict the original audio signal to be subtracted from the original audio signals and to generate residual signals A Bluetooth transceiver is configured for transmitting the set of coefficients and/or residual signals as data to another device, which utilizes the coefficients for speech applications.
    Type: Grant
    Filed: December 16, 2005
    Date of Patent: April 9, 2013
    Assignee: Vocollect, Inc.
    Inventors: Keith Braho, Roger Graham Byford, Thomas S. Kerr, Amro El-Jaroudi
  • Patent number: 8392184
    Abstract: The invention relates to speech signal processing that detects a speech signal from more than one microphone and obtains microphone signals that are processed by a beamformer to obtain a beamformed signal that is post-filtered signal with a filter that employs adaptable filter weights to obtain an enhanced beamformed signal with the post-filter adapting the filter weights with previously learned filter weights.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: March 5, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Markus Buck, Klaus Scheufele
  • Patent number: 8380331
    Abstract: Methods and apparatus for relative pitch tracking of multiple arbitrary sounds. A probabilistic method for pitch tracking may be implemented as or in a pitch tracking module. A constant-Q transform of an input signal may be decomposed to estimate one or more kernel distributions and one or more impulse distributions. Each kernel distribution represents a spectrum of a particular source, and each impulse distribution represents a relative pitch track for a particular source. The decomposition of the constant-Q transform may be performed according to shift-invariant probabilistic latent component analysis, and may include applying an expectation maximization algorithm to estimate the kernel distributions and the impulse distributions. When decomposing, a prior, e.g. a sliding-Gaussian Dirichlet prior or an entropic prior, and/or a temporal continuity constraint may be imposed on each impulse distribution.
    Type: Grant
    Filed: October 30, 2008
    Date of Patent: February 19, 2013
    Assignee: Adobe Systems Incorporated
    Inventors: Paris Smaragdis, Gautham J. Mysore
  • Patent number: 8374864
    Abstract: In one embodiment, a method includes receiving at a communication device an audio communication and a transcribed text created from the audio communication, and generating a mapping of the transcribed text to the audio communication independent of transcribing the audio. The mapping identifies locations of portions of the text in the audio communication. An apparatus for mapping the text to the audio is also disclosed.
    Type: Grant
    Filed: March 17, 2010
    Date of Patent: February 12, 2013
    Assignee: Cisco Technology, Inc.
    Inventor: Jim Kerr