Neural Network Patents (Class 704/232)

Full-sequence training of deep structures for speech recognition

Patent number: 9031844

Abstract: A method includes an act of causing a processor to access a deep-structured model retained in a computer-readable medium, the deep-structured model includes a plurality of layers with respective weights assigned to the plurality of layers, transition probabilities between states, and language model scores. The method further includes the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

Type: Grant

Filed: September 21, 2010

Date of Patent: May 12, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dong Yu, Li Deng, Abdel-rahman Samir Abdel-rahman Mohamed
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20150127337

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: April 22, 2014

Publication date: May 7, 2015

Applicant: Google Inc.

Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A.U. Bacchiani
SPEAKER VERIFICATION USING NEURAL NETWORKS

Publication number: 20150127336

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.

Type: Application

Filed: March 28, 2014

Publication date: May 7, 2015

Applicant: Google Inc.

Inventors: Xin Lei, Erik McDermott, Ehsan Variani, Ignacio L. Moreno
Methods and apparatus for resource sharing for voice and data interlacing

Patent number: 9026065

Abstract: Methods and apparatus for voice and data interlacing in a system having a shared antenna. In one embodiment, a voice and data communication system has a shared antenna for transmitting and receiving information in time slots, wherein the antenna can only be used for transmit or receive at a given time. The system determines timing requirements for data transmission and reception and interrupts data transmission for transmission of speech in selected intervals while meeting the data transmission timing and throughput requirements. The speech can be manipulated to fit with the selected intervals, to preserve the intelligibility of the manipulated speech.

Type: Grant

Filed: March 21, 2012

Date of Patent: May 5, 2015

Assignee: Raytheon Company

Inventors: David R. Peterson, Timothy S. Loos, David F. Ring, James F. Keating
Method and system for analyzing digital sound audio signal associated with baby cry

Patent number: 9009038

Abstract: A method for analyzing a digital audio signal associated with a baby cry, comprising the steps of: (a) processing the digital audio signal using a spectral analysis to generate a spectral data; (b) processing the digital audio signal using a time-frequency analysis to generate a time-frequency characteristic; (c) categorizing the baby cry into one of a basic type and a special type based on the spectral data; (d) if the baby cry is of the basic type, determining a basic need based on the time-frequency characteristic and a predetermined lookup table; and (e) if the baby cry is of the special type, determining a special need by inputting the time-frequency characteristic into a pre-trained artificial neural network.

Type: Grant

Filed: May 22, 2013

Date of Patent: April 14, 2015

Assignee: National Taiwan Normal University

Inventors: Jon-Chao Hong, Chao-Hsin Wu, Mei-Yung Chen
SYSTEM AND METHOD OF USING NEURAL TRANSFORMS OF ROBUST AUDIO FEATURES FOR SPEECH PROCESSING

Publication number: 20150100312

Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.

Type: Application

Filed: October 4, 2013

Publication date: April 9, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Enrico Luigi BOCCHIERI, Dimitrios DIMITRIADIS
KEY PHRASE DETECTION

Publication number: 20150095027

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for key phrase detection. One of the methods includes receiving a plurality of audio frame vectors that each model an audio waveform during a different period of time, generating an output feature vector for each of the audio frame vectors, wherein each output feature vector includes a set of scores that characterize an acoustic match between the corresponding audio frame vector and a set of expected event vectors, each of the expected event vectors corresponding to one of the scores and defining acoustic properties of at least a portion of a keyword, and providing each of the output feature vectors to a posterior handling module.

Type: Application

Filed: September 30, 2013

Publication date: April 2, 2015

Applicant: Google Inc.

Inventors: Maria Carolina Parada San Martin, Alexander H. Gruenstein, Guoguo Chen
SPEECH RECOGNIZER WITH MULTI-DIRECTIONAL DECODING

Publication number: 20150095026

Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.

Type: Application

Filed: September 27, 2013

Publication date: April 2, 2015

Applicant: Amazon Technologies, Inc.

Inventors: Michael Maximilian Emanuel Bisani, Nikko Strom, Bjorn Hoffmeister, Ryan Paul Thomas
ASSIGNMENT OF SEMANTIC LABELS TO A SEQUENCE OF WORDS USING NEURAL NETWORK ARCHITECTURES

Publication number: 20150066496

Abstract: Technologies pertaining to slot filling are described herein. A deep neural network, a recurrent neural network, and/or a spatio-temporally deep neural network are configured to assign labels to words in a word sequence set forth in natural language. At least one label is a semantic label that is assigned to at least one word in the word sequence.

Type: Application

Filed: September 2, 2013

Publication date: March 5, 2015

Applicant: Microsoft Corporation

Inventors: Anoop Deoras, Kaisheng Yao, Xiaodong He, Li Deng, Geoffrey Gerson Zweig, Ruhi Sarikaya, Dong Yu, Mei-Yuh Hwang, Gregoire Mesnil
Deep belief network for large vocabulary continuous speech recognition

Patent number: 8972253

Abstract: A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.

Type: Grant

Filed: September 15, 2010

Date of Patent: March 3, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Li Deng, Dong Yu, George Edward Dahl
Turbo processing for speech recognition with local-scale and broad-scale decoders

Patent number: 8972254

Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.

Type: Grant

Filed: June 28, 2012

Date of Patent: March 3, 2015

Assignee: Utah State University

Inventors: Jacob Gunther, Todd Moon
SPATIAL AUDIO SIGNALING FILTERING

Publication number: 20150039302

Abstract: An apparatus comprising: an analyser configured to analyse at least one input to determine one or more expression within the at least one input; and a controller configured to control at least one audio signal associated with the at least one input dependent on the determination of the one or more expression.

Type: Application

Filed: March 14, 2012

Publication date: February 5, 2015

Applicant: Nokia Corporation

Inventors: Roope Olavi Jarvinen, Kari Juhani Järvinen, Juha Henrik Arrasvuori, Miikka Vilermo
SPEECH RECOGNITION USING NEURAL NETWORKS

Publication number: 20150039301

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.

Type: Application

Filed: July 31, 2013

Publication date: February 5, 2015

Applicant: Google Inc.

Inventors: Andrew W. Senior, Ignacio L. Moreno
METHOD AND DEVICE FOR PARALLEL PROCESSING IN MODEL TRAINING

Publication number: 20150019214

Abstract: A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.

Type: Application

Filed: December 16, 2013

Publication date: January 15, 2015

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventors: Eryu WANG, Li LU, Xiang ZHANG, Haibo LIU, Feng RAO, Lou LI, Shuai YUE, Bo CHEN
RESTRUCTURING DEEP NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20140372112

Abstract: A Deep Neural Network (DNN) model used in an Automatic Speech Recognition (ASR) system is restructured. A restructured DNN model may include fewer parameters compared to the original DNN model. The restructured DNN model may include a monophone state output layer in addition to the senone output layer of the original DNN model. Singular value decomposition (SVD) can be applied to one or more weight matrices of the DNN model to reduce the size of the DNN Model. The output layer of the DNN model may be restructured to include monophone states in addition to the senones (tied triphone states) which are included in the original DNN model. When the monophone states are included in the restructured DNN model, the posteriors of monophone states are used to select a small part of senones to be evaluated.

Type: Application

Filed: June 18, 2013

Publication date: December 18, 2014

Inventors: Jian Xue, Emilian Stoimenov, Jinyu Li, Yifan Gong
SYSTEM AND METHOD FOR APPLYING A CONVOLUTIONAL NEURAL NETWORK TO SPEECH RECOGNITION

Publication number: 20140288928

Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.

Type: Application

Filed: March 25, 2013

Publication date: September 25, 2014

Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed
CLASSIFIER-BASED SYSTEM COMBINATION FOR SPOKEN TERM DETECTION

Publication number: 20140278390

Abstract: Systems and methods for processing a query include determining a plurality of sets of match candidates for a query using a processor, each of the plurality of sets of match candidates being independently determined from a plurality of diverse word lattice generation components of different type. The plurality of sets of match candidates is merged by generating a first score for each match candidate to provide a merged set of match candidates. A second score is computed for each match candidate of the merged set based upon features of that match candidate. The first score and the second score are combined to provide a final set of match candidates as matches to the query.

Type: Application

Filed: March 12, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian E. D. Kingsbury, Hong-Kwang Jeff Kuo, Lidia Luminita Mangu, Hagen Soltau
Method and apparatus of transforming speech feature vectors using an auto-associative neural network

Patent number: 8838446

Abstract: Provided is a method and apparatus for transforming a speech feature vector. The method includes extracting a feature vector required for speech recognition from a speech signal and transforming the extracted feature vector using an auto-associative neural network (AANN).

Type: Grant

Filed: August 31, 2007

Date of Patent: September 16, 2014

Assignee: Samsung Electronics Co., Ltd.

Inventors: So-young Jeong, Kwang-cheol Oh, Jae-hoon Jeong, Jeong-su Kim
EXPLOITING HETEROGENEOUS DATA IN DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION SYSTEMS

Publication number: 20140257804

Abstract: Technologies pertaining to training a deep neural network (DNN) for use in a recognition system are described herein. The DNN is trained using heterogeneous data, the heterogeneous data including narrowband signals and wideband signals. The DNN, subsequent to being trained, receives an input signal that can be either a wideband signal or narrowband signal. The DNN estimates the class posterior probability of the input signal regardless of whether the input signal is the wideband signal or the narrowband signal.

Type: Application

Filed: March 7, 2013

Publication date: September 11, 2014

Applicant: Microsoft Corporation

Inventors: Jinyu Li, Dong Yu, Yifan Gong
CONSERVATIVELY ADAPTING A DEEP NEURAL NETWORK IN A RECOGNITION SYSTEM

Publication number: 20140257803

Abstract: Various technologies described herein pertain to conservatively adapting a deep neural network (DNN) in a recognition system for a particular user or context. A DNN is employed to output a probability distribution over models of context-dependent units responsive to receipt of captured user input. The DNN is adapted for a particular user based upon the captured user input, wherein the adaption is undertaken conservatively such that a deviation between outputs of the adapted DNN and the unadapted DNN is constrained.

Type: Application

Filed: March 6, 2013

Publication date: September 11, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Kaisheng Yao, Hang Su, Gang Li, Frank Seide
MULTILINGUAL DEEP NEURAL NETWORK

Publication number: 20140257805

Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.

Type: Application

Filed: March 11, 2013

Publication date: September 11, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, Yifan Gong
CONVERSION OF NON-BACK-OFF LANGUAGE MODELS FOR EFFICIENT SPEECH DECODING

Publication number: 20140244248

Abstract: Techniques for conversion of non-back-off language models for use in speech decoders. For example, a method comprises the following step. A non-back-off language model is converted to a back-off language model. The converted back-off language model is pruned. The converted back-off language model is usable for decoding speech.

Type: Application

Filed: February 22, 2013

Publication date: August 28, 2014

Applicant: International Business Machines Corporation

Inventors: Ebru Arisoy, Bhuvana Ramabhadran, Abhinav Sethy, Stanley Chen
METHOD AND DEVICE FOR VOICEPRINT RECOGNITION

Publication number: 20140214417

Abstract: A method and device for voiceprint recognition, include: establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user.

Type: Application

Filed: December 12, 2013

Publication date: July 31, 2014

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventors: Eryu WANG, Li LU, Xiang ZHANG, Haibo LIU, Lou LI, Feng RAO, Duling LU, Shuai YUE, Bo CHEN
Method and apparatus for automatically determining speaker characteristics for speech-directed advertising or other enhancement of speech-controlled devices or services

Patent number: 8793127

Abstract: In addition to conveying primary information, human speech also conveys information concerning the speaker's gender, age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics, which is referred to as secondary information. Disclosed herein are both the means of automatic discovery and use of such secondary information to direct other aspects of the behavior of a controlled system. One embodiment of the invention comprises an improved method to determine, with high reliability, the gender of an adult speaker. A further embodiment of the invention comprises the use of this information to display a gender-appropriate advertisement to the user of an information retrieval system that uses a cell phone as the input and output device.

Type: Grant

Filed: October 31, 2007

Date of Patent: July 29, 2014

Assignee: Promptu Systems Corporation

Inventors: Harry Printz, Vikas Gulati
Application of user-specified transformations to automatic speech recognition results

Patent number: 8775183

Abstract: Textual transcription of speech is generated and formatted according to user-specified transformation and behavior requirements for a speech recognition system having input grammars and transformations. An apparatus may include a speech recognition platform configured to receive a user-specified transformation requirement, recognize speech in speech data into recognized speech according to a set of recognition grammars; and apply transformations to the recognized speech according to the user-specified transformation requirement. The apparatus may further be configured to receive a user-specified behavior requirement and transform the recognized speech according to the behavior requirement. Other embodiments are described and claimed.

Type: Grant

Filed: June 12, 2009

Date of Patent: July 8, 2014

Assignee: Microsoft Corporation

Inventors: Jonathan E. Hamaker, Keith C. Herold
Multi-stage speech recognition apparatus and method

Patent number: 8762142

Abstract: Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.

Type: Grant

Filed: August 15, 2007

Date of Patent: June 24, 2014

Assignee: Samsung Electronics Co., Ltd.

Inventors: So-young Jeong, Kwang-cheol Oh, Jae-hoon Jeong, Jeong-su Kim
SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS

Publication number: 20140163977

Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

Type: Application

Filed: December 12, 2012

Publication date: June 12, 2014

Applicant: AMAZON TECHNOLOGIES, INC.

Inventor: AMAZON TECHNOLOGIES, INC.
COMBINING AUDITORY ATTENTION CUES WITH PHONEME POSTERIOR SCORES FOR PHONE/VOWEL/SYLLABLE BOUNDARY DETECTION

Publication number: 20140149112

Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.

Type: Application

Filed: May 23, 2013

Publication date: May 29, 2014

Applicant: Sony Computer Entertainment Inc.

Inventor: Ozlem KALINLI-AKBACAK
Speaker identification

Patent number: 8719019

Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.

Type: Grant

Filed: April 25, 2011

Date of Patent: May 6, 2014

Assignee: Microsoft Corporation

Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
System and method for building optimal state-dependent statistical utterance classifiers in spoken dialog systems

Patent number: 8682669

Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.

Type: Grant

Filed: August 21, 2009

Date of Patent: March 25, 2014

Assignee: Synchronoss Technologies, Inc.

Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
Text presentation apparatus, text presentation method, and computer program product

Patent number: 8655664

Abstract: According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text.

Type: Grant

Filed: August 11, 2011

Date of Patent: February 18, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Kentaro Tachibana, Gou Hirabayashi, Takehiko Kagoshima
Personalized text-to-speech synthesis and personalized speech feature extraction

Patent number: 8655659

Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.

Type: Grant

Filed: August 12, 2010

Date of Patent: February 18, 2014

Assignees: Sony Corporation, Sony Mobile Communications AB

Inventors: Qingfang Wang, Shouchun He
User-specific confidence thresholds for speech recognition

Patent number: 8639508

Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.

Type: Grant

Filed: February 14, 2011

Date of Patent: January 28, 2014

Assignee: General Motors LLC

Inventors: Xufang Zhao, Gaurav Talwar
METHOD AND SYSTEM FOR ANALYZING DIGITAL SOUND AUDIO SIGNAL ASSOCIATED WITH BABY CRY

Publication number: 20130317815

Abstract: A method for analyzing a digital audio signal associated with a baby cry, comprising the steps of: (a) processing the digital audio signal using a spectral analysis to generate a spectral data; (b) processing the digital audio signal using a time-frequency analysis to generate a time-frequency characteristic; (c) categorizing the baby cry into one of a basic type and a special type based on the spectral data; (d) if the baby cry is of the basic type, determining a basic need based on the time-frequency characteristic and a predetermined lookup table; and (e) if the baby cry is of the special type, determining a special need by inputting the time-frequency characteristic into a pre-trained artificial neural network.

Type: Application

Filed: May 22, 2013

Publication date: November 28, 2013

Applicant: NATIONAL TAIWAN NORMAL UNIVERSITY

Inventors: Jon-Chao Hong, Chao-Hsin Wu, Mei-Yung Chen
Mobile terminal and menu control method thereof

Patent number: 8560324

Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.

Type: Grant

Filed: January 31, 2012

Date of Patent: October 15, 2013

Assignee: LG Electronics Inc.

Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
System and method for isolating uncertainty between speech recognition and natural language processing

Patent number: 8560311

Abstract: A speech recognition system includes a natural language processing component and an automated speech recognition component distinct from each other such that uncertainty in speech recognition is isolated from uncertainty in natural language understanding, wherein the natural language processing component and an automated speech recognition component communicate corresponding weighted meta-information representative of the uncertainty.

Type: Grant

Filed: September 23, 2010

Date of Patent: October 15, 2013

Inventors: Robert W. Williams, John E. Keane
Method for automated training of a plurality of artificial neural networks

Patent number: 8554555

Abstract: The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.

Type: Grant

Filed: February 17, 2010

Date of Patent: October 8, 2013

Assignee: Nuance Communications, Inc.

Inventors: Rainer Gruhn, Daniel Vasquez, Guillermo Aradilla
Systems and methods for determining the N-best strings

Patent number: 8527273

Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.

Type: Grant

Filed: July 30, 2012

Date of Patent: September 3, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mehryar Mohri, Michael Dennis Riley
Deep neural networks and methods for using same

Patent number: 8504361

Abstract: A method and system for labeling a selected word of a sentence using a deep neural network includes, in one exemplary embodiment, determining an index term corresponding to each feature of the word, transforming the index term or terms of the word into a vector, and predicting a label for the word using the vector. The method and system, in another exemplary embodiment, includes determining, for each word in the sentence, an index term corresponding to each feature of the word, transforming the index term or terms of each word in the sentence into a vector, applying a convolution operation to the vector of the selected word and at least one of the vectors of the other words in the sentence, to transform the vectors into a matrix of vectors, each of the vectors in the matrix including a plurality of row values, constructing a single vector from the vectors in the matrix, and predicting a label for the selected word using the single vector.

Type: Grant

Filed: February 9, 2009

Date of Patent: August 6, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Ronan Collobert, Jason Weston
Library of existing spoken dialog data for use in generating new natural language spoken dialog systems

Patent number: 8478589

Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.

Type: Grant

Filed: January 5, 2005

Date of Patent: July 2, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Lee Begeja, Giuseppe Di Fabbrizio, David Crawford Gibbon, Dilek Z. Hakkani-Tur, Zhu Liu, Bernard S. Renger, Behzad Shahraray, Gokhan Tur
EMOTIONAL AND/OR PSYCHIATRIC STATE DETECTION

Publication number: 20130166291

Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.

Type: Application

Filed: August 23, 2010

Publication date: June 27, 2013

Applicant: RMIT UNIVERSITY

Inventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
System for analyzing interactions and reporting analytic results to human-operated and system interfaces in real time

Patent number: 8463606

Abstract: A computerized system for advising one communicant in electronic communication between two or more communicants has apparatus monitoring and recording interaction between the communicants, software executing from a machine-readable medium and providing analytics, the software functions including rendering speech into text, and analyzing the rendered text for topics, performing communicant verification, and detecting changes in communicant emotion. Advice is offered to the one communicant during the interaction, based on results of the analytics.

Type: Grant

Filed: July 13, 2009

Date of Patent: June 11, 2013

Assignee: Genesys Telecommunications Laboratories, Inc.

Inventors: Mark Scott, Jim Barnett
DISCRIMINATIVE PRETRAINING OF DEEP NEURAL NETWORKS

Publication number: 20130138436

Abstract: Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.

Type: Application

Filed: November 26, 2011

Publication date: May 30, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Li Deng, Frank Torsten Bernd Seide, Gang Li
Combined lip reading and voice recognition multimodal interface system

Patent number: 8442820

Abstract: The present invention provides a combined lip reading and voice recognition multimodal interface system, which can issue a navigation operation instruction only by voice and lip movements, thus allowing a driver to look ahead during a navigation operation and reducing vehicle accidents related to navigation operations during driving.

Type: Grant

Filed: December 1, 2009

Date of Patent: May 14, 2013

Assignees: Hyundai Motor Company, Kia Motors Corporation

Inventors: Dae Hee Kim, Dai-Jin Kim, Jin Lee, Jong-Ju Shin, Jin-Seok Lee
Multi-frame prediction for hybrid neural network/hidden Markov models

Patent number: 8442821

Abstract: A method and system for multi-frame prediction in a hybrid neural network/hidden Markov model automatic speech recognition (ASR) system is disclosed. An audio input signal may be transformed into a time sequence of feature vectors, each corresponding to respective temporal frame of a sequence of periodic temporal frames of the audio input signal. The time sequence of feature vectors may be concurrently input to a neural network, which may process them concurrently. In particular, the neural network may concurrently determine for the time sequence of feature vectors a set of emission probabilities for a plurality of hidden Markov models of the ASR system, where the set of emission probabilities are associated with the temporal frames. The set of emission probabilities may then be concurrently applied to the hidden Markov models for determining speech content of the audio input signal.

Type: Grant

Filed: July 27, 2012

Date of Patent: May 14, 2013

Assignee: Google Inc.

Inventor: Vincent Vanhoucke
System and method for multi-channel multi-feature speech/noise classification for noise suppression

Patent number: 8428946

Abstract: An architecture and framework for speech/noise classification of an audio signal using multiple features with multiple input channels (e.g., microphones) are provided. The architecture may be implemented with noise suppression in a multi-channel environment where noise suppression is based on an estimation of the noise spectrum. The noise spectrum is estimated using a model that classifies each time/frame and frequency component of a signal as speech or noise by applying a speech/noise probability function. The speech/noise probability function estimates a speech/noise probability for each frequency and time bin. A speech/noise classification estimate is obtained by fusing (e.g., combining) data across different input channels using a layered network model.

Type: Grant

Filed: July 6, 2012

Date of Patent: April 23, 2013

Assignee: Google Inc.

Inventor: Marco Paniconi
Wireless headset and method for robust voice data communication

Patent number: 8417185

Abstract: A wireless device for use with speech recognition applications comprises a frame generator for generating successive frames from digitized original audio signals, the frames representing portions of the digitized audio signals. An autocorrelation circuit generates a set of coefficients for each frame, the coefficient set being reflective of spectral characteristics of the audio signal portion represented by the frame. In one embodiment, the autocorrelation coefficients may be used to predict the original audio signal to be subtracted from the original audio signals and to generate residual signals A Bluetooth transceiver is configured for transmitting the set of coefficients and/or residual signals as data to another device, which utilizes the coefficients for speech applications.

Type: Grant

Filed: December 16, 2005

Date of Patent: April 9, 2013

Assignee: Vocollect, Inc.

Inventors: Keith Braho, Roger Graham Byford, Thomas S. Kerr, Amro El-Jaroudi
Filtering of beamformed speech signals

Patent number: 8392184

Abstract: The invention relates to speech signal processing that detects a speech signal from more than one microphone and obtains microphone signals that are processed by a beamformer to obtain a beamformed signal that is post-filtered signal with a filter that employs adaptable filter weights to obtain an enhanced beamformed signal with the post-filter adapting the filter weights with previously learned filter weights.

Type: Grant

Filed: January 21, 2009

Date of Patent: March 5, 2013

Assignee: Nuance Communications, Inc.

Inventors: Markus Buck, Klaus Scheufele
Method and apparatus for relative pitch tracking of multiple arbitrary sounds

Patent number: 8380331

Abstract: Methods and apparatus for relative pitch tracking of multiple arbitrary sounds. A probabilistic method for pitch tracking may be implemented as or in a pitch tracking module. A constant-Q transform of an input signal may be decomposed to estimate one or more kernel distributions and one or more impulse distributions. Each kernel distribution represents a spectrum of a particular source, and each impulse distribution represents a relative pitch track for a particular source. The decomposition of the constant-Q transform may be performed according to shift-invariant probabilistic latent component analysis, and may include applying an expectation maximization algorithm to estimate the kernel distributions and the impulse distributions. When decomposing, a prior, e.g. a sliding-Gaussian Dirichlet prior or an entropic prior, and/or a temporal continuity constraint may be imposed on each impulse distribution.

Type: Grant

Filed: October 30, 2008

Date of Patent: February 19, 2013

Assignee: Adobe Systems Incorporated

Inventors: Paris Smaragdis, Gautham J. Mysore
Correlation of transcribed text with corresponding audio

Patent number: 8374864

Abstract: In one embodiment, a method includes receiving at a communication device an audio communication and a transcribed text created from the audio communication, and generating a mapping of the transcribed text to the audio communication independent of transcribing the audio. The mapping identifies locations of portions of the text in the audio communication. An apparatus for mapping the text to the audio is also disclosed.

Type: Grant

Filed: March 17, 2010

Date of Patent: February 12, 2013

Assignee: Cisco Technology, Inc.

Inventor: Jim Kerr

prev … 3 4 5 6 7 8 9 10 next