Neural Network Patents (Class 704/232)

Voice recognition system and voice recognition device

Patent number: 9761228

Abstract: There are provided a recognition result candidate comparator 205 that compares a plurality of server-side voice recognition result candidates received by a receiver 204, to detect texts having a difference, and a recognition result integrator 206 that integrates a client-side voice recognition result candidate and a server-side voice recognition result candidate on the basis of the client-side voice recognition result candidate, the server-side voice recognition result candidate, and a detection result provided by the recognition result candidate comparator 205, to decide a voice recognition result.

Type: Grant

Filed: November 20, 2013

Date of Patent: September 12, 2017

Assignee: Mitsubishi Electric Corporation

Inventors: Isamu Ogawa, Toshiyuki Hanazawa, Tomohiro Narita
Order statistic techniques for neural networks

Patent number: 9761221

Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.

Type: Grant

Filed: August 20, 2015

Date of Patent: September 12, 2017

Assignee: Nuance Communications, Inc.

Inventors: Steven John Rennie, Vaibhava Goel
User specified keyword spotting using neural network feature extractor

Patent number: 9754584

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

Type: Grant

Filed: November 8, 2016

Date of Patent: September 5, 2017

Assignee: Google Inc.

Inventors: Maria Carolina Parada San Martin, Tara N. Sainath, Guoguo Chen
Generating representations of acoustic sequences

Patent number: 9721562

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

Type: Grant

Filed: December 3, 2014

Date of Patent: August 1, 2017

Assignee: Google Inc.

Inventors: Hasim Sak, Andrew W. Senior
Transfer learning for deep neural network based hotword detection

Patent number: 9715660

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.

Type: Grant

Filed: March 31, 2014

Date of Patent: July 25, 2017

Assignee: Google Inc.

Inventors: Maria Carolina Parada San Martin, Guoguo Chen, Georg Heigold
System and method for an integrated, multi-modal, multi-device natural language voice services environment

Patent number: 9711143

Abstract: A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.

Type: Grant

Filed: April 4, 2016

Date of Patent: July 18, 2017

Assignee: VoiceBox Technologies Corporation

Inventors: Robert A. Kennewick, Chris Weider
System and method of extracting clauses for spoken language understanding

Patent number: 9703769

Abstract: A clausifier and method of extracting clauses for spoken language understanding are disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text.

Type: Grant

Filed: October 7, 2015

Date of Patent: July 11, 2017

Assignee: Nuance Communications, Inc.

Inventors: Srinivas Bangalore, Narendra K. Gupta, Mazin Gilbert
Processing multi-channel audio waveforms

Patent number: 9697826

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Type: Grant

Filed: July 8, 2016

Date of Patent: July 4, 2017

Assignee: Google Inc.

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
Reduced microphone power-up latency

Patent number: 9699550

Abstract: A method includes generating a command at a first microphone and sending the command from the first microphone to a second microphone. The command is sent to the second microphone via a bus that is coupled to the first microphone and to the second microphone.

Type: Grant

Filed: November 12, 2014

Date of Patent: July 4, 2017

Assignee: QUALCOMM Incorporated

Inventors: Joseph Robert Fitzgerald, Bengt Stefan Gustavsson, Louis Dominic Oliveira
Method for distinguishing components of signal of environment

Patent number: 9685155

Abstract: A method distinguishes components of a signal by processing the signal to estimate a set of analysis features, wherein each analysis feature defines an element of the signal and has feature values that represent parts of the signal, processing the signal to estimate input features of the signal, and processing the input features using a deep neural network to assign an associative descriptor to each element of the signal, wherein a degree of similarity between the associative descriptors of different elements is related to a degree to which the parts of the signal represented by the elements belong to a single component of the signal. The similarities between associative descriptors are processed to estimate correspondences between the elements of the signal and the components in the signal. Then, the signal is processed using the correspondences to distinguish component parts of the signal.

Type: Grant

Filed: May 5, 2016

Date of Patent: June 20, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: John Hershey, Jonathan Le Roux, Shinji Watanabe, Zhuo Chen
Translation and integration of presentation materials with cross-lingual multi-media support

Patent number: 9678953

Abstract: Computer-based systems and methods are disclosed for translation of a multi-media presentation (e.g., a lecture) along with the accompanying presentation materials. Translation and delivery of text-based presentation materials to a listener is annotated and aligned with audio, so that the listener can follow both the audio and the presentation material. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.

Type: Grant

Filed: January 5, 2015

Date of Patent: June 13, 2017

Assignee: Facebook, Inc.

Inventor: Alexander Waibel
Noise removal device and method, and program

Patent number: 9674606

Abstract: There is provided a signal processing device including a feature amount extraction unit configured to extract, from a frequency-domain signal obtained by frequency conversion on a voice signal, a feature amount of the frequency-domain signal, and a determination unit configured to determine, based on the extracted feature amount, presence or absence of noise in the voice signal within a predetermined section. The feature amount is composed of a plurality of elements. The plurality of elements contain an element defined based on a correlation value between a feature amount waveform which is a waveform according to the frequency-domain signal in the voice signal within the predetermined section and a feature amount waveform within another section sequential in time to the predetermined section.

Type: Grant

Filed: October 18, 2013

Date of Patent: June 6, 2017

Assignee: Sony Corporation

Inventors: Keiichi Osako, Mototsugu Abe
System and method for estimating the reliability of alternate speech recognition hypotheses in real time

Patent number: 9653066

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.

Type: Grant

Filed: October 23, 2009

Date of Patent: May 16, 2017

Assignee: Nuance Communications, Inc.

Inventors: Jason Williams, Suhrid Balakrishnan
Speech recognition candidate selection based on non-acoustic input

Patent number: 9632589

Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.

Type: Grant

Filed: July 1, 2015

Date of Patent: April 25, 2017

Assignee: International Business Machines Corporation

Inventors: Jonathan H. Connell, II, Etienne Marcheret
Methods and apparatus for training an artificial neural network for use in speech recognition

Patent number: 9627532

Abstract: Methods and apparatus for training a multi-layer artificial neural network for use in speech recognition. The method comprises determining for a first speech pattern of the plurality of speech patterns, using a first processing pipeline, network activations for a plurality of nodes of the artificial neural network in response to providing the first speech pattern as input to the artificial neural network, determining based, at least in part, on the network activations and a selection criterion, whether the artificial neural network should be trained on the first speech pattern, and updating, using a second processing pipeline, network weights between nodes of the artificial neural network based, at least in part, on the network activations when it is determined that the artificial neural network should be trained on the first speech pattern.

Type: Grant

Filed: June 18, 2014

Date of Patent: April 18, 2017

Assignee: Nuance Communications, Inc.

Inventors: Roberto Gemello, Franco Mana, Dario Albesano
Speech recognition candidate selection based on non-acoustic input

Patent number: 9626001

Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.

Type: Grant

Filed: November 13, 2014

Date of Patent: April 18, 2017

Assignee: International Business Machines Corporation

Inventors: Jonathan H. Connell, II, Etienne Marcheret
Systems and methods for combining stochastic average gradient and hessian-free optimization for sequence training of deep neural networks

Patent number: 9626621

Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.

Type: Grant

Filed: July 7, 2015

Date of Patent: April 18, 2017

Assignee: International Business Machines Corporation

Inventors: Pierre Dognin, Vaibhava Goel
Processing acoustic sequences using long short-term memory (LSTM) neural networks that include recurrent projection layers

Patent number: 9620108

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

Type: Grant

Filed: December 2, 2014

Date of Patent: April 11, 2017

Assignee: Google Inc.

Inventors: Hasim Sak, Andrew W. Senior
Context-dependent state tying using a neural network

Patent number: 9620145

Abstract: The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.

Type: Grant

Filed: May 20, 2014

Date of Patent: April 11, 2017

Assignee: Google Inc.

Inventors: Michiel A. U. Bacchiani, David Rybach
Predicting recognition quality of a phrase in automatic speech recognition systems

Patent number: 9613619

Abstract: A method for predicting a speech recognition quality of a phrase comprising at least one word includes: receiving, on a computer system including a processor and memory storing instructions, the phrase; computing, on the computer system, a set of features comprising one or more features corresponding to the phrase; providing the phrase to a prediction model on the computer system and receiving a predicted recognition quality value based on the set of features; and returning the predicted recognition quality value.

Type: Grant

Filed: October 30, 2013

Date of Patent: April 4, 2017

Assignee: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.

Inventors: Amir Lev-Tov, Avraham Faizakof, Yochai Konig
Method for using a multi-scale recurrent neural network with pretraining for spoken language understanding tasks

Patent number: 9607616

Abstract: A spoken language understanding (SLU) system receives a sequence of words corresponding to one or more spoken utterances of a user, which is passed through a spoken language understanding module to produce a sequence of intentions. The sequence of words are passed through a first subnetwork of a multi-scale recurrent neural network (MSRNN), and the sequence of intentions are passed through a second subnetwork of the multi-scale recurrent neural network (MSRNN). Then, the outputs of the first subnetwork and the second subnetwork are combined to predict a goal of the user.

Type: Grant

Filed: August 17, 2015

Date of Patent: March 28, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shinji Watanabe, Yi Luan, Bret Harsham
Systems and methods for accelerating hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Patent number: 9601109

Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.

Type: Grant

Filed: September 29, 2014

Date of Patent: March 21, 2017

Assignee: International Business Machines Corporation

Inventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
Word cloud display

Patent number: 9575936

Abstract: Machine learning-based methods to improve the knowledge extraction process in a specific domain or business environment, and then provides that extracted knowledge in a word cloud user interface display capable of summarizing and conveying a vast amount of information to a user very quickly. Based on the self-training mechanism developed by the inventors, the ontology programming automatically trains itself to understand the domain or environment of the communication data by processing and analyzing a defined corpus of communication data. The developed ontology can be applied to process a dataset of communication information to create a word cloud that can provide a quick view into the content of the dataset, including information about the language used by participants in the communications, such as identifying for a user key phrases and terms, the frequency of those phrases, the originator of the terms of phrases, and the confidence levels of such identifications.

Type: Grant

Filed: July 16, 2015

Date of Patent: February 21, 2017

Assignee: VERINT SYSTEMS LTD.

Inventors: Roni Romano, Galia Zacay, Rahm Fehr
Voice recognition function realizing method and device

Patent number: 9542935

Abstract: The embodiment of the present invention provides a method for realizing a voice recognition function, including: setting a corresponding relationship between the attitude parameter of a mobile terminal body and a voice recognition mode (S10); if a gravity sensor in the mobile terminal detects that a change of the attitude parameter of the mobile terminal body satisfies a condition of switching the voice recognition mode, then switching the voice recognition mode, and performing voice recognition under the switched voice recognition mode (S20). Through self-adaptively switching the voice recognition mode of the mobile terminal, the voice recognition function of the mobile terminal can be made to free the hands of a user to the greatest extent and save power consumption. An apparatus for realizing the voice recognition function corresponding to the method is also disclosed.

Type: Grant

Filed: June 17, 2013

Date of Patent: January 10, 2017

Assignee: ZTE Corporation

Inventor: Junxuan Lin
Web document annotation service

Patent number: 9519632

Abstract: Annotating web content, in one aspect, may include detecting a request to navigate to a web site for content on a web browser. A component such as a web browser plugin, extension or the like transmits a uniform resource locator (URL) associated with the web site to a computer-implemented service that stores annotations to the content separate from the web site that is providing the content, and receives from the computer-implemented service one or more annotations to the content. The web browser plugin or the like renders the one or more annotations within the content from the web site. The content rendered with the annotations may be displayed within a display window of the web browser.

Type: Grant

Filed: December 22, 2015

Date of Patent: December 13, 2016

Assignee: International Business Machines Corporation

Inventors: Eric J. Barkie, Benjamin L. Fletcher, Andrew P. Wyskida
Shared hidden layer combination for speech recognition systems

Patent number: 9520127

Abstract: Providing a framework for merging automatic speech recognition (ASR) systems having a shared deep neural network (DNN) feature transformation is provided. A received utterance may be evaluated to generate a DNN-derived feature from the top hidden layer of a DNN. The top hidden layer output may then be utilized to generate a network including a bottleneck layer and an output layer. Weights representing a feature dimension reduction may then be extracted between the top hidden layer and the bottleneck layer. Scores may then be generated and combined to merge the ASR systems which share the DNN feature transformation.

Type: Grant

Filed: April 29, 2014

Date of Patent: December 13, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinyu Li, Jian Xue, Yifan Gong
Method and device for parallel processing in model training

Patent number: 9508347

Abstract: A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.

Type: Grant

Filed: December 16, 2013

Date of Patent: November 29, 2016

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Eryu Wang, Li Lu, Xiang Zhang, Haibo Liu, Feng Rao, Lou Li, Shuai Yue, Bo Chen
System and method for discriminative pronunciation modeling for voice search

Patent number: 9484019

Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.

Type: Grant

Filed: October 11, 2012

Date of Patent: November 1, 2016

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Mazin Gilbert, Alistair D. Conkie, Andrej Ljolje
Ontology administration and application to enhance communication data analytics

Patent number: 9477752

Abstract: A method for developing an ontology for practicing communication data, wherein the ontology is a structural representation of language elements and the relationship between those language elements within the domain, includes providing a training set of communication data and processing the training set of communication data to identify terms within the training set of communication data, wherein a term is a word or short phrase. The method further includes utilizing the terms to identify relations within the training set of communication data, wherein a relation is a pair of terms that appear in proximity to one another. Finally, the terms in the relations are stored in a database.

Type: Grant

Filed: September 30, 2014

Date of Patent: October 25, 2016

Assignee: VERINT SYSTEMS INC.

Inventor: Roni Romano
Deep networks for unit selection speech synthesis

Patent number: 9460704

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.

Type: Grant

Filed: September 6, 2013

Date of Patent: October 4, 2016

Assignee: Google Inc.

Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
Far-field speech recognition systems and methods

Patent number: 9443516

Abstract: A method for far-field speech recognition can include determining a location for a plurality of sound recognition devices, communicatively coupling each of the plurality of sound recognition devices, adjusting a sound reception for the plurality of sound recognition devices to receive a voice command from a particular direction, and sending instructions to a device based on the voice command.

Type: Grant

Filed: January 9, 2014

Date of Patent: September 13, 2016

Assignee: Honeywell International Inc.

Inventors: SrinivasaRao Katuri, Amit Kulkarni
Biometric password security

Patent number: 9437195

Abstract: A system includes a user speech profile stored on a computer readable storage device, the speech profile containing a plurality of phonemes with user identifying characteristics for the phonemes, and a speech processor coupled to access the speech profile to generate a phrase containing user distinguishing phonemes based on a difference between the user identifying characteristics for such phonemes and average user identifying characteristics, such that the phrase has discriminability from other users. The speech processor may also or alternatively select the phrase as a function of ambient noise.

Type: Grant

Filed: September 18, 2013

Date of Patent: September 6, 2016

Assignee: Lenovo (Singapore) Pte. Ltd.

Inventors: John Weldon Nicholson, Steven Richard Perrin
Multi-stage hotword detection

Patent number: 9418656

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance. The actions further include determining a likelihood that the initial portion of the utterance includes a hotword. The actions further include determining that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold. The actions further include, in response to determining that the likelihood satisfies the threshold, transmitting a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance.

Type: Grant

Filed: March 13, 2015

Date of Patent: August 16, 2016

Assignee: Google Inc.

Inventors: Jakob Nicolaus Foerster, Alexander H. Gruenstein, Diego Melendo Casado
Mixed speech recognition

Patent number: 9390712

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Type: Grant

Filed: March 24, 2014

Date of Patent: July 12, 2016

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
Detecting speech input phrase confusion risk

Patent number: 9384731

Abstract: Embodiments are disclosed that relate to identifying phonetically similar speech grammar terms during computer program development. For example, one disclosed embodiment provides a method including providing a speech grammar development tool configured to receive input of a text representation of each of a plurality of proposed speech grammar terms, convert each text representation to a phonetic representation of the speech grammar term, compare the phonetic representation of the speech grammar term to the phonetic representations of other speech grammar terms using a weighted similarity matrix, and provide an output regarding risk of confusion between two proposed speech grammar terms based upon a comparison of the phonetic representations of the two proposed speech grammar terms. The method further includes receiving data regarding incorrect speech grammar term identification, and modifying one or more weights in the weighted similarity matrix based upon the data.

Type: Grant

Filed: November 6, 2013

Date of Patent: July 5, 2016

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Michael Tjalve, Pavan Karnam, Dennis Mooney
Low-footprint adaptation and personalization for a deep neural network

Patent number: 9324321

Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.

Type: Grant

Filed: March 7, 2014

Date of Patent: April 26, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
Source-specific speech interactions

Patent number: 9293134

Abstract: A speech system may be configured to operate in conjunction with a stationary base device and a handheld remote device to receive voice commands from a user. Voice commands may be directed either to the base device or to the handheld device. When performing automatic speech recognition (ASR), natural language understanding (NLU), dialog management, text-to-speech (TTS) conversion, and other speech-related tasks, the system may utilize various models, including ASR models, NLU models, dialog models, and TTS models. Different models may be used depending on whether the user has chosen to speak into the base device or the handheld audio device. The different models may be designed to accommodate the different characteristics of audio and speech that are present in audio provided by the two different components and the different characteristics of the environmental situation of the user.

Type: Grant

Filed: September 30, 2014

Date of Patent: March 22, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Shirin Saleem, Shamitha Somashekar, Aimee Therese Piercy, Kurt Wesley Piersol, Marcello Typrin
System and method for speech recognition using deep recurrent neural networks

Patent number: 9263036

Abstract: Deep recurrent neural networks applied to speech recognition. The deep recurrent neural networks (RNNs) are preferably implemented by stacked long short-term memory bidirectional RNNs. The RNNs are trained using end-to-end training with suitable regularization.

Type: Grant

Filed: November 26, 2013

Date of Patent: February 16, 2016

Assignee: Google Inc.

Inventor: Alexander B. Graves
Automatic collection of speaker name pronunciations

Patent number: 9240181

Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.

Type: Grant

Filed: August 20, 2013

Date of Patent: January 19, 2016

Assignee: Cisco Technology, Inc.

Inventors: Aparna Khare, Neha Agrawal, Sachin S. Kajarekar, Matthias Paulik
Automated language detection for domain names

Patent number: 9218335

Abstract: Methods and systems for automated language detection for domain names are disclosed. In some embodiments, a method for detecting a language of an Internationalized Domain Name (IDN) comprises receiving, by an I/O interface, a string of characters for the IDN; receiving training data, including a plurality of multi-gram analysis for a set of languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing. In some embodiments, the method further comprises comparing the detected language of the IDN with a user selected language and using the IDN to generate a domain name, if the comparing indicates that the detected language of the IDN is consistent with the user selected language.

Type: Grant

Filed: October 10, 2012

Date of Patent: December 22, 2015

Assignee: VERISIGN, INC.

Inventors: Ronald Andrew Hoskinson, Lambert Arians, Marc Anderson, Mahendra Jain
Speech recognition using speech characteristic probabilities

Patent number: 9202470

Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.

Type: Grant

Filed: January 31, 2013

Date of Patent: December 1, 2015

Assignee: BROADCOM CORPORATION

Inventor: Nambirajan Seshadri
Curriculum learning for speech recognition

Patent number: 9202464

Abstract: Methods and apparatus related to training speech recognition devices are presented. A computing device receives training samples for training a neural network to learn an acoustic speech model. A curriculum function for speech modeling can be determined. For each training sample of the training samples, a corresponding curriculum function value for the training sample can be determined using the curriculum function. The training samples can be ordered based on the corresponding curriculum function values. In some embodiments, the neural network can be trained utilizing the ordered training samples. The trained neural network can receive an input of a second plurality of samples corresponding to human speech, where the second plurality of samples differs from the training samples. In response to receiving the second plurality of samples, the trained neural network can generate a plurality of phones corresponding to the captured human speech.

Type: Grant

Filed: April 9, 2013

Date of Patent: December 1, 2015

Assignee: Google Inc.

Inventors: Andrew William Senior, Marc'Aurelio Ranzato
System and method for applying a convolutional neural network to speech recognition

Patent number: 9190053

Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.

Type: Grant

Filed: March 25, 2013

Date of Patent: November 17, 2015

Assignees: THE GOVERNING COUNCIL OF THE UNIVERISTY OF TORONTO

Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed
Visual speech detection using facial landmarks

Patent number: 9190061

Abstract: A data processing apparatus for detecting a probability of speech based on video data is disclosed. The data processing apparatus may include at least one processor, and a non-transitory computer-readable storage medium including instructions executable by the at least one processor, where execution of the instructions by the at least one processor causes the data processing apparatus to execute a visual speech detector. The visual speech detector may be configured to receive a coordinate-based signal. The coordinate-based signal may represent movement or lack of movement of at least one facial landmark of a person in a video signal. The visual speech detector may be configured to compute a probability of speech of the person based on the coordinate-based signal.

Type: Grant

Filed: March 15, 2013

Date of Patent: November 17, 2015

Assignee: Google Inc.

Inventor: Mikhal Shemer
Local peak weighted-minimum mean square error (LPW-MMSE) estimation for robust speech

Patent number: 9190072

Abstract: A system and method for noise reduction applied to a speech recognition front-end. An output of a front-end is optimized by giving, as a weight to the output for each band, a confidence index representing the remarkableness of the harmonic structure of observation speech. In a first method, when clean speech is estimated by executing MMSE estimation on a model that gives a probability distribution of noise-removed speech generated from observation speech, the posterior probability of the MMSE estimation is weighted using the confidence index as a weight. In a second method, linear interpolation is executed, for each band, between an observed value of observation speech and an estimated value of clean speech, with the confidence index serving as a weight. The first method and the second method can be combined.

Type: Grant

Filed: March 6, 2013

Date of Patent: November 17, 2015

Assignee: International Business Machines Corporation

Inventor: Osamu Ichikawa
Adaptive neural network speech recognition models

Patent number: 9153231

Abstract: Neural networks may be used in certain automatic speech recognition systems. To improve performance of these neural networks, they may be updated/retrained during run time by training the neural network based on the output of a speech recognition system or based on the output of the neural networks themselves. The outputs may include weighted outputs, lattices, weighted N-best lists, or the like. The neural networks may be acoustic model neural networks or language model neural networks. The neural networks may be retrained after each pass through the network, after each utterance, or in varying time scales.

Type: Grant

Filed: March 15, 2013

Date of Patent: October 6, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Stan Weidner Salvador, Frederick Victor Weber
System and a method for generating semantically similar sentences for building a robust SLM

Patent number: 9135237

Abstract: A system and method are described for generating semantically similar sentences for a statistical language model. A semantic class generator determines for each word in an input utterance a set of corresponding semantically similar words. A sentence generator computes a set of candidate sentences each containing at most one member from each set of semantically similar words. A sentence verifier grammatically tests each candidate sentence to determine a set of grammatically correct sentences semantically similar to the input utterance. Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database.

Type: Grant

Filed: July 13, 2011

Date of Patent: September 15, 2015

Assignee: Nuance Communications, Inc.

Inventors: Om D. Deshmukh, Sachindra Joshi, Shajith I. Mohamed, Ashish Verma
System and method for feature-rich continuous space language models

Patent number: 9092425

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for predicting probabilities of words for a language model. An exemplary system configured to practice the method receives a sequence of words and external data associated with the sequence of words and maps the sequence of words to an X-dimensional vector, corresponding to a vocabulary size. Then the system processes each X-dimensional vector, based on the external data, to generate respective Y-dimensional vectors, wherein each Y-dimensional vector represents a dense continuous space, and outputs at least one next word predicted to follow the sequence of words based on the respective Y-dimensional vectors. The X-dimensional vector, which is a binary sparse representation, can be higher dimensional than the Y-dimensional vector, which is a dense continuous space.

Type: Grant

Filed: December 8, 2010

Date of Patent: July 28, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Piotr Wojciech Mirowski, Srinivas Bangalore, Suhrid Balakrishnan, Sumit Chopra
Data processing device, information storage medium storing computer program therefor and data processing method

Patent number: 9047562

Abstract: A plurality of pruning measures (PM) are calculated from a feature amount (CV) of test data (TD) which is input, a plurality of isopycnic surfaces (EC) are plotted and set on a threshold space (SS), a threshold curved surface (SC) in which a decrease in at least one of a plurality of pruning measures (PM) causes an increase in at least one thereof is generated using a portion of one isopycnic surface (EC) as a part, a hypothesis curved surface (HC) of subject data (CD) is generated on the threshold space (SS) to set a position intersecting the threshold curved surface (SC) to a pruning threshold (PS), and a plurality of hypotheses of the subject data (CD) are pruned. Thereby, there is provided a data processing device of which at least one of the recognition speed and the recognition accuracy is higher than in the related art.

Type: Grant

Filed: December 2, 2010

Date of Patent: June 2, 2015

Assignee: NEC CORPORATION

Inventors: Koji Okabe, Ken Hanazawa, Seiya Osada, Takayuki Arakawa, Daisuke Tanaka
Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors

Publication number: 20150149165

Abstract: A method includes providing a deep neural network acoustic model, receiving audio data including one or more utterances of a speaker, extracting a plurality of speech recognition features from the one or more utterances of the speaker, creating a speaker identity vector for the speaker based on the extracted speech recognition features, and adapting the deep neural network acoustic model for automatic speech recognition using the extracted speech recognition features and the speaker identity vector.

Type: Application

Filed: September 29, 2014

Publication date: May 28, 2015

Inventor: George A. Saon

prev … 2 3 4 5 6 7 8 9 10 next