Neural Network Patents (Class 704/232)
  • Patent number: 9761228
    Abstract: There are provided a recognition result candidate comparator 205 that compares a plurality of server-side voice recognition result candidates received by a receiver 204, to detect texts having a difference, and a recognition result integrator 206 that integrates a client-side voice recognition result candidate and a server-side voice recognition result candidate on the basis of the client-side voice recognition result candidate, the server-side voice recognition result candidate, and a detection result provided by the recognition result candidate comparator 205, to decide a voice recognition result.
    Type: Grant
    Filed: November 20, 2013
    Date of Patent: September 12, 2017
    Assignee: Mitsubishi Electric Corporation
    Inventors: Isamu Ogawa, Toshiyuki Hanazawa, Tomohiro Narita
  • Patent number: 9761221
    Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: September 12, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Steven John Rennie, Vaibhava Goel
  • Patent number: 9754584
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.
    Type: Grant
    Filed: November 8, 2016
    Date of Patent: September 5, 2017
    Assignee: Google Inc.
    Inventors: Maria Carolina Parada San Martin, Tara N. Sainath, Guoguo Chen
  • Patent number: 9721562
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
    Type: Grant
    Filed: December 3, 2014
    Date of Patent: August 1, 2017
    Assignee: Google Inc.
    Inventors: Hasim Sak, Andrew W. Senior
  • Patent number: 9715660
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.
    Type: Grant
    Filed: March 31, 2014
    Date of Patent: July 25, 2017
    Assignee: Google Inc.
    Inventors: Maria Carolina Parada San Martin, Guoguo Chen, Georg Heigold
  • Patent number: 9711143
    Abstract: A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.
    Type: Grant
    Filed: April 4, 2016
    Date of Patent: July 18, 2017
    Assignee: VoiceBox Technologies Corporation
    Inventors: Robert A. Kennewick, Chris Weider
  • Patent number: 9703769
    Abstract: A clausifier and method of extracting clauses for spoken language understanding are disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text.
    Type: Grant
    Filed: October 7, 2015
    Date of Patent: July 11, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Srinivas Bangalore, Narendra K. Gupta, Mazin Gilbert
  • Patent number: 9697826
    Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.
    Type: Grant
    Filed: July 8, 2016
    Date of Patent: July 4, 2017
    Assignee: Google Inc.
    Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
  • Patent number: 9699550
    Abstract: A method includes generating a command at a first microphone and sending the command from the first microphone to a second microphone. The command is sent to the second microphone via a bus that is coupled to the first microphone and to the second microphone.
    Type: Grant
    Filed: November 12, 2014
    Date of Patent: July 4, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Joseph Robert Fitzgerald, Bengt Stefan Gustavsson, Louis Dominic Oliveira
  • Patent number: 9685155
    Abstract: A method distinguishes components of a signal by processing the signal to estimate a set of analysis features, wherein each analysis feature defines an element of the signal and has feature values that represent parts of the signal, processing the signal to estimate input features of the signal, and processing the input features using a deep neural network to assign an associative descriptor to each element of the signal, wherein a degree of similarity between the associative descriptors of different elements is related to a degree to which the parts of the signal represented by the elements belong to a single component of the signal. The similarities between associative descriptors are processed to estimate correspondences between the elements of the signal and the components in the signal. Then, the signal is processed using the correspondences to distinguish component parts of the signal.
    Type: Grant
    Filed: May 5, 2016
    Date of Patent: June 20, 2017
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: John Hershey, Jonathan Le Roux, Shinji Watanabe, Zhuo Chen
  • Patent number: 9678953
    Abstract: Computer-based systems and methods are disclosed for translation of a multi-media presentation (e.g., a lecture) along with the accompanying presentation materials. Translation and delivery of text-based presentation materials to a listener is annotated and aligned with audio, so that the listener can follow both the audio and the presentation material. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.
    Type: Grant
    Filed: January 5, 2015
    Date of Patent: June 13, 2017
    Assignee: Facebook, Inc.
    Inventor: Alexander Waibel
  • Patent number: 9674606
    Abstract: There is provided a signal processing device including a feature amount extraction unit configured to extract, from a frequency-domain signal obtained by frequency conversion on a voice signal, a feature amount of the frequency-domain signal, and a determination unit configured to determine, based on the extracted feature amount, presence or absence of noise in the voice signal within a predetermined section. The feature amount is composed of a plurality of elements. The plurality of elements contain an element defined based on a correlation value between a feature amount waveform which is a waveform according to the frequency-domain signal in the voice signal within the predetermined section and a feature amount waveform within another section sequential in time to the predetermined section.
    Type: Grant
    Filed: October 18, 2013
    Date of Patent: June 6, 2017
    Assignee: Sony Corporation
    Inventors: Keiichi Osako, Mototsugu Abe
  • Patent number: 9653066
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.
    Type: Grant
    Filed: October 23, 2009
    Date of Patent: May 16, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Jason Williams, Suhrid Balakrishnan
  • Patent number: 9632589
    Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.
    Type: Grant
    Filed: July 1, 2015
    Date of Patent: April 25, 2017
    Assignee: International Business Machines Corporation
    Inventors: Jonathan H. Connell, II, Etienne Marcheret
  • Patent number: 9627532
    Abstract: Methods and apparatus for training a multi-layer artificial neural network for use in speech recognition. The method comprises determining for a first speech pattern of the plurality of speech patterns, using a first processing pipeline, network activations for a plurality of nodes of the artificial neural network in response to providing the first speech pattern as input to the artificial neural network, determining based, at least in part, on the network activations and a selection criterion, whether the artificial neural network should be trained on the first speech pattern, and updating, using a second processing pipeline, network weights between nodes of the artificial neural network based, at least in part, on the network activations when it is determined that the artificial neural network should be trained on the first speech pattern.
    Type: Grant
    Filed: June 18, 2014
    Date of Patent: April 18, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Roberto Gemello, Franco Mana, Dario Albesano
  • Patent number: 9626001
    Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.
    Type: Grant
    Filed: November 13, 2014
    Date of Patent: April 18, 2017
    Assignee: International Business Machines Corporation
    Inventors: Jonathan H. Connell, II, Etienne Marcheret
  • Patent number: 9626621
    Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.
    Type: Grant
    Filed: July 7, 2015
    Date of Patent: April 18, 2017
    Assignee: International Business Machines Corporation
    Inventors: Pierre Dognin, Vaibhava Goel
  • Patent number: 9620108
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.
    Type: Grant
    Filed: December 2, 2014
    Date of Patent: April 11, 2017
    Assignee: Google Inc.
    Inventors: Hasim Sak, Andrew W. Senior
  • Patent number: 9620145
    Abstract: The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.
    Type: Grant
    Filed: May 20, 2014
    Date of Patent: April 11, 2017
    Assignee: Google Inc.
    Inventors: Michiel A. U. Bacchiani, David Rybach
  • Patent number: 9613619
    Abstract: A method for predicting a speech recognition quality of a phrase comprising at least one word includes: receiving, on a computer system including a processor and memory storing instructions, the phrase; computing, on the computer system, a set of features comprising one or more features corresponding to the phrase; providing the phrase to a prediction model on the computer system and receiving a predicted recognition quality value based on the set of features; and returning the predicted recognition quality value.
    Type: Grant
    Filed: October 30, 2013
    Date of Patent: April 4, 2017
    Assignee: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.
    Inventors: Amir Lev-Tov, Avraham Faizakof, Yochai Konig
  • Patent number: 9607616
    Abstract: A spoken language understanding (SLU) system receives a sequence of words corresponding to one or more spoken utterances of a user, which is passed through a spoken language understanding module to produce a sequence of intentions. The sequence of words are passed through a first subnetwork of a multi-scale recurrent neural network (MSRNN), and the sequence of intentions are passed through a second subnetwork of the multi-scale recurrent neural network (MSRNN). Then, the outputs of the first subnetwork and the second subnetwork are combined to predict a goal of the user.
    Type: Grant
    Filed: August 17, 2015
    Date of Patent: March 28, 2017
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Shinji Watanabe, Yi Luan, Bret Harsham
  • Patent number: 9601109
    Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.
    Type: Grant
    Filed: September 29, 2014
    Date of Patent: March 21, 2017
    Assignee: International Business Machines Corporation
    Inventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
  • Patent number: 9575936
    Abstract: Machine learning-based methods to improve the knowledge extraction process in a specific domain or business environment, and then provides that extracted knowledge in a word cloud user interface display capable of summarizing and conveying a vast amount of information to a user very quickly. Based on the self-training mechanism developed by the inventors, the ontology programming automatically trains itself to understand the domain or environment of the communication data by processing and analyzing a defined corpus of communication data. The developed ontology can be applied to process a dataset of communication information to create a word cloud that can provide a quick view into the content of the dataset, including information about the language used by participants in the communications, such as identifying for a user key phrases and terms, the frequency of those phrases, the originator of the terms of phrases, and the confidence levels of such identifications.
    Type: Grant
    Filed: July 16, 2015
    Date of Patent: February 21, 2017
    Assignee: VERINT SYSTEMS LTD.
    Inventors: Roni Romano, Galia Zacay, Rahm Fehr
  • Patent number: 9542935
    Abstract: The embodiment of the present invention provides a method for realizing a voice recognition function, including: setting a corresponding relationship between the attitude parameter of a mobile terminal body and a voice recognition mode (S10); if a gravity sensor in the mobile terminal detects that a change of the attitude parameter of the mobile terminal body satisfies a condition of switching the voice recognition mode, then switching the voice recognition mode, and performing voice recognition under the switched voice recognition mode (S20). Through self-adaptively switching the voice recognition mode of the mobile terminal, the voice recognition function of the mobile terminal can be made to free the hands of a user to the greatest extent and save power consumption. An apparatus for realizing the voice recognition function corresponding to the method is also disclosed.
    Type: Grant
    Filed: June 17, 2013
    Date of Patent: January 10, 2017
    Assignee: ZTE Corporation
    Inventor: Junxuan Lin
  • Patent number: 9519632
    Abstract: Annotating web content, in one aspect, may include detecting a request to navigate to a web site for content on a web browser. A component such as a web browser plugin, extension or the like transmits a uniform resource locator (URL) associated with the web site to a computer-implemented service that stores annotations to the content separate from the web site that is providing the content, and receives from the computer-implemented service one or more annotations to the content. The web browser plugin or the like renders the one or more annotations within the content from the web site. The content rendered with the annotations may be displayed within a display window of the web browser.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: December 13, 2016
    Assignee: International Business Machines Corporation
    Inventors: Eric J. Barkie, Benjamin L. Fletcher, Andrew P. Wyskida
  • Patent number: 9520127
    Abstract: Providing a framework for merging automatic speech recognition (ASR) systems having a shared deep neural network (DNN) feature transformation is provided. A received utterance may be evaluated to generate a DNN-derived feature from the top hidden layer of a DNN. The top hidden layer output may then be utilized to generate a network including a bottleneck layer and an output layer. Weights representing a feature dimension reduction may then be extracted between the top hidden layer and the bottleneck layer. Scores may then be generated and combined to merge the ASR systems which share the DNN feature transformation.
    Type: Grant
    Filed: April 29, 2014
    Date of Patent: December 13, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jinyu Li, Jian Xue, Yifan Gong
  • Patent number: 9508347
    Abstract: A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.
    Type: Grant
    Filed: December 16, 2013
    Date of Patent: November 29, 2016
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Eryu Wang, Li Lu, Xiang Zhang, Haibo Liu, Feng Rao, Lou Li, Shuai Yue, Bo Chen
  • Patent number: 9484019
    Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.
    Type: Grant
    Filed: October 11, 2012
    Date of Patent: November 1, 2016
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Mazin Gilbert, Alistair D. Conkie, Andrej Ljolje
  • Patent number: 9477752
    Abstract: A method for developing an ontology for practicing communication data, wherein the ontology is a structural representation of language elements and the relationship between those language elements within the domain, includes providing a training set of communication data and processing the training set of communication data to identify terms within the training set of communication data, wherein a term is a word or short phrase. The method further includes utilizing the terms to identify relations within the training set of communication data, wherein a relation is a pair of terms that appear in proximity to one another. Finally, the terms in the relations are stored in a database.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: October 25, 2016
    Assignee: VERINT SYSTEMS INC.
    Inventor: Roni Romano
  • Patent number: 9460704
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.
    Type: Grant
    Filed: September 6, 2013
    Date of Patent: October 4, 2016
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
  • Patent number: 9443516
    Abstract: A method for far-field speech recognition can include determining a location for a plurality of sound recognition devices, communicatively coupling each of the plurality of sound recognition devices, adjusting a sound reception for the plurality of sound recognition devices to receive a voice command from a particular direction, and sending instructions to a device based on the voice command.
    Type: Grant
    Filed: January 9, 2014
    Date of Patent: September 13, 2016
    Assignee: Honeywell International Inc.
    Inventors: SrinivasaRao Katuri, Amit Kulkarni
  • Patent number: 9437195
    Abstract: A system includes a user speech profile stored on a computer readable storage device, the speech profile containing a plurality of phonemes with user identifying characteristics for the phonemes, and a speech processor coupled to access the speech profile to generate a phrase containing user distinguishing phonemes based on a difference between the user identifying characteristics for such phonemes and average user identifying characteristics, such that the phrase has discriminability from other users. The speech processor may also or alternatively select the phrase as a function of ambient noise.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: September 6, 2016
    Assignee: Lenovo (Singapore) Pte. Ltd.
    Inventors: John Weldon Nicholson, Steven Richard Perrin
  • Patent number: 9418656
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance. The actions further include determining a likelihood that the initial portion of the utterance includes a hotword. The actions further include determining that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold. The actions further include, in response to determining that the likelihood satisfies the threshold, transmitting a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance.
    Type: Grant
    Filed: March 13, 2015
    Date of Patent: August 16, 2016
    Assignee: Google Inc.
    Inventors: Jakob Nicolaus Foerster, Alexander H. Gruenstein, Diego Melendo Casado
  • Patent number: 9390712
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Grant
    Filed: March 24, 2014
    Date of Patent: July 12, 2016
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Patent number: 9384731
    Abstract: Embodiments are disclosed that relate to identifying phonetically similar speech grammar terms during computer program development. For example, one disclosed embodiment provides a method including providing a speech grammar development tool configured to receive input of a text representation of each of a plurality of proposed speech grammar terms, convert each text representation to a phonetic representation of the speech grammar term, compare the phonetic representation of the speech grammar term to the phonetic representations of other speech grammar terms using a weighted similarity matrix, and provide an output regarding risk of confusion between two proposed speech grammar terms based upon a comparison of the phonetic representations of the two proposed speech grammar terms. The method further includes receiving data regarding incorrect speech grammar term identification, and modifying one or more weights in the weighted similarity matrix based upon the data.
    Type: Grant
    Filed: November 6, 2013
    Date of Patent: July 5, 2016
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Michael Tjalve, Pavan Karnam, Dennis Mooney
  • Patent number: 9324321
    Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.
    Type: Grant
    Filed: March 7, 2014
    Date of Patent: April 26, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
  • Patent number: 9293134
    Abstract: A speech system may be configured to operate in conjunction with a stationary base device and a handheld remote device to receive voice commands from a user. Voice commands may be directed either to the base device or to the handheld device. When performing automatic speech recognition (ASR), natural language understanding (NLU), dialog management, text-to-speech (TTS) conversion, and other speech-related tasks, the system may utilize various models, including ASR models, NLU models, dialog models, and TTS models. Different models may be used depending on whether the user has chosen to speak into the base device or the handheld audio device. The different models may be designed to accommodate the different characteristics of audio and speech that are present in audio provided by the two different components and the different characteristics of the environmental situation of the user.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: March 22, 2016
    Assignee: Amazon Technologies, Inc.
    Inventors: Shirin Saleem, Shamitha Somashekar, Aimee Therese Piercy, Kurt Wesley Piersol, Marcello Typrin
  • Patent number: 9263036
    Abstract: Deep recurrent neural networks applied to speech recognition. The deep recurrent neural networks (RNNs) are preferably implemented by stacked long short-term memory bidirectional RNNs. The RNNs are trained using end-to-end training with suitable regularization.
    Type: Grant
    Filed: November 26, 2013
    Date of Patent: February 16, 2016
    Assignee: Google Inc.
    Inventor: Alexander B. Graves
  • Patent number: 9240181
    Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: January 19, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Aparna Khare, Neha Agrawal, Sachin S. Kajarekar, Matthias Paulik
  • Patent number: 9218335
    Abstract: Methods and systems for automated language detection for domain names are disclosed. In some embodiments, a method for detecting a language of an Internationalized Domain Name (IDN) comprises receiving, by an I/O interface, a string of characters for the IDN; receiving training data, including a plurality of multi-gram analysis for a set of languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing. In some embodiments, the method further comprises comparing the detected language of the IDN with a user selected language and using the IDN to generate a domain name, if the comparing indicates that the detected language of the IDN is consistent with the user selected language.
    Type: Grant
    Filed: October 10, 2012
    Date of Patent: December 22, 2015
    Assignee: VERISIGN, INC.
    Inventors: Ronald Andrew Hoskinson, Lambert Arians, Marc Anderson, Mahendra Jain
  • Patent number: 9202470
    Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.
    Type: Grant
    Filed: January 31, 2013
    Date of Patent: December 1, 2015
    Assignee: BROADCOM CORPORATION
    Inventor: Nambirajan Seshadri
  • Patent number: 9202464
    Abstract: Methods and apparatus related to training speech recognition devices are presented. A computing device receives training samples for training a neural network to learn an acoustic speech model. A curriculum function for speech modeling can be determined. For each training sample of the training samples, a corresponding curriculum function value for the training sample can be determined using the curriculum function. The training samples can be ordered based on the corresponding curriculum function values. In some embodiments, the neural network can be trained utilizing the ordered training samples. The trained neural network can receive an input of a second plurality of samples corresponding to human speech, where the second plurality of samples differs from the training samples. In response to receiving the second plurality of samples, the trained neural network can generate a plurality of phones corresponding to the captured human speech.
    Type: Grant
    Filed: April 9, 2013
    Date of Patent: December 1, 2015
    Assignee: Google Inc.
    Inventors: Andrew William Senior, Marc'Aurelio Ranzato
  • Patent number: 9190053
    Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.
    Type: Grant
    Filed: March 25, 2013
    Date of Patent: November 17, 2015
    Assignees: THE GOVERNING COUNCIL OF THE UNIVERISTY OF TORONTO
    Inventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed
  • Patent number: 9190061
    Abstract: A data processing apparatus for detecting a probability of speech based on video data is disclosed. The data processing apparatus may include at least one processor, and a non-transitory computer-readable storage medium including instructions executable by the at least one processor, where execution of the instructions by the at least one processor causes the data processing apparatus to execute a visual speech detector. The visual speech detector may be configured to receive a coordinate-based signal. The coordinate-based signal may represent movement or lack of movement of at least one facial landmark of a person in a video signal. The visual speech detector may be configured to compute a probability of speech of the person based on the coordinate-based signal.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: November 17, 2015
    Assignee: Google Inc.
    Inventor: Mikhal Shemer
  • Patent number: 9190072
    Abstract: A system and method for noise reduction applied to a speech recognition front-end. An output of a front-end is optimized by giving, as a weight to the output for each band, a confidence index representing the remarkableness of the harmonic structure of observation speech. In a first method, when clean speech is estimated by executing MMSE estimation on a model that gives a probability distribution of noise-removed speech generated from observation speech, the posterior probability of the MMSE estimation is weighted using the confidence index as a weight. In a second method, linear interpolation is executed, for each band, between an observed value of observation speech and an estimated value of clean speech, with the confidence index serving as a weight. The first method and the second method can be combined.
    Type: Grant
    Filed: March 6, 2013
    Date of Patent: November 17, 2015
    Assignee: International Business Machines Corporation
    Inventor: Osamu Ichikawa
  • Patent number: 9153231
    Abstract: Neural networks may be used in certain automatic speech recognition systems. To improve performance of these neural networks, they may be updated/retrained during run time by training the neural network based on the output of a speech recognition system or based on the output of the neural networks themselves. The outputs may include weighted outputs, lattices, weighted N-best lists, or the like. The neural networks may be acoustic model neural networks or language model neural networks. The neural networks may be retrained after each pass through the network, after each utterance, or in varying time scales.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: October 6, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Stan Weidner Salvador, Frederick Victor Weber
  • Patent number: 9135237
    Abstract: A system and method are described for generating semantically similar sentences for a statistical language model. A semantic class generator determines for each word in an input utterance a set of corresponding semantically similar words. A sentence generator computes a set of candidate sentences each containing at most one member from each set of semantically similar words. A sentence verifier grammatically tests each candidate sentence to determine a set of grammatically correct sentences semantically similar to the input utterance. Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database.
    Type: Grant
    Filed: July 13, 2011
    Date of Patent: September 15, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Om D. Deshmukh, Sachindra Joshi, Shajith I. Mohamed, Ashish Verma
  • Patent number: 9092425
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for predicting probabilities of words for a language model. An exemplary system configured to practice the method receives a sequence of words and external data associated with the sequence of words and maps the sequence of words to an X-dimensional vector, corresponding to a vocabulary size. Then the system processes each X-dimensional vector, based on the external data, to generate respective Y-dimensional vectors, wherein each Y-dimensional vector represents a dense continuous space, and outputs at least one next word predicted to follow the sequence of words based on the respective Y-dimensional vectors. The X-dimensional vector, which is a binary sparse representation, can be higher dimensional than the Y-dimensional vector, which is a dense continuous space.
    Type: Grant
    Filed: December 8, 2010
    Date of Patent: July 28, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Piotr Wojciech Mirowski, Srinivas Bangalore, Suhrid Balakrishnan, Sumit Chopra
  • Patent number: 9047562
    Abstract: A plurality of pruning measures (PM) are calculated from a feature amount (CV) of test data (TD) which is input, a plurality of isopycnic surfaces (EC) are plotted and set on a threshold space (SS), a threshold curved surface (SC) in which a decrease in at least one of a plurality of pruning measures (PM) causes an increase in at least one thereof is generated using a portion of one isopycnic surface (EC) as a part, a hypothesis curved surface (HC) of subject data (CD) is generated on the threshold space (SS) to set a position intersecting the threshold curved surface (SC) to a pruning threshold (PS), and a plurality of hypotheses of the subject data (CD) are pruned. Thereby, there is provided a data processing device of which at least one of the recognition speed and the recognition accuracy is higher than in the related art.
    Type: Grant
    Filed: December 2, 2010
    Date of Patent: June 2, 2015
    Assignee: NEC CORPORATION
    Inventors: Koji Okabe, Ken Hanazawa, Seiya Osada, Takayuki Arakawa, Daisuke Tanaka
  • Publication number: 20150149165
    Abstract: A method includes providing a deep neural network acoustic model, receiving audio data including one or more utterances of a speaker, extracting a plurality of speech recognition features from the one or more utterances of the speaker, creating a speaker identity vector for the speaker based on the extracted speech recognition features, and adapting the deep neural network acoustic model for automatic speech recognition using the extracted speech recognition features and the speaker identity vector.
    Type: Application
    Filed: September 29, 2014
    Publication date: May 28, 2015
    Inventor: George A. Saon