Neural Network Patents (Class 704/232)
-
Patent number: 9761228Abstract: There are provided a recognition result candidate comparator 205 that compares a plurality of server-side voice recognition result candidates received by a receiver 204, to detect texts having a difference, and a recognition result integrator 206 that integrates a client-side voice recognition result candidate and a server-side voice recognition result candidate on the basis of the client-side voice recognition result candidate, the server-side voice recognition result candidate, and a detection result provided by the recognition result candidate comparator 205, to decide a voice recognition result.Type: GrantFiled: November 20, 2013Date of Patent: September 12, 2017Assignee: Mitsubishi Electric CorporationInventors: Isamu Ogawa, Toshiyuki Hanazawa, Tomohiro Narita
-
Patent number: 9761221Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.Type: GrantFiled: August 20, 2015Date of Patent: September 12, 2017Assignee: Nuance Communications, Inc.Inventors: Steven John Rennie, Vaibhava Goel
-
Patent number: 9754584Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.Type: GrantFiled: November 8, 2016Date of Patent: September 5, 2017Assignee: Google Inc.Inventors: Maria Carolina Parada San Martin, Tara N. Sainath, Guoguo Chen
-
Patent number: 9721562Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.Type: GrantFiled: December 3, 2014Date of Patent: August 1, 2017Assignee: Google Inc.Inventors: Hasim Sak, Andrew W. Senior
-
Patent number: 9715660Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes training a deep neural network with a first training set by adjusting values for each of a plurality of weights included in the neural network, and training the deep neural network to determine a probability that data received by the deep neural network has features similar to key features of one or more keywords or key phrases, the training comprising providing the deep neural network with a second training set and adjusting the values for a first subset of the plurality of weights, wherein the second training set includes data representing the key features of the one or more keywords or key phrases.Type: GrantFiled: March 31, 2014Date of Patent: July 25, 2017Assignee: Google Inc.Inventors: Maria Carolina Parada San Martin, Guoguo Chen, Georg Heigold
-
Patent number: 9711143Abstract: A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.Type: GrantFiled: April 4, 2016Date of Patent: July 18, 2017Assignee: VoiceBox Technologies CorporationInventors: Robert A. Kennewick, Chris Weider
-
Patent number: 9703769Abstract: A clausifier and method of extracting clauses for spoken language understanding are disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text.Type: GrantFiled: October 7, 2015Date of Patent: July 11, 2017Assignee: Nuance Communications, Inc.Inventors: Srinivas Bangalore, Narendra K. Gupta, Mazin Gilbert
-
Patent number: 9697826Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.Type: GrantFiled: July 8, 2016Date of Patent: July 4, 2017Assignee: Google Inc.Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
-
Patent number: 9699550Abstract: A method includes generating a command at a first microphone and sending the command from the first microphone to a second microphone. The command is sent to the second microphone via a bus that is coupled to the first microphone and to the second microphone.Type: GrantFiled: November 12, 2014Date of Patent: July 4, 2017Assignee: QUALCOMM IncorporatedInventors: Joseph Robert Fitzgerald, Bengt Stefan Gustavsson, Louis Dominic Oliveira
-
Patent number: 9685155Abstract: A method distinguishes components of a signal by processing the signal to estimate a set of analysis features, wherein each analysis feature defines an element of the signal and has feature values that represent parts of the signal, processing the signal to estimate input features of the signal, and processing the input features using a deep neural network to assign an associative descriptor to each element of the signal, wherein a degree of similarity between the associative descriptors of different elements is related to a degree to which the parts of the signal represented by the elements belong to a single component of the signal. The similarities between associative descriptors are processed to estimate correspondences between the elements of the signal and the components in the signal. Then, the signal is processed using the correspondences to distinguish component parts of the signal.Type: GrantFiled: May 5, 2016Date of Patent: June 20, 2017Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: John Hershey, Jonathan Le Roux, Shinji Watanabe, Zhuo Chen
-
Patent number: 9678953Abstract: Computer-based systems and methods are disclosed for translation of a multi-media presentation (e.g., a lecture) along with the accompanying presentation materials. Translation and delivery of text-based presentation materials to a listener is annotated and aligned with audio, so that the listener can follow both the audio and the presentation material. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.Type: GrantFiled: January 5, 2015Date of Patent: June 13, 2017Assignee: Facebook, Inc.Inventor: Alexander Waibel
-
Patent number: 9674606Abstract: There is provided a signal processing device including a feature amount extraction unit configured to extract, from a frequency-domain signal obtained by frequency conversion on a voice signal, a feature amount of the frequency-domain signal, and a determination unit configured to determine, based on the extracted feature amount, presence or absence of noise in the voice signal within a predetermined section. The feature amount is composed of a plurality of elements. The plurality of elements contain an element defined based on a correlation value between a feature amount waveform which is a waveform according to the frequency-domain signal in the voice signal within the predetermined section and a feature amount waveform within another section sequential in time to the predetermined section.Type: GrantFiled: October 18, 2013Date of Patent: June 6, 2017Assignee: Sony CorporationInventors: Keiichi Osako, Mototsugu Abe
-
Patent number: 9653066Abstract: Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.Type: GrantFiled: October 23, 2009Date of Patent: May 16, 2017Assignee: Nuance Communications, Inc.Inventors: Jason Williams, Suhrid Balakrishnan
-
Patent number: 9632589Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.Type: GrantFiled: July 1, 2015Date of Patent: April 25, 2017Assignee: International Business Machines CorporationInventors: Jonathan H. Connell, II, Etienne Marcheret
-
Patent number: 9627532Abstract: Methods and apparatus for training a multi-layer artificial neural network for use in speech recognition. The method comprises determining for a first speech pattern of the plurality of speech patterns, using a first processing pipeline, network activations for a plurality of nodes of the artificial neural network in response to providing the first speech pattern as input to the artificial neural network, determining based, at least in part, on the network activations and a selection criterion, whether the artificial neural network should be trained on the first speech pattern, and updating, using a second processing pipeline, network weights between nodes of the artificial neural network based, at least in part, on the network activations when it is determined that the artificial neural network should be trained on the first speech pattern.Type: GrantFiled: June 18, 2014Date of Patent: April 18, 2017Assignee: Nuance Communications, Inc.Inventors: Roberto Gemello, Franco Mana, Dario Albesano
-
Patent number: 9626001Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.Type: GrantFiled: November 13, 2014Date of Patent: April 18, 2017Assignee: International Business Machines CorporationInventors: Jonathan H. Connell, II, Etienne Marcheret
-
Patent number: 9626621Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.Type: GrantFiled: July 7, 2015Date of Patent: April 18, 2017Assignee: International Business Machines CorporationInventors: Pierre Dognin, Vaibhava Goel
-
Patent number: 9620108Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.Type: GrantFiled: December 2, 2014Date of Patent: April 11, 2017Assignee: Google Inc.Inventors: Hasim Sak, Andrew W. Senior
-
Patent number: 9620145Abstract: The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.Type: GrantFiled: May 20, 2014Date of Patent: April 11, 2017Assignee: Google Inc.Inventors: Michiel A. U. Bacchiani, David Rybach
-
Patent number: 9613619Abstract: A method for predicting a speech recognition quality of a phrase comprising at least one word includes: receiving, on a computer system including a processor and memory storing instructions, the phrase; computing, on the computer system, a set of features comprising one or more features corresponding to the phrase; providing the phrase to a prediction model on the computer system and receiving a predicted recognition quality value based on the set of features; and returning the predicted recognition quality value.Type: GrantFiled: October 30, 2013Date of Patent: April 4, 2017Assignee: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.Inventors: Amir Lev-Tov, Avraham Faizakof, Yochai Konig
-
Patent number: 9607616Abstract: A spoken language understanding (SLU) system receives a sequence of words corresponding to one or more spoken utterances of a user, which is passed through a spoken language understanding module to produce a sequence of intentions. The sequence of words are passed through a first subnetwork of a multi-scale recurrent neural network (MSRNN), and the sequence of intentions are passed through a second subnetwork of the multi-scale recurrent neural network (MSRNN). Then, the outputs of the first subnetwork and the second subnetwork are combined to predict a goal of the user.Type: GrantFiled: August 17, 2015Date of Patent: March 28, 2017Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Shinji Watanabe, Yi Luan, Bret Harsham
-
Patent number: 9601109Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.Type: GrantFiled: September 29, 2014Date of Patent: March 21, 2017Assignee: International Business Machines CorporationInventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
-
Patent number: 9575936Abstract: Machine learning-based methods to improve the knowledge extraction process in a specific domain or business environment, and then provides that extracted knowledge in a word cloud user interface display capable of summarizing and conveying a vast amount of information to a user very quickly. Based on the self-training mechanism developed by the inventors, the ontology programming automatically trains itself to understand the domain or environment of the communication data by processing and analyzing a defined corpus of communication data. The developed ontology can be applied to process a dataset of communication information to create a word cloud that can provide a quick view into the content of the dataset, including information about the language used by participants in the communications, such as identifying for a user key phrases and terms, the frequency of those phrases, the originator of the terms of phrases, and the confidence levels of such identifications.Type: GrantFiled: July 16, 2015Date of Patent: February 21, 2017Assignee: VERINT SYSTEMS LTD.Inventors: Roni Romano, Galia Zacay, Rahm Fehr
-
Patent number: 9542935Abstract: The embodiment of the present invention provides a method for realizing a voice recognition function, including: setting a corresponding relationship between the attitude parameter of a mobile terminal body and a voice recognition mode (S10); if a gravity sensor in the mobile terminal detects that a change of the attitude parameter of the mobile terminal body satisfies a condition of switching the voice recognition mode, then switching the voice recognition mode, and performing voice recognition under the switched voice recognition mode (S20). Through self-adaptively switching the voice recognition mode of the mobile terminal, the voice recognition function of the mobile terminal can be made to free the hands of a user to the greatest extent and save power consumption. An apparatus for realizing the voice recognition function corresponding to the method is also disclosed.Type: GrantFiled: June 17, 2013Date of Patent: January 10, 2017Assignee: ZTE CorporationInventor: Junxuan Lin
-
Patent number: 9519632Abstract: Annotating web content, in one aspect, may include detecting a request to navigate to a web site for content on a web browser. A component such as a web browser plugin, extension or the like transmits a uniform resource locator (URL) associated with the web site to a computer-implemented service that stores annotations to the content separate from the web site that is providing the content, and receives from the computer-implemented service one or more annotations to the content. The web browser plugin or the like renders the one or more annotations within the content from the web site. The content rendered with the annotations may be displayed within a display window of the web browser.Type: GrantFiled: December 22, 2015Date of Patent: December 13, 2016Assignee: International Business Machines CorporationInventors: Eric J. Barkie, Benjamin L. Fletcher, Andrew P. Wyskida
-
Patent number: 9520127Abstract: Providing a framework for merging automatic speech recognition (ASR) systems having a shared deep neural network (DNN) feature transformation is provided. A received utterance may be evaluated to generate a DNN-derived feature from the top hidden layer of a DNN. The top hidden layer output may then be utilized to generate a network including a bottleneck layer and an output layer. Weights representing a feature dimension reduction may then be extracted between the top hidden layer and the bottleneck layer. Scores may then be generated and combined to merge the ASR systems which share the DNN feature transformation.Type: GrantFiled: April 29, 2014Date of Patent: December 13, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Jinyu Li, Jian Xue, Yifan Gong
-
Patent number: 9508347Abstract: A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.Type: GrantFiled: December 16, 2013Date of Patent: November 29, 2016Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Eryu Wang, Li Lu, Xiang Zhang, Haibo Liu, Feng Rao, Lou Li, Shuai Yue, Bo Chen
-
Patent number: 9484019Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.Type: GrantFiled: October 11, 2012Date of Patent: November 1, 2016Assignee: AT&T Intellectual Property I, L.P.Inventors: Mazin Gilbert, Alistair D. Conkie, Andrej Ljolje
-
Patent number: 9477752Abstract: A method for developing an ontology for practicing communication data, wherein the ontology is a structural representation of language elements and the relationship between those language elements within the domain, includes providing a training set of communication data and processing the training set of communication data to identify terms within the training set of communication data, wherein a term is a word or short phrase. The method further includes utilizing the terms to identify relations within the training set of communication data, wherein a relation is a pair of terms that appear in proximity to one another. Finally, the terms in the relations are stored in a database.Type: GrantFiled: September 30, 2014Date of Patent: October 25, 2016Assignee: VERINT SYSTEMS INC.Inventor: Roni Romano
-
Patent number: 9460704Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.Type: GrantFiled: September 6, 2013Date of Patent: October 4, 2016Assignee: Google Inc.Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
-
Patent number: 9443516Abstract: A method for far-field speech recognition can include determining a location for a plurality of sound recognition devices, communicatively coupling each of the plurality of sound recognition devices, adjusting a sound reception for the plurality of sound recognition devices to receive a voice command from a particular direction, and sending instructions to a device based on the voice command.Type: GrantFiled: January 9, 2014Date of Patent: September 13, 2016Assignee: Honeywell International Inc.Inventors: SrinivasaRao Katuri, Amit Kulkarni
-
Patent number: 9437195Abstract: A system includes a user speech profile stored on a computer readable storage device, the speech profile containing a plurality of phonemes with user identifying characteristics for the phonemes, and a speech processor coupled to access the speech profile to generate a phrase containing user distinguishing phonemes based on a difference between the user identifying characteristics for such phonemes and average user identifying characteristics, such that the phrase has discriminability from other users. The speech processor may also or alternatively select the phrase as a function of ambient noise.Type: GrantFiled: September 18, 2013Date of Patent: September 6, 2016Assignee: Lenovo (Singapore) Pte. Ltd.Inventors: John Weldon Nicholson, Steven Richard Perrin
-
Patent number: 9418656Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance. The actions further include determining a likelihood that the initial portion of the utterance includes a hotword. The actions further include determining that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold. The actions further include, in response to determining that the likelihood satisfies the threshold, transmitting a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance.Type: GrantFiled: March 13, 2015Date of Patent: August 16, 2016Assignee: Google Inc.Inventors: Jakob Nicolaus Foerster, Alexander H. Gruenstein, Diego Melendo Casado
-
Patent number: 9390712Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: GrantFiled: March 24, 2014Date of Patent: July 12, 2016Assignee: Microsoft Technology Licensing, LLC.Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Patent number: 9384731Abstract: Embodiments are disclosed that relate to identifying phonetically similar speech grammar terms during computer program development. For example, one disclosed embodiment provides a method including providing a speech grammar development tool configured to receive input of a text representation of each of a plurality of proposed speech grammar terms, convert each text representation to a phonetic representation of the speech grammar term, compare the phonetic representation of the speech grammar term to the phonetic representations of other speech grammar terms using a weighted similarity matrix, and provide an output regarding risk of confusion between two proposed speech grammar terms based upon a comparison of the phonetic representations of the two proposed speech grammar terms. The method further includes receiving data regarding incorrect speech grammar term identification, and modifying one or more weights in the weighted similarity matrix based upon the data.Type: GrantFiled: November 6, 2013Date of Patent: July 5, 2016Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Michael Tjalve, Pavan Karnam, Dennis Mooney
-
Patent number: 9324321Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.Type: GrantFiled: March 7, 2014Date of Patent: April 26, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
-
Patent number: 9293134Abstract: A speech system may be configured to operate in conjunction with a stationary base device and a handheld remote device to receive voice commands from a user. Voice commands may be directed either to the base device or to the handheld device. When performing automatic speech recognition (ASR), natural language understanding (NLU), dialog management, text-to-speech (TTS) conversion, and other speech-related tasks, the system may utilize various models, including ASR models, NLU models, dialog models, and TTS models. Different models may be used depending on whether the user has chosen to speak into the base device or the handheld audio device. The different models may be designed to accommodate the different characteristics of audio and speech that are present in audio provided by the two different components and the different characteristics of the environmental situation of the user.Type: GrantFiled: September 30, 2014Date of Patent: March 22, 2016Assignee: Amazon Technologies, Inc.Inventors: Shirin Saleem, Shamitha Somashekar, Aimee Therese Piercy, Kurt Wesley Piersol, Marcello Typrin
-
Patent number: 9263036Abstract: Deep recurrent neural networks applied to speech recognition. The deep recurrent neural networks (RNNs) are preferably implemented by stacked long short-term memory bidirectional RNNs. The RNNs are trained using end-to-end training with suitable regularization.Type: GrantFiled: November 26, 2013Date of Patent: February 16, 2016Assignee: Google Inc.Inventor: Alexander B. Graves
-
Patent number: 9240181Abstract: An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected.Type: GrantFiled: August 20, 2013Date of Patent: January 19, 2016Assignee: Cisco Technology, Inc.Inventors: Aparna Khare, Neha Agrawal, Sachin S. Kajarekar, Matthias Paulik
-
Patent number: 9218335Abstract: Methods and systems for automated language detection for domain names are disclosed. In some embodiments, a method for detecting a language of an Internationalized Domain Name (IDN) comprises receiving, by an I/O interface, a string of characters for the IDN; receiving training data, including a plurality of multi-gram analysis for a set of languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing. In some embodiments, the method further comprises comparing the detected language of the IDN with a user selected language and using the IDN to generate a domain name, if the comparing indicates that the detected language of the IDN is consistent with the user selected language.Type: GrantFiled: October 10, 2012Date of Patent: December 22, 2015Assignee: VERISIGN, INC.Inventors: Ronald Andrew Hoskinson, Lambert Arians, Marc Anderson, Mahendra Jain
-
Patent number: 9202470Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.Type: GrantFiled: January 31, 2013Date of Patent: December 1, 2015Assignee: BROADCOM CORPORATIONInventor: Nambirajan Seshadri
-
Patent number: 9202464Abstract: Methods and apparatus related to training speech recognition devices are presented. A computing device receives training samples for training a neural network to learn an acoustic speech model. A curriculum function for speech modeling can be determined. For each training sample of the training samples, a corresponding curriculum function value for the training sample can be determined using the curriculum function. The training samples can be ordered based on the corresponding curriculum function values. In some embodiments, the neural network can be trained utilizing the ordered training samples. The trained neural network can receive an input of a second plurality of samples corresponding to human speech, where the second plurality of samples differs from the training samples. In response to receiving the second plurality of samples, the trained neural network can generate a plurality of phones corresponding to the captured human speech.Type: GrantFiled: April 9, 2013Date of Patent: December 1, 2015Assignee: Google Inc.Inventors: Andrew William Senior, Marc'Aurelio Ranzato
-
Patent number: 9190053Abstract: A system and method for applying a convolutional neural network (CNN) to speech recognition. The CNN may provide input to a hidden Markov model and has at least one pair of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The CNN has units that operate upon one or more local frequency bands of an acoustic signal. The CNN mitigates acoustic variation.Type: GrantFiled: March 25, 2013Date of Patent: November 17, 2015Assignees: THE GOVERNING COUNCIL OF THE UNIVERISTY OF TORONTOInventors: Gerald Bradley Penn, Hui Jiang, Ossama Abdelhamid Mohamed Abdelhamid, Abdel-rahman Samir Abdel-rahman Mohamed
-
Patent number: 9190061Abstract: A data processing apparatus for detecting a probability of speech based on video data is disclosed. The data processing apparatus may include at least one processor, and a non-transitory computer-readable storage medium including instructions executable by the at least one processor, where execution of the instructions by the at least one processor causes the data processing apparatus to execute a visual speech detector. The visual speech detector may be configured to receive a coordinate-based signal. The coordinate-based signal may represent movement or lack of movement of at least one facial landmark of a person in a video signal. The visual speech detector may be configured to compute a probability of speech of the person based on the coordinate-based signal.Type: GrantFiled: March 15, 2013Date of Patent: November 17, 2015Assignee: Google Inc.Inventor: Mikhal Shemer
-
Patent number: 9190072Abstract: A system and method for noise reduction applied to a speech recognition front-end. An output of a front-end is optimized by giving, as a weight to the output for each band, a confidence index representing the remarkableness of the harmonic structure of observation speech. In a first method, when clean speech is estimated by executing MMSE estimation on a model that gives a probability distribution of noise-removed speech generated from observation speech, the posterior probability of the MMSE estimation is weighted using the confidence index as a weight. In a second method, linear interpolation is executed, for each band, between an observed value of observation speech and an estimated value of clean speech, with the confidence index serving as a weight. The first method and the second method can be combined.Type: GrantFiled: March 6, 2013Date of Patent: November 17, 2015Assignee: International Business Machines CorporationInventor: Osamu Ichikawa
-
Patent number: 9153231Abstract: Neural networks may be used in certain automatic speech recognition systems. To improve performance of these neural networks, they may be updated/retrained during run time by training the neural network based on the output of a speech recognition system or based on the output of the neural networks themselves. The outputs may include weighted outputs, lattices, weighted N-best lists, or the like. The neural networks may be acoustic model neural networks or language model neural networks. The neural networks may be retrained after each pass through the network, after each utterance, or in varying time scales.Type: GrantFiled: March 15, 2013Date of Patent: October 6, 2015Assignee: Amazon Technologies, Inc.Inventors: Stan Weidner Salvador, Frederick Victor Weber
-
Patent number: 9135237Abstract: A system and method are described for generating semantically similar sentences for a statistical language model. A semantic class generator determines for each word in an input utterance a set of corresponding semantically similar words. A sentence generator computes a set of candidate sentences each containing at most one member from each set of semantically similar words. A sentence verifier grammatically tests each candidate sentence to determine a set of grammatically correct sentences semantically similar to the input utterance. Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database.Type: GrantFiled: July 13, 2011Date of Patent: September 15, 2015Assignee: Nuance Communications, Inc.Inventors: Om D. Deshmukh, Sachindra Joshi, Shajith I. Mohamed, Ashish Verma
-
Patent number: 9092425Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for predicting probabilities of words for a language model. An exemplary system configured to practice the method receives a sequence of words and external data associated with the sequence of words and maps the sequence of words to an X-dimensional vector, corresponding to a vocabulary size. Then the system processes each X-dimensional vector, based on the external data, to generate respective Y-dimensional vectors, wherein each Y-dimensional vector represents a dense continuous space, and outputs at least one next word predicted to follow the sequence of words based on the respective Y-dimensional vectors. The X-dimensional vector, which is a binary sparse representation, can be higher dimensional than the Y-dimensional vector, which is a dense continuous space.Type: GrantFiled: December 8, 2010Date of Patent: July 28, 2015Assignee: AT&T Intellectual Property I, L.P.Inventors: Piotr Wojciech Mirowski, Srinivas Bangalore, Suhrid Balakrishnan, Sumit Chopra
-
Patent number: 9047562Abstract: A plurality of pruning measures (PM) are calculated from a feature amount (CV) of test data (TD) which is input, a plurality of isopycnic surfaces (EC) are plotted and set on a threshold space (SS), a threshold curved surface (SC) in which a decrease in at least one of a plurality of pruning measures (PM) causes an increase in at least one thereof is generated using a portion of one isopycnic surface (EC) as a part, a hypothesis curved surface (HC) of subject data (CD) is generated on the threshold space (SS) to set a position intersecting the threshold curved surface (SC) to a pruning threshold (PS), and a plurality of hypotheses of the subject data (CD) are pruned. Thereby, there is provided a data processing device of which at least one of the recognition speed and the recognition accuracy is higher than in the related art.Type: GrantFiled: December 2, 2010Date of Patent: June 2, 2015Assignee: NEC CORPORATIONInventors: Koji Okabe, Ken Hanazawa, Seiya Osada, Takayuki Arakawa, Daisuke Tanaka
-
Publication number: 20150149165Abstract: A method includes providing a deep neural network acoustic model, receiving audio data including one or more utterances of a speaker, extracting a plurality of speech recognition features from the one or more utterances of the speaker, creating a speaker identity vector for the speaker based on the extracted speech recognition features, and adapting the deep neural network acoustic model for automatic speech recognition using the extracted speech recognition features and the speaker identity vector.Type: ApplicationFiled: September 29, 2014Publication date: May 28, 2015Inventor: George A. Saon