Neural Network Patents (Class 704/232)
-
Patent number: 10269342Abstract: A speech recognition system used in a workflow receives and analyzes speech input to recognize and accept a user's response to a task. Under certain conditions, a user's response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve recognition accuracy. For example, if the hypothesis of a user's response matches the expected response then there is a high probability that the user's response was recognized correctly. An expected response may include expected words and wildcard words. Wildcard words represent any recognized word in a user's response. By including wildcard words in the expected response, the speech recognition system may make modifications based on a wide range of user responses.Type: GrantFiled: October 29, 2014Date of Patent: April 23, 2019Assignee: HAND HELD PRODUCTS, INC.Inventors: Keith Braho, Jason M Makay
-
Patent number: 10268671Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating parse trees for input text segments. One of the methods includes obtaining an input text segment comprising a plurality of inputs arranged according to an input order; processing the inputs in the input text segment using an encoder long short term memory (LSTM) neural network to generate a respective encoder hidden state for each input in the input text segment; and processing the respective encoder hidden states for the inputs in the input text segment using an attention-based decoder LSTM neural network to generate a linearized representation of a parse tree for the input text segment.Type: GrantFiled: December 30, 2016Date of Patent: April 23, 2019Assignee: Google LLCInventors: Lukasz Mieczyslaw Kaiser, Oriol Vinyals
-
Patent number: 10262644Abstract: An application that manipulates audio (or audiovisual) content, automated music creation technologies may be employed to generate new musical content using digital signal processing software hosted on handheld and/or server (or cloud-based) compute platforms to intelligently process and combine a set of audio content captured and submitted by users of modern mobile phones or other handheld compute platforms. The user-submitted recordings may contain speech, singing, musical instruments, or a wide variety of other sound sources, and the recordings may optionally be preprocessed by the handheld devices prior to submission.Type: GrantFiled: December 31, 2014Date of Patent: April 16, 2019Assignee: Smule, Inc.Inventors: Randal Leistikow, Mark Godfrey, Ian S. Simon, Jeannie Yang, Michael W. Allen
-
Patent number: 10254760Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.Type: GrantFiled: June 4, 2018Date of Patent: April 9, 2019Assignee: Apex Artificial Intelligence Industries, Inc.Inventor: Kenneth A. Abeloe
-
Patent number: 10242665Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.Type: GrantFiled: June 4, 2018Date of Patent: March 26, 2019Assignee: Apex Artificial Intelligence Industries, Inc.Inventor: Kenneth A. Abeloe
-
Patent number: 10235992Abstract: The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode.Type: GrantFiled: July 27, 2017Date of Patent: March 19, 2019Assignee: nVoq IncorporatedInventors: Charles Corfield, Brian Marquette
-
Patent number: 10235994Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.Type: GrantFiled: June 30, 2016Date of Patent: March 19, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
-
Patent number: 10229672Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.Type: GrantFiled: January 3, 2017Date of Patent: March 12, 2019Assignee: Google LLCInventors: Kanury Kanishka Rao, Andrew W. Senior, Hasim Sak
-
Patent number: 10217052Abstract: The disclosure is directed to evaluating feature vectors using decision trees. Typically, the number of feature vectors and the number of decision trees are very high, which prevents loading them into a processor cache. The feature vectors are evaluated by processing the feature vectors across a disjoint subset of trees repeatedly. After loading the feature vectors into the cache, they are evaluated across a first subset of trees, then across a second subset of trees and so on. If the values based on the first and second subsets satisfy a specified criterion, further evaluation of the feature vectors across the remaining of the decision trees is terminated, thereby minimizing the number of trees evaluated and therefore, consumption of computing resources.Type: GrantFiled: April 29, 2015Date of Patent: February 26, 2019Assignee: Facebook, Inc.Inventors: Oleksandr Kuvshynov, Aleksandar Ilic
-
Patent number: 10170107Abstract: An approach to extending the recognizable labels of a label recognizer makes use of an encoding of linguistic inputs and label attributes into comparable vectors. The encodings may be determined with artificial neural networks (ANNs) that are jointly trained, and a comparison between the encoding of a sentence input and the encoding of an intent attribute vector may use a fixed function, which does not have to be trained. The encoding of label attributes can generalize permitting adding of a new label via corresponding attributes, thereby avoiding the need to immediately retrain a label recognizer with example inputs.Type: GrantFiled: December 29, 2016Date of Patent: January 1, 2019Assignee: Amazon Technologies, Inc.Inventors: Markus Dreyer, Pavankumar Reddy Muddireddy, Anjishnu Kumar
-
Patent number: 10163454Abstract: A method is provided for training a Deep Neural Network (DNN) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory. The side frames are preceding side frames preceding the central frames and/or succeeding side frames succeeding the central frames. The method further includes executing pre-training for only the central frames or both the central frames and the side frames and fine-tuning for the central frames and the side frames so as to emphasize connections between acoustic features in the central frames and units of the bottom layer in hidden layer of the DNN.Type: GrantFiled: October 26, 2017Date of Patent: December 25, 2018Assignee: International Business Machines CorporationInventor: Gakuto Kurata
-
Patent number: 10152968Abstract: Systems and methods for speech-based monitoring and/or control of automation devices are described. A speech-based method for monitoring and/or control of automation devices may include steps of determining a type of automation device to which first speech relates based, at least in part, on a location associated with the first speech; selecting a topic-specific speech recognition model adapted to recognize speech related to the determined type of automation device; using the topic-specific speech recognition model to recognize second speech provided at the location, wherein recognizing the second speech comprises identifying a query or command relating to the type of automation device and represented by the second speech; and issuing the query or command represented by the second speech to an automation device of the determined type.Type: GrantFiled: June 27, 2016Date of Patent: December 11, 2018Assignee: Iconics, Inc.Inventors: Russell L. Agrusa, Vojtech Kresl, Christopher N. Elsbree, Marco Tagliaferri, Lukas Volf
-
Patent number: 10147442Abstract: A neural network acoustic model is trained to be robust and produce accurate output when used to process speech signals having acoustic interference. The neural network acoustic model can be trained using a source-separation process by which, in addition to producing the main acoustic model output for a given input, the neural network generates predictions of the separate speech and interference portions of the input. The parameters of the neural network can be adjusted to jointly optimize all three outputs (e.g., the main acoustic model output, the speech signal prediction, and the interference signal prediction), rather than only optimizing the main acoustic model output. Once trained, output layers for the speech and interference signal predictions can be removed from the neural network or otherwise disabled.Type: GrantFiled: September 29, 2015Date of Patent: December 4, 2018Assignee: Amazon Technologies, Inc.Inventors: Sankaran Panchapagesan, Shiva Kumar Sundaram, Arindam Mandal
-
Patent number: 10140981Abstract: Features are disclosed for performing speech recognition on utterances using dynamic weights with speech recognition models. An automatic speech recognition system may use a general speech recognition model, such a large finite state transducer-based language model, to generate speech recognition results for various utterances. The general speech recognition model may include sub-models or other portions that are customized for particular tasks, such as speech recognition on utterances regarding particular topics. Individual weights within the general speech recognition model can be dynamically replaced based on the context in which an utterance is made or received, thereby providing a further degree of customization without requiring additional speech recognition models to generated, maintained, or loaded.Type: GrantFiled: June 10, 2014Date of Patent: November 27, 2018Assignee: Amazon Technologies, Inc.Inventors: Denis Sergeyevich Filimonov, Ariya Rastrow
-
Patent number: 10127904Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the time steps processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output for the time step using a phoneme output layer to generate a phoneme representation for the acoustic feature representation for the time step; and processing the recurrent output for the time step using a grapheme output layer to generate a grapheme representation for the acoustic feature representation for the time step; and extracting, from the phoneme and grapheme representations for the acoustic feature representations at each time step, a respective pronunciation for each of one or more words.Type: GrantFiled: July 29, 2015Date of Patent: November 13, 2018Assignee: Google LLCInventors: Kanury Kanishka Rao, Francoise Beaufays, Hasim Sak, Ouais Alsharif
-
Patent number: 10127495Abstract: Systems and methods for reducing the size of deep neural networks are disclosed. In an embodiment, a server computer stores a plurality of training datasets, each of which comprise a plurality of training input matrices and a plurality of corresponding outputs. The server computer initiates training of a deep neural network using the plurality of training input matrices, a weight matrix, and the plurality of corresponding outputs. While the training of the deep neural network is being performed, the server computer identifies one or more weight values of the weight matrix for removal. The server computer removes the one or more weight values from the weight matrix to generate a reduced weight matrix. The server computer then stores the reduced weight matrix with the deep neural network.Type: GrantFiled: April 14, 2017Date of Patent: November 13, 2018Inventors: Rohan Bopardikar, Sunil Bopardikar
-
Patent number: 10114809Abstract: Method for phonetically annotating text is performed at a computing device. The method includes: identifying a first polyphonic word segment in a text input, the first polyphonic word segment having at least a first pronunciation and a second pronunciation; determining at least a first probability for the first pronunciation and a second probability for the second pronunciation; determining a predetermined threshold difference based on: a comparison of the first and second probabilities with a preset threshold probability value, respectively, and a magnitude of a difference between the first and second probabilities; comparing the difference between the first probability and the second probability with the predetermined threshold difference; and selecting the first pronunciation as a current pronunciation for the first polyphonic word segment in accordance with a determination that the difference between the first probability and the second probability exceeds the predetermined threshold difference.Type: GrantFiled: June 23, 2016Date of Patent: October 30, 2018Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Xiaoping Wu, Qiang Dai
-
Patent number: 10109272Abstract: According to one embodiment, an apparatus for training a neural network acoustic model includes a calculating unit, a clustering unit, and a sharing unit. The calculating unit calculates, based on training data including a training speech and a labeled phoneme state, scores of phoneme states different from the labeled phoneme state. The clustering unit clusters a phoneme state whose score is larger than a predetermined threshold and the labeled phoneme state. The sharing unit shares probability of the labeled phoneme state by the clustered phoneme states. The training unit trains the neural network acoustic model based on the training speech and the clustered phoneme states.Type: GrantFiled: September 12, 2016Date of Patent: October 23, 2018Assignee: Kabushiki Kaisha ToshibaInventors: Huifeng Zhu, Yan Deng, Pei Ding, Kun Yong, Jie Hao
-
Patent number: 10090001Abstract: Method of speech enhancement using Neural Network-based combined signal starts with training neural network offline which includes: (i) exciting at least one accelerometer and at least one microphone using training accelerometer signal and training acoustic signal, respectively. The training accelerometer signal and the training acoustic signal are correlated during clean speech segments. Training neural network offline further includes (ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. Other embodiments are described.Type: GrantFiled: August 1, 2016Date of Patent: October 2, 2018Assignee: Apple Inc.Inventors: Lalin S. Theverapperuma, Vasu Iyengar, Sarmad Aziz Malik, Raghavendra Prabhu
-
Patent number: 10089977Abstract: Exemplary embodiments of the present invention provide a method of system combination in an audio analytics application including providing a plurality of language identification systems in which each of the language identification systems includes a plurality of probabilities. Each probability is associated with the system's ability to detect a particular language. The method of system combination in the audio analytics application includes receiving data at the language identification systems. The received data is different from data used to train the language identification systems. A confidence measure is determined for each of the language identification systems. The confidence measure identifies which language its system predicts for the received data and combining the language identification systems according to the confidence measures.Type: GrantFiled: July 7, 2015Date of Patent: October 2, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sriram Ganapathy, Mohamed K. Omar, Robert Ward
-
Patent number: 10062374Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.Type: GrantFiled: July 18, 2014Date of Patent: August 28, 2018Assignee: Nuance Communications, Inc.Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
-
Patent number: 10056075Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.Type: GrantFiled: December 9, 2016Date of Patent: August 21, 2018Assignee: International Business Machines CorporationInventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
-
Patent number: 10049668Abstract: Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.Type: GrantFiled: May 16, 2016Date of Patent: August 14, 2018Assignee: Apple Inc.Inventors: Rongqing Huang, Ilya Oparin
-
Patent number: 10043512Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.Type: GrantFiled: November 11, 2016Date of Patent: August 7, 2018Assignee: Google LLCInventors: Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Samuel Bengio, Ilya Sutskever
-
Patent number: 10013973Abstract: A method for generating a test-speaker-specific adaptive system for recognising sounds in speech spoken by a test speaker; the method employing: (i) training data comprising speech items spoken by the test speaker; and (ii) an input network component and a speaker adaptive output network, the input network component and speaker adaptive output network having been trained using training data from training speakers; the method comprising: (a) using the training data to train a test-speaker-specific adaptive model component of an adaptive model comprising the input network component, and the test-speaker-specific adaptive model component, and (b) providing the test-speaker-specific adaptive system comprising the input network component, the trained test-speaker-specific adaptive model component, and the speaker-adaptive output network.Type: GrantFiled: January 17, 2017Date of Patent: July 3, 2018Assignee: Kabushiki Kaisha ToshibaInventor: Rama Doddipatla
-
Patent number: 10009635Abstract: Methods and apparatus to identify media content using temporal signal characteristics are disclosed. An example method includes generating a reference signature based on a reference signal corresponding to known media, generating sums based on peaks in a media signal corresponding to media, identifying signal peaks based on the generated sums, generating a second signature based on normalized curve features, wherein the normalized curve features respectively correspond to the identified signal peaks at a corresponding temporal locations of the corresponding signal peak, and determining whether the media signal corresponds to the reference signal based on a comparison of the reference signature and the second signature.Type: GrantFiled: February 13, 2017Date of Patent: June 26, 2018Assignee: The Nielsen Company (US), LLCInventor: Morris Lee
-
Patent number: 10001829Abstract: An electronic device includes an appended module coupled to a core having a standby state comprising a first power supply circuit, a first clock and a circuit that recognizes multiple vocal commands timed by the first clock. The appended module includes a second power supply circuit independent of the first power supply circuit, a second clock independent of the first clock and having a frequency lower than that of the first clock, digital unit timed by the second clock including a sound capture circuit that delivers a processed sound signal, and a processing unit configured in order, in the presence of a parameter of the processed sound signal greater than a threshold, to analyze the content of the processed sound signal and to deliver, when the content of the sound signal comprises a reference pattern, an activating signal to the core that can take it out of its standby state.Type: GrantFiled: September 12, 2015Date of Patent: June 19, 2018Assignee: STMICROELECTRONICS (ROUSSET) SASInventors: Jonathan Cottinet, Jean Claude Bini
-
Patent number: 9997161Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.Type: GrantFiled: September 11, 2015Date of Patent: June 12, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar
-
Patent number: 9946706Abstract: Methods and systems which utilize, in one embodiment, automatic language identification, including automatic language identification for dynamic text processing. In at least certain embodiments, automatic language identification can be applied to spellchecking in real time as the user types.Type: GrantFiled: June 10, 2013Date of Patent: April 17, 2018Assignee: Apple Inc.Inventors: Douglas R. Davidson, Ali Ozer
-
Patent number: 9916538Abstract: Specification covers new algorithms, methods, and systems for artificial intelligence, soft computing, and deep learning/recognition, e.g., image recognition (e.g., for action, gesture, emotion, expression, biometrics, fingerprint, facial, OCR (text), background, relationship, position, pattern, and object), Big Data analytics, machine learning, training schemes, crowd-sourcing (experts), feature space, clustering, classification, SVM, similarity measures, modified Boltzmann Machines, optimization, search engine, ranking, question-answering system, soft (fuzzy or unsharp) boundaries/impreciseness/ambiguities/fuzziness in language, Natural Language Processing (NLP), Computing-with-Words (CWW), parsing, machine translation, sound and speech recognition, video search and analysis (e.g.Type: GrantFiled: March 18, 2014Date of Patent: March 13, 2018Assignee: Z ADVANCED COMPUTING, INC.Inventors: Lotfi A. Zadeh, Saied Tadayon, Bijan Tadayon
-
Patent number: 9916306Abstract: Systems and method for statistical linguistic analysis. According to some embodiments, methods may include evaluating a source text using one or more types of statistical linguistic analysis to determine a translatability of the source text and providing the translatability of the source text to a client node.Type: GrantFiled: October 19, 2012Date of Patent: March 13, 2018Assignee: SDL INC.Inventors: Laurens van den Oever, Jason Matthew Dent
-
Patent number: 9892115Abstract: An improved lecture support system integrates multi-media presentation materials with spoken content so that the listener can follow with both the speech and the supporting materials that accompany the presentation to provide additional understanding. Computer-based systems and methods are disclosed for translation of a spoken presentation (e.g., a lecture, a video) along with the accompanying presentation materials. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.Type: GrantFiled: January 5, 2015Date of Patent: February 13, 2018Assignee: Facebook, Inc.Inventor: Alexander Waibel
-
Patent number: 9881615Abstract: A speech recognition apparatus and method. The speech recognition apparatus includes a first recognizer configured to generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, a second recognizer configured to generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using a language model, and a combiner configured to combine the first recognition result and the second recognition result to generate a final recognition result in the second linguistic recognition unit and to reflect the final recognition result in the language model. The first linguistic recognition unit may be a same linguistic unit type as the second linguistic recognition unit. The first recognizer and the second recognizer are configured in a same neural network and simultaneously/collectively trained in the neural network using audio training data provided to the first recognizer.Type: GrantFiled: July 8, 2016Date of Patent: January 30, 2018Assignee: Samsung Electronics Co., Ltd.Inventors: Hee Youl Choi, Seokjin Hong
-
Patent number: 9858524Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating descriptions of input images. One of the methods includes obtaining an input image; processing the input image using a first neural network to generate an alternative representation for the input image; and processing the alternative representation for the input image using a second neural network to generate a sequence of a plurality of words in a target natural language that describes the input image.Type: GrantFiled: November 13, 2015Date of Patent: January 2, 2018Assignee: Google Inc.Inventors: Samy Bengio, Oriol Vinyals, Alexander Toshkov Toshev, Dumitru Erhan
-
Patent number: 9842610Abstract: A method is provided for training a Deep Neural Network (DNN) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory. The side frames are preceding side frames preceding the central frames and/or succeeding side frames succeeding the central frames. The method further includes executing pre-training for only the central frames or both the central frames and the side frames and fine-tuning for the central frames and the side frames so as to emphasize connections between acoustic features in the central frames and units of the bottom layer in hidden layer of the DNN.Type: GrantFiled: June 26, 2015Date of Patent: December 12, 2017Assignee: International Business Machines CorporationInventor: Gakuto Kurata
-
Patent number: 9842585Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.Type: GrantFiled: March 11, 2013Date of Patent: December 12, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, Yifan Gong
-
Patent number: 9842106Abstract: A method and system processes utterances that are acquired either from an automatic speech recognition (ASR) system or text. The utterances have associated identities of each party, such as role A utterances and role B utterances. The information corresponding to utterances, such as word sequence and identity, are converted to features. Each feature is received in an input layer of a neural network (NN). A dimensionality of each feature is reduced, in a projection layer of the NN, to produce a reduced dimensional feature. The reduced dimensional feature is processed to provide probabilities of labels for the utterances.Type: GrantFiled: December 4, 2015Date of Patent: December 12, 2017Assignee: Mitsubishi Electric Research Laboratories, IncInventors: Chiori Hori, Takaaki Hori, Shinji Watanabe, John Hershey
-
Patent number: 9824684Abstract: A sequence recognition system comprises a prediction component configured to receive a set of observed features from a signal to be recognized and to output a prediction output indicative of a predicted recognition based on the set of observed features. The sequence recognition system also comprises a classification component configured to receive the prediction output and to output a label indicative of recognition of the signal based on the prediction output.Type: GrantFiled: December 22, 2014Date of Patent: November 21, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Dong Yu, Yu Zhang, Michael L. Seltzer, James G. Droppo
-
Patent number: 9824692Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.Type: GrantFiled: September 12, 2016Date of Patent: November 21, 2017Assignee: PINDROP SECURITY, INC.Inventors: Elie Khoury, Matthew Garland
-
Patent number: 9818431Abstract: The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.Type: GrantFiled: December 21, 2015Date of Patent: November 14, 2017Assignee: Microsoft Technoloogy Licensing, LLCInventor: Dong Yu
-
Patent number: 9818409Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.Type: GrantFiled: October 7, 2015Date of Patent: November 14, 2017Assignee: Google Inc.Inventors: Andrew W. Senior, Hasim Sak, Izhak Shafran
-
Patent number: 9818410Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.Type: GrantFiled: December 29, 2015Date of Patent: November 14, 2017Assignee: Google Inc.Inventors: Hasim Sak, Andrew W. Senior
-
Patent number: 9805720Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.Type: GrantFiled: January 13, 2017Date of Patent: October 31, 2017Assignee: International Business Machines CorporationInventors: Jonathan H. Connell, II, Etienne Marcheret
-
Patent number: 9792900Abstract: Various implementations disclosed herein include an expert-assisted phoneme recognition neural network system configured to recognize phonemes within continuous large vocabulary speech sequences without using language specific models (“left-context”), look-ahead (“right-context”) information, or multi-pass sequence processing, and while operating within the resource constraints of low-power and real-time devices. To these ends, in various implementations, an expert-assisted phoneme recognition neural network system as described herein utilizes a-priori phonetic knowledge. Phonetics is concerned with the configuration of the human vocal tract while speaking and acoustic consequences on vocalizations. While similar sounding phonemes are difficult to detect and are frequently misidentified by previously known neural networks, phonetic knowledge gives insight into what aspects of sound acoustics contain the strongest contrast between similar sounding phonemes.Type: GrantFiled: July 6, 2016Date of Patent: October 17, 2017Assignee: MALASPINA LABS (BARBADOS), INC.Inventors: Saeed Mosayyebpour Kaskari, Aanchan Kumar Mohan, Michael David Fry, Dean Wolfgang Neumann
-
Patent number: 9785629Abstract: Methods and systems for automated language detection for domain names are disclosed. In some embodiments, a method for detecting a language of an Internationalized Domain Name (IDN) comprises receiving, by an I/O interface, a string of characters for the IDN; receiving training data, including a plurality of multi-gram analyses for a set of languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing. In some embodiments, the method further comprises comparing the detected language of the IDN with a user selected language and using the IDN to generate a domain name, if the comparing indicates that the detected language of the IDN is consistent with the user selected language.Type: GrantFiled: December 15, 2015Date of Patent: October 10, 2017Assignee: VERISIGN, INC.Inventors: Ronald Andrew Hoskinson, Lambert Arians, Marc Anderson, Mahendra Jain
-
Patent number: 9786270Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.Type: GrantFiled: July 8, 2016Date of Patent: October 10, 2017Assignee: Google Inc.Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
-
Patent number: 9761228Abstract: There are provided a recognition result candidate comparator 205 that compares a plurality of server-side voice recognition result candidates received by a receiver 204, to detect texts having a difference, and a recognition result integrator 206 that integrates a client-side voice recognition result candidate and a server-side voice recognition result candidate on the basis of the client-side voice recognition result candidate, the server-side voice recognition result candidate, and a detection result provided by the recognition result candidate comparator 205, to decide a voice recognition result.Type: GrantFiled: November 20, 2013Date of Patent: September 12, 2017Assignee: Mitsubishi Electric CorporationInventors: Isamu Ogawa, Toshiyuki Hanazawa, Tomohiro Narita
-
Patent number: 9761221Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.Type: GrantFiled: August 20, 2015Date of Patent: September 12, 2017Assignee: Nuance Communications, Inc.Inventors: Steven John Rennie, Vaibhava Goel
-
Patent number: 9754584Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.Type: GrantFiled: November 8, 2016Date of Patent: September 5, 2017Assignee: Google Inc.Inventors: Maria Carolina Parada San Martin, Tara N. Sainath, Guoguo Chen
-
Patent number: 9721562Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.Type: GrantFiled: December 3, 2014Date of Patent: August 1, 2017Assignee: Google Inc.Inventors: Hasim Sak, Andrew W. Senior