Neural Network Patents (Class 704/232)
  • Patent number: 10269342
    Abstract: A speech recognition system used in a workflow receives and analyzes speech input to recognize and accept a user's response to a task. Under certain conditions, a user's response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve recognition accuracy. For example, if the hypothesis of a user's response matches the expected response then there is a high probability that the user's response was recognized correctly. An expected response may include expected words and wildcard words. Wildcard words represent any recognized word in a user's response. By including wildcard words in the expected response, the speech recognition system may make modifications based on a wide range of user responses.
    Type: Grant
    Filed: October 29, 2014
    Date of Patent: April 23, 2019
    Assignee: HAND HELD PRODUCTS, INC.
    Inventors: Keith Braho, Jason M Makay
  • Patent number: 10268671
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating parse trees for input text segments. One of the methods includes obtaining an input text segment comprising a plurality of inputs arranged according to an input order; processing the inputs in the input text segment using an encoder long short term memory (LSTM) neural network to generate a respective encoder hidden state for each input in the input text segment; and processing the respective encoder hidden states for the inputs in the input text segment using an attention-based decoder LSTM neural network to generate a linearized representation of a parse tree for the input text segment.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: April 23, 2019
    Assignee: Google LLC
    Inventors: Lukasz Mieczyslaw Kaiser, Oriol Vinyals
  • Patent number: 10262644
    Abstract: An application that manipulates audio (or audiovisual) content, automated music creation technologies may be employed to generate new musical content using digital signal processing software hosted on handheld and/or server (or cloud-based) compute platforms to intelligently process and combine a set of audio content captured and submitted by users of modern mobile phones or other handheld compute platforms. The user-submitted recordings may contain speech, singing, musical instruments, or a wide variety of other sound sources, and the recordings may optionally be preprocessed by the handheld devices prior to submission.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: April 16, 2019
    Assignee: Smule, Inc.
    Inventors: Randal Leistikow, Mark Godfrey, Ian S. Simon, Jeannie Yang, Michael W. Allen
  • Patent number: 10254760
    Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.
    Type: Grant
    Filed: June 4, 2018
    Date of Patent: April 9, 2019
    Assignee: Apex Artificial Intelligence Industries, Inc.
    Inventor: Kenneth A. Abeloe
  • Patent number: 10242665
    Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.
    Type: Grant
    Filed: June 4, 2018
    Date of Patent: March 26, 2019
    Assignee: Apex Artificial Intelligence Industries, Inc.
    Inventor: Kenneth A. Abeloe
  • Patent number: 10235992
    Abstract: The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode.
    Type: Grant
    Filed: July 27, 2017
    Date of Patent: March 19, 2019
    Assignee: nVoq Incorporated
    Inventors: Charles Corfield, Brian Marquette
  • Patent number: 10235994
    Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.
    Type: Grant
    Filed: June 30, 2016
    Date of Patent: March 19, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
  • Patent number: 10229672
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
    Type: Grant
    Filed: January 3, 2017
    Date of Patent: March 12, 2019
    Assignee: Google LLC
    Inventors: Kanury Kanishka Rao, Andrew W. Senior, Hasim Sak
  • Patent number: 10217052
    Abstract: The disclosure is directed to evaluating feature vectors using decision trees. Typically, the number of feature vectors and the number of decision trees are very high, which prevents loading them into a processor cache. The feature vectors are evaluated by processing the feature vectors across a disjoint subset of trees repeatedly. After loading the feature vectors into the cache, they are evaluated across a first subset of trees, then across a second subset of trees and so on. If the values based on the first and second subsets satisfy a specified criterion, further evaluation of the feature vectors across the remaining of the decision trees is terminated, thereby minimizing the number of trees evaluated and therefore, consumption of computing resources.
    Type: Grant
    Filed: April 29, 2015
    Date of Patent: February 26, 2019
    Assignee: Facebook, Inc.
    Inventors: Oleksandr Kuvshynov, Aleksandar Ilic
  • Patent number: 10170107
    Abstract: An approach to extending the recognizable labels of a label recognizer makes use of an encoding of linguistic inputs and label attributes into comparable vectors. The encodings may be determined with artificial neural networks (ANNs) that are jointly trained, and a comparison between the encoding of a sentence input and the encoding of an intent attribute vector may use a fixed function, which does not have to be trained. The encoding of label attributes can generalize permitting adding of a new label via corresponding attributes, thereby avoiding the need to immediately retrain a label recognizer with example inputs.
    Type: Grant
    Filed: December 29, 2016
    Date of Patent: January 1, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Markus Dreyer, Pavankumar Reddy Muddireddy, Anjishnu Kumar
  • Patent number: 10163454
    Abstract: A method is provided for training a Deep Neural Network (DNN) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory. The side frames are preceding side frames preceding the central frames and/or succeeding side frames succeeding the central frames. The method further includes executing pre-training for only the central frames or both the central frames and the side frames and fine-tuning for the central frames and the side frames so as to emphasize connections between acoustic features in the central frames and units of the bottom layer in hidden layer of the DNN.
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: December 25, 2018
    Assignee: International Business Machines Corporation
    Inventor: Gakuto Kurata
  • Patent number: 10152968
    Abstract: Systems and methods for speech-based monitoring and/or control of automation devices are described. A speech-based method for monitoring and/or control of automation devices may include steps of determining a type of automation device to which first speech relates based, at least in part, on a location associated with the first speech; selecting a topic-specific speech recognition model adapted to recognize speech related to the determined type of automation device; using the topic-specific speech recognition model to recognize second speech provided at the location, wherein recognizing the second speech comprises identifying a query or command relating to the type of automation device and represented by the second speech; and issuing the query or command represented by the second speech to an automation device of the determined type.
    Type: Grant
    Filed: June 27, 2016
    Date of Patent: December 11, 2018
    Assignee: Iconics, Inc.
    Inventors: Russell L. Agrusa, Vojtech Kresl, Christopher N. Elsbree, Marco Tagliaferri, Lukas Volf
  • Patent number: 10147442
    Abstract: A neural network acoustic model is trained to be robust and produce accurate output when used to process speech signals having acoustic interference. The neural network acoustic model can be trained using a source-separation process by which, in addition to producing the main acoustic model output for a given input, the neural network generates predictions of the separate speech and interference portions of the input. The parameters of the neural network can be adjusted to jointly optimize all three outputs (e.g., the main acoustic model output, the speech signal prediction, and the interference signal prediction), rather than only optimizing the main acoustic model output. Once trained, output layers for the speech and interference signal predictions can be removed from the neural network or otherwise disabled.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: December 4, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Sankaran Panchapagesan, Shiva Kumar Sundaram, Arindam Mandal
  • Patent number: 10140981
    Abstract: Features are disclosed for performing speech recognition on utterances using dynamic weights with speech recognition models. An automatic speech recognition system may use a general speech recognition model, such a large finite state transducer-based language model, to generate speech recognition results for various utterances. The general speech recognition model may include sub-models or other portions that are customized for particular tasks, such as speech recognition on utterances regarding particular topics. Individual weights within the general speech recognition model can be dynamically replaced based on the context in which an utterance is made or received, thereby providing a further degree of customization without requiring additional speech recognition models to generated, maintained, or loaded.
    Type: Grant
    Filed: June 10, 2014
    Date of Patent: November 27, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Denis Sergeyevich Filimonov, Ariya Rastrow
  • Patent number: 10127904
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the time steps processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output for the time step using a phoneme output layer to generate a phoneme representation for the acoustic feature representation for the time step; and processing the recurrent output for the time step using a grapheme output layer to generate a grapheme representation for the acoustic feature representation for the time step; and extracting, from the phoneme and grapheme representations for the acoustic feature representations at each time step, a respective pronunciation for each of one or more words.
    Type: Grant
    Filed: July 29, 2015
    Date of Patent: November 13, 2018
    Assignee: Google LLC
    Inventors: Kanury Kanishka Rao, Francoise Beaufays, Hasim Sak, Ouais Alsharif
  • Patent number: 10127495
    Abstract: Systems and methods for reducing the size of deep neural networks are disclosed. In an embodiment, a server computer stores a plurality of training datasets, each of which comprise a plurality of training input matrices and a plurality of corresponding outputs. The server computer initiates training of a deep neural network using the plurality of training input matrices, a weight matrix, and the plurality of corresponding outputs. While the training of the deep neural network is being performed, the server computer identifies one or more weight values of the weight matrix for removal. The server computer removes the one or more weight values from the weight matrix to generate a reduced weight matrix. The server computer then stores the reduced weight matrix with the deep neural network.
    Type: Grant
    Filed: April 14, 2017
    Date of Patent: November 13, 2018
    Inventors: Rohan Bopardikar, Sunil Bopardikar
  • Patent number: 10114809
    Abstract: Method for phonetically annotating text is performed at a computing device. The method includes: identifying a first polyphonic word segment in a text input, the first polyphonic word segment having at least a first pronunciation and a second pronunciation; determining at least a first probability for the first pronunciation and a second probability for the second pronunciation; determining a predetermined threshold difference based on: a comparison of the first and second probabilities with a preset threshold probability value, respectively, and a magnitude of a difference between the first and second probabilities; comparing the difference between the first probability and the second probability with the predetermined threshold difference; and selecting the first pronunciation as a current pronunciation for the first polyphonic word segment in accordance with a determination that the difference between the first probability and the second probability exceeds the predetermined threshold difference.
    Type: Grant
    Filed: June 23, 2016
    Date of Patent: October 30, 2018
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Xiaoping Wu, Qiang Dai
  • Patent number: 10109272
    Abstract: According to one embodiment, an apparatus for training a neural network acoustic model includes a calculating unit, a clustering unit, and a sharing unit. The calculating unit calculates, based on training data including a training speech and a labeled phoneme state, scores of phoneme states different from the labeled phoneme state. The clustering unit clusters a phoneme state whose score is larger than a predetermined threshold and the labeled phoneme state. The sharing unit shares probability of the labeled phoneme state by the clustered phoneme states. The training unit trains the neural network acoustic model based on the training speech and the clustered phoneme states.
    Type: Grant
    Filed: September 12, 2016
    Date of Patent: October 23, 2018
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Huifeng Zhu, Yan Deng, Pei Ding, Kun Yong, Jie Hao
  • Patent number: 10090001
    Abstract: Method of speech enhancement using Neural Network-based combined signal starts with training neural network offline which includes: (i) exciting at least one accelerometer and at least one microphone using training accelerometer signal and training acoustic signal, respectively. The training accelerometer signal and the training acoustic signal are correlated during clean speech segments. Training neural network offline further includes (ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. Other embodiments are described.
    Type: Grant
    Filed: August 1, 2016
    Date of Patent: October 2, 2018
    Assignee: Apple Inc.
    Inventors: Lalin S. Theverapperuma, Vasu Iyengar, Sarmad Aziz Malik, Raghavendra Prabhu
  • Patent number: 10089977
    Abstract: Exemplary embodiments of the present invention provide a method of system combination in an audio analytics application including providing a plurality of language identification systems in which each of the language identification systems includes a plurality of probabilities. Each probability is associated with the system's ability to detect a particular language. The method of system combination in the audio analytics application includes receiving data at the language identification systems. The received data is different from data used to train the language identification systems. A confidence measure is determined for each of the language identification systems. The confidence measure identifies which language its system predicts for the received data and combining the language identification systems according to the confidence measures.
    Type: Grant
    Filed: July 7, 2015
    Date of Patent: October 2, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sriram Ganapathy, Mohamed K. Omar, Robert Ward
  • Patent number: 10062374
    Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.
    Type: Grant
    Filed: July 18, 2014
    Date of Patent: August 28, 2018
    Assignee: Nuance Communications, Inc.
    Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
  • Patent number: 10056075
    Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: August 21, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
  • Patent number: 10049668
    Abstract: Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.
    Type: Grant
    Filed: May 16, 2016
    Date of Patent: August 14, 2018
    Assignee: Apple Inc.
    Inventors: Rongqing Huang, Ilya Oparin
  • Patent number: 10043512
    Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.
    Type: Grant
    Filed: November 11, 2016
    Date of Patent: August 7, 2018
    Assignee: Google LLC
    Inventors: Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Samuel Bengio, Ilya Sutskever
  • Patent number: 10013973
    Abstract: A method for generating a test-speaker-specific adaptive system for recognising sounds in speech spoken by a test speaker; the method employing: (i) training data comprising speech items spoken by the test speaker; and (ii) an input network component and a speaker adaptive output network, the input network component and speaker adaptive output network having been trained using training data from training speakers; the method comprising: (a) using the training data to train a test-speaker-specific adaptive model component of an adaptive model comprising the input network component, and the test-speaker-specific adaptive model component, and (b) providing the test-speaker-specific adaptive system comprising the input network component, the trained test-speaker-specific adaptive model component, and the speaker-adaptive output network.
    Type: Grant
    Filed: January 17, 2017
    Date of Patent: July 3, 2018
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Rama Doddipatla
  • Patent number: 10009635
    Abstract: Methods and apparatus to identify media content using temporal signal characteristics are disclosed. An example method includes generating a reference signature based on a reference signal corresponding to known media, generating sums based on peaks in a media signal corresponding to media, identifying signal peaks based on the generated sums, generating a second signature based on normalized curve features, wherein the normalized curve features respectively correspond to the identified signal peaks at a corresponding temporal locations of the corresponding signal peak, and determining whether the media signal corresponds to the reference signal based on a comparison of the reference signature and the second signature.
    Type: Grant
    Filed: February 13, 2017
    Date of Patent: June 26, 2018
    Assignee: The Nielsen Company (US), LLC
    Inventor: Morris Lee
  • Patent number: 10001829
    Abstract: An electronic device includes an appended module coupled to a core having a standby state comprising a first power supply circuit, a first clock and a circuit that recognizes multiple vocal commands timed by the first clock. The appended module includes a second power supply circuit independent of the first power supply circuit, a second clock independent of the first clock and having a frequency lower than that of the first clock, digital unit timed by the second clock including a sound capture circuit that delivers a processed sound signal, and a processing unit configured in order, in the presence of a parameter of the processed sound signal greater than a threshold, to analyze the content of the processed sound signal and to deliver, when the content of the sound signal comprises a reference pattern, an activating signal to the core that can take it out of its standby state.
    Type: Grant
    Filed: September 12, 2015
    Date of Patent: June 19, 2018
    Assignee: STMICROELECTRONICS (ROUSSET) SAS
    Inventors: Jonathan Cottinet, Jean Claude Bini
  • Patent number: 9997161
    Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: June 12, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar
  • Patent number: 9946706
    Abstract: Methods and systems which utilize, in one embodiment, automatic language identification, including automatic language identification for dynamic text processing. In at least certain embodiments, automatic language identification can be applied to spellchecking in real time as the user types.
    Type: Grant
    Filed: June 10, 2013
    Date of Patent: April 17, 2018
    Assignee: Apple Inc.
    Inventors: Douglas R. Davidson, Ali Ozer
  • Patent number: 9916538
    Abstract: Specification covers new algorithms, methods, and systems for artificial intelligence, soft computing, and deep learning/recognition, e.g., image recognition (e.g., for action, gesture, emotion, expression, biometrics, fingerprint, facial, OCR (text), background, relationship, position, pattern, and object), Big Data analytics, machine learning, training schemes, crowd-sourcing (experts), feature space, clustering, classification, SVM, similarity measures, modified Boltzmann Machines, optimization, search engine, ranking, question-answering system, soft (fuzzy or unsharp) boundaries/impreciseness/ambiguities/fuzziness in language, Natural Language Processing (NLP), Computing-with-Words (CWW), parsing, machine translation, sound and speech recognition, video search and analysis (e.g.
    Type: Grant
    Filed: March 18, 2014
    Date of Patent: March 13, 2018
    Assignee: Z ADVANCED COMPUTING, INC.
    Inventors: Lotfi A. Zadeh, Saied Tadayon, Bijan Tadayon
  • Patent number: 9916306
    Abstract: Systems and method for statistical linguistic analysis. According to some embodiments, methods may include evaluating a source text using one or more types of statistical linguistic analysis to determine a translatability of the source text and providing the translatability of the source text to a client node.
    Type: Grant
    Filed: October 19, 2012
    Date of Patent: March 13, 2018
    Assignee: SDL INC.
    Inventors: Laurens van den Oever, Jason Matthew Dent
  • Patent number: 9892115
    Abstract: An improved lecture support system integrates multi-media presentation materials with spoken content so that the listener can follow with both the speech and the supporting materials that accompany the presentation to provide additional understanding. Computer-based systems and methods are disclosed for translation of a spoken presentation (e.g., a lecture, a video) along with the accompanying presentation materials. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.
    Type: Grant
    Filed: January 5, 2015
    Date of Patent: February 13, 2018
    Assignee: Facebook, Inc.
    Inventor: Alexander Waibel
  • Patent number: 9881615
    Abstract: A speech recognition apparatus and method. The speech recognition apparatus includes a first recognizer configured to generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, a second recognizer configured to generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using a language model, and a combiner configured to combine the first recognition result and the second recognition result to generate a final recognition result in the second linguistic recognition unit and to reflect the final recognition result in the language model. The first linguistic recognition unit may be a same linguistic unit type as the second linguistic recognition unit. The first recognizer and the second recognizer are configured in a same neural network and simultaneously/collectively trained in the neural network using audio training data provided to the first recognizer.
    Type: Grant
    Filed: July 8, 2016
    Date of Patent: January 30, 2018
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hee Youl Choi, Seokjin Hong
  • Patent number: 9858524
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating descriptions of input images. One of the methods includes obtaining an input image; processing the input image using a first neural network to generate an alternative representation for the input image; and processing the alternative representation for the input image using a second neural network to generate a sequence of a plurality of words in a target natural language that describes the input image.
    Type: Grant
    Filed: November 13, 2015
    Date of Patent: January 2, 2018
    Assignee: Google Inc.
    Inventors: Samy Bengio, Oriol Vinyals, Alexander Toshkov Toshev, Dumitru Erhan
  • Patent number: 9842610
    Abstract: A method is provided for training a Deep Neural Network (DNN) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory. The side frames are preceding side frames preceding the central frames and/or succeeding side frames succeeding the central frames. The method further includes executing pre-training for only the central frames or both the central frames and the side frames and fine-tuning for the central frames and the side frames so as to emphasize connections between acoustic features in the central frames and units of the bottom layer in hidden layer of the DNN.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: December 12, 2017
    Assignee: International Business Machines Corporation
    Inventor: Gakuto Kurata
  • Patent number: 9842585
    Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: December 12, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, Yifan Gong
  • Patent number: 9842106
    Abstract: A method and system processes utterances that are acquired either from an automatic speech recognition (ASR) system or text. The utterances have associated identities of each party, such as role A utterances and role B utterances. The information corresponding to utterances, such as word sequence and identity, are converted to features. Each feature is received in an input layer of a neural network (NN). A dimensionality of each feature is reduced, in a projection layer of the NN, to produce a reduced dimensional feature. The reduced dimensional feature is processed to provide probabilities of labels for the utterances.
    Type: Grant
    Filed: December 4, 2015
    Date of Patent: December 12, 2017
    Assignee: Mitsubishi Electric Research Laboratories, Inc
    Inventors: Chiori Hori, Takaaki Hori, Shinji Watanabe, John Hershey
  • Patent number: 9824684
    Abstract: A sequence recognition system comprises a prediction component configured to receive a set of observed features from a signal to be recognized and to output a prediction output indicative of a predicted recognition based on the set of observed features. The sequence recognition system also comprises a classification component configured to receive the prediction output and to output a label indicative of recognition of the signal based on the prediction output.
    Type: Grant
    Filed: December 22, 2014
    Date of Patent: November 21, 2017
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Dong Yu, Yu Zhang, Michael L. Seltzer, James G. Droppo
  • Patent number: 9824692
    Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.
    Type: Grant
    Filed: September 12, 2016
    Date of Patent: November 21, 2017
    Assignee: PINDROP SECURITY, INC.
    Inventors: Elie Khoury, Matthew Garland
  • Patent number: 9818431
    Abstract: The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: November 14, 2017
    Assignee: Microsoft Technoloogy Licensing, LLC
    Inventor: Dong Yu
  • Patent number: 9818409
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.
    Type: Grant
    Filed: October 7, 2015
    Date of Patent: November 14, 2017
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Hasim Sak, Izhak Shafran
  • Patent number: 9818410
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.
    Type: Grant
    Filed: December 29, 2015
    Date of Patent: November 14, 2017
    Assignee: Google Inc.
    Inventors: Hasim Sak, Andrew W. Senior
  • Patent number: 9805720
    Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.
    Type: Grant
    Filed: January 13, 2017
    Date of Patent: October 31, 2017
    Assignee: International Business Machines Corporation
    Inventors: Jonathan H. Connell, II, Etienne Marcheret
  • Patent number: 9792900
    Abstract: Various implementations disclosed herein include an expert-assisted phoneme recognition neural network system configured to recognize phonemes within continuous large vocabulary speech sequences without using language specific models (“left-context”), look-ahead (“right-context”) information, or multi-pass sequence processing, and while operating within the resource constraints of low-power and real-time devices. To these ends, in various implementations, an expert-assisted phoneme recognition neural network system as described herein utilizes a-priori phonetic knowledge. Phonetics is concerned with the configuration of the human vocal tract while speaking and acoustic consequences on vocalizations. While similar sounding phonemes are difficult to detect and are frequently misidentified by previously known neural networks, phonetic knowledge gives insight into what aspects of sound acoustics contain the strongest contrast between similar sounding phonemes.
    Type: Grant
    Filed: July 6, 2016
    Date of Patent: October 17, 2017
    Assignee: MALASPINA LABS (BARBADOS), INC.
    Inventors: Saeed Mosayyebpour Kaskari, Aanchan Kumar Mohan, Michael David Fry, Dean Wolfgang Neumann
  • Patent number: 9785629
    Abstract: Methods and systems for automated language detection for domain names are disclosed. In some embodiments, a method for detecting a language of an Internationalized Domain Name (IDN) comprises receiving, by an I/O interface, a string of characters for the IDN; receiving training data, including a plurality of multi-gram analyses for a set of languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing. In some embodiments, the method further comprises comparing the detected language of the IDN with a user selected language and using the IDN to generate a domain name, if the comparing indicates that the detected language of the IDN is consistent with the user selected language.
    Type: Grant
    Filed: December 15, 2015
    Date of Patent: October 10, 2017
    Assignee: VERISIGN, INC.
    Inventors: Ronald Andrew Hoskinson, Lambert Arians, Marc Anderson, Mahendra Jain
  • Patent number: 9786270
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.
    Type: Grant
    Filed: July 8, 2016
    Date of Patent: October 10, 2017
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
  • Patent number: 9761228
    Abstract: There are provided a recognition result candidate comparator 205 that compares a plurality of server-side voice recognition result candidates received by a receiver 204, to detect texts having a difference, and a recognition result integrator 206 that integrates a client-side voice recognition result candidate and a server-side voice recognition result candidate on the basis of the client-side voice recognition result candidate, the server-side voice recognition result candidate, and a detection result provided by the recognition result candidate comparator 205, to decide a voice recognition result.
    Type: Grant
    Filed: November 20, 2013
    Date of Patent: September 12, 2017
    Assignee: Mitsubishi Electric Corporation
    Inventors: Isamu Ogawa, Toshiyuki Hanazawa, Tomohiro Narita
  • Patent number: 9761221
    Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: September 12, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Steven John Rennie, Vaibhava Goel
  • Patent number: 9754584
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.
    Type: Grant
    Filed: November 8, 2016
    Date of Patent: September 5, 2017
    Assignee: Google Inc.
    Inventors: Maria Carolina Parada San Martin, Tara N. Sainath, Guoguo Chen
  • Patent number: 9721562
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
    Type: Grant
    Filed: December 3, 2014
    Date of Patent: August 1, 2017
    Assignee: Google Inc.
    Inventors: Hasim Sak, Andrew W. Senior