Neural Network Patents (Class 704/232)

Method and system for recognizing speech using wildcards in an expected response

Patent number: 10269342

Abstract: A speech recognition system used in a workflow receives and analyzes speech input to recognize and accept a user's response to a task. Under certain conditions, a user's response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve recognition accuracy. For example, if the hypothesis of a user's response matches the expected response then there is a high probability that the user's response was recognized correctly. An expected response may include expected words and wildcard words. Wildcard words represent any recognized word in a user's response. By including wildcard words in the expected response, the speech recognition system may make modifications based on a wide range of user responses.

Type: Grant

Filed: October 29, 2014

Date of Patent: April 23, 2019

Assignee: HAND HELD PRODUCTS, INC.

Inventors: Keith Braho, Jason M Makay
Generating parse trees of text segments using neural networks

Patent number: 10268671

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating parse trees for input text segments. One of the methods includes obtaining an input text segment comprising a plurality of inputs arranged according to an input order; processing the inputs in the input text segment using an encoder long short term memory (LSTM) neural network to generate a respective encoder hidden state for each input in the input text segment; and processing the respective encoder hidden states for the inputs in the input text segment using an attention-based decoder LSTM neural network to generate a linearized representation of a parse tree for the input text segment.

Type: Grant

Filed: December 30, 2016

Date of Patent: April 23, 2019

Assignee: Google LLC

Inventors: Lukasz Mieczyslaw Kaiser, Oriol Vinyals
Computationally-assisted musical sequencing and/or composition techniques for social music challenge or competition

Patent number: 10262644

Abstract: An application that manipulates audio (or audiovisual) content, automated music creation technologies may be employed to generate new musical content using digital signal processing software hosted on handheld and/or server (or cloud-based) compute platforms to intelligently process and combine a set of audio content captured and submitted by users of modern mobile phones or other handheld compute platforms. The user-submitted recordings may contain speech, singing, musical instruments, or a wide variety of other sound sources, and the recordings may optionally be preprocessed by the handheld devices prior to submission.

Type: Grant

Filed: December 31, 2014

Date of Patent: April 16, 2019

Assignee: Smule, Inc.

Inventors: Randal Leistikow, Mark Godfrey, Ian S. Simon, Jeannie Yang, Michael W. Allen
Self-correcting controller systems and methods of limiting the operation of neural networks to be within one or more conditions

Patent number: 10254760

Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.

Type: Grant

Filed: June 4, 2018

Date of Patent: April 9, 2019

Assignee: Apex Artificial Intelligence Industries, Inc.

Inventor: Kenneth A. Abeloe
Controller systems and methods of limiting the operation of neural networks to be within one or more conditions

Patent number: 10242665

Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.

Type: Grant

Filed: June 4, 2018

Date of Patent: March 26, 2019

Assignee: Apex Artificial Intelligence Industries, Inc.

Inventor: Kenneth A. Abeloe
Apparatus and methods using a pattern matching speech recognition engine to train a natural language speech recognition engine

Patent number: 10235992

Abstract: The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode.

Type: Grant

Filed: July 27, 2017

Date of Patent: March 19, 2019

Assignee: nVoq Incorporated

Inventors: Charles Corfield, Brian Marquette
Modular deep learning model

Patent number: 10235994

Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.

Type: Grant

Filed: June 30, 2016

Date of Patent: March 19, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
Training acoustic models using connectionist temporal classification

Patent number: 10229672

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.

Type: Grant

Filed: January 3, 2017

Date of Patent: March 12, 2019

Assignee: Google LLC

Inventors: Kanury Kanishka Rao, Andrew W. Senior, Hasim Sak
Evaluating feature vectors across disjoint subsets of decision trees

Patent number: 10217052

Abstract: The disclosure is directed to evaluating feature vectors using decision trees. Typically, the number of feature vectors and the number of decision trees are very high, which prevents loading them into a processor cache. The feature vectors are evaluated by processing the feature vectors across a disjoint subset of trees repeatedly. After loading the feature vectors into the cache, they are evaluated across a first subset of trees, then across a second subset of trees and so on. If the values based on the first and second subsets satisfy a specified criterion, further evaluation of the feature vectors across the remaining of the decision trees is terminated, thereby minimizing the number of trees evaluated and therefore, consumption of computing resources.

Type: Grant

Filed: April 29, 2015

Date of Patent: February 26, 2019

Assignee: Facebook, Inc.

Inventors: Oleksandr Kuvshynov, Aleksandar Ilic
Extendable label recognition of linguistic input

Patent number: 10170107

Abstract: An approach to extending the recognizable labels of a label recognizer makes use of an encoding of linguistic inputs and label attributes into comparable vectors. The encodings may be determined with artificial neural networks (ANNs) that are jointly trained, and a comparison between the encoding of a sentence input and the encoding of an intent attribute vector may use a fixed function, which does not have to be trained. The encoding of label attributes can generalize permitting adding of a new label via corresponding attributes, thereby avoiding the need to immediately retrain a label recognizer with example inputs.

Type: Grant

Filed: December 29, 2016

Date of Patent: January 1, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Markus Dreyer, Pavankumar Reddy Muddireddy, Anjishnu Kumar
Training deep neural network for acoustic modeling in speech recognition

Patent number: 10163454

Abstract: A method is provided for training a Deep Neural Network (DNN) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory. The side frames are preceding side frames preceding the central frames and/or succeeding side frames succeeding the central frames. The method further includes executing pre-training for only the central frames or both the central frames and the side frames and fine-tuning for the central frames and the side frames so as to emphasize connections between acoustic features in the central frames and units of the bottom layer in hidden layer of the DNN.

Type: Grant

Filed: October 26, 2017

Date of Patent: December 25, 2018

Assignee: International Business Machines Corporation

Inventor: Gakuto Kurata
Systems and methods for speech-based monitoring and/or control of automation devices

Patent number: 10152968

Abstract: Systems and methods for speech-based monitoring and/or control of automation devices are described. A speech-based method for monitoring and/or control of automation devices may include steps of determining a type of automation device to which first speech relates based, at least in part, on a location associated with the first speech; selecting a topic-specific speech recognition model adapted to recognize speech related to the determined type of automation device; using the topic-specific speech recognition model to recognize second speech provided at the location, wherein recognizing the second speech comprises identifying a query or command relating to the type of automation device and represented by the second speech; and issuing the query or command represented by the second speech to an automation device of the determined type.

Type: Grant

Filed: June 27, 2016

Date of Patent: December 11, 2018

Assignee: Iconics, Inc.

Inventors: Russell L. Agrusa, Vojtech Kresl, Christopher N. Elsbree, Marco Tagliaferri, Lukas Volf
Robust neural network acoustic model with side task prediction of reference signals

Patent number: 10147442

Abstract: A neural network acoustic model is trained to be robust and produce accurate output when used to process speech signals having acoustic interference. The neural network acoustic model can be trained using a source-separation process by which, in addition to producing the main acoustic model output for a given input, the neural network generates predictions of the separate speech and interference portions of the input. The parameters of the neural network can be adjusted to jointly optimize all three outputs (e.g., the main acoustic model output, the speech signal prediction, and the interference signal prediction), rather than only optimizing the main acoustic model output. Once trained, output layers for the speech and interference signal predictions can be removed from the neural network or otherwise disabled.

Type: Grant

Filed: September 29, 2015

Date of Patent: December 4, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Sankaran Panchapagesan, Shiva Kumar Sundaram, Arindam Mandal
Dynamic arc weights in speech recognition models

Patent number: 10140981

Abstract: Features are disclosed for performing speech recognition on utterances using dynamic weights with speech recognition models. An automatic speech recognition system may use a general speech recognition model, such a large finite state transducer-based language model, to generate speech recognition results for various utterances. The general speech recognition model may include sub-models or other portions that are customized for particular tasks, such as speech recognition on utterances regarding particular topics. Individual weights within the general speech recognition model can be dynamically replaced based on the context in which an utterance is made or received, thereby providing a further degree of customization without requiring additional speech recognition models to generated, maintained, or loaded.

Type: Grant

Filed: June 10, 2014

Date of Patent: November 27, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Denis Sergeyevich Filimonov, Ariya Rastrow
Learning pronunciations from acoustic sequences

Patent number: 10127904

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the time steps processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output for the time step using a phoneme output layer to generate a phoneme representation for the acoustic feature representation for the time step; and processing the recurrent output for the time step using a grapheme output layer to generate a grapheme representation for the acoustic feature representation for the time step; and extracting, from the phoneme and grapheme representations for the acoustic feature representations at each time step, a respective pronunciation for each of one or more words.

Type: Grant

Filed: July 29, 2015

Date of Patent: November 13, 2018

Assignee: Google LLC

Inventors: Kanury Kanishka Rao, Francoise Beaufays, Hasim Sak, Ouais Alsharif
Reducing the size of a neural network through reduction of the weight matrices

Patent number: 10127495

Abstract: Systems and methods for reducing the size of deep neural networks are disclosed. In an embodiment, a server computer stores a plurality of training datasets, each of which comprise a plurality of training input matrices and a plurality of corresponding outputs. The server computer initiates training of a deep neural network using the plurality of training input matrices, a weight matrix, and the plurality of corresponding outputs. While the training of the deep neural network is being performed, the server computer identifies one or more weight values of the weight matrix for removal. The server computer removes the one or more weight values from the weight matrix to generate a reduced weight matrix. The server computer then stores the reduced weight matrix with the deep neural network.

Type: Grant

Filed: April 14, 2017

Date of Patent: November 13, 2018

Inventors: Rohan Bopardikar, Sunil Bopardikar
Method and apparatus for phonetically annotating text

Patent number: 10114809

Abstract: Method for phonetically annotating text is performed at a computing device. The method includes: identifying a first polyphonic word segment in a text input, the first polyphonic word segment having at least a first pronunciation and a second pronunciation; determining at least a first probability for the first pronunciation and a second probability for the second pronunciation; determining a predetermined threshold difference based on: a comparison of the first and second probabilities with a preset threshold probability value, respectively, and a magnitude of a difference between the first and second probabilities; comparing the difference between the first probability and the second probability with the predetermined threshold difference; and selecting the first pronunciation as a current pronunciation for the first polyphonic word segment in accordance with a determination that the difference between the first probability and the second probability exceeds the predetermined threshold difference.

Type: Grant

Filed: June 23, 2016

Date of Patent: October 30, 2018

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Xiaoping Wu, Qiang Dai
Apparatus and method for training a neural network acoustic model, and speech recognition apparatus and method

Patent number: 10109272

Abstract: According to one embodiment, an apparatus for training a neural network acoustic model includes a calculating unit, a clustering unit, and a sharing unit. The calculating unit calculates, based on training data including a training speech and a labeled phoneme state, scores of phoneme states different from the labeled phoneme state. The clustering unit clusters a phoneme state whose score is larger than a predetermined threshold and the labeled phoneme state. The sharing unit shares probability of the labeled phoneme state by the clustered phoneme states. The training unit trains the neural network acoustic model based on the training speech and the clustered phoneme states.

Type: Grant

Filed: September 12, 2016

Date of Patent: October 23, 2018

Assignee: Kabushiki Kaisha Toshiba

Inventors: Huifeng Zhu, Yan Deng, Pei Ding, Kun Yong, Jie Hao
System and method for performing speech enhancement using a neural network-based combined symbol

Patent number: 10090001

Abstract: Method of speech enhancement using Neural Network-based combined signal starts with training neural network offline which includes: (i) exciting at least one accelerometer and at least one microphone using training accelerometer signal and training acoustic signal, respectively. The training accelerometer signal and the training acoustic signal are correlated during clean speech segments. Training neural network offline further includes (ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. Other embodiments are described.

Type: Grant

Filed: August 1, 2016

Date of Patent: October 2, 2018

Assignee: Apple Inc.

Inventors: Lalin S. Theverapperuma, Vasu Iyengar, Sarmad Aziz Malik, Raghavendra Prabhu
Method for system combination in an audio analytics application

Patent number: 10089977

Abstract: Exemplary embodiments of the present invention provide a method of system combination in an audio analytics application including providing a plurality of language identification systems in which each of the language identification systems includes a plurality of probabilities. Each probability is associated with the system's ability to detect a particular language. The method of system combination in the audio analytics application includes receiving data at the language identification systems. The received data is different from data used to train the language identification systems. A confidence measure is determined for each of the language identification systems. The confidence measure identifies which language its system predicts for the received data and combining the language identification systems according to the confidence measures.

Type: Grant

Filed: July 7, 2015

Date of Patent: October 2, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sriram Ganapathy, Mohamed K. Omar, Robert Ward
Methods and apparatus for training a transformation component

Patent number: 10062374

Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.

Type: Grant

Filed: July 18, 2014

Date of Patent: August 28, 2018

Assignee: Nuance Communications, Inc.

Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
Systems and methods for accelerating hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Patent number: 10056075

Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.

Type: Grant

Filed: December 9, 2016

Date of Patent: August 21, 2018

Assignee: International Business Machines Corporation

Inventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
Applying neural network language models to weighted finite state transducers for automatic speech recognition

Patent number: 10049668

Abstract: Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.

Type: Grant

Filed: May 16, 2016

Date of Patent: August 14, 2018

Assignee: Apple Inc.

Inventors: Rongqing Huang, Ilya Oparin
Generating target sequences from input sequences using partial conditioning

Patent number: 10043512

Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.

Type: Grant

Filed: November 11, 2016

Date of Patent: August 7, 2018

Assignee: Google LLC

Inventors: Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Samuel Bengio, Ilya Sutskever
Speaker-adaptive speech recognition

Patent number: 10013973

Abstract: A method for generating a test-speaker-specific adaptive system for recognising sounds in speech spoken by a test speaker; the method employing: (i) training data comprising speech items spoken by the test speaker; and (ii) an input network component and a speaker adaptive output network, the input network component and speaker adaptive output network having been trained using training data from training speakers; the method comprising: (a) using the training data to train a test-speaker-specific adaptive model component of an adaptive model comprising the input network component, and the test-speaker-specific adaptive model component, and (b) providing the test-speaker-specific adaptive system comprising the input network component, the trained test-speaker-specific adaptive model component, and the speaker-adaptive output network.

Type: Grant

Filed: January 17, 2017

Date of Patent: July 3, 2018

Assignee: Kabushiki Kaisha Toshiba

Inventor: Rama Doddipatla
Methods and apparatus for generating signatures to identify media content using temporal signal characteristics

Patent number: 10009635

Abstract: Methods and apparatus to identify media content using temporal signal characteristics are disclosed. An example method includes generating a reference signature based on a reference signal corresponding to known media, generating sums based on peaks in a media signal corresponding to media, identifying signal peaks based on the generated sums, generating a second signature based on normalized curve features, wherein the normalized curve features respectively correspond to the identified signal peaks at a corresponding temporal locations of the corresponding signal peak, and determining whether the media signal corresponds to the reference signal based on a comparison of the reference signature and the second signature.

Type: Grant

Filed: February 13, 2017

Date of Patent: June 26, 2018

Assignee: The Nielsen Company (US), LLC

Inventor: Morris Lee
Electronic device comprising a wake up module distinct from a core domain

Patent number: 10001829

Abstract: An electronic device includes an appended module coupled to a core having a standby state comprising a first power supply circuit, a first clock and a circuit that recognizes multiple vocal commands timed by the first clock. The appended module includes a second power supply circuit independent of the first power supply circuit, a second clock independent of the first clock and having a frequency lower than that of the first clock, digital unit timed by the second clock including a sound capture circuit that delivers a processed sound signal, and a processing unit configured in order, in the presence of a parameter of the processed sound signal greater than a threshold, to analyze the content of the processed sound signal and to deliver, when the content of the sound signal comprises a reference pattern, an activating signal to the core that can take it out of its standby state.

Type: Grant

Filed: September 12, 2015

Date of Patent: June 19, 2018

Assignee: STMICROELECTRONICS (ROUSSET) SAS

Inventors: Jonathan Cottinet, Jean Claude Bini
Automatic speech recognition confidence classifier

Patent number: 9997161

Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.

Type: Grant

Filed: September 11, 2015

Date of Patent: June 12, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar
Automatic language identification for dynamic text processing

Patent number: 9946706

Abstract: Methods and systems which utilize, in one embodiment, automatic language identification, including automatic language identification for dynamic text processing. In at least certain embodiments, automatic language identification can be applied to spellchecking in real time as the user types.

Type: Grant

Filed: June 10, 2013

Date of Patent: April 17, 2018

Assignee: Apple Inc.

Inventors: Douglas R. Davidson, Ali Ozer
Method and system for feature detection

Patent number: 9916538

Abstract: Specification covers new algorithms, methods, and systems for artificial intelligence, soft computing, and deep learning/recognition, e.g., image recognition (e.g., for action, gesture, emotion, expression, biometrics, fingerprint, facial, OCR (text), background, relationship, position, pattern, and object), Big Data analytics, machine learning, training schemes, crowd-sourcing (experts), feature space, clustering, classification, SVM, similarity measures, modified Boltzmann Machines, optimization, search engine, ranking, question-answering system, soft (fuzzy or unsharp) boundaries/impreciseness/ambiguities/fuzziness in language, Natural Language Processing (NLP), Computing-with-Words (CWW), parsing, machine translation, sound and speech recognition, video search and analysis (e.g.

Type: Grant

Filed: March 18, 2014

Date of Patent: March 13, 2018

Assignee: Z ADVANCED COMPUTING, INC.

Inventors: Lotfi A. Zadeh, Saied Tadayon, Bijan Tadayon
Statistical linguistic analysis of source content

Patent number: 9916306

Abstract: Systems and method for statistical linguistic analysis. According to some embodiments, methods may include evaluating a source text using one or more types of statistical linguistic analysis to determine a translatability of the source text and providing the translatability of the source text to a client node.

Type: Grant

Filed: October 19, 2012

Date of Patent: March 13, 2018

Assignee: SDL INC.

Inventors: Laurens van den Oever, Jason Matthew Dent
Translation training with cross-lingual multi-media support

Patent number: 9892115

Abstract: An improved lecture support system integrates multi-media presentation materials with spoken content so that the listener can follow with both the speech and the supporting materials that accompany the presentation to provide additional understanding. Computer-based systems and methods are disclosed for translation of a spoken presentation (e.g., a lecture, a video) along with the accompanying presentation materials. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.

Type: Grant

Filed: January 5, 2015

Date of Patent: February 13, 2018

Assignee: Facebook, Inc.

Inventor: Alexander Waibel
Speech recognition apparatus and method

Patent number: 9881615

Abstract: A speech recognition apparatus and method. The speech recognition apparatus includes a first recognizer configured to generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, a second recognizer configured to generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using a language model, and a combiner configured to combine the first recognition result and the second recognition result to generate a final recognition result in the second linguistic recognition unit and to reflect the final recognition result in the language model. The first linguistic recognition unit may be a same linguistic unit type as the second linguistic recognition unit. The first recognizer and the second recognizer are configured in a same neural network and simultaneously/collectively trained in the neural network using audio training data provided to the first recognizer.

Type: Grant

Filed: July 8, 2016

Date of Patent: January 30, 2018

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hee Youl Choi, Seokjin Hong
Generating natural language descriptions of images

Patent number: 9858524

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating descriptions of input images. One of the methods includes obtaining an input image; processing the input image using a first neural network to generate an alternative representation for the input image; and processing the alternative representation for the input image using a second neural network to generate a sequence of a plurality of words in a target natural language that describes the input image.

Type: Grant

Filed: November 13, 2015

Date of Patent: January 2, 2018

Assignee: Google Inc.

Inventors: Samy Bengio, Oriol Vinyals, Alexander Toshkov Toshev, Dumitru Erhan
Training deep neural network for acoustic modeling in speech recognition

Patent number: 9842610

Abstract: A method is provided for training a Deep Neural Network (DNN) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory. The side frames are preceding side frames preceding the central frames and/or succeeding side frames succeeding the central frames. The method further includes executing pre-training for only the central frames or both the central frames and the side frames and fine-tuning for the central frames and the side frames so as to emphasize connections between acoustic features in the central frames and units of the bottom layer in hidden layer of the DNN.

Type: Grant

Filed: June 26, 2015

Date of Patent: December 12, 2017

Assignee: International Business Machines Corporation

Inventor: Gakuto Kurata
Multilingual deep neural network

Patent number: 9842585

Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.

Type: Grant

Filed: March 11, 2013

Date of Patent: December 12, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jui-Ting Huang, Jinyu Li, Dong Yu, Li Deng, Yifan Gong
Method and system for role dependent context sensitive spoken and textual language understanding with neural networks

Patent number: 9842106

Abstract: A method and system processes utterances that are acquired either from an automatic speech recognition (ASR) system or text. The utterances have associated identities of each party, such as role A utterances and role B utterances. The information corresponding to utterances, such as word sequence and identity, are converted to features. Each feature is received in an input layer of a neural network (NN). A dimensionality of each feature is reduced, in a projection layer of the NN, to produce a reduced dimensional feature. The reduced dimensional feature is processed to provide probabilities of labels for the utterances.

Type: Grant

Filed: December 4, 2015

Date of Patent: December 12, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc

Inventors: Chiori Hori, Takaaki Hori, Shinji Watanabe, John Hershey
Prediction-based sequence recognition

Patent number: 9824684

Abstract: A sequence recognition system comprises a prediction component configured to receive a set of observed features from a signal to be recognized and to output a prediction output indicative of a predicted recognition based on the set of observed features. The sequence recognition system also comprises a classification component configured to receive the prediction output and to output a label indicative of recognition of the signal based on the prediction output.

Type: Grant

Filed: December 22, 2014

Date of Patent: November 21, 2017

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Dong Yu, Yu Zhang, Michael L. Seltzer, James G. Droppo
End-to-end speaker recognition using deep neural network

Patent number: 9824692

Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

Type: Grant

Filed: September 12, 2016

Date of Patent: November 21, 2017

Assignee: PINDROP SECURITY, INC.

Inventors: Elie Khoury, Matthew Garland
Multi-speaker speech separation

Patent number: 9818431

Abstract: The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.

Type: Grant

Filed: December 21, 2015

Date of Patent: November 14, 2017

Assignee: Microsoft Technoloogy Licensing, LLC

Inventor: Dong Yu
Context-dependent modeling of phonemes

Patent number: 9818409

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

Type: Grant

Filed: October 7, 2015

Date of Patent: November 14, 2017

Assignee: Google Inc.

Inventors: Andrew W. Senior, Hasim Sak, Izhak Shafran
Speech recognition with acoustic models

Patent number: 9818410

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

Type: Grant

Filed: December 29, 2015

Date of Patent: November 14, 2017

Assignee: Google Inc.

Inventors: Hasim Sak, Andrew W. Senior
Speech recognition candidate selection based on non-acoustic input

Patent number: 9805720

Abstract: A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.

Type: Grant

Filed: January 13, 2017

Date of Patent: October 31, 2017

Assignee: International Business Machines Corporation

Inventors: Jonathan H. Connell, II, Etienne Marcheret
Generation of phoneme-experts for speech recognition

Patent number: 9792900

Abstract: Various implementations disclosed herein include an expert-assisted phoneme recognition neural network system configured to recognize phonemes within continuous large vocabulary speech sequences without using language specific models (“left-context”), look-ahead (“right-context”) information, or multi-pass sequence processing, and while operating within the resource constraints of low-power and real-time devices. To these ends, in various implementations, an expert-assisted phoneme recognition neural network system as described herein utilizes a-priori phonetic knowledge. Phonetics is concerned with the configuration of the human vocal tract while speaking and acoustic consequences on vocalizations. While similar sounding phonemes are difficult to detect and are frequently misidentified by previously known neural networks, phonetic knowledge gives insight into what aspects of sound acoustics contain the strongest contrast between similar sounding phonemes.

Type: Grant

Filed: July 6, 2016

Date of Patent: October 17, 2017

Assignee: MALASPINA LABS (BARBADOS), INC.

Inventors: Saeed Mosayyebpour Kaskari, Aanchan Kumar Mohan, Michael David Fry, Dean Wolfgang Neumann
Automated language detection for domain names

Patent number: 9785629

Abstract: Methods and systems for automated language detection for domain names are disclosed. In some embodiments, a method for detecting a language of an Internationalized Domain Name (IDN) comprises receiving, by an I/O interface, a string of characters for the IDN; receiving training data, including a plurality of multi-gram analyses for a set of languages; analyzing, by a processor, the string of characters based on the training data, wherein the analyzing includes extracting a set of multi-grams from the string of characters and comparing the extracted set of multi-grams with the training data; detecting the language of the IDN based on results of the analyzing. In some embodiments, the method further comprises comparing the detected language of the IDN with a user selected language and using the IDN to generate a domain name, if the comparing indicates that the detected language of the IDN is consistent with the user selected language.

Type: Grant

Filed: December 15, 2015

Date of Patent: October 10, 2017

Assignee: VERISIGN, INC.

Inventors: Ronald Andrew Hoskinson, Lambert Arians, Marc Anderson, Mahendra Jain
Generating acoustic models

Patent number: 9786270

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

Type: Grant

Filed: July 8, 2016

Date of Patent: October 10, 2017

Assignee: Google Inc.

Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
Voice recognition system and voice recognition device

Patent number: 9761228

Abstract: There are provided a recognition result candidate comparator 205 that compares a plurality of server-side voice recognition result candidates received by a receiver 204, to detect texts having a difference, and a recognition result integrator 206 that integrates a client-side voice recognition result candidate and a server-side voice recognition result candidate on the basis of the client-side voice recognition result candidate, the server-side voice recognition result candidate, and a detection result provided by the recognition result candidate comparator 205, to decide a voice recognition result.

Type: Grant

Filed: November 20, 2013

Date of Patent: September 12, 2017

Assignee: Mitsubishi Electric Corporation

Inventors: Isamu Ogawa, Toshiyuki Hanazawa, Tomohiro Narita
Order statistic techniques for neural networks

Patent number: 9761221

Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.

Type: Grant

Filed: August 20, 2015

Date of Patent: September 12, 2017

Assignee: Nuance Communications, Inc.

Inventors: Steven John Rennie, Vaibhava Goel
User specified keyword spotting using neural network feature extractor

Patent number: 9754584

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing keywords using a long short term memory neural network. One of the methods includes receiving, by a device for each of multiple variable length enrollment audio signals, a respective plurality of enrollment feature vectors that represent features of the respective variable length enrollment audio signal, processing each of the plurality of enrollment feature vectors using a long short term memory (LSTM) neural network to generate a respective enrollment LSTM output vector for each enrollment feature vector, and generating, for the respective variable length enrollment audio signal, a template fixed length representation for use in determining whether another audio signal encodes another spoken utterance of the enrollment phrase by combining at most a quantity k of the enrollment LSTM output vectors for the enrollment audio signal.

Type: Grant

Filed: November 8, 2016

Date of Patent: September 5, 2017

Assignee: Google Inc.

Inventors: Maria Carolina Parada San Martin, Tara N. Sainath, Guoguo Chen
Generating representations of acoustic sequences

Patent number: 9721562

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

Type: Grant

Filed: December 3, 2014

Date of Patent: August 1, 2017

Assignee: Google Inc.

Inventors: Hasim Sak, Andrew W. Senior

prev 1 2 3 4 5 6 7 8 9 … next