Neural Network Patents (Class 704/232)

Stereo matching method and system using rectangular window

Patent number: 10713808

Abstract: Disclosed is a stereo matching method and apparatus based on a stereo vision, the method including acquiring a left image and a right image, identifying image data by applying a window to each of the acquired left image and right image, storing the image data in a line buffer, extracting a disparity from the image data stored in the line buffer, and generating a depth map based on the extracted disparity.

Type: Grant

Filed: October 25, 2017

Date of Patent: July 14, 2020

Assignees: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KYUNGPOOK NATIONAL UNIVERSITY INDUSTRY-ACADEMIC COOPERATION FOUNDATION

Inventors: Kwang Yong Kim, Byungin Moon, Mi-ryong Park, Kyeong-ryeol Bae
Linear transformation for speech recognition modeling

Patent number: 10714078

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Type: Grant

Filed: October 26, 2018

Date of Patent: July 14, 2020

Assignee: Google LLC

Inventors: Samuel Bengio, Mirkó Visontai, Christopher Walter George Thornton, Michiel A. U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
Raw speech speaker-recognition

Patent number: 10706857

Abstract: An apparatus including a multi time-frequency resolution convolution neural network module; a two dimensional convolution neural network layers module; and a discriminative fully-connected classifier layers module; wherein the multi time-frequency resolution convolution neural network module receives a raw speech signal from a human speaker and processes the raw speech signal to provide a first processed output in the form of multiple multi time-frequency resolution spectrographic feature maps; wherein the two dimensional convolution neural network layers module processes the first processed output to provide a second processed output; and wherein the discriminative fully-connected classifier layers module processes the second processed output to provide a third processed output, wherein the third processed output provides an indication of an identify of a human speaker or provides an indication of verification of the identify of a human speaker.

Type: Grant

Filed: April 20, 2020

Date of Patent: July 7, 2020

Assignee: KAIZEN SECURE VOIZ, INC.

Inventors: Viswanathan Ramasubramanian, Sunderrajan Kumar
Monaural multi-talker speech recognition with attention mechanism and gated convolutional networks

Patent number: 10699700

Abstract: Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring multi-talker mixed speech sequence data corresponding to a plurality of speakers, encoding the multi-speaker mixed speech sequence data into an embedded sequence data, generating speaker specific context vectors at each frame based on the embedded sequence, generating senone posteriors for each of the speaker based on the speaker specific context vectors and updating an acoustic model by performing permutation invariant training (PIT) model training based on the senone posteriors.

Type: Grant

Filed: July 31, 2018

Date of Patent: June 30, 2020

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Yanmin Qian, Dong Yu
Systems and method to resolve audio-based requests in a networked environment

Patent number: 10679614

Abstract: Techniques are described herein for enabling an automated assistant to adjust its behavior depending on a detected vocabulary level or other vocal characteristics of an input utterance provided to an automated assistant. The estimated vocabulary level or other vocal characteristics may be used to influence various aspects of a data processing pipeline employed by the automated assistant. In some implementations, one or more tolerance thresholds associated with, for example, grammatical tolerances or vocabulary tolerances, may be adjusted based on the estimated vocabulary level or vocal characteristics of the input utterance.

Type: Grant

Filed: April 24, 2019

Date of Patent: June 9, 2020

Assignee: GOOGLE LLC

Inventors: Pedro Gonnet Anders, Victor Carbune, Daniel Keysers, Thomas Deselaers, Sandro Feuz
Sound source localization confidence estimation using machine learning

Patent number: 10649060

Abstract: Techniques are described herein that are capable of performing sound source localization (SSL) confidence estimation using machine learning. An SSL operation is performed with regard to a sound to determine an SSL direction estimate and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound. The SSL direction estimate indicates an estimated direction from which the sound is received. The SSL-based confidence indicates an estimated probability that the sound is received from the estimated direction. The multi-channel representation includes representations of the sound that are detected by respective sensors (e.g., microphones). Additional characteristic(s) of the sound are automatically determined.

Type: Grant

Filed: July 24, 2017

Date of Patent: May 12, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventor: Kevin Juho Venalainen
Method for scoring in an automatic speech recognition system

Patent number: 10650805

Abstract: A system and method for speech recognition is provided. Embodiments may include receiving an audio signal at a first deep neural network (“DNN”) associated with a computing device. Embodiments may further include receiving the audio signal at a second deep neural network (“DNN”) associated with a computing device, wherein the second deep neural network includes fewer parameters than the first deep neural network. Embodiments may also include determining whether to select an output from the first deep neural network or the second deep neural network and providing the selected output to a decoder with an overall objective of speeding up ASR.

Type: Grant

Filed: September 11, 2014

Date of Patent: May 12, 2020

Assignee: Nuance Communications, Inc.

Inventors: Joel Pinto, Daniel Willett, Christian Plahl
Method and apparatus for generating parallel text in same language

Patent number: 10650102

Abstract: The present disclosure discloses a method and apparatus for generating a parallel text in the same language. The method comprises: acquiring a source segmented word sequence and a pre-trained word vector table; determining a source word vector sequence corresponding to the source segmented word sequence, according to the word vector table; importing the source word vector sequence into a first pre-trained recurrent neural network model, to generate an intermediate vector of a preset dimension for characterizing semantics of the source segmented word sequence; importing the intermediate vector into a second pre-trained recurrent neural network model, to generate a target word vector sequence corresponding to the intermediate vector; and determining a target segmented word sequence corresponding to the target word vector sequence according to the word vector table, and determining the target segmented word sequence as a parallel text in the same language corresponding to the source segmented word sequence.

Type: Grant

Filed: February 20, 2018

Date of Patent: May 12, 2020

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Pengkai Li, Jingzhou He, Zhihong Fu, Xianwei Xin
System, terminal apparatus, main apparatus, and method

Patent number: 10652657

Abstract: Provided is a system including: a sink device related information acquisition unit configured to acquire sink device related information, which is information on sink device capable of receiving a signal transmitted from a transmission unit of a main apparatus in response to an operation of switching the transmission unit of the main apparatus on, the sink device related information being acquired before the operation of switching the transmission unit on; a first display control unit configured to display, when information on sink device is contained in the sink device related information, the information on sink device on a display unit based on the sink device related information; and a second display control unit configured to cause, when information on sink device is not contained in the sink device related information, the main apparatus to try to detect sink device, and to display information on sink device on the display unit based on a result of the detection.

Type: Grant

Filed: November 29, 2017

Date of Patent: May 12, 2020

Assignee: Yamaha Corporation

Inventors: Akihiko Suyama, Masahiro Ishida, Tomoyoshi Akutagawa
System and method for the assessment of condition in complex operational systems based on multi-level pattern recognition

Patent number: 10635984

Abstract: A system and method to identify patterns in sets of signals produced during operation of a complex system and combines the identified patterns with records of past conditions to generate operational feedback to one or more machines of the complex system while it operates.

Type: Grant

Filed: July 23, 2018

Date of Patent: April 28, 2020

Assignee: FALKONRY INC.

Inventors: Gregory Olsen, Nikunj Mehta, Lenin Kumar Subramanian, Dan Kearns
Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model

Patent number: 10629185

Abstract: [Object] An object is to provide a statistical acoustic model adaptation method capable of efficient adaptation of an acoustic model using DNN with training data under a specific condition and achieving higher accuracy. [Solution] A method of speaker adaptation of an acoustic model using DNN includes the steps of: storing speech data 90 to 98 of different speakers separately in a first storage device; preparing speaker-by-speaker hidden layer modules 112 to 120; performing preliminary learning of all layers 42, 44, 110, 48, 50, 52 and 54 of a DNN 80 by switching and selecting the speech data 90 to 98 while dynamically replacing a specific layer 110 with hidden layer modules 112 to 120 corresponding to the selected speech data; replacing the specific layer 110 of the DNN that has completed the preliminary learning with an initial hidden layer; and training the DNN with speech data of a specific speaker while fixing parameters of layers other than the initial hidden layer.

Type: Grant

Filed: November 6, 2014

Date of Patent: April 21, 2020

Assignee: National Institute of Information and Communications Technology

Inventors: Shigeki Matsuda, Xugang Lu
Method and device extracting acoustic feature based on convolution neural network and terminal device

Patent number: 10621972

Abstract: The present disclosure provides a method and a device for extracting an acoustic feature based on a convolution neural network and a terminal device. The method includes: arranging speech to be recognized into a speech spectrogram with a predetermined dimension number; and recognizing the speech spectrogram with the predetermined dimension number by the convolution neural network to obtain the acoustic feature of the speech to be recognized.

Type: Grant

Filed: March 7, 2018

Date of Patent: April 14, 2020

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Chao Li, Xiangang Li
System and method for performing caller identity verification using multi-step voice analysis

Patent number: 10614813

Abstract: Caller identity verification can be improved by employing a multi-step verification that leverages speech features that are obtained from multiple interactions with a caller. An enrollment is performed in which customer speech features and customer information are collected. When a caller calls into the call center, an attempt is made to verify the caller's identity by requesting the caller to speak a predefined phrase, extracting speech features from the spoken phrase, and comparing the phrase. If the purported identity of the caller can be matched with one of the customers based on the comparison, the identity of the caller is verified. If the match cannot be made with a high enough degree of confidence, the customer is asked to speak any phrase that is not predefined. Features are extracted from the caller's speech, combined with features previously extracted from the predefined speech, and compared to the enrollment features.

Type: Grant

Filed: November 3, 2017

Date of Patent: April 7, 2020

Assignee: Intellisist, Inc.

Inventors: Gilad Odinak, Yishay Carmiel
Speech recognition apparatus and method with acoustic modelling

Patent number: 10607603

Abstract: Provided is a speech recognition apparatus. The apparatus includes a preprocessor configured to extract select frames from all frames of a first speech of a user, and a score calculator configured to calculate an acoustic score of a second speech, made up of the extracted select frames, by using a Deep Neural Network (DNN)-based acoustic model, and to calculate an acoustic score of frames, of the first speech, other than the select frames based on the calculated acoustic score of the second speech.

Type: Grant

Filed: August 9, 2018

Date of Patent: March 31, 2020

Assignee: Samsung Electronics Co., Ltd.

Inventor: In Chul Song
Machine learning for authenticating voice

Patent number: 10593336

Abstract: A machine learning multi-dimensional acoustic feature vector authentication system, according to an example of the present disclosure, builds and trains multiple multi-dimensional acoustic feature vector machine learning classifiers to determine a probability of spoofing of a voice. The system may extract an acoustic feature from a voice sample of a user. The system may convert the acoustic feature into multi-dimensional acoustic feature vectors and apply the multi-dimensional acoustic feature vectors to the multi-dimensional acoustic feature vector machine learning classifiers to detect spoofing and determine whether to authenticate a user.

Type: Grant

Filed: July 26, 2018

Date of Patent: March 17, 2020

Assignees: ACCENTURE GLOBAL SOLUTIONS LIMITED, MISRAM LLC

Inventors: Constantine T. Boyadjiev, Rajarathnam Chandramouli, Koduvayur Subbalakshmi, Zongru Shao
Systems and methods for improved speech recognition using neuromuscular information

Patent number: 10592001

Abstract: Systems and methods for using neuromuscular information to improve speech recognition. The system includes a plurality of neuromuscular sensors, arranged on one or more wearable devices, wherein the plurality of neuromuscular sensors is configured to continuously record a plurality of neuromuscular signals from a user, at least one storage device configured to store one or more trained statistical models, and at least one computer processor programmed to provide as an input to the one or more trained statistical models, the plurality of neuromuscular signals or signals derived from the plurality of neuromuscular signals, determine based, at least in part, on an output of the one or more trained statistical models, at least one instruction for modifying an operation of a speech recognizer, and provide the at least one instruction to the speech recognizer.

Type: Grant

Filed: May 8, 2018

Date of Patent: March 17, 2020

Assignee: Facebook Technologies, LLC

Inventors: Adam Berenzweig, Patrick Kaifosh, Alan Huan Du, Jeffrey Scott Seely
Mitigating overfitting in training machine trained networks

Patent number: 10586151

Abstract: Some embodiments of the invention provide a novel method for training a multi-layer node network that mitigates against overfitting the adjustable parameters of the network for a particular problem. During training, the method of some embodiments adjusts the modifiable parameters of the network by iteratively identifying different interior-node, influence-attenuating masks that effectively specify different sampled networks of the multi-layer node network. An interior-node, influence-attenuating mask specifies attenuation parameters that are applied (1) to the outputs of the interior nodes of the network in some embodiments, (2) to the inputs of the interior nodes of the network in other embodiments, or (3) to the outputs and inputs of the interior nodes in still other embodiments. In each mask, the attenuation parameters can be any one of several values (e.g., three or more values) within a range of values (e.g., between 0 and 1).

Type: Grant

Filed: July 31, 2016

Date of Patent: March 10, 2020

Assignee: Perceive Corporation

Inventor: Steven L. Teig
End-to-end speech recognition with policy learning

Patent number: 10573295

Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions.

Type: Grant

Filed: January 23, 2018

Date of Patent: February 25, 2020

Assignee: salesforce.com, inc.

Inventors: Yingbo Zhou, Caiming Xiong
Sentinel gate for modulating auxiliary information in a long short-term memory (LSTM) neural network

Patent number: 10565306

Abstract: The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM's memory, of long and short term visual and linguistic information.

Type: Grant

Filed: November 18, 2017

Date of Patent: February 18, 2020

Assignee: salesforce.com, inc.

Inventors: Jiasen Lu, Caiming Xiong, Richard Socher
Generating labeled data by sequence-to-sequence modeling with added perturbations to encoded information

Patent number: 10546230

Abstract: Methods and a system are provided for generating labeled data. A method includes encoding, by a processor-based encoder, a first labeled data into an encoded representation of the first labeled data. The method further includes modifying the encoded representation into a modified representation by adding a perturbation to the encoded representation. The method additionally includes decoding, by a processor-based decoder, the modified representation into a second labeled data.

Type: Grant

Filed: August 12, 2016

Date of Patent: January 28, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Gakuto Kurata
Generating representations of acoustic sequences

Patent number: 10535338

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

Type: Grant

Filed: November 2, 2018

Date of Patent: January 14, 2020

Assignee: Google LLC

Inventors: Hasim Sak, Andrew W. Senior
User adaptive speech recognition method and apparatus

Patent number: 10529319

Abstract: A user adaptive speech recognition method and apparatus are provided. A speech recognition method includes extracting an identity vector representing an individual characteristic of a user from speech data, implementing a sub-neural network by inputting a sub-input vector including at least the identity vector to the sub-neural network, determining a scaling factor based on a result of the implementing of the sub-neural network, implementing a main neural network, configured to perform a recognition operation, by applying the determined scaling factor to the main neural network and inputting the speech data to the main neural network to which the determined scaling factor is applied, and indicating a recognition result of the implementation of the main neural network.

Type: Grant

Filed: December 27, 2017

Date of Patent: January 7, 2020

Assignee: Samsung Electronics Co., Ltd.

Inventors: Inchul Song, Sang Hyun Yoo
Internal body marker prediction from surface data in medical imaging

Patent number: 10521927

Abstract: Machine learning is used to train a network to predict the location of an internal body marker from surface data. A depth image or other image of the surface of the patient is used to determine the locations of anatomical landmarks. The training may use a loss function that includes a term to limit failure to predict a landmark and/or off-centering of the landmark. The landmarks may then be used to configure medical scanning and/or for diagnosis.

Type: Grant

Filed: August 7, 2018

Date of Patent: December 31, 2019

Assignee: Siemens Healthcare GmbH

Inventors: Brian Teixeira, Vivek Kumar Singh, Birgi Tamersoy, Terrence Chen, Kai Ma, Andreas Krauss, Andreas Wimmer
Method and apparatus of building acoustic feature extracting model, and acoustic feature extracting method and apparatus

Patent number: 10515627

Abstract: A method and apparatus of building an acoustic feature extracting model, and an acoustic feature extracting method and apparatus. The method of building an acoustic feature extracting model comprises: considering first acoustic features extracted respectively from speech data corresponding to user identifiers as training data; using the training data to train a deep neural network to obtain an acoustic feature extracting model; wherein a target of training the deep neural network is to maximize similarity between the same user's second acoustic features and minimize similarity between different users' second acoustic features. The acoustic feature extracting model according to the present disclosure can self-learn optimal acoustic features that achieves a training target. As compared with a conventional acoustic feature extracting manner with a preset feature type and transformation manner, the acoustic feature extracting manner of the present disclosure achieves better flexibility and higher accuracy.

Type: Grant

Filed: May 15, 2018

Date of Patent: December 24, 2019

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li
Adaptive audio enhancement for multichannel speech recognition

Patent number: 10515626

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Grant

Filed: December 20, 2017

Date of Patent: December 24, 2019

Assignee: Google LLC

Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson
Initializing and learning rate adjustment for rectifier linear unit based artificial neural networks

Patent number: 10490182

Abstract: A data processing technique uses an Artificial Neural Network (ANN) with Rectifier Linear Units (ReLU) to yield improve accuracy in a runtime task, for example, in processing audio-based data acquired by a speech-enabled device. The technique includes a first aspect that relates to initialization of the ANN weights to initially yield a high fraction of positive outputs from the ReLU. These weights are then modified using an iterative procedure in which the weights are incrementally updated. A second aspect relates to controlling the size of the incremental updates (a “learning rate”) during the iterations of training according to a variance of the weights at each layer.

Type: Grant

Filed: December 29, 2016

Date of Patent: November 26, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Ayyavu Madhavaraj, Sri Venkata Surya Siva Rama Krishna Garimella
Systems and methods for improved user interface

Patent number: 10481863

Abstract: Aspects of the present disclosure relate to systems and methods for a voice-centric virtual or soft keyboard (or keypad). Unlike other keyboards, embodiments of the present disclosure prioritize the voice keyboard, meanwhile providing users with a quick and uniform navigation to other keyboards (e.g., alphabet, punctuations, symbols, emoji's, etc.). In addition, in embodiments, common actions, such as delete and return are also easily accessible. In embodiments, the keyboard is also configurable to allow a user to organize buttons according to their desired use and layout. Embodiments of such a keyboard provide a voice-centric, seamless, and powerful interface experience for users.

Type: Grant

Filed: June 13, 2017

Date of Patent: November 19, 2019

Assignee: Baidu USA LLC

Inventors: Zhuxiaona Wei, Thuan Nguyen, Iat Chan, Kenny M. Liou, Helin Wang, Houchang Lu
Memory efficient scalable deep learning with model parallelization

Patent number: 10474951

Abstract: Methods and systems for training a neural network include sampling multiple local sub-networks from a global neural network. The local sub-networks include a subset of neurons from each layer of the global neural network. The plurality of local sub-networks are trained at respective local processing devices to produce trained local parameters. The trained local parameters from each local sub-network are averaged to produce trained global parameters.

Type: Grant

Filed: September 21, 2016

Date of Patent: November 12, 2019

Assignee: NEC Corporation

Inventors: Renqiang Min, Huahua Wang, Asim Kadav
Apparatus and method for constructing multilingual acoustic model and computer readable recording medium for storing program for performing the method

Patent number: 10460043

Abstract: An apparatus and a method for constructing a multilingual acoustic model, and a computer readable recording medium are provided. The method for constructing a multilingual acoustic model includes dividing an input feature into a common language portion and a distinctive language portion, acquiring a tandem feature by training the divided common language portion and distinctive language portion using a neural network to estimate and remove correlation between phonemes, dividing parameters of an initial acoustic model constructed using the tandem feature into common language parameters and distinctive language parameters, adapting the common language parameters using data of a training language, adapting the distinctive language parameters using data of a target language, and constructing an acoustic model for the target language using the adapted common language parameters and the adapted distinctive language parameters.

Type: Grant

Filed: November 22, 2013

Date of Patent: October 29, 2019

Assignees: SAMSUNG ELECTRONICS CO., LTD., IDIAP RESEARCH INSTITUTE

Inventors: Nam-Hoon Kim, Petr Motlicek, Philip Neil Garner, David Imseng, Jae-won Lee, Jeong-Mi Cho
Method for operating a hearing apparatus, and hearing apparatus

Patent number: 10462584

Abstract: A method for operating a hearing apparatus that has a microphone for converting ambient sound into a microphone signal, involves a number of features being derived from the microphone signal. Three classifiers, which are implemented independently of one another for analyzing a respective assigned acoustic dimension, are each supplied with a specifically assigned selection from these features. The respective classifier is used to generate a respective piece of information about a manifestation of the acoustic dimension assigned to the classifier. At least one of the at least three pieces of information about the respective manifestation of the assigned acoustic dimension is then taken as a basis for altering a signal processing algorithm that is executed for the purpose of processing the microphone signal to produce an output signal.

Type: Grant

Filed: March 30, 2018

Date of Patent: October 29, 2019

Assignee: Sivantos Pte. Ltd.

Inventors: Marc Aubreville, Marko Lugger
Automaton deforming device, automaton deforming method, and computer program product

Patent number: 10452355

Abstract: According to an embodiment, an automaton deforming device includes a transforming unit and a deforming unit. The transforming unit generates second values by transforming first values, which either represent weights assigned to transitions in a weighted finite state automaton or represent values that are transformed into weights assigned to transitions in a weighted finite state automaton, in such a way that number of elements of a set of the first values are reduced and an order of the first values is preserved. The deforming unit deforms a weighted finite state automaton in which weights according to the second values are assigned to transitions.

Type: Grant

Filed: August 14, 2015

Date of Patent: October 22, 2019

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Manabu Nagao, Takashi Masuko
System and method for assessing audio files for transcription services

Patent number: 10446138

Abstract: A system and method for assessing transcription costs based on an audio file are provided. The method includes method for assessing an audio file for transcription includes accessing at least one audio file for transcription assessment; analyzing the at least one audio file to determine at least one transcription characteristic based on the at least one audio file; and calculating, based on the at least one determined transcription characteristic, an initial bid value for transcription of the audio file.

Type: Grant

Filed: October 24, 2017

Date of Patent: October 15, 2019

Assignee: Verbit Software Ltd.

Inventors: Eric Shellef, Kobi Ben Tzvi, Tom Livne
Noise mitigation using machine learning

Patent number: 10446170

Abstract: This disclosure relates to solutions for eliminating undesired audio artifacts, such as background noises, on an audio channel. A process for implementing the technology can include receiving a set of audio segments, analyzing the segments using a first ML model to identify a first probability of unwanted background noises in the segments, and if the first probability exceeds a threshold, analyzing the segments using a second ML model to determine a second probability that the one or more background features exist in the segments. In some aspects, the process can include attenuating audio artifacts in the segments, if the second probability exceeds a second threshold. In some implementations, dynamic time stretching and shrinking can be applied to the noise attenuation. Systems and machine-readable media are also provided.

Type: Grant

Filed: June 19, 2018

Date of Patent: October 15, 2019

Assignee: CISCO TECHNOLOGY, INC.

Inventors: Eric Chen, Asbjørn Therkelsen, Espen Moberg, Wei-Lien Hsu
Multi-accent speech recognition

Patent number: 10431206

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training a hierarchical recurrent neural network (HRNN) having a plurality of parameters on a plurality of training acoustic sequences to generate phoneme representations of received acoustic sequences. One method includes, for each of the received training acoustic sequences: processing the received acoustic sequence in accordance with current values of the parameters of the HRNN to generate a predicted grapheme representation of the received acoustic sequence; processing an intermediate output generated by an intermediate layer of the HRNN during the processing of the received acoustic sequence to generate one or more predicted phoneme representations of the received acoustic sequence; and adjusting the current values of the parameters of the HRNN based at (i) the predicted grapheme representation and (ii) the one or more predicted phoneme representations.

Type: Grant

Filed: August 22, 2016

Date of Patent: October 1, 2019

Assignee: Google LLC

Inventors: Hasim Sak, Kanury Kanishka Rao
Improving speaker verification across locations, languages, and/or dialects

Patent number: 10403291

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

Type: Grant

Filed: June 1, 2018

Date of Patent: September 3, 2019

Assignee: Google LLC

Inventors: Ignacio Lopez Moreno, Li Wan, Quan Wang
Method and system of automatic speech recognition using posterior confidence scores

Patent number: 10403268

Abstract: A system, article, and method include techniques of automatic speech recognition using posterior confidence scores.

Type: Grant

Filed: September 8, 2016

Date of Patent: September 3, 2019

Assignee: Intel IP Corporation

Inventors: David J. Trawick, Joachim Hofer, Josef G. Bauer, Georg Stemmer, Da-Ming Chiang
Edge-based detection of new and unexpected flows

Patent number: 10389741

Abstract: In one embodiment, a device in a network identifies a new interaction between two or more nodes in the network. The device forms a feature vector using contextual information associated with the new interaction between the two or more nodes. The device causes generation of an anomaly detection model for new node interactions using the feature vector. The device uses the anomaly detection model to determine whether a particular node interaction in the network is anomalous.

Type: Grant

Filed: May 24, 2016

Date of Patent: August 20, 2019

Assignee: Cisco Technology, Inc.

Inventors: Pierre-André Savalle, Laurent Sartran, Jean-Philippe Vasseur, Grégory Mermoud
Method and device for extracting speech features based on artificial intelligence

Patent number: 10380995

Abstract: Embodiments of the present disclosure provide a method and a device for extracting speech features based on artificial intelligence. The method includes performing a spectrum analysis on a speech to be recognized to obtain a spectrum program of the speech to be recognized; and extracting features of the spectrum program by using a gated convolution neural network to obtain the speech features of the speech to be recognized. As the spectrum program can describe the speech to be recognized in a form of image, and the gated convolution neural network is an effective method for processing images, the speech features extracted with this method may accurately describe characteristics of the speech.

Type: Grant

Filed: December 26, 2017

Date of Patent: August 13, 2019

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Chao Li, Xiangang Li
Methods and apparatus to determine tags for media using multiple media features

Patent number: 10380166

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to determine tags for unknown media using multiple media features. The methods, apparatus, systems and articles of manufacture for tagging unknown media, extracts audio features from an audio portion of the unknown media. The methods, apparatus, systems and articles of manufacture for tagging unknown media, extracts image features from image portions of the unknown media. The methods, apparatus, systems and articles of manufacture for tagging unknown media, weights the audio features with respect to the image features or weights the image features with respect to the audio features based at least partially on the recognition technology used to extract the feature. The methods, apparatus, systems and articles of manufacture for tagging unknown media, searches a database of pre-tagged media using the weighted features to generate a list of suggested tags for the unknown media.

Type: Grant

Filed: October 19, 2015

Date of Patent: August 13, 2019

Assignee: The Nielson Company (US), LLC

Inventor: Morris Lee
End-to-end speaker recognition using deep neural network

Patent number: 10381009

Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

Type: Grant

Filed: November 20, 2017

Date of Patent: August 13, 2019

Assignee: Pindrop Security, Inc.

Inventors: Elie Khoury, Matthew Garland
System and methods for adapting neural network acoustic models

Patent number: 10366687

Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.

Type: Grant

Filed: December 10, 2015

Date of Patent: July 30, 2019

Assignee: Nuance Communications, Inc.

Inventors: Puming Zhan, Xinwei Li
Learning front-end speech recognition parameters within neural network training

Patent number: 10360901

Abstract: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.

Type: Grant

Filed: December 5, 2014

Date of Patent: July 23, 2019

Assignee: Nuance Communications, Inc.

Inventors: Tara N. Sainath, Brian E. D. Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran
Applying neural network language models to weighted finite state transducers for automatic speech recognition

Patent number: 10354652

Abstract: Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.

Type: Grant

Filed: July 13, 2018

Date of Patent: July 16, 2019

Assignee: Apple Inc.

Inventors: Rongqing Huang, Ilya Oparin
Speaker-invariant training via adversarial learning

Patent number: 10347241

Abstract: Systems and methods can be implemented to conduct speaker-invariant training for speech recognition in a variety of applications. An adversarial multi-task learning scheme for speaker-invariant training can be implemented, aiming at actively curtailing the inter-talker feature variability, while maximizing its senone discriminability to enhance the performance of a deep neural network (DNN) based automatic speech recognition system. In speaker-invariant training, a DNN acoustic model and a speaker classifier network can be jointly optimized to minimize the senone (triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker invariant and senone-discriminative intermediate feature is learned through this adversarial multi-task learning, which can be applied to an automatic speech recognition system. Additional systems and methods are disclosed.

Type: Grant

Filed: March 23, 2018

Date of Patent: July 9, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Zhong Meng, Vadim Aleksandrovich Mazalov, Yifan Gong, Yong Zhao, Zhuo Chen, Jinyu Li
Neural network unit with output buffer feedback and masking capability with processing unit groups that operate as recurrent neural network LSTM cells

Patent number: 10346351

Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective OBWG. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output. Each PU group operates as a recurrent neural network LSTM cell.

Type: Grant

Filed: April 5, 2016

Date of Patent: July 9, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks, Kyle T. O'Brien
Translation training with cross-lingual multi-media support

Patent number: 10331796

Abstract: An improved lecture support system integrates multi-media presentation materials with spoken content so that the listener can follow with both the speech and the supporting materials that accompany the presentation to provide additional understanding. Computer-based systems and methods are disclosed for translation of a spoken presentation (e.g., a lecture, a video) along with the accompanying presentation materials. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.

Type: Grant

Filed: February 12, 2018

Date of Patent: June 25, 2019

Assignee: Facebook, Inc.

Inventor: Alexander Waibel
Controller systems and methods of limiting the operation of neural networks to be within one or more conditions

Patent number: 10324467

Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.

Type: Grant

Filed: May 29, 2018

Date of Patent: June 18, 2019

Assignee: Apex Artificial Intelligence Industries, Inc.

Inventor: Kenneth A. Abeloe
Meaning generation method, meaning generation apparatus, and storage medium

Patent number: 10319368

Abstract: A meaning generation method, in a meaning generation apparatus, includes acquiring meaning training data including text data of a sentence that can be an utterance sentence and meaning information indicating a meaning of the sentence and associated with the text data of the sentence, acquiring restatement training data including the text data of the sentence and text data of a restatement sentence of the sentence, and learning association between the utterance sentence and the meaning information and the restatement sentence. The learning includes learning of a degree of importance of a word included in the utterance sentence, and the learning is performed by applying the meaning training data and the restatement training data to a common model, and storing a result of the learning as learning result information.

Type: Grant

Filed: June 9, 2017

Date of Patent: June 11, 2019

Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.

Inventors: Takashi Ushio, Katsuyoshi Yamagami
Utterance classifier

Patent number: 10311872

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for classification using neural networks. One method includes receiving audio data corresponding to an utterance. Obtaining a transcription of the utterance. Generating a representation of the audio data. Generating a representation of the transcription of the utterance. Providing (i) the representation of the audio data and (ii) the representation of the transcription of the utterance to a classifier that, based on a given representation of the audio data and a given representation of the transcription of the utterance, is trained to output an indication of whether the utterance associated with the given representation is likely directed to an automated assistance or is likely not directed to an automated assistant.

Type: Grant

Filed: July 25, 2017

Date of Patent: June 4, 2019

Assignee: Google LLC

Inventors: Nathan David Howard, Gabor Simko, Maria Carolina Parada San Martin, Ramkarthik Kalyanasundaram, Guru Prakash Arumugam, Srinivas Vasudevan
Generating audio using neural networks

Patent number: 10304477

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.

Type: Grant

Filed: July 9, 2018

Date of Patent: May 28, 2019

Assignee: DeepMind Technologies Limited

Inventors: Aaron Gerard Antonius van den Oord, Sander Etienne Lea Dieleman, Nal Emmerich Kalchbrenner, Karen Simonyan, Oriol Vinyals

prev 1 2 3 4 5 6 7 8 … next