Neural Network Patents (Class 704/232)
  • Patent number: 10713808
    Abstract: Disclosed is a stereo matching method and apparatus based on a stereo vision, the method including acquiring a left image and a right image, identifying image data by applying a window to each of the acquired left image and right image, storing the image data in a line buffer, extracting a disparity from the image data stored in the line buffer, and generating a depth map based on the extracted disparity.
    Type: Grant
    Filed: October 25, 2017
    Date of Patent: July 14, 2020
    Assignees: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KYUNGPOOK NATIONAL UNIVERSITY INDUSTRY-ACADEMIC COOPERATION FOUNDATION
    Inventors: Kwang Yong Kim, Byungin Moon, Mi-ryong Park, Kyeong-ryeol Bae
  • Patent number: 10714078
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: July 14, 2020
    Assignee: Google LLC
    Inventors: Samuel Bengio, Mirkó Visontai, Christopher Walter George Thornton, Michiel A. U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
  • Patent number: 10706857
    Abstract: An apparatus including a multi time-frequency resolution convolution neural network module; a two dimensional convolution neural network layers module; and a discriminative fully-connected classifier layers module; wherein the multi time-frequency resolution convolution neural network module receives a raw speech signal from a human speaker and processes the raw speech signal to provide a first processed output in the form of multiple multi time-frequency resolution spectrographic feature maps; wherein the two dimensional convolution neural network layers module processes the first processed output to provide a second processed output; and wherein the discriminative fully-connected classifier layers module processes the second processed output to provide a third processed output, wherein the third processed output provides an indication of an identify of a human speaker or provides an indication of verification of the identify of a human speaker.
    Type: Grant
    Filed: April 20, 2020
    Date of Patent: July 7, 2020
    Assignee: KAIZEN SECURE VOIZ, INC.
    Inventors: Viswanathan Ramasubramanian, Sunderrajan Kumar
  • Patent number: 10699700
    Abstract: Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring multi-talker mixed speech sequence data corresponding to a plurality of speakers, encoding the multi-speaker mixed speech sequence data into an embedded sequence data, generating speaker specific context vectors at each frame based on the embedded sequence, generating senone posteriors for each of the speaker based on the speaker specific context vectors and updating an acoustic model by performing permutation invariant training (PIT) model training based on the senone posteriors.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: June 30, 2020
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Yanmin Qian, Dong Yu
  • Patent number: 10679614
    Abstract: Techniques are described herein for enabling an automated assistant to adjust its behavior depending on a detected vocabulary level or other vocal characteristics of an input utterance provided to an automated assistant. The estimated vocabulary level or other vocal characteristics may be used to influence various aspects of a data processing pipeline employed by the automated assistant. In some implementations, one or more tolerance thresholds associated with, for example, grammatical tolerances or vocabulary tolerances, may be adjusted based on the estimated vocabulary level or vocal characteristics of the input utterance.
    Type: Grant
    Filed: April 24, 2019
    Date of Patent: June 9, 2020
    Assignee: GOOGLE LLC
    Inventors: Pedro Gonnet Anders, Victor Carbune, Daniel Keysers, Thomas Deselaers, Sandro Feuz
  • Patent number: 10649060
    Abstract: Techniques are described herein that are capable of performing sound source localization (SSL) confidence estimation using machine learning. An SSL operation is performed with regard to a sound to determine an SSL direction estimate and an SSL-based confidence associated with the SSL direction estimate based at least in part on a multi-channel representation of the sound. The SSL direction estimate indicates an estimated direction from which the sound is received. The SSL-based confidence indicates an estimated probability that the sound is received from the estimated direction. The multi-channel representation includes representations of the sound that are detected by respective sensors (e.g., microphones). Additional characteristic(s) of the sound are automatically determined.
    Type: Grant
    Filed: July 24, 2017
    Date of Patent: May 12, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Kevin Juho Venalainen
  • Patent number: 10650805
    Abstract: A system and method for speech recognition is provided. Embodiments may include receiving an audio signal at a first deep neural network (“DNN”) associated with a computing device. Embodiments may further include receiving the audio signal at a second deep neural network (“DNN”) associated with a computing device, wherein the second deep neural network includes fewer parameters than the first deep neural network. Embodiments may also include determining whether to select an output from the first deep neural network or the second deep neural network and providing the selected output to a decoder with an overall objective of speeding up ASR.
    Type: Grant
    Filed: September 11, 2014
    Date of Patent: May 12, 2020
    Assignee: Nuance Communications, Inc.
    Inventors: Joel Pinto, Daniel Willett, Christian Plahl
  • Patent number: 10650102
    Abstract: The present disclosure discloses a method and apparatus for generating a parallel text in the same language. The method comprises: acquiring a source segmented word sequence and a pre-trained word vector table; determining a source word vector sequence corresponding to the source segmented word sequence, according to the word vector table; importing the source word vector sequence into a first pre-trained recurrent neural network model, to generate an intermediate vector of a preset dimension for characterizing semantics of the source segmented word sequence; importing the intermediate vector into a second pre-trained recurrent neural network model, to generate a target word vector sequence corresponding to the intermediate vector; and determining a target segmented word sequence corresponding to the target word vector sequence according to the word vector table, and determining the target segmented word sequence as a parallel text in the same language corresponding to the source segmented word sequence.
    Type: Grant
    Filed: February 20, 2018
    Date of Patent: May 12, 2020
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Pengkai Li, Jingzhou He, Zhihong Fu, Xianwei Xin
  • Patent number: 10652657
    Abstract: Provided is a system including: a sink device related information acquisition unit configured to acquire sink device related information, which is information on sink device capable of receiving a signal transmitted from a transmission unit of a main apparatus in response to an operation of switching the transmission unit of the main apparatus on, the sink device related information being acquired before the operation of switching the transmission unit on; a first display control unit configured to display, when information on sink device is contained in the sink device related information, the information on sink device on a display unit based on the sink device related information; and a second display control unit configured to cause, when information on sink device is not contained in the sink device related information, the main apparatus to try to detect sink device, and to display information on sink device on the display unit based on a result of the detection.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: May 12, 2020
    Assignee: Yamaha Corporation
    Inventors: Akihiko Suyama, Masahiro Ishida, Tomoyoshi Akutagawa
  • Patent number: 10635984
    Abstract: A system and method to identify patterns in sets of signals produced during operation of a complex system and combines the identified patterns with records of past conditions to generate operational feedback to one or more machines of the complex system while it operates.
    Type: Grant
    Filed: July 23, 2018
    Date of Patent: April 28, 2020
    Assignee: FALKONRY INC.
    Inventors: Gregory Olsen, Nikunj Mehta, Lenin Kumar Subramanian, Dan Kearns
  • Patent number: 10629185
    Abstract: [Object] An object is to provide a statistical acoustic model adaptation method capable of efficient adaptation of an acoustic model using DNN with training data under a specific condition and achieving higher accuracy. [Solution] A method of speaker adaptation of an acoustic model using DNN includes the steps of: storing speech data 90 to 98 of different speakers separately in a first storage device; preparing speaker-by-speaker hidden layer modules 112 to 120; performing preliminary learning of all layers 42, 44, 110, 48, 50, 52 and 54 of a DNN 80 by switching and selecting the speech data 90 to 98 while dynamically replacing a specific layer 110 with hidden layer modules 112 to 120 corresponding to the selected speech data; replacing the specific layer 110 of the DNN that has completed the preliminary learning with an initial hidden layer; and training the DNN with speech data of a specific speaker while fixing parameters of layers other than the initial hidden layer.
    Type: Grant
    Filed: November 6, 2014
    Date of Patent: April 21, 2020
    Assignee: National Institute of Information and Communications Technology
    Inventors: Shigeki Matsuda, Xugang Lu
  • Patent number: 10621972
    Abstract: The present disclosure provides a method and a device for extracting an acoustic feature based on a convolution neural network and a terminal device. The method includes: arranging speech to be recognized into a speech spectrogram with a predetermined dimension number; and recognizing the speech spectrogram with the predetermined dimension number by the convolution neural network to obtain the acoustic feature of the speech to be recognized.
    Type: Grant
    Filed: March 7, 2018
    Date of Patent: April 14, 2020
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Chao Li, Xiangang Li
  • Patent number: 10614813
    Abstract: Caller identity verification can be improved by employing a multi-step verification that leverages speech features that are obtained from multiple interactions with a caller. An enrollment is performed in which customer speech features and customer information are collected. When a caller calls into the call center, an attempt is made to verify the caller's identity by requesting the caller to speak a predefined phrase, extracting speech features from the spoken phrase, and comparing the phrase. If the purported identity of the caller can be matched with one of the customers based on the comparison, the identity of the caller is verified. If the match cannot be made with a high enough degree of confidence, the customer is asked to speak any phrase that is not predefined. Features are extracted from the caller's speech, combined with features previously extracted from the predefined speech, and compared to the enrollment features.
    Type: Grant
    Filed: November 3, 2017
    Date of Patent: April 7, 2020
    Assignee: Intellisist, Inc.
    Inventors: Gilad Odinak, Yishay Carmiel
  • Patent number: 10607603
    Abstract: Provided is a speech recognition apparatus. The apparatus includes a preprocessor configured to extract select frames from all frames of a first speech of a user, and a score calculator configured to calculate an acoustic score of a second speech, made up of the extracted select frames, by using a Deep Neural Network (DNN)-based acoustic model, and to calculate an acoustic score of frames, of the first speech, other than the select frames based on the calculated acoustic score of the second speech.
    Type: Grant
    Filed: August 9, 2018
    Date of Patent: March 31, 2020
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: In Chul Song
  • Patent number: 10593336
    Abstract: A machine learning multi-dimensional acoustic feature vector authentication system, according to an example of the present disclosure, builds and trains multiple multi-dimensional acoustic feature vector machine learning classifiers to determine a probability of spoofing of a voice. The system may extract an acoustic feature from a voice sample of a user. The system may convert the acoustic feature into multi-dimensional acoustic feature vectors and apply the multi-dimensional acoustic feature vectors to the multi-dimensional acoustic feature vector machine learning classifiers to detect spoofing and determine whether to authenticate a user.
    Type: Grant
    Filed: July 26, 2018
    Date of Patent: March 17, 2020
    Assignees: ACCENTURE GLOBAL SOLUTIONS LIMITED, MISRAM LLC
    Inventors: Constantine T. Boyadjiev, Rajarathnam Chandramouli, Koduvayur Subbalakshmi, Zongru Shao
  • Patent number: 10592001
    Abstract: Systems and methods for using neuromuscular information to improve speech recognition. The system includes a plurality of neuromuscular sensors, arranged on one or more wearable devices, wherein the plurality of neuromuscular sensors is configured to continuously record a plurality of neuromuscular signals from a user, at least one storage device configured to store one or more trained statistical models, and at least one computer processor programmed to provide as an input to the one or more trained statistical models, the plurality of neuromuscular signals or signals derived from the plurality of neuromuscular signals, determine based, at least in part, on an output of the one or more trained statistical models, at least one instruction for modifying an operation of a speech recognizer, and provide the at least one instruction to the speech recognizer.
    Type: Grant
    Filed: May 8, 2018
    Date of Patent: March 17, 2020
    Assignee: Facebook Technologies, LLC
    Inventors: Adam Berenzweig, Patrick Kaifosh, Alan Huan Du, Jeffrey Scott Seely
  • Patent number: 10586151
    Abstract: Some embodiments of the invention provide a novel method for training a multi-layer node network that mitigates against overfitting the adjustable parameters of the network for a particular problem. During training, the method of some embodiments adjusts the modifiable parameters of the network by iteratively identifying different interior-node, influence-attenuating masks that effectively specify different sampled networks of the multi-layer node network. An interior-node, influence-attenuating mask specifies attenuation parameters that are applied (1) to the outputs of the interior nodes of the network in some embodiments, (2) to the inputs of the interior nodes of the network in other embodiments, or (3) to the outputs and inputs of the interior nodes in still other embodiments. In each mask, the attenuation parameters can be any one of several values (e.g., three or more values) within a range of values (e.g., between 0 and 1).
    Type: Grant
    Filed: July 31, 2016
    Date of Patent: March 10, 2020
    Assignee: Perceive Corporation
    Inventor: Steven L. Teig
  • Patent number: 10573295
    Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions.
    Type: Grant
    Filed: January 23, 2018
    Date of Patent: February 25, 2020
    Assignee: salesforce.com, inc.
    Inventors: Yingbo Zhou, Caiming Xiong
  • Patent number: 10565306
    Abstract: The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM's memory, of long and short term visual and linguistic information.
    Type: Grant
    Filed: November 18, 2017
    Date of Patent: February 18, 2020
    Assignee: salesforce.com, inc.
    Inventors: Jiasen Lu, Caiming Xiong, Richard Socher
  • Patent number: 10546230
    Abstract: Methods and a system are provided for generating labeled data. A method includes encoding, by a processor-based encoder, a first labeled data into an encoded representation of the first labeled data. The method further includes modifying the encoded representation into a modified representation by adding a perturbation to the encoded representation. The method additionally includes decoding, by a processor-based decoder, the modified representation into a second labeled data.
    Type: Grant
    Filed: August 12, 2016
    Date of Patent: January 28, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Gakuto Kurata
  • Patent number: 10535338
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
    Type: Grant
    Filed: November 2, 2018
    Date of Patent: January 14, 2020
    Assignee: Google LLC
    Inventors: Hasim Sak, Andrew W. Senior
  • Patent number: 10529319
    Abstract: A user adaptive speech recognition method and apparatus are provided. A speech recognition method includes extracting an identity vector representing an individual characteristic of a user from speech data, implementing a sub-neural network by inputting a sub-input vector including at least the identity vector to the sub-neural network, determining a scaling factor based on a result of the implementing of the sub-neural network, implementing a main neural network, configured to perform a recognition operation, by applying the determined scaling factor to the main neural network and inputting the speech data to the main neural network to which the determined scaling factor is applied, and indicating a recognition result of the implementation of the main neural network.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: January 7, 2020
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Inchul Song, Sang Hyun Yoo
  • Patent number: 10521927
    Abstract: Machine learning is used to train a network to predict the location of an internal body marker from surface data. A depth image or other image of the surface of the patient is used to determine the locations of anatomical landmarks. The training may use a loss function that includes a term to limit failure to predict a landmark and/or off-centering of the landmark. The landmarks may then be used to configure medical scanning and/or for diagnosis.
    Type: Grant
    Filed: August 7, 2018
    Date of Patent: December 31, 2019
    Assignee: Siemens Healthcare GmbH
    Inventors: Brian Teixeira, Vivek Kumar Singh, Birgi Tamersoy, Terrence Chen, Kai Ma, Andreas Krauss, Andreas Wimmer
  • Patent number: 10515627
    Abstract: A method and apparatus of building an acoustic feature extracting model, and an acoustic feature extracting method and apparatus. The method of building an acoustic feature extracting model comprises: considering first acoustic features extracted respectively from speech data corresponding to user identifiers as training data; using the training data to train a deep neural network to obtain an acoustic feature extracting model; wherein a target of training the deep neural network is to maximize similarity between the same user's second acoustic features and minimize similarity between different users' second acoustic features. The acoustic feature extracting model according to the present disclosure can self-learn optimal acoustic features that achieves a training target. As compared with a conventional acoustic feature extracting manner with a preset feature type and transformation manner, the acoustic feature extracting manner of the present disclosure achieves better flexibility and higher accuracy.
    Type: Grant
    Filed: May 15, 2018
    Date of Patent: December 24, 2019
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li
  • Patent number: 10515626
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
    Type: Grant
    Filed: December 20, 2017
    Date of Patent: December 24, 2019
    Assignee: Google LLC
    Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson
  • Patent number: 10490182
    Abstract: A data processing technique uses an Artificial Neural Network (ANN) with Rectifier Linear Units (ReLU) to yield improve accuracy in a runtime task, for example, in processing audio-based data acquired by a speech-enabled device. The technique includes a first aspect that relates to initialization of the ANN weights to initially yield a high fraction of positive outputs from the ReLU. These weights are then modified using an iterative procedure in which the weights are incrementally updated. A second aspect relates to controlling the size of the incremental updates (a “learning rate”) during the iterations of training according to a variance of the weights at each layer.
    Type: Grant
    Filed: December 29, 2016
    Date of Patent: November 26, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Ayyavu Madhavaraj, Sri Venkata Surya Siva Rama Krishna Garimella
  • Patent number: 10481863
    Abstract: Aspects of the present disclosure relate to systems and methods for a voice-centric virtual or soft keyboard (or keypad). Unlike other keyboards, embodiments of the present disclosure prioritize the voice keyboard, meanwhile providing users with a quick and uniform navigation to other keyboards (e.g., alphabet, punctuations, symbols, emoji's, etc.). In addition, in embodiments, common actions, such as delete and return are also easily accessible. In embodiments, the keyboard is also configurable to allow a user to organize buttons according to their desired use and layout. Embodiments of such a keyboard provide a voice-centric, seamless, and powerful interface experience for users.
    Type: Grant
    Filed: June 13, 2017
    Date of Patent: November 19, 2019
    Assignee: Baidu USA LLC
    Inventors: Zhuxiaona Wei, Thuan Nguyen, Iat Chan, Kenny M. Liou, Helin Wang, Houchang Lu
  • Patent number: 10474951
    Abstract: Methods and systems for training a neural network include sampling multiple local sub-networks from a global neural network. The local sub-networks include a subset of neurons from each layer of the global neural network. The plurality of local sub-networks are trained at respective local processing devices to produce trained local parameters. The trained local parameters from each local sub-network are averaged to produce trained global parameters.
    Type: Grant
    Filed: September 21, 2016
    Date of Patent: November 12, 2019
    Assignee: NEC Corporation
    Inventors: Renqiang Min, Huahua Wang, Asim Kadav
  • Patent number: 10460043
    Abstract: An apparatus and a method for constructing a multilingual acoustic model, and a computer readable recording medium are provided. The method for constructing a multilingual acoustic model includes dividing an input feature into a common language portion and a distinctive language portion, acquiring a tandem feature by training the divided common language portion and distinctive language portion using a neural network to estimate and remove correlation between phonemes, dividing parameters of an initial acoustic model constructed using the tandem feature into common language parameters and distinctive language parameters, adapting the common language parameters using data of a training language, adapting the distinctive language parameters using data of a target language, and constructing an acoustic model for the target language using the adapted common language parameters and the adapted distinctive language parameters.
    Type: Grant
    Filed: November 22, 2013
    Date of Patent: October 29, 2019
    Assignees: SAMSUNG ELECTRONICS CO., LTD., IDIAP RESEARCH INSTITUTE
    Inventors: Nam-Hoon Kim, Petr Motlicek, Philip Neil Garner, David Imseng, Jae-won Lee, Jeong-Mi Cho
  • Patent number: 10462584
    Abstract: A method for operating a hearing apparatus that has a microphone for converting ambient sound into a microphone signal, involves a number of features being derived from the microphone signal. Three classifiers, which are implemented independently of one another for analyzing a respective assigned acoustic dimension, are each supplied with a specifically assigned selection from these features. The respective classifier is used to generate a respective piece of information about a manifestation of the acoustic dimension assigned to the classifier. At least one of the at least three pieces of information about the respective manifestation of the assigned acoustic dimension is then taken as a basis for altering a signal processing algorithm that is executed for the purpose of processing the microphone signal to produce an output signal.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: October 29, 2019
    Assignee: Sivantos Pte. Ltd.
    Inventors: Marc Aubreville, Marko Lugger
  • Patent number: 10452355
    Abstract: According to an embodiment, an automaton deforming device includes a transforming unit and a deforming unit. The transforming unit generates second values by transforming first values, which either represent weights assigned to transitions in a weighted finite state automaton or represent values that are transformed into weights assigned to transitions in a weighted finite state automaton, in such a way that number of elements of a set of the first values are reduced and an order of the first values is preserved. The deforming unit deforms a weighted finite state automaton in which weights according to the second values are assigned to transitions.
    Type: Grant
    Filed: August 14, 2015
    Date of Patent: October 22, 2019
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Manabu Nagao, Takashi Masuko
  • Patent number: 10446138
    Abstract: A system and method for assessing transcription costs based on an audio file are provided. The method includes method for assessing an audio file for transcription includes accessing at least one audio file for transcription assessment; analyzing the at least one audio file to determine at least one transcription characteristic based on the at least one audio file; and calculating, based on the at least one determined transcription characteristic, an initial bid value for transcription of the audio file.
    Type: Grant
    Filed: October 24, 2017
    Date of Patent: October 15, 2019
    Assignee: Verbit Software Ltd.
    Inventors: Eric Shellef, Kobi Ben Tzvi, Tom Livne
  • Patent number: 10446170
    Abstract: This disclosure relates to solutions for eliminating undesired audio artifacts, such as background noises, on an audio channel. A process for implementing the technology can include receiving a set of audio segments, analyzing the segments using a first ML model to identify a first probability of unwanted background noises in the segments, and if the first probability exceeds a threshold, analyzing the segments using a second ML model to determine a second probability that the one or more background features exist in the segments. In some aspects, the process can include attenuating audio artifacts in the segments, if the second probability exceeds a second threshold. In some implementations, dynamic time stretching and shrinking can be applied to the noise attenuation. Systems and machine-readable media are also provided.
    Type: Grant
    Filed: June 19, 2018
    Date of Patent: October 15, 2019
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Eric Chen, Asbjørn Therkelsen, Espen Moberg, Wei-Lien Hsu
  • Patent number: 10431206
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training a hierarchical recurrent neural network (HRNN) having a plurality of parameters on a plurality of training acoustic sequences to generate phoneme representations of received acoustic sequences. One method includes, for each of the received training acoustic sequences: processing the received acoustic sequence in accordance with current values of the parameters of the HRNN to generate a predicted grapheme representation of the received acoustic sequence; processing an intermediate output generated by an intermediate layer of the HRNN during the processing of the received acoustic sequence to generate one or more predicted phoneme representations of the received acoustic sequence; and adjusting the current values of the parameters of the HRNN based at (i) the predicted grapheme representation and (ii) the one or more predicted phoneme representations.
    Type: Grant
    Filed: August 22, 2016
    Date of Patent: October 1, 2019
    Assignee: Google LLC
    Inventors: Hasim Sak, Kanury Kanishka Rao
  • Patent number: 10403291
    Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.
    Type: Grant
    Filed: June 1, 2018
    Date of Patent: September 3, 2019
    Assignee: Google LLC
    Inventors: Ignacio Lopez Moreno, Li Wan, Quan Wang
  • Patent number: 10403268
    Abstract: A system, article, and method include techniques of automatic speech recognition using posterior confidence scores.
    Type: Grant
    Filed: September 8, 2016
    Date of Patent: September 3, 2019
    Assignee: Intel IP Corporation
    Inventors: David J. Trawick, Joachim Hofer, Josef G. Bauer, Georg Stemmer, Da-Ming Chiang
  • Patent number: 10389741
    Abstract: In one embodiment, a device in a network identifies a new interaction between two or more nodes in the network. The device forms a feature vector using contextual information associated with the new interaction between the two or more nodes. The device causes generation of an anomaly detection model for new node interactions using the feature vector. The device uses the anomaly detection model to determine whether a particular node interaction in the network is anomalous.
    Type: Grant
    Filed: May 24, 2016
    Date of Patent: August 20, 2019
    Assignee: Cisco Technology, Inc.
    Inventors: Pierre-André Savalle, Laurent Sartran, Jean-Philippe Vasseur, Grégory Mermoud
  • Patent number: 10380995
    Abstract: Embodiments of the present disclosure provide a method and a device for extracting speech features based on artificial intelligence. The method includes performing a spectrum analysis on a speech to be recognized to obtain a spectrum program of the speech to be recognized; and extracting features of the spectrum program by using a gated convolution neural network to obtain the speech features of the speech to be recognized. As the spectrum program can describe the speech to be recognized in a form of image, and the gated convolution neural network is an effective method for processing images, the speech features extracted with this method may accurately describe characteristics of the speech.
    Type: Grant
    Filed: December 26, 2017
    Date of Patent: August 13, 2019
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Chao Li, Xiangang Li
  • Patent number: 10380166
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to determine tags for unknown media using multiple media features. The methods, apparatus, systems and articles of manufacture for tagging unknown media, extracts audio features from an audio portion of the unknown media. The methods, apparatus, systems and articles of manufacture for tagging unknown media, extracts image features from image portions of the unknown media. The methods, apparatus, systems and articles of manufacture for tagging unknown media, weights the audio features with respect to the image features or weights the image features with respect to the audio features based at least partially on the recognition technology used to extract the feature. The methods, apparatus, systems and articles of manufacture for tagging unknown media, searches a database of pre-tagged media using the weighted features to generate a list of suggested tags for the unknown media.
    Type: Grant
    Filed: October 19, 2015
    Date of Patent: August 13, 2019
    Assignee: The Nielson Company (US), LLC
    Inventor: Morris Lee
  • Patent number: 10381009
    Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: August 13, 2019
    Assignee: Pindrop Security, Inc.
    Inventors: Elie Khoury, Matthew Garland
  • Patent number: 10366687
    Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.
    Type: Grant
    Filed: December 10, 2015
    Date of Patent: July 30, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Puming Zhan, Xinwei Li
  • Patent number: 10360901
    Abstract: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: July 23, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Tara N. Sainath, Brian E. D. Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran
  • Patent number: 10354652
    Abstract: Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.
    Type: Grant
    Filed: July 13, 2018
    Date of Patent: July 16, 2019
    Assignee: Apple Inc.
    Inventors: Rongqing Huang, Ilya Oparin
  • Patent number: 10347241
    Abstract: Systems and methods can be implemented to conduct speaker-invariant training for speech recognition in a variety of applications. An adversarial multi-task learning scheme for speaker-invariant training can be implemented, aiming at actively curtailing the inter-talker feature variability, while maximizing its senone discriminability to enhance the performance of a deep neural network (DNN) based automatic speech recognition system. In speaker-invariant training, a DNN acoustic model and a speaker classifier network can be jointly optimized to minimize the senone (triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker invariant and senone-discriminative intermediate feature is learned through this adversarial multi-task learning, which can be applied to an automatic speech recognition system. Additional systems and methods are disclosed.
    Type: Grant
    Filed: March 23, 2018
    Date of Patent: July 9, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhong Meng, Vadim Aleksandrovich Mazalov, Yifan Gong, Yong Zhao, Zhuo Chen, Jinyu Li
  • Patent number: 10346351
    Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective OBWG. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output. Each PU group operates as a recurrent neural network LSTM cell.
    Type: Grant
    Filed: April 5, 2016
    Date of Patent: July 9, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: G. Glenn Henry, Terry Parks, Kyle T. O'Brien
  • Patent number: 10331796
    Abstract: An improved lecture support system integrates multi-media presentation materials with spoken content so that the listener can follow with both the speech and the supporting materials that accompany the presentation to provide additional understanding. Computer-based systems and methods are disclosed for translation of a spoken presentation (e.g., a lecture, a video) along with the accompanying presentation materials. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.
    Type: Grant
    Filed: February 12, 2018
    Date of Patent: June 25, 2019
    Assignee: Facebook, Inc.
    Inventor: Alexander Waibel
  • Patent number: 10324467
    Abstract: Systems and methods for automatically self-correcting or correcting in real-time one or more neural networks after detecting a triggering event, or breaching boundary conditions are provided. Such a triggering event may indicate incorrect output signal or data being generated by the one or more neural networks. In particular, machine controllers of the invention limit the operations of neural networks to be within boundary conditions. Autonomous machines of the invention can be self-corrected after a breach of a boundary condition is detected. Autonomous land vehicles of the invention are capable of determining the timing of automatic transition to the manual control from automated driving mode. The controller of the invention filters and saves input-output data sets that fall within boundary conditions for later training of neural networks. The controllers of the invention include security architectures to prevent damages from virus attacks or system malfunctions.
    Type: Grant
    Filed: May 29, 2018
    Date of Patent: June 18, 2019
    Assignee: Apex Artificial Intelligence Industries, Inc.
    Inventor: Kenneth A. Abeloe
  • Patent number: 10319368
    Abstract: A meaning generation method, in a meaning generation apparatus, includes acquiring meaning training data including text data of a sentence that can be an utterance sentence and meaning information indicating a meaning of the sentence and associated with the text data of the sentence, acquiring restatement training data including the text data of the sentence and text data of a restatement sentence of the sentence, and learning association between the utterance sentence and the meaning information and the restatement sentence. The learning includes learning of a degree of importance of a word included in the utterance sentence, and the learning is performed by applying the meaning training data and the restatement training data to a common model, and storing a result of the learning as learning result information.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: June 11, 2019
    Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.
    Inventors: Takashi Ushio, Katsuyoshi Yamagami
  • Patent number: 10311872
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for classification using neural networks. One method includes receiving audio data corresponding to an utterance. Obtaining a transcription of the utterance. Generating a representation of the audio data. Generating a representation of the transcription of the utterance. Providing (i) the representation of the audio data and (ii) the representation of the transcription of the utterance to a classifier that, based on a given representation of the audio data and a given representation of the transcription of the utterance, is trained to output an indication of whether the utterance associated with the given representation is likely directed to an automated assistance or is likely not directed to an automated assistant.
    Type: Grant
    Filed: July 25, 2017
    Date of Patent: June 4, 2019
    Assignee: Google LLC
    Inventors: Nathan David Howard, Gabor Simko, Maria Carolina Parada San Martin, Ramkarthik Kalyanasundaram, Guru Prakash Arumugam, Srinivas Vasudevan
  • Patent number: 10304477
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.
    Type: Grant
    Filed: July 9, 2018
    Date of Patent: May 28, 2019
    Assignee: DeepMind Technologies Limited
    Inventors: Aaron Gerard Antonius van den Oord, Sander Etienne Lea Dieleman, Nal Emmerich Kalchbrenner, Karen Simonyan, Oriol Vinyals