Using Artificial Neural Networks (epo) Patents (Class 704/E15.017)

Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices

Patent number: 11978433

Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.

Type: Grant

Filed: June 22, 2021

Date of Patent: May 7, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Felix Weninger, Marco Gaudesi, Ralf Leibold, Puming Zhan
Style-based architecture for generative neural networks

Patent number: 11682199

Abstract: A style-based generative network architecture enables scale-specific control of synthesized output data, such as images. During training, the style-based generative neural network (generator neural network) includes a mapping network and a synthesis network. During prediction, the mapping network may be omitted, replicated, or evaluated several times. The synthesis network may be used to generate highly varied, high-quality output data with a wide variety of attributes. For example, when used to generate images of people's faces, the attributes that may vary are age, ethnicity, camera viewpoint, pose, face shape, eyeglasses, colors (eyes, hair, etc.), hair style, lighting, background, etc. Depending on the task, generated output data may include images, audio, video, three-dimensional (3D) objects, text, etc.

Type: Grant

Filed: August 23, 2022

Date of Patent: June 20, 2023

Assignee: NVIDIA Corporation

Inventors: Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine
Method for measuring humidity and electronic device using same

Patent number: 11682123

Abstract: A method for measuring humidity at long range using simplified equipment includes creating a formula according to a relationship between multiple sets of known optical flow feature vectors and a known humidity. First and second images are obtained, wherein the first image and the second image are captured as being in the same range of capture. A plurality of feature points in the first image is obtained and an optical flow feature vector for each of the feature points according to apparent changes in position of each feature point according to the second image are calculated. The degree of current humidity according to the optical flow feature vectors and the formula is thus obtained.

Type: Grant

Filed: August 30, 2022

Date of Patent: June 20, 2023

Assignee: Nanning FuLian FuGui Precision Industrial Co., Ltd.

Inventor: Ju-Lan Lu
Speech recognition using neural networks

Patent number: 11620991

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.

Type: Grant

Filed: January 21, 2021

Date of Patent: April 4, 2023

Assignee: Google LLC

Inventors: Andrew W. Senior, Ignacio L. Moreno
Method for measuring humidity and electronic device using same

Patent number: 11488314

Abstract: A method for measuring humidity at long range using simplified equipment includes creating a formula according to a relationship between multiple sets of known optical flow feature vectors and a known humidity. First and second images are obtained, wherein the first image and the second image are captured as being in the same range of capture. A plurality of feature points in the first image is obtained and an optical flow feature vector for each of the feature points according to apparent changes in position of each feature point according to the second image are calculated. The degree of current humidity according to the optical flow feature vectors and the formula is thus obtained.

Type: Grant

Filed: February 26, 2021

Date of Patent: November 1, 2022

Assignee: Nanning FuLian FuGui Precision Industrial Co., Ltd.

Inventor: Ju-Lan Lu
Projecting images to a generative model based on gradient-free latent vector determination

Patent number: 11468294

Abstract: A target image is projected into a latent space of generative model by determining a latent vector by applying a gradient-free technique and a class vector by applying a gradient-based technique. An image is generated from the latent and class vectors, and a loss function is used to determine a loss between the target image and the generated image. This determining of the latent vector and the class vector, generating an image, and using the loss function is repeated until a loss condition is satisfied. In response to the loss condition being satisfied, the latent and class vectors that resulted in the loss condition being satisfied are identified as the final latent and class vectors, respectively. The final latent and class vectors are provided to the generative model and multiple weights of the generative model are adjusted to fine-tune the generative model.

Type: Grant

Filed: February 21, 2020

Date of Patent: October 11, 2022

Assignee: Adobe Inc.

Inventors: Richard Zhang, Sylvain Philippe Paris, Junyan Zhu, Aaron Phillip Hertzmann, Jacob Minyoung Huh
End-to-end speaker recognition using deep neural network

Patent number: 11468901

Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

Type: Grant

Filed: August 8, 2019

Date of Patent: October 11, 2022

Assignee: PINDROP SECURITY, INC.

Inventors: Elie Khoury, Matthew Garland
Two-dimensional sound localization with transformation layer

Patent number: 11425496

Abstract: Methods and systems for localizing a sound source include determining a spatial transformation between a position of a reference microphone array and a position of a displaced microphone array. A sound is measured at the reference microphone array and at the displaced microphone array. A source of the sound is localized using a neural network that includes respective paths for the reference microphone array and the displaced microphone array. The neural network further includes a transformation layer that represents the spatial transformation.

Type: Grant

Filed: May 1, 2020

Date of Patent: August 23, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Guillaume Jean Victor Marie Le Moing, Phongtharin Vinayavekhin, Jayakorn Vongkulbhisal, Don Joven Ravoy Agravante, Tadanobu Inoue, Asim Munawar
DISCRIMINATIVE PRETRAINING OF DEEP NEURAL NETWORKS

Publication number: 20130138436

Abstract: Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.

Type: Application

Filed: November 26, 2011

Publication date: May 30, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Dong Yu, Li Deng, Frank Torsten Bernd Seide, Gang Li
Method for Automated Training of a Plurality of Artificial Neural Networks

Publication number: 20100217589

Abstract: The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.

Type: Application

Filed: February 17, 2010

Publication date: August 26, 2010

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Rainer Gruhn, Daniel Vasquez, Guillermo Aradilla
SPEECH INTERFACES

Publication number: 20100057452

Abstract: The described implementations relate to speech interfaces and in some instances to speech pattern recognition techniques that enable speech interfaces. One system includes a feature pipeline configured to produce speech feature vectors from input speech. This system also includes a classifier pipeline configured to classify individual speech feature vectors utilizing multi-level classification.

Type: Application

Filed: August 28, 2008

Publication date: March 4, 2010

Applicant: Microsoft Corporation

Inventors: Kunal Mukerjee, Brendan Meeder
System and method for learning a network of categories using prediction

Publication number: 20090106022

Abstract: An improved system and method is provided for efficiently learning a network of categories using prediction. A learning engine may receive a stream of characters and incrementally segment the stream of characters beginning with individual characters into larger and larger categories. To do so, a prediction engine may be provided for predicting a target category from the stream of characters using one or more context categories. Upon predicting the target category, the edges of the network of categories may be updated. A category composer may also be provided for composing a new category from existing categories in the network of categories, and a new category composed may then be added to the network of categories. Advantageously, iterative episodes of prediction and learning of categories for large scale applications may result in hundreds of thousands of categories connected by millions of prediction edges.

Type: Application

Filed: October 18, 2007

Publication date: April 23, 2009

Applicant: Yahoo! Inc.

Inventor: Omid Madani