Using Artificial Neural Networks (epo) Patents (Class 704/E15.017)
-
Patent number: 11978433Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.Type: GrantFiled: June 22, 2021Date of Patent: May 7, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Felix Weninger, Marco Gaudesi, Ralf Leibold, Puming Zhan
-
Patent number: 11682199Abstract: A style-based generative network architecture enables scale-specific control of synthesized output data, such as images. During training, the style-based generative neural network (generator neural network) includes a mapping network and a synthesis network. During prediction, the mapping network may be omitted, replicated, or evaluated several times. The synthesis network may be used to generate highly varied, high-quality output data with a wide variety of attributes. For example, when used to generate images of people's faces, the attributes that may vary are age, ethnicity, camera viewpoint, pose, face shape, eyeglasses, colors (eyes, hair, etc.), hair style, lighting, background, etc. Depending on the task, generated output data may include images, audio, video, three-dimensional (3D) objects, text, etc.Type: GrantFiled: August 23, 2022Date of Patent: June 20, 2023Assignee: NVIDIA CorporationInventors: Tero Tapani Karras, Timo Oskari Aila, Samuli Matias Laine
-
Patent number: 11682123Abstract: A method for measuring humidity at long range using simplified equipment includes creating a formula according to a relationship between multiple sets of known optical flow feature vectors and a known humidity. First and second images are obtained, wherein the first image and the second image are captured as being in the same range of capture. A plurality of feature points in the first image is obtained and an optical flow feature vector for each of the feature points according to apparent changes in position of each feature point according to the second image are calculated. The degree of current humidity according to the optical flow feature vectors and the formula is thus obtained.Type: GrantFiled: August 30, 2022Date of Patent: June 20, 2023Assignee: Nanning FuLian FuGui Precision Industrial Co., Ltd.Inventor: Ju-Lan Lu
-
Patent number: 11620991Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.Type: GrantFiled: January 21, 2021Date of Patent: April 4, 2023Assignee: Google LLCInventors: Andrew W. Senior, Ignacio L. Moreno
-
Patent number: 11488314Abstract: A method for measuring humidity at long range using simplified equipment includes creating a formula according to a relationship between multiple sets of known optical flow feature vectors and a known humidity. First and second images are obtained, wherein the first image and the second image are captured as being in the same range of capture. A plurality of feature points in the first image is obtained and an optical flow feature vector for each of the feature points according to apparent changes in position of each feature point according to the second image are calculated. The degree of current humidity according to the optical flow feature vectors and the formula is thus obtained.Type: GrantFiled: February 26, 2021Date of Patent: November 1, 2022Assignee: Nanning FuLian FuGui Precision Industrial Co., Ltd.Inventor: Ju-Lan Lu
-
Patent number: 11468294Abstract: A target image is projected into a latent space of generative model by determining a latent vector by applying a gradient-free technique and a class vector by applying a gradient-based technique. An image is generated from the latent and class vectors, and a loss function is used to determine a loss between the target image and the generated image. This determining of the latent vector and the class vector, generating an image, and using the loss function is repeated until a loss condition is satisfied. In response to the loss condition being satisfied, the latent and class vectors that resulted in the loss condition being satisfied are identified as the final latent and class vectors, respectively. The final latent and class vectors are provided to the generative model and multiple weights of the generative model are adjusted to fine-tune the generative model.Type: GrantFiled: February 21, 2020Date of Patent: October 11, 2022Assignee: Adobe Inc.Inventors: Richard Zhang, Sylvain Philippe Paris, Junyan Zhu, Aaron Phillip Hertzmann, Jacob Minyoung Huh
-
Patent number: 11468901Abstract: The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.Type: GrantFiled: August 8, 2019Date of Patent: October 11, 2022Assignee: PINDROP SECURITY, INC.Inventors: Elie Khoury, Matthew Garland
-
Patent number: 11425496Abstract: Methods and systems for localizing a sound source include determining a spatial transformation between a position of a reference microphone array and a position of a displaced microphone array. A sound is measured at the reference microphone array and at the displaced microphone array. A source of the sound is localized using a neural network that includes respective paths for the reference microphone array and the displaced microphone array. The neural network further includes a transformation layer that represents the spatial transformation.Type: GrantFiled: May 1, 2020Date of Patent: August 23, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Guillaume Jean Victor Marie Le Moing, Phongtharin Vinayavekhin, Jayakorn Vongkulbhisal, Don Joven Ravoy Agravante, Tadanobu Inoue, Asim Munawar
-
Publication number: 20130138436Abstract: Discriminative pretraining technique embodiments are presented that pretrain the hidden layers of a Deep Neural Network (DNN). In general, a one-hidden-layer neural network is trained first using labels discriminatively with error back-propagation (BP). Then, after discarding an output layer in the previous one-hidden-layer neural network, another randomly initialized hidden layer is added on top of the previously trained hidden layer along with a new output layer that represents the targets for classification or recognition. The resulting multiple-hidden-layer DNN is then discriminatively trained using the same strategy, and so on until the desired number of hidden layers is reached. This produces a pretrained DNN. The discriminative pretraining technique embodiments have the advantage of bringing the DNN layer weights close to a good local optimum, while still leaving them in a range with a high gradient so that they can be fine-tuned effectively.Type: ApplicationFiled: November 26, 2011Publication date: May 30, 2013Applicant: MICROSOFT CORPORATIONInventors: Dong Yu, Li Deng, Frank Torsten Bernd Seide, Gang Li
-
Publication number: 20100217589Abstract: The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.Type: ApplicationFiled: February 17, 2010Publication date: August 26, 2010Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Rainer Gruhn, Daniel Vasquez, Guillermo Aradilla
-
Publication number: 20100057452Abstract: The described implementations relate to speech interfaces and in some instances to speech pattern recognition techniques that enable speech interfaces. One system includes a feature pipeline configured to produce speech feature vectors from input speech. This system also includes a classifier pipeline configured to classify individual speech feature vectors utilizing multi-level classification.Type: ApplicationFiled: August 28, 2008Publication date: March 4, 2010Applicant: Microsoft CorporationInventors: Kunal Mukerjee, Brendan Meeder
-
Publication number: 20090106022Abstract: An improved system and method is provided for efficiently learning a network of categories using prediction. A learning engine may receive a stream of characters and incrementally segment the stream of characters beginning with individual characters into larger and larger categories. To do so, a prediction engine may be provided for predicting a target category from the stream of characters using one or more context categories. Upon predicting the target category, the edges of the network of categories may be updated. A category composer may also be provided for composing a new category from existing categories in the network of categories, and a new category composed may then be added to the network of categories. Advantageously, iterative episodes of prediction and learning of categories for large scale applications may result in hundreds of thousands of categories connected by millions of prediction edges.Type: ApplicationFiled: October 18, 2007Publication date: April 23, 2009Applicant: Yahoo! Inc.Inventor: Omid Madani