Patents by Inventor Georg Heigold

Georg Heigold has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Neural networks for speaker verification

Patent number: 11961525

Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

Type: Grant

Filed: August 3, 2021

Date of Patent: April 16, 2024

Assignee: Google LLC

Inventors: Georg Heigold, Samuel Bengio, Ignacio Lopez Moreno
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20240087559

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: November 10, 2023

Publication date: March 14, 2024

Applicant: Google LLC

Inventors: Georg Heigold, Erik Mcdermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A. U. Bacchiani
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20240062426

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: November 1, 2023

Publication date: February 22, 2024

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Neural Networks For Speaker Verification

Publication number: 20240038245

Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

Type: Application

Filed: October 11, 2023

Publication date: February 1, 2024

Applicant: Google LLC

Inventors: Georg Heigold, Samuel Bengio, Ignacio Lopez Moreno
Asynchronous optimization for sequence training of neural networks

Patent number: 11854534

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Grant

Filed: December 20, 2022

Date of Patent: December 26, 2023

Assignee: Google LLC

Inventors: Georg Heigold, Erik Mcdermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A. U. Bacchiani
Systems And Methods For Improved Video Understanding

Publication number: 20230017072

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

Type: Application

Filed: July 8, 2021

Publication date: January 19, 2023

Inventors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Luise Schmid
Asynchronous optimization for sequence training of neural networks

Patent number: 11557277

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Grant

Filed: December 15, 2021

Date of Patent: January 17, 2023

Assignee: Google LLC

Inventors: Georg Heigold, Erik McDermott, Vincent O. VanHoucke, Andrew W. Senior, Michiel A. U. Bacchiani
Conditional Object-Centric Learning with Slot Attention for Video and Other Sequential Data

Publication number: 20220383628

Abstract: A method includes obtaining first feature vectors and second feature vectors representing contents of a first and second image frame, respectively, of an input video. The method may also include generating, based on the first feature vectors, first slot vectors, where each slot vector represents attributes of a corresponding entity as represented in the first image frame, and generating, based on the first slot vectors, predicted slot vectors including a corresponding predicted slot vector that represents a transition of the attributes of the corresponding entity from the first to the second image frame. The method may additionally include generating, based on the predicted slot vectors and the second feature vectors, second slot vectors including a corresponding slot vector that represents the attributes of the corresponding entity as represented in the second image frame, and determining an output based on the predicted slot vectors or the second slot vectors.

Type: Application

Filed: April 21, 2022

Publication date: December 1, 2022

Inventors: Thomas Kipf, Gamaleldin Elsayed, Aravindh Mahendran, Austin Charles Stone, Sara Sabour Rouh Aghdam, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20220108686

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: December 15, 2021

Publication date: April 7, 2022

Applicant: Google LLC

Inventors: Georg Heigold, Erik McDermott, Vincent O. VanHoucke, Andrew W. Senior, Michiel A.U. Bacchiani
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20220108478

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: October 1, 2021

Publication date: April 7, 2022

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Asynchronous optimization for sequence training of neural networks

Patent number: 11227582

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Grant

Filed: January 6, 2021

Date of Patent: January 18, 2022

Assignee: Google LLC

Inventors: Georg Heigold, Erik Mcdermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A. U. Bacchiani
Object-Centric Learning with Slot Attention

Publication number: 20210383199

Abstract: A method involves receiving a perceptual representation including a plurality of feature vectors, and initializing a plurality of slot vectors represented by a neural network memory unit. Each respective slot vector is configured to represent a corresponding entity in the perceptual representation. The method also involves determining an attention matrix based on a product of the plurality of feature vectors transformed by a key function and the plurality of slot vectors transformed by a query function. Each respective value of a plurality of values along each respective dimension of the attention matrix is normalized with respect to the plurality of values. The method additionally involves determining an update matrix based on the plurality of feature vectors transformed by a value function and the attention matrix, and updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit.

Type: Application

Filed: July 13, 2020

Publication date: December 9, 2021

Inventors: Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner, Aravindh Mahendran, Francesco Locatello, Thomas Kipf, Georg Heigold, Alexey Dosovitskiy
Neural Networks For Speaker Verification

Publication number: 20210366491

Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

Type: Application

Filed: August 3, 2021

Publication date: November 25, 2021

Applicant: Google LLC

Inventors: Georg Heigold, Samuel Bengio, Ignacio Lopez Moreno
Neural networks for speaker verification

Patent number: 11107478

Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

Type: Grant

Filed: January 24, 2020

Date of Patent: August 31, 2021

Assignee: Google LLC

Inventors: Georg Heigold, Samuel Bengio, Ignacio Lopez Moreno
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20210125601

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: January 6, 2021

Publication date: April 29, 2021

Applicant: Google LLC

Inventors: Georg Heigold, Erik Mcdermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A.U. Bacchiani
Asynchronous optimization for sequence training of neural networks

Patent number: 10916238

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Grant

Filed: April 30, 2020

Date of Patent: February 9, 2021

Assignee: Google LLC

Inventors: Georg Heigold, Erik Mcdermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A. U. Bacchiani
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20200258500

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: April 30, 2020

Publication date: August 13, 2020

Applicant: Google LLC

Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A.U. Bacchiani
Asynchronous optimization for sequence training of neural networks

Patent number: 10672384

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Grant

Filed: September 17, 2019

Date of Patent: June 2, 2020

Assignee: Google LLC

Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A. U. Bacchiani
Neural Networks For Speaker Verification

Publication number: 20200160869

Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

Type: Application

Filed: January 24, 2020

Publication date: May 21, 2020

Applicant: Google LLC

Inventors: Georg Heigold, Samuel Bengio, Ignacio Lopez Moreno
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20200118549

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: September 17, 2019

Publication date: April 16, 2020

Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A.U. Bacchiani

1 2 next