Patents by Inventor Ioannis Alexandros Assael

Ioannis Alexandros Assael has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TRAINING VIDEO DATA GENERATION NEURAL NETWORKS USING VIDEO FRAME EMBEDDINGS

Publication number: 20230306258

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video data generation neural network having a plurality of video generation network parameters. In one aspect, a method includes generating one or more sequences of training video frames using the video data generation neural network in accordance with current values of the video data generation network parameters; obtaining one or more sequences of target video frames; and training the video data generation neural network using training signals derived from a similarity between respective embeddings of the training and target video frames. The embeddings are generated by a video data embedding neural network.

Type: Application

Filed: September 8, 2021

Publication date: September 28, 2023

Inventors: Ioannis Alexandros Assael, Brendan Shillingford
RESOLVING TIME-DELAYS USING GENERATIVE MODELS

Publication number: 20220382507

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating audio output samples predicted to be communicated by a user. One example system includes a first user device having a first user. The first user device initiates a communication session between the first user and a second user of a second user device. The first user device obtains a neural network model of the second user. The neural network model is trained to generate, conditioned on audio input samples received up to a current time step, an audio output sample predicted to be communicated by the second user at a next time step. The user device repeatedly provides received audio input samples as input to the neural network model and plays audio output samples generated by the neural network model in place of received audio input samples communicated by the second user.

Type: Application

Filed: August 12, 2022

Publication date: December 1, 2022

Inventors: Jakob Nicolaus Foerster, Ioannis Alexandros Assael
Resolving time-delays using generative models

Patent number: 11416207

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating audio output samples predicted to be communicated by a user. One example system includes a first user device having a first user. The first user device initiates a communication session between the first user and a second user of a second user device. The first user device obtains a neural network model of the second user. The neural network model is trained to generate, conditioned on audio input samples received up to a current time step, an audio output sample predicted to be communicated by the second user at a next time step. The user device repeatedly provides received audio input samples as input to the neural network model and plays audio output samples generated by the neural network model in place of received audio input samples communicated by the second user.

Type: Grant

Filed: May 31, 2019

Date of Patent: August 16, 2022

Assignee: DeepMind Technologies Limited

Inventors: Jakob Nicolaus Foerster, Ioannis Alexandros Assael
BANDWIDTH EXTENSION OF INCOMING DATA USING NEURAL NETWORKS

Publication number: 20220223162

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for bandwidth extension. One of the methods includes obtaining a low-resolution version of an input, the low-resolution version of the input comprising a first number of samples at a first sample rate over a first time period; and generating, from the low-resolution version of the input, a high-resolution version of the input comprising a second, larger number of samples at a second, higher sample rate over the first time period. Generating the high-resolution version includes generating a representation of the low-resolution version of the input; processing the representation of the low-resolution version of the input through a conditioning neural network to generate a conditioning input; and processing the conditioning input using a generative neural network to generate the high/resolution version of the input.

Type: Application

Filed: April 30, 2020

Publication date: July 14, 2022

Inventors: Ioannis Alexandros Assael, Thomas Chadwick Walters, Archit Gupta, Brendan Shillingford
Visual speech recognition by phoneme prediction

Patent number: 11386900

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.

Type: Grant

Filed: May 20, 2019

Date of Patent: July 12, 2022

Assignee: DeepMind Technologies Limited

Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
Sample-efficient adaptive text-to-speech

Patent number: 11355097

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.

Type: Grant

Filed: October 1, 2020

Date of Patent: June 7, 2022

Assignee: DeepMind Technologies Limited

Inventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
Cross-modal sequence distillation

Patent number: 11250838

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video speech recognition model having a plurality of model parameters on a set of unlabeled video-audio data and using a trained speech recognition model. During the training, the values of the parameters of the trained audio speech recognition model fixed are generally fixed and only the values of the video speech recognition model are adjusted. Once being trained, the video speech recognition model can be used to recognize speech from video when corresponding audio is not available.

Type: Grant

Filed: November 18, 2019

Date of Patent: February 15, 2022

Assignee: DeepMind Technologies Limited

Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
VISUAL SPEECH RECOGNITION BY PHONEME PREDICTION

Publication number: 20210110831

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.

Type: Application

Filed: May 20, 2019

Publication date: April 15, 2021

Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
SAMPLE-EFFICIENT ADAPTIVE TEXT-TO-SPEECH

Publication number: 20210020160

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.

Type: Application

Filed: October 1, 2020

Publication date: January 21, 2021

Inventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
Sample-efficient adaptive text-to-speech

Patent number: 10810993

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.

Type: Grant

Filed: October 28, 2019

Date of Patent: October 20, 2020

Assignee: DeepMind Technologies Limited

Inventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
CROSS-MODAL SEQUENCE DISTILLATION

Publication number: 20200160843

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video speech recognition model having a plurality of model parameters on a set of unlabeled video-audio data and using a trained speech recognition model. During the training, the values of the parameters of the trained audio speech recognition model fixed are generally fixed and only the values of the video speech recognition model are adjusted. Once being trained, the video speech recognition model can be used to recognize speech from video when corresponding audio is not available.

Type: Application

Filed: November 18, 2019

Publication date: May 21, 2020

Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
SAMPLE-EFFICIENT ADAPTIVE TEXT-TO-SPEECH

Publication number: 20200135172

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.

Type: Application

Filed: October 28, 2019

Publication date: April 30, 2020

Inventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
RESOLVING TIME-DELAYS USING GENERATIVE MODELS

Publication number: 20190369946

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating audio output samples predicted to be communicated by a user. One example system includes a first user device having a first user. The first user device initiates a communication session between the first user and a second user of a second user device. The first user device obtains a neural network model of the second user. The neural network model is trained to generate, conditioned on audio input samples received up to a current time step, an audio output sample predicted to be communicated by the second user at a next time step. The user device repeatedly provides received audio input samples as input to the neural network model and plays audio output samples generated by the neural network model in place of received audio input samples communicated by the second user.

Type: Application

Filed: May 31, 2019

Publication date: December 5, 2019

Inventors: Jakob Nicolaus Foerster, Ioannis Alexandros Assael