Patents by Inventor Ioannis Alexandros Assael
Ioannis Alexandros Assael has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230306258Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video data generation neural network having a plurality of video generation network parameters. In one aspect, a method includes generating one or more sequences of training video frames using the video data generation neural network in accordance with current values of the video data generation network parameters; obtaining one or more sequences of target video frames; and training the video data generation neural network using training signals derived from a similarity between respective embeddings of the training and target video frames. The embeddings are generated by a video data embedding neural network.Type: ApplicationFiled: September 8, 2021Publication date: September 28, 2023Inventors: Ioannis Alexandros Assael, Brendan Shillingford
-
Publication number: 20220382507Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating audio output samples predicted to be communicated by a user. One example system includes a first user device having a first user. The first user device initiates a communication session between the first user and a second user of a second user device. The first user device obtains a neural network model of the second user. The neural network model is trained to generate, conditioned on audio input samples received up to a current time step, an audio output sample predicted to be communicated by the second user at a next time step. The user device repeatedly provides received audio input samples as input to the neural network model and plays audio output samples generated by the neural network model in place of received audio input samples communicated by the second user.Type: ApplicationFiled: August 12, 2022Publication date: December 1, 2022Inventors: Jakob Nicolaus Foerster, Ioannis Alexandros Assael
-
Patent number: 11416207Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating audio output samples predicted to be communicated by a user. One example system includes a first user device having a first user. The first user device initiates a communication session between the first user and a second user of a second user device. The first user device obtains a neural network model of the second user. The neural network model is trained to generate, conditioned on audio input samples received up to a current time step, an audio output sample predicted to be communicated by the second user at a next time step. The user device repeatedly provides received audio input samples as input to the neural network model and plays audio output samples generated by the neural network model in place of received audio input samples communicated by the second user.Type: GrantFiled: May 31, 2019Date of Patent: August 16, 2022Assignee: DeepMind Technologies LimitedInventors: Jakob Nicolaus Foerster, Ioannis Alexandros Assael
-
Publication number: 20220223162Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for bandwidth extension. One of the methods includes obtaining a low-resolution version of an input, the low-resolution version of the input comprising a first number of samples at a first sample rate over a first time period; and generating, from the low-resolution version of the input, a high-resolution version of the input comprising a second, larger number of samples at a second, higher sample rate over the first time period. Generating the high-resolution version includes generating a representation of the low-resolution version of the input; processing the representation of the low-resolution version of the input through a conditioning neural network to generate a conditioning input; and processing the conditioning input using a generative neural network to generate the high/resolution version of the input.Type: ApplicationFiled: April 30, 2020Publication date: July 14, 2022Inventors: Ioannis Alexandros Assael, Thomas Chadwick Walters, Archit Gupta, Brendan Shillingford
-
Patent number: 11386900Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.Type: GrantFiled: May 20, 2019Date of Patent: July 12, 2022Assignee: DeepMind Technologies LimitedInventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
-
Patent number: 11355097Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.Type: GrantFiled: October 1, 2020Date of Patent: June 7, 2022Assignee: DeepMind Technologies LimitedInventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
-
Patent number: 11250838Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video speech recognition model having a plurality of model parameters on a set of unlabeled video-audio data and using a trained speech recognition model. During the training, the values of the parameters of the trained audio speech recognition model fixed are generally fixed and only the values of the video speech recognition model are adjusted. Once being trained, the video speech recognition model can be used to recognize speech from video when corresponding audio is not available.Type: GrantFiled: November 18, 2019Date of Patent: February 15, 2022Assignee: DeepMind Technologies LimitedInventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
-
Publication number: 20210110831Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.Type: ApplicationFiled: May 20, 2019Publication date: April 15, 2021Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
-
Publication number: 20210020160Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.Type: ApplicationFiled: October 1, 2020Publication date: January 21, 2021Inventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
-
Patent number: 10810993Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.Type: GrantFiled: October 28, 2019Date of Patent: October 20, 2020Assignee: DeepMind Technologies LimitedInventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
-
Publication number: 20200160843Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video speech recognition model having a plurality of model parameters on a set of unlabeled video-audio data and using a trained speech recognition model. During the training, the values of the parameters of the trained audio speech recognition model fixed are generally fixed and only the values of the video speech recognition model are adjusted. Once being trained, the video speech recognition model can be used to recognize speech from video when corresponding audio is not available.Type: ApplicationFiled: November 18, 2019Publication date: May 21, 2020Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
-
Publication number: 20200135172Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.Type: ApplicationFiled: October 28, 2019Publication date: April 30, 2020Inventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
-
Publication number: 20190369946Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating audio output samples predicted to be communicated by a user. One example system includes a first user device having a first user. The first user device initiates a communication session between the first user and a second user of a second user device. The first user device obtains a neural network model of the second user. The neural network model is trained to generate, conditioned on audio input samples received up to a current time step, an audio output sample predicted to be communicated by the second user at a next time step. The user device repeatedly provides received audio input samples as input to the neural network model and plays audio output samples generated by the neural network model in place of received audio input samples communicated by the second user.Type: ApplicationFiled: May 31, 2019Publication date: December 5, 2019Inventors: Jakob Nicolaus Foerster, Ioannis Alexandros Assael