Patents by Inventor Joao CARREIRA

Joao CARREIRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Parallel video processing systems

Patent number: 11967150

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Grant

Filed: February 13, 2023

Date of Patent: April 23, 2024

Assignee: DeepMind Technologies Limited

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
GENERATING NEURAL NETWORK OUTPUTS BY ENRICHING LATENT EMBEDDINGS USING SELF-ATTENTION AND CROSS-ATTENTION OPERATIONS

Publication number: 20240104355

Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.

Type: Application

Filed: February 3, 2022

Publication date: March 28, 2024

Inventors: Andrew Coulter Jaegle, Joao Carreira
ACTION CLASSIFICATION IN VIDEO CLIPS USING ATTENTION-BASED NEURAL NETWORKS

Publication number: 20240029436

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.

Type: Application

Filed: October 2, 2023

Publication date: January 25, 2024

Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
Action classification in video clips using attention-based neural networks

Patent number: 11776269

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.

Type: Grant

Filed: November 20, 2019

Date of Patent: October 3, 2023

Assignee: Deep Mind Technologies Limited

Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
GENERATING SEQUENCES OF DATA ELEMENTS USING CROSS-ATTENTION OPERATIONS

Publication number: 20230244907

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a sequence of data elements that includes a respective data element at each position in a sequence of positions. In one aspect, a method includes: for each position after a first position in the sequence of positions: obtaining a current sequence of data element embeddings that includes a respective data element embedding of each data element at a position that precedes the current position, obtaining a sequence of latent embeddings, and processing: (i) the current sequence of data element embeddings, and (ii) the sequence of latent embeddings, using a neural network to generate the data element at the current position. The neural network includes a sequence of neural network blocks including: (i) a cross-attention block, (ii) one or more self-attention blocks, and (iii) an output block.

Type: Application

Filed: January 30, 2023

Publication date: August 3, 2023

Inventors: Curtis Glenn-Macway Hawthorne, Andrew Coulter Jaegle, Catalina-Codruta Cangea, Sebastian Borgeaud Dit Avocat, Charlie Thomas Curtis Nash, Mateusz Malinowski, Sander Etienne Lea Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Stuart Simon, Hannah Rachel Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, Joao Carreira, Jesse Engel
PARALLEL VIDEO PROCESSING SYSTEMS

Publication number: 20230186625

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Application

Filed: February 13, 2023

Publication date: June 15, 2023

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
GENERATING NEURAL NETWORK OUTPUTS BY ENRICHING LATENT EMBEDDINGS USING SELF-ATTENTION AND CROSS-ATTENTION OPERATIONS

Publication number: 20230145129

Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.

Type: Application

Filed: January 11, 2023

Publication date: May 11, 2023

Inventors: Andrew Coulter Jaegle, Joao Carreira
Parallel video processing neural networks

Patent number: 11580736

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Grant

Filed: January 7, 2019

Date of Patent: February 14, 2023

Assignee: DeepMind Technologies Limited

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
Depth-Parallel Training of Neural Networks

Publication number: 20220398437

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing depth-parallel training of a neural network. One of the methods includes receiving an input sequence; and at each processing time step in a sequence of processing time steps: processing an input item using a first layer block in a stack of layer blocks to generate a first block output; for each subsequent layer block, processing a block output generated by the preceding layer block at the preceding processing time step to generate a current block output; computing i) a current error in an output item generated by the final layer block and ii) a current gradient of the current error; generating a parameter update for the final layer block; for each particular layer block that is not the final layer block, computing a current gradient for the particular layer block and generating a parameter update.

Type: Application

Filed: November 13, 2020

Publication date: December 15, 2022

Inventors: Mateusz Malinowski, Viorica Patraucean, Grzegorz Michal Swirszcz, Joao Carreira
REINFORCEMENT LEARNING FOR ACTIVE SEQUENCE PROCESSING

Publication number: 20220392206

Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network.

Type: Application

Filed: November 13, 2020

Publication date: December 8, 2022

Inventors: Viorica PATRAUCEAN, Bilal PIOT, Joao CARREIRA, Volodymyr MNIH, Simon OSINDERO
Action recognition in videos using 3D spatio-temporal convolutional neural networks

Patent number: 11361546

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Grant

Filed: August 27, 2020

Date of Patent: June 14, 2022

Assignee: DeepMind Technologies Limited

Inventors: Joao Carreira, Andrew Zisserman
ACTION CLASSIFICATION IN VIDEO CLIPS USING ATTENTION-BASED NEURAL NETWORKS

Publication number: 20220019807

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.

Type: Application

Filed: November 20, 2019

Publication date: January 20, 2022

Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
NEURAL NETWORK SYSTEMS FOR DECOMPOSING VIDEO DATA INTO LAYERED REPRESENTATIONS

Publication number: 20220012898

Abstract: A computer-implemented neural network system for decomposing input video data. A video data input receives a sequence of video image frames. The sequence is encoded, using a 3D spatio-temporal encoder neural network, into a set of latent variables representing a compressed version of the sequence. A 3D spatio-temporal decoder neural network processes the set of latent variables to generate two or more sets of decomposed video data; these may be stored, communicated, and/or made available to a user interface. Input video including undesired features such as reflections, shadows, and occlusions may thus be decomposed into two or more video sequences, one in which the undesired features are suppressed, and another containing the undesired features.

Type: Application

Filed: November 20, 2019

Publication date: January 13, 2022

Inventors: Joao Carreira, Jean-Baptiste Alayrac, Andrew Zisserman
PARALLEL VIDEO PROCESSING NEURAL NETWORKS

Publication number: 20210027064

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Application

Filed: January 7, 2019

Publication date: January 28, 2021

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
ACTION RECOGNITION IN VIDEOS USING 3D SPATIO-TEMPORAL CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20200394412

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Application

Filed: August 27, 2020

Publication date: December 17, 2020

Inventors: Joao Carreira, Andrew Zisserman
Action recognition in videos using 3D spatio-temporal convolutional neural networks

Patent number: 10789479

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Grant

Filed: November 12, 2019

Date of Patent: September 29, 2020

Assignee: DeepMind Technologies Limited

Inventors: Joao Carreira, Andrew Zisserman
ACTION RECOGNITION IN VIDEOS USING 3D SPATIO-TEMPORAL CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20200125852

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Application

Filed: November 12, 2019

Publication date: April 23, 2020

Inventors: Joao Carreira, Andrew Zisserman
SEMANTIC SEGMENTATION METHOD WITH SECOND-ORDER POOLING

Publication number: 20150104102

Abstract: Feature extraction, coding and pooling, are important components on many contemporary object recognition paradigms. This method explores pooling techniques that encode the second-order statistics of local descriptors inside a region. To achieve this effect, it introduces multiplicative second-order analogues of average and max pooling that together with appropriate non-linearities that lead to exceptional performance on free-form region recognition, without any type of feature coding. Instead of coding, it was found that enriching local descriptors with additional image information leads to large performance gains, especially in conjunction with the proposed pooling methodology. Thus, second-order pooling over free-form regions produces results superior to those of the winning systems in the Pascal VOC 2011 semantic segmentation challenge, with models that are 20,000 times faster.

Type: Application

Filed: October 11, 2013

Publication date: April 16, 2015

Applicant: Universidade de Coimbra

Inventors: Joao CARREIRA, Rui CASEIRO, Jorge BATISTA, Cristian SMINCHISESCU