Patents by Inventor Joao CARREIRA
Joao CARREIRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12175737Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network.Type: GrantFiled: November 13, 2020Date of Patent: December 24, 2024Assignee: DeepMind Technologies LimitedInventors: Viorica Patraucean, Bilal Piot, Joao Carreira, Volodymyr Mnih, Simon Osindero
-
Publication number: 20240303897Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for animating images using point trajectories.Type: ApplicationFiled: March 8, 2024Publication date: September 12, 2024Inventors: Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman
-
Patent number: 12067732Abstract: A computer-implemented neural network system for decomposing input video data. A video data input receives a sequence of video image frames. The sequence is encoded, using a 3D spatio-temporal encoder neural network, into a set of latent variables representing a compressed version of the sequence. A 3D spatio-temporal decoder neural network processes the set of latent variables to generate two or more sets of decomposed video data; these may be stored, communicated, and/or made available to a user interface. Input video including undesired features such as reflections, shadows, and occlusions may thus be decomposed into two or more video sequences, one in which the undesired features are suppressed, and another containing the undesired features.Type: GrantFiled: November 20, 2019Date of Patent: August 20, 2024Assignee: DeepMind Technologies LimitedInventors: Joao Carreira, Jean-Baptiste Alayrac, Andrew Zisserman
-
Publication number: 20240232580Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a network output using a neural network. In one aspect, a method comprises: obtaining: (i) a network input to a neural network, and (ii) a set of query embeddings; processing the network input using the neural network to generate a network output that comprises a respective dimension corresponding to each query embedding in the set of query embeddings, comprising: processing the network input using an encoder block of the neural network to generate a representation of the network input as a set of latent embeddings; and processing: (i) the set of latent embeddings, and (ii) the set of query embeddings, using a cross-attention block that generates each dimension of the network output by cross-attention of a corresponding query embedding over the set of latent embeddings.Type: ApplicationFiled: May 27, 2022Publication date: July 11, 2024Inventors: Andrew Coulter Jaegle, Jean-Baptiste Alayrac, Sebastian Borgeaud Dit Avocat, Catalin-Dumitru Ionescu, Carl Doersch, Fengning Ding, Oriol Vinyals, Olivier Jean Hénaff, Skanda Kumar Koppula, Daniel Zoran, Andrew Brock, Evan Gerard Shelhamer, Andrew Zisserman, Joao Carreira
-
Patent number: 11967150Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.Type: GrantFiled: February 13, 2023Date of Patent: April 23, 2024Assignee: DeepMind Technologies LimitedInventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
-
Publication number: 20240104355Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.Type: ApplicationFiled: February 3, 2022Publication date: March 28, 2024Inventors: Andrew Coulter Jaegle, Joao Carreira
-
Publication number: 20240029436Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.Type: ApplicationFiled: October 2, 2023Publication date: January 25, 2024Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
-
Patent number: 11776269Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.Type: GrantFiled: November 20, 2019Date of Patent: October 3, 2023Assignee: Deep Mind Technologies LimitedInventors: Joao Carreira, Carl Doersch, Andrew Zisserman
-
Publication number: 20230244907Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a sequence of data elements that includes a respective data element at each position in a sequence of positions. In one aspect, a method includes: for each position after a first position in the sequence of positions: obtaining a current sequence of data element embeddings that includes a respective data element embedding of each data element at a position that precedes the current position, obtaining a sequence of latent embeddings, and processing: (i) the current sequence of data element embeddings, and (ii) the sequence of latent embeddings, using a neural network to generate the data element at the current position. The neural network includes a sequence of neural network blocks including: (i) a cross-attention block, (ii) one or more self-attention blocks, and (iii) an output block.Type: ApplicationFiled: January 30, 2023Publication date: August 3, 2023Inventors: Curtis Glenn-Macway Hawthorne, Andrew Coulter Jaegle, Catalina-Codruta Cangea, Sebastian Borgeaud Dit Avocat, Charlie Thomas Curtis Nash, Mateusz Malinowski, Sander Etienne Lea Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Stuart Simon, Hannah Rachel Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, Joao Carreira, Jesse Engel
-
Publication number: 20230186625Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.Type: ApplicationFiled: February 13, 2023Publication date: June 15, 2023Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
-
Publication number: 20230145129Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.Type: ApplicationFiled: January 11, 2023Publication date: May 11, 2023Inventors: Andrew Coulter Jaegle, Joao Carreira
-
Patent number: 11580736Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.Type: GrantFiled: January 7, 2019Date of Patent: February 14, 2023Assignee: DeepMind Technologies LimitedInventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
-
Publication number: 20220398437Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing depth-parallel training of a neural network. One of the methods includes receiving an input sequence; and at each processing time step in a sequence of processing time steps: processing an input item using a first layer block in a stack of layer blocks to generate a first block output; for each subsequent layer block, processing a block output generated by the preceding layer block at the preceding processing time step to generate a current block output; computing i) a current error in an output item generated by the final layer block and ii) a current gradient of the current error; generating a parameter update for the final layer block; for each particular layer block that is not the final layer block, computing a current gradient for the particular layer block and generating a parameter update.Type: ApplicationFiled: November 13, 2020Publication date: December 15, 2022Inventors: Mateusz Malinowski, Viorica Patraucean, Grzegorz Michal Swirszcz, Joao Carreira
-
Publication number: 20220392206Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network.Type: ApplicationFiled: November 13, 2020Publication date: December 8, 2022Inventors: Viorica PATRAUCEAN, Bilal PIOT, Joao CARREIRA, Volodymyr MNIH, Simon OSINDERO
-
Patent number: 11361546Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.Type: GrantFiled: August 27, 2020Date of Patent: June 14, 2022Assignee: DeepMind Technologies LimitedInventors: Joao Carreira, Andrew Zisserman
-
Publication number: 20220019807Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.Type: ApplicationFiled: November 20, 2019Publication date: January 20, 2022Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
-
Publication number: 20220012898Abstract: A computer-implemented neural network system for decomposing input video data. A video data input receives a sequence of video image frames. The sequence is encoded, using a 3D spatio-temporal encoder neural network, into a set of latent variables representing a compressed version of the sequence. A 3D spatio-temporal decoder neural network processes the set of latent variables to generate two or more sets of decomposed video data; these may be stored, communicated, and/or made available to a user interface. Input video including undesired features such as reflections, shadows, and occlusions may thus be decomposed into two or more video sequences, one in which the undesired features are suppressed, and another containing the undesired features.Type: ApplicationFiled: November 20, 2019Publication date: January 13, 2022Inventors: Joao Carreira, Jean-Baptiste Alayrac, Andrew Zisserman
-
Publication number: 20210027064Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.Type: ApplicationFiled: January 7, 2019Publication date: January 28, 2021Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
-
Publication number: 20200394412Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.Type: ApplicationFiled: August 27, 2020Publication date: December 17, 2020Inventors: Joao Carreira, Andrew Zisserman
-
Patent number: 10789479Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.Type: GrantFiled: November 12, 2019Date of Patent: September 29, 2020Assignee: DeepMind Technologies LimitedInventors: Joao Carreira, Andrew Zisserman