Patents by Inventor Joao CARREIRA

Joao CARREIRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12175737
    Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network.
    Type: Grant
    Filed: November 13, 2020
    Date of Patent: December 24, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Viorica Patraucean, Bilal Piot, Joao Carreira, Volodymyr Mnih, Simon Osindero
  • Publication number: 20240303897
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for animating images using point trajectories.
    Type: Application
    Filed: March 8, 2024
    Publication date: September 12, 2024
    Inventors: Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman
  • Patent number: 12067732
    Abstract: A computer-implemented neural network system for decomposing input video data. A video data input receives a sequence of video image frames. The sequence is encoded, using a 3D spatio-temporal encoder neural network, into a set of latent variables representing a compressed version of the sequence. A 3D spatio-temporal decoder neural network processes the set of latent variables to generate two or more sets of decomposed video data; these may be stored, communicated, and/or made available to a user interface. Input video including undesired features such as reflections, shadows, and occlusions may thus be decomposed into two or more video sequences, one in which the undesired features are suppressed, and another containing the undesired features.
    Type: Grant
    Filed: November 20, 2019
    Date of Patent: August 20, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Joao Carreira, Jean-Baptiste Alayrac, Andrew Zisserman
  • Publication number: 20240232580
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a network output using a neural network. In one aspect, a method comprises: obtaining: (i) a network input to a neural network, and (ii) a set of query embeddings; processing the network input using the neural network to generate a network output that comprises a respective dimension corresponding to each query embedding in the set of query embeddings, comprising: processing the network input using an encoder block of the neural network to generate a representation of the network input as a set of latent embeddings; and processing: (i) the set of latent embeddings, and (ii) the set of query embeddings, using a cross-attention block that generates each dimension of the network output by cross-attention of a corresponding query embedding over the set of latent embeddings.
    Type: Application
    Filed: May 27, 2022
    Publication date: July 11, 2024
    Inventors: Andrew Coulter Jaegle, Jean-Baptiste Alayrac, Sebastian Borgeaud Dit Avocat, Catalin-Dumitru Ionescu, Carl Doersch, Fengning Ding, Oriol Vinyals, Olivier Jean Hénaff, Skanda Kumar Koppula, Daniel Zoran, Andrew Brock, Evan Gerard Shelhamer, Andrew Zisserman, Joao Carreira
  • Patent number: 11967150
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.
    Type: Grant
    Filed: February 13, 2023
    Date of Patent: April 23, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
  • Publication number: 20240104355
    Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.
    Type: Application
    Filed: February 3, 2022
    Publication date: March 28, 2024
    Inventors: Andrew Coulter Jaegle, Joao Carreira
  • Publication number: 20240029436
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.
    Type: Application
    Filed: October 2, 2023
    Publication date: January 25, 2024
    Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
  • Patent number: 11776269
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.
    Type: Grant
    Filed: November 20, 2019
    Date of Patent: October 3, 2023
    Assignee: Deep Mind Technologies Limited
    Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
  • Publication number: 20230244907
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a sequence of data elements that includes a respective data element at each position in a sequence of positions. In one aspect, a method includes: for each position after a first position in the sequence of positions: obtaining a current sequence of data element embeddings that includes a respective data element embedding of each data element at a position that precedes the current position, obtaining a sequence of latent embeddings, and processing: (i) the current sequence of data element embeddings, and (ii) the sequence of latent embeddings, using a neural network to generate the data element at the current position. The neural network includes a sequence of neural network blocks including: (i) a cross-attention block, (ii) one or more self-attention blocks, and (iii) an output block.
    Type: Application
    Filed: January 30, 2023
    Publication date: August 3, 2023
    Inventors: Curtis Glenn-Macway Hawthorne, Andrew Coulter Jaegle, Catalina-Codruta Cangea, Sebastian Borgeaud Dit Avocat, Charlie Thomas Curtis Nash, Mateusz Malinowski, Sander Etienne Lea Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Stuart Simon, Hannah Rachel Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, Joao Carreira, Jesse Engel
  • Publication number: 20230186625
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.
    Type: Application
    Filed: February 13, 2023
    Publication date: June 15, 2023
    Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
  • Publication number: 20230145129
    Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.
    Type: Application
    Filed: January 11, 2023
    Publication date: May 11, 2023
    Inventors: Andrew Coulter Jaegle, Joao Carreira
  • Patent number: 11580736
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.
    Type: Grant
    Filed: January 7, 2019
    Date of Patent: February 14, 2023
    Assignee: DeepMind Technologies Limited
    Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
  • Publication number: 20220398437
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing depth-parallel training of a neural network. One of the methods includes receiving an input sequence; and at each processing time step in a sequence of processing time steps: processing an input item using a first layer block in a stack of layer blocks to generate a first block output; for each subsequent layer block, processing a block output generated by the preceding layer block at the preceding processing time step to generate a current block output; computing i) a current error in an output item generated by the final layer block and ii) a current gradient of the current error; generating a parameter update for the final layer block; for each particular layer block that is not the final layer block, computing a current gradient for the particular layer block and generating a parameter update.
    Type: Application
    Filed: November 13, 2020
    Publication date: December 15, 2022
    Inventors: Mateusz Malinowski, Viorica Patraucean, Grzegorz Michal Swirszcz, Joao Carreira
  • Publication number: 20220392206
    Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network.
    Type: Application
    Filed: November 13, 2020
    Publication date: December 8, 2022
    Inventors: Viorica PATRAUCEAN, Bilal PIOT, Joao CARREIRA, Volodymyr MNIH, Simon OSINDERO
  • Patent number: 11361546
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.
    Type: Grant
    Filed: August 27, 2020
    Date of Patent: June 14, 2022
    Assignee: DeepMind Technologies Limited
    Inventors: Joao Carreira, Andrew Zisserman
  • Publication number: 20220019807
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.
    Type: Application
    Filed: November 20, 2019
    Publication date: January 20, 2022
    Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
  • Publication number: 20220012898
    Abstract: A computer-implemented neural network system for decomposing input video data. A video data input receives a sequence of video image frames. The sequence is encoded, using a 3D spatio-temporal encoder neural network, into a set of latent variables representing a compressed version of the sequence. A 3D spatio-temporal decoder neural network processes the set of latent variables to generate two or more sets of decomposed video data; these may be stored, communicated, and/or made available to a user interface. Input video including undesired features such as reflections, shadows, and occlusions may thus be decomposed into two or more video sequences, one in which the undesired features are suppressed, and another containing the undesired features.
    Type: Application
    Filed: November 20, 2019
    Publication date: January 13, 2022
    Inventors: Joao Carreira, Jean-Baptiste Alayrac, Andrew Zisserman
  • Publication number: 20210027064
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.
    Type: Application
    Filed: January 7, 2019
    Publication date: January 28, 2021
    Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
  • Publication number: 20200394412
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.
    Type: Application
    Filed: August 27, 2020
    Publication date: December 17, 2020
    Inventors: Joao Carreira, Andrew Zisserman
  • Patent number: 10789479
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.
    Type: Grant
    Filed: November 12, 2019
    Date of Patent: September 29, 2020
    Assignee: DeepMind Technologies Limited
    Inventors: Joao Carreira, Andrew Zisserman