Patents by Inventor Andrew Zisserman

Andrew Zisserman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Parallel video processing systems

Patent number: 11967150

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Grant

Filed: February 13, 2023

Date of Patent: April 23, 2024

Assignee: DeepMind Technologies Limited

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
ACTION CLASSIFICATION IN VIDEO CLIPS USING ATTENTION-BASED NEURAL NETWORKS

Publication number: 20240029436

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.

Type: Application

Filed: October 2, 2023

Publication date: January 25, 2024

Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
Action classification in video clips using attention-based neural networks

Patent number: 11776269

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.

Type: Grant

Filed: November 20, 2019

Date of Patent: October 3, 2023

Assignee: Deep Mind Technologies Limited

Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
CLASS AGNOSTIC REPETITION COUNTING IN VIDEO(S) UTILIZING A TEMPORAL SELF-SIMILARITY MATRIX

Publication number: 20230274548

Abstract: Techniques are disclosed that enable processing a video capturing a periodic activity using a repetition network to generate periodic output (e.g., a period length of the periodic activity captured in the video and/or a frame wise periodicity indication of the video capturing the periodic activity). Various implementations include a class agnostic repetition network which can be used to generate periodic output for a wide variety of periodic activities. Additional or alternative implementations include generating synthetic repetition videos which can be utilized to train the repetition network.

Type: Application

Filed: June 10, 2020

Publication date: August 31, 2023

Inventors: Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Andrew Zisserman, Pierre Sermanet
Spatial transformer modules

Patent number: 11734572

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.

Type: Grant

Filed: August 17, 2020

Date of Patent: August 22, 2023

Assignee: DeepMind Technologies Limited

Inventors: Maxwell Elliot Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
PARALLEL VIDEO PROCESSING SYSTEMS

Publication number: 20230186625

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Application

Filed: February 13, 2023

Publication date: June 15, 2023

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
Parallel video processing neural networks

Patent number: 11580736

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Grant

Filed: January 7, 2019

Date of Patent: February 14, 2023

Assignee: DeepMind Technologies Limited

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
Sampling latent variables to generate multiple segmentations of an image

Patent number: 11430123

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a plurality of possible segmentations of an image. In one aspect, a method comprises: receiving a request to generate a plurality of possible segmentations of an image; sampling a plurality of latent variables from a latent space, wherein each latent variable is sampled from the latent space in accordance with a respective probability distribution over the latent space that is determined based on the image; generating a plurality of possible segmentations of the image, comprising, for each latent variable, processing the image and the latent variable using a segmentation neural network having a plurality of segmentation neural network parameters to generate the possible segmentation of the image; and providing the plurality of possible segmentations of the image in response to the request.

Type: Grant

Filed: May 22, 2020

Date of Patent: August 30, 2022

Assignee: DeepMind Technologies Limited

Inventors: Simon Kohl, Bernardino Romera-Paredes, Danilo Jimenez Rezende, Seyed Mohammadali Eslami, Pushmeet Kohli, Andrew Zisserman, Olaf Ronneberger
Action recognition in videos using 3D spatio-temporal convolutional neural networks

Patent number: 11361546

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Grant

Filed: August 27, 2020

Date of Patent: June 14, 2022

Assignee: DeepMind Technologies Limited

Inventors: Joao Carreira, Andrew Zisserman
ACTION CLASSIFICATION IN VIDEO CLIPS USING ATTENTION-BASED NEURAL NETWORKS

Publication number: 20220019807

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.

Type: Application

Filed: November 20, 2019

Publication date: January 20, 2022

Inventors: Joao Carreira, Carl Doersch, Andrew Zisserman
NEURAL NETWORK SYSTEMS FOR DECOMPOSING VIDEO DATA INTO LAYERED REPRESENTATIONS

Publication number: 20220012898

Abstract: A computer-implemented neural network system for decomposing input video data. A video data input receives a sequence of video image frames. The sequence is encoded, using a 3D spatio-temporal encoder neural network, into a set of latent variables representing a compressed version of the sequence. A 3D spatio-temporal decoder neural network processes the set of latent variables to generate two or more sets of decomposed video data; these may be stored, communicated, and/or made available to a user interface. Input video including undesired features such as reflections, shadows, and occlusions may thus be decomposed into two or more video sequences, one in which the undesired features are suppressed, and another containing the undesired features.

Type: Application

Filed: November 20, 2019

Publication date: January 13, 2022

Inventors: Joao Carreira, Jean-Baptiste Alayrac, Andrew Zisserman
ALIGNING SEQUENCES BY GENERATING ENCODED REPRESENTATIONS OF DATA ITEMS

Publication number: 20220004883

Abstract: An encoder neural network is described which can encode a data item, such as a frame of a video, to form a respective encoded data item. Data items of a first data sequence are associated with respective data items of a second sequence, by determining which of the encoded data items of the second sequence is closest to the encoded data item produced from each data item of the first sequence. Thus, the two data sequences are aligned. The encoder neural network is trained automatically using a training set of data sequences, by an iterative process of successively increasing cycle consistency between pairs of the data sequences.

Type: Application

Filed: November 21, 2019

Publication date: January 6, 2022

Inventors: Yusuf Aytar, Debidatta Dwibedi, Andrew Zisserman, Jonathan Tompson, Pierre Sermanet
CROSS-TRANSFORMER NEURAL NETWORK SYSTEM FOR FEW-SHOT SIMILARITY DETERMINATION AND CLASSIFICATION

Publication number: 20210383226

Abstract: There is described a neural network system for determining a similarity measure between a query data item and a set of support data items. The neural network system is implemented by one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising receiving the query data item and obtaining a support set of one or more support data items comprising a support key embedding and a support value embedding for each respective support data item in the support set. The operations further comprise generating a query key embedding for the query data item using a key embedding neural network subsystem configured to process a data item to generate a key embedding.

Type: Application

Filed: June 4, 2021

Publication date: December 9, 2021

Inventors: Carl Doersch, Ankush Gupta, Andrew Zisserman
SPATIAL TRANSFORMER MODULES

Publication number: 20210034909

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.

Type: Application

Filed: August 17, 2020

Publication date: February 4, 2021

Inventors: Maxwell Elliot Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
PARALLEL VIDEO PROCESSING NEURAL NETWORKS

Publication number: 20210027064

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.

Type: Application

Filed: January 7, 2019

Publication date: January 28, 2021

Inventors: Simon Osindero, Joao Carreira, Viorica Patraucean, Andrew Zisserman
ACTION RECOGNITION IN VIDEOS USING 3D SPATIO-TEMPORAL CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20200394412

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Application

Filed: August 27, 2020

Publication date: December 17, 2020

Inventors: Joao Carreira, Andrew Zisserman
SAMPLING LATENT VARIABLES TO GENERATE MULTIPLE SEGMENTATIONS OF AN IMAGE

Publication number: 20200372654

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a plurality of possible segmentations of an image. In one aspect, a method comprises: receiving a request to generate a plurality of possible segmentations of an image; sampling a plurality of latent variables from a latent space, wherein each latent variable is sampled from the latent space in accordance with a respective probability distribution over the latent space that is determined based on the image; generating a plurality of possible segmentations of the image, comprising, for each latent variable, processing the image and the latent variable using a segmentation neural network having a plurality of segmentation neural network parameters to generate the possible segmentation of the image; and providing the plurality of possible segmentations of the image in response to the request.

Type: Application

Filed: May 22, 2020

Publication date: November 26, 2020

Inventors: Simon Kohl, Bernardino Romera-Paredes, Danilo Jimenez Rezende, Seyed Mohammadali Eslami, Pushmeet Kohli, Andrew Zisserman, Olaf Ronneberger
Action recognition in videos using 3D spatio-temporal convolutional neural networks

Patent number: 10789479

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Grant

Filed: November 12, 2019

Date of Patent: September 29, 2020

Assignee: DeepMind Technologies Limited

Inventors: Joao Carreira, Andrew Zisserman
Spatial transformer modules

Patent number: 10748029

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.

Type: Grant

Filed: July 20, 2018

Date of Patent: August 18, 2020

Assignee: DeepMind Technologies Limited

Inventors: Maxwell Elliot Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
ACTION RECOGNITION IN VIDEOS USING 3D SPATIO-TEMPORAL CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20200125852

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.

Type: Application

Filed: November 12, 2019

Publication date: April 23, 2020

Inventors: Joao Carreira, Andrew Zisserman

1 2 next