Patents by Inventor Thomas Kipf

Thomas Kipf has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Object-centric learning with slot attention

Patent number: 12639556

Abstract: A method involves receiving a perceptual representation including a plurality of feature vectors, and initializing a plurality of slot vectors represented by a neural network memory unit. Each respective slot vector is configured to represent a corresponding entity in the perceptual representation. The method also involves determining an attention matrix based on a product of the plurality of feature vectors transformed by a key function and the plurality of slot vectors transformed by a query function. Each respective value of a plurality of values along each respective dimension of the attention matrix is normalized with respect to the plurality of values. The method additionally involves determining an update matrix based on the plurality of feature vectors transformed by a value function and the attention matrix, and updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit.

Type: Grant

Filed: July 13, 2020

Date of Patent: May 26, 2026

Assignee: Google LLC

Inventors: Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner, Aravindh Mahendran, Francesco Locatello, Thomas Kipf, Georg Heigold, Alexey Dosovitskiy
Geometry-Free Neural Scene Representations Through Novel-View Synthesis

Publication number: 20260141621

Abstract: Provided are machine learning models that generate geometry-free neural scene representations through efficient object-centric novel-view synthesis. In particular, one example aspect of the present disclosure provides a novel framework in which an encoder model (e.g., an encoder transformer network) processes one or more RGB images (with or without pose) to produce a fully latent scene representation that can be passed to a decoder model (e.g., a decoder transformer network). Given one or more target poses, the decoder model can synthesize images in a single forward pass. In some example implementations, because transformers are used rather than convolutional or MLP networks, the encoder can learn an attention model that extracts enough 3D information about a scene from a small set of images to render novel views with correct projections, parallax, occlusions, and even semantics, without explicit geometry.

Type: Application

Filed: January 15, 2026

Publication date: May 21, 2026

Inventors: Seyed Mohammad Mehdi Sajjadi, Klaus Greff, Daniel Christopher Duckworth, Mario Lucic, Simon Jacob van Steenkiste, Aravindh Mahendran, Filip Pavetic, Leonidas John Guibas, Thomas Kipf
Translation and Scaling Equivariant Slot Attention

Publication number: 20260141705

Abstract: A method includes receiving feature vectors and, for each respective feature vector, a corresponding absolute positional encoding. The method also includes determining latent representations of entities represented by the feature vectors, and determining, for each respective latent representation, a corresponding relative positional encoding based on the corresponding absolute positional encoding of each feature vector and a corresponding position vector associated with the respective latent representation. The method additionally includes determining an attention matrix based on the feature vectors, the entity-centric latent representations, and the corresponding relative positional encoding of each latent representation.

Type: Application

Filed: November 15, 2022

Publication date: May 21, 2026

Inventors: Aravindh Mahendran, Ondrej Biza, Thomas Kipf, Simon Jacob van Steenkiste, Gamaleldin Elsayed, Seyed Mohammad Mehdi Sajjadi
Conditional object-centric learning with slot attention for video and other sequential data

Patent number: 12417623

Abstract: A method includes obtaining first feature vectors and second feature vectors representing contents of a first and second image frame, respectively, of an input video. The method may also include generating, based on the first feature vectors, first slot vectors, where each slot vector represents attributes of a corresponding entity as represented in the first image frame, and generating, based on the first slot vectors, predicted slot vectors including a corresponding predicted slot vector that represents a transition of the attributes of the corresponding entity from the first to the second image frame. The method may additionally include generating, based on the predicted slot vectors and the second feature vectors, second slot vectors including a corresponding slot vector that represents the attributes of the corresponding entity as represented in the second image frame, and determining an output based on the predicted slot vectors or the second slot vectors.

Type: Grant

Filed: April 21, 2022

Date of Patent: September 16, 2025

Assignee: Google LLC

Inventors: Thomas Kipf, Gamaleldin Elsayed, Aravindh Mahendran, Austin Charles Stone, Sara Sabour Rouh Aghdam, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff
Latent Pose Queries for Machine-Learned Image View Synthesis

Publication number: 20240169662

Abstract: An example method includes obtaining, by a computing system, one or more source images of a scene; obtaining, by the computing system, a query associated with a target view of the scene, wherein at least a portion of the query is parameterized in a latent pose space; and generating, by the computing system and using a machine-learned image view synthesis model, an output image of the scene associated with the target view.

Type: Application

Filed: November 22, 2023

Publication date: May 23, 2024

Inventors: Seyed Mohammad Mehdi Sajjadi, Klaus Greff, Etienne François Régis Pot, Daniel Christopher Duckworth, Mario Lucic, Aravindh Mahendran, Thomas Kipf
Conditional Object-Centric Learning with Slot Attention for Video and Other Sequential Data

Publication number: 20220383628

Abstract: A method includes obtaining first feature vectors and second feature vectors representing contents of a first and second image frame, respectively, of an input video. The method may also include generating, based on the first feature vectors, first slot vectors, where each slot vector represents attributes of a corresponding entity as represented in the first image frame, and generating, based on the first slot vectors, predicted slot vectors including a corresponding predicted slot vector that represents a transition of the attributes of the corresponding entity from the first to the second image frame. The method may additionally include generating, based on the predicted slot vectors and the second feature vectors, second slot vectors including a corresponding slot vector that represents the attributes of the corresponding entity as represented in the second image frame, and determining an output based on the predicted slot vectors or the second slot vectors.

Type: Application

Filed: April 21, 2022

Publication date: December 1, 2022

Inventors: Thomas Kipf, Gamaleldin Elsayed, Aravindh Mahendran, Austin Charles Stone, Sara Sabour Rouh Aghdam, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff
Object-Centric Learning with Slot Attention

Publication number: 20210383199

Abstract: A method involves receiving a perceptual representation including a plurality of feature vectors, and initializing a plurality of slot vectors represented by a neural network memory unit. Each respective slot vector is configured to represent a corresponding entity in the perceptual representation. The method also involves determining an attention matrix based on a product of the plurality of feature vectors transformed by a key function and the plurality of slot vectors transformed by a query function. Each respective value of a plurality of values along each respective dimension of the attention matrix is normalized with respect to the plurality of values. The method additionally involves determining an update matrix based on the plurality of feature vectors transformed by a value function and the attention matrix, and updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit.

Type: Application

Filed: July 13, 2020

Publication date: December 9, 2021

Inventors: Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner, Aravindh Mahendran, Francesco Locatello, Thomas Kipf, Georg Heigold, Alexey Dosovitskiy

Object-centric learning with slot attention

Geometry-Free Neural Scene Representations Through Novel-View Synthesis

Translation and Scaling Equivariant Slot Attention

Conditional object-centric learning with slot attention for video and other sequential data

Latent Pose Queries for Machine-Learned Image View Synthesis

Conditional Object-Centric Learning with Slot Attention for Video and Other Sequential Data

Object-Centric Learning with Slot Attention