Patents by Inventor Alexey Dosovitskiy

Alexey Dosovitskiy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

Publication number: 20240161459

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Application

Filed: January 25, 2024

Publication date: May 16, 2024

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
Processing images using self-attention based neural networks

Patent number: 11983903

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Grant

Filed: November 1, 2023

Date of Patent: May 14, 2024

Assignee: Google LLC

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Open-vocabulary object detection in images

Patent number: 11928854

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Grant

Filed: May 5, 2023

Date of Patent: March 12, 2024

Assignee: Google LLC

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20240062426

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: November 1, 2023

Publication date: February 22, 2024

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

Publication number: 20230360365

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Application

Filed: May 5, 2023

Publication date: November 9, 2023

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
View Synthesis Robust to Unconstrained Image Data

Publication number: 20230306655

Abstract: Provided are systems and methods for synthesizing novel views of complex scenes (e.g., outdoor scenes). In some implementations, the systems and methods can include or use machine-learned models that are capable of learning from unstructured and/or unconstrained collections of imagery such as, for example, “in the wild” photographs. In particular, example implementations of the present disclosure can learn a volumetric scene density and radiance represented by a machine-learned model such as one or more multilayer perceptrons (MLPs).

Type: Application

Filed: June 1, 2023

Publication date: September 28, 2023

Inventors: Daniel Christopher Duckworth, Alexey Dosovitskiy, Ricardo Martin-Brualla, Jonathan Tilton Barron, Noha Radwan, Seyed Mohammad Mehdi Sajjadi
View synthesis robust to unconstrained image data

Patent number: 11704844

Abstract: Provided are systems and methods for synthesizing novel views of complex scenes (e.g., outdoor scenes). In some implementations, the systems and methods can include or use machine-learned models that are capable of learning from unstructured and/or unconstrained collections of imagery such as, for example, “in the wild” photographs. In particular, example implementations of the present disclosure can learn a volumetric scene density and radiance represented by a machine-learned model such as one or more multilayer perceptrons (MLPs).

Type: Grant

Filed: April 18, 2022

Date of Patent: July 18, 2023

Assignee: GOOGLE LLC

Inventors: Daniel Christopher Duckworth, Alexey Dosovitskiy, Ricardo Martin Brualla, Jonathan Tilton Barron, Noha Waheed Ahmed Radwan, Seyed Mohammad Mehdi Sajjadi
Conditional Object-Centric Learning with Slot Attention for Video and Other Sequential Data

Publication number: 20220383628

Abstract: A method includes obtaining first feature vectors and second feature vectors representing contents of a first and second image frame, respectively, of an input video. The method may also include generating, based on the first feature vectors, first slot vectors, where each slot vector represents attributes of a corresponding entity as represented in the first image frame, and generating, based on the first slot vectors, predicted slot vectors including a corresponding predicted slot vector that represents a transition of the attributes of the corresponding entity from the first to the second image frame. The method may additionally include generating, based on the predicted slot vectors and the second feature vectors, second slot vectors including a corresponding slot vector that represents the attributes of the corresponding entity as represented in the second image frame, and determining an output based on the predicted slot vectors or the second slot vectors.

Type: Application

Filed: April 21, 2022

Publication date: December 1, 2022

Inventors: Thomas Kipf, Gamaleldin Elsayed, Aravindh Mahendran, Austin Charles Stone, Sara Sabour Rouh Aghdam, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff
MULTI-LAYER PERCEPTRON-BASED COMPUTER VISION NEURAL NETWORKS

Publication number: 20220375211

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using mixer neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more mixer neural network layers.

Type: Application

Filed: May 5, 2022

Publication date: November 24, 2022

Inventors: Ilya Tolstikhin, Neil Matthew Tinmouth Houlsby, Alexander Kolesnikov, Lucas Klaus Beyer, Alexey Dosovitskiy, Mario Lucic, Xiaohua Zhai, Thomas Unterthiner, Daniel M. Keysers, Jakob D. Uszkoreit, Yin Ching Jessica Yung, Andreas Steiner
View Synthesis Robust to Unconstrained Image Data

Publication number: 20220237834

Abstract: Provided are systems and methods for synthesizing novel views of complex scenes (e.g., outdoor scenes). In some implementations, the systems and methods can include or use machine-learned models that are capable of learning from unstructured and/or unconstrained collections of imagery such as, for example, “in the wild” photographs. In particular, example implementations of the present disclosure can learn a volumetric scene density and radiance represented by a machine-learned model such as one or more multilayer perceptrons (MLPs).

Type: Application

Filed: April 18, 2022

Publication date: July 28, 2022

Inventors: Daniel Christopher Duckworth, Alexey Dosovitskiy, Ricardo Martin Brualla, Jonathan Tilton Barron, Noha Waheed Ahmed Radwan, Seyed Mohammad Mehdi Sajjadi
END-TO-END TRAINING OF NEURAL NETWORKS FOR IMAGE PROCESSING

Publication number: 20220172066

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to process images. One of the methods includes obtaining a training image; processing the training image using a first subnetwork to generate, for each of a plurality of first image patches of the training image, a relevance score; generating, using the relevance scores, one or more second image patches of the training image by performing one or more differentiable operations on the relevance scores; processing the one or more second image patches using a second subnetwork to generate a prediction about the training image; determining an error of the training network output; and generating a parameter update for the first subnetwork, comprising backpropagating gradients determined according to the error of the training network output through i) the second subnetwork, ii) the one or more differentiable operations, and iii) the first subnetwork.

Type: Application

Filed: November 30, 2021

Publication date: June 2, 2022

Inventors: Thomas Unterthiner, Alexey Dosovitskiy, Aravindh Mahendran, Dirk Weissenborn, Jakob D. Uszkoreit, Jean-Baptiste Cordonnier
View synthesis robust to unconstrained image data

Patent number: 11308659

Abstract: Provided are systems and methods for synthesizing novel views of complex scenes (e.g., outdoor scenes). In some implementations, the systems and methods can include or use machine-learned models that are capable of learning from unstructured and/or unconstrained collections of imagery such as, for example, “in the wild” photographs. In particular, example implementations of the present disclosure can learn a volumetric scene density and radiance represented by a machine-learned model such as one or more multilayer perceptrons (MLPs).

Type: Grant

Filed: July 30, 2021

Date of Patent: April 19, 2022

Assignee: GOOGLE LLC

Inventors: Daniel Christopher Duckworth, Seyed Mohammad Mehdi Sajjadi, Jonathan Tilton Barron, Noha Radwan, Alexey Dosovitskiy, Ricardo Martin-Brualla
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20220108478

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: October 1, 2021

Publication date: April 7, 2022

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
View Synthesis Robust To Unconstrained Image Data

Publication number: 20220036602

Abstract: Provided are systems and methods for synthesizing novel views of complex scenes (e.g., outdoor scenes). In some implementations, the systems and methods can include or use machine-learned models that are capable of learning from unstructured and/or unconstrained collections of imagery such as, for example, “in the wild” photographs. In particular, example implementations of the present disclosure can learn a volumetric scene density and radiance represented by a machine-learned model such as one or more multilayer perceptrons (MLPs).

Type: Application

Filed: July 30, 2021

Publication date: February 3, 2022

Inventors: Daniel Christopher Duckworth, Seyed Mohammad Mehdi Sajjadi, Jonathan Tilton Barron, Noha Waheed Ahmed Radwan, Alexey Dosovitskiy, Ricardo Martin-Brualla
Object-Centric Learning with Slot Attention

Publication number: 20210383199

Abstract: A method involves receiving a perceptual representation including a plurality of feature vectors, and initializing a plurality of slot vectors represented by a neural network memory unit. Each respective slot vector is configured to represent a corresponding entity in the perceptual representation. The method also involves determining an attention matrix based on a product of the plurality of feature vectors transformed by a key function and the plurality of slot vectors transformed by a query function. Each respective value of a plurality of values along each respective dimension of the attention matrix is normalized with respect to the plurality of values. The method additionally involves determining an update matrix based on the plurality of feature vectors transformed by a value function and the attention matrix, and updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit.

Type: Application

Filed: July 13, 2020

Publication date: December 9, 2021

Inventors: Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner, Aravindh Mahendran, Francesco Locatello, Thomas Kipf, Georg Heigold, Alexey Dosovitskiy