Patents by Inventor Neil Matthews

Neil Matthews has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

Publication number: 20250148759

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Application

Filed: January 8, 2025

Publication date: May 8, 2025

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
MERGING ELEMENTS OF SEQUENCES DURING NEURAL NETWORK PROCESSING

Publication number: 20250139432

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network including one or more merger neural network blocks that each generate block output sequence that has fewer elements than the block input sequence that is processed by the merger neural network block.

Type: Application

Filed: February 6, 2023

Publication date: May 1, 2025

Inventors: Cédric Benjamin Renggli, Carlos Riquelme Ruiz, André Susano Pinto, Basil Mustafa, Joan Puigcerver i Perez, Neil Matthew Tinmouth Houlsby
Transfer learning between different computer vision tasks

Patent number: 12272442

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform a downstream computer vision task. One of the methods includes pre-training an initial neural network that shares layers with the neural network to perform an initial computer vision task and then training the neural network on the downstream computer vision task.

Type: Grant

Filed: December 14, 2021

Date of Patent: April 8, 2025

Assignee: Google LLC

Inventors: Xiaohua Zhai, Sylvain Gelly, Alexander Kolesnikov, Yin Ching Jessica Yung, Joan Puigcerver i Perez, Lucas Klaus Beyer, Neil Matthew Tinmouth Houlsby, Wen Yau Aaron Loh, Alan Prasana Karthikesalingam, Basil Mustafa, Jan Freyberg, Patricia Leigh MacWilliams, Vivek Natarajan
Open-vocabulary object detection in images

Patent number: 12230011

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Grant

Filed: January 25, 2024

Date of Patent: February 18, 2025

Assignee: Google LLC

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20250005797

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: September 12, 2024

Publication date: January 2, 2025

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20250005798

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: September 12, 2024

Publication date: January 2, 2025

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Scaling Heteroscedastic Classifiers for Large Numbers of Classes

Publication number: 20240354593

Abstract: HET classifiers, which learn a multivariate Gaussian distribution over prediction logits, perform well on image classification problems with hundreds to thousands of classes. However, compared to standard classifiers (e.g., deterministic (DET) classifiers), they introduce extra parameters that scale linearly with the number of classes. This makes them infeasible to apply to larger-scale problems. In addition, HET classifiers introduce a temperature hyperparameter, which is ordinarily tuned. HET classifiers are disclosed, where the parameter count (when compared to a DET classifier) scales independently of the number of classes. In large-scale settings of the embodiments, the need to tune the temperature hyperparameter is removed, by directly learning it on the training data.

Type: Application

Filed: July 20, 2023

Publication date: October 24, 2024

Inventors: Rodolphe René Willy Jenatton, Mark Patrick Collier, Effrosyni Kokiopoulou, Basil Mustafa, Neil Matthew Tinmouth Houlsby, Jesse Berent
Processing images using self-attention based neural networks

Patent number: 12125247

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Grant

Filed: October 1, 2021

Date of Patent: October 22, 2024

Assignee: Google LLC

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
PROCESSING IMAGES USING MIXTURE OF EXPERTS

Publication number: 20240289926

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating predictions about images. One of the systems includes a neural network comprising a sequence of one or more network blocks that are each configured to perform operations comprising: obtaining a block input that represents an intermediate representation of an input image; determining a plurality of patches of the block input or of an updated representation of the block input, wherein each patch comprises a different subset of elements of the block input or of the updated representation of the block input; assigning each patch to one or more respective expert modules of a plurality of expert modules of the network block; for each patch of the plurality of patches, processing the patch using the corresponding expert modules to generate respective module outputs; and generating a block output by combining the module outputs.

Type: Application

Filed: May 27, 2022

Publication date: August 29, 2024

Inventors: Carlos Riquelme Ruiz, André Susano Pinto, Basil Mustafa, Daniel M. Keysers, Joan Puigcerver i Perez, Maxim Neumann, Neil Matthew Tinmouth Houlsby, Rodolphe Jenatton
TRAINING ULTRA-LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS

Publication number: 20240256835

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing an input through each of a plurality of layers of a neural network to generate an output using a plurality of hardware accelerators. The plurality of layers comprise a fully connected layer having a plurality of parameters arranged in a row dimension and a column dimension. One of the methods comprises: generating a plurality of parameter blocks by partitioning the plurality of parameters along the row dimension and the column dimension; determining a ratio of a number of parameters along the row dimension relative to a number of parameters along the column dimension; and determining whether to use row sharding or column sharding with the plurality of hardware accelerators to calculate an output for the fully connected layer and then calculating the output for the fully connected layer using either row sharding or column sharding.

Type: Application

Filed: January 26, 2024

Publication date: August 1, 2024

Inventors: Mostafa Dehghani, Josip Djolonga, Jonathan Heek, Basil Mustafa, Piotr Michal Padlewski, Justin Morgan Gilmer, Neil Matthew Tinmouth Houlsby
Neural Network Architectures with Multiple Normalization Layers for Machine Vision

Publication number: 20240257511

Abstract: One example aspect of the present disclosure is directed to a neural network for machine vision. The neural network may include a stem block that includes a set of stem layers. The neural network may additionally include a visual transformer block. The set of stem layers may include a patch layer, a first normalization layer, an embedding layer, and a second normalization layer. The patch layer subdivides an input image into a set of image patches. The first normalization layer generates a set of normalized image patches by performing a first normalization process on each image patch of the set of image patches. The patch layer feeds forward to the first normalization layer. The embedding layer generates a set of vector embeddings. Each vector embedding of the set of embedding vectors is a projection of a corresponding normalized image patch from the set of normalized image patches onto a visual token. The first normalization layer feeds forward to the embedding layer.

Type: Application

Filed: January 22, 2024

Publication date: August 1, 2024

Inventors: Manoj Kumar Sivaraj, Neil Matthew Tinmouth Houlsby, Mostafa Dehghani
Pixel-Based Machine-Learned Models for Multimodal Vision-Language Tasks

Publication number: 20240169629

Abstract: A first image and textual content associated with the first image is obtained. A second image that depicts the textual content associated with the first image is rendered. The first image and the second image are processed with a machine-learned encoding model to respectively obtain a first image embedding and a second image embedding for an image embedding space including a plurality of image embeddings. The machine-learned encoding model is trained based on a difference between the first image embedding and the second image embedding.

Type: Application

Filed: November 17, 2023

Publication date: May 23, 2024

Inventors: Michael Tobias Tschannen, Neil Matthew Tinmouth Houlsby, Basil Mustafa
OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

Publication number: 20240161459

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Application

Filed: January 25, 2024

Publication date: May 16, 2024

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
Processing images using self-attention based neural networks

Patent number: 11983903

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Grant

Filed: November 1, 2023

Date of Patent: May 14, 2024

Assignee: Google LLC

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Open-vocabulary object detection in images

Patent number: 11928854

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Grant

Filed: May 5, 2023

Date of Patent: March 12, 2024

Assignee: Google LLC

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20240062426

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: November 1, 2023

Publication date: February 22, 2024

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
OPEN-VOCABULARY OBJECT DETECTION IN IMAGES

Publication number: 20230360365

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

Type: Application

Filed: May 5, 2023

Publication date: November 9, 2023

Inventors: Matthias Johannes Lorenz Minderer, Alexey Alexeevich Gritsenko, Austin Charles Stone, Dirk Weissenborn, Alexey Dosovitskiy, Neil Matthew Tinmouth Houlsby
Scalable Transfer Learning with Expert Models

Publication number: 20230196211

Abstract: Generally, the present disclosure is directed to systems and methods that provide a simple, scalable, yet effective strategy to perform transfer learning with a mixture of experts (MoE). In particular, the transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. In contrast, example systems and methods of the present disclosure use expert representations for transfer with a simple, yet effective, strategy.

Type: Application

Filed: June 7, 2021

Publication date: June 22, 2023

Inventors: Carlos Riquelme Ruiz, André Susano Pinto, Joan Puigcerver, Basil Mustafa, Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Cedric Benjamin Renggli, Daniel Martin Keysers
ENSEMBLING MIXTURE-OF-EXPERTS NEURAL NETWORKS

Publication number: 20230107409

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network including one or more expert neural network blocks that each include multiple routers and multiple expert neural networks.

Type: Application

Filed: October 5, 2022

Publication date: April 6, 2023

Inventors: Rodolphe Jenatton, Carlos Riquelme Ruiz, Dustin Tran, James Urquhart Allingham, Florian Wenzel, Zelda Elaine Mariet, Basil Mustafa, Joan Puigcerver i Perez, Neil Matthew Tinmouth Houlsby, Ghassen Jerfel
TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS

Publication number: 20220383630

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training Vision Transformer (ViT) neural networks.

Type: Application

Filed: May 31, 2022

Publication date: December 1, 2022

Inventors: Lucas Klaus Beyer, Neil Matthew Tinmouth Houlsby, Alexander Kolesnikov, Xiaohua Zhai

1 2 next