Patents by Inventor Xiaohua Zhai

Xiaohua Zhai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES

Publication number: 20240169715

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network that is configured to process an input image to generate a network output for the input image. In one aspect, a method comprises, at each of a plurality of training steps: obtaining a plurality of training images for the training step; obtaining, for each of the plurality of training images, a respective target output; and selecting, from a plurality of image patch generation schemes, an image patch generation scheme for the training step, wherein, given an input image, each of the plurality of image patch generation schemes generates a different number of patches of the input image, and wherein each patch comprises a respective subset of the pixels of the input image.

Type: Application

Filed: November 22, 2023

Publication date: May 23, 2024

Inventors: Lucas Klaus Beyer, Pavel Izmailov, Simon Kornblith, Alexander Kolesnikov, Mathilde Caron, Xiaohua Zhai, Matthias Johannes Lorenz Minderer, Ibrahim Alabdulmohsin, Michael Tobias Tschannen, Filip Pavetic
Processing images using self-attention based neural networks

Patent number: 11983903

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Grant

Filed: November 1, 2023

Date of Patent: May 14, 2024

Assignee: Google LLC

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Locked-Model Multimodal Contrastive Tuning

Publication number: 20240153256

Abstract: A method may include obtaining a pretrained image encoder and a training sample comprising a training image and a training text string corresponding to the training image. The method may also include initializing a text encoder in an untrained state, determining, using the pretrained image encoder and based on the training image, a first latent representation of the training image, and determining, using the text encoder and based on the training text string, a second latent representation of the training text string. The method may further include determining a loss value based on the first latent representation and the second latent representation, updating, based on the loss value, one or more parameters of the text encoder while holding fixed parameters of the pretrained image encoder, and outputting the text encoder in a trained state.

Type: Application

Filed: October 31, 2022

Publication date: May 9, 2024

Inventors: Daniel Keysers, Xiaohua Zhai, Xiao Wang, Lucas Beyer, Basil Mustafa, Andreas Steiner, Alexander Kolesnikov
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20240062426

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: November 1, 2023

Publication date: February 22, 2024

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS

Publication number: 20220383630

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training Vision Transformer (ViT) neural networks.

Type: Application

Filed: May 31, 2022

Publication date: December 1, 2022

Inventors: Lucas Klaus Beyer, Neil Matthew Tinmouth Houlsby, Alexander Kolesnikov, Xiaohua Zhai
MULTI-LAYER PERCEPTRON-BASED COMPUTER VISION NEURAL NETWORKS

Publication number: 20220375211

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using mixer neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more mixer neural network layers.

Type: Application

Filed: May 5, 2022

Publication date: November 24, 2022

Inventors: Ilya Tolstikhin, Neil Matthew Tinmouth Houlsby, Alexander Kolesnikov, Lucas Klaus Beyer, Alexey Dosovitskiy, Mario Lucic, Xiaohua Zhai, Thomas Unterthiner, Daniel M. Keysers, Jakob D. Uszkoreit, Yin Ching Jessica Yung, Andreas Steiner
TRANSFER LEARNING BETWEEN DIFFERENT COMPUTER VISION TASKS

Publication number: 20220189612

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform a downstream computer vision task. One of the methods includes pre-training an initial neural network that shares layers with the neural network to perform an initial computer vision task and then training the neural network on the downstream computer vision task.

Type: Application

Filed: December 14, 2021

Publication date: June 16, 2022

Inventors: Xiaohua Zhai, Sylvain Gelly, Alexander Kolesnikov, Yin Ching Jessica Yung, Joan Puigcerver i Perez, Lucas Klaus Beyer, Neil Matthew Tinmouth Houlsby, Wen Yau Aaron Loh, Alan Prasana Karthikesalingam, Basil Mustafa, Jan Freyberg, Patricia Leigh MacWilliams, Vivek Natarajan
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20220108478

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: October 1, 2021

Publication date: April 7, 2022

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner