Patents by Inventor Mostafa Dehghani

Mostafa Dehghani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20250005797

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: September 12, 2024

Publication date: January 2, 2025

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20250005798

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: September 12, 2024

Publication date: January 2, 2025

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Systems and Methods for Improved Video Understanding

Publication number: 20240428587

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

Type: Application

Filed: September 6, 2024

Publication date: December 26, 2024

Inventors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Luise Schmid
Systems and Methods for Improved Video Understanding

Publication number: 20240428586

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of spatiotemporal representations from the video data, the plurality of spatiotemporal representations comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of spatiotemporal representations as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

Type: Application

Filed: September 6, 2024

Publication date: December 26, 2024

Inventors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Luise Schmid
SELF-ATTENTION BASED NEURAL NETWORKS FOR PROCESSING NETWORK INPUTS FROM MULTIPLE MODALITIES

Publication number: 20240403636

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing and training a multi-modal, multi-task self-attention neural network.

Type: Application

Filed: October 5, 2022

Publication date: December 5, 2024

Inventors: Valerii Likhosherstov, Mostafa Dehghani, Anurag Arnab, Krzysztof Marcin Choromanski, Mario Lucic, Yi Tay
Processing images using self-attention based neural networks

Patent number: 12125247

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Grant

Filed: October 1, 2021

Date of Patent: October 22, 2024

Assignee: Google LLC

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
ACTION LOCALIZATION IN VIDEOS USING LEARNED QUERIES

Publication number: 20240346824

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing action localization on an input video. In particular, a system maintains a set of query vectors and uses the input video and the set of query vectors to generate an action localization output for the input video. The action localization output includes, for each of one or more agents depicted in the video, data specifying, for each of one or more video frames in the video, a respective bounding box in the video frame that depicts the agent and a respective action from a set of actions that is being performed by the agent in the video frame.

Type: Application

Filed: April 12, 2024

Publication date: October 17, 2024

Inventors: Alexey Alexeevich Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lucic, Cordelia Luise Schmid, Anurag Arnab
Systems and methods for improved video understanding

Patent number: 12112538

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

Type: Grant

Filed: July 8, 2021

Date of Patent: October 8, 2024

Assignee: GOOGLE LLC

Inventors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Luise Schmid
TRAINING ULTRA-LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS

Publication number: 20240256835

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing an input through each of a plurality of layers of a neural network to generate an output using a plurality of hardware accelerators. The plurality of layers comprise a fully connected layer having a plurality of parameters arranged in a row dimension and a column dimension. One of the methods comprises: generating a plurality of parameter blocks by partitioning the plurality of parameters along the row dimension and the column dimension; determining a ratio of a number of parameters along the row dimension relative to a number of parameters along the column dimension; and determining whether to use row sharding or column sharding with the plurality of hardware accelerators to calculate an output for the fully connected layer and then calculating the output for the fully connected layer using either row sharding or column sharding.

Type: Application

Filed: January 26, 2024

Publication date: August 1, 2024

Inventors: Mostafa Dehghani, Josip Djolonga, Jonathan Heek, Basil Mustafa, Piotr Michal Padlewski, Justin Morgan Gilmer, Neil Matthew Tinmouth Houlsby
Neural Network Architectures with Multiple Normalization Layers for Machine Vision

Publication number: 20240257511

Abstract: One example aspect of the present disclosure is directed to a neural network for machine vision. The neural network may include a stem block that includes a set of stem layers. The neural network may additionally include a visual transformer block. The set of stem layers may include a patch layer, a first normalization layer, an embedding layer, and a second normalization layer. The patch layer subdivides an input image into a set of image patches. The first normalization layer generates a set of normalized image patches by performing a first normalization process on each image patch of the set of image patches. The patch layer feeds forward to the first normalization layer. The embedding layer generates a set of vector embeddings. Each vector embedding of the set of embedding vectors is a projection of a corresponding normalized image patch from the set of normalized image patches onto a visual token. The first normalization layer feeds forward to the embedding layer.

Type: Application

Filed: January 22, 2024

Publication date: August 1, 2024

Inventors: Manoj Kumar Sivaraj, Neil Matthew Tinmouth Houlsby, Mostafa Dehghani
Pretraining Already-Pretrained Models for Diverse Downstream Tasks

Publication number: 20240256964

Abstract: An example method includes obtaining a pretrained machine-learned model that was initially pretrained using a pretraining dataset and further pretraining the model by generating, using a pretraining objective framework, a plurality of corrupted training examples from one or more training examples obtained from the pretraining dataset. A first set of one or more training examples can be corrupted according to a first set of configuration parameters of the pretraining objective framework. A second set can be corrupted according to a second set of configuration parameters of the pretraining objective framework. The example method includes inputting the plurality of corrupted training examples into model; obtaining from the model, a plurality of outputs respectively generated by model based on the plurality of corrupted training examples; and updating one or more parameters of model based on an evaluation of the plurality of outputs.

Type: Application

Filed: January 26, 2024

Publication date: August 1, 2024

Inventors: Yi Tay, Mostafa Dehghani
EFFICIENT DECODING OF OUTPUT SEQUENCES USING ADAPTIVE EARLY EXITING

Publication number: 20240169184

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.

Type: Application

Filed: January 29, 2024

Publication date: May 23, 2024

Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, JR.
Processing images using self-attention based neural networks

Patent number: 11983903

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Grant

Filed: November 1, 2023

Date of Patent: May 14, 2024

Assignee: Google LLC

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
UNIVERSAL TRANSFORMERS

Publication number: 20240143691

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a sequence-to-sequence model that is recurrent in depth while employing self-attention to combine information from different parts of sequences.

Type: Application

Filed: December 18, 2023

Publication date: May 2, 2024

Inventors: Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob D. Uszkoreit, Lukasz Mieczyslaw Kaiser
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS

Publication number: 20240062426

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Type: Application

Filed: November 1, 2023

Publication date: February 22, 2024

Inventors: Neil Matthew Tinmouth Houlsby, Sylvain Gelly, Jakob D. Uszkoreit, Xiaohua Zhai, Georg Heigold, Lucas Klaus Beyer, Alexander Kolesnikov, Matthias Johannes Lorenz Minderer, Dirk Weissenborn, Mostafa Dehghani, Alexey Dosovitskiy, Thomas Unterthiner
Efficient decoding of output sequences using adaptive early exiting

Patent number: 11886976

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.

Type: Grant

Filed: July 14, 2023

Date of Patent: January 30, 2024

Assignee: Google LLC

Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
EFFICIENT DECODING OF OUTPUT SEQUENCES USING ADAPTIVE EARLY EXITING

Publication number: 20240020516

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.

Type: Application

Filed: July 14, 2023

Publication date: January 18, 2024

Inventors: Tal Schuster, Adam Joshua Fisch, Jai Prakash Gupta, Mostafa Dehghani, Dara Bahri, Vinh Quoc Tran, Yi Tay, Donald Arthur Metzler, Jr.
Universal transformers

Patent number: 11860969

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a sequence to sequence model that is recurrent in depth while employing self-attention to combine information from different parts of sequences.

Type: Grant

Filed: August 10, 2020

Date of Patent: January 2, 2024

Assignee: Google LLC

Inventors: Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob D. Uszkoreit, Lukasz Mieczyslaw Kaiser
COMPUTER VISION NEURAL NETWORKS WITH LEARNED TOKENIZATION

Publication number: 20230409899

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing a network input using a computer vision neural network with learned tokenization.

Type: Application

Filed: June 21, 2022

Publication date: December 21, 2023

Inventors: Michael Sahngwon Ryoo, Anthony Jacob Piergiovanni, Anelia Angelova, Anurag Arnab, Mostafa Dehghani
Using Chains of Thought to Prompt Machine-Learned Models Pre-Trained on Diversified Objectives

Publication number: 20230244938

Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples.

Type: Application

Filed: January 27, 2023

Publication date: August 3, 2023

Inventors: Jason Weng Wei, Dengyong Zhou, Xuezhi Wang, Dale Eric Schuurmans, Quoc V. Le, Maarten Paul Bosma, Ed Huai-Hsin Chi, Olivier Jean Andrè Bousquet, Le Hou, Charles Aloysius Sutton, Nathanael Martin Schärli, Nathan Kemp Sekiguchi Scales, Augustus Quadrozzi Odena, Sharan Ajit Narang, Guy Gur-Ari Krakover, Aakanksha Chowdhery, David Martin Dohan, Aitor Lewkowycz, Henryk Michalewski, Jiageng Luan, David J. Bieber, Jacob Austin, Anders Johan Andreassen, Maxwell Isaac Nye, Yi Tay, Mostafa Dehghani

1 2 next