Patents by Inventor Anthony J. Piergiovanni

Anthony J. Piergiovanni has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Channel Fusion for Vision-Language Representation Learning

Publication number: 20240119713

Abstract: Provided is an approach that aligns multi-modal tokens using cross-attention without losing the advantages of global self-attention. In contrast to previous works that concatenate the unimodal tokens along the sequence dimension, example approaches described herein align per-modality tokens by chaining them along the channels. Specifically, the tokens from one modality can be used to query the other modality and the output can be concatenated with the query tokens on the channels. An analogous process can also be repeated (or performed in parallel) where the roles of the two modalities are switched. The resulting sets of compound tokens can be concatenated and fed into a self-attention encoder such as a transformer encoder that performs self-attention.

Type: Application

Filed: September 27, 2023

Publication date: April 11, 2024

Inventors: Anthony J. Piergiovanni, Maxwell Mbabilla Aladago
Multi-Modal Machine Learning Models with Improved Computational Efficiency Via Adaptive Tokenization and Fusion

Publication number: 20230394306

Abstract: Provided is an efficient multi-modal processing model. The multi-modal processing model can process input data from multiple different domains to generate a prediction for a multi-modal processing task. A machine-learned multi-modal processing model can include an adaptive tokenization layer that is configured to adaptively tokenize features generated from the multi-modal inputs into sets of tokens. Specifically, the tokens may have a smaller data size relative to the features from the inputs, thereby enabling a reduced number of processing operations to be performed overall, thereby improving the efficiency of model.

Type: Application

Filed: June 2, 2023

Publication date: December 7, 2023

Inventors: Anthony J. Piergiovanni, Wei-Cheng Kuo, Anelia Angelova
Small and Fast Video Processing Networks via Neural Architecture Search

Publication number: 20220366257

Abstract: Generally, the present disclosure is directed to a neural architecture search process for finding small and fast video processing networks for understanding of video data. The neural architecture search process can automatically design networks that provide comparable video processing performance at a fraction of the computational and storage cost of larger existing models, thereby conserving computing resources such as memory and processor usage.

Type: Application

Filed: September 16, 2020

Publication date: November 17, 2022

Inventors: Anthony J. Piergiovanni, Anelia Angelova, Michael Sahngwon Ryoo

Channel Fusion for Vision-Language Representation Learning

Multi-Modal Machine Learning Models with Improved Computational Efficiency Via Adaptive Tokenization and Fusion

Small and Fast Video Processing Networks via Neural Architecture Search