Patents by Inventor Tomas Pfister

Tomas Pfister has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Aggregating nested vision transformers

Patent number: 12327395

Abstract: A method includes receiving image data including a series of image patches of an image. The method includes generating, using a first set of transformers of a vision transformer (V-T) model, a first set of higher order feature representations based on the series of image patches and aggregating the first set of higher order feature representations into a second set of higher order feature representations that is smaller than the first set. The method includes generating, using a second set of transformers of the V-T model, a third set of higher order feature representations based on the second set of higher order feature representations and aggregating the third set of higher order feature representations into a fourth set of higher order feature representations that is smaller than the third set. The method includes generating, using the V-T model, an image classification of the image based on the fourth set.

Type: Grant

Filed: May 20, 2022

Date of Patent: June 10, 2025

Assignee: GOOGLE LLC

Inventors: Zizhao Zhang, Han Zhang, Long Zhao, Tomas Pfister
Self-Supervised Learning for Temporal Counterfactual Estimation

Publication number: 20250111285

Abstract: A machine-learned model includes an encoder having a feature block configured to embed input data into a plurality of features in an embedding space. The input data includes multiple components such as covariate, treatment, and output components. The encoder includes one or more encoding layers, each including a temporal attention block and a feature-wise attention block. The temporal attention block is configured to obtain the embedded input data and apply temporal causal attention along a time dimension in parallel for each feature of the plurality of features to generate temporal embeddings. The feature-wise attention block is configured to obtain the temporal embeddings and generate component representations such as a covariate representation, a treatment representation, and an output representation.

Type: Application

Filed: September 30, 2024

Publication date: April 3, 2025

Inventors: Yan Liu, Chuizheng Meng, Yihe Dong, Sercan Omer Arik, Tomas Pfister
Tool Documentation Enables Zero-Shot Tool-Usage With Large Language Models

Publication number: 20250036886

Abstract: Using a large language model to comply with a user request. The large language model receives tool documentation for each of one or more tools, and analyzes the tool documentation for each of the one or more tools to determine, for each tool, one or more tasks that the tool is operable to perform. Upon receiving a request from a user, the large language model generates a plan for complying with the request by using one or more of the tools, the plan including performance of one or more of the tasks.

Type: Application

Filed: July 9, 2024

Publication date: January 30, 2025

Inventors: Chen-Yu Lee, Alexander Ratner, Tomas Pfister, Chun-Liang Li, Yasuhisa Fujii, Ranjay Krishna, Cheng-Yu Hsieh, Si-An Chen
Multimodal Learning from Structured and Unstructured Data

Publication number: 20240386321

Abstract: Aspects of the disclosure are directed to a multimodal processing system for processing both structured and un-structured data. Real-world data is not always consistent in form or content. The multimodal processing system includes model that can be trained to account for this characteristic of real-world data, by selectively masking data of different modalities during pretraining to learn outputs that are the same or comparable between the masked and un-masked inputs. The model is trained according to modality-specific masking objectives computed for each modality of data and joint modality similarity-based masking objectives for a joint representation of the data across all modalities. The system provides consistent and accurate input, even when input data may have substantial portions of data from different modalities missing. Cross-modal relationships in data are reinforced by the model as different portions of data are masked, contributing to an overall increase in model accuracy versus other approaches.

Type: Application

Filed: April 18, 2024

Publication date: November 21, 2024

Inventors: Sayna Ebrahimi, Yihe Dong, Tomas Pfister, Sercan Omer Arik
STRUCTURAL ENCODING AND ATTENTION PARADIGMS FOR SEQUENCE MODELING

Publication number: 20240354504

Abstract: Systems and methods for providing a structure-aware sequence model that can interpret a document's text without first inferring the proper reading order of the document. In some examples, the model may use a graph convolutional network to generate contextualized “supertoken” embeddings for each token, which are then fed to a transformer that employs a sparse attention paradigm in which attention weights for at least some supertokens are modified based on differences between predicted and actual values of the order and distance between the attender and attendee supertokens.

Type: Application

Filed: August 25, 2021

Publication date: October 24, 2024

Inventors: Chen-Yu Lee, Chun-Liang Li, Timothy Dozat, Vincent Perot, Guolong Su, Nan Hua, Joshua Ainslie, Renshen Wang, Yasuhisa Fujii, Tomas Pfister
Data valuation using reinforcement learning

Patent number: 12106223

Abstract: A method includes obtaining a batch of training samples. For each particular training sample in the batch of training samples, the method includes generating, using a data value estimator model and the particular training sample, a corresponding predicted value of the particular training sample when used to train a machine learning model. The method includes selecting, based on the corresponding predicted values, a subset of the batch of training samples. For each particular training sample in the subset of the batch of training samples, the method includes determining, using the machine learning model and the particular training sample, a corresponding prediction performance measurement. The method includes adjusting one or more estimator parameter values of the data value estimator model based on the corresponding prediction performance measurements.

Type: Grant

Filed: June 12, 2023

Date of Patent: October 1, 2024

Assignee: GOOGLE LLC

Inventors: Sercan Omer Arik, Jinsung Yoon, Tomas Pfister
Self-Improving LLMs through Consistency-Based Self-Generated Demonstrations

Publication number: 20240249080

Abstract: Aspects of the disclosure are directed to automatically selecting examples in a prompt for an LLM to demonstrate how to perform tasks. Aspects of the disclosure can select and build a set of examples from LLM zero-shot outputs via predetermined criteria that can combine consistency, diversity, and repetition. In the zero-shot setting for three different LLMs, using only LLM predictions, aspects of the disclosure can improve performance up to 15% compared to zero-shot baselines and can match or exceed few-shot base-lines for a range of reasoning tasks.

Type: Application

Filed: March 30, 2023

Publication date: July 25, 2024

Inventors: Ruoxi Sun, Xingchen Wan, Hanjun Dai, Sercan Omer Arik, Tomas Pfister
Generating Synthetic Heterogenous Time-Series Data

Publication number: 20240185043

Abstract: The present disclosure provides a generative modeling framework for generating highly realistic and privacy preserving synthetic records for heterogenous time-series data, such as electronic health record data, financial data, etc. The generative modeling framework is based on a two-stage model that includes sequential encoder-decoder networks and generative adversarial networks (GANs).

Type: Application

Filed: November 13, 2023

Publication date: June 6, 2024

Inventors: Jinsung Yoon, Michel Jonathan Mizrahi, Nahid Farhady Ghalaty, Thomas Dunn Henry Jarvinen, Ashwin Sura Ravi, Peter Robert Brune, Fanyu Kong, David Roger Anderson, George Lee, Farhana Bandukwala, Eliezer Yosef Kanal, Sercan Omer Arik, Tomas Pfister
Test-Time Adaptation for Visual Document Understanding

Publication number: 20230377359

Abstract: An aspect of the disclosed technology comprises a test-time adaptation (“TTA”) technique for visual document understanding (“VDU”) tasks that uses self-supervised learning on different modalities (e.g., text and layout) by applying masked visual language modeling (“MVLM”) along with pseudo-labeling. In accordance with an aspect of the disclosed technology, the TTA technique enables a document model to adapt to domain or distribution shifts that are detected.

Type: Application

Filed: May 18, 2023

Publication date: November 23, 2023

Applicant: Google LLC

Inventors: Sayna Ebrahimi, Sercan Omer Arik, Tomas Pfister
DATA VALUATION USING REINFORCEMENT LEARNING

Publication number: 20230325675

Abstract: A method includes obtaining a batch of training samples. For each particular training sample in the batch of training samples, the method includes generating, using a data value estimator model and the particular training sample, a corresponding predicted value of the particular training sample when used to train a machine learning model. The method includes selecting, based on the corresponding predicted values, a subset of the batch of training samples. For each particular training sample in the subset of the batch of training samples, the method includes determining, using the machine learning model and the particular training sample, a corresponding prediction performance measurement. The method includes adjusting one or more estimator parameter values of the data value estimator model based on the corresponding prediction performance measurements.

Type: Application

Filed: June 12, 2023

Publication date: October 12, 2023

Applicant: Google LLC

Inventors: Sercan Omer Arik, Jinsung Yoon, Tomas Pfister
Complementary Prompting For Rehearsal-Free Continual Learning

Publication number: 20230274143

Abstract: A method for rehearsal-free continual learning includes obtaining a set of training samples where training sample in the set of training samples is associated with a respective task of a plurality of different tasks. The method includes obtaining a task-invariant prompt representative of learned knowledge common to each respective task of the plurality of different tasks. The method includes, for each respective task of the plurality of different tasks, obtaining a respective task-specific prompt representative of learned knowledge specific to the respective task. The method includes, during each of one or more training iterations, for each respective training sample in the set of training samples, selecting the respective task-specific prompt representative of the respective task of the respective training sample and training a model using the task-invariant prompt and the selected respective task-specific prompt.

Type: Application

Filed: February 24, 2023

Publication date: August 31, 2023

Applicant: Google LLC

Inventors: Zizhao Zhang, Zifeng Wang, Chen-Yu Lee, Ruoxi Sun, Sayna Ebrahimi, Xiaoqi Ren, Guolong Su, Vincent Perot, Tomas Pfister, Han Zhang
Self-Adapting Forecasting For Multi-Horizon Forecasting Machine Learning Models

Publication number: 20230110117

Abstract: Aspects of the disclosure provide for self-adapting forecasting (SAF) during the training and execution of machine learning models trained for multi-horizon forecasting on time-series data. The distribution of time-series data can shift over different periods of time. A deep neural network and other types of machine learning models are trained assuming that training data is independent and identically distributed (i.i.d.). With a computer system configured to execute SAF, the system can, at inference time, update a trained encoder to generate an encoded representation of time-series data capturing features characterizing the current distribution of the input time-series data. The updated encoded representation can be fed into a decoder trained to generate a multi-horizon forecast based on the updated encoded representation of the time-series data. At each instance of inference, the base weights of a trained model can be reused and updated to generate an updated encoded representation for that instance.

Type: Application

Filed: September 28, 2022

Publication date: April 13, 2023

Inventors: Sercan Omer Arik, Nathanael Christian Yoder, Tomas Pfister
Processing Multi-Horizon Forecasts For Time Series Data

Publication number: 20230018125

Abstract: Methods, systems, and apparatus, including computer storage media, for performing multi-horizon forecasting on time-series data. A method includes determining short-term temporal characteristics for respective forecasting horizons of one or more time-steps. The determining can include generating, using RNN encoders, encoder vectors based on static covariates, and time-varying input data; and predicting using one or more RNN decoders, a short-term pattern for a respective future time period. The method can also include capturing long-term temporal characteristics for the respective forecasting horizons based on the static covariates, the time-varying input data captured during the respective past time-periods, and the time-varying known future input data.

Type: Application

Filed: November 25, 2020

Publication date: January 19, 2023

Inventors: Si Jie Bryan Lim, Sercan Omer Arik, Nicolas Loeff, Tomas Pfister
Aggregating Nested Vision Transformers

Publication number: 20220375205

Abstract: A method includes receiving image data including a series of image patches of an image. The method includes generating, using a first set of transformers of a vision transformer (V-T) model, a first set of higher order feature representations based on the series of image patches and aggregating the first set of higher order feature representations into a second set of higher order feature representations that is smaller than the first set. The method includes generating, using a second set of transformers of the V-T model, a third set of higher order feature representations based on the second set of higher order feature representations and aggregating the third set of higher order feature representations into a fourth set of higher order feature representations that is smaller than the third set. The method includes generating, using the V-T model, an image classification of the image based on the fourth set.

Type: Application

Filed: May 20, 2022

Publication date: November 24, 2022

Applicant: Google LLC

Inventors: Zizhao Zhang, Han Zhang, Long Zhao, Tomas Pfister
Deep Neural Network Learning With Controllable Rules

Publication number: 20220245451

Abstract: The present disclosure provides a method to integrate prior knowledge (referred to as rules) into deep learning in a way that can be controllable at inference without retraining or tuning the model. Deep Neural Networks with Controllable Rule Representations (DNN-CRR) incorporate a rule encoder into the model architecture, which is coupled with a corresponding rule-based objective for enabling a shared representation to be used in decision making by learning both the original task and the rule. DNN-CRR is agnostic to data type and encoder architecture and can be applied to any kind of rule defined for inputs and/or outputs. In real-world domains where incorporating rules is critical, such as prediction tasks in Physics, Retail, and Healthcare.

Type: Application

Filed: February 3, 2022

Publication date: August 4, 2022

Inventors: Sercan Omer Arik, Sungyong Seo, Minho Jin, Jinsung Yoon, Tomas Pfister
SEARCH RANKING OF WEB-BASED SOCIAL CONTENT AGGREGATIONS

Publication number: 20150254252

Abstract: In embodiments of the present invention improved capabilities are described for a content aggregation ranking facility adapted to rank a plurality of web-based content aggregations based on a search term, where each web-based content aggregation is comprised of a plurality of visual web-linked content comprising an image that is linked to a uniform resource locator (URL), and where the ranking may be determined based, at least in part, via determining a correlation between the search term and a characteristic of the plurality of web-based content aggregations, and ranking the plurality of web-based content aggregations based the strength of the that correlation.

Type: Application

Filed: May 19, 2015

Publication date: September 10, 2015

Inventors: Jamil Khalil, Stephen Doyle, Joseph Wee, Christopher Byatte, Tomas Pfister
Automated recognition algorithm for detecting facial expressions

Patent number: 8848068

Abstract: This document discloses a solution for detecting human facial micro-expressions automatically by a video analysis system. Facial micro-expressions are involuntary expressions having a very short duration.

Type: Grant

Filed: May 8, 2012

Date of Patent: September 30, 2014

Assignee: Oulun Yliopisto

Inventors: Tomas Pfister, Matti Pietikäinen, Xiaobai Li, Guoying Zhao
Automated Recognition Algorithm For Detecting Facial Expressions

Publication number: 20130300900

Abstract: This document discloses a solution for detecting human facial micro-expressions automatically by a video analysis system. Facial micro-expressions are involuntary expressions having a very short duration.

Type: Application

Filed: May 8, 2012

Publication date: November 14, 2013

Inventors: Tomas Pfister, Matti Pietikäinen, Xiaobai Li, Guoying Zhao