Patents by Inventor Cordelia Luise Schmid

Cordelia Luise Schmid has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

AGENT TRAJECTORY PREDICTION USING TARGET LOCATIONS

Publication number: 20240149906

Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting future trajectories for an agent in an environment. A system obtains scene context data characterizing the environment. The scene context data includes data that characterizes a trajectory of an agent in a vicinity of a vehicle in an. environment up to a current time point. The system identifies a plurality of initial target locations in the environment. The system further generates, for each of a plurality of target locations that each corresponds to one of the initial target locations, a respective predicted likelihood score that represents a likelihood that the target location will be an intended final location for a future trajectory of the agent starting from the current time point.

Type: Application

Filed: July 28, 2021

Publication date: May 9, 2024

Inventors: Hang Zhao, Jiyang Gao, Chen Sun, Yi Shen, Yuning Chai, Cordelia Luise Schmid, Congcong Li, Benjamin Sapp, Dragomir Anguelov, Tian Lan, Yue Shen
Pre-Training a Model Using Unlabeled Videos

Publication number: 20240127794

Abstract: Systems and methods method for performing captioning for image or video data are described herein. The method can include receiving unlabeled multimedia data, and outputting, from a machine learning model, one or more captions for the multimedia data. Training the machine learning model to create these outputs can include inputting a subset of video frames and a first utterance into the machine learning model, using the machine learning model to predict a predicted utterance based on the subset of video frames and the first utterance, and updating one or more parameters of the machine learning model based on a loss function that compares the predicted utterance with the second utterance.

Type: Application

Filed: September 30, 2022

Publication date: April 18, 2024

Inventors: Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Luise Schmid
Determining structure and motion in images using neural networks

Patent number: 11763466

Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.

Type: Grant

Filed: December 23, 2020

Date of Patent: September 19, 2023

Assignee: Google LLC

Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
Attention Bottlenecks for Multimodal Fusion

Publication number: 20230177384

Abstract: Example embodiments according to aspects of the present disclosure provide an example computer-implemented method for multimodal data processing with improved cross-modal attention. The example method includes inputting a multimodal sequence to an example machine-learned model. The example model includes a first modal processing stream receiving a first modal portion of the multimodal sequence and a second modal processing stream receiving a second modal portion of the multimodal sequence. The example model includes fusing the first modal processing stream and the second modal processing stream across one or more fusion layers of the machine-learned model through a plurality of cross-modal context encodings. The example method includes outputting an inference based at least in part on the plurality of cross-modal context encodings.

Type: Application

Filed: December 8, 2021

Publication date: June 8, 2023

Inventors: Arsha Nagrani, Shan Yang, Anurag Arnab, Chen Sun, Cordelia Luise Schmid
Systems And Methods For Improved Video Understanding

Publication number: 20230017072

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

Type: Application

Filed: July 8, 2021

Publication date: January 19, 2023

Inventors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Luise Schmid
Action localization in images and videos using relational features

Patent number: 11163989

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization in images and videos. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform image processing and video processing operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.

Type: Grant

Filed: August 6, 2019

Date of Patent: November 2, 2021

Assignee: Google LLC

Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
ACTION LOCALIZATION USING RELATIONAL FEATURES

Publication number: 20210166009

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.

Type: Application

Filed: August 6, 2019

Publication date: June 3, 2021

Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
DETERMINING STRUCTURE AND MOTION IN IMAGES USING NEURAL NETWORKS

Publication number: 20210118153

Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.

Type: Application

Filed: December 23, 2020

Publication date: April 22, 2021

Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
Determining structure and motion in images using neural networks

Patent number: 10878583

Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.

Type: Grant

Filed: December 1, 2017

Date of Patent: December 29, 2020

Assignee: Google LLC

Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
DETERMINING STRUCTURE AND MOTION IN IMAGES USING NEURAL NETWORKS

Publication number: 20200349722

Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.

Type: Application

Filed: December 1, 2017

Publication date: November 5, 2020

Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki