Patents by Inventor Cordelia Luise Schmid
Cordelia Luise Schmid has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240149906Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting future trajectories for an agent in an environment. A system obtains scene context data characterizing the environment. The scene context data includes data that characterizes a trajectory of an agent in a vicinity of a vehicle in an. environment up to a current time point. The system identifies a plurality of initial target locations in the environment. The system further generates, for each of a plurality of target locations that each corresponds to one of the initial target locations, a respective predicted likelihood score that represents a likelihood that the target location will be an intended final location for a future trajectory of the agent starting from the current time point.Type: ApplicationFiled: July 28, 2021Publication date: May 9, 2024Inventors: Hang Zhao, Jiyang Gao, Chen Sun, Yi Shen, Yuning Chai, Cordelia Luise Schmid, Congcong Li, Benjamin Sapp, Dragomir Anguelov, Tian Lan, Yue Shen
-
Publication number: 20240127794Abstract: Systems and methods method for performing captioning for image or video data are described herein. The method can include receiving unlabeled multimedia data, and outputting, from a machine learning model, one or more captions for the multimedia data. Training the machine learning model to create these outputs can include inputting a subset of video frames and a first utterance into the machine learning model, using the machine learning model to predict a predicted utterance based on the subset of video frames and the first utterance, and updating one or more parameters of the machine learning model based on a loss function that compares the predicted utterance with the second utterance.Type: ApplicationFiled: September 30, 2022Publication date: April 18, 2024Inventors: Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Luise Schmid
-
Patent number: 11763466Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: GrantFiled: December 23, 2020Date of Patent: September 19, 2023Assignee: Google LLCInventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
-
Publication number: 20230177384Abstract: Example embodiments according to aspects of the present disclosure provide an example computer-implemented method for multimodal data processing with improved cross-modal attention. The example method includes inputting a multimodal sequence to an example machine-learned model. The example model includes a first modal processing stream receiving a first modal portion of the multimodal sequence and a second modal processing stream receiving a second modal portion of the multimodal sequence. The example model includes fusing the first modal processing stream and the second modal processing stream across one or more fusion layers of the machine-learned model through a plurality of cross-modal context encodings. The example method includes outputting an inference based at least in part on the plurality of cross-modal context encodings.Type: ApplicationFiled: December 8, 2021Publication date: June 8, 2023Inventors: Arsha Nagrani, Shan Yang, Anurag Arnab, Chen Sun, Cordelia Luise Schmid
-
Publication number: 20230017072Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.Type: ApplicationFiled: July 8, 2021Publication date: January 19, 2023Inventors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Luise Schmid
-
Patent number: 11163989Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization in images and videos. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform image processing and video processing operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.Type: GrantFiled: August 6, 2019Date of Patent: November 2, 2021Assignee: Google LLCInventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
-
Publication number: 20210166009Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.Type: ApplicationFiled: August 6, 2019Publication date: June 3, 2021Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
-
Publication number: 20210118153Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: ApplicationFiled: December 23, 2020Publication date: April 22, 2021Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
-
Patent number: 10878583Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: GrantFiled: December 1, 2017Date of Patent: December 29, 2020Assignee: Google LLCInventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
-
Publication number: 20200349722Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: ApplicationFiled: December 1, 2017Publication date: November 5, 2020Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki