Patents by Inventor Cordelia Luise Schmid

Cordelia Luise Schmid has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240149906
    Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for predicting future trajectories for an agent in an environment. A system obtains scene context data characterizing the environment. The scene context data includes data that characterizes a trajectory of an agent in a vicinity of a vehicle in an. environment up to a current time point. The system identifies a plurality of initial target locations in the environment. The system further generates, for each of a plurality of target locations that each corresponds to one of the initial target locations, a respective predicted likelihood score that represents a likelihood that the target location will be an intended final location for a future trajectory of the agent starting from the current time point.
    Type: Application
    Filed: July 28, 2021
    Publication date: May 9, 2024
    Inventors: Hang Zhao, Jiyang Gao, Chen Sun, Yi Shen, Yuning Chai, Cordelia Luise Schmid, Congcong Li, Benjamin Sapp, Dragomir Anguelov, Tian Lan, Yue Shen
  • Publication number: 20240127794
    Abstract: Systems and methods method for performing captioning for image or video data are described herein. The method can include receiving unlabeled multimedia data, and outputting, from a machine learning model, one or more captions for the multimedia data. Training the machine learning model to create these outputs can include inputting a subset of video frames and a first utterance into the machine learning model, using the machine learning model to predict a predicted utterance based on the subset of video frames and the first utterance, and updating one or more parameters of the machine learning model based on a loss function that compares the predicted utterance with the second utterance.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 18, 2024
    Inventors: Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Luise Schmid
  • Patent number: 11763466
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: September 19, 2023
    Assignee: Google LLC
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
  • Publication number: 20230177384
    Abstract: Example embodiments according to aspects of the present disclosure provide an example computer-implemented method for multimodal data processing with improved cross-modal attention. The example method includes inputting a multimodal sequence to an example machine-learned model. The example model includes a first modal processing stream receiving a first modal portion of the multimodal sequence and a second modal processing stream receiving a second modal portion of the multimodal sequence. The example model includes fusing the first modal processing stream and the second modal processing stream across one or more fusion layers of the machine-learned model through a plurality of cross-modal context encodings. The example method includes outputting an inference based at least in part on the plurality of cross-modal context encodings.
    Type: Application
    Filed: December 8, 2021
    Publication date: June 8, 2023
    Inventors: Arsha Nagrani, Shan Yang, Anurag Arnab, Chen Sun, Cordelia Luise Schmid
  • Publication number: 20230017072
    Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.
    Type: Application
    Filed: July 8, 2021
    Publication date: January 19, 2023
    Inventors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Luise Schmid
  • Patent number: 11163989
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization in images and videos. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform image processing and video processing operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.
    Type: Grant
    Filed: August 6, 2019
    Date of Patent: November 2, 2021
    Assignee: Google LLC
    Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
  • Publication number: 20210166009
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.
    Type: Application
    Filed: August 6, 2019
    Publication date: June 3, 2021
    Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
  • Publication number: 20210118153
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Application
    Filed: December 23, 2020
    Publication date: April 22, 2021
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
  • Patent number: 10878583
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Grant
    Filed: December 1, 2017
    Date of Patent: December 29, 2020
    Assignee: Google LLC
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
  • Publication number: 20200349722
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Application
    Filed: December 1, 2017
    Publication date: November 5, 2020
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki