Patents by Inventor Xitong Yang

Xitong Yang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Iterative spatio-temporal action detection in video

Patent number: 11631239

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Grant

Filed: April 22, 2021

Date of Patent: April 18, 2023

Assignee: NVIDIA CORPORATION

Inventors: Xiaodong Yang, Ming-Yu Liu, Jan Kautz, Fanyi Xiao, Xitong Yang
Self-supervised hierarchical motion learning for video action recognition

Patent number: 11594006

Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.

Type: Grant

Filed: August 20, 2020

Date of Patent: February 28, 2023

Assignee: NVIDIA CORPORATION

Inventors: Xiaodong Yang, Xitong Yang, Sifei Liu, Jan Kautz
ITERATIVE SPATIO-TEMPORAL ACTION DETECTION IN VIDEO

Publication number: 20210241489

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Application

Filed: April 22, 2021

Publication date: August 5, 2021

Inventors: Xiaodong YANG, Ming-Yu LIU, Jan KAUTZ, Fanyi XIAO, Xitong YANG
Iterative spatio-temporal action detection in video

Patent number: 11017556

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Grant

Filed: October 4, 2018

Date of Patent: May 25, 2021

Assignee: NVIDIA Corporation

Inventors: Xiaodong Yang, Xitong Yang, Fanyi Xiao, Ming-Yu Liu, Jan Kautz
Systems for modeling uncertainty in multi-modal retrieval and methods thereof

Patent number: 10943154

Abstract: Multi-modal data representing driving events and corresponding actions related to the driving events can be obtained and used to train a neural network at least in part by using a triplet loss computed for the driving events as a regression loss to determine an embedding of driving event data. In some cases, using the trained neural network, a retrieval request for an input driving event and corresponding action can be processed by determining, from the neural network, one or more similar driving events or corresponding actions in the multi-modal data.

Type: Grant

Filed: January 22, 2019

Date of Patent: March 9, 2021

Assignee: HONDA MOTOR CO., LTD.

Inventors: Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Larry Davis, Xitong Yang
SELF-SUPERVISED HIERARCHICAL MOTION LEARNING FOR VIDEO ACTION RECOGNITION

Publication number: 20210064931

Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.

Type: Application

Filed: August 20, 2020

Publication date: March 4, 2021

Inventors: Xiaodong Yang, Xitong Yang, Sifei Liu, Jan Kautz
SYSTEMS FOR MODELING UNCERTAINTY IN MULTI-MODAL RETRIEVAL AND METHODS THEREOF

Publication number: 20200234086

Abstract: Multi-modal data representing driving events and corresponding actions related to the driving events can be obtained and used to train a neural network at least in part by using a triplet loss computed for the driving events as a regression loss to determine an embedding of driving event data. In some cases, using the trained neural network, a retrieval request for an input driving event and corresponding action can be processed by determining, from the neural network, one or more similar driving events or corresponding actions in the multi-modal data.

Type: Application

Filed: January 22, 2019

Publication date: July 23, 2020

Inventors: Ahmed TAHA, Yi-Ting CHEN, Teruhisa MISU, Larry DAVIS, Xitong YANG
ITERATIVE SPATIO-TEMPORAL ACTION DETECTION IN VIDEO

Publication number: 20190102908

Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.

Type: Application

Filed: October 4, 2018

Publication date: April 4, 2019

Inventors: Xiaodong YANG, Xitong YANG, Fanyi XIAO, Ming-Yu LIU, Jan KAUTZ
Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action

Patent number: 9805255

Abstract: A multimodal sensing system includes various devices that work together to automatically classify an action. A video camera captures a sequence of digital images. At least one other sensor device captures other sensed data (e.g., motion data). The system will extract video features from the digital images so that each extracted image feature is associated with a time period. It will extract other features from the other sensed data so that each extracted other feature is associated with a time period. The system will fuse a group of the extracted video features and a group of the extracted other features to create a fused feature representation for a time period. It will then analyze the fused feature representation to identify a class, access a data store of classes and actions to identify an action that is associated with the class, and save the identified action to a memory device.

Type: Grant

Filed: January 29, 2016

Date of Patent: October 31, 2017

Assignee: Conduent Business Services, LLC

Inventors: Xitong Yang, Edgar A. Bernal, Sriganesh Madhvanath, Raja Bala, Palghat S. Ramesh, Qun Li, Jayant Kumar
TEMPORAL FUSION OF MULTIMODAL DATA FROM MULTIPLE DATA ACQUISITION SYSTEMS TO AUTOMATICALLY RECOGNIZE AND CLASSIFY AN ACTION

Publication number: 20170220854

Abstract: A multimodal sensing system includes various devices that work together to automatically classify an action. A video camera captures a sequence of digital images. At least one other sensor device captures other sensed data (e.g., motion data). The system will extract video features from the digital images so that each extracted image feature is associated with a time period. It will extract other features from the other sensed data so that each extracted other feature is associated with a time period. The system will fuse a group of the extracted video features and a group of the extracted other features to create a fused feature representation for a time period. It will then analyze the fused feature representation to identify a class, access a data store of classes and actions to identify an action that is associated with the class, and save the identified action to a memory device.

Type: Application

Filed: January 29, 2016

Publication date: August 3, 2017

Inventors: Xitong Yang, Edgar A. Bernal, Sriganesh Madhvanath, Raja Bala, Palghat S. Ramesh, Qun Li, Jayant Kumar