Patents by Inventor Nakul Agarwal

Nakul Agarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Plausible action anticipation using large video-language models

Patent number: 12386890

Abstract: Systems and methods for augmenting large video language models (LVLMs) by incorporation of a plausible action anticipation framework. The framework augments the LVLM by taking into account the aspect of plausibility in an action sequence. Two objective functions directed to counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss are derived and used to train the LVLM to generate plausible anticipated action sequences. The augmented LVLM is then able to produce sequences whereby each action in the sequence is temporally and spatially factual with respect to the others.

Type: Grant

Filed: April 8, 2024

Date of Patent: August 12, 2025

Assignee: HONDA MOTOR CO., LTD.

Inventors: Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee
SYSTEM AND METHOD FOR INTERPRETABLE MOTION PREDICTION AND VEHICLE NAVIGATION

Publication number: 20250199123

Abstract: A sensor system includes a ranged sensor that generates time-series data indicating positions of objects in an environment, and at least one processor that receives the time-series data generated by the ranged sensor, encodes the time-series data into edge embeddings with an encoder, and computes edge features and edge logits of the objects in the environment, represented in a latent space, based on the edge embeddings. The at least one processor also disentangles the edge features in the latent space, and generates a representation of time-invariant latent characteristics of interactions between the edge features in the latent space.

Type: Application

Filed: March 25, 2024

Publication date: June 19, 2025

Inventors: Victoria Magdalena DAX, Jiachen LI, Enna SACHDEVA, Nakul AGARWAL, Mykel J. KOCHENDERFER
VERSATILE ACTION MODELS (VAMOS) FOR VIDEO UNDERSTANDING

Publication number: 20250166377

Abstract: A method for forming versatile action models for video understanding may gather data from a video. The data may comprise textual video representations and other task specific language inputs. The method may use a pre-trained large language model (LLM) next token prediction for action anticipation based on the data from the video.

Type: Application

Filed: March 22, 2024

Publication date: May 22, 2025

Applicants: Honda Motor Co., Ltd., Brown University

Inventors: Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
OBJECT-CENTRIC VIDEO REPRESENTATION FOR ACTION PREDICTION

Publication number: 20250014338

Abstract: An electronic device and method for object-centric video representation for action prediction is provided. The electronic device extracts a first sequence of video segments from video content associated with a domain and detects a set of objects in the first sequence of video segments. The electronic device generates a set of embeddings based on the first sequence of video segments and the set of objects. The electronic device applies a PTE model on the set of embeddings. The electronic device predicts, based on the application, a set of object-action pairs associated with a second sequence of video segments of the video content. Each object-action pair includes an action to be executed using an object of the set of objects in a video segment of the second sequence of video segments. The second sequence of video segments succeeds the first sequence of video segments in a timeline of the video content.

Type: Application

Filed: December 14, 2023

Publication date: January 9, 2025

Applicants: Honda Motor Co., Ltd., Brown University

Inventors: CE ZHANG, CHANGCHENG FU, SHIJIE WANG, NAKUL AGARWAL, KWONJOON LEE, CHIHO CHOI, CHEN SUN
USING NEURAL LANGUAGE MODELS FOR LONG-TERM ACTION ANTICIPATION FROM VIDEOS

Publication number: 20250014321

Abstract: An electronic device and method for using neural language models for long-term action anticipation from videos is provided. The electronic device receives a video that includes one or more objects performing a physical task and generates, based on the video, a first set of tags that corresponds to a first sequence of actions associated with the physical task. The electronic device generates a first prompt for a neural language model based on the first set of tags and predicts, by application of the neural language model on the first prompt, a second set of tags that corresponds to a second sequence of actions associated with the physical task. The second sequence of actions succeeds the first sequence of actions. The electronic device controls a display device to display first prediction information based on the second set of tags.

Type: Application

Filed: December 14, 2023

Publication date: January 9, 2025

Applicants: Honda Motor Co., Ltd., Brown University

Inventors: CE ZHANG, CHANGCHENG FU, SHIJIE WANG, QI ZHAO, CHEN SUN, NAKUL AGARWAL, KWONJOON LEE
Driving scenario understanding

Patent number: 12183090

Abstract: According to one aspect, intersection scenario description may be implemented by receiving a video stream of a surrounding environment of an ego-vehicle, extracting tracklets and appearance features associated with dynamic objects from the surrounding environment, extracting motion features associated with dynamic objects from the surrounding environment based on the corresponding tracklets, passing the appearance features through an appearance neural network to generate an appearance model, passing the motion features through a motion neural network to generate a motion model, passing the appearance model and the motion model through a fusion network to generate a fusion output, passing the fusion output through a classifier to generate a classifier output, and passing the classifier output through a loss function to generate a multi-label classification output associated with the ego-vehicle, dynamic objects, and corresponding motion paths.

Type: Grant

Filed: June 30, 2022

Date of Patent: December 31, 2024

Assignee: HONDA MOTOR CO., LTD.

Inventors: Nakul Agarwal, Yi-Ting Chen
System and method for providing weakly-supervised online action segmentation

Patent number: 12169964

Abstract: A system and method for providing weakly-supervised online action segmentation that include receiving image data associated with multi-view videos of a procedure, wherein the procedure involves a plurality of atomic actions. The system and method also include analyzing the image data using weakly-supervised action segmentation to identify each of the plurality of atomic actions by using an ordered sequence of action labels. The system and method additionally include training a neural network with data pertaining to the plurality of atomic actions based on the weakly-supervised action segmentation. The system and method further include executing online action segmentation to label atomic actions that are occurring in real-time based on the plurality of atomic actions trained to the neural network.

Type: Grant

Filed: February 1, 2022

Date of Patent: December 17, 2024

Assignee: HONDA MOTOR CO., LTD.

Inventors: Reza Ghoddoosian, Isht Dwivedi, Nakul Agarwal, Chiho Choi, Behzad Dariush
SYSTEMS AND METHODS FOR TRAINING A NEURAL NETWORK FOR GENERATING A REASONING STATEMENT

Publication number: 20240404297

Abstract: Systems and methods for training a neural network for generating a reasoning statement are provided. In one embodiment, a method includes receiving sensor data from a perspective of an ego agent. The method includes identifying a plurality of captured objects in the at least one roadway environment. The method includes receiving a set of ranking classifications for a captured object of the plurality of captured objects. The annotator reasoning statement is a natural language explanation for the applied attribute. The method includes generating a training dataset for the object type including the annotator reasoning statements of the set of ranking classifications that include the applied attribute from the plurality of importance attributes in the importance category. The method includes training the neural network to generate a generated reasoning statement based on the training dataset in response to a training agent detecting a detected object of the object type.

Type: Application

Filed: June 5, 2023

Publication date: December 5, 2024

Inventors: Enna SACHDEVA, Nakul AGARWAL, Sean F. ROELOFS, Jiachen LI, Behzad DARIUSH, Chiho CHOI
WEAKLY SUPERVISED ACTION SEGMENTATION

Publication number: 20240371166

Abstract: According to one aspect, weakly-supervised action segmentation may include performing feature extraction to extract one or more features associated with a current frame of a video including a series of one or more actions, feeding one or more of the features to a recognition network to generate a predicted action score for the current frame of the video, feeding one or more of the features and the predicted action score to an action transition model to generate a potential subsequent action, feeding the potential subsequent action and the predicted action score to a hybrid segmentation model to generate a predicted sequence of actions from a first frame of the video to the current frame of the video, and segmenting or labeling one or more frames of the video based on the predicted sequence of actions from the first frame of the video to the current frame of the video.

Type: Application

Filed: April 27, 2023

Publication date: November 7, 2024

Inventors: Reza GHODDOOSIAN, Isht DWIVEDI, Nakul AGARWAL, Behzad DARIUSH
System and method for providing an agent action anticipative transformer

Patent number: 12094214

Abstract: A system and method for providing an agent action anticipative transformer that include receiving image data associated with a video of a surrounding environment of an ego agent. The system and method additionally include analyzing the image data and extracting short range clips from the image data. The system and method also include analyzing the short range clips and extracting clip-level features associated with each of the short range clips. The system and method further include executing self-supervision using causal masking with respect to the extracted clip-level features to output action predictions and feature predictions to enable ego-centric action anticipation with respect to at least one target agent to autonomously control the ego agent.

Type: Grant

Filed: April 29, 2022

Date of Patent: September 17, 2024

Assignee: HONDA MOTOR CO., LTD.

Inventors: Harshayu Girase, Nakul Agarwal, Chiho Choi
SPATIAL ACTION LOCALIZATION IN THE FUTURE (SALF)

Publication number: 20240161447

Abstract: According to one aspect, spatial action localization in the future (SALF) may include feeding a frame from a time step of a video clip through an encoder to generate a latent feature, feeding the latent feature and one or more latent features from one or more previous time steps of the video clip through a future feature predictor to generate a cumulative information for the time step, feeding the cumulative information through a decoder to generate a predicted action area and a predicted action classification associated with the predicted action area, and implementing an action based on the predicted action area and the predicted action classification. The encoder may include a 2D convolutional neural network (CNN) and/or a 3D-CNN. The future feature predictor may be based on an ordinary differential equation (ODE) function.

Type: Application

Filed: April 14, 2023

Publication date: May 16, 2024

Inventors: Hyung-gun CHI, Kwonjoon LEE, Nakul AGARWAL, Yi XU, Chiho CHOI
Driver behavior risk assessment and pedestrian awareness

Patent number: 11845464

Abstract: Driver behavior risk assessment and pedestrian awareness may include an receiving an input stream of images of an environment including one or more objects within the environment, estimating an intention of an ego vehicle based on the input stream of images and a temporal recurrent network (TRN), generating a scene representation based on the input stream of images and a graph neural network (GNN), generating a prediction of a situation based on the scene representation and the intention of the ego vehicle, and generating an influenced or non-influenced action determination based on the prediction of the situation and the scene representation.

Type: Grant

Filed: January 29, 2021

Date of Patent: December 19, 2023

Assignee: HONDA MOTOR CO., LTD.

Inventors: Nakul Agarwal, Yi-Ting Chen
SYSTEM AND METHOD FOR PROVIDING AN AGENT ACTION ANTICIPATIVE TRANSFORMER

Publication number: 20230351759

Abstract: A system and method for providing an agent action anticipative transformer that include receiving image data associated with a video of a surrounding environment of an ego agent. The system and method additionally include analyzing the image data and extracting short range clips from the image data. The system and method also include analyzing the short range clips and extracting clip-level features associated with each of the short range clips. The system and method further include executing self-supervision using causal masking with respect to the extracted clip-level features to output action predictions and feature predictions to enable ego-centric action anticipation with respect to at least one target agent to autonomously control the ego agent.

Type: Application

Filed: April 29, 2022

Publication date: November 2, 2023

Inventors: Harshayu GIRASE, Nakul AGARWAL, Chiho CHOI
DRIVER BEHAVIOR RISK ASSESSMENT AND PEDESTRIAN AWARENESS

Publication number: 20230311942

Abstract: Driver behavior risk assessment and pedestrian awareness may include an receiving an input stream of images of an environment including one or more objects within the environment, estimating an intention of an ego vehicle based on the input stream of images and a temporal recurrent network (TRN), generating a scene representation based on the input stream of images and a graph neural network (GNN), generating a prediction of a situation based on the scene representation and the intention of the ego vehicle, and generating an influenced or non-influenced action determination based on the prediction of the situation and the scene representation.

Type: Application

Filed: June 8, 2023

Publication date: October 5, 2023

Inventors: Nakul AGARWAL, Yi-Ting CHEN
System for performing intersection scenario retrieval and method thereof

Patent number: 11741723

Abstract: A system and method for performing intersection scenario retrieval that includes receiving a video stream of a surrounding environment of an ego vehicle. The system and method also include analyzing the video stream to trim the video stream into video clips of an intersection scene associated with the travel of the ego vehicle. The system and method additionally include annotating the ego vehicle, dynamic objects, and their motion paths that are included within the intersection scene with action units that describe an intersection scenario. The system and method further include retrieving at least one intersection scenario based on a query of an electronic dataset that stores a combination of action units to operably control a presentation of at least one intersection scenario video clip that includes the at least one intersection scenario.

Type: Grant

Filed: June 29, 2020

Date of Patent: August 29, 2023

Assignee: HONDA MOTOR CO., LTD.

Inventors: Yi-Ting Chen, Nakul Agarwal, Behzad Dariush, Ahmed Taha
DRIVING SCENARIO UNDERSTANDING

Publication number: 20230154195

Abstract: According to one aspect, intersection scenario description may be implemented by receiving a video stream of a surrounding environment of an ego-vehicle, extracting tracklets and appearance features associated with dynamic objects from the surrounding environment, extracting motion features associated with dynamic objects from the surrounding environment based on the corresponding tracklets, passing the appearance features through an appearance neural network to generate an appearance model, passing the motion features through a motion neural network to generate a motion model, passing the appearance model and the motion model through a fusion network to generate a fusion output, passing the fusion output through a classifier to generate a classifier output, and passing the classifier output through a loss function to generate a multi-label classification output associated with the ego-vehicle, dynamic objects, and corresponding motion paths.

Type: Application

Filed: June 30, 2022

Publication date: May 18, 2023

Inventors: Nakul AGARWAL, Yi-Ting CHEN
SYSTEM AND METHOD FOR PROVIDING WEAKLY-SUPERVISED ONLINE ACTION SEGMENTATION

Publication number: 20230141037

Abstract: A system and method for providing weakly-supervised online action segmentation that include receiving image data associated with multi-view videos of a procedure, wherein the procedure involves a plurality of atomic actions. The system and method also include analyzing the image data using weakly-supervised action segmentation to identify each of the plurality of atomic actions by using an ordered sequence of action labels. The system and method additionally include training a neural network with data pertaining to the plurality of atomic actions based on the weakly-supervised action segmentation. The system and method further include executing online action segmentation to label atomic actions that are occurring in real-time based on the plurality of atomic actions trained to the neural network.

Type: Application

Filed: February 1, 2022

Publication date: May 11, 2023

Inventors: Reza GHODDOOSIAN, Isht DWIVEDI, Nakul AGARWAL, Chiho CHOI, Behzad DARIUSH
System and method for providing unsupervised domain adaptation for spatio-temporal action localization

Patent number: 11580743

Abstract: A system and method for providing unsupervised domain adaption for spatio-temporal action localization that includes receiving video data associated with a source domain and a target domain that are associated with a surrounding environment of a vehicle. The system and method also include analyzing the video data associated with the source domain and the target domain and determining a key frame of the source domain and a key frame of the target domain. The system and method additionally include completing an action localization model to model a temporal context of actions occurring within the key frame of the source domain and the key frame of the target domain and completing an action adaption model to localize individuals and their actions and to classify the actions based on the video data. The system and method further include combining losses to complete spatio-temporal action localization of individuals and actions.

Type: Grant

Filed: March 25, 2022

Date of Patent: February 14, 2023

Assignee: HONDA MOTOR CO., LTD.

Inventors: Yi-Ting Chen, Behzad Dariush, Nakul Agarwal, Ming-Hsuan Yang
System and method for providing unsupervised domain adaptation for spatio-temporal action localization

Patent number: 11403850

Abstract: A system and method for providing unsupervised domain adaption for spatio-temporal action localization that includes receiving video data associated with a surrounding environment of a vehicle. The system and method also include completing an action localization model to model a temporal context of actions occurring within the surrounding environment of the vehicle based on the video data and completing an action adaption model to localize individuals and their actions and to classify the actions based on the video data. The system and method further include combining losses from the action localization model and the action adaption model to complete spatio-temporal action localization of individuals and actions that occur within the surrounding environment of the vehicle.

Type: Grant

Filed: February 28, 2020

Date of Patent: August 2, 2022

Assignee: Honda Motor Co., Ltd.

Inventors: Yi-Ting Chen, Behzad Dariush, Nakul Agarwal, Ming-Hsuan Yang
SYSTEM AND METHOD FOR PROVIDING UNSUPERVISED DOMAIN ADAPTATION FOR SPATIO-TEMPORAL ACTION LOCALIZATION

Publication number: 20220215661

Abstract: A system and method for providing unsupervised domain adaption for spatio-temporal action localization that includes receiving video data associated with a source domain and a target domain that are associated with a surrounding environment of a vehicle. The system and method also include analyzing the video data associated with the source domain and the target domain and determining a key frame of the source domain and a key frame of the target domain. The system and method additionally include completing an action localization model to model a temporal context of actions occurring within the key frame of the source domain and the key frame of the target domain and completing an action adaption model to localize individuals and their actions and to classify the actions based on the video data. The system and method further include combining losses to complete spatio-temporal action localization of individuals and actions.

Type: Application

Filed: March 25, 2022

Publication date: July 7, 2022

Inventors: Yi-Ting CHEN, Behzad DARIUSH, Nakul AGARWAL, Ming-Hsuan YANG

1 2 next