Patents by Inventor Juan Carlos Niebles Duque

Juan Carlos Niebles Duque has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR LEARNING UNIFIED REPRESENTATIONS OF LANGUAGE, IMAGE, AND POINT CLOUD FOR THREE-DIMENSIONAL RECOGNITION

Publication number: 20240169704

Abstract: Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.

Type: Application

Filed: March 13, 2023

Publication date: May 23, 2024

Inventors: Le XUE, Chen XING, Juan Carlos NIEBLES DUQUE, Caiming XIONG, Ran XU, Silvio SAVARESE
SYSTEMS AND METHODS FOR ATTENTION MECHANISM IN THREE-DIMENSIONAL OBJECT DETECTION

Publication number: 20240169746

Abstract: Embodiments described herein provide a system for three-dimensional (3D) object detection. The system includes an input interface configured to obtain 3D point data describing spatial information of a plurality of points, and a memory storing a neural network based 3D object detection model having an encoder and a decoder. The system also includes processors to perform operations including: encoding, by the encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features; generating, by attention layers at the decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights.

Type: Application

Filed: January 30, 2023

Publication date: May 23, 2024

Inventors: Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Juan Carlos Niebles Duque, Caiming Xiong, Ran Xu
SYSTEMS AND METHODS FOR VIDEO MODELS WITH PROCEDURE UNDERSTANDING

Publication number: 20240161464

Abstract: Embodiments described herein provide systems and methods for training video models to perform a task from an input instructional video. A procedure knowledge graph (PKG) may be generated with nodes representing procedure steps, and edges representing relationships between the steps. The PKG may be generated based on text and/or video training data which includes procedures (e.g., instructional videos). Using the PKG, a video model may be trained using the PKG to provide supervisory training signals for a number of tasks. Once the model is trained, it may be fine-tuned for a specific task which benefits from the model being trained in a way that makes the model embed procedural information when encoding videos.

Type: Application

Filed: January 25, 2023

Publication date: May 16, 2024

Inventors: Roberto Martin-Martin, Silvio Savarese, Honglu Zhou, Juan Carlos Niebles Duque
SYSTEMS AND METHODS FOR LEARNING UNIFIED REPRESENTATIONS OF LANGUAGE, IMAGE, AND POINT CLOUD FOR THREE-DIMENSIONAL RECOGNITION

Publication number: 20240160917

Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.

Type: Application

Filed: March 13, 2023

Publication date: May 16, 2024

Inventors: Le XUE, Chen XING, Juan Carlos NIEBLES DUQUE, Caiming XIONG, Ran XU, Silvio SAVARESE
SYSTEMS AND METHODS FOR MULTIMODAL LAYOUT DESIGNS OF DIGITAL PUBLICATIONS

Publication number: 20240104809

Abstract: Embodiments described herein provide systems and methods for multimodal layout generations for digital publications. The system may receive as inputs, a background image, one or more foreground texts, and one or more foreground images. Feature representations of the background image may be generated. The foreground inputs may be input to a layout generator which has cross attention to the background image feature representations in order to generate a layout comprising of bounding box parameters for each input item. A composite layout may be generated based on the inputs and generated bounding boxes. The resulting composite layout may then be displayed on a user interface.

Type: Application

Filed: January 30, 2023

Publication date: March 28, 2024

Inventors: Ning Yu, Chia-Chih Chen, Zeyuan Chen, Caiming Xiong, Juan Carlos Niebles Duque, Ran Xu, Rui Meng
SYSTEMS AND METHODS FOR OPEN VOCABULARY INSTANCE SEGMENTATION IN UNANNOTATED IMAGES

Publication number: 20240070868

Abstract: Embodiments described herein provide an open-vocabulary instance segmentation framework that adopts a pre-trained vision-language model to develop a pipeline in detecting novel categories of instances.

Type: Application

Filed: January 25, 2023

Publication date: February 29, 2024

Inventors: Ning Yu, Vibashan Vishnukumar Sharmini, Chen Xing, Juan Carlos Niebles Duque, Ran Xu
SYSTEMS AND METHODS FOR CONTRASTIVE PRETRAINING WITH VIDEO TRACKING SUPERVISION

Publication number: 20230154139

Abstract: Embodiments described herein provide an intelligent method to select instances, by utilizing unsupervised tracking for videos. Using this freely available form of supervision, a temporal constraint is adopted for selecting instances that ensures that different instances contain the same object while sampling the temporal augmentation from the video. In addition, using the information on the spatial extent of the tracked object, spatial constraints are applied to ensure that sampled instances overlap meaningfully with the tracked object. Taken together, these spatiotemporal constraints result in better supervisory signal for contrastive learning from videos.

Type: Application

Filed: January 31, 2022

Publication date: May 18, 2023

Inventors: Brian Chen, Ramprasaath Ramasamy Selvaraju, Juan Carlos Niebles Duque, Nikhil Naik
Spatio-temporal graph for video captioning with knowledge distillation

Patent number: 11604936

Abstract: A method for scene perception using video captioning based on a spatio-temporal graph model is described. The method includes decomposing the spatio-temporal graph model of a scene in input video into a spatial graph and a temporal graph. The method also includes modeling a two branch framework having an object branch and a scene branch according to the spatial graph and the temporal graph to learn object interactions between the object branch and the scene branch. The method further includes transferring the learned object interactions from the object branch to the scene branch as privileged information. The method also includes captioning the scene by aligning language logits from the object branch and the scene branch according to the learned object interactions.

Type: Grant

Filed: March 23, 2020

Date of Patent: March 14, 2023

Assignees: TOYOTA RESEARCH INSTITUTE, INC., THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY

Inventors: Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien David Gaidon, Ehsan Adeli-Mosabbeb, Juan Carlos Niebles Duque
SPATIO-TEMPORAL GRAPH FOR VIDEO CAPTIONING WITH KNOWLEDGE DISTILLATION

Publication number: 20210295093

Abstract: A method for scene perception using video captioning based on a spatio-temporal graph model is described. The method includes decomposing the spatio-temporal graph model of a scene in input video into a spatial graph and a temporal graph. The method also includes modeling a two branch framework having an object branch and a scene branch according to the spatial graph and the temporal graph to learn object interactions between the object branch and the scene branch. The method further includes transferring the learned object interactions from the object branch to the scene branch as privileged information. The method also includes captioning the scene by aligning language logits from the object branch and the scene branch according to the learned object interactions.

Type: Application

Filed: March 23, 2020

Publication date: September 23, 2021

Applicants: TOYOTA RESEARCH INSTITUTE, INC., THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY

Inventors: Boxiao PAN, Haoye CAI, De-An HUANG, Kuan-Hui LEE, Adrien David GAIDON, Ehsan ADELI-MOSABBEB, Juan Carlos NIEBLES DUQUE
Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision

Patent number: 11074438

Abstract: A method for predicting spatial positions of several key points on a human body in the near future in an egocentric setting is described. The method includes generating a frame-level supervision for human poses. The method also includes suppressing noise and filling missing joints of the human body using a pose completion module. The method further includes splitting the poses into a global stream and a local stream. Furthermore, the method includes combining the global stream and the local stream to forecast future human locomotion.

Type: Grant

Filed: October 1, 2019

Date of Patent: July 27, 2021

Assignees: TOYOTA RESEARCH INSTITUTE, INC., THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY

Inventors: Karttikeya Mangalam, Ehsan Adeli-Mosabbeb, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles Duque
DISENTANGLING HUMAN DYNAMICS FOR PEDESTRIAN LOCOMOTION FORECASTING WITH NOISY SUPERVISION

Publication number: 20210097266

Abstract: A method for predicting spatial positions of several key points on a human body in the near future in an egocentric setting is described. The method includes generating a frame-level supervision for human poses. The method also includes suppressing noise and filling missing joints of the human body using a pose completion module. The method further includes splitting the poses into a global stream and a local stream. Furthermore, the method includes combining the global stream and the local stream to forecast future human locomotion.

Type: Application

Filed: October 1, 2019

Publication date: April 1, 2021

Applicants: TOYOTA RESEARCH INSTITUTE, INC., THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY

Inventors: Karttikeya MANGALAM, Ehsan ADELI-MOSABBEB, Kuan-Hui LEE, Adrien GAIDON, Juan Carlos NIEBLES DUQUE
Video content analysis for automatic demographics recognition of users and videos

Patent number: 10210462

Abstract: A demographics analysis trains classifier models for predicting demographic attribute values of videos and users not already having known demographics. In one embodiment, the demographics analysis system trains classifier models for predicting demographics of videos using video features such as demographics of video uploaders, textual metadata, and/or audiovisual content of videos. In one embodiment, the demographics analysis system trains classifier models for predicting demographics of users (e.g., anonymous users) using user features based on prior video viewing periods of users. For example, viewing-period based user features can include individual viewing period statistics such as total videos viewed. Further, the viewing-period based features can include distributions of values over the viewing period, such as distributions in demographic attribute values of video uploaders, and/or distributions of viewings over hours of the day, days of the week, and the like.

Type: Grant

Filed: November 24, 2014

Date of Patent: February 19, 2019

Assignee: Google LLC

Inventors: Juan Carlos Niebles Duque, Hrishikesh Aradhye, Luciano Sbaiz, Jay Yagnik, Reto Strobl
Video Content Analysis For Automatic Demographics Recognition Of Users And Videos

Publication number: 20150081604

Abstract: A demographics analysis trains classifier models for predicting demographic attribute values of videos and users not already having known demographics. In one embodiment, the demographics analysis system trains classifier models for predicting demographics of videos using video features such as demographics of video uploaders, textual metadata, and/or audiovisual content of videos. In one embodiment, the demographics analysis system trains classifier models for predicting demographics of users (e.g., anonymous users) using user features based on prior video viewing periods of users. For example, viewing-period based user features can include individual viewing period statistics such as total videos viewed. Further, the viewing-period based features can include distributions of values over the viewing period, such as distributions in demographic attribute values of video uploaders, and/or distributions of viewings over hours of the day, days of the week, and the like.

Type: Application

Filed: November 24, 2014

Publication date: March 19, 2015

Inventors: Juan Carlos Niebles Duque, Hrishikesh Aradhye, Luciano Sbaiz, Jay Yagnik, Reto Strobl
Video content analysis for automatic demographics recognition of users and videos

Patent number: 8924993

Abstract: A demographics analysis trains classifier models for predicting demographic attribute values of videos and users not already having known demographics. In one embodiment, the demographics analysis system trains classifier models for predicting demographics of videos using video features such as demographics of video uploaders, textual metadata, and/or audiovisual content of videos. In one embodiment, the demographics analysis system trains classifier models for predicting demographics of users (e.g., anonymous users) using user features based on prior video viewing periods of users. For example, viewing-period based user features can include individual viewing period statistics such as total videos viewed. Further, the viewing-period based features can include distributions of values over the viewing period, such as distributions in demographic attribute values of video uploaders, and/or distributions of viewings over hours of the day, days of the week, and the like.

Type: Grant

Filed: November 10, 2011

Date of Patent: December 30, 2014

Assignee: Google Inc.

Inventors: Juan Carlos Niebles Duque, Hrishikesh Balkrishna Aradhye, Luciano Sbaiz, Jay N. Yagnik, Reto Strobl