Patents by Inventor Juan Carlos Niebles

Juan Carlos Niebles has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR A UNIFIED TRAINING FRAMEWORK OF LARGE LANGUAGE MODELS

Publication number: 20250259054

Abstract: Embodiments described herein provide a unified LLM training pipeline that hands the diversity of various data structures and formats involving LLMs agent trajectories. These pipelines are specifically designed to transform incoming data into a standardized representation, ensuring compatibility across varied formats. Furthermore, the data collection undergoes a filtering process to ensure high-quality trajectories, adding an additional layer of refinement to the dataset. In this way, the training pipeline not only unifies trajectories across environments but also enhances the overall quality and reliability of the collected data for LLM training.

Type: Application

Filed: May 8, 2024

Publication date: August 14, 2025

Inventors: Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Shelby Heinecke, Yihao Feng, Huan Wang, Juan Carlos Niebles, Silvio Savarese, Caiming Xiong
Systems and methods for video models with procedure understanding

Patent number: 12367662

Abstract: Embodiments described herein provide systems and methods for training video models to perform a task from an input instructional video. A procedure knowledge graph (PKG) may be generated with nodes representing procedure steps, and edges representing relationships between the steps. The PKG may be generated based on text and/or video training data which includes procedures (e.g., instructional videos). Using the PKG, a video model may be trained using the PKG to provide supervisory training signals for a number of tasks. Once the model is trained, it may be fine-tuned for a specific task which benefits from the model being trained in a way that makes the model embed procedural information when encoding videos.

Type: Grant

Filed: January 25, 2023

Date of Patent: July 22, 2025

Assignee: Salesforce, Inc.

Inventors: Roberto Martin-Martin, Silvio Savarese, Honglu Zhou, Juan Carlos Niebles Duque
SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE AGENTS

Publication number: 20250139411

Abstract: Embodiments described herein provide a large language model (LLM) based AI agent that adopts Monte-Carlo Tree Search (MCTS) to execute a task. The LLM is prompted with a task description and it responds with its first attempted list of actions. Based on the success or failure of the first attempt, the LLM is prompted with an updated prompt which includes feedback from the first attempt based on a determined reward. The prompt may include a relative “score” for each action taken at each step. A numeric score may be mapped to a set of pre-defined text labels, such as “high” or “low” value putting the score in a form more suited for an LLM prompt. In this way, the LLM is iteratively given prompts which are updated with the scores from each action taken at each previous iterations so that it traverses different paths on the tree in each iteration.

Type: Application

Filed: October 31, 2023

Publication date: May 1, 2025

Inventors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles Duque, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR EDITING A LARGE LANGUAGE MODEL

Publication number: 20250124233

Abstract: Systems and methods for editing a large language model are provided. The large language model generates a sequence of tokens, a first probability of a pre-edit output based on the sequence of tokens, and a second probability of a target output based on the sequence of tokens. A loss function is provided based on the first probability and the second probability. A plurality of gradients of the large language model with respect to the loss function is computed. An edit location of the large language model is determined based on the plurality of gradients. The large language model is edited by editing weights at the edit location of the large language model, such that the updated large language model generates the target output for an input including the sequence of words.

Type: Application

Filed: January 31, 2024

Publication date: April 17, 2025

Inventors: Itai Izhak Feigenbaum, Devansh Arpit, Shelby Heinecke, Juan Carlos Niebles Duque, Weiran Yao, Huan Wang, Caiming Xiong, Silvio Savarese
Cooperative-contrastive learning systems and methods

Patent number: 12254691

Abstract: Systems and methods for multi-view cooperative contrastive self-supervised learning, may include receiving a plurality of video sequences, the video sequences comprising a plurality of image frames; applying selected images of a first and second video sequence of the plurality of video sequences to a plurality of different encoders to derive a plurality of embeddings for different views of the selected images of the first and second video sequences; determining distances of the derived plurality of embeddings for the selected images of the first and second video sequences; detecting inconsistencies in the determined distances; and predicting semantics of a future image based on the determined distances.

Type: Grant

Filed: December 3, 2020

Date of Patent: March 18, 2025

Assignees: TOYOTA RESEARCH INSTITUTE, INC., THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY

Inventors: Nishant Rai, Ehsan Adeli Mosabbeb, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles
LARGE LANGUAGE MODELS FOR FLOW ARCHITECTURE DESIGN

Publication number: 20250086402

Abstract: Methods, systems, apparatuses, devices, and computer program products are described. A flow generation service may receive a natural language input that indicates instructions for automating a task according to a first process flow. Using a large language model (LLM), the flow generation service may decompose the natural language input into a set of elements (e.g., logical actions) and connectors, where the LLM may be trained on first metadata corresponding to a second process flow that is created manually by a user. In addition, using the LLM, the flow generation service may generate second metadata corresponding to each of the set of elements based on decomposing the natural language input. The flow generation service may sequence and merge the set of elements to generate the first process flow. In some examples, the flow generation service may send, for display to a user interface of a user device, the first process flow.

Type: Application

Filed: January 17, 2024

Publication date: March 13, 2025

Inventors: Ran Xu, Zeyuan Chen, Yihao Feng, Krithika Ramakrishnan, Congying Xia, Juan Carlos Niebles Duque, Vetter Serdikova, Huan Wang, Yuxi Zhang, Kexin Xie, Donglin Hu, Bo Wang, Ajaay Ravi, Matthew David Trepina, Sam Bailey, Abhishek Das, Yuliya Feldman, Pawan Agarwal
SYSTEMS AND METHODS FOR ORCHESTRATING LLM-AUGMENTED AUTONOMOUS AGENTS

Publication number: 20250053793

Abstract: Embodiments described herein provide a method of predicting an action by a plurality of language model augmented agents (LAAs). In at least one embodiment, a controller receives a task instruction to be performed using an environment. The controller receives an observation of a first state from the environment. The controller selects a LAA from the plurality of LAAs based on the task instruction and the observation. The controller obtains an output from the selected LAA generated using an input combining the task instruction, the observation, and an LAA-specific prompt template. The controller determines the action based on the output. The controller causes the action to be performed on the environment thereby causing the first state of the environment to change to a second state.

Type: Application

Filed: October 25, 2023

Publication date: February 13, 2025

Inventors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles Duque, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR LANGUAGE AGENT OPTIMIZATION

Publication number: 20250045567

Abstract: Embodiments described herein provide for optimizing a language model (LM) agent. In at least one embodiment, and LM agent comprises an “actor” LM and a “retrospective LM which provides reflections on attempts by the actor LM. The reflections are used to update subsequent prompts to the actor LM. Optimizing the LM agent comprises fine-tuning parameters of the retrospective LM while keeping parameters of the actor LM frozen. A gradient may be determined by a change in reward from the environment based on actions taken by the actor LM with and without a reflection of the retrospective LM. Using this gradient, parameters of the retrospective LM may be updated via backpropagation.

Type: Application

Filed: October 31, 2023

Publication date: February 6, 2025

Inventors: Weiran Yao, Shelby Heinecke, Juan Carlos Niebles Duque, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR MULTI-MODAL LANGUAGE MODELS

Publication number: 20240370718

Abstract: Embodiments described herein provide a method of generating a multi-modal task output to a text instruction relating to inputs of multiple different modalities (e.g., text, audio, video, 3D). The method comprises receiving, via a data interface, a first input of a first modality, a second input of a second modality and the text instruction relating to the first and the second inputs; encoding, by a first multimodal encoder adapted for the first modality, the first input of the first modality into a first encoded representation conditioned on the text instruction; encoding, by a second multimodal encoder adapted for the second modality, the second input of the second modality into a second encoded representation conditioned on the text instruction; and generating, by a neural network based language model, the multi-modal task output based on an input combining the first encoded representation, the second encoded representation, and the text instruction.

Type: Application

Filed: December 29, 2023

Publication date: November 7, 2024

Inventors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Silvio Savarese, Shafiq Rayhan Joty, Ran Xu, Caiming Xiong, Juan Carlos Niebles Duque
Systems and methods for contrastive pretraining with video tracking supervision

Patent number: 12106541

Abstract: Embodiments described herein provide an intelligent method to select instances, by utilizing unsupervised tracking for videos. Using this freely available form of supervision, a temporal constraint is adopted for selecting instances that ensures that different instances contain the same object while sampling the temporal augmentation from the video. In addition, using the information on the spatial extent of the tracked object, spatial constraints are applied to ensure that sampled instances overlap meaningfully with the tracked object. Taken together, these spatiotemporal constraints result in better supervisory signal for contrastive learning from videos.

Type: Grant

Filed: January 31, 2022

Date of Patent: October 1, 2024

Assignee: Salesforce, Inc.

Inventors: Brian Chen, Ramprasaath Ramasamy Selvaraju, Juan Carlos Niebles Duque, Nikhil Naik
SYSTEMS AND METHODS FOR MULTIMODAL PRETRAINING FOR THREE-DIMENSIONAL UNDERSTANDING MODELS

Publication number: 20240312128

Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.

Type: Application

Filed: October 24, 2023

Publication date: September 19, 2024

Inventors: Le Xue, Ning Yu, Shu Zhang, Junnan Li, Caiming Xiong, Silvio Savarese, Juan Carlos Niebles Duque, Ran Xu
Systems and Methods for Privacy-Preserving Optics

Publication number: 20240289990

Abstract: Systems and methods for privacy-preserving optics are described. An embodiments includes a method of preserving-privacy on captured images while performing a computer vision task that includes generating an optimal set of parameters to parameterize an encoding optical element to produce optical distortions such that acquired images by the camera are distorted, where the optimal set of parameters is learned via end-to-end learning that jointly optimizes from a camera optics model to a computational process that performs a computer vision task on the distorted images acquired by the camera, where the distorted images visually obscure a privacy attribute of people to protect their privacy but still preserve features to perform the computer vision task, acquiring several distorted images, and performing a computer vision task directly on the distorted images where the distortions generated by the camera are optimal and allow obtaining high performance on the computer vision task.

Type: Application

Filed: June 17, 2022

Publication date: August 29, 2024

Applicants: The Board of Trustees of the Leland Stanford Junior University, Universidad Industrial de Santander

Inventors: Juan Carlos Niebles, Carlos Hinojosa, Henry Arguello
SYSTEMS AND METHODS FOR ATTENTION MECHANISM IN THREE-DIMENSIONAL OBJECT DETECTION

Publication number: 20240169746

Abstract: Embodiments described herein provide a system for three-dimensional (3D) object detection. The system includes an input interface configured to obtain 3D point data describing spatial information of a plurality of points, and a memory storing a neural network based 3D object detection model having an encoder and a decoder. The system also includes processors to perform operations including: encoding, by the encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features; generating, by attention layers at the decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights.

Type: Application

Filed: January 30, 2023

Publication date: May 23, 2024

Inventors: Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Juan Carlos Niebles Duque, Caiming Xiong, Ran Xu
SYSTEMS AND METHODS FOR VIDEO MODELS WITH PROCEDURE UNDERSTANDING

Publication number: 20240161464

Abstract: Embodiments described herein provide systems and methods for training video models to perform a task from an input instructional video. A procedure knowledge graph (PKG) may be generated with nodes representing procedure steps, and edges representing relationships between the steps. The PKG may be generated based on text and/or video training data which includes procedures (e.g., instructional videos). Using the PKG, a video model may be trained using the PKG to provide supervisory training signals for a number of tasks. Once the model is trained, it may be fine-tuned for a specific task which benefits from the model being trained in a way that makes the model embed procedural information when encoding videos.

Type: Application

Filed: January 25, 2023

Publication date: May 16, 2024

Inventors: Roberto Martin-Martin, Silvio Savarese, Honglu Zhou, Juan Carlos Niebles Duque
SYSTEMS AND METHODS FOR MULTIMODAL LAYOUT DESIGNS OF DIGITAL PUBLICATIONS

Publication number: 20240104809

Abstract: Embodiments described herein provide systems and methods for multimodal layout generations for digital publications. The system may receive as inputs, a background image, one or more foreground texts, and one or more foreground images. Feature representations of the background image may be generated. The foreground inputs may be input to a layout generator which has cross attention to the background image feature representations in order to generate a layout comprising of bounding box parameters for each input item. A composite layout may be generated based on the inputs and generated bounding boxes. The resulting composite layout may then be displayed on a user interface.

Type: Application

Filed: January 30, 2023

Publication date: March 28, 2024

Inventors: Ning Yu, Chia-Chih Chen, Zeyuan Chen, Caiming Xiong, Juan Carlos Niebles Duque, Ran Xu, Rui Meng
Systems and methods for estimation of Parkinson's Disease gait impairment severity from videos using MDS-UPDRS

Patent number: 11918370

Abstract: Many embodiments of the invention include systems and methods for evaluating motion from a video, the method includes identifying a target individual in a set of one or more frames in a video, analyzing the set of frames to determine a set of pose parameters, generating a 3D body mesh based on the pose parameters, identifying joint positions for the target individual in the set of frames based on the generated 3D body mesh, predicting a motion evaluation score based on the identified join positions, providing an output based on the motion evaluation score.

Type: Grant

Filed: May 19, 2021

Date of Patent: March 5, 2024

Assignee: The Board of Trustees of the Leland Stanford Junior University

Inventors: Ehsan Adeli-Mosabbeb, Mandy Lu, Kathleen Poston, Juan Carlos Niebles
SYSTEMS AND METHODS FOR OPEN VOCABULARY INSTANCE SEGMENTATION IN UNANNOTATED IMAGES

Publication number: 20240070868

Abstract: Embodiments described herein provide an open-vocabulary instance segmentation framework that adopts a pre-trained vision-language model to develop a pipeline in detecting novel categories of instances.

Type: Application

Filed: January 25, 2023

Publication date: February 29, 2024

Inventors: Ning Yu, Vibashan Vishnukumar Sharmini, Chen Xing, Juan Carlos Niebles Duque, Ran Xu
Systems and Methods for Recognizing Human Actions from Privacy-Preserving Optics

Publication number: 20240021018

Abstract: Systems and methods of capturing privacy protected images and performing machine vision tasks are described. An embodiment includes a system that includes an optical component and an image processing application configured to capture distorted video using the optical component, where the optical component includes a set of optimal camera lens parameters ?*o learned using machine learning, performing a machine vision task on the distorted video, where the machine vision task includes a set of optimal action recognition parameters ?*c learned using the machine learning, and generating a classification based on the machine vision task, where the machine learning is jointly trained to optimize the optical element and the machine vision task.

Type: Application

Filed: July 13, 2023

Publication date: January 18, 2024

Applicants: The Board of Trustees of the Leland Stanford Junior University, Universidad Industrial de Santander

Inventors: Juan Carlos Niebles, Carlos Hinojosa, Henry Arguello, Miguel Marquez, Ehsan Adeli-Mosabbeb, Fei-Fei Li
SYSTEMS AND METHODS FOR CONTRASTIVE PRETRAINING WITH VIDEO TRACKING SUPERVISION

Publication number: 20230154139

Abstract: Embodiments described herein provide an intelligent method to select instances, by utilizing unsupervised tracking for videos. Using this freely available form of supervision, a temporal constraint is adopted for selecting instances that ensures that different instances contain the same object while sampling the temporal augmentation from the video. In addition, using the information on the spatial extent of the tracked object, spatial constraints are applied to ensure that sampled instances overlap meaningfully with the tracked object. Taken together, these spatiotemporal constraints result in better supervisory signal for contrastive learning from videos.

Type: Application

Filed: January 31, 2022

Publication date: May 18, 2023

Inventors: Brian Chen, Ramprasaath Ramasamy Selvaraju, Juan Carlos Niebles Duque, Nikhil Naik
Spatio-temporal graph for video captioning with knowledge distillation

Patent number: 11604936

Abstract: A method for scene perception using video captioning based on a spatio-temporal graph model is described. The method includes decomposing the spatio-temporal graph model of a scene in input video into a spatial graph and a temporal graph. The method also includes modeling a two branch framework having an object branch and a scene branch according to the spatial graph and the temporal graph to learn object interactions between the object branch and the scene branch. The method further includes transferring the learned object interactions from the object branch to the scene branch as privileged information. The method also includes captioning the scene by aligning language logits from the object branch and the scene branch according to the learned object interactions.

Type: Grant

Filed: March 23, 2020

Date of Patent: March 14, 2023

Assignees: TOYOTA RESEARCH INSTITUTE, INC., THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY

Inventors: Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien David Gaidon, Ehsan Adeli-Mosabbeb, Juan Carlos Niebles Duque

1 2 next