Patents by Inventor Silvio Savarese

Silvio Savarese has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR A UNIFIED TRAINING FRAMEWORK OF LARGE LANGUAGE MODELS

Publication number: 20250259054

Abstract: Embodiments described herein provide a unified LLM training pipeline that hands the diversity of various data structures and formats involving LLMs agent trajectories. These pipelines are specifically designed to transform incoming data into a standardized representation, ensuring compatibility across varied formats. Furthermore, the data collection undergoes a filtering process to ensure high-quality trajectories, adding an additional layer of refinement to the dataset. In this way, the training pipeline not only unifies trajectories across environments but also enhances the overall quality and reliability of the collected data for LLM training.

Type: Application

Filed: May 8, 2024

Publication date: August 14, 2025

Inventors: Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Shelby Heinecke, Yihao Feng, Huan Wang, Juan Carlos Niebles, Silvio Savarese, Caiming Xiong
Systems and methods for video models with procedure understanding

Patent number: 12367662

Abstract: Embodiments described herein provide systems and methods for training video models to perform a task from an input instructional video. A procedure knowledge graph (PKG) may be generated with nodes representing procedure steps, and edges representing relationships between the steps. The PKG may be generated based on text and/or video training data which includes procedures (e.g., instructional videos). Using the PKG, a video model may be trained using the PKG to provide supervisory training signals for a number of tasks. Once the model is trained, it may be fine-tuned for a specific task which benefits from the model being trained in a way that makes the model embed procedural information when encoding videos.

Type: Grant

Filed: January 25, 2023

Date of Patent: July 22, 2025

Assignee: Salesforce, Inc.

Inventors: Roberto Martin-Martin, Silvio Savarese, Honglu Zhou, Juan Carlos Niebles Duque
SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE AGENTS

Publication number: 20250139411

Abstract: Embodiments described herein provide a large language model (LLM) based AI agent that adopts Monte-Carlo Tree Search (MCTS) to execute a task. The LLM is prompted with a task description and it responds with its first attempted list of actions. Based on the success or failure of the first attempt, the LLM is prompted with an updated prompt which includes feedback from the first attempt based on a determined reward. The prompt may include a relative “score” for each action taken at each step. A numeric score may be mapped to a set of pre-defined text labels, such as “high” or “low” value putting the score in a form more suited for an LLM prompt. In this way, the LLM is iteratively given prompts which are updated with the scores from each action taken at each previous iterations so that it traverses different paths on the tree in each iteration.

Type: Application

Filed: October 31, 2023

Publication date: May 1, 2025

Inventors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles Duque, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR EDITING A LARGE LANGUAGE MODEL

Publication number: 20250124233

Abstract: Systems and methods for editing a large language model are provided. The large language model generates a sequence of tokens, a first probability of a pre-edit output based on the sequence of tokens, and a second probability of a target output based on the sequence of tokens. A loss function is provided based on the first probability and the second probability. A plurality of gradients of the large language model with respect to the loss function is computed. An edit location of the large language model is determined based on the plurality of gradients. The large language model is edited by editing weights at the edit location of the large language model, such that the updated large language model generates the target output for an input including the sequence of words.

Type: Application

Filed: January 31, 2024

Publication date: April 17, 2025

Inventors: Itai Izhak Feigenbaum, Devansh Arpit, Shelby Heinecke, Juan Carlos Niebles Duque, Weiran Yao, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR TEXT GENERATION WITH VOCABULARY DETOXIFICATION

Publication number: 20250111155

Abstract: Embodiments described herein provide a method for mitigating toxic content in text generation by a neural network based framework. The method includes the following operations. A text input of a sequence of tokens is received via a communication interface. A first output probability for a next token generating is generated by a first neural network model that is trained to generate tokens belonging to a prioritized category of vocabulary, in response to the text input. A second output probability of the next token is generated by a second neural network model that is trained to generate tokens belonging to an indiscriminate vocabulary, in response to the text input. The next token for a text output based on a combined output probability computed based on a correction item reflective of the first output probability and the second output probability is generated in response to the text input.

Type: Application

Filed: January 18, 2024

Publication date: April 3, 2025

Inventors: Tong Niu, Yingbo Zhou, Silvio Savarese, Semih Yavuz, Caiming Xiong
SYSTEMS AND METHODS FOR CONTROLLABLE DATA GENERATION FROM TEXT

Publication number: 20250068901

Abstract: Embodiments described herein provide a diffusion-based framework that is trained on a dataset with limited text labels, to generate a distribution of data samples in the dataset given a specific text description label. Specifically, firstly, unlabeled data is used to train the diffusion model to generate a data distribution of data samples given a specific text description label. Then text-labeled data samples are used to finetune the diffusion model to generate data distribution given a specific text description label, thus enhancing controllability of training.

Type: Application

Filed: January 25, 2024

Publication date: February 27, 2025

Inventors: Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, Ran Xu, Huan Wang, Caiming Xiong, Silvio Savarese
Methods and systems to remotely operate robotic devices

Patent number: 12226913

Abstract: Methods and systems to remotely operate robotic devices are provided. A number of embodiments allow users to remotely operate robotic devices using generalized consumer devices (e.g., cell phones). Additional embodiments provide for a platform to allow communication between consumer devices and the robotic devices. Further embodiments allow for training robotic devices to operate autonomously by training the robotic device with machine learning algorithms using data collected from scalable methods of controlling robotic devices.

Type: Grant

Filed: November 2, 2020

Date of Patent: February 18, 2025

Assignee: The Board of Trustees of the Leland Stanford Junior University

Inventors: Ajay U. Mandlekar, Yuke Zhu, Animesh Garg, Silvio Savarese, Fei-Fei Li
SYSTEMS AND METHODS FOR ORCHESTRATING LLM-AUGMENTED AUTONOMOUS AGENTS

Publication number: 20250053793

Abstract: Embodiments described herein provide a method of predicting an action by a plurality of language model augmented agents (LAAs). In at least one embodiment, a controller receives a task instruction to be performed using an environment. The controller receives an observation of a first state from the environment. The controller selects a LAA from the plurality of LAAs based on the task instruction and the observation. The controller obtains an output from the selected LAA generated using an input combining the task instruction, the observation, and an LAA-specific prompt template. The controller determines the action based on the output. The controller causes the action to be performed on the environment thereby causing the first state of the environment to change to a second state.

Type: Application

Filed: October 25, 2023

Publication date: February 13, 2025

Inventors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles Duque, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR LANGUAGE AGENT OPTIMIZATION

Publication number: 20250045567

Abstract: Embodiments described herein provide for optimizing a language model (LM) agent. In at least one embodiment, and LM agent comprises an “actor” LM and a “retrospective LM which provides reflections on attempts by the actor LM. The reflections are used to update subsequent prompts to the actor LM. Optimizing the LM agent comprises fine-tuning parameters of the retrospective LM while keeping parameters of the actor LM frozen. A gradient may be determined by a change in reward from the environment based on actions taken by the actor LM with and without a reflection of the retrospective LM. Using this gradient, parameters of the retrospective LM may be updated via backpropagation.

Type: Application

Filed: October 31, 2023

Publication date: February 6, 2025

Inventors: Weiran Yao, Shelby Heinecke, Juan Carlos Niebles Duque, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR MULTI-MODAL LANGUAGE MODELS

Publication number: 20240370718

Abstract: Embodiments described herein provide a method of generating a multi-modal task output to a text instruction relating to inputs of multiple different modalities (e.g., text, audio, video, 3D). The method comprises receiving, via a data interface, a first input of a first modality, a second input of a second modality and the text instruction relating to the first and the second inputs; encoding, by a first multimodal encoder adapted for the first modality, the first input of the first modality into a first encoded representation conditioned on the text instruction; encoding, by a second multimodal encoder adapted for the second modality, the second input of the second modality into a second encoded representation conditioned on the text instruction; and generating, by a neural network based language model, the multi-modal task output based on an input combining the first encoded representation, the second encoded representation, and the text instruction.

Type: Application

Filed: December 29, 2023

Publication date: November 7, 2024

Inventors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Silvio Savarese, Shafiq Rayhan Joty, Ran Xu, Caiming Xiong, Juan Carlos Niebles Duque
SYSTEMS AND METHODS FOR MULTIMODAL PRETRAINING FOR THREE-DIMENSIONAL UNDERSTANDING MODELS

Publication number: 20240312128

Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.

Type: Application

Filed: October 24, 2023

Publication date: September 19, 2024

Inventors: Le Xue, Ning Yu, Shu Zhang, Junnan Li, Caiming Xiong, Silvio Savarese, Juan Carlos Niebles Duque, Ran Xu
SYSTEMS AND METHODS FOR VIDEO MODELS WITH PROCEDURE UNDERSTANDING

Publication number: 20240161464

Abstract: Embodiments described herein provide systems and methods for training video models to perform a task from an input instructional video. A procedure knowledge graph (PKG) may be generated with nodes representing procedure steps, and edges representing relationships between the steps. The PKG may be generated based on text and/or video training data which includes procedures (e.g., instructional videos). Using the PKG, a video model may be trained using the PKG to provide supervisory training signals for a number of tasks. Once the model is trained, it may be fine-tuned for a specific task which benefits from the model being trained in a way that makes the model embed procedural information when encoding videos.

Type: Application

Filed: January 25, 2023

Publication date: May 16, 2024

Inventors: Roberto Martin-Martin, Silvio Savarese, Honglu Zhou, Juan Carlos Niebles Duque
CLOUD-BASED RESOURCE THROTTLING PREDICTION

Publication number: 20240118937

Abstract: Embodiments herein relate to prediction, based on previous usage of a cloud-based computing resource by a user of one or more users of the cloud-based computing resource, future usage of the cloud-based computing resource. Based on the predicted future usage, embodiments relate to identifying that throttling of access to the cloud-based computing resource is to occur, and notifying the user of the throttling. Other embodiments may be described and/or claimed.

Type: Application

Filed: October 7, 2022

Publication date: April 11, 2024

Applicant: Salesforce, Inc.

Inventors: Bo Zong, Huan Wang, Tian Lan, Ran Yao, Tony Wong, Daeki Cho, Caiming Xiong, Silvio Savarese, Yingbo Zhou
Methods and Systems to Remotely Operate Robotic Devices

Publication number: 20230226696

Abstract: Methods and systems to remotely operate robotic devices are provided. A number of embodiments allow users to remotely operate robotic devices using generalized consumer devices (e.g., cell phones). Additional embodiments provide for a platform to allow communication between consumer devices and the robotic devices. Further embodiments allow for training robotic devices to operate autonomously by training the robotic device with machine learning algorithms using data collected from scalable methods of controlling robotic devices.

Type: Application

Filed: November 2, 2020

Publication date: July 20, 2023

Applicant: The Board of Trustees of the Leland Stanford Junior University

Inventors: Ajay U. Mandlekar, Yuke Zhu, Animesh Garg, Silvio Savarese, Fei-Fei Li
Data annotation method and apparatus for enhanced machine learning

Patent number: 11301775

Abstract: A data annotation apparatus for machine learning is provided, which includes a stimulus generation portion, a biometrics reading portion, and a data integration portion. The stimulus generation portion is configured to generate, and present to an agent, at least one stimulus based on a first data from a first machine learning dataset. The biometrics reading portion is configured to measure at least one response of the agent to the at least one stimulus, and to generate biometrics data based on the at least one response. The data integration portion is configured to integrate the biometrics data, data of the at least one stimulus, and data of the first machine learning dataset to thereby obtain a second machine learning dataset. The data annotation apparatus can result in an improved data labeling and an enhanced machine learning.

Type: Grant

Filed: August 24, 2017

Date of Patent: April 12, 2022

Assignee: CloudMinds Robotics Co., Ltd.

Inventors: Qiang Li, Silvio Savarese, Charles Robert Jankowski, Jr., William Xiao-Qing Huang, Zhe Zhang, Xiaoli Fern
Systems and methods for semantic segmentation of 3D point clouds

Patent number: 11004202

Abstract: Systems and methods for obtaining 3D point-level segmentation of 3D point clouds in accordance with various embodiments of the invention are disclosed. One embodiment includes: at least one processor, and a memory containing a segmentation pipeline application. In addition, the segmentation pipeline application configures the at least one processor to: pre-process a 3D point cloud to group 3D points; provide the groups of 3D points to a 3D neural network to generate initial label predictions for the groups of 3D points; interpolate label predictions for individual 3D points based upon initial label predictions for at least two neighboring groups of 3D points including the group of 3D points to which a given individual 3D point belongs; refine the label predictions using a graph neural network; and output a segmented 3D point cloud.

Type: Grant

Filed: October 9, 2018

Date of Patent: May 11, 2021

Assignee: The Board of Trustees of the Leland Stanford Junior University

Inventors: Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, Silvio Savarese
Hybrid detection recognition system

Patent number: 10922353

Abstract: A system and method for determining an object or product represented in an image is disclosed. The system receives a first image, determines a region of interest in the first image, determines a classification score for the region of interest using a convolutional neural network that assigns the region of interest the classification score corresponding to a class, and identifies a first product in the first image based on the classification score.

Type: Grant

Filed: February 1, 2019

Date of Patent: February 16, 2021

Assignee: Ricoh Company, Ltd.

Inventors: Junghyun Kwon, Ramya Narasimha, Edward L. Schwartz, Max McFarland, Silvio Savarese, Kathrin Berkner
View synthesis using deep convolutional neural networks

Patent number: 10846836

Abstract: Disclosed is a system and method for generating intermediate views between two received images. To generate the intermediate views, a rectification network rectifies the two images and an encoder network encodes the two rectified images to generate convolutional neural network features. The convolutional neural network features are fed to a decoder network that decodes the features to generate a correspondence between the two rectified images and blending masks to predict the visibility of pixels of the rectified images in the intermediate view images. Using the correspondence between the two rectified images and blending masks, a view morphing network synthesizes intermediate view images depicting an object in the two images in a view between the two images.

Type: Grant

Filed: November 14, 2016

Date of Patent: November 24, 2020

Assignee: RICOH COMPANY, LTD.

Inventors: Junghyun Kwon, Dinghuang Ji, Max McFarland, Silvio Savarese
Single image rectification

Patent number: 10489893

Abstract: The disclosure includes a system and method for performing image rectification using a single image and information identified from the single image. An image recognition application receives an input image, identifies a plurality of objects in the input image, estimates rectification parameters for the plurality of objects, identifies a plurality of candidate rectification parameters using a voting procedure on the rectification parameters for the plurality of objects, estimates final rectification parameters based on the plurality of candidate rectification parameters, computes a global transformation matrix using the final rectification parameters, and performs image rectification on the input image using the global transformation matrix.

Type: Grant

Filed: January 29, 2018

Date of Patent: November 26, 2019

Assignee: Ricoh Company, Ltd.

Inventors: Jorge Moraleda, Ekta Prashnani, Michael J. Gormish, Kathrin Berkner, Silvio Savarese
Systems and methods for performing three-dimensional semantic parsing of indoor spaces

Patent number: 10424065

Abstract: Systems and methods for performing three-dimensional semantic parsing of indoor spaces in accordance with embodiments of the invention are disclosed. In one embodiment, a method includes receiving input data representing a three-dimensional space, determining disjointed spaces within the received data by generating a density histogram on each of a plurality of axes, determining space dividers based on the generated density histogram, and dividing the point cloud data into segments based on the determined space dividers, and determining elements in the disjointed spaces by aligning the disjointed spaces within the point cloud data along similar axes to create aligned versions of the disjointed spaces normalizing the aligned version of the disjointed spaces into the aligned version of the disjointed spaces, determining features in the disjointed spaces, generating at least one detection score, and filtering the at least one detection score to determine a final set of determined elements.

Type: Grant

Filed: June 9, 2017

Date of Patent: September 24, 2019

Assignee: The Board of Trustees of the Leland Stanford Junior University

Inventors: Iro Armeni, Ozan Sener, Amir R. Zamir, Martin Fischer, Silvio Savarese

1 2 3 next