Patents by Inventor Silvio Savarese

Silvio Savarese has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250111155
    Abstract: Embodiments described herein provide a method for mitigating toxic content in text generation by a neural network based framework. The method includes the following operations. A text input of a sequence of tokens is received via a communication interface. A first output probability for a next token generating is generated by a first neural network model that is trained to generate tokens belonging to a prioritized category of vocabulary, in response to the text input. A second output probability of the next token is generated by a second neural network model that is trained to generate tokens belonging to an indiscriminate vocabulary, in response to the text input. The next token for a text output based on a combined output probability computed based on a correction item reflective of the first output probability and the second output probability is generated in response to the text input.
    Type: Application
    Filed: January 18, 2024
    Publication date: April 3, 2025
    Inventors: Tong Niu, Yingbo Zhou, Silvio Savarese, Semih Yavuz, Caiming Xiong
  • Publication number: 20250068901
    Abstract: Embodiments described herein provide a diffusion-based framework that is trained on a dataset with limited text labels, to generate a distribution of data samples in the dataset given a specific text description label. Specifically, firstly, unlabeled data is used to train the diffusion model to generate a data distribution of data samples given a specific text description label. Then text-labeled data samples are used to finetune the diffusion model to generate data distribution given a specific text description label, thus enhancing controllability of training.
    Type: Application
    Filed: January 25, 2024
    Publication date: February 27, 2025
    Inventors: Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, Ran Xu, Huan Wang, Caiming Xiong, Silvio Savarese
  • Patent number: 12226913
    Abstract: Methods and systems to remotely operate robotic devices are provided. A number of embodiments allow users to remotely operate robotic devices using generalized consumer devices (e.g., cell phones). Additional embodiments provide for a platform to allow communication between consumer devices and the robotic devices. Further embodiments allow for training robotic devices to operate autonomously by training the robotic device with machine learning algorithms using data collected from scalable methods of controlling robotic devices.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: February 18, 2025
    Assignee: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Ajay U. Mandlekar, Yuke Zhu, Animesh Garg, Silvio Savarese, Fei-Fei Li
  • Publication number: 20250053793
    Abstract: Embodiments described herein provide a method of predicting an action by a plurality of language model augmented agents (LAAs). In at least one embodiment, a controller receives a task instruction to be performed using an environment. The controller receives an observation of a first state from the environment. The controller selects a LAA from the plurality of LAAs based on the task instruction and the observation. The controller obtains an output from the selected LAA generated using an input combining the task instruction, the observation, and an LAA-specific prompt template. The controller determines the action based on the output. The controller causes the action to be performed on the environment thereby causing the first state of the environment to change to a second state.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 13, 2025
    Inventors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles Duque, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
  • Publication number: 20250045567
    Abstract: Embodiments described herein provide for optimizing a language model (LM) agent. In at least one embodiment, and LM agent comprises an “actor” LM and a “retrospective LM which provides reflections on attempts by the actor LM. The reflections are used to update subsequent prompts to the actor LM. Optimizing the LM agent comprises fine-tuning parameters of the retrospective LM while keeping parameters of the actor LM frozen. A gradient may be determined by a change in reward from the environment based on actions taken by the actor LM with and without a reflection of the retrospective LM. Using this gradient, parameters of the retrospective LM may be updated via backpropagation.
    Type: Application
    Filed: October 31, 2023
    Publication date: February 6, 2025
    Inventors: Weiran Yao, Shelby Heinecke, Juan Carlos Niebles Duque, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
  • Publication number: 20240370718
    Abstract: Embodiments described herein provide a method of generating a multi-modal task output to a text instruction relating to inputs of multiple different modalities (e.g., text, audio, video, 3D). The method comprises receiving, via a data interface, a first input of a first modality, a second input of a second modality and the text instruction relating to the first and the second inputs; encoding, by a first multimodal encoder adapted for the first modality, the first input of the first modality into a first encoded representation conditioned on the text instruction; encoding, by a second multimodal encoder adapted for the second modality, the second input of the second modality into a second encoded representation conditioned on the text instruction; and generating, by a neural network based language model, the multi-modal task output based on an input combining the first encoded representation, the second encoded representation, and the text instruction.
    Type: Application
    Filed: December 29, 2023
    Publication date: November 7, 2024
    Inventors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Silvio Savarese, Shafiq Rayhan Joty, Ran Xu, Caiming Xiong, Juan Carlos Niebles Duque
  • Publication number: 20240312128
    Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.
    Type: Application
    Filed: October 24, 2023
    Publication date: September 19, 2024
    Inventors: Le Xue, Ning Yu, Shu Zhang, Junnan Li, Caiming Xiong, Silvio Savarese, Juan Carlos Niebles Duque, Ran Xu
  • Publication number: 20240169704
    Abstract: Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.
    Type: Application
    Filed: March 13, 2023
    Publication date: May 23, 2024
    Inventors: Le XUE, Chen XING, Juan Carlos NIEBLES DUQUE, Caiming XIONG, Ran XU, Silvio SAVARESE
  • Publication number: 20240161464
    Abstract: Embodiments described herein provide systems and methods for training video models to perform a task from an input instructional video. A procedure knowledge graph (PKG) may be generated with nodes representing procedure steps, and edges representing relationships between the steps. The PKG may be generated based on text and/or video training data which includes procedures (e.g., instructional videos). Using the PKG, a video model may be trained using the PKG to provide supervisory training signals for a number of tasks. Once the model is trained, it may be fine-tuned for a specific task which benefits from the model being trained in a way that makes the model embed procedural information when encoding videos.
    Type: Application
    Filed: January 25, 2023
    Publication date: May 16, 2024
    Inventors: Roberto Martin-Martin, Silvio Savarese, Honglu Zhou, Juan Carlos Niebles Duque
  • Publication number: 20240160917
    Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.
    Type: Application
    Filed: March 13, 2023
    Publication date: May 16, 2024
    Inventors: Le XUE, Chen XING, Juan Carlos NIEBLES DUQUE, Caiming XIONG, Ran XU, Silvio SAVARESE
  • Publication number: 20240118937
    Abstract: Embodiments herein relate to prediction, based on previous usage of a cloud-based computing resource by a user of one or more users of the cloud-based computing resource, future usage of the cloud-based computing resource. Based on the predicted future usage, embodiments relate to identifying that throttling of access to the cloud-based computing resource is to occur, and notifying the user of the throttling. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: October 7, 2022
    Publication date: April 11, 2024
    Applicant: Salesforce, Inc.
    Inventors: Bo Zong, Huan Wang, Tian Lan, Ran Yao, Tony Wong, Daeki Cho, Caiming Xiong, Silvio Savarese, Yingbo Zhou
  • Publication number: 20230226696
    Abstract: Methods and systems to remotely operate robotic devices are provided. A number of embodiments allow users to remotely operate robotic devices using generalized consumer devices (e.g., cell phones). Additional embodiments provide for a platform to allow communication between consumer devices and the robotic devices. Further embodiments allow for training robotic devices to operate autonomously by training the robotic device with machine learning algorithms using data collected from scalable methods of controlling robotic devices.
    Type: Application
    Filed: November 2, 2020
    Publication date: July 20, 2023
    Applicant: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Ajay U. Mandlekar, Yuke Zhu, Animesh Garg, Silvio Savarese, Fei-Fei Li
  • Patent number: 11301775
    Abstract: A data annotation apparatus for machine learning is provided, which includes a stimulus generation portion, a biometrics reading portion, and a data integration portion. The stimulus generation portion is configured to generate, and present to an agent, at least one stimulus based on a first data from a first machine learning dataset. The biometrics reading portion is configured to measure at least one response of the agent to the at least one stimulus, and to generate biometrics data based on the at least one response. The data integration portion is configured to integrate the biometrics data, data of the at least one stimulus, and data of the first machine learning dataset to thereby obtain a second machine learning dataset. The data annotation apparatus can result in an improved data labeling and an enhanced machine learning.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: April 12, 2022
    Assignee: CloudMinds Robotics Co., Ltd.
    Inventors: Qiang Li, Silvio Savarese, Charles Robert Jankowski, Jr., William Xiao-Qing Huang, Zhe Zhang, Xiaoli Fern
  • Patent number: 11004202
    Abstract: Systems and methods for obtaining 3D point-level segmentation of 3D point clouds in accordance with various embodiments of the invention are disclosed. One embodiment includes: at least one processor, and a memory containing a segmentation pipeline application. In addition, the segmentation pipeline application configures the at least one processor to: pre-process a 3D point cloud to group 3D points; provide the groups of 3D points to a 3D neural network to generate initial label predictions for the groups of 3D points; interpolate label predictions for individual 3D points based upon initial label predictions for at least two neighboring groups of 3D points including the group of 3D points to which a given individual 3D point belongs; refine the label predictions using a graph neural network; and output a segmented 3D point cloud.
    Type: Grant
    Filed: October 9, 2018
    Date of Patent: May 11, 2021
    Assignee: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, Silvio Savarese
  • Patent number: 10922353
    Abstract: A system and method for determining an object or product represented in an image is disclosed. The system receives a first image, determines a region of interest in the first image, determines a classification score for the region of interest using a convolutional neural network that assigns the region of interest the classification score corresponding to a class, and identifies a first product in the first image based on the classification score.
    Type: Grant
    Filed: February 1, 2019
    Date of Patent: February 16, 2021
    Assignee: Ricoh Company, Ltd.
    Inventors: Junghyun Kwon, Ramya Narasimha, Edward L. Schwartz, Max McFarland, Silvio Savarese, Kathrin Berkner
  • Patent number: 10846836
    Abstract: Disclosed is a system and method for generating intermediate views between two received images. To generate the intermediate views, a rectification network rectifies the two images and an encoder network encodes the two rectified images to generate convolutional neural network features. The convolutional neural network features are fed to a decoder network that decodes the features to generate a correspondence between the two rectified images and blending masks to predict the visibility of pixels of the rectified images in the intermediate view images. Using the correspondence between the two rectified images and blending masks, a view morphing network synthesizes intermediate view images depicting an object in the two images in a view between the two images.
    Type: Grant
    Filed: November 14, 2016
    Date of Patent: November 24, 2020
    Assignee: RICOH COMPANY, LTD.
    Inventors: Junghyun Kwon, Dinghuang Ji, Max McFarland, Silvio Savarese
  • Patent number: 10489893
    Abstract: The disclosure includes a system and method for performing image rectification using a single image and information identified from the single image. An image recognition application receives an input image, identifies a plurality of objects in the input image, estimates rectification parameters for the plurality of objects, identifies a plurality of candidate rectification parameters using a voting procedure on the rectification parameters for the plurality of objects, estimates final rectification parameters based on the plurality of candidate rectification parameters, computes a global transformation matrix using the final rectification parameters, and performs image rectification on the input image using the global transformation matrix.
    Type: Grant
    Filed: January 29, 2018
    Date of Patent: November 26, 2019
    Assignee: Ricoh Company, Ltd.
    Inventors: Jorge Moraleda, Ekta Prashnani, Michael J. Gormish, Kathrin Berkner, Silvio Savarese
  • Patent number: 10424065
    Abstract: Systems and methods for performing three-dimensional semantic parsing of indoor spaces in accordance with embodiments of the invention are disclosed. In one embodiment, a method includes receiving input data representing a three-dimensional space, determining disjointed spaces within the received data by generating a density histogram on each of a plurality of axes, determining space dividers based on the generated density histogram, and dividing the point cloud data into segments based on the determined space dividers, and determining elements in the disjointed spaces by aligning the disjointed spaces within the point cloud data along similar axes to create aligned versions of the disjointed spaces normalizing the aligned version of the disjointed spaces into the aligned version of the disjointed spaces, determining features in the disjointed spaces, generating at least one detection score, and filtering the at least one detection score to determine a final set of determined elements.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: September 24, 2019
    Assignee: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Iro Armeni, Ozan Sener, Amir R. Zamir, Martin Fischer, Silvio Savarese
  • Publication number: 20190163698
    Abstract: A system and method for determining an object or product represented in an image is disclosed. The system receives a first image, determines a region of interest in the first image, determines a classification score for the region of interest using a convolutional neural network that assigns the region of interest the classification score corresponding to a class, and identifies a first product in the first image based on the classification score.
    Type: Application
    Filed: February 1, 2019
    Publication date: May 30, 2019
    Applicant: Ricoh Company, Ltd.
    Inventors: Junghyun Kwon, Ramya Narasimha, Edward L. Schwartz, Max McFarland, Silvio Savarese, Kathrin Berkner
  • Publication number: 20190108639
    Abstract: Systems and methods for obtaining 3D point-level segmentation of 3D point clouds in accordance with various embodiments of the invention are disclosed. One embodiment includes: at least one processor, and a memory containing a segmentation pipeline application. In addition, the segmentation pipeline application configures the at least one processor to: pre-process a 3D point cloud to group 3D points; provide the groups of 3D points to a 3D neural network to generate initial label predictions for the groups of 3D points; interpolate label predictions for individual 3D points based upon initial label predictions for at least two neighboring groups of 3D points including the group of 3D points to which a given individual 3D point belongs; refine the label predictions using a graph neural network; and output a segmented 3D point cloud.
    Type: Application
    Filed: October 9, 2018
    Publication date: April 11, 2019
    Applicant: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, Silvio Savarese