Patents by Inventor Le Xue

Le Xue has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE AGENTS

Publication number: 20250139411

Abstract: Embodiments described herein provide a large language model (LLM) based AI agent that adopts Monte-Carlo Tree Search (MCTS) to execute a task. The LLM is prompted with a task description and it responds with its first attempted list of actions. Based on the success or failure of the first attempt, the LLM is prompted with an updated prompt which includes feedback from the first attempt based on a determined reward. The prompt may include a relative “score” for each action taken at each step. A numeric score may be mapped to a set of pre-defined text labels, such as “high” or “low” value putting the score in a form more suited for an LLM prompt. In this way, the LLM is iteratively given prompts which are updated with the scores from each action taken at each previous iterations so that it traverses different paths on the tree in each iteration.

Type: Application

Filed: October 31, 2023

Publication date: May 1, 2025

Inventors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles Duque, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR ORCHESTRATING LLM-AUGMENTED AUTONOMOUS AGENTS

Publication number: 20250053793

Abstract: Embodiments described herein provide a method of predicting an action by a plurality of language model augmented agents (LAAs). In at least one embodiment, a controller receives a task instruction to be performed using an environment. The controller receives an observation of a first state from the environment. The controller selects a LAA from the plurality of LAAs based on the task instruction and the observation. The controller obtains an output from the selected LAA generated using an input combining the task instruction, the observation, and an LAA-specific prompt template. The controller determines the action based on the output. The controller causes the action to be performed on the environment thereby causing the first state of the environment to change to a second state.

Type: Application

Filed: October 25, 2023

Publication date: February 13, 2025

Inventors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles Duque, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR LANGUAGE AGENT OPTIMIZATION

Publication number: 20250045567

Abstract: Embodiments described herein provide for optimizing a language model (LM) agent. In at least one embodiment, and LM agent comprises an “actor” LM and a “retrospective LM which provides reflections on attempts by the actor LM. The reflections are used to update subsequent prompts to the actor LM. Optimizing the LM agent comprises fine-tuning parameters of the retrospective LM while keeping parameters of the actor LM frozen. A gradient may be determined by a change in reward from the environment based on actions taken by the actor LM with and without a reflection of the retrospective LM. Using this gradient, parameters of the retrospective LM may be updated via backpropagation.

Type: Application

Filed: October 31, 2023

Publication date: February 6, 2025

Inventors: Weiran Yao, Shelby Heinecke, Juan Carlos Niebles Duque, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Lik Mui, Huan Wang, Caiming Xiong, Silvio Savarese
SYSTEMS AND METHODS FOR MULTI-MODAL LANGUAGE MODELS

Publication number: 20240370718

Abstract: Embodiments described herein provide a method of generating a multi-modal task output to a text instruction relating to inputs of multiple different modalities (e.g., text, audio, video, 3D). The method comprises receiving, via a data interface, a first input of a first modality, a second input of a second modality and the text instruction relating to the first and the second inputs; encoding, by a first multimodal encoder adapted for the first modality, the first input of the first modality into a first encoded representation conditioned on the text instruction; encoding, by a second multimodal encoder adapted for the second modality, the second input of the second modality into a second encoded representation conditioned on the text instruction; and generating, by a neural network based language model, the multi-modal task output based on an input combining the first encoded representation, the second encoded representation, and the text instruction.

Type: Application

Filed: December 29, 2023

Publication date: November 7, 2024

Inventors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Silvio Savarese, Shafiq Rayhan Joty, Ran Xu, Caiming Xiong, Juan Carlos Niebles Duque
SYSTEMS AND METHODS FOR MULTIMODAL PRETRAINING FOR THREE-DIMENSIONAL UNDERSTANDING MODELS

Publication number: 20240312128

Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.

Type: Application

Filed: October 24, 2023

Publication date: September 19, 2024

Inventors: Le Xue, Ning Yu, Shu Zhang, Junnan Li, Caiming Xiong, Silvio Savarese, Juan Carlos Niebles Duque, Ran Xu
SYSTEMS AND METHODS FOR LEARNING UNIFIED REPRESENTATIONS OF LANGUAGE, IMAGE, AND POINT CLOUD FOR THREE-DIMENSIONAL RECOGNITION

Publication number: 20240169704

Abstract: Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.

Type: Application

Filed: March 13, 2023

Publication date: May 23, 2024

Inventors: Le XUE, Chen XING, Juan Carlos NIEBLES DUQUE, Caiming XIONG, Ran XU, Silvio SAVARESE
SYSTEMS AND METHODS FOR ATTENTION MECHANISM IN THREE-DIMENSIONAL OBJECT DETECTION

Publication number: 20240169746

Abstract: Embodiments described herein provide a system for three-dimensional (3D) object detection. The system includes an input interface configured to obtain 3D point data describing spatial information of a plurality of points, and a memory storing a neural network based 3D object detection model having an encoder and a decoder. The system also includes processors to perform operations including: encoding, by the encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features; generating, by attention layers at the decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights.

Type: Application

Filed: January 30, 2023

Publication date: May 23, 2024

Inventors: Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Juan Carlos Niebles Duque, Caiming Xiong, Ran Xu
SYSTEMS AND METHODS FOR LEARNING UNIFIED REPRESENTATIONS OF LANGUAGE, IMAGE, AND POINT CLOUD FOR THREE-DIMENSIONAL RECOGNITION

Publication number: 20240160917

Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.

Type: Application

Filed: March 13, 2023

Publication date: May 16, 2024

Inventors: Le XUE, Chen XING, Juan Carlos NIEBLES DUQUE, Caiming XIONG, Ran XU, Silvio SAVARESE
Image analysis based document processing for inference of key-value pairs in non-fixed digital documents

Patent number: 11699297

Abstract: An online system extracts information from non-fixed form documents. The online system receives an image of a form document and obtains a set of phrases and locations of the set of phrases on the form image. For at least one field, the online system determines key scores for the set of phrases. The online system identifies a set of candidate values for the field from the set of identified phrases and identifies a set of neighbors for each candidate value from the set of identified phrases. The online system determines neighbor scores, where a neighbor score for a candidate value and a respective neighbor is determined based on the key score for the neighbor and a spatial relationship of the neighbor to the candidate value. The online system selects a candidate value and a respective neighbor based on the neighbor score as the value and key for the field.

Type: Grant

Filed: January 4, 2021

Date of Patent: July 11, 2023

Assignee: Salesforce, Inc.

Inventors: Mingfei Gao, Zeyuan Chen, Le Xue, Ran Xu, Caiming Xiong
IMAGE ANALYSIS BASED DOCUMENT PROCESSING FOR INFERENCE OF KEY-VALUE PAIRS IN NON-FIXED DIGITAL DOCUMENTS

Publication number: 20220215195

Abstract: An online system extracts information from non-fixed form documents. The online system receives an image of a form document and obtains a set of phrases and locations of the set of phrases on the form image. For at least one field, the online system determines key scores for the set of phrases. The online system identifies a set of candidate values for the field from the set of identified phrases and identifies a set of neighbors for each candidate value from the set of identified phrases. The online system determines neighbor scores, where a neighbor score for a candidate value and a respective neighbor is determined based on the key score for the neighbor and a spatial relationship of the neighbor to the candidate value. The online system selects a candidate value and a respective neighbor based on the neighbor score as the value and key for the field.

Type: Application

Filed: January 4, 2021

Publication date: July 7, 2022

Inventors: Mingfei Gao, Zeyuan Chen, Le Xue, Ran Xu, Caiming Xiong