Patents by Inventor Honglak LEE

Honglak LEE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Image manipulation by text instruction

Patent number: 11900517

Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.

Type: Grant

Filed: December 20, 2022

Date of Patent: February 13, 2024

Assignee: Google LLC

Inventors: Tianhao Zhang, Weilong Yang, Honglak Lee, Hung-Yu Tseng, Irfan Aziz Essa, Lu Jiang
IMAGE MANIPULATION BY TEXT INSTRUCTION

Publication number: 20230177754

Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.

Type: Application

Filed: December 20, 2022

Publication date: June 8, 2023

Inventors: Tianhao Zhang, Weilong Yang, Honglak Lee, Hung-Yu Tseng, Irfan Aziz Essa, Lu Jiang
Cross-Modal Contrastive Learning for Text-to-Image Generation based on Machine Learning Models

Publication number: 20230081171

Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.

Type: Application

Filed: September 7, 2021

Publication date: March 16, 2023

Inventors: Han Zhang, Jing Yu Koh, Jason Michael Baldridge, Yinfei Yang, Honglak Lee
Systems And Methods For Generating Predicted Visual Observations Of An Environment Using Machine Learned Models

Publication number: 20230072293

Abstract: A computing system for generating predicted images along a trajectory of unseen viewpoints. The system can obtain one or more spatial observations of an environment that may be captured from one or more previous camera poses. The system can generate a three-dimensional point cloud for the environment from the one or more spatial observations and the one or more previous camera poses. The system can project the three-dimensional point cloud into two-dimensional space to form one or more guidance spatial observations. The system can process the one or more guidance spatial observations with a machine-learned spatial observation prediction model to generate one or more predicted spatial observations. The system can process the one or more predicted spatial observations and image data with a machine-learned image prediction model to generate one or more predicted images from the target camera pose. The system can output the one or more predicted images.

Type: Application

Filed: August 23, 2021

Publication date: March 9, 2023

Inventors: Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Michael Baldridge, Peter James Anderson
Image manipulation by text instruction

Patent number: 11562518

Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.

Type: Grant

Filed: June 7, 2021

Date of Patent: January 24, 2023

Assignee: Google LLC

Inventors: Tianhao Zhang, Weilong Yang, Honglak Lee, Hung-Yu Tseng, Irfan Aziz Essa, Lu Jiang
Robotic grasping prediction using neural networks and geometry aware object representation

Patent number: 11554483

Abstract: Deep machine learning methods and apparatus, some of which are related to determining a grasp outcome prediction for a candidate grasp pose of an end effector of a robot. Some implementations are directed to training and utilization of both a geometry network and a grasp outcome prediction network. The trained geometry network can be utilized to generate, based on two-dimensional or two-and-a-half-dimensional image(s), geometry output(s) that are: geometry-aware, and that represent (e.g., high-dimensionally) three-dimensional features captured by the image(s). In some implementations, the geometry output(s) include at least an encoding that is generated based on a trained encoding neural network trained to generate encodings that represent three-dimensional features (e.g., shape). The trained grasp outcome prediction network can be utilized to generate, based on applying the geometry output(s) and additional data as input(s) to the network, a grasp outcome prediction for a candidate grasp pose.

Type: Grant

Filed: November 10, 2020

Date of Patent: January 17, 2023

Assignee: GOOGLE LLC

Inventors: James Davidson, Xinchen Yan, Yunfei Bai, Honglak Lee, Abhinav Gupta, Seyed Mohammad Khansari Zadeh, Arkanath Pathak, Jasmine Hsu
REINFORCEMENT LEARNING ALGORITHM SEARCH

Publication number: 20220391687

Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for generating and searching reinforcement learning algorithms. In some implementations, a computer-implemented system generates a sequence of candidate reinforcement learning algorithms. Each candidate reinforcement learning algorithm in the sequence is configured to receive an input environment state characterizing a state of an environment and to generate an output that specifies an action to be performed by an agent interacting with the environment. For each candidate reinforcement learning algorithm in the sequence, the system performs a performance evaluation for a set of a plurality of training environments. For each training environment, the system adjusts a set of environment-specific parameters of the candidate reinforcement learning algorithm by performing training of the candidate reinforcement learning algorithm to control a corresponding agent in the training environment.

Type: Application

Filed: June 3, 2021

Publication date: December 8, 2022

Inventors: John Dalton Co-Reyes, Yingjie Miao, Daiyi Peng, Sergey Vladimir Levine, Quoc V. Le, Honglak Lee, Aleksandra Faust
IMAGE MANIPULATION BY TEXT INSTRUCTION

Publication number: 20210383584

Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.

Type: Application

Filed: June 7, 2021

Publication date: December 9, 2021

Inventors: Tianhao Zhang, Weilong Yang, Honglak Lee, Hung-Yu Tseng, Irfan Aziz Essa, Lu Jiang
SAMPLE-EFFICIENT REINFORCEMENT LEARNING

Publication number: 20210201156

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sample-efficient reinforcement learning. One of the methods includes maintaining an ensemble of Q networks, an ensemble of transition models, and an ensemble of reward models; obtaining a transition; generating, using the ensemble of transition models, M trajectories; for each time step in each of the trajectories: generating, using the ensemble of reward models, N rewards for the time step, generating, using the ensemble of Q networks, L Q values for the time step, and determining, from the rewards, the Q values, and the training reward, L*N candidate target Q values for the trajectory and for the time step; for each of the time steps, combining the candidate target Q values; determining a final target Q value; and training at least one of the Q networks in the ensemble using the final target Q value.

Type: Application

Filed: May 20, 2019

Publication date: July 1, 2021

Inventors: Danijar Hafner, Jacob Buckman, Honglak Lee, Eugene Brevdo, George Jay Tucker
DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING

Publication number: 20210187733

Abstract: Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

Type: Application

Filed: May 17, 2019

Publication date: June 24, 2021

Inventors: Honglak Lee, Shixiang Gu, Sergey Levine
ROBOTIC MANIPULATION USING DOMAIN-INVARIANT 3D REPRESENTATIONS PREDICTED FROM 2.5D VISION DATA

Publication number: 20210101286

Abstract: Implementations relate to training a point cloud prediction model that can be utilized to process a single-view two-and-a-half-dimensional (2.5D) observation of an object, to generate a domain-invariant three-dimensional (3D) representation of the object. Implementations additionally or alternatively relate to utilizing the domain-invariant 3D representation to train a robotic manipulation policy model using, as at least part of the input to the robotic manipulation policy model during training, the domain-invariant 3D representations of simulated objects to be manipulated. Implementations additionally or alternatively relate to utilizing the trained robotic manipulation policy model in control of a robot based on output generated by processing generated domain-invariant 3D representations utilizing the robotic manipulation policy model.

Type: Application

Filed: February 28, 2020

Publication date: April 8, 2021

Inventors: Honglak Lee, Xinchen Yan, Soeren Pirk, Yunfei Bai, Seyed Mohammad Khansari Zadeh, Yuanzheng Gong, Jasmine Hsu
ROBOTIC GRASPING PREDICTION USING NEURAL NETWORKS AND GEOMETRY AWARE OBJECT REPRESENTATION

Publication number: 20210053217

Abstract: Deep machine learning methods and apparatus, some of which are related to determining a grasp outcome prediction for a candidate grasp pose of an end effector of a robot. Some implementations are directed to training and utilization of both a geometry network and a grasp outcome prediction network. The trained geometry network can be utilized to generate, based on two-dimensional or two-and-a-half-dimensional image(s), geometry output(s) that are: geometry-aware, and that represent (e.g., high-dimensionally) three-dimensional features captured by the image(s). In some implementations, the geometry output(s) include at least an encoding that is generated based on a trained encoding neural network trained to generate encodings that represent three-dimensional features (e.g., shape). The trained grasp outcome prediction network can be utilized to generate, based on applying the geometry output(s) and additional data as input(s) to the network, a grasp outcome prediction for a candidate grasp pose.

Type: Application

Filed: November 10, 2020

Publication date: February 25, 2021

Inventors: James Davidson, Xinchen Yan, Yunfei Bai, Honglak Lee, Abhinav Gupta, Seyed Mohammad Khansari Zadeh, Arkanath Pathak, Jasmine Hsu
Robotic grasping prediction using neural networks and geometry aware object representation

Patent number: 10864631

Abstract: Deep machine learning methods and apparatus, some of which are related to determining a grasp outcome prediction for a candidate grasp pose of an end effector of a robot. Some implementations are directed to training and utilization of both a geometry network and a grasp outcome prediction network. The trained geometry network can be utilized to generate, based on two-dimensional or two-and-a-half-dimensional image(s), geometry output(s) that are: geometry-aware, and that represent (e.g., high-dimensionally) three-dimensional features captured by the image(s). In some implementations, the geometry output(s) include at least an encoding that is generated based on a trained encoding neural network trained to generate encodings that represent three-dimensional features (e.g., shape). The trained grasp outcome prediction network can be utilized to generate, based on applying the geometry output(s) and additional data as input(s) to the network, a grasp outcome prediction for a candidate grasp pose.

Type: Grant

Filed: June 18, 2018

Date of Patent: December 15, 2020

Assignee: GOOGLE LLC

Inventors: James Davidson, Xinchen Yan, Yunfei Bai, Honglak Lee, Abhinav Gupta, Seyed Mohammad Khansari Zadeh, Arkanath Pathak, Jasmine Hsu
Modeling multiparty conversation dynamics: speaker, response, addressee selection using a novel deep learning approach

Patent number: 10657962

Abstract: An information processing system, a computer program product, and methods for modeling multi-party dialog interactions. A method includes learning, directly from data obtained from a multi-party conversational channel, to identify particular multi-party dialog threads as well as participants in one or more conversations. Each participant utterance being converted to a continuous vector representation updated in a model of the multi-party dialog relative to each participant utterance and according to each participant's role selected from the set of: sender, addressee, or observer. The method trains the model to choose a correct addressee and a correct response for each participant utterance, using a joint selection criterion. The method learns directly from the data obtained from the multi-party conversational channel, which dialog turns belong to each particular multi-party dialog thread.

Type: Grant

Filed: May 2, 2018

Date of Patent: May 19, 2020

Assignees: International Business Machines Corporation, University of Michigan

Inventors: Rui Zhang, Lazaros Polymenakos, Dragomir Radev, David Nahamoo, Honglak Lee
ROBOTIC GRASPING PREDICTION USING NEURAL NETWORKS AND GEOMETRY AWARE OBJECT REPRESENTATION

Publication number: 20200094405

Abstract: Deep machine learning methods and apparatus, some of which are related to determining a grasp outcome prediction for a candidate grasp pose of an end effector of a robot. Some implementations are directed to training and utilization of both a geometry network and a grasp outcome prediction network. The trained geometry network can be utilized to generate, based on two-dimensional or two-and-a-half-dimensional image(s), geometry output(s) that are: geometry-aware, and that represent (e.g., high-dimensionally) three-dimensional features captured by the image(s). In some implementations, the geometry output(s) include at least an encoding that is generated based on a trained encoding neural network trained to generate encodings that represent three-dimensional features (e.g., shape). The trained grasp outcome prediction network can be utilized to generate, based on applying the geometry output(s) and additional data as input(s) to the network, a grasp outcome prediction for a candidate grasp pose.

Type: Application

Filed: June 18, 2018

Publication date: March 26, 2020

Inventors: James Davidson, Xinchen Yan, Yunfei Bai, Honglak Lee, Abhinav Gupta, Seyed Mohammad Khansari Zadeh, Arkanath Pathak, Jasmine Hsu
MODELING MULTIPARTY CONVERSATION DYNAMICS: SPEAKER, RESPONSE, ADDRESSEE SELECTION USING A NOVEL DEEP LEARNING APPROACH

Publication number: 20190341036

Abstract: An information processing system, a computer program product, and methods for modeling multi-party dialog interactions. A method includes learning, directly from data obtained from a multi-party conversational channel, to identify particular multi-party dialog threads as well as participants in one or more conversations. Each participant utterance being converted to a continuous vector representation updated in a model of the multi-party dialog relative to each participant utterance and according to each participant's role selected from the set of: sender, addressee, or observer. The method trains the model to choose a correct addressee and a correct response for each participant utterance, using a joint selection criterion. The method learns directly from the data obtained from the multi-party conversational channel, which dialog turns belong to each particular multi-party dialog thread.

Type: Application

Filed: May 2, 2018

Publication date: November 7, 2019

Inventors: Rui ZHANG, Lazaros POLYMENAKOS, Dragomir RADEV, David NAHAMOO, Honglak LEE