Patents by Inventor Rahul Anant Jain

Rahul Anant Jain has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LEVERAGING OFFLINE TRAINING DATA AND AGENT COMPETENCY MEASURES TO IMPROVE ONLINE LEARNING

Publication number: 20240232643

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a target action selection policy to control a target agent interacting with an environment. In one aspect, a method comprises: obtaining a set of offline training data, wherein the offline training data characterizes interaction of a baseline agent with an environment as the baseline agent performs actions selected in accordance with a baseline action selection policy; generating a set of online training data that characterizes interaction of the target agent with the environment as the target agent performs actions selected in accordance with the target action selection policy; and training the target action selection policy on both: (i) the offline training data, and (ii) the online training data, wherein the training of the target action selection policy on the offline training data is conditioned on a measure of competency of the baseline agent.

Type: Application

Filed: October 23, 2023

Publication date: July 11, 2024

Inventors: Zheng Wen, Benjamin Van Roy, Rahul Anant Jain, Botao Hao
LEVERAGING OFFLINE TRAINING DATA AND AGENT COMPETENCY MEASURES TO IMPROVE ONLINE LEARNING

Publication number: 20240135190

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a target action selection policy to control a target agent interacting with an environment. In one aspect, a method comprises: obtaining a set of offline training data, wherein the offline training data characterizes interaction of a baseline agent with an environment as the baseline agent performs actions selected in accordance with a baseline action selection policy; generating a set of online training data that characterizes interaction of the target agent with the environment as the target agent performs actions selected in accordance with the target action selection policy; and training the target action selection policy on both: (i) the offline training data, and (ii) the online training data, wherein the training of the target action selection policy on the offline training data is conditioned on a measure of competency of the baseline agent.

Type: Application

Filed: October 22, 2023

Publication date: April 25, 2024

Inventors: Zheng Wen, Benjamin Van Roy, Rahul Anant Jain, Botao Hao

LEVERAGING OFFLINE TRAINING DATA AND AGENT COMPETENCY MEASURES TO IMPROVE ONLINE LEARNING

LEVERAGING OFFLINE TRAINING DATA AND AGENT COMPETENCY MEASURES TO IMPROVE ONLINE LEARNING