Patents by Inventor Varun KOMPELLA

Varun KOMPELLA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

EVENT TABLES FOR EFFICIENT EXPERIENCE REPLAY

Publication number: 20230368041

Abstract: Experience replay (ER) is an important component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. A theoretical advantage is proven over the traditional monolithic buffer approach and the combination of SSET with an existing prioritized sampling strategy can further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches.

Type: Application

Filed: April 6, 2023

Publication date: November 16, 2023

Inventors: Varun Kompella, Thomas Walsh, Samuel Barrett, Peter Wurman, Peter Stone
METHODS FOR TRAINING AN ARTIFICIAL INTELLIGENT AGENT WITH CURRICULUM AND SKILLS

Publication number: 20230237370

Abstract: A method for training an agent uses a mixture of scenarios designed to teach specific skills helpful in a larger domain, such as mixing general racing and very specific tactical racing scenarios. Aspects of the methods can include one or more of the following: (1) training the agent to be very good at time trials by having one or more cars spread out on the track; (2) running the agent in various racing scenarios with a variable number of opponents starting in different configurations around the track; (3) varying the opponents by using game-provided agents, agents trained according to aspects of the present invention, or agents controlled to follow specific driving lines; (4) setting up specific short scenarios with opponents in various racing situations with specific success criteria; and (5) having a dynamic curriculum based on how the agent performs on a variety of evaluation scenarios.

Type: Application

Filed: February 8, 2022

Publication date: July 27, 2023

Inventors: Thomas J. Walsh, Varun Kompella, Samuel Barrett, Michael D. Thomure, Patrick MacAlpine, Peter Wurman
METHODS AND SYSTEMS TO ADAPT PID COEFFICIENTS THROUGH REINFORCEMENT LEARNING

Publication number: 20220365493

Abstract: Systems and methods are used to adapt the coefficients of a proportional-integral-derivative (PID) controller through reinforcement learning. The approach for adapting PID coefficients can include an outer loop of reinforcement learning where the PID coefficients are tuned to changes in the environment and an inner loop of PID control for quickly reacting to changing errors. The outer loop can learn and adapt as the environment changes and be configured to only run at a predetermined frequency, after a given number of steps. The outer loop can use summary statistics about the error terms and any other information sensed about the environment to calculate an observation. This observation can be used to evaluate the next action, for example, by feeding it into a neural network representing the policy. The resulting action is the coefficients of the PID controller and the tunable parameters of things such as the filters.

Type: Application

Filed: May 7, 2021

Publication date: November 17, 2022

Inventors: Samuel Barrett, James MacGlashan, Varun Kompella, Peter Wurman, Goker Erdogan, Fabrizio Santini
Method and system for continual learning in an intelligent artificial agent

Patent number: 11443229

Abstract: A method and system for teaching an artificial intelligent agent includes giving the agent several examples where it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples to demonstrate a goal configuration. Once the agent has learned certain goal configurations, the agent can learn an option to achieve the goal configuration and a distance function that predicts at least one of a distance and a duration to the goal configuration under the learned option. This distance function prediction may be incorporated as a state feature of the agent.

Type: Grant

Filed: August 31, 2018

Date of Patent: September 13, 2022

Assignees: Sony Group Corporation, Sony Corporation of America

Inventors: Mark Bishop Ring, Satinder Baveja, Roberto Capobianco, Varun Kompella, Kaushik Subramanian, James MacGlashan
TASK PRIORITIZED EXPERIENCE REPLAY ALGORITHM FOR REINFORCEMENT LEARNING

Publication number: 20220101064

Abstract: A task prioritized experience replay (TaPER) algorithm enables simultaneous learning of multiple RL tasks off policy. The algorithm can prioritize samples that were part of fixed length episodes that led to the achievement of tasks. This enables the agent to quickly learn task policies by bootstrapping over its early successes. Finally, TaPER can improve performance on all tasks simultaneously, which is a desirable characteristic for multi-task RL. Unlike conventional ER algorithms that are applied to single RL task learning settings or that require rewards to be binary or abundant, or are provided as a parameterized specification of goals, TaPER poses no such restrictions and supports arbitrary reward and task specifications.

Type: Application

Filed: September 29, 2020

Publication date: March 31, 2022

Inventors: Varun Kompella, James MacGlashan, Peter Wurman, Peter STONE
MULTI-FORECAST NETWORKS

Publication number: 20200218992

Abstract: A method and system for training and/or operating an artificial intelligent agent can use multi-input and/or multi-forecast networks. Multi-forecasts are computational constructs, typically, but not necessarily, neural networks, whose shared network weights can be used to compute multiple related forecasts. This allows for more efficient training, in terms of the amount of data and/or experience needed, and in some instances, for more efficient computation of those forecasts. There are several related and sometimes composable approaches to multi-forecast networks.

Type: Application

Filed: January 2, 2020

Publication date: July 9, 2020

Inventors: Roberto Capobianco, Varun Kompella, Kaushik Subramanian, James Macglashan, Peter Wurman, Satinder Baveja
METHOD AND SYSTEM FOR CONTINUAL LEARNING IN AN INTELLIGENT ARTIFICIAL AGENT

Publication number: 20200074349

Abstract: A method and system for teaching an artificial intelligent agent includes giving the agent several examples where it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples to demonstrate a goal configuration. Once the agent has learned certain goal configurations, the agent can learn an option to achieve the goal configuration and a distance function that predicts at least one of a distance and a duration to the goal configuration under the learned option. This distance function prediction may be incorporated as a state feature of the agent.

Type: Application

Filed: August 31, 2018

Publication date: March 5, 2020

Inventors: Mark Bishop RING, Satinder BAVEJA, Roberto CAPOBIANCO, Varun KOMPELLA, Kaushik SUBRAMANIAN, James MACGLASHAN
METHOD AND SYSTEM FOR AN INTELLIGENT ARTIFICIAL AGENT

Publication number: 20190303776

Abstract: A method and system for teaching an artificial intelligent agent where the agent can be placed in a state that it would like it to learn how to achieve. By giving the agent several examples, it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples to demonstrate a goal configuration. Once the agent has learned certain goal configurations, the agent can learn policies and skills that achieve the learned goal configuration. The agent may create a collection of these policies and skills from which to select based on a particular command or state.

Type: Application

Filed: April 3, 2018

Publication date: October 3, 2019

Applicant: COGITAI, INC.

Inventors: Mark Bishop RING, Satinder BAVEJA, Peter STONE, James MACGLASHAN, Samuel BARRETT, Roberto CAPOBIANCO, Varun KOMPELLA, Kaushik SUBRAMANIAN, Peter WURMAN