Patents by Inventor Johan Sebastian Flennerhag

Johan Sebastian Flennerhag has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHODS AND SYSTEMS FOR CONSTRAINED REINFORCEMENT LEARNING

Publication number: 20240265263

Abstract: A method is described for iteratively training a policy model, such as a neural network, of a computer-implemented action selection system to control an agent interacting with an environment to perform a task subject to one or more constraints. The task has a reward associated with performance of the task. Each constraint limits to a corresponding threshold the expected value of the total of a corresponding constraint function which if the future actions of the agent are chosen according to the policy model, and each constraint is associated with a corresponding multiplier variable. In each iteration, a mixed reward function is generated based on values for the multiplier variables generated in the preceding iteration, and estimates of the rewards and the values of constraint reward functions if the actions are chosen based on the policy model generated in the preceding iteration.

Type: Application

Filed: January 26, 2024

Publication date: August 8, 2024

Inventors: Theodore Harris Moskovitz, Brendan Timothy O'Donoghue, Tom Ben Zion Zahavy, Johan Sebastian Flennerhag, Vivek Veeriah Jeya Veeraiah, Satinder Singh Baveja
META-LEARNED EVOLUTIONARY STRATEGIES OPTIMIZER

Publication number: 20240127071

Abstract: There is provided a computer-implemented method for updating a search distribution of an evolutionary strategies optimizer using an optimizer neural network comprising one or more attention blocks. The method comprises receiving a plurality of candidate solutions, one or more parameters defining the search distribution that the plurality of candidate solutions are sampled from, and fitness score data indicating a fitness of each respective candidate solution of the plurality of candidate solutions. The method further comprises processing, by the one or more attention neural network blocks, the fitness score data using an attention mechanism to generate respective recombination weights corresponding to each respective candidate solution. The method further comprises updating the one or more parameters defining the search distribution based upon the recombination weights applied to the plurality of candidate solutions.

Type: Application

Filed: September 27, 2023

Publication date: April 18, 2024

Inventors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Ben Zion Zahavy, Valentin Clement Dalibard, Christopher Yenchuan Lu, Satinder Singh Baveja, Johan Sebastian Flennerhag
NEURAL NETWORK REINFORCEMENT LEARNING WITH DIVERSE POLICIES

Publication number: 20240104389

Abstract: In one aspect there is provided a method for training a neural network system by reinforcement learning. The neural network system may be configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy aiming to satisfy an objective. The method may comprise obtaining a policy set comprising one or more policies for satisfying the objective and determining a new policy based on the one or more policies. The determining may include one or more optimization steps that aim to maximize a diversity of the new policy relative to the policy set under the condition that the new policy satisfies a minimum performance criterion based on an expected return that would be obtained by following the new policy.

Type: Application

Filed: February 4, 2022

Publication date: March 28, 2024

Inventors: Tom Ben Zion Zahavy, Brendan Timothy O'Donoghue, Andre da Motta Salles Barreto, Johan Sebastian Flennerhag, Volodymyr Mnih, Satinder Singh Baveja

METHODS AND SYSTEMS FOR CONSTRAINED REINFORCEMENT LEARNING

META-LEARNED EVOLUTIONARY STRATEGIES OPTIMIZER

NEURAL NETWORK REINFORCEMENT LEARNING WITH DIVERSE POLICIES