Patents by Inventor Brendan Timothy O'Donoghue

Brendan Timothy O'Donoghue has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240265263
    Abstract: A method is described for iteratively training a policy model, such as a neural network, of a computer-implemented action selection system to control an agent interacting with an environment to perform a task subject to one or more constraints. The task has a reward associated with performance of the task. Each constraint limits to a corresponding threshold the expected value of the total of a corresponding constraint function which if the future actions of the agent are chosen according to the policy model, and each constraint is associated with a corresponding multiplier variable. In each iteration, a mixed reward function is generated based on values for the multiplier variables generated in the preceding iteration, and estimates of the rewards and the values of constraint reward functions if the actions are chosen based on the policy model generated in the preceding iteration.
    Type: Application
    Filed: January 26, 2024
    Publication date: August 8, 2024
    Inventors: Theodore Harris Moskovitz, Brendan Timothy O'Donoghue, Tom Ben Zion Zahavy, Johan Sebastian Flennerhag, Vivek Veeriah Jeya Veeraiah, Satinder Singh Baveja
  • Publication number: 20240249151
    Abstract: The actions of an agent in an environment are selected using a policy model neural network which implements a policy model defining, for any observed state of the environment characterized by an observation received by the policy model neural network, a state-action distribution over the set of possible actions the agent can perform. The policy model neural network is jointly trained with a cost model neural network which, upon receiving an observation characterizing the environment, outputs a reward vector. The reward vector comprises a corresponding reward value for every possible action. The training involves a sequence of iterations, in each of which (a) a cost model is derived based on the state-action distribution of a candidate policy model defined in one or more previous iterations, and subsequently (b) a candidate policy model is obtained based on reward vector(s) defined by the cost model obtained in the iteration.
    Type: Application
    Filed: May 27, 2022
    Publication date: July 25, 2024
    Inventors: Tom Ben Zion Zahavy, Brendan Timothy O'Donoghue, Guillaume Desjardins, Satinder Singh Baveja
  • Publication number: 20240104389
    Abstract: In one aspect there is provided a method for training a neural network system by reinforcement learning. The neural network system may be configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy aiming to satisfy an objective. The method may comprise obtaining a policy set comprising one or more policies for satisfying the objective and determining a new policy based on the one or more policies. The determining may include one or more optimization steps that aim to maximize a diversity of the new policy relative to the policy set under the condition that the new policy satisfies a minimum performance criterion based on an expected return that would be obtained by following the new policy.
    Type: Application
    Filed: February 4, 2022
    Publication date: March 28, 2024
    Inventors: Tom Ben Zion Zahavy, Brendan Timothy O'Donoghue, Andre da Motta Salles Barreto, Johan Sebastian Flennerhag, Volodymyr Mnih, Satinder Singh Baveja
  • Publication number: 20240062060
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for solving mixed integer programs (MIPs) using neural networks. One of the methods includes obtaining data specifying parameters of a MIP; generating, from the parameters of the MIP, an input representation; processing the input representation using an encoder neural network to generate a respective embedding for each of the integer variables; generating a plurality of partial assignments by selecting a respective second, proper subset of the integer variables; and for each of the variables in the respective second subset, generating, using at least the respective embedding for the variable, a respective additional constraint on the value of the variable; generating, for each of the partial assignments, a corresponding candidate final assignment that assigns a respective value to each of the plurality of variables; and selecting, as a final assignment for the MIP, one of the candidate final assignments.
    Type: Application
    Filed: December 20, 2021
    Publication date: February 22, 2024
    Inventors: Sergey Bartunov, Felix Axel Gimeno Gil, Ingrid Karin von Glehn, Pawel Lichocki, Ivan Lobov, Vinod Nair, Brendan Timothy O'Donoghue, Nicolas Sonnerat, Christian Tjandraatmadja, Pengming Wang