Patents by Inventor Tom Ben Zion Zahavy

Tom Ben Zion Zahavy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240265263
    Abstract: A method is described for iteratively training a policy model, such as a neural network, of a computer-implemented action selection system to control an agent interacting with an environment to perform a task subject to one or more constraints. The task has a reward associated with performance of the task. Each constraint limits to a corresponding threshold the expected value of the total of a corresponding constraint function which if the future actions of the agent are chosen according to the policy model, and each constraint is associated with a corresponding multiplier variable. In each iteration, a mixed reward function is generated based on values for the multiplier variables generated in the preceding iteration, and estimates of the rewards and the values of constraint reward functions if the actions are chosen based on the policy model generated in the preceding iteration.
    Type: Application
    Filed: January 26, 2024
    Publication date: August 8, 2024
    Inventors: Theodore Harris Moskovitz, Brendan Timothy O'Donoghue, Tom Ben Zion Zahavy, Johan Sebastian Flennerhag, Vivek Veeriah Jeya Veeraiah, Satinder Singh Baveja
  • Publication number: 20240249151
    Abstract: The actions of an agent in an environment are selected using a policy model neural network which implements a policy model defining, for any observed state of the environment characterized by an observation received by the policy model neural network, a state-action distribution over the set of possible actions the agent can perform. The policy model neural network is jointly trained with a cost model neural network which, upon receiving an observation characterizing the environment, outputs a reward vector. The reward vector comprises a corresponding reward value for every possible action. The training involves a sequence of iterations, in each of which (a) a cost model is derived based on the state-action distribution of a candidate policy model defined in one or more previous iterations, and subsequently (b) a candidate policy model is obtained based on reward vector(s) defined by the cost model obtained in the iteration.
    Type: Application
    Filed: May 27, 2022
    Publication date: July 25, 2024
    Inventors: Tom Ben Zion Zahavy, Brendan Timothy O'Donoghue, Guillaume Desjardins, Satinder Singh Baveja
  • Publication number: 20240127071
    Abstract: There is provided a computer-implemented method for updating a search distribution of an evolutionary strategies optimizer using an optimizer neural network comprising one or more attention blocks. The method comprises receiving a plurality of candidate solutions, one or more parameters defining the search distribution that the plurality of candidate solutions are sampled from, and fitness score data indicating a fitness of each respective candidate solution of the plurality of candidate solutions. The method further comprises processing, by the one or more attention neural network blocks, the fitness score data using an attention mechanism to generate respective recombination weights corresponding to each respective candidate solution. The method further comprises updating the one or more parameters defining the search distribution based upon the recombination weights applied to the plurality of candidate solutions.
    Type: Application
    Filed: September 27, 2023
    Publication date: April 18, 2024
    Inventors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Ben Zion Zahavy, Valentin Clement Dalibard, Christopher Yenchuan Lu, Satinder Singh Baveja, Johan Sebastian Flennerhag
  • Publication number: 20240104389
    Abstract: In one aspect there is provided a method for training a neural network system by reinforcement learning. The neural network system may be configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy aiming to satisfy an objective. The method may comprise obtaining a policy set comprising one or more policies for satisfying the objective and determining a new policy based on the one or more policies. The determining may include one or more optimization steps that aim to maximize a diversity of the new policy relative to the policy set under the condition that the new policy satisfies a minimum performance criterion based on an expected return that would be obtained by following the new policy.
    Type: Application
    Filed: February 4, 2022
    Publication date: March 28, 2024
    Inventors: Tom Ben Zion Zahavy, Brendan Timothy O'Donoghue, Andre da Motta Salles Barreto, Johan Sebastian Flennerhag, Volodymyr Mnih, Satinder Singh Baveja
  • Publication number: 20230144995
    Abstract: A reinforcement learning system, method, and computer program code for controlling an agent to perform a plurality of tasks while interacting with an environment. The system learns options, where an option comprises a sequence of primitive actions performed by the agent under control of an option policy neural network. In implementations the system discovers options which are useful for multiple different tasks by meta-learning rewards for training the option policy neural network whilst the agent is interacting with the environment.
    Type: Application
    Filed: June 7, 2021
    Publication date: May 11, 2023
    Inventors: Vivek Veeriah Jeya Veeraiah, Tom Ben Zion Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado Philip van Hasselt, David Silver, Satinder Singh Baveja
  • Patent number: 10282462
    Abstract: A multi-modal computer classification network system for use in classifying data records is described herein. The system includes a memory device, a first classification computer server, a second classification computer server, and a policy computer server. The memory device includes an item records database and a labeling database. The first classification computer server includes a first classifier program that is configured to select an item record from the item database and generate a first classification record including a first ranked list of class labels. The second classification computer server includes a second classifier program that is configured to generate a second classification record including a second ranked list of class labels. The policy computer server includes a policy network that is programmed to determine a predicted class label based on the first and second ranked lists of class labels.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: May 7, 2019
    Assignee: WALMART APOLLO, LLC
    Inventors: Alessandro Magnani, Tom Ben Zion Zahavy, Abhinandan Krishnan, Shie Mannor
  • Publication number: 20180121533
    Abstract: A multi-modal computer classification network system for use in classifying data records is described herein. The system includes a memory device, a first classification computer server, a second classification computer server, and a policy computer server. The memory device includes an item records database and a labeling database. The first classification computer server includes a first classifier program that is configured to select an item record from the item database and generate a first classification record including a first ranked list of class labels. The second classification computer server includes a second classifier program that is configured to generate a second classification record including a second ranked list of class labels. The policy computer server includes a policy network that is programmed to determine a predicted class label based on the first and second ranked lists of class labels.
    Type: Application
    Filed: October 31, 2016
    Publication date: May 3, 2018
    Inventors: Alessandro Magnani, Tom Ben Zion Zahavy, Abhinandan Krishnan, Shie Mannor