Patents by Inventor Matthew Edmund TAYLOR

Matthew Edmund TAYLOR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Devices and methods for reinforcement learning visualization using immersive environments

Patent number: 11720792

Abstract: Disclosed are systems, methods, and devices for generating a visualization of a deep reinforcement learning (DRL) process. State data is received, reflective of states of an environment explored by an DRL agent, each state corresponding to a time step. For each given state, saliency metrics are calculated by processing the state data, each metric measuring saliency of a feature at the time step corresponding to the given state. A graphical visualization is generated, having at least two dimensions in which: each feature of the environment is graphically represented along a first axis; and each time step is represented along a second axis; and a plurality of graphical markers representing corresponding saliency metrics, each graphical marker having a size commensurate with the magnitude of the particular saliency metric represented, and a location along the first and second axes corresponding to the feature and time step for the particular saliency metric.

Type: Grant

Filed: July 31, 2020

Date of Patent: August 8, 2023

Assignee: ROYAL BANK OF CANADA

Inventors: Matthew Edmund Taylor, Bilal Kartal, Pablo Francisco Hernandez Leal, Nathan Douglas, Dianna Yim, Frank Maurer
System and method for deep reinforcement learning

Patent number: 11574148

Abstract: A computer system and method for extending parallelized asynchronous reinforcement learning for training a neural network is described in various embodiments, through coordinated operation of plurality of hardware processors or threads such that each functions as a worker agent that is configured to simultaneously interact with a target computing environment for local gradient computation based on a loss determination and to update global network parameters based at least on local gradient computation to train the neural network through modifications of weighted interconnections between interconnected computing units as gradient computation is conducted across a plurality of iterations of a target computing environment, the loss determination including at least a policy loss term (actor), a value loss term (critic), and an auxiliary control loss. Variations are described further where the neural network is adapted to include terminal state prediction and action guidance.

Type: Grant

Filed: November 5, 2019

Date of Patent: February 7, 2023

Assignee: ROYAL BANK OF CANADA

Inventors: Bilal Kartal, Pablo Francisco Hernandez Leal, Matthew Edmund Taylor
Interactive reinforcement learning with dynamic reuse of prior knowledge

Patent number: 11308401

Abstract: Systems, methods, and computer readable media directed to interactive reinforcement learning with dynamic reuse of prior knowledge are described in various embodiments. The interactive reinforcement learning is adapted for providing computer implemented systems for dynamic action selection based on confidence levels associated with demonstrator data or portions thereof.

Type: Grant

Filed: January 31, 2019

Date of Patent: April 19, 2022

Assignee: ROYAL BANK OF CANADA

Inventors: Matthew Edmund Taylor, Zhaodong Wang
Opponent modeling with asynchronous methods in deep RL

Patent number: 11295174

Abstract: A computer system and method for extending parallelized asynchronous reinforcement learning to include agent modeling for training a neural network is described. Coordinated operation of plurality of hardware processors or threads is utilized such that each functions as a worker process that is configured to simultaneously interact with a target computing environment for local gradient computation based on a loss determination mechanism and to update global network parameters. The loss determination mechanism includes at least a policy loss term (actor), a value loss term (critic), and a supervised cross entropy loss. Variations are described further where the neural network is adapted to include a latent space to track agent policy features.

Type: Grant

Filed: November 5, 2019

Date of Patent: April 5, 2022

Assignee: ROYAL BANK OF CANADA

Inventors: Pablo Francisco Hernandez Leal, Bilal Kartal, Matthew Edmund Taylor
SYSTEM AND METHOD FOR FACILITATING EXPLAINABILITY IN REINFORCEMENT MACHINE LEARNING

Publication number: 20210312282

Abstract: Systems are methods are provided for facilitating explainability of decision-making by reinforcement learning agents. A reinforcement learning agent is instantiated which generates, via a function approximation representation, learned outputs governing its decision-making. Data records of a plurality of past inputs for the agent are stored, each of the past inputs including values of a plurality of state variables. Data records of a plurality of past learned outputs of the agent are also stored. A group definition data structure defining groups of the state variables are received. For a given past input a given group, data generated reflective of a perturbed input by altering a value of at least one state variable is generated, and are presented to the reinforcement learning agent to obtain a perturbed learned output generated by the reinforcement learning agent; and a distance metric is generated reflective of a magnitude of difference between the perturbed learned output and the past learned output.

Type: Application

Filed: April 1, 2021

Publication date: October 7, 2021

Inventors: Pablo Francisco HERNANDEZ LEAL, Ruitong HUANG, Bilal KARTAL, Changjian LI, Matthew Edmund TAYLOR, Alexander BRANDIMARTE, Pui Shing LAM
SYSTEM AND METHOD FOR UNCERTAINTY-BASED ADVICE FOR DEEP REINFORCEMENT LEARNING AGENTS

Publication number: 20210073912

Abstract: Disclosed are systems, methods, and devices for training a learning agent. A learning agent that maintains a reinforcement learning neural network is instantiated. State data reflective of a state of an environment explored by the learning agent is received. An uncertainty metric calculated upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent. Upon determining that the uncertainty metric exceeds a pre-defined threshold: a request signal requesting an action suggestion from a demonstrator is sent; a suggestion signal reflective of the action suggestion is received; and an action signal to implement the action suggestion is sent.

Type: Application

Filed: September 3, 2020

Publication date: March 11, 2021

Inventors: Felipe Leno DA SILVA, Pablo Francisco HERNANDEZ LEAL, Bilal KARTAL, Matthew Edmund TAYLOR
DEVICES AND METHODS FOR REINFORCEMENT LEARNING VISUALIZATION USING IMMERSIVE ENVIRONMENTS

Publication number: 20210034974

Abstract: Disclosed are systems, methods, and devices for generating a visualization of a deep reinforcement learning (DRL) process. State data is received, reflective of states of an environment explored by an DRL agent, each state corresponding to a time step. For each given state, saliency metrics are calculated by processing the state data, each metric measuring saliency of a feature at the time step corresponding to the given state. A graphical visualization is generated, having at least two dimensions in which: each feature of the environment is graphically represented along a first axis; and each time step is represented along a second axis; and a plurality of graphical markers representing corresponding saliency metrics, each graphical marker having a size commensurate with the magnitude of the particular saliency metric represented, and a location along the first and second axes corresponding to the feature and time step for the particular saliency metric.

Type: Application

Filed: July 31, 2020

Publication date: February 4, 2021

Inventors: Matthew Edmund TAYLOR, Bilal KARTAL, Pablo Francisco HERNANDEZ LEAL, Nathan DOUGLAS, Dianna YIM, Frank MAURER
SYSTEM AND METHOD FOR MULTI-TYPE MEAN FIELD REINFORCEMENT MACHINE LEARNING

Publication number: 20200279136

Abstract: A system for a machine reinforcement learning architecture for an environment with a plurality of agents includes: at least one memory and at least one processor configured to provide a multi-agent reinforcement learning architecture, the multi-agent reinforcement learning model based on a mean field Q function including multiple types of agents, wherein each type of agent has a corresponding mean field.

Type: Application

Filed: February 28, 2020

Publication date: September 3, 2020

Inventors: Sriram Ganapathi Subramanian, Pascal Poupart, Matthew Edmund Taylor, Nidhi Hegde
OPPONENT MODELING WITH ASYNCHRONOUS METHODS IN DEEP RL

Publication number: 20200143208

Abstract: A computer system and method for extending parallelized asynchronous reinforcement learning to include agent modeling for training a neural network is described. Coordinated operation of plurality of hardware processors or threads is utilized such that each functions as a worker process that is configured to simultaneously interact with a target computing environment for local gradient computation based on a loss determination mechanism and to update global network parameters. The loss determination mechanism includes at least a policy loss term (actor), a value loss term (critic), and a supervised cross entropy loss. Variations are described further where the neural network is adapted to include a latent space to track agent policy features.

Type: Application

Filed: November 5, 2019

Publication date: May 7, 2020

Inventors: Pablo Francisco HERNANDEZ LEAL, Bilal KARTAL, Matthew Edmund TAYLOR
SYSTEM AND METHOD FOR DEEP REINFORCEMENT LEARNING

Publication number: 20200143206

Abstract: A computer system and method for extending parallelized asynchronous reinforcement learning for training a neural network is described in various embodiments, through coordinated operation of plurality of hardware processors or threads such that each functions as a worker agent that is configured to simultaneously interact with a target computing environment for local gradient computation based on a loss determination and to update global network parameters based at least on local gradient computation to train the neural network through modifications of weighted interconnections between interconnected computing units as gradient computation is conducted across a plurality of iterations of a target computing environment, the loss determination including at least a policy loss term (actor), a value loss term (critic), and an auxiliary control loss. Variations are described further where the neural network is adapted to include terminal state prediction and action guidance.

Type: Application

Filed: November 5, 2019

Publication date: May 7, 2020

Inventors: Bilal KARTAL, Pablo Francisco HERNANDEZ LEAL, Matthew Edmund TAYLOR
INTERACTIVE REINFORCEMENT LEARNING WITH DYNAMIC REUSE OF PRIOR KNOWLEDGE

Publication number: 20190236458

Abstract: Systems, methods, and computer readable media directed to interactive reinforcement learning with dynamic reuse of prior knowledge are described in various embodiments. The interactive reinforcement learning is adapted for providing computer implemented systems for dynamic action selection based on confidence levels associated with demonstrator data or portions thereof.

Type: Application

Filed: January 31, 2019

Publication date: August 1, 2019

Inventors: Matthew Edmund TAYLOR, Zhaodong WANG
PRE-TRAINING NEURAL NETWORKS WITH HUMAN DEMONSTRATIONS FOR DEEP REINFORCEMENT LEARNING

Publication number: 20190236455

Abstract: Disclosed herein are a system and method for providing a machine learning architecture based on monitored demonstrations. The system may include: a non-transitory computer-readable memory storage; at least one processor configured for dynamically training a machine learning architecture for performing one or more sequential tasks, the at least one processor configured to provide: a data receiver for receiving one or more demonstrator data sets, each demonstrator data set including a data structure representing the one or more state-action pairs; a neural network of the machine learning architecture, the neural network including a group of nodes in one or more layers; and a pre-training engine configured for processing the one or more demonstrator data sets to extract one or more features, the extracted one or more features used to pre-train the neural network based on the one or more state-action pairs observed in one or more interactions with the environment.

Type: Application

Filed: January 31, 2019

Publication date: August 1, 2019

Inventors: Matthew Edmund TAYLOR, Gabriel Victor DE LA CRUZ, JR., Yunshu DU