Patents by Inventor Peter Wurman

Peter Wurman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12354027
    Abstract: A method and system for teaching an artificial intelligent agent where the agent can be placed in a state that it would like it to learn how to achieve. By giving the agent several examples, it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples to demonstrate a goal configuration. Once the agent has learned certain goal configurations, the agent can learn policies and skills that achieve the learned goal configuration. The agent may create a collection of these policies and skills from which to select based on a particular command or state.
    Type: Grant
    Filed: April 3, 2018
    Date of Patent: July 8, 2025
    Assignee: SONY GROUP CORPORATION
    Inventors: Mark Bishop Ring, Satinder Baveja, Peter Stone, James MacGlashan, Samuel Barrett, Roberto Capobianco, Varun Kompella, Kaushik Subramanian, Peter Wurman
  • Patent number: 12277194
    Abstract: A task prioritized experience replay (TaPER) algorithm enables simultaneous learning of multiple RL tasks off policy. The algorithm can prioritize samples that were part of fixed length episodes that led to the achievement of tasks. This enables the agent to quickly learn task policies by bootstrapping over its early successes. Finally, TaPER can improve performance on all tasks simultaneously, which is a desirable characteristic for multi-task RL. Unlike conventional ER algorithms that are applied to single RL task learning settings or that require rewards to be binary or abundant, or are provided as a parameterized specification of goals, TaPER poses no such restrictions and supports arbitrary reward and task specifications.
    Type: Grant
    Filed: September 29, 2020
    Date of Patent: April 15, 2025
    Assignee: SONY GROUP CORPORATION
    Inventors: Varun Kompella, James MacGlashan, Peter Wurman, Peter Stone
  • Patent number: 12217156
    Abstract: A real-time temporal convolution network (RT-TCN) algorithm reuses the output of prior convolution operations in all layers of the network to minimize the computational requirements and memory footprint of a TCN during real-time evaluation. Further, a TCN trained via the fixed-window view, where the TCN is trained using fixed time splices of the input time series, can be executed in real-time continually using RT-TCN.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: February 4, 2025
    Assignee: SONY GROUP CORPORATION
    Inventors: Piyush Khandelwal, James MacGlashan, Peter Wurman, Fabrizio Santini
  • Patent number: 12153385
    Abstract: Systems and methods are used to adapt the coefficients of a proportional-integral-derivative (PID) controller through reinforcement learning. The approach for adapting PID coefficients can include an outer loop of reinforcement learning where the PID coefficients are tuned to changes in the environment and an inner loop of PID control for quickly reacting to changing errors. The outer loop can learn and adapt as the environment changes and be configured to only run at a predetermined frequency, after a given number of steps. The outer loop can use summary statistics about the error terms and any other information sensed about the environment to calculate an observation. This observation can be used to evaluate the next action, for example, by feeding it into a neural network representing the policy. The resulting action is the coefficients of the PID controller and the tunable parameters of things such as the filters.
    Type: Grant
    Filed: May 7, 2021
    Date of Patent: November 26, 2024
    Assignees: SONY GROUP CORPORATION, SONY CORPORATION OF AMERICA
    Inventors: Samuel Barrett, James MacGlashan, Varun Kompella, Peter Wurman, Goker Erdogan, Fabrizio Santini
  • Publication number: 20240375010
    Abstract: A single policy can be trained to handle the user selection of parameters across a predetermined range for each component of an artificial intelligent agent within a domain. The agent can be trained across a number of weights within the desired range for each component. These weights determine how much of a reward portion for each component should be considered by the agent during training. Thus, an improved formulation can be realized for UVFA-like goals based on compositional reward functions parameterized by their components' weights. Additionally, a set of reward components has been determined for the domain of autonomous racing games that, when combined with the improved UVFA formulation, allows training a single racing agent that generalizes over continuous behaviors in multiple dimensions. This can be used by game designers to tune the skill and personality of a trained agent.
    Type: Application
    Filed: May 8, 2023
    Publication date: November 14, 2024
    Inventors: Florian Fuchs, Craig Sherstan, Takuma Seno, Yunshu Du, Patrick MacAlpine, Alisa Devlic, Kaushik Subramanian, Peter Wurman
  • Patent number: 12083429
    Abstract: Dynamic driving aides, such as driving lines, turn indicators, braking indicators and acceleration indicators, for example, can be provided for players participating in a racing game. Typically, driving lines are provided for each class of cars. However, even within a class of cars, each car differs enough that the ideal driving lines and breaking points can vary. Therefore, with an agent trained via reinforcement learning, an ideal lines and other driving aides can be established for every individual car. These guides can even be varied to account for variations in the weather or other track conditions.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: September 10, 2024
    Assignee: SONY GROUP CORPORATION
    Inventors: Peter Wurman, Kaushik Subramanian, Florian Fuchs, Takuma Seno, Kenta Kawamoto
  • Patent number: 12017148
    Abstract: A user interface (UI), for analyzing model training runs, tracking and visualizing various aspects of machine learning experiments, can be used when training an artificial intelligent agent in, for example, a racing game environment. The UI can be web-based and can allow researchers to easily see the status of their experiments. The UI can include an experiment synchronized event viewer that can synchronizes visualizations, videos, and timeline/metrics graphs in the experiment. This viewer allows researchers to see how experiments unfold in great detail. The UI can further include experiment event annotations that can generate event annotations. These annotations can be displayed via the synchronized event viewer. The UI can be used to consider consolidated results across experiments and can further consider videos. For example, the UI can provide a reusable dashboard that can capture and compare metrics across multiple experiments.
    Type: Grant
    Filed: May 31, 2022
    Date of Patent: June 25, 2024
    Assignee: SONY GROUP CORPORATION
    Inventors: Rory Douglas, Dion Whitehead, Leon Barrett, Piyush Khandelwal, Thomas Walsh, Samuel Barrett, Kaushik Subramanian, James MacGlashan, Leilani Gilpin, Peter Wurman
  • Publication number: 20230381660
    Abstract: A user interface (UI), for analyzing model training runs, tracking and visualizing various aspects of machine learning experiments, can be used when training an artificial intelligent agent in, for example, a racing game environment. The UI can be web-based and can allow researchers to easily see the status of their experiments. The UI can include an experiment synchronized event viewer that can synchronizes visualizations, videos, and timeline/metrics graphs in the experiment. This viewer allows researchers to see how experiments unfold in great detail. The UI can further include experiment event annotations that can generate event annotations. These annotations can be displayed via the synchronized event viewer. The UI can be used to consider consolidated results across experiments and can further consider videos. For example, the UI can provide a reusable dashboard that can capture and compare metrics across multiple experiments.
    Type: Application
    Filed: May 31, 2022
    Publication date: November 30, 2023
    Inventors: Rory Douglas, Dion Whitehead, Leon Barrett, Piyush Khandelwal, Thomas Walsh, Samuel Barrett, Kaushik Subramanian, James MacGlashan, Leilani Gilpin, Peter Wurman
  • Publication number: 20230368041
    Abstract: Experience replay (ER) is an important component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. A theoretical advantage is proven over the traditional monolithic buffer approach and the combination of SSET with an existing prioritized sampling strategy can further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches.
    Type: Application
    Filed: April 6, 2023
    Publication date: November 16, 2023
    Inventors: Varun Kompella, Thomas Walsh, Samuel Barrett, Peter Wurman, Peter Stone
  • Patent number: 11763170
    Abstract: Systems and methods use deep, convolutional neural networks over exponentially long history windows to learn alphabets for context tree weighting (CTW) for prediction. Known issues of depth and breadth in conventional context tree weighting predictions are addressed by the systems and methods. To deal with depth, the history can be broken into time windows, permitting the ability to look exponentially far back while having less information the further one looks back. To deal with breadth, a deep neural network classifier can be used to learn to map arbitrary length histories to a small output symbol alphabet. The sequence of symbols produced by such a classifier over the history windows would then become the input sequence to CTW.
    Type: Grant
    Filed: February 5, 2018
    Date of Patent: September 19, 2023
    Assignees: Sony Group Corporation, Sony Corporation of America
    Inventors: Michael Bowling, Satinder Baveja, Peter Wurman
  • Patent number: 11745109
    Abstract: An artificial intelligent agent can act as a player in a video game, such as a racing video game. The game can be completely external to the agent and can run in real time. In this way, the training system is much more like a real world system. The consoles on which the game runs for training the agent are provided in a cloud computing environment. The agents and the trainers can run on other computing devices in the cloud, where the system can choose the trainers and agent compute based on proximity to console, for example. Users can choose the game they want to run and submit code which can be built and deployed to the cloud system. A resource management service can monitor game console resources between human users and research usage and identify experiments for suspension to ensure enough game consoles for human users.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: September 5, 2023
    Assignees: SONY GROUP CORPORATION, SONY CORPORATION OF AMERICA, SONY INTERACTIVE ENTERTAINMENT LLC
    Inventors: Peter Wurman, Leon Barrett, Piyush Khandelwal, Dion Whitehead, Rory Douglas, Houmehr Aghabozorgi, Justin V Beltran, Rabih Abdul Ahad, Bandaly Azzam
  • Publication number: 20230249074
    Abstract: Dynamic driving aides, such as driving lines, turn indicators, braking indicators and acceleration indicators, for example, can be provided for players participating in a racing game. Typically, driving lines are provided for each class of cars. However, even within a class of cars, each car differs enough that the ideal driving lines and breaking points can vary. Therefore, with an agent trained via reinforcement learning, an ideal lines and other driving aides can be established for every individual car. These guides can even be varied to account for variations in the weather or other track conditions.
    Type: Application
    Filed: February 8, 2022
    Publication date: August 10, 2023
    Inventors: Peter Wurman, Kaushik Subramanian, Florian Fuchs, Takuma Seno, Kenta Kawamoto
  • Publication number: 20230249082
    Abstract: An artificial intelligent agent can act as a player in a video game, such as a racing video game. The agent can race against, and often beat, the best players in the world. The game can be completely external to the agent and can run in real time. In this way, the training system is much more like a real world system. The consoles on which the game runs for training the agent are provided in a cloud computing environment. The agents and the trainers can run on other computing devices in the cloud, where the system can choose the trainers and agent compute based on proximity to console, for example. Users can choose the game they want to run and submit code which can be built and deployed to the cloud system. Metrics and logs and artifacts from the game can be sent to cloud storage.
    Type: Application
    Filed: February 8, 2022
    Publication date: August 10, 2023
    Inventors: Peter Wurman, Leon Barrett, Piyush Khandelwal, Dion Whitehead, Rory Douglas, Houmehr Aghabozorgi, Justin V Beltran, Rabih Abdul Ahad, Bandaly Azzam
  • Publication number: 20230249083
    Abstract: An artificial intelligent agent can act as a player in a video game, such as a racing video game. The game can be completely external to the agent and can run in real time. In this way, the training system is much more like a real world system. The consoles on which the game runs for training the agent are provided in a cloud computing environment. The agents and the trainers can run on other computing devices in the cloud, where the system can choose the trainers and agent compute based on proximity to console, for example. Users can choose the game they want to run and submit code which can be built and deployed to the cloud system. A resource management service can monitor game console resources between human users and research usage and identify experiments for suspension to ensure enough game consoles for human users.
    Type: Application
    Filed: February 8, 2022
    Publication date: August 10, 2023
    Inventors: Peter Wurman, Leon Barrett, Piyush Khandelwal, Dion Whitehead, Rory Douglas, Houmehr Aghabozorgi, Justin V Beltran, Rabih Abdul Ahad, Bandaly Azzam
  • Publication number: 20230237370
    Abstract: A method for training an agent uses a mixture of scenarios designed to teach specific skills helpful in a larger domain, such as mixing general racing and very specific tactical racing scenarios. Aspects of the methods can include one or more of the following: (1) training the agent to be very good at time trials by having one or more cars spread out on the track; (2) running the agent in various racing scenarios with a variable number of opponents starting in different configurations around the track; (3) varying the opponents by using game-provided agents, agents trained according to aspects of the present invention, or agents controlled to follow specific driving lines; (4) setting up specific short scenarios with opponents in various racing situations with specific success criteria; and (5) having a dynamic curriculum based on how the agent performs on a variety of evaluation scenarios.
    Type: Application
    Filed: February 8, 2022
    Publication date: July 27, 2023
    Inventors: Thomas J. Walsh, Varun Kompella, Samuel Barrett, Michael D. Thomure, Patrick MacAlpine, Peter Wurman
  • Publication number: 20220365493
    Abstract: Systems and methods are used to adapt the coefficients of a proportional-integral-derivative (PID) controller through reinforcement learning. The approach for adapting PID coefficients can include an outer loop of reinforcement learning where the PID coefficients are tuned to changes in the environment and an inner loop of PID control for quickly reacting to changing errors. The outer loop can learn and adapt as the environment changes and be configured to only run at a predetermined frequency, after a given number of steps. The outer loop can use summary statistics about the error terms and any other information sensed about the environment to calculate an observation. This observation can be used to evaluate the next action, for example, by feeding it into a neural network representing the policy. The resulting action is the coefficients of the PID controller and the tunable parameters of things such as the filters.
    Type: Application
    Filed: May 7, 2021
    Publication date: November 17, 2022
    Inventors: Samuel Barrett, James MacGlashan, Varun Kompella, Peter Wurman, Goker Erdogan, Fabrizio Santini
  • Publication number: 20220101064
    Abstract: A task prioritized experience replay (TaPER) algorithm enables simultaneous learning of multiple RL tasks off policy. The algorithm can prioritize samples that were part of fixed length episodes that led to the achievement of tasks. This enables the agent to quickly learn task policies by bootstrapping over its early successes. Finally, TaPER can improve performance on all tasks simultaneously, which is a desirable characteristic for multi-task RL. Unlike conventional ER algorithms that are applied to single RL task learning settings or that require rewards to be binary or abundant, or are provided as a parameterized specification of goals, TaPER poses no such restrictions and supports arbitrary reward and task specifications.
    Type: Application
    Filed: September 29, 2020
    Publication date: March 31, 2022
    Inventors: Varun Kompella, James MacGlashan, Peter Wurman, Peter STONE
  • Publication number: 20220067504
    Abstract: Reinforcement learning methods can use actor-critic networks where (1) additional laboratory-only state information is used to train a policy that much act without this additional laboratory-only information in a production setting; and (2) complex resource-demanding policies are distilled into a less-demanding policy that can be more easily run at production with limited computational resources. The production actor network can be optimized using a frozen version of a large critic network, previously trained with a large actor network. Aspects of these methods can leverage actor-critic methods in which the critic network models the action value function, as opposed to the state value function.
    Type: Application
    Filed: August 26, 2020
    Publication date: March 3, 2022
    Inventors: Piyush Khandelwal, James MacGlashan, Peter Wurman
  • Publication number: 20210312258
    Abstract: A real-time temporal convolution network (RT-TCN) algorithm reuses the output of prior convolution operations in all layers of the network to minimize the computational requirements and memory footprint of a TCN during real-time evaluation. Further, a TCN trained via the fixed-window view, where the TCN is trained using fixed time splices of the input time series, can be executed in real-time continually using RT-TCN.
    Type: Application
    Filed: August 20, 2020
    Publication date: October 7, 2021
    Inventors: Piyush Khandelwal, James MacGlashan, Peter Wurman, Fabrizio Santini
  • Publication number: 20200218992
    Abstract: A method and system for training and/or operating an artificial intelligent agent can use multi-input and/or multi-forecast networks. Multi-forecasts are computational constructs, typically, but not necessarily, neural networks, whose shared network weights can be used to compute multiple related forecasts. This allows for more efficient training, in terms of the amount of data and/or experience needed, and in some instances, for more efficient computation of those forecasts. There are several related and sometimes composable approaches to multi-forecast networks.
    Type: Application
    Filed: January 2, 2020
    Publication date: July 9, 2020
    Inventors: Roberto Capobianco, Varun Kompella, Kaushik Subramanian, James Macglashan, Peter Wurman, Satinder Baveja