Patents by Inventor Harm Hendrik VAN SEIJEN

Harm Hendrik VAN SEIJEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230199031
    Abstract: A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.
    Type: Application
    Filed: February 17, 2023
    Publication date: June 22, 2023
    Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI
  • Patent number: 11616813
    Abstract: A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: March 28, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Harm Hendrik Van Seijen, Seyed Mehdi Fatemi Booshehri
  • Patent number: 10977551
    Abstract: Aspects provided herein are relevant to machine learning techniques, including decomposing single-agent reinforcement learning problems into simpler problems addressed by multiple agents. Actions proposed by the multiple agents are then aggregated using an aggregator, which selects an action to take with respect to an environment. Aspects provided herein are also relevant to a hybrid reward model.
    Type: Grant
    Filed: June 27, 2017
    Date of Patent: April 13, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Harm Hendrik Van Seijen, Seyed Mehdi Fatemi Booshehri, Romain Michel Henri Laroche, Joshua Samuel Romoff
  • Publication number: 20200076857
    Abstract: A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.
    Type: Application
    Filed: August 28, 2019
    Publication date: March 5, 2020
    Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI
  • Publication number: 20180165603
    Abstract: Aspects provided herein are relevant to machine learning techniques, including decomposing single-agent reinforcement learning problems into simpler problems addressed by multiple agents. Actions proposed by the multiple agents are then aggregated using an aggregator, which selects an action to take with respect to an environment. Aspects provided herein are also relevant to a hybrid reward model.
    Type: Application
    Filed: June 27, 2017
    Publication date: June 14, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI, Romain Michel Henri LAROCHE, Joshua Samuel ROMOFF
  • Publication number: 20180165602
    Abstract: Aspects provided herein are relevant to machine learning techniques, including decomposing single-agent reinforcement learning problems into simpler problems addressed by multiple agents. Actions proposed by the multiple agents are then aggregated using an aggregator, which selects an action to take with respect to an environment. Aspects provided herein are also relevant to a hybrid reward model.
    Type: Application
    Filed: June 27, 2017
    Publication date: June 14, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI, Romain Michel Henri LAROCHE, Joshua Samuel ROMOFF