Patents by Inventor Seyed Mehdi Fatemi Booshehri

Seyed Mehdi Fatemi Booshehri has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SECURE EXPLORATION FOR REINFORCEMENT LEARNING

Publication number: 20230199031

Abstract: A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.

Type: Application

Filed: February 17, 2023

Publication date: June 22, 2023

Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI
UNDETECTABLE SANDBOX FOR MALWARE

Publication number: 20230185902

Abstract: Embodiments seek to prevent detection of a sandbox environment by a potential malware application. To this end, execution of the application is monitored, and provide information about the execution to a reinforcement learning machine learning model. The model generates a suggested modification to make to the executing application. The model is provided with information indicating whether the application executed successfully or not, and this information is used to train the model for additional modifications. By modifying the potential malware execution during its execution, detection of a sandbox environment is prevented, and analysis of the potential malware applications features are better understood.

Type: Application

Filed: January 30, 2023

Publication date: June 15, 2023

Inventors: Jugal PARIKH, Geoffrey Lyall McDonald, Mariusz Hieronim JAKUBOWSKI, Seyed Mehdi Fatemi Booshehri, Allan Gordon Lontoc Sepillo, Bradley Noah Faskowitz
Secure exploration for reinforcement learning

Patent number: 11616813

Abstract: A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.

Type: Grant

Filed: August 28, 2019

Date of Patent: March 28, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Harm Hendrik Van Seijen, Seyed Mehdi Fatemi Booshehri
Undetectable sandbox for malware

Patent number: 11568052

Abstract: Embodiments seek to prevent detection of a sandbox environment by a potential malware application. To this end, execution of the application is monitored, and provide information about the execution to a reinforcement learning machine learning model. The model generates a suggested modification to make to the executing application. The model is provided with information indicating whether the application executed successfully or not, and this information is used to train the model for additional modifications. By modifying the potential malware execution during its execution, detection of a sandbox environment is prevented, and analysis of the potential malware applications features are better understood.

Type: Grant

Filed: May 31, 2020

Date of Patent: January 31, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jugal Parikh, Geoffrey Lyall McDonald, Mariusz H. Jakubowski, Seyed Mehdi Fatemi Booshehri, Allan Gordon Lontoc Sepillo, Bradley Noah Faskowitz
UNDETECTABLE SANDBOX FOR MALWARE

Publication number: 20210374241

Abstract: Embodiments seek to prevent detection of a sandbox environment by a potential malware application. To this end, execution of the application is monitored, and provide information about the execution to a reinforcement learning machine learning model. The model generates a suggested modification to make to the executing application. The model is provided with information indicating whether the application executed successfully or not, and this information is used to train the model for additional modifications. By modifying the potential malware execution during its execution, detection of a sandbox environment is prevented, and analysis of the potential malware applications features are better understood.

Type: Application

Filed: May 31, 2020

Publication date: December 2, 2021

Inventors: Jugal Parikh, Geoffrey Lyall McDonald, Mariusz H. Jakubowski, Seyed Mehdi Fatemi Booshehri, Allan Gordon Lontoc Sepillo, Bradley Noah Faskowitz
Hybrid reward architecture for reinforcement learning

Patent number: 10977551

Abstract: Aspects provided herein are relevant to machine learning techniques, including decomposing single-agent reinforcement learning problems into simpler problems addressed by multiple agents. Actions proposed by the multiple agents are then aggregated using an aggregator, which selects an action to take with respect to an environment. Aspects provided herein are also relevant to a hybrid reward model.

Type: Grant

Filed: June 27, 2017

Date of Patent: April 13, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Harm Hendrik Van Seijen, Seyed Mehdi Fatemi Booshehri, Romain Michel Henri Laroche, Joshua Samuel Romoff
SECURE EXPLORATION FOR REINFORCEMENT LEARNING

Publication number: 20200076857

Abstract: A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.

Type: Application

Filed: August 28, 2019

Publication date: March 5, 2020

Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI
Two-stage training of a spoken dialogue system

Patent number: 10395646

Abstract: Described herein are systems and methods for two-stage training of a spoken dialog system. The first stage trains a policy network using external data to produce a semi-trained policy network. The external data includes one or more known fixed dialogs. The second stage trains the semi-trained policy network through interaction to produce a trained policy network. The interaction may be interaction with a user simulator.

Type: Grant

Filed: May 12, 2017

Date of Patent: August 27, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Seyed Mehdi Fatemi Booshehri, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman
SCALABILITY OF REINFORCEMENT LEARNING BY SEPARATION OF CONCERNS

Publication number: 20180165602

Abstract: Aspects provided herein are relevant to machine learning techniques, including decomposing single-agent reinforcement learning problems into simpler problems addressed by multiple agents. Actions proposed by the multiple agents are then aggregated using an aggregator, which selects an action to take with respect to an environment. Aspects provided herein are also relevant to a hybrid reward model.

Type: Application

Filed: June 27, 2017

Publication date: June 14, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI, Romain Michel Henri LAROCHE, Joshua Samuel ROMOFF
HYBRID REWARD ARCHITECTURE FOR REINFORCEMENT LEARNING

Publication number: 20180165603

Abstract: Aspects provided herein are relevant to machine learning techniques, including decomposing single-agent reinforcement learning problems into simpler problems addressed by multiple agents. Actions proposed by the multiple agents are then aggregated using an aggregator, which selects an action to take with respect to an environment. Aspects provided herein are also relevant to a hybrid reward model.

Type: Application

Filed: June 27, 2017

Publication date: June 14, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: Harm Hendrik VAN SEIJEN, Seyed Mehdi FATEMI BOOSHEHRI, Romain Michel Henri LAROCHE, Joshua Samuel ROMOFF
TWO-STAGE TRAINING OF A SPOKEN DIALOGUE SYSTEM

Publication number: 20170330556

Abstract: Described herein are systems and methods for two-stage training of a spoken dialogue system. The first stage trains a policy network using external data to produce a semi-trained policy network. The external data includes one or more known fixed dialogues. The second stage trains the semi-trained policy network through interaction to produce a trained policy network. The interaction may be interaction with a user simulator.

Type: Application

Filed: May 12, 2017

Publication date: November 16, 2017

Applicant: Maluuba Inc.

Inventors: Seyed Mehdi Fatemi Booshehri, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman