Patents by Inventor Hidenao Iwane

Hidenao Iwane has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Recording medium, reinforcement learning method, and reinforcement learning apparatus

Patent number: 11645574

Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.

Type: Grant

Filed: September 13, 2018

Date of Patent: May 9, 2023

Assignees: FUJITSU LIMITED KAWASAKI, JAPAN, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
Reinforcement learning method and reinforcement learning system

Patent number: 11619915

Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.

Type: Grant

Filed: February 21, 2020

Date of Patent: April 4, 2023

Assignee: FUJITSU LIMITED

Inventors: Hidenao Iwane, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
Apparatus, method and recording medium for controlling system using temporal difference error

Patent number: 11573537

Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Type: Grant

Filed: September 13, 2018

Date of Patent: February 7, 2023

Assignees: FUJITSU LIMITED, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
Reinforcement learning method, recording medium, and reinforcement learning system

Patent number: 11543789

Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.

Type: Grant

Filed: February 21, 2020

Date of Patent: January 3, 2023

Assignee: FUJITSU LIMITED

Inventors: Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
Reinforcement learning method and device

Patent number: 11366433

Abstract: A reinforcement learning device includes a processor that determines a first action on a control target by using a basic controller that defines an action on the control target depending on a state of the control target. The processor performs a first reinforcement learning within a first action range around the first action in order to acquire a first policy for determining an action on the control target depending on a state of the control target. The first action range is smaller than a limit action range for the control target. The processor determines a second action on the control target by using the first policy. The processor updates the first policy to a second policy by performing a second reinforcement learning within a second action range around the second action. The second action range is smaller than the limit action range.

Type: Grant

Filed: March 6, 2019

Date of Patent: June 21, 2022

Assignee: FUJITSU LIMITED

Inventors: Hidenao Iwane, Yoshihiro Okawa
POLICY IMPROVEMENT METHOD, NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM FOR STORING POLICY IMPROVEMENT PROGRAM, AND POLICY IMPROVEMENT DEVICE

Publication number: 20210109491

Abstract: A policy improvement method for reinforcement learning using a state value function, the method including: calculating, when an immediate cost or immediate reward of a control target in the reinforcement learning is defined by a state and an input, an estimated parameter that estimates a parameter of the state value function for the state of the control target; contracting a state space of the control target using the calculated estimated parameter; generating a TD error for the estimated state value function that estimates the state value function in the contracted state space of the control target by perturbing each parameter that defines the policy; generating an estimated gradient that estimates the gradient of the state value function with respect to the parameter that defines the policy, based on the generated TD error and the perturbation; and updating the parameter that defines the policy using the generated estimated gradient.

Type: Application

Filed: September 29, 2020

Publication date: April 15, 2021

Applicant: FUJITSU LIMITED

Inventors: Junichi Shigezumi, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
METHOD FOR REINFORCEMENT LEARNING, RECORDING MEDIUM STORING REINFORCEMENT LEARNING PROGRAM, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20210063974

Abstract: A method for reinforcement learning performed by a computer is disclosed. The method includes: predicting a state of a target to be controlled in reinforcement learning at each time point to measure a state of the target, the time point being included in a period from a time point to determine a present action to a time point to determine a subsequent action; calculating a degree of risk concerning the state of the target at the each time point with respect to a constraint condition based on a result of prediction; specifying a search range concerning the present action to the target in accordance with the calculated degree of risk and a degree of impact of the present action to the target on the state of the target at the each time point; and determining the present action to the target based on the specified search range.

Type: Application

Filed: August 25, 2020

Publication date: March 4, 2021

Applicant: FUJITSU LIMITED

Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
POLICY IMPROVEMENT METHOD, POLICY IMPROVEMENT PROGRAM STORAGE MEDIUM, AND POLICY IMPROVEMENT DEVICE

Publication number: 20210049486

Abstract: A policy improvement method of improving a policy of reinforcement learning based on a state value function is performed by a computer. The method causes a computer to execute a process including: calculating an input to a control target based on the policy and a predetermined exploration method of exploring for an input to the control target in the reinforcement learning; and updating a parameter of the policy based on a result of applying the calculated input to the control target, using the input to the control target and a generalized inverse matrix regarding a state of the control target.

Type: Application

Filed: August 11, 2020

Publication date: February 18, 2021

Applicant: FUJITSU LIMITED

Inventors: Tomotake Sasaki, Hidenao Iwane
REINFORCEMENT LEARNING METHOD AND REINFORCEMENT LEARNING SYSTEM

Publication number: 20200285204

Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.

Type: Application

Filed: February 21, 2020

Publication date: September 10, 2020

Applicant: FUJITSU LIMITED

Inventors: Hidenao IWANE, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
REINFORCEMENT LEARNING METHOD, RECORDING MEDIUM, AND REINFORCEMENT LEARNING SYSTEM

Publication number: 20200285208

Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.

Type: Application

Filed: February 21, 2020

Publication date: September 10, 2020

Applicant: FUJITSU LIMITED

Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
REINFORCEMENT LEARNING METHOD, RECORDING MEDIUM, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20200234123

Abstract: A reinforcement learning method executed by a computer includes calculating, in reinforcement learning of repeatedly executing a learning step for a value function that has monotonicity as a characteristic of a value according to a state or an action of a control target, a contribution level of the state or the action of the control target used in the learning step, the contribution level of the state or the action to the reinforcement learning being calculated for each learning step and calculated using a basis function used for representing the value function; determining whether to update the value function, based on the value function after each learning step and the calculated contribution level calculated in each learning step; and updating the value function when the determining determines to update the value function.

Type: Application

Filed: January 16, 2020

Publication date: July 23, 2020

Applicant: FUJITSU LIMITED

Inventors: Junichi Shigezumi, Hidenao Iwane, Hitoshi Yanami
REINFORCEMENT LEARNING METHOD, RECORDING MEDIUM, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20200233384

Abstract: A reinforcement learning method is executed by a computer, for wind power generator control. The reinforcement learning method includes obtaining, as an action for one step in a reinforcement learning, a series of control inputs to a windmill including control inputs for plural steps ahead; obtaining, as a reward for one step in the reinforcement learning, a series of generated power amounts including generated power amounts for the plural steps ahead and indicating power generated by a wind power generator in response to rotations of the windmill; and implementing reinforcement learning for each step of determining a control input to be given to the windmill based on the series of control inputs and the series of generated power amounts.

Type: Application

Filed: January 3, 2020

Publication date: July 23, 2020

Applicant: FUJITSU LIMITED

Inventor: Hidenao Iwane
EFFICIENT REINFORCEMENT LEARNING BASED ON MERGING OF TRAINED LEARNERS

Publication number: 20200193333

Abstract: First reinforcement learning is performed, based on an action of a basic controller defining an action on a state of an environment, to obtain a first reinforcement learner by using a state-action value function expressed in a polynomial in an action range smaller than an action-range limit for the environment. Second reinforcement learning is performed, based on an action of a first controller including the first reinforcement learner, to obtain a second reinforcement learner by using a state-action value function expressed in a polynomial in an action range smaller than the action-range limit. Third reinforcement learning is performed, based on an action of a second controller including a merged reinforcement learner obtained by merging the first reinforcement learner and the second reinforcement learner, to obtain a third reinforcement leaner by using a state-action value function expressed in a polynomial in an action range smaller than the action-range limit.

Type: Application

Filed: December 10, 2019

Publication date: June 18, 2020

Applicant: FUJITSU LIMITED

Inventor: Hidenao Iwane
RECORDING MEDIUM THAT STORES REINFORCEMENT LEARNING PROGRAM, REINFORCEMENT LEARNING METHOD, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20200184277

Abstract: A reinforcement learning method is performed by a computer. The method includes: acquiring an input value related to a state and an action of a control target and a gain of the control target that corresponds to the input value; estimating coefficients of state-action value function that becomes a polynomial for a variable that represents the action of the control target, or becomes a polynomial for a variable that represents the action of the control target when a value is substituted for a variable that represents the state of the control target, based on the acquired input value and the gain; and obtaining an optimum action or an optimum value of the state-action value function with the estimated coefficients by using a quantifier elimination.

Type: Application

Filed: December 4, 2019

Publication date: June 11, 2020

Applicant: FUJITSU LIMITED

Inventors: Hidenao Iwane, Tomotake Sasaki, Hitoshi Yanami
ACTION DETERMINING METHOD AND ACTION DETERMINING APPARATUS

Publication number: 20200174432

Abstract: A non-transitory computer-readable recording medium having stored therein a program for causing a computer to execute a process. The process includes obtaining a specific action related to a value function that becomes a polynomial expression for a variable that represents an action or a polynomial expression for a variable that represents an action when a value is substituted for a variable that represents a state. The process includes specifying an action range by using a quantifier elimination for a logical expression including a conditional expression that represents that a difference between a value of the value function and a value of the value function that corresponds to the specific action is smaller than a threshold value. The process includes determining a next action from the specified range.

Type: Application

Filed: November 27, 2019

Publication date: June 4, 2020

Applicant: FUJITSU LIMITED

Inventor: Hidenao Iwane
REINFORCEMENT LEARNING METHOD AND DEVICE

Publication number: 20190302708

Abstract: A reinforcement learning device includes a processor that determines a first action on a control target by using a basic controller that defines an action on the control target depending on a state of the control target. The processor performs a first reinforcement learning within a first action range around the first action in order to acquire a first policy for determining an action on the control target depending on a state of the control target. The first action range is smaller than a limit action range for the control target. The processor determines a second action on the control target by using the first policy. The processor updates the first policy to a second policy by performing a second reinforcement learning within a second action range around the second action. The second action range is smaller than the limit action range.

Type: Application

Filed: March 6, 2019

Publication date: October 3, 2019

Applicant: FUJITSU LIMITED

Inventors: Hidenao IWANE, Yoshihiro OKAWA
RECORDING MEDIUM, POLICY IMPROVING METHOD, AND POLICY IMPROVING APPARATUS

Publication number: 20190086876

Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a TD error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Type: Application

Filed: September 13, 2018

Publication date: March 21, 2019

Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
RECORDING MEDIUM, REINFORCEMENT LEARNING METHOD, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20190087751

Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.

Type: Application

Filed: September 13, 2018

Publication date: March 21, 2019

Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
OPERATION PLANNING METHOD, OPERATION PLANNING APPARATUS, OPERATION PLANNING SYSTEM, AND COMPUTER-READABLE RECORDING MEDIUM

Publication number: 20170301053

Abstract: A non-transitory computer-readable recording medium stores an operation planning program that causes a computer to execute a process including: for a set including elements that are users, for which a plan of operation including ride sharing is to be generated, performing ordering of the elements of the set by using indices indicating highness of possibilities that subadditivity is fulfilled, determining whether a combination of the elements in descending order of the ordering fulfills the subadditivity, the number of the elements in the combination being equal to or less than a predetermined number, and partitioning the set into subsets, for which the ride sharing is to be operated, by adding the combination of elements fulfilling the subadditivity to the subsets; and generating the plan of operation by using the partitioned subsets.

Type: Application

Filed: April 12, 2017

Publication date: October 19, 2017

Applicants: FUJITSU LIMITED, KYUSHU UNIVERSITY, NATIONAL UNIVERSITY CORPORATION

Inventors: Hirokazu Anai, Kotaro Ohori, Hidenao Iwane, Naoyuki Kamiyama, Akafumi Kira
Control method, control server, and computer-readable recording medium

Patent number: 9614401

Abstract: A control server according to an embodiment sorts a plurality of notebook PCs into a plurality of groups so that the total value of the remaining energy is a value similar to the total value of the remaining energy of the rechargeable batteries of a plurality of notebook PCs included in a different group. The control server according to the embodiment performs local search individually on the sorted groups, and generates a control plan for the individual notebook PCs.

Type: Grant

Filed: February 28, 2014

Date of Patent: April 4, 2017

Assignees: FUJITSU LIMITED, THE UNIVERSITY OF TOKYO

Inventors: Hitoshi Yanami, Hidenao Iwane, Tomotake Sasaki, Hirokazu Anai, Junji Kaneko, Shinji Hara, Suguru Fujita

1 2 next