Patents by Inventor Hidenao Iwane

Hidenao Iwane has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11645574
    Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.
    Type: Grant
    Filed: September 13, 2018
    Date of Patent: May 9, 2023
    Assignees: FUJITSU LIMITED KAWASAKI, JAPAN, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Patent number: 11619915
    Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: April 4, 2023
    Assignee: FUJITSU LIMITED
    Inventors: Hidenao Iwane, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
  • Patent number: 11573537
    Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.
    Type: Grant
    Filed: September 13, 2018
    Date of Patent: February 7, 2023
    Assignees: FUJITSU LIMITED, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Patent number: 11543789
    Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: January 3, 2023
    Assignee: FUJITSU LIMITED
    Inventors: Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Patent number: 11366433
    Abstract: A reinforcement learning device includes a processor that determines a first action on a control target by using a basic controller that defines an action on the control target depending on a state of the control target. The processor performs a first reinforcement learning within a first action range around the first action in order to acquire a first policy for determining an action on the control target depending on a state of the control target. The first action range is smaller than a limit action range for the control target. The processor determines a second action on the control target by using the first policy. The processor updates the first policy to a second policy by performing a second reinforcement learning within a second action range around the second action. The second action range is smaller than the limit action range.
    Type: Grant
    Filed: March 6, 2019
    Date of Patent: June 21, 2022
    Assignee: FUJITSU LIMITED
    Inventors: Hidenao Iwane, Yoshihiro Okawa
  • Publication number: 20210109491
    Abstract: A policy improvement method for reinforcement learning using a state value function, the method including: calculating, when an immediate cost or immediate reward of a control target in the reinforcement learning is defined by a state and an input, an estimated parameter that estimates a parameter of the state value function for the state of the control target; contracting a state space of the control target using the calculated estimated parameter; generating a TD error for the estimated state value function that estimates the state value function in the contracted state space of the control target by perturbing each parameter that defines the policy; generating an estimated gradient that estimates the gradient of the state value function with respect to the parameter that defines the policy, based on the generated TD error and the perturbation; and updating the parameter that defines the policy using the generated estimated gradient.
    Type: Application
    Filed: September 29, 2020
    Publication date: April 15, 2021
    Applicant: FUJITSU LIMITED
    Inventors: Junichi Shigezumi, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Publication number: 20210063974
    Abstract: A method for reinforcement learning performed by a computer is disclosed. The method includes: predicting a state of a target to be controlled in reinforcement learning at each time point to measure a state of the target, the time point being included in a period from a time point to determine a present action to a time point to determine a subsequent action; calculating a degree of risk concerning the state of the target at the each time point with respect to a constraint condition based on a result of prediction; specifying a search range concerning the present action to the target in accordance with the calculated degree of risk and a degree of impact of the present action to the target on the state of the target at the each time point; and determining the present action to the target based on the specified search range.
    Type: Application
    Filed: August 25, 2020
    Publication date: March 4, 2021
    Applicant: FUJITSU LIMITED
    Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Publication number: 20210049486
    Abstract: A policy improvement method of improving a policy of reinforcement learning based on a state value function is performed by a computer. The method causes a computer to execute a process including: calculating an input to a control target based on the policy and a predetermined exploration method of exploring for an input to the control target in the reinforcement learning; and updating a parameter of the policy based on a result of applying the calculated input to the control target, using the input to the control target and a generalized inverse matrix regarding a state of the control target.
    Type: Application
    Filed: August 11, 2020
    Publication date: February 18, 2021
    Applicant: FUJITSU LIMITED
    Inventors: Tomotake Sasaki, Hidenao Iwane
  • Publication number: 20200285204
    Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.
    Type: Application
    Filed: February 21, 2020
    Publication date: September 10, 2020
    Applicant: FUJITSU LIMITED
    Inventors: Hidenao IWANE, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
  • Publication number: 20200285208
    Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.
    Type: Application
    Filed: February 21, 2020
    Publication date: September 10, 2020
    Applicant: FUJITSU LIMITED
    Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Publication number: 20200234123
    Abstract: A reinforcement learning method executed by a computer includes calculating, in reinforcement learning of repeatedly executing a learning step for a value function that has monotonicity as a characteristic of a value according to a state or an action of a control target, a contribution level of the state or the action of the control target used in the learning step, the contribution level of the state or the action to the reinforcement learning being calculated for each learning step and calculated using a basis function used for representing the value function; determining whether to update the value function, based on the value function after each learning step and the calculated contribution level calculated in each learning step; and updating the value function when the determining determines to update the value function.
    Type: Application
    Filed: January 16, 2020
    Publication date: July 23, 2020
    Applicant: FUJITSU LIMITED
    Inventors: Junichi Shigezumi, Hidenao Iwane, Hitoshi Yanami
  • Publication number: 20200233384
    Abstract: A reinforcement learning method is executed by a computer, for wind power generator control. The reinforcement learning method includes obtaining, as an action for one step in a reinforcement learning, a series of control inputs to a windmill including control inputs for plural steps ahead; obtaining, as a reward for one step in the reinforcement learning, a series of generated power amounts including generated power amounts for the plural steps ahead and indicating power generated by a wind power generator in response to rotations of the windmill; and implementing reinforcement learning for each step of determining a control input to be given to the windmill based on the series of control inputs and the series of generated power amounts.
    Type: Application
    Filed: January 3, 2020
    Publication date: July 23, 2020
    Applicant: FUJITSU LIMITED
    Inventor: Hidenao Iwane
  • Publication number: 20200193333
    Abstract: First reinforcement learning is performed, based on an action of a basic controller defining an action on a state of an environment, to obtain a first reinforcement learner by using a state-action value function expressed in a polynomial in an action range smaller than an action-range limit for the environment. Second reinforcement learning is performed, based on an action of a first controller including the first reinforcement learner, to obtain a second reinforcement learner by using a state-action value function expressed in a polynomial in an action range smaller than the action-range limit. Third reinforcement learning is performed, based on an action of a second controller including a merged reinforcement learner obtained by merging the first reinforcement learner and the second reinforcement learner, to obtain a third reinforcement leaner by using a state-action value function expressed in a polynomial in an action range smaller than the action-range limit.
    Type: Application
    Filed: December 10, 2019
    Publication date: June 18, 2020
    Applicant: FUJITSU LIMITED
    Inventor: Hidenao Iwane
  • Publication number: 20200184277
    Abstract: A reinforcement learning method is performed by a computer. The method includes: acquiring an input value related to a state and an action of a control target and a gain of the control target that corresponds to the input value; estimating coefficients of state-action value function that becomes a polynomial for a variable that represents the action of the control target, or becomes a polynomial for a variable that represents the action of the control target when a value is substituted for a variable that represents the state of the control target, based on the acquired input value and the gain; and obtaining an optimum action or an optimum value of the state-action value function with the estimated coefficients by using a quantifier elimination.
    Type: Application
    Filed: December 4, 2019
    Publication date: June 11, 2020
    Applicant: FUJITSU LIMITED
    Inventors: Hidenao Iwane, Tomotake Sasaki, Hitoshi Yanami
  • Publication number: 20200174432
    Abstract: A non-transitory computer-readable recording medium having stored therein a program for causing a computer to execute a process. The process includes obtaining a specific action related to a value function that becomes a polynomial expression for a variable that represents an action or a polynomial expression for a variable that represents an action when a value is substituted for a variable that represents a state. The process includes specifying an action range by using a quantifier elimination for a logical expression including a conditional expression that represents that a difference between a value of the value function and a value of the value function that corresponds to the specific action is smaller than a threshold value. The process includes determining a next action from the specified range.
    Type: Application
    Filed: November 27, 2019
    Publication date: June 4, 2020
    Applicant: FUJITSU LIMITED
    Inventor: Hidenao Iwane
  • Publication number: 20190302708
    Abstract: A reinforcement learning device includes a processor that determines a first action on a control target by using a basic controller that defines an action on the control target depending on a state of the control target. The processor performs a first reinforcement learning within a first action range around the first action in order to acquire a first policy for determining an action on the control target depending on a state of the control target. The first action range is smaller than a limit action range for the control target. The processor determines a second action on the control target by using the first policy. The processor updates the first policy to a second policy by performing a second reinforcement learning within a second action range around the second action. The second action range is smaller than the limit action range.
    Type: Application
    Filed: March 6, 2019
    Publication date: October 3, 2019
    Applicant: FUJITSU LIMITED
    Inventors: Hidenao IWANE, Yoshihiro OKAWA
  • Publication number: 20190086876
    Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a TD error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.
    Type: Application
    Filed: September 13, 2018
    Publication date: March 21, 2019
    Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Publication number: 20190087751
    Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.
    Type: Application
    Filed: September 13, 2018
    Publication date: March 21, 2019
    Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Publication number: 20170301053
    Abstract: A non-transitory computer-readable recording medium stores an operation planning program that causes a computer to execute a process including: for a set including elements that are users, for which a plan of operation including ride sharing is to be generated, performing ordering of the elements of the set by using indices indicating highness of possibilities that subadditivity is fulfilled, determining whether a combination of the elements in descending order of the ordering fulfills the subadditivity, the number of the elements in the combination being equal to or less than a predetermined number, and partitioning the set into subsets, for which the ride sharing is to be operated, by adding the combination of elements fulfilling the subadditivity to the subsets; and generating the plan of operation by using the partitioned subsets.
    Type: Application
    Filed: April 12, 2017
    Publication date: October 19, 2017
    Applicants: FUJITSU LIMITED, KYUSHU UNIVERSITY, NATIONAL UNIVERSITY CORPORATION
    Inventors: Hirokazu Anai, Kotaro Ohori, Hidenao Iwane, Naoyuki Kamiyama, Akafumi Kira
  • Patent number: 9614401
    Abstract: A control server according to an embodiment sorts a plurality of notebook PCs into a plurality of groups so that the total value of the remaining energy is a value similar to the total value of the remaining energy of the rechargeable batteries of a plurality of notebook PCs included in a different group. The control server according to the embodiment performs local search individually on the sorted groups, and generates a control plan for the individual notebook PCs.
    Type: Grant
    Filed: February 28, 2014
    Date of Patent: April 4, 2017
    Assignees: FUJITSU LIMITED, THE UNIVERSITY OF TOKYO
    Inventors: Hitoshi Yanami, Hidenao Iwane, Tomotake Sasaki, Hirokazu Anai, Junji Kaneko, Shinji Hara, Suguru Fujita