Patents by Inventor Tomotake SASAKI

Tomotake SASAKI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11645574
    Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.
    Type: Grant
    Filed: September 13, 2018
    Date of Patent: May 9, 2023
    Assignees: FUJITSU LIMITED KAWASAKI, JAPAN, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Patent number: 11619915
    Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: April 4, 2023
    Assignee: FUJITSU LIMITED
    Inventors: Hidenao Iwane, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
  • Patent number: 11573537
    Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.
    Type: Grant
    Filed: September 13, 2018
    Date of Patent: February 7, 2023
    Assignees: FUJITSU LIMITED, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Patent number: 11543789
    Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: January 3, 2023
    Assignee: FUJITSU LIMITED
    Inventors: Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Patent number: 11385604
    Abstract: A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.
    Type: Grant
    Filed: March 5, 2020
    Date of Patent: July 12, 2022
    Assignee: FUJITSU LIMITED
    Inventor: Tomotake Sasaki
  • Patent number: 11137817
    Abstract: A non-transitory, computer-readable recording medium stores therein an arrangement search program that causes a computer that searches arrangement of virtual machines in plural servers in a facility including the plural servers to execute a process that includes setting an initial value of a parameter concerning the arrangement of the plurality of virtual machines in the plurality of servers, based on at least any one of first performance information on power consumption of the plurality of servers, second performance information on power consumption of air conditioning equipment installed in the facility, third performance information on power consumption of power source equipment installed in the facility, and heat coupling information on heat coupling among the plurality of servers and among the plurality of servers and the air conditioning equipment; and updating the parameter by a sequential parameter estimation method, so as to optimize power consumption of the overall facility.
    Type: Grant
    Filed: September 24, 2018
    Date of Patent: October 5, 2021
    Assignee: FUJITSU LIMITED
    Inventor: Tomotake Sasaki
  • Publication number: 20210109491
    Abstract: A policy improvement method for reinforcement learning using a state value function, the method including: calculating, when an immediate cost or immediate reward of a control target in the reinforcement learning is defined by a state and an input, an estimated parameter that estimates a parameter of the state value function for the state of the control target; contracting a state space of the control target using the calculated estimated parameter; generating a TD error for the estimated state value function that estimates the state value function in the contracted state space of the control target by perturbing each parameter that defines the policy; generating an estimated gradient that estimates the gradient of the state value function with respect to the parameter that defines the policy, based on the generated TD error and the perturbation; and updating the parameter that defines the policy using the generated estimated gradient.
    Type: Application
    Filed: September 29, 2020
    Publication date: April 15, 2021
    Applicant: FUJITSU LIMITED
    Inventors: Junichi Shigezumi, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Publication number: 20210063974
    Abstract: A method for reinforcement learning performed by a computer is disclosed. The method includes: predicting a state of a target to be controlled in reinforcement learning at each time point to measure a state of the target, the time point being included in a period from a time point to determine a present action to a time point to determine a subsequent action; calculating a degree of risk concerning the state of the target at the each time point with respect to a constraint condition based on a result of prediction; specifying a search range concerning the present action to the target in accordance with the calculated degree of risk and a degree of impact of the present action to the target on the state of the target at the each time point; and determining the present action to the target based on the specified search range.
    Type: Application
    Filed: August 25, 2020
    Publication date: March 4, 2021
    Applicant: FUJITSU LIMITED
    Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Publication number: 20210049486
    Abstract: A policy improvement method of improving a policy of reinforcement learning based on a state value function is performed by a computer. The method causes a computer to execute a process including: calculating an input to a control target based on the policy and a predetermined exploration method of exploring for an input to the control target in the reinforcement learning; and updating a parameter of the policy based on a result of applying the calculated input to the control target, using the input to the control target and a generalized inverse matrix regarding a state of the control target.
    Type: Application
    Filed: August 11, 2020
    Publication date: February 18, 2021
    Applicant: FUJITSU LIMITED
    Inventors: Tomotake Sasaki, Hidenao Iwane
  • Publication number: 20200285204
    Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.
    Type: Application
    Filed: February 21, 2020
    Publication date: September 10, 2020
    Applicant: FUJITSU LIMITED
    Inventors: Hidenao IWANE, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
  • Publication number: 20200285205
    Abstract: A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.
    Type: Application
    Filed: March 5, 2020
    Publication date: September 10, 2020
    Applicant: FUJITSU LIMITED
    Inventor: Tomotake Sasaki
  • Publication number: 20200285208
    Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.
    Type: Application
    Filed: February 21, 2020
    Publication date: September 10, 2020
    Applicant: FUJITSU LIMITED
    Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
  • Publication number: 20200184277
    Abstract: A reinforcement learning method is performed by a computer. The method includes: acquiring an input value related to a state and an action of a control target and a gain of the control target that corresponds to the input value; estimating coefficients of state-action value function that becomes a polynomial for a variable that represents the action of the control target, or becomes a polynomial for a variable that represents the action of the control target when a value is substituted for a variable that represents the state of the control target, based on the acquired input value and the gain; and obtaining an optimum action or an optimum value of the state-action value function with the estimated coefficients by using a quantifier elimination.
    Type: Application
    Filed: December 4, 2019
    Publication date: June 11, 2020
    Applicant: FUJITSU LIMITED
    Inventors: Hidenao Iwane, Tomotake Sasaki, Hitoshi Yanami
  • Patent number: 10310587
    Abstract: A power-supply control apparatus includes a processor that executes a process. The process includes calculating, for a first time period, a first predictive value of total power consumption by the power-supply control apparatus and one or more other power-supply control apparatuses to which power is supplied from a power supply; and determining whether to allow a storage battery to be charged in the first time period based on the first predictive value for the first time period and previous information that is related to the first predictive value and obtained in a second time period before the first time period.
    Type: Grant
    Filed: October 26, 2015
    Date of Patent: June 4, 2019
    Assignees: FUJITSU LIMITED, THE UNIVERSITY OF TOKYO
    Inventors: Tomotake Sasaki, Hitoshi Yanami, Junji Kaneko, Shinji Hara
  • Publication number: 20190086876
    Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a TD error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.
    Type: Application
    Filed: September 13, 2018
    Publication date: March 21, 2019
    Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Publication number: 20190087751
    Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.
    Type: Application
    Filed: September 13, 2018
    Publication date: March 21, 2019
    Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation
    Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
  • Publication number: 20190025899
    Abstract: A non-transitory, computer-readable recording medium stores therein an arrangement search program that causes a computer that searches arrangement of virtual machines in plural servers in a facility including the plural servers to execute a process that includes setting an initial value of a parameter concerning the arrangement of the plurality of virtual machines in the plurality of servers, based on at least any one of first performance information on power consumption of the plurality of servers, second performance information on power consumption of air conditioning equipment installed in the facility, third performance information on power consumption of power source equipment installed in the facility, and heat coupling information on heat coupling among the plurality of servers and among the plurality of servers and the air conditioning equipment; and updating the parameter by a sequential parameter estimation method, so as to optimize power consumption of the overall facility.
    Type: Application
    Filed: September 24, 2018
    Publication date: January 24, 2019
    Applicant: FUJITSU LIMITED
    Inventor: Tomotake SASAKI
  • Patent number: 9893629
    Abstract: A control method for a switching power supply circuit, the control method causing a processor to execute a process, the process includes: calculating a differential value between an output voltage of the switching power supply circuit and a target voltage; multiplying the differential value by a first coefficient to calculate a correction value; correcting a first detection value of an output current of the switching power supply circuit, which is detected by a current transformer circuit, based on the correction value, to generate a second detection value; comparing the second detection value with a threshold current value to determine whether or not an overcurrent has occurred; and reducing, when it is determined that the overcurrent has occurred, the output voltage of the switching power supply circuit.
    Type: Grant
    Filed: September 23, 2016
    Date of Patent: February 13, 2018
    Assignee: FUJITSU LIMITED
    Inventors: Yu Yonezawa, Tomotake Sasaki, Hisato Hosoyama, Yoshiyasu Nakashima
  • Patent number: 9800145
    Abstract: A control device includes a processor that executes a process including generating a driving signal that drives a switching device, so that an output voltage of a converter circuit that performs a step-down conversion on input power by driving the switching device matches a target value, and modifying the target value so that as an output current of the converter circuit becomes lower the output voltage becomes closer to an upper limit value of the output voltage.
    Type: Grant
    Filed: May 12, 2015
    Date of Patent: October 24, 2017
    Assignee: FUJITSU LIMITED
    Inventors: Tomotake Sasaki, Yu Yonezawa, Junji Kaneko, Yoshiyasu Nakashima
  • Patent number: 9774249
    Abstract: A power supply apparatus includes: an inductor to which an input voltage is applied; a switching element that switches a current flowing to the inductor on and off so as to cause an induced voltage to be generated; an electrolytic capacitor that smoothes the induced voltage and outputs the voltage to a load; and a control circuit that controls the switching element, wherein the control circuit outputs a second switching control signal obtained by superimposing a degradation detection-purpose signal for detecting degradation of the electrolytic capacitor on a first control signal, detects an output voltage output by switching performed by the switching element controlled by the second control signal, and estimates the degradation of the electrolytic capacitor by using the output voltage detected, a duty cycle of the first control signal, and a frequency component of the degradation detection-purpose signal contained in the output voltage detected.
    Type: Grant
    Filed: June 23, 2015
    Date of Patent: September 26, 2017
    Assignee: FUJITSU LIMITED
    Inventors: Yoshinobu Matsui, Hisato Hosoyama, Tomotake Sasaki