Patents by Inventor Tomotake SASAKI

Tomotake SASAKI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Recording medium, reinforcement learning method, and reinforcement learning apparatus

Patent number: 11645574

Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.

Type: Grant

Filed: September 13, 2018

Date of Patent: May 9, 2023

Assignees: FUJITSU LIMITED KAWASAKI, JAPAN, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
Reinforcement learning method and reinforcement learning system

Patent number: 11619915

Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.

Type: Grant

Filed: February 21, 2020

Date of Patent: April 4, 2023

Assignee: FUJITSU LIMITED

Inventors: Hidenao Iwane, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
Apparatus, method and recording medium for controlling system using temporal difference error

Patent number: 11573537

Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Type: Grant

Filed: September 13, 2018

Date of Patent: February 7, 2023

Assignees: FUJITSU LIMITED, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
Reinforcement learning method, recording medium, and reinforcement learning system

Patent number: 11543789

Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.

Type: Grant

Filed: February 21, 2020

Date of Patent: January 3, 2023

Assignee: FUJITSU LIMITED

Inventors: Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
Policy improvement method, recording medium, and policy improvement apparatus

Patent number: 11385604

Abstract: A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.

Type: Grant

Filed: March 5, 2020

Date of Patent: July 12, 2022

Assignee: FUJITSU LIMITED

Inventor: Tomotake Sasaki
Recording medium, arrangement search method, and arrangement searching apparatus

Patent number: 11137817

Abstract: A non-transitory, computer-readable recording medium stores therein an arrangement search program that causes a computer that searches arrangement of virtual machines in plural servers in a facility including the plural servers to execute a process that includes setting an initial value of a parameter concerning the arrangement of the plurality of virtual machines in the plurality of servers, based on at least any one of first performance information on power consumption of the plurality of servers, second performance information on power consumption of air conditioning equipment installed in the facility, third performance information on power consumption of power source equipment installed in the facility, and heat coupling information on heat coupling among the plurality of servers and among the plurality of servers and the air conditioning equipment; and updating the parameter by a sequential parameter estimation method, so as to optimize power consumption of the overall facility.

Type: Grant

Filed: September 24, 2018

Date of Patent: October 5, 2021

Assignee: FUJITSU LIMITED

Inventor: Tomotake Sasaki
POLICY IMPROVEMENT METHOD, NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM FOR STORING POLICY IMPROVEMENT PROGRAM, AND POLICY IMPROVEMENT DEVICE

Publication number: 20210109491

Abstract: A policy improvement method for reinforcement learning using a state value function, the method including: calculating, when an immediate cost or immediate reward of a control target in the reinforcement learning is defined by a state and an input, an estimated parameter that estimates a parameter of the state value function for the state of the control target; contracting a state space of the control target using the calculated estimated parameter; generating a TD error for the estimated state value function that estimates the state value function in the contracted state space of the control target by perturbing each parameter that defines the policy; generating an estimated gradient that estimates the gradient of the state value function with respect to the parameter that defines the policy, based on the generated TD error and the perturbation; and updating the parameter that defines the policy using the generated estimated gradient.

Type: Application

Filed: September 29, 2020

Publication date: April 15, 2021

Applicant: FUJITSU LIMITED

Inventors: Junichi Shigezumi, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
METHOD FOR REINFORCEMENT LEARNING, RECORDING MEDIUM STORING REINFORCEMENT LEARNING PROGRAM, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20210063974

Abstract: A method for reinforcement learning performed by a computer is disclosed. The method includes: predicting a state of a target to be controlled in reinforcement learning at each time point to measure a state of the target, the time point being included in a period from a time point to determine a present action to a time point to determine a subsequent action; calculating a degree of risk concerning the state of the target at the each time point with respect to a constraint condition based on a result of prediction; specifying a search range concerning the present action to the target in accordance with the calculated degree of risk and a degree of impact of the present action to the target on the state of the target at the each time point; and determining the present action to the target based on the specified search range.

Type: Application

Filed: August 25, 2020

Publication date: March 4, 2021

Applicant: FUJITSU LIMITED

Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
POLICY IMPROVEMENT METHOD, POLICY IMPROVEMENT PROGRAM STORAGE MEDIUM, AND POLICY IMPROVEMENT DEVICE

Publication number: 20210049486

Abstract: A policy improvement method of improving a policy of reinforcement learning based on a state value function is performed by a computer. The method causes a computer to execute a process including: calculating an input to a control target based on the policy and a predetermined exploration method of exploring for an input to the control target in the reinforcement learning; and updating a parameter of the policy based on a result of applying the calculated input to the control target, using the input to the control target and a generalized inverse matrix regarding a state of the control target.

Type: Application

Filed: August 11, 2020

Publication date: February 18, 2021

Applicant: FUJITSU LIMITED

Inventors: Tomotake Sasaki, Hidenao Iwane
REINFORCEMENT LEARNING METHOD AND REINFORCEMENT LEARNING SYSTEM

Publication number: 20200285204

Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.

Type: Application

Filed: February 21, 2020

Publication date: September 10, 2020

Applicant: FUJITSU LIMITED

Inventors: Hidenao IWANE, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
POLICY IMPROVEMENT METHOD, RECORDING MEDIUM, AND POLICY IMPROVEMENT APPARATUS

Publication number: 20200285205

Abstract: A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.

Type: Application

Filed: March 5, 2020

Publication date: September 10, 2020

Applicant: FUJITSU LIMITED

Inventor: Tomotake Sasaki
REINFORCEMENT LEARNING METHOD, RECORDING MEDIUM, AND REINFORCEMENT LEARNING SYSTEM

Publication number: 20200285208

Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.

Type: Application

Filed: February 21, 2020

Publication date: September 10, 2020

Applicant: FUJITSU LIMITED

Inventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
RECORDING MEDIUM THAT STORES REINFORCEMENT LEARNING PROGRAM, REINFORCEMENT LEARNING METHOD, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20200184277

Abstract: A reinforcement learning method is performed by a computer. The method includes: acquiring an input value related to a state and an action of a control target and a gain of the control target that corresponds to the input value; estimating coefficients of state-action value function that becomes a polynomial for a variable that represents the action of the control target, or becomes a polynomial for a variable that represents the action of the control target when a value is substituted for a variable that represents the state of the control target, based on the acquired input value and the gain; and obtaining an optimum action or an optimum value of the state-action value function with the estimated coefficients by using a quantifier elimination.

Type: Application

Filed: December 4, 2019

Publication date: June 11, 2020

Applicant: FUJITSU LIMITED

Inventors: Hidenao Iwane, Tomotake Sasaki, Hitoshi Yanami
Power-supply control apparatus, power-supply control method, server, power-supply control system, and storage medium

Patent number: 10310587

Abstract: A power-supply control apparatus includes a processor that executes a process. The process includes calculating, for a first time period, a first predictive value of total power consumption by the power-supply control apparatus and one or more other power-supply control apparatuses to which power is supplied from a power supply; and determining whether to allow a storage battery to be charged in the first time period based on the first predictive value for the first time period and previous information that is related to the first predictive value and obtained in a second time period before the first time period.

Type: Grant

Filed: October 26, 2015

Date of Patent: June 4, 2019

Assignees: FUJITSU LIMITED, THE UNIVERSITY OF TOKYO

Inventors: Tomotake Sasaki, Hitoshi Yanami, Junji Kaneko, Shinji Hara
RECORDING MEDIUM, POLICY IMPROVING METHOD, AND POLICY IMPROVING APPARATUS

Publication number: 20190086876

Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a TD error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Type: Application

Filed: September 13, 2018

Publication date: March 21, 2019

Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
RECORDING MEDIUM, REINFORCEMENT LEARNING METHOD, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20190087751

Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.

Type: Application

Filed: September 13, 2018

Publication date: March 21, 2019

Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
RECORDING MEDIUM, ARRANGEMENT SEARCH METHOD, AND ARRANGEMENT SEARCHING APPARATUS

Publication number: 20190025899

Abstract: A non-transitory, computer-readable recording medium stores therein an arrangement search program that causes a computer that searches arrangement of virtual machines in plural servers in a facility including the plural servers to execute a process that includes setting an initial value of a parameter concerning the arrangement of the plurality of virtual machines in the plurality of servers, based on at least any one of first performance information on power consumption of the plurality of servers, second performance information on power consumption of air conditioning equipment installed in the facility, third performance information on power consumption of power source equipment installed in the facility, and heat coupling information on heat coupling among the plurality of servers and among the plurality of servers and the air conditioning equipment; and updating the parameter by a sequential parameter estimation method, so as to optimize power consumption of the overall facility.

Type: Application

Filed: September 24, 2018

Publication date: January 24, 2019

Applicant: FUJITSU LIMITED

Inventor: Tomotake SASAKI
Control method for switching power supply circuit and power supply device

Patent number: 9893629

Abstract: A control method for a switching power supply circuit, the control method causing a processor to execute a process, the process includes: calculating a differential value between an output voltage of the switching power supply circuit and a target voltage; multiplying the differential value by a first coefficient to calculate a correction value; correcting a first detection value of an output current of the switching power supply circuit, which is detected by a current transformer circuit, based on the correction value, to generate a second detection value; comparing the second detection value with a threshold current value to determine whether or not an overcurrent has occurred; and reducing, when it is determined that the overcurrent has occurred, the output voltage of the switching power supply circuit.

Type: Grant

Filed: September 23, 2016

Date of Patent: February 13, 2018

Assignee: FUJITSU LIMITED

Inventors: Yu Yonezawa, Tomotake Sasaki, Hisato Hosoyama, Yoshiyasu Nakashima
Control device, DC-DC converter, switching power supply apparatus, and information processing apparatus

Patent number: 9800145

Abstract: A control device includes a processor that executes a process including generating a driving signal that drives a switching device, so that an output voltage of a converter circuit that performs a step-down conversion on input power by driving the switching device matches a target value, and modifying the target value so that as an output current of the converter circuit becomes lower the output voltage becomes closer to an upper limit value of the output voltage.

Type: Grant

Filed: May 12, 2015

Date of Patent: October 24, 2017

Assignee: FUJITSU LIMITED

Inventors: Tomotake Sasaki, Yu Yonezawa, Junji Kaneko, Yoshiyasu Nakashima
Power supply apparatus, control apparatus, and program therefor

Patent number: 9774249

Abstract: A power supply apparatus includes: an inductor to which an input voltage is applied; a switching element that switches a current flowing to the inductor on and off so as to cause an induced voltage to be generated; an electrolytic capacitor that smoothes the induced voltage and outputs the voltage to a load; and a control circuit that controls the switching element, wherein the control circuit outputs a second switching control signal obtained by superimposing a degradation detection-purpose signal for detecting degradation of the electrolytic capacitor on a first control signal, detects an output voltage output by switching performed by the switching element controlled by the second control signal, and estimates the degradation of the electrolytic capacitor by using the output voltage detected, a duty cycle of the first control signal, and a frequency component of the degradation detection-purpose signal contained in the output voltage detected.

Type: Grant

Filed: June 23, 2015

Date of Patent: September 26, 2017

Assignee: FUJITSU LIMITED

Inventors: Yoshinobu Matsui, Hisato Hosoyama, Tomotake Sasaki

1 2 next