Patents by Inventor Hitoshi Yanami
Hitoshi Yanami has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11645574Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.Type: GrantFiled: September 13, 2018Date of Patent: May 9, 2023Assignees: FUJITSU LIMITED KAWASAKI, JAPAN, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATIONInventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
-
Publication number: 20230129842Abstract: A non-transitory computer-readable recording medium stores a program for causing a computer to execute a process, the process includes generating a plurality of causal relationship candidates each including a pair of a first answer candidate to each of at least some of first questions and a second answer candidate to one of second questions based on questionnaire result data, and searching for a solution of a combinatorial optimization problem that minimizes or maximizes a value of an objective function of which the value changes according to causal relationship candidates to be combined, under a constraint condition such that a predetermined ratio of respondents or more of the plurality of respondents have answers that are same as the pair of the first answer candidate and the second answer candidate of any one of the causal relationship candidates to be combined.Type: ApplicationFiled: August 19, 2022Publication date: April 27, 2023Applicant: FUJITSU LIMITEDInventors: Kazuhiro MATSUMOTO, Masatoshi OGAWA, Hitoshi YANAMI, Noriyasu ASO, Hiromitsu SONEDA, Katsumi HOMMA, Natsuki ISHIKAWA, Hayato Dan
-
Patent number: 11619915Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.Type: GrantFiled: February 21, 2020Date of Patent: April 4, 2023Assignee: FUJITSU LIMITEDInventors: Hidenao Iwane, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
-
Patent number: 11573537Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.Type: GrantFiled: September 13, 2018Date of Patent: February 7, 2023Assignees: FUJITSU LIMITED, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATIONInventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
-
Patent number: 11543789Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.Type: GrantFiled: February 21, 2020Date of Patent: January 3, 2023Assignee: FUJITSU LIMITEDInventors: Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
-
Publication number: 20210109491Abstract: A policy improvement method for reinforcement learning using a state value function, the method including: calculating, when an immediate cost or immediate reward of a control target in the reinforcement learning is defined by a state and an input, an estimated parameter that estimates a parameter of the state value function for the state of the control target; contracting a state space of the control target using the calculated estimated parameter; generating a TD error for the estimated state value function that estimates the state value function in the contracted state space of the control target by perturbing each parameter that defines the policy; generating an estimated gradient that estimates the gradient of the state value function with respect to the parameter that defines the policy, based on the generated TD error and the perturbation; and updating the parameter that defines the policy using the generated estimated gradient.Type: ApplicationFiled: September 29, 2020Publication date: April 15, 2021Applicant: FUJITSU LIMITEDInventors: Junichi Shigezumi, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
-
Publication number: 20210063974Abstract: A method for reinforcement learning performed by a computer is disclosed. The method includes: predicting a state of a target to be controlled in reinforcement learning at each time point to measure a state of the target, the time point being included in a period from a time point to determine a present action to a time point to determine a subsequent action; calculating a degree of risk concerning the state of the target at the each time point with respect to a constraint condition based on a result of prediction; specifying a search range concerning the present action to the target in accordance with the calculated degree of risk and a degree of impact of the present action to the target on the state of the target at the each time point; and determining the present action to the target based on the specified search range.Type: ApplicationFiled: August 25, 2020Publication date: March 4, 2021Applicant: FUJITSU LIMITEDInventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
-
Publication number: 20200285204Abstract: A computer-implemented reinforcement learning method includes determining, based on a target probability of satisfaction of a constraint condition related to a state of a control object and a specific time within which a controller causes the state of the control object not satisfying the constraint condition to be the state of the control object satisfying the constraint condition, a parameter of a reinforcement learner that causes, in a specific probability, the state of the control object to satisfy the constraint condition at a first timing following a second timing at which the state of control object satisfies the constraint condition; and determining a control input to the control object by either the reinforcement learner or the controller, based on whether the state of the control object satisfies the constraint condition at a specific timing.Type: ApplicationFiled: February 21, 2020Publication date: September 10, 2020Applicant: FUJITSU LIMITEDInventors: Hidenao IWANE, Junichi Shigezumi, Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami
-
Publication number: 20200285208Abstract: A reinforcement learning method executed by a computer includes calculating a degree of risk for a state of a controlled object at a current time point with respect to a constraint condition related to the state of the controlled object, the degree of risk being calculated based on a predicted value of the state of the controlled object at a future time point, the predicted value being obtained from model information defining a relationship between the state of the controlled object and a control input to the controlled object; and determining the control input to the controlled object at the current time point, from a range defined according to the calculated degree of risk so that the range becomes narrower as the calculated degree of risk increases.Type: ApplicationFiled: February 21, 2020Publication date: September 10, 2020Applicant: FUJITSU LIMITEDInventors: Yoshihiro OKAWA, Tomotake Sasaki, Hidenao Iwane, Hitoshi Yanami
-
Publication number: 20200234123Abstract: A reinforcement learning method executed by a computer includes calculating, in reinforcement learning of repeatedly executing a learning step for a value function that has monotonicity as a characteristic of a value according to a state or an action of a control target, a contribution level of the state or the action of the control target used in the learning step, the contribution level of the state or the action to the reinforcement learning being calculated for each learning step and calculated using a basis function used for representing the value function; determining whether to update the value function, based on the value function after each learning step and the calculated contribution level calculated in each learning step; and updating the value function when the determining determines to update the value function.Type: ApplicationFiled: January 16, 2020Publication date: July 23, 2020Applicant: FUJITSU LIMITEDInventors: Junichi Shigezumi, Hidenao Iwane, Hitoshi Yanami
-
Publication number: 20200184277Abstract: A reinforcement learning method is performed by a computer. The method includes: acquiring an input value related to a state and an action of a control target and a gain of the control target that corresponds to the input value; estimating coefficients of state-action value function that becomes a polynomial for a variable that represents the action of the control target, or becomes a polynomial for a variable that represents the action of the control target when a value is substituted for a variable that represents the state of the control target, based on the acquired input value and the gain; and obtaining an optimum action or an optimum value of the state-action value function with the estimated coefficients by using a quantifier elimination.Type: ApplicationFiled: December 4, 2019Publication date: June 11, 2020Applicant: FUJITSU LIMITEDInventors: Hidenao Iwane, Tomotake Sasaki, Hitoshi Yanami
-
Patent number: 10310587Abstract: A power-supply control apparatus includes a processor that executes a process. The process includes calculating, for a first time period, a first predictive value of total power consumption by the power-supply control apparatus and one or more other power-supply control apparatuses to which power is supplied from a power supply; and determining whether to allow a storage battery to be charged in the first time period based on the first predictive value for the first time period and previous information that is related to the first predictive value and obtained in a second time period before the first time period.Type: GrantFiled: October 26, 2015Date of Patent: June 4, 2019Assignees: FUJITSU LIMITED, THE UNIVERSITY OF TOKYOInventors: Tomotake Sasaki, Hitoshi Yanami, Junji Kaneko, Shinji Hara
-
Publication number: 20190086876Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a TD error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.Type: ApplicationFiled: September 13, 2018Publication date: March 21, 2019Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School CorporationInventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
-
Publication number: 20190087751Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.Type: ApplicationFiled: September 13, 2018Publication date: March 21, 2019Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School CorporationInventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
-
Patent number: 9614401Abstract: A control server according to an embodiment sorts a plurality of notebook PCs into a plurality of groups so that the total value of the remaining energy is a value similar to the total value of the remaining energy of the rechargeable batteries of a plurality of notebook PCs included in a different group. The control server according to the embodiment performs local search individually on the sorted groups, and generates a control plan for the individual notebook PCs.Type: GrantFiled: February 28, 2014Date of Patent: April 4, 2017Assignees: FUJITSU LIMITED, THE UNIVERSITY OF TOKYOInventors: Hitoshi Yanami, Hidenao Iwane, Tomotake Sasaki, Hirokazu Anai, Junji Kaneko, Shinji Hara, Suguru Fujita
-
Publication number: 20160291090Abstract: A control scheme creation method according to an embodiment includes executing, on a computer, processing of calculation of the amount of stored or released energy of each of a plurality of energy storage devices for each of a plurality of periods based on estimation value information on the amount of energy consumption within a target area and based on remaining amount information representing the amount of remaining energy of each of the plurality of energy storage devices. Furthermore, the control scheme creation method includes executing, on the computer, processing of determination of storage timing or release timing for the energy storage device for each of the periods based on the calculated amount of stored or released energy.Type: ApplicationFiled: March 4, 2016Publication date: October 6, 2016Applicants: FUJITSU LIMITED, THE UNIVERSITY OF TOKYOInventors: Tomotake Sasaki, Hitoshi Yanami, Junji Kaneko, Shinji Hara
-
Publication number: 20160154453Abstract: A power-supply control apparatus includes a processor that executes a process. The process includes calculating, for a first time period, a first predictive value of total power consumption by the power-supply control apparatus and one or more other power-supply control apparatuses to which power is supplied from a power supply; and determining whether to allow a storage battery to be charged in the first time period based on the first predictive value for the first time period and previous information that is related to the first predictive value and obtained in a second time period before the first time period.Type: ApplicationFiled: October 26, 2015Publication date: June 2, 2016Applicants: FUJITSU LIMITED, The University of TokyoInventors: Tomotake SASAKI, Hitoshi YANAMI, Junji KANEKO, Shinji HARA
-
Patent number: 8935131Abstract: When model expressions of objective functions are generated at vertexes of a quadrilateral on a plane concerning P and N channels of transistors in SRAM, the initial number of times of simulation is allocated to each objective function at each designated vertex according to weight values set based on relationships presumed among the objective functions at each designated vertex. For each objective function at each designated vertex, first simulation is executed the allocated number of times. Furthermore, a model expression is generated from the first simulation result, and an evaluation indicator of an approximation accuracy of the model expression is calculated. Then, for each model expression, it is determined whether the corresponding model expression has influence on the yield, and based on the evaluation indicator of the corresponding model expression and presence or absence of the influence, it is determined whether additional simulation is required for the corresponding objective function.Type: GrantFiled: March 25, 2011Date of Patent: January 13, 2015Assignee: Fujitsu LimitedInventors: Hidenao Iwane, Hirokazu Anai, Hitoshi Yanami
-
Patent number: 8843351Abstract: This method includes: generating a constraint equation from data of an approximate expression of a cost function representing a relationship between a plurality of design parameters and a cost, data of a route in a cost space and data of a search range in a design parameter space; obtaining a logical expression of a solution for the constraint equation from a quantifier elimination processing unit that carries out a processing according to a quantifier elimination method; substituting coordinates of each of a plurality of points within the search range in the design parameter space into the logical expression of the solution to determine, for each of the plurality of points, true or false of the logical expression of the solution; and displaying the design parameter space in which a display object including a first point for which true is determined is disposed at the first point.Type: GrantFiled: May 25, 2011Date of Patent: September 23, 2014Assignee: Fujitsu LimitedInventors: Hitoshi Yanami, Hirokazu Anai, Hidenao Iwane
-
Publication number: 20140249793Abstract: A control server according to an embodiment sorts a plurality of notebook PCs into a plurality of groups so that the total value of the remaining amounts is a value similar to the total value of the remaining amounts of the rechargeable batteries of a plurality of notebook PCs included in a different group. The control server according to the embodiment performs local search individually on the sorted groups, and generates a control plan for the individual notebook PCs.Type: ApplicationFiled: February 28, 2014Publication date: September 4, 2014Applicants: The University of Tokyo, FUJITSU LIMITEDInventors: Hitoshi Yanami, Hidenao Iwane, Tomotake Sasaki, Hirokazu Anai, Junji Kaneko, Shinji Hara, Suguru Fujita