Patents by Inventor Kenji Doya

Kenji Doya has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Recording medium, reinforcement learning method, and reinforcement learning apparatus

Patent number: 11645574

Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.

Type: Grant

Filed: September 13, 2018

Date of Patent: May 9, 2023

Assignees: FUJITSU LIMITED KAWASAKI, JAPAN, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
Apparatus, method and recording medium for controlling system using temporal difference error

Patent number: 11573537

Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Type: Grant

Filed: September 13, 2018

Date of Patent: February 7, 2023

Assignees: FUJITSU LIMITED, OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
Inverse reinforcement learning by density ratio estimation

Patent number: 10896382

Abstract: A method of inverse reinforcement learning for estimating cost and value functions of behaviors of a subject includes acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: q(x)+gV(y)?V(x)=?ln{pi(y|x))/(p(y|x)} (1) where q(x) and V(x) denote a cost function and a value function, respectively, at state x, g represents a discount factor, and p(y|x) and pi(y|x) denote state transition probabilities before and after learning, respectively; estimating a density ratio pi(y|x)/p(y|x) in Eq. (1); estimating q(x) and V(x) in Eq. (1) using the least square method in accordance with the estimated density ratio pi(y|x)/p(y|x), and outputting the estimated q(x) and V(x).

Type: Grant

Filed: August 7, 2015

Date of Patent: January 19, 2021

Assignee: OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Eiji Uchibe, Kenji Doya
Direct inverse reinforcement learning with density ratio estimation

Patent number: 10896383

Abstract: A method of inverse reinforcement learning for estimating reward and value functions of behaviors of a subject includes: acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: r ? ( x ) + ? ? ? V ? ( y ) - V ? ( x ) = ? ln ? ? ? ? ( y | x ) b ? ( y | x ) , ? ( 1 ) = ? ln ? ? ? ? ( x , y ) b ? ( x , y ) - ln ? ? ? ? ( x ) b ? ( x ) , ? ( 2 ) where r(x) and V(x) denote a reward function and a value function, respectively, at state x, and ? represents a discount factor, and b(y|x) and ?(y|x) denote state transition probabilities before and after learning, respectively; estimating a logarithm of the density ratio ?(x)/b(x) in Eq. (2); estimating r(x) and V(x) in Eq.

Type: Grant

Filed: February 6, 2017

Date of Patent: January 19, 2021

Assignee: OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY SCHOOL CORPORATION

Inventors: Eiji Uchibe, Kenji Doya
RECORDING MEDIUM, REINFORCEMENT LEARNING METHOD, AND REINFORCEMENT LEARNING APPARATUS

Publication number: 20190087751

Abstract: A non-transitory, computer-readable recording medium stores therein a reinforcement learning program that uses a value function and causes a computer to execute a process comprising: estimating first coefficients of the value function represented in a quadratic form of inputs at times in the past than a present time and outputs at the present time and the times in the past, the first coefficients being estimated based on inputs at the times in the past, the outputs at the present time and the times in the past, and costs or rewards that corresponds to the inputs at the times in the past; and determining second coefficients that defines a control law, based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients.

Type: Application

Filed: September 13, 2018

Publication date: March 21, 2019

Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
RECORDING MEDIUM, POLICY IMPROVING METHOD, AND POLICY IMPROVING APPARATUS

Publication number: 20190086876

Abstract: A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a TD error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Type: Application

Filed: September 13, 2018

Publication date: March 21, 2019

Applicants: FUJITSU LIMITED, Okinawa Institute of Science and Technology School Corporation

Inventors: Tomotake Sasaki, Eiji Uchibe, Kenji Doya, Hirokazu Anai, Hitoshi Yanami, Hidenao Iwane
INVERSE REINFORCEMENT LEARNING BY DENSITY RATIO ESTIMATION

Publication number: 20170213151

Abstract: A method of inverse reinforcement learning for estimating cost and value functions of behaviors of a subject includes acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: q(x)+gV(y)?V(x)=?1n{pi(y|x))/(p(y|x)} (1) where q(x) and V(x) denote a cost function and a value function, respectively, at state x, g represents a discount factor, and p(y|x) and pi(y|x) denote state transition probabilities before and after learning, respectively; estimating a density ratio pi(y|x)/p(y|x) in Eq. (1); estimating q(x) and V(x) in Eq. (1) using the least square method in accordance with the estimated density ratio pi(y|x)/p(y|x), and outputting the estimated q(x) and V(x).

Type: Application

Filed: August 7, 2015

Publication date: July 27, 2017

Applicant: Okinawa Institute of Science and Technology School Corporation

Inventors: Eiji UCHIBE, Kenji DOYA
DIRECT INVERSE REINFORCEMENT LEARNING WITH DENSITY RATIO ESTIMATION

Publication number: 20170147949

Abstract: A method of inverse reinforcement learning for estimating reward and value functions of behaviors of a subject includes: acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: r ? ( x ) + ? ? ? V ? ( y ) - V ? ( x ) = ? ln ? ? ? ? ( y | x ) b ? ( y | x ) , ? ( 1 ) = ? ln ? ? ? ? ( x , y ) b ? ( x , y ) - ln ? ? ? ? ( x ) b ? ( x ) , ? ( 2 ) where r(x) and V(x) denote a reward function and a value function, respectively, at state x, and ? represents a discount factor, and b(y|x) and ?(y|x) denote state transition probabilities before and after learning, respectively; estimating a logarithm of the density ratio ?(x)/b(x) in Eq. (2); estimating r(x) and V(x) in Eq.

Type: Application

Filed: February 6, 2017

Publication date: May 25, 2017

Applicant: Okinawa Institute of Science and Technology School Corporation

Inventors: Eiji UCHIBE, Kenji DOYA
Key input device

Patent number: 7170495

Abstract: A key inputting device includes a vowel switch for inputting vowels, and consonant switches for inputting consonants. The vowel switch is displacable in five directions, each consonant switch is displacable in three directions, and displacement directions of each switch are allotted to each letter of alphabet corresponding to at least a movement of articulatory organs when pronouncing each letter, that is, a movement or a location of a jaw, a throat, a tongue, lips. The vowel switch is operationable by a thumb, and the consonant switches are operationable by an index finger, a middle finger, a ring finger, and a little finger, respectively.

Type: Grant

Filed: March 5, 2003

Date of Patent: January 30, 2007

Assignee: Advanced Telecommunications Research Institute International

Inventor: Kenji Doya
Key input device

Publication number: 20040178992

Abstract: A key inputting device includes a vowel switch for inputting vowels, and consonant switches for inputting consonants. The vowel switch is displacable in five directions, each consonant switch is displacable in three directions, and displacement directions of each switch are allotted to each letter of alphabet corresponding to at least a movement of articulatory organs when pronouncing each letter, that is, a movement or a location of a jaw, a throat, a tongue, lips. The vowel switch is operationable by a thumb, and the consonant switches are operationable by an index finger, a middle finger, a ring finger, and a little finger, respectively.

Type: Application

Filed: January 5, 2004

Publication date: September 16, 2004

Inventor: Kenji Doya
Agent learning machine

Patent number: 6529887

Abstract: The invention provides a novel highly-adaptive agent learning machine comprising a plurality of learning modules each having a set of reinforcement learning system which works on an environment and determines an action output for maximizing a reward provided as a result thereof and an environment predicting system which predicts a change in the environment, wherein a responsibility signal is calculated such that the smaller a prediction error of the environment predicting system of each of the learning modules, the larger the value thereof, and the action output by the reinforcement learning system is weighted in proportion to the responsibility signal, thereby providing an action with regard to the environment. The machine switches and combines actions optimum to various states or operational modes of an environment without using any specific teacher signal and performs behavior learning flexibly without using any prior knowledge.

Type: Grant

Filed: May 18, 2000

Date of Patent: March 4, 2003

Assignees: Agency of Industrial Science and Technology, Advanced Telecommunication Research Institute International

Inventors: Kenji Doya, Mitsuo Kawato