# Information processing apparatus, information processing method, and program

An information processing apparatus, comprises: a lower time series data generation unit having a plurality of recurrent neural networks which learn predetermined time series data, and generate prediction time series data according to the learning result; an upper time series data generation unit having recurrent neural networks which learn error time series data that is time series data of errors raised at the time of the learning by the respective plural recurrent neural networks of the lower time series data generation unit, and generate prediction error time series data that is time series data of prediction errors according to the learning result; and a conversion unit that performs nonlinear conversion for the prediction errors generated by the upper time series data generation unit, wherein the lower time series data generation unit outputs the prediction time series data generated by the respective plural recurrent neural networks according to the prediction errors which have undergone the nonlinear conversion by the conversion unit.

**Description**

**CROSS REFERENCES TO RELATED APPLICATIONS**

The present invention contains subject matter related to Japanese Patent Application JP 2006-135714 filed in the Japanese Patent Office on May 15, 2006, the entire contents of which being incorporated herein by reference.

**BACKGROUND OF THE INVENTION**

1. Field of the Invention

The present invention relates to an information processing apparatus, an information processing method, and a program, and more particularly, to an information processing apparatus, an information processing method, and a program which can generate time series data more correctly.

2. Description of the Related Art

The present applicant has suggested an invention of generating time series data according to the result of learning using recurrent neural networks (for example, refer to Jpn. Pat. Appln. Laid-Open Publication No. 11-126198).

According to this suggestion, as shown in **1**-**1** to **1**-*v*, and a network of upper hierarchy having RNNs **11**-**1** to **11**-*v. *

In the lower hierarchy network, outputs from the RNNs **1**-**1** to **1**-*v *are supplied to a composition circuit **3** through corresponding gates **2**-**1** to **2**-*v *to be composited.

Similarly, in the upper hierarchy network, outputs from the RNNs **11**-**1** to **11**-*v *are supplied to a composition circuit **13** through corresponding gates **12**-**1** to **12**-*v *to be composited. Then, based on the output from the composition circuit **13** of the upper hierarchy network, the on/off of the lower hierarchy gates **2**-**1** to **2**-*v *is controlled.

In the information processing apparatus shown in **1**-**1** to **1**-*v *are made to generate time series data P**1** to Pv respectively, and a predetermined gate of the lower hierarchy gates **2**-**1** to **2**-*v *is set on or set off based on the output from the upper hierarchy composition circuit **13**. Thus, one of the time series data P**1** to Pv, which is output from a predetermined one of the RNNs **1**-**1** to **1**-*v*, can be selectively output from the composition circuit **3**.

Accordingly, for example, as shown in **1** is generated for a predetermined time period, and the time series data P**2** is generated for a next predetermined time period, and then the time series data P**1** is generated for a next predetermined time period again.

**SUMMARY OF THE INVENTION**

In above-described suggestion, since the Winner-take-all operation is executed, in which any one of the gates **2**-**1** to **2**-*v*is set on, there is raised no problem in case the winner of the gates **2**-**1** to **2**-*v *is clearly discriminated. However, for example, in case the levels determining the winner are antagonistic to each other among plural gates, the winner of the gates **2**-**1** to **2**-*v *may be frequently changed, which makes it difficult to correctly generate time series data.

It is therefore desirable to overcome the above-mentioned drawbacks by providing an information processing apparatus, an information processing method, and a program which can generate time series data more correctly.

According to an embodiment of the present invention, there is provided an information processing apparatus, including: a lower time series data generation means having a plurality of recurrent neural networks which learn predetermined time series data, and generate prediction time series data according to the learning result; an upper time series data generation means having recurrent neural networks which learn error time series data that is time series data of errors raised at the time of the learning by the respective plural recurrent neural networks of the lower time series data generation means, and generate prediction error time series data that is time series data of prediction errors according to the learning result; and a conversion means for performing nonlinear conversion for the prediction errors generated by the upper time series data generation means; wherein the lower time series data generation means outputs the prediction time series data generated by the respective plural recurrent neural networks according to the prediction errors which have undergone the nonlinear conversion by the conversion means.

According to the information processing apparatus, the lower time series data generation means may further include a plurality of gate means for opening and closing the outputs of the prediction time series data at the subsequent stages of the respective plural recurrent neural networks, and the plural gate means open and close the outputs of the prediction time series data according to the prediction errors which have undergone the nonlinear conversion by the conversion means.

According to the information processing apparatus, the lower time series data generation means may further include a composition means for compositing and outputting the prediction time series data output from the plural gate means.

According to the information processing apparatus, the recurrent neural networks of the upper time series data generation means may be recurrent neural networks of the continuous time type.

According to the information processing apparatus, the conversion means may perform the nonlinear conversion for the prediction errors generated by the upper time series data generation means using the softmax function.

The information processing apparatus may further include a temporal filter means for performing the temporal filter processing for the errors output by the lower time series data generation means.

The information processing apparatus may further include a nonlinear filter means for nonlinearly converting the errors output by the lower time series data generation means.

According to the information processing apparatus, at the time of the learning, the lower time series data generation means may update the weight of learning of the respective plural recurrent neural networks according to errors raised at the time of the learning by the respective plural recurrent neural networks.

According to the information processing apparatus, at the time of the learning, of errors raised at the time of the learning by the respective plural recurrent neural networks, the lower time series data generation means may set a recurrent neural network that has raised a minimum error to the winner, and update the weight of learning of the respective plural recurrent neural networks according to the distance from the winner.

According to an embodiment of the present invention, there is also provided an information processing method, including the steps of: learning predetermined time series data, and generating prediction time series data according to the learning result; learning error time series data that is time series data of errors raised at the time of learning the predetermined time series data, and generating prediction error time series data that is time series data of prediction errors according to the learning result; performing nonlinear conversion for the generated prediction errors; and outputting the generated prediction time series data according to the prediction errors which have undergone the nonlinear conversion.

According to an embodiment of the present invention, there is also provided a program that makes a computer execute a processing, the processing including the steps of: learning predetermined time series data, and generating prediction time series data according to the learning result; learning error time series data that is time series data of errors raised at the time of learning the predetermined time series data, and generating prediction error time series data that is time series data of prediction errors according to the learning result; performing nonlinear conversion for the generated prediction errors; and outputting the generated prediction time series data according to the prediction errors which have undergone the nonlinear conversion.

According to one aspect of the present invention, the prediction time series data is generated according to the result of learning the predetermined time series data. Furthermore, the prediction error time series data that is time series data of prediction errors is generated according to the result of learning the error time series data that is time series data of errors raised at the time of learning the predetermined time series data. Moreover, the nonlinear conversion is performed for the generated prediction errors, and the generated prediction time series data is output according to the prediction errors which have undergone the nonlinear conversion.

According to one aspect of the present invention, it becomes possible to generate time series data more correctly.

The nature, principle and utility of the invention will become more apparent from the following detailed description when read in conjunction with the accompanying drawings in which like parts are designate by like reference numerals or characters.

**BRIEF DESCRIPTION OF THE DRAWINGS**

In the accompanying drawings:

**53** shown in

_{1 }that determines learning weight μ_{n }according to use frequency FREQ_{n};

_{2 }that performs nonlinear conversion according to the size of prediction error errorL_{n};

_{3 }to be used in the learning processing shown in

**51**;

**51**;

**51**;

**51**; and

**DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS**

Preferred embodiments of the present invention will be explained hereinafter. The correspondence relationship between constituent features of the present invention and embodiments written in the specification and drawings is represented as follows. This description is intended to confirm that the embodiments supporting the present invention are written in the specification and drawings. Accordingly, even if there are embodiments which are written in the specification and drawings, and are not written here as embodiments corresponding to the constituent features of the present invention, this does not mean that the embodiments do not correspond to the constituent features. Conversely, even if embodiments are written here as those corresponding to the constituent features, this does not mean that the embodiments do not correspond to constituent features other than the constituent features.

According to one aspect of the present invention, there is provided an information processing apparatus (for example, information processing apparatus **51** shown in **61** shown in **62** shown in **63** shown in

According to the aspect of the information processing apparatus, the lower time series data generation unit further includes a plurality of gate units (for example, gates **72**-**1** to **72**-N shown in

According to the aspect of the information processing apparatus, the lower time series data generation unit further includes a composition unit (for example, composition circuit **73** shown in

According to the aspect, the information processing apparatus further includes a temporal filter unit (for example, temporal filter unit **201** shown in

According to the aspect, the information processing apparatus further includes a nonlinear filter unit (for example, nonlinear filter unit **202** shown in

According to one aspect of the present invention, there is also provided an information processing method and program, which include the steps of learning predetermined time series data, and generating prediction time series data according to the learning result (for example, step S**1** to step S**7** shown in **55** shown in **31** to step S**37** shown in **53** shown in **54** shown in **57** shown in

The preferred embodiments of the present invention will further be described below with reference to the accompanying drawings.

**51** according to an embodiment of the present invention.

The information processing apparatus **51** shown in **51** is provided with at least a sensor for sensing a subject which is to be visually recognized, and a motor which is driven so as to move the robot, both of which are not shown, and a sensor motor signal which is a signal from the sensor and motor is supplied to the information processing apparatus **51**.

The information processing apparatus **51** includes a lower time series prediction generation unit **61**, an upper time series prediction generation unit **62**, and a gate signal conversion unit **63**, and executes learning processing of learning time series data given as teacher data and generation processing of generating (reproducing) time series data with respect to input according to the learning result.

In this embodiment, an example of learning and generating action sequences, which are series of performances carried out by a humanoid robot, by the information processing apparatus **51** will be explained.

In the following example, the information processing apparatus **51** learns three action sequences A, B, and C.

The performance of the humanoid robot as the action sequence A is such that the robot in the initial state which spreads out its both arms from side to side visually recognizes a cubic object placed on a table in front, and carries out the performance of seizing and holding up the object using its both hands up to a predetermined height and placing the object on the table again by plural times, and then returns its both arms to a position of the initial state (referred to as home position, hereinafter).

The performance of the humanoid robot as the action sequence B is such that the robot in the initial state visually recognizes a cubic object placed on a table in front, and carries out the performance of touching the object using its right hand and returning its arms to the home position, and then touching the object using its left hand and returning its arms to the home position, that is, touching the object using its one hand alternately, by plural times.

The performance of the humanoid robot as the action sequence C is such that the robot in the initial state visually recognizes a cubic object placed on a table in front, and carries out the performance of touching the object using its both hands once, and then returning its arms to the home position.

The information processing apparatus **51** learns and generates a signal for the sensor (for example, visual sensor) and motor in executing the action sequences A to C.

The lower time series prediction generation unit **61** includes N pieces of recurrent neural networks (referred to as RNNs, hereinafter) **71**-**1** to **71**-N, gates **72**-**1** to **72**-N which are arranged at the subsequent stages of the RNNs **71**-**1** to **71**-N, a composition circuit **73**, an arithmetic circuit **74**, a memory **75**, and a control circuit **76**. In case the RNNs **71**-**1** to **71**-N do not have to be discriminated, the RNNs **71**-**1** to **71**-N are simply referred to as RNN **71**. Similarly, the gates **72**-**1** to **72**-N may be simply referred to as gate **72**.

To the lower time series prediction generation unit **61**, a sensor motor signal from the sensor and motor arranged in the humanoid robot is input. Hereinafter, a sensor motor signal which is input to the lower time series prediction generation unit **61** at the time point “t” is represented as sm (t).

The lower time series prediction generation unit **61** predicts a sensor motor signal sm (t+1) at the time point “t+1” with respect to the sensor motor signal sm (t) at the time point “t” input thereto, according to the learning result, and outputs thus predicted sensor motor signal sm (t+1).

Specifically, the RNN **71**-*n *(n=1, 2, . . . , N) generates the sensor motor signal sm (t+1) at the time point “t+1” with respect to the input sensor motor signal sm (t) at the time point “t”, according to the learning result, and outputs thus generated sensor motor signal sm (t+1) to the gate **72**-*n. *

On the other hand, the action sequence is considered to be configured by a gathering (sequence) of various plural action parts (motion primitives). For example, it can be considered that the action sequence A is a gathering of action parts or visually recognizing an object, making its both hands come close to the object (until seizing the object), holding up the object, getting down thus held up object, and returning its both arms to the home position. Each of the RNNs **71**-**1** to **71**-N exclusively learns time series data of a sensor motor signal corresponding to a single action part.

Accordingly, since action parts learned by the RNNs **71**-**1** to **71**-N are different from each other, even if the same sensor motor signal sm (t) is input to the respective RNNs **71**-**1** to **71**-N, the sensor motor signal sm (t+1) output from the respective RNNs **71**-**1** to **71**-N is different. The sensor motor signal sm (t+1) output from the RNN **71**-*n *is represented as sensor motor signal sm_{n }(t+1).

To the gate **72**-*n *which is arranged at the subsequent stage of the RNN **71**-*n*, in addition to the sensor motor signal sm_{n }(t+1) at the time point. “t+1” from the RNN **71**-*n*, gate signals gate N={g_{1}, g_{2}, . . . , g_{N}} which are control signals for controlling the opened/closed state of the gates **72**-**1** to **72**-N are supplied from the gate signal conversion unit **63**. As will be explained later, the sum of the gate signal g_{n }configuring the gate signals gate N is 1 (Σg_{n}=1).

The gate **72**-*n *opens or closes the output of the sensor motor signal sm_{n }(t+1) from the RNN **71**-*n *according to the gate signal g_{n}. That is, the gate **72**-*n *outputs g_{n}×sm_{n }(t+1) to the composition circuit **73** at the time point “t+1”.

The composition circuit **73** composites outputs from the respective gates **72**-**1** to **72**-N, and outputs thus composited signal as the sensor motor signal sm (t+1) at the time point “t+1”. That is, the composition circuit **73** outputs the sensor motor signal sm (t+1) which is represented by the following mathematical formula (1).

When learning time series data of the sensor motor signal, the arithmetic circuit **74** calculates prediction errors errorL^{t+1 }N={errorL^{t+1}_{1}, errorL^{t+1}_{2}, . . . , errorL^{t+1}_{N}} between the sensor motor signals sm_{1 }(t+1) to sm_{N }(t+1) at the time point “t+1” which are output from the respective RNNs **71**-**1** to **71**-N with respect to the sensor motor signal sm (t) at the time point “It” and a teacher sensor motor signal sm* (t+1) at the time point “t+1” which is given to the lower time series prediction generation unit **61** as teacher data. As will be represented by a mathematical formula (16) to be described later, the prediction errors errorL^{t+1 }N are calculated as errors by taking not only the errors at the time point “t+1” but also the errors for the past L steps from the time point “t+1” into consideration.

The prediction error errorL^{t+1}_{n }of the RNN **71**-*n *at the time point “t+1” calculated by the arithmetic circuit **74** is supplied to the memory **75** to be stored therein.

Since the prediction errors errorL^{t+1 }N are repeatedly calculated in the time-series manner in the arithmetic circuit **74**, and thus calculated prediction errors errorL^{t+1 }N are stored in the memory **75**, time series data errorL N of the prediction errors for the teacher data is stored in the memory **75**. The time series data errorL N of the prediction errors is supplied to the upper time series prediction generation unit **62**. The arithmetic circuit **74** normalizes the time series data errorL N of the prediction errors for the teacher data to a value in the range from “0” to “1”, and outputs thus normalized value.

As described above, the memory **75** stores the time series data errorL N of the prediction errors for the teacher data. Furthermore, the memory **75** stores use frequencies FREQ_{1 }to FREQ_{N }of the RNNs **71**-**1** to **71**-N. The use frequencies FREQ_{1 }to FREQ_{N }of the RNNs **71**-**1** to **71**-N will be explained later with reference to

The control circuit **76** controls the respective units of the lower time series prediction generation unit **61**, or the RNNs **71**-**1** to **71**-N, arithmetic circuit **74**, memory **75**, etc.

On the other hand, the upper time series prediction generation unit **62** is configured by a single continuous time RNN (referred to as CTRNN, hereinafter) **81**.

The CTRNN **81** of the upper time series prediction generation unit **62** estimates (predicts) how much prediction errors the RNNs **71**-**1** to **71**-N of the lower time series prediction generation unit **61** generate at the time of generation, and outputs thus obtained estimation prediction errors.

That is, the CTRNN **81** uses and learns the time series data errorL N of the prediction errors of the RNNs **71**-**1** to **71**-N as the teacher data, and generates and outputs estimation prediction errors errorPredH N={errorPredH_{1}, errorPredH_{2}, . . . , errorPredH_{N}} of the RNNs **71**-**1** to **71**-N based on the learning result. The estimation prediction errors errorPredH N at the time point “t” are set such that errorPredH^{t }N={errorPredH^{t}_{1}, errorPredH^{t}_{2}, . . . , errorPredH^{t}_{N}}.

Furthermore, to the CTRNN **81**, a task ID as a task switch signal for switching which one of the estimation prediction errors errorPredH N of the action sequences A and B is output is given.

The gate signal conversion unit **63** converts the estimation prediction errors errorPredH^{t }N at the time point “t” to gate signals gate^{t }N={g^{t}_{1}, g^{t}_{2}, . . . , g^{t}_{N}} using the softmax function, and outputs thus converted signals to the gates **72**-**1** to **72**-N.

The gate signal g^{t}_{n }for the gate **72**-*n *at the time point “t” is represented by the following mathematical formula (2).

According to the mathematical formula (2), the nonlinear conversion is performed such that a prediction error of small value comes to be of large value, while a prediction error of large value comes to be of small value. As a result, a control under which the gate is opened larger in case the prediction error is of smaller value, while the gate is opened smaller in case the prediction error is of larger value is carried out at the gates **72**-**1** to **72**-N of the lower time series prediction generation unit **61**.

In thus configured information processing apparatus **51**, the upper time series prediction generation unit **62** outputs the estimation prediction errors errorPredH N which are estimation values of prediction errors generated by the RNNs **71**-**1** to **71**-N of the lower time series prediction generation unit **61** at the time of generation, and the estimation prediction errors errorPredH N are converted to the gate signals gate N for controlling the opened/closed state of the gates **72**-**1** to **72**-N. Then, the sum of the output signals sm_{1 }(t+1) to sm_{N }(t+1) of the RNNs **71**-**1** to **71**-N output from the gates **72**-**1** to **72**-N which have their opened/closed state controlled, which is represented by above-described mathematical formula (1), is supplied to the sensor and motor arranged in the humanoid robot as the sensor motor signal sm (t+1) at the time point “t+1”.

Since the estimation prediction errors errorPredH N as the outputs of the upper time series prediction generation unit **62** are converted to the gate signals gate N in the gate signal conversion unit **63** arranged at the subsequent stage, it can be said that the upper time series prediction generation unit **62** predicts which gate among the gates **72**-**1** to **72**-N is opened (large) at the time point “t”.

**71**-*n. *

As shown in **71**-*n *includes an input layer **101**, an intermediate layer (hidden layer) **102**, and an output layer **103**. The input layer **101** has nodes **111** of a predetermined number, the intermediate layer (hidden layer) **102** has nodes **112** of a predetermined number, and the output layer **103** has nodes **113** of a predetermined number.

To the nodes **111** of the input layer **101**, the sensor motor signal sm (t) at the time point “t”, and data that is output from some of the nodes **113** of the output layer **103** at the time point “t−1” which is the previous time point from the time point “t” by one to be fed back as a context c (t) indicative of the internal state of the RNN **71**-*n *are input.

The nodes **112** of the intermediate layer **102** perform the weighting addition processing of summing up data input from the nodes **111** of the input layer **101** and weighting coefficients between the nodes **112** and nodes **111** which have been obtained by the learning in advance, and output thus obtained summed up data to the nodes **113** of the output layer **103**.

The nodes **113** of the output layer **103** perform the weighting addition processing of summing up data input from the nodes **112** of the intermediate layer **102** and weighting coefficients between the nodes **113** and nodes **112** which have been obtained by the learning in advance. Some of the nodes **113** configuring the output layer **103** output thus obtained summed up data as the sensor motor signal sm_{n }(t+1) at the time point “t+1”. Furthermore, other nodes **113** configuring the output layer **103** feed back the summed up data to the nodes **111** of the input layer **101** as a context c (t+1) at the time point “t+1”.

As described above, by carrying out the weighting addition processing using weighting coefficients between nodes which have been obtained by the learning in advance, the RNN **71**-*n *predicts and outputs the sensor motor signal sm_{n }(t+1) at the time point “t+1” with respect to the input sensor motor signal sm (t) at the time point “t”.

In the learning of obtaining weighting coefficients between nodes, the BPTT (Back Propagation Through Time) method is employed. The BPTT method is a learning algorithm for RNNs having a context loop to which the BP (Back Propagation) method in the general hierarchical type neural networks is applied by spatially expanding the state of temporal signal propagation. This method is similarly employed in obtaining weighting coefficients in the CTRNN **81** to be described subsequently.

**81**.

**141** that includes an input layer **151**, an intermediate layer (hidden layer) **152**, an output layer **153**, and arithmetic sections **154**, **155**.

The input layer **151** has input nodes **160**-*i *(i=1, . . . , I), parameter nodes **161**-*r *(r=1, . . . , R), and context input nodes **162**-*k *(k=1, . . . , K), while the intermediate layer **152** has hidden nodes **163**-*j *(j=1, . . . , J). Furthermore, the output layer **153** has output nodes **164**-*i *(i=1, . . . , I) and context output nodes **165**-*k *(k=1, . . . , K).

In case the respective nodes of the input nodes **160**-*i*, parameter nodes **161**-*r*, context input nodes **162**-*k*, hidden nodes **163**-*j*, output nodes **164**-*i*, and context output nodes **165**-*k *do not have to be discriminated, these nodes are simply referred to as input node **160**, parameter node **161**, context input node **162**, hidden node **163**, output node **164**, and context output node **165**.

In the CTRNN **141**, it is learned that, with respect to a state vector x^{u }(t) at the time point “t” input thereto, a state vector x^{u }(t+1) at the time point “t+1” is predicted to be output. The CTRNN **141** has a regression loop called a context loop indicative of the internal state of the network, and the temporal development rule of time series data being the subject can be learned when the processing based on the internal state is performed.

The state vector x^{u }(t) at the time point “t” supplied to the CTRNN **141** is input to the input node **160**. To the parameter node **161**, a parameter tsdata^{u }is input. The parameter tsdata^{u }is data that identifies the kind (pattern of time series data) of the state vector x^{u }(t) supplied to the CTRNN **141**, which is data that identifies the action sequence in the CTRNN **81**. Even if the parameter tsdata^{u }is a fixed value, since it can be considered that the same value is input continuously, data (vector) which is input to the parameter node **161** at the time point “t” is set to parameter tsdata^{u }(t).

To the input nodes **160**-*i*, data x^{u}_{i }(t) which are i-th components configuring the state vector x^{u }(t) at the time point “t” are input. Furthermore, to the parameter nodes **161**-*r*, data tsdata^{u}_{r }(t) which are r-th components configuring the parameter tsdata^{u }(t) at the time point “t” are input. Moreover, to the context input nodes **162**-*k*, data c^{u}_{k }(t) which are k-th components configuring an internal state vector c^{u }(t) of the CTRNN **141** at the time point “t” are input.

In case the data x^{u}_{i }(t), tsdata^{u}_{r }(t), and c^{u}_{k }(t) are input to the respective input nodes **160**-*i*, parameter nodes **161**-*r*, and context input nodes **162**-*k*, the data x_{i }(t), tsdata_{r }(t), and c_{k }(t) which are output from the input nodes **160**-*i*, parameter nodes **161**-*r*, and context input nodes **162**-*k *are represented by the following mathematical formulas (3), (4), and (5).

*x*_{i}(*t*)=*f*(*x*_{i}^{u}(*t*)) [Mathematical Formula 3]

*ts*data_{r}(*t*)=*f*(*ts*data_{r}^{u}(*t*)) [Mathematical Formula 4]

*c*_{k}(*t*)=*f*(*c*_{k}^{u}(*t*)) [Mathematical Formula 5]

The function “f” in the mathematical formulas (3) to (5) is a differentiable continuous function such as the sigmoid function, and the mathematical formulas (3) to (5) represent that the data x^{u}_{i }(t), tsdata^{u}_{r }(t), and c^{u}_{k }(t) which are input to the respective input nodes **160**-*i*, parameter nodes **161**-*r*, and context input nodes **162**-*k *are activated by the function “f”, and output as the data x_{i }(t), tsdata_{r }(t), and c_{k }(t) from the input nodes **160**-*i*, parameter nodes **161**-*r*, and context input nodes **162**-*k*. The superscript “u” of the data x^{u}_{i }(t), tsdata^{u}_{r }(t), and c^{u}_{k }(t) represents the internal state of nodes before being activated (which is similar with respect to other nodes).

Data h^{u}_{j }(t) which is input to the hidden nodes **163**-*j *can be represented by the following mathematical formula (6) using weighting coefficients w^{h}_{ij }representing the weight of coupling between the input nodes **160**-*i *and the hidden nodes **163**-*j*, weighting coefficients w^{h}_{jr }representing the weight of coupling between the parameter nodes **161**-*r *and the hidden nodes **163**-*j*, and weighting coefficients w^{h}_{jk }representing the weight of coupling between the context input nodes **162**-*k *and the hidden nodes **163**-*j*, while data h_{j}(t) which is output from the hidden nodes **163**-*j *can be represented by the following mathematical formula (7).

*h*_{j}^{u}(*t*)=Σ*w*_{ij}^{h}*x*_{i}(*t*)+Σ*w*_{jr}^{h}*ts*data_{r}(*t*)+Σ*w*_{jk}^{h}*c*_{k}(*t*) [Mathematical Formula 6]

*h*_{j}(*t*)=*f*(*h*_{j}^{u}(*t*)) [Mathematical Formula 7]

The “Σ” of the first term of the right-hand side in the mathematical formula (6) represents that the addition is performed with respect to the entire i=1 to I, and the “Σ” of the second term thereof represents that the addition is performed with respect to the entire r=1 to R, and the “Σ” of the third term thereof represents that the addition is performed with respect to the entire k=1 to K.

Similarly, data y^{u}_{i }(t) which is input to the output nodes **164**-*i*, data y_{i }(t) which is output from the output nodes **164**-*i*, data o^{u}_{k }(t) which is input to the context output nodes **165**-*k*, and data o_{k }(t) which is output from the context output nodes **165**-*k *can be represented by the following mathematical formulas.

*y*_{i}^{u}(*t*)=Σ*w*_{ij}^{y}*h*_{j}(*t*) [Mathematical Formula 8]

*y*_{i}*=f*(*y*_{i}^{u}(*t*)) [Mathematical Formula 9]

*o*_{k}^{u}(*t*)=Σ*w*_{jk}^{o}*h*_{j}(*t*) [Mathematical Formula 10]

*o*_{k}(*t*)=*f*(*o*_{k}^{u}(*t*)) [Mathematical Formula 11]

The w^{y}_{ij }in the mathematical formula (8) are weighting coefficients representing the weight of coupling between the hidden nodes **163**-*j *and the output nodes **164**-*i*, and the “Σ” therein represents that the addition is performed with respect to the entire j=1 to J. Furthermore, the w^{o}_{jk }in the mathematical formula (10) are weighting coefficients representing the weight of coupling between the hidden nodes **163**-*j *and the context output nodes **165**-*k*, and the “Σ” therein represents that the addition is performed with respect to the entire j=1 to J.

The arithmetic section **154** calculates the difference Δx^{u}_{i }(t+1) between the data x^{u}_{i }(t) at the time point “t” and the data x^{u}_{i }(t+1) at the time point “t+1” from the data y_{i }(t) which is output from the output nodes **164**-*i *using the following mathematical formula (12), and further calculates the data x^{u}_{i }(t+1) at the time point “t+1” using the following mathematical formula (13) to output thus calculated data.

In those mathematical formulas, the “α” and “τ” each represent an arbitrary coefficient.

Accordingly, when the data x^{u}_{i }(t) at the time point “It” is input to the CTRNN **141**, the data x^{u}_{i }(t+1) at the time point “t+1” is output from the arithmetic section **154** of the CTRNN **141**. The data x^{u}_{i }(t+1) at the time point “t+1” output from the arithmetic section **154** is also supplied (fed back) to the input nodes **160**-*i. *

The arithmetic section **155** calculates the difference Δc^{u}_{k }(t+1) between the data c^{u}_{k }(t) at the time point “t” and the data c^{u}_{k }(t+1) at the time point “t+1” from the data o_{k }(t) which is output from the context output nodes **165**-*k *using the following mathematical formula (14), and further calculates the data c^{u}_{k }(t+1) at the time point “t+1” using the following mathematical formula (15) to output thus calculated data.

The data c^{u}_{k }(t+1) at the time point “t+1” output from the arithmetic section **155** is fed back to the context input nodes **162**-*k. *

In the mathematical formula (15), the internal state vector c^{u }(t+1) of the network at the next time point “t+1” is obtained by weighting and adding (adding with a predetermined ratio) the data o_{k }(t) output from the context output nodes **165**-*k *with the coefficient “α” to the internal state vector c^{u }(t) indicative of the current internal state of the network. Accordingly, it can be said that the CTRNN **141** shown in

As described above, in the CTRNN **141**, when the data x^{u }(t) and c^{u }(t) at the time point “It” is input, since the processing of generating and outputting the x^{u }(t+1) and c^{u }(t+1) at the time point “t+1” is sequentially carried out, in case the weighting coefficients w^{h}_{ij}, w^{h}_{ir}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk }have been learned, by giving the initial value x^{u }(t_{0})=X0 of the input data x^{u }(t) which is input to the input node **160**, parameter tsdata^{u }which is input to the parameter node **161**, and the initial value c^{u }(t_{0})=C0 of the context input data c^{u }(t) which is input to the context input node **162**, time series data can be generated.

The CTRNN **141** shown in **81** shown in **160** of the CTRNN **141**, and the task ID is given to the parameter node **161**. Accordingly, the number of pieces I of the input node **160** shown in **71** of the lower time series prediction generation unit **61**. As the initial value c^{u }(t_{0})=C0 of the context input data c^{u }(t) input to the context input node **162**, for example, a predetermined random value is given.

Next, referring to a flowchart shown in **61** will be explained.

Firstly, in step S**1**, the control circuit **76** of the lower time series prediction generation unit **61** reads in input data at a predetermined time point supplied as teacher data. As described above, the input data is a sensor motor signal, and it is assumed that the sensor motor signal sm (t) at the time point “t” is read in. Thus read in sensor motor signal sm (t) at the time point “t” is supplied to the N pieces of RNNs **71**-**1** to **71**-N configuring the lower time series prediction generation unit **61**, respectively, by the control circuit **76**.

In step S**2**, RNN **71**-*n *(n=1, 2, . . . , N) of the lower time series prediction generation unit **61** calculates the sensor motor signal sm_{n }(t+1) at the time point “t+1” with respect to the sensor motor signal sm (t) at the time point “t”.

Furthermore, in step S**2**, the arithmetic circuit **74** calculates the prediction error errorL^{t+1}_{n }of the RNN **71**-*n*. Specifically, as the prediction error errorL^{t+1}_{n}, the arithmetic circuit **74** calculates prediction errors corresponding to sensor motor signals for the past L time steps from the time point “t+1”, which are represented by the following mathematical formula (16).

In the mathematical formula (16), the sm_{n,i}′ (T) represents a sensor motor signal which is output by the i′-th node **113** of the I′ pieces of nodes **113** (**103** of the RNN **71**-*n *which outputs a sensor motor signal sm (T) at the time point “T”, and the sm*_{n,i}′ (T) represents a sensor motor signal as teacher data corresponding thereto.

According to the mathematical formula (16), the sum of errors between the sensor motor signal sm_{n,i}′ (T) of the i′-th node **113** in the output layer **103** of the RNN **71**-*n *and the teacher data sm*_{n,i}′ (T) from the time point T=t+1−L to the time point T=t+1 is set to the prediction error errorL^{t+1}_{n }of the RNN **71**-*n *at the time point “t+1”. In case the past sensor motor signal does not exist for the L time steps, the prediction error errorL^{t+1 }can be obtained using only data for existing time steps.

In step S**3**, the arithmetic circuit **74** supplies the prediction error errorL^{t+1}_{n }of the RNN **71**-*n *at the time point “t+1” to the memory **75**. Accordingly, the memory **75** is supplied with the “n” pieces of prediction errors errorL^{t+1}_{1 }to errorL^{t+1}_{N }of the RNNs **71**-**1** to **71**-N, and the memory **75** stores the prediction errors errorL^{t+1 }N={errorL^{t+1}_{1}, errorL^{t+1}_{2}, . . . , errorL^{t+1}_{N}}. Furthermore, in case the judgment of processing in step S**7** to be described later is No, since the processing of the step S**3** is repeated by predetermined time steps, in the memory **75**, the time series data errorL N of the prediction errors for the teacher data is stored.

In step S**4**, the control circuit **76** calculates learning weight υ_{n }of the RNN **71**-*n *according to the prediction error errorL^{t+1}_{n}. Specifically, the control circuit **76** calculates the learning weight υ_{n }using the following mathematical formula (17) employing the softmax function.

In step S**5**, the control circuit **76** updates weighting coefficient w_{ab, n }of the RNN **71**-*n *by employing the BPTT (Back Propagation Through Time) method. The weighting coefficient w_{ab, n }represents a weighting coefficient between the nodes **111** of the input layer **101** and the nodes **112** of the intermediate layer **102** of the RNN **71**-*n*, or represents a weighting coefficient between the nodes **112** of the intermediate layer **102** and the nodes **113** of the output layer **103** of the RNN **71**-*n. *

In updating the weighting coefficient w_{ab, n }of the RNN **71**-*n*, the weighting coefficient w_{ab, n }of the RNN **71**-*n *is calculated according to the learning weight υ_{n }calculated in step S**4**. Specifically, by employing the following mathematical formulas (18) and (19), from the s-time-th weighting coefficient w_{ab, n }(S) by the repeated calculation employing the BPTT method, the (s+1)-time-th weighting coefficient w_{ab, n }(s+1) can be obtained.

Δ*w*_{ab,n}(*s+*1)=η_{1}γ_{n}*δw*_{ab,n}+α_{1}*Δw*_{ab,n}(*s*) [Mathematical Formula 18]

*w*_{ab;n}(*s+*1)=*w*_{ab,n}(*s*)+Δ*w*_{ab,n}(*s+*1) [Mathematical Formula 19]

In the mathematical formula (18), the η_{1 }represents a learning coefficient, and the α_{1 }represents an inertia coefficient. In the mathematical formula (18), in case s=1, Δw_{ab, n }(s) is set to “0”.

In step S**6**, the control circuit **76** supplies the use frequencies FREQ_{1 }to FREQ_{N }of the RNNs **71**-**1** to **71**-N to the memory **75**. The memory **75** stores thus supplied use frequencies FREQ_{1 }to FREQ_{N }of the RNNs **71**-**1** to **71**-N. In above-described step S**5**, in case the learning weight υ_{n }is larger, the weighting coefficient w_{ab, n }of the RNN **71**-*n *is updated, and it is considered that the RNN **71**-*n *is utilized. Accordingly, for example, the control circuit **76** counts up the use frequency FREQ_{n }of the RNN **71**-*n *whose learning weight υ_{n }is equal to or larger than a predetermined value. These use frequencies FREQ_{1 }to FREQ_{N }are used in additional learning to be described later with reference to

In step S**7**, the control circuit **76** of the lower time series prediction generation unit **61** judges whether or not supplying input data is ended.

In step S**7**, in case it is determined that supplying input data is not ended, that is, in case input data at the next time point following after the input data supplied in step S**1** is supplied, returning to step S**1**, the subsequent processing is repeated.

On the other hand, in step S**7**, in case it is determined that supplying input data is ended, the learning processing is ended.

Next, learning the time series data of the prediction errors by the CTRNN **81** of the upper time series prediction generation unit **62** will be explained.

In case of making a humanoid robot having built therein the information processing apparatus **51** learn a plurality of action sequences, it is necessary that the weighting coefficients w^{h}_{ij}, w^{h}_{jr}, and w^{h}_{jk }between the respective nodes of the input layer **151** and intermediate layer **152** and the weighting coefficients w^{y}_{ij }and w^{o}_{jk }between the respective nodes of the intermediate layer **152** and output layer **153**, which are obtained as the result of the learning, are values capable of corresponding to all the action sequences.

Accordingly, in the learning processing, learning time series data corresponding to the plural action sequences is executed simultaneously. That is, in the learning processing, the CTRNNs **141** (^{h}_{ij}, w^{h}_{jr}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk }are obtained for the respective action sequences. Then, by repeatedly executing the processing of setting the average values of the weighting coefficients to single weighting coefficients w^{h}_{ij}, w^{v}_{jr}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk}, weighting coefficients w^{h}_{ij}, w^{h}_{jr}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk }of the CTRNN **81** to be utilized in the generation processing can be obtained.

**62** in learning time series data of Q pieces of prediction errors corresponding to Q pieces of action sequences. In this embodiment, since action sequences to be learned are A, B, and C, the number of the action sequences is three, that is, Q=3.

Firstly, in step S**31**, the upper time series prediction generation unit **62** reads in the time series data errorL N of the Q pieces of prediction errors as teacher data from the memory **75** of the lower time series prediction generation unit **61**. Then, the upper time series prediction generation unit **62** supplies thus read in Q pieces of time series data errorL N to the Q pieces of CTRNNs **141**, respectively.

In step S**32**, the upper time series prediction generation unit **62** reads in task IDs for identifying the respective Q pieces of action sequences. In this embodiment, task IDs for identifying the three action sequences A, B, and C are read in. Then, the upper time series prediction generation unit **62** supplies a task ID for identifying the action sequence A to one of the CTRNNs **141** to which teacher data for the action sequence A is supplied, supplies a task ID for identifying the action sequence B to one of the CTRNNs **141** to which teacher data for the action sequence B is supplied, and supplies a task ID for identifying the action sequence C to one of the CTRNNs **141** to which teacher data for the action sequence C is supplied.

In step S**33**, the upper time series prediction generation unit **62** assigns “1” to the variable “s” representing the number of times of learning.

In step S**34**, in the CTRNNs **141** corresponding to the Q pieces of time series data, the upper time series prediction generation unit **62** calculates error amounts δw^{h}_{ij}, δw^{h}_{jr}, and δw^{h}_{jk }of the weighting coefficients w^{h}_{ij }(S), w^{h}_{jr }(s), and w^{h}_{jk }(s) between the respective nodes of the input layer **151** and intermediate layer **152**, and error amounts δw^{y}_{ij }and δw^{h}_{jk }of the weighting coefficients w^{y}_{ij }(s) and w^{o}_{jk }(s) between the respective nodes of the intermediate layer **152** and output layer **153** by employing the BPTT method. In the CTRNNs **141** to which the q-th (q=1, . . . , Q) time series data is input, the error amounts δw^{h}_{ij}, δw^{h}_{jr}, δw^{h}_{jk}, δw^{y}_{ij}, and δw^{o}_{jk }which are obtained by employing the BPTT method are represented as error amounts δw^{h}_{ij, q}, δw^{h}_{jr, q}, δw^{h}_{jk, q}, δw^{y}_{ij, q}, and δw^{o}_{jk, q}.

In the calculation employing the BPTT method in step S**34**, when inversely propagating error amount δc^{u}_{k }(t+1) of the data c^{u}_{k }(t+1) of the context input nodes **162**-*k *at the time point “t+1” to error amount δo_{k }(t) of the data o_{k }(t) of the context output nodes **165**-*k *at the time point “t”, the upper time series prediction generation unit **62** adjusts the time constant of the context data by carrying out the division processing with an arbitrary positive coefficient “m”.

That is, the upper time series prediction generation unit **62** obtains the error amount δo_{k }(t) of the data o_{k }(t) of the context output nodes **165**-*k *at the time point “t” employing the following mathematical formula (20) using the error amount δc^{u}_{k }(t+1) of the data c^{u}_{k }(t+1) of the context input nodes **162**-*k *at the time point “t+1”,

Employing the mathematical formula (20) in the BPTT method, the degree of influence one time step ahead of the context data representing the internal state of the CTRNN **141** can be adjusted.

In step S**35**, the upper time series prediction generation unit **62** averages and updates the respective weighting coefficients w^{h}_{ij}, w^{h}_{jr}, and w^{h}_{jk }between the respective nodes of the input layer **151** and intermediate layer **152** and the respective weighting coefficients w^{y}_{ij }and w^{o}_{jk }between the respective nodes of the intermediate layer **152** and output layer **153** using Q pieces of time series data.

That is, employing the following mathematical formulas (21) to (30), the upper time series prediction generation unit **62** obtains weighting coefficients w^{h}_{ij }(S+1), w^{h}_{jr }(S+1), and w^{h}_{jk }(S+1) between the respective nodes of the input layer **151** and intermediate layer **152** and weighting coefficients w^{y}_{ij }(S+1) and w^{o}_{jk }(S+1) between the respective nodes of the intermediate layer **152** and output layer **153**.

In the mathematical formulas, the η_{2 }represents a learning coefficient, and α_{2 }represents an inertia coefficient. In the mathematical formulas (21), (23), (25), (27), and (29), in case s=1, Δw^{h}_{ij }(s), Δw^{h}_{jr }(s), Δw^{h}_{jk }(s) Δw^{y}_{ij }(s), and Δw^{o}_{jk }(s) are set to “0”.

In the step S**36**, the upper time series prediction generation unit **62** judges whether or not the variable “s” is equal to or smaller than a predetermined number of times of learning. The predetermined number of times of learning set up here is the number of times of learning which can be recognized when the learning error is sufficiently small.

In step S**36**, in case it is determined that the variable “s” is equal to or smaller than the predetermined number of times of learning, that is, the learning has not been carried out by the number of times under which it is recognized that the learning error is sufficiently small, in step S**37**, the upper time series prediction generation unit **62** increments the variable “s” by “1”, and the processing returns to step S**34**. Accordingly, the processing from step S**34** to step S**36** is repeated. On the other hand, in step S**36**, in case it is determined that the variable “s” is larger than the predetermined number of times of learning, the learning processing is ended.

In step S**36**, other than determining the ending of the learning processing depending on the number of times of learning, the ending of the learning processing may be determined depending on whether or not the learning error is within a predetermined reference value.

As described above, in the learning processing of the upper time series prediction generation unit **62**, by obtaining the weighting coefficients w^{h}_{ij}, w^{h}_{jr}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk }for the respective action sequences, and repeatedly executing the processing of obtaining the average values thereof, the weighting coefficients w^{h}_{ij}, w^{h}_{ir}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk }of the CTRNN **81** to be utilized in the generation processing can be obtained.

In above-described learning processing, the processing of obtaining the average values of the weighting coefficients w^{h}_{ij}, w^{h}_{jr}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk }for the respective action sequences is executed every time. On the other hand, this processing may be carried out every predetermined number of times. For example, in case the predetermined number of times of learning that ends the learning processing is 10000 times, the processing of obtaining the average values of the weighting coefficients w^{h}_{ij}, w^{h}_{jr}, w^{h}_{jk}, w^{y}_{ij}, and w^{o}_{jk }for the respective action sequences may be executed every 10 times of learning.

Next, referring to a flowchart shown in **51** shown in **71**-**1** to **71**-N and CTRNN **81** in which the weighting coefficients obtained in the learning processing explained with reference to

Firstly, in step S**51**, the CTRNN **81** of the upper time series prediction generation unit **62** reads in the initial value of input data. The initial value of input data is the initial value to be supplied to the input node **160** and context input node **162** to which a predetermined random value is supplied.

In step S**52**, the CTRNN **81** of the upper time series prediction generation unit **62** reads in a task ID for identifying an action sequence. Thus read in task ID is supplied to the parameter node **161**.

In step S**53**, the CTRNN **81** of the upper time series prediction generation unit **62** executes the processing of generating the estimation prediction errors errorPredH N of the RNNs **71**-**1** to **71**-N at a predetermined time point. The details of the generation processing will be explained later with reference to **81** may generate the estimation prediction errors errorPredH^{t+1 }N at the time point “t+1”, and outputs thus generated estimation prediction errors to the gate signal conversion unit **63**.

In step S**54**, the gate signal conversion unit **63** converts thus supplied estimation prediction errors errorPredH^{t+1 }N to the gate signals gate^{t+1 }N by employing the mathematical formula (2), and outputs thus converted gate signals to the gates **72**-**1** to **72**-N.

In step S**55**, the sensor motor signal sm (t) at the time point “t” is input to the RNN **71**-*n *of the lower time series prediction generation unit **61**, and, with respect to thus input sensor motor signal sm (t) at the time point “t”, the RNN **71**-*n *generates the sensor motor signal sm_{n }(t+1) at the time point “t+1”, and outputs thus generated sensor motor signal to the gate **72**-*n. *

In step S**56**, the gate **72**-*n *outputs the sensor motor signal sm_{n }(t+1) corresponding to the gate signal g^{t+1}_{n }of the gate signals gate^{t+1 }N supplied from the gate signal conversion unit **63**. That is, in the gate **72**-*n*, the gate is opened large in case the gate signal g^{t+1}_{n }is large, while the gate is opened small in case the gate signal g^{t+1}_{n }is small. To the composition circuit **73**, the sensor motor signal sm_{n }(t+1) according to the opened state of the gate of the gate **72**-*n *is supplied.

In step S**57**, the composition circuit **73** composites outputs from the respective gates **72**-**1** to **72**-N employing the mathematical formula (1), and outputs thus composited signal as the sensor motor signal sm (t+1) at the time point “t+1”.

In step S**58**, the information processing apparatus **51** judges whether or not generating time series data will be ended. In step S**58**, in case it is determined that generating time series data will not be ended, the processing returns to the step S**53**, and the subsequent processing is repeated. As a result, in the upper time series prediction generation unit **62**, estimation prediction errors errorPredH^{t+2 }N at the time point “t+2” following after the time point “t+1” processed in previous step S**53** are generated, while in the lower time series prediction generation unit **61**, a sensor motor signal sm (t+2) with respect to the sensor motor signal sm (t+1) at the time point “t+1” is generated.

On the other hand, in step S**58**, reaching a predetermined time step number, in case it is determined that generating time series data will be ended, the generation processing is ended.

Next, referring to a flowchart shown in **53** of ^{t+1 }N at the time point “t+1” will be explained.

Firstly, in step S**71**, the input nodes **160**-*i *calculates the data x_{i }(t) using the mathematical formula (3), and the parameter nodes **161**-*r *calculates the data tsdata_{r }(t) using the mathematical formula (4), and the context input nodes **162**-*k *calculates the data c_{k }(t) using the mathematical formula (5), outputting the data respectively.

In step S**72**, the hidden nodes **163**-*j *obtain the data h^{u}_{j }(t) by calculating the mathematical formula (6), and calculates and outputs the data h_{j }(t) using the mathematical formula (7).

In step S**73**, the output nodes **164**-*i *obtain the data y^{u}_{i }(t) by calculating the mathematical formula (8), and calculates and outputs the data y_{i }(t) using the mathematical formula (9).

In step S**74**, the context output nodes **165**-*k *obtains the data o^{u}_{k }(t) by calculating the mathematical formula (10), and calculates and outputs the data o_{k }(t) using the mathematical formula (11).

In step S**75**, the arithmetic section **154** obtains the difference Δx^{u}_{i }(t+1) using the mathematical formula (12), and calculates the data x^{u}_{i }(t+1) at the time point “t+1” using the mathematical formula (13), and outputs thus calculated data to the gate signal conversion unit **63**.

In step S**76**, the arithmetic section **155** obtains the difference Δc^{u}_{k }(t+1) using the mathematical formula (14), and calculates the data c^{u}_{k }(t+1) at the time point “t+1” using the mathematical formula (15). Furthermore, the arithmetic section **155** feeds back the data c^{u}_{k }(t+1) at the time point “t+1” which is obtained after the calculation using the mathematical formula (15) to the context input nodes **162**-*k. *

In step S**77**, the arithmetic section **154** feeds back the data x^{u}_{i }(t+1) at the time point “t+1” which is obtained after the calculation using the mathematical formula (13) to the input nodes **160**-*i*. Then, the processing returns to step S**53** in **54**.

As described above, in the generation processing in **62** outputs the estimation prediction errors errorPredH N which are estimation values of prediction errors generated by the RNNs **71**-**1** to **71**-N of the lower time series prediction generation unit **61** at the time of generation, and the estimation prediction errors errorPredH. N are converted to the gate signals gate N for controlling the opened/closed state of the gates **72**-**1** to **72**-N. Then, the sum of the output signals sm_{1 }(t+1) to sm_{N }(t+1) of the RNNs **71**-**1** to **71**-N output from the gates **72**-**1** to **72**-N which have their opened/closed state controlled, which is represented by above-described mathematical formula (1), is supplied to the sensor and motor arranged in the humanoid robot as the sensor motor signal sm (t+1) at the time point “t+1”, and the action sequence specified by the task ID is executed.

Next, additional learning that makes the information processing apparatus **51** additionally learn action sequences other than the action sequences A, B, and C that have been learned up to then will be explained. Hereinafter, an action sequence D is additionally learned, under which the robot in the home position carries out the performance of seizing and holding up an object using its both hands up to a predetermined height, and placing the object on a front table that is higher than a table on which the object is originally placed by one stage, and returning to the home position.

As described above, in the RNNs **71**-**1** to **71**-N of the lower time series prediction generation unit **61**, action parts which are different from each other are learned. Furthermore, in general, the N pieces which represents the number of pieces of the RNN **71** is largely prepared as compared with the number of the action parts sufficiently. Accordingly, among the RNNs **71**-**1** to **71**-N, there exist RNNs **71** (referred to as unused RNNs **71** arbitrarily, hereinafter) in which action parts are not learned.

In case of making the information processing apparatus **51** learn the new action sequence D in addition to the action sequences A, B, and C that have been learned up to then, the efficiency is improved when employing the manner of making RNNs **71** in which action parts have been learned intact, and making unused RNNs **71** learn new action parts included in the additional action sequence D. In this case, even if the additional action sequence D is learned, RNNs **71** which have performed the learning up to then are not broken down (weighting coefficients of RNNs **71** are not updated), and, in case action parts which have been learned up to then are included in the new action sequence D, the action parts can be utilized in common.

Accordingly, in additionally learning the action sequence D, to RNNs **71** in which action parts have been learned, the lower time series prediction generation unit **61** gives a resistance that makes it hard to change weighting coefficients.

The RNNs **71** in which action parts have been learned are RNN **71**-*n *whose use frequency FREQ_{N }stored in the memory **75** in step S**6** in

Accordingly, in the control circuit **76** of the lower time series prediction generation unit **61**, as shown in **71**-*n *whose use frequency FREQ_{N }is small, while it is difficult to update the weighting coefficient as for the RNN **71**-*n *whose use frequency FREQ_{n }is large. That is, the learning weight μ_{n }is determined depending on a function h_{1 }having the negative correlation in the use frequency FREQ_{n}. _{1}, which curve is large in inclination in case the use frequency FREQ_{n }is small, while small in inclination in case the use frequency FREQ_{n }is large. In _{1 }is represented as a nonlinear curve. On the other hand, a linear straight line may be employed so long as the function has the negative correlation.

Next, referring to a flowchart shown in **51** will be explained.

Firstly, in step S**101**, the control circuit **76** of the lower time series prediction generation unit **61** reads in the use frequencies FREQ_{1 }to FREQ_{N }of the RNNs **71**-**1** to **71**-N stored in the memory **75**.

In step S**102**, the control circuit **76** of the lower time series prediction generation unit **61** determines the learning weight in according to the use frequency FREQ_{n }of the RNN **71**-*n *using the function h_{1 }shown in _{n }is supplied to the RNN **71**-*n. *

In step S**103**, the information processing apparatus **51** executes the learning processing of the lower time series prediction generation unit **61** shown in **1** to S**7**. In step S**5** shown in **103**, instead of the mathematical formula (18), the following mathematical formula (31) including the learning weight μ_{n }is employed.

Δ*w*_{ab,n}(*s+*1)=η_{1}μ_{n}γ_{n}*δw*_{ab,n}+α_{1}*Δw*_{ab,n}(*s*) [Mathematical Formula 31]

After the processing of step S**103**, the time series data errorL N of the prediction errors of the action sequence D is stored in the memory **75**.

In step S**104**, the information processing apparatus **51** reads in the time series data errorL N of the prediction errors of the action sequence D added to the action sequences A, B, and C from the memory **75**, and, with respect to time series data of the four pieces of prediction errors, executes the learning processing of the upper time series prediction generation unit **62** shown in **31** to S**37**. Then, the additional learning processing is ended.

As described above, in the additional learning processing of the information processing apparatus **51**, with respect to the RNN **71**-*n *whose use frequency FREQ_{n }is large in the learning up to then, the learning weight μ_{n }making it difficult to change the weighting coefficient is given so as to learn the weighting coefficient. Accordingly, without changing the weighting coefficient of the RNN **71** which has been learned up to then by the learning of the additional action sequence D as much as possible, it becomes possible to learn the added action sequence effectively.

Next, another configuration of the information processing apparatus employing the present invention will be explained.

**51**. In **51** shown in

The configuration of the information processing apparatus **51** shown in **51** shown in **201** and a nonlinear filter unit **202** are newly provided.

To the temporal filter unit **201**, the time series data errorL N of the prediction errors output from the lower time series prediction generation unit **61** is input. The temporal filter unit **201** and nonlinear filter unit **202** performs predetermined filter processing for time series data input thereto, and outputs the time series data after the processing to the subsequent stage. The nonlinear filter unit **202** supplies the time series data after the processing to the upper time series prediction generation unit **62** as time series data errorL′ N of the prediction errors.

The upper time series prediction generation unit **62** learns the time series data of the prediction errors, and has to learn the rough variation of prediction errors of the RNNs **71**-**1** to **71**-N in a long time step to some extent, and the minute variation in a short time period is not related thereto.

The temporal filter unit **201** performs the temporal filter processing for the time series data errorL N of the prediction errors output from the lower time series prediction generation unit **61**. That is, the temporal filter unit **201** performs the lowpass filter processing for the time series data errorLN of the prediction errors output from the lower time series prediction generation unit **61**, and supplies the time series data after the processing to the nonlinear filter unit **202**. For example, as the lowpass filter processing, the moving average of a predetermined time step number may be used. Accordingly, the time series data of the prediction errors of the RNNs **71**-**1** to **71**-N in which the minute variation in a short time period is suppressed can be supplied to the upper time series prediction generation unit **62**.

The upper time series prediction generation unit **62** can be made to learn the rough variation of prediction errors of the RNNs **71**-**1** to **71**-N in a long time step to some extent by making the sampling rate, under which the CTRNN **81** of the upper time series prediction generation unit **62** samples time series data, larger than the sampling rate of the RNN **71** of the lower time series prediction generation unit **61**. For example, the upper time series prediction generation unit **62** can learn the rough variation of prediction errors of the RNNs **71**-**1** to **71**-N by learning time series data which is obtained by thinning time series data of the RNN **71** of the lower time series prediction generation unit **61** every predetermined time interval. Furthermore, by adjusting the coefficient “τ” of the mathematical formulas (13) and (15), time sampling can be adjusted. In this case, in case the coefficient “τ” is larger, the rough variation of prediction errors of the RNNs **71**-**1** to **71**-N can be learned.

The nonlinear filter unit **202** converts input prediction error errorL_{n }using a function h_{2 }that is represented by a nonlinear curve whose inclination is large in the range where input prediction error errorL_{n }is small, and becomes smaller as the input prediction error errorL_{n }becomes large, as shown in **202** supplies prediction error errorL′ N obtained after the conversion processing to the upper time series prediction generation unit **62**.

In the generation processing by the information processing apparatus **51**, as described above by referring to _{n }of the RNN **71**-*n *which is obtained by learning the prediction errors errorL N is, the larger the gate is opened. Conversely, the sensor motor signal sm_{n }(t+1) which is output from the RNN **71**-*n *whose estimation prediction error errorPredH_{n }is large is scarcely used.

Accordingly, the smaller the estimation prediction error errorPredH_{n }of the RNN **71**-*n *is, the higher the contribution ratio to the sensor motor signal sm (t+1) output from the lower time series prediction generation unit **61** becomes, and it can be said that the RNN **71**-*n *is important.

When considering the case in which prediction error errorL_{1 }of the RNN **71**-**1** and prediction error errorL_{n }of the RNN **71**-*n *are antagonistic to each other at a small value between “0” and “1” (for example, 0.3), and the case in which they are antagonistic to each other at a large value between “0” and “1” (for example, 0.9), in case the prediction error errorL_{1 }of the RNN **71**-**1** and the prediction error errorL_{n }of the RNN **71**-*n *are antagonistic to each other at a small value between “0” and “1”, at the time of generation, the contribution ratio of the sensor motor signal sm_{1 }(t+1) or sensor motor signal sm_{n }(t+1) output from the RNN **71**-**1** or RNN **71**-*n *to the sensor motor signal sm (t+1) output from the lower time series prediction generation unit **61** is high, which of the sensor motor signals of the RNN **71**-**1** and RNN **71**-*n *is superior comes to be important.

On the other hand, in case the prediction error errorL_{1 }of the RNN **71**-**1** and the prediction error errorL_{n }of the RNN **71**-*n *are antagonistic to each other at a large value between “0” and “1”, it can be considered that, other than the RNN **71**-**1** and RNN **71**-*n*, there exists an RNN **71** having a smaller prediction error, and at the time of generation, since the ratio that the sensor motor signal sm_{1 }(t+1) or sensor motor signal sm_{n }(t+1) output from the RNN **71**-**1** or RNN **71**-*n *is included in the sensor motor signal sm (t+1) output from the lower time series prediction generation unit **61** is small, which of the sensor motor signals of the RNN **71**-**1** and RNN **71**-*n*is superior is not important.

Using the function h_{2}, the nonlinear filter unit **202** enlarges the superiority difference of the RNNs **71** whose prediction error errorL which is important in generating the sensor motor signal sm (t+1) is small, while lessens the superiority difference of the RNNs **71** whose prediction error errorL which is not important in generating the sensor motor signal sm (t+1) is large. Accordingly, in the upper time series prediction generation unit **62**, the prediction error errorL which is important in the learning and output from the RNN **71** can be effectively learned.

The performance of the temporal filter unit **201** and nonlinear filter unit **202** is, in the case in which the upper time series prediction generation unit **62** reads in the time series data errorL N of the Q pieces of prediction errors as teacher data from the memory **75** of the lower time series prediction generation unit **61** in step S**31** of the flowchart shown in **201** and nonlinear filter unit **202**.

The temporal filter unit **201** and nonlinear filter unit **202** do not have to be simultaneously arranged necessarily, and any one of them may be arranged.

In the information processing apparatus **51** shown in **61** having the plural RNNs **71**-**1** to **71**-N, the model of the Mixture of RNN Expert that integrates the plural RNN outputs using the gate mechanism to determine the final output is employed. On the other hand, configurations other than the Mixture of RNN Expert may be employed.

As configurations other than the Mixture of RNN Expert, for example, the RNN-SOM may be employed in which the self-organization map used in the category learning for vector patterns (referred to as SOM, hereinafter) is introduced, RNNs are used for respective nodes of the SOM, appropriate RNNs are selected for external inputs in the self-organization manner, and the parameter learning of the RNNs is carried out. Concerning the SOM, details are written in “T. Kohonen, “self-organization map”, Springer • Verlag Tokyo”.

In the model of the Mixture of RNN Expert shown in

On the other hand, in the RNN-SOM, all the RNNs calculate learning errors (prediction errors) with respect to new learning samples (or time series data), and the RNN whose learning error is smallest is determined as the winner. After the winner of the RNNs is determined, the concept of distance space with RNNs other than the self RNN is introduced with respect to the respective RNNs, in which, irrespective of the learning errors of the respective RNNs, the RNN which is close to the winner of the RNNs learns the learning samples according to the degree of neighborhood with the winner.

**61**.

The learning processing shown in **124** shown in **4** shown in

That is, step S**121** to step S**123** and step, S**125** to step S**127** shown in **1** to step S**3** and step. S**5** to step S**7** shown in

In step S**124**, the lower time series prediction generation unit **61** sets an RNN **71** whose prediction error errorL^{t+1 }is minimum to the winner, and, based on a neighborhood function h_{3 }shown in _{n }according to the distance (DISTANCE_{n}) from the winner.

In the neighborhood function h_{3}, as shown in **71**-*n*, the distance (DISTANCE_{n}) from which to the winner is close, the large learning weight υ_{n }is allocated.

Next, referring to **51** learn and generate action sequences to be carried out by a humanoid robot will be described.

In this experiment, the example of the information processing apparatus **51** shown in **61**, is shown. The number of pieces N of the RNN **71** of the lower time series prediction generation unit **61** is set to 16 (N=16).

**51** generates the action sequence A after learning the action sequences A, B, and C.

**165** of the CTRNN **141** as the CTRNN **81** of the upper time series prediction generation unit **62** at the time of generation.

**81** of the upper time series prediction generation unit **62**.

**63**.

**73** of the lower time series prediction generation unit **61**, while **73** of the lower time series prediction generation unit **61**, respectively. In

The abscissa axis of the **165**, motor signals, and sensor signals, which are values from “0” to “1”. The ordinate axis of the **71** number (1 to 16) of the lower time series prediction generation unit **61**.

In _{n }or gate signal g^{t}_{n }for the RNN **71**-*n *corresponds to the grey level. In _{n }is small (that is, close to “0”), the grey level is blackly (thickly) represented, while in ^{t}_{n }is large (that is, close to “1”), the grey level is blackly (thickly) represented.

**51** generates the action sequence. B after learning the action sequences A, B, and C, while

**51** generates the action sequence D after learning the action sequences A, B, and C, and then additionally learning the action sequence D.

In generating time series data corresponding to the action sequence A, as shown in **71**-**14** is effective since the gate **72**-**14** is opened, and then in the posterior half of the sequence, the RNN **71**-**4** is effective since the gate **72**-**4** is opened.

On the other hand, in the conversion from data shown in _{1 }to errorPredH_{16 }is set to the sole winner is not employed but the softmax function of above-described mathematical formula (2) is employed, the effective RNN **71** is not discretely switched from the RNN **71**-**14** to the RNN **71**-**4** from a predetermined time point (time step) but the switching from the RNN **71**-**14** to the RNN **71**-**4** is slowly performed as the time lapses.

Accordingly, even in the case in which plural values among the errorPredH_{1 }to errorPredH_{16 }are antagonistic to each other, the winner is not alternated frequently, and the outputting is performed as the antagonistic state in the antagonistic state, which makes it possible to correctly generate the learned time series data.

In generating the action sequence B, as shown in **71**-**14**, RNN **71**-**2**, RNN **71**-**13**, RNN **71**-**1**, RNN **71**-**11** are effective, in this order.

In generating the action sequence C, as shown in **71**-**2**, RNN **71**-**12**, RNN **71**-**3** are effective, in this order.

In generating the action sequence D, as shown in **71**-**5**, RNN **71**-**15**, RNN **71**-**3**, RNN **71**-**16** are effective, in this order.

In switching the gate **72** of the action sequences B to D, the result similar to that in the case of the action sequence A shown in

That is, in case the gate signals gate N are switched from the RNN **71**-*n *whose estimation prediction error errorPredH_{n }is largest at a predetermined time point to the RNN **71**-*n*′ (n≠n′) whose estimation prediction error errorPredH_{n′} is second largest in a predetermined time period, the gate signal g_{n }gradually gets smaller, while the gate signal g_{n′} gradually gets larger. That is, in the gate **72**-*n*, the output of the sensor motor signal sm_{n }(t+1) is gradually closed, while in the gate **72**-*n*′, the output of the sensor motor signal sm_{n′} (t+1) is gradually opened.

Accordingly, even in the case in which plural values among the errorPredH_{1 }to errorPredH_{16 }are antagonistic to each other, the winner is not alternated frequently, and the outputting is performed as the antagonistic state in the antagonistic state, which makes it possible to correctly generate the learned time series data.

Furthermore, in the generation result of the action sequence D learned by the additional learning shown in **71**-**5**, RNN **71**-**15**, RNN **71**-**16**, which are not effective in the action sequences A to C, are effective, it can be seen that a new RNN **71** learns action parts which are not in the action sequences A to C which have been learned up to then.

Above-described series of processing can be executed by a hardware or a software. In case of making a software execute this series of processing, programs configuring the software are installed into a computer that is built in a dedicated hardware, or a general-purpose personal computer that can execute various functions when various programs are installed thereinto, from a program recording medium.

**301** executes various processing in accordance with programs stored in a ROM (Read Only Memory) **302** or a storage unit **308**. In a RAM (Random Access Memory) **303**, programs to be executed by the CPU **301** or various data are arbitrarily stored. These CPU **301**, ROM **302**, and RAM **303** are mutually connected through a bus **304**.

To the CPU **301**, an input-output interface **305** is connected through the bus **304**. To the input-output interface **305**, an input unit **306** composed of a keyboard, a mouse, and a microphone, a display composed of a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), etc., and an output unit **307** composed of loudspeakers are connected. The CPU **301** executes various processing according to commands input from the input unit **306**. Then, the CPU **301** outputs the processing result to the output unit **307**.

The storage unit **308** connected to the input-output interface **305** may be configured by a hard disk, and stores programs to be executed by the CPU **301** or various data. A communication unit **309** communicates with external devices through a network such as the Internet or a local area network, or communicates with external devices connected thereto directly.

A drive **310** connected to the input-output interface **305** drives a removable media **321** such as a magnetic disk, an optical disc, a magneto optical disc, or a semiconductor memory when the removable media **321** is loaded therein, and obtains programs or data recorded therein. Then, thus obtained programs or data are transferred to the storage unit **308**, as circumstances demand, to be stored therein. Programs or data may be obtained through the communication unit **309** to be stored in the storage unit **308**.

A program recording medium that is installed in a computer, and stores programs to be executed by the computer is configured by a magnetic disk (including a flexible disk), an optical disc (including a CD-ROM-(Compact Disc-Read Only Memory) and a DVD (Digital Versatile Disc)), a magneto optical disc, or the removable media **321** which is a package media configured by a semiconductor memory, or the ROM **302** in which programs are stored transiently or perpetually, or a hard disk configuring the storage unit **308**, as shown in **309** being an interface such as a router, a modem, etc., as circumstances demand, utilizing a wired or wireless communication medium such as a local area network, the Internet, Digital Satellite Broadcasting.

In above-described example, the switching of the action sequences A to C at the time of generation is carried out by changing the task ID of the CTRNN **81**. On the other hand, the switching of the action sequences A to C at the time of generation may be carried out by changing the initial value to be given to the context input node **162** without making the CTRNN **81** input the task ID.

In these embodiments, steps written in the flowcharts are those of a processing that is performed in the time-series manner along the written order, or a processing that is performed in parallel or individually not in the time-series manner necessarily.

The embodiments according to the present invention are not restricted to above-described embodiments, and various modifications are possible without departing from the scope and spirit of the present invention.

It should be understood by those skilled in the art that various modifications, combinations sub-combinations and alterations may occur depending on design requirements and other factors insofar as they are within the scope of the appended claims or the equivalents thereof.

## Claims

1. An information processing apparatus, including:

- lower time series data generation means having a plurality of recurrent neural networks which learn predetermined time series data, and generate prediction time series data according to the learning result;

- upper time series data generation means having recurrent neural networks which learn error time series data that is time series data of errors raised at the time of the learning by the respective plural recurrent neural networks of the lower time series data generation means, and generate prediction error time series data that is time series data of prediction errors according to the learning result; and

- conversion means for performing nonlinear conversion for the prediction errors generated by the upper time series data generation means,

- wherein the lower time series data generation means outputs the prediction time series data generated by the respective plural recurrent neural networks according to the prediction errors which have undergone the nonlinear conversion by the conversion means.

2. The information processing apparatus according to claim 1, wherein

- the lower time series data generation means further comprises a plurality of gate means for opening and closing the outputs of the prediction time series data at the subsequent stages of the respective plural recurrent neural networks, and the plural gate means open and close the outputs of the prediction time series data according to the prediction errors which have undergone the nonlinear conversion by the conversion means.

3. The information processing apparatus according to claim 2, wherein the lower time series data generation means further comprises composition means for compositing and outputting the prediction time series data output from the plural gate means.

4. The information processing apparatus according to claim 1, wherein the recurrent neural networks of the upper time series data generation means are recurrent neural networks of the continuous time type.

5. The information processing apparatus according to claim 1, wherein the conversion means performs the nonlinear conversion for the prediction errors generated by the upper time series data generation means using the softmax function.

6. The information processing apparatus according to claim 1, further comprising

- temporal filter means for performing the temporal filter processing for the errors output by the lower time series data generation means.

7. The information processing apparatus according to claim 1, further comprising

- nonlinear filter means for nonlinearly converting the errors output by the lower time series data generation means.

8. The information processing apparatus according to claim 1, wherein, at the time of the learning, the lower time series data generation means updates the weight of learning of the respective plural recurrent neural networks according to errors raised at the time of the learning by the respective plural recurrent neural networks.

9. The information processing apparatus according to claim 1, wherein, at the time of the learning, of errors raised at the time of the learning by the respective plural recurrent neural networks, the lower time series data generation means sets a recurrent neural network that has raised a minimum error to the winner, and updates the weight of learning of the respective plural recurrent neural networks according to the distance from the winner.

10. An information processing method, comprising the steps of:

- learning predetermined time series data, and generating prediction time series data according to the learning result;

- learning error time series data that is time series data of errors raised at the time of learning the predetermined time series data; and generating prediction error time series data that is time series data of prediction errors according to the learning result;

- performing nonlinear conversion for the generated prediction errors; and

- outputting the generated prediction time series data according to the prediction errors which have undergone the nonlinear conversion.

11. A program that makes a computer execute a processing, the processing comprising the steps of:

- learning predetermined time series data, and generating prediction time series data according to the learning result;

- learning error time series data that is time series data of errors raised at the time of learning the predetermined time series data, and generating prediction error time series data that is time series data of prediction errors according to the learning

**Patent History**

**Publication number**: 20070265841

**Type:**Application

**Filed**: May 14, 2007

**Publication Date**: Nov 15, 2007

**Patent Grant number**: 7877338

**Inventors**: Jun Tani (Saitama), Ryunosuke Nishimoto (Saitama), Masato Ito (Tokyo)

**Application Number**: 11/803,237

**Classifications**

**Current U.S. Class**:

**Time (704/211)**

**International Classification**: G10L 19/14 (20060101); G10L 21/00 (20060101);