COMPUTERREADABLE RECORDING MEDIUM STORING DATA ESTIMATION PROGRAM AND DATA ESTIMATION METHOD
A process includes extracting a first set of data to be used for construction of a firstmodel that outputs an estimatedvalue of first data at a first time that follows second times with respect to an input of a second set of data that includes second data that has been measured at the second times, from third sets of data that include third data that had been measured at third times prior to the second times, based on the second set, determining whether a secondmodel that has been previously constructed is identical to the firstmodel, based on the first set and one of the second set and the third sets used for the construction of the secondmodel, and when it is determined that the secondmodel is identical to the first model, acquiring the estimatedvalue output from the secondmodel by inputting the second set to the secondmodel.
Latest FUJITSU LIMITED Patents:
 Malware inspection apparatus and malware inspection method
 Encoding circuit, decoding circuit, encoding method, and decoding method
 Nontransitory computerreadable recording medium, determination method, and information processing apparatus
 Nontransitory computerreadable storage medium for storing collision risk calculation program, collision risk calculation method, and collision risk calculation apparatus
 Classification of electronic documents
This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2020213130, filed on Dec. 23, 2020, the entire contents of which are incorporated herein by reference.
FIELDThe embodiments discussed herein are related to a technique of estimating data.
BACKGROUNDThere are several known techniques regarding estimation of data.
For example, in a centralized simulation system in which a plurality of simulators cooperates, a technique of ensuring the execution efficiency and the function of each simulator and synchronizing the respective simulators is known. In this technique, for example, the system has a cooperation part having a common data area to which a plurality of independent simulators that individually simulate a plurality of elements constituting a simulation object is connected so as to be accessible. The cooperation part includes a time management part that manages the simulation time point with another simulator when requested from one simulator. The time management part manages the simulation time point between related simulators only when requested by each simulator.
Furthermore, for example, a technique relating to modeling of multivariate timeseries data overlaid with event data has been known. The technique, for example, first selects one or more historical timeseries data arrays that are similar to a recent timeseries data array and filters the similar historical timeseries data arrays based on the event data. Then, a localized temporal prediction model is learned using the filtered historical timeseries data arrays. In this technique, both or one of the construction and learning of the localized temporal prediction model is performed at or near a time when prediction is needed.
Furthermore, for example, a technique is known in which a past operation case similar to a specified operation condition is searched for in regard to a manufacturing process in which a physical phenomenon is complicated, and a future state is predicted from the search result. In this technique, for example, a timeseries database of operation states of the manufacturing process is created, variable values of the process are quantized and sequentially stored in a search table along with time point data. Then, the time point of the prediction starting point and a process variable value assigned as the starting point are quantized, and the search table is searched using the quantized values as search keys. In this technique, the time point of a process variable value having a quantized value similar to the search keys is specified in accordance with a similarity criterion, the process variable value at the specified time point is taken from the timeseries database, and a process variable value at a future time point wanted to be predicted is designated.
In addition, for example, a process administration support technique capable of supporting the administration in a steady state or a nonsteady state and an abnormal state by effectively utilizing the past history is known. For example, this technique is configured to work out a control variable value of a control object that brings the control object into a target state, according to a plurality of input variable values that change with time, and uses a hierarchically structured neural circuit model made up of an input layer, at least one intermediate layer, and an output layer. In this technique, the neural circuit model is caused to perform learning with a representative pattern of a plurality of input variable values at different points in time in past process administration history information as an input signal, and also with a control variable value relevant to the representative pattern as a teacher signal. Then, the desired control variable value is worked out by inputting an unlearned pattern to the learned neural circuit model as an input variable value.
Japanese Laidopen Patent Publication No. 2006350549, Japanese National Publication of International Patent Application No. 2019503540, Japanese Laidopen Patent Publication No. 2008146322, and Japanese Laidopen Patent Publication No. 10091208 are disclosed as related art.
SUMMARYAccording to an aspect of the embodiments, a nontransitory computerreadable recording medium storing a data estimation program that causes a computer to execute a process, the process includes extracting at least one first set of measurement data to be used for construction of a first model that outputs an estimated value of first measurement data at a first measurement time that follows second measurement times with respect to an input of a second set of measurement data that includes second measurement data that has been measured at the second measurement times, from third sets of measurement data that include third measurement data that had been measured at third measurement times prior to the second measurement times, based on the second set, determining whether a second model that has been previously constructed is identical to the first model, based on the first set and one of the second set and the third sets used for the construction of the second model, when it is determined that the second model is not identical to the first model, constructing the first model by using the first set, and acquiring the estimated value output from the first model by inputting the second set to the first model, and when it is determined that the second model is identical to the first model, acquiring the estimated value output from the second model by inputting the second set to the second model.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
When a future value of timeseries data such as measurement values is estimated, in a case where the object is complicated and it is difficult to construct a model from the laws of physics, it is performed to construct a model using a collection of past timeseries data sets and estimate the future value using the constructed model. Furthermore, it is also performed to construct a multiinput model using a collection of past timeseries data sets for a plurality of types of measurement data measured regarding an object, and estimate a future value using this model.
It is performed to construct a local model using some data sets from the collection of past timeseries data sets, as a model that estimates a future data value from most recent timeseries data including most recently obtained data. In such local model construction, usually, as the number of sets of timeseries data sets used for the construction is expanded, the accuracy of the model to be constructed may be expected to improve, but the amount of computation imposed for the construction increases. When model construction performed every time data estimation is newly performed imposes a lot of time due to a large amount of computation, the object of future value estimation is limited to a system with a relatively small time delay.
Hereinafter, embodiments of a technique of reducing the amount of computation imposed for estimating a future value will be described in detail with reference to the drawings.
The sensors 11A, 11B, 11C, . . . measure various physical data about an object system for which a future data value is to be estimated and output obtained measurement data. Note that, in the following explanation, the sensors 11A, 11B, 11C, . . . are collectively described as “sensors 11” unless description is given by particularly distinguishing between the sensors.
The computer 100 accepts the various pieces of measurement data output from the sensors 11 to work out an estimated value of future data based on these pieces of measurement data and outputs the obtained estimated value to the display device 12. In
The display device 12 is an output device that displays the estimated value output from the computer 100.
The processor 101 may be, for example, a single processor, a multiprocessor, or a multicore processor. The processor 101 uses the memory 102 to execute, for example, a data estimation processing program that describes a procedure of data estimation processing described later.
The memory 102 is, for example, a semiconductor memory and may include a RAM area and a ROM area. The storage device 103 is, for example, a semiconductor memory such as a hard disk or a flash memory, or an external storage device and provides functions as various databases (hereinafter, denoted as “DBs”) described later. Note that RAM is an abbreviation for random access memory. In addition, ROM is an abbreviation for read only memory.
The reading device 104 accesses a removable storage medium 105 in accordance with an instruction from the processor 101. For example, the removable storage medium 105 is achieved by a semiconductor device (such as a USB memory), a medium to which information is input and from which information is output by magnetic action (such as a magnetic disk), a medium to which information is input and from which information is output by optical action (such as a CDROM or DVD), or the like. Note that USB is an abbreviation for universal serial bus. CD is an abbreviation for compact disc. DVD is an abbreviation for digital versatile disk.
The communication interface 106 transmits and receives data via a communication network (not illustrated) in accordance with an instruction from the processor 101, for example.
The input/output interface 107 acquires various sorts of measurement data from the sensors 11 in the data estimation system 10 in
This data estimation program executed by the processor 101 of the computer 100 is provided, for example, in the following form.

 (1) Installed on the storage device 103 in advance.
 (2) Provided by the removable storage medium 105.
 (3) Provided to the communication interface 106 from a server such as a program server via a communication network.
Note that the hardware configuration of the computer 100 is exemplary, and the embodiment is not limited to this configuration. For example, a part or all of the functions of the functional units described above may be implemented as hardware including FPGA, SoC, and the like. Note that FPGA is an abbreviation for field programmable gate array. SoC is an abbreviation for systemonachip.
Hereinafter, the data estimation processing described in the data estimation program executed by the processor 101 will be described.
Initially, some examples of future data value estimation approach performed by constructing a local model and used in the data estimation processing will be described.
First, a first example of the data estimation approach will be described with reference to
The function of each component represented in
A measurement data acquisition unit 201 acquires measurement data (second measurement data) d_{t }measured by the sensor 11 at most recent date and time (a second measurement time) t and stores the acquired measurement data d_{t }in a measurement value DB 200 together with information on the date and time t.
In
In addition, the measurement data acquisition unit 201 repeats storing the measurement data in the measurement value DB 200 every time the sensors 11 perform measurement at predetermined cycles and new measurement data is obtained. In
Note that, in each example of the data estimation approach described hereafter, it is equally assumed that an estimated value y^{+}_{t+1 }of measurement data to be measured by the sensor 11 that measured y_{t }at measurement date and time t+1 following the date and time t is estimated.
The explanation of
The data set creation unit 212 creates most recent measurement data set (a second set) D_{t }and the collection of past measurement data sets (third sets), using the measurement data stored in the measurement value DB 200.
Among these data sets, the most recent measurement data set D_{t }is a measurement data set in which measurement data at each instance of measurement date and time j from j=(t−nd) to j=t (the measurement date and time of the most recent measurement data) is arranged in chronological order. Note that nd represents the delay time and is assumed as, for example, a time corresponding to about 10 cycles of the measurement cycle of the measurement data.
Furthermore, the collection of past measurement data sets is a collection of measurement data sets D_{k }including measurement data (third measurement data) measured at each instance of measurement date and time (third measurement times) k in the past earlier than the most recent measurement date and time t, as the most recent measurement data. The number of sets of measurement data sets D_{k }that can be included in this collection is assumed as, for example, about 10,000 sets.
Note that, in the following description, the measurement date and time of the most recent measurement data (the latest one) among pieces of measurement data included in the measurement data set will be referred to as the measurement date and time of the measurement data set.
Once the data set creation unit 212 creates various data sets, an error value calculation unit 213 subsequently calculates an error value. The error value calculation unit 213 calculates the error value of each of the measurement data sets D_{k }at each instance of measurement date and time k from k=1+nd to t−1, which are each an element of the collection of past measurement data sets, with respect to the most recent measurement data set D_{t}. In the present embodiment, the error value of this measurement data set D_{k }with respect to the most recent measurement data set D_{t }is calculated as follows.
First, for each pair of pieces of measurement data relevant between the measurement data set D_{k}, which is an element of the collection of past measurement data sets, and the most recent measurement data set D_{t}, a value obtained by squaring a difference between the pair of pieces of measurement data is calculated. Then, the sum of the above values calculated for each of the pair of pieces of measurement data is worked out, the square root of this sum is calculated, and the obtained value is assigned as the error value of the measurement data set D_{k }with respect to the most recent measurement data set D_{t}. For example, this calculation approach calculates, as the error value, the Euclidean distance between a pair of vectors that are configured with a piece of measurement data (normalized data) included in each of the measurement data set D_{k }and the most recent measurement data set D_{t }as elements of the respective vectors.
In the present embodiment, the error value calculated in this manner is used as an index representing the similarity between the measurement data set D_{k}, which is an element of the collection of past measurement data sets, and the most recent measurement data set D_{t}.
Following the calculation of the error value, a data set ranking unit 214 ranks each of the measurement data sets D_{k}, which are elements of the collection of past measurement data sets. The data set ranking unit 214 ranks each of the measurement data sets D_{k }in ascending order of the error values calculated by the error value calculation unit 213. For example, this ranking corresponds to ranking each of the measurement data sets D_{k }in an order from the highest in similarity with respect to the most recent measurement data set D_{t}. In
Following the above ranking, a similar data set extraction unit 215 extracts a similar data set. The similar data set extraction unit 215 extracts top n_{s }measurement data sets D_{k }in ascending order of error values, which are top n_{s }measurement data sets D_{k }in an order from the highest in similarity, as similar data sets, from the collection of past measurement data sets in which each element has been ranked. Note that the value of n_{s }is a predetermined value and is assumed as, for example, a value on the order of several hundreds.
In
Note that the similar data set extraction unit 215 further extracts, from the measurement value DB 200, pieces of measurement data measured by the sensors 11 that are the objects of the measurement data estimation, in which the measurement date and time of the measurement data closely follows the measurement date and time of each similar data set.
In
Following the extraction of the similar data sets, a model coefficient calculation unit 216 calculates a model coefficient. The model coefficient calculation unit 216 calculates the model coefficient to construct a model that outputs the estimated value y^{+}_{t+1 }of measurement data at the measurement date and time t+1 following the most recent measurement date and time, with respect to the input of the most recent measurement data set D_{t}. In the present embodiment, in order to calculate the model coefficient, linear multiple regression analysis is performed with the measurement data included in each of the similar data sets D^{(1)}, D^{(2)}, . . . , and D^{(ns) }as explanatory variables and with the estimated values y^{(1)}, y^{(2)}, . . . , and y^{(ns) }of the measurement data as objective variables (explained variables). By this analysis, the values of the partial regression coefficient and the intercept (constant term) in the multiple regression equation are individually calculated as the model coefficients. In
A model coefficient acquisition unit 217 acquires the model coefficients M_{t }and b_{t }calculated by the model coefficient calculation unit 216 as described above.
An estimated value calculation unit 218 substitutes the most recent measurement data set D_{t }into the following multiple regression equation configured using the model coefficients M_{t }and b_{t }acquired by the model coefficient acquisition unit 217, and calculates the estimated value y^{+}_{t+1 }of the measurement data at the measurement date and time t+1 following the most recent measurement date and time.
y^{+}_{t+}i=M_{t}×D_{t}+b_{t }
This multiple regression equation is an example of a model that outputs the estimated value y^{+}_{t+1 }of measurement data at a measurement time following the measurement time of the most recent measurement data set D_{t}, with respect to the input of the most recent measurement data set D_{t}.
An estimated value output unit 219 outputs the estimated value y^{+}_{t}±_{i }calculated by the estimated value calculation unit 218 and displays the output estimated value y^{+}_{t+1 }on the display device 12. Furthermore, in parallel with this output of the estimated value y^{+}_{t+1}, a model coefficient discard unit 220 discards the model coefficients M_{t }and b_{t }acquired by the model coefficient acquisition unit 217.
Thereafter, every time the estimation start query reception unit 211 confirms the reception of the estimation start query, each of the abovedescribed components function, and the procedure of acquiring the estimated value y^{+}_{t+1 }is repeated.
In the first example of the data estimation approach, the data is estimated as described above.
Next, a second example of the data estimation approach will be described.
In the first example described above, the model is regularly constructed every time new data estimation is performed. Therefore, as the number of pieces of data is expanded in measurement data used for constructing the model, the amount of computation imposed for constructing the model is expanded.
In contrast to this, in the second example of the data estimation approach described hereafter, the model coefficients obtained by constructing the model are saved together with the similar data sets used for constructing the model. In the procedure for new data estimation after that, when a new most recent measurement data set is acquired, it is determined whether a model to be used for data estimation based on the new most recent measurement data set is identical to the previously constructed model. This determination is made based on similar data sets for the new most recent measurement data set and similar data sets saved together with the model coefficients. In this determination, when it is determined that the model to be used is identical to the previously constructed model, the data estimation is performed by diverting the model using the saved model coefficients without constructing a new model. In this manner, by enabling the data estimation without constructing a new model, the amount of computation imposed for the data estimation may be reduced.
The second example of the data estimation approach will be described in more detail with reference to
Among the respective components represented in
The function of each component represented in
First, the measurement data acquisition unit 201, the estimation start query reception unit 211, the data set creation unit 212, the error value calculation unit 213, the data set ranking unit 214, and the similar data set extraction unit 215 in
In the second example represented in
Furthermore, in the past model DB 300, the similar data sets D^{(1)}, D^{(2)}, . . . , and D^{(ns) }used for constructing this model (calculating M_{t }and b_{t}) are stored in association with the model coefficients M_{t }and b_{t}. Note that, in this storage example, information on the measurement date and time of the measurement data set is appended as information that individually specifies which of the respective elements of the collection of past measurement data sets corresponds to which one of the similar data sets D^{(1)}, D^{(2)}, . . . , and D^{(ns)}. The item of “time” in each of the similar data sets D^{(1)}, D^{(2)}, . . . , and D^{(ns) }represented in
Note that, as represented in
The past similar model lookup unit 301 looks up similar data sets identical to the n_{s }ranked similar data sets extracted by the similar data set extraction unit 215, in the past model DB 300. When it is verified, as a result of this lookup, that the identical similar data sets do not exist in the past model DB 300, the model coefficients are calculated by the model coefficient calculation unit 216 in a manner similar to the first example described above. Then, the calculated model coefficients M_{t }and b_{t }are acquired by the model coefficient acquisition unit 217.
On the other hand, when the past similar model lookup unit 301 finds the identical similar data sets from the past model DB 300 by the abovementioned lookup, the model coefficients are not calculated by the model coefficient calculation unit 216. In this case, the model coefficient acquisition unit 217 acquires the model coefficients M_{t }and b_{t }associated with the identical similar data sets from the past model DB 300. For example, since the model coefficients calculated using the identical similar data sets have the same values, the model that has been constructed is diverted without constructing a model when the identical similar data sets are found in the past model DB 300. By configuring in this manner, the amount of computation for constructing a new model is reduced.
Both of the estimated value calculation unit 218 and the estimated value output unit 219 provide functions similar to the functions of these units in
First, when the model coefficients M_{t }and b_{t }acquired by the model coefficient acquisition unit 217 are model coefficients most recently calculated by the model coefficient calculation unit 216, the past model saving and deleting unit 302 newly registers these model coefficients M_{t }and b_{t }in the past model DB 300. Note that, when the model coefficients M_{t }and b_{t }are registered, information on the date and time when the model was constructed (the date and time when the model coefficients M_{t }and b_{t }were calculated) is also registered together in the past model DB 300. Furthermore, the past model saving and deleting unit 302 also registers the similar data sets D^{(1)}, D^{(2)}, . . . , and D^{(ns) }used by the model coefficient calculation unit 216 to calculate the model coefficients M_{t }and b_{t }in the past model DB 300 in association with the model coefficients M_{t }and b_{t}. Note that, when the similar data sets D^{(1)}, D^{(2)}, . . . , and D^{(ns) }are registered, information on the measurement date and time of the measurement data sets that are the similar data sets is also registered together in the past model DB 300.
When the model coefficients M_{t }and b_{t }are newly registered in the past model DB 300, the past model saving and deleting unit 302 assigns information on the number of citations for the model coefficients M_{t }and b_{t }as “0” times as the initial value and further registers the information in the past model DB 300.
On the other hand, when the model coefficient acquisition unit 217 has acquired the model coefficients M_{t }and b_{t }from the past model DB 300, the past model saving and deleting unit 302 increments the information on the number of citations associated with the model coefficients M_{t }and b_{t }in the past model DB 300.
The past model saving and deleting unit 302 also deletes the past model in addition to saving the past model described above.
The past model saving and deleting unit 302 first acquires information on the calculation date and time of the model coefficients M_{t }and b_{t }included in each record registered in the past model DB 300. Here, a record whose date and time represented by the information is old, which is a record of which the date and time is a predetermined saving period (for example, one year) before or earlier, is deleted from the past model DB 300. Since it is considered that a model with old construction date and time is highly likely not to properly represent the current state of the object system for which the model was constructed, this deletion is intended to exclude the model coefficients M_{t }and b_{t }of such a model from the objects of diversion.
Furthermore, for a model whose construction date and time is not old enough to be uniformly deleted, but whose frequency of being diverted is low, the past model saving and deleting unit 302 also deletes a record regarding the model. In more detail, for example, first, for each record whose calculation date and time of the model coefficients M_{t }and b_{t }is a predetermined grace period (for example, one month) before or earlier, the past model saving and deleting unit 302 acquires information on the date and time and information on the number of citations. Then, the number of citations is divided by an elapsed time from the date and time to the present point in time to calculate the diversion frequency of the model coefficients. Here, the past model saving and deleting unit 302 deletes a record whose calculated diversion frequency does not reach a predetermined threshold value, from the past model DB 300. This deletion is intended to prioritize giving a margin to the capacity of the past model DB 300 rather than holding the model coefficients M_{t }and b_{t }whose diversion frequency is low.
Thereafter, every time the estimation start query reception unit 211 confirms the reception of the estimation start query, each of the abovedescribed components function, and the procedure of acquiring the estimated value y^{+}_{t+1 }is repeated.
In the second example of the data estimation approach, the data is estimated as described above.
Next, a third example of the data estimation approach will be described.
In the third example described hereafter, n_{s }similar data sets ranked in an order from the highest in similarity with respect to the most recent measurement data set D_{t }and information on the similarity of each of the similar data sets are held in association with the most recent measurement data set D_{t}. In the processing for new data estimation after that, when the most recent measurement data set D_{t }that is identical to the newly acquired measurement data set is held, similar data sets held in association with the identical most recent measurement data set D_{t }and the information on the similarity are acquired. Then, the acquired similar data sets and information on the similarity are exploited for calculating the similarity and extracting the similar data sets for the newly acquired most recent measurement data set D_{t}. The third example aims at reducing the amount of computation imposed for calculating the similarity and extracting the similar data sets in this manner.
The outline of the third example of the data estimation approach will be further described with reference to
Among the respective components represented in
The function of each component represented in
First, the measurement data acquisition unit 201 and the estimation start query reception unit 211 in
In the third example represented in
When the data set creation unit 401 creates the most recent measurement data set D_{t}, a past identical data set search unit 402 searches for a past identical measurement data set. In more detail, for example, the past identical data set search unit 402 performs a lookup in data stored in an error value and rank DB 400. Note that, in the following description, this error value and rank DB 400 will be simply referred to as “rank DB 400”.
In the storage example in
Note that, in the following description, this measurement date and time of the measurement data set that is a similar data set will be simply referred to as “measurement date and time of the similar data set”.
The past identical data set search unit 402 searches the rank DB 400 for a most recent measurement data set in which all items other than the information on the measurement date and time are the same as those of the most recent measurement data set D_{t }created by the data set creation unit 401.
When the past identical data set search unit 402 verifies, as a result of this search, that the identical most recent measurement data set does not exist in the rank DB 400, the data set creation unit 401 creates the data set again. However, in this case, the data set creation unit 401 creates a collection of past measurement data sets.
Once the collection of past measurement data sets is created, subsequently, the calculation of the error values by an error value calculation unit 404, ranking of the measurement data sets by a data set ranking unit 405, and the extraction of similar data sets by a similar data set extraction unit 406 are performed successively. The functions provided by these respective elements in this case are the same functions individually provided by the error value calculation unit 213, the data set ranking unit 214, and the similar data set extraction unit 215 in the first example illustrated in
On the other hand, when the identical most recent measurement data set is found from the rank DB 400 by the abovedescribed search by the past identical data set search unit 402, a difference acquisition unit 403 acquires a difference time point.
First, the difference acquisition unit 403 acquires the rank and error value list associated with the most recent measurement data set found in the rank DB 400. Next, the difference acquisition unit 403 works out a period from the measurement date and time of the found most recent measurement data set to the measurement date and time of the most recent measurement data set D_{t }created by the data set creation unit 401. The difference acquisition unit 403 acquires a time point within this period as “difference time point”.
When the difference time point is acquired by the difference acquisition unit 403, the error value calculation unit 404, the data set ranking unit 405, and the similar data set extraction unit 406 individually work as follows.
First, the error value calculation unit 404 calculates the error values. However, in this case, the error value calculation unit 404 assigns, as an object, each of the measurement data sets in the collection of past measurement data sets whose measurement date and time coincide with the abovementioned difference time point and calculates the error value with respect to the most recent measurement data set D_{t }created by the data set creation unit 401. For this purpose, before the error value calculation unit 404 calculates the error values, the data set creation unit 401 creates only a measurement data set for which the error value is to be calculated, which means to create only a measurement data set whose measurement date and time coincide with the difference time point, from the collection of the past measurement data sets.
Note that the error value calculation itself performed by the error value calculation unit 404 for the measurement data set created by the data set creation unit 401 in this manner is the same as the error value calculation performed by the error value calculation unit 213 in the first example illustrated in
Next, ranking is performed by the data set ranking unit 405. However, in this case, the data set ranking unit 405 compares the magnitude among the error values for each similar data set indicated in the rank and error value list acquired by the difference acquisition unit 403 and the error values calculated by the error value calculation unit 404 for each measurement data set. Based on the result of this magnitude comparison, the data set ranking unit 405 ranks both of measurement data sets that are similar data sets indicated in the rank and error value list and measurement data sets that are objects of the error value calculation by the error value calculation unit 404, in ascending order of error values.
The similar data set extraction unit 406 extracts the similar data sets. However, in this case, the similar data set extraction unit 406 extracts top n_{s }measurement data sets D_{k }from the respective measurement data sets ranked by the data set ranking unit 405 as described above, in an order from the highest in similarity, which is an ascending order of error values. The n_{s }measurement data sets D_{k }extracted in this manner are treated as similar data sets for the most recent measurement data set D_{t }created by the data set creation unit 401.
Note that the similar data set extraction unit 406 is similar to the similar data set extraction unit 215 in the first example illustrated in
The working of the past similar model lookup unit 301, the past model saving and deleting unit 302, the model coefficient calculation unit 216, the model coefficient acquisition unit 217, the estimated value calculation unit 218, and the estimated value output unit 219 is similar to the working of these units in the second example illustrated in
Note that, in the third example in
When the identical most recent measurement data set D_{t }is not found by the abovedescribed search by the past identical data set search unit 402, the error value and rank DB saving and updating unit 407 creates the rank and error value list for the extracted similar data sets. Then, the error value and rank DB saving and updating unit 407 stores the created rank and error value list and the most recent measurement data set D_{t }created by the data set creation unit 401 in the rank DB 400 in association with each other.
On the other hand, when the identical most recent measurement data set D_{t }is found in the rank DB 400 by the abovedescribed search, the error value and rank DB saving and updating unit 407 updates the rank and error value list associated with the found most recent measurement data set. By this update, the rank and error value list is updated to a rank and error value list for the similar data sets extracted by the similar data set extraction unit 406. Additionally, the error value and rank DB saving and updating unit 407 updates the information “time” on the measurement date and time of the found most recent measurement data set to information on the measurement date and time of the most recent measurement data set D_{t }created by the data set creation unit 401.
In the third example of the data estimation approach, the data is estimated as described above. In this third example, when a most recent measurement data set identical to the most recent measurement data set D_{t }is stored in the rank DB 400, since a part of elements of the collection of past measurement data sets will not be created, and the error values for the part of elements will not be calculated, the amount of computation imposed for executing these pieces of processing is reduced.
Next, the processing procedure of the data estimation processing performed by the processor 101 in
The processor 101 performs this data estimation processing by executing the data estimation program. When the processor 101 performs this data estimation processing, the storage device 103 provides the functions of the measurement value DB 200, the past model DB 300, and the rank DB 400.
The data estimation processing is processing including measurement data storage processing and estimation processing. The processor 101 executes this measurement data storage processing and the estimation processing in parallel.
First, the measurement data storage processing will be described.
In
Next, in S502, processing of determining whether the measurement data d_{t }has been acquired by the processing in S501 described above is performed. When it is determined, in this determination processing, that the measurement data d_{t }has not been acquired (when the determination result is NO), the processing in S501 is performed again, and thereafter, this processing is repeated until it is determined by the processing in S501 that the measurement data d_{t }has been acquired.
When it is determined, in the determination processing in S502, that the measurement data d_{t }has been acquired (when the determination result is YES), the processing in S503 is performed. In S503, processing of storing the measurement data d_{t }acquired by the processing in S501 in the measurement value DB 200 together with information on the date and time t is performed. Note that, as described above, the measurement data d_{t }is normalized as needed and stored.
When the processing in S503 is completed, the processing returns to S501, and thereafter, every time the sensors 11 perform measurement at predetermined cycles and new measurement data is obtained, storing in the measurement value DB 200 is repeated.
The processing up to the above is the measurement data storage processing.
Next, the estimation processing will be described. Each diagram in
In
Next, in S602, processing of determining whether the communication interface 106 has received the estimation start query is performed. When it is determined, in this determination processing, that the estimation start query has not been received (when the determination result is NO), the processing in S601 is performed again, and thereafter, this processing is repeated until it is determined that the estimation start query has been received.
By performing the above processing in S601 and S602, the processor 101 provides the function of the estimation start query reception unit 211 in the third example of the functional configuration of the data estimation device described above.
When it is determined, in the determination processing in S602, that the estimation start query has been received (when the determination result is YES), the processing in S603 is performed. In S603, processing of creating the most recent measurement data set D_{t }is performed using the measurement data stored in the measurement value DB 200. Note that the approach of creating the most recent measurement data set D_{t }in this processing is similar to the approach of the data set creation unit 212 described above.
By performing this processing in S603, the processor 101 provides the function of creating the most recent measurement data set D_{t }by the data set creation unit 401 in the third example of the functional configuration of the data estimation device described above.
When this processing in S603 is completed, the processing proceeds to S611 in
In
By performing the above processing in S611 and S612, the processor 101 provides the function of the past identical data set search unit 402 in the third example of the functional configuration of the data estimation device described above.
When it is determined, in the determination processing in S611, that the identical most recent measurement data set has been found, processing of acquiring the rank and error value list associated with the most recent measurement data set found in the rank DB 400 is performed as processing in S613.
In following S614, processing of acquiring a time point within a period from the measurement date and time of the found most recent measurement data set to the measurement date and time of the most recent measurement data set D_{t }created by the processing in S603, as the abovedescribed difference time point is performed.
By performing the above processing in S613 and S614, the processor 101 provides the function of the difference acquisition unit 403 in the third example of the functional configuration of the data estimation device described above.
In S615 following S614, processing of creating only a measurement data set whose measurement date and time coincide with the difference time point from the collection of past measurement data sets is performed. The approach of creating the measurement data set in this processing is also similar to the approach of the data set creation unit 212 described above.
In following S616, processing of assigning, as an object, each of the measurement data sets in the collection of past measurement data sets whose measurement date and time coincide with the abovedescribed difference time point and calculating the error value with respect to the most recent measurement data set D_{t }created by the processing in S603 is performed. The approach of calculating the error value in this processing is similar to the approach of the error value calculation unit 213 described above.
In following S617, processing of ranking the measurement data sets is performed. In this processing, first, processing of comparing the magnitude among the error values for each similar data set indicated in the rank and error value list acquired by the processing in S613 and the error values calculated by the processing in S616 for each measurement data set is performed. Next, processing of ranking both of measurement data sets that are similar data sets indicated in the rank and error value list and measurement data sets that are objects of the error value calculation by the processing in S616, in ascending order of error values, based on the result of this magnitude comparison is performed.
In following S618, processing of extracting top n_{s }measurement data sets as similar data sets from among the respective measurement data sets ranked by the processing in S613, in an order from the highest in similarity, which is an ascending order of error values, is performed. Note that, in this processing, processing of extracting pieces of measurement data measured by the sensors 11 that are the objects of the measurement data estimation, in which the measurement date and time of the measurement data closely follows the measurement date and time of each similar data set, is also performed.
In following S619, processing of updating the rank and error value list associated with the most recent measurement data set found in the rank DB 400 is performed. In this processing, processing of updating the rank and error value list to a rank and error value list for the similar data sets extracted by the processing in S618 is performed. Additionally, processing of updating the information “time” on the measurement date and time of the found most recent measurement data set to information on the measurement date and time of the most recent measurement data set D_{t }created by the processing in S603 is also performed.
When this processing in S619 is completed, the processing proceeds to S631 in
On the other hand, when it is determined, in the determination processing in S611 described above, that the identical most recent measurement data set has not been found, processing of creating all the elements of the collection of past measurement data sets is performed as processing in S621. The approach of creating the measurement data set in this processing is also similar to the approach of the data set creation unit 212 described above.
In following S622, processing of calculating the error value of each measurement data set that is an element of the collection of the past measurement data sets created by the processing in S621, with respect to the most recent measurement data set D_{t }created by the processing in S603 is performed. The approach of calculating the error value in this processing is similar to the approach of the error value calculation unit 213 described above.
In following S623, processing of ranking the respective measurement data sets that are elements of the collection of past measurement data sets created by the processing in S621, in ascending order of error values calculated by the processing in S622 is performed.
In following S624, processing of extracting top n_{s }measurement data sets as similar data sets from among the respective measurement data sets ranked by the processing in S623, in an order from the highest in similarity, which is an ascending order of error values, is performed. Note that, in this processing, processing of extracting pieces of measurement data measured by the sensors 11 that are the objects of the measurement data estimation, in which the measurement date and time of the measurement data closely follows the measurement date and time of each similar data set, is also performed.
In following S625, processing of creating a rank and error value list for the extracted similar data sets and storing the created rank and error value list and the most recent measurement data set D_{t }created by the processing in S603 in the rank DB 400 in association with each other is performed.
When this processing in S625 is completed, the processing proceeds to S631 in
By performing the processing in S615 and S621 among the above respective pieces of processing, the processor 101 provides the function of creating each element of the collection of past measurement data sets by the data set creation unit 401 in the third example of the functional configuration of the data estimation device described above. Furthermore, by performing the processing in S616 and S622 among the above respective pieces of processing, the processor 101 provides the function of the error value calculation unit 404 in the third example of the functional configuration of the data estimation device described above. Moreover, by performing the processing in S617 and S623 among the above respective pieces of processing, the processor 101 provides the function of the data set ranking unit 405 in the third example of the functional configuration of the data estimation device described above. In addition, by performing the processing in S618 and S624 among the above respective pieces of processing, the processor 101 provides the function of the similar data set extraction unit 406 in the third example of the functional configuration of the data estimation device described above. Moreover, by performing the processing in S619 and S625 among the above respective pieces of processing, the processor 101 provides the function of the error value and rank DB saving and updating unit 407 in the third example of the functional configuration of the data estimation device described above.
In
The above processing in S631 and S632 is processing of determining whether the model to be used for data estimation based on the most recent measurement data set D_{t }created by the processing in S603 is identical to the previously constructed model. This determination is made based on the similar data sets extracted based on the most recent measurement data set D_{t }created by the processing in S603 and the similar data sets used to construct the already constructed model. The case where the result of the determination processing in S632 is YES is the case where the model to be used for data estimation based on the most recent measurement data set D_{t }is determined to be identical to the previously constructed model. On the other hand, the case where the result of the determination processing in S632 is NO is the case where the model to be used for data estimation based on the most recent measurement data set D_{t }is determined to be not identical to the previously constructed model.
By performing the above processing in S631 and S632, the processor 101 provides the function of the past similar model lookup unit 301 in the third example of the functional configuration of the data estimation device described above.
When it is determined, in the determination processing in S632, that the identical similar data sets have been found, processing of reproducing the already constructed model, using the model coefficients associated with the found similar data sets and acquiring a data estimated value using the reproduced model is performed. First, processing of acquiring the model coefficients associated with the identical similar data sets from the past model DB 300 is performed as processing in S633. Then, in following S634, processing of incrementing the information on the number of citations stored in the past model DB 300 in association with the model coefficients M_{t }and b_{t }is performed. This number of citations represents the number of times the model coefficients M_{t }and b_{t }were used, which is the number of times the already constructed model reproduced using the model coefficients M_{t }and b_{t }was reproduced.
When this processing in S634 is completed, the processing proceeds to S638.
On the other hand, when it is determined, in the determination processing in S632, that the identical similar data sets have not been found, first, processing for constructing a model using the measurement data sets extracted based on the most recent measurement data set D_{t }is performed. Then, after this processing, processing for acquiring the data estimated value output from the constructed model is performed by inputting the most recent measurement data set D_{t }to the model.
When it is determined, in the determination processing in S632, that the identical similar data sets have not been found, first, processing of calculating the model coefficients M_{t }and b_{t }using the similar data sets extracted by the processing in S624 is performed as the processing in S635. The approach of calculating the model coefficients M_{t }and b_{t }in this processing is similar to the approach of the model coefficient calculation unit 216 described above. By performing the above processing in S632, the processor 101 provides the function of the model coefficient calculation unit 216 in the third example of the functional configuration of the data estimation device described above.
In following S636, processing of acquiring the model coefficients M_{t }and b_{t }calculated by the above processing is performed. Then, in following S637, processing of newly registering the acquired model coefficients M_{t }and b_{t }in the past model DB 300 is performed. In this processing, processing of registering the information on the date and time when the model was constructed (the date and time when the model coefficients M_{t }and b_{t }were calculated), and the similar data sets used for calculating the model coefficients M_{t }and b_{t }and the information on the measurement date and time of the similar data sets, in the past model DB 300 in association with the model coefficients M_{t }and b_{t }is also performed. Moreover, in this processing, processing of registering the information on the number of citations for the model coefficients M_{t }and b_{t }in the past model DB 300 with “0” times as the initial value is also performed.
By performing the processing in S633 and S636 among the above respective pieces of processing, the processor 101 provides the function of the model coefficient acquisition unit 217 in the third example of the functional configuration of the data estimation device described above. Furthermore, by performing the processing in S634 and S637 among the above respective pieces of processing, the processor 101 provides the function of saving the past model among the functions of the past model saving and deleting unit 302 in the third example of the functional configuration of the data estimation device described above.
In the processing in S638 following the processing in each of S634 and S637, processing of calculating the estimated value y^{+}_{t+1 }of the measurement data using the model coefficients M_{t }and b_{t }acquired by the processing in S633 or S636 and the most recent measurement data set D_{t }created by the processing in S603 is performed. The approach of calculating the estimated value y^{+}_{t+1 }of the measurement data in this processing is similar to the approach of the estimated value calculation unit 218 described above. By performing this processing in S638, the processor 101 provides the function of the estimated value calculation unit 218 in the third example of the functional configuration of the data estimation device described above.
In following S639, processing of outputting the calculated estimated value y^{+}_{t+1 }of the measurement data and displaying the output estimated value y^{+}_{t+1 }on the display device 12 is performed. By performing this processing in S639, the processor 101 provides the function of the estimated value output unit 219 in the third example of the functional configuration of the data estimation device described above.
When this processing in S639 is completed, the processing proceeds to S641 in
In
In following S642, processing of deleting a record whose calculation date and time represented by the information acquired by the processing in S641 is old, which is a record of which the date and time is a predetermined saving period (for example, one year) before or earlier, from the past model DB 300 is performed.
In following S643, processing of acquiring the information on the date and time and the information on the number of citations for each record whose calculation date and time represented by the information acquired by the processing in S641 is a predetermined grace period (for example, one month) before or earlier is performed.
In following S644, processing of dividing the abovementioned number of citations by the elapsed time from the abovementioned date and time to the present point in time using the information acquired in the processing in S643 to calculate the diversion frequency of the model coefficients stored in the record for each record is performed.
In following S645, processing of deleting a record whose diversion frequency calculated by the processing in S644 does not reach a predetermined threshold value from the past model DB 300 is performed.
By performing the processing in S634 and S637 among the above respective pieces of processing, the processor 101 provides the function of deleting the past model among the functions of the past model saving and deleting unit 302 in the third example of the functional configuration of the data estimation device described above.
When this processing in S645 is completed, the processing returns to S601 in
The processing up to the above is the estimation processing.
While the disclosed embodiments and the advantages thereof have been described above in detail, those skilled in the art will be able to make a variety of modifications, additions, and omissions without departing from the scope of the embodiments as explicitly set forth in the claims.
For example, in the abovedescribed embodiment, as a value representing the degree of similarity of a certain measurement data set with respect to the most recent measurement data set, the Euclidean distance between a pair of vectors configured with a piece of measurement data included in each of the measurement data sets as elements of the respective vectors is calculated as the error value. Instead of this, for example, another value representing the degree of similarity, such as cosine similarity between a pair of the vectors, may be used.
Furthermore, in the abovedescribed embodiment, the linear multiple regression analysis is performed in order to construct the model, but the model may be constructed using another analysis approach.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Claims
1. A nontransitory computerreadable recording medium storing a data estimation program that causes a computer to execute a process, the process comprising:
 extracting at least one first set of measurement data to be used for construction of a first model that outputs an estimated value of first measurement data at a first measurement time that follows second measurement times with respect to an input of a second set of measurement data that includes second measurement data that has been measured at the second measurement times, from third sets of measurement data that include third measurement data that had been measured at third measurement time prior to the second measurement times, based on the second set;
 determining whether a second model that has been previously constructed is identical to the first model, based on the first set and one of the second set and the third sets used for the construction of the second model;
 when it is determined that the second model is not identical to the first model, constructing the first model by using the first set, and acquiring the estimated value output from the first model by inputting the second set to the first model; and
 when it is determined that the second model is identical to the first model, acquiring the estimated value output from the second model by inputting the second set to the second model.
2. The nontransitory computerreadable recording medium storing the data estimation program according to claim 1,
 wherein the process extracts a predetermined number of the first sets in an order from highest in similarity with respect to the second set, from the third sets.
3. The nontransitory computerreadable recording medium storing the data estimation program according to claim 2,
 wherein the similarity is calculated with respect to the second set for each of the third sets;
 wherein the process registers a list that associates a second measurement time of the second measurement times, which is included in each of the predetermined number of the first sets, with the similarity for each of the predetermined number of the first sets; the second set; and the second measurement time, in a database in association with each other; and
 wherein, when the first model to which the second measurement data is constructed, the process determines whether an identical set of measurement data the second set is registered in the database,
 wherein, when it is determined that the identical set is registered in the database,
 the process calculates the similarity for the second data set in which the second measurement time falls within a period between the measurement time associated with the identical set in the third sets in the database and the second measurement time set, and
 the process extracts the predetermined number of the similarity in an order from highest from among the calculated similarity and the similarity included in the list associated with the identical set in the database, and constructs the first model by using the predetermined number of the first set relevant to the extracted similarity.
4. The nontransitory computerreadable recording medium storing the data estimation program according to claim 3,
 wherein the process calculates the similarity, for each pair of pieces of measurement data relevant between a calculation object one of the third sets and the second set, a value obtained by squaring a difference between the pair of pieces of the measurement data, and calculates a square root of a sum of the calculated values as the similarity between the calculation object one of the third sets and the second set.
5. The nontransitory computerreadable recording medium storing the data estimation program according to claim 2,
 wherein the process further saves information on the first model that has been constructed as information on the already constructed second model in association with the predetermined number of the first sets used when constructing the first model, and
 wherein when the predetermined number of the first sets that have been extracted are saved, the process determines that the second model for which the information is saved in association with the predetermined number of the first sets that have been extracted is identical.
6. The nontransitory computerreadable recording medium storing the data estimation program according to claim 5,
 wherein the first model is constructed by performing linear multiple regression analysis on the predetermined number of the first sets, and
 wherein the information on the first model is information on a partial regression coefficient and an intercept in a regression equation obtained by performing the linear multiple regression analysis.
7. The nontransitory computerreadable recording medium storing the data estimation program according to claim 5,
 wherein when it is determined that the already constructed second model is identical to the first model, the process reproduces the second model using the saved information in the already constructed second model, and acquires the estimated value output from the second model by inputting the second set to the reproduced second model.
8. The nontransitory computerreadable recording medium storing the data estimation program according to claim 5, wherein
 wherein the process further
 saves information of date and time of the construction of the second model, and
 after a lapse of a predetermined saving period from the date and time of the construction, deletes the information on the second model and the predetermined number of the first sets associated with the information, which have been saved.
9. The nontransitory computerreadable recording medium storing the data estimation program according to claim 7, wherein
 wherein the process further
 saves information on date and time of the construction of the second model and information on a number of times of the reproduced second model,
 calculates a diversion frequency of the information on the second model using the date and time of the construction of the second model and the number of times of the reproduced second model, and
 after a lapse of a predetermined grace period from the date and time of the construction, deletes the information of the second model of which the diversion frequency does not reach a predetermined threshold value and the predetermined number of the first sets associated with the information, which have been saved.
10. A data estimation method that causes a computer to execute a process, the process comprising:
 extracting at least one first set of measurement data to be used for construction of a first model that outputs an estimated value of first measurement data at a first measurement time that follows second measurement times with respect to an input of a second set of measurement data that includes second measurement data that has been measured at the second measurement times, from third sets of measurement data that include third measurement data that had been measured at third measurement time prior to the second measurement times, based on the second set;
 determining whether a second model that has been previously constructed is identical to the first model, based on the first set and one of the second set and the third sets used for the construction of the second model;
 when it is determined that the second model is not identical to the first model, constructing the first model by using the first set, and acquiring the estimated value output from the first model by inputting the second set to the first model; and
 when it is determined that the second model is identical to the first model, acquiring the estimated value output from the second model by inputting the second set to the second model.
Type: Application
Filed: Oct 13, 2021
Publication Date: Jun 23, 2022
Applicant: FUJITSU LIMITED (Kawasakishi)
Inventors: Hiroshi Endo (Fuji), Hiroyoshi Kodama (Isehara), Takahide Yoshikawa (Kawasaki)
Application Number: 17/500,500