METHOD FOR AUTOMATED DETERMINATION OF A MODEL COMPRESSION TECHNIQUE FOR COMPRESSION OF AN ARTIFICIAL INTELLIGENCE-BASED MODEL

Info

Publication number: 20230297837
Type: Application
Filed: Jul 13, 2021
Publication Date: Sep 21, 2023
Inventors: Christoph Paulitsch (Karlsruhe), Vladimir Lavrik (Dreieich), Yang Qiao Meng (Beijing)
Application Number: 18/017,163

Abstract

The present disclosure relates to a computer-implemented method for automated determination of a model compression technique for compression of an artificial intelligence-based model, a corresponding computer program product, and a corresponding apparatus of an industrial automation environment. The method includes automated provisioning of a set of model compression techniques using an expert rule, determining metrics for the model compression techniques of the set of model compression techniques based on weighted constraints, and selecting an optimized model compression technique based on the determined metrics.

Description

Description

The present patent document is a § 371 nationalization of PCT Application Serial No. PCT/EP2021/069459, filed Jul. 13, 2021, designating the United States, which is hereby incorporated by reference, and this patent document also claims the benefit of European Patent Application No. 20188083.8, filed Jul. 28, 2020.

TECHNICAL FIELD

The present disclosure relates to a computer-implemented method for automated determination of a model compression technique for a compression of an artificial intelligence-based model, a corresponding computer program product, and a corresponding apparatus of an industrial automation environment.

BACKGROUND

One of the key concepts for industrial scenarios of the next generation is Industrial Internet of Things combined with a new generation of analytical methods which are based on employing artificial intelligence (AI) and methods related thereto. For these concepts, industrial equipment installed in factories, manufacturing plants, processing plants, or production sites are equipped with all kinds of suitable sensors to collect a variety of different types of data. Collected data are transmitted via either wired or wireless connections for further analysis. The analysis of the data is performed with a usage of either classical approaches or AI methods. Based on the data analysis, a plant operator or a service provider may perform optimization of processes or installations in order to, for example, decrease the cost of a production and to decrease an energy consumption.

For the analysis of data, specific models are known. Moreover, these models are deployed in an operating environment. In order to reduce processing effort and associated energy consumption, models may be compressed.

In conventional approaches, certain devices in an industrial environment are selectively monitored, data are collected in a nonsystematic manner, and ad hoc analysis of these data are performed. In other approaches, devices are monitored with a usage of predefined thresholds employing human expertise in order to regularly check the performance of the analysis.

Recently data analytics processes like the CRossInduStryProcess for Data Mining (CRISP DM) have been proposed that describe how data collection, data preparation, model building, and deployment on devices connected to the model building stage might be carried out. However, it does not consider model compression.

The Chinese patent application CN109978144A describes a kind of a model compression method and system that includes the acts of determining a compression ratio, a first compression processing, and a second compression processing.

The Chinese patent application CN110163367A proposes a model compression method using a compression algorithm component and an algorithm hyper parameter value. A candidate compression result is obtained, and a hyper parameter value is adjusted.

The international publication WO2016180457A1 shows estimating a set of physical parameters, iteratively inverting an equation to minimize an error between simulated data and measured data and to provide an estimated set of physical parameters. It thereby discloses applying a compression operator to the model vector representing the set of physical parameters to reduce the number of free variables.

The patent application CN109002889A discloses a kind of adaptive iteration formula convolutional neural networks model compression methods and it includes: to be pre-processed to training data, convolutional neural networks are trained with training data, select the model that optimal models are compressed as needs, model is compressed with adaptive iteration formula convolutional neural networks model compression method, compressed model is assessed, the model that optimal models are completed as compression is selected.

The patent application CN108053034A shows a kind of model parameter processing method, device, electronic equipment and storage mediums. Wherein, the described method includes the corresponding parameter sets to be compressed of the pending model are obtained, the parameter sets to be compressed include multiple model parameters, according to the model parameter in the parameter sets to be compressed, determine Compression Strategies.

The patent application CN109961147A describes a specific model compression technique based on Q-learning.

SUMMARY AND DESCRIPTION

It is one object of this disclosure to provide a method and corresponding computer program product and apparatus to improve the usage of artificial intelligence-based models in environments with limited processing capacity.

The scope of the present disclosure is defined solely by the appended claims and is not affected to any degree by the statements within this summary. The present embodiments may obviate one or more of the drawbacks or limitations in the related art.

The disclosure relates to a computer-implemented method for automated determination of a model compression technique for compression of an artificial intelligence-based model. The method includes automated provisioning of a set of model compression techniques using an expert rule, determining metrics for the model compression techniques of the set of model compression techniques based on weighted constraints, and selecting an optimized model compression technique based on the determined metrics.

Artificial intelligence-based models such as short AI-models, machine learning or deep learning-based models, or models based on neural networks might be used. Moreover, tree-based models with decision trees as a basis might be used according to the application the AI-model is to be used for.

As a set of model compression techniques, for example, the following techniques might be used or combinations thereof: parameter pruning & sharing; quantization & binarization; designed structural matrix; low-rank factorization & sparsity; transfer/compact cony filters; and/or knowledge distillation.

Those exemplary techniques are applicable to a large range of data analytics models. There might be other techniques applicable.

The expert rule assigns to an AI-based model a specific set of model compression techniques. This rule-based selection procedure, for example, relies on expert knowledge that is reflected in a taxonomy. With this taxonomy, also conditions of data analytics models or of available data may be considered for provisioning of the set of model compression techniques.

The metrics are determined for the model compression techniques, which are promising candidates due to the expert rule-based selection. In particular, respective metrics are determined for each of the model compression techniques. The metrics characterize compressed models, which have been compressed with the model compression techniques in terms of satisfying one or more constraints. The metrics reflect a value of the compressed model and therefore of the model compression technique used.

The metric is determined based on weighed constraints. The value characterizing the compressed model in terms of satisfying one or more constraints may be determined with giving specific constraints higher priority. Those high priority constraints influence the value more than others rated with a lower priority.

The metric may be the result of a test within a test phase. In a test phase, the metrics for all different compression techniques are determined by generating a compressed model for each of the model compression techniques and the results are compared to choose the best model compression technique.

The metric may be defined with respect to various constraints. For example, a metric is defined by considering two, three, or more specific constraints. The value that reflects the model compression techniques is then a value which provides information about the quality of a technique in terms of the constraints. Metrics might be tested by applying different constraints or different combinations of constraints of a group of constraints.

The metric varies depending on the respective weights that are assigned to a respective constraint. Those weights may be chosen by a user or automatically depending on a type of analysis the AI-based model is used for.

The metrics are customized in terms of which constraints are considered and which weights are assigned to the respective constraints. The customization depends, for example, on the type of analysis or an industrial environment or hardware restrictions of the industrial system the AI-based model is used in or devices the AI-based model is deployed on.

Metrics may be two- or three-dimensional quantities or vectors and might have different values in the different dimensions. The different dimensions may be defined by different functions of constraints or functions of different constraints and/or different weights of the different constraints.

The selection may be performed by evaluating the highest metric or the metric having the highest value, e.g., by choosing the compression technique which results in highest values in most of the metric dimensions or by choosing the metric best fulfilling the most important constraint or having the highest value in the most important constraint.

In an advantageous way, the proposed method does not rely on a consecutive order of training the model, compressing the model, and deploying the model on an intended system, and then having to restart the process after model monitoring has found model errors.

In contrast, candidate model compression techniques are tested before deployment in a systematic and automated manner to efficiently use system resources.

With the proposed method, starting from an AI task, which may be solved with an AI-based model, and having industrial and hardware requirements at hand, the best suited AI model compression technique that resolves the AI task with respect to constraints specified by industrial and hardware requirements is found.

The workflow for analyzing and selecting a model compression technique is automated so that in comparison with existing manual selection procedures time effort to select a model compression method is reduced.

The selection method enables the usage of customized metrics for the selection process, which results in a large flexibility and customizability to different devices, where the compressed AI model, which has been compressed with the selected technique, is intended to be used and also to different industrial environments and use cases.

Finding the optimal model compression technique with the proposed method reduces the energy consumption of an AI-based model being deployed. Selecting an optimal technique to compress the AI model and using the compressed model enables a deployment with optimal computational effort. In particular, less parameters of the model cause less computational effort and this leads to less energy consumption.

According to an embodiment, the constraints reflect hardware or software constraints of an executing system for execution of a compressed model of the artificial intelligence-based model compressed with the model compression technique. With the constraints reflecting hardware or software requirement, also the metrics determined dependent on the constraints describe the hardware and software constraints.

For example, the constraints are one or more of a speed compression ratio, a memory compression ratio, a hardware memory allocation, a hardware acceleration, a required inference time, a dimensionality reduction requirement, an accuracy requirement, a docker container characteristic, a software license availability, a software license version, and a training data necessity.

According to an embodiment, the expert rule relates an artificial intelligence-based model to the model compression techniques of the set of model compression techniques based on condition of the artificial intelligence-based model or data needed for training or executing the artificial intelligence-based model. For example, the rule assigns techniques based on conditions like characteristics of the AI model, e.g., availability of a Softmax output layer, of a physical model, of a circ model, physical decomposition, or characteristics of the training data, e.g., availability of original training data. The expert rule may be provided on a test environment for the process of selecting the compression technique or is provided on the system the compressed model is to be deployed, in particular during the test phase. This act may be performed when a certain set of compression techniques is to be chosen in order to run the method.

The expert rule may address one or more AI based models and may include a set of compression techniques per AI-based model.

According to an embodiment, the metrics are functions in dependence of respective values representing respective constraints, wherein the respective values are weighted with respective weighting factors. The values representing respective constraints may be numerical, continuous, ordinal, or discrete.

According to an embodiment, the functions describe linear, exponential, polynomial, fitted, or fuzzy relations. This enables a flexible mapping of real interdependencies of different constraints. For example, a first function mirrors a linear interdependency of an accuracy and an inference time for one user, meaning for a first deployment on a first system, and a second function mirrors a non-linear interdependency for a second operation type.

According to an embodiment, the functions vary depending on the constraints. Dependent on the how many constraints are considered; the functions are built correspondingly.

According to an embodiment, the metrics are relative to a reference metric of the artificial intelligence-based model. The reference metric might be influenced by the most important constraints. For example, the reference metric might be the accuracy of a compressed AI-based model. Metrics may be chosen having this reference as a cornerstone.

According to an embodiment, the constraints for building the metrics depend on hardware and software framework conditions of the system or device the artificial intelligence-based model is used in. For example, the constraints of interest depend on whether there are restrictions like memory or software license restrictions. Moreover, the constraints considered for the metric might be dependent on a desired complexity of the function underlying the metric, which also influences the complexity of an optional following optimization method.

According to an embodiment, the respective weighting factor for the respective constraint of the constraints depends on an analysis type the artificial intelligence-based model is used in. For example, weights for the different constraints or their respective values are given by a user or operation types, e.g., depending on whether a postmortem analysis or an online analysis is to be performed with the AI model.

According to an embodiment, the selecting of an optimized model compression technique based on the determined metrics further includes optimizing the metrics for each of the model compression techniques over the constraints, in particular over respective values representing the respective constraints, and moreover in particular over parameters of the respective model compression techniques influencing the respective value representing the respective constraint.

The optimization may be carried out over different compression techniques, in particular over all compression techniques of the set of model compression techniques, and the associated metric spaces. For example, an optimization space is built where the metrics value of every model compression technique that is tested in the selection procedure, is maximized.

Optimization is done, for example, towards the constraint's values. The optimization may be performed with a function having the constraints values as variables.

According to an embodiment, for optimizing the metrics for each of the model compression techniques over the constraints, at least one constraint or more constraints are fixed. For the optimization, some of the constraints may be hard constraints and may therefore be fixed. Especially constraints concerning the availability of hardware or software resources are fixed and may not be varied to optimize the metric.

According to an embodiment, for optimizing the metrics for each of the model compression techniques over the constraints, an optimization method is used, in particular but not limited to a gradient descent method, a genetic algorithm-based method or a machine learning classification method.

The disclosure moreover relates to a computer-implemented method for generation of a compressed artificial intelligence-based model by using a model compression technique determined according to one of the preceding claims. The selected model compression technique is applied to the AI based model and results in a compressed artificial intelligence-based model. Using the selection method described above reduces the effort in finding a suited compression method or technique. Moreover, using the optimization method described above enables finding an optimized compression technique. Applying the compressed AI-based model, which has been compressed with the selected and in particular optimized model compression technique, enables execution of an AI task with optimized computational effort and/or optimized energy consumption.

The disclosure moreover relates to a computer program product including instructions that, when executed by a computer, cause the computer to carry out the method as disclosed herein. The computer might be a processor and might be connectable to a human machine interface. The computer program product may be embodied as a function, as a routine, as a program code or as an executable object, in particular stored on a storage device.

The disclosure moreover relates to an apparatus of an automation environment, in particular an edge device of an industrial automation environment, with a logic component configured to execute a method for automated determination of a model compression technique for compression of an artificial intelligence-based model. The method includes an automated provision of a set of model compression techniques using an expert rule, a determination of metrics for the model compression techniques of the set of model compression techniques based on weighted constraints, and a selection of an optimized model compression technique based on the determined metrics.

The apparatus might advantageously be part of an industrial automation environment. The industrial automation environment in particular has limited processing capacity. With the provided apparatus, the flexibility and efficiency of deploying offline trained AI models on different edge devices, e.g., Technology Module Neural Processing Unit, Industrial Edge, etc. is improved.

The logic unit might be integrated into a control unit.

Further possible implementations or alternative solutions of the disclosure also encompass combinations (not explicitly mentioned herein) of features described above or below with regard to the embodiments. The person skilled in the art may also add individual or isolated aspects and features to the most basic form of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, different aspects of the present disclosure are described in more detail with reference to the accompanying drawings.

FIG. 1 depicts a schematic representation of a diagram illustrating the method for automated determination of a model compression technique according to a first embodiment;

FIG. 2 depicts a schematic representation of a diagram illustrating the method for optimizing metrics according to a second embodiment;

FIG. 3 depicts a schematic representation of a block diagram illustrating the method for automated determination of a model compression technique showing input and output according to a third embodiment.

DETAILED DESCRIPTION

The first embodiment refers to the testing phase of the determination method. According to the first embodiment, a deep learning-based model is used to solve an AI task of proposing a work schedule for autonomous guided vehicles, e.g., for providing an order of AGVs receiving goods to be transferred in an automation plant. The AI task is performed by an edge device, the edge device being connectable to a cloud environment. For example, the edge device receives AI-based models from the cloud environment. The method, according to the first embodiment, is run on the target device where the compressed model may be deployed. In other embodiments, the method for automated determination of a model compression technique is run in an emulated runtime environment, for example, an office PC with especially created environment which emulates a target device.

Three constraints may be considered for this exemplary scenario, namely, the hardware acceleration Me1, the inference time Me2, and the accuracy Me3 of the AI based result. In other scenarios, a variety of constraints Me1, Me2, Me3, . . . , Men might be considered. Each of the constraints Me1, Me2, Me3 is weighted with a respective weighting factor a-f. The number of constraints might be much higher in real scenarios. For example, the number or sort of constraints might be chosen in accordance with constraints that are considered for the calculation of a metric of the uncompressed model.

FIG. 1 shows a diagram with two axes as two dimensions, a first dimension d1, and a second dimension d2. The metrics CM1, CM2, CM3 are metrics determined for three different compression methods, which are promising candidates for compression of the deep learning model.

The first dimension d1 indicates a first value of the metric CM1 and this value is determined by usage of a function of the constraints and weighting factors for each constraint. For example, the first-dimension value is calculated by CM1_d1=a*Me1+b*Me2+c*Me3. The value of the metric CM2 in the second dimension d2, for example, is calculated by CM1_d2=d*Me1+e*Me2+f*Me3. The weights are assigned by a user or by the underlying operation type, e.g., whether a postmortem-operation or an online-operation is intended. In this example, a user 1 may give values a, b, c and user 2 may give values d, e, f, wherein the online use of user 1 leads to higher weights for the constraints of hardware acceleration and inference time, whereas a user 2 weights the accuracy Me3 higher. With this, each metric CM1, CM2, Cm3 is calculated based on requirements of the AI project.

Weighting factors a-f might be tuned depending on what is in focus. For example, the metrics values are calculated by CM 1/2=0.1*Me1+0.1*Me2+0.8*Me3. Here, there is a stress on keeping accuracy of a model as high as possible. The CPU utilization and memory are treated equally.

In the example illustrated in FIG. 1, the compression technique leading to the compressed model with the metric CM2 would be chosen because, for both users, the values are higher than for the other two compression techniques. The decision between CM1 and CM3 is not as easy to identify so that they are put in a common cluster and a following optimization act might give more insights.

Also, in cases where there is a complex relation between metrics, e.g., if accuracy increase inference time increases twofold for one compression technique but non-linearly for other compression techniques, or there are constraints on licenses, the optimum value is chosen by performing an optimization act.

An optimization may be performed for every compression technique that has been determined with the expert rule. Advantageously, a database or any other type of data storage is populated in order to generalize it in future with the intention to run a machine learning algorithm on it.

According to the second embodiment, an optimization is performed over the constraints inference time Me2 and accuracy Me3. Hard constraints concerning the availability of a software license, for example, whether there is a license available or not, and if yes, which software license type is available, (e.g., MIT, Google, or Apache license types), are also considered for the optimization and result in a restricted constraints space for the other constraints.

FIG. 2 illustrates a graph indicating an optimized result for constraint values, meaning to what extend constraints may be considered, e.g., how fast a compression may be executed while still achieving an appropriate accuracy. The space excluded from the optimization due to the hard constraints is illustrated in FIG. 2 by the hatched area. Combinations of constraint values lying on the curve RT may be chosen for determining an optimal compression technique and deploying an optimal compressed model.

Such a kind of curve as illustrated in FIG. 2 is determined for every compression technique. All different curves may be compared with each other to choose the best curve. Optimizing each of such curve delivers the best metric and therefore the best compression technique. Mathematically spoken, from a set of functions CMi=gi(f1(Me1), f2(Me2), . . . ), the optimal CMi is chosen.

According to the third embodiment, the following input data I is provided: a type of analysis, a strategy, an AI model, dataset, model compression technique expert selection rule, constraints. The following is generated as output data O: compressed model, optimal compression technique.

FIG. 3 illustrated the input data I and output data O as well as the following acts:

In act 51, the dataset is preprocessed with a strategy. Well known methods for preprocessing data for the usage in AI algorithms might be used.

In act S2, the dataset is split to test and train datasets.

In act S3, a machine learning model is trained on train data from act S2.

In act S4, the machine learning model is tested on test data from act S2 with respect to constraints.

In act S5, a set of model compression techniques is chosen with an expert rule, for example, like according to the first embodiment.

In act S6, when the type of analysis is postmortem, for every model compression technique, acts S7-S10 are performed.

In act S7, a model is compressed with a compressed model technique.

In act S8, the model is tested with respect to constraints and metrics determined in constraints, for example, like according to the first embodiment are obtained.

In act S9, the model compression technique metrics from act S8 and compressed model are saved.

In act S10, a model compression technique is optimized with respect to constraints with a higher weight on accuracy and confidence metrics.

In act S6′, when the type of analysis is online, for every model compression technique, acts S7′-S10′ are performed.

In act S7′, a model is compressed with a compressed model technique.

In act S8′, the model is tested with respect to constraints and metrics determined in constraints are obtained.

In act S9′, the model compression technique metrics from act S8′ and compressed model are saved.

In act S10′, a model compression technique is optimized with respect to constraints with a higher weight on hardware acceleration and inference time.

It is to be understood that the elements and features recited in the appended claims may be combined in different ways to produce new claims that likewise fall within the scope of the present disclosure. Thus, whereas the dependent claims appended below depend on only a single independent or dependent claim, it is to be understood that these dependent claims may, alternatively, be made to depend in the alternative from any preceding or following claim, whether independent or dependent, and that such new combinations are to be understood as forming a part of the present specification.

While the present disclosure has been described above by reference to various embodiments, it may be understood that many changes and modifications may be made to the described embodiments. It is therefore intended that the foregoing description be regarded as illustrative rather than limiting, and that it be understood that all equivalents and/or combinations of embodiments are intended to be included in this description.

Claims

1. A computer-implemented method for automated determination of a model compression technique for a compression of an artificial intelligence-based model, the method comprising:

automatically providing a set of model compression techniques using an expert rule;

determining metrics for model compression techniques of the set of model compression techniques based on weighted constraints; and

selecting an optimized model compression technique based on the determined metrics.

2. The method of claim 1, wherein the weighted constraints reflect hardware or software constraints of an executing system for execution of a compressed model of the artificial intelligence-based model compressed with the model compression technique.

3. The method of claim 1, wherein the expert rule relates an artificial intelligence-based model to the model compression techniques of the set of model compression techniques based on a condition of the artificial intelligence-based model or data needed for training or executing the artificial intelligence-based model.

4. The method of claim 1, wherein the metrics are functions in dependence of respective values representing respective constraints, and

wherein the respective values are weighted with respective weighting factors.

5. The method of claim 4, wherein the functions describe linear, exponential, polynomial, fitted, or fuzzy relations.

6. The method of claim 4, wherein the functions vary depending on the weighted constraints.

7. The method of claim 1, wherein the metrics are relative to a reference metric of the artificial intelligence-based model.

8. The method of claim 1, wherein the weighted constraints for building the metrics depend on hardware and software framework conditions of a system or a device the artificial intelligence-based model is used in.

9. The method of claim 1, wherein a respective weighting factor for a respective weighted constraint of the weighted constraints depends on an analysis type the artificial intelligence-based model is used in.

10. The method of claim 1, wherein the selecting of the optimized model compression technique further comprises optimizing the metrics for each model compression technique of the model compression techniques over the weighted constraints.

11. The method of claim 10, wherein, in the optimizing of the metrics for each model compression technique of the model compression techniques over the weighted constraints, at least one weighted constraint is fixed.

12. The method of claim 10, wherein, in the optimizing of the metrics for each model compression technique of the model compression techniques over the constraints, an optimization method is used.

13. The method of claim 1, further comprising:

generating a compressed artificial intelligence-based model using the optimized model compression technique.

14. A computer program product comprising instructions which, when executed by a computer, cause the computer to:

automatically provide a set of model compression techniques using an expert rule;

determine metrics for model compression techniques of the set of model compression techniques based on weighted constraints; and

selecting an optimized model compression technique based on the determined metrics.

15. An apparatus of an automation environment, the apparatus comprising:

a logic component configured to execute an automated determination of a model compression technique for compression of an artificial intelligence-based model, the automated determination comprising: an automated provision of a set of model compression techniques using an expert rule; a determination of metrics for model compression techniques of the set of model compression techniques based on weighted constraints; and a selection of an optimized model compression technique based on the determined metrics.

16. The apparatus of claim 15, wherein the apparatus is an edge device of an industrial automation environment.

17. The method of claim 10, wherein the optimizing of the metrics for each model compression technique of the model compression techniques over the weighted constraints is over respective values representing the respective constraints or over parameters of the respective model compression techniques influencing the respective value representing the respective constraint.

18. The method of claim 12, wherein the optimization method comprises a gradient descent method, a genetic algorithm based method, or a machine learning classification method.