USING OUTPUT EQUALIZATION IN TRAINING AN ARTIFICIAL INTELLIGENCE MODEL IN A SEMICONDUCTOR SOLUTION
A system for training an artificial intelligence (AI) model for an AI chip may include an AI training unit to train weights of an AI model in floating point, and one or more quantization units for updating the weights of the AI model while accounting for the hardware constraints in the AI chip. The system may also include customization unit for performing one or more linear transformations on the updated weights. The system may also perform output equalization for one or more convolution layers of the AI model to equalize the inputs and/or outputs of each layer of the AI model to within the range allowed in the physical AI chip. The system may further update the weights by performing shift-based quantization that mimics the characteristics of a hardware chip. The updated weights may be stored in fixed point and uploadable to an AI chip implementing an AI task.
Latest Gyrfalcon Technology Inc. Patents:
- Apparatus and methods of obtaining multi-scale feature vector using CNN based integrated circuits
- Greedy approach for obtaining an artificial intelligence model in a parallel configuration
- Using quantization in training an artificial intelligence model in a semiconductor solution
- Systems and methods for determining an artificial intelligence model in a communication system
- Systems and methods for determining an artificial intelligence model in a communication system
This application claims the filing benefit of U.S. Provisional Application No. 62/821,437, filed Mar. 20, 2019 and U.S. Provisional Application No. 62/830,269, filed Apr. 5, 2019. These applications are incorporated by reference herein in their entirety and for all purposes.
FIELDThis patent document relates generally to systems and methods for providing artificial intelligence solutions. Examples of training a convolution neural network model for uploading to an artificial intelligence semiconductor are provided.
BACKGROUNDArtificial intelligence (AI) semiconductor solutions include using embedded hardware in an AI integrated circuit (IC) to perform NI tasks. Hardware-based solutions, as well as software solutions, still encounter the challenges of obtaining an optimal AI model, such as a convolutional neural network (CNN) for the hardware. For example, if the weights of a CNN model are trained outside the chip, they are usually stored in floating point. When the weights of a CNN model in floating point are loaded into an AI chip they usually lose data bits from quantization, for example, from 16- or 32-bits to 1- to 8-bits. The loss of data bits in an AI chip compromises the performance of the AI chip due to lost information and data precision. Further, existing training methods are often performed in a high performance computing environment, such as on a desktop, without accounting for the hardware constraints in a physical AI chip. This often causes performance degradation when a trained AI model is loaded into an AI chip.
The present solution will be described with reference to the following figures, in which like numerals represent like items throughout the figures.
As used in this document, the singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise. Unless defined otherwise, technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art. As used in this document, the term “comprising” means “including, but not limited to.”
An example of “artificial intelligence logic circuit” and “AI logic circuit” includes a logic circuit that is configured to execute certain AI functions such as a neural network in AI or machine learning tasks. An AI logic circuit can be a processor. An AI logic circuit can also be a logic circuit that is controlled by an external processor and executes certain AI functions.
Examples of “integrated circuit” “semiconductor chip,” “chip,” and “semiconductor device” include integrated circuits (ICs) that contain electronic circuits on semiconductor materials, such as silicon, for performing certain functions. For example, an integrated circuit can be a microprocessor, a memory, a programmable array logic (PAL) device, are application-specific integrated circuit (ASIC), or others, An AI integrated circuit may include an integrated circuit that contains an AI logic circuit.
Examples of “AI chip” include hardware- or software-based device that is capable of performing functions of an AI logic circuit. An AI chip may be a physical IC. For example, a physical AI chip may include a CNN, which may contain weights and/or parameters. The AI chip may also be a virtual chip, i.e., software-based. For example, a virtual AI chip may include one or more processor simulators to implement functions of a desired AI logic circuit.
Examples of “AI model” include data containing one or more parameters that, when loaded inside an AI chip, are used for executing the AI chip. For example, an AI model for a given CNN may include the weights, biases, and other parameters for one or more convolutional layers of the CNN. Here, the weights and parameters of an AI model are interchangeable.
In a non-limiting, example, in a CNN model, a computation in a given layer in the CNN may be expressed by Y=W* X+b, where X is input data. Y is output data, W is a kernel, and b is a bias; all variables are relative to the given layer. Both the input data and the output data may have a number of channels. Operation “*” is a convolution. Kernel W may include weights. For example, a kernel may include 9 cells in a 3×3 mask, where each cell may have a binary value, such as “1” and “−1.” In such case, a kernel may be expressed by multiple binary values in the 3×3 mask multiplied by a scalar. In other examples, for some or all kernels, each cell may be a signed 2 or 8 bit integer. Alternatively, and/or additionally, a kernel may contain data with non-binary values, such as 7-value. Other bit length or values may also be possible. The scalar may include a value having a bit width, such as 12-bit or 16-bit. Other bit length may also be possible. The bias b may contain a value having multiple bits, such as 18 bits. Other bit length or values may also be possible. In a non-limiting example, the output Y may be further discretized into a signed 5-bit or 10-bit integer. Other bit length or values may also be possible.
In some examples, the AI chip in the AI system 114 may include an embedded cellular neural network that has memory containing the multiple parameters in the CNN. In some scenarios, the memory in an AI chip may be a one-time-programmable (OTP) memory that allows a user to load a CNN model into the physical AI chip once. Alternatively, the AI chip may have a random access memory (RAM), magneto-resistive random access memory (MRAM), other types of memory that allows a user to update and load a CNN model into the physical AI chip multiple times. In a non-limiting example, the AI chip may include convolutional, Pooling, and ReLU layers in a CNN model. In such case, the AI chip may perform all computations in an AI task. In other examples, the AI chip may include a subset of the convolutional, Pooling, and ReLU layers in a CNN model. In such case, the AI chip may perform certain computations in an AI task, leaving the remaining, computations in the AI task performed in a CPU/GPU or other host processors outside the AI chip.
In some examples, the training network 101 may be configured to include, a forward propagation neural network, in which information may flow from the input layer to one or more hidden layers of the network to the output layer. An AI training system may also be configured to include a backward propagation network to update the weights of the AI model based on the output of the AI chip. In some examples, an AI training system may include a combination of forward and backward propagation networks.
In some examples, training data 102 may be provided for use in training the AI model 112. For example, training data 102 may be used for training an AI model that is suitable for face recognition tasks, and the training data may contain any suitable dataset collected for performing face recognition tasks. In another example, the training data may be used for training an AI model suitable for scene recognition in video and images, and thus the training data may contain any suitable scene dataset collected for performing scene recognition tasks. In some scenarios, training data may reside in a memory in a host device. In one or more other scenarios, training data may reside in a central data repository and is available for access the training network 101 via the communication network 103. In some examples, an AI model may be trained by using one or more devices to implement one or more training units 104-110 as shown in
In some examples, the training network 101 may include a floating-point model training unit 104, which may be configured to train an AI model, e.g., a CNN model using one or more sets of training data 102. For example, the floating-point model training unit may be implemented on a desktop computer (CPU, and/or GPU) in floating point, in which one or more weights of the CNN model are in floating point. Any known or later developed methods may be used to train a CNN model. The training system 400 may further include one or more units to convert the floating-point model to a hardware-supported model, as further illustrated in
In some examples, the training system 100 may include a convolution quantization unit 106 and/or activation quantization unit 108, each of which may be configured to update the weights of a CNN model to adapt to an AI chip. For example, the convolution quantization unit 106 may convert the trained weights in floating-point to weights in fixed-point so that the weights can be supported by the AI chip. The activation quantization unit 108 may further update the weights of the CNN so that the CNN output values based on the updated weights are also supported by the AI chip. Alternatively, and/or additionally, the order of the convolution quantization unit 106 and the activation quantization unit 108 may not matter. For example, the activation quantization unit 108 may access the training weights (e.g., from the floating-point model training unit 104) in floating-point and generate updated weights in fixed-point. Conversely, the convolution quantization unit 106 may access the updated weights in fixed-point from the activation quantization unit 108 and further update the weights to those the can be supported by the AI chip. For, example, the updated weights from the convolution quantization unit 106 and/or the activation quantization unit 108 may be in fixed-point and have the bit-width, equal to that supported by the AI chip, such as 1-bit, 2-bit, 5-bit, 8-bit etc. The output values that are generated by the AI model based on the updated weights from the convolution quantization unit 106 and/or the activation quantization unit 108 may also result in fixed-point values and have the bit-width equal to that supported by activation layers in the AI chip, such as 5-bit, or 10-bit.
In some examples, the training network 101 may include a chip customization unit 110 which may be configured to further update the weights of the AI model to maximize the resources supported by the hardware AI chip. For example, the chip customization unit 110 may be configured to perform batch normalization merge, image mean merge, scalar mean merge, and/or a combination thereof, which are described in the present disclosure. The chip customization unit 110 may further train the weights in a manner that mimics the characteristics of the hardware in the AI chip. For example, the training may include shift-based quantization which may mimic the features of the hardware. In some examples, the one or more units in the training network 101 may be serially coupled in that the output of one unit is fed to the input of another unit. For example, the one or more units may be coupled in the order of 104, 108, 106 and 110, where 104 receives the training data and produces a floating-point AI model, where each of the units 105, 108 and 110 further converts or updates the weights of the AI model and unit 110 produces the final AI model 112 uploadable to an AI chip for executing an AI task. Alternatively, the one or more units may be coupled in the order of 104, 106, 108 and 110. In other scenarios, a lesser of 104, 106, 108 and 110 may be serially coupled. For example, boxes 104, 106 and 110, or boxes 104, 108 and 110 may be respectively serially coupled to train and update the weights of the AI model and generate the final AI model 112.
With further reference to
In some examples, the process 200 may further include quantizing the trained weights at 204, determining output of the AI model based on the quantized weights at 206, determining a change of weights at 208 and updating the weights at 210. In some examples, the quantized weights may correspond to the limit of the hardware, such as the physical AI chip. In a non-limiting example, the quantized weights may be of 1-bit (binary value), 2-bit, 3-bit, 5-bit, or other suitable bits, such as 8-bit. Determining the output of the AI model at 206 may include inferring the AI model using the training data 209 and the quantized trained weights.
With further reference to
In some examples, quantizing the weights at 204 may include a dynamic fixed point conversion. For example, the quantized weights may be determined by:
nbit is the bit-size of the weights in the physical AI chip. For example, nbit may be 8-bit, 12-bit etc. Other values may be possible.
In some examples, quantizing the weights at 204 may include determining the quantized weights based on the interval in which the values of the weights fall, where the interval is defined depending on the value of nbit. In a non-limiting example, when nbit=1, the weights of a CNN model rimy be quantized into two quantization levels. In other words, the weight values may be divided into two intervals. For example, the first interval is [0, ∞), and the second interval (−∞, 0). When Wk≥0, WQ=(Wk)Q=(Wmean)shift-quantized, where Wk represents the weights for a kernel in a convolution layer of the CNN model, Wmean=mean(abs(Wk)), and a shift-quantization of a weight w may be determined by
where |W|max is the maximum value of absolute values of the weights. Similarly, when Wk<0, WQ=−(Wmean)shift-quantized. The mean and maximum values are relative to a convolution layer in the CNN model.
In a non-limiting example, when nbit=2, the intervals may be defined by (−∞, −Wmean/4), [−Wmean/4, Wmean/4] and (Wmean/4, ∞). Thus, the weights may be quantized into:
WQ=0, when |Wk|≤Wmean/4;
WQ=(Wmean)shift-quantized, when Wk<Wmean/4;
WQ=−(Wmean)shift-quantized, when Wk<−Wmean/4.
It is appreciated that other variations may also be possible. For example, Wmax may be used instead of Wmean. Denominators other than the values of 4 may also be used.
In another non-limiting example, when nbit=3, the intervals may be defined, as shown in
WQ=0when |Wk|≤W′mean/2;
WQ=(W′mean)shift-quantized, when 3W′mean2<3W′mean/2;
WQ=(2W′mean)shift-quantized, when 3W′mean/2<Wk<3W′mean;
WQ=(4W′mean)shift-quantized, when Wk>3W′mean;
WQ=−(W′mean)shift-quantized, when −3W′mean/2<Wk<−W′mean/2;
WQ=−(2W′mean)shift-quantized, when −3W′mean<Wk<−3W′mean/2;
WQ=−(W′mean)shift-quantized, when Wk<3W′mean;
It is appreciated that other variations may also be possible. For example, Wmax may be used instead of Wmean. Denominators other than the values of 4 to 2 may also be used.
Alternatively, and/or additionally, quantizing the weights at 204 may also include compressed-fixed point conversion, where a weight value may be separated into a scalar and a mask, where W=scalar×mask. Here, a mask may include a k×k kernel and each value in the mask may have a bit-width, such as 1-bit, 2-bit, 3-bit, 5-bit, 8-bit or other bit sizes. In some examples, a quantized weight may be represented by a product of a mask and an associated scalar. The mask may be selected to maximize the bit size of the kernel, where the scalar may be a maximum common denominator among all of the weights. In a non-limiting example, when nbit=5 or above, scalar=min(abs(wk)) for all weights in kth kernel, and
The process 200 may further include determining a change of weights at 208 based on the output of the CNN model. In some examples, the output of the CNN model may be the output of the activation layer of the CNN. The process 200 may further update the weights of the CNN model at 210 based on the change of weights. In some examples, the process 200 may be implemented in a forward propagation and backward propagation framework. For example, the process 200 may perform operation 206 in a layer by layer fashion in a forward propagation, in which the inference of the AI model is propagated from the first convolution layer to the last convolution layer in a CNN (or a subset of the convolution layers in the CNN). The output inferred from the first layer will be fed to the second layer, the output inferred from the second layer will be fed to the third layer, so on and so forth until the output of the last layer is inferred.
In some examples, the operations 208 and 210 may be performed in a layer by layer fashion in a backward propagation, in which a change of weights is determined for each layer in a CNN from the last year to the first layer tor a subset of the convolution layers in the CNN), and the weights in each layer are updated based on the change of weights. In some examples, a loss function may be determined based on the output of the CNN model (e.g., the output of the last convolution layer of the CNN), and the changes of weights may be determined based on the loss function. This is further explained.
In some examples, the process 200 may repeat updating the weights of the CNN model in one or more iterations. In some examples, blocks 206, 208, 210 may be implemented using a gradient descent method, in which a suitable loss function may be used. In a non-limiting example, a loss function may be defined as:
where yi is the prediction of the network, e.g., the output of the CNN based on the ith training instance. In a non-limiting example, if the CNN output includes two image labels (e.g., dog or cat), then yi may have the value of 0 or 1. N is the number of training instances in the training data set. The probability p(yi) of a training instance being yiand may be determined from the training. In other words, the loss function h( ) may be defined based on a sum of loss values over a plurality of training instances in the training data set, wherein the loss value of each of the plurality of training instances is a difference between an output of the CNN model for the training instance and a ground truth of the training instance.
In a non-limiting example, the training data 209 may include a plurality of training input images. The ground truth data may include information about one or more objects in the image, or about whether the image contains a class of objects, such as a cat, a dog, a human face, or a given person's face. Inferring the AI model may include generating a recognition result indicating which class to which the input image belongs. In the training process, such as 200, the loss function may be determined based on the image labels in the ground truth and the recognition result generated from the AI chip based on the training input image.
In some examples, the gradient descent may be used to determine a change of weight
ΔW=ƒ(WQt)
by minimizing the loss function H( ), where WQt stand for the quantized weights at time t. The process may update the weight from a previous iteration based on the change of weight, e.g., Wt+1=WtΔW, where Wtand Wt+1 stand for the weights in a preceding iteration and the weights in the current iteration, respectively. In some examples, the weights (or updated weights) in each iteration, such as Wt and Wt+1 may be stored in floating point. The quantized weights WQt at each iteration t may be stored in fixed point. In some examples, the gradient descent may include known methods, such as stochastic gradient descent method.
With further reference to
In each iteration, the process 200 may determine whether a stopping criteria has been met at 214. If the stopping criteria has been met, the process may store the updated weights of the CNN model at the current iteration at 216 for use by another unit (e.g., a unit in 101 in
In some examples, the process 200 may be implemented entirely on a desktop using a CPU or a GPU. Alternatively, certain operations in the process 200 may be implemented in a physical AI chip, where the trained weights or updated weights are uploaded inside the AI chip.
Similar to
With further reference to
In some examples, the, process 600 may include accessing the input of a first convolution layer at 602 and determining the output of the first convolution layer at 604. For example, the first convolution layer may be any of the convolution layers in a CNN model that corresponds to a convolution layer, e.g. 502, 504, 506 . . . in an AI chip. The output of the convolution may be stored in floating point. Accessing the input of the first convolution layer at 602 may include assessing the input data, if the first convolution layer is the first layer after the input in the CNN, or assessing the output of the preceding layer, if the first convolution layer is an intermediate layer. Determining the output of the first convolution layer at 604 may include executing a CNN model to produce an output at the first convolution layer, in a training process, determining the output of the convolution layer may be performed outside of a chip, e.g., in a CPU/GPU environment. Alternatively, determining the output of the convolution layer may be performed in an AI chip.
With further reference to
Returning to
Here, a value of [0, α] may be represented by a maximum number of bits in the activation layer, e.g., 5-bit, 10-bit, or other values. If a weight is in the range of [0, α], then the quantization becomes a linear transformation. If a weight has a value of less than zero or a value of greater than α, then the quantization clips the weight at zero or α, respectively. Here, the quantization of activation layer limns the value of the output to the same limit in the hardware. In a non-limiting example, if the bit-width of an activation layer in an AI chip is 5 bits, then [0, α] may be represented by 5 bits. Accordingly, the quantized value will be represented by 5 bits.
With further reference to
Blocks 610 and 612 may perform in a similar fashion as blocks 604 and 606. Further, the process 600 may repeat blocks 608-612 for one Of more additional layers at 614. In some examples, the process 600 may quantize the output for all convolution layers in a CNN in a layer-by-layer fashion. In some examples, the process 600 may quantize the output of some convolution layers in a CNN model. For example, the process 600 may quantize the output of one or more last few convolution layers in the CNN.
Returning to
where yi is the prediction of the network, e.g., the output of the CNN based on the ith training instance. In a non-limiting example, if the CNN output includes two image labels (e.g., dog or cat), then yi may have the value of 0 or 1. N is the number of training instances in the training data set. The probability p(yi) of a training instance being yi and may be determined from the training. In other words, the loss function H( ) may be defined based on a sum of loss values over a plurality of training instances in the training data set, wherein the loss value of each of the plurality of training instances is a difference between an output of the CNN model for the training instance and a ground truth of the training instance.
In some examples, the gradient descent may be used to determine a change of weights
ΔW=ƒ(WQt)
by minimizing the loss function H( ), where WQt stands for the quantized weights at time t. In other words, WQt=Q(Wt). The process may update the weight from a previous iteration based on the change of weight, e.g., Wt+1=Wt+ΔW, where Wt and Wt+1 stand for the weights in a preceding iteration and the weights in the current iteration, respectively. In some examples, the weights (or updated weights) in each iteration, such as Wt and Wt+1, may be stored in floating point. The quantized weights WQt at each iteration t may be stored in fixed point. In some examples, the gradient descent may include known methods, such as a stochastic gradient descent method.
With further reference to
In some examples, performing batch normalization merge at 704 may include updating the weights and biases of the CNN model by merging the batch normalization into the convolution layers such that the input values of a convolution layer Y=W*X+b are effectively normalized to Y″=W′*X+b where W′ and b′ are updated weights and biases. In some examples, a hatch normalization may be expressed as:
where the mean and std are the average and standard deviations of the input values (or output values of previous layers) for each batch of images X. Here, γ and β may be learned from the training process. Accordingly, the weights and biases may be updated based on:
In some examples, the weights and biases may be updated per convolution layer. The updating of weights and biases may be performed independently between layers. A batch refers to a data batch, such as a plurality of images. Average values and standard deviations may be determined over the plurality of images in each batch. The values for γ and β are learned during the gradient descent training, independently from the weights and biases of the AI model. A batch normalization may normalize the inputs of each layer to the same range of values. This may help speed up the training process (to converge faster). For example, batch normalization may prevent early saturation of non-linear activation functions. The batch normalization merge at 704 essentially merges the batch normalization parameters into the convolution layer of an AI model. This reduces memory usage on the chip, and increases inference speed when running the AI model on the chip.
With further reference to
W′=W
b′=b−W*mean
In some examples, the updating of weights and biases in the image mean merge may be performed for the first convolution layer, which is connected to an image layer at the input. As shown, the image mean merge makes sure the input pixel values of the training images are within the pixel value range, e.g., [0, 255]. Further, the image mean is used during the training to adjust the input image pixel range to be balanced around the value of zero to facilitate training convergence.
With further reference to
In some examples, the updating of weights and biases in the image scale merge may be performed for the first convolution layer, which is connected to an image layer at the input. As shown, the image scale merge gives the effect of adjusting the input image to take values to take full advantage of the size of the input image channel in the AI chip. For example, if the pixel values of the image are above the maximum value allowed in the AI chip, the image scale merge gives the effect of scaling down the image values, or normalizing the image values to within the maximum allowed range of the input image in the AI chip. Conversely, if the pixel values of the image are in a small range, the image scale merge gives the effect of scaling up the image values, or normalizing the values to take full advantage of the maximum allowed range of the input image in the AI chip.
With further reference to
In some examples, the maximum output value of the ith layer αi may be statistically determined from the multiple images in a batch. For example, αi and αi−1 may each represent the statistical maximum output value of the ith layer and its preceding layer, the (i−l)th layer, respectively. Then, the weights and biases of the AI model may be updated as:
where nbit is the maximum bit-size of the output value of each layer. In the above example, the quantized value will be in the range of [0, α] represented in nbit, such as 5-bit. Then the quantization grid is α/(2nbit . . . 1)=α/31. After output equalization, the quantization value will be in the range of [0, 31], with an effective quantization grid being 31/31 (=1.0). In other words, the output equalization causes the quantization grid to be 1, which is feasible by the AI chip hardware.
The various linear transformation operations in
With further reference to
The process 800 may further perform shift-based quantization on the access weights at 804. Shift-based quantization may mimic the characteristics of a hardware chip because shift registers are commonly available inside a chip. In some examples, the weights and biases are updated based on a shift value. The shift value may be an integer. For example,
where WQ and bQ are the quantized weights and biases, and nbit represents the maximum allowed value in the physical AI chip. In some examples, the weights and biases are updated for one or more convolution layers in a CNN model.
With further reference to
Determining the change of weights at 810 and updating the weights at 812 may include a similar training process as in
where yi is the prediction of the network, e.g., the output of the CNN based on the ith training instance. In a non-limiting example, if the CNN output includes two image labels (e.g., dog or cat), then yi may have the value of 0 or 1. N is the number of training instances in the training data set. The probability p(yi) of a training instance being yi and may be determined from the training. In other words, the loss function H( ) may be defined based on a sum of loss values over a plurality of training instances n the training data set, wherein the loss value of each of the plurality of training instances is a difference between an output of the CNN model for the training instance and a ground truth of the training instance.
In some examples, the gradient descent may be used to determine a change of weights
ΔW=ƒ(WQt)
by minimizing the loss function H( ), where WQt stands for the quantized weights at time t. In other words, WQt=Q(Wt). The process may update the weight from a previous iteration based on the change of weight, e.g., Wt+1=Wt+ΔW, where Wt and Wt+1 stand for the weights in a preceding iteration and the weights in the current iteration, respectively. In some examples, the weights (or updated weights) in each iteration, such as Wt and Wt+1, may be stored in floating point. The quantized weights WQt at each iteration t may be stored in fixed point. In some examples, the gradient descent may include known methods, such as stochastic gradient descent method.
The stopping criteria may defined in a similar fashion as in
In some examples, the process 800 may be implemented entirely on a desktop using a CPU or a GPU. Alternatively, certain operations in the process 800 may be implemented in a physical AI chip, where the trained weights or updated weights are uploaded inside the AI chip. Once the stopping criteria is met at 814, the process 800 may store the updated weights at 816.
Returning to
In an example application, an AI chip may be installed in a camera and store the trained weights and/or other parameters of the CNN model, such as those trained/quantized/updates weights generated in any of units in the training network 101 (in
It is appreciated that the disclosures of various embodiments in
An optional display interface 930 may permit information from the bus 900 to be displayed on a display device 935 in visual, graphic, or alphanumeric format. An audio interface and audio output (such as a speaker) also may be provided. Communication with external devices may occur using various communication ports 940 such as a transmitter and/or receiver, antenna, an RFID tag and/or short-range, or near-field communication circuitry. A communication port 940 may be attached to a communications network, such as the Internet, a local area network, or a cellular telephone data network.
The hardware may also include a user interface sensor 945 that allows for receipt of data from input devices 950 such as a keyboard, a mouse, a joystick, a touchscreen, a remote control, a pointing device, a video input device, and/or an audio input device, such as a microphone. Digital image flames may also be received from an imaging capturing device 955 such as a video or camera that can either be built-in or external to the system. Other environmental sensors 960, such as a GPS system and/or a temperature sensor, may be installed on system and communicatively accessible by the processor 905, either directly or via the communication ports 940. The communication ports 940 may also communicate with the AI chip to upload or retrieve data to/from the chip. For example, a trained AI model with updated quantized weights obtained from the training system 100 (
Optionally, the hardware may not need to include a memory, but instead programming instructions are run on one or more virtual machines or one or more containers on a cloud. For example, the various methods illustrated above may be implemented by a server on a cloud that includes multiple virtual machines, each virtual machine having an operating system, a virtual disk, virtual network and applications, and the programming instructions for implementing various functions in the robotic system may be stored on one or more of those virtual machines on the cloud.
Various embodiments described above may be implemented and adapted to various applications. For example, the AI chip having a CNN architecture may be residing in an electronic mobile device. The electronic mobile device may use the built-in AI chip to produce recognition results and generate performance values. In some scenarios, training the CNN model can be performed in the mobile device itself, where the mobile device retrieves training data from a dataset and uses the built-in AI chip to perform the training. In other scenarios, the processing device may be a server device in the communication network (e.g., 102 in
The various systems and methods disclosed in this patent document provide advantages over the prior art, whether implemented standalone or combined. For example, using the systems and methods described in
It will be readily understood that the components of the present solution as generally described herein and illustrated in the appended figures could be arranged and designed in a wide variety of different configurations. Thus, the detailed description of various implementations, as represented herein and in the figures, is not intended to limit the scope of the present disclosure, but is merely representative of various implementations. While the various aspects of the present solution are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The present solution may be embodied in other specific forms without departing from its spirit or essential, characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the present solution is, therefore, indicated by the appended claims rather than by this detailed description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Reference throughout this specification to features, advantages, or similar language does not imply that all of the features and advantages that may be realized with the present solution should be or are in any single embodiment thereof. Rather, language referring to the features and advantages is understood to mean that a specific feature, advantage, or characteristic described in connection with an embodiment is included in at least one embodiment of the present solution. Thus, discussions of the features and advantages, and, similar language, throughout the specification may, but do not necessarily, refer to the same embodiment.
Furthermore, the described features, advantages, and characteristics of the present solution may be combined in any suitable manner in one or more embodiments. One ordinarily skilled in the relevant art will recognize, in light, of the description herein, that the present solution can be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the present solution.
Other advantages can be apparent to those skilled in the art from the foregoing specification. Accordingly, it will be recognized by those skilled in the art that changes, modifications, or combinations may be made to the above-described embodiments without departing from the broad inventive concepts of the invention. It should therefore be understood that the present solution is not limited to the particular embodiments described herein, but is intended to include all changes, modifications, and all combinations of various embodiments that are within the scope and spirit of the invention as defined in the claims.
Claims
1. A system comprising,
- a processor; and
- non-transitory computer readable medium contain programming instructions that, when executed, will cause the processor to: access eights of a convolution neural network (CNN) model; perform one or more linear transformations on the weights; perform a output equalization operation to update the weights of the CNN model to cause output of one or more convolution layers of the CNN model to be equalized; and upload the updated weights of the CNN model to an artificial intelligence (AI) chip capable of executing an AI task.
2. The system of claim 1, wherein the programming instructions for performing one or more linear transformations comprise programming instructions configured to perform one or more operations of:
- performing a batch normalization merge;
- performing an image mean merge; or
- performing an image scale merge.
3. The system of claim 1, wherein the programming instructions for performing the output equalization operation comprise programming instructions configured to:
- based on a training image set, use the weights of the CNN model to determine a first maximum output value of a first convolution layer of the CNN model and determine a second maximum, output value of a second convolution layer of the CNN model; and
- update the weights of the second layer of the CNN model based on the first and second maximum output values.
4. The system of claim 3, wherein each convolution layer of the CNN model comprises a bias value, and the programming instructions for performing the gain edition operation also comprise programming instructions configured to:
- update the bias value of the second. convolution layer based on the second maximum value and a bit-size of the second convolution layer corresponding to a convolution layer of the AI chip.
5. The system of claim 1 further comprising additional programming instructions configured to, before uploading the updated weights to the AI chip, perform a fine-tuning operation comprising:
- accessing the updated weights of the CNN model from the output equalization operation;
- in one or more iterations until a stopping criteria is met, performing operations comprising: performing shift-based quantization to quantize the updated weights of the CNN model; determining output of the AI model based on the quantized weights of the CNN model and a training data set; determining a change of weights based on the output of the CNN model; and updating the weights of the CNN model based on the change of weights.
6. The system of claim 5, wherein programming instructions for performing the shift-based quantization comprise programing instructions configured to:
- determine a shift value for a convolution layer of the CNN model corresponding to a convolution layer of the AI chip, based on a bit-size of the corresponding convolution layer of the AI chip and a maximum value of the weights of the CNN model; and
- quantize the updated weights of the convolution layer of the CNN model based on the determined shift value.
7. The system of claim 5, wherein the programming instructions for determining the change of weights of the CNN model comprise using a gradient descent method, wherein a loss function in the gradient descent method is based on a sum of loss values over a plurality of training instances in the training data set, wherein the loss value of each of the plurality of training instances is a difference between the quantized output of the CNN model for the training instance and a ground truth of the training instance.
8. The system of claim 1 further comprising additional programming instructions configured to cause the AI chip to;
- perform the AI task to generate output of the AI task; and
- presenting the output of the AI task on an output device;
- wherein the updated weights of the CNN model are uploaded into an embedded cellular neural network architecture in the AI chip.
9. A method comprising, at a processing device:
- accessing weights of a convolution neural network (CNN) model;
- performing one or more linear transformations on the weights;
- performing a output equalization operation to update the weights of the CNN model to cause output of one or more convolution layers of the CNN model to be equalized; and
- uploading the updated weights of the CNN model to an artificial intelligence (AI) system comprising an AI chip capable of executing an AI task; and
- at the AI system: causing the AI chip to perform the AI task to generate output of the AI task; and presenting the output of the AI task on an output device;
- wherein the updated weights of the CNN model are uploaded into an embedded cellular neural network architecture in the AI chip.
10. The method of claim 9, wherein performing the linear transformations comprises performing one or more operations comprising:
- performing a batch normalization merge;
- performing an image mean merge; and
- performing an image scale merge.
11. The method of claim 9, wherein performing the output equalization operation comprises:
- based on a training image set, using the weights of the CNN model to determine a first maximum output value of a first convolution layer of the CNN model and determine a second maximum output value of a second convolution layer of the CNN model; and
- updating the weights of the second convolution layer of the CNN model based on the first and second maximum output values.
12. The method of claim 11, wherein each convolution layer of the CNN model comprises a bias value, and performing the gain edition operation also comprises:
- updating the bias value of the second convolution layer based on the second maximum value and a bit-size of the second convolution layer corresponding to a convolution layer of the AI chip.
13. The method of claim 9 further comprising, before uploading the updated weights to the AI chip:
- accessing the updated weights of the CNN model from the output equalization operation; and
- in one or more iterations until a stopping criteria is met, performing operations comprising: performing shift-based quantization to quantize the updated weights of the CNN model; determining output of the AI model based on the quantized weights of the CNN model and a training data set; determining a change of weights based on the output of the CNN model; and updating the weights of the CNN model based on the change of weights.
14. The method of claim 13, wherein perforating the shift-based quantization comprises:
- determining a shift value for a convolution layer of the CNN model corresponding to a convolution layer of the AI chip, based on a bit-size of the corresponding convolution layer of the AI chip and a maximum value of the weights of the CNN model; and
- quantizing weights of the convolution layer of the CNN model based on the determined shift value.
15. The method of claim 13, wherein determining the change of weights comprises using a gradient descent method, wherein a loss function in the gradient descent method is based on a sum of loss values over a plurality of training instances in the training data set, wherein the loss value of each of the plurality of training instances is a difference between the quantized output of the CNN model for the training instance and a ground truth of the training instance.
16. A method comprising, at a processing device:
- accessing weights of a convolution neural network (CNN) model;
- performing one or more linear transformations on the weights, the linear transformations comprising performing one or more operations comprising: performing a batch normalization merge; performing an image mean merge; and performing an image scale merge;
- performing a output equalization operation to update the weights of the CNN model by updating the weights of a first convolution layer of the CNN model based on statistical output values of the first convolution layer and a second convolution layer preceding the first convolution layer based on a training image set; and
- uploading the updated weights of the CNN model to an artificial intelligence (AI) system comprising an AI chip capable of executing an AI task.
17. The method of claim 16 further comprising, at the AI system:
- causing the AI chip to perform the AI task to generate output of the AI task; and
- presenting the output of the AI task on an output device;
- wherein the updated weights of the CNN model are uploaded into an embedded cellular neural network architecture in the AI chip.
18. The method of claim 16, wherein, performing the output equalization operation comprises:
- based on the training image set, using the weights of the CNN model to determine a first maximum output value of the first convolution layer and determine a second maximum output value of the second convolution layer; and
- updating the weights of the first convolution layer of the CNN model based on the first and second maximum output values.
19. The method of claim 18, wherein each convolution layer of the CNN model comprises, a bias value, and performing the output equalization operation also comprises:
- updating the bias value of the second convolution layer based on the second maximum value and a bit-size of the second convolution layer corresponding to a convolution layer of the AI chip.
20. The method of claim 16 further comprising, before uploading the updated weights to the AI chip:
- accessing the updated weights of the CNN model form the output equalization operation; and
- in one or more iterations until a stopping criteria is met, performing operations comprising: performing shift-based quantization on the weights of the CNN model; determining output of the AI model based on the updated weights of the CNN model and a training data set; determining a change of weights based on the output of the CNN model; and updating the weights of the CNN model based on the change of weights.
Type: Application
Filed: Sep 27, 2019
Publication Date: Sep 24, 2020
Applicant: Gyrfalcon Technology Inc. (Milpitas, CA)
Inventors: Yongxiong Ren (San Jose, CA), Yi Fan (Fremont, CA), Yequn Zhang (San Jose, CA), Tianran Chen (San Jose, CA), Yinbo Shi (Santa Clara, CA), Xiaochun Li (San Ramon, CA), Lin Yang (Milpitas, CA)
Application Number: 16/586,432