NEURAL NETWORK DEVICES AND METHODS OF OPERATING THE SAME
A neural network device may generate an input feature list based on an input feature map, where the input feature list includes an input feature index and an input feature value, generating an output feature index based on the input feature index corresponding to an input feature included in the input feature list and a weight index corresponding to a weight included in a weight list, and generating an output feature value corresponding to the output feature index based on the input feature value corresponding to the input feature and a weight value corresponding to the weight.
Latest Samsung Electronics Patents:
- Display device packaging box
- Ink composition, light-emitting apparatus using ink composition, and method of manufacturing light-emitting apparatus
- Method and apparatus for performing random access procedure
- Method and apparatus for random access using PRACH in multi-dimensional structure in wireless communication system
- Method and apparatus for covering a fifth generation (5G) communication system for supporting higher data rates beyond a fourth generation (4G)
This application is a continuation of U.S. application Ser. No. 15/864,379, dated Jan. 8, 2018, which claims the benefit, under 35 U.S.C. § 119, of Korean Patent Application No. 10-2017-0027778, filed on Mar. 3, 2017, in the Korean Intellectual Property Office, the disclosure of each of which is incorporated herein in its entirety by reference.
BACKGROUNDThe inventive concepts relate to semiconductor devices, and more particularly, to neural network device configured to perform operations based on one or more indexes and one or more methods of operating the same.
A neural network refers to a computational architecture which is a model of a biological brain. As neural network technology has recently been developed, there has been a lot of research into analyzing input data and extracting valid information using neural network devices in various types of electronic systems.
Neural network devices may perform a relatively large quantity of operations (“neural network operations”) with regard to complex input data. Efficient processing of neural network operations is desired for a neural network device to analyze high-definition input and extract information in real time.
SUMMARYThe inventive concepts provide a neural network device for increasing an operating speed and reducing power consumption and a method of operating the same.
According to some example embodiments, a method of operating a neural network device may include generating an input feature list based on an input feature map, the input feature list including an input feature index and an input feature value, the input feature index and the input feature value corresponding to an input feature; generating an output feature index based on a first operation on the input feature index and a weight index of a weight list; and generating an output feature value corresponding to the output feature index based on a second operation on the input feature value and a weight value corresponding to the weight index.
According to another some example embodiments, a method of operating a neural network device may include generating an input feature list, the input feature list including an input feature index and an input feature value corresponding to an input feature having a non-zero value, the input feature index indicating a location of the input feature on an input feature map; generating an output feature index based on an index operation on the input feature index; and generating an output feature value corresponding to the output feature index based on a data operation on the input feature value.
According to some example embodiments, a neural network device may include a first memory storing a program of instructions; and a processor. The processor may be configured to execute the program of instructions to perform an index operation based on an input feature index, the input feature index indicating a location of an input feature on an input feature map, generate an output feature index based on an index operation result of the index operation, perform a data operation based on an input feature value of the input feature, and generate an output feature value corresponding to the output feature index based on a data operation result of the data operation.
According to some example embodiments, a method may include generating, using an index remapper of a processor, an input feature list based on an input feature map, the input feature list including an input feature index and an input feature value, the input feature index and the input feature value corresponding to an input feature; and causing an index remapper to perform a first operation to generate an output feature index. The first operation may include adding the input feature index and a weight index of a weight list, dividing an added-up value resulting from the adding by an integer, and selecting a quotient of the dividing as an output feature index based on a determination that no remainder is present upon completion of the dividing.
Example embodiments of the inventive concepts will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
An electronic system 100 may analyze input data in real time based on a neural network, extract valid information, and determine a situation or control the elements of an electronic device mounted on the electronic system 100 based on the extracted information. The electronic system 100 may be used in a drone, a robotic device such as an advanced driver assistance system (ADAS), a smart television (TV), a smart phone, a medical device, a mobile device, an image display device, a measuring device, and an internet of things (IoT) device. The electronic system 100 may be mounted on any one of other various electronic devices.
Referring to
The CPU 110 controls overall operations of the electronic system 100. The CPU 110 may include a single core processor or a multi-core processor. The CPU 110 may process or execute programs and/or data stored in the memory 140. For example, the CPU 110 may control the function of the neural network device 130 by executing programs (“one or more programs of instructions”) stored in the memory 140 to implement some or all of the operations described herein.
The RAM 120 may temporarily store programs, data, or instructions. Programs and/or data stored in the memory 140 may be temporarily stored in the RAM 120 according to the control of the CPU 110 or booting code. The RAM 120 may be implemented as dynamic RAM (DRAM) or static RAM (SRAM).
The neural network device 130 may perform a neural network operation based on input data and may generate an information signal based on a result of the operation (“the neural network operation”). Neural networks may include convolutional neural networks (CNN), recurrent neural networks (RNN), deep belief networks, and restricted Boltzmann machines but are not limited thereto.
The information signal may include one among various kinds of recognition signals such as a voice recognition signal, a thing recognition signal, an image recognition signal, and a biometric recognition signal. The neural network device 130 may receive frame data included in a video stream as input data and may generate a recognition signal with respect to a thing, which is included in an image represented by the frame data, from the frame data. However, the inventive concepts are not limited thereto. The neural network device 130 may receive various kinds (“types”) of input data according to the type or function of an electronic device on which the electronic system 100 is mounted and may generate a recognition signal according to the input data. An example of a neural network architecture will be briefly described with reference to
Each of the first through third layers 11, 12, and 13 may receive input data or a feature map generated in a previous layer as an input feature map and may generate an output feature map or a recognition signal REC by performing an operation on the input feature map. At this time, the feature map is data which represents various features of input data. Features maps FM1, FM2, and FM3 may have a form of a two-dimensional matrix or a form of a three-dimensional matrix. These feature maps FM1, FM2, and FM3 having a multi-dimensional matrix form may be referred to as feature tensors. The feature maps FM1, FM2, and FM3 have a width (or a column) W, a height (or a row) H, and a depth D, which may respectively correspond to the x-axis, the y-axis, and the z-axis in a coordinate system. The depth D may be referred to as the number of channels.
A location on the xy-plane of a feature map may be referred to as a spatial location. A location on the z-axis of the feature map may be referred to as a channel. A size on the xy-plane of the feature map may be referred to as a spatial size.
The first layer 11 may perform a convolution of the first feature map FM1 and a weight map WM to generate the second feature map FM2. The weight map WM may filter the first feature map FM1 and may be referred to as a filter or a kernel. The depth, i.e., the number of channels of the weight map WM, may be the same as the depth, i.e., the number of channels of the first feature map FM1. The convolution may be performed on the same channels in both the weight map WM and the first feature map FM1. The weight map WM shifts by traversing the first feature map FM1 as a sliding window. The amount of shift may be referred to as a “stride length” or a “stride”. During a shift, each weight included in the weight map WM may be multiplied by and added to all feature values in an area where the weight map WM overlaps the first feature map FM1. One channel of the second feature map FM2 may be generated by performing a convolution of the first feature map FM1 and the weight map WM. Although only one weight map WM is shown in
The second layer 12 may perform pooling to generate the third feature map FM3. The pooling may be referred to as sampling or downsampling. A two-dimensional pooling window PW may be shifted on the second feature map FM2 and a maximum value among feature values (or an average of the feature values) in an area where the pooling window PW overlaps the second feature map FM2 may be selected, so that the third feature map FM3 may be generated from the second feature map FM2. The number of channels of the third feature map FM3 may be the same as the number of channels of the second feature map FM2.
In some example embodiments, the pooling window PW may be shifted on the second feature map FM2 by a unit of the size of the pooling window PW. The amount of shift, i.e., the stride of the pooling window PW, may be the same as the length of the pooling window PW. Accordingly, the spatial size of the third feature map FM3 may be smaller than that of the second feature map FM2. However, the inventive concepts are not limited thereto. The spatial size of the third feature map FM3 may be the same as or larger than that of the second feature map FM2. The spatial size of the third feature map FM3 may be determined according to the size of the pooling window PW, a stride length, and whether zero-padding is performed or not.
The third layer 13 may combine features of the third feature map FM3 and categorize a class CL of the input data. The third layer 13 may also generate the recognition signal REC corresponding to the class CL. The input data may correspond to frame data included in a video stream. At this time, the third layer 13 may extract a class corresponding to a thing included in an image represented by the frame data based on the third feature map FM3 provided from the second layer 12, recognize the thing, and generate the recognition signal REC corresponding to the thing.
In a neural network, low-level layers, e.g., convolution layers, may extract low-level features (e.g., an edge or gradient of a face image) from input data or an input feature map and high-level layers, e.g., fully-connected layers, may extract or detect high-level features, i.e., classes (e.g., eyes and a nose of the face image) from the input feature map.
Referring to
The neural network device 130 may perform an operation corresponding to at least one of a plurality of layers of a neural network described above with reference to
As shown in
An index-based neural network operation may include an index operation. The index operation is performing an operation on each input feature index in an input feature list and an index of a different parameter. The index operation may be referred to as index remapping. When the index operation is performed, a data operation, i.e., an operation on an input feature value, may be simplified or skipped.
As shown in
Meanwhile, a weight map used in a convolution operation may be converted into a weight list and provided to the neural network device 130. The weight list may include an index and data which correspond to each weight having a non-zero value. To avoid confusion about terms, an index and data in an input feature list will be referred to as an input feature index and an input feature value and an index and data in a weight list will be referred to as a weight index and a weight value.
The neural network device 130 may perform a convolution operation on input features and weights, which have non-zero values, based on indices in an input feature list and indices in a weight list.
A zero value in a neural network operation does not influence the result of the operation. Accordingly, the neural network device 130 may generate an input feature list based on input features having non-zero values and perform an operation based on indices in the input feature list, so that the neural network device 130 may perform an operation on input features only having non-zero values. As a result, an operation on input features having the zero value may be skipped.
However, the inventive concepts may not be limited thereto. An input feature list may also include an index and data which correspond to an input feature having the zero value. The neural network device 130 may generate the input feature list based on input features having either the zero value or a non-zero value and may perform an operation based on indices.
Referring back to
The memory 140 may be DRAM but is not limited thereto. The memory 140 may include at least one among volatile memory and nonvolatile memory. The nonvolatile memory includes read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, phase-change RAM (PRAM), magnetic RAM (MRAM), resistive RAM (RRAM), and ferroelectric RAM (FeRAM). The volatile memory may include DRAM, SRAM, synchronous DRAM (SDRAM), PRAM, MRAM, RRAM, and FeRAM. Alternatively, the memory 140 may include at least one among a hard disk drive (HDD), a solid state drive (SSD), compact flash (CF), secure digital (SD), micro-SD, mini-SD, extreme digital (xD), and a memory stick.
The sensor module 150 may collect surrounding information of an electronic device mounted on the electronic system 100. The sensor module 150 may sense or receive a signal (e.g., a video signal, an audio signal, a magnetic signal, a bio-signal, or a touch signal) from outside the electronic device and may convert the sensed or received signal into data. For this operation, the sensor module 150 may include at least one of various sensing devices such as a microphone, an image pickup device, an image sensor, a light detection and ranging (LIDAR) sensor, an ultrasonic sensor, an infrared sensor, a bio-sensor, and a touch sensor.
The sensor module 150 may provide the data to the neural network device 130 as input data. For example, the sensor module 150 may include an image sensor. At this time, the sensor module 150 may shoot an external circumstance of an electronic device, generate a video stream, and sequentially provide consecutive data frames of the video stream to the neural network device 130 as input data. However, the inventive concepts are not limited thereto. The sensor module 150 may provide various types of data to the neural network device 130.
The communication module 160 may include various types of wired or wireless interfaces which communicate with external devices. For example, the communication module 160 may include a communication interface which enables access to a local area network (LAN), a wireless LAN (WLAN) like wireless fidelity (Wi-Fi), a wireless personal area network (WPAN) like Bluetooth, a wireless universal serial bus (USB), ZigBee, near field communication (NFC), radio-frequency identification (RFID), power line communication (PLC), or a mobile cellular network like third generation (3G), fourth generation (4G), or long term evolution (LTE).
The communication module 160 may receive a weight map or a weight list from an external server. The external server may perform training based on massive learning data and may provide a weight map or a weight list, which includes trained weights, to the electronic system 100. The received weight map or weight list may be stored in the memory 140.
The communication module 160 may generate and/or communicate an information signal based on a result of an operation (e.g., an output feature map, generated during an operation in a form of an output feature list or an output feature matrix).
As described above, according to some example embodiments of the inventive concepts, the neural network device 130 may efficiently perform a neural network operation by performing the neural network operation based on an index. In particular, the neural network device 130 may generate an input feature list corresponding to an input feature having a non-zero value in a sparse neural network in which non-zero values are sparse in a feature map or a weight map and perform an operation on the input feature having the non-zero value based on the input feature list, thereby reducing the amount of operations. As the amount of operations is reduced, the efficiency of the neural network device 130 is increased and power consumption of the neural network device 130 and the electronic system 100 is decreased. Various embodiments of an index-based neural network operation method will be described in detail below.
Referring to
The neural network device 130 may perform an index operation based on the input feature index in the input feature list and generate an output feature index based on the index operation result in operation S120. The index operation result of the index operation may be an output feature index.
The neural network device 130 may perform a data operation based on the input feature value in the input feature list and may generate an output feature value corresponding to the output feature index based on the data operation result in operation S130. At this time, when the output feature index generated in operation S120 is not mapped in the output feature map, the neural network device 130 may skip the data operation. The data operation result of the data operation may be an output feature value corresponding to the output feature index.
The neural network device 130 may generate an output feature list based on the output feature index and the output feature value in operation S140. The neural network device 130 performs operations S120 and S130 on all input features in the input feature list to generate the output feature list. Restated, the neural network device 130 may generate, at operation S110, an input feature list that includes a plurality of input feature indices and a plurality of input feature values, the plurality of input feature indices corresponding to separate input features of a plurality of input features, the plurality of input feature values corresponding to separate input features of the plurality of input features, and the neural network device 130 may further perform, based on separate, respective input features, separate sets of operations S120 and S130 to generate a plurality of output feature indices based on the separate, respective input feature indices of the input feature list and to generate a plurality of output feature values based on the separate, respective input feature values, respectively. As part of performing separate sets of operations S120 and S130 based on separate, respective input features, the neural network device 130 may filter a limited selection of output indices, of the plurality of output indices, based on a determination that the limited selection of output indices do not influence an output result during the operation, such that the plurality of output indices is filtered to include a remainder selection of output indices that do influence an output result during the operation. The neural network device 130 may store the output feature list in a memory. The memory may be located inside the neural network device 130 or may be a memory, e.g., the memory 140 shown in
In some example embodiments, if the output feature list is for the final layer of a neural network, the neural network device 130 may generate an information signal based on the output feature list.
The neural network device 130 may reduce the amount of operations by performing an operation on each input feature index and each input feature value and filtering output indices (e.g., a limited selection of output indices of the plurality of output indices) which do not influence an output result during the operation. In addition, the neural network device 130 may easily process various operations of a neural network based on an index operation. As a result, the functioning of an electronic system 100 that includes the neural network device 130 may be improved based on performing the aforementioned one or more operations.
Referring to
Thereafter, the neural network device 130 may perform an index-based convolution operation based on the input feature list and a weight list which has been stored in advance.
The neural network device 130 may generate an output feature index based on an input feature index and a weight index in operation S220. The neural network device 130 may generate the output feature index by performing an operation (“first operation”) on the input feature index and the weight index.
The neural network device 130 may generate the output feature index by performing an operation on the input feature index corresponding to the input feature having a non-zero value and a weight index corresponding to a weight having a non-zero value.
In detail, the neural network device 130 may generate the output feature index by adding the input feature index and the weight index. The neural network device 130 may add a first index of the input feature index and a first index of the weight index and add a second index of the input feature index and a second index of the weight index.
The neural network device 130 may generate an output feature value corresponding to the output feature index based on the input feature value and a weight value in operation S230. The neural network device 130 may generate the output feature value by performing a data operation (“second operation”) based on the input feature value and the weight value. The neural network device 130 may multiply the input feature value by the weight value and may generate the output feature value based on a multiplication value resulting from the multiplication. The neural network device 130 may generate the output feature value by adding a plurality of multiplication values corresponding to the output feature index. The input feature value and the weight value may be non-zero.
The neural network device 130 may perform an index-based convolution operation by performing the index operation based on the input feature index and the weight index in the weight list in operation S220 and performing the data operation based on the input feature value and the weight value in operation S230. In some example embodiments, if the output feature is for the final layer of a neural network, the neural network device 130 may generate an information signal based on the output feature value.
In some example embodiments, the index-based convolution operation method may also include an operation in which the neural network device 130 generates the weight list from a weight matrix. For example, the neural network device 130 may receive the weight matrix from outside, e.g., outside the neural network device 130 or an external server of an electronic device equipped with the neural network device 130, and may generate the weight list from the weight matrix. The weight list may include a weight index and a weight value which correspond to each of weights included in the weight matrix. The neural network device 130 may generate the weight list corresponding to at least one weight having a non-zero value in the weight matrix. The neural network device 130 may store the weight list and may use the weight index and the weight value in operations S220 and S230. However, the inventive concepts are not limited thereto. The neural network device 130 may receive the weight list from an outside, e.g., outside the neural network device 130 or an external server of an electronic device equipped with the neural network device 130, and may store the weight list and then use the weight list.
In detail,
Referring to
As described above, when a convolution operation is performed, an input feature having a zero value and/or a weight having a zero value do not influence the operation result. Although a lot of snapshots may be generated during the traversal convolution operation, only six snapshots shown in
The neural network device 130 may generate an initial weight list IWL with respect to non-zero weights, e.g., the weights W0,1 and W2,2, of the weight matrix WMX. A weight index of the initial weight list IWL indicates a spatial location, e.g., an address, of each of the weights W0,1 and W2,2. Such a weight index may be referred to as an “initial weight index.”
Thereafter, the initial weight index may be adjusted to correspond to a particular operation. The adjusting may include the neural network device 130 generating a mirrored weight list MWL by mirroring a weight index (the “initial weight index”) in the initial weight list IWL based on a weight bias index, e.g., (RA, CA)=(1, 1), indicating the center of the weight matrix WMX.
The neural network device 130 may bias mirrored weight indices by subtracting the weight bias index, i.e., (RA, CA)=(1, 1), from a weight index (“mirrored weight index”) of the mirrored weight list MWL. As a result, (1, 0) and (−1, −1) may be generated as weight indices of the respective weights W0,1 and W2,2 and the weight list WL used for the convolution operation may be generated.
For example, each of input feature indices (1, 1), (1, 4), and (4, 3) of the respective input features f1,1, f1,4, and f4,3 may be added to the weight index (1, 0) of the weight W0,1, so that output feature indices (2, 1), (2, 4), and (5, 3) may be generated. At this time, the first index RA of each input feature index may be added to the first index RA of the weight index and the second index CA of each input feature index may be added to the second index CA of the weight index.
An input feature value of each of the input features f1,1, f1,4, and f4,3 is multiplied by a weight value of the weight W0,1, so that a first output feature list OFL1 may be generated with respect to the weight W0,1. In addition, each of the input feature indices (1, 1), (1, 4), and (4, 3) of the respective input features f1,1, f1,4, and f4,3 may be added to the weight index (−1, −1) of the weight W2,2 and the input feature value of each of the input features f1,1, f1,4, and f4,3 is multiplied by a weight value of the weight W2,2, so that a second output feature list OFL2 may be generated with respect to the weight W2,2.
Since there is no overlapping output feature index between the first output feature list OFL1 and the second output feature list OFL2, output features in the first output feature list OFL1 and output features in the second output feature list OFL2 may be mapped on a matrix without additional operation. It can be seen that the output feature matrix OFMX shown in
The traversal convolution operation essentially involves redundancy due to traversal. Accordingly, it is not easy to skip an operation on an input feature and a weight which have the zero value, i.e., a meaningless operation which does not influence an output feature. However, when the index-based convolution operation according to some example embodiments of the inventive concepts is used as shown in
Referring to
When an index-based convolution operation is performed based on the input feature list IFL shown in
According to the current embodiments of the inventive concepts, when the index-based convolution operation is used, the neural network device 130 may generate an output feature index using an index operation and an output feature value using a data operation. However, when there is an overlapping output feature index, i.e., when there are a plurality of data operation results, i.e., multiplication values, with respect to one output feature index, the neural network device 130 may add the plurality of multiplication values to generate the output feature value corresponding to the output feature index.
As described above with reference to
Referring to
The neural network device 130 may add a bias index to each index of the input feature list in operation S320. Consequently, the neural network device 130 may perform zero-padding. This will be described in detail with reference to
Zero-padding in a neural network is adding zeros to the input feature map IFM in all outward directions, i.e., row and column directions. When zero-padding is applied to the input feature map IFM, an input feature map with zero-padding, i.e., a zero-padded input feature map IFM_Z may be generated. When one zero is added to every outward direction of the input feature map IFM, as shown in
When zero-padding is applied to the input feature map IFM in matrix form during a traversal convolution operation, an output feature map having the same size as the input feature map IFM may be generated. A neural network device performing the traversal convolution operation needs to include a control logic, which adds zeros to the input feature map IFM, to support the zero-padding.
An operation on an input feature having the zero value may be skipped in an index-based neural network operation. When using zero-padding, the neural network device 130 may generate the input feature map IFMa, i.e., the initial input feature list, including input features having a non-zero value and may generate the padded input feature map IFM_Za, i.e., a padded input feature list, excluding zeros generated by applying index-based zero-padding to the input feature map IFMa. Restated, the neural network device 130 may generate an initial input feature list IFMa that includes an initial input feature index corresponding to a location of the input feature and an input feature value corresponding to the input feature.
The neural network device 130 performing the index-based neural network operation may generate the padded input feature map IFM_Za by remapping indices in the input feature list, i.e., the input feature map IFMa in list form, based on a bias index (z, z), also referred to herein as a “feature bias index.” For example, the neural network device 130 may add the bias index (z, z) to the indices of input features of the input feature map IFMa to remap the indices. At this time, the bias index (z, z) may be determined according to a zero-value length.
For example, when one zero is added to the input feature map IFM in all outward directions of the input feature map IFM, as shown in
As described above, the neural network device 130 performing an index-based neural network operation may remap the indices of the input feature map IFMa in list form based on the bias index (z, z) set according to a zero-value length, thereby easily generating the padded input feature map IFM_Za excluding zeros without using a separate control logic for zero-padding.
Referring to
The neural network device 130 may determine whether there is a remainder of the division in operation S430. When there is a remainder, the neural network device 130 may skip an operation on an input feature index and a weight value in operation S440. When there is a remainder of the division, the added-up index is not mapped on an output feature map, and therefore, the result of a data operation on the index does not influence the output feature map. Accordingly, the neural network device 130 may skip the operation on the input feature value and the weight value.
Otherwise, when there is no remainder of the division (e.g., upon completion of the dividing), the neural network device 130 may select a quotient as an output feature index in operation S450 and may perform operations (e.g., multiplication and addition) on the input feature value and the weight value in operation S460. An operation value resulting from the operation may be provided as an output feature value for the output feature index.
For example, when there is no remainder after dividing a result of adding an input feature index of a first input feature and a weight index of a first weight by a stride length, a quotient may be selected as an output feature index and a result of performing an operation on an input feature value corresponding to the first input feature and a weight value corresponding to the first weight may be provided as an output value for the output feature index. When there is a remainder after dividing a result of adding an input feature index of a second input feature and a weight index of a second weight by the stride length, the result of the operation on the input feature index of the second input feature and the weight index of the second weight is not selected as an output feature index. Accordingly, an operation on an input feature value corresponding to the second input feature and a weight value corresponding to the second weight may be omitted.
As described above, a stride may be easily used in an index-based convolution operation through an operation on indices and the amount of operations may be decreased.
As described above, when an index-based convolution operation is used according to some example embodiments of the inventive concepts, the neural network device 130 may add an input feature index and a weight index, may divide the added-up index by a stride length, and may select a quotient as an output feature index when there is no remainder after the division.
For example, since the stride length is 1 in
When there is no remainder after dividing an added-up index by a stride length of 3 in the example shown in
The neural network device 130 may generate an output feature value by performing an operation on an input feature value and a weight value which correspond to an output feature index. The neural network device 130 may not perform an operation on an input feature value and a weight value which do not correspond to an output feature index.
Referring to
The neural network device 130 may perform a pooling operation on the input features having the same remapped index in operation S520. In other words, the pooling operation may be performed on the input features included in the pooling window. Max pooling or average pooling may be performed on the input features.
The neural network device 130 may provide a pooling operation value resulting from the pooling operation as an output feature value corresponding to the output feature index in operation 5530. The index-based pooling method will be described in detail with reference to
As described above with reference to
According to some example embodiments, the neural network device 130 may perform pooling based on an index. The neural network device 130 may divide an input feature index by a particular (or, alternatively, predetermined) sampling length (“sub-sampling size”) and may select the quotient of the division as a remapped index with respect to an input (an “output feature index corresponding to an input feature”). Accordingly, as shown in an index-remapped input feature map (b), indices may be remapped with respect to input features and a plurality of input features may have the same remapped index according to a sampling unit. The remapped index may be an output feature index, i.e., a spatial location at which an output feature value will be stored in an output feature matrix. Before input feature values are stored at a location according to the corresponding output feature index, an operation may be performed on the input feature values according to the kind of pooling.
For example, when max pooling is applied to an input feature matrix, a maximum value among input feature values included in a 2×2 sampling unit, i.e., input feature values corresponding to one output feature index, may be provided as an output feature value corresponding to the output feature index.
In another example, when average pooling is applied to an input feature matrix, input feature values corresponding to one output feature index may be added, an added-up value resulting from the addition may be divided by the number of the input feature values, and the division result may be provided as an output feature value corresponding to the output feature index. However, the inventive concepts are not limited to these examples and various kinds of pooling may be used.
When a result of performing a pooling operation on input features corresponding to each output feature index is provided as an output feature value, the output feature map (c) may be generated.
The various embodiments of an index-based neural network operation method have been described above with reference to
Referring to
The neural network device 200 may include a controller 220, a neural network processor 210, and a system memory 230. The neural network device 200 may also include a direct memory access (DMA) controller to store data in an external memory. The neural network processor 210, the controller 220, and the system memory 230 of the neural network device 200 may communicate with one another through a system bus. The neural network device 200 may be implemented as a semiconductor chip, e.g., a system-on-chip (SoC), but is not limited thereto. The neural network device 200 may be implemented by a plurality of semiconductor chips. In the present embodiment, the controller 220 and the neural network processor 210 are shown as separate components but are not limited thereto, and the controller 220 may included in the neural network processor 210.
The controller 220 may be implemented as a CPU or a microprocessor. The controller 220 may control all operations of the neural network device 200. In some example embodiments, the controller 220 may execute a program of instruction stored in the system memory 230 to control the neural network device 200. The controller 220 may control the operations of the neural network processor 210 and the system memory 230. For example, the controller 220 may set and manage parameters to allow the neural network processor 210 to normally execute layers of a neural network.
The controller 220 may generate a weight list from a weight matrix and provide the weight list to the neural network processor 210. However, the inventive concepts are not limited thereto. A separate preprocessing circuit generating the weight list from the weight matrix may be included in the neural network device 200 or the neural network processor 210.
The neural network processor 210 may include a plurality of processing circuits 211. The processing circuits 211 may be configured to simultaneously operate in parallel. Furthermore, the processing circuits 211 may operate independently from one another. Each of the processing circuits 211 may be implemented as a core circuit executing instructions. The processing circuits 211 may perform the index-based operations described above with reference to
The neural network processor 210 may be implemented by hardware circuits. For example, the neural network processor 210 may be implemented as an integrated circuit. The neural network processor 210 may include at least one among a CPU, a multi-core CPU, an array processor, a vector processor, a digital signal processor (DSP), a field-programmable gate array (FPGA), a programmable logic array (PLA), an application specific integrated circuit (ASIC), a programmable logic circuitry, a video processing unit (VPU), and a graphics processing unit (GPU). However, the inventive concepts are not limited thereto.
The neural network processor 210 may also include an internal memory 212. The internal memory 212 may be a cache memory of the neural network processor 210. The internal memory 212 may be SRAM but is not limited thereto. The internal memory 212 may be implemented as a buffer or a cache memory of the neural network processor 210 or one of other kinds of memory of the neural network processor 210. The internal memory 212 may store data generated according to an operation performed by the processing circuits 211, e.g., output feature indices, output feature values, or various kinds of data generated during the operation.
The system memory 230 may be implemented as RAM, e.g., DRAM or SRAM. The system memory 230 may be connected to the neural network processor 210 through a memory controller. The system memory 230 may store various kinds of programs and data. The system memory 230 may store weight maps provided from an external device, e.g., a server or an external memory.
The system memory 230 may buffer weight maps corresponding to a next layer which will be executed by the neural network processor 210. When an operation is performed using a weight map in the processing circuits 211, the weight map may be output from an external memory (e.g., the memory 140 in
The system memory 230 may also temporarily store an output feature map output from the neural network processor 210.
Referring to
The list maker 213 may generate an input feature list from input features. The list maker 213 may identify inputs having a non-zero value and generate an input feature list of the inputs having a non-zero value.
When a received input feature is a compressed input feature matrix, the list maker 213 may decompress the input feature matrix and generate an input feature list based on the decompressed input feature matrix. When a received input feature includes a compressed input feature list, the list maker 213 may generate an input feature list by performing decompression.
The selector 215 may selectively provide an input feature list output from the list maker 213 or an input feature list received from the internal memory 212 to the processing circuit 211. For example, the selector 215 may provide the input feature list from the list maker 213 to the processing circuit 211 in a first operating mode. The first operating mode may be a linear operation mode. For example, the first operating mode may be a convolution mode. The selector 215 may provide the input feature list from the internal memory 212 to the processing circuit 211 in a second operating mode. The second operating mode may be a pooling mode or a nonlinear operation mode using an activation function. For example, in the second operating mode, a pooling operation may be performed or an activation function may be applied to output feature values generated in the first operating mode.
The index remapper 21 may perform an index operation and generate an output feature index. The index remapper 21 may perform the index operation described above with reference to
The index remapper 21 may receive an input feature list from the selector 215 and a weight list from the dedicated memory 24. The index remapper 21 may add an input feature index and a weight index to generate an added-up index. The index remapper 21 may divide the added-up index by a particular (or, alternatively, predetermined) integer, e.g., a stride length or a sampling unit, used in the pooling operation.
The index remapper 21 may filter indices which have been generated to allow a data operation to be performed on meaningful indices among the generated indices. For example, the index remapper 21 may classify the generated indices into output feature indices and the other indices so that a data operation is performed on the output feature indices included in an output feature list in the first data operation circuit 22 and/or the second data operation circuit 23. The index remapper 21 may control the first data operation circuit 22 and/or the second data operation circuit 23 not to perform an operation on the other indices.
The index remapper 21 may request that data stored in the dedicated memory 24 be read. For example, the index remapper 21 may request that the dedicated memory 24 read a weight list. Restated, the index remapper 21 may transmit, to the dedicated memory 24, a read request signal associated with a request to read parameters corresponding to a first input feature value among the plurality of parameters in a second operating mode. Alternatively, the index remapper 21 may request that the dedicated memory 24 output parameters corresponding to an input feature value, e.g., an output feature value in the output feature list.
The dedicated memory 24 may store various kinds of data used during an operation performed by the processing circuit 211. For example, the dedicated memory 24 may store a weight list. The dedicated memory 24 may also store a lookup table including parameters corresponding to input feature values. The dedicated memory 24 may provide the weight list to the index remapper 21 and the first data operation circuit 22 in response to a request of the index remapper 21. The dedicated memory 24 may also provide the parameters to the first data operation circuit 22 and the second data operation circuit 23 in response to a request of the index remapper 21.
The first data operation circuit 22 and the second data operation circuit 23 may perform a data operation. The first data operation circuit 22 and the second data operation circuit 23 may form a data operation circuit. The first data operation circuit 22 and the second data operation circuit 23 may perform the data operation described above with reference to
The first data operation circuit 22 may perform a multiplication operation. The first data operation circuit 22 may include a multiplier. When the processing circuit 211 performs a convolution operation, the first data operation circuit 22 may multiply an input feature value in an input feature list by a weight value in a weight list. The multiplication result may be provided to the second data operation circuit 23. The first data operation circuit 22 may be implemented by an array of multipliers.
The second data operation circuit 23 may perform an addition operation and also perform a division operation. Furthermore, the second data operation circuit 23 may perform other various kinds of operations. The second data operation circuit 23 may be implemented as an accumulator or an arithmetic operation circuit. The second data operation circuit 23 may be implemented as an array of operational circuits. For example, the second data operation circuit 23 may be implemented as an array of accumulators.
The internal memory 212 may store data output from the processing circuit 211. For example, the internal memory 212 may store an output feature index and a corresponding output feature value, which are received from the second data operation circuit 23. In other words, the internal memory 212 may store an output feature list. In addition, the internal memory 212 may store intermediate results output from the processing circuit 211 during an operation. The intermediate results may be provided to the second data operation circuit 23 to be used in an operation of the second data operation circuit 23.
Data stored in the internal memory 212 may be provided to the processing circuit 211 through the selector 215. In other words, output data resulting from a current operation of the processing circuit 211 may be used in a next operation. For example, an output feature list generated resulting from a convolution operation of the processing circuit 211 may be provided to the processing circuit 211 as an input feature list and the processing circuit 211 may perform a pooling operation on the input feature list.
Meanwhile, the output feature list may be output from the second data operation circuit 23 to the outside, e.g., the memory 140 of the electronic system 100, or may be stored in the internal memory 212 and then output. The output feature list may be output through the compressor 214. The compressor 214 may compress the output feature list and output a compressed output feature list.
The operation of a processor according to an operating mode will be described with reference to
Referring to
The index remapper 21 and the first data operation circuit 22 may respectively receive a weight index and a weight value corresponding to the weight index from a weight list stored in the dedicated memory 24. The index remapper 21 may receive the weight index and the first data operation circuit 22 may receive the weight value.
The index remapper 21 may perform an index operation based on an input feature index and the weight index and the first data operation circuit 22 may perform a data operation on an input feature value and the weight value. The index remapper 21 may add the input feature index and the weight index and may also divide the added-up value to generate an output feature index.
The index remapper 21 may also determine whether the output feature index is meaningful. When it is determined that the output feature index is not meaningful, the index remapper 21 may control the first data operation circuit 22 not to perform an operation on the input feature value and the weight value which correspond to the output feature index. Accordingly, the first data operation circuit 22 may perform an operation on an input feature value and a weight value only corresponding to a meaningful output feature index.
The second data operation circuit 23 may add operation results corresponding to the same output feature index among operation results output from the first data operation circuit 22. Consequently, the first data operation circuit 22 and the second data operation circuit 23 may perform a multiplication operation and an addition operation which are included in a convolution operation.
The second data operation circuit 23 may store an output feature list generated through the convolution operation in the internal memory 212 or may output the output feature list through the compressor 214.
Referring to
The index remapper 21 may receive an input feature value, i.e., an output feature value in the output feature list, from the internal memory 212. The dedicated memory 24, which may be referred to herein as a “third memory,” may store a lookup table including parameters corresponding to input feature values. Restated, the lookup table may include a plurality of parameters corresponding to each feature value of a plurality of feature values. A sign function, a sigmoid function, or an exponential function may be used in a neural network. These activation functions have nonlinearity. The lookup table may include parameters for allowing an activation function with nonlinearity to be calculated as a piecewise linear function. An output “f” of an activation function of an input feature value “v” may be expressed as a result of applying a piecewise linear function to the input feature value “v”, as defined in Equation 1:
f=c(v)·v+b(v) (1)
where c(v) is a coefficient corresponding to the input feature value “v” and b(v) is a bias value corresponding to the input feature value “v”. The lookup table may include parameters corresponding to different input feature values.
The index remapper 21 may request parameters corresponding to the input feature value “v” from the dedicated memory 24. Such a request may include transmitting, to the dedicated memory 24, a read request signal associated with a request to read parameters corresponding to an input feature value among the plurality of parameters. The received parameters may include a first parameter and a second parameter received from the dedicated memory 24, where the first parameter and the second parameter correspond to the input feature value. Accordingly, the parameters, i.e., c(v) and b(v), corresponding to the input feature value “v” may be output from the lookup table stored in the dedicated memory 24. Restated, the output feature value may be generated based on the input feature value, the first parameter, and the second parameter.
The parameter c(v) may be provided to the first data operation circuit 22 and the parameter b(v) may be provided to the second data operation circuit 23. The first data operation circuit 22 may perform a multiplication operation based on the input feature value “v” and the parameter c(v) and the second data operation circuit 23 may perform an addition operation based on the operation result received from the first data operation circuit 22 and the parameter b(v). As a result, the output “f” of the activation function of the input feature value “v” may be generated. Output feature values of the activation function of a plurality of input feature values may be output to outside the neural network processor. The output feature values of the activation function may be compressed by the compressor 214 before being output to the outside.
Referring to
The kernels KN0 through KN4 may be filters different from one another to obtain different characteristics from the input feature map IFM. The number of channels CH included in each of the kernels KN0 through KN4 is the same as the number of channels of the input feature map IFM.
When the convolution operation is performed, each of the kernels KN0 through KN4 may be shifted on the x-y plane of the input feature map IFM. Accordingly, the convolution operation may be performed on the input feature map IFM and the kernels KN0 through KN4 channel by channel. For example, a channel CHk of the kernels KN0 through KN4 may be applied to the channel CHk of the input feature map IFM in the convolution operation. When the convolution operation is performed by applying one of the kernels KN0 through KN4 to the input feature map IFM, the convolution operation can be performed independently from channel to channel. Output feature values, which have the same spatial location, e.g., the same location on the x-y plane and correspond to different channels among output features resulting from the convolution operation, may be added. Accordingly, a result of performing the convolution operation by applying one of the kernels KN0 through KN4 to the input feature map IFM may correspond to one channel of the output feature map OFM.
When the convolution operation is performed based on the plurality of the kernels KN0 through KN4, a plurality of channels may be generated. As shown in
Convolution operations respectively using the kernels KN0 through KN4 may be performed simultaneously in parallel. The convolution operations may be performed in different processing circuits in parallel. However, this parallel operation may vary with the hardware structure of a neural network.
As described above with reference to
As described above, to perform convolution operations in parallel in different processing circuits with respect to the respective channels of the input feature map IFM, the index-based neural network may divide each kernel by channels and regroup the same channels of kernels into one channel group.
Referring to
When a convolution operation is performed, a channel group corresponding to each channel of the input feature map IFM may be used among the channel groups CH0 through CHn-1. For example, a convolution operation may be performed on a second channel of the input feature map IFM and the second channel group CH1. Each of the channel groups CH0 through CHn-1 includes the channels of the kernels KN0 through KN4, and therefore, the result of a convolution operation based on one of the channel groups CH0 through CHn-1 may influence all first through fifth channels of the output feature map OFM. When among convolution operation results with respect to “n” channel groups, convolution operation results which have been generated from one kernel and correspond to one spatial location on the output feature map OFM are added, the output feature map OFM may be completed.
Referring to
Referring to
The neural network processor 210a may generate an input feature list for each channel of the input feature map IFM. The selector 215a may provide the input feature list of input features included in each channel to one of the processing circuits 211a_0 through 211a_k. For example, the selector 215a may provide an input feature list of input features included in a first channel to the first processing circuit 211a_0 and may provide an input feature list of input features included in a k-th channel to the k-th processing circuit 211a_k.
The processing circuits 211a_0 through 211a_k may respectively correspond to the channels of the input feature map IFM. In other words, each of the processing circuits 211a_0 through 211a_k may correspond to a core, i.e., one of the channel groups shown in
For example, the first processing circuit 211a_0 may include a plurality of index remappers 21a, a plurality of first data operation circuits 22a, a plurality of second data operation circuits 23a, and a dedicated memory 24a.
Each of the index remappers 21a may include an arithmetic operation circuit. The first data operation circuits 22a may be an array of multipliers. The second data operation circuits 23a may be an array of adders. However, the inventive concepts are not limited thereto. Each of the second data operation circuits 23a may also include an arithmetic operation circuit.
The dedicated memory 24a may store the weight list WL or a lookup table LUT. When the neural network processor 210a perform a convolution operation, the dedicated memory 24a may output a weight index corresponding to a weight from the weight list WL to the index remappers 21a and may output a weight value corresponding to the weight to the first data operation circuits 22a. The weight list WL may include a weight index, a weight value, and a kernel index which correspond to each weight. The kernel index is information about a kernel including the weight.
When the neural network processor 210a performs a nonlinear operation, the dedicated memory 24a may provide parameters corresponding to an input feature to the first data operation circuits 22a and the second data operation circuits 23a to support a piecewise linear function.
The operation of the first processing circuit 211a_0 is similar to that of the processing circuit 211 described with reference to
The other processing circuits 211a_1 through 211a_k may substantially include the same elements as the first processing circuit 211a_0 and may perform substantially the same operation as the first processing circuit 211a_0.
Meanwhile, some of operation values output from the respective processing circuits 211a_0 through 211a_k may correspond to the same location on an output feature map. Accordingly, the global accumulator 216 may add operation values which have been output from different processing circuits but correspond to the same location on the output feature map.
At this time, due to the characteristics of a sparse neural network, locations to which operation values output from the processing circuits 211a_0 through 211a_k are mapped on the output feature map may be randomly distributed and locations to which operation values simultaneously output from the processing circuits 211a_0 through 211a_k are mapped may be the same as one another on the output feature map. When the global accumulator 216 accumulates in real time operation values output from the processing circuits 211a_0 through 211a_k, the load of the global accumulator 216 may be excessively increased.
For this reason, the second data operation circuits 23a included in each of the processing circuits 211a_0 through 211a_k may add up operation values output from the first data operation circuits 22a according to spatial locations on the output feature map and channels to generate an added-up value for each spatial location and channel. The processing circuits 211a_0 through 211a_k may be synchronized to output added-up values. Each of the second data operation circuits 23a may include an SRAM bank to add up operation values output from the first data operation circuits 22a according to the spatial locations on the output feature map and the channels.
The added-up values output from the processing circuits 211a_0 through 211a_k may be output as vector data according to a corresponding location on the output feature map. The global accumulator 216 may accumulate the vector data.
Since the dense neural network has sparse input features or weights having the zero value, an operation may be efficiently performed by simplifying an operation procedure rather than skipping an operation on the zero value in the operation procedure.
Referring to
As described above with reference to
Input features corresponding to an input feature index indicating one spatial location may be expressed by an input feature vector. Weights corresponding to a weight index indicating one spatial location may be expressed by a weight vector. Accordingly, an input feature list may include an input feature index and an input feature vector corresponding to the input feature index and a weight list may include a weight index and a weight vector corresponding to the weight index. For example, each of the kernels KN0 through KN4 shown in
An input feature index and a weight index are added to generate an output feature index. A dot product of a feature vector and a weight vector may be output as an operation value corresponding to the output feature index. A plurality of operation values may exist with respect to one output feature index. The operation values may be added to generate an output feature value corresponding to the output feature index.
Referring to
The processing circuits 211b_0 through 211b_k may respectively correspond to different kernels. The structure of the processing circuits 211b_0 through 211b_k is similar to that of the processing circuit 211 shown in
The address remapper 21b may include an arithmetic operation circuit. The first data operation circuits 22b may be an array of multipliers. The second data operation circuits 23b may be an array of adders. The address remapper 21b may perform an operation on an externally received input feature index and a weight index provided from the dedicated memory 24b, the first data operation circuits 22b may multiply an input feature value by a weight value, and the second data operation circuits 23b may add multiplication values resulting from multiplications. Consequently, a dot product may be performed on an input feature vector corresponding to the input feature index and a weight vector corresponding to the weight index.
While the inventive concepts have been particularly shown and described with reference to embodiments thereof, it will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the following claims.
Claims
1. A neural network device comprising:
- a first memory storing a program of instructions; and
- a processor configured to execute the program of instructions to perform an index operation based on an input feature index, the input feature index indicating a location of an input feature on an input feature map, generate an output feature index based on an index operation result of the index operation, perform a data operation based on an input feature value of the input feature, and generate an output feature value corresponding to the output feature index based on a data operation result of the data operation.
2. A method comprising:
- generating, using a list maker of a processor, an input feature list based on an input feature map, the input feature list including an input feature index and an input feature value, the input feature index and the input feature value corresponding to an input feature; and
- causing an index remapper of the processor to perform a first operation to generate an output feature index, the first operation including adding the input feature index and a weight index of a weight list, dividing an added-up value resulting from the adding by an integer, and selecting a quotient of the dividing as the output feature index based on a determination that no remainder is present upon completion of the dividing.
Type: Application
Filed: Apr 4, 2022
Publication Date: Aug 18, 2022
Applicant: Samsung Electronics Co., Ltd. (Suwon-si)
Inventor: Jun-seok PARK (Hwaseong-si)
Application Number: 17/712,247