High throughput AES architecture
An advanced encryption system (AES) architecture includes a maximum parallel encryption module which implements one round of the AES algorithm in one clock cycle, and a maximum parallel key scheduling module which generates sub-keys in one clock cycle in parallel with the encryption module, thereby permitting feedback modes of operation to be used without adversely affecting AES throughput. A controller controls the operation of the encryption and key scheduling modules such that one round is completed per clock cycle. The controller is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs). The architecture also preferably includes asynchronous input and output buffers.
1. Field of the Invention
This invention relates to the field of encryption systems, and particularly to advanced encryption standard (AES) architectures.
2. Description of the Related Art
The advanced encryption standard (AES) is a new encryption standard which implements the Rijndael algorithm. The Rijndael algorithm accepts data blocks and key sizes of 128, 192, or 256 bits; the AES implementation is a symmetric block cipher with 128 bit data blocks and a key size that can be chosen from 128, 192, or 256 bits.
Several possible implementation modes of the AES standard are shown in
Ideally, an implementation of the AES standard will have a high data rate. Several AES designs have been proposed to achieve a high data rate based on pipelined architectures. These work well when employing the AES algorithm as an ECB, with no feedback. However, the AES standard is most often used in the feedback modes of operation; in these modes, the output of the AES algorithm is fed back to the input. Unfortunately, this arrangement is incompatible with pipeline structures, due to the long latency of each pipeline path.
SUMMARY OF THE INVENTIONAn AES architecture is presented which overcomes the problems noted above. High throughput is achieved, even when the AES algorithm is employed with one of the feedback modes of operation.
The present invention is a low latency, non-pipelined AES architecture. Hardware is provided for one encryption round, which is re-used as needed to complete the encryption process. This permits feedback modes to be used without adversely affecting AES throughput.
The present architecture requires a maximum parallel encryption module, which is arranged to implement one round of the AES algorithm in one clock cycle. It also requires a maximum parallel key scheduling module, arranged to generate sub keys in one clock cycle in parallel with the encryption module. The encryption and key scheduling modules are preferably made from combinatorial logic blocks, replicated as necessary to achieve one round per clock cycle.
A controller controls the operation of the encryption and key scheduling modules such that one round of the AES algorithm is completed per clock cycle. The controller is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs).
Further features and advantages of the invention will be apparent to those skilled in the art from the following detailed description, taken together with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
An AES architecture in accordance with the present invention is shown in
The key scheduling module 12 is also made maximum parallel, such that the sub-keys required by encryption module 10 are generated in one clock cycle, in parallel with the encryption module.
Encryption module 10 and key scheduling module 12 are controlled via controller 14. The controller is adapted to operate encryption module 10 and key scheduling module 12 to perform one round of the AES algorithm in one clock cycle. Controller 14 is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs), such as an input FSM 15 and an output FSM 16 which control the operation of an input buffer 17 and an output buffer 18, respectively. The controller preferably also communicates with the outside world via input commands and output status bits. The control scheme preferably also includes FSMs 19 and 20, which control the operation of encryption module 10 and key scheduling module 12, respectively, and may be internal or external to their respective modules. Controller 14 preferably also includes an FSM 22; the controller's implementation is discussed in more detail in relation to
The key is provided to key scheduling module 12 either via the input port and encryption module, or (as shown in
When arranged as described above, the present AES architecture provides low latency and high throughput, even when used with feedback modes of operation.
The architecture also preferably includes asynchronous input and output buffers, which implement a full handshake. Asynchronous input buffer 17 loads X-bit data bytes to be encrypted (P), places them in parallel in an N-bit internal register 24, and presents the N bits to the input of encryption module 10 simultaneously. Similarly, asynchronous output buffer 18 receives the N-bit output from encryption module 10 and outputs encoded X-bit data bytes (C) to an output bus. This arrangement decouples the external I/O operations, i.e., the loading and unloading of data, from the internal operation of the encryption core (modules 10 and 12). This allows the input and output busses to be any width compared to the internal input and output registers. Thus, the encryption core can be used in an environment in which the number of pins is limited (e.g., an 8-bit bus or a serial link), as well as with high speed parallel busses (e.g., 64, 128 or 256 bits). Another benefit afforded by the preferred asynchronous input and output buffers is that they enable a slow input and/or output to still be combined with fast internal operation, with the handshaking stretched over a large number of clock cycles to accommodate the slow interface.
One possible implementation for encryption module 10 is shown in
For substitution sub-module 30, the incoming data bits are preferably divided into 8-bit bytes, each of which is used to address an S-box lookup table. Each S-box contains 256 8-bit entries. To provide maximum parallelism and to finish one round of encryption in one clock cycle, the same S-box is replicated 32 times for an expected data block length of 256 bits. The S-box is replicated 16 or 24 times for expected data block lengths of 128 or 192 bits, respectively.
For shift row sub-module 32, the 256 bits of incoming data (assuming a maximum expected data block length of 256 bits) are preferably divided into four 64 bit chunks, each of which is called a “row” and contains eight bytes. Byte-wise cyclic shifts are performed on each row, with the amount of shift determined by the block length through a lookup table, as defined in the AES standard.
For mix column sub-module 34, matrix multiplication is performed on the shifted bytes in accordance with the mix column definition specified in the AES standard, using combinatorial logic; four, six, or eight blocks are used for data block lengths of 128, 192, or 256 bits, respectively.
Finally, key addition sub-module 36 exclusive-OR's the mix column output with the sub-keys received from key scheduling module 12, as prescribed by the AES standard, to generate the encrypted output. Sub-module 36 uses 128, 192 or 256 exclusive-OR gates to produce an output of 128, 192 or 256 bits, respectively.
Maximum parallel key scheduling module 12 has a data path wide enough to accommodate the maximum expected key length. Sub-keys are generated on the fly, in one clock cycle and in parallel with the encryption module. Key scheduling module 12 is arranged to accommodate the different key and block lengths allowed by the Rijndael algorithm or the AES standard, as necessary. The Rijndael algorithm allows block lengths and key lengths of 128, 192 and 256 bits, while the AES standard limits the block length to 128 bits. For the former case, the key scheduling module 12 is arranged to accommodate the nine different key length and block length combinations, and operates as defined in the Rijndael algorithm. For the latter case, only three combinations must be accommodated, with operation of the key scheduling module defined in the AES standard.
The present architecture can support a chosen combination of key-length k and data block length N, which may require differing numbers of key schedule iterations and round transformations. As noted above, one round transformation per clock cycle is required. Consequently, the speed of the key-scheduling process must be adapted as k and N change. Depending on the parameter values, it may be necessary to complete 0, 1 or 2 key scheduling iterations per clock cycle to keep up with 1 round transformation per clock cycle. For example, when 256 bit data blocks and 128 bit sub-keys (N=256, k=128), then 2 key schedule iterations are needed for each data block. Non-integral rates can also occur: for example, if N=128 and k=192, 1.5 key schedule iterations are required per data block.
One key scheduling architecture capable of accommodating these combinations is shown in
A simplified key scheduling architecture may be used when only three key and block length combinations must be accommodated; such an architecture is shown in
As noted above, controller 14 is preferably part of a hierarchical distributed control scheme comprising communicating finite state machines (FSMs); this avoids having the controller logic in the critical path, which might slow down the system. Such a control scheme is shown in
Note that the implementations of the control scheme, key scheduling module, and encryption module shown above are merely exemplary. Other designs could be used to implement these functions in accordance with the definitions given in the AES standard, as long as the encryption and key scheduling modules are made maximum parallel, and the architecture can implement one round of the AES algorithm in one clock cycle.
As noted above, the present AES architecture can be used with one of the feedback modes of operation. This is illustrated in
While particular embodiments of the invention have been shown and described, numerous variations and alternate embodiments will occur to those skilled in the art. Accordingly, it is intended that the invention be limited only in terms of the appended claims.
Claims
1.-24. (canceled)
25. An advanced encryption standard (AES) architecture which provides high throughput and low latency, comprising:
- a parallel encryption circuit that receives a plurality of data bytes to be encrypted and implements one round of the AES algorithm in one clock cycle;
- a parallel key scheduling circuit that generates sub-keys in one clock cycle in parallel with said parallel encryption module, said sub-keys provided to said parallel encryption module; and
- a controller that controls the operation of said parallel encryption and key scheduling modules such that said AES architecture performs one round of the AES algorithm in one clock cycle;
- wherein the parallel key scheduling circuit generates sub-keys and schedules operations in parallel with the maximum parallel encryption circuit, thereby permitting feedback used by the AES algorithm to increase parallelization of AES encryption.
26. The AES architecture of claim 25, further comprising:
- an asynchronous input buffer that receives data bytes to be encrypted, buffers a plurality of said data bytes in parallel, and provides parallel data bytes to said parallel encryption circuit; and
- an asynchronous output buffer that receives an output of said parallel encryption circuit and outputs encrypted data bytes to an output bus.
27. The AES architecture of claim 26, wherein said parallel encryption circuit comprises:
- a substitution sub-circuit comprising substitution blocks which are replicated as needed to receive all of said parallel data bytes from said asynchronous input buffer simultaneously;
- a shift row sub-circuit which receives the outputs of said substitution sub-circuit;
- a mix column sub-circuit which receives the outputs of said shift row sub-circuit; and
- a key addition sub-circuit that receives and combines the outputs of said mix column sub-circuit and said sub-keys from said parallel key scheduling circuit, and provides the results at an output, said output being the output of said parallel encryption circuit.
28. The AES architecture of claim 27, wherein said parallel encryption and key scheduling circuits are implemented exclusively with combinatorial logic.
29. The AES architecture of claim 26, wherein said controller is implemented with a hierarchical distributed control scheme comprising communicating finite state machines (FSMs), comprising:
- a main FSM; and
- local FSMs which are controlled by said main FSM, said local FSMs comprising: a parallel encryption circuit FSM which controls said parallel encryption circuit; a key scheduling circuit FSM which controls said key scheduling circuit; an input buffer FSM which controls said asynchronous input buffer; and an output buffer FSM which controls said asynchronous output buffer.
30. The AES architecture of claim 25, wherein said controller is implemented with a hierarchical distributed control scheme comprising communicating finite state machines (FSMs).
31. The AES architecture of claim 25, wherein said AES architecture implements a Rijndael algorithm with a data-blocks length of 128, 192 or 256 bits and a key-length of 128, 192 or 256 bits.
32. The AES architecture of claim 25, wherein said AES architecture implements the AES standard with a data-block length of 128 bits and a key-length of 128, 192 or 256 bits.
33. The AES architecture of claim 25, wherein said AES architecture implements an electronic code book (ECB) mode of operation.
34. The AES architecture of claim 25, wherein said AES architecture implements a feedback mode of operation.
35. The AES architecture of claim 25, comprising:
- a parallel encryption circuit that receives the output of said asynchronous input buffer and implements one round of the AES algorithm in one clock cycle;
- wherein said controller is a hierarchical distributed control scheme comprising communicating finite state machines (FSMs).
36. The AES architecture of claim 35, wherein said parallel encryption circuit comprises:
- a substitution sub-circuit comprising substitution blocks which are replicated as needed to receive all of said parallel data bytes from said asynchronous input buffer simultaneously;
- a shift row sub-circuit which receives the outputs of said substitution sub-circuit;
- a mix column sub-circuit which receives the 10 outputs of said shift row sub-circuit; and
- a key addition sub-circuit that receives and combines the outputs of said mix column sub-circuit and said sub-keys from said parallel key scheduling circuit, and provides the results at an output, said output being the output of said parallel encryption circuit;
- each of said parallel encryption module sub-circuits implemented exclusively with combinatorial logic.
37. The AES architecture of claim 35, wherein said communicating FSMs comprise:
- a main FSM; and
- local FSMs which are controlled by said main FSM, said local FSMs comprising: a parallel encryption circuit FSM which controls said parallel circuit module; a key scheduling circuit FSM which controls said key scheduling circuit; an input buffer FSM which controls said asynchronous input buffer; and an output buffer FSM which controls said asynchronous output buffer.
38. The AES architecture of claim 35, wherein said AES architecture implements a Rijndael algorithm with a data-blocks length of 128, 192 or 256 bits and a key-length of 128, 192 or 256 bits.
39. The AES architecture of claim 35, wherein said AES architecture implements the AES standard with a data-block length of 128 bits and a key-length of 128, 192 or 256 bits.
40. The AES architecture of claim 35, wherein said AES architecture implements the electronic code book (ECB) mode of operation.
41. The AES architecture of claim 35, wherein said AES architecture implements a feedback mode of operation.
42. The AES architecture of claim 41, wherein said AES architecture implements a Cipher Block Chaining (CBC) feedback mode of operation.
43. The AES architecture of claim 41, wherein said AES architecture implements a Cipher Feedback (CFB) feedback mode of operation.
44. The AES architecture of claim 41, wherein said AES architecture implements an Output Feedback (OFB) feedback mode of operation.
Type: Application
Filed: Apr 10, 2007
Publication Date: Feb 7, 2008
Inventor: Ingrid Verbauwhede (Encino, CA)
Application Number: 11/786,191
International Classification: H04L 9/28 (20060101); H04L 9/00 (20060101);