MULTIPLICATION CIRCUITRY, SYSTEM, CHIP-CONTAINING PRODUCT, AND COMPUTER-READABLE MEDIUM
Multiplication circuitry comprises adder sub-arrays which each add partial products derived from first/second operands. Sub-array result values generated by the adder sub-arrays are added in a result assembly addition to generate at least one multiplication result value representing a result of signed multiplication of the first operand and the second operand. Sign extension emulation is performed for a sign-extension-emulated sub-array result value added in the result assembly addition, by applying a default zero extension to the sign-extension-emulated sub-array result value regardless of its sign and emulating an effect of sign extending the sign-extension-emulated sub-array result value using another of the assembled values. Another example of multiplication circuitry applies default zero extension to a third signed operand being added to a product of first/second signed operands, and emulates sign extension of the third signed operand by adjusting one of the partial products derived from the first/second signed operands.
The present technique relates to the field of data processing.
Technical BackgroundA processor may have logic circuitry for implementing various arithmetic or logical operations. One arithmetic operation to be supported by a processor may be a multiplication operation. While the arithmetic operation for the multiplication operation is well-defined, there is design choice to be made in how to implement hardware circuit logic for performing that operation within a processor. Design decisions made by the circuit designer may have an impact on processing performance and/or energy efficiency.
SUMMARYAt least some examples of the present technique provide multiplication circuitry comprising:
-
- a plurality of adder sub-arrays, each adder array to add a respective set of partial products to generate one or more sub-array result values representing a result of signed multiplication of a respective pair of portions of bits selected from a first operand and a second operand, the plurality of adder sub-arrays comprising separate instances of hardware circuitry, the plurality of adder sub-arrays having at least two separate enable control signals for independently controlling whether at least two subsets of the adder sub-arrays are enabled or disabled; and
- a result assembly adder array to perform a result assembly addition to add a plurality of assembled values including the sub-array result values generated by the plurality of adder sub-arrays, to generate at least one multiplication result value representing a result of signed multiplication of the first operand and the second operand;
- wherein for a sign-extension-emulated sub-array result value being added in the result assembly addition, the result assembly adder array is configured to perform sign extension emulation by:
- applying a default zero extension to the sign-extension-emulated sub-array result value regardless of a sign of the sign-extension-emulated sub-array result value, and
- performing the result assembly addition with at least one other of the plurality of assembled values having a value that, when added in the result assembly addition, emulates an effect of sign extending the sign-extension-emulated sub-array result value up to a bit position corresponding to the most significant bit of the at least one multiplication result value.
At least some examples of the present technique provide multiplication circuitry comprising:
-
- partial product selection circuitry to select a plurality of partial products based on a first signed operand and a second signed operand; and
- an adder array to add the plurality of partial products and a third signed operand; in which:
- the adder array is configured to apply a default zero extension to the third signed operand regardless of a sign of the third signed operand, and the partial product selection circuitry is configured to adjust one of the partial products added by the adder array to emulate an effect of sign extending the third signed operand.
At least some examples provide a system comprising: the multiplication circuitry according to either of the examples described above, implemented in at least one packaged chip; at least one system component; and a board, wherein the at least one packaged chip and the at least one system component are assembled on the board.
At least some examples provide a chip-containing product comprising the system described above assembled on a further board with at least one other product component.
At least some examples of the present technique provide a non-transitory computer-readable medium to store computer-readable code for fabrication of the multiplication circuitry of either of the examples described above.
Further aspects, features and advantages of the present technique will be apparent from the following description of examples, which is to be read in conjunction with the accompanying drawings.
In some examples, multiplication circuitry comprises two or more adder sub-arrays each to add a respective set of partial products to generate one or more sub-array result values representing a result of signed multiplication of a respective pair of portions of bits selected from a first operand and a second operand. Each adder sub-array comprises a separate instance of hardware circuitry. The adder sub-arrays have at least two separate enable control signals for independently controlling whether at least two subsets of adder sub-arrays are enabled or disabled. This approach can be useful to support multiplication operations on portions of the first operand and second operand corresponding to different data types, for example.
A result assembly adder array is provided to perform a result assembly addition to add a plurality of assembled values including the sub-array result values generated by the plurality of adder sub-arrays, to generate at least one multiplication result value representing a result of signed multiplication of the first operand and the second operand. Hence, as well as being operable independently (with unused adder arrays being able to be placed in a power saving state using the independent enable/disable control), the adder sub-arrays can also be used in a cooperative mode so that the respective sub-array result values are added together by the result assembly adder array, to produce at least one multiplication result value which represents a result of signed multiplication of the first operand and the second operand. The at least one multiplication result value may provide a wider multiplication result (with a greater number of bits) than the individual results produced by each adder array. In some cases, the result assembly adder array may generate a plurality of multiplication result values in a redundant form, e.g. generating a carry term and a save term according to a carry-save addition.
Multiplication circuitry of this type may be useful, for example, to support at least two data element size configurations for multiplication of one or more respective pairs of data elements selected from the first operand and the second operand, each data element size configuration corresponding to a different combination of data element sizes for the data elements selected from the first operand and the second operand. For example, the data element size configurations can correspond to different SIMD element sizes for a SIMD (single instruction, multiple data) multiplication operation. For example, the first and second operands could be vector operands and each data element could be a respective vector element. In another example, the first and second operands could be matrix operands and each data element could be a respective matrix element. For example, each individual adder sub-array may be sized appropriate for a specific data element size configuration, and selection of which adder sub-arrays are enabled/disabled may depend on the current element size configuration in use. Use of at least two subsets of adder sub-arrays with independent enable/disable control can provide greater energy efficiency for implementing multiplications with different data element size configurations, compared to an implementation which provides a single large adder array (sized according to the maximum data element size) which relies on injecting 0s for some of the partial product bits when performing multiplications with a data element size smaller than the maximum supported size.
However, with this approach, for a signed multiplication operation, there may be greater complexity in implementing sign extensions of the sub-array result values in the result assembly addition. Such sign extension can have a negative impact on both circuit area due to requiring additional adder cells to add sign extension bits, and processing performance due to increased fanout (the extent to which logic gates of the adder array depend on results of earlier logic gates). For circuit designs with increased fan out, it may become challenging to meet targets on circuit timings, which can limit the maximum clock frequency that can be used and hence reduce performance.
In examples discussed below, for a sign-extension-emulated sub-array result value being added in the result assembly addition, the result assembly adder array is configured to perform sign extension emulation by: applying a default zero extension to the sign-extension-emulated sub-array result value regardless of a sign of the sign-extension-emulated sub-array result value, and performing the result assembly addition with at least one other of the plurality of assembled values having a value that, when added in the result assembly addition, emulates an effect of sign extending the sign-extension-emulated sub-array result value up to a bit position corresponding to the most significant bit of the at least one multiplication result value.
Hence, as a zero extension is applied by default to the sign-extension-emulated sub-array result value, this means there is no need to provide specific adder logic gates to add sign extension bits at positions more significant than the most significant bits of the sign-extension-emulated sub-array result value, saving circuit area. An effect of sign extending the sign-extension-emulated sub-array result value is emulated by setting at least one other assembled value being added in the result assembly addition so that an equivalent result is generated to the one that would have been generated if the sign-extension-emulated sub-array result value had been sign extended in the traditional manner by checking the value of a sign bit and injecting bits of equivalent value to the sign bit at all more significant bit lanes of the result assembly addition. The one or more other assembled values which are used to emulate the sign extension may have fewer bits that depend on earlier intermediate results of the multiplication operation, thus reducing fanout. With reduced fanout, the critical path delay through the slowest processing path through the multiplication circuitry can be reduced, and processing performance can be improved. Hence, performing sign extension emulation for the result assembly addition can reduce the circuit area of the multiplication circuitry and improve processing performance.
In some examples, the at least one other of the plurality of assembled values (whose value is set to emulate an effect of sign extension of a given sign-extension-emulated sub-array result value) comprises a static constant having a value selected independent of values of the first operand and the second operand. Use of a static constant helps to reduce fan out and hence improve performance, because the static constant is independent of the values of the first operand and the second operand and so can be hardwired in circuit logic or obtained in parallel with calculation of the sub-array result values, to reduce the critical path delay through the multiplication circuitry.
In some cases, the static constant may depend on the type of multiply operation being performed, so the result assembly adder array may select which static constant to use depending on a parameter defining the multiply operation. Typically information on the operation type may become known at a much earlier timing than the operands, so the constant can still be regarded as relatively static in comparison to intermediate results which depends on the operands themselves.
While the examples above are described in relation to a single sign-extension-emulated sub-array result value, the sign extension emulation may be applied to more than one of the sub-array result values, so in some examples there may be two or more such sign-extension-emulated sub-array result values.
In some examples, the static constant may be shared between a plurality of sign-extension-emulated sub-array result values, the static constant having a value which when added in the result assembly addition provides emulation of sign extension of each of those plurality of sign-extension-emulated sub-array result values. By combining the respective constants which would emulate sign extensions for multiple sign-extension-emulated sub-array result values into a single shared constant, this reduces the number of assembled values to be added in the result assembly addition. Hence, the depth of the result assembly adder array can be reduced and this helps to improve performance and reduce circuit area.
In some examples, the at least one other of the plurality of assembled values also comprises a correction value injected relative to the sign-extension-emulated sub-array result value which, in combination with the static constant, emulates sign extending the sign-extension-emulated sub-array result value, the correction value comprising fewer bits than the static constant. In some cases, the correction value may comprise a single bit correction. The correction value can require fewer additional bits to be injected into the addition for sign extension compared to traditional sign extension methods, reducing circuit area and fanout.
The sign extension emulation may be applied to a number of different types of sign extension which may occur when performing the result assembly addition.
In some examples, the result assembly adder array is configured to perform a first type of sign extension emulation for a given sign-extension-emulated sub-array result value whose most significant bit is of lower significance than a most significant bit of the at least one multiplication result value, and which is generated by one of the adder sub-arrays based on a pair of portions of bits selected from the first operand and the second operand which includes a sign bit of at least one of the first operand or the second operand. Sub-arrays which generate a sub-array result value with a most significant bit equal to the most significant bit of the at least one multiplication result value, and sub-arrays which operate on a pair of portions of bits selected from the first and second operand which do not include a sign bit of at least one of the first operand or the second operand, do not have this first type of sign extension performed. Especially for the adder subarrays requiring the first type of sign extension which generate the sub-array result values of lowest significance, padding the upper end of those sub-array result values with sign extension bits can require a large number of additional bits which greatly increases the circuit area cost and fanout. This can be avoided by performing sign extension emulation for this type of sign extension.
For the first type of sign extension emulation, the result assembly adder array may include in the plurality of assembled values added in the result assembly addition, at least one assembled value providing a same result as applying:
-
- a correction value of +1 at a bit position corresponding to a most significant bit of the given sign-extension-emulated sub-array result value; and
- a constant having a value which represents subtraction of 1 at a bit position corresponding to the most significant bit of the given sign-extension-emulated sub-array result value.
Adding +1 at the most significant bit clears the sign bit of the given sign-extension-emulated sub-array result value to 0, and therefore avoids the need for traditional sign extension bits (set to mirror the top bit of the value being sign-extended), but this would leave the overall addition result at a value which is too large, which can be corrected by also subtracting 1 at the bit position corresponding to the most significant bit. The +1 and −1 corrections can be applied separately using separate assembled values added in the result assembly addition, or could be combined into a single constant applied as a single assembled value in the result assembly addition. In the case of the +1 correction, this could also be applied within the adder sub-array which generates the given sign-extension-emulated sub-array result value, so that the assembled value which provides the same result as the +1 correction can be the given sign-extension-emulated sub-array result value itself (which may already have the +1 correction included at the point it is added by the result assembly adder array). Either way, unlike traditional sign extension bits, these corrections used to emulate the sign extension are static and do not depend on the actual sign bit of the sign-extension-emulated sub-array result value, so can be hardwired or computed in parallel with the sub-array result value. Compared to deriving sign extension bits once the sign bit becomes known, this helps to reduce the length and inter-connectedness of dependent chains of logic gates (fanout) and hence improve performance.
In some implementations of the multiplication circuitry, each adder sub-array generates, as the one or more sub-array result values, a sum term and a carry term which when added together would give the result of the signed multiplication of the respective pairs of portions, and the result assembly adder array includes, as separate assembled values in the plurality of assembled values being added in the result assembly addition, the sum term and the carry term for a given adder sub-array. By adding the sum and carry terms as separate assembled values in the result assembly addition, this avoids any need for a carry propagate adder to add the sum and carry terms together to produce a non-redundant representation of the output of a given adder sub-array before the result assembly addition is performed. This helps improve performance because a carry propagate adder can be very slow in comparison to carry save additions which produce sum and carry terms.
However, for such an implementation which includes the sum term and carry term as separate assembled values to be added in the result assembly addition, there is a second cause of sign extension. This arises because, if the sum term and the carry term had been added by a carry propagate adder prior to the result assembly addition, this carry propagate addition would generate a carry out bit which if set to 1 would indicate that the product generated by a given adder sub-array was negative. While this carry is not actually generated as the carry propagate addition is not being performed, it should be accounted for and if the carry would be 1 then it would require sign extension to ensure that the sign of the product generated by the given adder sub-array is preserved during the result assembly addition. If handled using the traditional method of sign extension, this would again require a relatively large number of additional adder cells, as well as increasing fanout due to injecting a large number of sign extension bits which depend on the carry out bit from the sum/carry addition, which will then cause all subsequent levels of an adder reduction tree to also depend on that carry out bit.
In contrast, in some examples discussed below, the result assembly adder array may perform a second type of sign extension emulation to emulate a sign extension of a carry out which would be caused by addition of the sum term and the carry term from the given adder sub-array (if such an addition was actually performed—as noted above, it does not need to be performed). By using the sign extension emulation technique described above for the second type of sign extension, this again can save circuit area and improve performance by reducing the number of sign extension bits that need to be applied which are dependent on the specific values output from the given adder sub-array.
For the second type of sign extension emulation applied to the sum term and the carry term from the given adder sub-array, the result assembly adder array may include in the plurality of assembled values added in the result assembly addition:
-
- a correction value at a bit position one place higher than a most significant bit of the sum term, the correction value having opposite bit value to the carry out caused by addition of the sum term and the carry term; and
- a static constant having a value which represents subtraction of 1 at a bit position one place higher than the most significant bit of the carry term.
The combination of such a correction value and constant helps emulate the combined effect of injecting the carry out bit itself and emulating the carry out bit's sign extension.
For the second type of sign extension emulation, the result assembly adder array may select whether the correction value is 0 or 1 based on carry out bits obtained by the given adder sub-array when generating the sum term and the carry term. This can help improve performance because the correction value can be obtained by the given adder sub-array rather than needing a subsequent carry propagate adder to actually add the sum and carry terms.
Another type of sign extension may arise in implementations where the multiplication circuitry supports a negated signed multiplication operation in which the at least one multiplication result value represents −1 times a result of signed multiplication of the first operand and the second operand. In this case, if the negated signed multiplication operation is performed, sub-array result values whose most significant bit is of lower significance than a most significant bit of the at least one multiplication result value may require sign extending even if they are generated based on portions of the first operand and second operand which do not include either operand's most significant bit (unlike for standard non-negated multiplications where such sub-array result values would not need sign extension because result values which do not depend on the sign bits of the operands are treated as positively weighted in a signed value using two's complement notation). The sign extension emulation technique described above can also be used to eliminate costly sign extension bits for such a third type of sign extension.
Hence, for the negated signed multiplication operation, the result assembly adder array may perform a third type of sign extension emulation for a given sign-extension-emulated sub-array result value whose most significant bit is of lower significance than a most significant bit of the at least one multiplication result value, and which is generated by one of the adder sub-arrays based on a given pair of portions of bits selected from the first operand and the second operand where neither of the given pair of portions of bits selected from the first operand and the second operand includes a sign bit. As by definition the third type of sign extension emulation affects the adder sub-arrays which generate results of relatively low significance compared to the most significant bit of the overall multiplication result, this third type of sign extension would normally require a relatively large number of sign extension bits to be applied to sign extend the sub-array result value up to the most significant bits of the at least one multiplication result value. This would be relatively costly and this cost can be avoided by performing the third type of sign extension emulation.
For the third type of sign extension emulation, the result assembly adder array may include in the plurality of assembled values added in the result assembly addition:
-
- a static constant having a value which represents adding 1s at all bit positions more significant than a most significant bit of the given sign-extension-emulated sub-array result value; and
- a correction value at a bit position one place higher than a most significant bit of the given sign-extension-emulated sub-array result value, the correction value being 1 if one or both of the given pair of portions of bits selected from the first operand and the second operand is zero, and being 0 if one of the given pair of portions of bits selected from the first operand and the second operand is non-zero.
As whether the given pair of portions of bits of the first operand or the second operand are zero can be either known in advance, or computed relatively quickly compared to the main processing path through the multiplier circuitry, the correction value can be determined in advance or in parallel with calculation of the given sign-extension-emulated sub-array result value, so can be taken off the critical timing path. Hence, both the static constant and the correction value for the third type of sign extension emulation need not depend on the output of the sub-array generating the given sign-extension-emulated sub-array result value. Therefore, use of the third type of sign extension emulation can improve performance.
It is not essential for all of the first, second and third types of sign extension emulation to be implemented. A given circuit implementation of the multiplication circuitry may implement any one or more of these types of sign extension emulation. Where more than one type of sign extension emulation is implemented, the constants for providing each type of sign extension emulation can be combined into a single shared constant whose value reflects the combined effects of each of the individual types of sign extension emulation.
Another type of operation which may involve costly sign extension may be a signed multiply-add operation for which first and second signed operands are multiplied and the result of the multiplication is added to a third signed operand. Typically, a multiplier for performing a stand-alone multiplication operation may handle sign extensions of the partial products generated from the first and second signed operands relatively efficiently, but when a further addition of the third signed operand is required, that third signed operand may be included as an additional value to be added in the partial product adder array, and typically its sign extension is handled in the traditional manner with bits equal in value to the most significant bit of the third signed operand being injected at every bit position more significant than the most significant bit of the third signed operand, up to the most significant bit of the overall multiplication result. This sign extension can be relatively costly for circuit area and performance.
In contrast, in some examples of multiplication circuitry described below, where an adder array adds a number of partial products (selected by partial product selection circuitry based on the first signed operand and the second signed operand) to a third signed operand, the adder array may apply a default zero extension to the third signed operand regardless of a sign of the third signed operand, and the partial product selection circuitry may adjust one of the partial products added by the adder array to emulate an effect of sign extending the third signed operand. Similar to the examples above, this conserves circuit area and reduces fanout therefore improving performance for signed multiply-add operations.
In some examples, each partial product added by the adder array has a sign extension header to emulate sign extension based on a sign of the corresponding partial product. The partial product selection circuitry may adjust the sign extension header associated with a least significant partial product based on the sign of the third signed operand, to emulate the effect of sign extending the third signed operand. In particular, the partial product selection circuitry may set the sign extension header associated with the least significant partial product to have a value which is 1 lower when the third signed operand is negative than when the third signed operand is positive. This eliminates any need for a series of sign extension bits to be applied at the upper end of the third signed operand, hence reducing circuit area and improving performance as mentioned above.
An apparatus may comprise processing circuitry to perform data processing in response to instructions; and the processing circuitry may comprise any of the examples of multiplication circuitry described above. For example, the processing circuitry could be a CPU (Central Processing Unit), GPU (Graphics processing unit) or other processing unit within a data processing system (e.g. a Neural Processing Unit provided for performing neural network processing or other machine learning operations).
The execute stage 16 includes a number of processing units, for executing different classes of processing operation. For example the execution units may include an integer or fixed-point arithmetic/logic unit (ALU) 20 for performing arithmetic or logical operations on scalar operands or vector operands read from the register file 14; a floating point unit 22 for performing operations on floating-point values, a branch unit 24 for evaluating the outcome of branch operations and adjusting the program counter which represents the current point of execution accordingly; and a load/store unit 28 for performing load/store operations to access data in a memory system 8, 30, 32, 34. In this example the memory system include a level one data cache 30, the level one instruction cache 8, a shared level two cache 32 and main system memory 34. It will be appreciated that this is just one example of a possible memory hierarchy and other arrangements of caches can be provided. The specific types of processing unit 20 to 28 shown in the execute stage 16 are just one example, and other implementations may have a different set of processing units or could include multiple instances of the same type of processing unit so that multiple micro-operations of the same type can be handled in parallel. It will be appreciated that
One example of an operation which may be supported by the processing circuitry 4 (e.g. within the ALU 20 or the floating point unit 22) is a multiplication operation. In some systems, a dedicated execution unit called a multiply-accumulate (MAC) unit may be provided to handle multiplications since multiply-accumulate operations (where two operands are multiplied and the result is added to an accumulator value). A multiply-accumulate (also known as multiply-add) operation may be frequently used in digital signal processing algorithms for example, so any techniques for improving energy efficiency and reducing pressure to meet circuit timings can be extremely helpful.
While some examples below discuss a multiplication operation for conciseness, this is intended to encompass multiply-add or multiply-accumulate operations, so even if a subsequent adder for adding the multiplication result to a third operand is not shown, such a subsequent adder could still be provided. It is also possible to provide standalone multiplication operations which produce a multiplication result without also adding the multiplication result to a third operand.
The multiplication circuitry 40, 50 described below uses a technique known as Booth multiplication, which is based on the principle that, when multiplying a first value (a multiplicand M) by a second value (a multiplier R) to obtain a multiplication result M*R, within the multiplicand M a string of consecutive binary 1s can effectively be replaced with a +1 at the bit position one place higher than the upper end of the string and a −1 at the bit position corresponding to the lower end of the string, which can help to reduce many of the partial products to zero and so make processor logic implementation more straightforward. This is analogous to 999 in decimal being equivalent to 1000−1. Hence, if considering a multiplication of 999*R, the “schoolbook” long multiplication approach would carry out a series of additions of partial products 900*R+90*R+9*R. With the Booth approach this could be reduced to 1000*R−1*R. Respective overlapping groups of bits (referred to as “Booth digits” below) of the multiplicand M can be analysed to look for patterns representing the start/end of runs of successive 1s, and this can be used to deduce each multiple of R to be selected as a respective partial product to be added to form the product result. Although Booth multiplication is described here, the sign extension emulation techniques described below could also be applied to the adders which add partial products in a multiplier which does not use Booth multiplication, e.g. one where the partial products are defined according to the traditional “schoolbook” multiplication approach shown for comparison below.
Booth multiplication involves three stages:
1. Booth Encoding the Multiplicand M.The multiplicand M is logically partitioned into a series of overlapping Booth digits each corresponding to a subset of bits of the multiplicand M. For each Booth digit, the Booth encoder analyses the pattern of bits in that Booth digit and outputs, as a Booth encoding of that Booth digit, a partial product selection indicator which indicates which of a number of different multiples of the multiplier R should be selected as a corresponding partial product to be included in the set of partial products added to produce the multiplication result M*R. Different “radix” versions of the Booth encoding scheme can be provided, where the radix indicates how many bits of the multiplicand M are considered in each Booth digit. Neighbouring Booth digits overlap by 1 bit. The least significant Booth digit is padded with a fixed bit of 0b0 at the lower end. The most significant Booth digit is padded with at least one bit above the most significant bit of M. The padding bits correspond to a sign-extension of the multiplicand.
For example, for a Radix-4 Booth multiplication, each Booth digit comprises 3 bits, and neighbouring Booth digits overlap by 1 bit. For example, for an 8-bit multiplicand M having bits M[7:0], the Booth digits may comprise:
-
- M[1:0]:0b0 (the lower two bits of M concatenated with a fixed value of 0 at the lower end).
- M[3:1]
- M[5:3]
- M[7:5]
- 3 bits comprising a sign extension of M[7] (e.g. for unsigned values this would be 0b00: M[7] and for signed values represented in two's complement representation all three bits are set equal to M[7]).
The Booth encoder implements hardware circuit logic that, based on the pattern of bits in a given Booth digit, determines which multiple of the multiplier R should be selected for a corresponding partial product. The rules for which multiple to select for a given pattern of bits in a Booth digit are based on whether a run of successive 1s starts or ends within that Booth digit. For example, if all bits are 0 or all bits are 1 within the Booth digit, the multiple to select is 0*R because there is no run of 1s starting or ending within the Booth digit. For Booth digits involving a mix of 0s and 1s, the multiple to select depends on the position where any transition from 0 to 1 or 1 to 0 occurs, so that the multiple implements the combined effect of (i) adding a +R multiple at a bit position one higher than the top bit of any run of successive 1s occurring within the Booth digit and (ii) adding a −R multiple at a bit position corresponding to the bottom bit of any run of successive 1s occurring within the Booth digit. However, as the Booth digits are assessed multiple bits at a time, for Radix-4 or higher-radix Booth encoding, higher multiples of R such as ±2*R are considered to account for the fact that the +R or −R multiple could be injected at different bit positions within the Booth digit. Some worked examples are discussed below for Radix-4 and Radix-8, but it will be appreciated that other radix values could be used. For each Booth digit of the multiplicand M, the Booth encoder outputs a partial product selection indicator indicating which of a set of candidate multiples of the multiplier R should be selected for a corresponding partial product.
2. Selection of Partial ProductsThe required multiples of the multiplier R are prepared (this can be done in parallel with the Booth encoding). For example, for Radix-4 operations, the multiples of R that could be selected for a given Booth digit are: +2*R, +R, 0, −R, −2*R. For Radix-8 the multiples extend from +4*R to −4*R. Hence, partial product selection circuitry may form the multiple values. Forming the multiple values may include negation of R for forming the negative multiples, left shifting of R to form power-of-2 multiples such as ±2*R and ±4*R, and, if the radix is such that a non-power of 2 multiple such as ±3*R is required, addition of other multiples (e.g. adding ±2*R and ±R to form the ±3*R multiple).
From among the candidate multiples, for a given Booth digit partial product selection circuitry selects one of the multiples, based on the partial product selection indicator provided by the Booth encoder for that given Booth digit.
3. Addition of Partial ProductsThe partial products selected for each Booth digit are added together (with an appropriate alignment between the partial products for adjacent Booth digits to account for the relative magnitude of the partial product based on the position where the Booth digit was found within the multiplicand M).
To illustrate Booth multiplication, consider a multiplication M*R where M and R are both 8-bit values and the decimal values corresponding to M and R are M=56 and R=47:
-
- M=00111000
- R=00101111
With a traditional “schoolbook” long multiplication method, this can be converted into a series of partial products as follows (where PPi is +R if the corresponding bit i of M is 1 and is 0 if the corresponding bit i of M is 0):
-
- Add partial products (shifted by 1 each time):
9 partial products for 8*8 bit multiplication
With the schoolbook approach, the run of successive 1s in M causes three partial products PP3, PP4, PP5 to include +R multiples. With the Booth approach, the same result could have been achieved by adding +R for PP6 and adding −R for PP3, but with Radix-2 (which would correspond to Booth digits each comprising 2 bits), this would not enable the number of partial products to be reduced. Hence, most practical Booth implementations use Radix-4 or higher.
For Radix-4, the Booth digits are selected based on the bits of M as explained earlier, and the encoding rules are as follows:
Using the same example of M=56 and R=47, the multiple values available for selection for each partial product are:
-
- +2R=01011110
- +R 00101111
- 0=00000000
- −R=11010001
- −2R=10100010
and the multiplicand is: - M=00111000
Hence, the Booth digits selected from the multiplicand M, and the corresponding partial products selected for each Booth digit according to the encoding rules shown above are as follows:
Adding partial products (shifted by 2 each time to account for relative alignment of each Booth digit) gives:
Hence, the same numeric result can now be achieved while adding only 5 partial products, rather than 9.
Similarly, for Radix-8, the Booth encoding rules are as follows:
Applying this to the same M=56 and R=47 example gives:
-
- M=00111000
- +4R=10111100
- +3R=01110011
- +2R 01011110
- +R=00101111
- 0=00000000
- −R=11010001
- −2R=10100010
- −3R=(1) 01110011
- −4R=(1) 01000100
- BD0=M2:M−1=000(0)->PP0=0
- BD1=M5:M2=1110->PP1=−R=11010001
- BD2=M8:M5=(0)001 (bit 8 sign extended from bit 7)->PP2=+R=00101111
- Add partial products (shift by 3 each time):
Now the result can be achieved in 3 partial products.
Hence, an approach using a higher radix can treat the multiplicand M as including fewer Booth digits and hence require fewer partial products to be added, but this is at the expense of increased complexity in having more options for the multiple selection (which will increase the circuit complexity of generating the multiple values and the multiplexers which select the multiple, as well as the complexity of the Booth encoder).
In the example of
The Booth digit encoding and partial product selection are performed according to the Booth multiplication technique discussed above. The Booth partial product generator 42 could operate according to any radix (e.g. radix 4 or radix 8) and may be implemented according to any known Booth encoding/partial product selection technique. Nominally, for N bit operands, there are N/2+1 Booth digits and hence N/2+1 partial products. However, for a signed multiplication of 32-bit operands with Radix-4 Booth encoding, the 17th Booth digit is not needed as it may always correspond to either 000 or 111 and so correspond to a multiple of 0*opb, so does not require explicit addition (the (N/2+1) th Booth digit may be used for unsigned multiplications which do not require the sign extension techniques described here).
The adder array 44 is a carry-save adder tree which performs several stages of 3:2 carry-save additions to reduce the partial products to a sum term and a carry term. The carry propagate adder 46 adds the sum and carry terms to produce a single result value in a non-redundant representation.
For a signed multiplication of the first operand opa and the second operand opb, the addition performed by the carry-save adder tree takes account of the sign of each partial product, which may depend not only on whether the Booth digit generated by encoding the first operand opa causes a positive or negative multiple to be selected as the partial product, but also on the sign of the second operand opb whose multiples are selected for the partial products.
On the face of it, this would therefore require some sign extension bits to be added at all bit positions more significant than the most significant bit of each partial product, for example:
-
- PP0 aaaaaaaA . . .
- PP1 bbbbbB . . .
- PP2 cccC . . .
- PP3 dD . . .
- . . .
where A, B, C, D are the most significant bits of partial products PP0, PP1, PP2, PP3 respectively (corresponding to a sign bit), and a, b, c, d are sign extension bits of same value as A, B, C and D respectively. While this simplified example does not require many sign extension bits, for larger operand sizes such as that shown in the example ofFIG. 3 (with 32-bit operands opa and opb), performing full sign extension can be relatively costly because they impact the area, power and timing for the multiplication circuitry.
The combined effect of all the sign extensions can instead be implemented by providing each partial product with a sign extension header 48 which is injected at a few additional bit positions more significant than the upper bit C of each partial product. The sign extension headers 48 for the partial products are selected by the partial product generator 42 in parallel with selection of the significant bits A, B, C, D etc. for the partial products. As shown in
If a signed multiply-add operation (MAC or MLA) is performed to compute C+A×B, this can be achieved by simply passing operand C (opc) to the adder array 44 as another partial product, as shown in
As shown in
Hence, as shown in
The top four rows of the table show the case when opc is positive and so the sign extension header 48-0 for PP0 is computed according to ˜S, S, S, C as described above. The lower four rows show the case when opc is negative and so the sign extension header is 1 lower than when opc is positive to reflect subtraction of 1. With this change, it can be seen that the only new output value for PP0's sign extension header 48-0 is “0101” (5 in decimal), so this change would be relatively low cost to implement in hardware.
Hence, by adjusting the sign extension header 48-0 associated with the least significant partial product PP0 based on the sign of the third signed operand opc, this can emulate the effect of sign extending the third signed operand, making it unnecessary to apply sign extension to the third signed operand opc. Instead, a default zero extension can be applied to opc regardless of the actual sign of opc. This avoids the added cost of adding the sign extension bits X shown in
At step 102, a default zero extension is applied to the third signed operand opc regardless of its sign.
At step 104, the adder array 44 adds the partial products selected at step 100 (including the least significant partial product with its adjusted sign extension header) and the third signed operand (with the default zero extension applied). The adder array 44 may be a 3:2 carry save adder tree which produces its result in a redundant form comprising a sum term and a carry term. A carry propagate adder 46 may then add the sum and carry terms to produce a non-redundant result in two's complement form, representing C+A*B, where C is the third signed operand, A is the first signed operand and B is the second signed operand.
As shown in
For one of the data element size configurations, the first and second operands may be treated as single data elements to be multiplied together, but for other data element size configurations each of the first and second operands may be logically divided into multiple independent data elements and the product result to be generated may be a vector or matrix comprising a number of result data elements each corresponding to the product of a corresponding pair of data elements of the first and second operands. For a given multiply operation, the data element size configuration information may depend on an immediate operand or register operand of an instruction executed by the processing circuitry 4, and/or based on element size mode information stored within the system register of the processing circuitry 4. The data element size configuration information may vary from one multiplication operation to another.
The Booth encoding circuitry 52 Booth encodes the first operand src_a, to generate a set of partial product selection indicators 62 which each correspond to a Booth encoding of a respective Booth digit of the first operand. The Booth encoding is generated based on the bit patterns of the corresponding Booth digit, according to the encoding rules shown for radix-4 or radix-8 above (or alternatively, if higher radix is used for the Booth encoding, according to similar rules for that higher radix).
The partial product selection circuitry 54 selects, based on the second operand src_b and the partial product selection indicators 62, the sets of partial products to be added by each of the adder sub-arrays 56. For example, a first set of partial products “pps 0” is selected for adder array 56-0, a second set of partial products “pps 1” is selected for adder array 56-1, and so on. The partial product selection may also depend on the data element size configuration information (e.g. different portions of the second operand src_b may be used to select the partial products, depending on whether the data element size configuration information indicates use of a cooperative mode where the adder sub-arrays operate cooperatively to compute the largest data type or a non-cooperative mode where one or more adder sub-arrays work independently to compute SIMD product results for smaller data types). For each partial product, the partial product is a selected multiple of a corresponding portion of the second operand src_b, where that multiple can range from +2*R to −2*R for radix-4 and from +4*R to −4*R for radix-8 (where R is the value corresponding to the selected portion of bits of the second operand src_b that is relevant for a given adder sub-array 56). Different adder sub-arrays 56 may have their partial products selected based on different portions of the second operand src_b. As well as selecting the partial products, the partial product selection circuitry 54 may also include circuitry for generating the multiple values available for selection as the partial products—e.g. including shifting circuitry, negation circuitry, and/or adding circuitry to generate the required +2*R to −2*R or +4*R to −4*R multiples for each portion of the second operand src_b. The circuitry for generating the multiple values based on src_b may operate in parallel with the Booth encoding circuitry 52 generating the partial product selection indicators 62 based on the first operand src_a.
Each adder sub-array 56 receives its set of partial products, and when enabled based on a corresponding enable control signal 64 provided by the enable control circuitry 60, adds its partial products to generate at least one corresponding sub-array result value 66 which represents a result of multiplication of a respective pair of portions of bits selected from the first operand src_a and the second operand src_b. The one or more sub-array result values 66 for a given adder sub-array 56 represent the numeric result M*R of the product of M (a number represented by a selected portion of bits of the first operand src_a) and R (a number represented by a selected portion of bits of the second operand src_b). For each adder sub-array 56, the portions selected from the first and second operands to represent M and R may be different. To speed up addition of partial products, each adder array 56 may be implemented as a carry-save-adder tree which performs a series of carry-save additions (not carry-propagate additions), which reduces processing time by allowing parallel processing of additions in different bit lanes because there is no dependence of the addition in one bit lane on carries generated in lower bit lanes. Hence, in some examples, the sub-array-result values 66 for a given adder may be represented in a carry-save representation using a sum term and a carry term. To generate a binary result for M*R in a two's complement representation, this may require a further addition of the sum term and the carry term using a carry-propagate adder (not shown in
The respective adder arrays 56 are sized to handle different data element configurations within the first and second operands. For example, one subset of adder sub-arrays 56 may implement the additions of partial products for respective pairwise multiplications of pairs of 8-bit data elements within the first and second operands. A second subset of adder sub-arrays 56 may implement the partial products additions for respective pairwise multiplications of pairs of 16-bit data elements within the first and second operands. A third subset of adder sub-arrays 56 may implement the partial products additions for respective pairwise multiplications of pairs of 32-bit data elements within the first and second operands. It will be appreciated that this is just one example of different data element configurations that can be implemented. However, it can be useful to provide separate distinct adder sub-arrays sized appropriate to each data element size configuration, rather than implementing all the data element sizes using a single larger adder array, as this can be more energy efficient because it allows the adder sub-arrays 56 corresponding to data element size configurations not required for a given multiplication operation to be disabled to save power.
In this example, each adder sub-array 56 has its own independent enable control signal 64 which is set independently by the enable control circuitry 60 to independently control whether each adder sub-array is currently enabled or disabled. For example, each enable control signal 64 may be a clock signal used to clock components of the adder sub-array 56, so the enable control circuitry 60 may disable a given adder sub-array by clamping the corresponding clock signal to a fixed value. By preventing the clock signal from toggling, the adder sub-array can be disabled and dynamic power is saved. Other implementations may use a different form of enable control, such as power gating where the enable control signal 64 controls enabling/disabling of the adder sub-array 56 by turning on/off a power gate which controls whether the adder sub-array 56 is coupled to or isolated from a power supply node.
Other examples may control the independent enable/disable of the adder sub-array at a coarser granularity, on a subset by subset basis. For example, subsets of adder sub-arrays 56 each corresponding to a given data size could each be provided with an independent enable control signal 64, but adder arrays within the same subset could be enabled/disabled collectively based on the same enable control signal. However, in practice, having independent enable/disable control for each adder sub-array 56 as shown in the example of
The respective adder sub-arrays 56 can also be used cooperatively to implement a larger multiplication, such as the multiplication of wider portions of bits of src_a and src_b which comprise all magnitude-indicating bits of the first and second operands. When the data element size configuration information indicates that the cooperative mode is to be used, the respective sub-array result values 66 produced by at least a subset of adder arrays (e.g. all of the adder arrays) are added together by the result assembly adder array 58, to produce a product result indicating the numeric value corresponds the product of the wider portions of src_a and src_b than the portions considered by any individual adder array (again, the product may initially be produced by result assembly adder array 58 in a redundant form using separate sum/carry terms, so there may be a further carry propagate adder not shown in
Hence, a larger adder array for a multiplier is constructed from a number of smaller (sub) arrays 56. Some of the smaller subarrays are sized (in terms of power and area) for multiplying smaller data types in a non-cooperative mode. The subarrays 56 can also be used cooperatively to construct larger logical arrays (e.g. for the largest data type).
When the data element size configuration to be used is 8-bit, 16-bit or 32-bit, the corresponding subset of adder arrays is enabled and each adder array within that subset receives partial products selected depending on a corresponding pair of 8/16/32-bit data elements within the first and second operands src_a, src_b. The results of each adder array can be assembled into a vector result (e.g. by result assembly circuitry 58 which in
It is possible to operate the adder sub-arrays so that the subsets of adder arrays for more than one of the data element size configurations are enabled in parallel, to produce (based on the same source operands src_a, src_b) a first vector result corresponding to one size configuration (e.g. 32-bit) and a second vector result corresponding to a second size configuration (e.g. 16-bit). This may require the result assembly circuitry to be duplicated to allow for output of multiple independent results in the same cycle.
In the 64-bit cooperative configuration, all the adder sub-arrays are enabled, and the respective product representing values 66 generated by the adder arrays are further added by the result assembly adder tree 58 to produce a 64-bit multiplication result. Optionally, for a multiply-accumulate operation, a further adder 72 may add the multiplication result (or a vector of multiplication results for each data element lane) to corresponding elements of a third operand (still in carry save form to speed up the further adder 72 compared to carry propagate additions).
Optionally, the carry and save terms output by the result assembly circuitry 58 or further adder 72 can be added by a carry propagate adder to produce a result in 2's complement representation, but this is not essential as often the multiply operation may be one of a series of multiply-accumulate operations and so it may be more efficient to retain the result in carry-save form to allow the further adder 72 to perform a faster addition to a previous accumulation result also in carry-save form (with the carry-propagate operation for converting to 2's complement being deferred until after the final accumulation is performed).
If the subarray multiplier is used to implement a signed multiplication operation, this introduces an additional complexity in the result assembly addition performed by the compression tree 58, because the sub-array carry/save results 66 may require sign extension to ensure that the relative sign of each adder sub-array's output is preserved when assembled into the full width product by the result assembly adder array 58. This can introduce a significant amount of additional circuit logic and increase circuit fan out making it challenging to meet circuit timings.
At step 206, the result assembly adder array 58 performs the result assembly addition to add the plurality of assembled values, which include the sub-array result values generated by the adder sub-arrays 56 at step 200 (with default zero extensions applied), and may also include one or more additional assembled values such as the constant(s) or single bit corrections mentioned above. The result assembly adder array 58 generates, as a result of the result assembly addition, at least one multiplication result value representing a result of a signed multiplication of the first operand and the second operand.
This approach is particularly useful for a subarray multiplier which supports two or more data size configurations, so has a relatively large number of individual adder sub-arrays 56 as shown in
However, to simplify the explanation, the sign extension emulation techniques are first described for a simpler example shown in
Hence, for this simplified example the only data size configurations supported are a 32*32-bit multiplication where adder sub-array FW_0 is used to multiply the lower 32 bits of src_a by the lower 32 bits of src_b and the two extrabits adder sub-arrays 56 are disabled for power saving, and a 64*64 bit multiplication where the three sub-arrays FW_0, Extrabits_0, Extrabits_1 each calculate their respective sub-array results (FW_0 operating in the same way as for the 32*32-bit multiplication, Extrabits_0 multiplying the upper 32 bits of src_a by the lower 32 bits of src_b, and Extrabits_1 multiplying all 64 bits of src_a by the upper 32 bits of src_b).
As shown in
However, as shown in
A first type of sign extension “EB_0_sign_extension[127:96]” is shown for the Extrabits_0 sum term. The first type of sign extension would apply for a subarray which produces signed outputs, and for which in the result assembly addition the most significant bit for the result of that sub-array is aligned to a bit of the overall multiplication result other than the most significant bit. A sub-array will produce signed outputs if its inputs includes either source's most significant bit. Hence, in the simplified example of
A second type of sign extension “FW_0_booth_cout[127:54]” and “EB_0_booth_cout[127:96]” is shown for the FW_0 and Extrabits_0 carry terms, and arises because the sum and carry terms from these sub-arrays were not added together by a carry propagate adder before starting the result assembly addition. If a carry propagate addition had been performed on the sum and carry terms before the result assembly addition started, that carry propagate addition could have caused a carry out which if set to 1 would represent a negative signed result, and so require a sign extension when combined with other results in the result assembly addition. Hence, the sign extension “FW_0_booth_cout[127:54]” would have each bit set equivalent to the carry out which would be generated in an addition of FW_0_sum and FW_0_carry, while the sign extension bits for “EB_0_booth_cout[127:96]” would have each bit set equivalent to the carry out which would be generated in an addition of Extrabits_0_sum and Extrabits_0_carry.
Hence, the result assembly addition requires a significant number of additional sign extension bits to be injected into the assembly tree, which not only requires additional circuit logic compared to an unsigned addition, but also increases the amount of circuit fanout since the sign extension bits depend on bits derived from the sum/carry values from the respective adder arrays and once introduced at one row of the addition tree cause subsequent rows of adder cells to depend on the output of the previous row. This increased circuit fanout causes longer critical path lengths, putting greater pressure on meeting circuit timings, which may tend to limit the maximum clock frequency that can be supported. Hence, sign extensions can be costly in terms of performance.
-
- If the value to be sign extended is negative, its msb (most significant bit, shown bold underlined) is 1 and the sign extension would be S . . . SSS1xxxxxx where the bits denoted by S are the sign extension bits and are also equal to 1.
- For a negative signed value subject to sign extension, adding 1 at the most significant bit of the value being sign extended causes the sign-extended value to change from 1 . . . 1111xxx . . . xxx to (1)0 . . . 0000xxx . . . xxx, where the 0 bit shown bold underlined is the bit at the corresponding msb position in the value being sign extended, and the bracketed 1 represents a carry out of 1 which can be ignored as it will be cancelled out by the −1 correction described below). This adjusted value does not require any explicit adder cells to add the 0s above the most significant bit position, but is too high compared to the value that should have been added (it represents 10 . . . 0000xxx . . . xxx instead of 01 . . . 1111xxx . . . xxx).
- Hence, to obtain the correct result, it would also be needed to subtract 1 at the msb of the value being sign extended.
- On the other hand, if the original value to be signed extended had been positive, its msb would be 0 and the sign-extension would be S . . . SSS0xxxxxx where the bits marked S are the sign extension bits and are also equal to 0.
- If +1 was added at the msb of the positive value being sign extended, this would not cause any 1 to propagate beyond the msb itself (0+1=1), so the sign extension bits S would stay as 0: the value after +1 being added at msb position would become S . . . SSS1xxxxxx. Again, the value is too high—it represents 0 . . . 0001xxx . . . xxx instead of 0 . . . 0000xxx . . . xxx. Subtracting 1 at the msb of the value being sign extended again restores the correct value. Therefore, as shown in
FIG. 15 , regardless of whether the original value was positive or negative, the sign extension can be emulated by: - adding a correction value of +1 at the msb of the value being sign extended-see correction in column 95 at the msb of Extrabits_0_sum; and
- subtracting −1 at the msb of the value being sign extended.
The −1 correction can be represented by a single static constant 232. For the example ofFIG. 16 , this constant becomes 64′hFFFFFFFF80000000 which represents in two's complement form the effect of subtracting 1 at column 95 (column 95 corresponding to the msb of Extrabits_0_sum). The +1 correction can be applied within the adder sub-array itself (e.g. by the Extrabits_0 adder sub-array), at the time of generating the adder sub-array result which is being sign extended, so that it is not necessary for the +1 correction to be considered by the result assembly adder array. An alternative would be to combine the +1 correction into the constant representing the −1 correction.
Hence, the +1 correction in combination with the additional assembled value 232 (constant) emulates the effect of sign-extending the Extrabits_0_sum sub-array result, and means a default zero extension can be applied to that sub-array result to avoid needing to include in the result assembly addition tree the EB_0_sign_extension sign extension bits (shown for comparison in
Similar to the first type of sign extension, the second type of sign extension is based on applying a one-bit correction and an adjustment of −1, but the second type of sign extension differs from the first type of sign extension in that:
-
- the one bit correction is a value !c which has the opposite sign to the carry out bit which would have been generated from the addition of the sum and carry terms produced by the given adder sub-array. As noted below, this can be calculated by the given adder sub-array based on additional bits carried through the adder reduction tree in sub-array 56;
- both the !c correction and the −1 adjustment are applied at the bit position ‘msb+1’, which is one place higher than the position corresponding to the most significant bit of the sum term for the given adder sub-array;
This works because: - if the carry out bit c had been generated by a carry propagate addition of the sum/carry terms, that carry out bit would have been located at the position ‘msb+1’.
- if the carry out bit c was 1, its sign extension would result in there being 1s at position ‘msb+1’ and all more significant bits up to the top bit of the overall result of the result assembly addition—this is equivalent to subtracting 1 at position ‘msb+1’. Hence, for the case where a carry is 1, subtracting 1 at position msb+1 would be enough to give the correct outcome. To avoid problems of increased fanout, it is desirable for this −1 adjustment to be implemented using addition of a static constant which is independent of the values of the operands.
- if the carry out bit c was 0, its sign extension would cause there to be 0s at position ‘msb+1’ and all more significant bits up to the top bit of the overall result of the result assembly addition—i.e. no sign extension would be necessary in this case. However, using the static constant to apply the −1 subtraction at position ‘msb+1’ to deal with the case where the carry out bit c was 1 means that when c=0 then the result is too low. This is corrected by applying a 1-bit correction of +1 at position ‘msb+1’.
- Hence, a static injection of −1 at position ‘msb+1’, combined with +1 only in the case when the carry out c=0, gives the correct result. When c=1, there is no need for the +1 correction and so injecting 0 gives the correct result. Therefore, the 1-bit correction has the opposite value from the carry out bit c, i.e. the correction should be !c (NOT c).
- Therefore, the 1-bit correction of !c combined with a static subtraction of −1 in the same column gives the correct outcome in both cases where c=0 and where c=1. The 1-bit correction means there is only one column which depends on the carry status, not all the columns more significant than the msb position as in the traditional sign extension approach shown in
FIG. 15 , so this helps to reduce fanout.
For example, as shown inFIG. 16 , the FW_0 and Extrabits_1 sub-arrays produce sum and carry terms which have their most significant bit aligned to bits 63 and 95 respectively within the result assembly result, so the 1 bit corrections !c and subtractions of 1 are applied in column 64 for FW_0 and column 96 for Extrabits_1. The two instances of −1 for dealing with the booth carry out extension elimination can be combined to provide a constant of 64′hFFFFFFFEFFFFFFF, and this constant can be combined with the constant 64′hFFFFFFFF80000000 used to eliminate the first type of sign extension as discussed forFIG. 15 , to give a combined compensation constant 64′hFFFFFFFE7FFFFFFF which implements all three of the −1 contributions used to eliminate the true sign extension for Extrabits_0 and the carry out sign extension for FW_0 and Extrabits_0.
The !c correction at position msb+1 relative to a given adder sub-array's output represents whether addition of the sum and carry terms from that given adder sub-array would generate a carry out. It would not be desirable to actually add the sum and carry terms to find the !c correction, because this would be counter to the purpose of consuming the sum and carry terms directly in the result assembly addition, which is to eliminate the delay of such a carry propagate addition. Therefore, instead the value of !c can be estimated by carrying additional bits through the carry-save adder tree used to add the partial products within the given adder sub-array 56. Normally, for generation of N-bit sum/carry values, at each level of the carry save adder reduction tree, the output of that level would be truncated at bit N−1 to give an N-bit value [N−1:0] which is passed to a subsequent level of the reduction tree. However, to support the second type of sign extension emulation, instead the value at each level of the reduction tree is extended to bit N, and the carry out bit can then be calculated from an OR of the final sum and carry's MSB+1 bits.
For example, the pseudocode below shows a simplified example for an 8-bit*8-bit multiplication using a subarray, to produce what would normally be a 16-bit result, but which is provided with 17 bits [16:0] so that the !c term can be estimated without needing a full carry propagate addition of the sum/carry terms pps_o, ppc_o. Note that each level of the 3:2 reduction tree, including the final level producing the sum and carry terms pps_o, ppc_o to be consumed in the result assembly addition, extends to bit [16]. The final line of the pseudocode calculates the !c term as the inverse of the OR of bit of the sum and carry terms.
This emulates the combined effect of: a true sign extension of the sign bit (at column 95) for the Extrabits_0 result, and the Booth carry out cancellation sign extension at columns 64 and 96 for the sum/carry terms of FW_0 and Extrabits_0. Extrabits_1 does not need any sign extension to be emulated, because it already extends up to the most significant bit 127 of the multiplication result. It can be seen that, compared to
However, with the negated signed multiplication, sign extension also becomes relevant for those sub-arrays 56 which act on portions of the first and second operands src_a, src_b which do not include any sign bit (e.g. see FW_0 in the example of
-
- a 1-bit correction “0?” aligned to the msb+1 bit position (one place higher than the most significant bit of the sum output of the given adder sub-array which requires the third type of sign extension). “0?”=1 if either of the first and second signed operands is equal to zero, and “0?”=0 if both of the first and second signed operands are non-zero.
- a constant which implements adding 1s at all bit positions more significant than the most significant bit of the sum output of the given adder sub-array.
This reflects that, in all cases other than when either (or both) of the portions of the operands being multiplied by the given adder sub-array is zero, in a signed negated multiplication (−1*src_a*src_b) the product of lower sub-portions of src_a, src_b represented by the sum/carry outputs of the given adder sub-array 56 will have a non-zero value. To ensure negative weighting in the result assembly addition, a sign extension should be applied to inject 1s at all positions more significant than the most significant bit of the sum output of that given adder sub-array. Therefore, the addition of a series of 1s at those more significant bit positions gives the correct sign extension result in most cases. However, in the case where the portions of either src_a or src_b being processed by the given adder sub-array are 0, it would be incorrect to sign extend with a string of 1s, as in that case the output of the sub-array represents 0 (as anything multiplied by 0 gives a result of 0). When zero is negated, the negated value is still 0, which should be represented in two's complement by all 0s. However, as the static constant adds 1s at all bits above msb every time, in the case where one of the inputs is 0, this can be corrected for by also adding a 1-bit correction at position msb+1, which cancels out the 1s and gives the correct result.
As the constant “FFFF . . . ” is static, it does not need to depend on the inputs, so can be combined with other constants used for other sign extensions as shown in
-
- the first type of sign extension (true sign extension based on sign bit) for Extrabits_0;
- the second type of sign extension (sign extension of carry out bit which would have been generated if sum/carry terms had been added before being consumed in the result assembly addition) for FW_0 and Extrabits_0;
- the third type of sign extension (sign extension of negated output) for FW_0.
By combining these constants into one, this reduces the depth of the adder tree used for the result assembly adder array 58.
Also, as shown in
-
- a correction of src_b.c at column 96, calculated based on the additional bits carried through the Extrabits_0 adder reduction tree as noted above, for use in the second type of sign extension emulation for Extrabits_0; and
- at column 64, a value !c|0? which represents the OR of the corrections used for the second and third type of sign extension emulation:
- !c indicating whether addition of the sum/carry terms output by FW_0 would have generated a carry out; and
- 0? indicating whether either of the portions of either src_a, src_b, or both being processed by the adder sub-array is 0.
In
-
- 401: FW_0_sum
- 402: FW_0_carry
- 403: FW_0_booth_cout
- 404: FW_1_sign_extension
- 405: FW_1_carry
- 406: FW_1_sum
- 407: Extra_bits_sign_extension
- 408: FW_1_booth_cout
- 409: Extra_bits_sum
- 410: Extra_bits_booth_cout
- 411: Extra_bits_carry
- 412: HW_0_booth_cout
- 413: HW_0_carry
- 414: HW_0_sum
- 415: HW_2_booth_cout
- 416: HW_2_carry
- 417: HW_2_sum
- 418: HW_1_booth_cout
- 419: HW_1_carry
- 420: HW_1_sum
- 421: HW_3_sign_extension
- 422: HW_3_carry
- 423: HW_3_sum
- 424: HW_3_booth_cout
- 425: Byte_0_booth_cout
- 426: Byte_0_sign_extension
- 427: Byte_0_carry
- 428: Byte_0_sum
- 429: Byte_2_sign_extension
- 430: Byte_6_sign_extension
- 431: Byte_2_booth_cout
- 432: Byte_2_sum
- 433: Byte_4_sign_extension
- 434: Byte_4_sum
- 435: Byte_2_carry
- 436: Byte_6_sum
- 437: Byte_4_booth_cout
- 438: Byte_4_carry
- 439: Byte_6_carry
- 440: Byte_6_booth_cout
- 441: Byte_1_sign_extension
- 442: Byte_1_carry
- 443: Byte_1_sum
- 444: Byte_3_sign_extension
- 445: Byte_3_sum
- 446: Byte_3_booth_cout
- 447: Byte_1_booth_cout
- 448: Byte_3_carry
- 449: Byte_5_sum
- 450: Byte_5_sign_extension
- 451: Byte_5_booth_cout
- 452: Byte_7_sum
- 453: Byte_7_carry
- 454: Byte_5_carry
- 546: 64′h FEFF_7E7E_FF7F_7F80
- 641: 64′h FDFE_FBFD_FDFE_FDFF
- 642: 64′h FCFE_7A7C_FD7E_7D7F
- 701: FW_0_umul_in_sign_mul_op_‘FFFF’_tail
- 714: HW_0_umul_in_sign_mul_op_‘FFFF’_tail
- 717: HW_2_umul_in_sign_mul_op_‘FFFF’_tail
- 722: HW_1_umul_in_sign_mul_op_‘FFFF’_tail
- 915: 64′h FCFE_797C_FC7E_7C7E
As can be seen from comparing
Hence, as shown in
This greatly reduces the complexity of the result assembly adder tree and reduces circuit fanout to limit the size of the critical timing path.
As shown in
Putting this all together,
Hence, from comparing
It will be appreciated that not all examples need to use the sign extension emulation techniques for eliminating all three of the types of sign extension discussed here. For example, implementations which do not support negated multiply operations do not need to apply the third sign extension emulation. Similarly, implementations which add the carry and sum terms from a given adder sub-array before injecting the total into the result assembly addition do not need to apply the second sign extension emulation.
It will also be appreciated that the specific constants which are added in the result assembly addition to emulate the sign extensions will depend on the specific manner in which the subarray multiplier splits a larger multiplication into a number of smaller multiplications. The constants shown in
Concepts described herein may be embodied in computer-readable code for fabrication of an apparatus that embodies the described concepts. For example, the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts. The above computer-readable code may additionally or alternatively enable the definition, modelling, simulation, verification and/or testing of an apparatus embodying the concepts described herein.
For example, the computer-readable code for fabrication of an apparatus embodying the concepts described herein can be embodied in code defining a hardware description language (HDL) representation of the concepts. For example, the code may define a register-transfer-level (RTL) abstraction of one or more logic circuits for defining an apparatus embodying the concepts. The code may define a HDL representation of the one or more logic circuits embodying the apparatus in Verilog, SystemVerilog, Chisel, or VHDL (Very High-Speed Integrated Circuit Hardware Description Language) as well as intermediate representations such as FIRRTL. Computer-readable code may provide definitions embodying the concept using system-level modelling languages such as SystemC and SystemVerilog or other behavioural representations of the concepts that can be interpreted by a computer to enable simulation, functional and/or formal verification, and testing of the concepts.
Additionally or alternatively, the computer-readable code may define a low-level description of integrated circuit components that embody concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII. The one or more netlists or other computer-readable representation of integrated circuit components may be generated by applying one or more logic synthesis processes to an RTL representation to generate definitions for use in fabrication of an apparatus embodying the invention. Alternatively or additionally, the one or more logic synthesis processes can generate from the computer-readable code a bitstream to be loaded into a field programmable gate array (FPGA) to configure the FPGA to embody the described concepts. The FPGA may be deployed for the purposes of verification and test of the concepts prior to fabrication in an integrated circuit or the FPGA may be deployed in a product directly.
The computer-readable code may comprise a mix of code representations for fabrication of an apparatus, for example including a mix of one or more of an RTL representation, a netlist representation, or another computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus embodying the invention. Alternatively or additionally, the concept may be defined in a combination of a computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus and computer-readable code defining instructions which are to be executed by the defined apparatus once fabricated.
Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc. An integrated circuit fabricated using the computer-readable code may comprise components such as one or more of a central processing unit, graphics processing unit, neural processing unit, digital signal processor or other components that individually or collectively embody the concept.
Concepts described herein may be embodied in a system comprising at least one packaged chip. The multiplication circuitry described earlier is implemented in the at least one packaged chip (either being implemented in one specific chip of the system, or distributed over more than one packaged chip). The at least one packaged chip is assembled on a board with at least one system component. A chip-containing product may comprise the system assembled on a further board with at least one other product component. The system or the chip-containing product may be assembled into a housing or onto a structural support (such as a frame or blade).
As shown in
In some examples, a collection of chiplets (i.e. small modular chips with particular functionality) may itself be referred to as a chip. A chiplet may be packaged individually in a semiconductor package and/or together with other chiplets into a multi-chiplet semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chiplet product comprising two or more vertically stacked integrated circuit layers).
The one or more packaged chips 3200 are assembled on a board 3202 together with at least one system component 3204 to provide a system 3206. For example, the board may comprise a printed circuit board. The board substrate may be made of any of a variety of materials, e.g. plastic, glass, ceramic, or a flexible substrate material such as paper, plastic or textile material. The at least one system component 3204 comprises one or more external components which are not part of the one or more packaged chip(s) 3200. For example, the at least one system component 3204 could include, for example, any one or more of the following: another packaged chip (e.g. provided by a different manufacturer or produced on a different process node), an interface module, a resistor, a capacitor, an inductor, a transformer, a diode, a transistor and/or a sensor.
A chip-containing product 3210 is manufactured comprising the system 3206 (including the board 3202, the one or more chips 3200 and the at least one system component 3204) and one or more product components 3212. The product components 3212 comprise one or more further components which are not part of the system 3206. As a non-exhaustive list of examples, the one or more product components 3212 could include a user input/output device such as a keypad, touch screen, microphone, loudspeaker, display screen, haptic device, etc.; a wireless communication transmitter/receiver; a sensor; an actuator for actuating mechanical motion; a thermal control device; a further packaged chip; an interface module; a resistor; a capacitor; an inductor; a transformer; a diode; and/or a transistor. The system 3206 and one or more product components 3212 may be assembled on to a further board 414.
The board 3202 or the further board 3214 may be provided on or within a device housing or other structural support (e.g. a frame or blade) to provide a product which can be handled by a user and/or is intended for operational use by a person or company.
The system 3206 or the chip-containing product 3216 may be at least one of: an end-user product, a machine, a medical device, a computing or telecommunications infrastructure product, or an automation control system. For example, as a non-exhaustive list of examples, the chip-containing product could be any of the following: a telecommunications device, a mobile phone, a tablet, a laptop, a computer, a server (e.g. a rack server or blade server), an infrastructure device, networking equipment, a vehicle or other automotive product, industrial machinery, consumer device, smart card, credit card, smart glasses, avionics device, robotics device, camera, television, smart television, DVD players, set top box, wearable device, domestic appliance, smart meter, medical device, heating/lighting control device, sensor, and/or a control system for controlling public infrastructure equipment such as smart motorway or traffic lights.
Some examples are set out in the following clauses:
1. Multiplication circuitry comprising:
-
- a plurality of adder sub-arrays, each adder array to add a respective set of partial products to generate one or more sub-array result values representing a result of signed multiplication of a respective pair of portions of bits selected from a first operand and a second operand, the plurality of adder sub-arrays comprising separate instances of hardware circuitry, the plurality of adder sub-arrays having at least two separate enable control signals for independently controlling whether at least two subsets of the adder sub-arrays are enabled or disabled; and
- a result assembly adder array to perform a result assembly addition to add a plurality of assembled values including the sub-array result values generated by the plurality of adder sub-arrays, to generate at least one multiplication result value representing a result of signed multiplication of the first operand and the second operand;
- wherein for a sign-extension-emulated sub-array result value being added in the result assembly addition, the result assembly adder array is configured to perform sign extension emulation by:
- applying a default zero extension to the sign-extension-emulated sub-array result value regardless of a sign of the sign-extension-emulated sub-array result value, and
- performing the result assembly addition with at least one other of the plurality of assembled values having a value that, when added in the result assembly addition, emulates an effect of sign extending the sign-extension-emulated sub-array result value up to a bit position corresponding to the most significant bit of the at least one multiplication result value.
2. The multiplication circuitry according to clause 1, in which the at least one other of the plurality of assembled values comprises a static constant having a value selected independent of values of the first operand and the second operand.
3. The multiplication circuitry according to clause 2, in which the static constant is shared between a plurality of sign-extension-emulated sub-array result values, the static constant having a value which when added in the result assembly addition provides emulation of sign extension of each of those plurality of sign-extension-emulated sub-array result values.
4. The multiplication circuitry according to any of clauses 2 and 3, in which the at least one other of the plurality of assembled values also comprises a correction value injected relative to the sign-extension-emulated sub-array result value which, in combination with the static constant, emulates sign extending the sign-extension-emulated sub-array result value, the correction value comprising fewer bits than the static constant.
5. The multiplication circuitry according to any of clauses 1 to 4, in which the result assembly adder array is configured to perform a first type of sign extension emulation for a given sign-extension-emulated sub-array result value whose most significant bit is of lower significance than a most significant bit of the at least one multiplication result value, and which is generated by one of the adder sub-arrays based on a pair of portions of bits selected from the first operand and the second operand which includes a sign bit of at least one of the first operand or the second operand.
6. The multiplication circuitry according to clause 5, in which, for the first type of sign extension emulation, the result assembly adder array is configured to include in the plurality of assembled values added in the result assembly addition at least one assembled value providing a same result as applying:
-
- a correction value of +1 at a bit position corresponding to a most significant bit of the given sign-extension-emulated sub-array result value; and
- a constant having a value which represents subtraction of 1 at a bit position corresponding to the most significant bit of the given sign-extension-emulated sub-array result value.
7. The multiplication circuitry according to any of clauses 1 to 6, in which:
-
- each adder sub-array is configured to generate, as said one or more sub-array result values, a sum term and a carry term which when added together would give the result of the signed multiplication of the respective pairs of portions; and
- the result assembly adder array is configured to include, as separate assembled values in the plurality of assembled values being added in the result assembly addition, the sum term and the carry term for a given adder sub-array.
8. The multiplication circuitry according to clause 7, in which:
-
- the result assembly adder array is configured to perform a second type of sign extension emulation to emulate a sign extension of a carry out caused by addition of the sum term and the carry term from the given adder sub-array.
9. The multiplication circuitry according to clause 8, in which for the second type of sign extension emulation applied to the sum term and the carry term from the given adder sub-array, the result assembly adder array is configured to include in the plurality of assembled values added in the result assembly addition:
-
- a correction value at a bit position one place higher than a most significant bit of the sum term, the correction value having opposite bit value to the carry out caused by addition of the sum term and the carry term; and
- a static constant having a value which represents subtraction of 1 at a bit position one place higher than the most significant bit of the carry term.
10. The multiplication circuitry according to clause 9, in which the result assembly adder array is configured to select whether the correction value is 0 or 1 based on carry out bits obtained by the given adder sub-array when generating the sum term and the carry term.
11. The multiplication circuitry according to any of clauses 1 to 10, in which:
-
- the multiplication circuitry is configured to support a negated signed multiplication operation in which the at least one multiplication result value represents −1 times a result of signed multiplication of the first operand and the second operand; and
- for the negated signed multiplication operation, the result assembly adder array is configured to perform a third type of sign extension emulation for a given sign-extension-emulated sub-array result value whose most significant bit is of lower significance than a most significant bit of the at least one multiplication result value, and which is generated by one of the adder sub-arrays based on a given pair of portions of bits selected from the first operand and the second operand where neither of the given pair of portions of bits selected from the first operand and the second operand includes a sign bit.
12. The multiplication circuitry according to clause 11, in which for the third type of sign extension emulation, the result assembly adder array is configured to include in the plurality of assembled values added in the result assembly addition:
-
- a static constant having a value which represents adding 1s at all bit positions more significant than a most significant bit of the given sign-extension-emulated sub-array result value; and
- a correction value at a bit position one place higher than a most significant bit of the given sign-extension-emulated sub-array result value, the correction value being 1 if one or both of the given pair of portions of bits selected from the first operand and the second operand is zero, and being 0 if one of the given pair of portions of bits selected from the first operand and the second operand is non-zero.
13. A system comprising:
-
- the multiplication circuitry of any of clauses 1 to 12, implemented in at least one packaged chip;
- at least one system component; and
- a board,
- wherein the at least one packaged chip and the at least one system component are assembled on the board.
14. A chip-containing product comprising the system of clause 13 assembled on a further board with at least one other product component.
15. A non-transitory computer-readable medium to store computer-readable code for fabrication of multiplication circuitry according to any of clauses 1 to 12.
16. Multiplication circuitry comprising:
-
- partial product selection circuitry to select a plurality of partial products based on a first signed operand and a second signed operand; and
- an adder array to add the plurality of partial products and a third signed operand; in which:
- the adder array is configured to apply a default zero extension to the third signed operand regardless of a sign of the third signed operand, and the partial product selection circuitry is configured to adjust one of the partial products added by the adder array to emulate an effect of sign extending the third signed operand.
17. The multiplication circuitry according to clause 16, in which each partial product added by the adder array has a sign extension header to emulate sign extension based on a sign of the corresponding partial product; and
-
- the partial product selection circuitry is configured to adjust the sign extension header associated with a least significant partial product based on the sign of the third signed operand, to emulate the effect of sign extending the third signed operand.
18. The multiplication circuitry according to clause 17, in which the partial product selection circuitry is configured to set the sign extension header associated with the least significant partial product to have a value which is 1 lower when the third signed operand is negative than when the third signed operand is positive.
19. A system comprising:
-
- the multiplication circuitry of any of clauses 16 to 18, implemented in at least one packaged chip;
- at least one system component; and
- a board,
- wherein the at least one packaged chip and the at least one system component are assembled on the board.
20. A chip-containing product comprising the system of clause 18 assembled on a further board with at least one other product component.
21 Computer-readable code for fabrication of multiplication circuitry according to any of clauses 16 to 18.
22. A computer-readable medium to store the computer-readable code of clause 21.
23. A non-transitory computer-readable medium to store the computer-readable code of clause 21.
In the present application, the words “configured to . . . ” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
In the present application, lists of features preceded with the phrase “at least one of” mean that any one or more of those features can be provided either individually or in combination. For example, “at least one of: A, B and C” encompasses any of the following options: A alone (without B or C), B alone (without A or C), C alone (without A or B), A and B in combination (without C), A and C in combination (without B), B and C in combination (without A), or A, B and C in combination.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims.
Claims
1. Multiplication circuitry comprising:
- a plurality of adder sub-arrays, each adder array to add a respective set of partial products to generate one or more sub-array result values representing a result of signed multiplication of a respective pair of portions of bits selected from a first operand and a second operand, the plurality of adder sub-arrays comprising separate instances of hardware circuitry, the plurality of adder sub-arrays having at least two separate enable control signals for independently controlling whether at least two subsets of the adder sub-arrays are enabled or disabled; and
- a result assembly adder array to perform a result assembly addition to add a plurality of assembled values including the sub-array result values generated by the plurality of adder sub-arrays, to generate at least one multiplication result value representing a result of signed multiplication of the first operand and the second operand;
- wherein for a sign-extension-emulated sub-array result value being added in the result assembly addition, the result assembly adder array is configured to perform sign extension emulation by: applying a default zero extension to the sign-extension-emulated sub-array result value regardless of a sign of the sign-extension-emulated sub-array result value, and performing the result assembly addition with at least one other of the plurality of assembled values having a value that, when added in the result assembly addition, emulates an effect of sign extending the sign-extension-emulated sub-array result value up to a bit position corresponding to the most significant bit of the at least one multiplication result value.
2. The multiplication circuitry according to claim 1, in which the at least one other of the plurality of assembled values comprises a static constant having a value selected independent of values of the first operand and the second operand.
3. The multiplication circuitry according to claim 2, in which the static constant is shared between a plurality of sign-extension-emulated sub-array result values, the static constant having a value which when added in the result assembly addition provides emulation of sign extension of each of those plurality of sign-extension-emulated sub-array result values.
4. The multiplication circuitry according to claim 2, in which the at least one other of the plurality of assembled values also comprises a correction value injected relative to the sign-extension-emulated sub-array result value which, in combination with the static constant, emulates sign extending the sign-extension-emulated sub-array result value, the correction value comprising fewer bits than the static constant.
5. The multiplication circuitry according to claim 1, in which the result assembly adder array is configured to perform a first type of sign extension emulation for a given sign-extension-emulated sub-array result value whose most significant bit is of lower significance than a most significant bit of the at least one multiplication result value, and which is generated by one of the adder sub-arrays based on a pair of portions of bits selected from the first operand and the second operand which includes a sign bit of at least one of the first operand or the second operand.
6. The multiplication circuitry according to claim 5, in which, for the first type of sign extension emulation, the result assembly adder array is configured to include in the plurality of assembled values added in the result assembly addition at least one assembled value providing a same result as applying:
- a correction value of +1 at a bit position corresponding to a most significant bit of the given sign-extension-emulated sub-array result value; and
- a constant having a value which represents subtraction of 1 at a bit position corresponding to the most significant bit of the given sign-extension-emulated sub-array result value.
7. The multiplication circuitry according to claim 1, in which:
- each adder sub-array is configured to generate, as said one or more sub-array result values, a sum term and a carry term which when added together would give the result of the signed multiplication of the respective pairs of portions; and
- the result assembly adder array is configured to include, as separate assembled values in the plurality of assembled values being added in the result assembly addition, the sum term and the carry term for a given adder sub-array.
8. The multiplication circuitry according to claim 7, in which:
- the result assembly adder array is configured to perform a second type of sign extension emulation to emulate a sign extension of a carry out caused by addition of the sum term and the carry term from the given adder sub-array.
9. The multiplication circuitry according to claim 8, in which for the second type of sign extension emulation applied to the sum term and the carry term from the given adder sub-array, the result assembly adder array is configured to include in the plurality of assembled values added in the result assembly addition:
- a correction value at a bit position one place higher than a most significant bit of the sum term, the correction value having opposite bit value to the carry out caused by addition of the sum term and the carry term; and
- a static constant having a value which represents subtraction of 1 at a bit position one place higher than the most significant bit of the carry term.
10. The multiplication circuitry according to claim 1, in which:
- the multiplication circuitry is configured to support a negated signed multiplication operation in which the at least one multiplication result value represents −1 times a result of signed multiplication of the first operand and the second operand; and
- for the negated signed multiplication operation, the result assembly adder array is configured to perform a third type of sign extension emulation for a given sign-extension-emulated sub-array result value whose most significant bit is of lower significance than a most significant bit of the at least one multiplication result value, and which is generated by one of the adder sub-arrays based on a given pair of portions of bits selected from the first operand and the second operand where neither of the given pair of portions of bits selected from the first operand and the second operand includes a sign bit.
11. The multiplication circuitry according to claim 10, in which for the third type of sign extension emulation, the result assembly adder array is configured to include in the plurality of assembled values added in the result assembly addition:
- a static constant having a value which represents adding 1s at all bit positions more significant than a most significant bit of the given sign-extension-emulated sub-array result value; and
- a correction value at a bit position one place higher than a most significant bit of the given sign-extension-emulated sub-array result value, the correction value being 1 if one or both of the given pair of portions of bits selected from the first operand and the second operand is zero, and being 0 if one of the given pair of portions of bits selected from the first operand and the second operand is non-zero.
12. A system comprising:
- the multiplication circuitry of claim 1, implemented in at least one packaged chip;
- at least one system component; and
- a board,
- wherein the at least one packaged chip and the at least one system component are assembled on the board.
13. A chip-containing product comprising the system of claim 12 assembled on a further board with at least one other product component.
14. A non-transitory computer-readable medium to store computer-readable code for fabrication of multiplication circuitry according to claim 1.
15. Multiplication circuitry comprising:
- partial product selection circuitry to select a plurality of partial products based on a first signed operand and a second signed operand; and
- an adder array to add the plurality of partial products and a third signed operand; in which:
- the adder array is configured to apply a default zero extension to the third signed operand regardless of a sign of the third signed operand, and the partial product selection circuitry is configured to adjust one of the partial products added by the adder array to emulate an effect of sign extending the third signed operand.
16. The multiplication circuitry according to claim 15, in which each partial product added by the adder array has a sign extension header to emulate sign extension based on a sign of the corresponding partial product; and
- the partial product selection circuitry is configured to adjust the sign extension header associated with a least significant partial product based on the sign of the third signed operand, to emulate the effect of sign extending the third signed operand.
17. The multiplication circuitry according to claim 16, in which the partial product selection circuitry is configured to set the sign extension header associated with the least significant partial product to have a value which is 1 lower when the third signed operand is negative than when the third signed operand is positive.
18. A system comprising:
- the multiplication circuitry of claim 15, implemented in at least one packaged chip;
- at least one system component; and
- a board,
- wherein the at least one packaged chip and the at least one system component are assembled on the board.
19. A chip-containing product comprising the system of claim 18 assembled on a further board with at least one other product component.
20. A non-transitory computer-readable medium to store computer-readable code for fabrication of multiplication circuitry according to claim 15.
Type: Application
Filed: Jul 21, 2023
Publication Date: Jan 23, 2025
Inventors: Nicholas Andrew PFISTER (Austin, TX), Vignesh Devidas KUDVA (Cambridge)
Application Number: 18/356,618