Abstract: A method and apparatus for performing modular multiplication is disclosed. An apparatus in accordance with one embodiment of the present invention includes a modular multiplier including a plurality of independent computation channels, where the plurality of independent computation channels includes a first computation channel and a second computation channel, and a coupling device interposed between the first computation channel and the second computation channel to receive a control signal and to couple the first computation channel to the second computation channel in response to a receipt of the control signal.
Abstract: A multiply unit uses four multipliers independently to perform for four parallel multiplications of single-width operands or uses the four multiplier cooperatively with an adder to perform a multiplication of double-width operands. In alternative embodiments, the adder operates in the same clock cycle as the multipliers or in a following clock cycle. Operand selection logic selects pairs of either single-width multiplicands or single-width partial multiplicands depending on for single or double-width multiplies.
Abstract: The electrical circuitry for a multiplier system includes a counter for determining proximity to sampling operation, and a switch to select between symmetrical noise invariant operation and a low-power mode of operation. A noise invariant circuit disables row skip operation in a multi-row multiplier, to enable analog sampling. Disabling of the row skip operation is accomplished at a time which is several digital cycles preceding the time of analog sampling. Power saving multiplier row skippage resumes after analog sampling is completed.
Abstract: A Booth encoding circuit includes a plurality of cells (202a-202d), in which at least one of the cells (202c) includes a plurality of inputs. The cell also includes a first plurality of transistors (203) receiving at least one input and forming a NAND logic stage. The cell further includes a second plurality of transistors (211) receiving at least one input and forming an OR logic stage. The cell also includes a first output inverter (222) connected to at least one of the second plurality of transistors (211), and a first switching (224) connected to at least one of the first plurality of transistors (203). The cell further includes a second switching (226) connected to the first output inverter (222), and a second output inverter (228) connected to the first switching (224) and the second switching (226).
Abstract: Efficient computation of complex multiplication results and very efficient fast Fourier transforms (FFTs) are provided. A parallel array VLIW digital signal processor is employed along with specialized complex multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs are used allowing the complex multiplication pipeline hardware to be efficiently used. In addition, efficient techniques for supporting combined multiply accumulate operations are described.
Type:
Grant
Filed:
June 22, 1999
Date of Patent:
January 4, 2005
Assignee:
PTS Corporation
Inventors:
Nikos P. Pitsianis, Gerald G. Pechanek, Ricardo E. Rodriguez
Abstract: A method and apparatuses for performing binary multiplication on signed and unsigned operands of various lengths is discussed herein. It is a concept that may be split into two parts, the first of which is the multiplication hardware itself, a compact, less than-full sized multiplier employing Booth or other type of recoding methods upon the multiplier to reduce the number of partial products per scan, and implemented in such a manner so that a multiplication operation with large operands may be broken into subgroups of operations that will fit into this mid-sized multiplier whose results, here called modular products, may be knitted back together to form a correct, final product. The second part of the concept is the supporting hardware used to separate the operands into subgroups and input the data and control signals to the multiplier, and the algorithms and apparatuses used to align and combine the modular products properly to obtain the final product.
Type:
Application
Filed:
May 12, 2003
Publication date:
November 18, 2004
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Fadi Y. Busaba, Steven R. Carlough, David S. Hutton, Christopher A. Krygowski, John G. Rell, Sheryll H. Veneracion
Abstract: A digital circuit including a Booth encoder having inputs for receiving a plurality of adjacent bits of a first binary input number, and an encoder control input for allowing selection between multiplication of first and second binary input numbers and multiplication of the pairs of binary numbers smaller than the first or second input number, the encoder being configured to encode the bits of the first binary input number dependent on the encoder control input to generate Booth encoded outputs for use in selection of a partial product, the Booth encoder being for use with a selector having inputs for receiving a plurality of adjacent bits of the second binary input number, and for receiving the Booth encoded outputs from the encoder, the selector being configured to select a partial product bit according to the Booth encoded outputs and the bits of the second binary input number.
Abstract: Integer multiply operations using data stored in an integer register file are performed using multi-media primitive instructions that operate on smaller operands. The present invention performs a multiply operation on a 32-bit or 64-bit value by performing multiply operations on a series of smaller operands to form partial products, and adding the partial products together. Data manipulation instructions are used to reposition 16-bit segments of the 32-bit operands into positions that allow the multi-media parallel multiply instructions to compute partial products, and the partial products are then added together to form the result. In every embodiment, the present invention achieves better latencies than the prior art method of performing integer multiply operations provided by the IA-64 architecture.
Type:
Grant
Filed:
July 31, 2001
Date of Patent:
November 2, 2004
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: The electrical circuitry for a multiplier system includes a counter for determining proximity to sampling operation, and a switch to select between symmetrical noise invariant operation and a low-power mode of operation. A noise invariant circuit disables row skip operation in a multi-row multiplier, to enable analog sampling. Disabling of the row skip operation is accomplished at a time which is several digital cycles preceding the time of analog sampling. Power saving multiplier row skippage resumes after analog sampling is completed.
Abstract: A method of operating a multiplication circuit to perform multiply-accumulate operations on multi-word operands is characterized by an operations sequencer that is programmed to direct the transfer of operand segments between RAM and internal data registers in a specified sequence. The sequence processes groups of two adjacent result word-weights (columns), with the multiply cycles within a group proceeding in a zigzag fashion by alternating columns with steadily increasing or decreasing operand segment weights. In multiplier embodiments having additional internal cache registers, these store frequently used operand segments so they aren't reloaded from memory multiple times. In this case, the sequence within a group need not proceed in a strict zigzag fashion, but can jump to a multiply operation involve at least one operand segment stored in a cache.
Abstract: The present invention provides for saving power in a floating point unit. Bypass logic is coupled to the input of the aligner and the multiplier. An aligner bypass is coupled to the output of the aligner and an output of the bypass logic. A multiplier bypass is coupled to the output of the multiplier and an output of the bypass logic. The aligner bypass and the multiplier bypass transmit the output of the aligner and multiplier, or the bypass logic, as a function of an aligner bypass signal and a multiplier bypass signal, respectively. An adder is coupled to the output of the aligner bypass and the multiplier bypass. Clock disable logic is used to selectively enable and disable at least portions of the aligner, multiplier and bypass logic. This is done based on the operation and on the value of the operands.
Type:
Application
Filed:
March 19, 2003
Publication date:
September 23, 2004
Applicant:
International Business Machines Corporation
Inventors:
Sang Hoo Dhong, Silvia Melitta Mueller, Hwa-Joon Oh, Kevin Duc Tran
Abstract: An iterative multiplier circuit (10) comprises modules (15 to 18) that subdivide the respective input signals (Zn, Jn) into a first part (msb(Zn), msb(Jn)) that is the power of 2 immediately lower or equal to the input signal and a second part (Zn—msb(Zn), Jn—msb(Jn)) corresponding to the difference between the input signal and the aforesaid first part. A shift module (19) generates a respective output signal through shift operations that implement the multiplication operation for numbers that are powers of 2. The circuit operates according to a general iterative scheme in which at each step three components of the output signal (X,Y) are computed, corresponding to the product of two numbers that are powers of 2 and to two products in which at least one of the factors is a power of 2. The number of steps in the iteration scheme is controllable, thus allowing to vary the accuracy with which the output value (X,Y) is calculated.
Type:
Application
Filed:
February 13, 2004
Publication date:
September 23, 2004
Inventors:
Donato Ettorre, Bruno Melis, Alfredo Ruscitto
Abstract: In a method for multiplication of floating-point real numbers, encoded in a binary way in sign, exponent and mantissa, the multiplication of the mantissa envisages a step of calculation of partial products, which are constituted by a set of addenda corresponding to the mantissa. In order to reduce the size and power consumption of the circuits designed for calculation, there is adopted a method of binary encoding which envisages setting the first bit of the mantissa to a value 1, in order to obtain a mantissa having a value comprised between 0.5 and 1. Also proposed are methods for rounding of the product and circuits for the implementation of the multiplication method. Also illustrated are circuits for conversion from and to encoding of floating-point real numbers according to the IEEE754 standard. Preferential application is in portable and/or wireless electronic devices, such as mobile telephones and PDAs, with low power-consumption requirements.
Abstract: A multiplier circuit is disclosed including a Wallace tree block and a carry propagation adder. The Wallace tree block includes a sum calculation block adding partial products for each digit and a carry calculation block adding carries obtained in the addition by the sum calculation block. In the case of multiplication over an extension field (finite field GF(2n)) of two, a result of calculation by the sum calculation block is outputted. The carry propagation adder adds the result of calculation by the sum calculation block and a result of calculation by the carry calculation block. In the case of multiplication for integers (finite field GF(p)), a result of calculation by the carry propagation adder is outputted.
Type:
Application
Filed:
January 21, 2004
Publication date:
September 9, 2004
Applicant:
International Business Machines Corporation
Abstract: A unified, extra regular, complexity-effective, high-performance multiplier construction method. The method is applicable to a whole spectrum of n×n-b pipelined or non-pipelined multipliers for 10≦n≦81, with no more than two levels of tripling process for each construction. The method includes a library containing 3-b to 9-b borrow parallel small multipliers, used for compact, low-power implementation. The multipliers are developed based on the novel counter circuitry, called borrow parallel counter, which utilizes 4-b 1-hot encoded signals and borrow bits, i.e., bits weighted 2. Exampled by a 54×54-b (bit) multiplier, the method allows large multipliers to be generated from smaller multipliers, tripling the size in each expansion (6×6-b to 18×18-b to 54×54-b). This significantly reduces the complexity of state of the art designs and achieves full self-testability without sacrificing high-performance.
Type:
Application
Filed:
December 5, 2003
Publication date:
September 2, 2004
Applicant:
THE RESEARCH FOUNDATION OF STATE UNIVERSITY OF NEW YORK
Abstract: In an arithmetic device which performs a multiplication of a multiplicand A and a multiplier B expressed by bit patterns using a secondary Booth algorithm, an encoder selects a partial product indicating −A when the value of i specifying three consecutive bits of B is 0, and selects a partial product indicating 0 when the value of i is not 0. An addition circuit generates a two's complement of A from the partial product indicating −A, and outputs it as a multiplication result.
Abstract: A multiply execution unit that is operable to generate the integer product and the XOR product of a multiplicand and a multiplier. The multiply execution unit includes a summing circuit for summing a plurality of partial products. The partial products may be Booth encoded. The summing circuit can generate an integer sum of the plurality of partial products and can generate an XOR sum of the plurality of partial products. The summing circuit includes a first plurality of full adders. The first plurality of full adders each has three inputs, a carry output, and a sum output. The sum outputs of the first plurality of full adders are independent of the value of any carry output in the summing circuit. The summing circuit also includes a second plurality of full adders. The second plurality of full adders each has three inputs, a carry output, and a sum output.
Type:
Application
Filed:
January 30, 2003
Publication date:
August 5, 2004
Inventors:
Leonard D. Rarick, Sheueling Chang Shantz, Shreyas Sundaram
Abstract: An apparatus and method for compressing a reduction array into an accumulated carry-save sum. The reduction array includes a partial product matrix, a carry-save sum, and a constant value row. A compressor array generates a previous accumulated carry-save sum. A three-input/two-output carry-save adder pre-reduces the constant value row and the previously accumulated carry-save sum into a two-row intermediate carry-save sum that is added to the partial product matrix to form a current accumulated carry-save sum.
Type:
Grant
Filed:
December 11, 2000
Date of Patent:
July 13, 2004
Assignee:
International Business Machines Corporation
Abstract: A method and device are provided that allow computation of multiple modulus conversion (MMC) outputs using little or no division operations. Instead of division operations, multiplication and logical shift operations are used to produce pseudo-quotients and pseudo-remainders, which may be corrected in a final step to produce correct MMC outputs. This allows for more efficient implementation, since division is typically less efficient than multiplication and logical shift. The method and device operate on MMC inputs that may be partitioned into sub-quotients of varying numbers of digits in any numbering system. The multiplication and logical shift operations are performed on each of the sub-quotients according to a procedure derived from long-division techniques.
Abstract: The present invention generally relates to an apparatus and method for efficiently summing the partial product bits produced by a multiplier. Briefly described, in architecture, the apparatus includes a first array of odd/even summation circuitry, a second array of odd/even summation circuitry, and a linear array of adders. The apparatus is configured to add a row of partial product bits produced by a multiplier in multiplying a first operand with a second operand. The first array of odd/even summation circuitry produces a first summation of a portion of the partial product bits. The second array of odd/even circuitry produces a second summation of the other partial product bits. The linear array of adders then adds the first summation and the second summation to produce a carry save representation of a product bit (i.e., a bit of the product produced by multiplying the first operand by the second operand).
Type:
Grant
Filed:
February 15, 2000
Date of Patent:
May 25, 2004
Assignee:
Hewlett-Packard Development Company, L.P.
Inventors:
Glenn T Colon-Bonet, Stephen L Bass, Thomas J. Sullivan
Abstract: Multi-precision multiplication methods include storing a first operand and a second operand as a first array and a second array of n words. A first weighted sum is determined from multiple subproducts of corresponding words of the first operand and the second operand. The methods may further include iteratively determining a next weighted sum from a previous weighted sum and a recursively calculated intermediate product. The disclosed methods can be used in a variety of different applications (e.g., cryptography) and can be implemented in a number of software or hardware environments.
Type:
Application
Filed:
August 6, 2003
Publication date:
May 20, 2004
Applicants:
The State of Oregon Acting by and through the State Board of Higher Education on Behalf of, Oregon State University
Abstract: The present invention relates to a multiplication coefficient complementary apparatus for complementing a multiplication coefficient while reducing unnecessary operations performed during complementing the multiplication coefficient. The multiplication coefficient complementary apparatus comprises a plurality of multiplication units (11) each for multiplying an input signal by a multiplication coefficient; a plurality of complementary units (12) each for complementing the multiplication coefficient by means of a time constant process; and a control unit (13) for changing states of connecting the multiplication units (11) with the complementary units (12).
Abstract: A system and method are disclosed which provide a multiplier comprising a linear summation array that is implemented in a manner that enables both signed and unsigned multiplication to be performed. A preferred embodiment utilizes a modified Baugh-Wooley algorithm to enable an optimum even-and-odd linear summation array for performing both signed and unsigned high speed multiplication. That is, a preferred embodiment enables a linear summation array that is smaller in size and simpler in design than the multiplier arrays typically implemented for signed multiplication in the prior art. The modified Baugh-Wooley algorithm of a preferred embodiment translates a signed operand to an unsigned operand to greatly simplify the sign extension for multiplication, and to enable a relatively small multiplier array that does not include sign extension columns to be utilized for performing signed multiplication.
Type:
Grant
Filed:
February 21, 2000
Date of Patent:
March 16, 2004
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: A decibel level adjustment device that calculates an output signal that is a d decibel multiple of an input signal comprises a plurality of shift circuits, a shift amount control circuit, and adders. The shift circuits shift an input signal by exactly a designated number of bits in a designated direction. The shift amount control circuit receives the value of d as a decibel control value, and in accordance with this decibel control value, generates and outputs control signals that indicate the number of bits to shift and the shift direction of each shift circuit. The adder adds the outputs of the shift circuits.
Abstract: A system and method are disclosed which provide a multiplier comprising a linear summation array that is implemented in a manner that enables both signed and unsigned multiplication to be performed. A preferred embodiment utilizes a modified Baugh-Wooley algorithm to enable an optimum even-and-odd linear summation array for performing both signed and unsigned high speed multiplication. That is, a preferred embodiment enables a linear summation array that is smaller in size and simpler in design than the multiplier arrays typically implemented for signed multiplication in the prior art. The modified Baugh-Wooley algorithm of a preferred embodiment translates a signed operand to an unsigned operand to greatly simplify the sign extension for multiplication, and to enable a relatively small multiplier array that does not include sign extension columns to be utilized for performing signed multiplication.
Abstract: A system of and method for extended Booth encoding of two binary numbers, K and L. A stage of the encoder receives K[2n+1], K[2n], L[2n+1], and C[n−1], N−1≧n≧0, with N being the length of L, and it being assumed L[2n]=0, and forms C[n], S[n], M1[n], and M2[n] according to the following equations: C[n]=K[2n+1]|L[2n+1], S[n]=K[2n+1]{circumflex over ( )}L[2n+1], M1[n]=K[2n]{circumflex over ( )}C[n−1], M2[n]=(S[n]&/K[2n]&/C[n−1])|(/S[n]&K[2n]&C[n−1]), where | refers to the logical OR function, {circumflex over ( )} to the exclusive OR function, & to the logical AND function, and/to the logical inversion function.
Abstract: A fast, parallel modular multiplier is presented which is scalable according to available hardware resources. Linear throughput increases with respect to consumed resources is achieved. Multiple independent data streams may be processed simultaneously, and optimal clock rates are attained by virtue of limited fan-out of all signal paths and nearest neighbor interconnections. Integrated circuit implementation is benefited by the potential for signal sharing among input and output busses and a common control interface for all independent data streams.
Abstract: A fast, scalable, systolic modular multiplier is presented. Linear throughput scalability with respect to consumed hardware resources is achieved through simultaneous parallel processing of multiple independent data streams. Optimal clock rates are attained by virtue of systolic properties of limited fan-out of all signal paths and nearest neighbor interconnections. Signal sharing among input and output busses and a common control interface for all independent data streams is made possible, thus benefiting integrated circuit implementations.
Abstract: A decibel level adjustment device that calculates an output signal that is a d decibel multiple of an input signal comprises a plurality of shift circuits, a shift amount control circuit, and adders. The shift circuits shift an input signal by exactly a designated number of bits in a designated direction. The shift amount control circuit receives the value of d as a decibel control value, and in accordance with this decibel control value, generates and outputs control signals that indicate the number of bits to shift and the shift direction of each shift circuit. The adder adds the outputs of the shift circuits.
Abstract: A multiplication apparatus and system may include a multiplicand buffer to hold a digit of a multiplicand, a multiplier buffer to hold a digit of a multiplier, and a result buffer to hold a carry-free multiplied and accumulated result of the multiplicand and a plurality of reverse ordered digits included in the multiplier. An article, including a machine-accessible medium, may contain data capable of causing a machine to implement a multiplication method, including selecting a multiplicand plurality of digits, reversing the order of a selected multiplier plurality of digits to provide a reversed plurality of digits, and multiplying and accumulating the multiplicand plurality of digits and the reversed plurality of digits to provide a multiplication result.
Abstract: An arithmetic unit that performs high speed multiplication and addition operations is provided. The arithmetic unit is applicable to an instruction set not having a multiplication-addition instruction. The arithmetic circuit included in a data processing device is configured to have: a multiplication device (EMUL1) to which data A and B are inputted and which outputs partial signals, sum signal (113) and carry signal (114), for computing A*B; a first addition device (EADD1) which adds the sum signal and the carry signal to compute the final result of A*B; and a second addition device (EADD2) which receives data E, the sum signal, and the carry signal and is capable of computing the result of adding E to A*B. The arithmetic circuit selects among three types of operations, multiplication (A*B), addition (D+E), and multiplication-addition (A*B+E) by selection circuits 104 and 105.
Abstract: Provided is a system and method for a modem including one or more processing paths. Also included is a number of interconnected modules sequentially arrayed along the one or more paths. Each module is configured to (i) process signals passed along the paths in accordance with the sequence and (ii) implement predetermined functions to perform the processing. Further, each of the modules has a particular degree of functional programmability and the degrees of functional programmability monotonically vary in accordance with the sequence.
Type:
Application
Filed:
January 24, 2003
Publication date:
December 4, 2003
Inventors:
Gregory H. Efland, Haixiang Liang, Yuanjie Chen
Abstract: An error compensation bias circuit and method for a canonic signed digit (CSD) fixed-width multiplier that receives a W-bit input and produces a W-bit product. Truncated bits of the multiplier are divided into two groups (a major group and a minor group) depending upon their effects on quantization error. An error compensation bias is expressed in terms of the truncated bits in the major group. The effects of the remaining truncated bits in the minor group are taken into account by a probabilistic estimation. The error compensation bias circuit typically requires only a few logic gates to implement.
Type:
Application
Filed:
April 23, 2003
Publication date:
November 27, 2003
Applicant:
Broadcom Corporation
Inventors:
Keshab K. Parhi, Jin-Gyun Chung, Sang-Min Kim
Abstract: An apparatus and method for SIMD modular multiplication are described. In one embodiment, the method includes selection of modular multiplication method available from an operating environment. Once the multiplication method is selected, a data access pattern for processing of data is selected. Finally, the selected modular multiplication method is executed in order to process data according to the selected data access pattern. In a further embodiment, a single instruction multiple data (SIMD) modular multiplication instruction is provided in order to enable simultaneous modular multiplication of multiplicand and multiplier operands, which may be vertically or horizontally accessed from memory, as indicated by a selected data access pattern. Alternatively, modular multiplication is implemented utilizing a SIMD byte shuffle operation, which enables modular multiplication of a constant multiplicand value to varying data multiplier values.
Type:
Application
Filed:
May 2, 2002
Publication date:
November 13, 2003
Inventors:
William W. Macy, Hong Jiang, Eric Debes, Igor V. Kozintsev
Abstract: To provide a modular multiplication method and a calculating device that do not rely on the Montgomery technique, wherein the number of times of multiply-add calculations is reduced to shorten a calculation time for calculation speed-up, there is no limitation in input value, and it is possible to execute a remainder calculation exceeding the calculable maximum bit length of a multiply-add unit that is used. Assuming that N=2n−M and X=&agr;×2n+&bgr;, a relation of XmodN=(&agr;×M+&bgr;)modN is derived, which is utilized.
Abstract: A machine or method used for reducing the implementation cost of digital filters that use multiplication operations. For each new input, a small look-up table of products is computed and stored. Weighting of the inputs when computing digital filter outputs can be accomplished using look-up table access, shifting, and addition. The invention can be used for constant filters or for adaptive filters. With constant filter coefficients, a small look-up table which exploits the properties of the various coefficient representations as a group is possible. With adaptive filters, a larger table may be needed, but can be used to reduce the multiplication cost of both filter output computation and filter adaptation. The invention is particularly useful in technologies where general multiplication is costly, such as field programmable gate arrays, application specific integrated circuits, and software running on general-purpose microprocessors.
Abstract: The present invention provides a computer-implemented method for multiplying two large multiplicands. The method includes generating a plurality of partial products by multiplying each digit of the first multiplicand with each digit of the second multiplicand. The resulting partial products have a least significant digit and a most significant digit. The method further includes adding each of the most significant digits to a first array and adding each of the least significant digits to a second array. The method then includes adding the first array to the second array, wherein the result is the product of the two original multiplicands.
Abstract: An arithmetic device with low power consumption includes master latches, a dynamic range detection unit, slave latches, an operation unit, and a word-length restoration unit. In the arithmetic device, the master latches latch a plurality of (such as two) input data. The dynamic range detection unit detects the effective dynamic range of these input data. The slave latches latch the values of the effective dynamic-range bits of these input data. The operation unit performs predetermined operations of the bits of these effective dynamic range to obtaing an operation result. Since the operation unit only performs operations of the bits of the effective dynamic range, the circuit corresponding to other bits will not demonstrate switching of power consumption, thereby lowering the overall power consumption. Furthermore, the word-length restoration unit will complement the operation result to its original output length in association with the sign of the operation result, for obtaining the correct operation result.
Type:
Grant
Filed:
December 1, 1999
Date of Patent:
September 30, 2003
Assignee:
Industrial Technology Research Institute
Inventors:
Oscal T. -C. Chen, I-Ping Hsu, Ruey-Liang Ma
Abstract: One embodiment of the present invention provides a system that facilitates performing a mask-driven multiplication operation between arithmetic intervals within a computer system. The system first receives interval operands, including a first interval and a second interval, to be multiplied together to produce a resulting interval. Next, the system uses the operand values to create a mask. The system uses this mask to perform a multi-way branch to the code for the interval operands. In one embodiment of the present invention, creating the mask additionally involves: determining whether the first interval and/or second intervals are empty, and modifying the mask so the multi-way branch directs the execution flow of the program to appropriate code for this case. In one embodiment of the present invention, if the first interval is empty or if the second interval is empty, the multi-way branch directs the execution flow of the program to code that sets the resulting interval to be empty.
Abstract: A machine used for multiplication which exploits the facts that cos(2&pgr;/3) is equal to minus one half and that the sum of cos(−2&pgr;/5) and cos(−4&pgr;/5) is also equal to minus one half. Low-cost multiplication by cos(2&pgr;/3) can be implemented with simple negation and shifting operations in three-point discrete Fourier transforms (DFTs) and in related three-point transforms. Low-cost multiplication of a multiplier input by both cos(−2&pgr;/5) and cos(−4&pgr;/5) can be implemented with a first multiplication of the multiplier input by one of the numbers to produce a first product, simple negation and shifting operations to obtain an intermediate result which is minus one half times the multiplier input, and subtraction of the first product from the intermediate result to obtain the second product.
Abstract: In hardware multipliers, the generation of partial products is a necessary step in the process known to the art for efficient production of a final product. A way to increase the speed of hardware multipliers is through the use of the Booth algorithm. The alternate Booth partial product generation for hardware multipliers of the present invention is directed to a method and apparatus for eliminating the encoding of the bits of the multiplier prior to entering the partial product generating cell of the present invention which may result in less hardware and increased speed.
Abstract: An apparatus and method for compressing a reduction array into an accumulated carry-save sum. The reduction array includes a partial product matrix, a carry-save sum, and a constant value row. A compressor array generates a previous accumulated carry-save sum. A three-input/two-output carry-save adder pre-reduces the constant value row and the previously accumulated carry-save sum into a two-row intermediate carry-save sum that is added to the partial product matrix to form a current accumulated carry-save sum.
Type:
Application
Filed:
December 11, 2000
Publication date:
August 21, 2003
Applicant:
International Business Machines Corporation
Abstract: A machine or method used in signal processing transforms that involve computing one or more sums each of one or more products. Multiplication for one product is implemented using one machine or method, and multiplication for a second product is implemented using a machine or method that is not capable of computing the first product. Alternatively, the numbers used in computing one product have a pair of finite-precision numeric formats that is not the same as the pair of finite-precision numeric formats of numbers used in computing a second product. Which machine or method is used depends on the particular representation of one or both numbers being multiplied, and on common properties of groups of allowed input number values and representations.
Abstract: A multiplier circuit for use in a data processor. The multiplier circuit contains a partial products generating circuit that receives a multiplicand value and a multiplier value and generates a group of partial products. The multiplier circuit also contains a split array for adding the partial products. A first summation array has a first group of adders that sum the even partial products to produce an even summation value. A second summation array has a second group of adders that sum the odd partial products to produce an odd summation value. The even and odd summation values are then summed to produce the output of the multiplier.
Abstract: The invention relates to a fixed point multiplying apparatus and method using an encoded multiplicand. The multiplicand is encoded into an independent binary system instead of a conventional binary system and each bit value of the encoded multiplicand is used as a control signal about an inputted multiplier in order to effectively execute a fixed point multiplication used in a transform algorithm such as the DCT in use for a multimedia codec. The multiplication is executed at a high speed with a simple structure and a small gate number.
Abstract: The present invention is directed to an apparatus and method for efficiently calculating an intermediate value between a first end value and a second end value such that the area and time required to implement this operation is minimized. The present invention is also used to efficiently multiply a value by a fraction. A fraction is involved in calculating an intermediate value and also for multiplying by a fraction. When the denominator of the fraction is odd, the binary representation of the blending function, which is used to calculate an intermediate value, exhibits special characteristics. The special characteristics allow the present invention to, among others, avoid the use of multipliers, which require a large number of gates to implement. This invention exploits this and other special characteristics in order to efficiently implement in hardware the blending function and to efficiently multiply a value by a fraction.
Abstract: The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.
Type:
Application
Filed:
September 4, 2002
Publication date:
June 12, 2003
Inventors:
Craig Hansen, Bruce Bateman, John Moussouris
Abstract: Integer multiply operations using data stored in an integer register file are performed using multi-media primitive instructions that operate on smaller operands. The present invention performs a multiply operation on a 32-bit or 64-bit value by performing multiply operations on a series of smaller operands to form partial products, and adding the partial products together. Data manipulation instructions are used to reposition 16-bit segments of the 32-bit operands into positions that allow the multi-media parallel multiply instructions to compute partial products, and the partial products are then added together to form the result. In every embodiment, the present invention achieves better latencies than the prior art method of performing integer multiply operations provided by the IA-64 architecture.
Abstract: A machine or method used in signal processing transforms involving computation of one or more sums each of one or more products. A first multiplier computes a first product and a first set of intermediate terms. A second multiplier computes a second product using one or more of the terms computed by the first multiplier. Because they share computations, the two multipliers can have lower implementation cost than if they function separately. The invention is particularly useful in signal processing transforms that have fixed weights, such as discrete Fourier transforms, discrete cosine transforms, and pulse-shaping filters. These transforms are multiply-intensive and are used repeatedly in many applications. Implementations of shared multiplication techniques can have reduced chip space, computation time, and power consumption relative to implementations that do not share computation.
Abstract: A multiplication block for a reconfigurable chip includes multiple multiplication units and a group of the selectable adder units operably interconnectable with the multiplication units. The adder units can be selectively connected for different configurations. The multiplication block is preferably controlled by an instruction which can put the multiplication block into different configurations.