Carry-save Adders (i.e., Csas) Patents (Class 708/629)
  • Patent number: 11789701
    Abstract: A multiplier circuit is provided to multiply a first operand and a second operand. The multiplier circuit includes a carry-save adder network comprising a plurality of carry-save adders to perform partial product additions to reduce a plurality of partial products to a redundant result value that represents a product of the first operand and the second operand. A number of the carry-save adders that is used to generate the redundant result value is controllable and is dependent on a width of at least one of the first operand and the second operand.
    Type: Grant
    Filed: August 5, 2020
    Date of Patent: October 17, 2023
    Assignee: Arm Limited
    Inventors: Tai Li, Jack William Derek Andrew, Michael Alexander Kennedy
  • Patent number: 10042605
    Abstract: A processor includes a carry save array multiplier. The carry save array multiplier includes an array of cascaded partial product generators. The array of cascaded partial product generators is configured to generate an output value as a product of two operands presented at inputs of the multiplier. The array of cascaded partial product generators is also configured to generate an output value as a sum of two operands presented at inputs of the multiplier.
    Type: Grant
    Filed: April 19, 2016
    Date of Patent: August 7, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Christian Wiencke, Armin Stingl
  • Patent number: 9632751
    Abstract: According to one embodiment, an arithmetic circuit includes follows. The arithmetic unit performs an arithmetic operation including addition and multiplication to generate a first value of (n+m) bits. The rounding preprocessor performs an OR operation on lower (m?k) bits of the first value to generate a second value of 1 bit. The register stores a third value of (n+k+1) bits obtained by concatenating upper (n+k) bits of the first value and the second value. The rounding postprocessor calculates a carry bit value of 1 bit from a most significant bit of the third value and lower (k+1) bits of the third value, and adds the carry bit value to upper n bits of the third value.
    Type: Grant
    Filed: December 24, 2013
    Date of Patent: April 25, 2017
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventor: Koichiro Ban
  • Patent number: 8996601
    Abstract: The disclosed embodiments relate to apparatus for accurately, efficiently and quickly executing a multiplication instruction. The disclosed embodiments can provide a multiplier module having an optimized layout that can help speed up computation of a result during a multiply operation so that cycle delay can be reduced and so that power consumption can be reduced.
    Type: Grant
    Filed: June 21, 2012
    Date of Patent: March 31, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Scott A. Hilker, George Q. Phan
  • Patent number: 8918446
    Abstract: Methods and apparatus relating to reducing power consumption in multi-precision floating point multipliers are described. In an embodiment, certain portions of a multiplier are disabled in response to two or more multiplication operations with the same data size and data type occurring back-to-back. Other embodiments are also claimed and described.
    Type: Grant
    Filed: December 14, 2010
    Date of Patent: December 23, 2014
    Assignee: Intel Corporation
    Inventors: Brent R. Boswell, Thierry Pons, Tom Aviram
  • Patent number: 8868634
    Abstract: A method and apparatus are described for performing multiplication in a processor to generate a product. In one embodiment, a 64-bit multiplier and a 64-bit multiplicand may be multiplied together over four cycles by merging different partial product (PP) subsets, generated by a Booth encoder and a PP generator, with feedback sum and carry results. The logic inputs of a plurality of multiplexers may be selected on a cyclical basis to efficiently compress (i.e., merge) each PP subset with feedback sum and carry results. A pair of preliminary sum results stored during one cycle may be outputted during a subsequent cycle and processed by a logic gate (e.g., an XOR gate) to generate a feedback sum result that is merged with a feedback carry result and a PP subset. Final sum and carry results may be added to generate the product of the multiplier and the multiplicand.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: October 21, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Srikanth Arekapudi, Sudherssen Kalaiselvan
  • Patent number: 8838664
    Abstract: The disclosed embodiments relate to methods and apparatus for accurately, efficiently and quickly executing a fused multiply-and-accumulate instruction with respect to floating-point operands that have packed-single-precision format. The disclosed embodiments can speed up computation of a high-part of a result during a fused multiply-and-accumulate operation so that cycle delay can be reduced and so that power consumption can be reduced.
    Type: Grant
    Filed: June 29, 2011
    Date of Patent: September 16, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David Oliver, Debjit Dassarma, Hanbing Liu, Scott Hilker
  • Publication number: 20140164457
    Abstract: An extensible iterative multiplier design is provided. Embodiments provide cascaded 8-bit multipliers for simplifying the performance of multi-byte multiplications. Booth encoding is performed in the lowest order multiplier, with the result of the Booth encoding then provided to higher order multipliers. Additionally, multiply-add operations can be performed by initializing a partial product sum register. Configurable connections between the multipliers facilitate a variety of possible multiplication options, including the possibility of varying the width of the operands.
    Type: Application
    Filed: December 7, 2013
    Publication date: June 12, 2014
    Applicant: Wave Semiconductor, Inc.
    Inventors: Samit Chaudhuri, Radoslav Danilak
  • Patent number: 8667040
    Abstract: An apparatus having operand registers, an opcode detector, a carryless preformat unit, a compressor, a left shifter, and exclusive-OR logic. The operand registers receive operands for a carryless multiplication. The opcode detector receives a carryless multiplication instruction, and asserts a carryless signal. The carryless preformat unit partitions a first operand into a plurality of parts that are such that a Booth encoder is precluded from selection of second partial products of a second operand, where the second partial products reflect implicit carry operations. The compressor sums first partial products of the second operand via carry save adders arranged in a Wallace tree configuration, where generation of carry bits is disabled. The left shifter shifts one or more outputs of the compressor. The exclusive-OR logic executes an exclusive-OR function to yield a carryless multiplication result.
    Type: Grant
    Filed: December 3, 2010
    Date of Patent: March 4, 2014
    Assignee: VIA Technologies, Inc.
    Inventor: Timothy A. Elliott
  • Patent number: 8661072
    Abstract: A shared parallel adder tree for executing multiple different population count operations on a single datum includes a number of carry-save adders (CSAs) and/or half adders (HAs), arranged in rows, where certain CSAs and HAs are dedicated to a single population count operation, while other CSAs and HAs are shared among two or more population count operations. The datum is applied to the first row in the tree. Partial sums of the number of ones at various locations within the tree are routed to certain CSAs and/or HAs “down” the tree to propagate the particular population count operations. Carry-propagate adders generate at least a portion of the final sum of the number of ones in certain population count operations. An “AND” operation on a particular number of the bits in the datum provides the high order bit of the resulting sum of the particular population count operation.
    Type: Grant
    Filed: August 19, 2008
    Date of Patent: February 25, 2014
    Assignee: International Business Machines Corporation
    Inventors: Bartholomew Blaner, Todd R. Iglehart, Robert K. Montoye
  • Patent number: 8645448
    Abstract: An apparatus having a carryless preformat unit, a Booth encoder, a compressor, a left shifter, and exclusive-OR logic. The carryless preformat unit receives a multiplier operand and partitions the multiplier operand into parts. The Booth encoder receives the parts and directs selection of first partial products of a multiplicand that do not reflect implicit carry operations. The compressor sums the first partial products via a configuration of carry save adders that generate sum bits and carry bits, where generation of the carry bits is disabled during execution of the carryless multiplication. The left shifter shifts bits of one or more outputs of the compressor. The exclusive-OR logic is coupled to the compressor and the left shifter, and is configured to execute an exclusive-OR function on the outputs to yield a carryless multiplication result.
    Type: Grant
    Filed: December 3, 2010
    Date of Patent: February 4, 2014
    Assignee: VIA Technologies, Inc.
    Inventor: Timothy A. Elliott
  • Patent number: 8606842
    Abstract: Provided are N-digit addition and subtraction units and N-digit addition and subtraction modules in which borrowing and carrying are not propagated in modules having basic digits. In the units and modules, an output pattern of results of addition and subtraction is predicted based on a relation between an augend and an addend and a relation between a minuend and a subtrahend, respectively, thereby preventing borrowing and carrying from being propagated in modules having basic digits.
    Type: Grant
    Filed: August 21, 2008
    Date of Patent: December 10, 2013
    Assignee: Tokyo Denki University
    Inventors: Hiroshi Kasahara, Tsugio Nakamura, Jin Sato
  • Patent number: 8601048
    Abstract: Herein described is a method and system of implementing integrated circuit logic modules that provide maximum efficiency and minimum energy dissipation. In a representative embodiment, a method of implementing one or more digital signal processing functions comprises determining one or more parameters associated with generating an optimal logic module. The one or more parameters may comprise the circuit area of the logic module and the processing time through a critical path of the logic module. In a representative embodiment, the system comprises a logic module that utilizes four full adders arranged in a tree configuration. In a representative embodiment, the logic module comprises a carry-save accumulator that provides maximum efficiency and minimal energy dissipation.
    Type: Grant
    Filed: January 5, 2005
    Date of Patent: December 3, 2013
    Assignee: Broadcom Corporation
    Inventor: Christian Lutkemeyer
  • Publication number: 20130031154
    Abstract: A self-timed multiplier unit includes a multiplier and a clock generator. The multiplier has a first set of semiconductor circuits in a critical path. The clock generator has a second set of semiconductor circuits configured to control a clock period of said clock generator selected to set a clock period longer than the propagation delay through the critical path of the multiplier. The clock generator may include a delay circuit having a delay to set the clock period longer than the propagation delay through the critical path of said multiplier. The clock generator uses circuit with identical logical design including the same standard cells, the same logic design or the same floor plan. Close matching of these circuit causes the multiplier and the clock generator to experience the same PVT speed variations.
    Type: Application
    Filed: July 27, 2012
    Publication date: January 31, 2013
    Applicant: TEXAS INSTRUMENTS DEUTSCHLAND GMBH
    Inventors: Christian Wiencke, Horst Diewald
  • Patent number: 8364741
    Abstract: A multiplier includes an operation unit that adds or subtracts a first group selected from a current input data, and a second group selected from a next input data corresponding to the first group to generate an operation result, a Booth's encoder that encodes the operation result according to Booth's algorithm, and generates code data, a partial product generation unit that calculates a partial product from the code data as a first partial product, and calculates, in a case where the first group and the second group are specific combination, a second partial product, and an adder that cumulatively adds an output from the partial product generation unit. The specific combination is a combination in which the highest-order bit of each of the first group and the second group is the same value, and the third least significant bit obtained after the subtraction operation is 1.
    Type: Grant
    Filed: February 25, 2009
    Date of Patent: January 29, 2013
    Assignee: Renesas Electronics Corporation
    Inventor: Yoichi Katayama
  • Patent number: 8352533
    Abstract: There is provided a semiconductor integrated circuit including: a plurality of first logic blocks which are reconfigurable, the plurality of first logic blocks inputting data of a first bit width and performing computation; a first network connecting the plurality of first logic blocks in a dynamically reconfigurable manner; a plurality of second logic blocks inputting data of a second bit width different from the first bit width and performing computation; a second network connected to outputs of the plurality of second logic blocks; and a third network connecting a carry bit output of a computing unit included in the first logic block to an input of a computing unit included in the second logic block in a dynamically reconfigurable manner.
    Type: Grant
    Filed: December 11, 2008
    Date of Patent: January 8, 2013
    Assignee: Fujitsu Semiconductor Limited
    Inventor: Hiroshi Furukawa
  • Patent number: 8275822
    Abstract: Multiplication engines and multiplication methods are provided. A multiplication engine for a digital processor includes a first multiplier to generate unequally weighted partial products from input operands in a first multiplier mode; a second multiplier to generate equally weighted partial products from input operands in a second multiplier mode; a multiplexer to select the unequally weighted partial products in the first multiplier mode and to select the equally weighted partial products in the second multiplier mode; and a carry save adder array configured to combine the selected partial products in the first multiplier mode and in the second multiplier mode.
    Type: Grant
    Filed: January 10, 2008
    Date of Patent: September 25, 2012
    Assignee: Analog Devices, Inc.
    Inventors: Andreas D. Olofsson, Baruch Yanovitch
  • Patent number: 8244790
    Abstract: A multiplier circuit is disclosed including a Wallace tree block and a carry propagation adder. The Wallace tree block includes a sum calculation block adding partial products for each digit and a carry calculation block adding carries obtained in the addition by the sum calculation block. In the case of multiplication over an extension field (finite field GF(2n)) of two, a result of calculation by the sum calculation block is outputted. The carry propagation adder adds the result of calculation by the sum calculation block and a result of calculation by the carry calculation block. In the case of multiplication for integers (finite field GF(p)), a result of calculation by the carry propagation adder is outputted.
    Type: Grant
    Filed: January 21, 2004
    Date of Patent: August 14, 2012
    Assignee: International Business Machines Corporation
    Inventors: Akashi Satoh, Kohji Takano
  • Patent number: 8099450
    Abstract: Combining circuitry for combining a plurality of multi-bit partial product terms in a multiplier circuit includes a plurality of compression columns, each column receiving a plurality of partial product term bits. At least one compression column includes: a first circuit arranged to receive a first set of the plurality of partial product term bits for the at least one compression column, the first circuit further arranged to combine the first set of term bits to produce a first combined term bit set; and a second circuit arranged to receive a second set of the plurality of term bits for the at least one compression column and all of the first combined term bit set.
    Type: Grant
    Filed: July 20, 2006
    Date of Patent: January 17, 2012
    Assignee: STMicroelectronics (Research & Development) Ltd.
    Inventor: Tariq Kurd
  • Patent number: 8019805
    Abstract: A floating point multiplier circuit includes partial product generation logic configured to generate a plurality of partial products from multiplicand and multiplier values. The plurality of partial products corresponds to a first and second portion of the multiplier value during respective first and second partial product execution phases. The multiplier also includes a plurality of carry save adders configured to accumulate the plurality of partial products generated during the first and second partial product execution phases into a redundant product during respective first and second carry save adder execution phases. The multiplier further includes a first carry propagate adder coupled to the plurality of carry save adders and configured to reduce a first and second portion of the redundant product to a multiplicative product during respective first and second carry propagate adder phases. The first carry propagate adder phase begins after the second carry save adder execution phase completes.
    Type: Grant
    Filed: December 9, 2003
    Date of Patent: September 13, 2011
    Assignee: GLOBALFOUNDRIES Inc.
    Inventor: Debjit Das Sarma
  • Publication number: 20100235414
    Abstract: A Montgomery multiplication device calculates a Montgomery product of an operand X and an operand Y with respect to a modulus M and includes a plurality of processing elements. In a first clock cycle, two intermediate partial sums are created by obtaining an input of length w?1 from a preceding processing element as w?1 least significant bits. The most significant bit is configured as either zero or one. Then, two partial sums are calculated using a word of the operand Y, a word of the modulus M, a bit of the operand X, and the two intermediate partial sums. In a second clock cycle, a selection bit is obtained from a subsequent processing element and one of the two partial sums is selected based on the value of the selection bit. Then, the selected partial sum is used for calculation of a word of the Montgomery product.
    Type: Application
    Filed: March 1, 2010
    Publication date: September 16, 2010
    Inventors: Miaoqing Huang, Krzysztof Gaj
  • Patent number: 7797366
    Abstract: Techniques for the design and use of a digital signal processor, including processing transmissions in a communications (e.g., code division multiple access) system. Power-efficient sign extension for Booth multiplication processes involves applying a sign bit in a Booth multiplication tree. The sign bit allows the Booth multiplication process to perform a sign extension step. This further involves one-extending a predetermined partial product row of the Booth multiplication tree using a sign bit for preserving the correct sign of the predetermined partial product row. The process and system resolve the signal value of the sign bit by generating a sign-extension bit in the Booth multiplication tree. The sign-extension bit is positioned in a carry-out column to extend the product of the Booth multiplication process.
    Type: Grant
    Filed: February 15, 2006
    Date of Patent: September 14, 2010
    Assignee: QUALCOMM Incorporated
    Inventors: Shankar Krithivasan, Christopher Edward Koob, William C. Anderson
  • Publication number: 20100057824
    Abstract: A computer system for computing a binary operation involving a first term multiplied by a second term resulting in a product, where the product is conditionally added to a third term in a central processing unit. The central processing unit includes a carry save adder configured to add a plurality of partial products obtained from the product of the first term and the second term to obtain a first partial result and a second partial result, a multiplexer configured to output one selected from the group consisting of the second term, the third term, and zero, and an alignment shifter configured to shift an output of the multiplexer to align the output of the multiplexer with the first partial result and the second partial result to obtain a shifted term. The shifted term, the first partial result and the second partial result are added together to obtain a result of the binary operation.
    Type: Application
    Filed: September 3, 2008
    Publication date: March 4, 2010
    Applicant: SUN MICROSYSTEMS, INC.
    Inventor: Leonard D. Rarick
  • Publication number: 20080243976
    Abstract: The present invention relates to a multiply apparatus and a method for multiplying a first operand consisting of na bits and a second operand consisting of nx bits. In one embodiment the multiply apparatus comprising a CSA (CSA) unit with nx rows each comprising na AND gates for calculating a single bit product of two single bit input values and adder cells for adding results of a preceding row to a following row and a last output row for outputting a carry vector and a sum vector, and logic circuitry for selectively inverting the single bit products at the most significant position of the nx?1 first rows and at the na?1 least significant positions of the output row in response to a first configuration signal before inputting the selectively inverted single bit products to respective adder cells for switching the CSA unit selectively between processing of signed two's complement operands and unsigned operands in response to the first configuration signal.
    Type: Application
    Filed: March 28, 2008
    Publication date: October 2, 2008
    Applicant: TEXAS INSTRUMENTS DEUTSCHLAND GMBH
    Inventor: Christian Wiencke
  • Patent number: 7424506
    Abstract: A method is presented comprising analyzing two or more input terms on a per-bit basis within each level of bit-significance. Maximally segmenting each of the levels of bit-significance into one or more one-, two-, and/or three-bit groups, and designing a hyperpipelined hybrid Wallace tree adder utilizing one or more full-adders, half-adders, and associated register based, at least in part, on the maximal segmentation of the input terms.
    Type: Grant
    Filed: March 31, 2001
    Date of Patent: September 9, 2008
    Assignee: Durham Logistics LLC
    Inventor: John T. Orchard
  • Patent number: 7392277
    Abstract: A cascaded differential domino four-to-two reducer. In an embodiment, the four-to-two reducer is constructed of a first three-to-two reducer and a second three-to-two reducer directly connected to the first three-to-two reducer. In a further embodiment, the first and second three-to-two reducer both include a symmetric carry generate gate.
    Type: Grant
    Filed: June 29, 2001
    Date of Patent: June 24, 2008
    Assignee: Intel Corporation
    Inventor: Thomas D. Fletcher
  • Patent number: 7373368
    Abstract: A multiply execution unit that can generate the integer product of a multiplicand and a multiplier and is also operable to generate the XOR product of the multiplicand and the multiplier. The multiply execution unit includes a summing circuit for summing a plurality of partial products. The summing circuit includes a plurality of rows. The summing circuit can generate an integer sum of the plurality of partial products and can generate an XOR sum of the plurality of partial products. The summing circuit includes a plurality of compressors in the first row of the summing circuit. The plurality of compressors each has more than three inputs that receive data, a carry output, and a sum output.
    Type: Grant
    Filed: July 15, 2004
    Date of Patent: May 13, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Leonard D. Rarick, Shu-Chin Tai
  • Patent number: 7334200
    Abstract: A low-error fixed-width multiplier receives a W-bit input and produces a W-bit product. In an embodiment, a multiplier (Y) is encoded using modified Booth coding. The encoded multiplier (Y) and a multiplicand (X) are processed together to generate partial products. The partial products are accumulated to generate a product (P). To compensate for the quantization error, Booth encoder outputs are used for the generation of error compensation bias. The truncated bits are divided into two groups, a major least significant bit group and a minor least significant bit group, depending upon their effects on the quantization error. Different error compensation methods are applied to each group.
    Type: Grant
    Filed: February 22, 2005
    Date of Patent: February 19, 2008
    Assignee: Broadcom Corporation
    Inventors: Keshab K. Parhi, Jin-Gyun Chung, Kwang-Cheol Lee, Kyung-Ju Cho
  • Patent number: 7313585
    Abstract: A multiplier circuit is disclosed for multiplying a multiplicand by a multiplier. The multiplier circuit includes a partial product generator and a partial product adder. The partial product generator includes a first input to receive a multiplicand; a second input to receive a multiplier; partial product generation means for producing a plurality of partial products based on the multiplicand and the multiplier; and an output coupled to the partial product generation means to provide the plurality of partial products. The partial product adder includes an input coupled to the output of the partial product generator; a plurality of adders to add the plurality of partial products to produce a final product, the plurality of adders comprising a plurality of compressors having substantially the same width; and an output coupled to the plurality of adders to provide the final product.
    Type: Grant
    Filed: August 30, 2003
    Date of Patent: December 25, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Paul W. Winterrowd
  • Patent number: 7212959
    Abstract: A method and apparatus for accumulating arbitrary length strings of input values, such as floating point values, in a layered tree structure such that the order of adds at each layer is maintained. The accumulating utilizes a shared adder, and includes means for directing initial inputs and intermediate result values.
    Type: Grant
    Filed: August 8, 2001
    Date of Patent: May 1, 2007
    Inventors: Stephen Clark Purcell, Scott Kimura, Mark L. Wood Patrick
  • Patent number: 7124162
    Abstract: A Wallace tree structure such as that used in a digital signal processor (DSP) is arranged to sum vectors. The structure has a number of adder stages, each of which may have half adders with two input nodes, and full adders with three input nodes. The structure is designed with reference to the vectors to be summed. The number of full- and half-adders in each stage and the arrangement of vector inputs depends upon their characteristics. An algorithm calculates the possible tree structures and input arrangements, and selects an optimum design having a small final stage ripple adder after the last stage of the Wallace tree structure, the design being based upon the characteristics of the vector inputs. This leads to reduced propagation delay and a reduced amount of semiconductor material for implementation of the DSP.
    Type: Grant
    Filed: October 29, 2002
    Date of Patent: October 17, 2006
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Alain Combes, Franz Steininger
  • Patent number: 7111033
    Abstract: A carry save adder circuit for reducing the number of inputs to a lower number of outputs, the carry save adder circuit including four carry save adders, the four carry save adders being arranged in two layers with the first and second carry save adders being arranged in a first of said layers and the third and fourth carry save adders being arranged in a second of the layers, said third and fourth carry save adders being arranged to provide the outputs, the third and fourth carry save adders each receiving at least one output from each of the first and second carry save adders and the first and second carry save adders being arranged to receive at least some of the inputs.
    Type: Grant
    Filed: July 30, 2001
    Date of Patent: September 19, 2006
    Assignee: STMicroelectronics S.A.
    Inventor: Sebastien Ferroussat
  • Patent number: 7111166
    Abstract: An extension of the serial/parallel Montgomery modular multiplication method with simultaneous reduction as previously implemented by the applicants, adapted innovatively to perform both in the prime number and in the GF(2q) polynomial based number field, in such a way as to simplify the flow of operands, by performing a multiple anticipatory function to enhance the previous modular multiplication procedures.
    Type: Grant
    Filed: May 14, 2001
    Date of Patent: September 19, 2006
    Assignee: Fortress U&T Div. M-Systems Flash Disk Pioneers Ltd.
    Inventors: Itai Dror, Carmi David Gressel, Michael Mostovoy, Alexey Molchanov
  • Patent number: 7043520
    Abstract: A partial carry-save format is employed for a finite impulse response filter output representation, thereby reducing a number of flip-flops and hence power. By replacing the least significant bit processing section on the output side of the finite impulse response filter with a combined carry-save adder and carry-propagate adder followed by a register rather than two flip-flops, the present invention reduces the load on the clock and achieves reduced propagation delay. To further improve the performance of the finite impulse response filter, a simpler carry-save adder is employed in the least significant bit section, which is possible due to the use of a single register at an input to each of the carry-save adders rather than two flip-flops, one for a carry output and one for a sum output from the adder.
    Type: Grant
    Filed: November 29, 2003
    Date of Patent: May 9, 2006
    Assignee: Agere Systems Inc.
    Inventors: Patrik Larsson, Christopher John Nicol
  • Patent number: 6989843
    Abstract: A sample-to-pixel calculation unit in a graphics system may comprise an adder tree. The adder tree includes a plurality of adder cells coupled in a tree configuration. Input values are presented to a first layer of adder cells. Each input value may have two associated control signals: a data valid signal and a winner-take-all signal. The final output of the adder tree equals (a) a sum of those input values whose data valid signals are asserted provided that none of the winner-take-all signals are asserted, or (b) a selected one of the input values if one of the winner-take-all bits is asserted. The selected input value is the one whose winner-take-all bit is set. The adder tree may be used to perform sums of weighted sample attributes and/or sums of coefficients values as part of pixel value computations.
    Type: Grant
    Filed: June 28, 2001
    Date of Patent: January 24, 2006
    Assignee: Sun Microsystems, Inc.
    Inventors: N. David Naegle, Scott R. Nelson
  • Patent number: 6978426
    Abstract: A low-error fixed-width multiplier receives a W-bit input and produces a W-bit product. In an embodiment, a multiplier (Y) is encoded using modified Booth coding. The encoded multiplier (Y) and a multiplicand (X) are processed together to generate partial products. The partial products are accumulated to generate a product (P). To compensate for the quantization error, Booth encoder outputs are used for the generation of error compensation bias. The truncated bits are divided into two groups, a major least significant bit group and a minor least significant bit group, depending upon their effects on the quantization error. Different error compensation methods are applied to each group.
    Type: Grant
    Filed: August 30, 2002
    Date of Patent: December 20, 2005
    Assignee: Broadcom Corporation
    Inventors: Keshab K. Parhi, Jin-Gyun Chung, Kwang-Cheol Lee, Kyung-Ju Cho
  • Patent number: 6973471
    Abstract: A multiplier (42) forms a product from two signed operands without performing a sign extension of the multiplicand (A). A modified Booth's recoding of the multiplier operand (B) is begun immediately without being delayed by a sign extension operation. While recoding and partial product generation is occurring, a determination is made in parallel whether or not a sign extension adjustment term must be created. When needed, a value equal to (?B) (2N), where N is equal to a bit width of the multiplicand (A), is formed in parallel with the recoding and partial product generation. The sign extension adjustment term is coupled to a plurality of carry save adders (49, 51, 53) that compress a plurality of partial products to a sum term and a carry term. A final add stage combines the sum term and carry term to provide a product with correct sign extension.
    Type: Grant
    Filed: February 22, 2002
    Date of Patent: December 6, 2005
    Assignee: Freescale Semiconductor, Inc.
    Inventor: Trinh Huy Nguyen
  • Patent number: 6763367
    Abstract: An apparatus and method for compressing a reduction array into an accumulated carry-save sum. The reduction array includes a partial product matrix, a carry-save sum, and a constant value row. A compressor array generates a previous accumulated carry-save sum. A three-input/two-output carry-save adder pre-reduces the constant value row and the previously accumulated carry-save sum into a two-row intermediate carry-save sum that is added to the partial product matrix to form a current accumulated carry-save sum.
    Type: Grant
    Filed: December 11, 2000
    Date of Patent: July 13, 2004
    Assignee: International Business Machines Corporation
    Inventors: Ohsang Kwon, Kevin J. Nowka
  • Patent number: 6732135
    Abstract: In a digital processor performing division, quotient accumulation apparatus is formed of a set of muxes and a single carry save adder. Partial quotients are accumulated in carry-save form with proper sign extension. Delay of partial quotient bit fragments from one iteration to a following iteration enables the apparatus to limit use to one carry save adder. By enlarging minimal logic, the quotient accumulation apparatus operates at a rate fast enough to support the rate of fast dividers.
    Type: Grant
    Filed: January 31, 2000
    Date of Patent: May 4, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Sridhar Samudrala, John D. Clouser, William R. Grundmann
  • Patent number: 6721774
    Abstract: A digital multiplier 110 for multiplying a plurality of multiplicand signals X0-X23 representing a multiplicand and a plurality of multiplier signals Y0-Y23 representing a multiplier. In it, a plurality of intermediate results signals, such as partial product signals, are generated from the multiplicand signals and the multiplier signals. A plurality of adder circuits 40 are also provided for adding the intermediate results signals to generate a plurality of final result signals representing the result of multiplying the multiplicand and the multiplier, wherein at least some of the adder circuits receive first signals representing intermediate addition results from at least two prior adder stages and also receive second signals representing intermediate results generated as the result of only a single addition.
    Type: Grant
    Filed: May 7, 1998
    Date of Patent: April 13, 2004
    Assignee: Texas Instruments Incorporated
    Inventors: Wai Lee, Toshiyuki Sakuta
  • Patent number: 6692534
    Abstract: The present invention provides an apparatus for booth decoding which stores the most significant bit of the lower half of the number used as the key for booth decoding. By using this stored bit to determine the rightmost booth group corresponding to the upper half of the key, booth decoding may be accomplished more quickly using an apparatus that is simpler and smaller than prior art assemblies.
    Type: Grant
    Filed: September 8, 1999
    Date of Patent: February 17, 2004
    Assignee: Sun Microsystems, Inc.
    Inventors: Yong Wang, Allan Tzeng
  • Patent number: 6615229
    Abstract: The present invention relates to a new low-power, high performance multiplier circuit design, and more specifically to a partitioned multiplier implemented using a modified, symmetrical Wallace tree structure that enables the power to parts of the multiplier to be selectively turned on and off. A multiplier implemented using complementary pass-transistor logic (CPL) 3:2 carry save adders (CSAs) includes a left array with a first multiple of CPL CSAs, a right array with a second multiple of CPL CSAs, and a merge block coupled to the left array and the right array, such that the left and right arrays are not coupled to each other. The left and right arrays are configured to independently receive power, such that, each array can be turned on and off without affecting the other array. The merge block includes a third multiple of CPL CSAs and the merge block can be configured to output a result value of a multiplication operation.
    Type: Grant
    Filed: June 29, 2000
    Date of Patent: September 2, 2003
    Assignee: Intel Corporation
    Inventors: Narsing Vijayrao, Chi Keung Lee, Kumar Sudarshan
  • Patent number: 6611857
    Abstract: A multiplier (12) is disclosed that includes an encoder (36), a hierarchy of compressors (40, 42, 44, 50, 52, 60 and 70), a bit detector (130) and a switch (134). The encoder (36) is operable to receive a first and second encoder input. The compressors (40, 42, 44, 50, 52, 60 and 70) are coupled to the encoder (36). The compressors (40,42, 44, 50, 52, 60 and 70) are operable to receive a first number of inputs and to generate a second number of outputs, with the second number being less than the first number. The bit detector (130) is operable to monitor the first encoder input to determine whether the first encoder input is in a reduced precision range (28). The bit detector (130) is also operable to deactivate a subset of the compressors (40 and 50) when the bit detector (130) determines that the first encoder input is in the reduced precision range (28). The switch (134) is coupled to a specified one of the compressors (42).
    Type: Grant
    Filed: November 15, 2000
    Date of Patent: August 26, 2003
    Assignee: Texas Instruments Incorporated
    Inventors: Carl E. Lemonds, Alan Gatherer
  • Publication number: 20030093454
    Abstract: A Wallace tree structure such as that used in a DSP is arranged to sum vectors. The structure has a number of adder stages (365, 370, 375), each of which may have half adders (300) with two input nodes, and full adders (310) with three input nodes. The structure is designed with reference to the vectors to be summed. The number of full- and half-adders in each stage and the arrangement of vector inputs depends upon their characteristics. An algorithm calculates the possible tree structures and input arrangements, and selects an optimum design having a small final stage ripple adder (380), the design being based upon the characteristics of the vector inputs. This leads to reduced propagation delay and a reduced amount of semiconductor material for implementation of the DSP.
    Type: Application
    Filed: October 29, 2002
    Publication date: May 15, 2003
    Inventors: Alain Combes, Franz Steininger
  • Patent number: 6535902
    Abstract: A multiplier circuit has an encoder and a partial product bit generating circuit. The encoder receives a multiplier bit signal arid is used to output a plurality of encode signals. The partial product bit generating circuit receives the encode signals along with a multiplicand bit signal from each digit place and is used to generate a partial product bit for each digit place. The partial product bit generating circuit has a first selection circuit which is used to select a logically true signal from among the encode signals in accordance with a value of the multiplicand bit signal. Therefore, the circuit can be reduced in size by reducing the number of necessary elements without sacrificing its high speed capability.
    Type: Grant
    Filed: April 2, 2001
    Date of Patent: March 18, 2003
    Assignee: Fujitsu Limited
    Inventor: Gensuke Goto
  • Patent number: 6535901
    Abstract: A method and apparatus for generating a fast multiply accumulation circuit includes processing that begins by determining number of current partial products for a multiplication of a first multiplicand and a second multiplicand. The processing then continues by determining size of the current partial products. The processing then continues by identifying one of a plurality of reduction patterns based on the size of the current partial products. The processing then continues by determining number of, and configuration of, full adders and half adders required for a reduction function of the current partial products based on the one of the plurality of reduction patterns and the size of the current partial products, wherein the multiply-accumulator performs the reduction function.
    Type: Grant
    Filed: April 26, 2000
    Date of Patent: March 18, 2003
    Assignee: Sigmatel, Inc.
    Inventor: Robert T Grisamore
  • Patent number: 6484193
    Abstract: A fully pipelined parallel multiplier with a fast clock cycle. The pipelined parallel multiplier contains three units: a bit-product matrix unit, a reduction unit, and an addition unit. The bit-product matrix is configured to receive two binary numbers, a multiplier and a multiplicand. A bit-product matrix is formed based on these two numbers. The bit-product matrix unit forms a first pipeline stage. The bit-product matrix is latched to the reduction unit using d-type latch circuits. The reduction unit includes a plurality of reduction stages, with each reduction stage acting as a pipeline stage. The reduction unit reduces the matrix down to a two-row matrix. Intermediate results are latched from one stage to the next using d-type latch circuits. The reduction unit also contains a plurality of half-adder and full-adder circuits. The final two-row matrix formed by the reduction unit is then latched to an addition unit.
    Type: Grant
    Filed: July 30, 1999
    Date of Patent: November 19, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gwangwoo Johnny Choe, James R. MacDonald
  • Patent number: 6446104
    Abstract: A double-precision multiplier for use in the floating point pipeline of a processor has an array multiplier and a carry-save partial-product accumulator. Double precision multiplication is accomplished by generating a plurality of partial products and summing these in the carry-save partial-product accumulator. The partial-product accumulator has a carry-save adder, a sum register, a carry-out counter and an extender. The carry-out counter receives a carry outputs of the carry-save adder and array multiplier, and the extender is coupled to extend the sum register dependent upon the contents of the carry-out counter. The extension occurs during addition of the most significant partial product to the sum of less significant partial products.
    Type: Grant
    Filed: September 15, 1999
    Date of Patent: September 3, 2002
    Assignee: Sun Microsystems, Inc.
    Inventors: Tzungren Allan Tzeng, Choon Ping Chng
  • Patent number: 6442582
    Abstract: A multiplier carry bit compression apparatus and method for a multiplier using Wallace tree addition structures uses a plurality of early and late carry bit compression operations for each level of the Wallace tree addition structure. For each level in a Wallace tree addition structure, each early carry bit compression operation compresses early compression bits prior to each corresponding late carry bit compression operation that compresses late carry bits.
    Type: Grant
    Filed: June 17, 1999
    Date of Patent: August 27, 2002
    Assignee: ATI International SRL
    Inventor: Stephen C. Hale
  • Publication number: 20020116433
    Abstract: A multiply-accumulate module (100) includes a multiply-accumulate core (120), which includes a plurality of Booth encoder cells (104a). The multiply-accumulate core (120) also includes a plurality of Booth decoder cells (110a) connected to at least one of the Booth encoder cells (104a) and a plurality of Wallace tree cells (112a) connected to at least one of the Booth decoder cells (110a). Moreover, at least one first Wallace tree cell (112a1) or at least one first Booth decoder cell (110a1), or any combination thereof, includes a first plurality of transistors, and at least one second Wallace tree cell (112a2) or at least one second Booth decoder cell (110a2), or any combination thereof, includes a second plurality of transistors. In addition, at least one critical path of the multiply-accumulate module (100) includes the at least one first cell and a width of at least one of the first plurality of transistors is greater than a width of at least one of the second plurality of transistors.
    Type: Application
    Filed: September 27, 2001
    Publication date: August 22, 2002
    Inventors: Kaoru Awaka, Hiroshi Takahashi, Shigetoshi Muramatsu, Akihiro Takegama