Multiplication Followed By Addition (i.e., X*y+z) Patents (Class 708/523)
-
Publication number: 20140122554Abstract: In an embodiment, a fused multiply-add (FMA) circuit is configured to receive a plurality of input data values to perform an FMA instruction on the input data values. The circuit includes a multiplier unit and an adder unit coupled to an output of the multiplier unit, and a control logic to receive the input data values and to reduce switching activity and thus reduce power consumption of one or more components of the circuit based on a value of one or more of the input data values. Other embodiments are described and claimed.Type: ApplicationFiled: October 31, 2012Publication date: May 1, 2014Inventors: Brian J. Hickmann, Dennis R. Bradford, Thomas D. Fletcher
-
Patent number: 8706791Abstract: Embodiments of the invention are directed to system and method that enable relatively low power dissipation by scheduling operations of multiply accumulators chain of two or more multiply accumulators units by delivering an output result of a first multiply accumulator of the chain as an input to a second subsequent multiply accumulator of the chain.Type: GrantFiled: July 30, 2009Date of Patent: April 22, 2014Assignee: Ceva D.S.P. Ltd.Inventor: Jeffrey Allan (Alon) Jacob (Yaakov)
-
Publication number: 20140101220Abstract: A composite finite field multiplier is disclosed. The multiplier includes a controller, an input port, an output port, a GF((2n)2) multiplier, a GF(2n) standard basis multiplier, and a GF(2n) look-up table multiplier; the controller is connected respectively to the input port, the output port, the GF((2n)2) multiplier, the GF(2n) standard basis multiplier and the GF(2n) look-up table multiplier; the GF((2n)2) multiplier is connected respectively to the GF(2n) standard basis multiplier and the GF(2n) look-up table multiplier. By using the GF((2n)2) multiplier, the GF(2n) standard basis multiplier and the GF(2n) look-up table multiplier, the multiplication of three operands is realized. Compared with the existing multiplier, the multiplier of the present invention has significant advantages in the speed of multiplying three operands over GF((2n)m).Type: ApplicationFiled: May 25, 2012Publication date: April 10, 2014Inventors: Shaohua Tang, Haibo Yi
-
Publication number: 20140095570Abstract: An apparatus and method for calculating an internal state for artificial emotions are disclosed, of which the method comprises multiplying an input value obtained from a sensor with a first personality set in accordance with at least one low rank element contained in at least one high rank element of a NEO PI-R (Revised NEO Personality Inventory); calculating a personality factor value in a Five-Factor Model of the personality by adding the results of the multiplication; and calculating the internal state by multiplying the personality factor value with a second personality.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Applicant: KOREA INSTITUTE OF INDUSTRIAL TECHNOLOGYInventor: KOREA INSTITUTE OF INDUSTRIAL TECHNOLOGY
-
Publication number: 20140095568Abstract: A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.Type: ApplicationFiled: December 3, 2013Publication date: April 3, 2014Inventors: MAARTEN J. BOERSMA, KLAUS M. KROENER, CHRISTOPHE J. LAYER, SILVIA M. MUELLER
-
Publication number: 20140082036Abstract: The disclosed embodiments disclose techniques for using a split division circuit that includes a first divider that is optimized for a first range of divisor values and a second divider that is optimized for a second range of divisor values; the first range is distinct from the second range. During operation, the circuit receives a divisor for the division operation. The circuit: determines whether the divisor is in the first range or the second range to determine whether the first divider or the second divider should perform the division operation; performs the division operation in the selected host divider; and then outputs the result that was generated by the selected host divider.Type: ApplicationFiled: March 15, 2013Publication date: March 20, 2014Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Josephus C. Ebergen, Navaneeth P. Jamadagni, Ivan E. Sutherland
-
Publication number: 20140067889Abstract: A datapath circuit may include a digital multiply and accumulate circuit (MAC) and a digital hardware calculator for parallel computation. The digital hardware calculator and the MAC may be coupled to an input memory element for receipt of input operands. The MAC may include a digital multiplier structure with partial product generators coupled to an adder to multiply a first and second input operands and generate a multiplication result. The digital hardware calculator may include a first look-up table coupled between a calculator input and a calculator output register. The first look-up table may include table entry values mapped to corresponding math function results in accordance with a first predetermined mathematical function. The digital hardware calculator may be configured to calculate, based on the first look-up table, a computationally hard mathematical function such as a logarithm function, an exponential function, a division function and a square root function.Type: ApplicationFiled: August 27, 2013Publication date: March 6, 2014Applicant: ANALOG DEVICES A/SInventor: Mikael M. MORTENSEN
-
Patent number: 8667042Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.Type: GrantFiled: September 24, 2010Date of Patent: March 4, 2014Assignee: Intel CorporationInventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
-
Patent number: 8667046Abstract: A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).Type: GrantFiled: February 20, 2009Date of Patent: March 4, 2014Assignee: Ecole Polytechnique Federale de Lausanne/Service des Relations IndustriellesInventors: Philip Brisk, Alessandro Cevrero, Frank K. Gurkaynak, Paolo Ienne Lopez, Hadi Parandeh-Afshar
-
Patent number: 8649508Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.Type: GrantFiled: September 29, 2008Date of Patent: February 11, 2014Assignee: Tata Consultancy Services Ltd.Inventor: Natarajan Vijayarangan
-
Patent number: 8645450Abstract: Multiplier-accumulator circuitry includes circuitry for forming a plurality of partial products of multiplier and multiplicand inputs, carry-save adder circuitry for adding together the partial products and another input to produce intermediate sum and carry outputs, final adder circuitry for adding together the intermediate sum and carry outputs to produce a final output, and feedback circuitry for applying the final output (typically after some delay, e.g., due to registration of the final output) to the carry-save adder circuitry as said another input. The above circuitry may be implemented in so-called “hard IP” (intellectual property) of a field-programmable gate array (“FPGA”) integrated circuit device. If desired, any overflow from the accumulation performed by the above circuitry may be accumulated in “soft” accumulator-overflow circuitry that is implemented in the general-purpose programmable logic of the FPGA.Type: GrantFiled: March 2, 2007Date of Patent: February 4, 2014Assignee: Altera CorporationInventors: Kok Heng Choe, Tony K Ngai, Henry Y. Lui
-
Publication number: 20130346462Abstract: An arithmetic logic stage in a graphics pipeline includes a number of arithmetic logic units (ALUs). The ALUs each include, for example, a multiplier and an adder. The ALUs are interconnected by circuitry that, for example, routes the output from the multiplier in one ALU to both the adder in that ALU and an adder in another ALU.Type: ApplicationFiled: August 27, 2013Publication date: December 26, 2013Inventors: Tyson BERGLAND, Michael J.M. TOKSVIG, Justin Michael MAHAN
-
Publication number: 20130346461Abstract: An apparatus for calculating a result of a scalar multiplication of a reference number with a reference point on an elliptic curve comprises a point selector and a processor. The point selector is configured to select randomly or pseudo-randomly an auxiliary point on the elliptic curve. The processor is configured to calculate the result of the scalar multiplication with a double-and-always-add process using the auxiliary point.Type: ApplicationFiled: August 22, 2013Publication date: December 26, 2013Applicant: Infineon Technologies AGInventor: Wieland Fischer
-
Publication number: 20130332501Abstract: A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.Type: ApplicationFiled: June 11, 2012Publication date: December 12, 2013Applicant: IBM CorporationInventors: Maarten J. Boersma, Klaus Michael Kroener, Christophe J. Layer, Silvia M. Mueller
-
Patent number: 8606840Abstract: A fused multiply add (FMA) unit includes an alignment counter configured to calculate an alignment shift count, an aligner configured to align an addend input based on the alignment shift count and output an aligned addend, a multiplier configured to multiply a first multiplicand input and a second multiplicand input and output a product, an adder configured to add the aligned addend and the product and output a sum without determining the sign of the sum or complementing the sum, a normalizer configured to receive the sum directly from the adder and normalize the sum irrespective of the sign of the sum and output a normalized sum, and a rounder configured to round and complement-adjust the normalized sum and output a final mantissa.Type: GrantFiled: March 17, 2010Date of Patent: December 10, 2013Assignee: Oracle International CorporationInventor: Sadar Ahmed
-
Patent number: 8595280Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.Type: GrantFiled: October 29, 2010Date of Patent: November 26, 2013Assignee: ARM LimitedInventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri
-
Patent number: 8589465Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.Type: GrantFiled: May 8, 2013Date of Patent: November 19, 2013Assignee: Altera CorporationInventors: Suleyman Sirri Demirsoy, Hyun Yi
-
Publication number: 20130262549Abstract: An arithmetic circuit includes a circuit to output n-th multiples of a multiplicand, a circuit to output an XOR operation result that is a result of performing an exclusive logical sum operation between the multiplicand and a result of shifting the multiplicand to left by one bit, a circuit to output a first selection signal in response to a first portion of a multiplier, a circuit to output a second selection signal in response to a second portion of the multiplier, a circuit to select, in response to the first selection signal, one of the n-th multiples of the multiplicand and the XOR operation result, a circuit to select, in response to the second selection signal, one of the n-th multiples of the multiplicand and the XOR operation result, and a circuit to output a result of adding up the first partial product and the second partial product.Type: ApplicationFiled: January 8, 2013Publication date: October 3, 2013Inventor: Kenichi KITAMURA
-
Patent number: 8543634Abstract: A specialized processing block such as a DSP block may be enhanced by including direct connections that allow the block output to be directly connected to either the multiplier inputs or the adder inputs of another such block. A programmable integrated circuit device may includes a plurality of such specialized processing blocks. The specialized processing block includes a multiplier having two multiplicand inputs and a product output, an adder having as one adder input the product output of the multiplier, and having a second adder input and an adder output, a direct-connect output of the adder output to a first other one of the specialized processing block, and a direct-connect input from a second other one of the specialized processing block. The direct-connect input connects a direct-connect output of that second other one of the specialized processing block to a first one of the multiplicand inputs.Type: GrantFiled: March 30, 2012Date of Patent: September 24, 2013Assignee: Altera CorporationInventors: Lei Xu, Volker Mauer, Steven Perry
-
Patent number: 8533250Abstract: Circuits for a multiplier with a built-in accumulator and a method of performing multiplication with accumulation are disclosed. An embodiment of the disclosed circuits includes a logic circuit coupled to receive two inputs. The logic circuit is capable of generating a plurality of value bits from the inputs received. In one embodiment, the logic circuit includes a Booth recoder circuit that generates a plurality of partial products. A block of adders is coupled to logic circuit to receive and sum up the value bits. An adder adds the summation result from the block of adders to a previous accumulated value to generate intermediate sum and carry values. An accumulator, coupled to the adder, receives and stores the intermediate values.Type: GrantFiled: June 17, 2009Date of Patent: September 10, 2013Assignee: Altera CorporationInventors: Kok Yoong Foo, Yan Jiong Boo, Geok Sun Chong, Boon Jin Ang, Kar Keng Chua
-
Publication number: 20130226982Abstract: An apparatus and a method for generating a partial product for a polynomial operation are provided. The apparatus includes first encoders, each of the first encoders configured to selectively generate and output one of mutually exclusive values based on two inputs. The apparatus further includes a second encoder configured to generate and output two candidate partial products based on an output from a first one of the first encoders that is provided at a reference bit position of the inputs, an output from a second one of the first encoders that is provided at an upper bit position of the inputs, and a multiplicand. The apparatus further includes a multiplexer configured to select one of the candidate partial products output from the second encoder.Type: ApplicationFiled: August 17, 2012Publication date: August 29, 2013Applicant: Samsung Electronics Co., Ltd.Inventor: Hyeong-Seok YU
-
Patent number: 8521800Abstract: An arithmetic logic stage in a graphics pipeline includes a number of arithmetic logic units (ALUs). The ALUs each include, for example, a multiplier and an adder. The ALUs are interconnected by circuitry that, for example, routes the output from the multiplier in one ALU to both the adder in that ALU and an adder in another ALU.Type: GrantFiled: August 15, 2007Date of Patent: August 27, 2013Assignee: Nvidia CorporationInventors: Tyson J. Bergland, Michael J. M. Toksvig, Justin M. Mahan
-
Publication number: 20130198254Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.Type: ApplicationFiled: March 13, 2013Publication date: August 1, 2013Inventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
-
Patent number: 8495124Abstract: A decimal multiplication mechanism for fixed and floating point computation in a computer having a coefficient mechanism without resulting leading zero detection (LZD) and process which assumes that the final product will be M+N digits in length and performs all calculations based on this assumption. Least significant digits that would be truncated are no longer stored, but retained as sticky information which is used to finalize the result product. Once the computation of the product is complete, a final check based on the examination of key bits observed during partial product accumulation is used to determine if the final product is truly M+N digits in length, or M+N?1 digits. If the latter is true, then corrective final product shifting is employed to obtain the proper result. This eliminates the need for dedicated leading zero detection hardware used to determine the number of significant digits in the final product.Type: GrantFiled: June 23, 2010Date of Patent: July 23, 2013Assignee: International Business Machines CorporationInventors: Steven R. Carlough, Adam B. Collura, Michael Kroener, Silvia Melitta Mueller
-
Patent number: 8495122Abstract: Described is a programmable logic device (PLD) with columns of DSP slices that can be cascaded to create DSP circuits of varying size and complexity. Each slice includes a mode port that receives mode control signals for dynamically altering the function and connectivity of related slices. Such alterations can occur with or without reconfiguring the PLD.Type: GrantFiled: December 21, 2004Date of Patent: July 23, 2013Assignee: Xilinx, Inc.Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
-
Patent number: 8489665Abstract: A dividing unit sets an actual packet length transferred from a packet receiving section to a variable U, and then sets 2? to a variable V. If a positive number determining section determines that a subtraction result of subtracting a remainder N0 from a quotient M0, both found by dividing U by V, is a positive number, the dividing unit overwrites the subtraction result to U. The dividing unit repeats such operations of dividing the subtraction result by V, until the positive number determining section determines that the subtraction result of subtracting the remainder from the quotient, both found by dividing U by V, is a non-positive number. When the subtraction result becomes a non-positive number and the quotient and the remainder match, a packet length determining section determines that received data has a normal size, and notifies it to a discard determining section.Type: GrantFiled: January 28, 2009Date of Patent: July 16, 2013Assignee: Fujitsu LimitedInventors: Fuyuta Sato, Hideo Okawa
-
Patent number: 8463834Abstract: A floating point multiplier includes a data path in which a plurality of partial products are calculated and then reduced to a first partial product and a second partial product. Shift amount determining circuitry 100 analyzes the exponents of the input operands A and B as well as counting the leading zeros in the fractional portions of these operands to determine an amount of left shift or right shift to be applied by shifting circuitry 200, 202 within the multiplier data path. This shift amount is applied so as to align the partial products so that when they are added they will produce the result C without requiring this to be further shifted. Furthermore, shifting the partial products to the correct alignment in this way in advance of adding these partial products permits injection rounding combined with the adding of the partial products to be employed for cases including subnormal values.Type: GrantFiled: November 3, 2009Date of Patent: June 11, 2013Assignee: ARM LimitedInventor: David Raymond Lutz
-
Patent number: 8463832Abstract: Various implementations of a digital signal processing (DSP) block architecture of a programmable logic device (PLD) and related methods are provided. In one example, a PLD includes a dedicated DSP block. The DSP block includes a first multiplier adapted to multiply a first plurality of input signals to provide a first plurality of product signals. The DSP block also includes a second multiplier adapted to multiply a second plurality of input signals to provide a second plurality of product signals. The DSP block further includes an arithmetic logic unit (ALU) adapted to operate on the first product signals and the second product signals received at first and second operand inputs, respectively, of the ALU to provide a plurality of output signals.Type: GrantFiled: June 25, 2008Date of Patent: June 11, 2013Assignee: Lattice Semiconductor CorporationInventors: Asher Hazanchuk, Ian Ing, Satwant Singh
-
Patent number: 8458243Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.Type: GrantFiled: March 3, 2010Date of Patent: June 4, 2013Assignee: Altera CorporationInventors: Suleyman Sirri Demirsoy, Hyun Yi
-
Patent number: 8447800Abstract: In a denormal support mode, the normalization circuit of a floating-point adder is used to normalize or denormalized the output of a floating-point multiplier. Each floating-point multiply instruction is speculatively converted to a multiply-add instruction, with the addend forced to zero. This preserves the value of the product, while normalizing or denormalizing the product using the floating-point adder's normalization circuit. When the operands to the multiply operation are available, they are inspected. If the operands will not generate an unnormal intermediate product or a denormal final product, the add operation is suppressed, such as by operand-forwarding. Additionally, each non-fused floating-point multiply-add instruction is replaced with a multiply-add instruction having a zero addend, and a floating-point add instruction having the addend of the original multiply-add instruction is inserted into the instruction stream.Type: GrantFiled: February 14, 2011Date of Patent: May 21, 2013Assignee: QUALCOMM IncorporatedInventors: Kenneth Alan Dockser, Pathik Sunil Lall
-
Patent number: 8438208Abstract: A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M.Type: GrantFiled: June 19, 2009Date of Patent: May 7, 2013Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla, Paul J. Jordan
-
Patent number: 8417760Abstract: For calculating a result of a modular multiplication with long operands, at least the multiplicand is divided into at least three shorter portions. Using the three shorter portions of the multiplicand, the multiplier and the modulus, a modular multiplication is performed within a cryptographic calculation, wherein the portions of the multiplicand, the multiplier and the modulus are parameters of the cryptographic calculation. The calculation is performed sequentially using the portions of the multiplicand and using an intermediate result obtained in a previous calculation, until all portions of the multiplicand are processed, to obtain the final result of the modular multiplication. The calculation of an intermediate result is performed using a multiplication addition operation, in which MMD operations and updating operations are performed sequentially, and short auxiliary registers and short result registers are used.Type: GrantFiled: October 30, 2006Date of Patent: April 9, 2013Assignee: Infineon Technologies AGInventor: Wieland Fischer
-
Patent number: 8396911Abstract: In a determination as to similarity on parts of a piece of data, high-speed processing is performed without the need for a database. Division signal lines (L1 to Lk) that transmit signals corresponding to division data are used.Type: GrantFiled: September 25, 2008Date of Patent: March 12, 2013Assignee: Toshiba Information Systems (Japan) CorporationInventor: Akiyoshi Oguro
-
Patent number: 8346838Abstract: A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.Type: GrantFiled: September 15, 2009Date of Patent: January 1, 2013Assignee: Intel CorporationInventors: Eric Debes, William W. Macy, Jonathan J. Tyler
-
Patent number: 8332452Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of dot product operations to generate operands for generating operands for a new vector. The dot product operations may require the issue of a plurality of permute instructions to arrange the vector operands in desired locations of a target register. Embodiments of the invention provide a dot product instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby avoiding the need for permute instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.Type: GrantFiled: October 31, 2006Date of Patent: December 11, 2012Assignee: International Business Machines CorporationInventors: Eric Oliver Mejdrich, Adam James Muff
-
Patent number: 8316071Abstract: Sum and carry signals are formed representing a product of a first and a second operand. A bias signal is formed having a value determined by a sign of a product of the first and the second operand. An output signal is provided based on an addition of the sum signal, the carry signal, a sign-extended addend, and the bias signal. A portion of the output signal, a saturated minimum value, or a saturated maximum value, is selected as a final result based on the sign of the product and a sign of the output signal.Type: GrantFiled: May 27, 2009Date of Patent: November 20, 2012Assignee: Advanced Micro Devices, Inc.Inventors: Kevin A. Hurd, Scott A. Hilker
-
Patent number: 8301681Abstract: A specialized processing block for a programmable logic device includes circuitry for performing multiplications and sums thereof, as well as circuitry for performing floating point operations. The floating point circuitry preferably includes rounding and normalization circuitry. To perform mantissa multiplications, the floating point circuitry preferably relies on the aforementioned multipliers of the specialized processing block.Type: GrantFiled: June 5, 2006Date of Patent: October 30, 2012Assignee: Altera CorporationInventors: Kwan Yee Martin Lee, Martin Langhammer, Triet M. Nguyen, Yi-Wen Lin
-
Patent number: 8280940Abstract: A data processing apparatus including a register bank, a shadow register and an arithmetic operation unit. The register bank includes a number of registers respectively for storing a number of operands wherein the registers are n-bit registers, and n is a nature number. The shadow register stores a first backup operand for making a backup of a first operand, which is stored in a first one of the registers in response to first control signal. The arithmetic operation unit performs at least an arithmetic operation on the operands to obtain operational data, and stores the operational data in the first register in response to an arithmetic operation command.Type: GrantFiled: October 22, 2007Date of Patent: October 2, 2012Assignee: Himax Technologies LimitedInventors: Chun-Yu Chen, Shu-Ming Liu
-
Patent number: 8266199Abstract: A specialized processing block for a programmable logic device incorporates a fundamental processing unit that performs a sum of two multiplications, adding the partial products of both multiplications without computing the individual multiplications. Such fundamental processing units consume less area than conventional separate multipliers and adders. The specialized processing block further has input and output stages, as well as a loopback function, to allow the block to be configured for various digital signal processing operations.Type: GrantFiled: June 5, 2006Date of Patent: September 11, 2012Assignee: Altera CorporationInventors: Martin Langhammer, Kwan Yee Martin Lee, Orang Azgomi, Keone Streicher, Yi-Wen Lin
-
Patent number: 8266198Abstract: A specialized processing block for a programmable logic device includes circuitry for performing multiplications and sums thereof, as well as circuitry for rounding the result. The rounding circuitry can selectably perform round-to-nearest and round-to-nearest-even operations. In addition, the bit position at which rounding occurs is preferably selectable. The specialized processing block preferably also includes saturation circuitry to prevent overflows and underflows, and the bit position at which saturation occurs also preferably is selectable. The selectability of both the rounding and saturation positions provides control of the output data word width. The rounding and saturation circuitry may be selectably located in different positions based on timing needs. Similarly, rounding may be speeded up using a look-ahead mode in which both rounded and unrounded results are computed in parallel, with the rounding logic selecting between those results.Type: GrantFiled: June 5, 2006Date of Patent: September 11, 2012Assignee: Altera CorporationInventors: Kwan Yee Martin Lee, Martin Langhammer, Yi-Wen Lin, Triet M. Nguyen
-
Patent number: 8239439Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed.Type: GrantFiled: December 13, 2007Date of Patent: August 7, 2012Assignee: International Business Machines CorporationInventors: Adam J. Muff, Matthew R. Tubbs
-
Patent number: 8239441Abstract: Modifying a leading zero estimation during an unfused multiply add operation of (A*B)+C. A plurality of terms x and y may be received, and each may be based on truncated terms s and t (e.g., in performing the unfused multiply add operation) and the shifted C term. A first leading zero estimation may be calculated based on the terms x and y. It may be determined if near total catastrophic cancellation has occurred. A carry in from a right most number of bits of the terms s and t and the most significant truncated bits of s and t may be used to generate a second leading zero estimation based on the first leading zero estimation if the near total catastrophic cancellation has occurred.Type: GrantFiled: May 15, 2008Date of Patent: August 7, 2012Assignee: Oracle America, Inc.Inventor: Leonard D. Rarick
-
Patent number: 8239440Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.Type: GrantFiled: March 28, 2008Date of Patent: August 7, 2012Assignee: Oracle America, Inc.Inventors: Jeffrey S. Brooks, Christopher H. Olson
-
Publication number: 20120173600Abstract: Provided are an apparatus and method for performing a complex number operation using a Single Instruction Multiple Data (SIMD) architecture. A SIMD operation apparatus may perform, in parallel, a real part operation and an imaginary part operation of a plurality of complex numbers. The real part operation and the imaginary part operation may be performed sequentially, or in parallel.Type: ApplicationFiled: August 16, 2011Publication date: July 5, 2012Inventors: Young Hwan Park, Ho YANG
-
Publication number: 20120150933Abstract: Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.Type: ApplicationFiled: July 15, 2011Publication date: June 14, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Maarten J. Boersma, Markus Kaltenbach, Jens Leenstra, Tim Niggemeier, Philipp Oehler, Philipp Panitz
-
Patent number: 8200725Abstract: An arithmetic processing system processes a sensing signal and a first approximate offset signal to obtain a second approximate offset signal. The system includes a first arithmetic processor and a second arithmetic processor. The first arithmetic processor receives and processes the sensing signal and the first approximate offset signal to output a first arithmetic signal. The second arithmetic processor processes the first arithmetic signal to output a second arithmetic signal, and the second arithmetic signal is added with a predetermined offset signal to obtain the second approximate offset signal, and the second approximate offset signal is closer to a real offset signal of the sensing signal than the first approximate offset signal. A method of arithmetic processing is also disclosed.Type: GrantFiled: November 6, 2007Date of Patent: June 12, 2012Assignee: Asia Optical Co., Inc.Inventors: Kun-Chi Liao, Yu-Ting Lee
-
Patent number: 8190669Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.Type: GrantFiled: October 20, 2004Date of Patent: May 29, 2012Assignee: NVIDIA CorporationInventors: Stuart F. Oberman, Ming Y. Siu
-
Publication number: 20120078992Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.Type: ApplicationFiled: September 24, 2010Publication date: March 29, 2012Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
-
Patent number: 8112466Abstract: An efficient implementation of DSP functions in a field programmable gate array (FPGA) using one or more computational blocks, each block having of a multiplier, an accumulator, and multiplexers. The structure implements most common DSP equations in a fast and a highly compact manner. A novel method for cascading these blocks with the help of dedicated DSP lines is provided, which leads to a very simple and proficient implementation of n-stage MAC operations.Type: GrantFiled: September 28, 2005Date of Patent: February 7, 2012Assignee: Sicronic Remote KG, LLCInventors: Deboleena Minz, Kailash Digari
-
Publication number: 20120011187Abstract: A circuit for performing a floating-point fused-multiply-add (FMA) calculation of a×b±c. The circuit includes (i) a partial product generation module having (a) a multiples generator unit configured to generate multiples of a multiplicand has m digit binary coded decimal (BCD) format, (b) a recoding unit configured to generate n+1 signed digits (SD) sets from a sum vector and a carry vector of a multiplier, and (c) a multiples selection unit configured to generate partial product vectors from the multiples of the multiplicand based on the n+1 SD sets and the sign of FMA calculation, and (ii) a carry save adder (CSA) tree configured to add the partial product vectors and an addend to generate a result sum vector and a result carry vector in a m+n digit BCD format.Type: ApplicationFiled: July 6, 2011Publication date: January 12, 2012Applicant: SILMINDS, LLC, EGYPTInventors: Amira Mohamed, Ramy Raafat, Hossam Ali Hassan Fahmy, Tarek Eldeeb, Yasmeen Farouk, Rodina Samy, Mostafa Elkhouly