Multiplication Followed By Addition (i.e., X*y+z) Patents (Class 708/523)
  • Publication number: 20140122554
    Abstract: In an embodiment, a fused multiply-add (FMA) circuit is configured to receive a plurality of input data values to perform an FMA instruction on the input data values. The circuit includes a multiplier unit and an adder unit coupled to an output of the multiplier unit, and a control logic to receive the input data values and to reduce switching activity and thus reduce power consumption of one or more components of the circuit based on a value of one or more of the input data values. Other embodiments are described and claimed.
    Type: Application
    Filed: October 31, 2012
    Publication date: May 1, 2014
    Inventors: Brian J. Hickmann, Dennis R. Bradford, Thomas D. Fletcher
  • Patent number: 8706791
    Abstract: Embodiments of the invention are directed to system and method that enable relatively low power dissipation by scheduling operations of multiply accumulators chain of two or more multiply accumulators units by delivering an output result of a first multiply accumulator of the chain as an input to a second subsequent multiply accumulator of the chain.
    Type: Grant
    Filed: July 30, 2009
    Date of Patent: April 22, 2014
    Assignee: Ceva D.S.P. Ltd.
    Inventor: Jeffrey Allan (Alon) Jacob (Yaakov)
  • Publication number: 20140101220
    Abstract: A composite finite field multiplier is disclosed. The multiplier includes a controller, an input port, an output port, a GF((2n)2) multiplier, a GF(2n) standard basis multiplier, and a GF(2n) look-up table multiplier; the controller is connected respectively to the input port, the output port, the GF((2n)2) multiplier, the GF(2n) standard basis multiplier and the GF(2n) look-up table multiplier; the GF((2n)2) multiplier is connected respectively to the GF(2n) standard basis multiplier and the GF(2n) look-up table multiplier. By using the GF((2n)2) multiplier, the GF(2n) standard basis multiplier and the GF(2n) look-up table multiplier, the multiplication of three operands is realized. Compared with the existing multiplier, the multiplier of the present invention has significant advantages in the speed of multiplying three operands over GF((2n)m).
    Type: Application
    Filed: May 25, 2012
    Publication date: April 10, 2014
    Inventors: Shaohua Tang, Haibo Yi
  • Publication number: 20140095570
    Abstract: An apparatus and method for calculating an internal state for artificial emotions are disclosed, of which the method comprises multiplying an input value obtained from a sensor with a first personality set in accordance with at least one low rank element contained in at least one high rank element of a NEO PI-R (Revised NEO Personality Inventory); calculating a personality factor value in a Five-Factor Model of the personality by adding the results of the multiplication; and calculating the internal state by multiplying the personality factor value with a second personality.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Applicant: KOREA INSTITUTE OF INDUSTRIAL TECHNOLOGY
    Inventor: KOREA INSTITUTE OF INDUSTRIAL TECHNOLOGY
  • Publication number: 20140095568
    Abstract: A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.
    Type: Application
    Filed: December 3, 2013
    Publication date: April 3, 2014
    Inventors: MAARTEN J. BOERSMA, KLAUS M. KROENER, CHRISTOPHE J. LAYER, SILVIA M. MUELLER
  • Publication number: 20140082036
    Abstract: The disclosed embodiments disclose techniques for using a split division circuit that includes a first divider that is optimized for a first range of divisor values and a second divider that is optimized for a second range of divisor values; the first range is distinct from the second range. During operation, the circuit receives a divisor for the division operation. The circuit: determines whether the divisor is in the first range or the second range to determine whether the first divider or the second divider should perform the division operation; performs the division operation in the selected host divider; and then outputs the result that was generated by the selected host divider.
    Type: Application
    Filed: March 15, 2013
    Publication date: March 20, 2014
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Josephus C. Ebergen, Navaneeth P. Jamadagni, Ivan E. Sutherland
  • Publication number: 20140067889
    Abstract: A datapath circuit may include a digital multiply and accumulate circuit (MAC) and a digital hardware calculator for parallel computation. The digital hardware calculator and the MAC may be coupled to an input memory element for receipt of input operands. The MAC may include a digital multiplier structure with partial product generators coupled to an adder to multiply a first and second input operands and generate a multiplication result. The digital hardware calculator may include a first look-up table coupled between a calculator input and a calculator output register. The first look-up table may include table entry values mapped to corresponding math function results in accordance with a first predetermined mathematical function. The digital hardware calculator may be configured to calculate, based on the first look-up table, a computationally hard mathematical function such as a logarithm function, an exponential function, a division function and a square root function.
    Type: Application
    Filed: August 27, 2013
    Publication date: March 6, 2014
    Applicant: ANALOG DEVICES A/S
    Inventor: Mikael M. MORTENSEN
  • Patent number: 8667042
    Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: March 4, 2014
    Assignee: Intel Corporation
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
  • Patent number: 8667046
    Abstract: A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).
    Type: Grant
    Filed: February 20, 2009
    Date of Patent: March 4, 2014
    Assignee: Ecole Polytechnique Federale de Lausanne/Service des Relations Industrielles
    Inventors: Philip Brisk, Alessandro Cevrero, Frank K. Gurkaynak, Paolo Ienne Lopez, Hadi Parandeh-Afshar
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Patent number: 8645450
    Abstract: Multiplier-accumulator circuitry includes circuitry for forming a plurality of partial products of multiplier and multiplicand inputs, carry-save adder circuitry for adding together the partial products and another input to produce intermediate sum and carry outputs, final adder circuitry for adding together the intermediate sum and carry outputs to produce a final output, and feedback circuitry for applying the final output (typically after some delay, e.g., due to registration of the final output) to the carry-save adder circuitry as said another input. The above circuitry may be implemented in so-called “hard IP” (intellectual property) of a field-programmable gate array (“FPGA”) integrated circuit device. If desired, any overflow from the accumulation performed by the above circuitry may be accumulated in “soft” accumulator-overflow circuitry that is implemented in the general-purpose programmable logic of the FPGA.
    Type: Grant
    Filed: March 2, 2007
    Date of Patent: February 4, 2014
    Assignee: Altera Corporation
    Inventors: Kok Heng Choe, Tony K Ngai, Henry Y. Lui
  • Publication number: 20130346462
    Abstract: An arithmetic logic stage in a graphics pipeline includes a number of arithmetic logic units (ALUs). The ALUs each include, for example, a multiplier and an adder. The ALUs are interconnected by circuitry that, for example, routes the output from the multiplier in one ALU to both the adder in that ALU and an adder in another ALU.
    Type: Application
    Filed: August 27, 2013
    Publication date: December 26, 2013
    Inventors: Tyson BERGLAND, Michael J.M. TOKSVIG, Justin Michael MAHAN
  • Publication number: 20130346461
    Abstract: An apparatus for calculating a result of a scalar multiplication of a reference number with a reference point on an elliptic curve comprises a point selector and a processor. The point selector is configured to select randomly or pseudo-randomly an auxiliary point on the elliptic curve. The processor is configured to calculate the result of the scalar multiplication with a double-and-always-add process using the auxiliary point.
    Type: Application
    Filed: August 22, 2013
    Publication date: December 26, 2013
    Applicant: Infineon Technologies AG
    Inventor: Wieland Fischer
  • Publication number: 20130332501
    Abstract: A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.
    Type: Application
    Filed: June 11, 2012
    Publication date: December 12, 2013
    Applicant: IBM Corporation
    Inventors: Maarten J. Boersma, Klaus Michael Kroener, Christophe J. Layer, Silvia M. Mueller
  • Patent number: 8606840
    Abstract: A fused multiply add (FMA) unit includes an alignment counter configured to calculate an alignment shift count, an aligner configured to align an addend input based on the alignment shift count and output an aligned addend, a multiplier configured to multiply a first multiplicand input and a second multiplicand input and output a product, an adder configured to add the aligned addend and the product and output a sum without determining the sign of the sum or complementing the sum, a normalizer configured to receive the sum directly from the adder and normalize the sum irrespective of the sign of the sum and output a normalized sum, and a rounder configured to round and complement-adjust the normalized sum and output a final mantissa.
    Type: Grant
    Filed: March 17, 2010
    Date of Patent: December 10, 2013
    Assignee: Oracle International Corporation
    Inventor: Sadar Ahmed
  • Patent number: 8595280
    Abstract: A data processing apparatus and method for performing multiply-accumulate operations is provided. The data processing apparatus includes data processing circuitry responsive to control signals to perform data processing operations on at least one input data element.
    Type: Grant
    Filed: October 29, 2010
    Date of Patent: November 26, 2013
    Assignee: ARM Limited
    Inventors: Dominic Hugo Symes, Mladen Wilder, Guy Larri
  • Patent number: 8589465
    Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.
    Type: Grant
    Filed: May 8, 2013
    Date of Patent: November 19, 2013
    Assignee: Altera Corporation
    Inventors: Suleyman Sirri Demirsoy, Hyun Yi
  • Publication number: 20130262549
    Abstract: An arithmetic circuit includes a circuit to output n-th multiples of a multiplicand, a circuit to output an XOR operation result that is a result of performing an exclusive logical sum operation between the multiplicand and a result of shifting the multiplicand to left by one bit, a circuit to output a first selection signal in response to a first portion of a multiplier, a circuit to output a second selection signal in response to a second portion of the multiplier, a circuit to select, in response to the first selection signal, one of the n-th multiples of the multiplicand and the XOR operation result, a circuit to select, in response to the second selection signal, one of the n-th multiples of the multiplicand and the XOR operation result, and a circuit to output a result of adding up the first partial product and the second partial product.
    Type: Application
    Filed: January 8, 2013
    Publication date: October 3, 2013
    Inventor: Kenichi KITAMURA
  • Patent number: 8543634
    Abstract: A specialized processing block such as a DSP block may be enhanced by including direct connections that allow the block output to be directly connected to either the multiplier inputs or the adder inputs of another such block. A programmable integrated circuit device may includes a plurality of such specialized processing blocks. The specialized processing block includes a multiplier having two multiplicand inputs and a product output, an adder having as one adder input the product output of the multiplier, and having a second adder input and an adder output, a direct-connect output of the adder output to a first other one of the specialized processing block, and a direct-connect input from a second other one of the specialized processing block. The direct-connect input connects a direct-connect output of that second other one of the specialized processing block to a first one of the multiplicand inputs.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: September 24, 2013
    Assignee: Altera Corporation
    Inventors: Lei Xu, Volker Mauer, Steven Perry
  • Patent number: 8533250
    Abstract: Circuits for a multiplier with a built-in accumulator and a method of performing multiplication with accumulation are disclosed. An embodiment of the disclosed circuits includes a logic circuit coupled to receive two inputs. The logic circuit is capable of generating a plurality of value bits from the inputs received. In one embodiment, the logic circuit includes a Booth recoder circuit that generates a plurality of partial products. A block of adders is coupled to logic circuit to receive and sum up the value bits. An adder adds the summation result from the block of adders to a previous accumulated value to generate intermediate sum and carry values. An accumulator, coupled to the adder, receives and stores the intermediate values.
    Type: Grant
    Filed: June 17, 2009
    Date of Patent: September 10, 2013
    Assignee: Altera Corporation
    Inventors: Kok Yoong Foo, Yan Jiong Boo, Geok Sun Chong, Boon Jin Ang, Kar Keng Chua
  • Publication number: 20130226982
    Abstract: An apparatus and a method for generating a partial product for a polynomial operation are provided. The apparatus includes first encoders, each of the first encoders configured to selectively generate and output one of mutually exclusive values based on two inputs. The apparatus further includes a second encoder configured to generate and output two candidate partial products based on an output from a first one of the first encoders that is provided at a reference bit position of the inputs, an output from a second one of the first encoders that is provided at an upper bit position of the inputs, and a multiplicand. The apparatus further includes a multiplexer configured to select one of the candidate partial products output from the second encoder.
    Type: Application
    Filed: August 17, 2012
    Publication date: August 29, 2013
    Applicant: Samsung Electronics Co., Ltd.
    Inventor: Hyeong-Seok YU
  • Patent number: 8521800
    Abstract: An arithmetic logic stage in a graphics pipeline includes a number of arithmetic logic units (ALUs). The ALUs each include, for example, a multiplier and an adder. The ALUs are interconnected by circuitry that, for example, routes the output from the multiplier in one ALU to both the adder in that ALU and an adder in another ALU.
    Type: Grant
    Filed: August 15, 2007
    Date of Patent: August 27, 2013
    Assignee: Nvidia Corporation
    Inventors: Tyson J. Bergland, Michael J. M. Toksvig, Justin M. Mahan
  • Publication number: 20130198254
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
    Type: Application
    Filed: March 13, 2013
    Publication date: August 1, 2013
    Inventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
  • Patent number: 8495124
    Abstract: A decimal multiplication mechanism for fixed and floating point computation in a computer having a coefficient mechanism without resulting leading zero detection (LZD) and process which assumes that the final product will be M+N digits in length and performs all calculations based on this assumption. Least significant digits that would be truncated are no longer stored, but retained as sticky information which is used to finalize the result product. Once the computation of the product is complete, a final check based on the examination of key bits observed during partial product accumulation is used to determine if the final product is truly M+N digits in length, or M+N?1 digits. If the latter is true, then corrective final product shifting is employed to obtain the proper result. This eliminates the need for dedicated leading zero detection hardware used to determine the number of significant digits in the final product.
    Type: Grant
    Filed: June 23, 2010
    Date of Patent: July 23, 2013
    Assignee: International Business Machines Corporation
    Inventors: Steven R. Carlough, Adam B. Collura, Michael Kroener, Silvia Melitta Mueller
  • Patent number: 8495122
    Abstract: Described is a programmable logic device (PLD) with columns of DSP slices that can be cascaded to create DSP circuits of varying size and complexity. Each slice includes a mode port that receives mode control signals for dynamically altering the function and connectivity of related slices. Such alterations can occur with or without reconfiguring the PLD.
    Type: Grant
    Filed: December 21, 2004
    Date of Patent: July 23, 2013
    Assignee: Xilinx, Inc.
    Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
  • Patent number: 8489665
    Abstract: A dividing unit sets an actual packet length transferred from a packet receiving section to a variable U, and then sets 2? to a variable V. If a positive number determining section determines that a subtraction result of subtracting a remainder N0 from a quotient M0, both found by dividing U by V, is a positive number, the dividing unit overwrites the subtraction result to U. The dividing unit repeats such operations of dividing the subtraction result by V, until the positive number determining section determines that the subtraction result of subtracting the remainder from the quotient, both found by dividing U by V, is a non-positive number. When the subtraction result becomes a non-positive number and the quotient and the remainder match, a packet length determining section determines that received data has a normal size, and notifies it to a discard determining section.
    Type: Grant
    Filed: January 28, 2009
    Date of Patent: July 16, 2013
    Assignee: Fujitsu Limited
    Inventors: Fuyuta Sato, Hideo Okawa
  • Patent number: 8463834
    Abstract: A floating point multiplier includes a data path in which a plurality of partial products are calculated and then reduced to a first partial product and a second partial product. Shift amount determining circuitry 100 analyzes the exponents of the input operands A and B as well as counting the leading zeros in the fractional portions of these operands to determine an amount of left shift or right shift to be applied by shifting circuitry 200, 202 within the multiplier data path. This shift amount is applied so as to align the partial products so that when they are added they will produce the result C without requiring this to be further shifted. Furthermore, shifting the partial products to the correct alignment in this way in advance of adding these partial products permits injection rounding combined with the adding of the partial products to be employed for cases including subnormal values.
    Type: Grant
    Filed: November 3, 2009
    Date of Patent: June 11, 2013
    Assignee: ARM Limited
    Inventor: David Raymond Lutz
  • Patent number: 8463832
    Abstract: Various implementations of a digital signal processing (DSP) block architecture of a programmable logic device (PLD) and related methods are provided. In one example, a PLD includes a dedicated DSP block. The DSP block includes a first multiplier adapted to multiply a first plurality of input signals to provide a first plurality of product signals. The DSP block also includes a second multiplier adapted to multiply a second plurality of input signals to provide a second plurality of product signals. The DSP block further includes an arithmetic logic unit (ALU) adapted to operate on the first product signals and the second product signals received at first and second operand inputs, respectively, of the ALU to provide a plurality of output signals.
    Type: Grant
    Filed: June 25, 2008
    Date of Patent: June 11, 2013
    Assignee: Lattice Semiconductor Corporation
    Inventors: Asher Hazanchuk, Ian Ing, Satwant Singh
  • Patent number: 8458243
    Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.
    Type: Grant
    Filed: March 3, 2010
    Date of Patent: June 4, 2013
    Assignee: Altera Corporation
    Inventors: Suleyman Sirri Demirsoy, Hyun Yi
  • Patent number: 8447800
    Abstract: In a denormal support mode, the normalization circuit of a floating-point adder is used to normalize or denormalized the output of a floating-point multiplier. Each floating-point multiply instruction is speculatively converted to a multiply-add instruction, with the addend forced to zero. This preserves the value of the product, while normalizing or denormalizing the product using the floating-point adder's normalization circuit. When the operands to the multiply operation are available, they are inspected. If the operands will not generate an unnormal intermediate product or a denormal final product, the add operation is suppressed, such as by operand-forwarding. Additionally, each non-fused floating-point multiply-add instruction is replaced with a multiply-add instruction having a zero addend, and a floating-point add instruction having the addend of the original multiply-add instruction is inserted into the instruction stream.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: May 21, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Kenneth Alan Dockser, Pathik Sunil Lall
  • Patent number: 8438208
    Abstract: A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M.
    Type: Grant
    Filed: June 19, 2009
    Date of Patent: May 7, 2013
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla, Paul J. Jordan
  • Patent number: 8417760
    Abstract: For calculating a result of a modular multiplication with long operands, at least the multiplicand is divided into at least three shorter portions. Using the three shorter portions of the multiplicand, the multiplier and the modulus, a modular multiplication is performed within a cryptographic calculation, wherein the portions of the multiplicand, the multiplier and the modulus are parameters of the cryptographic calculation. The calculation is performed sequentially using the portions of the multiplicand and using an intermediate result obtained in a previous calculation, until all portions of the multiplicand are processed, to obtain the final result of the modular multiplication. The calculation of an intermediate result is performed using a multiplication addition operation, in which MMD operations and updating operations are performed sequentially, and short auxiliary registers and short result registers are used.
    Type: Grant
    Filed: October 30, 2006
    Date of Patent: April 9, 2013
    Assignee: Infineon Technologies AG
    Inventor: Wieland Fischer
  • Patent number: 8396911
    Abstract: In a determination as to similarity on parts of a piece of data, high-speed processing is performed without the need for a database. Division signal lines (L1 to Lk) that transmit signals corresponding to division data are used.
    Type: Grant
    Filed: September 25, 2008
    Date of Patent: March 12, 2013
    Assignee: Toshiba Information Systems (Japan) Corporation
    Inventor: Akiyoshi Oguro
  • Patent number: 8346838
    Abstract: A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.
    Type: Grant
    Filed: September 15, 2009
    Date of Patent: January 1, 2013
    Assignee: Intel Corporation
    Inventors: Eric Debes, William W. Macy, Jonathan J. Tyler
  • Patent number: 8332452
    Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of dot product operations to generate operands for generating operands for a new vector. The dot product operations may require the issue of a plurality of permute instructions to arrange the vector operands in desired locations of a target register. Embodiments of the invention provide a dot product instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby avoiding the need for permute instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.
    Type: Grant
    Filed: October 31, 2006
    Date of Patent: December 11, 2012
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff
  • Patent number: 8316071
    Abstract: Sum and carry signals are formed representing a product of a first and a second operand. A bias signal is formed having a value determined by a sign of a product of the first and the second operand. An output signal is provided based on an addition of the sum signal, the carry signal, a sign-extended addend, and the bias signal. A portion of the output signal, a saturated minimum value, or a saturated maximum value, is selected as a final result based on the sign of the product and a sign of the output signal.
    Type: Grant
    Filed: May 27, 2009
    Date of Patent: November 20, 2012
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Kevin A. Hurd, Scott A. Hilker
  • Patent number: 8301681
    Abstract: A specialized processing block for a programmable logic device includes circuitry for performing multiplications and sums thereof, as well as circuitry for performing floating point operations. The floating point circuitry preferably includes rounding and normalization circuitry. To perform mantissa multiplications, the floating point circuitry preferably relies on the aforementioned multipliers of the specialized processing block.
    Type: Grant
    Filed: June 5, 2006
    Date of Patent: October 30, 2012
    Assignee: Altera Corporation
    Inventors: Kwan Yee Martin Lee, Martin Langhammer, Triet M. Nguyen, Yi-Wen Lin
  • Patent number: 8280940
    Abstract: A data processing apparatus including a register bank, a shadow register and an arithmetic operation unit. The register bank includes a number of registers respectively for storing a number of operands wherein the registers are n-bit registers, and n is a nature number. The shadow register stores a first backup operand for making a backup of a first operand, which is stored in a first one of the registers in response to first control signal. The arithmetic operation unit performs at least an arithmetic operation on the operands to obtain operational data, and stores the operational data in the first register in response to an arithmetic operation command.
    Type: Grant
    Filed: October 22, 2007
    Date of Patent: October 2, 2012
    Assignee: Himax Technologies Limited
    Inventors: Chun-Yu Chen, Shu-Ming Liu
  • Patent number: 8266199
    Abstract: A specialized processing block for a programmable logic device incorporates a fundamental processing unit that performs a sum of two multiplications, adding the partial products of both multiplications without computing the individual multiplications. Such fundamental processing units consume less area than conventional separate multipliers and adders. The specialized processing block further has input and output stages, as well as a loopback function, to allow the block to be configured for various digital signal processing operations.
    Type: Grant
    Filed: June 5, 2006
    Date of Patent: September 11, 2012
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Kwan Yee Martin Lee, Orang Azgomi, Keone Streicher, Yi-Wen Lin
  • Patent number: 8266198
    Abstract: A specialized processing block for a programmable logic device includes circuitry for performing multiplications and sums thereof, as well as circuitry for rounding the result. The rounding circuitry can selectably perform round-to-nearest and round-to-nearest-even operations. In addition, the bit position at which rounding occurs is preferably selectable. The specialized processing block preferably also includes saturation circuitry to prevent overflows and underflows, and the bit position at which saturation occurs also preferably is selectable. The selectability of both the rounding and saturation positions provides control of the output data word width. The rounding and saturation circuitry may be selectably located in different positions based on timing needs. Similarly, rounding may be speeded up using a look-ahead mode in which both rounded and unrounded results are computed in parallel, with the rounding logic selecting between those results.
    Type: Grant
    Filed: June 5, 2006
    Date of Patent: September 11, 2012
    Assignee: Altera Corporation
    Inventors: Kwan Yee Martin Lee, Martin Langhammer, Yi-Wen Lin, Triet M. Nguyen
  • Patent number: 8239439
    Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed.
    Type: Grant
    Filed: December 13, 2007
    Date of Patent: August 7, 2012
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Matthew R. Tubbs
  • Patent number: 8239441
    Abstract: Modifying a leading zero estimation during an unfused multiply add operation of (A*B)+C. A plurality of terms x and y may be received, and each may be based on truncated terms s and t (e.g., in performing the unfused multiply add operation) and the shifted C term. A first leading zero estimation may be calculated based on the terms x and y. It may be determined if near total catastrophic cancellation has occurred. A carry in from a right most number of bits of the terms s and t and the most significant truncated bits of s and t may be used to generate a second leading zero estimation based on the first leading zero estimation if the near total catastrophic cancellation has occurred.
    Type: Grant
    Filed: May 15, 2008
    Date of Patent: August 7, 2012
    Assignee: Oracle America, Inc.
    Inventor: Leonard D. Rarick
  • Patent number: 8239440
    Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: August 7, 2012
    Assignee: Oracle America, Inc.
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Publication number: 20120173600
    Abstract: Provided are an apparatus and method for performing a complex number operation using a Single Instruction Multiple Data (SIMD) architecture. A SIMD operation apparatus may perform, in parallel, a real part operation and an imaginary part operation of a plurality of complex numbers. The real part operation and the imaginary part operation may be performed sequentially, or in parallel.
    Type: Application
    Filed: August 16, 2011
    Publication date: July 5, 2012
    Inventors: Young Hwan Park, Ho YANG
  • Publication number: 20120150933
    Abstract: Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.
    Type: Application
    Filed: July 15, 2011
    Publication date: June 14, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Jens Leenstra, Tim Niggemeier, Philipp Oehler, Philipp Panitz
  • Patent number: 8200725
    Abstract: An arithmetic processing system processes a sensing signal and a first approximate offset signal to obtain a second approximate offset signal. The system includes a first arithmetic processor and a second arithmetic processor. The first arithmetic processor receives and processes the sensing signal and the first approximate offset signal to output a first arithmetic signal. The second arithmetic processor processes the first arithmetic signal to output a second arithmetic signal, and the second arithmetic signal is added with a predetermined offset signal to obtain the second approximate offset signal, and the second approximate offset signal is closer to a real offset signal of the sensing signal than the first approximate offset signal. A method of arithmetic processing is also disclosed.
    Type: Grant
    Filed: November 6, 2007
    Date of Patent: June 12, 2012
    Assignee: Asia Optical Co., Inc.
    Inventors: Kun-Chi Liao, Yu-Ting Lee
  • Patent number: 8190669
    Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.
    Type: Grant
    Filed: October 20, 2004
    Date of Patent: May 29, 2012
    Assignee: NVIDIA Corporation
    Inventors: Stuart F. Oberman, Ming Y. Siu
  • Publication number: 20120078992
    Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
  • Patent number: 8112466
    Abstract: An efficient implementation of DSP functions in a field programmable gate array (FPGA) using one or more computational blocks, each block having of a multiplier, an accumulator, and multiplexers. The structure implements most common DSP equations in a fast and a highly compact manner. A novel method for cascading these blocks with the help of dedicated DSP lines is provided, which leads to a very simple and proficient implementation of n-stage MAC operations.
    Type: Grant
    Filed: September 28, 2005
    Date of Patent: February 7, 2012
    Assignee: Sicronic Remote KG, LLC
    Inventors: Deboleena Minz, Kailash Digari
  • Publication number: 20120011187
    Abstract: A circuit for performing a floating-point fused-multiply-add (FMA) calculation of a×b±c. The circuit includes (i) a partial product generation module having (a) a multiples generator unit configured to generate multiples of a multiplicand has m digit binary coded decimal (BCD) format, (b) a recoding unit configured to generate n+1 signed digits (SD) sets from a sum vector and a carry vector of a multiplier, and (c) a multiples selection unit configured to generate partial product vectors from the multiples of the multiplicand based on the n+1 SD sets and the sign of FMA calculation, and (ii) a carry save adder (CSA) tree configured to add the partial product vectors and an addend to generate a result sum vector and a result carry vector in a m+n digit BCD format.
    Type: Application
    Filed: July 6, 2011
    Publication date: January 12, 2012
    Applicant: SILMINDS, LLC, EGYPT
    Inventors: Amira Mohamed, Ramy Raafat, Hossam Ali Hassan Fahmy, Tarek Eldeeb, Yasmeen Farouk, Rodina Samy, Mostafa Elkhouly