Multiplication Followed By Addition (i.e., X*y+z) Patents (Class 708/523)
  • Patent number: 8090758
    Abstract: A multiplier-accumulator includes a pre-adder, a multiplier, an accumulator, multiplexing logic, and control logic. The pre-adder is configured to sum a first input and a second input to produce a pre-sum output. The multiplier is configured to multiply a third input and the pre-sum output to produce a product output. The accumulator is configured to sum a pair of accumulator inputs to produce a sum output. The multiplexer is configured to select the pair of accumulator inputs from a plurality of multiplexer inputs, where the plurality of multiplexer inputs includes the product output and the sum output. The control logic is configured to control operation of the pre-adder, the accumulator, and the multiplexer logic. In an example, each of the first input, the second input, the third input, and the sum output is coupled to programmable interconnect of a programmable logic device.
    Type: Grant
    Filed: December 14, 2006
    Date of Patent: January 3, 2012
    Assignee: Xilinx, Inc.
    Inventors: Schuyler E. Shimanek, William E. Allaire, Steven J. Zack
  • Publication number: 20110320512
    Abstract: A decimal multiplication mechanism for fixed and floating point computation in a computer having a coefficient mechanism without resulting leading zero detection (LZD) and process which assumes that the final product will be M+N digits in length and performs all calculations based on this assumption. Least significant digits that would be truncated are no longer stored, but retained as sticky information which is used to finalize the result product. Once the computation of the product is complete, a final check based on the examination of key bits observed during partial product accumulation is used to determine if the final product is truly M+N digits in length, or M+N?1 digits. If the latter is true, then corrective final product shifting is employed to obtain the proper result. This eliminates the need for dedicated leading zero detection hardware used to determine the number of significant digits in the final product.
    Type: Application
    Filed: June 23, 2010
    Publication date: December 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Steven R. Carlough, Adam B. Collura, Michael Kroener, Silvia Melitta Mueller
  • Patent number: 8078660
    Abstract: A bridge fused multiply-adder is disclosed. The fused multiply-adder is for the single instruction execution of (A×B)+C. The bridge fused multiply-add unit adds this functionality to existing floating-point co-processor units by including a fused multiply-add hardware “bridge” between an existing floating-point adder and a floating-point multiplier unit. This fused multiply-add functionality is added to existing two-operand architecture designs without degrading the performance or parallel pipe execution of floating-point adder and floating-point multiplier instructions.
    Type: Grant
    Filed: April 9, 2008
    Date of Patent: December 13, 2011
    Assignee: The Board of Regents, University of Texas System
    Inventors: Eric Quinnell, Earl E. Swartzlander, Jr., Carl Lemonds
  • Patent number: 8078661
    Abstract: A multiple-word multiplication-accumulation circuit suitable for use with a single-port memory. The circuit is composed of a multiplication-accumulation (MAC) operator and surrounding registers. The MAC operator has multiplicand and multiplier input ports with different bit widths to calculate a sum of products of multiple-word data read out of a memory. The registers serve as buffer storage of multiple-word data to be supplied to individual input ports of the MAC operator. The amount of data supplied to the MAC operator in each clock cycle is adjusted such that total amount of data consumed and produced by the MAC operator in one clock cycle will be equal to or smaller than the maximum amount of data that the memory can transfer in one clock cycle. This feature enables the use of a bandwidth-limited single-port memory, without causing adverse effect on the efficiency of MAC operator usage.
    Type: Grant
    Filed: July 26, 2004
    Date of Patent: December 13, 2011
    Assignee: Fujitsu Semiconductor Limited
    Inventors: Kenji Mukaida, Masahiko Takenaka, Naoya Torii, Shoichi Masui
  • Patent number: 8065356
    Abstract: A programmable element for data processing comprises a crosspoint switch (318), a mathematical operation module (320), and a plurality of data hold modules (604,606). Each of the data hold modules (604,606) receives data from the crosspoint switch (318) and communicates the data to an input of the mathematical operation module (320) such that data arrives at the inputs of the mathematical operation module (320) substantially simultaneously. A first data hold module (604) communicates a first data valid signal to a second data hold module (606) upon receipt of first valid data, and the second data hold module communicates a second data valid signal to the first data hold module upon receipt of second valid data.
    Type: Grant
    Filed: December 20, 2006
    Date of Patent: November 22, 2011
    Assignee: L3 Communications Integrated Systems, L.P.
    Inventors: Jerry William Yancey, Yea Zong Kuo
  • Publication number: 20110276614
    Abstract: A data processing apparatus and method are provided for performing a reciprocal operation on an input value d to produce a result value X. The reciprocal operation involves iterative execution of a refinement step to converge on the result value, the refinement step performing the computation: Xi=Xi-1*M, where Xi is an estimate of the result value for the i-th iteration of the refinement step, and M is a value determined by a portion of the refinement step. The data processing apparatus comprises a register data store having a plurality of registers operable to store data, and processing logic operable to execute instructions to perform data processing operations on data held in the register data store.
    Type: Application
    Filed: July 19, 2011
    Publication date: November 10, 2011
    Applicant: ARM Limited
    Inventors: David Raymond Lutz, Christopher Neal Hinds
  • Patent number: 8051121
    Abstract: According to some embodiments, a dual multiply-accumulate operation optimized for even and odd multisample calculations is disclosed.
    Type: Grant
    Filed: March 4, 2008
    Date of Patent: November 1, 2011
    Assignee: Marvell International Ltd.
    Inventors: Bradley C. Aldrich, Nigel C. Paver, William T. Maghielse
  • Publication number: 20110264719
    Abstract: The present invention relates to power and hardware efficient digital multipliers configured to multiply an N-bit multiplicand with an M-bit multiplier. The digital multipliers comprise efficient partial product generation through sharing of at least one partial product result.
    Type: Application
    Filed: September 23, 2009
    Publication date: October 27, 2011
    Applicant: AUDIOASICS A/S
    Inventor: Mikael Mortensen
  • Patent number: 8046399
    Abstract: A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
    Type: Grant
    Filed: January 25, 2008
    Date of Patent: October 25, 2011
    Assignee: Oracle America, Inc.
    Inventors: Murali K. Inaganti, Leonard D. Rarick
  • Patent number: 8041759
    Abstract: A specialized processing block for a programmable logic device incorporates a fundamental processing unit that performs a sum of two multiplications, adding the partial products of both multiplications without computing the individual multiplications. Such fundamental processing units consume less area than conventional separate multipliers and adders. The specialized processing block further has input and output stages, as well as a loopback function, to allow the block to be configured for various digital signal processing operations, including finite impulse response (FIR) filters and infinite impulse response (IIR) filters. By using the programmable connections, and in some cases the programmable resources of the programmable logic device, and by running portions of the specialized processing block at higher clock speeds than the remainder of the programmable logic device, more complex FIR and IIR filters can be implemented.
    Type: Grant
    Filed: June 5, 2006
    Date of Patent: October 18, 2011
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Kwan Yee Martin Lee, Orang Azgomi, Keone Streicher, Robert L. Pelt
  • Patent number: 8036165
    Abstract: The quality of signals during SDMA is raised. In an uplink, a signal processing unit receives signals respectively from a plurality of terminal apparatuses which have been multiple-accessed by division of time. It derives receiving channel characteristics corresponding to the plurality of terminal apparatuses, respectively, for each time slot. In a downlink, the signal processing unit derives transmitting channel characteristics from the receiving channel characteristics derived and, based on the transmitting channel characteristics derived, it transmits signals respectively to the plurality of terminal apparatuses to which SDMA has been performed.
    Type: Grant
    Filed: May 16, 2005
    Date of Patent: October 11, 2011
    Assignee: Kyocera Corporation
    Inventors: Takeo Miyata, Katsutoshi Kawai
  • Patent number: 8015229
    Abstract: An apparatus for performing multiply-accumulate operations in a microprocessor comprising operand input registers for receiving data to be operated on an adder and a multiplier for performing operations on the data, a result output port for presenting results to the microprocessor, a multiplexer for storing results, an accumulator cache for storing an accumulator value internal to the apparatus, and control circuitry for controlling the operation of the apparatus.
    Type: Grant
    Filed: June 1, 2005
    Date of Patent: September 6, 2011
    Assignee: Atmel Corporation
    Inventors: Øyvind Strøm, Erik Knutsen Renno
  • Patent number: 8005210
    Abstract: Modulus scaling applied a reduction techniques decreases time to perform modular arithmetic operations by avoiding shifting and multiplication operations. Modulus scaling may be applied to both integer and binary fields and the scaling multiplier factor is chosen based on a selected reduction technique for the modular arithmetic operation.
    Type: Grant
    Filed: June 30, 2007
    Date of Patent: August 23, 2011
    Assignee: Intel Corporation
    Inventors: Erdinc Ozturk, Vinodh Gopal, Gilbert Wolrich, Wajdi K. Feghali
  • Patent number: 8001360
    Abstract: A system and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying a data selection operand and a first and a second register providing a plurality of data elements, the data selection operand comprising a plurality of fields each selecting one of the plurality of data elements, the execution unit operable to provide the data element selected by each field of the data selection operand to a predetermined position in a catenated result.
    Type: Grant
    Filed: January 16, 2004
    Date of Patent: August 16, 2011
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris
  • Publication number: 20110185000
    Abstract: A low-error reduced-width multiplier is provided by the present invention. The multiplier can dynamically compensate the truncation error. The compensation value is derived by the dependencies among the multiplier partial products, and thus, can be analyzed according to the multiplication type and the multiplier input statistics.
    Type: Application
    Filed: February 28, 2011
    Publication date: July 28, 2011
    Applicant: National Chiao Tung University
    Inventors: Yen-Chin Liao, Hsie-Chia Chang
  • Patent number: 7986779
    Abstract: Time to perform scalar point multiplication used for ECC is reduced by minimizing the number of shifting operations. These operations are minimized by applying modulus scaling by performing selective comparisons of points at intermediate computations based on primality of the order of an ECC group.
    Type: Grant
    Filed: June 30, 2007
    Date of Patent: July 26, 2011
    Assignee: Intel Corporation
    Inventors: Erdinc Ozturk, Vinodh Gopal, Gilbert Wolrich, Wajdi K. Feghali
  • Patent number: 7987222
    Abstract: A method for performing multiplication on a field programmable gate array includes generating a product by multiplying a first plurality of bits from a first number and a first plurality of bits from a second number. A stored value designated as a product of a second plurality of bits from the first number and a second plurality of bits from the second number is retrieved. The product is scaled with respect to a position of the first plurality of bits from the first number and a position of the first plurality of bits from the second number. The stored value is scaled with respect to a position of the second plurality of bits from the second number and a position of the second plurality of bits from the second number. The scaled product and the scaled stored value are summed.
    Type: Grant
    Filed: April 22, 2004
    Date of Patent: July 26, 2011
    Assignee: Altera Corporation
    Inventors: Asher Hazanchuk, Benjamin Esposito
  • Patent number: 7987344
    Abstract: A programmable processor and method for improving the performance of processors by incorporating an execution unit configurable to execute a plurality of instruction streams from the plurality of threads, wherein each instruction stream includes a group instruction that operates on a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result.
    Type: Grant
    Filed: January 16, 2004
    Date of Patent: July 26, 2011
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris
  • Patent number: 7982496
    Abstract: A bus-based logic block for an integrated circuit includes a provision for placing an arbitrary constant onto a data bus in the logic block. An exemplary logic block has multi-bit first and second inputs and a multi-bit output. The logic block includes a multi-bit multiplexer circuit, a multi-bit programmable logic circuit, and a constant generator circuit. The multiplexer circuit has a multi-bit first input coupled to a multi-bit first input of the logic block, a multi-bit second input, and a multi-bit output. The programmable logic circuit has a multi-bit first input coupled to the output of the multiplexer circuit, and a multi-bit output. The constant generator circuit has a multi-bit output coupled to the second input of the multiplexer circuit. Each bit of the logic block may be commonly controlled with all other bits of the logic block.
    Type: Grant
    Filed: April 2, 2009
    Date of Patent: July 19, 2011
    Assignee: Xilinx, Inc.
    Inventor: Steven P. Young
  • Patent number: 7978846
    Abstract: The computation time to perform scalar point multiplication in an Elliptic Curve Group is reduced by modifying the Barrett Reduction technique. Computations are performed using an N-bit scaled modulus based a modulus m having k-bits to provide a scaled result, with N being greater than k. The N-bit scaled result is reduced to a k-bit result using a pre-computed N-bit scaled reduction parameter in an optimal manner avoiding shifting/aligning operations for any arbitrary values of k, N.
    Type: Grant
    Filed: June 30, 2007
    Date of Patent: July 12, 2011
    Assignee: Intel Corporation
    Inventors: Erdinc Ozturk, Vinodh Gopal, Gilbert Wolrich, Wajdi K. Feghali
  • Publication number: 20110161389
    Abstract: A plurality of specialized processing blocks in a programmable logic device, including multipliers and circuitry for adding results of those multipliers, can be configured as a larger multiplier by adding to the specialized processing blocks selectable circuitry for shifting multiplier results before adding. In one embodiment, this allows all but the final addition to take place in specialized processing blocks, with the final addition occurring in programmable logic. In another embodiment, additional compression and adding circuitry allows even the final addition to occur in the specialized processing blocks.
    Type: Application
    Filed: March 8, 2011
    Publication date: June 30, 2011
    Applicant: ALTERA CORPORATION
    Inventors: Martin Langhammer, Kumara Tharmalingam
  • Publication number: 20110153707
    Abstract: An apparatus and method are described for multiplying and adding matrices. For example, one embodiment of a method comprises decoding by a decoder in a processor device, a single instruction specifying an m-by-m matrix operation for a set of vectors, wherein each vector represents an m-by-m matrix of data elements and m is greater than one; issuing the single instruction for execution by an execution unit in the processor device; and responsive to the execution of the single instruction, generating a resultant vector, wherein the resultant vector represents an m-by-m matrix of data elements.
    Type: Application
    Filed: December 10, 2010
    Publication date: June 23, 2011
    Inventors: Boris Ginzburg, Simon Rubanovich, Benny Eitan
  • Patent number: 7966087
    Abstract: A method, system, and medium of modeling and/or for controlling a manufacturing process is disclosed. In particular, a method according to embodiments of the present invention includes calculating a set of predicted output values, and obtaining a prediction model based on a set of input parameters, the set of predicted output values, and empirical output values. Each input parameter causes a change in at least two outputs. The method also includes optimizing the prediction model by minimizing differences between the set of predicted output values and the empirical output values, and adjusting the set of input parameters to obtain a set of desired output values to control the manufacturing apparatus. Obtaining the prediction model includes transforming the set of input parameters into transformed input values using a transformation function of multiple coefficient values, and calculating the predicted output values using the transformed input values.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: June 21, 2011
    Assignee: Applied Materials, Inc.
    Inventors: Yuri Kokotov, Efim Entin, Jacques Seror, Yossi Fisher, Shalomo Sarel, Arulkumar P. Shanmugasundram, Alexander T. Schwarm, Young Jeen Paik
  • Patent number: 7958179
    Abstract: Provided are an arithmetic method and device of a reconfigurable processor. The arithmetic device includes: an Arithmetic Logic Unit (ALU) for performing an addition and subtraction operation and a logic operation of a binary signal; a multiplier for performing a multiplication operation of the binary signal; a shifter for changing an arrangement of the binary signal; a first operand selector and a second operand selector each for selecting one of values output from the ALU, the multiplier, and the shifter; and an adder for adding the values selected by the first operand selector and the second operand selector.
    Type: Grant
    Filed: October 30, 2007
    Date of Patent: June 7, 2011
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Chun Gi Lyuh, Soon Il Yeo, Tae Moon Roh, Jong Dae Kim
  • Patent number: 7917568
    Abstract: An x87 fused multiply-add (FMA) instruction in the instruction set of an x86 architecture microprocessor is disclosed. The FMA instruction implicitly specifies the two factor operands as the top two operands of the x87 FPU register stack and explicitly specifies the third addend operand as a third x87 FPU register stack register. The microprocessor multiplies the first two operands and adds the product to the third operand to generate a result. The result is stored into the third register and the first two operands are popped off the stack. In an alternate embodiment, the third operand is also implicitly specified as being stored in the register that is two registers below the top of stack register; the result is also stored therein. The instruction opcode value is in the x87 opcode range.
    Type: Grant
    Filed: July 23, 2007
    Date of Patent: March 29, 2011
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Timothy A. Elliott, Terry Parks
  • Patent number: 7912887
    Abstract: In a denormal support mode, the normalization circuit of a floating-point adder is used to normalize or denormalized the output of a floating-point multiplier. Each floating-point multiply instruction is speculatively converted to a multiply-add instruction, with the addend forced to zero. This preserves the value of the product, while normalizing or denormalizing the product using the floating-point adder's normalization circuit. When the operands to the multiply operation are available, they are inspected. If the operands will not generate an unnormal intermediate product or a denormal final product, the add operation is suppressed, such as by operand-forwarding. Additionally, each non-fused floating-point multiply-add instruction is replaced with a multiply-add instruction having a zero addend, and a floating-point add instruction having the addend of the original multiply-add instruction is inserted into the instruction stream.
    Type: Grant
    Filed: May 10, 2006
    Date of Patent: March 22, 2011
    Assignee: QUALCOMM Incorporated
    Inventors: Kenneth Alan Dockser, Pathik Sunil Lall
  • Publication number: 20110055303
    Abstract: One embodiment relates to a method for generating a periodic function in response to an argument in a digital signal processing system, where the periodic function can be represented as functions of two or more components of the argument. The method may include: obtaining a first operand from one of two or more lookup tables in response to a first component of the argument; obtaining a second operand from one of the lookup tables in response to a second component of the argument; conditionally mirroring the first and second operands in response to a quadrant of the argument; and calculating a value of the periodic function in response to the operands with a linear algebra unit without using conditional code execution.
    Type: Application
    Filed: March 15, 2010
    Publication date: March 3, 2011
    Applicant: AZURAY TECHNOLOGIES, INC.
    Inventor: Keith Slavin
  • Publication number: 20110055308
    Abstract: Systems and methods for multi-precision computation are disclosed. One embodiment of the present invention includes a plurality of multiply-add units (MADDs) configured to perform one or more single precision operations and an arrangement generator to generate one or more mantissa arrangements using a plurality of double precision numbers. Each MADD is configured to receive and load said mantissa arrangements from the arrangement generator. The MADDs compute a result of a multi-precision computation using the mantissa arrangements. In an embodiment, the MADDs are configured to simultaneously perform operations that include, single precision operations, double-precision additions and double-precision multiply and additions.
    Type: Application
    Filed: June 10, 2010
    Publication date: March 3, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael J. Mantor, Jeffrey T. Brady, Daniel B. Clifton, Christopher Spencer
  • Patent number: 7890566
    Abstract: A functional unit in a digital system is provided with a rounding DOT product instruction, wherein a product of first pair of elements is combined with a product of second pair of elements, the combined product is rounded, and the final result is stored in a destination. Rounding is performed by adding a rounding value to form an intermediate result, and then shifting the intermediate result right. A combined result is rounded to a fixed length shorter than the combined product. The products are combined by either addition or subtraction. An overflow resulting from the combination or from rounding is not reported.
    Type: Grant
    Filed: October 31, 2000
    Date of Patent: February 15, 2011
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph R. Zbiciak
  • Publication number: 20110029589
    Abstract: Embodiments of the invention are directed to system and method that enable relatively low power dissipation by scheduling operations of multiply accumulators chain of two or more multiply accumulators units by delivering an output result of a first multiply accumulator of the chain as an input to a second subsequent multiply accumulator of the chain.
    Type: Application
    Filed: July 30, 2009
    Publication date: February 3, 2011
    Inventor: Jeffrey Allan (Alon) JACOB (YAAKOV)
  • Patent number: 7873815
    Abstract: DSP architectures having improved performance are described. In an exemplary architecture, a DSP includes two MAC units and two ALUs, where one of the ALUs replaces an adder for one of the two MAC units. This DSP may be configured to operate in a dual-MAC/single-ALU configuration, a single-MAC/dual-ALU configuration, or a dual-MAC/dual-ALU configuration. This flexibility allows the DSP to handle various types of signal processing operations and improves utilization of the available hardware. The DSP architectures further includes pipeline registers that break up critical paths and allow operations at a higher clock speed for greater throughput.
    Type: Grant
    Filed: March 4, 2004
    Date of Patent: January 18, 2011
    Assignee: QUALCOMM Incorporated
    Inventors: Gilbert C. Sih, De D. Hsu, Way-Shing Lee, Xufeng Chen
  • Publication number: 20100306301
    Abstract: Sum and carry signals are formed representing a product of a first and a second operand. A bias signal is formed having a value determined by a sign of a product of the first and the second operand. An output signal is provided based on an addition of the sum signal, the carry signal, a sign-extended addend, and the bias signal. A portion of the output signal, a saturated minimum value, or a saturated maximum value, is selected as a final result based on the sign of the product and a sign of the output signal.
    Type: Application
    Filed: May 27, 2009
    Publication date: December 2, 2010
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Kevin A. Hurd, Scott A. Hilker
  • Patent number: 7822199
    Abstract: A method and device for performing a cryptographic operation by a device controlled by a security application executed outside thereof in which a cryptographic value (y) is produced a calculation comprising at least one multiplication between first and second factors containing a security key (s) associated with the device and a challenge number (c) provided by the security application. The first multiplication factor comprises a determined number of bits (L) in a binary representation and the second factor is constrained in such a way that it comprises, in a binary representation, several bits at 1 with a sequence of at least L?1 bits at 0 between each pair of consecutive bits to 1 while the multiplication is carried out by assembling the binary versions of the first factor shifted according to positions of the bits at 1 of the second factor, respectively.
    Type: Grant
    Filed: February 24, 2005
    Date of Patent: October 26, 2010
    Assignee: France Telecom
    Inventors: Marc Girault, David Lefranc
  • Patent number: 7818550
    Abstract: One embodiment of a processor includes a fetch stage, decoder stage, execution stage and completion stage. The execution stage includes a primary execution stage for handling low latency instructions and a secondary execution stage for handling higher latency instructions. A detector determines if an instruction is a high latency instruction or a low latency instruction. If the detector also finds that a particular low latency instruction is dependent on, and destructive of, a corresponding high latency instruction, then the secondary execution stage dynamically fuses the execution of the low latency instruction together with the execution of the high latency instruction. Otherwise, the primary execution stage handles the execution of the low latency instruction.
    Type: Grant
    Filed: July 23, 2007
    Date of Patent: October 19, 2010
    Assignee: International Business Machines Corporation
    Inventor: Michael Thomas Vaden
  • Publication number: 20100250640
    Abstract: A linear systolic array is added to the lower side of a trapezoid systolic array created by combining a triangular systolic array and a square systolic array. In order to make the connection among the cells fixed, the intermediate result output from each row of the trapezoid systolic array to a lower row is shifted in phase with respect to the intermediate result of the complex MFA algorithm, the phase shift is absorbed by the next row, and the phase shift in the intermediate result output from the last row of the trapezoid systolic array is corrected by the linear systolic array. Each cell is implemented by a CORDIC circuit that processes vector angle computation, vector rotation, division, and multiply-and-accumulate with a constant delay.
    Type: Application
    Filed: November 21, 2008
    Publication date: September 30, 2010
    Inventor: Katsutoshi Seki
  • Publication number: 20100235414
    Abstract: A Montgomery multiplication device calculates a Montgomery product of an operand X and an operand Y with respect to a modulus M and includes a plurality of processing elements. In a first clock cycle, two intermediate partial sums are created by obtaining an input of length w?1 from a preceding processing element as w?1 least significant bits. The most significant bit is configured as either zero or one. Then, two partial sums are calculated using a word of the operand Y, a word of the modulus M, a bit of the operand X, and the two intermediate partial sums. In a second clock cycle, a selection bit is obtained from a subsequent processing element and one of the two partial sums is selected based on the value of the selection bit. Then, the selected partial sum is used for calculation of a word of the Montgomery product.
    Type: Application
    Filed: March 1, 2010
    Publication date: September 16, 2010
    Inventors: Miaoqing Huang, Krzysztof Gaj
  • Patent number: 7797516
    Abstract: A set of low-cost microcontroller extensions facilitates Digital Signal Processing (DSP) applications by incorporating a Multiply-Accumulate (MAC) unit in a Central Processing Unit (CPU) of the microcontroller which is responsive to the extensions.
    Type: Grant
    Filed: March 16, 2007
    Date of Patent: September 14, 2010
    Assignee: ATMEL Corporation
    Inventors: Benjamin Francis Froemming, Emil Lambrache
  • Publication number: 20100211622
    Abstract: In a determination as to similarity on parts of a piece of data, high-speed processing is performed without the need for a database. Division signal lines (L1 to Lk) that transmit signals corresponding to division data are used.
    Type: Application
    Filed: September 25, 2008
    Publication date: August 19, 2010
    Inventor: Akiyoshi Oguro
  • Publication number: 20100183145
    Abstract: An arithmetic circuit capable of Montgomery multiplication using only a one-port RAM is disclosed. In a first read process, b[i] is read from a memory M2 of a sync one-port RAM for storing a[s?1: 0] and b[s?1: 0] and stored in a register R1. In a second read process, a[j] is read from the memory M2, t[j] from a memory M1 of a sync one-port RAM for storing t[s?1: 0], b[i] from the register R1, and a value RC from a register R2, and input to a sum-of-products calculation circuit 10 for calculating t[j]+a[j]*b[j]+RC. In a write process, the calculation result data FH is written in the register R2, and the calculation result data FL in the memory M1 as t[j]. A first subloop process for repeating the second read process, the sum-of-products calculation process and the write process is executed after the first read process.
    Type: Application
    Filed: January 12, 2010
    Publication date: July 22, 2010
    Inventor: Shigeo OHYAMA
  • Publication number: 20100169404
    Abstract: A multiplier-accumulator (MAC) block can be programmed to operate in one or more modes. When the MAC block implements at least one multiply-and-accumulate operation, the accumulator value can be zeroed without introducing clock latency or initialized in one clock cycle. To zero the accumulator value, the most significant bits (MSBs) of data representing zero can be input to the MAC block and sent directly to the add-subtract-accumulate unit. Alternatively, dedicated configuration bits can be set to clear the contents of a pipeline register for input to the add-subtract-accumulate unit.
    Type: Application
    Filed: January 7, 2010
    Publication date: July 1, 2010
    Inventors: Leon Zheng, Martin Langhammer, Nitin Prasad, Greg Starr, Chiao Kai Hwang, Kumara Tharmalingam
  • Patent number: 7728624
    Abstract: An integrated circuit comprising at least one group comprising having multiple arithmetic/logic units arranged in sub-groups. In the sub-groups at inputs of multiple arithmetic/logic units, in each case a single one of the first selection units is connected on the input side, wherein no other selection unit is connected directly on the input side of this selection unit. The first selection units are coupled to each other such that a horizontal and/or vertical logical interconnection of the arithmetic/logic units within a group, and/or a logical interconnection of arithmetic/logic units to an upstream group can be implemented. Second selection units are in each case connected on the output side of a column of arithmetic/logic units. The second selection units of a group are connected on the output side to one bus each, and a microprocessor is coupled to this bus.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: June 1, 2010
    Assignee: Micronas GmbH
    Inventor: Gert Umbach
  • Patent number: 7730118
    Abstract: An arithmetic unit for selectively implementing one of a multiply and multiply-accumulate instruction, including a multiplier, addition circuitry, a result register, and accumulator circuitry. The multiplier arranged to receive first and second operands and operable to generate multiplication terms. The addition circuitry for receiving multiplication terms from the multiplier and operable to combine them to generate a multiplication result. The result register for receiving the multiplication result from the adder. The accumulator circuitry connected to receive a value stored in the result register and an accumulate control signal which determines whether the arithmetic unit implements a multiply or a multiply-accumulate instruction.
    Type: Grant
    Filed: April 7, 2006
    Date of Patent: June 1, 2010
    Assignee: STMicroelectronics (Research & Development) Limited
    Inventor: Tariq Kurd
  • Patent number: 7725521
    Abstract: A method and apparatus for performing matrix transformations including multiply-add operations and byte shuffle operations on packed data in a processor. In one embodiment, two rows of content byte elements are shuffled to generate a first and second packed data respectively including elements of a first two columns and of a second two columns. A third packed data including sums of products is generated from the first packed data and elements from two rows of a matrix by a multiply-add instruction. A fourth packed data including sums of products is generated from the second packed data and elements from two more rows of the matrix by another multiply-add instruction. Corresponding sums of products of the third and fourth packed data are then summed to generate two rows of a product matrix. Elements of the product matrix may be generated in an order that further facilitates a second matrix multiplication.
    Type: Grant
    Filed: October 10, 2003
    Date of Patent: May 25, 2010
    Assignee: Intel Corporation
    Inventors: Yen-Kuang Chen, Eric Q. Li, William W. Macy, Jr., Minerva M. Yeung
  • Patent number: 7716268
    Abstract: A method and apparatus for providing a processor based nested form polynomial engine are disclosed. A concise instruction format is provided to significantly decrease memory required and allow for instruction pipelining without branch penalty using a nested form polynomial engine. The instruction causing a processor to set coefficient and data address pointers for evaluating a polynomial, to load loading a coefficient and data operand into a coefficient register and a data register, respectively, to multiply the contents of the coefficient register and data register to produce a product, to add a next coefficient operand to the product to produce a sum, to provide the sum to an accumulator and to repeat the loading, multiplying, adding and providing until evaluation of the polynomial is complete.
    Type: Grant
    Filed: March 4, 2005
    Date of Patent: May 11, 2010
    Assignee: Hitachi Global Storage Technologies Netherlands B.V.
    Inventors: Jeffrey J. Dobbek, Kirk Hwang
  • Patent number: 7716269
    Abstract: A multiply accumulate unit (“MAC”) that performs operations on packed integer data. In one embodiment, the MAC receives 2 32-bit data words which, depending on the specified mode of operation, each contain either four 8-bit operands, two 16-bit operands, or one 32-bit operand. Depending on the mode of operation, the MAC performs either sixteen 8×8 operations, four 16×16 operations, or one 32×32 operation. Results may be individually retrieved from registers and the corresponding accumulator cleared after the read cycle. In addition, the accumulators may be globally initialized. Two results from the 8×8 operations may be packed into a single 32-bit register. The MAC may also shift and saturate the products as required.
    Type: Grant
    Filed: June 16, 2005
    Date of Patent: May 11, 2010
    Assignee: Cradle Technologies
    Inventors: Moshe B. Simon, Erik P. Machnicki, David A. Harrison, Rakesh K. Singh
  • Patent number: 7716266
    Abstract: A method and system for performing a binary mode and hexadecimal mode Multiply-Add floating point operation in a floating point arithmetic unit according to a formula A*C+B, wherein A, B and C operands each have a fraction and an exponent part expA, expB and expC and the exponent of the product A*C is calculated and compared to the exponent of the addend under inclusion of an exponent bias value dedicated to use unsigned biased exponents, wherein the comparison yields a shift amount used for aligning the addend with the product operand, wherein a shift amount calculation provides a common value CV for both binary and hexadecimal according to the formula (expA+expC?expB+CV).
    Type: Grant
    Filed: January 26, 2006
    Date of Patent: May 11, 2010
    Assignee: International Business Machines Corporation
    Inventors: Son Dao Trong, Juergen Haess, Klaus Michael Kroener, Eric M. Schwarz
  • Patent number: 7698358
    Abstract: In a programmable logic device having a specialized functional block incorporating multipliers and adders, multiplication operations that do not fit neatly into the available multipliers are performed partially in the multipliers of the specialized functional block and partially in multipliers configured in programmable logic of the programmable logic device. Unused resources of the specialized functional block, including adders, may be used to add together the partial products produced inside and outside the specialized functional block. Some adders configured in programmable logic of the programmable logic device also may be used for that purpose.
    Type: Grant
    Filed: December 24, 2003
    Date of Patent: April 13, 2010
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Leon Zheng, Chiao Kai Hwang, Gregory Starr
  • Patent number: 7676535
    Abstract: An embodiment of the present invention is a technique to perform floating-point operations. A floating-point (FP) squarer squares a first argument to produce an intermediate argument. The first and intermediate arguments have first and intermediate mantissas and exponents. A FP multiply-add (MAD) unit performs a multiply-and-add operation on the intermediate argument, a second argument, and a third argument to produce a result having a result mantissa and a result exponent. The second and third arguments have second and third mantissas and exponents, respectively.
    Type: Grant
    Filed: September 28, 2005
    Date of Patent: March 9, 2010
    Assignee: Intel Corporation
    Inventors: David D. Donofrio, Xuye Li
  • Publication number: 20100023569
    Abstract: A method of computing arithmetic operations more efficiently than the conventional Arithmetic Logic Unit (ALU) is disclosed. By encoding both operands from Binary Coded Decimal (BCD) codes (0000, to 1001) into decimal digits (0 to 9), inputting them in the GerTh's™ look-up tables, which are made of an array of AND gates, the invention finds the answer more efficiently. This method finds the result in fewer steps than a traditional ALU by reducing the repetitive calculation steps and logic gates required. And this new method makes the unsolvable computerized binary floating-point multiplications and divisions back to the solvable GerTh's computerized decimal digits' (0-9) elementary arithmetic operations.
    Type: Application
    Filed: July 22, 2008
    Publication date: January 28, 2010
    Applicant: DAW SHIEN SCIENTIFIC RESEARCH & DEVELOPMENT, INC.
    Inventors: James Shihfu Shiao, Albert Shihyung Shiao
  • Publication number: 20100011042
    Abstract: A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.
    Type: Application
    Filed: September 15, 2009
    Publication date: January 14, 2010
    Inventors: Eric Debes, William W. Macy, Jonathan J. Tyler