Multiplication Followed By Addition (i.e., X*y+z) Patents (Class 708/523)
  • Patent number: 7640285
    Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.
    Type: Grant
    Filed: October 20, 2004
    Date of Patent: December 29, 2009
    Assignee: NVIDIA Corporation
    Inventors: Stuart F. Oberman, Ming Y. Siu
  • Publication number: 20090313236
    Abstract: A documents database has a plurality of documents, including but not limited to text files, video clips and sound files. Each document is associated with at least one category of a plurality of categories in a categories database, and each category has at least one keyword. A search request having at least one search term is received from a user, and a categories database is searched for categories having a keyword corresponding to the user search term to identify first level categories. The other keywords from the identified first level categories are retrieved and the documents database is searched for documents having a user search term or a retrieved keyword. The identified documents are then ranked and presented to the user. Other search expansion techniques, and display techniques, are also discussed.
    Type: Application
    Filed: June 13, 2008
    Publication date: December 17, 2009
    Applicant: NEWS DISTRIBUTION NETWORK, INC.
    Inventors: Paul Matthew Hernacki, Gregory Alton Peters
  • Publication number: 20090292754
    Abstract: Methods and systems for detecting underflow in a floating-point operation are disclosed. In accordance with an example disclosed method a plurality of comparator circuits and a plurality of logic devices coupled to the plurality of comparator circuits are operated to determine whether performing a floating-point operation using a floating-point hardware unit will generate an underflow condition. The operating of the plurality of comparator circuits and the logic devices involves inputting a multiply-add operation result value to at least some of the plurality of comparator circuits. In addition, a plurality of logic outputs are outputted via the plurality of logic devices. The plurality of logic outputs are indicative of comparison operations performed by at least some of the comparator circuits based on the multiply-add operation result value. An underflow indicator is outputted based on the plurality of logic outputs.
    Type: Application
    Filed: July 31, 2009
    Publication date: November 26, 2009
    Inventor: Marius A. Cornea-Hasegan
  • Publication number: 20090292756
    Abstract: A processor to calculate a product-component having fewer digits than an entire product of a multiplication of a multiplicand and a multiplier. A memory holds at least one multiplicand-component having fewer digits than the multiplicand and at least one multiplier-component having fewer digits than the multiplier. A logic then calculates the product-component based on the multiplicand-components and the multiplier-components in the memory. Collectively, a plurality of the processors can calculate all of the product-components of the product.
    Type: Application
    Filed: May 23, 2008
    Publication date: November 26, 2009
    Inventors: Gibson D. Elliot, Jay Randall Stoner
  • Patent number: 7624138
    Abstract: A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.
    Type: Grant
    Filed: December 30, 2003
    Date of Patent: November 24, 2009
    Assignee: Intel Corporation
    Inventors: Eric Debes, William W. Macy, Jonathan J. Tyler
  • Publication number: 20090287757
    Abstract: Modifying a leading zero estimation during an unfused multiply add operation of (A*B)+C. A plurality of terms x and y may be received, and each may be based on truncated terms s and t (e.g., in performing the unfused multiply add operation) and the shifted C term. A first leading zero estimation may be calculated based on the terms x and y. It may be determined if near total catastrophic cancellation has occurred. A carry in from a right most number of bits of the terms s and t and the most significant truncated bits of s and t may be used to generate a second leading zero estimation based on the first leading zero estimation if the near total catastrophic cancellation has occurred.
    Type: Application
    Filed: May 15, 2008
    Publication date: November 19, 2009
    Inventor: Leonard D. Rarick
  • Publication number: 20090248769
    Abstract: A multiply and accumulate engine may implement a digital filter. In some embodiments, the number of coefficients that are stored may be equal to only half of the number of filter taps that are implemented. This may be done by doing multiplications operand by operand within two data registers in a first direction and then shifting directions so that the first operand in a first register is multiplied by the last operand in another register. In some embodiments, the multiply and accumulate engine may be implemented as a two cycle engine wherein in the first stage, multiply and accumulate operations are implemented and then stored into a register. In a second stage and a second cycle, the results stored in the register are further accumulated.
    Type: Application
    Filed: March 26, 2008
    Publication date: October 1, 2009
    Inventor: Teck-Kuen Chua
  • Publication number: 20090248779
    Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.
    Type: Application
    Filed: March 28, 2008
    Publication date: October 1, 2009
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Patent number: 7596472
    Abstract: The device determines the weighting coefficients to be applied to N digital source signals to form a composite signal. The first- to third-order moments of the composite signal must respectively present mean value, variance and skewness characteristics predefined by a user. The device introduces an additional variable, in the form of a weighting matrix W. The vector w being the vector of the weighting coefficients and wT the transpose of the vector w, the difference W?wwT is a positive semidefinite matrix. Moreover, the device performs linearization, around a vector wref of reference weighting coefficients, of the skewness constraint on the third-order moments using a matrix A = [ W w w T 1 ] as further intermediate variable.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: September 29, 2009
    Assignee: Prax Value
    Inventor: Francois Oustry
  • Patent number: 7567997
    Abstract: In one embodiment an IC is disclosed which includes a plurality of cascaded digital signal processing slices, wherein each slice has a multiplier coupled to an adder via a multiplexer and each slice has a direct connection to an adjoining slice; and means for configuring the plurality of digital signal processing slices to perform one or more mathematical operations, via, for example, opmodes. This IC allows for the implementation of some basic math functions, such as add, subtract, multiply and divide. Many other applications may be implemented using the one or more DSP slices, for example, accumulate, multiply accumulate (MACC), a wide multiplexer, barrel shifter, counter, and folded, decimating, and interpolating FIRs to name a few.
    Type: Grant
    Filed: December 21, 2004
    Date of Patent: July 28, 2009
    Assignee: XILINX, Inc.
    Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
  • Publication number: 20090177447
    Abstract: A method for estimating software development effort comprises the steps of: generating a database containing a plurality of source softwares; calculating the Grey relational coefficients between the software to be developed and a source software in the database for each feature they exhibit; calculating the weights for each Grey relational coefficient; multiplying each Grey relational coefficient with the corresponding weight; calculating the Grey relational grade by summing up the products produced in the multiplying step; calculating the Grey relational grades for all remaining source softwares in the database; and comparing the Grey relational grades to estimate the effort for developing the software to be developed.
    Type: Application
    Filed: January 4, 2008
    Publication date: July 9, 2009
    Applicant: NATIONAL TSING HUA UNIVERSITY
    Inventors: Chao Jung Hsu, Chin Yu Huang
  • Publication number: 20090150471
    Abstract: Provided are a reconfigurable arithmetic unit and a processor having the same. The reconfigurable arithmetic unit can perform an addition operation or a multiplication operation according to an instruction by sharing an adder. The reconfigurable arithmetic unit includes a booth encoder for encoding a multiplier, a partial product generator for generating a plurality of partial products using the encoded multiplier and a multiplicand, a Wallace tree circuit for compressing the partial products into a first partial product and a second partial product, a first Multiplexer (MUX) for selecting and outputting one of the first partial product and a first addition input according to a selection signal, a second MUX for selecting and outputting one of the second partial product and a second addition input according to the selection signal, and a Carry Propagation Adder (CPA) for adding an output of the first MUX and an output of the second MUX to output an operation result.
    Type: Application
    Filed: June 10, 2008
    Publication date: June 11, 2009
    Inventors: Yil Suk YANG, Jung Hee SUK, Chun Gi LYUH, Tae Moon ROH, Jong Dae KIM
  • Patent number: 7516307
    Abstract: A method and apparatus is disclosed that computes multiple absolute differences from packed data and sums each one of the multiple absolute differences together to produce a result. According to one embodiment, a processor includes a decode unit to decode a packed sum of absolute differences (PSAD) instruction having an opcode format to identify a set of packed data operands. The decode unit initiates a sequence of operations on the set of packed data operands in response to decoding the PSAD instruction. An execution unit performs a first operation of the sequence of operations initiated by the decode logic, and a bus provides the execution unit with the set of packed data operands as identified in accordance with the opcode format.
    Type: Grant
    Filed: November 6, 2001
    Date of Patent: April 7, 2009
    Assignee: Intel Corporation
    Inventors: Mohammad A. Abdallah, Vladimir Pentkovski
  • Publication number: 20090077154
    Abstract: Provided is a microprocessor including a complex-MAC unit that operates in response to a complex-MAC instruction. The complex-MAC unit receives first and second complex data (each having 2m-bit length) from a first register having a register length of at least 2m+1 bits, and also receives third and fourth complex data (each having 2m-bit length) from a second register having a register length of at least 2m+1 bits, to calculate a sum of real parts or imaginary parts of a complex product of the first and third complex data and a complex product of the second and fourth complex data. The complex-MAC unit adds the obtained sum of the real parts or imaginary parts to a stored value of the third register, and overwrites the third register with the cumulative total value. The third register has a register length of at least 2m+2 bits.
    Type: Application
    Filed: September 10, 2008
    Publication date: March 19, 2009
    Applicant: NEC ELECTRONICS CORPORATION
    Inventors: Hideki Matsuyama, Masayuki Daitou
  • Publication number: 20090070399
    Abstract: An arithmetic processing system processes a sensing signal and a first approximate offset signal to obtain a second approximate offset signal. The system includes a first arithmetic processor and a second arithmetic processor. The first arithmetic processor receives and processes the sensing signal and the first approximate offset signal to output a first arithmetic signal. The second arithmetic processor processes the first arithmetic signal to output a second arithmetic signal, and the second arithmetic signal is added with a predetermined offset signal to obtain the second approximate offset signal, and the second approximate offset signal is closer to a real offset signal of the sensing signal than the first approximate offset signal. A method of arithmetic processing is also disclosed.
    Type: Application
    Filed: November 6, 2007
    Publication date: March 12, 2009
    Applicant: ASIA OPTICAL CO., INC.
    Inventors: Kun-Chi Liao, Yu-Ting Lee
  • Publication number: 20090030963
    Abstract: The conventional two's complement multiplier which is constituted by a Booth encoder, a partial production generation circuit, and an adder has a problem that the circuit scale would be increased because a bit extension is performed when the multiplier is adapted to an unsigned multiplication. A multiplication circuit of the present invention is provided with a first Booth encoder (1) for encoding lower-order several bits of a multiplier according to first rules of encoding using a Booth algorithm, and a second Booth encoder (5) for encoding most-significant several bits of the multiplier according to second rules of encoding using a Booth algorithm, which are different from the first rules of encoding, and thereby the most-significant several bits of the multiplier are encoded using the Booth algorithm which is different from that for the lower-order several bits.
    Type: Application
    Filed: February 8, 2007
    Publication date: January 29, 2009
    Inventor: Kouichi Nagano
  • Patent number: 7480690
    Abstract: Described are arithmetic circuits divided logically into a product generator and an adder. Multiplexing circuitry logically disposed between the product generator and the adder supports conventional functionality by providing partial products from the product generator to addend terminals of the adder. The multiplexing circuitry can also be controlled to direct a number of external added inputs to the adder. The additional addend inputs can include inputs and outputs cascaded from other arithmetic circuits.
    Type: Grant
    Filed: December 21, 2004
    Date of Patent: January 20, 2009
    Assignee: XILINX, Inc.
    Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
  • Patent number: 7472155
    Abstract: Described is a programmable logic device (PLD) with columns of DSP slices that can be cascaded to create DSP circuits of varying size and complexity. Each DSP slice includes a plurality of operand input ports and a slice output port, all of which are programmably connected to general routing and logic resources. The operand ports receive operands for processing, and a slice output port conveys processed results. Each slice additionally includes a feedback port connected to the respective slice output port, to support accumulate functions in this embodiment, and a cascade input port connected to the output port of an upstream slice to support cascading.
    Type: Grant
    Filed: December 21, 2004
    Date of Patent: December 30, 2008
    Assignee: Xilinx, Inc.
    Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
  • Patent number: 7467175
    Abstract: Described is a programmable logic device (PLD) with columns of DSP slices that can be combined to create DSP circuits of varying size and complexity. DSP slices in accordance with some embodiments includes programmable operand input registers that can be configured to introduce different amounts of delay, from zero to two clock cycles, for example, to support pipelining. In one such embodiment, each DSP slice includes a partial-product generator having a multiplier port, a multiplicand port, and a product port. The multiplier and multiplicand ports connect to the operand input port via respective first and second operand input registers, each of which is capable of introducing from zero to two clock cycles of delay. In another embodiment, the output of at least one operand input register can connect to the input of an operand input register of a downstream DSP slice so that operands can be transferred among one or more slices.
    Type: Grant
    Filed: December 21, 2004
    Date of Patent: December 16, 2008
    Assignee: XILINX, Inc.
    Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
  • Publication number: 20080256162
    Abstract: An x87 fused multiply-add (FMA) instruction in the instruction set of an x86 architecture microprocessor is disclosed. The FMA instruction implicitly specifies the two factor operands as the top two operands of the x87 FPU register stack and explicitly specifies the third addend operand as a third x87 FPU register stack register. The microprocessor multiplies the first two operands and adds the product to the third operand to generate a result. The result is stored into the third register and the first two operands are popped off the stack. In an alternate embodiment, the third operand is also implicitly specified as being stored in the register that is two registers below the top of stack register; the result is also stored therein. The instruction opcode value is in the x87 opcode range.
    Type: Application
    Filed: July 23, 2007
    Publication date: October 16, 2008
    Applicant: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Timothy A. Elliott, Terry Parks
  • Patent number: 7437401
    Abstract: A programmable logic device is provided that includes a MAC block having mode splitting capabilities. Different modes of operation may be implemented simultaneously whereby the multipliers and other DSP circuitry of the MAC block may be allocated among the different modes of operation. For example, one multiplier may be used to implement a multiply mode while another two multipliers may be used to implement a sum of two multipliers mode.
    Type: Grant
    Filed: February 20, 2004
    Date of Patent: October 14, 2008
    Assignee: Altera Corporation
    Inventors: Leon Zheng, Martin Langhammer, Steven Perry, Paul Metzgen, Gregory Starr, William Hwang, Kumara Tharmalingam
  • Patent number: 7428566
    Abstract: A multipurpose functional unit is configurable to support a number of operations including multiply-add and format conversion operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and logical test operations.
    Type: Grant
    Filed: November 10, 2004
    Date of Patent: September 23, 2008
    Assignee: Nvidia Corporation
    Inventors: Ming Y. Siu, Stuart F. Oberman
  • Publication number: 20080195685
    Abstract: Multiplication engines and multiplication methods are provided. A multiplication engine for a digital processor includes a first multiplier to generate unequally weighted partial products from input operands in a first multiplier mode; a second multiplier to generate equally weighted partial products from input operands in a second multiplier mode; a multiplexer to select the unequally weighted partial products in the first multiplier mode and to select the equally weighted partial products in the second multiplier mode; and a carry save adder array configured to combine the selected partial products in the first multiplier mode and in the second multiplier mode.
    Type: Application
    Filed: January 10, 2008
    Publication date: August 14, 2008
    Applicant: Analog Devices, Inc.
    Inventors: Andreas D. Olofsson, Baruch Yanovitch
  • Publication number: 20080189347
    Abstract: According to some embodiments, a dual multiply-accumulate operation optimized for even and odd multisample calculations is disclosed.
    Type: Application
    Filed: March 4, 2008
    Publication date: August 7, 2008
    Inventors: Bradley C. Aldrich, Nigel C. Paver, William T. Maghielse
  • Publication number: 20080155382
    Abstract: Various methods and systems for implementing Reed Solomon multiplication sections from exclusive-OR (XOR) logic are disclosed. For example, a system includes a Reed Solomon multiplication section, which includes XOR-based logic. The XOR-based logic includes an input, an output, and one or more XOR gates. A symbol X is received at the input of the XOR-based logic. The one or more XOR gates are coupled to generate a product of a power of ? and X at the output, wherein ? is a root of a primitive polynomial of a Reed Solomon code. Such a Reed Solomon multiplication section, which can include one or more multipliers implemented using XOR-based logic, can be included in a Reed Solomon encoder or decoder.
    Type: Application
    Filed: March 11, 2008
    Publication date: June 26, 2008
    Inventors: Qiujie Dong, Andrew J. Thurston
  • Publication number: 20080140753
    Abstract: An electronically implemented method includes multiplying a number A, and a number B, where A is composed of segments ai and B is composed of segments bj where i and j are integers greater than 1. The multiplying includes determining partial product values for at least some of aibj and determining a sum of partial product values for aibj and ajbi where ai=bj and bj=ai for respective values of i and j, by multiplying one of (1) aibj and (2) ajbi by two. A sum is determined and stored in a memory storage element of the determined partial product values and the determined sum of partial product values for aibj and ajbi.
    Type: Application
    Filed: December 8, 2006
    Publication date: June 12, 2008
    Inventors: Vinodh Gopal, Gilbert M. Wolrich, Wajdi Feghali, Robert P. Ottavi
  • Publication number: 20080140752
    Abstract: System and method for processing symbols in a communication system are disclosed and may include in a processor that receives symbols to be coded for transmission over a wireless medium, grouping elements of an input matrix across a second dimension of the input matrix to form groups of matrix elements while multiplying the input matrix and an input vector. The input vector may include the symbols to be coded for transmission over the wireless medium. The method may also include pre-computing possible permutations of partial results for each of the groups of matrix elements, and assigning the partial results from each of the groups of matrix elements to each of a corresponding index of a first dimension of the input matrix to form a matrix of assigned partial results.
    Type: Application
    Filed: January 21, 2008
    Publication date: June 12, 2008
    Inventor: Yung-hsiang Lee
  • Publication number: 20080114826
    Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of dot product operations to generate operands for generating operands for a new vector. The dot product operations may require the issue of a plurality of permute instructions to arrange the vector operands in desired locations of a target register. Embodiments of the invention provide a dot product instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby avoiding the need for permute instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.
    Type: Application
    Filed: October 31, 2006
    Publication date: May 15, 2008
    Inventors: Eric Oliver Mejdrich, Adam James Muff
  • Patent number: 7353244
    Abstract: According to some embodiments, a dual multiply-accumulate operation optimized for even and odd multisample calculations is disclosed.
    Type: Grant
    Filed: April 16, 2004
    Date of Patent: April 1, 2008
    Assignee: Marvell International Ltd.
    Inventors: Bradley C. Aldrich, Nigel C. Paver, William T. Maghielse
  • Publication number: 20070185953
    Abstract: Included are embodiments of a Multiply-Accumulate Unit to process multiple format floating point operands. For short format operands, embodiments of the Multiply Accumulate Unit are configured to process data with twice the throughput as long and mixed format data. At least one embodiment can include a short exponent calculation component configured to receive short format data, a long exponent calculation component configured to receive long format data, and a mixed exponent calculation component configured to receive short exponent data, the mixed exponent calculation component further configured to received long format data. Embodiments also include a mantissa datapath configured for implementation to accommodate processing of long, mixed, and short floating point operands.
    Type: Application
    Filed: February 6, 2007
    Publication date: August 9, 2007
    Inventors: Boris Prokopenko, Timour Paltashev, Derek Gladding
  • Patent number: 7231510
    Abstract: A mechanism for, and method of, processing multiply-accumulate instructions with out-of-order completion in a pipeline, for use in a processor having an at least four-wide instruction issue architecture, and a digital signal processor (DSP) incorporating the mechanism or the method. In one embodiment, the mechanism including: (1) a multiply-accumulate unit (MAC) having an initial multiply stage and a subsequent accumulate stage and (2) out-of-order completion logic, associated with the MAC, that causes interim results produced by the multiply stage to be stored when the accumulate stage is unavailable and allows younger instructions to complete before the multiply-accumulate instructions.
    Type: Grant
    Filed: November 13, 2001
    Date of Patent: June 12, 2007
    Assignee: VeriSilicon Holdings (Cayman Islands) Co. Ltd.
    Inventors: Hung T. Nguyen, Shannon A. Wichman
  • Patent number: 7225323
    Abstract: A multipurpose functional unit is configurable to support a number of operations including multiply-add and comparison testing operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and format conversion operations.
    Type: Grant
    Filed: November 10, 2004
    Date of Patent: May 29, 2007
    Assignee: NVIDIA Corporation
    Inventors: Ming Y. Siu, Stuart F. Oberman
  • Patent number: 7216217
    Abstract: A programmable processor that comprises a general purpose processor architecture, capable of operation independent of another host processor, having a virtual memory addressing unit, an instruction path and a data path; an external interface; a cache operable to retain data communicated between the external interface and the data path; at least one register file configurable to receive and store data from the data path and to communicate the stored data to the data path; and a multi-precision execution unit coupled to the data path. The multi-precision execution unit is configurable to dynamically partition data received from the data path to account for an elemental width of the data and is capable of performing group floating-point operations on multiple operands in partitioned fields of operand registers and returning catenated results. In other embodiments the multi-precision execution unit is additionally configurable to execute group integer and/or group data handling operations.
    Type: Grant
    Filed: August 25, 2003
    Date of Patent: May 8, 2007
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris
  • Patent number: 7206927
    Abstract: A method of executing an instruction stream in a pipelined execution unit of depth, p, comprises loading the instruction stream; detecting an iteration of an instruction in the loaded instruction stream; interleaving p steams of instances of the instruction in the pipeline; detecting an end of the iteration; and combining results obtained from the p streams after all programmed iterations have completed. A computational circuit comprises a register which can hold a value representing both an operand and result of an iterative operation; a multiplexer having a first input connected to receive the operand from the register, a second input connected to a source of an identify value for the iterative operation, and an output; and an operator circuit having an input connected to receive a value from the multiplexer output, and an output connected to return thee result to the register.
    Type: Grant
    Filed: November 19, 2002
    Date of Patent: April 17, 2007
    Assignee: Analog Devices, Inc.
    Inventor: Abhijit Giri
  • Patent number: 7181484
    Abstract: A multiply unit includes an extended precision accumulator. Microprocessor instructions are provided for manipulating portions of the extended precision accumulator including an instruction to move the contents of a portion of the extended accumulator to a general-purpose register (“MFLHXU”) and an instruction to move the contents of a general-purpose register to a portion of the extended accumulator (“MTLHX”).
    Type: Grant
    Filed: February 21, 2001
    Date of Patent: February 20, 2007
    Assignee: MIPS Technologies, Inc.
    Inventors: Morten Stribaek, Pascal Paillier
  • Patent number: 7142010
    Abstract: In a programmable logic device having dedicated multiplier circuitry, some of the scan chain registers normally used for testing the device are located adjacent input registers of the multipliers. Those scan chain registers are ANDed with the input registers, and can be loaded with templates of ones and zeroes. This allows, e.g., subset multiplication if the least significant bits are loaded with zeroes and the remaining bits are loaded with ones. The multipliers preferably are arranged in blocks with other components, such as adders, that allow them to be configured as finite impulse response (FIR) filters. In such configurations, the scan chain registers can be used to load filter coefficients, avoiding the use of scarce logic and routing resources of the device.
    Type: Grant
    Filed: December 19, 2003
    Date of Patent: November 28, 2006
    Assignee: Altera Corporation
    Inventors: Martin Langhammer, Chiao Kai Hwang, Gregory Starr
  • Patent number: 7127482
    Abstract: An algorithm and hardware structure is described for numerical operations on signals that is reconfigurable to operate in a downsampling or non-downsampling mode. According to one embodiment, a plurality of adders and multipliers are reconfigurable via a switching fabric to operate as a plurality of MAAC ( multiply-add-accumulator) kernels (described in detail below), when operating in a non-downsampling mode and a plurality of MAAC kernels and AMAAC (add-multiply-add-accumulator) kernals (described in detail below), when operating in a downsampling mode.
    Type: Grant
    Filed: November 19, 2001
    Date of Patent: October 24, 2006
    Assignee: Intel Corporation
    Inventors: Yan Hou, Hong Jiang, Kam Leung
  • Patent number: 7111166
    Abstract: An extension of the serial/parallel Montgomery modular multiplication method with simultaneous reduction as previously implemented by the applicants, adapted innovatively to perform both in the prime number and in the GF(2q) polynomial based number field, in such a way as to simplify the flow of operands, by performing a multiple anticipatory function to enhance the previous modular multiplication procedures.
    Type: Grant
    Filed: May 14, 2001
    Date of Patent: September 19, 2006
    Assignee: Fortress U&T Div. M-Systems Flash Disk Pioneers Ltd.
    Inventors: Itai Dror, Carmi David Gressel, Michael Mostovoy, Alexey Molchanov
  • Patent number: 7107305
    Abstract: A tightly coupled dual 16-bit multiply-accumulate (MAC) unit for performing single-instruction/multiple-data (SIMD) operations may forward an intermediate result to another operation in a pipeline to resolve an accumulating dependency penalty. The MAC unit may also be used to perform 32-bit×32-bit operations.
    Type: Grant
    Filed: October 5, 2001
    Date of Patent: September 12, 2006
    Assignee: Intel Corporation
    Inventors: Deli Deng, Anthony Jebson, Yuyun Liao, Nigel C. Paver, Steve J. Strazdus
  • Patent number: 7080113
    Abstract: A virtually parallel multiplier-accumulator (VMAC) that can execute more than or less than one MAC operation in a single system clock cycle. The inventive VMAC advantageously employs a resource/time-sharing methodology with multiple sequential computational stages.
    Type: Grant
    Filed: July 17, 2003
    Date of Patent: July 18, 2006
    Assignee: Agere Systems Inc.
    Inventors: Hyun Lee, Shaun P. Whalen
  • Patent number: 7043517
    Abstract: A multiply accumulator performs a multiplication-and-addition operation for a first multiplier with N bits, a second multiplier with N bits, and an addend with M bits, wherein M is larger than 2N. The multiply accumulator includes a modified Booth encoder and a multiplication-and-addition unit. The modified Booth encoder performs a Booth encoding to either the first multiplier or its bit inversion by supplementing a multiplier sign bit behind a least significant bit of either the first multiplier or its bit inversion. The multiplication-and-addition unit includes a carry save adder tree and a sign extension adder and achieves a high speed of the multiplication-and-addition operation by simultaneously performing the multiplication and addition.
    Type: Grant
    Filed: March 7, 2003
    Date of Patent: May 9, 2006
    Assignee: Faraday Technology Corp.
    Inventor: Chi-jui Chung
  • Patent number: 7043518
    Abstract: A multiply accumulate unit (“MAC”) that performs operations on packed integer data. In one embodiment, the MAC receives 2 32-bit data words which, depending on the specified mode of operation, each contain either four 8-bit operands, two 16-bit operands, or one 32-bit operand. Depending on the mode of operation, the MAC performs either sixteen 8×8 operations, four 16×16 operations, or one 32×32 operation. Results may be individually retrieved from registers and the corresponding accumulator cleared after the read cycle. In addition, the accumulators may be globally initialized. Two results from the 8×8 operations may be packed into a single 32-bit register. The MAC may also shift and saturate the products as required.
    Type: Grant
    Filed: February 9, 2004
    Date of Patent: May 9, 2006
    Assignee: Cradle Technologies, Inc.
    Inventors: Moshe B. Simon, Erik P. Machnicki, David A. Harrison, Rakesh K. Singh
  • Patent number: 7043519
    Abstract: In an SIMD sum of product arithmetic method of enabling a concurrent execution of 2n (where n is a natural number) parallel sum of product arithmetic (operations), the SIMD sum of product arithmetic is executed using 2m (m=0, . . . , log2 n) accumulators as one set, and by replacing a 2p?1th accumulator with an adjacent 2pth (p=1, . . . , n/2) accumulator, without changing a sequence of accumulator addresses, in the set, as accumulator addresses to be allocated to sum of product arithmetic circuits for the SIMD sum of product arithmetic.
    Type: Grant
    Filed: September 5, 2001
    Date of Patent: May 9, 2006
    Assignee: Fujitsu Limited
    Inventor: Masayuki Tsuji
  • Patent number: 7035890
    Abstract: An apparatus for multiplying and accumulating numeric quantities, including a multiplier for receiving the numeric quantities, with the multiplier having a sum output and a carry output. A first shift register has an input coupled to the sum output of the multiplier, and a second shift register has an input coupled to the carry output of the multiplier. An adder and third shift register are used to complete processing of the apparatus' arithmetic operations.
    Type: Grant
    Filed: March 1, 2001
    Date of Patent: April 25, 2006
    Assignee: 8x8, Inc
    Inventors: Jan Fandrianto, Chi Shin Wang, Sehat Sutardja, Hedley K. J. Rainnie, Bryan R. Martin
  • Patent number: 7027598
    Abstract: A pre-computation and dual-pass modular operation approach to implement encryption protocols efficiently in electronic integrated circuits is disclosed. An encrypted electronic message is received and another electronic message generated based on the encryption protocol. Two passes of Montgomery's method are used for a modular operation that is associated with the encryption protocol along with pre-computation of a constant based on a modulus. The modular operation may be a modular multiplication or a modular exponentiation. Modular arithmetic may be performed using the residue number system (RNS) and two RNS bases with conversions between the two RNS bases. A minimal number of register files are used for the computations along with an array of multiplier circuits and an array of modular reduction circuits. The approach described allows for high throughput for large encryption keys with a relatively small number of logical gates.
    Type: Grant
    Filed: September 19, 2001
    Date of Patent: April 11, 2006
    Assignee: Cisco Technology, Inc.
    Inventors: Mihailo M. Stojancic, Mahesh S. Maddury, Kenneth J. Tomei
  • Patent number: 7027597
    Abstract: A pre-computation and dual-pass modular operation approach to implement encryption protocols efficiently in electronic integrated circuits is disclosed. An encrypted electronic message is received and another electronic message generated based on the encryption protocol. Two passes of Montgomery's method are used for a modular operation that is associated with the encryption protocol along with pre-computation of a constant based on a modulus. The modular operation may be a modular multiplication or a modular exponentiation. Modular arithmetic may be performed using the residue number system (RNS) and two RNS bases with conversions between the two RNS bases. A minimal number of register files are used for the computations along with an array of multiplier circuits and an array of modular reduction circuits. The approach described allows for high throughput for large encryption keys with a relatively small number of logical gates.
    Type: Grant
    Filed: September 18, 2001
    Date of Patent: April 11, 2006
    Assignee: Cisco Technologies, Inc.
    Inventors: Mihailo M. Stojancic, Mahesh S. Maddury, Kenneth J. Tomei
  • Patent number: 7013321
    Abstract: According to the invention, a processing core that executes a parallel multiply accumulate operation is disclosed. Included in the processing core are a first, second and third input operand registers; a number of functional blocks; and, an output operand register. The first, second and third input operand registers respectively include a number of first input operands, a number of second input operands and a number of third input operands. Each of the number of functional blocks performs a multiply accumulate operation. The output operand register includes a number of output operands. Each of the number of output operands is related to one of the number of first input operands, one of the number of second input operands and one of the number of third input operands.
    Type: Grant
    Filed: November 21, 2001
    Date of Patent: March 14, 2006
    Assignee: Sun Microsystems, Inc.
    Inventor: Ashley Saulsbury
  • Patent number: 7010558
    Abstract: An apparatus and method for performing enhanced algorithmic processing, including reduced cycle-count fast Fourier transform (FFT) calculations. In one aspect, the invention comprises a user-configurable processor having an extension instruction adapted for reduced cycle-count algorithmic operations. In one exemplary embodiment, the processor is an extensible core, and the extension instruction comprises a 32-bit instruction word linked with existing circuitry in the processor core used for multiply-accumulate (mac) instructions. 16-bit, 24-bit, and dual 16-bit multiply options are available for the multiply/accumulate unit of the processor. The extension instruction is pipelined to the same number of stages as the mac instructions, thereby avoiding unnecessary stalls and increasing performance. A modified accumulator data path used in support of the foregoing instruction is also described.
    Type: Grant
    Filed: April 18, 2002
    Date of Patent: March 7, 2006
    Assignee: ARC International
    Inventor: Chris Morris
  • Patent number: 6988184
    Abstract: Methods of performing dyadic digital signal processing (DSP) instructions. In one embodiment of the invention, the method includes fetching a dyadic DSP instruction having a main operation and a sub operation; predecoding the dyadic DSP instruction to generate predecoded instruction signals; and decoding the predecoded instruction signals to generate select signals to selectively couple data from a first plurality of buses coupled to inputs of multiplexers of a first plurality of DSP functional blocks to execute the main operation of the dyadic DSP instruction in one processor cycle and to selectively couple data from a second plurality of buses coupled to inputs of multiplexers of a second plurality of DSP functional blocks to execute the sub operation of the dyadic DSP instruction in the one processor cycle.
    Type: Grant
    Filed: August 2, 2002
    Date of Patent: January 17, 2006
    Assignee: Intel Corporation
    Inventors: Kumar Ganapathy, Ruban Kanapathipillai
  • Patent number: 6976049
    Abstract: The present invention relates to a method and system for providing a single accumulatable packed multi-way addition instruction having the functionality of multiple instructions without causing any timing problems in the execute stage. Specifically, the accumulatable packed multi-way combination instruction may be associated with at least one destination and a plurality of operands and set a polarity of each of a plurality of source operands derived from the plurality of operands, if requested by the instruction. The instruction also may add selected pairs of the plurality of source operands in predetermined orders to obtain at least one result and, if requested by the instruction, accumulating the plurality of results to obtain at least one accumulated result; output at least one predetermined pair of the at least one result and the at least one accumulated result; and accumulate condition codes for each of the at least one result and the at least one accumulated result, if requested by the instruction.
    Type: Grant
    Filed: March 28, 2002
    Date of Patent: December 13, 2005
    Assignee: Intel Corporation
    Inventor: Gad Sheaffer