Multiplication Followed By Addition (i.e., X*y+z) Patents (Class 708/523)
-
Patent number: 7640285Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.Type: GrantFiled: October 20, 2004Date of Patent: December 29, 2009Assignee: NVIDIA CorporationInventors: Stuart F. Oberman, Ming Y. Siu
-
Publication number: 20090313236Abstract: A documents database has a plurality of documents, including but not limited to text files, video clips and sound files. Each document is associated with at least one category of a plurality of categories in a categories database, and each category has at least one keyword. A search request having at least one search term is received from a user, and a categories database is searched for categories having a keyword corresponding to the user search term to identify first level categories. The other keywords from the identified first level categories are retrieved and the documents database is searched for documents having a user search term or a retrieved keyword. The identified documents are then ranked and presented to the user. Other search expansion techniques, and display techniques, are also discussed.Type: ApplicationFiled: June 13, 2008Publication date: December 17, 2009Applicant: NEWS DISTRIBUTION NETWORK, INC.Inventors: Paul Matthew Hernacki, Gregory Alton Peters
-
Publication number: 20090292754Abstract: Methods and systems for detecting underflow in a floating-point operation are disclosed. In accordance with an example disclosed method a plurality of comparator circuits and a plurality of logic devices coupled to the plurality of comparator circuits are operated to determine whether performing a floating-point operation using a floating-point hardware unit will generate an underflow condition. The operating of the plurality of comparator circuits and the logic devices involves inputting a multiply-add operation result value to at least some of the plurality of comparator circuits. In addition, a plurality of logic outputs are outputted via the plurality of logic devices. The plurality of logic outputs are indicative of comparison operations performed by at least some of the comparator circuits based on the multiply-add operation result value. An underflow indicator is outputted based on the plurality of logic outputs.Type: ApplicationFiled: July 31, 2009Publication date: November 26, 2009Inventor: Marius A. Cornea-Hasegan
-
Publication number: 20090292756Abstract: A processor to calculate a product-component having fewer digits than an entire product of a multiplication of a multiplicand and a multiplier. A memory holds at least one multiplicand-component having fewer digits than the multiplicand and at least one multiplier-component having fewer digits than the multiplier. A logic then calculates the product-component based on the multiplicand-components and the multiplier-components in the memory. Collectively, a plurality of the processors can calculate all of the product-components of the product.Type: ApplicationFiled: May 23, 2008Publication date: November 26, 2009Inventors: Gibson D. Elliot, Jay Randall Stoner
-
Patent number: 7624138Abstract: A method and apparatus for including in a processor instructions for performing integer transforms including multiply-add operations and horizontal-add operations on packed data. In one embodiment, a processor is coupled to a memory that stores a first packed byte data and a second packed byte data. The processor performs operations on said first packed byte data and said second packed byte data to generate a third packed data in response to receiving a multiply-add instruction. A plurality of the 16-bit data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed byte data. The processor adds together at least a first and a second 16-bit data element of the third packed data in response to receiving an horizontal-add instruction to generate a 16-bit result as one of a plurality of data elements of a fourth packed data.Type: GrantFiled: December 30, 2003Date of Patent: November 24, 2009Assignee: Intel CorporationInventors: Eric Debes, William W. Macy, Jonathan J. Tyler
-
Publication number: 20090287757Abstract: Modifying a leading zero estimation during an unfused multiply add operation of (A*B)+C. A plurality of terms x and y may be received, and each may be based on truncated terms s and t (e.g., in performing the unfused multiply add operation) and the shifted C term. A first leading zero estimation may be calculated based on the terms x and y. It may be determined if near total catastrophic cancellation has occurred. A carry in from a right most number of bits of the terms s and t and the most significant truncated bits of s and t may be used to generate a second leading zero estimation based on the first leading zero estimation if the near total catastrophic cancellation has occurred.Type: ApplicationFiled: May 15, 2008Publication date: November 19, 2009Inventor: Leonard D. Rarick
-
Publication number: 20090248769Abstract: A multiply and accumulate engine may implement a digital filter. In some embodiments, the number of coefficients that are stored may be equal to only half of the number of filter taps that are implemented. This may be done by doing multiplications operand by operand within two data registers in a first direction and then shifting directions so that the first operand in a first register is multiplied by the last operand in another register. In some embodiments, the multiply and accumulate engine may be implemented as a two cycle engine wherein in the first stage, multiply and accumulate operations are implemented and then stored into a register. In a second stage and a second cycle, the results stored in the register are further accumulated.Type: ApplicationFiled: March 26, 2008Publication date: October 1, 2009Inventor: Teck-Kuen Chua
-
Publication number: 20090248779Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.Type: ApplicationFiled: March 28, 2008Publication date: October 1, 2009Inventors: Jeffrey S. Brooks, Christopher H. Olson
-
Patent number: 7596472Abstract: The device determines the weighting coefficients to be applied to N digital source signals to form a composite signal. The first- to third-order moments of the composite signal must respectively present mean value, variance and skewness characteristics predefined by a user. The device introduces an additional variable, in the form of a weighting matrix W. The vector w being the vector of the weighting coefficients and wT the transpose of the vector w, the difference W?wwT is a positive semidefinite matrix. Moreover, the device performs linearization, around a vector wref of reference weighting coefficients, of the skewness constraint on the third-order moments using a matrix A = [ W w w T 1 ] as further intermediate variable.Type: GrantFiled: December 19, 2006Date of Patent: September 29, 2009Assignee: Prax ValueInventor: Francois Oustry
-
Patent number: 7567997Abstract: In one embodiment an IC is disclosed which includes a plurality of cascaded digital signal processing slices, wherein each slice has a multiplier coupled to an adder via a multiplexer and each slice has a direct connection to an adjoining slice; and means for configuring the plurality of digital signal processing slices to perform one or more mathematical operations, via, for example, opmodes. This IC allows for the implementation of some basic math functions, such as add, subtract, multiply and divide. Many other applications may be implemented using the one or more DSP slices, for example, accumulate, multiply accumulate (MACC), a wide multiplexer, barrel shifter, counter, and folded, decimating, and interpolating FIRs to name a few.Type: GrantFiled: December 21, 2004Date of Patent: July 28, 2009Assignee: XILINX, Inc.Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
-
Publication number: 20090177447Abstract: A method for estimating software development effort comprises the steps of: generating a database containing a plurality of source softwares; calculating the Grey relational coefficients between the software to be developed and a source software in the database for each feature they exhibit; calculating the weights for each Grey relational coefficient; multiplying each Grey relational coefficient with the corresponding weight; calculating the Grey relational grade by summing up the products produced in the multiplying step; calculating the Grey relational grades for all remaining source softwares in the database; and comparing the Grey relational grades to estimate the effort for developing the software to be developed.Type: ApplicationFiled: January 4, 2008Publication date: July 9, 2009Applicant: NATIONAL TSING HUA UNIVERSITYInventors: Chao Jung Hsu, Chin Yu Huang
-
Publication number: 20090150471Abstract: Provided are a reconfigurable arithmetic unit and a processor having the same. The reconfigurable arithmetic unit can perform an addition operation or a multiplication operation according to an instruction by sharing an adder. The reconfigurable arithmetic unit includes a booth encoder for encoding a multiplier, a partial product generator for generating a plurality of partial products using the encoded multiplier and a multiplicand, a Wallace tree circuit for compressing the partial products into a first partial product and a second partial product, a first Multiplexer (MUX) for selecting and outputting one of the first partial product and a first addition input according to a selection signal, a second MUX for selecting and outputting one of the second partial product and a second addition input according to the selection signal, and a Carry Propagation Adder (CPA) for adding an output of the first MUX and an output of the second MUX to output an operation result.Type: ApplicationFiled: June 10, 2008Publication date: June 11, 2009Inventors: Yil Suk YANG, Jung Hee SUK, Chun Gi LYUH, Tae Moon ROH, Jong Dae KIM
-
Patent number: 7516307Abstract: A method and apparatus is disclosed that computes multiple absolute differences from packed data and sums each one of the multiple absolute differences together to produce a result. According to one embodiment, a processor includes a decode unit to decode a packed sum of absolute differences (PSAD) instruction having an opcode format to identify a set of packed data operands. The decode unit initiates a sequence of operations on the set of packed data operands in response to decoding the PSAD instruction. An execution unit performs a first operation of the sequence of operations initiated by the decode logic, and a bus provides the execution unit with the set of packed data operands as identified in accordance with the opcode format.Type: GrantFiled: November 6, 2001Date of Patent: April 7, 2009Assignee: Intel CorporationInventors: Mohammad A. Abdallah, Vladimir Pentkovski
-
Publication number: 20090077154Abstract: Provided is a microprocessor including a complex-MAC unit that operates in response to a complex-MAC instruction. The complex-MAC unit receives first and second complex data (each having 2m-bit length) from a first register having a register length of at least 2m+1 bits, and also receives third and fourth complex data (each having 2m-bit length) from a second register having a register length of at least 2m+1 bits, to calculate a sum of real parts or imaginary parts of a complex product of the first and third complex data and a complex product of the second and fourth complex data. The complex-MAC unit adds the obtained sum of the real parts or imaginary parts to a stored value of the third register, and overwrites the third register with the cumulative total value. The third register has a register length of at least 2m+2 bits.Type: ApplicationFiled: September 10, 2008Publication date: March 19, 2009Applicant: NEC ELECTRONICS CORPORATIONInventors: Hideki Matsuyama, Masayuki Daitou
-
Publication number: 20090070399Abstract: An arithmetic processing system processes a sensing signal and a first approximate offset signal to obtain a second approximate offset signal. The system includes a first arithmetic processor and a second arithmetic processor. The first arithmetic processor receives and processes the sensing signal and the first approximate offset signal to output a first arithmetic signal. The second arithmetic processor processes the first arithmetic signal to output a second arithmetic signal, and the second arithmetic signal is added with a predetermined offset signal to obtain the second approximate offset signal, and the second approximate offset signal is closer to a real offset signal of the sensing signal than the first approximate offset signal. A method of arithmetic processing is also disclosed.Type: ApplicationFiled: November 6, 2007Publication date: March 12, 2009Applicant: ASIA OPTICAL CO., INC.Inventors: Kun-Chi Liao, Yu-Ting Lee
-
Publication number: 20090030963Abstract: The conventional two's complement multiplier which is constituted by a Booth encoder, a partial production generation circuit, and an adder has a problem that the circuit scale would be increased because a bit extension is performed when the multiplier is adapted to an unsigned multiplication. A multiplication circuit of the present invention is provided with a first Booth encoder (1) for encoding lower-order several bits of a multiplier according to first rules of encoding using a Booth algorithm, and a second Booth encoder (5) for encoding most-significant several bits of the multiplier according to second rules of encoding using a Booth algorithm, which are different from the first rules of encoding, and thereby the most-significant several bits of the multiplier are encoded using the Booth algorithm which is different from that for the lower-order several bits.Type: ApplicationFiled: February 8, 2007Publication date: January 29, 2009Inventor: Kouichi Nagano
-
Patent number: 7480690Abstract: Described are arithmetic circuits divided logically into a product generator and an adder. Multiplexing circuitry logically disposed between the product generator and the adder supports conventional functionality by providing partial products from the product generator to addend terminals of the adder. The multiplexing circuitry can also be controlled to direct a number of external added inputs to the adder. The additional addend inputs can include inputs and outputs cascaded from other arithmetic circuits.Type: GrantFiled: December 21, 2004Date of Patent: January 20, 2009Assignee: XILINX, Inc.Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
-
Patent number: 7472155Abstract: Described is a programmable logic device (PLD) with columns of DSP slices that can be cascaded to create DSP circuits of varying size and complexity. Each DSP slice includes a plurality of operand input ports and a slice output port, all of which are programmably connected to general routing and logic resources. The operand ports receive operands for processing, and a slice output port conveys processed results. Each slice additionally includes a feedback port connected to the respective slice output port, to support accumulate functions in this embodiment, and a cascade input port connected to the output port of an upstream slice to support cascading.Type: GrantFiled: December 21, 2004Date of Patent: December 30, 2008Assignee: Xilinx, Inc.Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
-
Patent number: 7467175Abstract: Described is a programmable logic device (PLD) with columns of DSP slices that can be combined to create DSP circuits of varying size and complexity. DSP slices in accordance with some embodiments includes programmable operand input registers that can be configured to introduce different amounts of delay, from zero to two clock cycles, for example, to support pipelining. In one such embodiment, each DSP slice includes a partial-product generator having a multiplier port, a multiplicand port, and a product port. The multiplier and multiplicand ports connect to the operand input port via respective first and second operand input registers, each of which is capable of introducing from zero to two clock cycles of delay. In another embodiment, the output of at least one operand input register can connect to the input of an operand input register of a downstream DSP slice so that operands can be transferred among one or more slices.Type: GrantFiled: December 21, 2004Date of Patent: December 16, 2008Assignee: XILINX, Inc.Inventors: James M. Simkins, Steven P. Young, Jennifer Wong, Bernard J. New, Alvin Y. Ching
-
Publication number: 20080256162Abstract: An x87 fused multiply-add (FMA) instruction in the instruction set of an x86 architecture microprocessor is disclosed. The FMA instruction implicitly specifies the two factor operands as the top two operands of the x87 FPU register stack and explicitly specifies the third addend operand as a third x87 FPU register stack register. The microprocessor multiplies the first two operands and adds the product to the third operand to generate a result. The result is stored into the third register and the first two operands are popped off the stack. In an alternate embodiment, the third operand is also implicitly specified as being stored in the register that is two registers below the top of stack register; the result is also stored therein. The instruction opcode value is in the x87 opcode range.Type: ApplicationFiled: July 23, 2007Publication date: October 16, 2008Applicant: VIA Technologies, Inc.Inventors: G. Glenn Henry, Timothy A. Elliott, Terry Parks
-
Patent number: 7437401Abstract: A programmable logic device is provided that includes a MAC block having mode splitting capabilities. Different modes of operation may be implemented simultaneously whereby the multipliers and other DSP circuitry of the MAC block may be allocated among the different modes of operation. For example, one multiplier may be used to implement a multiply mode while another two multipliers may be used to implement a sum of two multipliers mode.Type: GrantFiled: February 20, 2004Date of Patent: October 14, 2008Assignee: Altera CorporationInventors: Leon Zheng, Martin Langhammer, Steven Perry, Paul Metzgen, Gregory Starr, William Hwang, Kumara Tharmalingam
-
Patent number: 7428566Abstract: A multipurpose functional unit is configurable to support a number of operations including multiply-add and format conversion operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and logical test operations.Type: GrantFiled: November 10, 2004Date of Patent: September 23, 2008Assignee: Nvidia CorporationInventors: Ming Y. Siu, Stuart F. Oberman
-
Publication number: 20080195685Abstract: Multiplication engines and multiplication methods are provided. A multiplication engine for a digital processor includes a first multiplier to generate unequally weighted partial products from input operands in a first multiplier mode; a second multiplier to generate equally weighted partial products from input operands in a second multiplier mode; a multiplexer to select the unequally weighted partial products in the first multiplier mode and to select the equally weighted partial products in the second multiplier mode; and a carry save adder array configured to combine the selected partial products in the first multiplier mode and in the second multiplier mode.Type: ApplicationFiled: January 10, 2008Publication date: August 14, 2008Applicant: Analog Devices, Inc.Inventors: Andreas D. Olofsson, Baruch Yanovitch
-
Publication number: 20080189347Abstract: According to some embodiments, a dual multiply-accumulate operation optimized for even and odd multisample calculations is disclosed.Type: ApplicationFiled: March 4, 2008Publication date: August 7, 2008Inventors: Bradley C. Aldrich, Nigel C. Paver, William T. Maghielse
-
Publication number: 20080155382Abstract: Various methods and systems for implementing Reed Solomon multiplication sections from exclusive-OR (XOR) logic are disclosed. For example, a system includes a Reed Solomon multiplication section, which includes XOR-based logic. The XOR-based logic includes an input, an output, and one or more XOR gates. A symbol X is received at the input of the XOR-based logic. The one or more XOR gates are coupled to generate a product of a power of ? and X at the output, wherein ? is a root of a primitive polynomial of a Reed Solomon code. Such a Reed Solomon multiplication section, which can include one or more multipliers implemented using XOR-based logic, can be included in a Reed Solomon encoder or decoder.Type: ApplicationFiled: March 11, 2008Publication date: June 26, 2008Inventors: Qiujie Dong, Andrew J. Thurston
-
Publication number: 20080140753Abstract: An electronically implemented method includes multiplying a number A, and a number B, where A is composed of segments ai and B is composed of segments bj where i and j are integers greater than 1. The multiplying includes determining partial product values for at least some of aibj and determining a sum of partial product values for aibj and ajbi where ai=bj and bj=ai for respective values of i and j, by multiplying one of (1) aibj and (2) ajbi by two. A sum is determined and stored in a memory storage element of the determined partial product values and the determined sum of partial product values for aibj and ajbi.Type: ApplicationFiled: December 8, 2006Publication date: June 12, 2008Inventors: Vinodh Gopal, Gilbert M. Wolrich, Wajdi Feghali, Robert P. Ottavi
-
Publication number: 20080140752Abstract: System and method for processing symbols in a communication system are disclosed and may include in a processor that receives symbols to be coded for transmission over a wireless medium, grouping elements of an input matrix across a second dimension of the input matrix to form groups of matrix elements while multiplying the input matrix and an input vector. The input vector may include the symbols to be coded for transmission over the wireless medium. The method may also include pre-computing possible permutations of partial results for each of the groups of matrix elements, and assigning the partial results from each of the groups of matrix elements to each of a corresponding index of a first dimension of the input matrix to form a matrix of assigned partial results.Type: ApplicationFiled: January 21, 2008Publication date: June 12, 2008Inventor: Yung-hsiang Lee
-
Publication number: 20080114826Abstract: The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve performing a plurality of dot product operations to generate operands for generating operands for a new vector. The dot product operations may require the issue of a plurality of permute instructions to arrange the vector operands in desired locations of a target register. Embodiments of the invention provide a dot product instruction wherein a mask field may be used to specify a particular location of a target register in which to transfer data, thereby avoiding the need for permute instructions for arranging data, reducing dependencies between instructions, and the usage of temporary registers.Type: ApplicationFiled: October 31, 2006Publication date: May 15, 2008Inventors: Eric Oliver Mejdrich, Adam James Muff
-
Patent number: 7353244Abstract: According to some embodiments, a dual multiply-accumulate operation optimized for even and odd multisample calculations is disclosed.Type: GrantFiled: April 16, 2004Date of Patent: April 1, 2008Assignee: Marvell International Ltd.Inventors: Bradley C. Aldrich, Nigel C. Paver, William T. Maghielse
-
Publication number: 20070185953Abstract: Included are embodiments of a Multiply-Accumulate Unit to process multiple format floating point operands. For short format operands, embodiments of the Multiply Accumulate Unit are configured to process data with twice the throughput as long and mixed format data. At least one embodiment can include a short exponent calculation component configured to receive short format data, a long exponent calculation component configured to receive long format data, and a mixed exponent calculation component configured to receive short exponent data, the mixed exponent calculation component further configured to received long format data. Embodiments also include a mantissa datapath configured for implementation to accommodate processing of long, mixed, and short floating point operands.Type: ApplicationFiled: February 6, 2007Publication date: August 9, 2007Inventors: Boris Prokopenko, Timour Paltashev, Derek Gladding
-
Patent number: 7231510Abstract: A mechanism for, and method of, processing multiply-accumulate instructions with out-of-order completion in a pipeline, for use in a processor having an at least four-wide instruction issue architecture, and a digital signal processor (DSP) incorporating the mechanism or the method. In one embodiment, the mechanism including: (1) a multiply-accumulate unit (MAC) having an initial multiply stage and a subsequent accumulate stage and (2) out-of-order completion logic, associated with the MAC, that causes interim results produced by the multiply stage to be stored when the accumulate stage is unavailable and allows younger instructions to complete before the multiply-accumulate instructions.Type: GrantFiled: November 13, 2001Date of Patent: June 12, 2007Assignee: VeriSilicon Holdings (Cayman Islands) Co. Ltd.Inventors: Hung T. Nguyen, Shannon A. Wichman
-
Patent number: 7225323Abstract: A multipurpose functional unit is configurable to support a number of operations including multiply-add and comparison testing operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and format conversion operations.Type: GrantFiled: November 10, 2004Date of Patent: May 29, 2007Assignee: NVIDIA CorporationInventors: Ming Y. Siu, Stuart F. Oberman
-
Patent number: 7216217Abstract: A programmable processor that comprises a general purpose processor architecture, capable of operation independent of another host processor, having a virtual memory addressing unit, an instruction path and a data path; an external interface; a cache operable to retain data communicated between the external interface and the data path; at least one register file configurable to receive and store data from the data path and to communicate the stored data to the data path; and a multi-precision execution unit coupled to the data path. The multi-precision execution unit is configurable to dynamically partition data received from the data path to account for an elemental width of the data and is capable of performing group floating-point operations on multiple operands in partitioned fields of operand registers and returning catenated results. In other embodiments the multi-precision execution unit is additionally configurable to execute group integer and/or group data handling operations.Type: GrantFiled: August 25, 2003Date of Patent: May 8, 2007Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris
-
Patent number: 7206927Abstract: A method of executing an instruction stream in a pipelined execution unit of depth, p, comprises loading the instruction stream; detecting an iteration of an instruction in the loaded instruction stream; interleaving p steams of instances of the instruction in the pipeline; detecting an end of the iteration; and combining results obtained from the p streams after all programmed iterations have completed. A computational circuit comprises a register which can hold a value representing both an operand and result of an iterative operation; a multiplexer having a first input connected to receive the operand from the register, a second input connected to a source of an identify value for the iterative operation, and an output; and an operator circuit having an input connected to receive a value from the multiplexer output, and an output connected to return thee result to the register.Type: GrantFiled: November 19, 2002Date of Patent: April 17, 2007Assignee: Analog Devices, Inc.Inventor: Abhijit Giri
-
Patent number: 7181484Abstract: A multiply unit includes an extended precision accumulator. Microprocessor instructions are provided for manipulating portions of the extended precision accumulator including an instruction to move the contents of a portion of the extended accumulator to a general-purpose register (“MFLHXU”) and an instruction to move the contents of a general-purpose register to a portion of the extended accumulator (“MTLHX”).Type: GrantFiled: February 21, 2001Date of Patent: February 20, 2007Assignee: MIPS Technologies, Inc.Inventors: Morten Stribaek, Pascal Paillier
-
Patent number: 7142010Abstract: In a programmable logic device having dedicated multiplier circuitry, some of the scan chain registers normally used for testing the device are located adjacent input registers of the multipliers. Those scan chain registers are ANDed with the input registers, and can be loaded with templates of ones and zeroes. This allows, e.g., subset multiplication if the least significant bits are loaded with zeroes and the remaining bits are loaded with ones. The multipliers preferably are arranged in blocks with other components, such as adders, that allow them to be configured as finite impulse response (FIR) filters. In such configurations, the scan chain registers can be used to load filter coefficients, avoiding the use of scarce logic and routing resources of the device.Type: GrantFiled: December 19, 2003Date of Patent: November 28, 2006Assignee: Altera CorporationInventors: Martin Langhammer, Chiao Kai Hwang, Gregory Starr
-
Patent number: 7127482Abstract: An algorithm and hardware structure is described for numerical operations on signals that is reconfigurable to operate in a downsampling or non-downsampling mode. According to one embodiment, a plurality of adders and multipliers are reconfigurable via a switching fabric to operate as a plurality of MAAC ( multiply-add-accumulator) kernels (described in detail below), when operating in a non-downsampling mode and a plurality of MAAC kernels and AMAAC (add-multiply-add-accumulator) kernals (described in detail below), when operating in a downsampling mode.Type: GrantFiled: November 19, 2001Date of Patent: October 24, 2006Assignee: Intel CorporationInventors: Yan Hou, Hong Jiang, Kam Leung
-
Patent number: 7111166Abstract: An extension of the serial/parallel Montgomery modular multiplication method with simultaneous reduction as previously implemented by the applicants, adapted innovatively to perform both in the prime number and in the GF(2q) polynomial based number field, in such a way as to simplify the flow of operands, by performing a multiple anticipatory function to enhance the previous modular multiplication procedures.Type: GrantFiled: May 14, 2001Date of Patent: September 19, 2006Assignee: Fortress U&T Div. M-Systems Flash Disk Pioneers Ltd.Inventors: Itai Dror, Carmi David Gressel, Michael Mostovoy, Alexey Molchanov
-
Patent number: 7107305Abstract: A tightly coupled dual 16-bit multiply-accumulate (MAC) unit for performing single-instruction/multiple-data (SIMD) operations may forward an intermediate result to another operation in a pipeline to resolve an accumulating dependency penalty. The MAC unit may also be used to perform 32-bit×32-bit operations.Type: GrantFiled: October 5, 2001Date of Patent: September 12, 2006Assignee: Intel CorporationInventors: Deli Deng, Anthony Jebson, Yuyun Liao, Nigel C. Paver, Steve J. Strazdus
-
Patent number: 7080113Abstract: A virtually parallel multiplier-accumulator (VMAC) that can execute more than or less than one MAC operation in a single system clock cycle. The inventive VMAC advantageously employs a resource/time-sharing methodology with multiple sequential computational stages.Type: GrantFiled: July 17, 2003Date of Patent: July 18, 2006Assignee: Agere Systems Inc.Inventors: Hyun Lee, Shaun P. Whalen
-
Patent number: 7043517Abstract: A multiply accumulator performs a multiplication-and-addition operation for a first multiplier with N bits, a second multiplier with N bits, and an addend with M bits, wherein M is larger than 2N. The multiply accumulator includes a modified Booth encoder and a multiplication-and-addition unit. The modified Booth encoder performs a Booth encoding to either the first multiplier or its bit inversion by supplementing a multiplier sign bit behind a least significant bit of either the first multiplier or its bit inversion. The multiplication-and-addition unit includes a carry save adder tree and a sign extension adder and achieves a high speed of the multiplication-and-addition operation by simultaneously performing the multiplication and addition.Type: GrantFiled: March 7, 2003Date of Patent: May 9, 2006Assignee: Faraday Technology Corp.Inventor: Chi-jui Chung
-
Patent number: 7043518Abstract: A multiply accumulate unit (“MAC”) that performs operations on packed integer data. In one embodiment, the MAC receives 2 32-bit data words which, depending on the specified mode of operation, each contain either four 8-bit operands, two 16-bit operands, or one 32-bit operand. Depending on the mode of operation, the MAC performs either sixteen 8×8 operations, four 16×16 operations, or one 32×32 operation. Results may be individually retrieved from registers and the corresponding accumulator cleared after the read cycle. In addition, the accumulators may be globally initialized. Two results from the 8×8 operations may be packed into a single 32-bit register. The MAC may also shift and saturate the products as required.Type: GrantFiled: February 9, 2004Date of Patent: May 9, 2006Assignee: Cradle Technologies, Inc.Inventors: Moshe B. Simon, Erik P. Machnicki, David A. Harrison, Rakesh K. Singh
-
Patent number: 7043519Abstract: In an SIMD sum of product arithmetic method of enabling a concurrent execution of 2n (where n is a natural number) parallel sum of product arithmetic (operations), the SIMD sum of product arithmetic is executed using 2m (m=0, . . . , log2 n) accumulators as one set, and by replacing a 2p?1th accumulator with an adjacent 2pth (p=1, . . . , n/2) accumulator, without changing a sequence of accumulator addresses, in the set, as accumulator addresses to be allocated to sum of product arithmetic circuits for the SIMD sum of product arithmetic.Type: GrantFiled: September 5, 2001Date of Patent: May 9, 2006Assignee: Fujitsu LimitedInventor: Masayuki Tsuji
-
Patent number: 7035890Abstract: An apparatus for multiplying and accumulating numeric quantities, including a multiplier for receiving the numeric quantities, with the multiplier having a sum output and a carry output. A first shift register has an input coupled to the sum output of the multiplier, and a second shift register has an input coupled to the carry output of the multiplier. An adder and third shift register are used to complete processing of the apparatus' arithmetic operations.Type: GrantFiled: March 1, 2001Date of Patent: April 25, 2006Assignee: 8x8, IncInventors: Jan Fandrianto, Chi Shin Wang, Sehat Sutardja, Hedley K. J. Rainnie, Bryan R. Martin
-
Patent number: 7027598Abstract: A pre-computation and dual-pass modular operation approach to implement encryption protocols efficiently in electronic integrated circuits is disclosed. An encrypted electronic message is received and another electronic message generated based on the encryption protocol. Two passes of Montgomery's method are used for a modular operation that is associated with the encryption protocol along with pre-computation of a constant based on a modulus. The modular operation may be a modular multiplication or a modular exponentiation. Modular arithmetic may be performed using the residue number system (RNS) and two RNS bases with conversions between the two RNS bases. A minimal number of register files are used for the computations along with an array of multiplier circuits and an array of modular reduction circuits. The approach described allows for high throughput for large encryption keys with a relatively small number of logical gates.Type: GrantFiled: September 19, 2001Date of Patent: April 11, 2006Assignee: Cisco Technology, Inc.Inventors: Mihailo M. Stojancic, Mahesh S. Maddury, Kenneth J. Tomei
-
Patent number: 7027597Abstract: A pre-computation and dual-pass modular operation approach to implement encryption protocols efficiently in electronic integrated circuits is disclosed. An encrypted electronic message is received and another electronic message generated based on the encryption protocol. Two passes of Montgomery's method are used for a modular operation that is associated with the encryption protocol along with pre-computation of a constant based on a modulus. The modular operation may be a modular multiplication or a modular exponentiation. Modular arithmetic may be performed using the residue number system (RNS) and two RNS bases with conversions between the two RNS bases. A minimal number of register files are used for the computations along with an array of multiplier circuits and an array of modular reduction circuits. The approach described allows for high throughput for large encryption keys with a relatively small number of logical gates.Type: GrantFiled: September 18, 2001Date of Patent: April 11, 2006Assignee: Cisco Technologies, Inc.Inventors: Mihailo M. Stojancic, Mahesh S. Maddury, Kenneth J. Tomei
-
Patent number: 7013321Abstract: According to the invention, a processing core that executes a parallel multiply accumulate operation is disclosed. Included in the processing core are a first, second and third input operand registers; a number of functional blocks; and, an output operand register. The first, second and third input operand registers respectively include a number of first input operands, a number of second input operands and a number of third input operands. Each of the number of functional blocks performs a multiply accumulate operation. The output operand register includes a number of output operands. Each of the number of output operands is related to one of the number of first input operands, one of the number of second input operands and one of the number of third input operands.Type: GrantFiled: November 21, 2001Date of Patent: March 14, 2006Assignee: Sun Microsystems, Inc.Inventor: Ashley Saulsbury
-
Patent number: 7010558Abstract: An apparatus and method for performing enhanced algorithmic processing, including reduced cycle-count fast Fourier transform (FFT) calculations. In one aspect, the invention comprises a user-configurable processor having an extension instruction adapted for reduced cycle-count algorithmic operations. In one exemplary embodiment, the processor is an extensible core, and the extension instruction comprises a 32-bit instruction word linked with existing circuitry in the processor core used for multiply-accumulate (mac) instructions. 16-bit, 24-bit, and dual 16-bit multiply options are available for the multiply/accumulate unit of the processor. The extension instruction is pipelined to the same number of stages as the mac instructions, thereby avoiding unnecessary stalls and increasing performance. A modified accumulator data path used in support of the foregoing instruction is also described.Type: GrantFiled: April 18, 2002Date of Patent: March 7, 2006Assignee: ARC InternationalInventor: Chris Morris
-
Patent number: 6988184Abstract: Methods of performing dyadic digital signal processing (DSP) instructions. In one embodiment of the invention, the method includes fetching a dyadic DSP instruction having a main operation and a sub operation; predecoding the dyadic DSP instruction to generate predecoded instruction signals; and decoding the predecoded instruction signals to generate select signals to selectively couple data from a first plurality of buses coupled to inputs of multiplexers of a first plurality of DSP functional blocks to execute the main operation of the dyadic DSP instruction in one processor cycle and to selectively couple data from a second plurality of buses coupled to inputs of multiplexers of a second plurality of DSP functional blocks to execute the sub operation of the dyadic DSP instruction in the one processor cycle.Type: GrantFiled: August 2, 2002Date of Patent: January 17, 2006Assignee: Intel CorporationInventors: Kumar Ganapathy, Ruban Kanapathipillai
-
Patent number: 6976049Abstract: The present invention relates to a method and system for providing a single accumulatable packed multi-way addition instruction having the functionality of multiple instructions without causing any timing problems in the execute stage. Specifically, the accumulatable packed multi-way combination instruction may be associated with at least one destination and a plurality of operands and set a polarity of each of a plurality of source operands derived from the plurality of operands, if requested by the instruction. The instruction also may add selected pairs of the plurality of source operands in predetermined orders to obtain at least one result and, if requested by the instruction, accumulating the plurality of results to obtain at least one accumulated result; output at least one predetermined pair of the at least one result and the at least one accumulated result; and accumulate condition codes for each of the at least one result and the at least one accumulated result, if requested by the instruction.Type: GrantFiled: March 28, 2002Date of Patent: December 13, 2005Assignee: Intel CorporationInventor: Gad Sheaffer