Multiplication Followed By Addition Patents (Class 708/501)
  • Patent number: 9405535
    Abstract: A circuit arrangement provides support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: August 2, 2016
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9400635
    Abstract: An integrated circuit is provided that performs floating-point operations involving at least two successive computational steps. Two floating-point numbers entering any additional computational step after the first computational step are aligned dynamically by shifting the mantissa of the floating-point number with the greater exponent to the left and the mantissa of the floating-point number with the smaller exponent to the right. The number of left shift bits is dependent on the magnitude of the difference between the two floating-point exponents and the number of leading zeroes in the mantissa with the greater exponent. The number of right shift bits is dependent on the magnitude of the difference between the two floating-point exponents and the number of left shift bits.
    Type: Grant
    Filed: January 14, 2013
    Date of Patent: July 26, 2016
    Assignee: Altera Corporation
    Inventor: Tomasz Sebastian Czajkowski
  • Patent number: 9317250
    Abstract: The present application provides a method and apparatus for supporting denormal numbers in a floating point multiply-add unit (FMAC). One embodiment of the FMAC is configurable to add a product of first and second operands to a third operand. This embodiment of the FMAC is configurable to determine a minimum exponent shift for a sum of the product and the third operand by subtracting a minimum normal exponent from a product exponent of the product. This embodiment of the FMAC is also configurable to cause bits representing the sum to be left shifted by the minimum exponent shift if a third exponent of the third operand is less than or equal to the product exponent and the minimum exponent shift is less than or equal to a predicted left shift for the sum.
    Type: Grant
    Filed: November 12, 2012
    Date of Patent: April 19, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Kelvin D. Goveas, Debjit Das Sarma, Scott A. Hilker, Hanbing Liu
  • Patent number: 9274752
    Abstract: In one embodiment, a processor includes at least one floating point unit. The at least one floating point unit may include an adder, leading change anticipator (LCA) logic, and a shifter. The adder may be to add a first operand X and a second operand Y to obtain an output operand having a bit length n. The LCA logic may be to: for each bit position i from n?1 to 1, obtain a set of propagation values and a set of bit values based on the first operand X and the second operand Y; and generate a LCA mask based on the set of propagation values and the set of bit values. The shifter may be to normalize the output operand based on the LCA mask. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: March 1, 2016
    Assignee: Intel Corporation
    Inventors: Simon Rubanovich, Thierry Pons, Amit Gradstein, Zeev Sperber
  • Patent number: 9264066
    Abstract: Techniques are disclosed relating to type conversion using a floating-point unit. In one embodiment, to convert a floating-point value to a normalized integer format, a floating-point unit is configured to perform an operation to generate a result having a significant portion and an exponent portion, where the operation includes multiplying the floating-point value by a constant. In one embodiment, the apparatus is further configured to add a value to the exponent portion of the result, and set a rounding mode to round to nearest. The constant may be a greatest value less than one that can be represented using the particular number of unsigned bits. The value added to the initial exponent may be equal to the number of unsigned bits of the normalized integer format. The apparatus may perform this conversion in response to a pack instruction.
    Type: Grant
    Filed: July 30, 2013
    Date of Patent: February 16, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9256397
    Abstract: A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.
    Type: Grant
    Filed: December 3, 2013
    Date of Patent: February 9, 2016
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Klaus M. Kroener, Christophe J. Layer, Silvia M. Mueller
  • Patent number: 9141337
    Abstract: A processing device is provided that includes a first, second and third precision operation circuit. The processing device further includes a shared, bit-shifting circuit that is communicatively coupled to the first, second and third precision operation circuits. A method is also provided for multiplying a first and second binary number including adding a first exponent value associated with the first binary number to a second exponent value associated with the second binary number and multiplying a first mantissa value associated with the first binary number to a second mantissa value associated with the second binary number. The method includes performing the exponent adding and mantissa multiplying substantially in parallel. The method further includes performing at least one of adding or subtracting a third binary number to the product. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create an apparatus.
    Type: Grant
    Filed: September 6, 2011
    Date of Patent: September 22, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Scott Hilker
  • Patent number: 9104474
    Abstract: Embodiments of the present invention may provide methods and circuits for energy efficient floating point multiply and/or add operations. A variable precision floating point circuit may determine the certainty of the result of a multiply-add floating point calculation in parallel with the floating-point calculation. The variable precision floating point circuit may use the certainty of the inputs in combination with information from the computation, such as, binary digits that cancel, normalization shifts, and rounding, to perform a calculation of the certainty of the result. A floating point multiplication circuit may determine whether a lowest portion of a multiplication result could affect the final result and may induce a replay of the multiplication operation when it is determined that the result could affect the final result.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: August 11, 2015
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Ram K. Krishnamurthy, William C. Hasenplaugh, Randy L. Allmon, Jonathan Enoch
  • Patent number: 8990282
    Abstract: A fused multiply add floating point unit 1 includes multiplying circuitry 4 and adding circuitry 8. The multiply circuitry 4 multiplies operands B and C having N-bit significands to generate an unrounded product B*C. The unrounded product B*C has an M-bit significand, where M>N. The adding circuitry 8 receives an operand A that is input at a later processing cycle than a processing cycle at which the multiplying circuitry 4 receives operands B and C. The adding circuitry 8 commences processing of the operand A after the unrounded product B*C is generated by the multiplying circuitry 4. The adding circuitry 8 adds the operand A to the unrounded product B*C and outputs a rounded result A+B*C.
    Type: Grant
    Filed: September 21, 2009
    Date of Patent: March 24, 2015
    Assignee: ARM Limited
    Inventor: David Raymond Lutz
  • Patent number: 8990283
    Abstract: A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
    Type: Grant
    Filed: October 24, 2011
    Date of Patent: March 24, 2015
    Assignee: Oracle America, Inc.
    Inventors: Murali K. Inaganti, Leonard D. Rarick
  • Patent number: 8977670
    Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.
    Type: Grant
    Filed: May 11, 2012
    Date of Patent: March 10, 2015
    Assignee: Oracle International Corporation
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Patent number: 8972471
    Abstract: An arithmetic module is provided, including a first adder, a first shifter coupled to the first adder, a multiplier coupled to the first shifter for receiving an external coefficient signal, a digit alignment unit coupled to the multiplier, a second adder coupled to the digit alignment unit, and a second shifter coupled to the second adder. The arithmetic module reduces the overall computation time effectively, as compared with a scalar processor, by employing a serial data connection design, and also significantly reduces power consumption of the digital signal processor by requiring fewer input and output ends than those of a multi-issue processor.
    Type: Grant
    Filed: September 12, 2012
    Date of Patent: March 3, 2015
    Assignee: National Chiao Tung University
    Inventors: Chih-Wei Liu, Kuo-Chiang Chang, Shih-Hao Ou, Yu-Wen Chen
  • Patent number: 8965945
    Abstract: An apparatus and method are provided for performing an addition operation on operands A and B in order to produce a result R, the operands A and B and the result R being floating point values each having a significand and an exponent. The apparatus comprises prediction circuitry for generating a shift indication based on a prediction of the number of leading zeros that would be present in an output produced by subjecting the operands A and B to an unlike signed addition. Further, result pre-normalization circuitry performs a shift operation on the significands of both operand A and operand B prior to addition of the significands, this serving to discard a number of most significant bits of the significands of both operands as determined by the shift indication in order to produce modified significands for operands A and B.
    Type: Grant
    Filed: February 17, 2011
    Date of Patent: February 24, 2015
    Assignee: ARM Limited
    Inventor: David Raymond Lutz
  • Patent number: 8930433
    Abstract: An embodiment of an apparatus performs a floating-point multiply-add process on a first multiplicand, a second multiplicand, and an addend. A leading 0 bit is added to a mantissa of the first multiplicand to form an expanded first mantissa, and a partial-product multiplication is performed on the expanded first mantissa and a mantissa of the second multiplicand to produce partial-product sum and a partial-product carry mantissas. Leading bits of the partial-product sum and carry mantissas are changed to 0 bits if they are both 1 bits, and the partial-product sum and the partial-product carry are shifted right according to an exponent difference of a product of the first multiplicand and the second multiplicand. Otherwise both the partial-product sum and carry mantissas are arithmetically shifted right according to the exponent difference. The first and second multiplicands and the addend can be complex numbers.
    Type: Grant
    Filed: April 24, 2012
    Date of Patent: January 6, 2015
    Assignee: Futurewei Technologies, Inc.
    Inventors: Zhihong Li, Tong Sun, Zhikun Cheng
  • Patent number: 8930432
    Abstract: A floating point execution unit is capable of selectively repurposing one or more adders in an exponent path of the floating point execution unit to perform fixed point addition operations, thereby providing fixed point functionality in the floating point execution unit.
    Type: Grant
    Filed: August 4, 2011
    Date of Patent: January 6, 2015
    Assignee: International Business Machines Corporation
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Patent number: 8924454
    Abstract: A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.
    Type: Grant
    Filed: January 25, 2012
    Date of Patent: December 30, 2014
    Assignee: Arm Finance Overseas Limited
    Inventor: David Yiu-Man Lau
  • Publication number: 20140379773
    Abstract: Systems and methods of performing a fused multiply add (FMA) operations are provided. In one embodiment, the length of the adder used by the FMA operation is less than 3*N, where N is the number of bits in the mantissa term of a floating point number. A mask may be used to perform the addition portion of the FMA operation using the adder. A second mask may be used to denormalize the result of the addition portion of the FMA operation if an underflow occurs.
    Type: Application
    Filed: June 25, 2013
    Publication date: December 25, 2014
    Inventors: Simon Rubanovich, Thierry Pons, Amit Gradstein, Zeev Sperber
  • Patent number: 8914430
    Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: December 16, 2014
    Assignee: Intel Corporation
    Inventors: Amit Gradstein, Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan
  • Publication number: 20140351309
    Abstract: An FMA unit, for carrying out an arithmetic operation in a model computation unit of a control unit, is configured to process input of two factors and one summand in the form of floating point values, and provide a computation result of such processing as an output variable in the form of a floating point value. The FMA unit is designed to carry out a multiplication and a subsequent addition, the bit resolutions of the inputs for the factors being lower than the bit resolution of the input for the summand and the bit resolution of the output variable.
    Type: Application
    Filed: May 21, 2014
    Publication date: November 27, 2014
    Applicant: ROBERT BOSCH GMBH
    Inventors: Wolfgang FISCHER, Andre GUNTORO
  • Publication number: 20140351308
    Abstract: A system and method are provided for dynamically reducing power consumption of floating-point logic. A disable control signal that is based on a characteristic of a floating-point format input operand is received and a portion of a logic circuit is disabled based on the disable control signal. The logic circuit processes the floating-point format input operand to generate an output.
    Type: Application
    Filed: May 23, 2013
    Publication date: November 27, 2014
    Applicant: NVIDIA Corporation
    Inventors: David C. Tannenbaum, Srinivasan Iyer
  • Patent number: 8892619
    Abstract: A floating-point fused multiply-add (FMA) unit embodied in an integrated circuit includes a multiplier circuit cascaded with an adder circuit to produce a result A*C+B. To decrease latency, the FMA includes accumulation bypass circuits forwarding an unrounded result of the adder to inputs of the close path and the far path circuits of the adder, and forwarding an exponent result in carry save format to an input of the exponent difference circuit. Also included in the FMA is a multiply-add bypass circuit forwarding the unrounded result to the inputs of the multiplier circuit. The adder circuit includes an exponent difference circuit implemented in parallel with the multiplier circuit; a close path circuit implemented after the exponent difference circuit; and a far path circuit implemented after the exponent difference circuit.
    Type: Grant
    Filed: July 24, 2012
    Date of Patent: November 18, 2014
    Assignee: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Sameh Galal, Mark Horowitz
  • Patent number: 8868632
    Abstract: Methods and apparatus for predicting an underflow condition associated with a floating-point multiply-add operation are disclosed. An example apparatus obtains a first operand value and a second operand value. The example apparatus then determines if the second operand value subtracted from the first operand value is greater than a minimum value and determines if the first operand value is greater than a sum value associated with a minimum operand value. The example apparatus then asserts an output signal indicative of an absence of an underflow condition associated with a floating-point value based on conditions associated with determining whether the second operand value subtracted from the first operand value is greater than the minimum value and determining if the first operand value is greater than the sum value.
    Type: Grant
    Filed: September 15, 2005
    Date of Patent: October 21, 2014
    Assignee: Intel Corporation
    Inventor: Marius A. Cornea-Hasegan
  • Publication number: 20140244704
    Abstract: A method for operating a fused-multiply-add pipeline in a floating-point unit of a processor is disclosed. A multiplication is initially performed between a first operand and a second operand in a multiplier block to obtain a set of partial product results. The partial product results are sent to a carry-save adder block. A partial product reduction is performed on the partial product results to generate a carry-save result having a sum term and a carry term. The carry-save result is then formatted to generate a carry-out bit. The carry-save result is added to a third operand to generate a final result.
    Type: Application
    Filed: January 31, 2014
    Publication date: August 28, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: SON DAO TRONG, MICHAEL KLEIN, CHRISTOPHE LAYER, SILVIA M. MUELLER
  • Patent number: 8805915
    Abstract: A fixed multiply-add (FMA) apparatus and method are provided. The FMA apparatus includes a partial product generator configured to generate a partial sum and a partial carry, a carry save adder configured to generate a partial sum having a first bit size and a partial carry having the first bit size by adding the partial sum and the partial carry to least significant bits (LSBs) of the mantissa of a third floating-point number, a carry select adder configured to generate a mantissa having a second bit size by adding the first bit-size partial sum and the first bit-size partial carry to most significant bits (MSBs) of the third floating-point number, and a selector configured to transmit the first bit-size partial sum and the first bit-size partial carry to the carry save adder or the carry select adder according to whether the mantissa of the third floating-point number is zero.
    Type: Grant
    Filed: June 6, 2011
    Date of Patent: August 12, 2014
    Assignees: Samsung Electronics Co., Ltd., Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hyeong-Seok Yu, Dong-Kwan Suh, Suk-Jin Kim, San Kim, Yong-Surk Lee
  • Publication number: 20140195580
    Abstract: A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Application
    Filed: December 30, 2011
    Publication date: July 10, 2014
    Inventors: Cristina S. Anderson, Bret L. Toll, Robert Valentine, Simon Rubanovich, Amit Gradsieien
  • Publication number: 20140188967
    Abstract: In one embodiment, a processor includes at least one floating point unit. The at least one floating point unit may include an adder, leading change anticipator (LCA) logic, and a shifter. The adder may be to add a first operand X and a second operand Y to obtain an output operand having a bit length n. The LCA logic may be to: for each bit position i from n?1 to 1, obtain a set of propagation values and a set of bit values based on the first operand X and the second operand Y; and generate a LCA mask based on the set of propagation values and the set of bit values. The shifter may be to normalize the output operand based on the LCA mask. Other embodiments are described and claimed.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Simon Rubanovich, Thierry Pons, Amit Gradstein, Zeev Sperber
  • Publication number: 20140188968
    Abstract: Embodiments of the present invention may provide methods and circuits for energy efficient floating point multiply and/or add operations. A variable precision floating point circuit may determine the certainty of the result of a multiply-add floating point calculation in parallel with the floating-point calculation. The variable precision floating point circuit may use the certainty of the inputs in combination with information from the computation, such as, binary digits that cancel, normalization shifts, and rounding, to perform a calculation of the certainty of the result. A floating point multiplication circuit may determine whether a lowest portion of a multiplication result could affect the final result and may induce a replay of the multiplication operation when it is determined that the result could affect the final result.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Himanshu KAUL, Mark A. ANDERS, Sanu K. MATHEW, Ram K. KRISHNAMURTHY, William C. HASENPLAUGH, Randy L. ALLMON, Jonathan ENOCH
  • Publication number: 20140188966
    Abstract: A floating-point fused multiply-add (FMA) unit embodied in an integrated circuit includes a multiplier circuit cascaded with an adder circuit to produce a result A*C+B. To decrease latency, the FMA includes accumulation bypass circuits forwarding an unrounded result of the adder to inputs of the close path and the far path circuits of the adder, and forwarding an exponent result in carry save format to an input of the exponent difference circuit. Also included in the FMA is a multiply-add bypass circuit forwarding the unrounded result to the inputs of the multiplier circuit. The adder circuit includes an exponent difference circuit implemented in parallel with the multiplier circuit; a close path circuit implemented after the exponent difference circuit; and a far path circuit implemented after the exponent difference circuit.
    Type: Application
    Filed: July 24, 2012
    Publication date: July 3, 2014
    Inventors: Sameh Galal, Mark Horowitz
  • Publication number: 20140164465
    Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.
    Type: Application
    Filed: March 11, 2013
    Publication date: June 12, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20140164464
    Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.
    Type: Application
    Filed: December 6, 2012
    Publication date: June 12, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 8732225
    Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.
    Type: Grant
    Filed: October 11, 2013
    Date of Patent: May 20, 2014
    Assignee: Altera Corporation
    Inventors: Suleyman Demirsoy, Hyun Yi
  • Publication number: 20140136587
    Abstract: The present application provides a method and apparatus for supporting denormal numbers in a floating point multiply-add unit (FMAC). One embodiment of the FMAC is configurable to add a product of first and second operands to a third operand. This embodiment of the FMAC is configurable to determine a minimum exponent shift for a sum of the product and the third operand by subtracting a minimum normal exponent from a product exponent of the product. This embodiment of the FMAC is also configurable to cause bits representing the sum to be left shifted by the minimum exponent shift if a third exponent of the third operand is less than or equal to the product exponent and the minimum exponent shift is less than or equal to a predicted left shift for the sum.
    Type: Application
    Filed: November 12, 2012
    Publication date: May 15, 2014
    Inventors: Kelvin D. Goveas, Debjit Das Sarma, Scott A. Hilker, Hanbing Liu
  • Patent number: 8711146
    Abstract: Methods and apparatuses for constructing a multi-level solver, comprising decomposing a graph into a plurality of pieces, wherein each of the pieces has a plurality of edges and a plurality of interface nodes, and wherein the interface nodes in the graph are fewer in number than the edges in the graph; producing a local preconditioner for each of the pieces; and aggregating the local preconditioners to form a global preconditioner.
    Type: Grant
    Filed: November 29, 2007
    Date of Patent: April 29, 2014
    Assignee: Carnegie Mellon University
    Inventors: Gary Lee Miller, Ioannis Koutis
  • Patent number: 8694572
    Abstract: A decimal floating-point Fused-Multiply-Add (FMA) unit that performs the operation of ±(A×B)±C on decimal floating-point operands. The decimal floating-point FMA unit executes the multiplication and addition operations compliant with the IEEE 754-2008 standard. Specifically, the decimal floating-point FMA includes a parallel multiplier and injects the addend after required alignment as an additional partial product in the reduction tree used in the parallel multiplier. The decimal floating-point FMA unit may be configured to perform addition-subtraction operations or multiplication operations as standalone operations.
    Type: Grant
    Filed: July 6, 2011
    Date of Patent: April 8, 2014
    Assignee: SilMinds, LLC, Egypt
    Inventors: Rodina Samy, Hossam Ali Hassan Fahmy, Tarek Eldeeb, Ramy Raafat, Yasmeen Farouk, Mostafa Elkhouly, Amira Mohamed
  • Publication number: 20140095568
    Abstract: A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.
    Type: Application
    Filed: December 3, 2013
    Publication date: April 3, 2014
    Inventors: MAARTEN J. BOERSMA, KLAUS M. KROENER, CHRISTOPHE J. LAYER, SILVIA M. MUELLER
  • Publication number: 20140067894
    Abstract: Systems and methods for efficiently handling problematic corner cases in floating point operations without raising flags or exceptions. One or more floating point numbers that will generate a problematic corner case in floating point computations, such as division or square root computation, are detected. Fix-up operations are applied to modify the computation such that the problematic corner case is avoided. The modified computation then is performed, while suppressing error flags are suppressed during intermediate stages.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Erich James Plondke, David J. Hoyle, Swaminathan Balasubramanian
  • Publication number: 20140067895
    Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventor: Liang-Kai Wang
  • Patent number: 8667042
    Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: March 4, 2014
    Assignee: Intel Corporation
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
  • Patent number: 8626813
    Abstract: A fused floating-point dot product unit. The fused dot product unit includes an improved alignment scheme that generates smaller significand pairs compared to the traditional alignment due to the reduced shift amount and sticky logic. Furthermore, the fused dot product unit implements early normalization and a fast rounding scheme. By normalizing the significands prior to the significand addition, the length of the adder can be reduced and the round logic can be performed in parallel. Additionally, the fused dot product unit implements a four-input leading zero anticipation unit thereby reducing the overhead of the reduction tree by encoding the four inputs at once. The fused floating-point dot product unit may also employ a dual-path (a far path and a close path) algorithm to improve performance. Pipelining may also be applied to the dual-path fused dot product unit to increase the throughput.
    Type: Grant
    Filed: August 12, 2013
    Date of Patent: January 7, 2014
    Assignee: Board of Regents, The University of Texas System
    Inventors: Earl E. Swartzlander, Jongwook Sohn
  • Publication number: 20140006467
    Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
  • Publication number: 20130332501
    Abstract: A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.
    Type: Application
    Filed: June 11, 2012
    Publication date: December 12, 2013
    Applicant: IBM Corporation
    Inventors: Maarten J. Boersma, Klaus Michael Kroener, Christophe J. Layer, Silvia M. Mueller
  • Patent number: 8601047
    Abstract: A decimal floating-point (DFP) adder includes a decimal leading-zero anticipator (LZA). The DFP adder receives DFP operands. Each operand includes a significand, an exponent, a sign bit and a leading zero count for the significand. The DFP adder adds or subtracts the DFP operands to obtain a DFP result. The LZA determines the leading zero count associated with the significand of the DFP result. The LZA operates at least partially in parallel with circuitry (in the DFP adder) that computes the DFP result. The LZA does not wait for that circuitry to finish computation of the DFP result. Instead it “anticipates” the number of leading zeros that the result's significand will contain.
    Type: Grant
    Filed: June 13, 2013
    Date of Patent: December 3, 2013
    Assignee: Advanced Micro Devices
    Inventor: Liang-Kai Wang
  • Patent number: 8577948
    Abstract: In one embodiment, a processor includes a multiply-accumulate (MAC) unit having a first path to handle execution of an instruction if a difference between at least a portion of first and second operands and a third operand is less than a threshold value, and a second path to handle the instruction execution if the difference is greater than the threshold value. Based on the difference, at least part of the third operand is to be provided to a multiplier of the MAC unit or to a compressor of the second path. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 20, 2010
    Date of Patent: November 5, 2013
    Assignee: Intel Corporation
    Inventors: Suresh Srinivasan, Rajaraman Ramanarayanan, Sanu K. Mathew, Ram K. Krishnamurthy, Vasantha K. Erraguntla
  • Publication number: 20130282784
    Abstract: A device and methods are disclosed for communicating an unrounded result from one arithmetic calculation for use in a second, subsequent calculation. For example, an unrounded result of a first calculation can be forwarded to provide a multiplier, a multiplicand or an addend operand for the subsequent operation. The operand can be forwarded to the input of the same fused multiply addition module (FMAM) that supplied the result, or to another FMAM, and do so without regard to the precision of the forwarded operand, the precision of the subsequent operation, or the native precision of the FMAM.
    Type: Application
    Filed: June 19, 2013
    Publication date: October 24, 2013
    Inventors: David S. Oliver, Debjit Das Sarma, Scott Hilker
  • Publication number: 20130282783
    Abstract: An embodiment of an apparatus performs a floating-point multiply-add process on a first multiplicand, a second multiplicand, and an addend. A leading 0 bit is added to a mantissa of the first multiplicand to form an expanded first mantissa, and a partial-product multiplication is performed on the expanded first mantissa and a mantissa of the second multiplicand to produce partial-product sum and a partial-product carry mantissas. Leading bits of the partial-product sum and carry mantissas are changed to 0 bits if they are both 1 bits, and the partial-product sum and the partial-product carry are shifted right according to an exponent difference of a product of the first multiplicand and the second multiplicand. Otherwise both the partial-product sum and carry mantissas are arithmetically shifted right according to the exponent difference. The first and second multiplicands and the addend can be complex numbers.
    Type: Application
    Filed: April 24, 2012
    Publication date: October 24, 2013
    Applicant: FutureWei Technologies, Inc.
    Inventors: Zhihong Li, Tong Sun, Zhikun Cheng
  • Publication number: 20130262547
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
    Type: Application
    Filed: May 30, 2013
    Publication date: October 3, 2013
    Inventors: Alexander Peleg, Milland Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
  • Patent number: 8549054
    Abstract: In an arithmetic processing apparatus, a dividing unit divides a second bit string into a low-order bit part having a bit width equal to a first bit width and a high-order bit part which is higher than the low-order bit part, a first arithmetic unit performs arithmetic operations for a carry to and a borrow from the high-order bit part; and a second arithmetic unit performs addition of absolute values of the low-order bit part and the first bit string. Finally, a selecting unit selects an output of the first arithmetic unit from among an arithmetic operation result with a carry, an arithmetic operation result with a borrow, and the high-order bit part itself, according to information about the high-order bit part, sign information of the first bit string and the second bit string, and an intermediate result of the addition of the absolute values by the second arithmetic unit.
    Type: Grant
    Filed: August 21, 2008
    Date of Patent: October 1, 2013
    Assignee: Fujitsu Limited
    Inventor: Ryuji Kan
  • Patent number: 8495121
    Abstract: A device and methods are disclosed for communicating an unrounded result from one arithmetic calculation for use in a second, subsequent calculation. For example, an unrounded result of a first calculation can be forwarded to provide a multiplier, a multiplicand or an addend operand for the subsequent operation. The operand can be forwarded to the input of the same fused multiply addition module (FMAM) that supplied the result, or to another FMAM, and do so without regard to the precision of the forwarded operand, the precision of the subsequent operation, or the native precision of the FMAM.
    Type: Grant
    Filed: November 20, 2008
    Date of Patent: July 23, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David S. Oliver, Debjit Das Sarma, Scott Hilker
  • Patent number: 8489663
    Abstract: A decimal floating-point (DFP) adder includes a decimal leading-zero anticipator (LZA). The DFP adder receives DFP operands. Each operand includes a significand, an exponent, a sign bit and a leading zero count for the significand. The DFP adder adds or subtracts the DFP operands to obtain a DFP result. The LZA determines the leading zero count associated with the significand of the DFP result. The LZA operates at least partially in parallel with circuitry (in the DFP adder) that computes the DFP result. The LZA does not wait for that circuitry to finish computation of the DFP result. Instead it “anticipates” the number of leading zeros that the result's significand will contain.
    Type: Grant
    Filed: June 5, 2009
    Date of Patent: July 16, 2013
    Assignee: Advanced Micro Devices
    Inventor: Liang-Kai Wang
  • Patent number: 8471593
    Abstract: A logic cell array having a number of logic cells and a segmented bus system for logic cell communication, the bus system including different segment lines having shorter and longer segments for connecting two points in order to be able to minimize the number of bus elements traversed between separate communication start and end points.
    Type: Grant
    Filed: November 4, 2011
    Date of Patent: June 25, 2013
    Inventors: Martin Vorbach, Frank May, Dirk Reichardt, Frank Lier, Gerd Ehlers, Armin Nückel, Volker Baumgarte, Prashant Rao, Jens Oertel