Multiplication Followed By Addition Patents (Class 708/501)
  • Patent number: 11893360
    Abstract: A process for performing vector dot products receives a row vector and a column vector as floating point numbers in a format of sign plus exponent bits plus mantissa bits. The process generates a single dot product value by separately processing the sign bits, exponent bits, and mantissa bits to form a sign bit, a normalized mantissa formed by multiplying pairs multiplicand elements, and exponent information including MAX_EXP and EXP_DIFF. A second pipeline stage receives the multiplied pairs of normalized mantissas, optionally performs an exponent adjustment, pads, complements and shifts the normalized mantissas, and the results are added in a series of stages until a single addition result remains, which is normalized using MAX_EXP to form the floating point output result.
    Type: Grant
    Filed: February 21, 2021
    Date of Patent: February 6, 2024
    Assignee: Ceremorphic, Inc.
    Inventor: Dylan Finch
  • Patent number: 11782711
    Abstract: Systems, apparatuses, and methods related to dynamic precision bit string accumulation are described. Dynamic bit string accumulation can be performed using an edge computing device. In an example method, dynamic precision bit string accumulation can include performing an iteration of a recursive operation using a first bit string and a second bit string and determining that a result of the iteration of the recursive operation contains a quantity of bits in a particular bit sub-set of the result that is greater than a threshold quantity of bits associated with the particular bit sub-set. The method can further include writing a result of the iteration of the recursive operation to a first register and writing at least a portion of the bits associated with the particular bit sub-set of the result to a second register.
    Type: Grant
    Filed: November 29, 2021
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Vijay S. Ramesh, Richard C. Murphy
  • Patent number: 11720328
    Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.
    Type: Grant
    Filed: September 23, 2020
    Date of Patent: August 8, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bin He, Shubh Shah, Michael Mantor
  • Patent number: 11455766
    Abstract: A processor selectively adjusts the precision of data for different functional units. Specified functional units of the processor, such as shader processing unit of a graphics processing unit (GPU) include a zeroing module to store, based on the states of corresponding precision flags, a data value of zero at specified portion of an input and/or output data operand. The functional unit then processes the data including the zeroed portion. Because a portion of the data has been zeroed, the functional unit consumes less power during data processing. Furthermore, the precision flags are set such that the reduced precision of the data does not significantly impact a user experience.
    Type: Grant
    Filed: September 18, 2018
    Date of Patent: September 27, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Pramod V. Argade, Daniel Nikolai Peroni
  • Patent number: 11429349
    Abstract: Floating point Multiply-Add, Accumulate Unit, supporting BF16 format for Multiply-Accumulate operations, and FP32 Single-Precision Addition complying with the IEEE 754 Standard. The Multiply-Accumulate unit uses higher radix and longer internal 2's complement significand representation to facilitate precision as well as comparison and operation with negative numbers. The addition is performed using Carry-Save format to avoid long carry propagation and speed up the operation. Operations including overflow detection, zero detection and sign extension are adopted for 2s complement and Carry-Save format. Handling of Overflow and Sign Extension allows for fast operation relatively independent on the size of the accumulator.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: August 30, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Vojin G. Oklobdzija, Matthew M. Kim
  • Patent number: 11321096
    Abstract: Hardware units and methods for performing matrix multiplication via a multi-stage pipeline wherein the storage elements associated with one or more stages of the pipeline are clock gated based on the data elements and/or portions thereof that known to have a zero value (or can be treated as having a zero value). In some cases, the storage elements may be clock gated on a per data element basis based on whether the data element has a zero value (or can be treated as having a zero value). In other cases, the storage elements may be clock gated on a partial element basis based on the bit width of the data elements. For example, if bit width of the data elements is less than a maximum bit width for the data elements then a portion of the bits related to that data element can be treated as having a zero value and a portion of the storage elements associated with that data element may not be clocked. In yet other cases the storage elements may be clock gated on both a per element and a partial element basis.
    Type: Grant
    Filed: November 5, 2018
    Date of Patent: May 3, 2022
    Assignee: Imagination Technologies Limited
    Inventors: Christopher Martin, Azzurra Pulimeno
  • Patent number: 11269632
    Abstract: An instruction to convert data from a source data type to a target data type is obtained. The source data type is selected from one or more source data types supported by the instruction, and the target data type is selected from one or more target data types supported by the instruction. Based on a selected data type of the source data type or the target data type, a determination is made of a rounding mode for use by the instruction. The rounding mode is implicitly set based on the selected data type; it is assigned to the selected data type. A conversion of the data from the source data type to the target data type is performed. The conversion includes performing a rounding operation using the rounding mode implicitly set. The performing the conversion provides a result in the target data type, which is written to a select location.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: March 8, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Laith M. AlBarakat, Jonathan D. Bradbury, Timothy Slegel, Cedric Lichtenau, Joachim von Buttlar
  • Patent number: 11237909
    Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.
    Type: Grant
    Filed: August 21, 2020
    Date of Patent: February 1, 2022
    Assignee: International Business Machines Corporation
    Inventors: Robert F. Enenkel, Christopher Anand, Adele Olejarz, Lucas Dutton
  • Patent number: 11226791
    Abstract: An arithmetic processing device has, when any or both of a first operand and a second operand included in a multiply-add operation instruction is or are zero, an exponent setting circuit sets an exponent of the first operand to a first set value, and sets an exponent of the second operand to a second set value. An exponent calculation circuit calculates an exponent obtained by a multiply-add operation, based on the exponents of the first and second operands outputted by the exponent setting circuit and an exponent of a third operand included in the multiply-add operation instruction. The sum of the first set value and the second set value is set so that a bit position of the third operand is located on a higher-order bit side than the most significant bit of the sum of the first operand and the second operand.
    Type: Grant
    Filed: October 2, 2019
    Date of Patent: January 18, 2022
    Assignee: FUJITSU LIMITED
    Inventors: Takio Ono, Hiroyuki Wada
  • Patent number: 11175892
    Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: November 16, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen
  • Patent number: 11163531
    Abstract: A method and a MAC unit that may include accumulation unit and a multiplier. A accumulation unit that includes a first part, a second part and a third part. The first part may calculate a truncated sum. The second part may be configured to (a) receive, during each calculation cycle, a carry out of an add operation performed during a calculation cycle, (b) receive a sign bit of an intermediate product calculated during the calculation cycle; and (c) calculate, by the counter logic, a counter logic value, and (d) convert, after a start of a last calculation cycle of the calculation cycles, an output value of the counter logic to an intermediate value having a two's complement format. The third part may be configured to calculate an output value of the MAC unit based on the intermediate value and a truncated sum calculated by the first part of the accumulation unit.
    Type: Grant
    Filed: June 21, 2019
    Date of Patent: November 2, 2021
    Assignee: DSP GROUP LTD.
    Inventors: Moshe Haiut, Assaf Ganor
  • Patent number: 11137983
    Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: October 5, 2021
    Assignee: Altera Corporation
    Inventors: Keone Streicher, Martin Langhammer, Yi-Wen Lin, Hyun Yi
  • Patent number: 11106598
    Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
    Type: Grant
    Filed: December 16, 2019
    Date of Patent: August 31, 2021
    Assignee: Shanghai Cambricon Information Technology Co., Ltd.
    Inventors: Yao Zhang, Bingrui Wang
  • Patent number: 11061672
    Abstract: A microprocessor is configured for unchained and chained modes of split execution of a fused compound arithmetic operation. In both modes of split execution, a first execution unit executes only a first part of the fused compound arithmetic operation and produces an intermediate result thereof, and a second instruction execution unit receives the intermediate result and executes a second part of the fused compound arithmetic operation to produce a final result. In the unchained mode, execution is accomplished by dispatching separate split-execution microinstructions to the first and second instruction execution units. In the chained mode, execution is accomplished by dispatching a single split-execution microinstruction to the first instruction execution unit and sending a chaining control signal or signal group to the second execution unit, causing it to execute its part of the fused arithmetic operation without needing an instruction.
    Type: Grant
    Filed: July 5, 2016
    Date of Patent: July 13, 2021
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Thomas Elmer, Nikhil A. Patil
  • Patent number: 11016731
    Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: May 25, 2021
    Assignee: Intel Corporation
    Inventors: Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Patent number: 11010131
    Abstract: An integrated circuit may include a floating-point adder. The adder may be implemented using a dual-path adder architecture having a near path and a far path. The near path may include a leading zero anticipator (LZA), a comparison circuit for comparing an exponent value to an LZA count, and associated circuitry for handling subnormal numbers. The far path may include a subtraction circuit for computing the difference between a received exponent value and a minimum exponent value, at least two shifters for shifting far greater and far lesser mantissa values in parallel, and associated circuitry for handling subnormal numbers. The adder may be dynamically configured to support a first mode that processes FP16 at inputs and outputs, a second mode that processes modified FP16? inputs, and a third mode that processes FP16? at inputs and outputs.
    Type: Grant
    Filed: September 14, 2017
    Date of Patent: May 18, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca
  • Patent number: 10977039
    Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.
    Type: Grant
    Filed: November 1, 2019
    Date of Patent: April 13, 2021
    Assignee: Intel Corporation
    Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Mark Charney, Robert Valentine, Jesus Corbal, Binwei Yang
  • Patent number: 10776207
    Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.
    Type: Grant
    Filed: September 6, 2018
    Date of Patent: September 15, 2020
    Assignee: International Business Machines Corporation
    Inventors: Robert F. Enenkel, Christopher Anand, Lucas Dutton, Adele Olejarz
  • Patent number: 10761807
    Abstract: This invention discloses a floating-point number operation circuit and a method thereof. The floating-point number operation circuit is configured to perform a fused multiplication and accumulation (fused mac) operation or a multiplication and accumulation (mac) operation on a first operand, a second operand, and a third operand, or perform a multiplication operation on the first operand and the second operand. The floating-point number operation circuit includes two rounding circuits, a multiplication circuit, a selection circuit, a control circuit, and an addition circuit. The control circuit controls the scheduling of various operations and the use of resources on each calculation path.
    Type: Grant
    Filed: October 23, 2018
    Date of Patent: September 1, 2020
    Assignee: REALTEK SEMICONDUCTOR CORPORATION
    Inventor: Chia-I Chen
  • Patent number: 10713012
    Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation.
    Type: Grant
    Filed: October 15, 2018
    Date of Patent: July 14, 2020
    Assignee: Intel Corporation
    Inventors: Aditya Varma, Michael Espig
  • Patent number: 10678510
    Abstract: The present embodiments relate to integrated circuits with floating-point arithmetic circuitry that handles normalized and denormalized floating-point numbers. The floating-point arithmetic circuitry may include a normalization circuit and a rounding circuit, and the floating-point arithmetic circuitry may generate a first result in form of a normalized, unrounded floating-point number and a second result in form of a normalized, rounded floating-point number. If desired, the floating-point arithmetic circuitry may be implemented in specialized processing blocks.
    Type: Grant
    Filed: September 25, 2017
    Date of Patent: June 9, 2020
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 10671347
    Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.
    Type: Grant
    Filed: January 28, 2016
    Date of Patent: June 2, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz
  • Patent number: 10649730
    Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: May 12, 2020
    Assignee: International Business Machines Corporation
    Inventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
  • Patent number: 10481869
    Abstract: Techniques are disclosed relating to circuitry configured to perform floating-point operations such as fused multiply-addition (FMA) with multiple paths and power control. In some embodiments, an FMA unit includes a near path and multiple far paths and is configured to select a path based on a determined exponent difference. In some embodiments, the FMA unit is configured to operate portions of non-selected paths in a low power state.
    Type: Grant
    Filed: November 10, 2017
    Date of Patent: November 19, 2019
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Ting Yu, Yu Sun
  • Patent number: 10445066
    Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.
    Type: Grant
    Filed: February 14, 2017
    Date of Patent: October 15, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz
  • Patent number: 10379811
    Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.
    Type: Grant
    Filed: November 16, 2017
    Date of Patent: August 13, 2019
    Assignee: International Business Machines Corporation
    Inventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
  • Patent number: 10331407
    Abstract: A method for performing tiny detection in floating-point operations with a floating-point unit. The floating-point unit is configured to implement a fused-multiply-add operation on three wide operands. The floating-point unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3-to-2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.
    Type: Grant
    Filed: November 11, 2017
    Date of Patent: June 25, 2019
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner
  • Patent number: 10318290
    Abstract: A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: June 11, 2019
    Assignee: ARM Finance Overseas Limited
    Inventor: David Yiu-Man Lau
  • Patent number: 10310818
    Abstract: Floating point chained multiply accumulation is performed using a multiplier to multiply a first floating point operand by a second floating point operand to generate an unrounded multiplication result. An adder then adds a third floating point operand to the unrounded multiplication result to generate an unrounded accumulation result. Rounding circuitry then applies both the rounding associated with the unrounded multiplication result and rounding associated with the unrounded accumulation result to generate a rounded accumulation result.
    Type: Grant
    Filed: July 19, 2017
    Date of Patent: June 4, 2019
    Assignee: ARM Limited
    Inventor: Felix Segundo Missel Manzo
  • Patent number: 10248417
    Abstract: A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, contains at least the following steps: decoding an instruction request from a compiler; executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to the instruction mode of the instruction request, thereby enabling ALGs (Arithmetic Logic Groups) to execute lanes of a thread. m is less than n and the precision of the first-type data is lower than the precision of the second-type data.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: April 2, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Huaisheng Zhang, Dacheng Liang, Boming Chen, Renyu Bian
  • Patent number: 10241756
    Abstract: A floating-point unit for performing tiny detection in floating-point operations. The floating-point unit is configured to implement a fused-multiply-add operation on three wide operands. The floating-point unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3-to-2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.
    Type: Grant
    Filed: July 11, 2017
    Date of Patent: March 26, 2019
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner
  • Patent number: 10235135
    Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.
    Type: Grant
    Filed: July 17, 2017
    Date of Patent: March 19, 2019
    Assignee: International Business Machines Corporation
    Inventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
  • Patent number: 10140093
    Abstract: An apparatus and method are provided for estimating a shift amount when employing processing circuitry to perform a subtraction operation to subtract a second significand value of a second floating-point operand from a first significand value of a first floating-point operand in order to generate a difference value. Shift estimation circuitry then determines an estimated shift amount to be applied to the difference value. The shift estimation circuitry comprises significand analysis circuitry to generate, from analysis of the significand values of the two floating-point operands, a first bit string identifying a most significant bit position within the difference value that is predicted to have its bit set to a determined value. In parallel, shift limiting circuitry generates from an exponent value a second bit string identifying a shift limit bit position.
    Type: Grant
    Filed: March 30, 2017
    Date of Patent: November 27, 2018
    Assignee: ARM Limited
    Inventors: David Raymond Lutz, Ian Michael Caulfield
  • Patent number: 10101998
    Abstract: A Vector Checksum instruction. Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.
    Type: Grant
    Filed: May 25, 2017
    Date of Patent: October 16, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Patent number: 10095516
    Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: October 9, 2018
    Assignee: INTEL CORPORATION
    Inventors: Shay Gueron, Vlad Krasnov, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 10001995
    Abstract: A processor includes a decode unit to decode a packed data alignment plus compute instruction. The instruction is to indicate a first set of one or more source packed data operands that is to include first data elements, a second set of one or more source packed data operands that is to include second data elements, at least one data element offset. An execution unit, in response to the instruction, is to store a result packed data operand that is to include result data elements that each have a value of an operation performed with a pair of a data element of the first set of source packed data operands and a data element of the second set of source packed data operands. The execution unit is to apply the at least one data element offset to at least a corresponding one of the first and second sets of source packed data operands. The at least one data element offset is to counteract any lack of correspondence between the data elements of each pair in the first and second sets of source packed data operands.
    Type: Grant
    Filed: June 2, 2015
    Date of Patent: June 19, 2018
    Assignee: Intel Corporation
    Inventors: Edwin Jan Van Dalen, Alexander Augusteijn, Martinus C. Wezelenburg, Steven Roos
  • Patent number: 9959093
    Abstract: A binary fused multiply-add floating-point unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a rounding-up signal.
    Type: Grant
    Filed: June 29, 2016
    Date of Patent: May 1, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael Klein, Klaus M. Kroener, Cédric Lichtenau, Silvia Melitta Mueller
  • Patent number: 9952829
    Abstract: A binary fused multiply-add floating-point unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a rounding-up signal.
    Type: Grant
    Filed: February 1, 2016
    Date of Patent: April 24, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael Klein, Klaus M. Kroener, Cédric Lichtenau, Silvia Melitta Mueller
  • Patent number: 9928035
    Abstract: A multiply and accumulation (MAC) unit for multiplying a provided first and a provided second multiplicand and for adding a provided summand to the resulting product is described. The MAC includes at least one multiplication block which is configured to multiply a first input signal and a second input signal, wherein the first input signal is given in a carry-save adder format and the second input signal is given in a binary format, wherein the multiplication result is provided in a carry-save format, and a carry-save adder which is configured to add to the result of the multiplication the provided summand.
    Type: Grant
    Filed: May 18, 2016
    Date of Patent: March 27, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Marcel Kossel
  • Patent number: 9891886
    Abstract: A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.
    Type: Grant
    Filed: June 24, 2015
    Date of Patent: February 13, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventor: Thomas Elmer
  • Patent number: 9841948
    Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.
    Type: Grant
    Filed: August 12, 2015
    Date of Patent: December 12, 2017
    Assignee: QUALCOMM Incorporated
    Inventor: Liang-Kai Wang
  • Patent number: 9829956
    Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.
    Type: Grant
    Filed: November 21, 2012
    Date of Patent: November 28, 2017
    Assignee: NVIDIA Corporation
    Inventors: David Conrad Tannenbaum, Colin Sprinkle, Stuart F. Oberman, Ming Y. Siu, Srinivasan Iyer, Ian-Chi Yan Kwong
  • Patent number: 9778908
    Abstract: A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiply-accumulate operation. The second sub-operation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result.
    Type: Grant
    Filed: June 24, 2015
    Date of Patent: October 3, 2017
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventor: Thomas Elmer
  • Patent number: 9720648
    Abstract: A method for hiding implicit bit corrections in a partial product adder array in a binary and hexadecimal floating-point multiplier such that no additional adder stages are needed for the implicit bit corrections. Two leading-one correction terms are generated for the fraction in the multiplier floating-point number and two leading-one correction terms are generated for the fraction in the multiplicand floating-point number. The floating-point numbers may be single-precision or double-precision. Each leading-one correction term for the single-precision case is appended to the left of an intermediate partial product sum in the adder array that is an input to an adder so as to not to extend the bits in the input further to the left than the bits in another input to the adder. Each leading-one correction term for the double-precision case replaces an adder input that is unused when base-2 floating-point numbers are multiplied.
    Type: Grant
    Filed: December 22, 2014
    Date of Patent: August 1, 2017
    Assignee: International Business Machines Corporation
    Inventors: Silvia M. Mueller, Son Dao Trong
  • Patent number: 9645792
    Abstract: At least one processor may emulate a fused multiply-add operation for a first operand, a second operand, and a third operand. The at least one processor may determine an intermediate value based at least in part on multiplying the first operand with the second operand, determine at least one of an upper intermediate value or a lower intermediate value, wherein determining the upper intermediate value comprises rounding, towards zero, the intermediate value by a specified number of bits, and wherein determining the lower intermediate value comprises subtracting the intermediate value by the upper intermediate value, determine an upper value and a lower value based at least in part on adding or subtracting the third operand to one of the upper intermediate value or the lower intermediate value, and determine an emulated fused multiply-add result by adding the upper value and the lower value.
    Type: Grant
    Filed: August 18, 2014
    Date of Patent: May 9, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Pramod Vasant Argade, Andrew Evan Gruber, Chiente Ho, Stewart Griffin Hall, Lin Chen
  • Patent number: 9563400
    Abstract: A method for hiding implicit bit corrections in a partial product adder array in a binary and hexadecimal floating-point multiplier such that no additional adder stages are needed for the implicit bit corrections. Two leading-one correction terms are generated for the fraction in the multiplier floating-point number and two leading-one correction terms are generated for the fraction in the multiplicand floating-point number. The floating-point numbers may be single-precision or double-precision. Each leading-one correction term for the single-precision case is appended to the left of an intermediate partial product sum in the adder array that is an input to an adder so as to not to extend the bits in the input further to the left than the bits in another input to the adder. Each leading-one correction term for the double-precision case replaces an adder input that is unused when base-2 floating-point numbers are multiplied.
    Type: Grant
    Filed: September 18, 2014
    Date of Patent: February 7, 2017
    Assignee: International Business Machines Corporation
    Inventors: Silvia M. Mueller, Son Dao Trong
  • Patent number: 9542154
    Abstract: Systems and methods of performing a fused multiply add (FMA) operations are provided. In one embodiment, the length of the adder used by the FMA operation is less than 3*N, where N is the number of bits in the mantissa term of a floating point number. A mask may be used to perform the addition portion of the FMA operation using the adder. A second mask may be used to denormalize the result of the addition portion of the FMA operation if an underflow occurs.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: January 10, 2017
    Assignee: Intel Corporation
    Inventors: Simon Rubanovich, Thierry Pons, Amit Gradstein, Zeev Sperber
  • Patent number: 9519458
    Abstract: A fused-multiply-add system is disclosed. The fused-multiply-add system includes a multiplier to multiply first and second operands and to provide at least one product. The fused-multiply-add system also includes an alignment shifter for aligning a third operand with the at least one product to provide an aligned third operand. The fused-multiply-add system also includes an adder and a subtractor coupled to the multiplier and the alignment shifter for performing two asymmetrical additions in parallel paths. The fused-multiply-add system also includes at least one leading zero counter for counting a number of leading zero bits provided by at least one of the adder and the subtractor to provide at least one normalization shift amount. Finally, the fused-multiply-add system includes a multiplexer coupled to the adder and the subtractor for providing an appropriate output based upon a sign bit.
    Type: Grant
    Filed: April 8, 2014
    Date of Patent: December 13, 2016
    Assignee: Cadence Design Systems, Inc.
    Inventors: David H. C. Chen, William A. Huffman
  • Patent number: 9465575
    Abstract: A fused floating-point multiply-add element includes a multiplier that generates a product, and a shifter that shifts an addend within a narrow range. Interpreting logic analyzes the magnitude of the addend relative to the product and then causes logic arrays to position the shifted addend within the left, center, or right portions of a composite register depending in the magnitude of the addend relative to the product. The interpreting logic also forces other portions of the composite register to zero. When the addend is zero, the interpreting logic forces all portions of the composite register to zero. Final combining logic then adds the contents of the composite register to the product.
    Type: Grant
    Filed: August 5, 2013
    Date of Patent: October 11, 2016
    Assignee: NVIDIA Corporation
    Inventors: Srinivasan Iyer, David Conrad Tannenbaum, Stuart F. Oberman, Ming (Michael) Y. Siu
  • Patent number: 9449198
    Abstract: A system including a first circuit and a second circuit. The first circuit includes a multiplier array to receive a first operand and a second operand and generate a plurality of outputs, an adder array to receive the plurality of outputs and generate a partial product of the first operand and the second operand including partial sums and carry bits, which are stored in a plurality of register arrays. The second circuit generates the product of the first operand and the second operand by implementing a two-stage reduction of the partial product of the first operand and the second operand. A first stage includes rearranging the partial sums and carry bits as two multi-bit integers. A second stage includes generating a plurality of multi-bit integers based on the two multi-bit integers, and generating the product of the first operand and the second operand based on the plurality of multi-bit integers.
    Type: Grant
    Filed: April 6, 2015
    Date of Patent: September 20, 2016
    Assignee: Marvell International LTD.
    Inventors: Fei Sun, Chang Shu