Multiplication Followed By Addition Patents (Class 708/501)

Patent number: 11455766Abstract: A processor selectively adjusts the precision of data for different functional units. Specified functional units of the processor, such as shader processing unit of a graphics processing unit (GPU) include a zeroing module to store, based on the states of corresponding precision flags, a data value of zero at specified portion of an input and/or output data operand. The functional unit then processes the data including the zeroed portion. Because a portion of the data has been zeroed, the functional unit consumes less power during data processing. Furthermore, the precision flags are set such that the reduced precision of the data does not significantly impact a user experience.Type: GrantFiled: September 18, 2018Date of Patent: September 27, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Pramod V. Argade, Daniel Nikolai Peroni

Patent number: 11429349Abstract: Floating point MultiplyAdd, Accumulate Unit, supporting BF16 format for MultiplyAccumulate operations, and FP32 SinglePrecision Addition complying with the IEEE 754 Standard. The MultiplyAccumulate unit uses higher radix and longer internal 2's complement significand representation to facilitate precision as well as comparison and operation with negative numbers. The addition is performed using CarrySave format to avoid long carry propagation and speed up the operation. Operations including overflow detection, zero detection and sign extension are adopted for 2s complement and CarrySave format. Handling of Overflow and Sign Extension allows for fast operation relatively independent on the size of the accumulator.Type: GrantFiled: August 9, 2021Date of Patent: August 30, 2022Assignee: SambaNova Systems, Inc.Inventors: Vojin G. Oklobdzija, Matthew M. Kim

Patent number: 11321096Abstract: Hardware units and methods for performing matrix multiplication via a multistage pipeline wherein the storage elements associated with one or more stages of the pipeline are clock gated based on the data elements and/or portions thereof that known to have a zero value (or can be treated as having a zero value). In some cases, the storage elements may be clock gated on a per data element basis based on whether the data element has a zero value (or can be treated as having a zero value). In other cases, the storage elements may be clock gated on a partial element basis based on the bit width of the data elements. For example, if bit width of the data elements is less than a maximum bit width for the data elements then a portion of the bits related to that data element can be treated as having a zero value and a portion of the storage elements associated with that data element may not be clocked. In yet other cases the storage elements may be clock gated on both a per element and a partial element basis.Type: GrantFiled: November 5, 2018Date of Patent: May 3, 2022Assignee: Imagination Technologies LimitedInventors: Christopher Martin, Azzurra Pulimeno

Patent number: 11269632Abstract: An instruction to convert data from a source data type to a target data type is obtained. The source data type is selected from one or more source data types supported by the instruction, and the target data type is selected from one or more target data types supported by the instruction. Based on a selected data type of the source data type or the target data type, a determination is made of a rounding mode for use by the instruction. The rounding mode is implicitly set based on the selected data type; it is assigned to the selected data type. A conversion of the data from the source data type to the target data type is performed. The conversion includes performing a rounding operation using the rounding mode implicitly set. The performing the conversion provides a result in the target data type, which is written to a select location.Type: GrantFiled: June 17, 2021Date of Patent: March 8, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Laith M. AlBarakat, Jonathan D. Bradbury, Timothy Slegel, Cedric Lichtenau, Joachim von Buttlar

Patent number: 11237909Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.Type: GrantFiled: August 21, 2020Date of Patent: February 1, 2022Assignee: International Business Machines CorporationInventors: Robert F. Enenkel, Christopher Anand, Adele Olejarz, Lucas Dutton

Patent number: 11226791Abstract: An arithmetic processing device has, when any or both of a first operand and a second operand included in a multiplyadd operation instruction is or are zero, an exponent setting circuit sets an exponent of the first operand to a first set value, and sets an exponent of the second operand to a second set value. An exponent calculation circuit calculates an exponent obtained by a multiplyadd operation, based on the exponents of the first and second operands outputted by the exponent setting circuit and an exponent of a third operand included in the multiplyadd operation instruction. The sum of the first set value and the second set value is set so that a bit position of the third operand is located on a higherorder bit side than the most significant bit of the sum of the first operand and the second operand.Type: GrantFiled: October 2, 2019Date of Patent: January 18, 2022Assignee: FUJITSU LIMITEDInventors: Takio Ono, Hiroyuki Wada

Patent number: 11175892Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carrypropagate adders of a first precision. Results from the carrypropagate adders may be added using a floatingpoint adder of the first precision. Results from the floatingpoint adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floatingpoint adder with zero, with a generalpurpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.Type: GrantFiled: November 20, 2017Date of Patent: November 16, 2021Assignee: Intel CorporationInventors: Martin Langhammer, Dongdong Chen

Patent number: 11163531Abstract: A method and a MAC unit that may include accumulation unit and a multiplier. A accumulation unit that includes a first part, a second part and a third part. The first part may calculate a truncated sum. The second part may be configured to (a) receive, during each calculation cycle, a carry out of an add operation performed during a calculation cycle, (b) receive a sign bit of an intermediate product calculated during the calculation cycle; and (c) calculate, by the counter logic, a counter logic value, and (d) convert, after a start of a last calculation cycle of the calculation cycles, an output value of the counter logic to an intermediate value having a two's complement format. The third part may be configured to calculate an output value of the MAC unit based on the intermediate value and a truncated sum calculated by the first part of the accumulation unit.Type: GrantFiled: June 21, 2019Date of Patent: November 2, 2021Assignee: DSP GROUP LTD.Inventors: Moshe Haiut, Assaf Ganor

Patent number: 11137983Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floatingpoint functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floatingpoint functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floatingpoint multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floatingpoint exception handling.Type: GrantFiled: September 27, 2019Date of Patent: October 5, 2021Assignee: Altera CorporationInventors: Keone Streicher, Martin Langhammer, YiWen Lin, Hyun Yi

Patent number: 11106598Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixedpoint data, thereby improving the processing speed and efficiency of training operations.Type: GrantFiled: December 16, 2019Date of Patent: August 31, 2021Assignee: Shanghai Cambricon Information Technology Co., Ltd.Inventors: Yao Zhang, Bingrui Wang

Patent number: 11061672Abstract: A microprocessor is configured for unchained and chained modes of split execution of a fused compound arithmetic operation. In both modes of split execution, a first execution unit executes only a first part of the fused compound arithmetic operation and produces an intermediate result thereof, and a second instruction execution unit receives the intermediate result and executes a second part of the fused compound arithmetic operation to produce a final result. In the unchained mode, execution is accomplished by dispatching separate splitexecution microinstructions to the first and second instruction execution units. In the chained mode, execution is accomplished by dispatching a single splitexecution microinstruction to the first instruction execution unit and sending a chaining control signal or signal group to the second execution unit, causing it to execute its part of the fused arithmetic operation without needing an instruction.Type: GrantFiled: July 5, 2016Date of Patent: July 13, 2021Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventors: Thomas Elmer, Nikhil A. Patil

Patent number: 11016731Abstract: Disclosed embodiments relate to performing floatingpoint (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floatingpoint (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a FuzzyJbit format comprising a sign bit, a 9bit exponent, and a 25bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.Type: GrantFiled: March 29, 2019Date of Patent: May 25, 2021Assignee: Intel CorporationInventors: Amit Gradstein, Simon Rubanovich, Zeev Sperber

Patent number: 11010131Abstract: An integrated circuit may include a floatingpoint adder. The adder may be implemented using a dualpath adder architecture having a near path and a far path. The near path may include a leading zero anticipator (LZA), a comparison circuit for comparing an exponent value to an LZA count, and associated circuitry for handling subnormal numbers. The far path may include a subtraction circuit for computing the difference between a received exponent value and a minimum exponent value, at least two shifters for shifting far greater and far lesser mantissa values in parallel, and associated circuitry for handling subnormal numbers. The adder may be dynamically configured to support a first mode that processes FP16 at inputs and outputs, a second mode that processes modified FP16? inputs, and a third mode that processes FP16? at inputs and outputs.Type: GrantFiled: September 14, 2017Date of Patent: May 18, 2021Assignee: Intel CorporationInventors: Martin Langhammer, Bogdan Pasca

Patent number: 10977039Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.Type: GrantFiled: November 1, 2019Date of Patent: April 13, 2021Assignee: Intel CorporationInventors: Venkateswara Madduri, Elmoustapha OuldAhmedVall, Mark Charney, Robert Valentine, Jesus Corbal, Binwei Yang

Patent number: 10776207Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.Type: GrantFiled: September 6, 2018Date of Patent: September 15, 2020Assignee: International Business Machines CorporationInventors: Robert F. Enenkel, Christopher Anand, Lucas Dutton, Adele Olejarz

Patent number: 10761807Abstract: This invention discloses a floatingpoint number operation circuit and a method thereof. The floatingpoint number operation circuit is configured to perform a fused multiplication and accumulation (fused mac) operation or a multiplication and accumulation (mac) operation on a first operand, a second operand, and a third operand, or perform a multiplication operation on the first operand and the second operand. The floatingpoint number operation circuit includes two rounding circuits, a multiplication circuit, a selection circuit, a control circuit, and an addition circuit. The control circuit controls the scheduling of various operations and the use of resources on each calculation path.Type: GrantFiled: October 23, 2018Date of Patent: September 1, 2020Assignee: REALTEK SEMICONDUCTOR CORPORATIONInventor: ChiaI Chen

Patent number: 10713012Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation.Type: GrantFiled: October 15, 2018Date of Patent: July 14, 2020Assignee: Intel CorporationInventors: Aditya Varma, Michael Espig

Patent number: 10678510Abstract: The present embodiments relate to integrated circuits with floatingpoint arithmetic circuitry that handles normalized and denormalized floatingpoint numbers. The floatingpoint arithmetic circuitry may include a normalization circuit and a rounding circuit, and the floatingpoint arithmetic circuitry may generate a first result in form of a normalized, unrounded floatingpoint number and a second result in form of a normalized, rounded floatingpoint number. If desired, the floatingpoint arithmetic circuitry may be implemented in specialized processing blocks.Type: GrantFiled: September 25, 2017Date of Patent: June 9, 2020Assignee: Altera CorporationInventor: Martin Langhammer

Patent number: 10671347Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.Type: GrantFiled: January 28, 2016Date of Patent: June 2, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz

Patent number: 10649730Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fusedmultiplyadd high part) and the low part (i.e., a fusedmultiplyadd high part). The unit prealigns the intermediate wide result on two fixed length shifters such that the fusedmultiplyadd high part and the fusedmultiplyadd low part are prealigned to each fit on one shifter of the two fixed length shifters.Type: GrantFiled: June 26, 2019Date of Patent: May 12, 2020Assignee: International Business Machines CorporationInventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner

Patent number: 10481869Abstract: Techniques are disclosed relating to circuitry configured to perform floatingpoint operations such as fused multiplyaddition (FMA) with multiple paths and power control. In some embodiments, an FMA unit includes a near path and multiple far paths and is configured to select a path based on a determined exponent difference. In some embodiments, the FMA unit is configured to operate portions of nonselected paths in a low power state.Type: GrantFiled: November 10, 2017Date of Patent: November 19, 2019Assignee: Apple Inc.Inventors: LiangKai Wang, Ting Yu, Yu Sun

Patent number: 10445066Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.Type: GrantFiled: February 14, 2017Date of Patent: October 15, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz

Patent number: 10379811Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fusedmultiplyadd high part) and the low part (i.e., a fusedmultiplyadd high part). The unit prealigns the intermediate wide result on two fixed length shifters such that the fusedmultiplyadd high part and the fusedmultiplyadd low part are prealigned to each fit on one shifter of the two fixed length shifters.Type: GrantFiled: November 16, 2017Date of Patent: August 13, 2019Assignee: International Business Machines CorporationInventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner

Patent number: 10331407Abstract: A method for performing tiny detection in floatingpoint operations with a floatingpoint unit. The floatingpoint unit is configured to implement a fusedmultiplyadd operation on three wide operands. The floatingpoint unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3to2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.Type: GrantFiled: November 11, 2017Date of Patent: June 25, 2019Assignee: International Business Machines CorporationInventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner

Patent number: 10318290Abstract: A first floatingpoint operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floatingpoint operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floatingpoint operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.Type: GrantFiled: May 24, 2017Date of Patent: June 11, 2019Assignee: ARM Finance Overseas LimitedInventor: David YiuMan Lau

Patent number: 10310818Abstract: Floating point chained multiply accumulation is performed using a multiplier to multiply a first floating point operand by a second floating point operand to generate an unrounded multiplication result. An adder then adds a third floating point operand to the unrounded multiplication result to generate an unrounded accumulation result. Rounding circuitry then applies both the rounding associated with the unrounded multiplication result and rounding associated with the unrounded accumulation result to generate a rounded accumulation result.Type: GrantFiled: July 19, 2017Date of Patent: June 4, 2019Assignee: ARM LimitedInventor: Felix Segundo Missel Manzo

Patent number: 10248417Abstract: A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, contains at least the following steps: decoding an instruction request from a compiler; executing a loop m times to generate m microinstructions for calculating firsttype data, or n times to generate n microinstructions for calculating secondtype data according to the instruction mode of the instruction request, thereby enabling ALGs (Arithmetic Logic Groups) to execute lanes of a thread. m is less than n and the precision of the firsttype data is lower than the precision of the secondtype data.Type: GrantFiled: August 24, 2017Date of Patent: April 2, 2019Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventors: Huaisheng Zhang, Dacheng Liang, Boming Chen, Renyu Bian

Patent number: 10241756Abstract: A floatingpoint unit for performing tiny detection in floatingpoint operations. The floatingpoint unit is configured to implement a fusedmultiplyadd operation on three wide operands. The floatingpoint unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3to2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.Type: GrantFiled: July 11, 2017Date of Patent: March 26, 2019Assignee: International Business Machines CorporationInventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner

Patent number: 10235135Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fusedmultiplyadd high part) and the low part (i.e., a fusedmultiplyadd high part). The unit prealigns the intermediate wide result on two fixed length shifters such that the fusedmultiplyadd high part and the fusedmultiplyadd low part are prealigned to each fit on one shifter of the two fixed length shifters.Type: GrantFiled: July 17, 2017Date of Patent: March 19, 2019Assignee: International Business Machines CorporationInventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner

Patent number: 10140093Abstract: An apparatus and method are provided for estimating a shift amount when employing processing circuitry to perform a subtraction operation to subtract a second significand value of a second floatingpoint operand from a first significand value of a first floatingpoint operand in order to generate a difference value. Shift estimation circuitry then determines an estimated shift amount to be applied to the difference value. The shift estimation circuitry comprises significand analysis circuitry to generate, from analysis of the significand values of the two floatingpoint operands, a first bit string identifying a most significant bit position within the difference value that is predicted to have its bit set to a determined value. In parallel, shift limiting circuitry generates from an exponent value a second bit string identifying a shift limit bit position.Type: GrantFiled: March 30, 2017Date of Patent: November 27, 2018Assignee: ARM LimitedInventors: David Raymond Lutz, Ian Michael Caulfield

Patent number: 10101998Abstract: A Vector Checksum instruction. Elements from a second operand are added together onebyone to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.Type: GrantFiled: May 25, 2017Date of Patent: October 16, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Eric M. Schwarz

Patent number: 10095516Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.Type: GrantFiled: June 29, 2012Date of Patent: October 9, 2018Assignee: INTEL CORPORATIONInventors: Shay Gueron, Vlad Krasnov, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich

Patent number: 10001995Abstract: A processor includes a decode unit to decode a packed data alignment plus compute instruction. The instruction is to indicate a first set of one or more source packed data operands that is to include first data elements, a second set of one or more source packed data operands that is to include second data elements, at least one data element offset. An execution unit, in response to the instruction, is to store a result packed data operand that is to include result data elements that each have a value of an operation performed with a pair of a data element of the first set of source packed data operands and a data element of the second set of source packed data operands. The execution unit is to apply the at least one data element offset to at least a corresponding one of the first and second sets of source packed data operands. The at least one data element offset is to counteract any lack of correspondence between the data elements of each pair in the first and second sets of source packed data operands.Type: GrantFiled: June 2, 2015Date of Patent: June 19, 2018Assignee: Intel CorporationInventors: Edwin Jan Van Dalen, Alexander Augusteijn, Martinus C. Wezelenburg, Steven Roos

Patent number: 9959093Abstract: A binary fused multiplyadd floatingpoint unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a roundingup signal.Type: GrantFiled: June 29, 2016Date of Patent: May 1, 2018Assignee: International Business Machines CorporationInventors: Michael Klein, Klaus M. Kroener, CÃ©dric Lichtenau, Silvia Melitta Mueller

Patent number: 9952829Abstract: A binary fused multiplyadd floatingpoint unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a roundingup signal.Type: GrantFiled: February 1, 2016Date of Patent: April 24, 2018Assignee: International Business Machines CorporationInventors: Michael Klein, Klaus M. Kroener, CÃ©dric Lichtenau, Silvia Melitta Mueller

Multiplyandaccumulate unit in carrysave adder format and application in a feedback loop equalizer
Patent number: 9928035Abstract: A multiply and accumulation (MAC) unit for multiplying a provided first and a provided second multiplicand and for adding a provided summand to the resulting product is described. The MAC includes at least one multiplication block which is configured to multiply a first input signal and a second input signal, wherein the first input signal is given in a carrysave adder format and the second input signal is given in a binary format, wherein the multiplication result is provided in a carrysave format, and a carrysave adder which is configured to add to the result of the multiplication the provided summand.Type: GrantFiled: May 18, 2016Date of Patent: March 27, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Marcel Kossel 
Patent number: 9891886Abstract: A microprocessor performs a fused multiplyaccumulate operation of a form Â±A*BÂ±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded nonredundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded nonredundant intermediate result vector that excludes one or more least significant bits of the unrounded nonredundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded nonredundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.Type: GrantFiled: June 24, 2015Date of Patent: February 13, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTDInventor: Thomas Elmer

Patent number: 9841948Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to prealign the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the prealigned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.Type: GrantFiled: August 12, 2015Date of Patent: December 12, 2017Assignee: QUALCOMM IncorporatedInventor: LiangKai Wang

Patent number: 9829956Abstract: An approach is provided for enabling power reduction in floatingpoint operations. In one example, a system receives floatingpoint numbers of a fused multiplyadd instruction. The system determines the fused multiplyadd instruction does not require compliance with a standard of precision for floatingpoint numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiplyadd instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.Type: GrantFiled: November 21, 2012Date of Patent: November 28, 2017Assignee: NVIDIA CorporationInventors: David Conrad Tannenbaum, Colin Sprinkle, Stuart F. Oberman, Ming Y. Siu, Srinivasan Iyer, IanChi Yan Kwong

Patent number: 9778908Abstract: A microprocessor splits a fused multiplyaccumulate operation of the form A*B+C into first and second multiplyaccumulate suboperations to be performed by a multiplier and an adder. The first suboperation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiplyaccumulate operation. The second suboperation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result.Type: GrantFiled: June 24, 2015Date of Patent: October 3, 2017Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer

Patent number: 9720648Abstract: A method for hiding implicit bit corrections in a partial product adder array in a binary and hexadecimal floatingpoint multiplier such that no additional adder stages are needed for the implicit bit corrections. Two leadingone correction terms are generated for the fraction in the multiplier floatingpoint number and two leadingone correction terms are generated for the fraction in the multiplicand floatingpoint number. The floatingpoint numbers may be singleprecision or doubleprecision. Each leadingone correction term for the singleprecision case is appended to the left of an intermediate partial product sum in the adder array that is an input to an adder so as to not to extend the bits in the input further to the left than the bits in another input to the adder. Each leadingone correction term for the doubleprecision case replaces an adder input that is unused when base2 floatingpoint numbers are multiplied.Type: GrantFiled: December 22, 2014Date of Patent: August 1, 2017Assignee: International Business Machines CorporationInventors: Silvia M. Mueller, Son Dao Trong

Patent number: 9645792Abstract: At least one processor may emulate a fused multiplyadd operation for a first operand, a second operand, and a third operand. The at least one processor may determine an intermediate value based at least in part on multiplying the first operand with the second operand, determine at least one of an upper intermediate value or a lower intermediate value, wherein determining the upper intermediate value comprises rounding, towards zero, the intermediate value by a specified number of bits, and wherein determining the lower intermediate value comprises subtracting the intermediate value by the upper intermediate value, determine an upper value and a lower value based at least in part on adding or subtracting the third operand to one of the upper intermediate value or the lower intermediate value, and determine an emulated fused multiplyadd result by adding the upper value and the lower value.Type: GrantFiled: August 18, 2014Date of Patent: May 9, 2017Assignee: QUALCOMM IncorporatedInventors: Pramod Vasant Argade, Andrew Evan Gruber, Chiente Ho, Stewart Griffin Hall, Lin Chen

Patent number: 9563400Abstract: A method for hiding implicit bit corrections in a partial product adder array in a binary and hexadecimal floatingpoint multiplier such that no additional adder stages are needed for the implicit bit corrections. Two leadingone correction terms are generated for the fraction in the multiplier floatingpoint number and two leadingone correction terms are generated for the fraction in the multiplicand floatingpoint number. The floatingpoint numbers may be singleprecision or doubleprecision. Each leadingone correction term for the singleprecision case is appended to the left of an intermediate partial product sum in the adder array that is an input to an adder so as to not to extend the bits in the input further to the left than the bits in another input to the adder. Each leadingone correction term for the doubleprecision case replaces an adder input that is unused when base2 floatingpoint numbers are multiplied.Type: GrantFiled: September 18, 2014Date of Patent: February 7, 2017Assignee: International Business Machines CorporationInventors: Silvia M. Mueller, Son Dao Trong

Patent number: 9542154Abstract: Systems and methods of performing a fused multiply add (FMA) operations are provided. In one embodiment, the length of the adder used by the FMA operation is less than 3*N, where N is the number of bits in the mantissa term of a floating point number. A mask may be used to perform the addition portion of the FMA operation using the adder. A second mask may be used to denormalize the result of the addition portion of the FMA operation if an underflow occurs.Type: GrantFiled: June 25, 2013Date of Patent: January 10, 2017Assignee: Intel CorporationInventors: Simon Rubanovich, Thierry Pons, Amit Gradstein, Zeev Sperber

Patent number: 9519458Abstract: A fusedmultiplyadd system is disclosed. The fusedmultiplyadd system includes a multiplier to multiply first and second operands and to provide at least one product. The fusedmultiplyadd system also includes an alignment shifter for aligning a third operand with the at least one product to provide an aligned third operand. The fusedmultiplyadd system also includes an adder and a subtractor coupled to the multiplier and the alignment shifter for performing two asymmetrical additions in parallel paths. The fusedmultiplyadd system also includes at least one leading zero counter for counting a number of leading zero bits provided by at least one of the adder and the subtractor to provide at least one normalization shift amount. Finally, the fusedmultiplyadd system includes a multiplexer coupled to the adder and the subtractor for providing an appropriate output based upon a sign bit.Type: GrantFiled: April 8, 2014Date of Patent: December 13, 2016Assignee: Cadence Design Systems, Inc.Inventors: David H. C. Chen, William A. Huffman

Patent number: 9465575Abstract: A fused floatingpoint multiplyadd element includes a multiplier that generates a product, and a shifter that shifts an addend within a narrow range. Interpreting logic analyzes the magnitude of the addend relative to the product and then causes logic arrays to position the shifted addend within the left, center, or right portions of a composite register depending in the magnitude of the addend relative to the product. The interpreting logic also forces other portions of the composite register to zero. When the addend is zero, the interpreting logic forces all portions of the composite register to zero. Final combining logic then adds the contents of the composite register to the product.Type: GrantFiled: August 5, 2013Date of Patent: October 11, 2016Assignee: NVIDIA CorporationInventors: Srinivasan Iyer, David Conrad Tannenbaum, Stuart F. Oberman, Ming (Michael) Y. Siu

Patent number: 9449198Abstract: A system including a first circuit and a second circuit. The first circuit includes a multiplier array to receive a first operand and a second operand and generate a plurality of outputs, an adder array to receive the plurality of outputs and generate a partial product of the first operand and the second operand including partial sums and carry bits, which are stored in a plurality of register arrays. The second circuit generates the product of the first operand and the second operand by implementing a twostage reduction of the partial product of the first operand and the second operand. A first stage includes rearranging the partial sums and carry bits as two multibit integers. A second stage includes generating a plurality of multibit integers based on the two multibit integers, and generating the product of the first operand and the second operand based on the plurality of multibit integers.Type: GrantFiled: April 6, 2015Date of Patent: September 20, 2016Assignee: Marvell International LTD.Inventors: Fei Sun, Chang Shu

Patent number: 9405535Abstract: A circuit arrangement provides support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.Type: GrantFiled: November 29, 2012Date of Patent: August 2, 2016Assignee: International Business Machines CorporationInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs

Patent number: 9400635Abstract: An integrated circuit is provided that performs floatingpoint operations involving at least two successive computational steps. Two floatingpoint numbers entering any additional computational step after the first computational step are aligned dynamically by shifting the mantissa of the floatingpoint number with the greater exponent to the left and the mantissa of the floatingpoint number with the smaller exponent to the right. The number of left shift bits is dependent on the magnitude of the difference between the two floatingpoint exponents and the number of leading zeroes in the mantissa with the greater exponent. The number of right shift bits is dependent on the magnitude of the difference between the two floatingpoint exponents and the number of left shift bits.Type: GrantFiled: January 14, 2013Date of Patent: July 26, 2016Assignee: Altera CorporationInventor: Tomasz Sebastian Czajkowski

Patent number: 9317250Abstract: The present application provides a method and apparatus for supporting denormal numbers in a floating point multiplyadd unit (FMAC). One embodiment of the FMAC is configurable to add a product of first and second operands to a third operand. This embodiment of the FMAC is configurable to determine a minimum exponent shift for a sum of the product and the third operand by subtracting a minimum normal exponent from a product exponent of the product. This embodiment of the FMAC is also configurable to cause bits representing the sum to be left shifted by the minimum exponent shift if a third exponent of the third operand is less than or equal to the product exponent and the minimum exponent shift is less than or equal to a predicted left shift for the sum.Type: GrantFiled: November 12, 2012Date of Patent: April 19, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Kelvin D. Goveas, Debjit Das Sarma, Scott A. Hilker, Hanbing Liu