Multiplication Followed By Addition Patents (Class 708/501)

Coarse floating point accumulator circuit, and MAC processing pipelines including same

Patent number: 12282748

Abstract: An integrated circuit including a multiplier-accumulator circuit pipeline including a plurality of MAC circuits. Each MAC circuit includes: (A) a multiplier circuit to multiply first input data and filter weight data to generate and output first product data having a floating point data format, and (B) a coarse floating point accumulator circuit including: (1) an alignment shift circuit to shift at least one field of the first product data and generate shifted first product data, and (2) fixed point addition circuitry, coupled to the alignment shift circuit, to add second input data and the shifted first product data using the fixed point addition circuitry. The plurality of MAC circuits of the multiplier-accumulator circuit execution pipeline, in operation, each perform a plurality of multiply operations and accumulate operations to process the first input data and generate processed data therefrom.

Type: Grant

Filed: May 6, 2021

Date of Patent: April 22, 2025

Assignee: Analog Devices, Inc.

Inventors: Frederick A. Ware, Cheng C. Wang
Instructions and logic to perform floating point and integer operations for machine learning

Patent number: 12217053

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

Type: Grant

Filed: December 4, 2023

Date of Patent: February 4, 2025

Assignee: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Supporting floating point 16 (FP16) in dot product architecture

Patent number: 12216735

Abstract: A dot-product architecture and method are disclosed for calculating floating-point dot-products of two vectors. The architecture includes an array of multiplier units that each include an integer logic that multiplies integer values of corresponding elements of the two vectors; an exponent logic that adds exponent values of the corresponding elements of the two vectors to form an unbiased exponent values, and a local shifter that forms a first shifted value by shifting a product-integer value by a number of bits in a predetermined direction based on a difference value between an unbiased exponent value corresponding to the product-integer value and a maximum unbiased exponent value for the array of multiplier units. An adder tree adds shifted values output from local shifters of the array of multiplier units to form an output, and an accumulator accumulates the output of the addition unit.

Type: Grant

Filed: January 20, 2021

Date of Patent: February 4, 2025

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Hamzah Ahmed Ali Abdelaziz, Ali Shafiee Ardestani, Joseph H. Hassoun
Process for dual mode floating point multiplier-accumulator with high precision mode for near zero accumulation results

Patent number: 12197889

Abstract: A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates: an integer form fraction at a first bitwidth and a second bitwidth greater than the first bitwidth, a sign bit, and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. The integer form fractions of the first bitwidths are provided to an adder tree using the first bitwidth, and if the sum has an excess percentage of leading 0s, then the second bitwidth is used by an adder tree using the second bitwidth to form a great precision integer form fraction. The sign, integer form fraction, and maximum exponent are provided to an normalizer which generates a floating point result.

Type: Grant

Filed: June 21, 2021

Date of Patent: January 14, 2025

Assignee: Ceremorphic, Inc.

Inventor: Dylan Finch
Power saving floating point multiplier-accumulator with precision-aware accumulation

Patent number: 12106069

Abstract: A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which outputs a signed integer form fraction and a maximum exponent. A range estimator forms a possible range of values from the exponent differences and determines an adder precision. The integer form fractions are summed using the adder precision, a sign bit is extracted, and a floating point value is output. Each MAC processor provides its integer form fraction with a precision determined by the MAC processor's exponent difference.

Type: Grant

Filed: June 21, 2021

Date of Patent: October 1, 2024

Assignee: Ceremorphic, Inc.

Inventor: Dylan Finch
Power saving floating point Multiplier-Accumulator with a high precision accumulation detection mode

Patent number: 12079593

Abstract: A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer form fraction with a first bitwidth and a second bitwidth, along with a maximum exponent. The first bitwidth signed integer form fractions are summed by an adder tree using the first bitwidth to form a first sum, and when an excess leading 0 condition is detected, a second adder tree operative on the second bitwidth integer form fractions forms a second sum. The first sum or second sum, along with the maximum exponent, is converted into floating point result.

Type: Grant

Filed: June 21, 2021

Date of Patent: September 3, 2024

Assignee: Ceremorphic, Inc.

Inventor: Dylan Finch
Floating point dot product multiplier-accumulator

Patent number: 11983237

Abstract: A vector dot product multiplier receives a row vector and a column vector as floating point numbers in a format of sign plus exponent bits plus mantissa bits. The dot product multiplier generates a single dot product value by separately processing the sign bits, exponent bits, and mantissa bits in a few pipelined stages. A first pipeline stage generates a sign bit, a normalized mantissa formed by multiplying pairs multiplicand elements, and exponent information. A second pipeline stage receives the multiplied pairs of normalized mantissas, performs an adjustment, performs a padding, complement, and shift, and sums the results in an adder stage. The resulting integer is normalized to generate a sign bit, exponent, and mantissa of the floating point result.

Type: Grant

Filed: February 21, 2021

Date of Patent: May 14, 2024

Assignee: Ceremorphic, Inc.

Inventor: Dylan Finch
Process for a floating point dot product multiplier-accumulator

Patent number: 11893360

Abstract: A process for performing vector dot products receives a row vector and a column vector as floating point numbers in a format of sign plus exponent bits plus mantissa bits. The process generates a single dot product value by separately processing the sign bits, exponent bits, and mantissa bits to form a sign bit, a normalized mantissa formed by multiplying pairs multiplicand elements, and exponent information including MAX_EXP and EXP_DIFF. A second pipeline stage receives the multiplied pairs of normalized mantissas, optionally performs an exponent adjustment, pads, complements and shifts the normalized mantissas, and the results are added in a series of stages until a single addition result remains, which is normalized using MAX_EXP to form the floating point output result.

Type: Grant

Filed: February 21, 2021

Date of Patent: February 6, 2024

Assignee: Ceremorphic, Inc.

Inventor: Dylan Finch
Dynamic precision bit string accumulation

Patent number: 11782711

Abstract: Systems, apparatuses, and methods related to dynamic precision bit string accumulation are described. Dynamic bit string accumulation can be performed using an edge computing device. In an example method, dynamic precision bit string accumulation can include performing an iteration of a recursive operation using a first bit string and a second bit string and determining that a result of the iteration of the recursive operation contains a quantity of bits in a particular bit sub-set of the result that is greater than a threshold quantity of bits associated with the particular bit sub-set. The method can further include writing a result of the iteration of the recursive operation to a first register and writing at least a portion of the bits associated with the particular bit sub-set of the result to a second register.

Type: Grant

Filed: November 29, 2021

Date of Patent: October 10, 2023

Assignee: Micron Technology, Inc.

Inventors: Vijay S. Ramesh, Richard C. Murphy
Processing unit with small footprint arithmetic logic unit

Patent number: 11720328

Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.

Type: Grant

Filed: September 23, 2020

Date of Patent: August 8, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Bin He, Shubh Shah, Michael Mantor
Variable precision computing system

Patent number: 11455766

Abstract: A processor selectively adjusts the precision of data for different functional units. Specified functional units of the processor, such as shader processing unit of a graphics processing unit (GPU) include a zeroing module to store, based on the states of corresponding precision flags, a data value of zero at specified portion of an input and/or output data operand. The functional unit then processes the data including the zeroed portion. Because a portion of the data has been zeroed, the functional unit consumes less power during data processing. Furthermore, the precision flags are set such that the reduced precision of the data does not significantly impact a user experience.

Type: Grant

Filed: September 18, 2018

Date of Patent: September 27, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Pramod V. Argade, Daniel Nikolai Peroni
Floating point multiply-add, accumulate unit with carry-save accumulator

Patent number: 11429349

Abstract: Floating point Multiply-Add, Accumulate Unit, supporting BF16 format for Multiply-Accumulate operations, and FP32 Single-Precision Addition complying with the IEEE 754 Standard. The Multiply-Accumulate unit uses higher radix and longer internal 2's complement significand representation to facilitate precision as well as comparison and operation with negative numbers. The addition is performed using Carry-Save format to avoid long carry propagation and speed up the operation. Operations including overflow detection, zero detection and sign extension are adopted for 2s complement and Carry-Save format. Handling of Overflow and Sign Extension allows for fast operation relatively independent on the size of the accumulator.

Type: Grant

Filed: August 9, 2021

Date of Patent: August 30, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Vojin G. Oklobdzija, Matthew M. Kim
Hardware unit for performing matrix multiplication with clock gating

Patent number: 11321096

Abstract: Hardware units and methods for performing matrix multiplication via a multi-stage pipeline wherein the storage elements associated with one or more stages of the pipeline are clock gated based on the data elements and/or portions thereof that known to have a zero value (or can be treated as having a zero value). In some cases, the storage elements may be clock gated on a per data element basis based on whether the data element has a zero value (or can be treated as having a zero value). In other cases, the storage elements may be clock gated on a partial element basis based on the bit width of the data elements. For example, if bit width of the data elements is less than a maximum bit width for the data elements then a portion of the bits related to that data element can be treated as having a zero value and a portion of the storage elements associated with that data element may not be clocked. In yet other cases the storage elements may be clock gated on both a per element and a partial element basis.

Type: Grant

Filed: November 5, 2018

Date of Patent: May 3, 2022

Assignee: Imagination Technologies Limited

Inventors: Christopher Martin, Azzurra Pulimeno
Data conversion to/from selected data type with implied rounding mode

Patent number: 11269632

Abstract: An instruction to convert data from a source data type to a target data type is obtained. The source data type is selected from one or more source data types supported by the instruction, and the target data type is selected from one or more target data types supported by the instruction. Based on a selected data type of the source data type or the target data type, a determination is made of a rounding mode for use by the instruction. The rounding mode is implicitly set based on the selected data type; it is assigned to the selected data type. A conversion of the data from the source data type to the target data type is performed. The conversion includes performing a rounding operation using the rounding mode implicitly set. The performing the conversion provides a result in the target data type, which is written to a select location.

Type: Grant

Filed: June 17, 2021

Date of Patent: March 8, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Laith M. AlBarakat, Jonathan D. Bradbury, Timothy Slegel, Cedric Lichtenau, Joachim von Buttlar
Load exploitation and improved pipelineability of hardware instructions

Patent number: 11237909

Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.

Type: Grant

Filed: August 21, 2020

Date of Patent: February 1, 2022

Assignee: International Business Machines Corporation

Inventors: Robert F. Enenkel, Christopher Anand, Adele Olejarz, Lucas Dutton
Arithmetic processing device and method of controlling arithmetic processing device that enables suppression of size of device

Patent number: 11226791

Abstract: An arithmetic processing device has, when any or both of a first operand and a second operand included in a multiply-add operation instruction is or are zero, an exponent setting circuit sets an exponent of the first operand to a first set value, and sets an exponent of the second operand to a second set value. An exponent calculation circuit calculates an exponent obtained by a multiply-add operation, based on the exponents of the first and second operands outputted by the exponent setting circuit and an exponent of a third operand included in the multiply-add operation instruction. The sum of the first set value and the second set value is set so that a bit position of the third operand is located on a higher-order bit side than the most significant bit of the sum of the first operand and the second operand.

Type: Grant

Filed: October 2, 2019

Date of Patent: January 18, 2022

Assignee: FUJITSU LIMITED

Inventors: Takio Ono, Hiroyuki Wada
Integrated circuits with machine learning extensions

Patent number: 11175892

Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

Type: Grant

Filed: November 20, 2017

Date of Patent: November 16, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Dongdong Chen
Multiply and accumulate (MAC) unit and a method of adding numbers

Patent number: 11163531

Abstract: A method and a MAC unit that may include accumulation unit and a multiplier. A accumulation unit that includes a first part, a second part and a third part. The first part may calculate a truncated sum. The second part may be configured to (a) receive, during each calculation cycle, a carry out of an add operation performed during a calculation cycle, (b) receive a sign bit of an intermediate product calculated during the calculation cycle; and (c) calculate, by the counter logic, a counter logic value, and (d) convert, after a start of a last calculation cycle of the calculation cycles, an output value of the counter logic to an intermediate value having a two's complement format. The third part may be configured to calculate an output value of the MAC unit based on the intermediate value and a truncated sum calculated by the first part of the accumulation unit.

Type: Grant

Filed: June 21, 2019

Date of Patent: November 2, 2021

Assignee: DSP GROUP LTD.

Inventors: Moshe Haiut, Assaf Ganor
Programmable device implementing fixed and floating point functionality in a mixed architecture

Patent number: 11137983

Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.

Type: Grant

Filed: September 27, 2019

Date of Patent: October 5, 2021

Assignee: Altera Corporation

Inventors: Keone Streicher, Martin Langhammer, Yi-Wen Lin, Hyun Yi
Computing device and method

Patent number: 11106598

Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Type: Grant

Filed: December 16, 2019

Date of Patent: August 31, 2021

Assignee: Shanghai Cambricon Information Technology Co., Ltd.

Inventors: Yao Zhang, Bingrui Wang
Chained split execution of fused compound arithmetic operations

Patent number: 11061672

Abstract: A microprocessor is configured for unchained and chained modes of split execution of a fused compound arithmetic operation. In both modes of split execution, a first execution unit executes only a first part of the fused compound arithmetic operation and produces an intermediate result thereof, and a second instruction execution unit receives the intermediate result and executes a second part of the fused compound arithmetic operation to produce a final result. In the unchained mode, execution is accomplished by dispatching separate split-execution microinstructions to the first and second instruction execution units. In the chained mode, execution is accomplished by dispatching a single split-execution microinstruction to the first instruction execution unit and sending a chaining control signal or signal group to the second execution unit, causing it to execute its part of the fused arithmetic operation without needing an instruction.

Type: Grant

Filed: July 5, 2016

Date of Patent: July 13, 2021

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Thomas Elmer, Nikhil A. Patil
Using Fuzzy-Jbit location of floating-point multiply-accumulate results

Patent number: 11016731

Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

Type: Grant

Filed: March 29, 2019

Date of Patent: May 25, 2021

Assignee: Intel Corporation

Inventors: Amit Gradstein, Simon Rubanovich, Zeev Sperber
Floating-point adder circuitry with subnormal support

Patent number: 11010131

Abstract: An integrated circuit may include a floating-point adder. The adder may be implemented using a dual-path adder architecture having a near path and a far path. The near path may include a leading zero anticipator (LZA), a comparison circuit for comparing an exponent value to an LZA count, and associated circuitry for handling subnormal numbers. The far path may include a subtraction circuit for computing the difference between a received exponent value and a minimum exponent value, at least two shifters for shifting far greater and far lesser mantissa values in parallel, and associated circuitry for handling subnormal numbers. The adder may be dynamically configured to support a first mode that processes FP16 at inputs and outputs, a second mode that processes modified FP16? inputs, and a third mode that processes FP16? at inputs and outputs.

Type: Grant

Filed: September 14, 2017

Date of Patent: May 18, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Bogdan Pasca
Apparatus and method for performing dual signed and unsigned multiplication of packed data elements

Patent number: 10977039

Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.

Type: Grant

Filed: November 1, 2019

Date of Patent: April 13, 2021

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Mark Charney, Robert Valentine, Jesus Corbal, Binwei Yang
Load exploitation and improved pipelineability of hardware instructions

Patent number: 10776207

Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.

Type: Grant

Filed: September 6, 2018

Date of Patent: September 15, 2020

Assignee: International Business Machines Corporation

Inventors: Robert F. Enenkel, Christopher Anand, Lucas Dutton, Adele Olejarz
Floating-point number operation circuit and method

Patent number: 10761807

Abstract: This invention discloses a floating-point number operation circuit and a method thereof. The floating-point number operation circuit is configured to perform a fused multiplication and accumulation (fused mac) operation or a multiplication and accumulation (mac) operation on a first operand, a second operand, and a third operand, or perform a multiplication operation on the first operand and the second operand. The floating-point number operation circuit includes two rounding circuits, a multiplication circuit, a selection circuit, a control circuit, and an addition circuit. The control circuit controls the scheduling of various operations and the use of resources on each calculation path.

Type: Grant

Filed: October 23, 2018

Date of Patent: September 1, 2020

Assignee: REALTEK SEMICONDUCTOR CORPORATION

Inventor: Chia-I Chen
Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits

Patent number: 10713012

Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation.

Type: Grant

Filed: October 15, 2018

Date of Patent: July 14, 2020

Assignee: Intel Corporation

Inventors: Aditya Varma, Michael Espig
Denormalization in multi-precision floating-point arithmetic circuitry

Patent number: 10678510

Abstract: The present embodiments relate to integrated circuits with floating-point arithmetic circuitry that handles normalized and denormalized floating-point numbers. The floating-point arithmetic circuitry may include a normalization circuit and a rounding circuit, and the floating-point arithmetic circuitry may generate a first result in form of a normalized, unrounded floating-point number and a second result in form of a normalized, rounded floating-point number. If desired, the floating-point arithmetic circuitry may be implemented in specialized processing blocks.

Type: Grant

Filed: September 25, 2017

Date of Patent: June 9, 2020

Assignee: Altera Corporation

Inventor: Martin Langhammer
Stochastic rounding floating-point multiply instruction using entropy from a register

Patent number: 10671347

Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.

Type: Grant

Filed: January 28, 2016

Date of Patent: June 2, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz
Normalization of a product on a datapath

Patent number: 10649730

Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.

Type: Grant

Filed: June 26, 2019

Date of Patent: May 12, 2020

Assignee: International Business Machines Corporation

Inventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
Multi-path fused multiply-add with power control

Patent number: 10481869

Abstract: Techniques are disclosed relating to circuitry configured to perform floating-point operations such as fused multiply-addition (FMA) with multiple paths and power control. In some embodiments, an FMA unit includes a near path and multiple far paths and is configured to select a path based on a determined exponent difference. In some embodiments, the FMA unit is configured to operate portions of non-selected paths in a low power state.

Type: Grant

Filed: November 10, 2017

Date of Patent: November 19, 2019

Assignee: Apple Inc.

Inventors: Liang-Kai Wang, Ting Yu, Yu Sun
Stochastic rounding floating-point multiply instruction using entropy from a register

Patent number: 10445066

Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.

Type: Grant

Filed: February 14, 2017

Date of Patent: October 15, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz
Normalization of a product on a datapath

Patent number: 10379811

Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.

Type: Grant

Filed: November 16, 2017

Date of Patent: August 13, 2019

Assignee: International Business Machines Corporation

Inventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
Tiny detection in a floating-point unit

Patent number: 10331407

Abstract: A method for performing tiny detection in floating-point operations with a floating-point unit. The floating-point unit is configured to implement a fused-multiply-add operation on three wide operands. The floating-point unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3-to-2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.

Type: Grant

Filed: November 11, 2017

Date of Patent: June 25, 2019

Assignee: International Business Machines Corporation

Inventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner
Merged floating point operation using a modebit

Patent number: 10318290

Abstract: A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.

Type: Grant

Filed: May 24, 2017

Date of Patent: June 11, 2019

Assignee: ARM Finance Overseas Limited

Inventor: David Yiu-Man Lau
Floating point chained multiply accumulate

Patent number: 10310818

Abstract: Floating point chained multiply accumulation is performed using a multiplier to multiply a first floating point operand by a second floating point operand to generate an unrounded multiplication result. An adder then adds a third floating point operand to the unrounded multiplication result to generate an unrounded accumulation result. Rounding circuitry then applies both the rounding associated with the unrounded multiplication result and rounding associated with the unrounded accumulation result to generate a rounded accumulation result.

Type: Grant

Filed: July 19, 2017

Date of Patent: June 4, 2019

Assignee: ARM Limited

Inventor: Felix Segundo Missel Manzo
Methods and apparatuses for calculating FP (full precision) and PP (partial precision) values

Patent number: 10248417

Abstract: A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, contains at least the following steps: decoding an instruction request from a compiler; executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to the instruction mode of the instruction request, thereby enabling ALGs (Arithmetic Logic Groups) to execute lanes of a thread. m is less than n and the precision of the first-type data is lower than the precision of the second-type data.

Type: Grant

Filed: August 24, 2017

Date of Patent: April 2, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Huaisheng Zhang, Dacheng Liang, Boming Chen, Renyu Bian
Tiny detection in a floating-point unit

Patent number: 10241756

Abstract: A floating-point unit for performing tiny detection in floating-point operations. The floating-point unit is configured to implement a fused-multiply-add operation on three wide operands. The floating-point unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3-to-2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.

Type: Grant

Filed: July 11, 2017

Date of Patent: March 26, 2019

Assignee: International Business Machines Corporation

Inventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner
Normalization of a product on a datapath

Patent number: 10235135

Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.

Type: Grant

Filed: July 17, 2017

Date of Patent: March 19, 2019

Assignee: International Business Machines Corporation

Inventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
Apparatus and method for estimating a shift amount when performing floating-point subtraction

Patent number: 10140093

Abstract: An apparatus and method are provided for estimating a shift amount when employing processing circuitry to perform a subtraction operation to subtract a second significand value of a second floating-point operand from a first significand value of a first floating-point operand in order to generate a difference value. Shift estimation circuitry then determines an estimated shift amount to be applied to the difference value. The shift estimation circuitry comprises significand analysis circuitry to generate, from analysis of the significand values of the two floating-point operands, a first bit string identifying a most significant bit position within the difference value that is predicted to have its bit set to a determined value. In parallel, shift limiting circuitry generates from an exponent value a second bit string identifying a shift limit bit position.

Type: Grant

Filed: March 30, 2017

Date of Patent: November 27, 2018

Assignee: ARM Limited

Inventors: David Raymond Lutz, Ian Michael Caulfield
Vector checksum instruction

Patent number: 10101998

Abstract: A Vector Checksum instruction. Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.

Type: Grant

Filed: May 25, 2017

Date of Patent: October 16, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Eric M. Schwarz
Vector multiplication with accumulation in large register space

Patent number: 10095516

Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.

Type: Grant

Filed: June 29, 2012

Date of Patent: October 9, 2018

Assignee: INTEL CORPORATION

Inventors: Shay Gueron, Vlad Krasnov, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Packed data alignment plus compute instructions, processors, methods, and systems

Patent number: 10001995

Abstract: A processor includes a decode unit to decode a packed data alignment plus compute instruction. The instruction is to indicate a first set of one or more source packed data operands that is to include first data elements, a second set of one or more source packed data operands that is to include second data elements, at least one data element offset. An execution unit, in response to the instruction, is to store a result packed data operand that is to include result data elements that each have a value of an operation performed with a pair of a data element of the first set of source packed data operands and a data element of the second set of source packed data operands. The execution unit is to apply the at least one data element offset to at least a corresponding one of the first and second sets of source packed data operands. The at least one data element offset is to counteract any lack of correspondence between the data elements of each pair in the first and second sets of source packed data operands.

Type: Grant

Filed: June 2, 2015

Date of Patent: June 19, 2018

Assignee: Intel Corporation

Inventors: Edwin Jan Van Dalen, Alexander Augusteijn, Martinus C. Wezelenburg, Steven Roos
Binary fused multiply-add floating-point calculations

Patent number: 9959093

Abstract: A binary fused multiply-add floating-point unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a rounding-up signal.

Type: Grant

Filed: June 29, 2016

Date of Patent: May 1, 2018

Assignee: International Business Machines Corporation

Inventors: Michael Klein, Klaus M. Kroener, Cédric Lichtenau, Silvia Melitta Mueller
Binary fused multiply-add floating-point calculations

Patent number: 9952829

Abstract: A binary fused multiply-add floating-point unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a rounding-up signal.

Type: Grant

Filed: February 1, 2016

Date of Patent: April 24, 2018

Assignee: International Business Machines Corporation

Inventors: Michael Klein, Klaus M. Kroener, Cédric Lichtenau, Silvia Melitta Mueller
Multiply-and-accumulate unit in carry-save adder format and application in a feedback loop equalizer

Patent number: 9928035

Abstract: A multiply and accumulation (MAC) unit for multiplying a provided first and a provided second multiplicand and for adding a provided summand to the resulting product is described. The MAC includes at least one multiplication block which is configured to multiply a first input signal and a second input signal, wherein the first input signal is given in a carry-save adder format and the second input signal is given in a binary format, wherein the multiplication result is provided in a carry-save format, and a carry-save adder which is configured to add to the result of the multiplication the provided summand.

Type: Grant

Filed: May 18, 2016

Date of Patent: March 27, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Marcel Kossel
Split-path heuristic for performing a fused FMA operation

Patent number: 9891886

Abstract: A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.

Type: Grant

Filed: June 24, 2015

Date of Patent: February 13, 2018

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD

Inventor: Thomas Elmer
Microarchitecture for floating point fused multiply-add with exponent scaling

Patent number: 9841948

Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.

Type: Grant

Filed: August 12, 2015

Date of Patent: December 12, 2017

Assignee: QUALCOMM Incorporated

Inventor: Liang-Kai Wang
Approach to power reduction in floating-point operations

Patent number: 9829956

Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.

Type: Grant

Filed: November 21, 2012

Date of Patent: November 28, 2017

Assignee: NVIDIA Corporation

Inventors: David Conrad Tannenbaum, Colin Sprinkle, Stuart F. Oberman, Ming Y. Siu, Srinivasan Iyer, Ian-Chi Yan Kwong
Temporally split fused multiply-accumulate operation

Patent number: 9778908

Abstract: A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiply-accumulate operation. The second sub-operation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result.

Type: Grant

Filed: June 24, 2015

Date of Patent: October 3, 2017

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventor: Thomas Elmer

1 2 3 4 5 … next