Multiplication Followed By Addition Patents (Class 708/501)
-
Patent number: 12217053Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.Type: GrantFiled: December 4, 2023Date of Patent: February 4, 2025Assignee: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 12216735Abstract: A dot-product architecture and method are disclosed for calculating floating-point dot-products of two vectors. The architecture includes an array of multiplier units that each include an integer logic that multiplies integer values of corresponding elements of the two vectors; an exponent logic that adds exponent values of the corresponding elements of the two vectors to form an unbiased exponent values, and a local shifter that forms a first shifted value by shifting a product-integer value by a number of bits in a predetermined direction based on a difference value between an unbiased exponent value corresponding to the product-integer value and a maximum unbiased exponent value for the array of multiplier units. An adder tree adds shifted values output from local shifters of the array of multiplier units to form an output, and an accumulator accumulates the output of the addition unit.Type: GrantFiled: January 20, 2021Date of Patent: February 4, 2025Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Hamzah Ahmed Ali Abdelaziz, Ali Shafiee Ardestani, Joseph H. Hassoun
-
Patent number: 12197889Abstract: A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates: an integer form fraction at a first bitwidth and a second bitwidth greater than the first bitwidth, a sign bit, and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. The integer form fractions of the first bitwidths are provided to an adder tree using the first bitwidth, and if the sum has an excess percentage of leading 0s, then the second bitwidth is used by an adder tree using the second bitwidth to form a great precision integer form fraction. The sign, integer form fraction, and maximum exponent are provided to an normalizer which generates a floating point result.Type: GrantFiled: June 21, 2021Date of Patent: January 14, 2025Assignee: Ceremorphic, Inc.Inventor: Dylan Finch
-
Patent number: 12106069Abstract: A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which outputs a signed integer form fraction and a maximum exponent. A range estimator forms a possible range of values from the exponent differences and determines an adder precision. The integer form fractions are summed using the adder precision, a sign bit is extracted, and a floating point value is output. Each MAC processor provides its integer form fraction with a precision determined by the MAC processor's exponent difference.Type: GrantFiled: June 21, 2021Date of Patent: October 1, 2024Assignee: Ceremorphic, Inc.Inventor: Dylan Finch
-
Power saving floating point Multiplier-Accumulator with a high precision accumulation detection mode
Patent number: 12079593Abstract: A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer form fraction with a first bitwidth and a second bitwidth, along with a maximum exponent. The first bitwidth signed integer form fractions are summed by an adder tree using the first bitwidth to form a first sum, and when an excess leading 0 condition is detected, a second adder tree operative on the second bitwidth integer form fractions forms a second sum. The first sum or second sum, along with the maximum exponent, is converted into floating point result.Type: GrantFiled: June 21, 2021Date of Patent: September 3, 2024Assignee: Ceremorphic, Inc.Inventor: Dylan Finch -
Patent number: 11983237Abstract: A vector dot product multiplier receives a row vector and a column vector as floating point numbers in a format of sign plus exponent bits plus mantissa bits. The dot product multiplier generates a single dot product value by separately processing the sign bits, exponent bits, and mantissa bits in a few pipelined stages. A first pipeline stage generates a sign bit, a normalized mantissa formed by multiplying pairs multiplicand elements, and exponent information. A second pipeline stage receives the multiplied pairs of normalized mantissas, performs an adjustment, performs a padding, complement, and shift, and sums the results in an adder stage. The resulting integer is normalized to generate a sign bit, exponent, and mantissa of the floating point result.Type: GrantFiled: February 21, 2021Date of Patent: May 14, 2024Assignee: Ceremorphic, Inc.Inventor: Dylan Finch
-
Patent number: 11893360Abstract: A process for performing vector dot products receives a row vector and a column vector as floating point numbers in a format of sign plus exponent bits plus mantissa bits. The process generates a single dot product value by separately processing the sign bits, exponent bits, and mantissa bits to form a sign bit, a normalized mantissa formed by multiplying pairs multiplicand elements, and exponent information including MAX_EXP and EXP_DIFF. A second pipeline stage receives the multiplied pairs of normalized mantissas, optionally performs an exponent adjustment, pads, complements and shifts the normalized mantissas, and the results are added in a series of stages until a single addition result remains, which is normalized using MAX_EXP to form the floating point output result.Type: GrantFiled: February 21, 2021Date of Patent: February 6, 2024Assignee: Ceremorphic, Inc.Inventor: Dylan Finch
-
Patent number: 11782711Abstract: Systems, apparatuses, and methods related to dynamic precision bit string accumulation are described. Dynamic bit string accumulation can be performed using an edge computing device. In an example method, dynamic precision bit string accumulation can include performing an iteration of a recursive operation using a first bit string and a second bit string and determining that a result of the iteration of the recursive operation contains a quantity of bits in a particular bit sub-set of the result that is greater than a threshold quantity of bits associated with the particular bit sub-set. The method can further include writing a result of the iteration of the recursive operation to a first register and writing at least a portion of the bits associated with the particular bit sub-set of the result to a second register.Type: GrantFiled: November 29, 2021Date of Patent: October 10, 2023Assignee: Micron Technology, Inc.Inventors: Vijay S. Ramesh, Richard C. Murphy
-
Patent number: 11720328Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.Type: GrantFiled: September 23, 2020Date of Patent: August 8, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Bin He, Shubh Shah, Michael Mantor
-
Patent number: 11455766Abstract: A processor selectively adjusts the precision of data for different functional units. Specified functional units of the processor, such as shader processing unit of a graphics processing unit (GPU) include a zeroing module to store, based on the states of corresponding precision flags, a data value of zero at specified portion of an input and/or output data operand. The functional unit then processes the data including the zeroed portion. Because a portion of the data has been zeroed, the functional unit consumes less power during data processing. Furthermore, the precision flags are set such that the reduced precision of the data does not significantly impact a user experience.Type: GrantFiled: September 18, 2018Date of Patent: September 27, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Pramod V. Argade, Daniel Nikolai Peroni
-
Patent number: 11429349Abstract: Floating point Multiply-Add, Accumulate Unit, supporting BF16 format for Multiply-Accumulate operations, and FP32 Single-Precision Addition complying with the IEEE 754 Standard. The Multiply-Accumulate unit uses higher radix and longer internal 2's complement significand representation to facilitate precision as well as comparison and operation with negative numbers. The addition is performed using Carry-Save format to avoid long carry propagation and speed up the operation. Operations including overflow detection, zero detection and sign extension are adopted for 2s complement and Carry-Save format. Handling of Overflow and Sign Extension allows for fast operation relatively independent on the size of the accumulator.Type: GrantFiled: August 9, 2021Date of Patent: August 30, 2022Assignee: SambaNova Systems, Inc.Inventors: Vojin G. Oklobdzija, Matthew M. Kim
-
Patent number: 11321096Abstract: Hardware units and methods for performing matrix multiplication via a multi-stage pipeline wherein the storage elements associated with one or more stages of the pipeline are clock gated based on the data elements and/or portions thereof that known to have a zero value (or can be treated as having a zero value). In some cases, the storage elements may be clock gated on a per data element basis based on whether the data element has a zero value (or can be treated as having a zero value). In other cases, the storage elements may be clock gated on a partial element basis based on the bit width of the data elements. For example, if bit width of the data elements is less than a maximum bit width for the data elements then a portion of the bits related to that data element can be treated as having a zero value and a portion of the storage elements associated with that data element may not be clocked. In yet other cases the storage elements may be clock gated on both a per element and a partial element basis.Type: GrantFiled: November 5, 2018Date of Patent: May 3, 2022Assignee: Imagination Technologies LimitedInventors: Christopher Martin, Azzurra Pulimeno
-
Patent number: 11269632Abstract: An instruction to convert data from a source data type to a target data type is obtained. The source data type is selected from one or more source data types supported by the instruction, and the target data type is selected from one or more target data types supported by the instruction. Based on a selected data type of the source data type or the target data type, a determination is made of a rounding mode for use by the instruction. The rounding mode is implicitly set based on the selected data type; it is assigned to the selected data type. A conversion of the data from the source data type to the target data type is performed. The conversion includes performing a rounding operation using the rounding mode implicitly set. The performing the conversion provides a result in the target data type, which is written to a select location.Type: GrantFiled: June 17, 2021Date of Patent: March 8, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Laith M. AlBarakat, Jonathan D. Bradbury, Timothy Slegel, Cedric Lichtenau, Joachim von Buttlar
-
Patent number: 11237909Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.Type: GrantFiled: August 21, 2020Date of Patent: February 1, 2022Assignee: International Business Machines CorporationInventors: Robert F. Enenkel, Christopher Anand, Adele Olejarz, Lucas Dutton
-
Patent number: 11226791Abstract: An arithmetic processing device has, when any or both of a first operand and a second operand included in a multiply-add operation instruction is or are zero, an exponent setting circuit sets an exponent of the first operand to a first set value, and sets an exponent of the second operand to a second set value. An exponent calculation circuit calculates an exponent obtained by a multiply-add operation, based on the exponents of the first and second operands outputted by the exponent setting circuit and an exponent of a third operand included in the multiply-add operation instruction. The sum of the first set value and the second set value is set so that a bit position of the third operand is located on a higher-order bit side than the most significant bit of the sum of the first operand and the second operand.Type: GrantFiled: October 2, 2019Date of Patent: January 18, 2022Assignee: FUJITSU LIMITEDInventors: Takio Ono, Hiroyuki Wada
-
Patent number: 11175892Abstract: An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.Type: GrantFiled: November 20, 2017Date of Patent: November 16, 2021Assignee: Intel CorporationInventors: Martin Langhammer, Dongdong Chen
-
Patent number: 11163531Abstract: A method and a MAC unit that may include accumulation unit and a multiplier. A accumulation unit that includes a first part, a second part and a third part. The first part may calculate a truncated sum. The second part may be configured to (a) receive, during each calculation cycle, a carry out of an add operation performed during a calculation cycle, (b) receive a sign bit of an intermediate product calculated during the calculation cycle; and (c) calculate, by the counter logic, a counter logic value, and (d) convert, after a start of a last calculation cycle of the calculation cycles, an output value of the counter logic to an intermediate value having a two's complement format. The third part may be configured to calculate an output value of the MAC unit based on the intermediate value and a truncated sum calculated by the first part of the accumulation unit.Type: GrantFiled: June 21, 2019Date of Patent: November 2, 2021Assignee: DSP GROUP LTD.Inventors: Moshe Haiut, Assaf Ganor
-
Patent number: 11137983Abstract: Configurable specialized processing blocks, such as DSP blocks, are described that implement fixed and floating-point functionality in a single mixed architecture on a programmable device. The described architecture reduces the need to construct floating-point functions outside the configurable specialized processing block, thereby minimizing hardware cost and area. The disclosed architecture also introduces pipelining into the DSP block in order to ensure the floating-point multiplication and addition functions remain in synchronicity, thereby increasing the maximum frequency at which the DSP block can operate. Moreover, the disclosed architecture includes logic circuitry to support floating-point exception handling.Type: GrantFiled: September 27, 2019Date of Patent: October 5, 2021Assignee: Altera CorporationInventors: Keone Streicher, Martin Langhammer, Yi-Wen Lin, Hyun Yi
-
Patent number: 11106598Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.Type: GrantFiled: December 16, 2019Date of Patent: August 31, 2021Assignee: Shanghai Cambricon Information Technology Co., Ltd.Inventors: Yao Zhang, Bingrui Wang
-
Patent number: 11061672Abstract: A microprocessor is configured for unchained and chained modes of split execution of a fused compound arithmetic operation. In both modes of split execution, a first execution unit executes only a first part of the fused compound arithmetic operation and produces an intermediate result thereof, and a second instruction execution unit receives the intermediate result and executes a second part of the fused compound arithmetic operation to produce a final result. In the unchained mode, execution is accomplished by dispatching separate split-execution microinstructions to the first and second instruction execution units. In the chained mode, execution is accomplished by dispatching a single split-execution microinstruction to the first instruction execution unit and sending a chaining control signal or signal group to the second execution unit, causing it to execute its part of the fused arithmetic operation without needing an instruction.Type: GrantFiled: July 5, 2016Date of Patent: July 13, 2021Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventors: Thomas Elmer, Nikhil A. Patil
-
Patent number: 11016731Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.Type: GrantFiled: March 29, 2019Date of Patent: May 25, 2021Assignee: Intel CorporationInventors: Amit Gradstein, Simon Rubanovich, Zeev Sperber
-
Patent number: 11010131Abstract: An integrated circuit may include a floating-point adder. The adder may be implemented using a dual-path adder architecture having a near path and a far path. The near path may include a leading zero anticipator (LZA), a comparison circuit for comparing an exponent value to an LZA count, and associated circuitry for handling subnormal numbers. The far path may include a subtraction circuit for computing the difference between a received exponent value and a minimum exponent value, at least two shifters for shifting far greater and far lesser mantissa values in parallel, and associated circuitry for handling subnormal numbers. The adder may be dynamically configured to support a first mode that processes FP16 at inputs and outputs, a second mode that processes modified FP16? inputs, and a third mode that processes FP16? at inputs and outputs.Type: GrantFiled: September 14, 2017Date of Patent: May 18, 2021Assignee: Intel CorporationInventors: Martin Langhammer, Bogdan Pasca
-
Patent number: 10977039Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.Type: GrantFiled: November 1, 2019Date of Patent: April 13, 2021Assignee: Intel CorporationInventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Mark Charney, Robert Valentine, Jesus Corbal, Binwei Yang
-
Patent number: 10776207Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.Type: GrantFiled: September 6, 2018Date of Patent: September 15, 2020Assignee: International Business Machines CorporationInventors: Robert F. Enenkel, Christopher Anand, Lucas Dutton, Adele Olejarz
-
Patent number: 10761807Abstract: This invention discloses a floating-point number operation circuit and a method thereof. The floating-point number operation circuit is configured to perform a fused multiplication and accumulation (fused mac) operation or a multiplication and accumulation (mac) operation on a first operand, a second operand, and a third operand, or perform a multiplication operation on the first operand and the second operand. The floating-point number operation circuit includes two rounding circuits, a multiplication circuit, a selection circuit, a control circuit, and an addition circuit. The control circuit controls the scheduling of various operations and the use of resources on each calculation path.Type: GrantFiled: October 23, 2018Date of Patent: September 1, 2020Assignee: REALTEK SEMICONDUCTOR CORPORATIONInventor: Chia-I Chen
-
Patent number: 10713012Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation.Type: GrantFiled: October 15, 2018Date of Patent: July 14, 2020Assignee: Intel CorporationInventors: Aditya Varma, Michael Espig
-
Patent number: 10678510Abstract: The present embodiments relate to integrated circuits with floating-point arithmetic circuitry that handles normalized and denormalized floating-point numbers. The floating-point arithmetic circuitry may include a normalization circuit and a rounding circuit, and the floating-point arithmetic circuitry may generate a first result in form of a normalized, unrounded floating-point number and a second result in form of a normalized, rounded floating-point number. If desired, the floating-point arithmetic circuitry may be implemented in specialized processing blocks.Type: GrantFiled: September 25, 2017Date of Patent: June 9, 2020Assignee: Altera CorporationInventor: Martin Langhammer
-
Patent number: 10671347Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.Type: GrantFiled: January 28, 2016Date of Patent: June 2, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz
-
Patent number: 10649730Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.Type: GrantFiled: June 26, 2019Date of Patent: May 12, 2020Assignee: International Business Machines CorporationInventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
-
Patent number: 10481869Abstract: Techniques are disclosed relating to circuitry configured to perform floating-point operations such as fused multiply-addition (FMA) with multiple paths and power control. In some embodiments, an FMA unit includes a near path and multiple far paths and is configured to select a path based on a determined exponent difference. In some embodiments, the FMA unit is configured to operate portions of non-selected paths in a low power state.Type: GrantFiled: November 10, 2017Date of Patent: November 19, 2019Assignee: Apple Inc.Inventors: Liang-Kai Wang, Ting Yu, Yu Sun
-
Patent number: 10445066Abstract: Embodiments are directed to a computer implemented method for executing machine instructions in a central processing unit. The method includes obtaining, by a processor system, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture. The method further includes executing the machine instruction, wherein the executing includes loading a multiplicand into a multiplicand register, and loading a multiplier into a multiplier register. The executing further generates an intermediate product having least significant bits by multiplying the multiplicand and the multiplier. The executing further includes generating a rounded product by performing a probability analysis on the least significant bits of the intermediate product, and initiating a rounding operation on the intermediate product to produce the rounded product based at least in part on the probability analysis.Type: GrantFiled: February 14, 2017Date of Patent: October 15, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Steven R. Carlough, Brian R. Prasky, Eric M. Schwarz
-
Patent number: 10379811Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.Type: GrantFiled: November 16, 2017Date of Patent: August 13, 2019Assignee: International Business Machines CorporationInventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
-
Patent number: 10331407Abstract: A method for performing tiny detection in floating-point operations with a floating-point unit. The floating-point unit is configured to implement a fused-multiply-add operation on three wide operands. The floating-point unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3-to-2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.Type: GrantFiled: November 11, 2017Date of Patent: June 25, 2019Assignee: International Business Machines CorporationInventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner
-
Patent number: 10318290Abstract: A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.Type: GrantFiled: May 24, 2017Date of Patent: June 11, 2019Assignee: ARM Finance Overseas LimitedInventor: David Yiu-Man Lau
-
Patent number: 10310818Abstract: Floating point chained multiply accumulation is performed using a multiplier to multiply a first floating point operand by a second floating point operand to generate an unrounded multiplication result. An adder then adds a third floating point operand to the unrounded multiplication result to generate an unrounded accumulation result. Rounding circuitry then applies both the rounding associated with the unrounded multiplication result and rounding associated with the unrounded accumulation result to generate a rounded accumulation result.Type: GrantFiled: July 19, 2017Date of Patent: June 4, 2019Assignee: ARM LimitedInventor: Felix Segundo Missel Manzo
-
Patent number: 10248417Abstract: A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, contains at least the following steps: decoding an instruction request from a compiler; executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to the instruction mode of the instruction request, thereby enabling ALGs (Arithmetic Logic Groups) to execute lanes of a thread. m is less than n and the precision of the first-type data is lower than the precision of the second-type data.Type: GrantFiled: August 24, 2017Date of Patent: April 2, 2019Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventors: Huaisheng Zhang, Dacheng Liang, Boming Chen, Renyu Bian
-
Patent number: 10241756Abstract: A floating-point unit for performing tiny detection in floating-point operations. The floating-point unit is configured to implement a fused-multiply-add operation on three wide operands. The floating-point unit comprise: a multiplier, a left shifter, a right shifter a select circuit comprising a 3-to-2 compressor, an adder connected to the dataflow from the select circuit, a first feedback path connecting a carry output) of the adder to the select circuit, and a second feedback path connecting an output of the adder to the left and right shifters for passing an intermediate wide result through the left and right shifters. The adder is configured to provide an unrounded result for tiny detection.Type: GrantFiled: July 11, 2017Date of Patent: March 26, 2019Assignee: International Business Machines CorporationInventors: Michael K. Kroener, Silvia M. Mueller, Andreas Wagner
-
Patent number: 10235135Abstract: A unit operates on a sum term and a carry term separated into a high part and a low part of a product and performs a method that includes iteratively computing a carry save product and separating the carry save product into the high part and the low part: an intermediate product. The unit generates an intermediate wide result by performing a wide addition of the intermediate product to generate an unrounded sum for the high part (i.e., a fused-multiply-add high part) and the low part (i.e., a fused-multiply-add high part). The unit pre-aligns the intermediate wide result on two fixed length shifters such that the fused-multiply-add high part and the fused-multiply-add low part are pre-aligned to each fit on one shifter of the two fixed length shifters.Type: GrantFiled: July 17, 2017Date of Patent: March 19, 2019Assignee: International Business Machines CorporationInventors: Klaus M. Kroener, Cedric Lichtenau, Silvia M. Mueller, Andreas Wagner
-
Patent number: 10140093Abstract: An apparatus and method are provided for estimating a shift amount when employing processing circuitry to perform a subtraction operation to subtract a second significand value of a second floating-point operand from a first significand value of a first floating-point operand in order to generate a difference value. Shift estimation circuitry then determines an estimated shift amount to be applied to the difference value. The shift estimation circuitry comprises significand analysis circuitry to generate, from analysis of the significand values of the two floating-point operands, a first bit string identifying a most significant bit position within the difference value that is predicted to have its bit set to a determined value. In parallel, shift limiting circuitry generates from an exponent value a second bit string identifying a shift limit bit position.Type: GrantFiled: March 30, 2017Date of Patent: November 27, 2018Assignee: ARM LimitedInventors: David Raymond Lutz, Ian Michael Caulfield
-
Patent number: 10101998Abstract: A Vector Checksum instruction. Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.Type: GrantFiled: May 25, 2017Date of Patent: October 16, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Eric M. Schwarz
-
Patent number: 10095516Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.Type: GrantFiled: June 29, 2012Date of Patent: October 9, 2018Assignee: INTEL CORPORATIONInventors: Shay Gueron, Vlad Krasnov, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 10001995Abstract: A processor includes a decode unit to decode a packed data alignment plus compute instruction. The instruction is to indicate a first set of one or more source packed data operands that is to include first data elements, a second set of one or more source packed data operands that is to include second data elements, at least one data element offset. An execution unit, in response to the instruction, is to store a result packed data operand that is to include result data elements that each have a value of an operation performed with a pair of a data element of the first set of source packed data operands and a data element of the second set of source packed data operands. The execution unit is to apply the at least one data element offset to at least a corresponding one of the first and second sets of source packed data operands. The at least one data element offset is to counteract any lack of correspondence between the data elements of each pair in the first and second sets of source packed data operands.Type: GrantFiled: June 2, 2015Date of Patent: June 19, 2018Assignee: Intel CorporationInventors: Edwin Jan Van Dalen, Alexander Augusteijn, Martinus C. Wezelenburg, Steven Roos
-
Patent number: 9959093Abstract: A binary fused multiply-add floating-point unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a rounding-up signal.Type: GrantFiled: June 29, 2016Date of Patent: May 1, 2018Assignee: International Business Machines CorporationInventors: Michael Klein, Klaus M. Kroener, Cédric Lichtenau, Silvia Melitta Mueller
-
Patent number: 9952829Abstract: A binary fused multiply-add floating-point unit configured to operate on an addend, a multiplier, and a multiplicand. The unit is configured to receive as the addend an unrounded result of a prior operation executed in the unit via an early result feedback path; to perform an alignment shift of the unrounded addend on an unrounded exponent and an unrounded mantissa; as well as perform a rounding correction for the addend in parallel to the actual alignment shift, responsive to a rounding-up signal.Type: GrantFiled: February 1, 2016Date of Patent: April 24, 2018Assignee: International Business Machines CorporationInventors: Michael Klein, Klaus M. Kroener, Cédric Lichtenau, Silvia Melitta Mueller
-
Multiply-and-accumulate unit in carry-save adder format and application in a feedback loop equalizer
Patent number: 9928035Abstract: A multiply and accumulation (MAC) unit for multiplying a provided first and a provided second multiplicand and for adding a provided summand to the resulting product is described. The MAC includes at least one multiplication block which is configured to multiply a first input signal and a second input signal, wherein the first input signal is given in a carry-save adder format and the second input signal is given in a binary format, wherein the multiplication result is provided in a carry-save format, and a carry-save adder which is configured to add to the result of the multiplication the provided summand.Type: GrantFiled: May 18, 2016Date of Patent: March 27, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Marcel Kossel -
Patent number: 9891886Abstract: A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.Type: GrantFiled: June 24, 2015Date of Patent: February 13, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTDInventor: Thomas Elmer
-
Patent number: 9841948Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.Type: GrantFiled: August 12, 2015Date of Patent: December 12, 2017Assignee: QUALCOMM IncorporatedInventor: Liang-Kai Wang
-
Patent number: 9829956Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.Type: GrantFiled: November 21, 2012Date of Patent: November 28, 2017Assignee: NVIDIA CorporationInventors: David Conrad Tannenbaum, Colin Sprinkle, Stuart F. Oberman, Ming Y. Siu, Srinivasan Iyer, Ian-Chi Yan Kwong
-
Patent number: 9778908Abstract: A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiply-accumulate operation. The second sub-operation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result.Type: GrantFiled: June 24, 2015Date of Patent: October 3, 2017Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer
-
Patent number: 9720648Abstract: A method for hiding implicit bit corrections in a partial product adder array in a binary and hexadecimal floating-point multiplier such that no additional adder stages are needed for the implicit bit corrections. Two leading-one correction terms are generated for the fraction in the multiplier floating-point number and two leading-one correction terms are generated for the fraction in the multiplicand floating-point number. The floating-point numbers may be single-precision or double-precision. Each leading-one correction term for the single-precision case is appended to the left of an intermediate partial product sum in the adder array that is an input to an adder so as to not to extend the bits in the input further to the left than the bits in another input to the adder. Each leading-one correction term for the double-precision case replaces an adder input that is unused when base-2 floating-point numbers are multiplied.Type: GrantFiled: December 22, 2014Date of Patent: August 1, 2017Assignee: International Business Machines CorporationInventors: Silvia M. Mueller, Son Dao Trong