Round Off Or Truncation Patents (Class 708/497)
  • Patent number: 11768685
    Abstract: An integrated circuit, comprising an instruction pipeline that includes instruction fetch phase circuitry, instruction decode phase circuitry, and instruction execution circuitry. The instruction execution circuitry includes transformation circuitry for receiving an interleaved dual vector operand as an input and for outputting a first natural order vector including a first set of data values from the interleaved dual vector operand and a second natural order vector including a second set of data values from the interleaved dual vector operand.
    Type: Grant
    Filed: May 5, 2022
    Date of Patent: September 26, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Mujibur Rahman, Timothy David Anderson, Joseph Zbiciak
  • Patent number: 11449309
    Abstract: A hardware module comprising circuitry configured to: store a sequence of n bits in a register of the hardware module; generate a signed integer comprising a magnitude component and a sign bit by: if the most significant bit of the sequence of n bits is equal to one: set each of the n?1 of the most significant bits of the magnitude component to be equal to the corresponding bit of the n?1 least significant bits of the sequence of n bits; and set the sign bit to be zero; if the most significant bit of the sequence of n bits is equal to zero: set each of the n?1 of the most significant bits of the magnitude component to be equal to the inverse of the corresponding bit of the n?1 least significant bits of the sequence of n bits; and set the sign bit to be one.
    Type: Grant
    Filed: June 21, 2019
    Date of Patent: September 20, 2022
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Mrudula Gore
  • Patent number: 11449574
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to perform stochastic rounding, thus in some circumstances enabling reducing systematic bias in long dependency chains of floating-point computations. The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network.
    Type: Grant
    Filed: April 13, 2018
    Date of Patent: September 20, 2022
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Michael Edwin James, Michael Morrison, Gary R. Lauterbach, Srikanth Arekapudi
  • Patent number: 11422803
    Abstract: A processing-in-memory (PIM) device includes a data storage region and a multiplication/accumulation (MAC) operator. The data storage region is configured to store first data and second data. The MAC operator is configured to perform a MAC arithmetic operation of the first data and the second data. The MAC operator includes a MAC circuit configured to perform the MAC arithmetic operation to output MAC result data and a data output unit configured to feedback bias data to the MAC circuit prior to the MAC arithmetic operation.
    Type: Grant
    Filed: January 8, 2021
    Date of Patent: August 23, 2022
    Assignee: SK hynix Inc.
    Inventor: Choung Ki Song
  • Patent number: 11392418
    Abstract: A computer system may initialize one or more workloads. The computer system may operate in a boost mode and a regular mode. The boost mode may include an adjustment of a pacing setting and an adjustment of group availability targets for executing the one or more workloads. The computer system may identify that the boost mode is enabled during a system start of the computer system. The computer system may identify that the pacing setting is operating in the regular mode. The computer system may dynamically increase the pacing setting. The increase of the pacing setting may enable an increased processor utilization of the computer system by the one or more workloads. The increased processor utilization may generate a concurrent processing of the one or more workloads. The computer system may determine an end of the boost mode and reset the pacing setting.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: July 19, 2022
    Assignee: International Business Machines Corporation
    Inventors: Juergen Holtz, Qais Noorshams
  • Patent number: 11372644
    Abstract: A processor system comprises a shared memory and a processing element. The processing element includes a matrix processor unit and is in communication with the shared memory. The processing element is configured to receive a processor instruction specifying a data matrix and a matrix manipulation operation. A manipulation matrix based on the processor instruction is identified. The data matrix and the manipulation matrix are loaded into the matrix processor unit and a matrix operation is performed to determine a result matrix. The result matrix is outputted to a destination location.
    Type: Grant
    Filed: December 9, 2019
    Date of Patent: June 28, 2022
    Assignee: Meta Platforms, Inc.
    Inventors: Thomas Mark Ulrich, Krishnakumar Narayanan Nair, Yuchen Hao
  • Patent number: 11334796
    Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: May 17, 2022
    Assignee: Intel Corporation
    Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
  • Patent number: 10942889
    Abstract: Bit string accumulation in a memory array periphery is described. Control circuitry (e.g., a processing device) may be utilized to control performance of operations using bit strings within a memory device. Results of the operations may be accumulated in circuitry peripheral to a memory array of the memory device. For instance, a plurality of sense amplifiers may be coupled to a memory array and a processing device. A quantity of sense amplifiers among the plurality of sense amplifiers can be the same as a quantity of rows or columns of the array. The processing device may be configured to cause performance of a recursive operation using one or more bit strings that are formatted according to a Type III universal number format or a posit format. The processing device may further be configured to cause resultant bit strings representing results of iterations of the recursive operation to be accumulated in the plurality of sense amplifiers.
    Type: Grant
    Filed: June 4, 2019
    Date of Patent: March 9, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Vijay S. Ramesh
  • Patent number: 10942890
    Abstract: Systems, apparatuses, and methods related to bit string accumulation in memory array periphery are described. Control circuitry (e.g., a processing device) may be utilized to control performance of operations using bit strings within a memory device. Results of the operations may be accumulated in circuitry peripheral to a memory array of the memory device. For instance, a method for bit string accumulation in memory array periphery can include retrieving a bit string stored in a data structure of a memory array. The bit string can represent a result of performance of an arithmetic operation or a logical operation. The method can further include storing the bit string in a plurality of sense amplifiers located in a periphery of the memory array and using the bit string as an operand in performance of at least a portion of a recursive operation.
    Type: Grant
    Filed: June 4, 2019
    Date of Patent: March 9, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Vijay S. Ramesh
  • Patent number: 10936569
    Abstract: In a system for storing in memory a tensor that includes at least three modes, elements of the tensor are stored in a mode-based order for improving locality of references when the elements are accessed during an operation on the tensor. To facilitate efficient data reuse in a tensor transform that includes several iterations, on a tensor that includes at least three modes, a system performs a first iteration that includes a first operation on the tensor to obtain a first intermediate result, and the first intermediate result includes a first intermediate-tensor. The first intermediate result is stored in memory, and a second iteration is performed in which a second operation on the first intermediate result accessed from the memory is performed, so as to avoid a third operation, that would be required if the first intermediate result were not accessed from the memory.
    Type: Grant
    Filed: May 20, 2013
    Date of Patent: March 2, 2021
    Assignee: Reservoir Labs, Inc.
    Inventors: Muthu Manikandan Baskaran, Richard A. Lethin, Benoit J. Meister, Nicolas T. Vasilache
  • Patent number: 10936769
    Abstract: Systems and methods evaluate simulation models and measure floating point arithmetic errors in terms of Unit in Last Place (ULP). The simulation model may include model elements that perform numerical computations using Native Floating Point (NFP) arithmetic. The model elements may be arranged to implement a procedure. A data store may include local ULP errors predetermined for the model elements. The systems and methods may retrieve the local ULP errors for the model elements included in the model, and may apply a rules-based analysis to compute an overall ULP error of the simulation model. The systems and methods may present the overall ULP computed for the model. The systems and methods may also present intermediate ULP errors determined for portions of the simulation model. Changes may be made to the model to reduce the overall ULP error.
    Type: Grant
    Filed: May 10, 2019
    Date of Patent: March 2, 2021
    Assignee: The MathWorks, Inc.
    Inventors: Kiran K. Kintali, Shomit Dutta, E. Mehran Mestchian, Pieter J. Mosterman
  • Patent number: 10782932
    Abstract: A round-for-reround mode (preferably in a BID encoded Decimal format) of a floating point instruction prepares a result for later rounding to a variable number of digits by detecting that the least significant digit may be a 0, and if so changing it to 1 when the trailing digits are not all 0. A subsequent reround instruction is then able to round the result to any number of digits at least 2 fewer than the number of digits of the result. An optional embodiment saves a tag indicating the fact that the low order digit of the result is 0 or 5 if the trailing bits are non-zero in a tag field rather than modify the result. Another optional embodiment also saves a half-way-and-above indicator when the trailing digits represent a decimal with a most significant digit having a value of 5. An optional subsequent reround instruction is able to round the result to any number of digits fewer or equal to the number of digits of the result using the saved tags.
    Type: Grant
    Filed: August 25, 2019
    Date of Patent: September 22, 2020
    Assignee: International Business Machines Corporation
    Inventors: Michael F. Cowlishaw, Eric M. Schwarz, Ronald M. Smith, Sr., Phil C. Yeh
  • Patent number: 10606557
    Abstract: A data processing apparatus is provided. Intermediate value generation circuitry generates an intermediate value from a first floating point number and a second floating point number. The intermediate value includes a number of leading 0s indicative of a prediction of a number of leading 0s in a difference between absolute values of the first floating point number and the second floating point number. The prediction differs by at most one from the number of leading 0s in the difference between absolute values of the first floating point number and the second floating point number. Count circuitry counts the number of leading 0s in said intermediate value and mask generation circuitry produces one or more masks using the intermediate value. The mask generation circuitry produces the one or more masks at the same time or before the count circuitry counts the number of leading 0s in the intermediate value.
    Type: Grant
    Filed: December 6, 2016
    Date of Patent: March 31, 2020
    Assignee: ARM Limited
    Inventor: David Raymond Lutz
  • Patent number: 10592672
    Abstract: The disclosed embodiments provide a system that facilitates testing of an insecure computing environment. During operation, the system obtains a real data set comprising a set of data strings. Next, the system determines a set of frequency distributions associated with the set of data strings. The system then generates a test data set from the real data set, wherein the test data set comprises a set of random data strings that conforms to the set of frequency distributions. Finally, the system tests the insecure computing environment using the test data set.
    Type: Grant
    Filed: December 21, 2016
    Date of Patent: March 17, 2020
    Assignee: INTUIT INC.
    Inventor: Colin R. Dillard
  • Patent number: 10592252
    Abstract: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.
    Type: Grant
    Filed: December 31, 2015
    Date of Patent: March 17, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
  • Patent number: 10416962
    Abstract: Logic is provided for performing decimal and binary floating point arithmetic calculations on first and second operands. The method includes: receiving the first and second operands in packed format; unpacking the first and second operands; swapping the first operand to a fourth operand and the second operand to a third operand, if an exponent of the first operand is less than an exponent of the second operand, otherwise storing the first operand to the third operand and the second operand to the fourth operand; aligning the third operand and the fourth operands based on the exponent difference of the third and fourth operand and a number of leading zeroes of the third operand; performing an add/subtract operation on the aligned third and fourth operands with normalizing and rounding between the operands; and packing the result obtained from the add/subtract.
    Type: Grant
    Filed: October 2, 2015
    Date of Patent: September 17, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Steven R. Carlough, Juergen Haess, Michael Klein, Klaus M. Kroener, Petra Leber, Silvia M. Mueller, Kerstin Schelm
  • Patent number: 10318240
    Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.
    Type: Grant
    Filed: November 14, 2017
    Date of Patent: June 11, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10310814
    Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.
    Type: Grant
    Filed: June 23, 2017
    Date of Patent: June 4, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10209986
    Abstract: A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 19, 2019
    Assignee: Intel Corporation
    Inventors: Jesus Corbal San Adrian, Cristina S. Anderson, Robert Valentine, Bret Toll, Amit Gradstein, Simon Rubanovich, Benny Eitan
  • Patent number: 10162639
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 10162638
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 10162637
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 10146503
    Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for floating point operations. Disclosed embodiments pertain to a circuit that is capable of processing both a normal and denormal inputs and outputting normal and denormal results, and where a rounding module is used advantageously to reduce operational latency of the circuit.
    Type: Grant
    Filed: October 13, 2016
    Date of Patent: December 4, 2018
    Assignee: Imagination Technologies Limited
    Inventor: Leonard Rarick
  • Patent number: 10140092
    Abstract: According to one general aspect, an apparatus may include a floating-point multiply-accumulate unit configured to generate a floating point result by either adding or subtracting three floating point operands: an addend, a product carry, and a product sum. The floating-point multiply-accumulate unit may include a close path adder. The close path adder may include an unincremented mantissa addition circuit configured to compute an unincremented mantissa result based upon the three floating point operands. The close path adder may also include an incremented mantissa addition circuit configured to, at least partially in parallel with the mantissa addition circuit, produce an incremented mantissa result. The close path adder may further include a selection circuit configured to produce a close path result by selecting between the unincremented mantissa result and the incremented mantissa result.
    Type: Grant
    Filed: February 10, 2017
    Date of Patent: November 27, 2018
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Ashraf Ahmed
  • Patent number: 9928031
    Abstract: Processing circuitry is provided to perform an overlap propagating operation on a first data value to generate a second data value, the first and second data values having a redundant representation representing a P-bit numeric value using an M-bit data value comprising a plurality of N-bit portions, where M>P>N. In the redundant representation, each N-bit portion other than a most significant N-bit portion includes a plurality of overlap bits having a same significance as a plurality of least significant bits of a following N-bit portion. Each N-bit portion of the second data value other than a least significant N-bit portion is generated by adding non-overlap bits of a corresponding N-bit portion of the first data value to the overlap bits of a preceding N-bit portion of the first data value. This provides a faster technique for reducing the chance of overflow during addition of the redundantly represented M-bit value.
    Type: Grant
    Filed: November 12, 2015
    Date of Patent: March 27, 2018
    Assignee: ARM LIMITED
    Inventors: Neil Burgess, David Raymond Lutz, Christopher Neal Hinds
  • Patent number: 9817661
    Abstract: A data processing system supports execution of program instructions having a rounding position input operand so as to generate control signals for controlling processing circuitry to process a floating point input operand with a significand value to generate an output result which depends upon a value from rounding the floating point input operand using a variable rounding point within the significand of the floating point input operand as specified by the rounding position input operand. In this way, processing operations having as inputs floating point operands and anchored number operands may be facilitated.
    Type: Grant
    Filed: October 7, 2015
    Date of Patent: November 14, 2017
    Assignee: ARM Limited
    Inventors: David Raymond Lutz, Christopher Neal Hinds, Neil Burgess
  • Patent number: 9798519
    Abstract: A microprocessor comprises an instruction pipeline, a shared memory, and first and second arithmetic processing units in the instruction pipeline, each capable of reading or receiving operands from and writing or providing results to the shared memory. The first arithmetic processing unit performs a first portion of a mathematical operation to produce an intermediate result vector that is not a complete, final result of the mathematical operation. The first arithmetic processing unit generates a plurality of non-architectural calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The second arithmetic processing unit performs a second portion of the mathematical operation, in accordance with the calculation control indicators, to produce a complete, final result of the mathematical operation.
    Type: Grant
    Filed: June 24, 2015
    Date of Patent: October 24, 2017
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventor: Thomas Elmer
  • Patent number: 9778909
    Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.
    Type: Grant
    Filed: October 24, 2016
    Date of Patent: October 3, 2017
    Assignee: Intel Corporation
    Inventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
  • Patent number: 9733899
    Abstract: Processing circuitry performs a plurality of lanes of processing on respective data elements of at least one operand vector to generate corresponding result data elements of a result vector. The processing circuitry identifies lane position information for each lane of processing, the lane position information for a given lane identifying a relative position of the corresponding result data element to be generated by the given lane within a corresponding result data value spanning one or more result data elements of the result vector. The processing circuitry is configured to perform each lane of processing in dependence on the lane position information identified for that lane. This enables generation of results which are wider or narrower than the vector size supported in hardware.
    Type: Grant
    Filed: November 12, 2015
    Date of Patent: August 15, 2017
    Assignee: ARM Limited
    Inventors: David Raymond Lutz, Neil Burgess, Christopher Neal Hinds
  • Patent number: 9678714
    Abstract: Method and computer system for implementing an operation on ?1 floating point input, in accordance with a rounding mode, e.g. using a Newton-Raphson technique. The floating point result comprises a p-bit mantissa. An unrounded proposed mantissa result is determined using the Newton-Raphson technique, wherein a p-bit rounded proposed mantissa result, t, corresponds to a rounding of the unrounded proposed mantissa result in accordance with the rounding mode, with k leading zeroes. If an increment to the (m?k)th bit of the unrounded result would affect the p-bit rounded result then the input(s) and bits of the unrounded result are used to determine a check parameter which is indicative of a relationship between an exact result and the unrounded result if the (m?k)th bit were incremented. The p-bit mantissa of the floating point result, is determined in dependence upon the check parameter, to be either t or t+1.
    Type: Grant
    Filed: July 11, 2014
    Date of Patent: June 13, 2017
    Assignee: Imagination Technologies Limited
    Inventors: Manouk Manoukian, Leonard Rarick
  • Patent number: 9552190
    Abstract: In accordance with some embodiments, a floating point number datapath circuitry, e.g., within an integrated circuit programmable logic device is provided. The datapath circuitry may be used for computing a rounded absolute value of a mantissa of a floating point number. The floating point datapath circuitry may have only a single adder stage for computing a rounded absolute value of a mantissa of the floating point number based on one or more bits of an unrounded mantissa of the floating point number. The unrounded and rounded mantissas may include a sign bit, a sticky bit, a round bit, and/or a least significant bit, and/or other bits. The unrounded mantissa may be in a format that includes negative numbers (e.g., 2's complement) and the rounded mantissa may be in a format that may include a portion of the floating point number represented as a positive number, (e.g., signed magnitude).
    Type: Grant
    Filed: April 20, 2016
    Date of Patent: January 24, 2017
    Assignee: ALTERA CORPORATION
    Inventors: Martin Langhammer, Bogdan Pasca
  • Patent number: 9411583
    Abstract: An apparatus is described having a semiconductor chip that has an instruction execution pipeline. The instruction execution pipeline has an execution unit with logic circuitry to perform the following for an instruction: accept input vector elements representing real and imaginary parts of a plurality of complex numbers; and, present the complex conjugates of the complex numbers.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: August 9, 2016
    Assignee: Intel Corporation
    Inventors: Suleyman Sair, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 9405728
    Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.
    Type: Grant
    Filed: September 5, 2013
    Date of Patent: August 2, 2016
    Assignee: Altera Corporation
    Inventor: Tomasz Czajkowski
  • Patent number: 9354875
    Abstract: An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: May 31, 2016
    Assignee: Intel Corporation
    Inventors: Matthew C. Merten, Justin M. Deinlein, Yury N. Ilin, Alexandre J. Farcy, Tong Li, Srikanth T. Srinivasan
  • Patent number: 9348557
    Abstract: In accordance with some embodiments, a floating point number datapath circuitry, e.g., within an integrated circuit programmable logic device is provided. The datapath circuitry may be used for computing a rounded absolute value of a mantissa of a floating point number. The floating point datapath circuitry may have only a single adder stage for computing a rounded absolute value of a mantissa of the floating point number based on one or more bits of an unrounded mantissa of the floating point number. The unrounded and rounded mantissas may include a sign bit, a sticky bit, a round bit, and/or a least significant bit, and/or other bits. The unrounded mantissa may be in a format that includes negative numbers (e.g., 2's complement) and the rounded mantissa may be in a format that may include a portion of the floating point number represented as a positive number, (e.g., signed magnitude).
    Type: Grant
    Filed: February 21, 2014
    Date of Patent: May 24, 2016
    Assignee: ALTERA CORPORATION
    Inventors: Martin Langhammer, Bogdan Pasca
  • Patent number: 9298421
    Abstract: The disclosed embodiments disclose techniques for performing quotient selection in an iterative carry-save division operation that divides a dividend, R, by a divisor, D, to produce an approximation of a quotient, Q=R/D. During a divide operation, a divider approximates Q by iteratively selecting an operation to perform for each iteration of the carry-save division operation and then performing the selected operation. The operation for each iteration is selected based on the current partial sum bits of a partial remainder in carry-save form (rs) and the current partial carry bits of a partial remainder in carry-save form (rc). More specifically, the operation is selected from a set of operations that includes: (1) a 2X* operation; (2) an S1 & 2X* operation; (3) an S2 & 2X* operation; (4) an A1 & 2X* operation; and (5) an A2 & 2X* operation.
    Type: Grant
    Filed: September 17, 2013
    Date of Patent: March 29, 2016
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Josephus C. Ebergen, Navaneeth P. Jamadagni, Ivan E. Sutherland
  • Patent number: 8930435
    Abstract: A method for computation, including defining a sequence of n bits that encodes an exponent d, such that no more than a specified number of successive bits in the sequence are the same, initializing first and second registers using a value of a base x that is to be exponentiated, whereby the first and second registers hold respective first and second values, which are successively updated during the computation, successively, for each bit in the sequence computing a product of the first and second values, depending on whether the bit is one or zero, selecting one of the first and second registers, and storing the product in the selected one of the registers, whereby the first and second registers hold respective first and second final values upon completion of the sequence, and returning xd based on the first and second final values. Related apparatus and methods are also described.
    Type: Grant
    Filed: September 21, 2010
    Date of Patent: January 6, 2015
    Assignee: Cisco Technology Inc.
    Inventors: Yaacov Belenky, Zeev Geyzel
  • Patent number: 8918445
    Abstract: An integrated multiplier circuit that operates on a variety of data formats including integer fixed point, signed or unsigned, real or complex, 8 bit, 16 bit or 32 bit as well as floating point data that may be single precision real, single precision complex or double precision. The circuit uses a single set of multiplier arrays to perform 16×16, 32×32 and 64×64 multiplies, 32×32 and 64×64 complex multiplies, 32×32 and 64×64 complex multiplies with one operand conjugated.
    Type: Grant
    Filed: September 21, 2011
    Date of Patent: December 23, 2014
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Mujibur Rahman
  • Patent number: 8903881
    Abstract: An arithmetic circuit for quantizing pre-quantized data includes a first input register to store first-format pre-quantized data that includes a mantissa and an exponent, a second input register to store a quantization target exponent, an exponent-correction-value indicating unit to indicate an exponent correction value, an exponent generating unit to generate a quantized exponent obtained by subtracting the exponent correction value from the quantization target exponent, a shift amount generating unit to generate a shift amount obtained by subtracting the exponent of the pre-quantized data and the exponent correction value from the quantization target exponent, a shift unit to generate a quantized mantissa obtained by shifting the mantissa of the pre-quantized data by the shift amount generated by the shift amount generating unit, and an output register to store quantized data that includes the quantized exponent generated by the exponent generating unit and the quantized mantissa generated by the shift unit
    Type: Grant
    Filed: April 3, 2012
    Date of Patent: December 2, 2014
    Assignee: Fujitsu Limited
    Inventors: Ryuji Kan, Hideyuki Unno, Kenichi Kitamura
  • Patent number: 8886695
    Abstract: A programmable integrated circuit device is programmed to normalize multiplication operations by examining the input or output values to determined the likelihood of overflow or underflow and then to adjust the input or output values accordingly. The examination of the inputs can include an examination of the number of adder stages feeding into the inputs, as well as a count of leading bits ahead of the first significant bit. Adjustment of an input can include shifting the mantissa by the leading bit count and adjusting the exponent accordingly, while adjustment of the output can include shifting the mantissa by the sum of the leading bit counts of the inputs and adjusting the exponent accordingly. Or the output can be examined to find its leading bit count and the output then can be adjusted by shifting the mantissa by the leading bit count and adjusting the exponent accordingly.
    Type: Grant
    Filed: July 10, 2012
    Date of Patent: November 11, 2014
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 8832166
    Abstract: An optimized floating point multiplier rounding circuit that minimizes the increase of the critical timing path of the calculation. The values of the temporary mantissa required to make the rounding decision are calculated simultaneously by the circuit shown in the invention.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: September 9, 2014
    Assignee: Texas Instruments Incorporated
    Inventor: Timothy David Anderson
  • Patent number: 8775494
    Abstract: A computer-implemented method for executing a floating-point calculation where an exact value of an associated result cannot be expressed as a floating-point value is disclosed. The method involves: generating an estimate of the associated result and storing the estimate in memory; calculating an amount of error for the estimate; determining whether the amount of error is less than or equal to a threshold of error for the associated result; and if the amount of error is less than or equal to the threshold of error, then concluding that the estimate of the associated result is a correctly rounded result of the floating-point calculation; or if the amount of error is greater than the threshold of error, then testing whether the floating-point calculation constitutes an exception case.
    Type: Grant
    Filed: March 1, 2011
    Date of Patent: July 8, 2014
    Assignee: NVIDIA Corporation
    Inventor: Alexandru Fit-Florea
  • Publication number: 20140181169
    Abstract: A mechanism for performing single-path floating-point rounding in a floating point unit is disclosed. A system of the disclosure includes a memory and a processing device communicably coupled to the memory. In one embodiment, the processing device comprises a floating point unit (FPU) to generate a plurality of status flags for a rounded value of a finite nonzero number. The plurality of status flags are generated based on the finite nonzero number without calculating the rounded value of the finite nonzero number. The plurality of status flags comprises an overflow flag and an underflow flag. The FPU determines whether a rounded value should be calculated for the finite nonzero number based on the plurality of status flags and whether the overflow flag is asserted.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Inventors: WARREN E. FERGUSON, BRIAN J. HICKMANN, THOMAS D. FLETCHER
  • Patent number: 8751555
    Abstract: A method for performing a decimal floating-point division, including: receiving, by a decimal floating-point divider, a decimal floating-point dividend and a decimal floating-point divisor; obtaining, by the decimal floating-point divider, a preliminary quotient having a first precision level, where the preliminary quotient is calculated from the decimal floating-point dividend and the decimal-floating point divisor; receiving, by the decimal floating-point divider, a rounding mode; selecting a rounding action based on the preliminary quotient and the rounding mode; and obtaining a rounded quotient having a second precision level by rounding the preliminary quotient according to the rounding action, where the first precision level is at least one digit greater than the second precision level.
    Type: Grant
    Filed: July 6, 2011
    Date of Patent: June 10, 2014
    Assignee: SilMinds, LLC, Egypt
    Inventors: Amira Mohamed, Hossam Ali Hassan Fahmy, Ramy Raafat, Yasmeen Farouk, Mostafa Elkhouly, Rodina Samy, Tarek Eldeeb
  • Patent number: 8732226
    Abstract: Systems, methods, processors, media, and other embodiments associated with integer rounding a floating point number in one micro-operation (uop) are described. One system embodiment includes a memory to store an integer rounding floating point instruction and a processor to perform the integer rounding floating point instruction. The processor may include a floating point unit that includes circuits and/or logics that integer round the floating point number.
    Type: Grant
    Filed: June 6, 2006
    Date of Patent: May 20, 2014
    Assignee: Intel Corporation
    Inventors: Mohammad Abdallah, Chad D. Hancock, Kwok W. Lui
  • Publication number: 20140101216
    Abstract: A technique is provided for performing a mixed precision estimate. A processing circuit receives an input of a first precision having a wide precision value. The processing circuit computes an output in an output exponent range corresponding to a narrow precision value based on the input having the wide precision value.
    Type: Application
    Filed: December 11, 2013
    Publication date: April 10, 2014
    Applicant: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 8694572
    Abstract: A decimal floating-point Fused-Multiply-Add (FMA) unit that performs the operation of ±(A×B)±C on decimal floating-point operands. The decimal floating-point FMA unit executes the multiplication and addition operations compliant with the IEEE 754-2008 standard. Specifically, the decimal floating-point FMA includes a parallel multiplier and injects the addend after required alignment as an additional partial product in the reduction tree used in the parallel multiplier. The decimal floating-point FMA unit may be configured to perform addition-subtraction operations or multiplication operations as standalone operations.
    Type: Grant
    Filed: July 6, 2011
    Date of Patent: April 8, 2014
    Assignee: SilMinds, LLC, Egypt
    Inventors: Rodina Samy, Hossam Ali Hassan Fahmy, Tarek Eldeeb, Ramy Raafat, Yasmeen Farouk, Mostafa Elkhouly, Amira Mohamed
  • Patent number: 8671129
    Abstract: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.
    Type: Grant
    Filed: March 8, 2011
    Date of Patent: March 11, 2014
    Assignee: Oracle International Corporation
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Publication number: 20130304785
    Abstract: A data processing apparatus includes processing circuitry for performing a convert-to-integer operation for converting a floating-point value to a rounded two's complement integer value. The convert-to-integer operation uses round-to-nearest, ties away from zero, rounding (RNA rounding). The operation is performed by generating an intermediate value based on the floating-point value, adding a rounding value to the intermediate value to generate a sum value, and outputting the integer-valued bits of the sum value as the rounded two's complement integer value. If the floating-point value is negative, then the intermediate value is generated by inverting the bits without adding a bit value of 1 to a least significant bit of the inverted value.
    Type: Application
    Filed: May 11, 2012
    Publication date: November 14, 2013
    Applicant: ARM LIMITED
    Inventors: David Raymond LUTZ, Neil BURGESS
  • Publication number: 20130304786
    Abstract: A device is provided for computing a function value of a function F. The device includes a memory, a truncator unit, a selector unit, and an evaluator unit. The memory contains a look-up table comprising a set of entries, each entry having associated with it a domain and an approximation function for approximating F on the associated domain. The truncator unit is arranged to truncate or round a first value X1 to generate a second value X2. The selector unit is arranged to select an entry of the lookup-table according to the second value X2, thus selecting the approximation function that is associated with the selected entry. The evaluator unit is arranged to determine the function value of the selected approximation function at the first value X1.
    Type: Application
    Filed: January 21, 2011
    Publication date: November 14, 2013
    Inventors: Ilia Moskovich, Roy Glasner, Dmitry Lachover