Round Off Or Truncation Patents (Class 708/497)
-
Patent number: 12020000Abstract: Systems and methods include arithmetic circuitry that generates a floating-point mantissa and includes a propagation network that calculates the floating-point mantissa based on input bits. The systems and methods also include rounding circuitry that rounds the floating-point mantissa. The rounding circuitry includes a multiplexer at a rounding location for the floating-point mantissa that selectively inputs a first input bit of the input bits or a rounding bit. The rounding circuitry also includes an OR gate that ORs a second input bit of the input bits with the rounding bit. Moreover, the second input bit is a less significant bit than the first input bit.Type: GrantFiled: December 24, 2020Date of Patent: June 25, 2024Assignee: Intel CorporationInventors: Martin Langhammer, Alexander Heinecke
-
Patent number: 11768685Abstract: An integrated circuit, comprising an instruction pipeline that includes instruction fetch phase circuitry, instruction decode phase circuitry, and instruction execution circuitry. The instruction execution circuitry includes transformation circuitry for receiving an interleaved dual vector operand as an input and for outputting a first natural order vector including a first set of data values from the interleaved dual vector operand and a second natural order vector including a second set of data values from the interleaved dual vector operand.Type: GrantFiled: May 5, 2022Date of Patent: September 26, 2023Assignee: Texas Instruments IncorporatedInventors: Mujibur Rahman, Timothy David Anderson, Joseph Zbiciak
-
Patent number: 11449309Abstract: A hardware module comprising circuitry configured to: store a sequence of n bits in a register of the hardware module; generate a signed integer comprising a magnitude component and a sign bit by: if the most significant bit of the sequence of n bits is equal to one: set each of the n?1 of the most significant bits of the magnitude component to be equal to the corresponding bit of the n?1 least significant bits of the sequence of n bits; and set the sign bit to be zero; if the most significant bit of the sequence of n bits is equal to zero: set each of the n?1 of the most significant bits of the magnitude component to be equal to the inverse of the corresponding bit of the n?1 least significant bits of the sequence of n bits; and set the sign bit to be one.Type: GrantFiled: June 21, 2019Date of Patent: September 20, 2022Assignee: GRAPHCORE LIMITEDInventors: Stephen Felix, Mrudula Gore
-
Patent number: 11449574Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to perform stochastic rounding, thus in some circumstances enabling reducing systematic bias in long dependency chains of floating-point computations. The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network.Type: GrantFiled: April 13, 2018Date of Patent: September 20, 2022Assignee: Cerebras Systems Inc.Inventors: Sean Lie, Michael Edwin James, Michael Morrison, Gary R. Lauterbach, Srikanth Arekapudi
-
Patent number: 11422803Abstract: A processing-in-memory (PIM) device includes a data storage region and a multiplication/accumulation (MAC) operator. The data storage region is configured to store first data and second data. The MAC operator is configured to perform a MAC arithmetic operation of the first data and the second data. The MAC operator includes a MAC circuit configured to perform the MAC arithmetic operation to output MAC result data and a data output unit configured to feedback bias data to the MAC circuit prior to the MAC arithmetic operation.Type: GrantFiled: January 8, 2021Date of Patent: August 23, 2022Assignee: SK hynix Inc.Inventor: Choung Ki Song
-
Patent number: 11392418Abstract: A computer system may initialize one or more workloads. The computer system may operate in a boost mode and a regular mode. The boost mode may include an adjustment of a pacing setting and an adjustment of group availability targets for executing the one or more workloads. The computer system may identify that the boost mode is enabled during a system start of the computer system. The computer system may identify that the pacing setting is operating in the regular mode. The computer system may dynamically increase the pacing setting. The increase of the pacing setting may enable an increased processor utilization of the computer system by the one or more workloads. The increased processor utilization may generate a concurrent processing of the one or more workloads. The computer system may determine an end of the boost mode and reset the pacing setting.Type: GrantFiled: February 21, 2020Date of Patent: July 19, 2022Assignee: International Business Machines CorporationInventors: Juergen Holtz, Qais Noorshams
-
Patent number: 11372644Abstract: A processor system comprises a shared memory and a processing element. The processing element includes a matrix processor unit and is in communication with the shared memory. The processing element is configured to receive a processor instruction specifying a data matrix and a matrix manipulation operation. A manipulation matrix based on the processor instruction is identified. The data matrix and the manipulation matrix are loaded into the matrix processor unit and a matrix operation is performed to determine a result matrix. The result matrix is outputted to a destination location.Type: GrantFiled: December 9, 2019Date of Patent: June 28, 2022Assignee: Meta Platforms, Inc.Inventors: Thomas Mark Ulrich, Krishnakumar Narayanan Nair, Yuchen Hao
-
Patent number: 11334796Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.Type: GrantFiled: August 3, 2020Date of Patent: May 17, 2022Assignee: Intel CorporationInventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
-
Patent number: 10942890Abstract: Systems, apparatuses, and methods related to bit string accumulation in memory array periphery are described. Control circuitry (e.g., a processing device) may be utilized to control performance of operations using bit strings within a memory device. Results of the operations may be accumulated in circuitry peripheral to a memory array of the memory device. For instance, a method for bit string accumulation in memory array periphery can include retrieving a bit string stored in a data structure of a memory array. The bit string can represent a result of performance of an arithmetic operation or a logical operation. The method can further include storing the bit string in a plurality of sense amplifiers located in a periphery of the memory array and using the bit string as an operand in performance of at least a portion of a recursive operation.Type: GrantFiled: June 4, 2019Date of Patent: March 9, 2021Assignee: Micron Technology, Inc.Inventor: Vijay S. Ramesh
-
Patent number: 10942889Abstract: Bit string accumulation in a memory array periphery is described. Control circuitry (e.g., a processing device) may be utilized to control performance of operations using bit strings within a memory device. Results of the operations may be accumulated in circuitry peripheral to a memory array of the memory device. For instance, a plurality of sense amplifiers may be coupled to a memory array and a processing device. A quantity of sense amplifiers among the plurality of sense amplifiers can be the same as a quantity of rows or columns of the array. The processing device may be configured to cause performance of a recursive operation using one or more bit strings that are formatted according to a Type III universal number format or a posit format. The processing device may further be configured to cause resultant bit strings representing results of iterations of the recursive operation to be accumulated in the plurality of sense amplifiers.Type: GrantFiled: June 4, 2019Date of Patent: March 9, 2021Assignee: Micron Technology, Inc.Inventor: Vijay S. Ramesh
-
Patent number: 10936569Abstract: In a system for storing in memory a tensor that includes at least three modes, elements of the tensor are stored in a mode-based order for improving locality of references when the elements are accessed during an operation on the tensor. To facilitate efficient data reuse in a tensor transform that includes several iterations, on a tensor that includes at least three modes, a system performs a first iteration that includes a first operation on the tensor to obtain a first intermediate result, and the first intermediate result includes a first intermediate-tensor. The first intermediate result is stored in memory, and a second iteration is performed in which a second operation on the first intermediate result accessed from the memory is performed, so as to avoid a third operation, that would be required if the first intermediate result were not accessed from the memory.Type: GrantFiled: May 20, 2013Date of Patent: March 2, 2021Assignee: Reservoir Labs, Inc.Inventors: Muthu Manikandan Baskaran, Richard A. Lethin, Benoit J. Meister, Nicolas T. Vasilache
-
Patent number: 10936769Abstract: Systems and methods evaluate simulation models and measure floating point arithmetic errors in terms of Unit in Last Place (ULP). The simulation model may include model elements that perform numerical computations using Native Floating Point (NFP) arithmetic. The model elements may be arranged to implement a procedure. A data store may include local ULP errors predetermined for the model elements. The systems and methods may retrieve the local ULP errors for the model elements included in the model, and may apply a rules-based analysis to compute an overall ULP error of the simulation model. The systems and methods may present the overall ULP computed for the model. The systems and methods may also present intermediate ULP errors determined for portions of the simulation model. Changes may be made to the model to reduce the overall ULP error.Type: GrantFiled: May 10, 2019Date of Patent: March 2, 2021Assignee: The MathWorks, Inc.Inventors: Kiran K. Kintali, Shomit Dutta, E. Mehran Mestchian, Pieter J. Mosterman
-
Patent number: 10782932Abstract: A round-for-reround mode (preferably in a BID encoded Decimal format) of a floating point instruction prepares a result for later rounding to a variable number of digits by detecting that the least significant digit may be a 0, and if so changing it to 1 when the trailing digits are not all 0. A subsequent reround instruction is then able to round the result to any number of digits at least 2 fewer than the number of digits of the result. An optional embodiment saves a tag indicating the fact that the low order digit of the result is 0 or 5 if the trailing bits are non-zero in a tag field rather than modify the result. Another optional embodiment also saves a half-way-and-above indicator when the trailing digits represent a decimal with a most significant digit having a value of 5. An optional subsequent reround instruction is able to round the result to any number of digits fewer or equal to the number of digits of the result using the saved tags.Type: GrantFiled: August 25, 2019Date of Patent: September 22, 2020Assignee: International Business Machines CorporationInventors: Michael F. Cowlishaw, Eric M. Schwarz, Ronald M. Smith, Sr., Phil C. Yeh
-
Patent number: 10606557Abstract: A data processing apparatus is provided. Intermediate value generation circuitry generates an intermediate value from a first floating point number and a second floating point number. The intermediate value includes a number of leading 0s indicative of a prediction of a number of leading 0s in a difference between absolute values of the first floating point number and the second floating point number. The prediction differs by at most one from the number of leading 0s in the difference between absolute values of the first floating point number and the second floating point number. Count circuitry counts the number of leading 0s in said intermediate value and mask generation circuitry produces one or more masks using the intermediate value. The mask generation circuitry produces the one or more masks at the same time or before the count circuitry counts the number of leading 0s in the intermediate value.Type: GrantFiled: December 6, 2016Date of Patent: March 31, 2020Assignee: ARM LimitedInventor: David Raymond Lutz
-
Patent number: 10592672Abstract: The disclosed embodiments provide a system that facilitates testing of an insecure computing environment. During operation, the system obtains a real data set comprising a set of data strings. Next, the system determines a set of frequency distributions associated with the set of data strings. The system then generates a test data set from the real data set, wherein the test data set comprises a set of random data strings that conforms to the set of frequency distributions. Finally, the system tests the insecure computing environment using the test data set.Type: GrantFiled: December 21, 2016Date of Patent: March 17, 2020Assignee: INTUIT INC.Inventor: Colin R. Dillard
-
Patent number: 10592252Abstract: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.Type: GrantFiled: December 31, 2015Date of Patent: March 17, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
-
Patent number: 10416962Abstract: Logic is provided for performing decimal and binary floating point arithmetic calculations on first and second operands. The method includes: receiving the first and second operands in packed format; unpacking the first and second operands; swapping the first operand to a fourth operand and the second operand to a third operand, if an exponent of the first operand is less than an exponent of the second operand, otherwise storing the first operand to the third operand and the second operand to the fourth operand; aligning the third operand and the fourth operands based on the exponent difference of the third and fourth operand and a number of leading zeroes of the third operand; performing an add/subtract operation on the aligned third and fourth operands with normalizing and rounding between the operands; and packing the result obtained from the add/subtract.Type: GrantFiled: October 2, 2015Date of Patent: September 17, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Steven R. Carlough, Juergen Haess, Michael Klein, Klaus M. Kroener, Petra Leber, Silvia M. Mueller, Kerstin Schelm
-
Patent number: 10318240Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.Type: GrantFiled: November 14, 2017Date of Patent: June 11, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 10310814Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.Type: GrantFiled: June 23, 2017Date of Patent: June 4, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 10209986Abstract: A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.Type: GrantFiled: December 22, 2011Date of Patent: February 19, 2019Assignee: Intel CorporationInventors: Jesus Corbal San Adrian, Cristina S. Anderson, Robert Valentine, Bret Toll, Amit Gradstein, Simon Rubanovich, Benny Eitan
-
Patent number: 10162637Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.Type: GrantFiled: March 5, 2018Date of Patent: December 25, 2018Assignee: Intel CorporationInventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
-
Patent number: 10162638Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.Type: GrantFiled: March 5, 2018Date of Patent: December 25, 2018Assignee: Intel CorporationInventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
-
Patent number: 10162639Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.Type: GrantFiled: March 5, 2018Date of Patent: December 25, 2018Assignee: Intel CorporationInventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
-
Patent number: 10146503Abstract: Embodiments disclosed pertain to apparatuses, systems, and methods for floating point operations. Disclosed embodiments pertain to a circuit that is capable of processing both a normal and denormal inputs and outputting normal and denormal results, and where a rounding module is used advantageously to reduce operational latency of the circuit.Type: GrantFiled: October 13, 2016Date of Patent: December 4, 2018Assignee: Imagination Technologies LimitedInventor: Leonard Rarick
-
Patent number: 10140092Abstract: According to one general aspect, an apparatus may include a floating-point multiply-accumulate unit configured to generate a floating point result by either adding or subtracting three floating point operands: an addend, a product carry, and a product sum. The floating-point multiply-accumulate unit may include a close path adder. The close path adder may include an unincremented mantissa addition circuit configured to compute an unincremented mantissa result based upon the three floating point operands. The close path adder may also include an incremented mantissa addition circuit configured to, at least partially in parallel with the mantissa addition circuit, produce an incremented mantissa result. The close path adder may further include a selection circuit configured to produce a close path result by selecting between the unincremented mantissa result and the incremented mantissa result.Type: GrantFiled: February 10, 2017Date of Patent: November 27, 2018Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Ashraf Ahmed
-
Patent number: 9928031Abstract: Processing circuitry is provided to perform an overlap propagating operation on a first data value to generate a second data value, the first and second data values having a redundant representation representing a P-bit numeric value using an M-bit data value comprising a plurality of N-bit portions, where M>P>N. In the redundant representation, each N-bit portion other than a most significant N-bit portion includes a plurality of overlap bits having a same significance as a plurality of least significant bits of a following N-bit portion. Each N-bit portion of the second data value other than a least significant N-bit portion is generated by adding non-overlap bits of a corresponding N-bit portion of the first data value to the overlap bits of a preceding N-bit portion of the first data value. This provides a faster technique for reducing the chance of overflow during addition of the redundantly represented M-bit value.Type: GrantFiled: November 12, 2015Date of Patent: March 27, 2018Assignee: ARM LIMITEDInventors: Neil Burgess, David Raymond Lutz, Christopher Neal Hinds
-
Patent number: 9817661Abstract: A data processing system supports execution of program instructions having a rounding position input operand so as to generate control signals for controlling processing circuitry to process a floating point input operand with a significand value to generate an output result which depends upon a value from rounding the floating point input operand using a variable rounding point within the significand of the floating point input operand as specified by the rounding position input operand. In this way, processing operations having as inputs floating point operands and anchored number operands may be facilitated.Type: GrantFiled: October 7, 2015Date of Patent: November 14, 2017Assignee: ARM LimitedInventors: David Raymond Lutz, Christopher Neal Hinds, Neil Burgess
-
Patent number: 9798519Abstract: A microprocessor comprises an instruction pipeline, a shared memory, and first and second arithmetic processing units in the instruction pipeline, each capable of reading or receiving operands from and writing or providing results to the shared memory. The first arithmetic processing unit performs a first portion of a mathematical operation to produce an intermediate result vector that is not a complete, final result of the mathematical operation. The first arithmetic processing unit generates a plurality of non-architectural calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The second arithmetic processing unit performs a second portion of the mathematical operation, in accordance with the calculation control indicators, to produce a complete, final result of the mathematical operation.Type: GrantFiled: June 24, 2015Date of Patent: October 24, 2017Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer
-
Patent number: 9778909Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.Type: GrantFiled: October 24, 2016Date of Patent: October 3, 2017Assignee: Intel CorporationInventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
-
Patent number: 9733899Abstract: Processing circuitry performs a plurality of lanes of processing on respective data elements of at least one operand vector to generate corresponding result data elements of a result vector. The processing circuitry identifies lane position information for each lane of processing, the lane position information for a given lane identifying a relative position of the corresponding result data element to be generated by the given lane within a corresponding result data value spanning one or more result data elements of the result vector. The processing circuitry is configured to perform each lane of processing in dependence on the lane position information identified for that lane. This enables generation of results which are wider or narrower than the vector size supported in hardware.Type: GrantFiled: November 12, 2015Date of Patent: August 15, 2017Assignee: ARM LimitedInventors: David Raymond Lutz, Neil Burgess, Christopher Neal Hinds
-
Patent number: 9678714Abstract: Method and computer system for implementing an operation on ?1 floating point input, in accordance with a rounding mode, e.g. using a Newton-Raphson technique. The floating point result comprises a p-bit mantissa. An unrounded proposed mantissa result is determined using the Newton-Raphson technique, wherein a p-bit rounded proposed mantissa result, t, corresponds to a rounding of the unrounded proposed mantissa result in accordance with the rounding mode, with k leading zeroes. If an increment to the (m?k)th bit of the unrounded result would affect the p-bit rounded result then the input(s) and bits of the unrounded result are used to determine a check parameter which is indicative of a relationship between an exact result and the unrounded result if the (m?k)th bit were incremented. The p-bit mantissa of the floating point result, is determined in dependence upon the check parameter, to be either t or t+1.Type: GrantFiled: July 11, 2014Date of Patent: June 13, 2017Assignee: Imagination Technologies LimitedInventors: Manouk Manoukian, Leonard Rarick
-
Patent number: 9552190Abstract: In accordance with some embodiments, a floating point number datapath circuitry, e.g., within an integrated circuit programmable logic device is provided. The datapath circuitry may be used for computing a rounded absolute value of a mantissa of a floating point number. The floating point datapath circuitry may have only a single adder stage for computing a rounded absolute value of a mantissa of the floating point number based on one or more bits of an unrounded mantissa of the floating point number. The unrounded and rounded mantissas may include a sign bit, a sticky bit, a round bit, and/or a least significant bit, and/or other bits. The unrounded mantissa may be in a format that includes negative numbers (e.g., 2's complement) and the rounded mantissa may be in a format that may include a portion of the floating point number represented as a positive number, (e.g., signed magnitude).Type: GrantFiled: April 20, 2016Date of Patent: January 24, 2017Assignee: ALTERA CORPORATIONInventors: Martin Langhammer, Bogdan Pasca
-
Patent number: 9411583Abstract: An apparatus is described having a semiconductor chip that has an instruction execution pipeline. The instruction execution pipeline has an execution unit with logic circuitry to perform the following for an instruction: accept input vector elements representing real and imaginary parts of a plurality of complex numbers; and, present the complex conjugates of the complex numbers.Type: GrantFiled: December 22, 2011Date of Patent: August 9, 2016Assignee: Intel CorporationInventors: Suleyman Sair, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 9405728Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.Type: GrantFiled: September 5, 2013Date of Patent: August 2, 2016Assignee: Altera CorporationInventor: Tomasz Czajkowski
-
Patent number: 9354875Abstract: An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.Type: GrantFiled: December 27, 2012Date of Patent: May 31, 2016Assignee: Intel CorporationInventors: Matthew C. Merten, Justin M. Deinlein, Yury N. Ilin, Alexandre J. Farcy, Tong Li, Srikanth T. Srinivasan
-
Patent number: 9348557Abstract: In accordance with some embodiments, a floating point number datapath circuitry, e.g., within an integrated circuit programmable logic device is provided. The datapath circuitry may be used for computing a rounded absolute value of a mantissa of a floating point number. The floating point datapath circuitry may have only a single adder stage for computing a rounded absolute value of a mantissa of the floating point number based on one or more bits of an unrounded mantissa of the floating point number. The unrounded and rounded mantissas may include a sign bit, a sticky bit, a round bit, and/or a least significant bit, and/or other bits. The unrounded mantissa may be in a format that includes negative numbers (e.g., 2's complement) and the rounded mantissa may be in a format that may include a portion of the floating point number represented as a positive number, (e.g., signed magnitude).Type: GrantFiled: February 21, 2014Date of Patent: May 24, 2016Assignee: ALTERA CORPORATIONInventors: Martin Langhammer, Bogdan Pasca
-
Patent number: 9298421Abstract: The disclosed embodiments disclose techniques for performing quotient selection in an iterative carry-save division operation that divides a dividend, R, by a divisor, D, to produce an approximation of a quotient, Q=R/D. During a divide operation, a divider approximates Q by iteratively selecting an operation to perform for each iteration of the carry-save division operation and then performing the selected operation. The operation for each iteration is selected based on the current partial sum bits of a partial remainder in carry-save form (rs) and the current partial carry bits of a partial remainder in carry-save form (rc). More specifically, the operation is selected from a set of operations that includes: (1) a 2X* operation; (2) an S1 & 2X* operation; (3) an S2 & 2X* operation; (4) an A1 & 2X* operation; and (5) an A2 & 2X* operation.Type: GrantFiled: September 17, 2013Date of Patent: March 29, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Josephus C. Ebergen, Navaneeth P. Jamadagni, Ivan E. Sutherland
-
Patent number: 8930435Abstract: A method for computation, including defining a sequence of n bits that encodes an exponent d, such that no more than a specified number of successive bits in the sequence are the same, initializing first and second registers using a value of a base x that is to be exponentiated, whereby the first and second registers hold respective first and second values, which are successively updated during the computation, successively, for each bit in the sequence computing a product of the first and second values, depending on whether the bit is one or zero, selecting one of the first and second registers, and storing the product in the selected one of the registers, whereby the first and second registers hold respective first and second final values upon completion of the sequence, and returning xd based on the first and second final values. Related apparatus and methods are also described.Type: GrantFiled: September 21, 2010Date of Patent: January 6, 2015Assignee: Cisco Technology Inc.Inventors: Yaacov Belenky, Zeev Geyzel
-
Patent number: 8918445Abstract: An integrated multiplier circuit that operates on a variety of data formats including integer fixed point, signed or unsigned, real or complex, 8 bit, 16 bit or 32 bit as well as floating point data that may be single precision real, single precision complex or double precision. The circuit uses a single set of multiplier arrays to perform 16×16, 32×32 and 64×64 multiplies, 32×32 and 64×64 complex multiplies, 32×32 and 64×64 complex multiplies with one operand conjugated.Type: GrantFiled: September 21, 2011Date of Patent: December 23, 2014Assignee: Texas Instruments IncorporatedInventors: Timothy David Anderson, Mujibur Rahman
-
Patent number: 8903881Abstract: An arithmetic circuit for quantizing pre-quantized data includes a first input register to store first-format pre-quantized data that includes a mantissa and an exponent, a second input register to store a quantization target exponent, an exponent-correction-value indicating unit to indicate an exponent correction value, an exponent generating unit to generate a quantized exponent obtained by subtracting the exponent correction value from the quantization target exponent, a shift amount generating unit to generate a shift amount obtained by subtracting the exponent of the pre-quantized data and the exponent correction value from the quantization target exponent, a shift unit to generate a quantized mantissa obtained by shifting the mantissa of the pre-quantized data by the shift amount generated by the shift amount generating unit, and an output register to store quantized data that includes the quantized exponent generated by the exponent generating unit and the quantized mantissa generated by the shift unitType: GrantFiled: April 3, 2012Date of Patent: December 2, 2014Assignee: Fujitsu LimitedInventors: Ryuji Kan, Hideyuki Unno, Kenichi Kitamura
-
Patent number: 8886695Abstract: A programmable integrated circuit device is programmed to normalize multiplication operations by examining the input or output values to determined the likelihood of overflow or underflow and then to adjust the input or output values accordingly. The examination of the inputs can include an examination of the number of adder stages feeding into the inputs, as well as a count of leading bits ahead of the first significant bit. Adjustment of an input can include shifting the mantissa by the leading bit count and adjusting the exponent accordingly, while adjustment of the output can include shifting the mantissa by the sum of the leading bit counts of the inputs and adjusting the exponent accordingly. Or the output can be examined to find its leading bit count and the output then can be adjusted by shifting the mantissa by the leading bit count and adjusting the exponent accordingly.Type: GrantFiled: July 10, 2012Date of Patent: November 11, 2014Assignee: Altera CorporationInventor: Martin Langhammer
-
Patent number: 8832166Abstract: An optimized floating point multiplier rounding circuit that minimizes the increase of the critical timing path of the calculation. The values of the temporary mantissa required to make the rounding decision are calculated simultaneously by the circuit shown in the invention.Type: GrantFiled: September 28, 2011Date of Patent: September 9, 2014Assignee: Texas Instruments IncorporatedInventor: Timothy David Anderson
-
Patent number: 8775494Abstract: A computer-implemented method for executing a floating-point calculation where an exact value of an associated result cannot be expressed as a floating-point value is disclosed. The method involves: generating an estimate of the associated result and storing the estimate in memory; calculating an amount of error for the estimate; determining whether the amount of error is less than or equal to a threshold of error for the associated result; and if the amount of error is less than or equal to the threshold of error, then concluding that the estimate of the associated result is a correctly rounded result of the floating-point calculation; or if the amount of error is greater than the threshold of error, then testing whether the floating-point calculation constitutes an exception case.Type: GrantFiled: March 1, 2011Date of Patent: July 8, 2014Assignee: NVIDIA CorporationInventor: Alexandru Fit-Florea
-
Publication number: 20140181169Abstract: A mechanism for performing single-path floating-point rounding in a floating point unit is disclosed. A system of the disclosure includes a memory and a processing device communicably coupled to the memory. In one embodiment, the processing device comprises a floating point unit (FPU) to generate a plurality of status flags for a rounded value of a finite nonzero number. The plurality of status flags are generated based on the finite nonzero number without calculating the rounded value of the finite nonzero number. The plurality of status flags comprises an overflow flag and an underflow flag. The FPU determines whether a rounded value should be calculated for the finite nonzero number based on the plurality of status flags and whether the overflow flag is asserted.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Inventors: WARREN E. FERGUSON, BRIAN J. HICKMANN, THOMAS D. FLETCHER
-
Patent number: 8751555Abstract: A method for performing a decimal floating-point division, including: receiving, by a decimal floating-point divider, a decimal floating-point dividend and a decimal floating-point divisor; obtaining, by the decimal floating-point divider, a preliminary quotient having a first precision level, where the preliminary quotient is calculated from the decimal floating-point dividend and the decimal-floating point divisor; receiving, by the decimal floating-point divider, a rounding mode; selecting a rounding action based on the preliminary quotient and the rounding mode; and obtaining a rounded quotient having a second precision level by rounding the preliminary quotient according to the rounding action, where the first precision level is at least one digit greater than the second precision level.Type: GrantFiled: July 6, 2011Date of Patent: June 10, 2014Assignee: SilMinds, LLC, EgyptInventors: Amira Mohamed, Hossam Ali Hassan Fahmy, Ramy Raafat, Yasmeen Farouk, Mostafa Elkhouly, Rodina Samy, Tarek Eldeeb
-
Patent number: 8732226Abstract: Systems, methods, processors, media, and other embodiments associated with integer rounding a floating point number in one micro-operation (uop) are described. One system embodiment includes a memory to store an integer rounding floating point instruction and a processor to perform the integer rounding floating point instruction. The processor may include a floating point unit that includes circuits and/or logics that integer round the floating point number.Type: GrantFiled: June 6, 2006Date of Patent: May 20, 2014Assignee: Intel CorporationInventors: Mohammad Abdallah, Chad D. Hancock, Kwok W. Lui
-
Publication number: 20140101216Abstract: A technique is provided for performing a mixed precision estimate. A processing circuit receives an input of a first precision having a wide precision value. The processing circuit computes an output in an output exponent range corresponding to a narrow precision value based on the input having the wide precision value.Type: ApplicationFiled: December 11, 2013Publication date: April 10, 2014Applicant: International Business Machines CorporationInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 8694572Abstract: A decimal floating-point Fused-Multiply-Add (FMA) unit that performs the operation of ±(A×B)±C on decimal floating-point operands. The decimal floating-point FMA unit executes the multiplication and addition operations compliant with the IEEE 754-2008 standard. Specifically, the decimal floating-point FMA includes a parallel multiplier and injects the addend after required alignment as an additional partial product in the reduction tree used in the parallel multiplier. The decimal floating-point FMA unit may be configured to perform addition-subtraction operations or multiplication operations as standalone operations.Type: GrantFiled: July 6, 2011Date of Patent: April 8, 2014Assignee: SilMinds, LLC, EgyptInventors: Rodina Samy, Hossam Ali Hassan Fahmy, Tarek Eldeeb, Ramy Raafat, Yasmeen Farouk, Mostafa Elkhouly, Amira Mohamed
-
Patent number: 8671129Abstract: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.Type: GrantFiled: March 8, 2011Date of Patent: March 11, 2014Assignee: Oracle International CorporationInventors: Jeffrey S. Brooks, Christopher H. Olson
-
Publication number: 20130304785Abstract: A data processing apparatus includes processing circuitry for performing a convert-to-integer operation for converting a floating-point value to a rounded two's complement integer value. The convert-to-integer operation uses round-to-nearest, ties away from zero, rounding (RNA rounding). The operation is performed by generating an intermediate value based on the floating-point value, adding a rounding value to the intermediate value to generate a sum value, and outputting the integer-valued bits of the sum value as the rounded two's complement integer value. If the floating-point value is negative, then the intermediate value is generated by inverting the bits without adding a bit value of 1 to a least significant bit of the inverted value.Type: ApplicationFiled: May 11, 2012Publication date: November 14, 2013Applicant: ARM LIMITEDInventors: David Raymond LUTZ, Neil BURGESS