Floating Point Or Vector Patents (Class 712/222)
  • Publication number: 20110296146
    Abstract: A set of instructions for implementation in a floating-point unit or other computer processor hardware is disclosed herein. In one embodiment, an extended-range fused multiply-add operation, a first look-up operation, and a second look-up operation are each embodied in hardware instructions configured to be operably executed in a processor. These operations are accompanied by a table which provides a set of defined values in response to various function types, supporting the computation of elementary functions such as reciprocal, square, cube, fourth roots and their reciprocals, exponential, and logarithmic functions. By allowing each of these functions to be computed with a hardware instruction, branching and predicated execution may be reduced or eliminated, while also permitting the use of distributed instructions across a number of execution units.
    Type: Application
    Filed: May 27, 2010
    Publication date: December 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher K. Anand, Robert F. Enenkel, Anuroop Sharma, Daniel M. Zabawa
  • Publication number: 20110296147
    Abstract: A method of testing a computer, the method has designating a register as an input-only register having a setting of a value which does not cause an exception interruption with an execution of a specific type of instruction, generating a test instruction array having a plurality of instructions for a test, by assigning a register excluding the input-only register as an output destination of an execution result of each of the plurality of instructions, executing the plurality of instructions included in the generated test instruction array, and evaluating the execution results by the computer.
    Type: Application
    Filed: April 13, 2011
    Publication date: December 1, 2011
    Applicant: FUJITSU LIMITED
    Inventors: Fumio Ichikawa, Tamoru Inoue
  • Publication number: 20110283092
    Abstract: The described embodiments comprise a processor that executes vector instructions. In the described embodiments, while executing program code, the processor receives a vector instruction that indicates an input vector that includes N elements, wherein receiving the vector instruction comprises optionally receiving a predicate vector that includes N elements. The processor then executes the vector instruction. When executing the vector instruction, if the predicate vector is received, based on active elements in the predicate vector, otherwise, if the predicate vector is not received, based on an assumed predicate vector for which each element is active, the processor sets a value in a scalar register equal to a predetermined element of the input vector. In the described embodiments, the vector instruction can be a GetFirst, an AssignLast1P, or an AssignLast2P instruction.
    Type: Application
    Filed: July 22, 2011
    Publication date: November 17, 2011
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20110276790
    Abstract: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R??1.
    Type: Application
    Filed: May 7, 2010
    Publication date: November 10, 2011
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence Spracklen, Nils Gura
  • Patent number: 8046564
    Abstract: Techniques, systems and apparatus are described for providing a processing element (PE) structure forming a floating point unit (FPU)-processing element. Each processing element includes each of two multiplexers (MUXes) to receive data from one or more sources including another PE, and select one value from the received data. The processing element includes an arithmetic logic unit (ALU) in communication with the two multiplexers to receive the selected value from each multiplexer as two input values, and process the received two input values to generate results of the ALU.
    Type: Grant
    Filed: September 19, 2008
    Date of Patent: October 25, 2011
    Assignee: Core Logic, Inc.
    Inventors: Hoon Mo Yang, Man Hwee Jo, Il Hyun Park, Ki Young Choi
  • Publication number: 20110258418
    Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.
    Type: Application
    Filed: April 15, 2011
    Publication date: October 20, 2011
    Inventor: Patrice Roussel
  • Patent number: 8028153
    Abstract: A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode.
    Type: Grant
    Filed: August 14, 2008
    Date of Patent: September 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: Mark J Hickey, Adam J Muff, Matthew R Tubbs, Charles D Wait
  • Publication number: 20110231636
    Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 22, 2011
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
  • Patent number: 8024678
    Abstract: An interface to a dynamically configurable arithmetic unit can include data alignment modules, where each data alignment module receives input variables being associated with one or more arithmetic expressions. The interface can include multiplexers coupled to the data alignment modules, wherein a data alignment module has outputs coupled to a first multiplexer. The first multiplexer can have a selection line and an output coupled to an input port of the dynamically configurable arithmetic unit. The interface can include a second multiplexer having input instructions and the selection line, where each instruction is associated with one of the arithmetic expressions and has an operation to be performed by the dynamically configurable arithmetic unit. The second multiplexer is configurable to provide selected ones of the input instructions to the dynamically configurable arithmetic unit through an output of the second multiplexer responsive to the selection line.
    Type: Grant
    Filed: April 1, 2009
    Date of Patent: September 20, 2011
    Assignee: Xilinx, Inc.
    Inventors: Bradley L. Taylor, Arvind Sundararajan, Shay Ping Seng, L. James Hwang
  • Publication number: 20110208505
    Abstract: A processor may include a floating-point unit (FPU) and an arithmetic logic unit (ALU). Instructions to the processor may include greater or lesser amounts of floating-point operations and integer operations. In a circumstance where instructions include predominantly integer operations, power to the FPU may be reduced or turned completely off. In such a circumstance, occasional floating-point operations may be emulated and performed by the ALU. If the processor subsequently determines that incoming instructions include a greater proportion of floating-point operations, the FPU may be powered back on and used to perform the floating-point operations.
    Type: Application
    Filed: February 24, 2010
    Publication date: August 25, 2011
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: David E. Mayhew, Mark D. Hummel
  • Patent number: 8001360
    Abstract: A system and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying a data selection operand and a first and a second register providing a plurality of data elements, the data selection operand comprising a plurality of fields each selecting one of the plurality of data elements, the execution unit operable to provide the data element selected by each field of the data selection operand to a predetermined position in a catenated result.
    Type: Grant
    Filed: January 16, 2004
    Date of Patent: August 16, 2011
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris
  • Patent number: 7987344
    Abstract: A programmable processor and method for improving the performance of processors by incorporating an execution unit configurable to execute a plurality of instruction streams from the plurality of threads, wherein each instruction stream includes a group instruction that operates on a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result.
    Type: Grant
    Filed: January 16, 2004
    Date of Patent: July 26, 2011
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris
  • Publication number: 20110173421
    Abstract: To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device.
    Type: Application
    Filed: January 8, 2010
    Publication date: July 14, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger, Burkhard Steinmacher-Burow
  • Publication number: 20110161624
    Abstract: Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.
    Type: Application
    Filed: December 29, 2009
    Publication date: June 30, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian K. Flachs, Seiji Maeda, Steven Osman
  • Publication number: 20110153996
    Abstract: Parallel and vectored data structures may be used in a single instruction multiple data processor that applies the Gilbert-Johnson-Keerthi algorithm. As a result, the performance of multi-core processors doing graphics processing may be increased in some cases.
    Type: Application
    Filed: December 23, 2009
    Publication date: June 23, 2011
    Inventors: Aleksey A. Bader, Mikhail Smelyanskiy, Jatin Chhugani
  • Publication number: 20110138155
    Abstract: A vector computer executing vector operations via vector pipeline processing is restructured to dynamically perform an overtaking control on vector gather/scatter instructions. Minimum/maximum values among vector elements of vector registers are determined based on the result of fixed-point calculation defining an address dependency source instruction in accordance with a vector gather/scatter instruction, wherein minimum/maximum values are determined in a redundant time owing to a short turnaround time of the fixed-point calculation compared to floating-point calculation. An access range of addresses attributed to the vector gather/scatter instruction is specified based on minimum/maximum values. An overtaking control is performed on the vector gather/scatter instruction in light of the access range of addresses.
    Type: Application
    Filed: December 1, 2010
    Publication date: June 9, 2011
    Inventor: EIICHIRO KAWAGUCHI
  • Patent number: 7953958
    Abstract: A joint detection system is configured to perform joint detection of received signals and includes a joint detection accelerator and a host processor. The joint detection accelerator may include a memory unit to store input data values, intermediate results and output data values; one or more computation units to process the input data values and the intermediate results, and to provide output data values to the memory unit; a controller to control the memory and the one or more computation units to perform joint detection processing; and an external interface to receive the input data values from the host processor and to provide output data values to the host processor. The computation units may include a complex multiply accumulate unit, a simplified complex multiply accumulate unit and a normalized floating point divider. The memory unit may include an input memory, a matrix memory, a main memory and an output memory.
    Type: Grant
    Filed: June 12, 2007
    Date of Patent: May 31, 2011
    Assignee: MediaTek Inc.
    Inventors: John Zijun Shen, Paul D. Krivacek, Thomas J. Barber, Jr., Lidwine Martinot, Aiguo Yan, Marko Kocic
  • Patent number: 7949858
    Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
    Type: Grant
    Filed: February 2, 2009
    Date of Patent: May 24, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric M. Schwarz, Ronald M. Smith, Sr.
  • Publication number: 20110119471
    Abstract: A method is presented including decomposing a first value into many parts. Decomposing includes shifting (310) a rounded integer portion of the first value to generate a second value. Generating (320) a third value. Extracting (330) a plurality of significand bits from the second value to generate a fourth value. Extracting (340) a portion of bits from the fourth value to generate an integer component. Generating (350) a fifth value. Also the third value, the fifth value, and the integer component are either stored (360, 380) in a memory or transmitted to an arithmetic logical unit (ALU).
    Type: Application
    Filed: July 13, 2001
    Publication date: May 19, 2011
    Inventors: Robert S Norin, Olga Dryzhakova, Alexander Isaev, Andrey Naralkin
  • Patent number: 7945766
    Abstract: A processor capable of executing conditional store instructions without being limited by the number of condition codes is provided. Condition data is stored in floating-point registers, and an operation unit executes a conditional floating-point store instruction of determining whether to store, in cache, store data.
    Type: Grant
    Filed: November 25, 2008
    Date of Patent: May 17, 2011
    Assignee: Fujitsu Limited
    Inventor: Toshio Yoshida
  • Patent number: 7937568
    Abstract: A method, system and processor for increasing the instruction throughput in a processor executing longer latency instructions within the instruction pipeline. Logic associated with specific stages of the execution pipeline, responsible for executing the particular type of instructions, determines when at least a threshold number of the particular-type instructions is scheduled to be executed. The logic then automatically changes an execution cycle frequency of the specific pipeline stages from a first cycle frequency to a second, pre-established higher cycle frequency, which enables more efficient execution and higher execution throughput of the particular-type instructions. The cycle frequency of only the one or more functional stages are switched to the higher cycle frequency independent of the cycle frequency of the other functional stages in the processor pipeline.
    Type: Grant
    Filed: July 11, 2007
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventors: Anthony Correale, Jr., Kenichi Tsuchiya
  • Publication number: 20110093686
    Abstract: In a data processing apparatus 1 having registers 6, when a state saving trigger event occurs while a result value of a data processing operation is still to be written to a destination register then saving and restoring control circuitry 12 selects a state saving sequence defining a temporal order for saving register values to a backup data store 10. The sequence is selected to provide the destination register with a position within the sequence corresponding to a time after the result value has been written to the destination register. The register values are then saved to the backup data store 10 in the order of the selected state saving sequence. A similar technique can be used when a state restoring trigger event triggers loading of the data values from the backup data store 10 to the registers 6.
    Type: Application
    Filed: September 16, 2010
    Publication date: April 21, 2011
    Applicant: ARM LIMITED
    Inventors: Antony John Penton, Simon Axford
  • Patent number: 7921278
    Abstract: An “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.
    Type: Grant
    Filed: March 10, 2008
    Date of Patent: April 5, 2011
    Assignee: International Business Machines Corporation
    Inventors: Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7913067
    Abstract: A system and method for overlapping execution (OE) of instructions through non-uniform execution pipelines in an in-order processor are provided. The system includes a first execution unit to perform instruction execution in a first execution pipeline. The system also includes a second execution unit to perform instruction execution in a second execution pipeline, where the second execution pipeline includes a greater number of stages than the first execution pipeline. The system further includes an instruction dispatch unit (IDU), the IDU including OE registers and logic for dispatching an OE-capable instruction to the first execution unit such that the instruction completes execution prior to completing execution of a previously dispatched instruction to the second execution unit. The system additionally includes a latch to hold a result of the execution of the OE-capable instruction until after the second execution unit completes the execution of the previously dispatched instruction.
    Type: Grant
    Filed: February 20, 2008
    Date of Patent: March 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: David S. Hutton, Khary J. Alexander, Fadi Y. Busaba, Bruce C. Giamei, John G. Rell, Jr., Eric M. Schwarz, Chung-Lung Kevin Shum
  • Patent number: 7913066
    Abstract: A programmable “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. In addition, programmable logic is provided to enable a custom early exit condition to be specified for the iterative refinement algorithm so that the underlying hardware can be configured for optimal execution of particular iterative refinement algorithms. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.
    Type: Grant
    Filed: March 10, 2008
    Date of Patent: March 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: Adam James Muff, Matthew Ray Tubbs
  • Publication number: 20110060892
    Abstract: A microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands includes first and second floating-point units. The first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit. The non-ADF result is associated with a first instruction. The second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction. The second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result. The microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
    Type: Application
    Filed: June 22, 2010
    Publication date: March 10, 2011
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks
  • Patent number: 7904700
    Abstract: A software-accessible special purpose register is architected into a processing unit in order to implement persistent vector multiplexer control of a vector-based execution unit. A persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be stored in the special purpose register such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.
    Type: Grant
    Filed: March 10, 2008
    Date of Patent: March 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Robert Allen Shearer, Matthew Ray Tubbs
  • Patent number: 7904699
    Abstract: Persistent vector multiplexer control is used in a vector-based execution unit to control the shuffling of words in operand vectors processed by the execution unit. In addition, a persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be persisted such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.
    Type: Grant
    Filed: March 10, 2008
    Date of Patent: March 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Robert Allen Shearer, Matthew Ray Tubbs
  • Patent number: 7900025
    Abstract: Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.
    Type: Grant
    Filed: October 14, 2008
    Date of Patent: March 1, 2011
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Publication number: 20110047358
    Abstract: Mechanisms are provided for tracking exceptions in the execution of vectorized code. A speculative instruction is executed on a vector element of a vector. An exception condition is detected in association with the vector element based on a result of executing the speculative instruction on the vector element. A special exception value is stored in the vector element in a vector register corresponding to the vector, indicative of the exception condition, without invoking an exception handler for the exception condition. The special exception value is propagated with the vector element of the vector through a processor architecture of the processor, without invoking the exception handler for the exception condition. An exception corresponding to the exception condition indicated by the special exception value is generated only in response to a non-speculative instruction being executed that performs a non-speculative operation on the vector element.
    Type: Application
    Filed: August 19, 2009
    Publication date: February 24, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Alan Gara, Michael K. Gschwind
  • Publication number: 20110047359
    Abstract: Mechanisms are provided for inserting indicated instructions for tracking and indicating exceptions in the execution of vectorized code. A portion of first code is received for compilation. The portion of first code is analyzed to identify non-speculative instructions performing designated non-speculative operations in the first code that are candidates for replacement by replacement operation-and-indicate instructions that perform the designated non-speculative operations and further perform an indication operation for indicating any exception conditions corresponding to special exception values present in vector register inputs to the replacement operation-and-indicate instructions. The replacement is performed and second code is generated based on the replacement of the at least one non-speculative instruction.
    Type: Application
    Filed: August 19, 2009
    Publication date: February 24, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Alan Gara, Michael K. Gschwind
  • Patent number: 7895418
    Abstract: There is disclosed an operand queue for use in a floating point unit. The floating point unit comprises floating point processing units for executing floating point instructions that write operands to an external memory and for executing floating point instructions that read operands from the external memory. The floating point also comprises an operand queue for storing a plurality of operands associated with one or more operations being processed in the floating point unit. The operand queue stores a first operand being written to an external memory by a floating point write instruction executed by a first one of the plurality of floating point processing units and supplies the first operand to a floating point read instruction executed by a second one of the plurality of floating point processing units subsequent to the execution of the floating point write instruction.
    Type: Grant
    Filed: November 28, 2005
    Date of Patent: February 22, 2011
    Assignee: National Semiconductor Corporation
    Inventor: Daniel W. Green
  • Publication number: 20110035745
    Abstract: A RISC processor apparatus and method for supporting an X86 virtual machine.
    Type: Application
    Filed: December 17, 2008
    Publication date: February 10, 2011
    Applicant: Institute of Computing Technology of the Chinese Academy of Sciences
    Inventors: Guojie Li, Weiwu Hu, Xiaoyu Li, Menghao Su
  • Publication number: 20110029760
    Abstract: A microprocessor executes an instruction specifying a floating-point input operand having a predetermined size and that instructs the microprocessor to round the floating-point input operand to an integer value using a rounding mode and to return a floating-point result having the same predetermined size. An instruction translator translates the instruction into first and second microinstructions. An execution unit executes the first and second microinstructions. The first microinstruction receives as an input operand the instruction floating-point input operand and generates an intermediate result from the input operand. The second microinstruction receives as an input operand the intermediate result of the first microinstruction and generates the floating-point result of the instruction from the intermediate result. The intermediate result is the same predetermined size as the instruction floating-point input operand.
    Type: Application
    Filed: May 20, 2010
    Publication date: February 3, 2011
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: Tom Elmer, Terry Parks
  • Publication number: 20110004644
    Abstract: Apparatus and methods are provided to perform floating point operations that are adaptive to the precision formats of input operands. The apparatus includes adaptive conversion logic and a tagged register file. The adaptive conversion logic receives the input operands, where each of the input operands is of a corresponding precision. The adaptive conversion logic also records the corresponding precision for use in subsequent floating point operations. The tagged register file is coupled to the adaptive conversion logic. The tagged register file stores the each of the input operands, and stores the corresponding precision and furthermore associates the corresponding precision with the each of the input operands. The subsequent floating point operations are performed at a precision level according to the corresponding precision.
    Type: Application
    Filed: July 3, 2009
    Publication date: January 6, 2011
    Applicant: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Rodney E. Hooker, Terry Parks
  • Patent number: 7865693
    Abstract: Mechanisms for aligning enhanced precision vectors based on reduced precision data values are provided. At least one data value, having a first precision type, is received for storing in a vector register. The vector register stores data as a vector having a plurality of vector elements. The first precision type is modified to have a second precision type different in precision than the first precision type to thereby generate at least one modified data value. The at least one modified data value is stored in at least one vector element of the plurality of vector elements. An alignment of the at least one modified data value is determined relative to a boundary of a vector element of the vector register. An alignment operation to re-align the at least one modified data value based on the boundary of the vector element of the vector register is performed.
    Type: Grant
    Filed: October 14, 2008
    Date of Patent: January 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Bruce M. Fleischer, Michael K. Gschwind
  • Publication number: 20100332805
    Abstract: An out-of-order renaming processor is provided with a register file within which aliasing between registers of different sizes may occur. In this way a program instruction having a source register of a double precision size may alias with two single precision registers being used as destinations of one or more preceding program instructions. In order to track this data dependency the double precision register may be remapped into a micro-operation specifying two single precision registers as its source register. In this way, scheduling circuitry may use its existing hazard detection and management mechanisms to handle potential data hazards and dependencies. Not all program instructions having such data hazards between registers of different sizes are handled by this source register remapping. For these other program instructions a slower mechanism for dealing with the data dependency hazard is provided.
    Type: Application
    Filed: June 24, 2009
    Publication date: December 30, 2010
    Applicant: ARM Limited
    Inventors: Conrado Blasco Allue, David James Williamson, James Nolan Hardage, Glen Andrew Harris, Robert Gregory McDonald
  • Publication number: 20100325397
    Abstract: A data processing apparatus is described which comprises processing circuitry responsive to data processing instructions to execute integer data processing operations and floating point data processing operations, a first set of integer registers useable by the processing circuitry in executing the integer data processing operations, and a second set of floating point registers useable by the processing circuitry in executing the floating point data processing operations.
    Type: Application
    Filed: May 3, 2010
    Publication date: December 23, 2010
    Inventor: Simon John Craske
  • Publication number: 20100325399
    Abstract: The described embodiments provide a processor that executes a vector instruction. The processor starts by receiving a vector instruction that uses at least one vector of values that includes N elements as an input. In addition, the processor optionally receives a predicate vector that includes N elements. The processor then executes the vector instruction. In the described embodiments, when executing the vector instruction, if the predicate vector is received, for one or more selected elements in the vector of values for which a corresponding element in the predicate vector is active, otherwise, for one or more selected elements in the vector of values, the processor checks the one or more selected elements to determine if the selected elements contain a predetermined value. When the selected elements contain the predetermined value, the processor sets a corresponding status flag.
    Type: Application
    Filed: August 31, 2010
    Publication date: December 23, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100325398
    Abstract: The described embodiments provide a processor for generating a result vector that contains results from a comparison operation. During operation, the processor receives a first input vector, a second input vector, and a control vector. When subsequently generating a result vector, the processor first captures a base value from a key element position in the first input vector. For selected elements in the result vector, processor compares the base value and values from relevant elements to the left of a corresponding element in the second input vector, and writes the result into the element in the result vector. In the described embodiments, the key element position and the relevant elements can be defined by the control vector and an optional predicate vector.
    Type: Application
    Filed: August 31, 2010
    Publication date: December 23, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100318772
    Abstract: A system and method for increasing processor throughput by decreasing a loop critical path. In one embodiment, a table comprises multiple stack entries, each comprising an x87 floating-point (FP) stack specifier. The combinatorial logic for operand translation of N FP instructions per clock cycle may require N instantiated copies of a combinatorial logic block. Each instantiated copy may determine a new ordering of the stack entries. Control logic may receive necessary information from the corresponding N FP instructions and determine a corresponding combined computational effect, or stack reordering, on entries within the table based on two or more instructions. Resulting control signals are conveyed to the N instantiated copies. A resulting accumulative delay from an input of the first copy to the output of the Nth copy may be less than or equal to (N?1)*time_delay versus a longer N*time_delay.
    Type: Application
    Filed: June 11, 2009
    Publication date: December 16, 2010
    Inventors: Ranganathan Sudhakar, Daryl Lieu, Debjit Das Sarma
  • Patent number: 7849291
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: December 7, 2010
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris
  • Patent number: 7849294
    Abstract: Illustrative embodiments determine the data type of the operand being accessed as well as analyze the data value subrange of the input operand data type. If the operand's data type does not match the required format of the instruction being processed, a determination is made as to whether a subrange of data values of the data type of the input operand is supported natively. If the subrange of data values of the input operand is not supported natively, then a format conversion is performed on the data and the instruction may then operate on the data. Otherwise, the data may be operated on directly by the instruction without a format conversion operation and thus, the conversion is not performed.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: December 7, 2010
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Brett Olsson
  • Publication number: 20100306510
    Abstract: Systems and methods for providing single cycle movement of data between a floating-point register file (FRF) and a general purpose or integer register file (RF) of a microprocessor system are provided. The system may include an integer execution unit operative to execute instructions with single cycle latency, a floating-point execution unit, a working register file (WRF), an FRF, and an IRF. To achieve the single cycle movement functionality, the integer execution unit may physically own the WRF, IRF, and FRF, and may monitor and control any dependencies between them. Thus, since the integer execution unit has direct read access to both the IRF and the FRF, data may be moved between the two register files using the single cycle operation of the integer execution unit, without the need to store and load the data from memory.
    Type: Application
    Filed: June 2, 2009
    Publication date: December 2, 2010
    Applicant: Sun Microsystems, Inc.
    Inventors: Christopher Olson, Robert T. Golla, Jeffrey S. Brooks
  • Patent number: 7840788
    Abstract: A process which automatically inserts commands that test for and raise exceptions indicating floating point status exceptions into a sequence of instructions to be executed, re-ordering a pipelined instructions by moving a floating point instruction from after a branch instruction to before the branch instruction, and responds to exceptions in execution of the sequence of instructions by returning execution to a point in the sequence of instructions at which correct state is known and then executing each instruction in the sequence singly to completion so that exceptions in pipelined floating point instructions can be automatically-detected and handled precisely.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: November 23, 2010
    Inventors: Guillermo J. Rozas, David Dunn, Robert F. Cmelik
  • Patent number: 7840622
    Abstract: Method to convert a hexadecimal floating point number (H) into a binary floating point number by using a Floating Point Unit (FPU) with fused multiply add with an A-register a B-register for two multiplicand operands and a C-register for an addend operand, wherein a leading zero counting unit (LZC) is associated to the addend C-register, wherein the difference of the leading zero result provided by the LZC and the input exponent (E) is calculated by a control unit and determines based on the Raw-Result-Exponent a force signal (F) with special conditions like ‘Exponent Overflow’, ‘Exponent Underflow’, and ‘Zero Result’.
    Type: Grant
    Filed: July 20, 2006
    Date of Patent: November 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Guenter Gerwig, Klaus Michael Kroener
  • Publication number: 20100281239
    Abstract: A system and method for efficient reliable execution on a simultaneous multithreading machine. A processor is placed in a reliable execution mode (REM) to detect possible errors during execution of a mission critical software application. Only two threads may be configured to operate in this mode. Floating-point store and integer-transfer unary instructions may be converted to new binary instructions. Each new instruction has two source operands, each one corresponding to a different thread is specified by a same logical register number as a single source operand of the original unary instruction. All other instructions are replicated, wherein the original instruction and its twin are assigned to different threads. Simultaneous multi-threaded (SMT) floating-point logic may only be able to provide lockstep execution when it communicates using the new instruction with instantiated integer independent clusters.
    Type: Application
    Filed: April 29, 2009
    Publication date: November 4, 2010
    Inventors: Ranganathan Sudhakar, Nhon T. Quach
  • Patent number: 7826935
    Abstract: A controller with a processing unit and a floating-point processor is disposed in a vehicle to control an actuator according to detected parameters sent from sensors. The processor generates a floating-point control parameter indicating a first quantity of control from the detected parameters at floating-point calculations, and the unit generates a fixed-point control parameter indicating a second quantity of control from the detected parameters at fixed-point calculations. The unit converts the floating-point control parameter into a converted control parameter of fixed-point representation and judges from both the converted control parameter and the fixed-point control parameter whether a failure has occurred in the processor. When no failure has occurred in the processor, the unit generates a control signal from the floating-point control parameter. The actuator is operated according to the control signal so as to give the first quantity of control to a controlled object.
    Type: Grant
    Filed: April 5, 2007
    Date of Patent: November 2, 2010
    Assignee: Denso Corporation
    Inventors: Hidetoshi Kobayashi, Eiji Takayama
  • Publication number: 20100274993
    Abstract: Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Robert T. Golla, Jama I. Barreh, Jeffrey S. Brooks, Howard L. Levy
  • Publication number: 20100274992
    Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks