Abstract: Methods for latest producer tracking in a processor. In one embodiment, the method includes the steps of (1) writing a physical register identification value in a first register rename map location specified by a first instruction, (2) writing a first in-register status value in a second register rename map location specified by the first instruction, (3) writing a producer tracking status value at a producer tracking map location specified by the physical register identification value, and (4) modifying, upon graduation of the first instruction, the first in-register status value only if the producer tracking map location stores the producer tracking status value written in step (3). Other methods are also presented.
Abstract: In a logically partitioned host computer system comprising host processors (host CPUs), a facility and instruction for discovering topology of one or more guest processors (guest CPUs) of a guest configuration comprises a guest processor of the guest configuration fetching and executing a STORE SYSTEM INFORMATION instruction that obtains topology information of the computer configuration. The topology information comprising nesting information of processors of the configuration and the degree of dedication a host processor provides to a corresponding guest processor. The information is preferably stored in a single table in memory.
Type:
Grant
Filed:
January 11, 2008
Date of Patent:
June 8, 2010
Assignee:
International Business Machines Corporation
Inventors:
Mark S. Farrell, Charles W. Gainey, Jr., Jeffrey P. Kubala, Donald W. Schmidt
Abstract: A data processing system includes a plurality of functional units that selectively execute instructions. A register file includes a plurality of registers that store data corresponding to the instructions. A reorder buffer communicates with the register file and stores the data, includes at least one bypassable buffer location, and includes at least one non-bypassable buffer location.
Type:
Grant
Filed:
August 2, 2006
Date of Patent:
June 1, 2010
Assignee:
Marvell International Ltd.
Inventors:
Hong-Yi Chen, Richard Lee, Geoffrey K. Yung, Jensen Tjeng
Abstract: A processing element (1) forming part of a parallel processing array such as SIMD comprises an arithmetic logic unit (ALU) (3), a multiplexer (MUX) (5), an accumulator (ACCU) (7) and a flag register (FLAG) (9). The ALU is configured to operate on a common instruction received by all processing elements in the processing array. The processing element (1) further comprises a storage element (SE) (11), which supports the processing of local customized (i.e. data dependent) processing in the processing element (1), such as lookup table operations and the storing local coefficient data.
Type:
Grant
Filed:
August 3, 2004
Date of Patent:
May 25, 2010
Assignee:
NXP B.V.
Inventors:
Om Prakash Gangwal, Anteneh Alemu Abbo, Richard Petrus Kleihorst
Abstract: An operation apparatus includes a plurality of operation device units; a configuration memory storing setting information provided for each predetermined state of the plurality of operation device units; and a sequencer controlling the plurality of operation device units by outputting transition destination addresses designating relevant information from configuration information comprising the setting information provided for each state of the operation device units stored in the configuration memory, wherein the sequencer carries out operation based on task information previously loaded and a change-over condition signal output from the plurality of operation device units, and generates the transition destination address to output to the configuration memory.
Abstract: A computing system may support an endian toggle register (ETR) and the endianess of the endian toggle register may be designated using a set endian bit (SEB) or a clear endian bit (CEB) instruction. An endian conversion is performed on the data that is moved into and moved out of the ETR. However, if the destination memory is an endian toggle disabled memory, the contents of the ETR may be transferred to the endian toggle disabled memory without performing the endian conversion. A compiler supported on the computing system may comprise an endian storage class to perform endian conversion, transparently, using high-level languages.
Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.
Abstract: A reconfigurable processor which is capable of carrying out or continuing processing even after occurrence of an error in a data processing unit within the reconfigurable processor. The reconfigurable processor has a processing element matrix comprised of a plurality of processing elements. The reconfigurable processor reconfigures the processing element matrix according to an error of the processing element matrix.
Abstract: A configurable coprocessor interface between a central processing unit (CPU) and a coprocessor is provided. The coprocessor interface has an instruction transfer signal group for transferring different instruction types from the CPU to the coprocessor, sequentially or in parallel, a busy signal group, for allowing the coprocessor to signal the CPU that it cannot receive a transfer of one or more of the different instruction types, and an instruction order signal group for indicating to the coprocessor a relative execution order for multiple instructions that are transferred in parallel. In addition, the coprocessor interface includes separate data transfer signal groups for data being transferred from the CPU to the coprocessor, and for data being transferred from the coprocessor to the CPU, along with a data order signal group for indicating a relative order of data (if transferred out-of-order).
Type:
Grant
Filed:
February 14, 2007
Date of Patent:
April 13, 2010
Assignee:
MIPS Technologies, Inc.
Inventors:
Lawrence Henry Hudepohl, Darren Miller Jones, Radhika Thekkath, Franz Treue
Abstract: Method and apparatus for configuring a processor embedded in an integrated circuit for use as a logic element is described. In one example, a processing apparatus in an integrated circuit includes a point-to-point data streaming interface and arithmetic logic unit (ALU) circuitry. The ALU circuitry includes at least one input port in communication with the point-to-point data streaming interface. The processor may also include a register file and multiplexer logic. The multiplexer logic is configured to selectively couple the register file and the point-to-point streaming interface to the at least one input port of the ALU circuitry.
Abstract: A method includes providing a debug instruction and providing a debug control register field, where if the debug control register field has a first value, the debug instruction executes a debug operation and where if the debug control register field has a second value, the debug instruction is to be executed as a no-operation (NOP) instruction. A data processing system includes instruction fetch circuitry for receiving a debug instruction, a debug control register field, and debug execution control circuitry for controlling execution of the debug instruction in a first manner if the debug control register field has a first value and in a second manner if the debug control register field has a second value, where in the first manner a debug operation is performed and in the second manner no debug operation is performed.
Type:
Grant
Filed:
October 12, 2007
Date of Patent:
March 30, 2010
Assignee:
Freescale Semiconductor, Inc
Inventors:
William C. Moyer, Michael D. Snyder, Gary L. Whisenhunt
Abstract: Reference architecture instructions are translated into target architecture operations. In some embodiments, an execution unit of a processor executes a function determined from a collection of operations, the function specifying functionality based on instructions, the collection selected from operations translated from the instructions. In further embodiments, the function is specified as a fused operation. Sequences of operations are optimized by fusing collections of operations; fused operations specify a same observable function as respective collections, but advantageously enable more efficient processing. In some embodiments, a collection comprises multiple register operations. Sequences of operations, in a predicted execution order in some embodiments, form traces. In some embodiments, fusing operations requires setting only final architectural state, such as final flag state; intermediate architectural state is used implicitly in a fused operation.
Abstract: Instruction dispatch in a multithreaded microprocessor such as a graphics processor is not constrained by an order among the threads. Instructions for each thread are fetched, and a dispatch circuit determines which instructions in the buffer are ready to execute. The dispatch circuit may issue any ready instruction for execution, and an instruction from one thread may be issued prior to an instruction from another thread regardless of which instruction was fetched first. If multiple functional units are available, multiple instructions can be dispatched in parallel.
Type:
Grant
Filed:
October 10, 2006
Date of Patent:
March 9, 2010
Assignee:
NVIDIA Corporation
Inventors:
John Erik Lindholm, Brett Coon, Simon S. Moy
Abstract: A processor having a zero-overhead operand copy capability. The processor includes multiple execution units to execute instructions in parallel and multiple register files each associated with one or more of the execution units. The processor further includes circuitry to select either an instruction execution result from a first one of the execution units or content of a register within a first one of the register files associated with the first one of the execution units to be stored within a register within a second one of the register files.
Abstract: A system that executes a long transaction in a system with limited transactional hardware resources. During operation, the system executes the long transaction in a non transactional mode, which does not use transactional hardware resources. The system defers stores generated during the long transaction so that the stores are not committed to the architectural state of a processor until the transaction is successfully completed. If the long transaction successfully completes, the system commits the long transaction, which involves performing multiple hardware transactions to commit the deferred stores to the architectural state of the processor.
Abstract: Intermediate results are passed between constituent instructions of an expanded instruction using register renaming resources and control logic. A first constituent instruction generates intermediate results and is assigned a PRN in a constituent instruction rename table, and writes intermediate results to the identified physical register. A second constituent instruction performs a look up in the constituent instruction rename table and reads the intermediate results from the physical register. Constituent instruction rename logic tracks the constituent instructions through the pipeline, and delete the constituent instruction rename table entry and returns the PRN to a free list when the second constituent instruction has read the intermediate results.
Type:
Grant
Filed:
January 24, 2007
Date of Patent:
February 23, 2010
Assignee:
QUALCOMM Incorporated
Inventors:
Michael Scott McIlvaine, James Norris Dieffenderfer, Nathan Samuel Nunamaker, Thomas Andrew Sartorius, Rodney Wayne Smith
Abstract: In a performance analyzing apparatus, a setting unit sets an event of which the performance is desired to be monitored, a detecting unit detects an instruction address at the time of generation of an interrupt signal from a timer, and a calculating unit calculates a variation amount of a counted value by a hardware counter at a detected instruction address. The variation amount is accumulatively retained for each detected instruction address. A specifying unit specifies an instruction address that corresponds to the event, and a display unit displays a graph of the total variation amounts.
Abstract: A method and apparatus for steering instructions dynamically, at issue time, so as to maximize the efficiency of use of execution units being shared by multiple threads being processed by an SMT processor. Resource vectors are used at issue time to redirect instructions, from threads being processed simultaneously, to shared resources for which the multiple threads are competing. The existing resource vectors for instructions that are queued for issuance are analyzed and, where appropriate, dynamically recalculated and modified for maximum efficiency.
Type:
Grant
Filed:
January 14, 2008
Date of Patent:
January 19, 2010
Assignee:
International Business Machines Corporation
Inventors:
Hung Qui Le, Dung Quoc Nguyen, Brian William Thompto, Raymond Cheung Yeung
Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledge indication is asserted that corresponds to an identified replay case of the subset.
Type:
Grant
Filed:
October 10, 2006
Date of Patent:
January 12, 2010
Assignee:
Apple Inc.
Inventors:
Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller