Abstract: In one embodiment, a micro-processing system includes a hardware structure disposed on a processor core. The hardware structure includes a plurality of entries, each of which are associated with portion of code and a translation of that code which can be executed to achieve substantially equivalent functionality. The hardware structure includes a redirection array that enables, when referenced, execution to be redirected from a portion of code to its counterpart translation. The entries enabling such redirection are maintained within or evicted from the hardware structure based on usage information for the entries.
Abstract: A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.
Abstract: A multi-thread processor includes a plurality of hardware threads each of which generates an independent instruction flow, a first thread scheduler that continuously outputs a thread selection signal uniformly in a first period of a cycle of the first schedule pattern in accordance with a first schedule pattern or continuously outputs the thread selection signal uniformly in a second period of a cycle of the second schedule pattern in accordance with a second schedule pattern, the thread selection signal designating a hardware thread to be executed in a next execution cycle from among the plurality of hardware threads, a first selector that selects one of the plurality of hardware threads according to the thread selection signal and outputs an instruction generated by the selected hardware thread from among the plurality of hardware threads, and an execution pipeline that executes an instruction output from the first selector.
Abstract: In a multi-processor transaction execution environment, a transaction executes a hint instruction indicating proximity to completion of the transaction. Pending aborts of the transaction due to memory conflicts are suppressed based on the proximity of the transaction to completion.
Type:
Grant
Filed:
March 2, 2014
Date of Patent:
December 20, 2016
Assignee:
International Business Machines Corporation
Inventors:
Jonathan D. Bradbury, Dan F. Greiner, Michael Karl Gschwind, Maged M. Michael, Chung-Lung K. Shum
Abstract: Systems and methods for managing context switches among threads in a processing system. A processor may perform a context switch between threads using separate context registers. A context switch allows a processor to switch from processing a thread that is waiting for data to one that is ready for additional processing. The processor includes control registers with entries which may indicate that an associated context is waiting for data from an external source.
Type:
Grant
Filed:
May 1, 2015
Date of Patent:
December 13, 2016
Assignee:
ARM Finance Overseas Limited
Inventors:
Robert Gelinas, W. Patrick Hays, Sol Katzman, William J. Dally
Abstract: A domain-specific hardwired symbolic machine is disclosed that processes information via the flexible formation and hardwired mapping of symbols from one or more domains onto other such domains, computing and communicating with improved security because it has no CPU, no Random Access Memory (RAM), no instruction registers, no Instruction Set Architecture (ISA), no operating system (OS) and no applications programming. The machine may learn, e.g. from its users, via hardwired analysis of domain faults with associated recovery. The machine may modify itself according to interaction with its authorized authenticated users with self-modification via learning within application-specific, user-specific constraints hardwired into the original machine, eliminating configuration management and computer programming.
Abstract: A data processing apparatus and method of data processing are provided. The data processing apparatus comprises execution circuitry configured to execute a sequence of program instructions. Checkpoint circuitry is configured to identify an instance of a predetermined type of instruction in the sequence of program instructions and to store checkpoint information associated with that instance. The checkpoint information identifies a state of the data processing apparatus prior to execution of that instance of the predetermined type of instruction, wherein the predetermined type of instruction has an expected long completion latency.
Abstract: An aspect includes implementing endian-mode-sensitive memory instructions for a vector processor. One such system includes a byte addressable memory and a processor. The processor includes a register that includes a plurality of byte elements 0 to S. The system is configured to perform a method that includes obtaining an instruction by the processor and determining that the instruction is a memory access instruction specifying the register and a memory address. In response to the determination that the instruction is a memory access instruction and independent of a current global endian mode setting that is selectable in the processor, the memory access instruction is executed by copying the byte data between the memory and the register so that the byte element n of the register corresponds to the memory address+n for n=0 to S.
Type:
Grant
Filed:
February 28, 2014
Date of Patent:
November 29, 2016
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A unified architecture for dynamic generation, execution, synchronization and parallelization of complex instruction formats includes a virtual register file, register cache and register file hierarchy. A self-generating and synchronizing dynamic and static threading architecture provides efficient context switching.
Abstract: A multi-thread processor including a plurality of hardware threads each of which generates an independent instruction flow, a thread scheduler that outputs a thread selection signal in accordance with a schedule, the thread selection signal designating a hardware thread to be executed in a next execution cycle among the plurality of hardware threads, and a first selector that selects one of the plurality of hardware threads according to the thread selection signal and outputs an instruction generated by the selected hardware thread. The thread scheduler specifies execution of at least one hardware thread pre-selected among the plurality of hardware threads in a predetermined first execution period, and specifies execution of a variably selected hardware thread in a second execution period other than the first execution period. A time ratio between the predetermined first execution period and the second execution period is set according to processing requests.
Abstract: Multiple sets of character data having termination characters are compared using parallel processing and without causing unwarranted exceptions. Each set of character data to be compared is loaded within one or more vector registers. In particular, in one embodiment, for each set of character data to be compared, an instruction is used that loads data in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. Further, an instruction is used to find the index of the first delimiter character, i.e., the first zero or null character, or the index of unequal characters. Using these instructions, a location of the end of one of the sets of data or a location of an unequal character is efficiently provided.
Type:
Grant
Filed:
December 4, 2014
Date of Patent:
October 25, 2016
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Jonathan D. Bradbury, Michael K. Gschwind, Timothy J. Slegel
Abstract: Systems, processors, and methods for determining when to enter loop buffer mode early for loops in an instruction stream. A processor waits until a branch history register has saturated before entering loop buffer mode for a loop if the processor has not yet determined the loop has an unpredictable exit. However, if the loop has an unpredictable exit, then the loop is allowed to enter loop buffer mode early. While in loop buffer mode, the loop is dispatched from a loop buffer, and the front-end of the processor is powered down until the loop terminates.
Abstract: A processor includes a plurality of execution units. At least one of the execution units is configured to determine, based on a field of a first instruction, a number of additional instructions to execute in conjunction with the first instruction and prior to execution of the first instruction.
Abstract: A transactional memory system dynamically predicts the resource requirements of hardware transactions. A processor of the transactional memory system predicts resource requirements of a first hardware transaction to be executed based on any one of a resource hint and a previous execution of a prior hardware transaction. The processor allocates resources for the first hardware transaction based on the predicted resource requirements. The processor executes the first hardware transaction. The processor saves resource usage information of the first hardware transaction for future prediction.
Type:
Grant
Filed:
February 27, 2014
Date of Patent:
October 18, 2016
Assignee:
International Business Machines Corporation
Inventors:
Fadi Y. Busaba, Dan F. Greiner, Michael K. Gschwind, Maged M. Michael, Valentina Salapura, Chung-Lung K. Shum
Abstract: An integrated circuit comprising a set of data processing units including a first data processing unit and at least one second data processing unit operable at variable frequencies is disclosed. The integrated circuit further includes an instruction scheduler adapted to evaluate data dependencies between individual instructions in a received plurality of instructions and assign the instructions to the first data processing unit and the at least one second data processing unit for parallel execution in accordance with said data dependencies. The integrated circuit is operable in a first power mode and a second power mode. The second power mode is a reduced power mode compared to the first power mode and is adapted to adjust the operating frequency of the first data processing unit and the at least one second data processing unit in the second power mode as a function of the evaluated data dependencies.
Type:
Grant
Filed:
March 3, 2014
Date of Patent:
October 11, 2016
Assignee:
NXP B.V.
Inventors:
Hamed Fatemi, Jose Pineda de Gyvez, Juan Echeverri Escobar
Abstract: In one embodiment, the present invention includes a processor that has an on-die storage such as a static random access memory to store an architectural state of one or more threads that are swapped out of architectural state storage of the processor on entry to a system management mode (SMM). In this way communication of this state information to a system management memory can be avoided, reducing latency associated with entry into SMM. Embodiments may also enable the processor to update a status of executing agents that are either in a long instruction flow or in a system management interrupt (SMI) blocked state, in order to provide an indication to agents inside the SMM. Other embodiments are described and claimed.
Type:
Grant
Filed:
October 8, 2013
Date of Patent:
October 11, 2016
Assignee:
Intel Corporation
Inventors:
Mahesh Natu, Thanunathan Rangarajan, Gautam Doshi, Shamanna M. Datta, Baskaran Ganesan, Mohan J. Kumar, Rajesh S. Parthasarathy, Frank Binns, Rajesh Nagaraja Murthy, Robert C. Swanson
Abstract: A deferral instruction associated with a transaction is executed in a transaction execution computing environment with transactional memory. Based on executing the deferral instruction, a processor sets a defer-state indicating that pending disruptive events such as interrupts or conflicting memory accesses are to be deferred. A pending disruptive event is deferred based on the set defer-state, and the transaction is completed based on the disruptive event being deferred. The progress of the transaction may be monitored during a deferral period. The length of such deferral period may be specified by the deferral instruction. Whether the deferral period has expired may be determined based on the monitored progress of the transaction. If the deferral period has expired, the transaction may be aborted and the disruptive event may be processed.
Type:
Grant
Filed:
February 27, 2014
Date of Patent:
October 11, 2016
Assignee:
International Business Machines Corporation
Abstract: Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.
Type:
Grant
Filed:
April 2, 2012
Date of Patent:
August 30, 2016
Assignee:
NVIDIA Corporation
Inventors:
Peter Michael Nelson, Jack Hilaire Choquette, Olivier Giroux
Abstract: In a multi-processor transaction execution environment a transaction is executed a plurality of times. Based on the executions, a duration is predicted for executing the transaction. Based on the predicted duration, a threshold is determined. Pending aborts of the transaction due to memory conflicts are suppressed based on the transaction exceeding the determined threshold.
Type:
Grant
Filed:
February 27, 2014
Date of Patent:
August 30, 2016
Assignee:
International Business Machines Corporation
Inventors:
Jonathan D. Bradbury, Harold W. Cain, III, Michael Karl Gschwind, Maged M. Michael, Valentina Salapura, Chung-Lung K. Shum, Timothy J. Slegel
Abstract: A method, computer program product, and system are provided for scheduling a plurality of instructions in a computing system. For example, the method can generate a plurality of instruction lineages, in which the plurality of instruction lineages is assigned to one or more registers. Each of the plurality of instruction lineages has at least one node representative of an instruction from the plurality of instructions. The method can also determine a node order based on respective priority values associated with each of the nodes. Further, the method can include scheduling the plurality of instructions based on the node order and the one or more registers assigned to the one or more registers.