Abstract: Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.
Type:
Application
Filed:
July 25, 2011
Publication date:
November 17, 2011
Applicant:
MICROSOFT CORPORATION
Inventors:
John Joseph Duffy, Jan Gray, Yosseff Levanoni
Abstract: A circuit arrangement and method support efficient indexing into large register files by utilizing register address sequence detection, wherein register addresses to be used by an instruction are produced by concatenating a portion of the address that is contained in the instruction with another portion that is speculatively produced by sequence detection logic. The portion of the correct full address that is not contained in the instruction is stored in a software accessible special purpose register. If the end of a particular sequence of addresses is detected by the sequence detection logic, the invention speculatively assumes that the next address in the sequence will be used. Since only a portion of the full addresses are stored in the instruction, they occupy less instruction space than the full address widths. An instruction may include at least one address portion that identifies a register address.
Type:
Application
Filed:
May 12, 2010
Publication date:
November 17, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Eric O. Mejdrich, Adam J. Muff, Robert A. Shearer, Matthew R. Tubbs
Abstract: A programming interface device for a programmable logic circuit comprises a series of parallel logic block chains each having first and second connection means, the first and second connection means being disposed at opposite ends of each chain. The programming interface device comprises first and second interfacing means for interfacing with the first and second connection means of each logic block chain, respectively and at least one programming circuit, each programming circuit arranged to configure a plurality of serially connected logic blocks. Finally, the programming interface comprises programmable connection means for connecting the connection means of each logic block chain to either the connection means of another logic block chain or directly to one of the at least one programming circuits, such that the parallel logic block chains can be configured in parallel, series or in any combination thereof.
Abstract: A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline may be generated and one or more graph transformations may be applied to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations. The deferred, combined parallel operations may be executed to produce materialized parallel data objects corresponding to the deferred parallel data objects.
Type:
Application
Filed:
June 4, 2010
Publication date:
November 10, 2011
Applicant:
Google Inc.
Inventors:
Craig D. Chambers, Ashish Raniwala, Frances J. Perry, Stephen R. Adams, Robert R. Henry, Robert Bradshaw, Nathan Weizenbaum
Abstract: Execution units process commands from one or more command queues. Once a command is available on the queue, each unit participating in the execution of the command atomically decrements the command's work groups remaining counter by the work group reservation size and processes a corresponding number of work groups within a work group range. Once all work groups within a range are processed, an execution unit increments a work group processed counter. The unit that increments the work group processed counter to the value stored in a work groups to be executed counter signals completion of the command. Each execution unit that access a command also marks a work group seen counter. Once the work groups processed counter equals the work groups to be executed counter and the work group seen counter equals the number of execution units, the command may be removed or overwritten on the command queue.
Type:
Grant
Filed:
August 31, 2009
Date of Patent:
November 8, 2011
Assignee:
International Business Machines Corporation
Inventors:
Benjamin G. Alexander, Gregory H. Bellows, Joaquin Madruga, Brian D. Watt
Abstract: A disclosed information processing system includes a receiving node and a storing node, the receiving node includes an order information adding unit that adds first order information to operation instructions included in an operation instruction sequence, the first order information indicating an order among the operation instruction sequences and an operation instruction transmission unit that transmits the one or more operation instructions to the storing node, the storing node includes an operation instruction execution unit that executes the operation instructions. Further, upon a receipt of a second operation instruction having the first order information indicating that the second operation instruction is earlier than the one or more first operation instructions, which was already executed, in the first order relationship, the storing node re-executes the first operation instruction after the second operation instruction is executed.
Abstract: A circuit for tracking memory operations with trace-based execution is disclosed. Each trace includes a sequence of operations that includes zero or more of the memory operations. The memory operations being executed form a set of active memory operations that have a predefined program order among them. At least some of the active memory operations access the memory in an execution order that is different from the program order. Checkpoint entries are associated with each trace. Each entry refers to a checkpoint location. Executing one of the active memory operations updates a checkpoint location. During the operation of the circuit, none of the operations of a given trace has any effect on the execution unit's architectural state prior to committing that trace. Each trace becomes eligible for commitment after all operations in the trace complete executing. After the trace is committed, all of the checkpoint entries associated with the trace are invalidated.
Type:
Grant
Filed:
February 13, 2008
Date of Patent:
November 1, 2011
Assignee:
Oracle America, Inc.
Inventors:
John Gregory Favor, Paul G. Chan, Graham Ricketson Murphy, Joseph Byron Rowlands
Abstract: Upon execution of an interrupt return (IRET) instruction when a second interrupt is pending, rather than popping a stack, obtaining processor state information, and then pushing the state information back onto the stack prior to vectoring off to a second interrupt service routine, direct vectoring is employed such that the stack is not pushed or popped but rather the processor vectors directly from the IRET instruction in the first interrupt service routine to the second interrupt service routine. A novel stored interrupt enable (SIE) bit stores whether maskable interrupts were enabled at the time the first interrupt service routine was entered. Execution of IRET automatically checks the SIE. If the SIE indicates interrupts were enabled, then direct vectoring occurs. If the SIE indicates that interrupts were disabled, then the second interrupt remains pending, and an interrupt return operation is performed by popping the stack and restoring the prior processor state.
Abstract: A Programmable Arithmetic Logic Unit Cluster is claimed. The plurality of Programmable Logic Blocks (50) in the cluster are in a physically linear sequence; but, will process data in parallel when the data pathways permit. A physically linear and operationally parallel design is possible mainly due to an Internal Register Bus (30). A small subset of data from the Internal Register Bus (30) is modified by each Arithmetic Logic Unit (53). The greater chunk of information of the Internal Register Bus (30) passes, without changes made to the data, through a plurality of two-to-one-multiplexers (54) as data inputs, bypassing the Data Selection (52) and Arithmetic Logic Unit's (53) circuitry. Only the data specified as the Accumulator (41) and Carry History (31) are modified by the blocks (50). The Accumulator (41) and the Carry Output Line (44) are distributed back onto the Internal Register Bus (30) for subsequent blocks or for the Output Register File (79) by said two-to-one-multiplexers (54).
Abstract: A method and system for thread scheduling for optimal heat dissipation are provided. Temperature sensors measure temperature throughout various parts of a processor chip. The temperatures detected are reported to an operating system or the like for scheduling threads. In one aspect, the observed temperature values are recorded on registers. An operating system or the like reads the registers and schedules threads based on the temperature values.
Type:
Grant
Filed:
July 7, 2006
Date of Patent:
November 1, 2011
Assignee:
International Business Machines Corporation
Inventors:
Orran Y. Krieger, Bryan S. Rosenburg, Robert B. Tremaine, Robert W. Wisniewski
Abstract: A method for a dynamically configurable imaging process, including: inputting, using a graphical user interface for at least one first specially programmed computer, computer instructions for executing at least one image processing task; creating template type information using the computer instructions and a first processor for the first computer, the template type information in a database; retrieving, using a second processor for at least one second specially programmed computer, the template type information from the database; inserting, using the second processor, the template type information into a service flow for a first imaging process; and executing, using the second processor, the template type information such that the computer instructions for the at least one image processing task are implemented as part of the service flow for the first imaging process.
Abstract: A multi-threaded processor that is capable of responding to, and processing, multiple low-latency-tolerant events concurrently and while using relatively slow, low-power memories is disclosed. The illustrative embodiment comprises a multi-threaded processor, which itself comprises a context controller and a plurality of hardware contexts. Each hardware context is capable of storing the current state of one thread in a form that enables the processor to quickly switch to or from the execution of that thread. To enable the processor to be capable of responding to low-latency-tolerant events quickly, each thread—and, therefore, each hardware context is prioritized—depending on the latency tolerance of the thread responding to the event.
Abstract: A method includes determining a rate of resource occupancy of a constituent stage of an unbalanced instruction pipeline implemented in a processor through profiling an instruction code. The method also includes performing data processing at a maximum throughput at an optimum clock frequency based on the rate of resource occupancy.
Abstract: A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.
Abstract: Executing a block of code is disclosed. Executing includes receiving an indication that the block of code is to be executed using a synchronization mechanism and speculatively executing the block of code on a virtual machine. The block of code may include application code. The block of code does not necessarily indicate that the block of code should be speculatively executed.
Abstract: In a processor including a plurality of register groups, while a task is being executed using one of the register groups, a context of a task to be executed next is restored into another one of the register groups. If the execution of the task currently being executed is suspended before the restoration starts, the task execution is continued by using one of the register groups in which a context of a task executed previously remains and executing the task.
Abstract: The implementation of a counter in a microcontroller adapted to the JavaCard language while respecting the atomicity of a modification of the value of this counter, wherein the counter is reset by the sending to the microcontroller of an instruction to verify a user code by submitting a correct code, and the value of the counter is decremented by the sending to the microcontroller of the instruction to verify the user code with an erroneous code value.
Abstract: An instruction processing pipeline 6 is provided. This has error detection and error recovery circuitry 20 associated with one or more of the pipeline stages. If an error is detected within a signal value within that pipeline stage, then it can be repaired. Part of the error recovery may be to flush upstream program instructions from the instruction pipeline 6. When multi-threading, only those instructions from a thread including an instruction which has been lost as a consequence of the error recovery need to be flushed from the instruction pipeline 6. Instruction can also be selected for flushing in dependence upon characteristics such as privileged level, number of dependent instructions etc.
Type:
Grant
Filed:
March 14, 2008
Date of Patent:
October 11, 2011
Assignee:
ARM Limited
Inventors:
Emre Ă–zer, Shidhartha Das, David Michael Bull
Abstract: Cell processor task management in a cell processor having a main memory, one or more power processor units (PPU) and one or more synergistic processing units (SPU), each SPU having a processor and a local memory is described. An SPU task manager (STM) running on one or more of the SPUs reads one or more task definitions stored in the main memory into the local memory of a selected SPU. Based on information contained in the task definitions the SPU loads code and/or data related to the task definitions from the main memory into the local memory associated with the selected SPU. The selected SPU then performs one or more tasks using the code and/or data.
Type:
Grant
Filed:
September 27, 2005
Date of Patent:
October 11, 2011
Assignee:
Sony Computer Entertainment Inc.
Inventors:
John P. Bates, Payton R. White, Richard B. Stenson, Howard Berkey, Atilla Vass, Mark Cerny, John Morgan
Abstract: Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.
Type:
Application
Filed:
June 20, 2011
Publication date:
October 6, 2011
Inventors:
Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
Abstract: Embodiments of the present invention relate to a system and method for providing processing capacity on demand. According to the embodiments, a processor package has a plurality of processing elements. One or more of the processing elements may be made active in response to increased demand for processing capacity based on modifiable authorization information.
Abstract: Various computing center control and cooling apparatus and methods are disclosed. In one aspect, a method of controlling plural processors of a computing system is provided. The method includes monitoring activity levels of the plural processors over a time interval to determine plural activity level scores. The plural activity level scores are compared with predetermined processor activity level scores corresponding to preselected processor operating modes to determine a recommended operating mode for each of the plural processors. Each of the plural processors is instructed to operate in one of the recommended operating modes.
Type:
Application
Filed:
April 6, 2010
Publication date:
October 6, 2011
Inventors:
Gamal Refai-Ahmed, Stanley Ossias, Maxat Touzelbaev
Abstract: A method for performing image signal processing (ISP) with the aid of a graphics processing unit (GPU) includes: utilizing an ISP pipeline to perform pre-processing on source data of at least one portion of at least one source frame image to generate intermediate data of at least one portion of at least one intermediate frame image, where the ISP pipeline stores the intermediate data into a memory; and utilizing the GPU to retrieve the intermediate data from the memory and perform specific processing on the intermediate data to generate processed data, where the GPU stores the processed data into the external/on-chip memory. In particular, at least one of the intermediate data and the processed data complies with a specific color format. In addition, an associated apparatus is further provided.
Type:
Application
Filed:
March 30, 2010
Publication date:
October 6, 2011
Inventors:
You-Ming Tsao, Yu-Chung Chang, Yuan-Chung Lee
Abstract: A coprocessor interface unit for interfacing a coprocessor to an out-of-order execution pipeline, and applications thereof. In an embodiment, the coprocessor interface unit includes an in-order instruction queue, a coprocessor load data queue, and a coprocessor store data queue. Instructions are written into the in-order instruction queue by an instruction dispatch unit. Instructions exit the in-order instruction queue and enter the coprocessor. In the coprocessor, the instructions operate on data read from the coprocessor load data queue. Data is written back, for example, to memory or a register file by inserting the data into the out-of-order execution pipeline, either directly or via the coprocessor store data queue, which writes back the data.
Abstract: Source code to be processed is analyzed and configuration data in implementing in accordance with each of plural implementation systems is created and is stored in a local memory of a DRP incorporating system. When execution of target processing is started, the implementation system determination processing calculates estimated processing time when the configuration of each of the implementation systems is adopted and determines the optimum one of the implementation systems based on a combination of the estimated processing time and the circuit scale of the configuration.
Abstract: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
Abstract: A hierarchical microprocessor. An embodiment of a hierarchical microprocessor includes a plurality of first-level instruction pipeline elements; a plurality of execution clusters, where each execution cluster is operatively coupled with each of the first-level instruction pipeline elements. Each execution cluster includes a plurality of second-level instruction pipeline elements, where each of the second-level instruction pipeline elements corresponds with a respective first-level instruction pipeline element, and one or more instruction execution units operatively coupled with each of the second-level instruction pipeline elements, where the microprocessor is configured to execute multiple execution threads using the plurality of first-level instruction pipeline elements and the plurality of execution clusters.
Abstract: Certain embodiments for reducing instruction storage space for a processor integrated in a network adapter chip may include generating MIPS instructions from corresponding new instructions. The new instructions may be in patch code instruction (PCI) format. The new instructions may be decoded and the MIPS instructions may be generated by a MIPS processor within a network adapter chip. Decoding the new instructions may also be referred to as interpreting the new instructions. The new instructions may comprise fewer bits than the generated MIPS instructions. The generated MIPS instructions may be executed by the MIPS processor within the network adapter chip.
Type:
Grant
Filed:
November 14, 2005
Date of Patent:
September 27, 2011
Assignee:
Broadcom Corporation
Inventors:
Kelly Yu, Takashi Tomita, Jonathan F. Lee
Abstract: A processor and a processor control method which efficiently perform an operation on data using a register, are provided. The register may include a data type field and a data field. The processor may generate the data type bits and store the generated data type bits in the data type field.
Abstract: An interface to a dynamically configurable arithmetic unit can include data alignment modules, where each data alignment module receives input variables being associated with one or more arithmetic expressions. The interface can include multiplexers coupled to the data alignment modules, wherein a data alignment module has outputs coupled to a first multiplexer. The first multiplexer can have a selection line and an output coupled to an input port of the dynamically configurable arithmetic unit. The interface can include a second multiplexer having input instructions and the selection line, where each instruction is associated with one of the arithmetic expressions and has an operation to be performed by the dynamically configurable arithmetic unit. The second multiplexer is configurable to provide selected ones of the input instructions to the dynamically configurable arithmetic unit through an output of the second multiplexer responsive to the selection line.
Type:
Grant
Filed:
April 1, 2009
Date of Patent:
September 20, 2011
Assignee:
Xilinx, Inc.
Inventors:
Bradley L. Taylor, Arvind Sundararajan, Shay Ping Seng, L. James Hwang
Abstract: A processor includes a circuit for tracking memory operations with trace-based execution. Each trace includes a sequence of operations that includes zero or more of the memory operations. The memory operations being executed form a set of active memory operations that have a predefined program order among them. At least some of the active memory operations access the memory in an execution order that is different from the program order. During the operation of the circuit, none of the operations of a given trace has any effect on the execution unit's architectural state prior to committing that trace. Each trace becomes eligible for commitment after all operations in the trace complete executing. The circuit also includes a sub-circuit that holds memory operation ordering information corresponding to the active memory operations. The sub-circuit detects violations of ordering constraints.
Type:
Grant
Filed:
February 13, 2008
Date of Patent:
September 20, 2011
Assignee:
Oracle America, Inc.
Inventors:
John Gregory Favor, Paul G. Chan, Graham Ricketson Murphy, Joseph Byron Rowlands
Abstract: A system and method for determine which threads to execute at a given time in a multi-threaded computer system. A thread prioritizer determines execution fairness between pairs of potentially executing threads. A switch enabler determines forward progress of each executing thread. The resulting indicators from the thread prioritizer and switch enabler may aid in the determination of whether or not to switch a particular potentially executing thread into execution resources.
Abstract: Reducing pipeline stall between a compute unit and address unit in a processor can be accomplished by computing results in a compute unit in response to instructions of an algorithm; storing in a local random access memory array in a compute unit predetermined sets of functions, related to the computed results for predetermined sets of instructions of the algorithm; and providing within the compute unit direct mapping of computed results to related function.
Type:
Grant
Filed:
October 26, 2005
Date of Patent:
September 20, 2011
Assignee:
Analog Devices, Inc.
Inventors:
James Wilson, Joshua A. Kablotsky, Yosef Stein, Colm J. Prendergast, Gregory M. Yukna, Christopher M. Mayer
Abstract: A system for communicating command parameters between a processor and a memory flow controller is provided. The system makes use of a channel interface as the primary mechanism for communicating between the processor and a memory flow controller. The channel interface provides channels for communicating with processor facilities, memory flow control facilities, machine state registers, and external processor interrupt facilities, for example. These channels may be designated as blocking or non-blocking. With blocking channels, when no data is available to be read from the corresponding registers, or there is no space available to write to the corresponding registers, the processor is placed in a low power “stall” state. The processor is automatically awakened, via communication across the blocking channel, when data becomes available or space is freed. Thus, the channels of the present invention permit the processor to stay in a low power state.
Type:
Grant
Filed:
April 21, 2008
Date of Patent:
September 20, 2011
Assignee:
International Business Machines Corporation
Inventors:
Michael N. Day, Charles R. Johns, Peichun P. Liu, Todd E. Swanson, Thuong Q. Truong
Abstract: A processor for supporting a MIMO operation and method of processing a MIMO instruction are provided. The MIMO operation supporting processor may include a scheduler and at least one functional unit. The scheduler may map multiple inputs of the MIMO instruction to a plurality of sequential input cycles, respectively, and may map multiple outputs of the MIMO instruction to a plurality of sequential output cycles, respectively. The output cycles may be followed by the input cycles and a predetermined number of cycles for a MIMO operation. A functional unit may read a register during sequential input cycles, may perform a MIMO operation during a predetermined number of execution cycles, and may write the result of the MIMO operation into a register during sequential output cycles.
Type:
Application
Filed:
December 9, 2010
Publication date:
September 15, 2011
Applicant:
SAMSUNG ELECTRONICS CO., LTD.
Inventors:
Tai-Song JIN, Dong-Hoon Yoo, Bernhard Egger, Won-Sub Kim
Abstract: An electronic locking and sealing device, including: a housing, a lock tongue, an operating unit, a main control unit, and a connecting part, the operating unit is disposed in the housing and fixedly connected to the lock tongue, and operates to push out or to take back the locking tongue from or in the housing, the main control unit is connected to the operating unit, and operates to receive an external control instruction whereby automatically controlling the operating unit, and the connecting part has a hole disposed at the center thereof and corresponding to the lock tongue.
Abstract: An embodiment of the present invention includes a circuit for tracking memory operations with trace-based execution. Each trace includes a sequence of operations that includes zero or more of the memory operations. The memory operations being executed form a set of active memory operations that have a predefined program order among them and corresponding ordering constraints. At least some of the active memory operations access the memory in an execution order that is different from the program order. Checkpoint entries are associated with each trace. Violations of the ordering constraints may be signaled too late to prevent an update of the cached data associated with the memory operations. A sub-circuit detects this condition and invalidates the checkpoint locations indicated by the checkpoint entries associated with the trace experiencing the violation and all younger traces.
Type:
Grant
Filed:
February 13, 2008
Date of Patent:
September 13, 2011
Assignee:
Oracle America, Inc.
Inventors:
John Gregory Favor, Paul G. Chan, Graham Ricketson Murphy, Joseph Byron Rowlands
Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
Type:
Grant
Filed:
April 25, 2005
Date of Patent:
September 13, 2011
Assignee:
Seiko-Epson Corporation
Inventors:
Cheryl Senter Brashears, Johannes Wang, Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
Abstract: A unit status reporting protocol may also be used for context switching, debugging, and removing deadlock conditions in a processing unit. A processing unit is in one of five states: empty, active, stalled, quiescent, and halted. The state that a processing unit is in is reported to a front end monitoring unit to enable the front end monitoring unit to determine when a context switch may be performed or when a deadlock condition exists. The front end monitoring unit can issue a halt command to perform a context switch or take action to remove a deadlock condition and allow processing to resume.
Type:
Grant
Filed:
August 13, 2007
Date of Patent:
September 13, 2011
Assignee:
NVIDIA Corporation
Inventors:
Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken
Abstract: A method, information processing system, and computer program product manage instruction execution based on machine state. At least one instruction is received. The at least one instruction is decoded. A current machine state is determined in response to the decoding. The at least one instruction is organized into a set of unit of operations based on the current machine state that has been determined. The set of unit of operations is executed.
Type:
Application
Filed:
March 5, 2010
Publication date:
September 8, 2011
Applicant:
International Business Machines Corporation
Inventors:
Fadi BUSABA, Bruce GIAMEI, David HUTTON, Eric SCHWARZ
Abstract: An idle-state detection circuit detects that the processor repeats every predetermined number of clocks an operation to load data satisfying a preset idle-state condition from a particular address, and determines that the processor is in an idle state if the number of iterations is greater than a preset specified number of loops.
Abstract: An apparatus includes a processor. The processor includes two memories. The first memory stores one set of instructions. The second memory stores another set of instructions that are longer than the set of instructions in the first memory. An instruction in the set of instructions in the first memory is used as a pointer to a corresponding instruction in the set of instructions in the second memory.
Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.
Type:
Application
Filed:
May 25, 2010
Publication date:
September 1, 2011
Inventors:
John George Mathieson, Phil Carmack, Brian Smith
Abstract: Embodiments of the disclosure generally set forth techniques for handling communication between processor cores. Some example multi-core processors include a first set of processor cores in a first region of the multi-core processor configured to dynamically receive a first supply voltage and a first clock signal, a second set of processor cores in a second region of the multi-core processor configured to dynamically receive a second supply voltage and a second clock signal, and an interface block coupled to the first set of processor cores and the second set of processor cores, wherein the interface block is configured to facilitate communications between the first set of processor cores and the second set of processor cores.
Abstract: Provided is a cryptographic device performing encryption or decryption on input data, and more particularly, a cryptographic device having a session memory bus for communicating with a session memory. The cryptographic device includes: an external session memory for storing cryptographic information on each session; a cryptographic processor for encrypting or decrypting input data using the cryptographic information; an external session memory bus connected to the external session memory and the cryptographic processor; and a Central Processing Unit (CPU) for transferring and receiving data to and from the external session memory via the cryptographic processor. The separate session memory buses allow the cryptographic processor to access a session memory without being disturbed by another device, thereby improving the entire performance of the cryptographic device.
Type:
Grant
Filed:
May 13, 2008
Date of Patent:
August 30, 2011
Assignee:
Electronics and Telecommunications Research Institute
Inventors:
Sun Kang, Soo Hyeon Kim, Hyo Won Kim, Tae Joo Chang
Abstract: An embodiment of the present invention includes a circuit for tracking memory operations with trace-based execution. Each trace includes a sequence of operations that includes zero or more of the memory operations. The memory operations being executed form a set of active memory operations that have a predefined program order among them and corresponding ordering constraints. At least some of the active memory operations access the memory in an execution order that is different from the program order. Checkpoint entries are associated with each trace. When a memory operation attempts to update a cache line that may not be updated, the circuit attempts to upgrade the cache line. If this fails, a rollback request is generated that indicates the trace involved. The checkpoint locations associated with the indicated trace are overwritten along with those locations associated with all younger traces.
Type:
Grant
Filed:
February 13, 2008
Date of Patent:
August 30, 2011
Assignee:
Oracle America, Inc.
Inventors:
John Gregory Favor, Paul G. Chan, Graham Ricketson Murphy, Joseph Byron Rowlands
Abstract: A method, apparatus and program storage device for providing light weight system calls to improve user mode performance is disclosed. A range of system call code for the light weight system calls is provided in a system call table. The light weight system calls skip the code for saving and restore processor context.
Type:
Grant
Filed:
December 1, 2005
Date of Patent:
August 30, 2011
Assignee:
International Business Machines Corporation
Inventors:
Daniel Heffley, Wenjeng Ko, Cheng-Chung Song
Abstract: A profiler may analyze processes being run by a processor. The profiler may include logic to periodically sample a value of an instruction pointer that indicates an instruction in the first process that is currently being executed by the processor and logic to update profile data based on the sampled value. The profiler may additionally include logic to determine, in response to a context switch that includes the operating system switching the active process from the first process to another of the plurality of processes, whether the first process executes for greater than a first length of time; logic to stop operation of the profiler when the first process executes for greater than the first length of time; and logic to clear the profile data when the first process fails to execute for greater than the first length of time.
Abstract: A system and method for management of resource allocation of threads for efficient execution of instructions. Prior to dispatching decoded instructions of a first thread from the instruction fetch unit to a buffer within a scheduler, logic within the instruction fetch unit may determine the buffer is already full of dispatched instructions. However, the logic may also determine that a buffer for a second thread within the core or micro core is available. The second buffer may receive and issue decoded instructions for the first thread until the buffer is becomes unavailable. While the second buffer receives and issues instructions for the first thread, the throughput of the system for the first thread may increase due to a reduction in wait cycles.
Abstract: A multithreaded computer system of the present invention includes a plurality of processor elements (PEs) and a parallel processor controller which switches threads in each PE. The parallel processor controller includes a plurality of execution order registers which hold, for each processor element, an execution order of threads to be executed; a plurality of counters which count an execution time for a thread that is being executed by each processor element and generate a timeout signal when the counted time reaches a limit assigned to the thread; and a thread execution scheduler which switches the thread that is being executed to the thread to be executed by each processor element based on an execution order held in the execution order register and the timeout signal.