Instruction Issuing Patents (Class 712/214)

Simultaneous issuance of multiple instructions (Class 712/215)

Multi-core multi-threaded processing systems with instruction reordering in an in-order pipeline

Patent number: 8176298

Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.

Type: Grant

Filed: August 31, 2004

Date of Patent: May 8, 2012

Assignee: NetLogic Microsystems, Inc.

Inventor: David T. Hass
Method and apparatus for clearing hazards using jump instructions

Patent number: 8171262

Abstract: A method and apparatus for overlaying hazard clearing with a jump instruction within a pipeline microprocessor is described. The apparatus includes hazard logic to detect when a jump instruction specifies that hazards are to be cleared as part of a jump operation. If hazards are to be cleared, the hazard logic disables branch prediction for the jump instruction, thereby causing the jump instruction to proceed down the pipeline until it is finally resolved, and flushing the pipeline behind the jump instruction. Disabling of branch prediction for the jump instruction effectively clears all execution and/or instruction hazards that preceded the jump instruction. Alternatively, hazard logic causes issue control logic to stall the jump instruction for n-cycles until all hazards are cleared. State tracking logic may be provided to determine whether any instructions are executing in the pipeline that create hazards. If so, hazard logic performs normally.

Type: Grant

Filed: November 21, 2005

Date of Patent: May 1, 2012

Assignee: MIPS Technology, Inc.

Inventors: Niels Gram Jeppesen, G. Michael Uhler
Resource management apparatus and radio network controller

Patent number: 8171484

Abstract: A resource management apparatus includes a resource management part to manage an amount of resources used and an amount of virtual resources of each of a plurality of processing units, a selection and control part to select a processing unit having a smallest sum of the amount of resources used and the amount of virtual resources in response to an external process request, and to increase the amount of resources used by the selected processing unit and to decrease the amount of resources used by a processing unit corresponding to an external process release request in response to the process release request, a virtual resource control part to increase the amount of virtual resources of the processing unit corresponding to the process release request in response to the process release request, and a request sending part to send the external process request or process release request to the selected or corresponding processing unit.

Type: Grant

Filed: October 24, 2007

Date of Patent: May 1, 2012

Assignee: Fujitsu Limited

Inventors: Katsuhiko Yamatsu, Hidetada Tanaka, Kazuaki Sumi, Nobuyuki Shima
Method and system for accessing memory in parallel computing using load fencing instructions

Patent number: 8171261

Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.

Type: Grant

Filed: September 2, 2003

Date of Patent: May 1, 2012

Assignee: Intel Corporation

Inventors: Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
MULTITHREADED PROCESSOR WITH MULTIPLE CONCURRENT PIPELINES PER THREAD

Publication number: 20120096243

Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.

Type: Application

Filed: October 27, 2011

Publication date: April 19, 2012

Applicant: Aspen Acquisition Corporation

Inventors: Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
ISSUING INSTRUCTIONS WITH UNRESOLVED DATA DEPENDENCIES

Publication number: 20120089819

Abstract: The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue.

Type: Application

Filed: October 6, 2010

Publication date: April 12, 2012

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Shailender Chaudhry, Richard Thuy Van, Robert E. Cypher, Debasish Chandra
Information processing program and information processing apparatus

Patent number: 8151007

Abstract: A computer of an information processing apparatus repeatedly accepts an operation to designate at least one of a plurality of command elements making up of a command, executes at least any one of a first memory writing processing to write a first command element having a specific attribute out of the command elements corresponding to the accepted operation in a first memory and a second memory writing processing to write a second command element having an attribute different from the attribute in a second memory, determines whether or not a command element array stored over the first memory and the second memory satisfies an execution allowable condition every execution of the writing processing, and processes information according to the command element array when the satisfaction is determined.

Type: Grant

Filed: April 8, 2008

Date of Patent: April 3, 2012

Assignee: Nintendo Co., Ltd.

Inventor: Hiroshi Momose
MULTI-LEVEL REGISTER FILE SUPPORTING MULTIPLE THREADS

Publication number: 20120072700

Abstract: A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution.

Type: Application

Filed: September 17, 2010

Publication date: March 22, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: CHRISTOPHER M. ABERNATHY, MARY D. BROWN, HUNG Q. LE, DUNG Q. NGUYEN
Structural power reduction in multithreaded processor

Patent number: 8140830

Abstract: A circuit arrangement and method utilize a plurality of execution units having different power and performance characteristics and capabilities within a multithreaded processor core, and selectively route instructions having different performance requirements to different execution units based upon those performance requirements. As such, instructions that have high performance requirements, such as instructions associated with primary tasks or time sensitive tasks, can be routed to a higher performance execution unit to maximize performance when executing those instructions, while instructions that have low performance requirements, such as instructions associated with background tasks or non-time sensitive tasks, can be routed to a reduced power execution unit to reduce the power consumption (and associated heat generation) associated with executing those instructions.

Type: Grant

Filed: May 22, 2008

Date of Patent: March 20, 2012

Assignee: International Business Machines Corporation

Inventors: Stephen Joseph Schwinn, Matthew Ray Tubbs, Charles David Wait
Multithreaded processor and method for switching threads by swapping instructions between buffers while pausing execution

Patent number: 8140829

Abstract: A processor includes primary threads of execution that may simultaneously issue instructions, and one or more backup threads. When a primary thread stalls, the contents of its instruction buffer may be switched with the instruction buffer for a backup thread, thereby allowing the backup thread to begin execution. This design allows two primary threads to issue simultaneously, which allows for overlap of instruction pipeline latencies. This design further allows a fast switch to a backup thread when a primary thread stalls, thereby providing significantly improved throughput in executing instructions by the processor.

Type: Grant

Filed: November 20, 2003

Date of Patent: March 20, 2012

Assignee: International Business Machines Corporation

Inventors: Richard James Eickemeyer, David Arnold Luick
Floating point execution unit for calculating a one minus dot product value in a single pass

Patent number: 8139061

Abstract: A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.

Type: Grant

Filed: August 1, 2008

Date of Patent: March 20, 2012

Assignee: International Business Machines Corporation

Inventors: Adam James Muff, Matthew Ray Tubbs
MACROSCALAR PROCESSOR ARCHITECTURE

Publication number: 20120066472

Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Type: Application

Filed: November 17, 2011

Publication date: March 15, 2012

Inventor: Jeffry E. Gonion
Structure for dynamic livelock resolution with variable delay memory access queue

Patent number: 8131980

Abstract: A design structure for resolving the occurrence of livelock at the interface between the processor core and memory subsystem controller. Livelock is resolved by introducing a livelock detection mechanism (which includes livelock detection utility or logic) within the processor to detect a livelock condition and dynamically change the duration of the delay stage(s) in order to alter the “harmonic” fixed-cycle loop behavior. The livelock detection logic (LDL) counts the number of flushes a particular instruction takes or the number of times an instruction re-issues without completing. The LDL then compares that number to a preset threshold number. Based on the result of the comparison, the LDL triggers the implementation of one of two different livelock resolution processes.

Type: Grant

Filed: June 3, 2008

Date of Patent: March 6, 2012

Assignee: International Business Machines Corporation

Inventors: Ronald Hall, Michael L. Karm, Alvan W. Ng, Todd A. Venton
Secured processing unit

Patent number: 8127120

Abstract: A method for executing by a processing unit a program stored in a memory, includes: detecting a piece of information during the execution of the program by the processing unit, and if the information is detected, triggering the execution of a hidden subprogram by the processing unit. The method may be applied to the securization of an integrated circuit.

Type: Grant

Filed: April 23, 2008

Date of Patent: February 28, 2012

Assignee: STMicroelectronics SA

Inventor: Philippe Roquelaure
System and method for executing instructions prior to an execution stage in a processor

Patent number: 8127114

Abstract: A method of processing a plurality of instructions in multiple pipeline stages within a pipeline processor is disclosed. The method partially or wholly executes a stalled instruction in a pipeline stage that has a function other than instruction execution prior to the execution stage within the processor. Partially or wholly executing the instruction prior to the execution stage in the pipeline speeds up the execution of the instruction and allows the processor to more effectively utilize its resources, thus increasing the processor's efficiency.

Type: Grant

Filed: March 28, 2007

Date of Patent: February 28, 2012

Assignee: QUALCOMM Incorporated

Inventors: Kiran Ravi Seth, James Norris Dieffenderfer, Michael Scott McIlvaine, Nathan Samuel Nunamaker
Group formation with multiple taken branches per group

Patent number: 8127115

Abstract: Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.

Type: Grant

Filed: April 3, 2009

Date of Patent: February 28, 2012

Assignee: International Business Machines Corporation

Inventors: Richard William Doing, Kevin Neal Magil, Balaram Sinharoy, Jeffrey R. Summers, James Albert Van Norstrand, Jr.
Methods and apparatus for resource allocation in partial fault tolerant applications

Patent number: 8112758

Abstract: Techniques are disclosed for allocation of resources in a distributed computing system. For example, a method for allocating a set of one or more components of an application to a set of one or more resource groups includes the following steps performed by a computer system. The set of one or more resource groups is ordered based on respective failure measures and resource capacities associated with the one or more resource groups. An importance value is assigned to each of the one or more components, wherein the importance value is associated with an affect of the component on an output of the application. The one or more components are assigned to the one or more resource groups based on the importance value of each component and the respective failure measures and resource capacities associated with the one or more resource groups, wherein components with higher importance values are assigned to resource groups with lower failure measures and higher resource capacities.

Type: Grant

Filed: January 8, 2008

Date of Patent: February 7, 2012

Assignee: International Business Machines Corporation

Inventors: Navendu Jain, Yoonho Park, Deepak S. Turaga, Chitra Venkatramani
Reduced instruction set baseband controller

Patent number: 8112616

Abstract: In wireless communications such as in the Bluetooth communication system, an execution unit sequentially receives software instructions for execution. Prior to completing each instruction, the execution unit issues an interrupt indicating the upcoming completion of the instruction execution and awaits receipt of the next instruction. A Link Manager issues limited instructions, and a Link Controller includes a hardware execution unit for executing the limited instructions. A processing unit in the Link Manager performs remaining functions under control of a software program.

Type: Grant

Filed: May 6, 2010

Date of Patent: February 7, 2012

Assignee: Broadcom Corporation

Inventor: Joakim Linde
Cache-based control of atomic operations in conjunction with an external ALU block

Patent number: 8108610

Abstract: One embodiment of the invention sets forth a mechanism for efficiently processing atomic operations transmitted from multiple general processing clusters to an L2 cache. A tag look-up unit tracks the availability of each cache line in the L2 cache, reserves the necessary cache lines for the atomic operations and transmits the atomic operations to an ALU for processing. The tag look-up unit also increments a reference counter associated with a reserved cache line each time an atomic operation associated with that cache line is received. This feature allows multiple atomic operations associated with the same cache line to be pipelined to the ALU. A ROP unit that includes the ALU may request additional data necessary to process an atomic operation from the L2 cache. Result data is stored in the L2 cache and may also be returned to the general processing clusters.

Type: Grant

Filed: October 21, 2008

Date of Patent: January 31, 2012

Assignee: NVIDIA Corporation

Inventors: David B. Glasco, Peter B. Holmqvist, George R. Lynch, Patrick R. Marchand, Karan Mehra, James Roberts
System and method for a group priority issue schema for a cascaded pipeline

Patent number: 8108654

Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. The method, among others, can be broadly summarized by the following steps: receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit.

Type: Grant

Filed: February 19, 2008

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventors: Jeffrey P. Bradford, David A. Luick
Selecting fixed-point instructions to issue on load-store unit

Patent number: 8108655

Abstract: Issue logic identifies a simple fixed point instruction, included in a unified payload, which is ready to issue. The simple fixed point instruction is a type of instruction that is executable by both a fixed point execution unit and a load-store execution unit. In turn, the issue logic determines that the unified payload does not include a load-store instruction that is ready to issue. As a result, the issue logic issues the simple fixed point instruction to the load-store execution unit in response to determining that the simple fixed point instruction is ready to issue and determining that the unified payload does not include a load-store instruction that is ready to issue.

Type: Grant

Filed: March 24, 2009

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventors: Christopher Michael Abernathy, James Wilson Bishop, Mary Douglass Brown, William Elton Burky, Robert Allen Cordes, Hung Qui Le, Dung Quoc Nguyen, Todd Alan Venton
PAIRED EXECUTION SCHEDULING OF DEPENDENT MICRO-OPERATIONS

Publication number: 20120023314

Abstract: A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler.

Type: Application

Filed: July 21, 2010

Publication date: January 26, 2012

Inventors: Matthew M. Crum, Michael D. Achenbach, Betty A. McDaniel, Benjamin T. Sander
System and method for optimization within a group priority issue schema for a cascaded pipeline

Patent number: 8095779

Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if a plurality of load instructions are in the issue group, if so, schedule the plurality of load instructions in descending order of longest dependency chain depth to shortest dependency chain depth in a shortest to longest available execution pipelines; and (3) execute the issue group of instructions in the cascaded delayed execution pipeline unit.

Type: Grant

Filed: February 19, 2008

Date of Patent: January 10, 2012

Assignee: International Business Machines Corporation

Inventor: David A. Luick
Method and system for sharing functional units of a multithreaded processor

Patent number: 8095778

Abstract: Sharing functional units within a multithreaded processor. In one embodiment, the multithreaded processor may include a multithreaded instruction source that may provide an instruction from each of a plurality of thread groups in a given cycle. A given thread group may include one or more instructions from one or more threads. The arbitration functionality may arbitrate between the plurality of thread groups for access to a functional unit such as a load store unit, for example, that may be shared between the thread groups.

Type: Grant

Filed: June 30, 2004

Date of Patent: January 10, 2012

Assignee: Open Computing Trust I & II

Inventor: Robert T. Golla
INSTRUCTION UNIT WITH INSTRUCTION BUFFER PIPELINE BYPASS

Publication number: 20110320771

Abstract: A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer.

Type: Application

Filed: June 28, 2010

Publication date: December 29, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Processing pipeline having stage-specific thread selection and method thereof

Patent number: 8086825

Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.

Type: Grant

Filed: December 31, 2007

Date of Patent: December 27, 2011

Assignee: Advanced Micro Devices, Inc.

Inventors: Gene Shen, Sean Lie, Marius Evers
Dependency tracking for enabling successive processor instructions to issue

Patent number: 8086826

Abstract: An information handling system includes a processor with an issue unit (IU) that may perform instruction dependency tracking for successive instruction issue operations. The IU maintains non-shifting issue queue (NSIQ) and shifting issue queue (SIQ) instructions along with relative instruction to instruction dependency information. A mapper maps queue position data for instructions that dispatch to issue queue locations within the IU. The IU may test an issuing producer instruction against consumer instructions in the IU for queue position (QPOS) and register tag (RTAG) matches. A matching consumer instruction may issue in a successive manner in the case of a queue position match or in a next processor cycle in the case of a register tag match.

Type: Grant

Filed: March 24, 2009

Date of Patent: December 27, 2011

Assignee: International Business Machines Corporation

Inventors: Mary Douglass Brown, William Elton Burky, Dung Quoc Nguyen, Balaram Sinharoy
HIGH-WORD FACILITY FOR EXTENDING THE NUMBER OF GENERAL PURPOSE REGISTERS AVAILABLE TO INSTRUCTIONS

Publication number: 20110314260

Abstract: A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed.

Type: Application

Filed: June 22, 2010

Publication date: December 22, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Dan F. Greiner, Marcel Mitran, Timothy J. Slegel
Variable length pipeline processor architecture

Patent number: 8074056

Abstract: In one implementation, a pipeline processor is provided having a base architecture that includes one or more decoders operable to decode program instructions and generate one or more decoded instructions, and one or more execution units operable to execute the one or more decoded instructions. Each execution unit includes one or more execution pipeline stages. The pipeline processor architecture further includes one or more additional co-processor pipelines. The one or more decoders of the base architecture are operable to recognize one or more instructions to be processed by a given co-processor pipeline and pass the one or more recognized instructions to the given co-processor pipeline for decoding and execution.

Type: Grant

Filed: March 1, 2005

Date of Patent: December 6, 2011

Assignee: Marvell International Ltd.

Inventors: Hong-Yi Chen, Jensen Tjeng
PROCESSOR AND METHOD PROVIDING INSTRUCTION SUPPORT FOR INSTRUCTIONS THAT UTILIZE MULTIPLE REGISTER WINDOWS

Publication number: 20110296142

Abstract: A processor including instruction support for large-operand instructions that use multiple register windows may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may also include an instruction execution unit that, during operation, receives instructions for execution from the instruction fetch unit and executes a large-operand instruction defined within the ISA, where execution of the large-operand instruction is dependent upon a plurality of registers arranged within a plurality of register windows. The processor may further include control circuitry (which may be included within the fetch unit, the execution unit, or elsewhere within the processor) that determines whether one or more of the register windows depended upon by the large-operand instruction are not present. In response to determining that one or more of these register windows are not present, the control circuitry causes them to be restored.

Type: Application

Filed: May 28, 2010

Publication date: December 1, 2011

Inventors: Christopher H. Olson, Paul J. Jordan, Jama I. Barreh
PIPELINE PROCESSOR AND AN EQUAL MODEL CONSERVATION METHOD

Publication number: 20110296143

Abstract: A pipeline processor which meets a latency restriction on an equal model is provided. The pipeline processor includes a pipeline processing unit to process an instruction at a plurality of stages and an equal model compensator to store the results of the processing of some or all of the instructions located in the pipeline processing unit and to write the results of the processing in a register file based on the latency of each instruction.

Type: Application

Filed: December 30, 2010

Publication date: December 1, 2011

Inventors: Heejun Shim, Yenjo Han, Jae-Young Kim, Yeon-Gon Cho, Jinseok Lee
HIERARCHICAL MULTITHREADED PROCESSING

Publication number: 20110276784

Abstract: In one embodiment, a current candidate thread is selected from each of multiple first groups of threads using a low granularity selection scheme, where each of the first groups includes multiple threads and first groups are mutually exclusive. A second group of threads is formed comprising the current candidate thread selected from each of the first groups of threads. A current winning thread is selected from the second group of threads using a high granularity selection scheme. An instruction is fetched from a memory based on a fetch address for a next instruction of the current winning thread. The instruction is then dispatched to one of the execution units for execution, whereby execution stalls of the execution units are reduced by fetching instructions based on the low granularity and high granularity selection schemes.

Type: Application

Filed: May 10, 2010

Publication date: November 10, 2011

Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)

Inventors: Evan Gewirtz, Robert Hathaway, Stephan Meier, Edward Ho
Pipe scheduling for pipelines based on destination register number

Patent number: 8055883

Abstract: A data processing apparatus 1 has a plurality of registers 10 of the same type of register and a plurality of processing pipelines 40, 50, each processing pipeline 40, 50 being arranged to process instructions. At least one instruction includes a destination register specifier specifying which of said registers is a destination register for storing a processing result of the at least one instruction. Instruction issuing circuitry 26 is configured to issue the at least one instruction for processing by one of the plurality of processing pipelines. The instruction issuing circuitry 26 selects the one of the plurality of processing pipelines to which the candidate instruction is issued in dependence upon the value of the destination register specifier of the candidate instruction.

Type: Grant

Filed: July 1, 2009

Date of Patent: November 8, 2011

Assignee: ARM Limited

Inventor: David Raymond Lutz
Result path sharing between a plurality of execution units within a processor

Patent number: 8051275

Abstract: A processor 2 includes an execution cluster 10 having multiple execution units 14, 16, 18, 20. The execution units 14, 16, 18, 20 share result buses 22, 24. Issue circuitry 12 within the execution cluster 10 determines future availability of a result bus 22, 24 for an instruction to be issued (or recently issued) using a known cycle count for that instruction. The availability is tracked for each result bus using a mask register 32 storing a mask value within which each bit position indicates the availability or non-availability of that result bus at a particular processing cycle in the future. The mask value is left shifted each processing cycle.

Type: Grant

Filed: June 1, 2009

Date of Patent: November 1, 2011

Assignee: ARM Limited

Inventors: David James Williamson, Conrado Blasco Allué
Supplying instruction stored in local memory configured as cache to peer processing elements in MIMD processing units

Patent number: 8051273

Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.

Type: Grant

Filed: November 2, 2010

Date of Patent: November 1, 2011

Assignee: NEC Corporation

Inventor: Shorin Kyo
INSTRUCTION CRACKING AND ISSUE SHORTENING BASED ON INSTRUCTION BASE FIELDS, INDEX FIELDS, OPERAND FIELDS, AND VARIOUS OTHER INSTRUCTION TEXT BITS

Publication number: 20110252220

Abstract: A method, information processing system, and computer program product crack and/or shorten computer executable instructions. At least one instruction is received. The at least on instruction is analyzed. An instruction type associated with the at least one instruction is identified. At least one of a base field, an index field, one or more operands, and a mask field of the instruction are analyzed. At least one of the following is then performed: the at least one instruction is organized into a set of unit of operation; and the at least one instruction is shortened. The set of unit of operations is then executed.

Type: Application

Filed: April 9, 2010

Publication date: October 13, 2011

Applicant: International Business Machines Corporation

Inventors: Fadi BUSABA, Brian CURRAN, Lee EISEN, Bruce GIAMEI, David HUTTON
CACHE-AWARE THREAD SCHEDULING IN MULTI-THREADED SYSTEMS

Publication number: 20110246995

Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.

Type: Application

Filed: April 5, 2010

Publication date: October 6, 2011

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
Data processing system and program for transferring data

Patent number: 8028284

Abstract: A system is provided having a group of processors which performs coordinated processing, wherein data is transferred to the group of processors or from the group of processors. When data is transferred from an input queue, a ring buffer, to the group of processors, an identifier adding unit adds an identifier to the data as a tag, the identifier indicating a block that contains this data in the input queue. When data processed by any one of the processors included in the group of processors is transferred to an output queue, a block selecting unit selects one of blocks of the output queue as a block for storing the data, the one corresponding to the tag added to this data.

Type: Grant

Filed: October 31, 2006

Date of Patent: September 27, 2011

Assignee: Sony Computer Entertainment Inc.

Inventor: Yasukichi Ohkawa
ADVANCED PROCESSOR SCHEDULING IN A MULTITHREADED SYSTEM

Publication number: 20110225398

Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.

Type: Application

Filed: May 24, 2011

Publication date: September 15, 2011

Inventors: David T. Hass, Abbas Rashid
PROCESSES, CIRCUITS, DEVICES, AND SYSTEMS FOR SCOREBOARD AND OTHER PROCESSOR IMPROVEMENTS

Publication number: 20110208950

Abstract: A method of instruction issue (3200) in a microprocessor (1100, 1400, or 1500) with execution pipestages (E1, E2, etc.) and that executes a producer instruction Ip and issues a candidate instruction I0 (3245) having a source operand dependency on a destination operand of instruction Ip. The method includes issuing the candidate instruction I0 as a function (1720, 1950, 1958, 3235) of a pipestage EN(I0) of first need by the candidate instruction for the source operand, a pipestage EA(Ip) of first availability of the destination operand from the producer instruction, and the one execution pipestage E(Ip) currently associated with the producer instruction. A method of data forwarding (3300) in a microprocessor (1100, 1400, or 1500) having a pipeline (1640) having pipestages (E1, E2, etc.

Type: Application

Filed: March 21, 2011

Publication date: August 25, 2011

Applicant: TEXAS INSTRUMENTS INCORPORATED

Inventors: Thang Minh Tran, Raul A. Garibay, JR., James Nolan Hardage
Reducing data hazards in pipelined processors to provide high processor utilization

Patent number: 8006072

Abstract: A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.

Type: Grant

Filed: May 18, 2010

Date of Patent: August 23, 2011

Assignee: Micron Technology, Inc.

Inventors: Neal Andrew Cook, Alan T. Wootton, James Peterson
System and method for optimization within a group priority issue schema for a cascaded pipeline

Patent number: 7996654

Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.

Type: Grant

Filed: February 19, 2008

Date of Patent: August 9, 2011

Assignee: International Business Machines Corporation

Inventor: David A. Luick
Issuing load-dependent instructions in an issue queue in a processing unit of a data processing system

Patent number: 7991979

Abstract: A system and method for issuing load-dependent instructions in an issue queue in a processing unit. A load miss queue is provided. The load miss queue comprises a physical address field, an issue queue position field, a valid identifier field, a source identifier field, and a data type field. A load instruction that misses a first level cache is dispatched, and both the physical address field and the data type field are set. A load-dependent instruction is identified. In response to identifying the load-dependent instruction, each of the issue queue position field, valid identifier field, and source identifier field are set. If the issue queue position field refers to a flushed instruction, the valid identifier field is cleared. The load instruction is recycled, and a value of the valid identifier field is determined. The load-dependent instruction is then selected for issue on a next processing cycle independent of an age of the load-dependent instruction.

Type: Grant

Filed: September 23, 2008

Date of Patent: August 2, 2011

Assignee: International Business Machines Corporation

Inventors: Christopher M. Abernathy, Mary D Brown, William E. Burky, Todd A. Venton
Data processing apparatus and method for reducing issue circuitry responsibility by using a predetermined pipeline stage to schedule a next operation in a sequence of operations defined by a complex instruction

Patent number: 7984269

Abstract: A data processing apparatus and method are provided for executing complex instructions. The data processing apparatus executes instructions defining operations to be performed by the data processing apparatus, those instructions including at least one complex instruction defining a sequence of operations to be performed. The data processing apparatus comprises a plurality of execution pipelines, each execution pipeline having a plurality of pipeline stages and arranged to perform at least one associated operation. Issue circuitry interfaces with the plurality of execution pipelines and is used to schedule performance of the operations defined by the instructions. For the at least one complex instruction, the issue circuitry is arranged to schedule a first operation in the sequence, and to issue control signals to one of the execution pipelines with which that first operation is associated, those control signals including an indication of each additional operation in the sequence.

Type: Grant

Filed: June 12, 2007

Date of Patent: July 19, 2011

Assignee: ARM Limited

Inventors: Luc Orion, Cédric Denis Robert Airaud, Boris Sira Alvarez-Heredia
System and method for prioritizing arithmetic instructions

Patent number: 7984270

Abstract: The present invention provides a system and method for prioritizing arithmetic instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one arithmetic instruction is in the issue group, if so scheduling the least one arithmetic instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one arithmetic instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.

Type: Grant

Filed: February 19, 2008

Date of Patent: July 19, 2011

Assignee: International Business Machines Corporation

Inventor: David A. Luick
Advanced processor scheduling in a multithreaded system

Patent number: 7984268

Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.

Type: Grant

Filed: July 23, 2004

Date of Patent: July 19, 2011

Assignee: NetLogic Microsystems, Inc.

Inventors: David T. Hass, Abbas Rashid
Design structure for single hot forward interconnect scheme for delayed execution pipelines

Patent number: 7984272

Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding data in a processor is provided. The design structure includes a processor. The processor includes at least one cascaded delayed execution pipeline unit having a first and second pipeline, wherein the second pipeline is configured to execute instructions in a common issue group in a delayed manner relative to the first pipeline, and circuitry. The circuitry is configured to determine if a first instruction being executed in the first pipeline modifies data in a data register which is accessed by a second instruction being executed in the second pipeline, and if the first instruction being executed in the first pipeline modifies data in the data register which is accessed by the second instruction being executed in the second pipeline, forward the modified data from the first pipeline to the second pipeline.

Type: Grant

Filed: March 21, 2008

Date of Patent: July 19, 2011

Assignee: International Business Machines Corporation

Inventor: David Arnold Luick
Adaptive allocation of reservation station entries to an instruction set with variable operands in a microprocessor

Patent number: 7979677

Abstract: A method and device for adaptively allocating reservation station entries to an instruction set with variable operands in a microprocessor. The device includes logic for determining free reservation station queue positions in a reservation station. The device allocates an issue queue to an instruction and writes the instruction into the issue queue as an issue queue entry. The device reads an operand corresponding to the instruction from a general purpose register and writes the operand into a reservation station using one of the free reservations station positions as a write address. The device writes each reservation station queue position corresponding to said instruction into said issue queue entry. When the instruction is ready for issue to an execution unit, the device reads out the instruction from the issue queue entry the reservation station queue positions to the execution unit.

Type: Grant

Filed: August 3, 2007

Date of Patent: July 12, 2011

Assignee: International Business Machines Corporation

Inventor: Dung Q. Nguyen
Using temperature data for instruction thread direction

Patent number: 7971035

Abstract: A data processing system having a memory for storing instructions and several central processing units for executing instructions, each central processing unit includes an adaptive power supply which provides, among other data, temperature information. Circuitry is provided that receives the temperature information from the many central processing units, selects a central processing unit which has the lowest temperature and which is available to execute instructions and dispatches instructions to the selected central processing from the memory.

Type: Grant

Filed: February 6, 2007

Date of Patent: June 28, 2011

Assignee: International Business Machines Corporation

Inventors: Deepak K. Singh, Francois Ibrahim Atallah
SYNCHRONIZING SIMD VECTORS

Publication number: 20110153989

Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.

Type: Application

Filed: December 22, 2009

Publication date: June 23, 2011

Inventors: Ravi Rajwar, Andrew T. Forsyth

prev … 3 4 5 6 7 8 9 10 11 … next