Instruction Issuing Patents (Class 712/214)
  • Patent number: 8176298
    Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
    Type: Grant
    Filed: August 31, 2004
    Date of Patent: May 8, 2012
    Assignee: NetLogic Microsystems, Inc.
    Inventor: David T. Hass
  • Patent number: 8171262
    Abstract: A method and apparatus for overlaying hazard clearing with a jump instruction within a pipeline microprocessor is described. The apparatus includes hazard logic to detect when a jump instruction specifies that hazards are to be cleared as part of a jump operation. If hazards are to be cleared, the hazard logic disables branch prediction for the jump instruction, thereby causing the jump instruction to proceed down the pipeline until it is finally resolved, and flushing the pipeline behind the jump instruction. Disabling of branch prediction for the jump instruction effectively clears all execution and/or instruction hazards that preceded the jump instruction. Alternatively, hazard logic causes issue control logic to stall the jump instruction for n-cycles until all hazards are cleared. State tracking logic may be provided to determine whether any instructions are executing in the pipeline that create hazards. If so, hazard logic performs normally.
    Type: Grant
    Filed: November 21, 2005
    Date of Patent: May 1, 2012
    Assignee: MIPS Technology, Inc.
    Inventors: Niels Gram Jeppesen, G. Michael Uhler
  • Patent number: 8171484
    Abstract: A resource management apparatus includes a resource management part to manage an amount of resources used and an amount of virtual resources of each of a plurality of processing units, a selection and control part to select a processing unit having a smallest sum of the amount of resources used and the amount of virtual resources in response to an external process request, and to increase the amount of resources used by the selected processing unit and to decrease the amount of resources used by a processing unit corresponding to an external process release request in response to the process release request, a virtual resource control part to increase the amount of virtual resources of the processing unit corresponding to the process release request in response to the process release request, and a request sending part to send the external process request or process release request to the selected or corresponding processing unit.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: May 1, 2012
    Assignee: Fujitsu Limited
    Inventors: Katsuhiko Yamatsu, Hidetada Tanaka, Kazuaki Sumi, Nobuyuki Shima
  • Patent number: 8171261
    Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.
    Type: Grant
    Filed: September 2, 2003
    Date of Patent: May 1, 2012
    Assignee: Intel Corporation
    Inventors: Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
  • Publication number: 20120096243
    Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.
    Type: Application
    Filed: October 27, 2011
    Publication date: April 19, 2012
    Applicant: Aspen Acquisition Corporation
    Inventors: Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
  • Publication number: 20120089819
    Abstract: The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue.
    Type: Application
    Filed: October 6, 2010
    Publication date: April 12, 2012
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Shailender Chaudhry, Richard Thuy Van, Robert E. Cypher, Debasish Chandra
  • Patent number: 8151007
    Abstract: A computer of an information processing apparatus repeatedly accepts an operation to designate at least one of a plurality of command elements making up of a command, executes at least any one of a first memory writing processing to write a first command element having a specific attribute out of the command elements corresponding to the accepted operation in a first memory and a second memory writing processing to write a second command element having an attribute different from the attribute in a second memory, determines whether or not a command element array stored over the first memory and the second memory satisfies an execution allowable condition every execution of the writing processing, and processes information according to the command element array when the satisfaction is determined.
    Type: Grant
    Filed: April 8, 2008
    Date of Patent: April 3, 2012
    Assignee: Nintendo Co., Ltd.
    Inventor: Hiroshi Momose
  • Publication number: 20120072700
    Abstract: A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution.
    Type: Application
    Filed: September 17, 2010
    Publication date: March 22, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: CHRISTOPHER M. ABERNATHY, MARY D. BROWN, HUNG Q. LE, DUNG Q. NGUYEN
  • Patent number: 8140830
    Abstract: A circuit arrangement and method utilize a plurality of execution units having different power and performance characteristics and capabilities within a multithreaded processor core, and selectively route instructions having different performance requirements to different execution units based upon those performance requirements. As such, instructions that have high performance requirements, such as instructions associated with primary tasks or time sensitive tasks, can be routed to a higher performance execution unit to maximize performance when executing those instructions, while instructions that have low performance requirements, such as instructions associated with background tasks or non-time sensitive tasks, can be routed to a reduced power execution unit to reduce the power consumption (and associated heat generation) associated with executing those instructions.
    Type: Grant
    Filed: May 22, 2008
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Stephen Joseph Schwinn, Matthew Ray Tubbs, Charles David Wait
  • Patent number: 8140829
    Abstract: A processor includes primary threads of execution that may simultaneously issue instructions, and one or more backup threads. When a primary thread stalls, the contents of its instruction buffer may be switched with the instruction buffer for a backup thread, thereby allowing the backup thread to begin execution. This design allows two primary threads to issue simultaneously, which allows for overlap of instruction pipeline latencies. This design further allows a fast switch to a backup thread when a primary thread stalls, thereby providing significantly improved throughput in executing instructions by the processor.
    Type: Grant
    Filed: November 20, 2003
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Richard James Eickemeyer, David Arnold Luick
  • Patent number: 8139061
    Abstract: A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.
    Type: Grant
    Filed: August 1, 2008
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Adam James Muff, Matthew Ray Tubbs
  • Publication number: 20120066472
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.
    Type: Application
    Filed: November 17, 2011
    Publication date: March 15, 2012
    Inventor: Jeffry E. Gonion
  • Patent number: 8131980
    Abstract: A design structure for resolving the occurrence of livelock at the interface between the processor core and memory subsystem controller. Livelock is resolved by introducing a livelock detection mechanism (which includes livelock detection utility or logic) within the processor to detect a livelock condition and dynamically change the duration of the delay stage(s) in order to alter the “harmonic” fixed-cycle loop behavior. The livelock detection logic (LDL) counts the number of flushes a particular instruction takes or the number of times an instruction re-issues without completing. The LDL then compares that number to a preset threshold number. Based on the result of the comparison, the LDL triggers the implementation of one of two different livelock resolution processes.
    Type: Grant
    Filed: June 3, 2008
    Date of Patent: March 6, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ronald Hall, Michael L. Karm, Alvan W. Ng, Todd A. Venton
  • Patent number: 8127120
    Abstract: A method for executing by a processing unit a program stored in a memory, includes: detecting a piece of information during the execution of the program by the processing unit, and if the information is detected, triggering the execution of a hidden subprogram by the processing unit. The method may be applied to the securization of an integrated circuit.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: February 28, 2012
    Assignee: STMicroelectronics SA
    Inventor: Philippe Roquelaure
  • Patent number: 8127114
    Abstract: A method of processing a plurality of instructions in multiple pipeline stages within a pipeline processor is disclosed. The method partially or wholly executes a stalled instruction in a pipeline stage that has a function other than instruction execution prior to the execution stage within the processor. Partially or wholly executing the instruction prior to the execution stage in the pipeline speeds up the execution of the instruction and allows the processor to more effectively utilize its resources, thus increasing the processor's efficiency.
    Type: Grant
    Filed: March 28, 2007
    Date of Patent: February 28, 2012
    Assignee: QUALCOMM Incorporated
    Inventors: Kiran Ravi Seth, James Norris Dieffenderfer, Michael Scott McIlvaine, Nathan Samuel Nunamaker
  • Patent number: 8127115
    Abstract: Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.
    Type: Grant
    Filed: April 3, 2009
    Date of Patent: February 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Richard William Doing, Kevin Neal Magil, Balaram Sinharoy, Jeffrey R. Summers, James Albert Van Norstrand, Jr.
  • Patent number: 8112758
    Abstract: Techniques are disclosed for allocation of resources in a distributed computing system. For example, a method for allocating a set of one or more components of an application to a set of one or more resource groups includes the following steps performed by a computer system. The set of one or more resource groups is ordered based on respective failure measures and resource capacities associated with the one or more resource groups. An importance value is assigned to each of the one or more components, wherein the importance value is associated with an affect of the component on an output of the application. The one or more components are assigned to the one or more resource groups based on the importance value of each component and the respective failure measures and resource capacities associated with the one or more resource groups, wherein components with higher importance values are assigned to resource groups with lower failure measures and higher resource capacities.
    Type: Grant
    Filed: January 8, 2008
    Date of Patent: February 7, 2012
    Assignee: International Business Machines Corporation
    Inventors: Navendu Jain, Yoonho Park, Deepak S. Turaga, Chitra Venkatramani
  • Patent number: 8112616
    Abstract: In wireless communications such as in the Bluetooth communication system, an execution unit sequentially receives software instructions for execution. Prior to completing each instruction, the execution unit issues an interrupt indicating the upcoming completion of the instruction execution and awaits receipt of the next instruction. A Link Manager issues limited instructions, and a Link Controller includes a hardware execution unit for executing the limited instructions. A processing unit in the Link Manager performs remaining functions under control of a software program.
    Type: Grant
    Filed: May 6, 2010
    Date of Patent: February 7, 2012
    Assignee: Broadcom Corporation
    Inventor: Joakim Linde
  • Patent number: 8108610
    Abstract: One embodiment of the invention sets forth a mechanism for efficiently processing atomic operations transmitted from multiple general processing clusters to an L2 cache. A tag look-up unit tracks the availability of each cache line in the L2 cache, reserves the necessary cache lines for the atomic operations and transmits the atomic operations to an ALU for processing. The tag look-up unit also increments a reference counter associated with a reserved cache line each time an atomic operation associated with that cache line is received. This feature allows multiple atomic operations associated with the same cache line to be pipelined to the ALU. A ROP unit that includes the ALU may request additional data necessary to process an atomic operation from the L2 cache. Result data is stored in the L2 cache and may also be returned to the general processing clusters.
    Type: Grant
    Filed: October 21, 2008
    Date of Patent: January 31, 2012
    Assignee: NVIDIA Corporation
    Inventors: David B. Glasco, Peter B. Holmqvist, George R. Lynch, Patrick R. Marchand, Karan Mehra, James Roberts
  • Patent number: 8108654
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. The method, among others, can be broadly summarized by the following steps: receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 31, 2012
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey P. Bradford, David A. Luick
  • Patent number: 8108655
    Abstract: Issue logic identifies a simple fixed point instruction, included in a unified payload, which is ready to issue. The simple fixed point instruction is a type of instruction that is executable by both a fixed point execution unit and a load-store execution unit. In turn, the issue logic determines that the unified payload does not include a load-store instruction that is ready to issue. As a result, the issue logic issues the simple fixed point instruction to the load-store execution unit in response to determining that the simple fixed point instruction is ready to issue and determining that the unified payload does not include a load-store instruction that is ready to issue.
    Type: Grant
    Filed: March 24, 2009
    Date of Patent: January 31, 2012
    Assignee: International Business Machines Corporation
    Inventors: Christopher Michael Abernathy, James Wilson Bishop, Mary Douglass Brown, William Elton Burky, Robert Allen Cordes, Hung Qui Le, Dung Quoc Nguyen, Todd Alan Venton
  • Publication number: 20120023314
    Abstract: A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler.
    Type: Application
    Filed: July 21, 2010
    Publication date: January 26, 2012
    Inventors: Matthew M. Crum, Michael D. Achenbach, Betty A. McDaniel, Benjamin T. Sander
  • Patent number: 8095779
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if a plurality of load instructions are in the issue group, if so, schedule the plurality of load instructions in descending order of longest dependency chain depth to shortest dependency chain depth in a shortest to longest available execution pipelines; and (3) execute the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 10, 2012
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 8095778
    Abstract: Sharing functional units within a multithreaded processor. In one embodiment, the multithreaded processor may include a multithreaded instruction source that may provide an instruction from each of a plurality of thread groups in a given cycle. A given thread group may include one or more instructions from one or more threads. The arbitration functionality may arbitrate between the plurality of thread groups for access to a functional unit such as a load store unit, for example, that may be shared between the thread groups.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: January 10, 2012
    Assignee: Open Computing Trust I & II
    Inventor: Robert T. Golla
  • Publication number: 20110320771
    Abstract: A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer.
    Type: Application
    Filed: June 28, 2010
    Publication date: December 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 8086825
    Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: December 27, 2011
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gene Shen, Sean Lie, Marius Evers
  • Patent number: 8086826
    Abstract: An information handling system includes a processor with an issue unit (IU) that may perform instruction dependency tracking for successive instruction issue operations. The IU maintains non-shifting issue queue (NSIQ) and shifting issue queue (SIQ) instructions along with relative instruction to instruction dependency information. A mapper maps queue position data for instructions that dispatch to issue queue locations within the IU. The IU may test an issuing producer instruction against consumer instructions in the IU for queue position (QPOS) and register tag (RTAG) matches. A matching consumer instruction may issue in a successive manner in the case of a queue position match or in a next processor cycle in the case of a register tag match.
    Type: Grant
    Filed: March 24, 2009
    Date of Patent: December 27, 2011
    Assignee: International Business Machines Corporation
    Inventors: Mary Douglass Brown, William Elton Burky, Dung Quoc Nguyen, Balaram Sinharoy
  • Publication number: 20110314260
    Abstract: A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed.
    Type: Application
    Filed: June 22, 2010
    Publication date: December 22, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Marcel Mitran, Timothy J. Slegel
  • Patent number: 8074056
    Abstract: In one implementation, a pipeline processor is provided having a base architecture that includes one or more decoders operable to decode program instructions and generate one or more decoded instructions, and one or more execution units operable to execute the one or more decoded instructions. Each execution unit includes one or more execution pipeline stages. The pipeline processor architecture further includes one or more additional co-processor pipelines. The one or more decoders of the base architecture are operable to recognize one or more instructions to be processed by a given co-processor pipeline and pass the one or more recognized instructions to the given co-processor pipeline for decoding and execution.
    Type: Grant
    Filed: March 1, 2005
    Date of Patent: December 6, 2011
    Assignee: Marvell International Ltd.
    Inventors: Hong-Yi Chen, Jensen Tjeng
  • Publication number: 20110296142
    Abstract: A processor including instruction support for large-operand instructions that use multiple register windows may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may also include an instruction execution unit that, during operation, receives instructions for execution from the instruction fetch unit and executes a large-operand instruction defined within the ISA, where execution of the large-operand instruction is dependent upon a plurality of registers arranged within a plurality of register windows. The processor may further include control circuitry (which may be included within the fetch unit, the execution unit, or elsewhere within the processor) that determines whether one or more of the register windows depended upon by the large-operand instruction are not present. In response to determining that one or more of these register windows are not present, the control circuitry causes them to be restored.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Inventors: Christopher H. Olson, Paul J. Jordan, Jama I. Barreh
  • Publication number: 20110296143
    Abstract: A pipeline processor which meets a latency restriction on an equal model is provided. The pipeline processor includes a pipeline processing unit to process an instruction at a plurality of stages and an equal model compensator to store the results of the processing of some or all of the instructions located in the pipeline processing unit and to write the results of the processing in a register file based on the latency of each instruction.
    Type: Application
    Filed: December 30, 2010
    Publication date: December 1, 2011
    Inventors: Heejun Shim, Yenjo Han, Jae-Young Kim, Yeon-Gon Cho, Jinseok Lee
  • Publication number: 20110276784
    Abstract: In one embodiment, a current candidate thread is selected from each of multiple first groups of threads using a low granularity selection scheme, where each of the first groups includes multiple threads and first groups are mutually exclusive. A second group of threads is formed comprising the current candidate thread selected from each of the first groups of threads. A current winning thread is selected from the second group of threads using a high granularity selection scheme. An instruction is fetched from a memory based on a fetch address for a next instruction of the current winning thread. The instruction is then dispatched to one of the execution units for execution, whereby execution stalls of the execution units are reduced by fetching instructions based on the low granularity and high granularity selection schemes.
    Type: Application
    Filed: May 10, 2010
    Publication date: November 10, 2011
    Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)
    Inventors: Evan Gewirtz, Robert Hathaway, Stephan Meier, Edward Ho
  • Patent number: 8055883
    Abstract: A data processing apparatus 1 has a plurality of registers 10 of the same type of register and a plurality of processing pipelines 40, 50, each processing pipeline 40, 50 being arranged to process instructions. At least one instruction includes a destination register specifier specifying which of said registers is a destination register for storing a processing result of the at least one instruction. Instruction issuing circuitry 26 is configured to issue the at least one instruction for processing by one of the plurality of processing pipelines. The instruction issuing circuitry 26 selects the one of the plurality of processing pipelines to which the candidate instruction is issued in dependence upon the value of the destination register specifier of the candidate instruction.
    Type: Grant
    Filed: July 1, 2009
    Date of Patent: November 8, 2011
    Assignee: ARM Limited
    Inventor: David Raymond Lutz
  • Patent number: 8051275
    Abstract: A processor 2 includes an execution cluster 10 having multiple execution units 14, 16, 18, 20. The execution units 14, 16, 18, 20 share result buses 22, 24. Issue circuitry 12 within the execution cluster 10 determines future availability of a result bus 22, 24 for an instruction to be issued (or recently issued) using a known cycle count for that instruction. The availability is tracked for each result bus using a mask register 32 storing a mask value within which each bit position indicates the availability or non-availability of that result bus at a particular processing cycle in the future. The mask value is left shifted each processing cycle.
    Type: Grant
    Filed: June 1, 2009
    Date of Patent: November 1, 2011
    Assignee: ARM Limited
    Inventors: David James Williamson, Conrado Blasco Allué
  • Patent number: 8051273
    Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.
    Type: Grant
    Filed: November 2, 2010
    Date of Patent: November 1, 2011
    Assignee: NEC Corporation
    Inventor: Shorin Kyo
  • Publication number: 20110252220
    Abstract: A method, information processing system, and computer program product crack and/or shorten computer executable instructions. At least one instruction is received. The at least on instruction is analyzed. An instruction type associated with the at least one instruction is identified. At least one of a base field, an index field, one or more operands, and a mask field of the instruction are analyzed. At least one of the following is then performed: the at least one instruction is organized into a set of unit of operation; and the at least one instruction is shortened. The set of unit of operations is then executed.
    Type: Application
    Filed: April 9, 2010
    Publication date: October 13, 2011
    Applicant: International Business Machines Corporation
    Inventors: Fadi BUSABA, Brian CURRAN, Lee EISEN, Bruce GIAMEI, David HUTTON
  • Publication number: 20110246995
    Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.
    Type: Application
    Filed: April 5, 2010
    Publication date: October 6, 2011
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
  • Patent number: 8028284
    Abstract: A system is provided having a group of processors which performs coordinated processing, wherein data is transferred to the group of processors or from the group of processors. When data is transferred from an input queue, a ring buffer, to the group of processors, an identifier adding unit adds an identifier to the data as a tag, the identifier indicating a block that contains this data in the input queue. When data processed by any one of the processors included in the group of processors is transferred to an output queue, a block selecting unit selects one of blocks of the output queue as a block for storing the data, the one corresponding to the tag added to this data.
    Type: Grant
    Filed: October 31, 2006
    Date of Patent: September 27, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Yasukichi Ohkawa
  • Publication number: 20110225398
    Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
    Type: Application
    Filed: May 24, 2011
    Publication date: September 15, 2011
    Inventors: David T. Hass, Abbas Rashid
  • Publication number: 20110208950
    Abstract: A method of instruction issue (3200) in a microprocessor (1100, 1400, or 1500) with execution pipestages (E1, E2, etc.) and that executes a producer instruction Ip and issues a candidate instruction I0 (3245) having a source operand dependency on a destination operand of instruction Ip. The method includes issuing the candidate instruction I0 as a function (1720, 1950, 1958, 3235) of a pipestage EN(I0) of first need by the candidate instruction for the source operand, a pipestage EA(Ip) of first availability of the destination operand from the producer instruction, and the one execution pipestage E(Ip) currently associated with the producer instruction. A method of data forwarding (3300) in a microprocessor (1100, 1400, or 1500) having a pipeline (1640) having pipestages (E1, E2, etc.
    Type: Application
    Filed: March 21, 2011
    Publication date: August 25, 2011
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Thang Minh Tran, Raul A. Garibay, JR., James Nolan Hardage
  • Patent number: 8006072
    Abstract: A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: August 23, 2011
    Assignee: Micron Technology, Inc.
    Inventors: Neal Andrew Cook, Alan T. Wootton, James Peterson
  • Patent number: 7996654
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7991979
    Abstract: A system and method for issuing load-dependent instructions in an issue queue in a processing unit. A load miss queue is provided. The load miss queue comprises a physical address field, an issue queue position field, a valid identifier field, a source identifier field, and a data type field. A load instruction that misses a first level cache is dispatched, and both the physical address field and the data type field are set. A load-dependent instruction is identified. In response to identifying the load-dependent instruction, each of the issue queue position field, valid identifier field, and source identifier field are set. If the issue queue position field refers to a flushed instruction, the valid identifier field is cleared. The load instruction is recycled, and a value of the valid identifier field is determined. The load-dependent instruction is then selected for issue on a next processing cycle independent of an age of the load-dependent instruction.
    Type: Grant
    Filed: September 23, 2008
    Date of Patent: August 2, 2011
    Assignee: International Business Machines Corporation
    Inventors: Christopher M. Abernathy, Mary D Brown, William E. Burky, Todd A. Venton
  • Patent number: 7984269
    Abstract: A data processing apparatus and method are provided for executing complex instructions. The data processing apparatus executes instructions defining operations to be performed by the data processing apparatus, those instructions including at least one complex instruction defining a sequence of operations to be performed. The data processing apparatus comprises a plurality of execution pipelines, each execution pipeline having a plurality of pipeline stages and arranged to perform at least one associated operation. Issue circuitry interfaces with the plurality of execution pipelines and is used to schedule performance of the operations defined by the instructions. For the at least one complex instruction, the issue circuitry is arranged to schedule a first operation in the sequence, and to issue control signals to one of the execution pipelines with which that first operation is associated, those control signals including an indication of each additional operation in the sequence.
    Type: Grant
    Filed: June 12, 2007
    Date of Patent: July 19, 2011
    Assignee: ARM Limited
    Inventors: Luc Orion, Cédric Denis Robert Airaud, Boris Sira Alvarez-Heredia
  • Patent number: 7984270
    Abstract: The present invention provides a system and method for prioritizing arithmetic instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one arithmetic instruction is in the issue group, if so scheduling the least one arithmetic instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one arithmetic instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: July 19, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7984268
    Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.
    Type: Grant
    Filed: July 23, 2004
    Date of Patent: July 19, 2011
    Assignee: NetLogic Microsystems, Inc.
    Inventors: David T. Hass, Abbas Rashid
  • Patent number: 7984272
    Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding data in a processor is provided. The design structure includes a processor. The processor includes at least one cascaded delayed execution pipeline unit having a first and second pipeline, wherein the second pipeline is configured to execute instructions in a common issue group in a delayed manner relative to the first pipeline, and circuitry. The circuitry is configured to determine if a first instruction being executed in the first pipeline modifies data in a data register which is accessed by a second instruction being executed in the second pipeline, and if the first instruction being executed in the first pipeline modifies data in the data register which is accessed by the second instruction being executed in the second pipeline, forward the modified data from the first pipeline to the second pipeline.
    Type: Grant
    Filed: March 21, 2008
    Date of Patent: July 19, 2011
    Assignee: International Business Machines Corporation
    Inventor: David Arnold Luick
  • Patent number: 7979677
    Abstract: A method and device for adaptively allocating reservation station entries to an instruction set with variable operands in a microprocessor. The device includes logic for determining free reservation station queue positions in a reservation station. The device allocates an issue queue to an instruction and writes the instruction into the issue queue as an issue queue entry. The device reads an operand corresponding to the instruction from a general purpose register and writes the operand into a reservation station using one of the free reservations station positions as a write address. The device writes each reservation station queue position corresponding to said instruction into said issue queue entry. When the instruction is ready for issue to an execution unit, the device reads out the instruction from the issue queue entry the reservation station queue positions to the execution unit.
    Type: Grant
    Filed: August 3, 2007
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventor: Dung Q. Nguyen
  • Patent number: 7971035
    Abstract: A data processing system having a memory for storing instructions and several central processing units for executing instructions, each central processing unit includes an adaptive power supply which provides, among other data, temperature information. Circuitry is provided that receives the temperature information from the many central processing units, selects a central processing unit which has the lowest temperature and which is available to execute instructions and dispatches instructions to the selected central processing from the memory.
    Type: Grant
    Filed: February 6, 2007
    Date of Patent: June 28, 2011
    Assignee: International Business Machines Corporation
    Inventors: Deepak K. Singh, Francois Ibrahim Atallah
  • Publication number: 20110153989
    Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.
    Type: Application
    Filed: December 22, 2009
    Publication date: June 23, 2011
    Inventors: Ravi Rajwar, Andrew T. Forsyth