Instruction Issuing Patents (Class 712/214)
-
Patent number: 8176298Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.Type: GrantFiled: August 31, 2004Date of Patent: May 8, 2012Assignee: NetLogic Microsystems, Inc.Inventor: David T. Hass
-
Patent number: 8171262Abstract: A method and apparatus for overlaying hazard clearing with a jump instruction within a pipeline microprocessor is described. The apparatus includes hazard logic to detect when a jump instruction specifies that hazards are to be cleared as part of a jump operation. If hazards are to be cleared, the hazard logic disables branch prediction for the jump instruction, thereby causing the jump instruction to proceed down the pipeline until it is finally resolved, and flushing the pipeline behind the jump instruction. Disabling of branch prediction for the jump instruction effectively clears all execution and/or instruction hazards that preceded the jump instruction. Alternatively, hazard logic causes issue control logic to stall the jump instruction for n-cycles until all hazards are cleared. State tracking logic may be provided to determine whether any instructions are executing in the pipeline that create hazards. If so, hazard logic performs normally.Type: GrantFiled: November 21, 2005Date of Patent: May 1, 2012Assignee: MIPS Technology, Inc.Inventors: Niels Gram Jeppesen, G. Michael Uhler
-
Patent number: 8171484Abstract: A resource management apparatus includes a resource management part to manage an amount of resources used and an amount of virtual resources of each of a plurality of processing units, a selection and control part to select a processing unit having a smallest sum of the amount of resources used and the amount of virtual resources in response to an external process request, and to increase the amount of resources used by the selected processing unit and to decrease the amount of resources used by a processing unit corresponding to an external process release request in response to the process release request, a virtual resource control part to increase the amount of virtual resources of the processing unit corresponding to the process release request in response to the process release request, and a request sending part to send the external process request or process release request to the selected or corresponding processing unit.Type: GrantFiled: October 24, 2007Date of Patent: May 1, 2012Assignee: Fujitsu LimitedInventors: Katsuhiko Yamatsu, Hidetada Tanaka, Kazuaki Sumi, Nobuyuki Shima
-
Patent number: 8171261Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.Type: GrantFiled: September 2, 2003Date of Patent: May 1, 2012Assignee: Intel CorporationInventors: Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
-
Publication number: 20120096243Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.Type: ApplicationFiled: October 27, 2011Publication date: April 19, 2012Applicant: Aspen Acquisition CorporationInventors: Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
-
Publication number: 20120089819Abstract: The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue.Type: ApplicationFiled: October 6, 2010Publication date: April 12, 2012Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Shailender Chaudhry, Richard Thuy Van, Robert E. Cypher, Debasish Chandra
-
Patent number: 8151007Abstract: A computer of an information processing apparatus repeatedly accepts an operation to designate at least one of a plurality of command elements making up of a command, executes at least any one of a first memory writing processing to write a first command element having a specific attribute out of the command elements corresponding to the accepted operation in a first memory and a second memory writing processing to write a second command element having an attribute different from the attribute in a second memory, determines whether or not a command element array stored over the first memory and the second memory satisfies an execution allowable condition every execution of the writing processing, and processes information according to the command element array when the satisfaction is determined.Type: GrantFiled: April 8, 2008Date of Patent: April 3, 2012Assignee: Nintendo Co., Ltd.Inventor: Hiroshi Momose
-
Publication number: 20120072700Abstract: A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution.Type: ApplicationFiled: September 17, 2010Publication date: March 22, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: CHRISTOPHER M. ABERNATHY, MARY D. BROWN, HUNG Q. LE, DUNG Q. NGUYEN
-
Patent number: 8140830Abstract: A circuit arrangement and method utilize a plurality of execution units having different power and performance characteristics and capabilities within a multithreaded processor core, and selectively route instructions having different performance requirements to different execution units based upon those performance requirements. As such, instructions that have high performance requirements, such as instructions associated with primary tasks or time sensitive tasks, can be routed to a higher performance execution unit to maximize performance when executing those instructions, while instructions that have low performance requirements, such as instructions associated with background tasks or non-time sensitive tasks, can be routed to a reduced power execution unit to reduce the power consumption (and associated heat generation) associated with executing those instructions.Type: GrantFiled: May 22, 2008Date of Patent: March 20, 2012Assignee: International Business Machines CorporationInventors: Stephen Joseph Schwinn, Matthew Ray Tubbs, Charles David Wait
-
Patent number: 8140829Abstract: A processor includes primary threads of execution that may simultaneously issue instructions, and one or more backup threads. When a primary thread stalls, the contents of its instruction buffer may be switched with the instruction buffer for a backup thread, thereby allowing the backup thread to begin execution. This design allows two primary threads to issue simultaneously, which allows for overlap of instruction pipeline latencies. This design further allows a fast switch to a backup thread when a primary thread stalls, thereby providing significantly improved throughput in executing instructions by the processor.Type: GrantFiled: November 20, 2003Date of Patent: March 20, 2012Assignee: International Business Machines CorporationInventors: Richard James Eickemeyer, David Arnold Luick
-
Patent number: 8139061Abstract: A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.Type: GrantFiled: August 1, 2008Date of Patent: March 20, 2012Assignee: International Business Machines CorporationInventors: Adam James Muff, Matthew Ray Tubbs
-
Publication number: 20120066472Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.Type: ApplicationFiled: November 17, 2011Publication date: March 15, 2012Inventor: Jeffry E. Gonion
-
Patent number: 8131980Abstract: A design structure for resolving the occurrence of livelock at the interface between the processor core and memory subsystem controller. Livelock is resolved by introducing a livelock detection mechanism (which includes livelock detection utility or logic) within the processor to detect a livelock condition and dynamically change the duration of the delay stage(s) in order to alter the “harmonic” fixed-cycle loop behavior. The livelock detection logic (LDL) counts the number of flushes a particular instruction takes or the number of times an instruction re-issues without completing. The LDL then compares that number to a preset threshold number. Based on the result of the comparison, the LDL triggers the implementation of one of two different livelock resolution processes.Type: GrantFiled: June 3, 2008Date of Patent: March 6, 2012Assignee: International Business Machines CorporationInventors: Ronald Hall, Michael L. Karm, Alvan W. Ng, Todd A. Venton
-
Patent number: 8127120Abstract: A method for executing by a processing unit a program stored in a memory, includes: detecting a piece of information during the execution of the program by the processing unit, and if the information is detected, triggering the execution of a hidden subprogram by the processing unit. The method may be applied to the securization of an integrated circuit.Type: GrantFiled: April 23, 2008Date of Patent: February 28, 2012Assignee: STMicroelectronics SAInventor: Philippe Roquelaure
-
Patent number: 8127114Abstract: A method of processing a plurality of instructions in multiple pipeline stages within a pipeline processor is disclosed. The method partially or wholly executes a stalled instruction in a pipeline stage that has a function other than instruction execution prior to the execution stage within the processor. Partially or wholly executing the instruction prior to the execution stage in the pipeline speeds up the execution of the instruction and allows the processor to more effectively utilize its resources, thus increasing the processor's efficiency.Type: GrantFiled: March 28, 2007Date of Patent: February 28, 2012Assignee: QUALCOMM IncorporatedInventors: Kiran Ravi Seth, James Norris Dieffenderfer, Michael Scott McIlvaine, Nathan Samuel Nunamaker
-
Patent number: 8127115Abstract: Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.Type: GrantFiled: April 3, 2009Date of Patent: February 28, 2012Assignee: International Business Machines CorporationInventors: Richard William Doing, Kevin Neal Magil, Balaram Sinharoy, Jeffrey R. Summers, James Albert Van Norstrand, Jr.
-
Patent number: 8112758Abstract: Techniques are disclosed for allocation of resources in a distributed computing system. For example, a method for allocating a set of one or more components of an application to a set of one or more resource groups includes the following steps performed by a computer system. The set of one or more resource groups is ordered based on respective failure measures and resource capacities associated with the one or more resource groups. An importance value is assigned to each of the one or more components, wherein the importance value is associated with an affect of the component on an output of the application. The one or more components are assigned to the one or more resource groups based on the importance value of each component and the respective failure measures and resource capacities associated with the one or more resource groups, wherein components with higher importance values are assigned to resource groups with lower failure measures and higher resource capacities.Type: GrantFiled: January 8, 2008Date of Patent: February 7, 2012Assignee: International Business Machines CorporationInventors: Navendu Jain, Yoonho Park, Deepak S. Turaga, Chitra Venkatramani
-
Patent number: 8112616Abstract: In wireless communications such as in the Bluetooth communication system, an execution unit sequentially receives software instructions for execution. Prior to completing each instruction, the execution unit issues an interrupt indicating the upcoming completion of the instruction execution and awaits receipt of the next instruction. A Link Manager issues limited instructions, and a Link Controller includes a hardware execution unit for executing the limited instructions. A processing unit in the Link Manager performs remaining functions under control of a software program.Type: GrantFiled: May 6, 2010Date of Patent: February 7, 2012Assignee: Broadcom CorporationInventor: Joakim Linde
-
Patent number: 8108610Abstract: One embodiment of the invention sets forth a mechanism for efficiently processing atomic operations transmitted from multiple general processing clusters to an L2 cache. A tag look-up unit tracks the availability of each cache line in the L2 cache, reserves the necessary cache lines for the atomic operations and transmits the atomic operations to an ALU for processing. The tag look-up unit also increments a reference counter associated with a reserved cache line each time an atomic operation associated with that cache line is received. This feature allows multiple atomic operations associated with the same cache line to be pipelined to the ALU. A ROP unit that includes the ALU may request additional data necessary to process an atomic operation from the L2 cache. Result data is stored in the L2 cache and may also be returned to the general processing clusters.Type: GrantFiled: October 21, 2008Date of Patent: January 31, 2012Assignee: NVIDIA CorporationInventors: David B. Glasco, Peter B. Holmqvist, George R. Lynch, Patrick R. Marchand, Karan Mehra, James Roberts
-
Patent number: 8108654Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. The method, among others, can be broadly summarized by the following steps: receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 19, 2008Date of Patent: January 31, 2012Assignee: International Business Machines CorporationInventors: Jeffrey P. Bradford, David A. Luick
-
Patent number: 8108655Abstract: Issue logic identifies a simple fixed point instruction, included in a unified payload, which is ready to issue. The simple fixed point instruction is a type of instruction that is executable by both a fixed point execution unit and a load-store execution unit. In turn, the issue logic determines that the unified payload does not include a load-store instruction that is ready to issue. As a result, the issue logic issues the simple fixed point instruction to the load-store execution unit in response to determining that the simple fixed point instruction is ready to issue and determining that the unified payload does not include a load-store instruction that is ready to issue.Type: GrantFiled: March 24, 2009Date of Patent: January 31, 2012Assignee: International Business Machines CorporationInventors: Christopher Michael Abernathy, James Wilson Bishop, Mary Douglass Brown, William Elton Burky, Robert Allen Cordes, Hung Qui Le, Dung Quoc Nguyen, Todd Alan Venton
-
Publication number: 20120023314Abstract: A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler.Type: ApplicationFiled: July 21, 2010Publication date: January 26, 2012Inventors: Matthew M. Crum, Michael D. Achenbach, Betty A. McDaniel, Benjamin T. Sander
-
Patent number: 8095779Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if a plurality of load instructions are in the issue group, if so, schedule the plurality of load instructions in descending order of longest dependency chain depth to shortest dependency chain depth in a shortest to longest available execution pipelines; and (3) execute the issue group of instructions in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 19, 2008Date of Patent: January 10, 2012Assignee: International Business Machines CorporationInventor: David A. Luick
-
Patent number: 8095778Abstract: Sharing functional units within a multithreaded processor. In one embodiment, the multithreaded processor may include a multithreaded instruction source that may provide an instruction from each of a plurality of thread groups in a given cycle. A given thread group may include one or more instructions from one or more threads. The arbitration functionality may arbitrate between the plurality of thread groups for access to a functional unit such as a load store unit, for example, that may be shared between the thread groups.Type: GrantFiled: June 30, 2004Date of Patent: January 10, 2012Assignee: Open Computing Trust I & IIInventor: Robert T. Golla
-
Publication number: 20110320771Abstract: A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer.Type: ApplicationFiled: June 28, 2010Publication date: December 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Patent number: 8086825Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.Type: GrantFiled: December 31, 2007Date of Patent: December 27, 2011Assignee: Advanced Micro Devices, Inc.Inventors: Gene Shen, Sean Lie, Marius Evers
-
Patent number: 8086826Abstract: An information handling system includes a processor with an issue unit (IU) that may perform instruction dependency tracking for successive instruction issue operations. The IU maintains non-shifting issue queue (NSIQ) and shifting issue queue (SIQ) instructions along with relative instruction to instruction dependency information. A mapper maps queue position data for instructions that dispatch to issue queue locations within the IU. The IU may test an issuing producer instruction against consumer instructions in the IU for queue position (QPOS) and register tag (RTAG) matches. A matching consumer instruction may issue in a successive manner in the case of a queue position match or in a next processor cycle in the case of a register tag match.Type: GrantFiled: March 24, 2009Date of Patent: December 27, 2011Assignee: International Business Machines CorporationInventors: Mary Douglass Brown, William Elton Burky, Dung Quoc Nguyen, Balaram Sinharoy
-
Publication number: 20110314260Abstract: A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed.Type: ApplicationFiled: June 22, 2010Publication date: December 22, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dan F. Greiner, Marcel Mitran, Timothy J. Slegel
-
Patent number: 8074056Abstract: In one implementation, a pipeline processor is provided having a base architecture that includes one or more decoders operable to decode program instructions and generate one or more decoded instructions, and one or more execution units operable to execute the one or more decoded instructions. Each execution unit includes one or more execution pipeline stages. The pipeline processor architecture further includes one or more additional co-processor pipelines. The one or more decoders of the base architecture are operable to recognize one or more instructions to be processed by a given co-processor pipeline and pass the one or more recognized instructions to the given co-processor pipeline for decoding and execution.Type: GrantFiled: March 1, 2005Date of Patent: December 6, 2011Assignee: Marvell International Ltd.Inventors: Hong-Yi Chen, Jensen Tjeng
-
Publication number: 20110296142Abstract: A processor including instruction support for large-operand instructions that use multiple register windows may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may also include an instruction execution unit that, during operation, receives instructions for execution from the instruction fetch unit and executes a large-operand instruction defined within the ISA, where execution of the large-operand instruction is dependent upon a plurality of registers arranged within a plurality of register windows. The processor may further include control circuitry (which may be included within the fetch unit, the execution unit, or elsewhere within the processor) that determines whether one or more of the register windows depended upon by the large-operand instruction are not present. In response to determining that one or more of these register windows are not present, the control circuitry causes them to be restored.Type: ApplicationFiled: May 28, 2010Publication date: December 1, 2011Inventors: Christopher H. Olson, Paul J. Jordan, Jama I. Barreh
-
Publication number: 20110296143Abstract: A pipeline processor which meets a latency restriction on an equal model is provided. The pipeline processor includes a pipeline processing unit to process an instruction at a plurality of stages and an equal model compensator to store the results of the processing of some or all of the instructions located in the pipeline processing unit and to write the results of the processing in a register file based on the latency of each instruction.Type: ApplicationFiled: December 30, 2010Publication date: December 1, 2011Inventors: Heejun Shim, Yenjo Han, Jae-Young Kim, Yeon-Gon Cho, Jinseok Lee
-
Publication number: 20110276784Abstract: In one embodiment, a current candidate thread is selected from each of multiple first groups of threads using a low granularity selection scheme, where each of the first groups includes multiple threads and first groups are mutually exclusive. A second group of threads is formed comprising the current candidate thread selected from each of the first groups of threads. A current winning thread is selected from the second group of threads using a high granularity selection scheme. An instruction is fetched from a memory based on a fetch address for a next instruction of the current winning thread. The instruction is then dispatched to one of the execution units for execution, whereby execution stalls of the execution units are reduced by fetching instructions based on the low granularity and high granularity selection schemes.Type: ApplicationFiled: May 10, 2010Publication date: November 10, 2011Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)Inventors: Evan Gewirtz, Robert Hathaway, Stephan Meier, Edward Ho
-
Patent number: 8055883Abstract: A data processing apparatus 1 has a plurality of registers 10 of the same type of register and a plurality of processing pipelines 40, 50, each processing pipeline 40, 50 being arranged to process instructions. At least one instruction includes a destination register specifier specifying which of said registers is a destination register for storing a processing result of the at least one instruction. Instruction issuing circuitry 26 is configured to issue the at least one instruction for processing by one of the plurality of processing pipelines. The instruction issuing circuitry 26 selects the one of the plurality of processing pipelines to which the candidate instruction is issued in dependence upon the value of the destination register specifier of the candidate instruction.Type: GrantFiled: July 1, 2009Date of Patent: November 8, 2011Assignee: ARM LimitedInventor: David Raymond Lutz
-
Patent number: 8051275Abstract: A processor 2 includes an execution cluster 10 having multiple execution units 14, 16, 18, 20. The execution units 14, 16, 18, 20 share result buses 22, 24. Issue circuitry 12 within the execution cluster 10 determines future availability of a result bus 22, 24 for an instruction to be issued (or recently issued) using a known cycle count for that instruction. The availability is tracked for each result bus using a mask register 32 storing a mask value within which each bit position indicates the availability or non-availability of that result bus at a particular processing cycle in the future. The mask value is left shifted each processing cycle.Type: GrantFiled: June 1, 2009Date of Patent: November 1, 2011Assignee: ARM LimitedInventors: David James Williamson, Conrado Blasco Allué
-
Patent number: 8051273Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.Type: GrantFiled: November 2, 2010Date of Patent: November 1, 2011Assignee: NEC CorporationInventor: Shorin Kyo
-
Publication number: 20110252220Abstract: A method, information processing system, and computer program product crack and/or shorten computer executable instructions. At least one instruction is received. The at least on instruction is analyzed. An instruction type associated with the at least one instruction is identified. At least one of a base field, an index field, one or more operands, and a mask field of the instruction are analyzed. At least one of the following is then performed: the at least one instruction is organized into a set of unit of operation; and the at least one instruction is shortened. The set of unit of operations is then executed.Type: ApplicationFiled: April 9, 2010Publication date: October 13, 2011Applicant: International Business Machines CorporationInventors: Fadi BUSABA, Brian CURRAN, Lee EISEN, Bruce GIAMEI, David HUTTON
-
Publication number: 20110246995Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.Type: ApplicationFiled: April 5, 2010Publication date: October 6, 2011Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
-
Patent number: 8028284Abstract: A system is provided having a group of processors which performs coordinated processing, wherein data is transferred to the group of processors or from the group of processors. When data is transferred from an input queue, a ring buffer, to the group of processors, an identifier adding unit adds an identifier to the data as a tag, the identifier indicating a block that contains this data in the input queue. When data processed by any one of the processors included in the group of processors is transferred to an output queue, a block selecting unit selects one of blocks of the output queue as a block for storing the data, the one corresponding to the tag added to this data.Type: GrantFiled: October 31, 2006Date of Patent: September 27, 2011Assignee: Sony Computer Entertainment Inc.Inventor: Yasukichi Ohkawa
-
Publication number: 20110225398Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.Type: ApplicationFiled: May 24, 2011Publication date: September 15, 2011Inventors: David T. Hass, Abbas Rashid
-
Publication number: 20110208950Abstract: A method of instruction issue (3200) in a microprocessor (1100, 1400, or 1500) with execution pipestages (E1, E2, etc.) and that executes a producer instruction Ip and issues a candidate instruction I0 (3245) having a source operand dependency on a destination operand of instruction Ip. The method includes issuing the candidate instruction I0 as a function (1720, 1950, 1958, 3235) of a pipestage EN(I0) of first need by the candidate instruction for the source operand, a pipestage EA(Ip) of first availability of the destination operand from the producer instruction, and the one execution pipestage E(Ip) currently associated with the producer instruction. A method of data forwarding (3300) in a microprocessor (1100, 1400, or 1500) having a pipeline (1640) having pipestages (E1, E2, etc.Type: ApplicationFiled: March 21, 2011Publication date: August 25, 2011Applicant: TEXAS INSTRUMENTS INCORPORATEDInventors: Thang Minh Tran, Raul A. Garibay, JR., James Nolan Hardage
-
Patent number: 8006072Abstract: A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.Type: GrantFiled: May 18, 2010Date of Patent: August 23, 2011Assignee: Micron Technology, Inc.Inventors: Neal Andrew Cook, Alan T. Wootton, James Peterson
-
Patent number: 7996654Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 19, 2008Date of Patent: August 9, 2011Assignee: International Business Machines CorporationInventor: David A. Luick
-
Patent number: 7991979Abstract: A system and method for issuing load-dependent instructions in an issue queue in a processing unit. A load miss queue is provided. The load miss queue comprises a physical address field, an issue queue position field, a valid identifier field, a source identifier field, and a data type field. A load instruction that misses a first level cache is dispatched, and both the physical address field and the data type field are set. A load-dependent instruction is identified. In response to identifying the load-dependent instruction, each of the issue queue position field, valid identifier field, and source identifier field are set. If the issue queue position field refers to a flushed instruction, the valid identifier field is cleared. The load instruction is recycled, and a value of the valid identifier field is determined. The load-dependent instruction is then selected for issue on a next processing cycle independent of an age of the load-dependent instruction.Type: GrantFiled: September 23, 2008Date of Patent: August 2, 2011Assignee: International Business Machines CorporationInventors: Christopher M. Abernathy, Mary D Brown, William E. Burky, Todd A. Venton
-
Patent number: 7984269Abstract: A data processing apparatus and method are provided for executing complex instructions. The data processing apparatus executes instructions defining operations to be performed by the data processing apparatus, those instructions including at least one complex instruction defining a sequence of operations to be performed. The data processing apparatus comprises a plurality of execution pipelines, each execution pipeline having a plurality of pipeline stages and arranged to perform at least one associated operation. Issue circuitry interfaces with the plurality of execution pipelines and is used to schedule performance of the operations defined by the instructions. For the at least one complex instruction, the issue circuitry is arranged to schedule a first operation in the sequence, and to issue control signals to one of the execution pipelines with which that first operation is associated, those control signals including an indication of each additional operation in the sequence.Type: GrantFiled: June 12, 2007Date of Patent: July 19, 2011Assignee: ARM LimitedInventors: Luc Orion, Cédric Denis Robert Airaud, Boris Sira Alvarez-Heredia
-
Patent number: 7984270Abstract: The present invention provides a system and method for prioritizing arithmetic instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one arithmetic instruction is in the issue group, if so scheduling the least one arithmetic instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one arithmetic instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 19, 2008Date of Patent: July 19, 2011Assignee: International Business Machines CorporationInventor: David A. Luick
-
Patent number: 7984268Abstract: An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.Type: GrantFiled: July 23, 2004Date of Patent: July 19, 2011Assignee: NetLogic Microsystems, Inc.Inventors: David T. Hass, Abbas Rashid
-
Patent number: 7984272Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding data in a processor is provided. The design structure includes a processor. The processor includes at least one cascaded delayed execution pipeline unit having a first and second pipeline, wherein the second pipeline is configured to execute instructions in a common issue group in a delayed manner relative to the first pipeline, and circuitry. The circuitry is configured to determine if a first instruction being executed in the first pipeline modifies data in a data register which is accessed by a second instruction being executed in the second pipeline, and if the first instruction being executed in the first pipeline modifies data in the data register which is accessed by the second instruction being executed in the second pipeline, forward the modified data from the first pipeline to the second pipeline.Type: GrantFiled: March 21, 2008Date of Patent: July 19, 2011Assignee: International Business Machines CorporationInventor: David Arnold Luick
-
Patent number: 7979677Abstract: A method and device for adaptively allocating reservation station entries to an instruction set with variable operands in a microprocessor. The device includes logic for determining free reservation station queue positions in a reservation station. The device allocates an issue queue to an instruction and writes the instruction into the issue queue as an issue queue entry. The device reads an operand corresponding to the instruction from a general purpose register and writes the operand into a reservation station using one of the free reservations station positions as a write address. The device writes each reservation station queue position corresponding to said instruction into said issue queue entry. When the instruction is ready for issue to an execution unit, the device reads out the instruction from the issue queue entry the reservation station queue positions to the execution unit.Type: GrantFiled: August 3, 2007Date of Patent: July 12, 2011Assignee: International Business Machines CorporationInventor: Dung Q. Nguyen
-
Patent number: 7971035Abstract: A data processing system having a memory for storing instructions and several central processing units for executing instructions, each central processing unit includes an adaptive power supply which provides, among other data, temperature information. Circuitry is provided that receives the temperature information from the many central processing units, selects a central processing unit which has the lowest temperature and which is available to execute instructions and dispatches instructions to the selected central processing from the memory.Type: GrantFiled: February 6, 2007Date of Patent: June 28, 2011Assignee: International Business Machines CorporationInventors: Deepak K. Singh, Francois Ibrahim Atallah
-
Publication number: 20110153989Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.Type: ApplicationFiled: December 22, 2009Publication date: June 23, 2011Inventors: Ravi Rajwar, Andrew T. Forsyth