Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
  • Patent number: 8019973
    Abstract: An information processing apparatus and a method of controlling the same that employs a register window system and a Simultaneous Multithreading method for reducing circuit areas by sharing a data transfer bus between threads, said bus connecting a master register and a work register provided for each thread and for avoiding interference in instruction execution with other threads caused by a conflict between accesses to a register between threads. An information processing apparatus and a method of controlling the information processing apparatus employing a register window system for register reading, in which a master register and a work register are held for each thread and a bus for transferring data from the master to the work register is shared by threads in order to realize Simultaneous Multithreading.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: September 13, 2011
    Assignee: Fujitsu Limited
    Inventors: Takashi Suzuki, Toshio Yoshida
  • Patent number: 8006072
    Abstract: A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: August 23, 2011
    Assignee: Micron Technology, Inc.
    Inventors: Neal Andrew Cook, Alan T. Wootton, James Peterson
  • Patent number: 8006077
    Abstract: A processing system features a first processing core to operate in a first node, a second processing core to operate in a second node, and random access memory (RAM) responsive to the first and second processing cores. The processing system also features control logic to perform operations such as (a) automatically updating a resident set size (RSS) counter to correspond to the RSS for the thread on the first node in response to allocation of a page frame for a thread in the first node, and (b) using the RSS counter to predict migration overhead when determining whether the thread should be migrated from the first processing core to the second processing core. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: August 23, 2011
    Assignee: Intel Corporation
    Inventors: Tong Li, Daniel Baumberger, Scott Hahn
  • Patent number: 7996654
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7984270
    Abstract: The present invention provides a system and method for prioritizing arithmetic instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one arithmetic instruction is in the issue group, if so scheduling the least one arithmetic instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one arithmetic instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: July 19, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7979783
    Abstract: An error detection device for a command decoder is described, the command decoder reading out an associated sequence of control signal words from a command memory based on an input word, wherein the sequence of control signal words has at least one control signal word, having: a controller designed to provide the input word at a first time and the input word at a second time for reading out the command memory, wherein the second time is delayed with respect to the first time, to effect a readout of the sequence of control signal words at a first time and a readout of the sequence of control signal words at a second time; and a comparator designed to receive and compare the associated sequences of control signal words read out at the first and second times, and to output a signal indicating an error if the associated sequences of control signal words read out at the first and second times are different.
    Type: Grant
    Filed: February 8, 2007
    Date of Patent: July 12, 2011
    Assignee: Infineon Technologies AG
    Inventors: Michael Goessel, Franz Klug, Steffen Marc Sonnekalb
  • Patent number: 7979677
    Abstract: A method and device for adaptively allocating reservation station entries to an instruction set with variable operands in a microprocessor. The device includes logic for determining free reservation station queue positions in a reservation station. The device allocates an issue queue to an instruction and writes the instruction into the issue queue as an issue queue entry. The device reads an operand corresponding to the instruction from a general purpose register and writes the operand into a reservation station using one of the free reservations station positions as a write address. The device writes each reservation station queue position corresponding to said instruction into said issue queue entry. When the instruction is ready for issue to an execution unit, the device reads out the instruction from the issue queue entry the reservation station queue positions to the execution unit.
    Type: Grant
    Filed: August 3, 2007
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventor: Dung Q. Nguyen
  • Publication number: 20110161623
    Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.
    Type: Application
    Filed: December 30, 2009
    Publication date: June 30, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
  • Publication number: 20110138153
    Abstract: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.
    Type: Application
    Filed: February 14, 2011
    Publication date: June 9, 2011
    Inventor: Robert T. Golla
  • Patent number: 7953959
    Abstract: A processor includes: an instruction buffer which holds a group of instructions that can be executed in parallel; an instruction decoding unit which decodes part or all of the group of instructions; and an instruction issuance control unit which detects whether or not a factor obstructing simultaneous execution of the group of instructions exists in the group of instructions and supplies the group of instructions to the instruction decoding unit by controlling the instruction buffer so that the instructions of the group of instructions are sequentially supplied when the factor exists and all the instructions of the group of instructions are simultaneously supplied when the factor does not exist.
    Type: Grant
    Filed: March 9, 2006
    Date of Patent: May 31, 2011
    Assignee: Panasonic Corporation
    Inventor: Tetsu Hosoki
  • Patent number: 7949856
    Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: May 24, 2011
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Patent number: 7941648
    Abstract: A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64×64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports.
    Type: Grant
    Filed: June 3, 2008
    Date of Patent: May 10, 2011
    Assignee: Altera Corporation
    Inventors: Gerald George Pechanek, Edward A. Wolff
  • Patent number: 7937563
    Abstract: A system and method for providing a digital real-time voltage droop detection and subsequent voltage droop reduction. A scheduler within a reservation station may store a weight value for each instruction corresponding to node capacitance switching activity for the instruction derived from pre-silicon power modeling analysis. For instructions picked with available source data, the corresponding weight values are summed together to produce a local current consumption value and this value is summed with any existing global current consumption values from corresponding schedulers of other processor cores yielding an activity event. The activity event is stored. Hashing functions within the scheduler are used to determine both a recent and an old activity average using the calculated activity event and stored older activity events.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: May 3, 2011
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Samuel D. Naffziger, Michael Gerard Butler
  • Patent number: 7934203
    Abstract: During program code conversion, such as in a dynamic binary translator, automatic code generation provides target code 21 executable by a target processor 13. Multiple instruction ports 610 disperse a group of instructions to functional units 620 of the processor 13. Disclosed is a mechanism of preparing an instruction group 606 using a plurality of pools 700 having a hierarchical structure 711-715. Each pool represents a different overlapping subset of the issue ports 610. Placing an instruction 600 into a particular pool 700 also reduces vacancies in any one or more subsidiary pools in the hierarchy. In a preferred embodiment, a counter value 702 is associated with each pool 700 to track vacancies. A valid instruction group 606 is formed by picking the placed instructions 600 from the pools 700. The instruction groups are generated accurately and automatically. Decoding errors and stalls are minimized or completely avoided.
    Type: Grant
    Filed: May 27, 2005
    Date of Patent: April 26, 2011
    Assignee: International Business Machines Corporation
    Inventors: William O. Lovett, David Haikney, Matthew Evans
  • Patent number: 7921277
    Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: April 5, 2011
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Patent number: 7890734
    Abstract: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: February 15, 2011
    Assignee: Open Computing Trust I & II
    Inventor: Robert T. Golla
  • Patent number: 7890735
    Abstract: A multi-threaded microprocessor (1105) for processing instructions in threads. The microprocessor (1105) includes first and second decode pipelines (1730.0, 1730.1), first and second execute pipelines (1740, 1750), and coupling circuitry (1916) operable in a first mode to couple first and second threads from the first and second decode pipelines (1730.0, 1730.1) to the first and second execute pipelines (1740, 1750) respectively, and the coupling circuitry (1916) operable in a second mode to couple the first thread to both the first and second execute pipelines (1740, 1750). Various processes of manufacture, articles of manufacture, processes and methods of operation, circuits, devices, and systems are disclosed.
    Type: Grant
    Filed: August 23, 2006
    Date of Patent: February 15, 2011
    Assignee: Texas Instruments Incorporated
    Inventor: Thang Tran
  • Patent number: 7882335
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a first pipeline based upon a priority list; and (3) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: February 1, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A Luick
  • Patent number: 7877579
    Abstract: The present invention provides a system and method for prioritizing compare instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one compare instruction is in the issue group, if so scheduling the least one compare instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one compare instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 25, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7873817
    Abstract: A reduced instruction set computer (RISC) processor includes a processing core, which is arranged to process a software thread. A hardware-implemented scheduler is arranged to receive respective contexts of a plurality of software threads, to determine a schedule for processing of the software threads by the processing core, and to serve the contexts to the processing core in accordance with the schedule.
    Type: Grant
    Filed: October 18, 2005
    Date of Patent: January 18, 2011
    Inventors: Eli Aloni, Gilad Ayalon, Oren David
  • Patent number: 7870368
    Abstract: The present invention provides a system and method for prioritizing branch instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one branch instruction is in the issue group, if so scheduling the least one branch instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one branch instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 11, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7865700
    Abstract: The present invention provides a system and method for prioritizing store instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one store instruction is in the issue group, if so scheduling the least one store instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one store instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 4, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7861061
    Abstract: A processor and a method for executing VLIW instructions by first fetching a VLIW instruction and then identifying from option bits encoded in a first one of the instructions within the fetched VLIW instruction packet which, if any, of the remaining instructions within the VLIW instruction are to be executed in the same execution cycle as the first instruction. Finally, executing the first instruction and any remaining instructions identified from the encoded option bits.
    Type: Grant
    Filed: May 23, 2003
    Date of Patent: December 28, 2010
    Assignee: STMicroelectronics (R&D) Ltd.
    Inventor: Zahid Hussain
  • Publication number: 20100312989
    Abstract: A processor 2 supporting register renaming has a rename table 20 in which the flag register has multiple tag values associated therewith. These tag values indicate which virtual register corresponds to a destination flag register of the oldest instruction which wrote a still up-to-date value of a subset of the flags.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Inventor: James Nolan Hardage
  • Patent number: 7827388
    Abstract: Each instruction thread in a SMT processor is associated with a software assigned base input processing priority. Unless some predefined event or circumstance occurs with an instruction being processed or to be processed, the base input processing priorities of the respective threads are used to determine the interleave frequency between the threads according to some instruction interleave rule. However, upon the occurrence of some predefined event or circumstance in the processor related to a particular instruction thread, the base input processing priority of one or more instruction threads is adjusted to produce one more adjusted priority values. The instruction interleave rule is then enforced according to the adjusted priority value or values together with any base input processing priority values that have not been subject to adjustment.
    Type: Grant
    Filed: March 7, 2008
    Date of Patent: November 2, 2010
    Assignee: International Business Machines Corporation
    Inventors: John Wesley Ward, III, Minh Michelle Quy Pham, Ronald Nick Kalla, Balaram Sinharoy
  • Patent number: 7822592
    Abstract: In an active system, an actor is able to effect action in a subject system. The actor and the subject system exist in an environment which can impact the subject system. Neither the actor nor the subject system has any control over the environment. The actor includes a model and a processor. The processor is guided by the model. The processor is arranged to effect action in the subject system. The subject system is known by the model. This allows the actor to be guided in its action on the subject system by the model of the subject system. Events can occur in the subject system either through the actions of the actor, as guided by the model, or through actions of other actors, or through a change in state of the subject system itself (e.g. the progression of a chemical reaction) or its environment (e.g. the passage of time). The actor keeps the model updated with its own actions.
    Type: Grant
    Filed: October 17, 2005
    Date of Patent: October 26, 2010
    Assignee: Manthatron-IP Limited
    Inventor: Peter Hawkins
  • Publication number: 20100257340
    Abstract: Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.
    Type: Application
    Filed: April 3, 2009
    Publication date: October 7, 2010
    Applicant: INTERNATIONAL BUISNESS MACHINES CORPORATION
    Inventors: Richard William Doing, Kevin Neal Magil, Balaram Sinharoy, Jeffrey R. Summers, James A. Van Norstrand, JR.
  • Patent number: 7810084
    Abstract: Computer-implemented methods, computer systems and computer program products are provided for parallel processing a plurality of data objects with a plurality of processors. As disclosed herein, the data objects to be assembled for further processing may be in bundles, the bundles obeying first predefined criteria, which is dynamically controlled by using a bundle specific master table. The methods and systems may generate pipelines of data objects by pre-selecting and grouping the data objects according to second predefined criteria by a first group of the plurality of processors, and create the bundles from each pipeline of the pre-selected data objects by a second group of the plurality of processors.
    Type: Grant
    Filed: June 1, 2006
    Date of Patent: October 5, 2010
    Assignee: SAP AG
    Inventor: Karsten S. Egetoft
  • Patent number: 7797517
    Abstract: Reference architecture instructions are translated into target architecture operations. Sequences of operations, in a predicted execution order in some embodiments, form traces. In some embodiments, a trace is based on a plurality of basic blocks. In some embodiments, a trace is committed or aborted as a single entity. Sequences of operations are optimized by fusing collections of operations; fused operations specify a same observable function as respective collections, but advantageously enable more efficient processing. In some embodiments, a collection comprises multiple register operations. Fusing a register operation with a branch operation in a trace forms a fused reg-op/branch operation. In some embodiments, branch instructions translate into assert operations. Fusing an assert operation with another operation forms a fused assert operation. In some embodiments, fused operations only set architectural state, such as high-order portions of registers, that is subsequently read before being written.
    Type: Grant
    Filed: November 17, 2006
    Date of Patent: September 14, 2010
    Assignee: Oracle America, Inc.
    Inventor: John Gregory Favor
  • Patent number: 7793080
    Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: September 7, 2010
    Inventors: Gene Shen, Sean Lie
  • Patent number: 7779236
    Abstract: The invention provides a method and system for operating a pipelined microprocessor more quickly, by detecting instructions that load from identical memory locations as were recently stored to, without having to actually compute the referenced external memory addresses. The microprocessor examines the symbolic structure of instructions as they are encountered, so as to be able to detect identical memory locations by examination of their symbolic structure. For example, in a preferred embodiment, instructions that store to and load from an identical offset from an identical register are determined to be referencing the identical memory location, without having to actually compute the complete physical target address.
    Type: Grant
    Filed: November 19, 1999
    Date of Patent: August 17, 2010
    Assignee: STMicroelectronics, Inc.
    Inventor: David L. Isaman
  • Patent number: 7779165
    Abstract: Producers and consumer processes may synchronize and transfer data using a shared data structure. After locating a potential transfer location that indicates an EMPTY status, a producer may store data to be transferred in the transfer location. A producer may use a compare-and-swap (CAS) operation to store the transfer data to the transfer location. A consumer may subsequently read the transfer data from the transfer location and store, such as by using a CAS operation, a DONE status indicator in the transfer location. The producer may notice the DONE indication and may then set the status location back to EMPTY to indicate that the location is available for future transfers, by the same or a different producer. The producer may also monitor the transfer location and time out if no consumer has picked up the transfer data.
    Type: Grant
    Filed: January 4, 2006
    Date of Patent: August 17, 2010
    Assignee: Oracle America, Inc.
    Inventors: Mark S. Moir, Daniel S. Nussbaum, Ori Shalev, Nir N. Shavit
  • Publication number: 20100199072
    Abstract: A register file comprising a plurality of register entries for storing data values for use in the execution of data processing instructions is provided, and comprises at least one write port and at least one read port, and circuitry responsive to a write request received at said at least one write port to update one of said plurality of register entries identified by an address specified by said write request with a data value specified by said write request. The register file also comprises further circuitry responsive to a received control signal to set at least a portion of a predetermined register entry to a predetermined value. In this way, certain register file updating instructions can be executed in parallel with other instructions without the need for additional full write-ports as would be required for typical dual-issue, thereby reducing area and routing complexity and cost compared with the use of an additional write-port due to the lower gate count required by the proposed further circuitry.
    Type: Application
    Filed: February 2, 2009
    Publication date: August 5, 2010
    Applicant: ARM LIMITED
    Inventor: Simon John Craske
  • Patent number: 7769980
    Abstract: In arithmetic/logic units (ALU) provided corresponding to entries, an MIMD instruction decoder generating a group of control signals in accordance with a Multiple Instruction-Multiple Data (MIMD) instruction and an MIMD register storing data designating the MIMD instruction are provided, and an inter-ALU communication circuit is provided. The amount and direction of movement of the inter-ALU communication circuit are set by data bits stored in a movement data register. It is possible to execute data movement and arithmetic/logic operation with the amount of movement and operation instruction set individually for each ALU unit. Therefore, in a Single Instruction-Multiple Data type processing device, Multiple Instruction-Multiple Data operation can be executed at high speed in a flexible manner.
    Type: Grant
    Filed: August 16, 2007
    Date of Patent: August 3, 2010
    Assignee: Renesas Technology Corp.
    Inventors: Toshinori Sueyoshi, Masahiro Iida, Mitsutaka Nakano, Fumiaki Senoue, Katsuya Mizumoto
  • Patent number: 7761691
    Abstract: A method for scheduling instructions for clustered digital signal processors comprising a plurality of clusters, each cluster including at least two functional units and a first register file having a first unit, a second unit and a single set of access ports shared by the functional units comprises steps of checking whether executing one instruction needs data to be read from the first unit and the second unit of the first register file, generating a copying instruction to transfer data from the first unit to the second unit of the first register file, checking whether there is a prior operation cycle available to perform the copying instruction, scheduling the copying instruction in the prior operation cycle, and scheduling the instruction after the copying instruction.
    Type: Grant
    Filed: October 27, 2005
    Date of Patent: July 20, 2010
    Assignee: National Tsing Hua University
    Inventors: Chung-Lin Tang, Yung-Chia Lin, Jenq-Kuen Lee
  • Publication number: 20100174884
    Abstract: A processor (101) in which a plurality of arithmetic elements executing instructions are embedded includes: fixed function arithmetic elements (121 to 123) each having a circuit configuration that is not dynamically reconfigurable; a reconfigurable arithmetic element (125) having a circuit configuration that is dynamically reconfigurable; and an arithmetic operation control unit (113) which allocates instructions to the fixed function arithmetic elements (121 to 123) and the reconfigurable arithmetic element (125) and issues the allocated instructions to the respective arithmetic elements.
    Type: Application
    Filed: November 9, 2006
    Publication date: July 8, 2010
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventors: Hiroyuki Morishita, Takao Yamamoto, Masaitsu Nakajima
  • Publication number: 20100161945
    Abstract: An information handling system includes a processor that may perform issue queue virtual load/store instruction operations. The issue queue maintains load and store instructions with a real/virtual dependency flag. The issue queue provides storage resources for real and virtual load/store instructions. Real load/store instructions execute in a load store unit LSU. Virtual load/store instructions are pending execution in the LSU. The LSU may keep track of each virtual load/store instruction within the issue queue by thread, type, and pointer data. Provided that all dependencies are clear for a pending virtual load/store instruction, the LSU marks the pending virtual load/store instruction as real. The pending virtual load/store instruction may then issue to the LSU as a real load/store instruction.
    Type: Application
    Filed: December 22, 2008
    Publication date: June 24, 2010
    Applicants: International Business Machines Corporation, IBM Corporation
    Inventors: William E. Burky, Kurt A. Feiste, Dung Quoc Nguyen, Balaram Sinharoy, Albert Thomas Williams
  • Patent number: 7739483
    Abstract: A method and apparatus for dual-target register allocation is described, intended to enable the efficient mapping/renaming of registers associated with instructions within a pipelined microprocessor architecture.
    Type: Grant
    Filed: September 28, 2001
    Date of Patent: June 15, 2010
    Assignee: Intel Corporation
    Inventors: Rajesh Patel, James Dundas, Adi Yoaz
  • Publication number: 20100138810
    Abstract: Paralleling processing system and method. When clusters are formed based on strongly connected components, a single cluster (fat cluster) having at least a predetermined number of blocks, or an expected processing time exceeding a predetermined threshold, is formed. The fat cluster is subjected to an unrolling process to make multiple copies of the processing of the fat cluster and to assign the copies to individual processors. Processing of the fat cluster is executed by the multiple processor devices in a pipelined manner. If a fat cluster to be iteratively executed cannot be executed in the pipelined manner because a processing result of an nth iteration of the fat cluster depends on a processing result of a preceding iteration of the fat cluster an input value needed for execution of the fat cluster is generated based on a certain prediction, and the fat cluster is speculatively executed.
    Type: Application
    Filed: December 2, 2009
    Publication date: June 3, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hideaki Komatsu, Arquimedes Martinez Canedo, Takeo Yoshizawa
  • Patent number: 7730280
    Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.
    Type: Grant
    Filed: April 18, 2007
    Date of Patent: June 1, 2010
    Assignee: Vicore Technologies, Inc.
    Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
  • Publication number: 20100131741
    Abstract: A microcontroller capable of improving processing performance as a whole by executing different programs by a plurality of CPUs and capable of detecting abnormality for safety-required processing by evaluating results of the same processing executed by the plurality of CPUs. A plurality of processing systems including CPUs and memories are provided, data output from the CPUs in each of the processing systems is separately compressed and stored by compressors for each of the CPUs, respectively. The compressed storage data is mutually compared by a comparator, and abnormality of processing can be detected when the comparison result indicates a mismatch. Even when the timings by which the same processing results are obtained are different when the plurality of CPUs asynchronously execute the same processing, the processing results of both of them can be easily compared with each other since compression is carried out by the compressors.
    Type: Application
    Filed: November 2, 2009
    Publication date: May 27, 2010
    Inventors: Hiromichi YAMADA, Kotaro Shimamura, Kesami Hagiwara, Yoshikazu Kiyoshige, Yuichi Ishiguro
  • Patent number: 7721070
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: September 22, 2008
    Date of Patent: May 18, 2010
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 7707578
    Abstract: A thread scheduling mechanism is provided that flexibly enforces performance isolation of multiple threads to alleviate the effect of anti-cooperative execution behavior with respect to a shared resource, for example, hoarding a cache or pipeline, using the hardware capabilities of simultaneous multi-threaded (SMT) or multi-core processors. Given a plurality of threads running on at least two processors in at least one functional processor group, the occurrence of a rescheduling condition indicating anti-cooperative execution behavior is sensed, and, if present, at least one of the threads is rescheduled such that the first and second threads no longer execute in the same functional processor group at the same time.
    Type: Grant
    Filed: December 16, 2004
    Date of Patent: April 27, 2010
    Assignee: VMware, Inc.
    Inventors: John R. Zedlewski, Carl A. Waldspurger
  • Patent number: 7698707
    Abstract: Identifying compatible threads in a Simultaneous Multithreading (SMT) processor environment is provided by calculating a performance metric, such as cycles per instruction (CPI), that occurs when two threads are running on the SMT processor. The CPI that is achieved when both threads were executing on the SMT processor is determined. If the CPI that was achieved is better than the compatibility threshold, then information indicating the compatibility is recorded. When a thread is about to complete, the scheduler looks at the run queue from which the completing thread belongs to dispatch another thread. The scheduler identifies a thread that is (1) compatible with the thread that is still running on the SMT processor (i.e., the thread that is not about to complete), and (2) ready to execute. The CPI data is continually updated so that threads that are compatible with one another are continually identified.
    Type: Grant
    Filed: February 25, 2008
    Date of Patent: April 13, 2010
    Assignee: International Business Machines Corporation
    Inventors: Jos Manuel Accapadi, Andrew Dunshea, Dirk Michel, Mysore Sathyanarayana Srinivas
  • Patent number: 7697007
    Abstract: A controlling process may enable or disable the launching of a predicated process that has already been queued for launching, e.g. via a pushbuffer. The controlling process generates a report so that launching of the predicated process is enabled or disabled based on the report. The predicate may be global in application to enable or disable all subsequent launch commands. Alternatively, the predicate may be specific to one or more predicated processes. In an embodiment with a central processing unit (CPU) coupled to a graphics processing unit (GPU), the CPU may generate the controlling process that enables or disables the launch of the predicated process. Alternatively or additionally, the GPU may generate the controlling process that enables or disables the launch of the predicated process.
    Type: Grant
    Filed: July 12, 2006
    Date of Patent: April 13, 2010
    Assignee: NVIDIA Corporation
    Inventor: Jerome F. Duluk, Jr.
  • Patent number: 7698535
    Abstract: An asynchronous circuit is described for processing units of data having a program order associated therewith. The circuit includes an N-way-issue resource comprising N parallel pipelines. Each pipeline is operable to transmit a subset of the units of data in a first-in-first-out manner. The asynchronous circuit is operable to sequentially control transmission of the units of data in the pipelines such that the program order is maintained.
    Type: Grant
    Filed: September 16, 2003
    Date of Patent: April 13, 2010
    Assignee: Fulcrum Microsystems, Inc.
    Inventors: Andrew Lines, Robert Southworth, Uri Cummings
  • Patent number: 7685403
    Abstract: Instructions asserted in the instruction pipeline (3) of the microprocessor are accompanied by control information, comprising a group of bits, asserted within a control information pipeline (15) of the processor. The control information pipeline is synchronized to the instruction pipeline so that the control information for an instruction progresses in synchronism with the instruction. The control information may identify, directly or indirectly, the type of operation called for by the instruction and, if the operation is to be performed in parts, indicate the part to be performed. Means are included in to the processor, such as a number of functional execution units (7), to interpret that control information and take appropriate action.
    Type: Grant
    Filed: June 16, 2003
    Date of Patent: March 23, 2010
    Inventors: Brett Coon, Godfrey D'Souza, Paul Serris
  • Patent number: 7681018
    Abstract: A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor also includes a memory control system that has a first memory controller that sorts memory references based on whether the memory references are directed to an even bank or an odd bank of memory and a second memory controller that optimizes memory references based upon whether the memory references are read references or write references. Instructions for switching and branching based on executing contexts are also disclosed.
    Type: Grant
    Filed: January 12, 2001
    Date of Patent: March 16, 2010
    Assignee: Intel Corporation
    Inventors: Gilbert Wolrich, Matthew J. Adiletta, William Wheeler
  • Publication number: 20100064121
    Abstract: A dual-issue instruction is decoded to determine a plurality of LSU dependencies needed by an LSU part of the dual-issue instruction and a plurality of non-LSU dependencies needed by a non-LSU part of the dual-issue instruction. During dispatch of the dual-issue instruction by the microprocessor, the dual dependency matrices are employed as follows: a Load-Store Unit (LSU) dependency matrix is written with the plurality of LSU dependencies and a non-LSU dependency matrix is written with the plurality of non-LSU dependencies; an LSU issue valid (LSU IV) indicator is set as valid to issue; an LSU portion of the dual-issue instruction is issued once the plurality of LSU dependencies of the dual issue instruction are satisfied; a non-LSU issue valid (non-LSU IV) indicator is set as valid to issue; and a non-LSU portion of the dual-issue instruction is issued once the plurality of non-LSU dependencies of the dual issue instruction are satisfied.
    Type: Application
    Filed: September 11, 2008
    Publication date: March 11, 2010
    Applicant: International Business Machines Corporation
    Inventors: Gregory W. Alexander, Brian D. Barrick, Lee E. Eisen, John W. Ward, III
  • Patent number: 7676656
    Abstract: A method and apparatus for minimizing unscheduled D-cache miss pipeline stalls is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group is a load instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is not delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: July 2, 2008
    Date of Patent: March 9, 2010
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick