Superscalar Patents (Class 712/23)
  • Patent number: 7562206
    Abstract: Microarchitecture policies and structures to predict execution clusters and facilitate inter-cluster communication are disclosed. In disclosed embodiments, sequentially ordered instructions are decoded into micro-operations. Execution of one set of micro-operations is predicted to involve execution resources to perform memory access operations and inter-cluster communication, but not to perform branching operations. Execution of a second set of micro-operations is predicted to involve execution resources to perform branching operations but not to perform memory access operations. The micro-operations are partitioned for execution in accordance with these predictions, the first set of micro-operations to a first cluster of execution resources and the second set of micro-operations to a second cluster of execution resources. The first and second sets of micro-operations are executed out of sequential order and are retired to represent their sequential instruction ordering.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: July 14, 2009
    Assignee: Intel Corporation
    Inventors: Avinash Sodani, Alexandre J. Farcy, Stephan J. Jourdan, Per Hammarlund, Mark C. Davis
  • Publication number: 20090172359
    Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.
    Type: Application
    Filed: December 31, 2007
    Publication date: July 2, 2009
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Gene Shen, Sean Lie
  • Patent number: 7555631
    Abstract: A register system for a data processor which operates in a plurality of modes. The register system provides multiple, identical banks of register sets, the data processor controlling access such that instructions and processes need not specify any given bank. An integer register set includes first (RA[23:0]) and second (RA[31:24]) subsets, and a shadow subset (RT[31:24]). While the data processor is in a first mode, instructions access the first and second subsets. While the data processor is in a second mode, instructions may access the first subset, but any attempts to access the second subset are re-routed to the shadow subset instead, transparently to the instructions, allowing system routines to seemingly use the second subset without having to save and restore data which user routines have written to the second subset. A re-typable register set provides integer width data and floating point width data in response to integer instructions and floating point instructions, respectively.
    Type: Grant
    Filed: January 31, 2002
    Date of Patent: June 30, 2009
    Inventors: Sanjiv Garg, Derek J. Lentz, Le Trong Nguyen, Sho Long Chen
  • Patent number: 7555632
    Abstract: The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available.
    Type: Grant
    Filed: December 27, 2005
    Date of Patent: June 30, 2009
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20090138674
    Abstract: An electronic system includes a pipeline having a first number of pipeline stages coupled in series, a pipeline control unit, and a logic engine, wherein each pipeline stage in the pipeline is for outputting data to a next pipeline stage at each cycle of a clock signal. The pipeline control unit is for changing the first number of pipeline stages in the pipeline to a second number of pipeline stages. The logic engine is for performing operations of the electronic system in a first mode by utilizing the pipeline having the first number of pipeline stages and for performing operations of the electronic system in a second mode by utilizing the pipeline having the second number of pipeline stages. A frequency control unit and a voltage control unit, coupled to the pipeline and the logic engine, respectively adjust the frequency and voltage of the electronic system accordingly.
    Type: Application
    Filed: November 22, 2007
    Publication date: May 28, 2009
    Inventors: Li-Hung Chang, Hong-Men Su
  • Patent number: 7533248
    Abstract: A multithreaded processor including a shared functional unit. In one embodiment, the multithreaded processor includes a functional unit coupled to a multithreaded instruction source that may request access to use the functional unit. The multithreaded processor may also include a processing unit that is coupled to request access to use the functional unit. The functional unit may be configured to execute one of an instruction provided by the multithreaded instruction source and an operation provided by the processing unit in a given cycle dependent upon which of the multithreaded instruction source and the processing unit has a higher priority.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: May 12, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Patent number: 7506185
    Abstract: A microelectronic device according to the present invention is made up of two or more functional units, which are all disposed on a single chip, or die. The present invention works on the strategy that all of the functional units on the die are not, and do not need to be operational at a given time in the execution of a computer program that is controlling the microelectronic device. The present invention on a very rapid basis (typically a half clock cycle), therefore, turns on and off the functional units of the microelectronic device in accordance with the requirements of the program being executed. This power down can be achieved by one of three techniques; turning off clock inputs to the functional units interrupting the supply of power to the functional units, or deactivating input signals to the functional units.
    Type: Grant
    Filed: June 6, 2006
    Date of Patent: March 17, 2009
    Assignee: Seiko Epson Corporation
    Inventor: Chong Ming Lin
  • Patent number: 7490225
    Abstract: Synchronized register renaming between a master processor and a coprocessor that receives operations from the master enables efficient implementation of register renaming and operation execution in the processors. An ideal and an external register allocation map are implemented in the coprocessor. When registers are no longer allocated according to the ideal allocation map and the registers are currently allocated according to the external allocation map, the registers are deallocated in the external map and the number of freed registers is reported to the master. The master increments a free register credit count accordingly, and decrements the credit count by one for each operation issued to the coprocessor. An operation is not issued to the coprocessor unless at least a register is free according to the credit count. The master also throttles coprocessor operation issue based on a credit count corresponding to free scheduler entries available in the coprocessor.
    Type: Grant
    Filed: October 31, 2006
    Date of Patent: February 10, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: John Gregory Favor, Christopher P. Nelson
  • Publication number: 20090019257
    Abstract: A mechanism for superscalar decode of variable length instructions. A length decode unit may obtain a plurality of instruction bytes based on a scan window of a predetermined size. The instruction bytes may be associated with a plurality of variable length instructions, which are scheduled to be executed by a processing unit. The length decode unit may, for each instruction byte, estimate the start of a next variable length instruction following a current variable length instruction, and store a first pointer. A pre-pick unit may, for each instruction byte, use the first pointer to estimate the start of a subsequent variable length instruction following the next variable length instruction within the scan window, and store a second pointer. A pick unit may use a start pointer and related first and second pointers to determine the actual start of the variable length instructions within the scan window, and generate instruction pointers.
    Type: Application
    Filed: July 10, 2007
    Publication date: January 15, 2009
    Inventors: Gene W. Shen, Sean Lie
  • Publication number: 20080320274
    Abstract: An apparatus for queue allocation. An embodiment of the apparatus includes a dispatch order data structure, a bit vector, and a queue controller. The dispatch order data structure corresponds to a queue. The dispatch order data structure stores a plurality of dispatch indicators associated with a plurality of pairs of entries of the queue to indicate a write order of the entries in the queue. The bit vector stores a plurality of mask values corresponding to the dispatch indicators of the dispatch order data structure. The queue controller interfaces with the queue and the dispatch order data structure. The queue controller excludes at least some of the entries from a queue operation based on the mask values of the bit vector.
    Type: Application
    Filed: June 19, 2007
    Publication date: December 25, 2008
    Applicant: Raza Microelectronics, Inc.
    Inventors: Gaurav Singh, Srivatsan Srinivasan, Lintsung Wong
  • Publication number: 20080313424
    Abstract: There is provided a multi-bit storage cell for a register file. The storage cell includes a first set of storage elements for a vector slice. Each storage element respectively corresponds to a particular one of a plurality of thread sets for the vector slice. The storage cell includes a second set of storage elements for a scalar slice. Each storage element in the second set respectively corresponds to a particular one of at least one thread set for the scalar slice. The storage cell includes at least one selection circuit for selecting, for an instruction issued by a thread, a particular one of the storage elements from any of the first set and the second set based upon the instruction being a vector instruction or a scalar instruction and based upon a corresponding set from among the pluralities of thread sets to which the thread belongs.
    Type: Application
    Filed: June 13, 2007
    Publication date: December 18, 2008
    Inventor: MICHAEL GSCHWIND
  • Publication number: 20080313425
    Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. A processing unit detects if a long-latency miss associated with a load instruction has been encountered. Responsive to a long-latency miss, the processing unit enters a load lookahead mode. Responsive to entering the load lookahead mode, the processing unit dispatches each instruction from a first set of instructions from a first buffer with an associated vector. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to completed execution of the first set of instructions from the first buffer, the processing unit copies the set of vectors from a first vector array to a second vector array. Then the processing unit dispatches a second set of instructions from a second buffer with an associated vector from the second vector array.
    Type: Application
    Filed: June 15, 2007
    Publication date: December 18, 2008
    Inventors: Hung Q. Le, Dung Q. Nguyen
  • Patent number: 7467385
    Abstract: A multi-streaming processor has a plurality of streams for streaming one or more instruction threads, a set of functional resources for processing instructions from streams, and interrupt handler logic. The logic detects and maps interrupts and exceptions to one or more specific streams. In some embodiments one interrupt or exception may be mapped to two or more streams, and in others two or more interrupts or exceptions may be mapped to one stream. Mapping may be static and determined at processor design, programmable, with data stored and amendable, or conditional and dynamic, the interrupt logic executing an algorithm sensitive to variables to determine the mapping. Interrupts may be external interrupts generated by devices external to the processor software (internal) interrupts generated by active streams, or conditional, based on variables. After interrupts are acknowledged streams to which interrupts or exceptions are mapped are vectored to appropriate service routines.
    Type: Grant
    Filed: March 21, 2006
    Date of Patent: December 16, 2008
    Assignee: MIPS Technologies, Inc.
    Inventors: Adolfo M Nemirovsky, Mario D Nemirovsky, Narendra Sankar
  • Patent number: 7467286
    Abstract: A method and apparatus are provided for executing packed data instructions. According to one aspect of the invention, a processor includes registers, a register renaming unit coupled to the registers, a decoder coupled to the register renaming unit, and a partial-width execution unit coupled to the decoder. The register renaming unit provides an architectural register file to store packed data operands that include data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. Each of the instructions in the first set specify operations to be performed on all of the data elements. In contrast, each of the instructions in the second set specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either the first or second set of instructions.
    Type: Grant
    Filed: May 9, 2005
    Date of Patent: December 16, 2008
    Assignee: Intel Corporation
    Inventors: Mohammad Abdallah, James Coke, Vladimir Pentkovski, Patrice Roussel, Shreekant S. Thakkar
  • Patent number: 7464242
    Abstract: A method, an apparatus, and a computer program product are provided for detecting load/store dependency in a memory system by dynamically changing the address width for comparison. An incoming load/store operation must be compared to the operations in the pipeline and the queues to avoid address conflicts. Overall, the present invention introduces a cache hit or cache miss input into the load/store dependency logic. If the incoming load operation is a cache hit, then the quadword boundary address value is used for detection. If the incoming load operation is a cache miss, then the cacheline boundary address value is used for detection. This invention enhances the performance of LHS and LHR operations in a memory system.
    Type: Grant
    Filed: February 3, 2005
    Date of Patent: December 9, 2008
    Assignee: International Business Machines Corporation
    Inventors: Brian David Barrick, Dwain Alan Hicks, Takeki Osanai, David Scott Ray
  • Patent number: 7460989
    Abstract: A method is provided, wherein a virtual internal master clock is used in connection with a RISC CPU. The RISC CPU comprises a number of concurrently operating function units, wherein each unit runs according to its own clocks, including multiple-stage totally unsynchronized clocks, in order to process a stream of instructions. The method includes the steps of generating a virtual model master clock having a clock cycle, and initializing each of the function units at the beginning of respectively corresponding processing cycles. The method further includes operating each function unit during a respectively corresponding processing cycle to carry out a task with respect to one of the instructions, in order to produce a result. Respective results are all evaluated in synchronization, by means of the master clock. This enables the instruction processing operation to be modeled using a sequential computer language, such as C or C++.
    Type: Grant
    Filed: October 14, 2004
    Date of Patent: December 2, 2008
    Assignee: International Business Machines Corporation
    Inventor: Oliver Keren Ban
  • Patent number: 7447876
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: April 18, 2005
    Date of Patent: November 4, 2008
    Assignee: Seiko Epson Corporation
    Inventors: Cheryl D. Senter, Johannes Wang
  • Publication number: 20080270749
    Abstract: A multi-threaded in-order superscalar processor 2 is described having a fetch stage 8 within which thread interleaving circuitry 36 interleaves instructions taken from different program threads to form an interleaved stream of instructions which is then decoded and subject to issue. Hint generation circuitry 62 within the fetch stage 8 adds hint data to the threads indicating that parallel issue of an associated instruction is permitted with one of more other instructions.
    Type: Application
    Filed: April 25, 2007
    Publication date: October 30, 2008
    Applicant: ARM Limited
    Inventors: Emre Ozer, Vladimir Vasekin, Stuart David Biles
  • Publication number: 20080263318
    Abstract: A processor has an interface portion and an interior environment. The interface portion comprises: at least one port arranged to receive a current time value; a first register associated with the port and arranged to store a trigger time value; and comparison logic configured to detect whether the current time value matches the trigger time value and, provided that said match is detected, to transfer data between the port and an external environment and alter a ready signal to indicate the transfer. The internal environment comprises: an execution unit for transferring data between the at least one port and the internal environment; and a thread scheduler for scheduling a plurality of threads for execution by the execution unit, each thread comprising a sequence of instructions. The scheduling includes scheduling one or more of said threads for execution in dependence on the ready signal.
    Type: Application
    Filed: April 17, 2007
    Publication date: October 23, 2008
    Inventors: Michael David May, Peter Hedinger, Alastair Dixon
  • Publication number: 20080244224
    Abstract: In one embodiment, the present invention includes an apparatus having an instruction selector to select an instruction, where the selector is to store a dependent indicator to indicate a direct dependent consumer instruction of a producer instruction, a decode logic coupled to the instruction selector to receive the dependent indicator when the producer instruction is selected and to generate a wakeup signal for the direct dependent consumer instruction, and wakeup logic to receive the wakeup signal and to indicate that the producer instruction has been selected. Other embodiments are described and claimed.
    Type: Application
    Filed: March 29, 2007
    Publication date: October 2, 2008
    Inventors: Peter Sassone, Jeff Rupley, Bryan Black
  • Publication number: 20080244223
    Abstract: According to one example embodiment of the inventive subject matter, the method and apparatus described herein is used to generate an optimized speculative version of a static piece of code. The portion of code is optimized in the sense that the number of instructions executed will be smaller. However, since the applied optimization is speculative, the optimized version can be incorrect and some mechanism to recover from that situation is required. Thus, the quality of the produced code will be measured by taking into account both the final length of the code as well as the frequency of misspeculation.
    Type: Application
    Filed: March 31, 2007
    Publication date: October 2, 2008
    Inventors: Carlos Garcia Quinones, Jesus Sanchez, Carlos Madriles, Pedro Marcuello, Antonio Gonzalez
  • Patent number: 7430651
    Abstract: A tag monitoring system for assigning tags to instructions. A source supplies instructions to be executed by a functional unit. A register file stores information required for the execution of each instruction. A queue having a plurality of slots containing tags which are used for tagging the instructions. The tags are arranged in the queue in an order specified by the program order of their corresponding instructions. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file stores an instruction's information at a location in the register file defined by the tag assigned to that instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.
    Type: Grant
    Filed: January 25, 2006
    Date of Patent: September 30, 2008
    Assignee: Seiko-Epson Corporation
    Inventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
  • Publication number: 20080235491
    Abstract: A technique for reducing stack pointer adjustment operations when stack dependent operations, which correspond to stack dependent instructions, are encountered includes setting a stack pointer to an initial value for a stack. A number of bytes associated with the stack dependent operation is determined. A stack pointer delta is then modified based upon the number of bytes associated with the stack dependent operation. A current location in the stack is determined based on the stack pointer and the stack pointer delta.
    Type: Application
    Filed: March 22, 2007
    Publication date: September 25, 2008
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Christopher Svec, Faisal Syed, Michael E. Tuuk, Benjamin T. Sander, Gregory W. Smaus
  • Patent number: 7409670
    Abstract: Methods and apparatus are provided for implementing a programmable device including a processor core, a hardware accelerator, and secondary components such as memory. A portion of a program written in a high-level language is automatically selected for hardware acceleration. Dedicated ports are generated to allow the hardware accelerator to handle pointer referencing and dereferencing. A hardware accelerator is generated to perform pipelined processing of instructions. The number of stages implemented for pipelined processing is at least partially dependent on the latency associated with accessing secondary components.
    Type: Grant
    Filed: November 16, 2004
    Date of Patent: August 5, 2008
    Assignee: Altera Corporation
    Inventors: J. Orion Pritchard, Todd Wayne
  • Publication number: 20080172546
    Abstract: A digital signal processor is provided, comprising at least one cluster. The cluster may comprise at least two function units each conducting different instruction types, at least two private register files each associated with one function unit for data storage, a ping-pong register providing exclusively accessible data storage, and a public register file. The public register file comprises at least two read ports, each coupled to a function unit, providing read accessibility for the function units, and one write port to write data to the public register file.
    Type: Application
    Filed: February 26, 2007
    Publication date: July 17, 2008
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Chuan-Cheng Peng, Po-Han Huang
  • Patent number: 7401328
    Abstract: A data processing system includes a grouping tool coupled to a processor. The grouping tool groups the stream of instructions such that each group of instructions has a dimensionless signature annotated thereto. An instruction prefetch unit of the processor fetches the stream of grouped instructions from a memory in the processor and an instruction issue logic unit of the processor identifies boundaries between the groups of instructions by executing a signature detection algorithm. In one embodiment, the data processing system includes a pipelined superscalar processor core and is capable of concurrently executing multiple instructions in the same or different pipeline stages.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: July 15, 2008
    Assignee: LSI Corporation
    Inventor: John Lu
  • Patent number: 7398375
    Abstract: The present invention provides a dynamic scheduling scheme that uses reservation stations having at least one station that stores an at least two operand instruction. An allocator portion determines that the instruction, entering the pipeline, has one ready operand and one not-ready operand, and accordingly places it in a station having only one comparator. The one comparator then compares the not-ready operand with tags broadcasted on a result tag bus to determine when the not-ready operand becomes ready. Once ready, execution is requested to the corresponding functional unit.
    Type: Grant
    Filed: April 3, 2003
    Date of Patent: July 8, 2008
    Assignee: The Regents of the University of Michigan
    Inventors: Daniel J. Ernst, Todd M. Austin
  • Patent number: 7392369
    Abstract: Embodiments include various methods, apparatuses, and systems in which a processor includes an out of order issue engine and an in-order execution pipeline. For some embodiments, the issue engine may be remote from the execution pipeline and execution resources may be many clock cycles away from the issue engine. The issue engine categorizes operations as at least one of either a speculative operations which perform computations, or an architectural operations which has potential to fault or cause an exception. Potentially excepting operations may be decomposed into two separate micro-operations: a speculative micro-operation, which is used to generate data results speculatively so that operations dependent on the results may be speculatively issued, and an architectural micro-operation, which signals the faulting condition for the excepting operation. A STORE operation becomes an architectural operation and all previous faulting conditions may be guaranteed to have evaluated before a STORE is issued.
    Type: Grant
    Filed: April 18, 2006
    Date of Patent: June 24, 2008
    Assignee: Intel Corporation
    Inventors: Jeffery J. Baxter, Gary N. Hammond, Nazar A. Zaidi
  • Patent number: 7376812
    Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.
    Type: Grant
    Filed: May 13, 2002
    Date of Patent: May 20, 2008
    Assignee: Tensilica, Inc.
    Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
  • Patent number: 7373481
    Abstract: A Distributed-Structure-based parallel module structure and parallel processing method. One object is to provide a novel sequence-net computer architecture. A parallel operating structure with N+1 independent flow-sequences is created, and the N+1 flow-sequences control independently the distributed token via the sequence-net instructions to realize the parallel operating of module. Wherein N flow-sequences is regular type, a new consistency flow-sequence Sc running independently is composed by consistency tokens. The distributed token connecting among multi-machines support the co-operation running among N+1 flow-sequences.
    Type: Grant
    Filed: July 18, 2001
    Date of Patent: May 13, 2008
    Inventor: Zhaochang Xu
  • Patent number: 7373485
    Abstract: A clustered superscalar processor for reducing the miss rate of a register cache and reducing the possibility of miss penalties. The processor checks before storing an instruction in an instruction window whether there is a data dependency relationship between the instruction that will be stored in the instruction window and a previous instruction stored in the instruction window. When there is a data dependency relationship, the execution result of the previous instruction of one cluster is communicated to a register cache of another cluster that executes the instruction having a data dependency relationship with the previous instruction.
    Type: Grant
    Filed: March 3, 2005
    Date of Patent: May 13, 2008
    Assignee: National University Corporation Nagoya University
    Inventors: Hideki Ando, Hajime Shimada, Atsushi Mochizuki
  • Publication number: 20080082788
    Abstract: A method and apparatus for improving the operation of an out-of order computer processor by utilizing and managing instruction wakeup using pointers with an instruction queue payload random-access memory, a mapping table, and a multiple wake-up table. Instructions allocated to the instruction queue are identified by association with a physical destination register used to index in the mapping table to provide dependent instruction information for instruction wakeup for scalable instruction queue design, reduced power consumption, and fast branch mis-prediction recovery, without the use of content-addressable memory cells.
    Type: Application
    Filed: October 2, 2006
    Publication date: April 3, 2008
    Applicants: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA, UNIVERSITAT POLITECNICA DE CATALUNYA
    Inventors: ALEXANDER V. VEIDENBAUM, MARCO ANTONIO RAMIREZ SALINAS, ADRIAN CRISTAL KESTELMAN, MATEO VALERO CORTES
  • Patent number: 7343473
    Abstract: A system and method for extracting complex, variable length computer instructions from a stream of complex instructions each subdivided into a variable number of instructions bytes, and aligning instruction bytes of individual ones of the complex instructions. The system receives a portion of the stream of complex instructions and extracts a first set of instruction bytes starting with the first instruction bytes, using an extract shifter. The set of instruction bytes are then passed to an align latch where they are aligned and output to a next instruction detector. The next instruction detector determines the end of the first instruction based on said set of instruction bytes. An extract shifter is used to extract and provide the next set of instruction bytes to an align shifter which aligns and outputs the next instruction. The process is then repeated for the remaining instruction bytes in the stream of complex instructions.
    Type: Grant
    Filed: June 28, 2005
    Date of Patent: March 11, 2008
    Assignee: Transmeta Corporation
    Inventors: Brett Coon, Yoshiyuki Miyayama, Le Trong Nguyen, Johannes Wang
  • Patent number: 7330963
    Abstract: Embodiments include various methods, apparatuses, and systems in which a processor includes an out of order issue engine and an in-order execution pipeline. For some embodiments, the issue engine may be remote from the execution pipeline and execution resources may be many clock cycles away from the issue engine. The issue engine categorizes operations as at least one of either a speculative operations, which perform computations, or an architectural operation, which has potential to fault or cause an exception. Potentially excepting operations may be decomposed into two separate micro-operations: a speculative micro-operation, which is used to generate data results speculatively so that operations dependent on the results may be speculatively issued, and an architectural micro-operation, which signals the faulting condition for the excepting operation. A STORE operation becomes an architectural operation and all previous faulting conditions may be guaranteed to have evaluated before a STORE is issued.
    Type: Grant
    Filed: April 18, 2006
    Date of Patent: February 12, 2008
    Assignee: Intel Corporation
    Inventors: Jeffery J. Baxter, Gary N. Hammond, Nazar A. Zaidi
  • Patent number: 7318145
    Abstract: A random slip generator is provided to lessen side channel leakage and thus thwart cryptanalysis attacks, such as timing attacks and power analysis attacks. Random slip generation may be configurable so that the average frequency of random slips generated by the system may be set. Additional techniques are provided to make nullified instructions consume power like any other executing instruction.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: January 8, 2008
    Assignee: MIPS Technologies, Inc.
    Inventors: Morten Stribaek, Jakob Schou Jensen, Jean-Francois Dhem
  • Publication number: 20080005533
    Abstract: A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.
    Type: Application
    Filed: June 30, 2006
    Publication date: January 3, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Erik R. Altman, Vijayalakshmi Srinivasan
  • Patent number: 7308559
    Abstract: A digital signal processor (DSP) includes dual SIMD units that are connected in cascade, and wherein results of a first SIMD stage of the cascade may be stored in a register file of a second SIMD stage in the cascade. Each SIMD stage contains its own resources for storing operands and intermediate results (e.g., its own register file), as well as for decoding the operations that may be executed in that stage. Within each stage, hardware resources are organized to operate in SIMD manner, so that independent SIMD operations can be executed simultaneously, one in each stage of the cascade. Intermediate operands and results flowing through the cascade are stored at the register files of the stages, and may be accessed from those register files. Data may also be brought from memory directly into the register files of the stages in the cascade.
    Type: Grant
    Filed: June 7, 2003
    Date of Patent: December 11, 2007
    Assignee: International Business Machines Corporation
    Inventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
  • Patent number: 7281119
    Abstract: A computer system supplies instructions simultaneously to a plurality of parallel execution pipelines in either superscalar mode or very long instruction word mode with checks for vertical and horizontal dependency between instructions, the horizontal dependency checks between instructions supplied in the same machine cycle being effective in superscalar mode but disabled in very long instruction word mode.
    Type: Grant
    Filed: May 2, 2000
    Date of Patent: October 9, 2007
    Assignee: STMicroelectronics S.A.
    Inventors: Andrew Cofler, Bruno Fel, Laurent Ducousso
  • Patent number: 7254693
    Abstract: A method, apparatus, and computer program product are disclosed for selectively prohibiting speculative conditional branch execution. A particular type of conditional branch instruction is selected. An indication is stored within each instruction that is the particular type of conditional branch instruction. A processor then fetches a first instruction from code that is to be executed. A determination is made regarding whether the first instruction includes the indication. In response to determining that the instruction includes the indication: speculative execution of the first instruction is prohibited, an actual location to which the first instruction will branch is resolved, and execution of the code is branched to the actual location. In response to determining that the instruction does not include the indication, the first instruction is speculatively executed.
    Type: Grant
    Filed: December 2, 2004
    Date of Patent: August 7, 2007
    Assignee: International Business Machines Corporation
    Inventors: Lee Evan Eisen, Francis Patrick O'Connell
  • Patent number: 7249243
    Abstract: Techniques for control word prediction and speculative execution. In one embodiment, an apparatus includes a control word predictor, execution resources, and a comparison module. The control word predictor of this embodiment predicts a predicted control word for execution of operations in response to a control word changing operation. The execution resources of this embodiment speculatively execute the plurality of operations utilizing the predicted control word, and the comparison module determines if the predicted control word matches an actual control word set by the control word changing operation or a plurality of other control words, and to cause re-execution of said plurality of operations if said actual control word matches any of the plurality of other control words.
    Type: Grant
    Filed: August 6, 2003
    Date of Patent: July 24, 2007
    Assignee: Intel Corporation
    Inventors: Mohammad A. Abadallah, Mitchell Diamond, David B. Jackson, Kip A. Baumann, Ki W. Yoon, Rafi M. Saied, Robert L. Farrell
  • Patent number: 7225320
    Abstract: A multi-processor unit includes a first domain for processing data according to first configuration information and having multiple first domain processors each connected to communication apparatus and each performing a different function of the first processing. The first domain processors include a first domain control processor for controlling the first processing of the first domain. The multi-processor unit also includes a second domain for second processing of the first processed data depending on a second domain configuration and having multiple second domain processors each connected to the communication apparatus and each performing a different function of the second processing. The second domain processors include a second domain control processor for controlling the second processing of the second domain.
    Type: Grant
    Filed: December 28, 2000
    Date of Patent: May 29, 2007
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Geoffrey Francis Burns
  • Patent number: 7210024
    Abstract: Hazard detection is simplified by converting a conditional instruction, operative to perform an operation if a condition is satisfied, into an emissary instruction operative to evaluate the condition and an unconditional base instruction operative to perform the operation. The emissary instruction is executed, while the base instruction is halted. The emissary instruction evaluates the condition and reports the condition evaluation back to the base instruction. Based on the condition evaluation, the base instruction is either launched into the pipeline for execution, or it is discarded (or a NOP, or null instruction, substituted for it). In either case, the dependencies of following instructions may be resolved.
    Type: Grant
    Filed: February 10, 2005
    Date of Patent: April 24, 2007
    Assignee: Qualcomm Incorporated
    Inventors: Michael Scott McIlvaine, James Norris Dieffenderfer, Jeffrey Todd Bridges, Thomas Andrew Sartorius, Rodney Wayne Smith
  • Patent number: 7200737
    Abstract: A processor is provided that includes an execution unit for executing instructions and a replay system for replaying instructions which have not executed properly. The replay system is coupled to the execution unit and includes a checker for determining whether each instruction has executed properly and a replay queue coupled to the checker for temporarily storing one or more instructions for replay. The replay queue may be used to store a long latency instruction, such as a load in which data must be retrieved from an external memory device. The long latency instruction and possibly one or more dependent instruction are stored in the replay queue until the long latency instruction is ready to be executed (e.g., data for the load instruction has been retrieved from external memory). Once the long latency instruction is ready to be executed, (e.g., the data is available), the long latency instruction may then be unloaded from the replay queue for re-execution.
    Type: Grant
    Filed: December 29, 1999
    Date of Patent: April 3, 2007
    Assignee: Intel Corporation
    Inventors: Amit A. Merchant, Darrell D. Boggs, David J. Sager
  • Patent number: 7171541
    Abstract: A register renaming system for a processor based on superscalar architecture that can process a larger number of instructions per cycle by providing a free list to hold unallocated physical-register numbers and a mapping table whose entries are provided in respective correspondence with the logical registers and each designed to hold a physical-register number, and by pipelining where dependency checks among instructions are to be done as a pre-process.
    Type: Grant
    Filed: September 6, 2000
    Date of Patent: January 30, 2007
    Inventor: Hajime Seki
  • Patent number: 7162610
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: September 12, 2003
    Date of Patent: January 9, 2007
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H Trang
  • Patent number: 7143401
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Grant
    Filed: February 20, 2001
    Date of Patent: November 28, 2006
    Assignee: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Patent number: 7136989
    Abstract: A parallel computation processor being capable of high-speed loop operation. When instruction decoders decode the VLOOP instruction, which triggers loop operation, an instruction buffer starts storing normal instructions. The instruction buffer dispatches a VLIW instruction composed of n pieces of normal instructions to execution units each time n pieces of instructions are stored therein. The execution units concurrently execute the instructions. After all instructions comprised in a loop have been stored in the buffer and once dispatched as VLIW instructions to be executed, the loop is executed repeatedly.
    Type: Grant
    Filed: September 26, 2002
    Date of Patent: November 14, 2006
    Assignee: NEC Corporation
    Inventor: Daiji Ishii
  • Patent number: 7111152
    Abstract: Instructions in a computer system are executed in a plurality of parallel execution pipelines, a horizontal dependency check is carried out between instructions supplied to the parallel pipelines and in response to detecting horizontal dependency a control signal of a first or second type is generated depending on whether the dependency can be resolved by activating a by-pass or whether a temporary stall is required in one of the pipelines.
    Type: Grant
    Filed: May 2, 2000
    Date of Patent: September 19, 2006
    Assignee: STMicroelectronics S.A.
    Inventors: Andrew Cofler, Bruno Fel, Laurent Ducousso
  • Patent number: 7096347
    Abstract: The instruction pipeline of a processor, which includes execution circuitry and instruction sequencing logic, receives a stream of instructions including a pipeline interlocking test instruction. The processor includes pipeline control logic that, responsive to receipt of the test instruction, interlocks the instruction pipeline as specified in the test instruction to prevent advancement of at least one first instruction in the instruction pipeline while permitting advancement of at least one second instruction in the instruction pipeline until occurrence of a release condition also specified by the test instruction. In response to the release condition, the pipeline control logic releases the interlock to enable advancement of said at least one instruction in the instruction pipeline.
    Type: Grant
    Filed: October 25, 2001
    Date of Patent: August 22, 2006
    Inventor: Charles R. Moore
  • Patent number: 7089404
    Abstract: Apparatus and a method for causing scheduler software to produce code which executes more rapidly by ignoring some of the normal constraints placed on its scheduling operations and simply scheduling certain instructions to run as fast as possible, raising an exception if the scheduling violates a scheduling constraint, and determining steps to be taken for correctly executing each set of instructions about which an exception is raised.
    Type: Grant
    Filed: June 14, 1999
    Date of Patent: August 8, 2006
    Assignee: Transmeta Corporation
    Inventors: Guillermo J. Rozas, Godfrey P. D'Souza, Charles R. Price, Paul S. Serris