Superscalar Patents (Class 712/23)

Multilevel scheme for dynamically and statically predicting instruction resource utilization to generate execution cluster partitions

Patent number: 7562206

Abstract: Microarchitecture policies and structures to predict execution clusters and facilitate inter-cluster communication are disclosed. In disclosed embodiments, sequentially ordered instructions are decoded into micro-operations. Execution of one set of micro-operations is predicted to involve execution resources to perform memory access operations and inter-cluster communication, but not to perform branching operations. Execution of a second set of micro-operations is predicted to involve execution resources to perform branching operations but not to perform memory access operations. The micro-operations are partitioned for execution in accordance with these predictions, the first set of micro-operations to a first cluster of execution resources and the second set of micro-operations to a second cluster of execution resources. The first and second sets of micro-operations are executed out of sequential order and are retired to represent their sequential instruction ordering.

Type: Grant

Filed: December 30, 2005

Date of Patent: July 14, 2009

Assignee: Intel Corporation

Inventors: Avinash Sodani, Alexandre J. Farcy, Stephan J. Jourdan, Per Hammarlund, Mark C. Davis
PROCESSING PIPELINE HAVING PARALLEL DISPATCH AND METHOD THEREOF

Publication number: 20090172359

Abstract: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.

Type: Application

Filed: December 31, 2007

Publication date: July 2, 2009

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Gene Shen, Sean Lie
RISC microprocessor architecture implementing multiple typed register sets

Patent number: 7555631

Abstract: A register system for a data processor which operates in a plurality of modes. The register system provides multiple, identical banks of register sets, the data processor controlling access such that instructions and processes need not specify any given bank. An integer register set includes first (RA[23:0]) and second (RA[31:24]) subsets, and a shadow subset (RT[31:24]). While the data processor is in a first mode, instructions access the first and second subsets. While the data processor is in a second mode, instructions may access the first subset, but any attempts to access the second subset are re-routed to the shadow subset instead, transparently to the instructions, allowing system routines to seemingly use the second subset without having to save and restore data which user routines have written to the second subset. A re-typable register set provides integer width data and floating point width data in response to integer instructions and floating point instructions, respectively.

Type: Grant

Filed: January 31, 2002

Date of Patent: June 30, 2009

Inventors: Sanjiv Garg, Derek J. Lentz, Le Trong Nguyen, Sho Long Chen
High-performance superscalar-based computer system with out-of-order instruction execution and concurrent results distribution

Patent number: 7555632

Abstract: The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available.

Type: Grant

Filed: December 27, 2005

Date of Patent: June 30, 2009

Assignee: Seiko Epson Corporation

Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
ELECTRONIC SYSTEM FOR CHANGING NUMBER OF PIPELINE STAGES OF A PIPELINE

Publication number: 20090138674

Abstract: An electronic system includes a pipeline having a first number of pipeline stages coupled in series, a pipeline control unit, and a logic engine, wherein each pipeline stage in the pipeline is for outputting data to a next pipeline stage at each cycle of a clock signal. The pipeline control unit is for changing the first number of pipeline stages in the pipeline to a second number of pipeline stages. The logic engine is for performing operations of the electronic system in a first mode by utilizing the pipeline having the first number of pipeline stages and for performing operations of the electronic system in a second mode by utilizing the pipeline having the second number of pipeline stages. A frequency control unit and a voltage control unit, coupled to the pipeline and the logic engine, respectively adjust the frequency and voltage of the electronic system accordingly.

Type: Application

Filed: November 22, 2007

Publication date: May 28, 2009

Inventors: Li-Hung Chang, Hong-Men Su
Multithreaded processor including a functional unit shared between multiple requestors and arbitration therefor

Patent number: 7533248

Abstract: A multithreaded processor including a shared functional unit. In one embodiment, the multithreaded processor includes a functional unit coupled to a multithreaded instruction source that may request access to use the functional unit. The multithreaded processor may also include a processing unit that is coupled to request access to use the functional unit. The functional unit may be configured to execute one of an instruction provided by the multithreaded instruction source and an operation provided by the processing unit in a given cycle dependent upon which of the multithreaded instruction source and the processing unit has a higher priority.

Type: Grant

Filed: June 30, 2004

Date of Patent: May 12, 2009

Assignee: Sun Microsystems, Inc.

Inventors: Robert T. Golla, Gregory F. Grohoski
Selective power-down for high performance CPU/system

Patent number: 7506185

Abstract: A microelectronic device according to the present invention is made up of two or more functional units, which are all disposed on a single chip, or die. The present invention works on the strategy that all of the functional units on the die are not, and do not need to be operational at a given time in the execution of a computer program that is controlling the microelectronic device. The present invention on a very rapid basis (typically a half clock cycle), therefore, turns on and off the functional units of the microelectronic device in accordance with the requirements of the program being executed. This power down can be achieved by one of three techniques; turning off clock inputs to the functional units interrupting the supply of power to the functional units, or deactivating input signals to the functional units.

Type: Grant

Filed: June 6, 2006

Date of Patent: March 17, 2009

Assignee: Seiko Epson Corporation

Inventor: Chong Ming Lin
Synchronizing master processor by stalling when tracking of coprocessor rename register resource usage count for sent instructions reaches credited apportioned number

Patent number: 7490225

Abstract: Synchronized register renaming between a master processor and a coprocessor that receives operations from the master enables efficient implementation of register renaming and operation execution in the processors. An ideal and an external register allocation map are implemented in the coprocessor. When registers are no longer allocated according to the ideal allocation map and the registers are currently allocated according to the external allocation map, the registers are deallocated in the external map and the number of freed registers is reported to the master. The master increments a free register credit count accordingly, and decrements the credit count by one for each operation issued to the coprocessor. An operation is not issued to the coprocessor unless at least a register is free according to the credit count. The master also throttles coprocessor operation issue based on a credit count corresponding to free scheduler entries available in the coprocessor.

Type: Grant

Filed: October 31, 2006

Date of Patent: February 10, 2009

Assignee: Sun Microsystems, Inc.

Inventors: John Gregory Favor, Christopher P. Nelson
Method and Apparatus for Length Decoding and Identifying Boundaries of Variable Length Instructions

Publication number: 20090019257

Abstract: A mechanism for superscalar decode of variable length instructions. A length decode unit may obtain a plurality of instruction bytes based on a scan window of a predetermined size. The instruction bytes may be associated with a plurality of variable length instructions, which are scheduled to be executed by a processing unit. The length decode unit may, for each instruction byte, estimate the start of a next variable length instruction following a current variable length instruction, and store a first pointer. A pre-pick unit may, for each instruction byte, use the first pointer to estimate the start of a subsequent variable length instruction following the next variable length instruction within the scan window, and store a second pointer. A pick unit may use a start pointer and related first and second pointers to determine the actual start of the variable length instructions within the scan window, and generate instruction pointers.

Type: Application

Filed: July 10, 2007

Publication date: January 15, 2009

Inventors: Gene W. Shen, Sean Lie
Age matrix for queue dispatch order

Publication number: 20080320274

Abstract: An apparatus for queue allocation. An embodiment of the apparatus includes a dispatch order data structure, a bit vector, and a queue controller. The dispatch order data structure corresponds to a queue. The dispatch order data structure stores a plurality of dispatch indicators associated with a plurality of pairs of entries of the queue to indicate a write order of the entries in the queue. The bit vector stores a plurality of mask values corresponding to the dispatch indicators of the dispatch order data structure. The queue controller interfaces with the queue and the dispatch order data structure. The queue controller excludes at least some of the entries from a queue operation based on the mask values of the bit vector.

Type: Application

Filed: June 19, 2007

Publication date: December 25, 2008

Applicant: Raza Microelectronics, Inc.

Inventors: Gaurav Singh, Srivatsan Srinivasan, Lintsung Wong
METHOD AND APPARATUS FOR SPATIAL REGISTER PARTITIONING WITH A MULTI-BIT CELL REGISTER FILE

Publication number: 20080313424

Abstract: There is provided a multi-bit storage cell for a register file. The storage cell includes a first set of storage elements for a vector slice. Each storage element respectively corresponds to a particular one of a plurality of thread sets for the vector slice. The storage cell includes a second set of storage elements for a scalar slice. Each storage element in the second set respectively corresponds to a particular one of at least one thread set for the scalar slice. The storage cell includes at least one selection circuit for selecting, for an instruction issued by a thread, a particular one of the storage elements from any of the first set and the second set based upon the instruction being a vector instruction or a scalar instruction and based upon a corresponding set from among the pluralities of thread sets to which the thread belongs.

Type: Application

Filed: June 13, 2007

Publication date: December 18, 2008

Inventor: MICHAEL GSCHWIND
Enhanced Load Lookahead Prefetch in Single Threaded Mode for a Simultaneous Multithreaded Microprocessor

Publication number: 20080313425

Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. A processing unit detects if a long-latency miss associated with a load instruction has been encountered. Responsive to a long-latency miss, the processing unit enters a load lookahead mode. Responsive to entering the load lookahead mode, the processing unit dispatches each instruction from a first set of instructions from a first buffer with an associated vector. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to completed execution of the first set of instructions from the first buffer, the processing unit copies the set of vectors from a first vector array to a second vector array. Then the processing unit dispatches a second set of instructions from a second buffer with an associated vector from the second vector array.

Type: Application

Filed: June 15, 2007

Publication date: December 18, 2008

Inventors: Hung Q. Le, Dung Q. Nguyen
Interrupt and exception handling for multi-streaming digital processors

Patent number: 7467385

Abstract: A multi-streaming processor has a plurality of streams for streaming one or more instruction threads, a set of functional resources for processing instructions from streams, and interrupt handler logic. The logic detects and maps interrupts and exceptions to one or more specific streams. In some embodiments one interrupt or exception may be mapped to two or more streams, and in others two or more interrupts or exceptions may be mapped to one stream. Mapping may be static and determined at processor design, programmable, with data stored and amendable, or conditional and dynamic, the interrupt logic executing an algorithm sensitive to variables to determine the mapping. Interrupts may be external interrupts generated by devices external to the processor software (internal) interrupts generated by active streams, or conditional, based on variables. After interrupts are acknowledged streams to which interrupts or exceptions are mapped are vectored to appropriate service routines.

Type: Grant

Filed: March 21, 2006

Date of Patent: December 16, 2008

Assignee: MIPS Technologies, Inc.

Inventors: Adolfo M Nemirovsky, Mario D Nemirovsky, Narendra Sankar
Executing partial-width packed data instructions

Patent number: 7467286

Abstract: A method and apparatus are provided for executing packed data instructions. According to one aspect of the invention, a processor includes registers, a register renaming unit coupled to the registers, a decoder coupled to the register renaming unit, and a partial-width execution unit coupled to the decoder. The register renaming unit provides an architectural register file to store packed data operands that include data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. Each of the instructions in the first set specify operations to be performed on all of the data elements. In contrast, each of the instructions in the second set specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either the first or second set of instructions.

Type: Grant

Filed: May 9, 2005

Date of Patent: December 16, 2008

Assignee: Intel Corporation

Inventors: Mohammad Abdallah, James Coke, Vladimir Pentkovski, Patrice Roussel, Shreekant S. Thakkar
Method of load/store dependencies detection with dynamically changing address length

Patent number: 7464242

Abstract: A method, an apparatus, and a computer program product are provided for detecting load/store dependency in a memory system by dynamically changing the address width for comparison. An incoming load/store operation must be compared to the operations in the pipeline and the queues to avoid address conflicts. Overall, the present invention introduces a cache hit or cache miss input into the load/store dependency logic. If the incoming load operation is a cache hit, then the quadword boundary address value is used for detection. If the incoming load operation is a cache miss, then the cacheline boundary address value is used for detection. This invention enhances the performance of LHS and LHR operations in a memory system.

Type: Grant

Filed: February 3, 2005

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Brian David Barrick, Dwain Alan Hicks, Takeki Osanai, David Scott Ray
Method and apparatus for modeling multiple concurrently dispatched instruction streams in super scalar CPU with a sequential language

Patent number: 7460989

Abstract: A method is provided, wherein a virtual internal master clock is used in connection with a RISC CPU. The RISC CPU comprises a number of concurrently operating function units, wherein each unit runs according to its own clocks, including multiple-stage totally unsynchronized clocks, in order to process a stream of instructions. The method includes the steps of generating a virtual model master clock having a clock cycle, and initializing each of the function units at the beginning of respectively corresponding processing cycles. The method further includes operating each function unit during a respectively corresponding processing cycle to carry out a task with respect to one of the instructions, in order to produce a result. Respective results are all evaluated in synchronization, by means of the master clock. This enables the instruction processing operation to be modeled using a sequential computer language, such as C or C++.

Type: Grant

Filed: October 14, 2004

Date of Patent: December 2, 2008

Assignee: International Business Machines Corporation

Inventor: Oliver Keren Ban
System and method for handling load and/or store operations in a superscalar microprocessor

Patent number: 7447876

Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.

Type: Grant

Filed: April 18, 2005

Date of Patent: November 4, 2008

Assignee: Seiko Epson Corporation

Inventors: Cheryl D. Senter, Johannes Wang
Instruction issue control within a multi-threaded in-order superscalar processor

Publication number: 20080270749

Abstract: A multi-threaded in-order superscalar processor 2 is described having a fetch stage 8 within which thread interleaving circuitry 36 interleaves instructions taken from different program threads to form an interleaved stream of instructions which is then decoded and subject to issue. Hint generation circuitry 62 within the fetch stage 8 adds hint data to the threads indicating that parallel issue of an associated instruction is permitted with one of more other instructions.

Type: Application

Filed: April 25, 2007

Publication date: October 30, 2008

Applicant: ARM Limited

Inventors: Emre Ozer, Vladimir Vasekin, Stuart David Biles
Timed ports

Publication number: 20080263318

Abstract: A processor has an interface portion and an interior environment. The interface portion comprises: at least one port arranged to receive a current time value; a first register associated with the port and arranged to store a trigger time value; and comparison logic configured to detect whether the current time value matches the trigger time value and, provided that said match is detected, to transfer data between the port and an external environment and alter a ready signal to indicate the transfer. The internal environment comprises: an execution unit for transferring data between the at least one port and the internal environment; and a thread scheduler for scheduling a plurality of threads for execution by the execution unit, each thread comprising a sequence of instructions. The scheduling includes scheduling one or more of said threads for execution in dependence on the ready signal.

Type: Application

Filed: April 17, 2007

Publication date: October 23, 2008

Inventors: Michael David May, Peter Hedinger, Alastair Dixon
Scheduling a direct dependent instruction

Publication number: 20080244224

Abstract: In one embodiment, the present invention includes an apparatus having an instruction selector to select an instruction, where the selector is to store a dependent indicator to indicate a direct dependent consumer instruction of a producer instruction, a decode logic coupled to the instruction selector to receive the dependent indicator when the producer instruction is selected and to generate a wakeup signal for the direct dependent consumer instruction, and wakeup logic to receive the wakeup signal and to indicate that the producer instruction has been selected. Other embodiments are described and claimed.

Type: Application

Filed: March 29, 2007

Publication date: October 2, 2008

Inventors: Peter Sassone, Jeff Rupley, Bryan Black
BRANCH PRUNING IN ARCHITECTURES WITH SPECULATION SUPPORT

Publication number: 20080244223

Abstract: According to one example embodiment of the inventive subject matter, the method and apparatus described herein is used to generate an optimized speculative version of a static piece of code. The portion of code is optimized in the sense that the number of instructions executed will be smaller. However, since the applied optimization is speculative, the optimized version can be incorrect and some mechanism to recover from that situation is required. Thus, the quality of the produced code will be measured by taking into account both the final length of the code as well as the frequency of misspeculation.

Type: Application

Filed: March 31, 2007

Publication date: October 2, 2008

Inventors: Carlos Garcia Quinones, Jesus Sanchez, Carlos Madriles, Pedro Marcuello, Antonio Gonzalez
System and method for assigning tags to control instruction processing in a superscalar processor

Patent number: 7430651

Abstract: A tag monitoring system for assigning tags to instructions. A source supplies instructions to be executed by a functional unit. A register file stores information required for the execution of each instruction. A queue having a plurality of slots containing tags which are used for tagging the instructions. The tags are arranged in the queue in an order specified by the program order of their corresponding instructions. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file stores an instruction's information at a location in the register file defined by the tag assigned to that instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.

Type: Grant

Filed: January 25, 2006

Date of Patent: September 30, 2008

Assignee: Seiko-Epson Corporation

Inventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
Techniques for Maintaining a Stack Pointer

Publication number: 20080235491

Abstract: A technique for reducing stack pointer adjustment operations when stack dependent operations, which correspond to stack dependent instructions, are encountered includes setting a stack pointer to an initial value for a stack. A number of bytes associated with the stack dependent operation is determined. A stack pointer delta is then modified based upon the number of bytes associated with the stack dependent operation. A current location in the stack is determined based on the stack pointer and the stack pointer delta.

Type: Application

Filed: March 22, 2007

Publication date: September 25, 2008

Applicant: Advanced Micro Devices, Inc.

Inventors: Christopher Svec, Faisal Syed, Michael E. Tuuk, Benjamin T. Sander, Gregory W. Smaus
Scheduling logic on a programmable device implemented using a high-level language

Patent number: 7409670

Abstract: Methods and apparatus are provided for implementing a programmable device including a processor core, a hardware accelerator, and secondary components such as memory. A portion of a program written in a high-level language is automatically selected for hardware acceleration. Dedicated ports are generated to allow the hardware accelerator to handle pointer referencing and dereferencing. A hardware accelerator is generated to perform pipelined processing of instructions. The number of stages implemented for pipelined processing is at least partially dependent on the latency associated with accessing secondary components.

Type: Grant

Filed: November 16, 2004

Date of Patent: August 5, 2008

Assignee: Altera Corporation

Inventors: J. Orion Pritchard, Todd Wayne
DIGITAL SIGNAL PROCESSOR

Publication number: 20080172546

Abstract: A digital signal processor is provided, comprising at least one cluster. The cluster may comprise at least two function units each conducting different instruction types, at least two private register files each associated with one function unit for data storage, a ping-pong register providing exclusively accessible data storage, and a public register file. The public register file comprises at least two read ports, each coupled to a function unit, providing read accessibility for the function units, and one write port to write data to the public register file.

Type: Application

Filed: February 26, 2007

Publication date: July 17, 2008

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Chuan-Cheng Peng, Po-Han Huang
Software-implemented grouping techniques for use in a superscalar data processing system

Patent number: 7401328

Abstract: A data processing system includes a grouping tool coupled to a processor. The grouping tool groups the stream of instructions such that each group of instructions has a dimensionless signature annotated thereto. An instruction prefetch unit of the processor fetches the stream of grouped instructions from a memory in the processor and an instruction issue logic unit of the processor identifies boundaries between the groups of instructions by executing a signature detection algorithm. In one embodiment, the data processing system includes a pipelined superscalar processor core and is capable of concurrently executing multiple instructions in the same or different pipeline stages.

Type: Grant

Filed: December 18, 2003

Date of Patent: July 15, 2008

Assignee: LSI Corporation

Inventor: John Lu
Technique for reduced-tag dynamic scheduling and reduced-tag prediction

Patent number: 7398375

Abstract: The present invention provides a dynamic scheduling scheme that uses reservation stations having at least one station that stores an at least two operand instruction. An allocator portion determines that the instruction, entering the pipeline, has one ready operand and one not-ready operand, and accordingly places it in a station having only one comparator. The one comparator then compares the not-ready operand with tags broadcasted on a result tag bus to determine when the not-ready operand becomes ready. Once ready, execution is requested to the corresponding functional unit.

Type: Grant

Filed: April 3, 2003

Date of Patent: July 8, 2008

Assignee: The Regents of the University of Michigan

Inventors: Daniel J. Ernst, Todd M. Austin
Decomposing architectural operation into speculative and architectural micro-operations for speculative execution of others and for violation check

Patent number: 7392369

Abstract: Embodiments include various methods, apparatuses, and systems in which a processor includes an out of order issue engine and an in-order execution pipeline. For some embodiments, the issue engine may be remote from the execution pipeline and execution resources may be many clock cycles away from the issue engine. The issue engine categorizes operations as at least one of either a speculative operations which perform computations, or an architectural operations which has potential to fault or cause an exception. Potentially excepting operations may be decomposed into two separate micro-operations: a speculative micro-operation, which is used to generate data results speculatively so that operations dependent on the results may be speculatively issued, and an architectural micro-operation, which signals the faulting condition for the excepting operation. A STORE operation becomes an architectural operation and all previous faulting conditions may be guaranteed to have evaluated before a STORE is issued.

Type: Grant

Filed: April 18, 2006

Date of Patent: June 24, 2008

Assignee: Intel Corporation

Inventors: Jeffery J. Baxter, Gary N. Hammond, Nazar A. Zaidi
Vector co-processor for configurable and extensible processor architecture

Patent number: 7376812

Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.

Type: Grant

Filed: May 13, 2002

Date of Patent: May 20, 2008

Assignee: Tensilica, Inc.

Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
Distributed-structure-based parallel module structure and parallel processing method

Patent number: 7373481

Abstract: A Distributed-Structure-based parallel module structure and parallel processing method. One object is to provide a novel sequence-net computer architecture. A parallel operating structure with N+1 independent flow-sequences is created, and the N+1 flow-sequences control independently the distributed token via the sequence-net instructions to realize the parallel operating of module. Wherein N flow-sequences is regular type, a new consistency flow-sequence Sc running independently is composed by consistency tokens. The distributed token connecting among multi-machines support the co-operation running among N+1 flow-sequences.

Type: Grant

Filed: July 18, 2001

Date of Patent: May 13, 2008

Inventor: Zhaochang Xu
Clustered superscalar processor with communication control between clusters

Patent number: 7373485

Abstract: A clustered superscalar processor for reducing the miss rate of a register cache and reducing the possibility of miss penalties. The processor checks before storing an instruction in an instruction window whether there is a data dependency relationship between the instruction that will be stored in the instruction window and a previous instruction stored in the instruction window. When there is a data dependency relationship, the execution result of the previous instruction of one cluster is communicated to a register cache of another cluster that executes the instruction having a data dependency relationship with the previous instruction.

Type: Grant

Filed: March 3, 2005

Date of Patent: May 13, 2008

Assignee: National University Corporation Nagoya University

Inventors: Hideki Ando, Hajime Shimada, Atsushi Mochizuki
POINTER-BASED INSTRUCTION QUEUE DESIGN FOR OUT-OF-ORDER PROCESSORS

Publication number: 20080082788

Abstract: A method and apparatus for improving the operation of an out-of order computer processor by utilizing and managing instruction wakeup using pointers with an instruction queue payload random-access memory, a mapping table, and a multiple wake-up table. Instructions allocated to the instruction queue are identified by association with a physical destination register used to index in the mapping table to provide dependent instruction information for instruction wakeup for scalable instruction queue design, reduced power consumption, and fast branch mis-prediction recovery, without the use of content-addressable memory cells.

Type: Application

Filed: October 2, 2006

Publication date: April 3, 2008

Applicants: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA, UNIVERSITAT POLITECNICA DE CATALUNYA

Inventors: ALEXANDER V. VEIDENBAUM, MARCO ANTONIO RAMIREZ SALINAS, ADRIAN CRISTAL KESTELMAN, MATEO VALERO CORTES
System and method for translating non-native instructions to native instructions for processing on a host processor

Patent number: 7343473

Abstract: A system and method for extracting complex, variable length computer instructions from a stream of complex instructions each subdivided into a variable number of instructions bytes, and aligning instruction bytes of individual ones of the complex instructions. The system receives a portion of the stream of complex instructions and extracts a first set of instruction bytes starting with the first instruction bytes, using an extract shifter. The set of instruction bytes are then passed to an align latch where they are aligned and output to a next instruction detector. The next instruction detector determines the end of the first instruction based on said set of instruction bytes. An extract shifter is used to extract and provide the next set of instruction bytes to an align shifter which aligns and outputs the next instruction. The process is then repeated for the remaining instruction bytes in the stream of complex instructions.

Type: Grant

Filed: June 28, 2005

Date of Patent: March 11, 2008

Assignee: Transmeta Corporation

Inventors: Brett Coon, Yoshiyuki Miyayama, Le Trong Nguyen, Johannes Wang
Resolving all previous potentially excepting architectural operations before issuing store architectural operation

Patent number: 7330963

Abstract: Embodiments include various methods, apparatuses, and systems in which a processor includes an out of order issue engine and an in-order execution pipeline. For some embodiments, the issue engine may be remote from the execution pipeline and execution resources may be many clock cycles away from the issue engine. The issue engine categorizes operations as at least one of either a speculative operations, which perform computations, or an architectural operation, which has potential to fault or cause an exception. Potentially excepting operations may be decomposed into two separate micro-operations: a speculative micro-operation, which is used to generate data results speculatively so that operations dependent on the results may be speculatively issued, and an architectural micro-operation, which signals the faulting condition for the excepting operation. A STORE operation becomes an architectural operation and all previous faulting conditions may be guaranteed to have evaluated before a STORE is issued.

Type: Grant

Filed: April 18, 2006

Date of Patent: February 12, 2008

Assignee: Intel Corporation

Inventors: Jeffery J. Baxter, Gary N. Hammond, Nazar A. Zaidi
Random slip generator

Patent number: 7318145

Abstract: A random slip generator is provided to lessen side channel leakage and thus thwart cryptanalysis attacks, such as timing attacks and power analysis attacks. Random slip generation may be configurable so that the average frequency of random slips generated by the system may be set. Additional techniques are provided to make nullified instructions consume power like any other executing instruction.

Type: Grant

Filed: May 9, 2002

Date of Patent: January 8, 2008

Assignee: MIPS Technologies, Inc.

Inventors: Morten Stribaek, Jakob Schou Jensen, Jean-Francois Dhem
A METHOD TO REDUCE THE NUMBER OF LOAD INSTRUCTIONS SEARCHED BY STORES AND SNOOPS IN AN OUT-OF-ORDER PROCESSOR

Publication number: 20080005533

Abstract: A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.

Type: Application

Filed: June 30, 2006

Publication date: January 3, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Erik R. Altman, Vijayalakshmi Srinivasan
Digital signal processor with cascaded SIMD organization

Patent number: 7308559

Abstract: A digital signal processor (DSP) includes dual SIMD units that are connected in cascade, and wherein results of a first SIMD stage of the cascade may be stored in a register file of a second SIMD stage in the cascade. Each SIMD stage contains its own resources for storing operands and intermediate results (e.g., its own register file), as well as for decoding the operations that may be executed in that stage. Within each stage, hardware resources are organized to operate in SIMD manner, so that independent SIMD operations can be executed simultaneously, one in each stage of the cascade. Intermediate operands and results flowing through the cascade are stored at the register files of the stages, and may be accessed from those register files. Data may also be brought from memory directly into the register files of the stages in the cascade.

Type: Grant

Filed: June 7, 2003

Date of Patent: December 11, 2007

Assignee: International Business Machines Corporation

Inventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
Selective vertical and horizontal dependency resolution via split-bit propagation in a mixed-architecture system having superscalar and VLIW modes

Patent number: 7281119

Abstract: A computer system supplies instructions simultaneously to a plurality of parallel execution pipelines in either superscalar mode or very long instruction word mode with checks for vertical and horizontal dependency between instructions, the horizontal dependency checks between instructions supplied in the same machine cycle being effective in superscalar mode but disabled in very long instruction word mode.

Type: Grant

Filed: May 2, 2000

Date of Patent: October 9, 2007

Assignee: STMicroelectronics S.A.

Inventors: Andrew Cofler, Bruno Fel, Laurent Ducousso
Selectively prohibiting speculative execution of conditional branch type based on instruction bit

Patent number: 7254693

Abstract: A method, apparatus, and computer program product are disclosed for selectively prohibiting speculative conditional branch execution. A particular type of conditional branch instruction is selected. An indication is stored within each instruction that is the particular type of conditional branch instruction. A processor then fetches a first instruction from code that is to be executed. A determination is made regarding whether the first instruction includes the indication. In response to determining that the instruction includes the indication: speculative execution of the first instruction is prohibited, an actual location to which the first instruction will branch is resolved, and execution of the code is branched to the actual location. In response to determining that the instruction does not include the indication, the first instruction is speculatively executed.

Type: Grant

Filed: December 2, 2004

Date of Patent: August 7, 2007

Assignee: International Business Machines Corporation

Inventors: Lee Evan Eisen, Francis Patrick O'Connell
Control word prediction and varying recovery upon comparing actual to set of stored words

Patent number: 7249243

Abstract: Techniques for control word prediction and speculative execution. In one embodiment, an apparatus includes a control word predictor, execution resources, and a comparison module. The control word predictor of this embodiment predicts a predicted control word for execution of operations in response to a control word changing operation. The execution resources of this embodiment speculatively execute the plurality of operations utilizing the predicted control word, and the comparison module determines if the predicted control word matches an actual control word set by the control word changing operation or a plurality of other control words, and to cause re-execution of said plurality of operations if said actual control word matches any of the plurality of other control words.

Type: Grant

Filed: August 6, 2003

Date of Patent: July 24, 2007

Assignee: Intel Corporation

Inventors: Mohammad A. Abadallah, Mitchell Diamond, David B. Jackson, Kip A. Baumann, Ki W. Yoon, Rafi M. Saied, Robert L. Farrell
Control architecture for a high-throughput multi-processor channel decoding system

Patent number: 7225320

Abstract: A multi-processor unit includes a first domain for processing data according to first configuration information and having multiple first domain processors each connected to communication apparatus and each performing a different function of the first processing. The first domain processors include a first domain control processor for controlling the first processing of the first domain. The multi-processor unit also includes a second domain for second processing of the first processed data depending on a second domain configuration and having multiple second domain processors each connected to the communication apparatus and each performing a different function of the second processing. The second domain processors include a second domain control processor for controlling the second processing of the second domain.

Type: Grant

Filed: December 28, 2000

Date of Patent: May 29, 2007

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Geoffrey Francis Burns
Conditional instruction execution via emissary instruction for condition evaluation

Patent number: 7210024

Abstract: Hazard detection is simplified by converting a conditional instruction, operative to perform an operation if a condition is satisfied, into an emissary instruction operative to evaluate the condition and an unconditional base instruction operative to perform the operation. The emissary instruction is executed, while the base instruction is halted. The emissary instruction evaluates the condition and reports the condition evaluation back to the base instruction. Based on the condition evaluation, the base instruction is either launched into the pipeline for execution, or it is discarded (or a NOP, or null instruction, substituted for it). In either case, the dependencies of following instructions may be resolved.

Type: Grant

Filed: February 10, 2005

Date of Patent: April 24, 2007

Assignee: Qualcomm Incorporated

Inventors: Michael Scott McIlvaine, James Norris Dieffenderfer, Jeffrey Todd Bridges, Thomas Andrew Sartorius, Rodney Wayne Smith
Processor with a replay system that includes a replay queue for improved throughput

Patent number: 7200737

Abstract: A processor is provided that includes an execution unit for executing instructions and a replay system for replaying instructions which have not executed properly. The replay system is coupled to the execution unit and includes a checker for determining whether each instruction has executed properly and a replay queue coupled to the checker for temporarily storing one or more instructions for replay. The replay queue may be used to store a long latency instruction, such as a load in which data must be retrieved from an external memory device. The long latency instruction and possibly one or more dependent instruction are stored in the replay queue until the long latency instruction is ready to be executed (e.g., data for the load instruction has been retrieved from external memory). Once the long latency instruction is ready to be executed, (e.g., the data is available), the long latency instruction may then be unloaded from the replay queue for re-execution.

Type: Grant

Filed: December 29, 1999

Date of Patent: April 3, 2007

Assignee: Intel Corporation

Inventors: Amit A. Merchant, Darrell D. Boggs, David J. Sager
Register renaming system

Patent number: 7171541

Abstract: A register renaming system for a processor based on superscalar architecture that can process a larger number of instructions per cycle by providing a free list to hold unallocated physical-register numbers and a mapping table whose entries are provided in respective correspondence with the logical registers and each designed to hold a physical-register number, and by pipelining where dependency checks among instructions are to be done as a pre-process.

Type: Grant

Filed: September 6, 2000

Date of Patent: January 30, 2007

Inventor: Hajime Seki
High-performance, superscalar-based computer system with out-of-order instruction execution

Patent number: 7162610

Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.

Type: Grant

Filed: September 12, 2003

Date of Patent: January 9, 2007

Assignee: Seiko Epson Corporation

Inventors: Le Trong Nguyen, Derek J Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H Trang
Single-chip multiprocessor with cycle-precise program scheduling of parallel execution

Patent number: 7143401

Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.

Type: Grant

Filed: February 20, 2001

Date of Patent: November 28, 2006

Assignee: Elbrus International

Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
Parallel computation processor, parallel computation control method and program thereof

Patent number: 7136989

Abstract: A parallel computation processor being capable of high-speed loop operation. When instruction decoders decode the VLOOP instruction, which triggers loop operation, an instruction buffer starts storing normal instructions. The instruction buffer dispatches a VLIW instruction composed of n pieces of normal instructions to execution units each time n pieces of instructions are stored therein. The execution units concurrently execute the instructions. After all instructions comprised in a loop have been stored in the buffer and once dispatched as VLIW instructions to be executed, the loop is executed repeatedly.

Type: Grant

Filed: September 26, 2002

Date of Patent: November 14, 2006

Assignee: NEC Corporation

Inventor: Daiji Ishii
Computer system that operates in VLIW and superscalar modes and has selectable dependency control

Patent number: 7111152

Abstract: Instructions in a computer system are executed in a plurality of parallel execution pipelines, a horizontal dependency check is carried out between instructions supplied to the parallel pipelines and in response to detecting horizontal dependency a control signal of a first or second type is generated depending on whether the dependency can be resolved by activating a by-pass or whether a temporary stall is required in one of the pipelines.

Type: Grant

Filed: May 2, 2000

Date of Patent: September 19, 2006

Assignee: STMicroelectronics S.A.

Inventors: Andrew Cofler, Bruno Fel, Laurent Ducousso
Processor and method of testing a processor for hardware faults utilizing a pipeline interlocking test instruction

Patent number: 7096347

Abstract: The instruction pipeline of a processor, which includes execution circuitry and instruction sequencing logic, receives a stream of instructions including a pipeline interlocking test instruction. The processor includes pipeline control logic that, responsive to receipt of the test instruction, interlocks the instruction pipeline as specified in the test instruction to prevent advancement of at least one first instruction in the instruction pipeline while permitting advancement of at least one second instruction in the instruction pipeline until occurrence of a release condition also specified by the test instruction. In response to the release condition, the pipeline control logic releases the interlock to enable advancement of said at least one instruction in the instruction pipeline.

Type: Grant

Filed: October 25, 2001

Date of Patent: August 22, 2006

Inventor: Charles R. Moore
Method and apparatus for enhancing scheduling in an advanced microprocessor

Patent number: 7089404

Abstract: Apparatus and a method for causing scheduler software to produce code which executes more rapidly by ignoring some of the normal constraints placed on its scheduling operations and simply scheduling certain instructions to run as fast as possible, raising an exception if the scheduling violates a scheduling constraint, and determining steps to be taken for correctly executing each set of instructions about which an exception is raised.

Type: Grant

Filed: June 14, 1999

Date of Patent: August 8, 2006

Assignee: Transmeta Corporation

Inventors: Guillermo J. Rozas, Godfrey P. D'Souza, Charles R. Price, Paul S. Serris

prev 1 2 3 4 5 6 7 8 … next