Superscalar Patents (Class 712/23)
  • Patent number: 6725355
    Abstract: A microprocessor having an internal memory for storing data to be process, a data pointer register for storing an address on the internal memory, a decoder 36 for decoding an instruction, a general-purpose register module 11 including data registers r0 and r1 for storing data read from an address on the internal memory stored in the data pointer register in accordance with a request to read data stored in the internal memory, and an ALU 13 for performing processing using data stored in the general-purpose register module 11 based on the result of decoding by the decoder 36 and writing the result of processing in the general-purpose register module 11.
    Type: Grant
    Filed: August 11, 1998
    Date of Patent: April 20, 2004
    Assignee: Sony Corporation
    Inventor: Yoshihiko Imamura
  • Publication number: 20040073773
    Abstract: A novel vector processor architecture, and hardware and processing features associated therewith, provide both vector processing and superscalar processing features.
    Type: Application
    Filed: August 6, 2003
    Publication date: April 15, 2004
    Inventor: Victor Demjanenko
  • Patent number: 6721873
    Abstract: A method and apparatus for improving dispersal performance of instruction threads is described. In one embodiment, the dispersal logic determines whether the instructions supplied to it include any NOP instructions. When a NOP instruction is detected, the dispersal logic places the NOP into a no-op port for execution. All other instructions are distributed to the proper execution pipes in a normal manner. Because the NOP instructions do not use the execution resources of other instructions, all instruction threads can be executed in one cycle.
    Type: Grant
    Filed: December 29, 2000
    Date of Patent: April 13, 2004
    Assignee: Intel Corporation
    Inventors: Sailesh Kottapalli, Udo Walterscheidt, Andrew Sun, Thomas Yeh, Kinkee Sit
  • Patent number: 6718458
    Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.
    Type: Grant
    Filed: March 27, 2003
    Date of Patent: April 6, 2004
    Assignee: Broadcom Corporation
    Inventors: Dan Dobberpuhl, Robert Stepanian
  • Patent number: 6714961
    Abstract: The invention is directed toward a multiprocessing system having multiple processing units. For at least one of the processing units in the multiprocessing system, a first job signal is assigned to the processing unit for speculative execution of a corresponding first job, and a further job signal is assigned to the processing unit for speculative execution of a corresponding further job. The speculative execution of said further job is initiated when the processing unit has completed execution of the first job. If desirable, even more job signals may be assigned to the processing unit for speculative execution. In this way, multiple job signals are assigned to the processing units of the processing system, and the processing units are allowed to execute a plurality of jobs speculatively while waiting for commit priority.
    Type: Grant
    Filed: November 12, 1999
    Date of Patent: March 30, 2004
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Per Anders Holmberg, Terje Egeland, Nils Ola Linnermark, Karl Oscar Joachim Strömbergson, Magnus Carlsson
  • Publication number: 20040059894
    Abstract: The program to be executed is compiled by translating it into native instructions of the instruction-set architecture of the processor system, organizing the instructions deriving from the translation of the program into respective bundles in an order of successive bundles, each bundle grouping together instructions adapted to be executed in parallel by the processor system. The bundles of instructions are ordered into respective sub-bundles, said sub-bundles identifying a first set of instructions, which must be executed before the instructions belonging to the next bundle of said order, and a second set of instructions, which can be executed both before and in parallel with respect to the instructions belonging to said subsequent bundle of said order.
    Type: Application
    Filed: July 1, 2003
    Publication date: March 25, 2004
    Applicant: STMicroelectronics S.r.I.
    Inventors: Fabrizio Simone Rovati, Antonio Maria Borneo, Danilo Pietro Pau
  • Patent number: 6711670
    Abstract: A superscalar processing system that detects data hazards within instruction groups utilizes a memory, a plurality of pipelines, an instruction dispersal unit (IDU), and a control mechanism. The memory includes a plurality of entries that respectively correspond with a plurality of registers. The IDU receives an instruction group that includes a plurality of instructions and transmits the instructions of the instruction group to the plurality of pipelines. The control mechanism analyzes one of the instructions and identifies an entry in the memory that corresponds with a register associated with the one instruction. The control mechanism then analyzes the entry and transmits a warning signal in response to a determination that the entry indicates that another instruction within the instruction group is associated with the register.
    Type: Grant
    Filed: October 14, 1999
    Date of Patent: March 23, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Donald Charles Soltis, Jr., Ronny Lee Arnold
  • Patent number: 6708269
    Abstract: In a multi-threaded system, such as in a multi-processor system, different types of fences are provided to force completion of programmatically earlier instructions in a program. The types of fences can be thread-specific, and different types of fences are used based on different kinds of conditions, instructions, operations, or memory types. When a fence is executed, senior stores, request buffers, bus queues, or any combination of these stages in an execution pipeline can be drained. Fetches at a front end of the pipeline can also be killed to ensure that the bus queue can be drained.
    Type: Grant
    Filed: December 30, 1999
    Date of Patent: March 16, 2004
    Assignee: Intel Corporation
    Inventors: Keshavan K. Tiruvallur, Douglas M. Carmean, Robert J. Greiner, Muntaquim Chowdhury, Madhavan Parthasarathy
  • Patent number: 6704856
    Abstract: A method of compacting an instruction queue in an out of order processor includes determining the number of invalid instructions below and including each row in the queue, by counting invalid bits or validity indicators associated with rows below and up to the current row. For each row, multiplexor select signals are generated from the flat vector counts for the N rows above and including the present row, and from the validity indicators associated with the N rows, where N is a predetermined value. A multiplexor associated with a particular row selects one of the N rows according to the select value, and moves or passes the instruction held in the selected row to the present row. A row's select value is determined by forming a diagonal from the N count vectors corresponding to the N rows above and including the present row, and logically ANDing, each diagonal bit with the valid bit associated with the same row. Each row's count vector is determined in two stages.
    Type: Grant
    Filed: December 17, 1999
    Date of Patent: March 9, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: James A. Farrell, Timothy C. Fischer, Daniel L. Leibholz, Bruce A. Gieseke
  • Patent number: 6694424
    Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.
    Type: Grant
    Filed: January 3, 2000
    Date of Patent: February 17, 2004
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad
  • Patent number: 6691221
    Abstract: A computing system has first and second instruction storing circuits, each instruction storing circuit storing N instructions for parallel output. An instruction dispatch circuit, coupled to the first instruction storing circuit dispatches L instructions stored in the first instruction storing circuit, wherein L is less than or equal to N. An instruction loading circuit, coupled to the instruction dispatch circuit and to the first and second instruction storing circuits, loads L instructions from the second instruction storing circuit into the first instruction storing circuit after the L instructions are dispatched from the first instruction storing circuit and before further instructions are dispatched from the first instruction storing circuit. The instruction loading circuit loads the L instructions from the second instruction storing circuit into the positions previously occupied by the L instructions dispatched from the first instruction storing circuit.
    Type: Grant
    Filed: May 24, 2001
    Date of Patent: February 10, 2004
    Assignees: Mips Technologies, Inc., Kabushiki Kaisha Toshiba
    Inventors: Chandra Joshi, Paul Rodman, Peter Hsu, Monica R. Nofal
  • Publication number: 20040019766
    Abstract: Multiple instructions, specifying equivalent operations but designating different execution units, are stored beforehand on an instruction exchange table. First, a primary compiler compiles a source program into a set of machine-readable instructions. From the set of instructions, an instruction parallelizer generates a set of long instruction words. Specifically, an instruction identifier identifies one of the instructions in the set with one of the instructions stored on the instruction exchange table. Then, an instruction replacer replaces the instruction in question with another one of the instructions that is also stored on the instruction exchange table, specifies an equivalent operation but designates a different execution unit as a target. In this manner, the number of parallelly executable instructions can be increased, while the number of no-operation instructions can be reduced, thus generating a parallelized instruction set at a higher level of parallelism.
    Type: Application
    Filed: July 18, 2003
    Publication date: January 29, 2004
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventor: Kenichi Kawaguchi
  • Patent number: 6675288
    Abstract: A technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. The technique further involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: January 6, 2004
    Assignee: Hewlett-Packard Development Company L.P.
    Inventors: James Arthur Farrell, Sharon Marie Britton, Harry Ray Fair, III, Bruce Gieseke, Daniel Lawrence Leibholz, Derrick R. Meyer
  • Patent number: 6658655
    Abstract: A threaded interpreter (916) is suitable for executing a program comprising a series of program instructions stored in a memory (904). For the execution of a program instruction the threaded interpreter includes a preparatory unit (918) for executing a plurality of preparatory steps making the program instruction available in the threaded interpreter, and an execution unit (920) with one or more machine instructions emulating the program instruction. According to the invention, the threaded interpreter is designed such that during the execution on an instruction-level parallel processor of the series of program instructions machine instructions implementing a first one of the preparatory steps are executed in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions.
    Type: Grant
    Filed: December 6, 1999
    Date of Patent: December 2, 2003
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Jan Hoogerbrugge, Alexander Augusteijn
  • Patent number: 6658550
    Abstract: An asynchronous processor having pipelined instruction fetching and execution to implement concurrent execution of instructions by two or more execution units. A writeback unit is connected to execution units and memory units to control information updates and to handle precise exception. A pipelined completion mechanism can be implemented to improve the throughput.
    Type: Grant
    Filed: April 30, 2002
    Date of Patent: December 2, 2003
    Assignee: California Institute of Technology
    Inventors: Alain J. Martin, Andrew Lines, Rajit Manohar, Uri Cummings, Mika Nystroem
  • Patent number: 6651159
    Abstract: A floating point register stack for a processor combines a plurality of two general purpose registers to form a register stack for x86 instructions and leaves the remaining general purpose registers for native instructions of the processor. By mapping x86 sources into the stack of two general purpose registers and operating x86 instructions on the x86 stack, the register stack for the processor is able to support both the processor's native instruction set and the x86 instruction set without increasing the size of the register stack.
    Type: Grant
    Filed: November 29, 1999
    Date of Patent: November 18, 2003
    Assignee: ATI International SRL
    Inventors: Tiruvur R. Ramesh, Sanjay Mansingh, Korbin Van Dyke
  • Patent number: 6651161
    Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.
    Type: Grant
    Filed: January 3, 2000
    Date of Patent: November 18, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad
  • Patent number: 6651164
    Abstract: A superscalar processing system that detects data hazards within instruction groups transmitted to the processing system utilizes a content-addressable memory, a plurality of pipelines, an instruction dispersal unit (IDU), and a control mechanism. The IDU receives an instruction group that includes a plurality of instructions and transmits the instructions of the instruction group to the plurality of pipelines. The control mechanism stores register identifiers of the instructions in the content-addressable memory and determines whether a register identifier of one of the instructions is stored in the content-addressable memory. When the register identifier of the one instruction is stored in the content-addressable memory, the control mechanism transmits a warning signal indicating that one of the instruction groups contained a data hazard.
    Type: Grant
    Filed: October 14, 1999
    Date of Patent: November 18, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Donald Charles Soltis, Jr., Ronny Lee Arnold
  • Publication number: 20030212878
    Abstract: A scaleable microprocessor architecture has an efficient and orthogonal instruction set of 20 basic instructions, and a scaleable program word size from 15 bits up, including but not limited to 16, 24, 32, and 64 bits. As many instructions are packed into a single program word as allowed by the size of a program word. An integral return stack is used for nested subroutine calls and returns. An integral data stack is also used to pass parameters among nested subroutines. The simplified instruction set and the dual stack architecture make it possible to execute all instructions in a single clock cycle from a single phase master clock. Additional instructions can be added to facilitate accessing arrays in memory, for multiplication and division of integers, for real time interrupts, and to support an UART I/O device. This scaleable microprocessor architecture greatly increases code density and processing speed while decreasing significantly silicon area and power consumption.
    Type: Application
    Filed: May 7, 2002
    Publication date: November 13, 2003
    Inventor: Chen-Hanson Ting
  • Patent number: 6647486
    Abstract: Routine processing for routine data, non-routine processing for routine data and general non-routine processing are to be processed efficiently. To this end, a main CPU has a CPU core having a parallel computational mechanism, a command cache and a data cache as ordinary cache units, and a scratch-pad memory SPR which is an internal high-speed memory capable of performing direct memory accessing (DMA) suited for routine processing. A floating decimal point vector processor (VPE) has an internal high-speed memory (VU-MEM) capable of DMA processing and is tightly connected to the main CPU to form a co-processor. The VPE has a high-speed internal memory (VU-MEM) capable of DMA processing. The DMA controller (DMAC) controls DMA transfer between the main memory and the SPR, between the main memory and the (VU-MEM) and between the (VU-MEM) and the SPR.
    Type: Grant
    Filed: May 22, 2002
    Date of Patent: November 11, 2003
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Akio Ohba
  • Patent number: 6647485
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: May 10, 2001
    Date of Patent: November 11, 2003
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6643766
    Abstract: Speculative pre-fetching and pre-flushing of additional cache lines minimize cache miss latency and coherency check latency of an out of order instruction execution processor. A pre-fetch/pre-flush slot (DPRESLOT) is provided in a memory queue (MQUEUE) of the out-of-order execution processor. The DPRESLOT monitors the transactions between a system interface, e.g., the system bus, and an address reorder buffer slot (ARBSLOT) and/or between the system interface and a cache coherency check slot (CCCSLOT). When a cache miss is detected, the DPRESLOT causes one or more cache lines in addition to the data line, which caused the current cache miss, to be pre-fetched from the memory hierarchy into the cache memory (DCACHE) in anticipation that the additional data would be required in the near future.
    Type: Grant
    Filed: May 4, 2000
    Date of Patent: November 4, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Gregg B Lesartre, David Jerome Johnson
  • Patent number: 6633970
    Abstract: A mechanism is provided for allowing a processor to recover from a failure of a predicted path of instructions (e.g., from a mispredicted branch or other event). The mechanism includes a plurality of physical registers, each physical register can store either architectural data or speculative data. The apparatus also includes a primary array to store a mapping from logical registers to physical registers, the primary array storing a speculative state of the processor. The apparatus also includes a buffer coupled to the primary array to store information identifying which physical registers store architectural data and which physical registers store speculative data. According to another embodiment, a history buffer is coupled to the secondary array and stores historical physical register to logical register mappings performed for each of a plurality of instructions part of a predicted path.
    Type: Grant
    Filed: December 28, 1999
    Date of Patent: October 14, 2003
    Assignee: Intel Corporation
    Inventors: David W. Clift, Darrell D. Boggs, David J. Sager
  • Publication number: 20030191923
    Abstract: A computing system as described in which individual instructions are executable in parallel by processing pipelines, and instructions to be executed in parallel by different pipelines are supplied to the pipelines simultaneously. The system includes storage for storing an arbitrary number of the instructions to be executed. The instructions to be executed are tagged with pipeline identification tags indicative of the pipeline to which they should be dispatched. The pipeline identification tags are supplied to a system which controls a crossbar switch, enabling the tags to be used to control the switch and supply the appropriate instructions simultaneously to the differing pipelines.
    Type: Application
    Filed: April 9, 1998
    Publication date: October 9, 2003
    Inventors: HOWARD G. SACHS, Siamak Arya
  • Patent number: 6631464
    Abstract: An instruction fetch control system prefetches a branch instruction in a pipeline system and fetches a branch target instruction of the branch instruction. The control system comprises a first branch judgement circuit for conducting a branch condition judgement in a stage prior to the branch judgement stage in which a second and original branch judgement of the branch instruction is conducted, and a circuit for starting a prefetch of instructions following said branch target instruction without waiting for the branch judgement stage where the first branch judgement circuit judges that the branch is successful.
    Type: Grant
    Filed: June 10, 1993
    Date of Patent: October 7, 2003
    Assignee: Fujitsu Limited
    Inventors: Tsuyoshi Mori, Seishi Okada
  • Patent number: 6631462
    Abstract: A method includes pushing a datum onto a stack by a first processor and popping the datum off the stack by a second processor.
    Type: Grant
    Filed: January 5, 2000
    Date of Patent: October 7, 2003
    Assignee: Intel Corporation
    Inventors: Gilbert Wolrich, Matthew J. Adiletta, William Wheeler, Daniel Cutter, Debra Bernstein
  • Patent number: 6629232
    Abstract: Interconnect-dominated large register files are reduced in chip area and delay time. A register file in a processor having a number of execution units is divided into multiple copies. Different groups of execution units can read from and write to their own copy of the file registers by a set of local read and write ports. All of the register-file copies are synchronized by writing data from the execution units to remote write ports in at least some registers in other copies of the register file. Each copy can be divided into local and global registers. While all copies of the global registers continue to be written by the remote write ports, the local registers can be written only by a local cluster of execution units. Alternatively or additionally, all of the execution units can write to their local register-file copy, but only some of the units can write the global registers in all copies of the register file.
    Type: Grant
    Filed: July 3, 2000
    Date of Patent: September 30, 2003
    Assignee: Intel Corporation
    Inventors: Ken Arora, Harshvardhan Sharangpani, Rajiv Gupta
  • Patent number: 6625726
    Abstract: A method and apparatus for fault handling in computer systems. In one embodiment, a first register is used to store an address which points to the top of a stack. The address stored in the first register may be updated during the execution of an instruction. A second register may be used to store an address previously first register. The contents of the second register may be kept unchanged until the retirement of the instruction that is currently executing. If a fault occurs during execution of the instruction, a microcode fault handler may perform routines that may clear the fault or those conditions which led to the fault. The microcode fault handler may also copy the contents of the second register back into the first register. Execution of the instruction may be restarted from the operation just prior to when the fault occurred. The program from which the instruction originated may then continue to run.
    Type: Grant
    Filed: June 2, 2000
    Date of Patent: September 23, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael T. Clark, Scott A. White
  • Publication number: 20030177340
    Abstract: A system and method of executing instructions within a counterflow pipeline processor. The counterflow pipeline processor includes an instruction pipeline, a data pipeline, a reorder buffer and a plurality of execution units. An instruction and one or more operands issue into the instruction pipeline and a determination is made at one of the execution units whether the instruction is ready for execution. If so, the operands are loaded into the execution unit and the instruction executes. The execution unit is monitored for a result and, when the result arrives, it is stored into the result pipeline. If the instruction reaches the end of the pipeline without executing it wraps around and is sent down the instruction pipeline again.
    Type: Application
    Filed: March 18, 2003
    Publication date: September 18, 2003
    Applicant: Intel Corporation
    Inventors: Kenneth J. Janik, Shih-Lien L. Lu, Michael F. Miller
  • Patent number: 6622237
    Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.
    Type: Grant
    Filed: January 3, 2000
    Date of Patent: September 16, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad, Keith R. Schakel
  • Patent number: 6622235
    Abstract: A scheduler issues memory operations without regard to whether or not resources are available to handle each possible execution outcome of that memory operation. The scheduler also retains the memory operation after issuance. If a condition occurs which prevents correct execution of the memory operation, the memory operation is retried. The scheduler subsequently reschedules and reissues the memory operation in response to the retry. Additionally, the scheduler may receive a retry type indicating the reason for retry. Certain retry types may indicate a delayed reissuance of the memory operation until the occurrence of a subsequent event. In response to such retry types, the scheduler monitors for the subsequent event and delays reissuance until the event is detected. The scheduler may include a physical address buffer to detect a load memory operation which incorrectly issued prior to an older store memory operation upon which it is dependent for the memory operation.
    Type: Grant
    Filed: January 3, 2000
    Date of Patent: September 16, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James B. Keller, Ramsey W. Haddad, Stephan G. Meier
  • Patent number: 6618698
    Abstract: Clusters of processors are interconnected as an emulation engine such that processors share input and data stacks, and the setup and storing of results are done in parallel, but the output of one evaluation unit is connected to the input of the next evaluation unit. A set of ‘cascade’ connections provides access to the intermediate values. By tapping intermediate values from one processor, and feeding them to the next, a significant emulation speedup is achieved.
    Type: Grant
    Filed: August 12, 1999
    Date of Patent: September 9, 2003
    Assignee: Quickturn Design Systems, Inc.
    Inventors: William F. Beausoleil, Tak-kwong Ng, Helmut Roth, Peter Tannenbaum, N. James Tomassetti
  • Patent number: 6609189
    Abstract: The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: August 19, 2003
    Assignee: Yale University
    Inventors: Bradley C. Kuszmaul, Dana Sue Henry-Kuszmaul
  • Patent number: 6609247
    Abstract: A method and an apparatus for re-creating a trace of instructions from an emulated instruction set when running on hardware optimized for a different instruction set, such as IA-32 instructions running on an IA-64 machine, are disclosed. An execution trace buffer is created that maintains desired information about instructions as they are executed and retired. The invention may be configured such that certain desired information helpful to debugging the system may be written to the buffer as the instructions are retired. This information may include the addresses of sequential or branch instructions, or other relevant information that can be gathered continuously and non-intrusively as instructions are executed. The information may be read from the buffer and output in a machine-visible form at the user's convenience.
    Type: Grant
    Filed: February 18, 2000
    Date of Patent: August 19, 2003
    Assignee: Hewlett-Packard Development Company
    Inventors: Anuj Dua, Russell Clarence Brockmann, Susith Rohana Fernando, Kevin David Safford
  • Publication number: 20030145173
    Abstract: A method of parallel hardware-based multithreaded processing is described. The method includes assigning tasks for packet processing to programming engines and establishing pipelines between programming stages, which correspond to the programming engines. The method also includes establishing contexts for the assigned tasks on the programming engines and using a software controlled cache such as a CAM to transfer data between next neighbor registers residing in the programming engines.
    Type: Application
    Filed: January 25, 2002
    Publication date: July 31, 2003
    Inventors: Hugh M. Wilkinson, Mark B. Rosenbluth, Matthew J. Adiletta, Debra Bernstein, Gilbert Wolrich
  • Patent number: 6591360
    Abstract: A method and apparatus that generates a simplified, localized version (“a local stall”) of a global stall to improve the performance of a pipelined microprocessor. The local stall is generated when a data-dependency hazard is detected for a local consumer. Utilizing circuitry used in the pipelined microprocessor's data-forwarding circuitry, the local stall is generated with a relatively minor increase in circuitry. The local stall is generated much sooner than the global stall, arriving much sooner in a local pipeline. The local pipeline utilizes the local stall to override the global stall, when appropriate, and to ensure that correct data is read for a local consumer and to operate more efficiently than a standard pipeline without a local stall.
    Type: Grant
    Filed: January 18, 2000
    Date of Patent: July 8, 2003
    Assignee: Hewlett-Packard Development Company
    Inventors: Donald C. Soltis, Jr., Rohit Bhatia, Mark Gibson
  • Publication number: 20030126589
    Abstract: A method and apparatus for a reduction operation is described. A method may be utilized that includes receiving a first program unit in a parallel computing environment, the first program unit may include a reduction operation to be performed and translating the first program unit into a second program unit, the second program unit may associate the reduction operation with a set of one or more low-level instructions that may, in part, perform the reduction operation.
    Type: Application
    Filed: January 2, 2002
    Publication date: July 3, 2003
    Inventors: David K. Poulsen, Sanjiv M. Shah, Paul M. Petersen, Grant E. Haab, Jay P. Hoeflinger
  • Patent number: 6578135
    Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.
    Type: Grant
    Filed: January 11, 2000
    Date of Patent: June 10, 2003
    Assignee: Broadcom Corporation
    Inventors: Dan Dobberpuhl, Robert Stepanian
  • Patent number: 6574725
    Abstract: A processor architecture containing multiple closely coupled processors in a form of symmetric multiprocessing system is provided. The special coupling mechanism allows it to speculatively execute multiple threads in parallel very efficiently. Generally, the operating system is responsible for scheduling various threads of execution among the available processors in a multiprocessor system. One problem with parallel multithreading is that the overhead involved in scheduling the threads for execution by the operating system is such that shorter segments of code cannot efficiently take advantage of parallel multithreading. Consequently, potential performance gains from parallel multithreading are not attainable. Additional circuitry is included in a form of symmetrical multiprocessing system which enables the scheduling and speculative execution of multiple threads on multiple processors without the involvement and inherent overhead of the operating system.
    Type: Grant
    Filed: November 1, 1999
    Date of Patent: June 3, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Uwe Kranich, David S. Christie
  • Patent number: 6567839
    Abstract: A system and method for performing computer processing operations in a data processing system includes a multithreaded processor and thread switch logic. The multithreaded processor is capable of switching between two or more threads of instructions which can be independently executed. Each thread has a corresponding state in a thread state register depending on its execution status. The thread switch logic contains a thread switch control register to store the conditions upon which a thread switch can occur. Upon the occurrence of a thread switch event, the state and priority of all threads are dynamically interrogated to determine which thread should be the active thread executing the processor. The thread switch logic has a time-out register which forces a thread switch when execution of the active thread in the multithreaded processor exceeds a programmable period of time.
    Type: Grant
    Filed: October 23, 1997
    Date of Patent: May 20, 2003
    Assignee: International Business Machines Corporation
    Inventors: John Michael Borkenhagen, Richard James Eickemeyer, William Thomas Flynn, Sheldon Bernard Levenstein, Andrew Henry Wottreng
  • Patent number: 6560692
    Abstract: The data processing circuit of this invention enables efficient description and execution of processes that act upon the stack pointer, using short instructions. It also enables efficient description of processes that save and restore the contents of registers, increasing the speed of processing of interrupts and subroutine calls and returns. A CPU that uses this data processing circuit comprises a dedicated stack pointer register SP and uses an instruction decoder to decode a group of dedicated stack pointer instructions that specify the SP as an implicit operand. This group of dedicated stack pointer instructions are implemented in hardware by using general-purpose registers, the PC, the SP, an address adder, an ALU, a PC incrementer, internal buses, internal signal lines, and external buses.
    Type: Grant
    Filed: May 20, 1997
    Date of Patent: May 6, 2003
    Assignee: Seiko Epson Corporation
    Inventors: Makoto Kudo, Satoshi Kubota, Yoshiyuki Miyayama, Hisao Sato
  • Patent number: 6553480
    Abstract: A group completion table (GCT) that manages the execution of instruction groups having more than one executable instruction is disclosed. The GCT includes a plurality of table entries, wherein each of the table entries is associated with a respective instruction group. Each table entry in the GCT includes a plurality of instruction completion identifiers, wherein each of the instruction completion identifiers corresponds to a specific instruction in the associated instruction group. The table entry also includes a trouble identifier that is utilized to flag the occurrence of any exception condition encountered in the execution of any instruction in the instruction group. In a related embodiment, the trouble identifier utilized in the table entry is a single bit.
    Type: Grant
    Filed: November 5, 1999
    Date of Patent: April 22, 2003
    Assignee: International Business Machines Corporation
    Inventors: Hoichi Cheong, Hung Qui Le
  • Publication number: 20030070060
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Application
    Filed: October 30, 2002
    Publication date: April 10, 2003
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20030065905
    Abstract: A parallel computation processor being capable of high-speed loop operation. When instruction decoders decode the VLOOP instruction, which triggers loop operation, an instruction buffer starts storing normal instructions. The instruction buffer dispatches a VLIW instruction composed of n pieces of normal instructions to execution units each time n pieces of instructions are stored therein. The execution units concurrently execute the instructions. After all instructions comprised in a loop have been stored in the buffer and once dispatched as VLIW instructions to be executed, the loop is executed repeatedly.
    Type: Application
    Filed: September 26, 2002
    Publication date: April 3, 2003
    Applicant: NEC CORPORATION
    Inventor: Daiji Ishii
  • Patent number: 6542986
    Abstract: A superscalar processor may issue multiple instructions per clock cycle. Included in a superscalar processor may be a reorder buffer which stores information corresponding to concurrently dispatched instructions. Dependencies may exist among the instructions which are concurrently dispatched. To resolve this dependency, when a dependency is detected amongst a group of concurrently dispatched instructions, an indication of the dependency, along with an indication of the position of the dependency, is conveyed to the corresponding reservation station. When the reservation station receives the indication of the dependency, the operand tag associated with the dependency may be replaced with the correct tag. Advantageously, the circuitry needed to resolve the dependency may be moved out of the critical path of the processor; thus, improving the performance of the processor by allowing it to operate at an increased frequency.
    Type: Grant
    Filed: November 9, 1999
    Date of Patent: April 1, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Scott A. White
  • Patent number: 6539471
    Abstract: Method and apparatus for reducing or eliminating retirement logic in an out-of-order processor are disclosed. Instructions are processed using a processing unit capable of out-of-order processing and having architectural registers having an architectural state. Groups of instructions are prepared for processing by processing unit, wherein within each group to be processed the instructions producing the final state of an architectural register are changed so that they write to an output copy of the architectural state, the instructions reading architectural registers are changed to read from an input copy of the architectural state, and the instructions within each group producing results to architectural registers that would be overwritten by another instruction in the group are changed to write their results to temporary registers.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: March 25, 2003
    Assignee: Intel Corporation
    Inventor: Gad S. Sheaffer
  • Patent number: 6526499
    Abstract: The present invention discloses a method and apparatus for implementing a senior load instruction type. An instruction requesting a memory reference is decoded. The decoded instruction is then dispatched to a memory ordering unit. The instruction is retired from a load buffer and is executed after retiring.
    Type: Grant
    Filed: January 10, 2001
    Date of Patent: February 25, 2003
    Assignee: Intel Corporation
    Inventors: Salvador Palanca, Shekoufeh Qawami, Niranjan L. Cooray, Angad Narang, Subramaniam Maiyuran
  • Patent number: 6522934
    Abstract: A process control system includes a controller that executes a control routine which performs a series of unit procedures within a process. The control routine is written or created to specify the class of unit to be used for each unit procedure, but not the actual unit itself. At the start of each unit procedure of the control routine, a dynamic unit selection routine selects a particular unit as the unit to be used during operation of that unit procedure. When called, the dynamic unit selection routine determines a set of possible units to be used, determines if each of the set of possible units is suitable for use during that unit procedure of the control routine based on a suitability criterion, prioritizes the units that meet the suitability criterion based on a priority criterion and selects the particular unit from the prioritized list of suitable units in order of priority.
    Type: Grant
    Filed: July 2, 1999
    Date of Patent: February 18, 2003
    Assignee: Fisher-Rosemount Systems, Inc.
    Inventors: William G. Irwin, David L. Deitz
  • Patent number: 6515759
    Abstract: A printer provides additional read/write memory for image processing by operating with stored programs in compressed form. When needed for execution, instructions of a compressed program are expanded by a decompression circuit on the fly in an instruction cache. In a preferred embodiment, the instruction cache includes dynamic random access memory (DRAM). Further, the processor for executing the expanded instructions, the decompression circuit, and the instruction cache are integrated together on the same chip. A printer having a processor for formatting incoming data in a page description language (PDL), for example, executes the instructions of the PDL interpreter program from the cache while the PDL program as a whole is stored in compressed format in off-chip ROM or received from an external computer (downloaded) into off-chip RAM.
    Type: Grant
    Filed: August 1, 2000
    Date of Patent: February 4, 2003
    Assignee: Hewlett-Packard Company
    Inventor: Kenneth K. Smith
  • Publication number: 20030005260
    Abstract: A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands. A data dependance check circuit is included for determining data dependencies between the instructions. A tag assignment circuit generates one or more tags to specify the location of operands, based on the data dependencies determined by the data dependance check circuit. A set of register file port multiplexers select the tags generated by the tag assignment circuit and pass the tags onto the read address ports of the register file for storing execution results.
    Type: Application
    Filed: March 1, 2002
    Publication date: January 2, 2003
    Inventors: Sanjiv Garg, Kevin Ray Iadonato, Le Trong Nguyen, Johannes Wang