Superscalar Patents (Class 712/23)

Arithmetic processing architecture having a portion of general-purpose registers directly coupled to a plurality of memory banks

Patent number: 6725355

Abstract: A microprocessor having an internal memory for storing data to be process, a data pointer register for storing an address on the internal memory, a decoder 36 for decoding an instruction, a general-purpose register module 11 including data registers r0 and r1 for storing data read from an address on the internal memory stored in the data pointer register in accordance with a request to read data stored in the internal memory, and an ALU 13 for performing processing using data stored in the general-purpose register module 11 based on the result of decoding by the decoder 36 and writing the result of processing in the general-purpose register module 11.

Type: Grant

Filed: August 11, 1998

Date of Patent: April 20, 2004

Assignee: Sony Corporation

Inventor: Yoshihiko Imamura
Vector processor architecture and methods performed therein

Publication number: 20040073773

Abstract: A novel vector processor architecture, and hardware and processing features associated therewith, provide both vector processing and superscalar processing features.

Type: Application

Filed: August 6, 2003

Publication date: April 15, 2004

Inventor: Victor Demjanenko
Method and apparatus for improving dispersal performance in a processor through the use of no-op ports

Patent number: 6721873

Abstract: A method and apparatus for improving dispersal performance of instruction threads is described. In one embodiment, the dispersal logic determines whether the instructions supplied to it include any NOP instructions. When a NOP instruction is detected, the dispersal logic places the NOP into a no-op port for execution. All other instructions are distributed to the proper execution pipes in a normal manner. Because the NOP instructions do not use the execution resources of other instructions, all instruction threads can be executed in one cycle.

Type: Grant

Filed: December 29, 2000

Date of Patent: April 13, 2004

Assignee: Intel Corporation

Inventors: Sailesh Kottapalli, Udo Walterscheidt, Andrew Sun, Thomas Yeh, Kinkee Sit
Method and apparatus for performing addressing operations in a superscalar, superpipelined processor

Patent number: 6718458

Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.

Type: Grant

Filed: March 27, 2003

Date of Patent: April 6, 2004

Assignee: Broadcom Corporation

Inventors: Dan Dobberpuhl, Robert Stepanian
Multiple job signals per processing unit in a multiprocessing system

Patent number: 6714961

Abstract: The invention is directed toward a multiprocessing system having multiple processing units. For at least one of the processing units in the multiprocessing system, a first job signal is assigned to the processing unit for speculative execution of a corresponding first job, and a further job signal is assigned to the processing unit for speculative execution of a corresponding further job. The speculative execution of said further job is initiated when the processing unit has completed execution of the first job. If desirable, even more job signals may be assigned to the processing unit for speculative execution. In this way, multiple job signals are assigned to the processing units of the processing system, and the processing units are allowed to execute a plurality of jobs speculatively while waiting for commit priority.

Type: Grant

Filed: November 12, 1999

Date of Patent: March 30, 2004

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Per Anders Holmberg, Terje Egeland, Nils Ola Linnermark, Karl Oscar Joachim Strömbergson, Magnus Carlsson
Process for running programs on processors and corresponding processor system

Publication number: 20040059894

Abstract: The program to be executed is compiled by translating it into native instructions of the instruction-set architecture of the processor system, organizing the instructions deriving from the translation of the program into respective bundles in an order of successive bundles, each bundle grouping together instructions adapted to be executed in parallel by the processor system. The bundles of instructions are ordered into respective sub-bundles, said sub-bundles identifying a first set of instructions, which must be executed before the instructions belonging to the next bundle of said order, and a second set of instructions, which can be executed both before and in parallel with respect to the instructions belonging to said subsequent bundle of said order.

Type: Application

Filed: July 1, 2003

Publication date: March 25, 2004

Applicant: STMicroelectronics S.r.I.

Inventors: Fabrizio Simone Rovati, Antonio Maria Borneo, Danilo Pietro Pau
System and method for detecting data hazards within an instruction group of a compiled computer program

Patent number: 6711670

Abstract: A superscalar processing system that detects data hazards within instruction groups utilizes a memory, a plurality of pipelines, an instruction dispersal unit (IDU), and a control mechanism. The memory includes a plurality of entries that respectively correspond with a plurality of registers. The IDU receives an instruction group that includes a plurality of instructions and transmits the instructions of the instruction group to the plurality of pipelines. The control mechanism analyzes one of the instructions and identifies an entry in the memory that corresponds with a register associated with the one instruction. The control mechanism then analyzes the entry and transmits a warning signal in response to a determination that the entry indicates that another instruction within the instruction group is associated with the register.

Type: Grant

Filed: October 14, 1999

Date of Patent: March 23, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Donald Charles Soltis, Jr., Ronny Lee Arnold
Method and apparatus for multi-mode fencing in a microprocessor system

Patent number: 6708269

Abstract: In a multi-threaded system, such as in a multi-processor system, different types of fences are provided to force completion of programmatically earlier instructions in a program. The types of fences can be thread-specific, and different types of fences are used based on different kinds of conditions, instructions, operations, or memory types. When a fence is executed, senior stores, request buffers, bus queues, or any combination of these stages in an execution pipeline can be drained. Fetches at a front end of the pipeline can also be killed to ensure that the bus queue can be drained.

Type: Grant

Filed: December 30, 1999

Date of Patent: March 16, 2004

Assignee: Intel Corporation

Inventors: Keshavan K. Tiruvallur, Douglas M. Carmean, Robert J. Greiner, Muntaquim Chowdhury, Madhavan Parthasarathy
Method for compacting an instruction queue

Patent number: 6704856

Abstract: A method of compacting an instruction queue in an out of order processor includes determining the number of invalid instructions below and including each row in the queue, by counting invalid bits or validity indicators associated with rows below and up to the current row. For each row, multiplexor select signals are generated from the flat vector counts for the N rows above and including the present row, and from the validity indicators associated with the N rows, where N is a predetermined value. A multiplexor associated with a particular row selects one of the N rows according to the select value, and moves or passes the instruction held in the selected row to the present row. A row's select value is determined by forming a diagonal from the N count vectors corresponding to the N rows above and including the present row, and logically ANDing, each diagonal bit with the valid bit associated with the same row. Each row's count vector is determined in two stages.

Type: Grant

Filed: December 17, 1999

Date of Patent: March 9, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: James A. Farrell, Timothy C. Fischer, Daniel L. Leibholz, Bruce A. Gieseke
Store load forward predictor training

Patent number: 6694424

Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.

Type: Grant

Filed: January 3, 2000

Date of Patent: February 17, 2004

Assignee: Advanced Micro Devices, Inc.

Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad
Loading previously dispatched slots in multiple instruction dispatch buffer before dispatching remaining slots for parallel execution

Patent number: 6691221

Abstract: A computing system has first and second instruction storing circuits, each instruction storing circuit storing N instructions for parallel output. An instruction dispatch circuit, coupled to the first instruction storing circuit dispatches L instructions stored in the first instruction storing circuit, wherein L is less than or equal to N. An instruction loading circuit, coupled to the instruction dispatch circuit and to the first and second instruction storing circuits, loads L instructions from the second instruction storing circuit into the first instruction storing circuit after the L instructions are dispatched from the first instruction storing circuit and before further instructions are dispatched from the first instruction storing circuit. The instruction loading circuit loads the L instructions from the second instruction storing circuit into the positions previously occupied by the L instructions dispatched from the first instruction storing circuit.

Type: Grant

Filed: May 24, 2001

Date of Patent: February 10, 2004

Assignees: Mips Technologies, Inc., Kabushiki Kaisha Toshiba

Inventors: Chandra Joshi, Paul Rodman, Peter Hsu, Monica R. Nofal
Program translator and processor

Publication number: 20040019766

Abstract: Multiple instructions, specifying equivalent operations but designating different execution units, are stored beforehand on an instruction exchange table. First, a primary compiler compiles a source program into a set of machine-readable instructions. From the set of instructions, an instruction parallelizer generates a set of long instruction words. Specifically, an instruction identifier identifies one of the instructions in the set with one of the instructions stored on the instruction exchange table. Then, an instruction replacer replaces the instruction in question with another one of the instructions that is also stored on the instruction exchange table, specifies an equivalent operation but designates a different execution unit as a target. In this manner, the number of parallelly executable instructions can be increased, while the number of no-operation instructions can be reduced, thus generating a parallelized instruction set at a higher level of parallelism.

Type: Application

Filed: July 18, 2003

Publication date: January 29, 2004

Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.

Inventor: Kenichi Kawaguchi
Apparatus for mapping instructions using a set of valid and invalid logical to physical register assignments indicated by bits of a valid vector together with a logical register list

Patent number: 6675288

Abstract: A technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. The technique further involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.

Type: Grant

Filed: May 9, 2002

Date of Patent: January 6, 2004

Assignee: Hewlett-Packard Development Company L.P.

Inventors: James Arthur Farrell, Sharon Marie Britton, Harry Ray Fair, III, Bruce Gieseke, Daniel Lawrence Leibholz, Derrick R. Meyer
Method of executing an interpreter program

Patent number: 6658655

Abstract: A threaded interpreter (916) is suitable for executing a program comprising a series of program instructions stored in a memory (904). For the execution of a program instruction the threaded interpreter includes a preparatory unit (918) for executing a plurality of preparatory steps making the program instruction available in the threaded interpreter, and an execution unit (920) with one or more machine instructions emulating the program instruction. According to the invention, the threaded interpreter is designed such that during the execution on an instruction-level parallel processor of the series of program instructions machine instructions implementing a first one of the preparatory steps are executed in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions.

Type: Grant

Filed: December 6, 1999

Date of Patent: December 2, 2003

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Jan Hoogerbrugge, Alexander Augusteijn
Pipelined asynchronous processing

Patent number: 6658550

Abstract: An asynchronous processor having pipelined instruction fetching and execution to implement concurrent execution of instructions by two or more execution units. A writeback unit is connected to execution units and memory units to control information updates and to handle precise exception. A pipelined completion mechanism can be implemented to improve the throughput.

Type: Grant

Filed: April 30, 2002

Date of Patent: December 2, 2003

Assignee: California Institute of Technology

Inventors: Alain J. Martin, Andrew Lines, Rajit Manohar, Uri Cummings, Mika Nystroem
Floating point register stack management for CISC

Patent number: 6651159

Abstract: A floating point register stack for a processor combines a plurality of two general purpose registers to form a register stack for x86 instructions and leaves the remaining general purpose registers for native instructions of the processor. By mapping x86 sources into the stack of two general purpose registers and operating x86 instructions on the x86 stack, the register stack for the processor is able to support both the processor's native instruction set and the x86 instruction set without increasing the size of the register stack.

Type: Grant

Filed: November 29, 1999

Date of Patent: November 18, 2003

Assignee: ATI International SRL

Inventors: Tiruvur R. Ramesh, Sanjay Mansingh, Korbin Van Dyke
Store load forward predictor untraining

Patent number: 6651161

Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.

Type: Grant

Filed: January 3, 2000

Date of Patent: November 18, 2003

Assignee: Advanced Micro Devices, Inc.

Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad
System and method for detecting an erroneous data hazard between instructions of an instruction group and resulting from a compiler grouping error

Patent number: 6651164

Abstract: A superscalar processing system that detects data hazards within instruction groups transmitted to the processing system utilizes a content-addressable memory, a plurality of pipelines, an instruction dispersal unit (IDU), and a control mechanism. The IDU receives an instruction group that includes a plurality of instructions and transmits the instructions of the instruction group to the plurality of pipelines. The control mechanism stores register identifiers of the instructions in the content-addressable memory and determines whether a register identifier of one of the instructions is stored in the content-addressable memory. When the register identifier of the one instruction is stored in the content-addressable memory, the control mechanism transmits a warning signal indicating that one of the instruction groups contained a data hazard.

Type: Grant

Filed: October 14, 1999

Date of Patent: November 18, 2003

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Donald Charles Soltis, Jr., Ronny Lee Arnold
Scaleable microprocessor architecture

Publication number: 20030212878

Abstract: A scaleable microprocessor architecture has an efficient and orthogonal instruction set of 20 basic instructions, and a scaleable program word size from 15 bits up, including but not limited to 16, 24, 32, and 64 bits. As many instructions are packed into a single program word as allowed by the size of a program word. An integral return stack is used for nested subroutine calls and returns. An integral data stack is also used to pass parameters among nested subroutines. The simplified instruction set and the dual stack architecture make it possible to execute all instructions in a single clock cycle from a single phase master clock. Additional instructions can be added to facilitate accessing arrays in memory, for multiplication and division of integers, for real time interrupts, and to support an UART I/O device. This scaleable microprocessor architecture greatly increases code density and processing speed while decreasing significantly silicon area and power consumption.

Type: Application

Filed: May 7, 2002

Publication date: November 13, 2003

Inventor: Chen-Hanson Ting
Information processing apparatus for entertainment system utilizing DMA-controlled high-speed transfer and processing of routine data

Patent number: 6647486

Abstract: Routine processing for routine data, non-routine processing for routine data and general non-routine processing are to be processed efficiently. To this end, a main CPU has a CPU core having a parallel computational mechanism, a command cache and a data cache as ordinary cache units, and a scratch-pad memory SPR which is an internal high-speed memory capable of performing direct memory accessing (DMA) suited for routine processing. A floating decimal point vector processor (VPE) has an internal high-speed memory (VU-MEM) capable of DMA processing and is tightly connected to the main CPU to form a co-processor. The VPE has a high-speed internal memory (VU-MEM) capable of DMA processing. The DMA controller (DMAC) controls DMA transfer between the main memory and the SPR, between the main memory and the (VU-MEM) and between the (VU-MEM) and the SPR.

Type: Grant

Filed: May 22, 2002

Date of Patent: November 11, 2003

Assignee: Sony Computer Entertainment Inc.

Inventor: Akio Ohba
High-performance, superscalar-based computer system with out-of-order instruction execution

Patent number: 6647485

Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.

Type: Grant

Filed: May 10, 2001

Date of Patent: November 11, 2003

Assignee: Seiko Epson Corporation

Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
Speculative pre-fetching additional line on cache miss if no request pending in out-of-order processor

Patent number: 6643766

Abstract: Speculative pre-fetching and pre-flushing of additional cache lines minimize cache miss latency and coherency check latency of an out of order instruction execution processor. A pre-fetch/pre-flush slot (DPRESLOT) is provided in a memory queue (MQUEUE) of the out-of-order execution processor. The DPRESLOT monitors the transactions between a system interface, e.g., the system bus, and an address reorder buffer slot (ARBSLOT) and/or between the system interface and a cache coherency check slot (CCCSLOT). When a cache miss is detected, the DPRESLOT causes one or more cache lines in addition to the data line, which caused the current cache miss, to be pre-fetched from the memory hierarchy into the cache memory (DCACHE) in anticipation that the additional data would be required in the near future.

Type: Grant

Filed: May 4, 2000

Date of Patent: November 4, 2003

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Gregg B Lesartre, David Jerome Johnson
Processor with registers storing committed/speculative data and a RAT state history recovery mechanism with retire pointer

Patent number: 6633970

Abstract: A mechanism is provided for allowing a processor to recover from a failure of a predicted path of instructions (e.g., from a mispredicted branch or other event). The mechanism includes a plurality of physical registers, each physical register can store either architectural data or speculative data. The apparatus also includes a primary array to store a mapping from logical registers to physical registers, the primary array storing a speculative state of the processor. The apparatus also includes a buffer coupled to the primary array to store information identifying which physical registers store architectural data and which physical registers store speculative data. According to another embodiment, a history buffer is coupled to the secondary array and stores historical physical register to logical register mappings performed for each of a plurality of instructions part of a predicted path.

Type: Grant

Filed: December 28, 1999

Date of Patent: October 14, 2003

Assignee: Intel Corporation

Inventors: David W. Clift, Darrell D. Boggs, David J. Sager
INSTRUCTION CACHE ASSOCIATIVE CROSSBAR SWITCH

Publication number: 20030191923

Abstract: A computing system as described in which individual instructions are executable in parallel by processing pipelines, and instructions to be executed in parallel by different pipelines are supplied to the pipelines simultaneously. The system includes storage for storing an arbitrary number of the instructions to be executed. The instructions to be executed are tagged with pipeline identification tags indicative of the pipeline to which they should be dispatched. The pipeline identification tags are supplied to a system which controls a crossbar switch, enabling the tags to be used to control the switch and supply the appropriate instructions simultaneously to the differing pipelines.

Type: Application

Filed: April 9, 1998

Publication date: October 9, 2003

Inventors: HOWARD G. SACHS, Siamak Arya
Instruction pipeline with a branch prefetch when the branch is certain

Patent number: 6631464

Abstract: An instruction fetch control system prefetches a branch instruction in a pipeline system and fetches a branch target instruction of the branch instruction. The control system comprises a first branch judgement circuit for conducting a branch condition judgement in a stage prior to the branch judgement stage in which a second and original branch judgement of the branch instruction is conducted, and a circuit for starting a prefetch of instructions following said branch target instruction without waiting for the branch judgement stage where the first branch judgement circuit judges that the branch is successful.

Type: Grant

Filed: June 10, 1993

Date of Patent: October 7, 2003

Assignee: Fujitsu Limited

Inventors: Tsuyoshi Mori, Seishi Okada
Memory shared between processing threads

Patent number: 6631462

Abstract: A method includes pushing a datum onto a stack by a first processor and popping the datum off the stack by a second processor.

Type: Grant

Filed: January 5, 2000

Date of Patent: October 7, 2003

Assignee: Intel Corporation

Inventors: Gilbert Wolrich, Matthew J. Adiletta, William Wheeler, Daniel Cutter, Debra Bernstein
Copied register files for data processors having many execution units

Patent number: 6629232

Abstract: Interconnect-dominated large register files are reduced in chip area and delay time. A register file in a processor having a number of execution units is divided into multiple copies. Different groups of execution units can read from and write to their own copy of the file registers by a set of local read and write ports. All of the register-file copies are synchronized by writing data from the execution units to remote write ports in at least some registers in other copies of the register file. Each copy can be divided into local and global registers. While all copies of the global registers continue to be written by the remote write ports, the local registers can be written only by a local cluster of execution units. Alternatively or additionally, all of the execution units can write to their local register-file copy, but only some of the units can write the global registers in all copies of the register file.

Type: Grant

Filed: July 3, 2000

Date of Patent: September 30, 2003

Assignee: Intel Corporation

Inventors: Ken Arora, Harshvardhan Sharangpani, Rajiv Gupta
Method and apparatus for fault handling in computer systems

Patent number: 6625726

Abstract: A method and apparatus for fault handling in computer systems. In one embodiment, a first register is used to store an address which points to the top of a stack. The address stored in the first register may be updated during the execution of an instruction. A second register may be used to store an address previously first register. The contents of the second register may be kept unchanged until the retirement of the instruction that is currently executing. If a fault occurs during execution of the instruction, a microcode fault handler may perform routines that may clear the fault or those conditions which led to the fault. The microcode fault handler may also copy the contents of the second register back into the first register. Execution of the instruction may be restarted from the operation just prior to when the fault occurred. The program from which the instruction originated may then continue to run.

Type: Grant

Filed: June 2, 2000

Date of Patent: September 23, 2003

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael T. Clark, Scott A. White
Non-stalling circular counterflow pipeline processor with reorder buffer

Publication number: 20030177340

Abstract: A system and method of executing instructions within a counterflow pipeline processor. The counterflow pipeline processor includes an instruction pipeline, a data pipeline, a reorder buffer and a plurality of execution units. An instruction and one or more operands issue into the instruction pipeline and a determination is made at one of the execution units whether the instruction is ready for execution. If so, the operands are loaded into the execution unit and the instruction executes. The execution unit is monitored for a result and, when the result arrives, it is stored into the result pipeline. If the instruction reaches the end of the pipeline without executing it wraps around and is sent down the instruction pipeline again.

Type: Application

Filed: March 18, 2003

Publication date: September 18, 2003

Applicant: Intel Corporation

Inventors: Kenneth J. Janik, Shih-Lien L. Lu, Michael F. Miller
Store to load forward predictor training using delta tag

Patent number: 6622237

Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.

Type: Grant

Filed: January 3, 2000

Date of Patent: September 16, 2003

Assignee: Advanced Micro Devices, Inc.

Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad, Keith R. Schakel
Scheduler which retries load/store hit situations

Patent number: 6622235

Abstract: A scheduler issues memory operations without regard to whether or not resources are available to handle each possible execution outcome of that memory operation. The scheduler also retains the memory operation after issuance. If a condition occurs which prevents correct execution of the memory operation, the memory operation is retried. The scheduler subsequently reschedules and reissues the memory operation in response to the retry. Additionally, the scheduler may receive a retry type indicating the reason for retry. Certain retry types may indicate a delayed reissuance of the memory operation until the occurrence of a subsequent event. In response to such retry types, the scheduler monitors for the subsequent event and delays reissuance until the event is detected. The scheduler may include a physical address buffer to detect a load memory operation which incorrectly issued prior to an older store memory operation upon which it is dependent for the memory operation.

Type: Grant

Filed: January 3, 2000

Date of Patent: September 16, 2003

Assignee: Advanced Micro Devices, Inc.

Inventors: James B. Keller, Ramsey W. Haddad, Stephan G. Meier
Clustered processors in an emulation engine

Patent number: 6618698

Abstract: Clusters of processors are interconnected as an emulation engine such that processors share input and data stacks, and the setup and storing of results are done in parallel, but the output of one evaluation unit is connected to the input of the next evaluation unit. A set of ‘cascade’ connections provides access to the intermediate values. By tapping intermediate values from one processor, and feeding them to the next, a significant emulation speedup is achieved.

Type: Grant

Filed: August 12, 1999

Date of Patent: September 9, 2003

Assignee: Quickturn Design Systems, Inc.

Inventors: William F. Beausoleil, Tak-kwong Ng, Helmut Roth, Peter Tannenbaum, N. James Tomassetti
Cycle segmented prefix circuits

Patent number: 6609189

Abstract: The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU.

Type: Grant

Filed: March 12, 1999

Date of Patent: August 19, 2003

Assignee: Yale University

Inventors: Bradley C. Kuszmaul, Dana Sue Henry-Kuszmaul
Method and apparatus for re-creating the trace of an emulated instruction set when executed on hardware native to a different instruction set field

Patent number: 6609247

Abstract: A method and an apparatus for re-creating a trace of instructions from an emulated instruction set when running on hardware optimized for a different instruction set, such as IA-32 instructions running on an IA-64 machine, are disclosed. An execution trace buffer is created that maintains desired information about instructions as they are executed and retired. The invention may be configured such that certain desired information helpful to debugging the system may be written to the buffer as the instructions are retired. This information may include the addresses of sequential or branch instructions, or other relevant information that can be gathered continuously and non-intrusively as instructions are executed. The information may be read from the buffer and output in a machine-visible form at the user's convenience.

Type: Grant

Filed: February 18, 2000

Date of Patent: August 19, 2003

Assignee: Hewlett-Packard Development Company

Inventors: Anuj Dua, Russell Clarence Brockmann, Susith Rohana Fernando, Kevin David Safford
Context pipelines

Publication number: 20030145173

Abstract: A method of parallel hardware-based multithreaded processing is described. The method includes assigning tasks for packet processing to programming engines and establishing pipelines between programming stages, which correspond to the programming engines. The method also includes establishing contexts for the assigned tasks on the programming engines and using a software controlled cache such as a CAM to transfer data between next neighbor registers residing in the programming engines.

Type: Application

Filed: January 25, 2002

Publication date: July 31, 2003

Inventors: Hugh M. Wilkinson, Mark B. Rosenbluth, Matthew J. Adiletta, Debra Bernstein, Gilbert Wolrich
Local stall/hazard detect in superscalar, pipelined microprocessor

Patent number: 6591360

Abstract: A method and apparatus that generates a simplified, localized version (“a local stall”) of a global stall to improve the performance of a pipelined microprocessor. The local stall is generated when a data-dependency hazard is detected for a local consumer. Utilizing circuitry used in the pipelined microprocessor's data-forwarding circuitry, the local stall is generated with a relatively minor increase in circuitry. The local stall is generated much sooner than the global stall, arriving much sooner in a local pipeline. The local pipeline utilizes the local stall to override the global stall, when appropriate, and to ensure that correct data is read for a local consumer and to operate more efficiently than a standard pipeline without a local stall.

Type: Grant

Filed: January 18, 2000

Date of Patent: July 8, 2003

Assignee: Hewlett-Packard Development Company

Inventors: Donald C. Soltis, Jr., Rohit Bhatia, Mark Gibson
Providing parallel computing reduction operations

Publication number: 20030126589

Abstract: A method and apparatus for a reduction operation is described. A method may be utilized that includes receiving a first program unit in a parallel computing environment, the first program unit may include a reduction operation to be performed and translating the first program unit into a second program unit, the second program unit may associate the reduction operation with a set of one or more low-level instructions that may, in part, perform the reduction operation.

Type: Application

Filed: January 2, 2002

Publication date: July 3, 2003

Inventors: David K. Poulsen, Sanjiv M. Shah, Paul M. Petersen, Grant E. Haab, Jay P. Hoeflinger
Method and apparatus for performing addressing operations in a superscalar superpipelined processor

Patent number: 6578135

Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.

Type: Grant

Filed: January 11, 2000

Date of Patent: June 10, 2003

Assignee: Broadcom Corporation

Inventors: Dan Dobberpuhl, Robert Stepanian
Method and mechanism for speculatively executing threads of instructions

Patent number: 6574725

Abstract: A processor architecture containing multiple closely coupled processors in a form of symmetric multiprocessing system is provided. The special coupling mechanism allows it to speculatively execute multiple threads in parallel very efficiently. Generally, the operating system is responsible for scheduling various threads of execution among the available processors in a multiprocessor system. One problem with parallel multithreading is that the overhead involved in scheduling the threads for execution by the operating system is such that shorter segments of code cannot efficiently take advantage of parallel multithreading. Consequently, potential performance gains from parallel multithreading are not attainable. Additional circuitry is included in a form of symmetrical multiprocessing system which enables the scheduling and speculative execution of multiple threads on multiple processors without the involvement and inherent overhead of the operating system.

Type: Grant

Filed: November 1, 1999

Date of Patent: June 3, 2003

Assignee: Advanced Micro Devices, Inc.

Inventors: Uwe Kranich, David S. Christie
Thread switch control in a multithreaded processor system

Patent number: 6567839

Abstract: A system and method for performing computer processing operations in a data processing system includes a multithreaded processor and thread switch logic. The multithreaded processor is capable of switching between two or more threads of instructions which can be independently executed. Each thread has a corresponding state in a thread state register depending on its execution status. The thread switch logic contains a thread switch control register to store the conditions upon which a thread switch can occur. Upon the occurrence of a thread switch event, the state and priority of all threads are dynamically interrogated to determine which thread should be the active thread executing the processor. The thread switch logic has a time-out register which forces a thread switch when execution of the active thread in the multithreaded processor exceeds a programmable period of time.

Type: Grant

Filed: October 23, 1997

Date of Patent: May 20, 2003

Assignee: International Business Machines Corporation

Inventors: John Michael Borkenhagen, Richard James Eickemeyer, William Thomas Flynn, Sheldon Bernard Levenstein, Andrew Henry Wottreng
Data processing circuit, microcomputer, and electronic equipment

Patent number: 6560692

Abstract: The data processing circuit of this invention enables efficient description and execution of processes that act upon the stack pointer, using short instructions. It also enables efficient description of processes that save and restore the contents of registers, increasing the speed of processing of interrupts and subroutine calls and returns. A CPU that uses this data processing circuit comprises a dedicated stack pointer register SP and uses an instruction decoder to decode a group of dedicated stack pointer instructions that specify the SP as an implicit operand. This group of dedicated stack pointer instructions are implemented in hardware by using general-purpose registers, the PC, the SP, an address adder, an ALU, a PC incrementer, internal buses, internal signal lines, and external buses.

Type: Grant

Filed: May 20, 1997

Date of Patent: May 6, 2003

Assignee: Seiko Epson Corporation

Inventors: Makoto Kudo, Satoshi Kubota, Yoshiyuki Miyayama, Hisao Sato
System and method for managing the execution of instruction groups having multiple executable instructions

Patent number: 6553480

Abstract: A group completion table (GCT) that manages the execution of instruction groups having more than one executable instruction is disclosed. The GCT includes a plurality of table entries, wherein each of the table entries is associated with a respective instruction group. Each table entry in the GCT includes a plurality of instruction completion identifiers, wherein each of the instruction completion identifiers corresponds to a specific instruction in the associated instruction group. The table entry also includes a trouble identifier that is utilized to flag the occurrence of any exception condition encountered in the execution of any instruction in the instruction group. In a related embodiment, the trouble identifier utilized in the table entry is a single bit.

Type: Grant

Filed: November 5, 1999

Date of Patent: April 22, 2003

Assignee: International Business Machines Corporation

Inventors: Hoichi Cheong, Hung Qui Le
High-performance, superscalar-based computer system with out-of-order instruction execution

Publication number: 20030070060

Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.

Type: Application

Filed: October 30, 2002

Publication date: April 10, 2003

Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
Parallel computation processor, parallel computation control method and program thereof

Publication number: 20030065905

Abstract: A parallel computation processor being capable of high-speed loop operation. When instruction decoders decode the VLOOP instruction, which triggers loop operation, an instruction buffer starts storing normal instructions. The instruction buffer dispatches a VLIW instruction composed of n pieces of normal instructions to execution units each time n pieces of instructions are stored therein. The execution units concurrently execute the instructions. After all instructions comprised in a loop have been stored in the buffer and once dispatched as VLIW instructions to be executed, the loop is executed repeatedly.

Type: Application

Filed: September 26, 2002

Publication date: April 3, 2003

Applicant: NEC CORPORATION

Inventor: Daiji Ishii
Resolving dependencies among concurrently dispatched instructions in a superscalar microprocessor

Patent number: 6542986

Abstract: A superscalar processor may issue multiple instructions per clock cycle. Included in a superscalar processor may be a reorder buffer which stores information corresponding to concurrently dispatched instructions. Dependencies may exist among the instructions which are concurrently dispatched. To resolve this dependency, when a dependency is detected amongst a group of concurrently dispatched instructions, an indication of the dependency, along with an indication of the position of the dependency, is conveyed to the corresponding reservation station. When the reservation station receives the indication of the dependency, the operand tag associated with the dependency may be replaced with the correct tag. Advantageously, the circuitry needed to resolve the dependency may be moved out of the critical path of the processor; thus, improving the performance of the processor by allowing it to operate at an increased frequency.

Type: Grant

Filed: November 9, 1999

Date of Patent: April 1, 2003

Assignee: Advanced Micro Devices, Inc.

Inventor: Scott A. White
Method and apparatus for pre-processing instructions for a processor

Patent number: 6539471

Abstract: Method and apparatus for reducing or eliminating retirement logic in an out-of-order processor are disclosed. Instructions are processed using a processing unit capable of out-of-order processing and having architectural registers having an architectural state. Groups of instructions are prepared for processing by processing unit, wherein within each group to be processed the instructions producing the final state of an architectural register are changed so that they write to an output copy of the architectural state, the instructions reading architectural registers are changed to read from an input copy of the architectural state, and the instructions within each group producing results to architectural registers that would be overwritten by another instruction in the group are changed to write their results to temporary registers.

Type: Grant

Filed: December 23, 1998

Date of Patent: March 25, 2003

Assignee: Intel Corporation

Inventor: Gad S. Sheaffer
Method and apparatus for load buffers

Patent number: 6526499

Abstract: The present invention discloses a method and apparatus for implementing a senior load instruction type. An instruction requesting a memory reference is decoded. The decoded instruction is then dispatched to a memory ordering unit. The instruction is retired from a load buffer and is executed after retiring.

Type: Grant

Filed: January 10, 2001

Date of Patent: February 25, 2003

Assignee: Intel Corporation

Inventors: Salvador Palanca, Shekoufeh Qawami, Niranjan L. Cooray, Angad Narang, Subramaniam Maiyuran
Dynamic unit selection in a process control system

Patent number: 6522934

Abstract: A process control system includes a controller that executes a control routine which performs a series of unit procedures within a process. The control routine is written or created to specify the class of unit to be used for each unit procedure, but not the actual unit itself. At the start of each unit procedure of the control routine, a dynamic unit selection routine selects a particular unit as the unit to be used during operation of that unit procedure. When called, the dynamic unit selection routine determines a set of possible units to be used, determines if each of the set of possible units is suitable for use during that unit procedure of the control routine based on a suitability criterion, prioritizes the units that meet the suitability criterion based on a priority criterion and selects the particular unit from the prioritized list of suitable units in order of priority.

Type: Grant

Filed: July 2, 1999

Date of Patent: February 18, 2003

Assignee: Fisher-Rosemount Systems, Inc.

Inventors: William G. Irwin, David L. Deitz
Printer having processor with instruction cache and compressed program store

Patent number: 6515759

Abstract: A printer provides additional read/write memory for image processing by operating with stored programs in compressed form. When needed for execution, instructions of a compressed program are expanded by a decompression circuit on the fly in an instruction cache. In a preferred embodiment, the instruction cache includes dynamic random access memory (DRAM). Further, the processor for executing the expanded instructions, the decompression circuit, and the instruction cache are integrated together on the same chip. A printer having a processor for formatting incoming data in a page description language (PDL), for example, executes the instructions of the PDL interpreter program from the cache while the PDL program as a whole is stored in compressed format in off-chip ROM or received from an external computer (downloaded) into off-chip RAM.

Type: Grant

Filed: August 1, 2000

Date of Patent: February 4, 2003

Assignee: Hewlett-Packard Company

Inventor: Kenneth K. Smith
Superscalar RISC instruction scheduling

Publication number: 20030005260

Abstract: A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands. A data dependance check circuit is included for determining data dependencies between the instructions. A tag assignment circuit generates one or more tags to specify the location of operands, based on the data dependencies determined by the data dependance check circuit. A set of register file port multiplexers select the tags generated by the tag assignment circuit and pass the tags onto the read address ports of the register file for storing execution results.

Type: Application

Filed: March 1, 2002

Publication date: January 2, 2003

Inventors: Sanjiv Garg, Kevin Ray Iadonato, Le Trong Nguyen, Johannes Wang

prev … 2 3 4 5 6 7 8 9 10 … next