Superscalar Patents (Class 712/23)
-
Patent number: 6725355Abstract: A microprocessor having an internal memory for storing data to be process, a data pointer register for storing an address on the internal memory, a decoder 36 for decoding an instruction, a general-purpose register module 11 including data registers r0 and r1 for storing data read from an address on the internal memory stored in the data pointer register in accordance with a request to read data stored in the internal memory, and an ALU 13 for performing processing using data stored in the general-purpose register module 11 based on the result of decoding by the decoder 36 and writing the result of processing in the general-purpose register module 11.Type: GrantFiled: August 11, 1998Date of Patent: April 20, 2004Assignee: Sony CorporationInventor: Yoshihiko Imamura
-
Publication number: 20040073773Abstract: A novel vector processor architecture, and hardware and processing features associated therewith, provide both vector processing and superscalar processing features.Type: ApplicationFiled: August 6, 2003Publication date: April 15, 2004Inventor: Victor Demjanenko
-
Patent number: 6721873Abstract: A method and apparatus for improving dispersal performance of instruction threads is described. In one embodiment, the dispersal logic determines whether the instructions supplied to it include any NOP instructions. When a NOP instruction is detected, the dispersal logic places the NOP into a no-op port for execution. All other instructions are distributed to the proper execution pipes in a normal manner. Because the NOP instructions do not use the execution resources of other instructions, all instruction threads can be executed in one cycle.Type: GrantFiled: December 29, 2000Date of Patent: April 13, 2004Assignee: Intel CorporationInventors: Sailesh Kottapalli, Udo Walterscheidt, Andrew Sun, Thomas Yeh, Kinkee Sit
-
Method and apparatus for performing addressing operations in a superscalar, superpipelined processor
Patent number: 6718458Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.Type: GrantFiled: March 27, 2003Date of Patent: April 6, 2004Assignee: Broadcom CorporationInventors: Dan Dobberpuhl, Robert Stepanian -
Patent number: 6714961Abstract: The invention is directed toward a multiprocessing system having multiple processing units. For at least one of the processing units in the multiprocessing system, a first job signal is assigned to the processing unit for speculative execution of a corresponding first job, and a further job signal is assigned to the processing unit for speculative execution of a corresponding further job. The speculative execution of said further job is initiated when the processing unit has completed execution of the first job. If desirable, even more job signals may be assigned to the processing unit for speculative execution. In this way, multiple job signals are assigned to the processing units of the processing system, and the processing units are allowed to execute a plurality of jobs speculatively while waiting for commit priority.Type: GrantFiled: November 12, 1999Date of Patent: March 30, 2004Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventors: Per Anders Holmberg, Terje Egeland, Nils Ola Linnermark, Karl Oscar Joachim Strömbergson, Magnus Carlsson
-
Publication number: 20040059894Abstract: The program to be executed is compiled by translating it into native instructions of the instruction-set architecture of the processor system, organizing the instructions deriving from the translation of the program into respective bundles in an order of successive bundles, each bundle grouping together instructions adapted to be executed in parallel by the processor system. The bundles of instructions are ordered into respective sub-bundles, said sub-bundles identifying a first set of instructions, which must be executed before the instructions belonging to the next bundle of said order, and a second set of instructions, which can be executed both before and in parallel with respect to the instructions belonging to said subsequent bundle of said order.Type: ApplicationFiled: July 1, 2003Publication date: March 25, 2004Applicant: STMicroelectronics S.r.I.Inventors: Fabrizio Simone Rovati, Antonio Maria Borneo, Danilo Pietro Pau
-
Patent number: 6711670Abstract: A superscalar processing system that detects data hazards within instruction groups utilizes a memory, a plurality of pipelines, an instruction dispersal unit (IDU), and a control mechanism. The memory includes a plurality of entries that respectively correspond with a plurality of registers. The IDU receives an instruction group that includes a plurality of instructions and transmits the instructions of the instruction group to the plurality of pipelines. The control mechanism analyzes one of the instructions and identifies an entry in the memory that corresponds with a register associated with the one instruction. The control mechanism then analyzes the entry and transmits a warning signal in response to a determination that the entry indicates that another instruction within the instruction group is associated with the register.Type: GrantFiled: October 14, 1999Date of Patent: March 23, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: Donald Charles Soltis, Jr., Ronny Lee Arnold
-
Patent number: 6708269Abstract: In a multi-threaded system, such as in a multi-processor system, different types of fences are provided to force completion of programmatically earlier instructions in a program. The types of fences can be thread-specific, and different types of fences are used based on different kinds of conditions, instructions, operations, or memory types. When a fence is executed, senior stores, request buffers, bus queues, or any combination of these stages in an execution pipeline can be drained. Fetches at a front end of the pipeline can also be killed to ensure that the bus queue can be drained.Type: GrantFiled: December 30, 1999Date of Patent: March 16, 2004Assignee: Intel CorporationInventors: Keshavan K. Tiruvallur, Douglas M. Carmean, Robert J. Greiner, Muntaquim Chowdhury, Madhavan Parthasarathy
-
Patent number: 6704856Abstract: A method of compacting an instruction queue in an out of order processor includes determining the number of invalid instructions below and including each row in the queue, by counting invalid bits or validity indicators associated with rows below and up to the current row. For each row, multiplexor select signals are generated from the flat vector counts for the N rows above and including the present row, and from the validity indicators associated with the N rows, where N is a predetermined value. A multiplexor associated with a particular row selects one of the N rows according to the select value, and moves or passes the instruction held in the selected row to the present row. A row's select value is determined by forming a diagonal from the N count vectors corresponding to the N rows above and including the present row, and logically ANDing, each diagonal bit with the valid bit associated with the same row. Each row's count vector is determined in two stages.Type: GrantFiled: December 17, 1999Date of Patent: March 9, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: James A. Farrell, Timothy C. Fischer, Daniel L. Leibholz, Bruce A. Gieseke
-
Patent number: 6694424Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.Type: GrantFiled: January 3, 2000Date of Patent: February 17, 2004Assignee: Advanced Micro Devices, Inc.Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad
-
Patent number: 6691221Abstract: A computing system has first and second instruction storing circuits, each instruction storing circuit storing N instructions for parallel output. An instruction dispatch circuit, coupled to the first instruction storing circuit dispatches L instructions stored in the first instruction storing circuit, wherein L is less than or equal to N. An instruction loading circuit, coupled to the instruction dispatch circuit and to the first and second instruction storing circuits, loads L instructions from the second instruction storing circuit into the first instruction storing circuit after the L instructions are dispatched from the first instruction storing circuit and before further instructions are dispatched from the first instruction storing circuit. The instruction loading circuit loads the L instructions from the second instruction storing circuit into the positions previously occupied by the L instructions dispatched from the first instruction storing circuit.Type: GrantFiled: May 24, 2001Date of Patent: February 10, 2004Assignees: Mips Technologies, Inc., Kabushiki Kaisha ToshibaInventors: Chandra Joshi, Paul Rodman, Peter Hsu, Monica R. Nofal
-
Publication number: 20040019766Abstract: Multiple instructions, specifying equivalent operations but designating different execution units, are stored beforehand on an instruction exchange table. First, a primary compiler compiles a source program into a set of machine-readable instructions. From the set of instructions, an instruction parallelizer generates a set of long instruction words. Specifically, an instruction identifier identifies one of the instructions in the set with one of the instructions stored on the instruction exchange table. Then, an instruction replacer replaces the instruction in question with another one of the instructions that is also stored on the instruction exchange table, specifies an equivalent operation but designates a different execution unit as a target. In this manner, the number of parallelly executable instructions can be increased, while the number of no-operation instructions can be reduced, thus generating a parallelized instruction set at a higher level of parallelism.Type: ApplicationFiled: July 18, 2003Publication date: January 29, 2004Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.Inventor: Kenichi Kawaguchi
-
Patent number: 6675288Abstract: A technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. The technique further involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.Type: GrantFiled: May 9, 2002Date of Patent: January 6, 2004Assignee: Hewlett-Packard Development Company L.P.Inventors: James Arthur Farrell, Sharon Marie Britton, Harry Ray Fair, III, Bruce Gieseke, Daniel Lawrence Leibholz, Derrick R. Meyer
-
Patent number: 6658655Abstract: A threaded interpreter (916) is suitable for executing a program comprising a series of program instructions stored in a memory (904). For the execution of a program instruction the threaded interpreter includes a preparatory unit (918) for executing a plurality of preparatory steps making the program instruction available in the threaded interpreter, and an execution unit (920) with one or more machine instructions emulating the program instruction. According to the invention, the threaded interpreter is designed such that during the execution on an instruction-level parallel processor of the series of program instructions machine instructions implementing a first one of the preparatory steps are executed in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions.Type: GrantFiled: December 6, 1999Date of Patent: December 2, 2003Assignee: Koninklijke Philips Electronics N.V.Inventors: Jan Hoogerbrugge, Alexander Augusteijn
-
Patent number: 6658550Abstract: An asynchronous processor having pipelined instruction fetching and execution to implement concurrent execution of instructions by two or more execution units. A writeback unit is connected to execution units and memory units to control information updates and to handle precise exception. A pipelined completion mechanism can be implemented to improve the throughput.Type: GrantFiled: April 30, 2002Date of Patent: December 2, 2003Assignee: California Institute of TechnologyInventors: Alain J. Martin, Andrew Lines, Rajit Manohar, Uri Cummings, Mika Nystroem
-
Patent number: 6651159Abstract: A floating point register stack for a processor combines a plurality of two general purpose registers to form a register stack for x86 instructions and leaves the remaining general purpose registers for native instructions of the processor. By mapping x86 sources into the stack of two general purpose registers and operating x86 instructions on the x86 stack, the register stack for the processor is able to support both the processor's native instruction set and the x86 instruction set without increasing the size of the register stack.Type: GrantFiled: November 29, 1999Date of Patent: November 18, 2003Assignee: ATI International SRLInventors: Tiruvur R. Ramesh, Sanjay Mansingh, Korbin Van Dyke
-
Patent number: 6651161Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.Type: GrantFiled: January 3, 2000Date of Patent: November 18, 2003Assignee: Advanced Micro Devices, Inc.Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad
-
Patent number: 6651164Abstract: A superscalar processing system that detects data hazards within instruction groups transmitted to the processing system utilizes a content-addressable memory, a plurality of pipelines, an instruction dispersal unit (IDU), and a control mechanism. The IDU receives an instruction group that includes a plurality of instructions and transmits the instructions of the instruction group to the plurality of pipelines. The control mechanism stores register identifiers of the instructions in the content-addressable memory and determines whether a register identifier of one of the instructions is stored in the content-addressable memory. When the register identifier of the one instruction is stored in the content-addressable memory, the control mechanism transmits a warning signal indicating that one of the instruction groups contained a data hazard.Type: GrantFiled: October 14, 1999Date of Patent: November 18, 2003Assignee: Hewlett-Packard Development Company, L.P.Inventors: Donald Charles Soltis, Jr., Ronny Lee Arnold
-
Publication number: 20030212878Abstract: A scaleable microprocessor architecture has an efficient and orthogonal instruction set of 20 basic instructions, and a scaleable program word size from 15 bits up, including but not limited to 16, 24, 32, and 64 bits. As many instructions are packed into a single program word as allowed by the size of a program word. An integral return stack is used for nested subroutine calls and returns. An integral data stack is also used to pass parameters among nested subroutines. The simplified instruction set and the dual stack architecture make it possible to execute all instructions in a single clock cycle from a single phase master clock. Additional instructions can be added to facilitate accessing arrays in memory, for multiplication and division of integers, for real time interrupts, and to support an UART I/O device. This scaleable microprocessor architecture greatly increases code density and processing speed while decreasing significantly silicon area and power consumption.Type: ApplicationFiled: May 7, 2002Publication date: November 13, 2003Inventor: Chen-Hanson Ting
-
Patent number: 6647486Abstract: Routine processing for routine data, non-routine processing for routine data and general non-routine processing are to be processed efficiently. To this end, a main CPU has a CPU core having a parallel computational mechanism, a command cache and a data cache as ordinary cache units, and a scratch-pad memory SPR which is an internal high-speed memory capable of performing direct memory accessing (DMA) suited for routine processing. A floating decimal point vector processor (VPE) has an internal high-speed memory (VU-MEM) capable of DMA processing and is tightly connected to the main CPU to form a co-processor. The VPE has a high-speed internal memory (VU-MEM) capable of DMA processing. The DMA controller (DMAC) controls DMA transfer between the main memory and the SPR, between the main memory and the (VU-MEM) and between the (VU-MEM) and the SPR.Type: GrantFiled: May 22, 2002Date of Patent: November 11, 2003Assignee: Sony Computer Entertainment Inc.Inventor: Akio Ohba
-
Patent number: 6647485Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.Type: GrantFiled: May 10, 2001Date of Patent: November 11, 2003Assignee: Seiko Epson CorporationInventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6643766Abstract: Speculative pre-fetching and pre-flushing of additional cache lines minimize cache miss latency and coherency check latency of an out of order instruction execution processor. A pre-fetch/pre-flush slot (DPRESLOT) is provided in a memory queue (MQUEUE) of the out-of-order execution processor. The DPRESLOT monitors the transactions between a system interface, e.g., the system bus, and an address reorder buffer slot (ARBSLOT) and/or between the system interface and a cache coherency check slot (CCCSLOT). When a cache miss is detected, the DPRESLOT causes one or more cache lines in addition to the data line, which caused the current cache miss, to be pre-fetched from the memory hierarchy into the cache memory (DCACHE) in anticipation that the additional data would be required in the near future.Type: GrantFiled: May 4, 2000Date of Patent: November 4, 2003Assignee: Hewlett-Packard Development Company, L.P.Inventors: Gregg B Lesartre, David Jerome Johnson
-
Patent number: 6633970Abstract: A mechanism is provided for allowing a processor to recover from a failure of a predicted path of instructions (e.g., from a mispredicted branch or other event). The mechanism includes a plurality of physical registers, each physical register can store either architectural data or speculative data. The apparatus also includes a primary array to store a mapping from logical registers to physical registers, the primary array storing a speculative state of the processor. The apparatus also includes a buffer coupled to the primary array to store information identifying which physical registers store architectural data and which physical registers store speculative data. According to another embodiment, a history buffer is coupled to the secondary array and stores historical physical register to logical register mappings performed for each of a plurality of instructions part of a predicted path.Type: GrantFiled: December 28, 1999Date of Patent: October 14, 2003Assignee: Intel CorporationInventors: David W. Clift, Darrell D. Boggs, David J. Sager
-
Publication number: 20030191923Abstract: A computing system as described in which individual instructions are executable in parallel by processing pipelines, and instructions to be executed in parallel by different pipelines are supplied to the pipelines simultaneously. The system includes storage for storing an arbitrary number of the instructions to be executed. The instructions to be executed are tagged with pipeline identification tags indicative of the pipeline to which they should be dispatched. The pipeline identification tags are supplied to a system which controls a crossbar switch, enabling the tags to be used to control the switch and supply the appropriate instructions simultaneously to the differing pipelines.Type: ApplicationFiled: April 9, 1998Publication date: October 9, 2003Inventors: HOWARD G. SACHS, Siamak Arya
-
Patent number: 6631464Abstract: An instruction fetch control system prefetches a branch instruction in a pipeline system and fetches a branch target instruction of the branch instruction. The control system comprises a first branch judgement circuit for conducting a branch condition judgement in a stage prior to the branch judgement stage in which a second and original branch judgement of the branch instruction is conducted, and a circuit for starting a prefetch of instructions following said branch target instruction without waiting for the branch judgement stage where the first branch judgement circuit judges that the branch is successful.Type: GrantFiled: June 10, 1993Date of Patent: October 7, 2003Assignee: Fujitsu LimitedInventors: Tsuyoshi Mori, Seishi Okada
-
Patent number: 6631462Abstract: A method includes pushing a datum onto a stack by a first processor and popping the datum off the stack by a second processor.Type: GrantFiled: January 5, 2000Date of Patent: October 7, 2003Assignee: Intel CorporationInventors: Gilbert Wolrich, Matthew J. Adiletta, William Wheeler, Daniel Cutter, Debra Bernstein
-
Patent number: 6629232Abstract: Interconnect-dominated large register files are reduced in chip area and delay time. A register file in a processor having a number of execution units is divided into multiple copies. Different groups of execution units can read from and write to their own copy of the file registers by a set of local read and write ports. All of the register-file copies are synchronized by writing data from the execution units to remote write ports in at least some registers in other copies of the register file. Each copy can be divided into local and global registers. While all copies of the global registers continue to be written by the remote write ports, the local registers can be written only by a local cluster of execution units. Alternatively or additionally, all of the execution units can write to their local register-file copy, but only some of the units can write the global registers in all copies of the register file.Type: GrantFiled: July 3, 2000Date of Patent: September 30, 2003Assignee: Intel CorporationInventors: Ken Arora, Harshvardhan Sharangpani, Rajiv Gupta
-
Patent number: 6625726Abstract: A method and apparatus for fault handling in computer systems. In one embodiment, a first register is used to store an address which points to the top of a stack. The address stored in the first register may be updated during the execution of an instruction. A second register may be used to store an address previously first register. The contents of the second register may be kept unchanged until the retirement of the instruction that is currently executing. If a fault occurs during execution of the instruction, a microcode fault handler may perform routines that may clear the fault or those conditions which led to the fault. The microcode fault handler may also copy the contents of the second register back into the first register. Execution of the instruction may be restarted from the operation just prior to when the fault occurred. The program from which the instruction originated may then continue to run.Type: GrantFiled: June 2, 2000Date of Patent: September 23, 2003Assignee: Advanced Micro Devices, Inc.Inventors: Michael T. Clark, Scott A. White
-
Publication number: 20030177340Abstract: A system and method of executing instructions within a counterflow pipeline processor. The counterflow pipeline processor includes an instruction pipeline, a data pipeline, a reorder buffer and a plurality of execution units. An instruction and one or more operands issue into the instruction pipeline and a determination is made at one of the execution units whether the instruction is ready for execution. If so, the operands are loaded into the execution unit and the instruction executes. The execution unit is monitored for a result and, when the result arrives, it is stored into the result pipeline. If the instruction reaches the end of the pipeline without executing it wraps around and is sent down the instruction pipeline again.Type: ApplicationFiled: March 18, 2003Publication date: September 18, 2003Applicant: Intel CorporationInventors: Kenneth J. Janik, Shih-Lien L. Lu, Michael F. Miller
-
Patent number: 6622237Abstract: A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC.Type: GrantFiled: January 3, 2000Date of Patent: September 16, 2003Assignee: Advanced Micro Devices, Inc.Inventors: James B. Keller, Thomas S. Green, Wei-Han Lien, Ramsey W. Haddad, Keith R. Schakel
-
Patent number: 6622235Abstract: A scheduler issues memory operations without regard to whether or not resources are available to handle each possible execution outcome of that memory operation. The scheduler also retains the memory operation after issuance. If a condition occurs which prevents correct execution of the memory operation, the memory operation is retried. The scheduler subsequently reschedules and reissues the memory operation in response to the retry. Additionally, the scheduler may receive a retry type indicating the reason for retry. Certain retry types may indicate a delayed reissuance of the memory operation until the occurrence of a subsequent event. In response to such retry types, the scheduler monitors for the subsequent event and delays reissuance until the event is detected. The scheduler may include a physical address buffer to detect a load memory operation which incorrectly issued prior to an older store memory operation upon which it is dependent for the memory operation.Type: GrantFiled: January 3, 2000Date of Patent: September 16, 2003Assignee: Advanced Micro Devices, Inc.Inventors: James B. Keller, Ramsey W. Haddad, Stephan G. Meier
-
Patent number: 6618698Abstract: Clusters of processors are interconnected as an emulation engine such that processors share input and data stacks, and the setup and storing of results are done in parallel, but the output of one evaluation unit is connected to the input of the next evaluation unit. A set of ‘cascade’ connections provides access to the intermediate values. By tapping intermediate values from one processor, and feeding them to the next, a significant emulation speedup is achieved.Type: GrantFiled: August 12, 1999Date of Patent: September 9, 2003Assignee: Quickturn Design Systems, Inc.Inventors: William F. Beausoleil, Tak-kwong Ng, Helmut Roth, Peter Tannenbaum, N. James Tomassetti
-
Patent number: 6609189Abstract: The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU.Type: GrantFiled: March 12, 1999Date of Patent: August 19, 2003Assignee: Yale UniversityInventors: Bradley C. Kuszmaul, Dana Sue Henry-Kuszmaul
-
Patent number: 6609247Abstract: A method and an apparatus for re-creating a trace of instructions from an emulated instruction set when running on hardware optimized for a different instruction set, such as IA-32 instructions running on an IA-64 machine, are disclosed. An execution trace buffer is created that maintains desired information about instructions as they are executed and retired. The invention may be configured such that certain desired information helpful to debugging the system may be written to the buffer as the instructions are retired. This information may include the addresses of sequential or branch instructions, or other relevant information that can be gathered continuously and non-intrusively as instructions are executed. The information may be read from the buffer and output in a machine-visible form at the user's convenience.Type: GrantFiled: February 18, 2000Date of Patent: August 19, 2003Assignee: Hewlett-Packard Development CompanyInventors: Anuj Dua, Russell Clarence Brockmann, Susith Rohana Fernando, Kevin David Safford
-
Publication number: 20030145173Abstract: A method of parallel hardware-based multithreaded processing is described. The method includes assigning tasks for packet processing to programming engines and establishing pipelines between programming stages, which correspond to the programming engines. The method also includes establishing contexts for the assigned tasks on the programming engines and using a software controlled cache such as a CAM to transfer data between next neighbor registers residing in the programming engines.Type: ApplicationFiled: January 25, 2002Publication date: July 31, 2003Inventors: Hugh M. Wilkinson, Mark B. Rosenbluth, Matthew J. Adiletta, Debra Bernstein, Gilbert Wolrich
-
Patent number: 6591360Abstract: A method and apparatus that generates a simplified, localized version (“a local stall”) of a global stall to improve the performance of a pipelined microprocessor. The local stall is generated when a data-dependency hazard is detected for a local consumer. Utilizing circuitry used in the pipelined microprocessor's data-forwarding circuitry, the local stall is generated with a relatively minor increase in circuitry. The local stall is generated much sooner than the global stall, arriving much sooner in a local pipeline. The local pipeline utilizes the local stall to override the global stall, when appropriate, and to ensure that correct data is read for a local consumer and to operate more efficiently than a standard pipeline without a local stall.Type: GrantFiled: January 18, 2000Date of Patent: July 8, 2003Assignee: Hewlett-Packard Development CompanyInventors: Donald C. Soltis, Jr., Rohit Bhatia, Mark Gibson
-
Publication number: 20030126589Abstract: A method and apparatus for a reduction operation is described. A method may be utilized that includes receiving a first program unit in a parallel computing environment, the first program unit may include a reduction operation to be performed and translating the first program unit into a second program unit, the second program unit may associate the reduction operation with a set of one or more low-level instructions that may, in part, perform the reduction operation.Type: ApplicationFiled: January 2, 2002Publication date: July 3, 2003Inventors: David K. Poulsen, Sanjiv M. Shah, Paul M. Petersen, Grant E. Haab, Jay P. Hoeflinger
-
Patent number: 6578135Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.Type: GrantFiled: January 11, 2000Date of Patent: June 10, 2003Assignee: Broadcom CorporationInventors: Dan Dobberpuhl, Robert Stepanian
-
Patent number: 6574725Abstract: A processor architecture containing multiple closely coupled processors in a form of symmetric multiprocessing system is provided. The special coupling mechanism allows it to speculatively execute multiple threads in parallel very efficiently. Generally, the operating system is responsible for scheduling various threads of execution among the available processors in a multiprocessor system. One problem with parallel multithreading is that the overhead involved in scheduling the threads for execution by the operating system is such that shorter segments of code cannot efficiently take advantage of parallel multithreading. Consequently, potential performance gains from parallel multithreading are not attainable. Additional circuitry is included in a form of symmetrical multiprocessing system which enables the scheduling and speculative execution of multiple threads on multiple processors without the involvement and inherent overhead of the operating system.Type: GrantFiled: November 1, 1999Date of Patent: June 3, 2003Assignee: Advanced Micro Devices, Inc.Inventors: Uwe Kranich, David S. Christie
-
Patent number: 6567839Abstract: A system and method for performing computer processing operations in a data processing system includes a multithreaded processor and thread switch logic. The multithreaded processor is capable of switching between two or more threads of instructions which can be independently executed. Each thread has a corresponding state in a thread state register depending on its execution status. The thread switch logic contains a thread switch control register to store the conditions upon which a thread switch can occur. Upon the occurrence of a thread switch event, the state and priority of all threads are dynamically interrogated to determine which thread should be the active thread executing the processor. The thread switch logic has a time-out register which forces a thread switch when execution of the active thread in the multithreaded processor exceeds a programmable period of time.Type: GrantFiled: October 23, 1997Date of Patent: May 20, 2003Assignee: International Business Machines CorporationInventors: John Michael Borkenhagen, Richard James Eickemeyer, William Thomas Flynn, Sheldon Bernard Levenstein, Andrew Henry Wottreng
-
Patent number: 6560692Abstract: The data processing circuit of this invention enables efficient description and execution of processes that act upon the stack pointer, using short instructions. It also enables efficient description of processes that save and restore the contents of registers, increasing the speed of processing of interrupts and subroutine calls and returns. A CPU that uses this data processing circuit comprises a dedicated stack pointer register SP and uses an instruction decoder to decode a group of dedicated stack pointer instructions that specify the SP as an implicit operand. This group of dedicated stack pointer instructions are implemented in hardware by using general-purpose registers, the PC, the SP, an address adder, an ALU, a PC incrementer, internal buses, internal signal lines, and external buses.Type: GrantFiled: May 20, 1997Date of Patent: May 6, 2003Assignee: Seiko Epson CorporationInventors: Makoto Kudo, Satoshi Kubota, Yoshiyuki Miyayama, Hisao Sato
-
Patent number: 6553480Abstract: A group completion table (GCT) that manages the execution of instruction groups having more than one executable instruction is disclosed. The GCT includes a plurality of table entries, wherein each of the table entries is associated with a respective instruction group. Each table entry in the GCT includes a plurality of instruction completion identifiers, wherein each of the instruction completion identifiers corresponds to a specific instruction in the associated instruction group. The table entry also includes a trouble identifier that is utilized to flag the occurrence of any exception condition encountered in the execution of any instruction in the instruction group. In a related embodiment, the trouble identifier utilized in the table entry is a single bit.Type: GrantFiled: November 5, 1999Date of Patent: April 22, 2003Assignee: International Business Machines CorporationInventors: Hoichi Cheong, Hung Qui Le
-
Publication number: 20030070060Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.Type: ApplicationFiled: October 30, 2002Publication date: April 10, 2003Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Publication number: 20030065905Abstract: A parallel computation processor being capable of high-speed loop operation. When instruction decoders decode the VLOOP instruction, which triggers loop operation, an instruction buffer starts storing normal instructions. The instruction buffer dispatches a VLIW instruction composed of n pieces of normal instructions to execution units each time n pieces of instructions are stored therein. The execution units concurrently execute the instructions. After all instructions comprised in a loop have been stored in the buffer and once dispatched as VLIW instructions to be executed, the loop is executed repeatedly.Type: ApplicationFiled: September 26, 2002Publication date: April 3, 2003Applicant: NEC CORPORATIONInventor: Daiji Ishii
-
Patent number: 6542986Abstract: A superscalar processor may issue multiple instructions per clock cycle. Included in a superscalar processor may be a reorder buffer which stores information corresponding to concurrently dispatched instructions. Dependencies may exist among the instructions which are concurrently dispatched. To resolve this dependency, when a dependency is detected amongst a group of concurrently dispatched instructions, an indication of the dependency, along with an indication of the position of the dependency, is conveyed to the corresponding reservation station. When the reservation station receives the indication of the dependency, the operand tag associated with the dependency may be replaced with the correct tag. Advantageously, the circuitry needed to resolve the dependency may be moved out of the critical path of the processor; thus, improving the performance of the processor by allowing it to operate at an increased frequency.Type: GrantFiled: November 9, 1999Date of Patent: April 1, 2003Assignee: Advanced Micro Devices, Inc.Inventor: Scott A. White
-
Patent number: 6539471Abstract: Method and apparatus for reducing or eliminating retirement logic in an out-of-order processor are disclosed. Instructions are processed using a processing unit capable of out-of-order processing and having architectural registers having an architectural state. Groups of instructions are prepared for processing by processing unit, wherein within each group to be processed the instructions producing the final state of an architectural register are changed so that they write to an output copy of the architectural state, the instructions reading architectural registers are changed to read from an input copy of the architectural state, and the instructions within each group producing results to architectural registers that would be overwritten by another instruction in the group are changed to write their results to temporary registers.Type: GrantFiled: December 23, 1998Date of Patent: March 25, 2003Assignee: Intel CorporationInventor: Gad S. Sheaffer
-
Patent number: 6526499Abstract: The present invention discloses a method and apparatus for implementing a senior load instruction type. An instruction requesting a memory reference is decoded. The decoded instruction is then dispatched to a memory ordering unit. The instruction is retired from a load buffer and is executed after retiring.Type: GrantFiled: January 10, 2001Date of Patent: February 25, 2003Assignee: Intel CorporationInventors: Salvador Palanca, Shekoufeh Qawami, Niranjan L. Cooray, Angad Narang, Subramaniam Maiyuran
-
Patent number: 6522934Abstract: A process control system includes a controller that executes a control routine which performs a series of unit procedures within a process. The control routine is written or created to specify the class of unit to be used for each unit procedure, but not the actual unit itself. At the start of each unit procedure of the control routine, a dynamic unit selection routine selects a particular unit as the unit to be used during operation of that unit procedure. When called, the dynamic unit selection routine determines a set of possible units to be used, determines if each of the set of possible units is suitable for use during that unit procedure of the control routine based on a suitability criterion, prioritizes the units that meet the suitability criterion based on a priority criterion and selects the particular unit from the prioritized list of suitable units in order of priority.Type: GrantFiled: July 2, 1999Date of Patent: February 18, 2003Assignee: Fisher-Rosemount Systems, Inc.Inventors: William G. Irwin, David L. Deitz
-
Patent number: 6515759Abstract: A printer provides additional read/write memory for image processing by operating with stored programs in compressed form. When needed for execution, instructions of a compressed program are expanded by a decompression circuit on the fly in an instruction cache. In a preferred embodiment, the instruction cache includes dynamic random access memory (DRAM). Further, the processor for executing the expanded instructions, the decompression circuit, and the instruction cache are integrated together on the same chip. A printer having a processor for formatting incoming data in a page description language (PDL), for example, executes the instructions of the PDL interpreter program from the cache while the PDL program as a whole is stored in compressed format in off-chip ROM or received from an external computer (downloaded) into off-chip RAM.Type: GrantFiled: August 1, 2000Date of Patent: February 4, 2003Assignee: Hewlett-Packard CompanyInventor: Kenneth K. Smith
-
Publication number: 20030005260Abstract: A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands. A data dependance check circuit is included for determining data dependencies between the instructions. A tag assignment circuit generates one or more tags to specify the location of operands, based on the data dependencies determined by the data dependance check circuit. A set of register file port multiplexers select the tags generated by the tag assignment circuit and pass the tags onto the read address ports of the register file for storing execution results.Type: ApplicationFiled: March 1, 2002Publication date: January 2, 2003Inventors: Sanjiv Garg, Kevin Ray Iadonato, Le Trong Nguyen, Johannes Wang