Of Multiple Instructions Simultaneously Patents (Class 712/206)
  • Publication number: 20080133887
    Abstract: When fetching an instruction from a plurality of memory banks, a first pipeline cycle corresponding to selection of a memory bank and a second pipeline cycle corresponding to instruction readout are generated to carry out a pipeline process. Only the selected memory bank can be precharged to allow reduction of power consumption. Since the first and second pipeline cycles are effected in parallel, the throughput of the instruction memory can be improved.
    Type: Application
    Filed: December 4, 2007
    Publication date: June 5, 2008
    Applicant: RENESAS TECHNOLOGY CORP.
    Inventors: Toyohiko Yoshida, Akira Yamada, Hisakazu Sato
  • Patent number: 7383403
    Abstract: In one embodiment, a processor comprises a plurality of instruction buffers, an instruction cache coupled to supply instructions to the plurality of instruction buffers, and a cache miss unit coupled to the instruction cache. Each of the plurality of instruction buffers is configured to store instructions fetched from a respective thread of a plurality of threads. The cache miss unit is configured to monitor cache misses in the instruction cache. Particularly, the cache miss unit is configured to detect which of the plurality of threads experience a cache miss to a cache line. Responsive to a return of the cache line for storage in the instruction cache, the cache miss unit is configured to concurrently cause at least one instruction from the cache line to be stored in each of the plurality of instruction buffers that corresponds to one of the plurality of threads which experienced the cache miss to the cache line.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: June 3, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Jama I. Barreh, Manish Shah, Robert T. Golla
  • Patent number: 7373536
    Abstract: Systems and methods for halting the execution of instructions in a microprocessor are disclosed. The halt instruction may have an operand which allows a programmer to specify which clock of a system is to be utilized in conjunction with the halt instruction. A specified number of clock cycles may then be counted using the clock identified in the instruction, after which dispatch of instructions may resume. These system and methods may also allow the execution of instructions to be halted with respect to one or more of a multiplicity of threads within a microprocessor while allowing the continued execution of instructions associated with the remaining threads.
    Type: Grant
    Filed: August 4, 2004
    Date of Patent: May 13, 2008
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Hiroo Hayashi
  • Patent number: 7366884
    Abstract: A context switching system for a multi-thread execution pipeline loop having a pipeline latency and a method of operation thereof. In one embodiment, the context switching system includes a context switch requesting subsystem configured to: (1) detect a device request from a thread executing within the multi-thread execution pipeline loop for access to a device having a fulfillment latency exceeding the pipeline latency, and (2) generate a context switch request for the thread. The context switching system further includes a context controller subsystem configured to receive the context switch request and prevent the thread from executing until the device request is fulfilled.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: April 29, 2008
    Assignee: Agere Systems Inc.
    Inventors: Victor A. Bennett, Sean W. McGee
  • Patent number: 7366874
    Abstract: Apparatus and method for dispatching a very long instruction word (VLIW) instruction having a variable length are provided. The apparatus for dispatching a VLIW instruction includes a packet buffer for storing at least one or more VLIW instructions, and a decoding unit configured to constitute a VLIW instruction to be currently executed among the VLIW instructions stored in the packet buffer and decode predetermined bits of each sub-instruction contained in the VLIW instruction. The apparatus dispatches a corresponding sub-instruction to an FU which corresponds to each sub-instruction, based on the results of decoding performed in the decoding unit, position information on the sub-instructions that are placed on the packet buffer, and position information on the sub-instructions that are placed in the current VLIW instruction. Sub-instructions can be effectively dispatched to corresponding FUs using simple decoding logic even in a case where the length of the VLIW instruction is not fixed.
    Type: Grant
    Filed: December 3, 2002
    Date of Patent: April 29, 2008
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Nak-hee Seong, Kyoung-mook Lim, Seh-woong Jeong, Jae-hong Park, Hyung-jun Im, Gun-young Bae, Young-duck Kim
  • Patent number: 7363481
    Abstract: There is provided an information processing method characterized in that, in accordance with an instruction from a host CPU 411, either a CPU 103 or 104 loads a common code and an instruction code defined to be executed by itself from an external memory 110 into an internal memory 101, the other CPU loads an instruction code defined to be executed by itself from the external memory 110 into the internal memory 101, and then the respective CPUs load the respective instruction codes defined to be executed by themselves that are loaded in the internal memory 101, and execute the common code that is loaded in the internal memory 101 as required.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: April 22, 2008
    Assignee: Sony Corporation
    Inventor: Shigeo Sugimori
  • Patent number: 7363625
    Abstract: An SMT system is designed to allow software alteration of thread priority. In one case, the system signals a change in a thread priority based on the state of instruction execution and in particular when the instruction has completed execution. To alter the priority of a thread, the software uses a special form of a “no operation” (NOP) instruction (hereafter termed thread priority NOP). When the thread priority NOP is dispatched, its special NOP is decoded in the decode unit of the IDU into an operation that writes a special code into the completion table for the thread priority NOP. A “trouble” bit is also set in the completion table that indicates which instruction group contains the thread priority NOP. The trouble bit indicates that special processing is required after instruction completion. The thread priority instruction is processed after completion using the special code to change a thread's priority.
    Type: Grant
    Filed: April 24, 2003
    Date of Patent: April 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: William E. Burky, Ronald N. Kalla, David A. Schroter, Balaram Sinharoy
  • Patent number: 7360218
    Abstract: A system and method for identifying compatible threads in a Simultaneous Multithreading (SMT) processor environment is provided by calculating a performance metric, such as cycles per instruction (CPI), that occurs when two threads are running on the SMT processor. The CPI that is achieved when both threads were executing on the SMT processor is determined. If the CPI that was achieved is better than the compatibility threshold, then information indicating the compatibility is recorded. When a thread is about to complete, the scheduler looks at the run queue from which the completing thread belongs to dispatch another thread. The scheduler identifies a thread that is (1) compatible with the thread that is still running on the SMT processor (i.e., the thread that is not about to complete), and (2) ready to execute. The CPI data is continually updated so that threads that are compatible with one another are continually identified.
    Type: Grant
    Filed: September 25, 2003
    Date of Patent: April 15, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jos Manuel Accapadi, Andrew Dunshea, Dirk Michel, Mysore Sathyanarayana Srinivas
  • Patent number: 7360062
    Abstract: The selection between instruction threads in a SMT processor for the purpose of interleaving instructions from the different instruction threads may be modified to accommodate certain processor events or conditions. During each processor clock cycle, an interleave rule enforcement component produces at least one base instruction thread selection signal that indicates a particular one of the instruction threads for passing an instruction from that particular thread into a stream of interleaved instructions. Thread selection modification is provided by an interleave modification component that generates a final thread selection signal based upon the base thread selection signal and a feedback signal derived from one or more conditions or events in the various processor elements.
    Type: Grant
    Filed: April 25, 2003
    Date of Patent: April 15, 2008
    Assignee: International Business Machines Corporation
    Inventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
  • Patent number: 7350030
    Abstract: The invention comprises an apparatus and method of prefetching from a memory device having interleaved channels. The chipset prefetcher comprises a stride detector to detect a stride in a stream, a prefetch injector to insert prefetches onto the memory device, a channel mapper to map the prefetches to each channel of the memory device, a scheduler to schedule the prefetches onto the memory device in a DRAM-state aware manner, a throttling heuristic to scale the number of prefetches, and a prefetch data buffer to store prefetch data. The method of prefetching comprises tracking the state of streams, detecting a stride on one of the streams, selecting the stream with the stride for prefetch injection, enqueueing prefetches from the selected stream, mapping the prefetches to each of the interleaved channels, injecting the prefetches from the selected stream into each of the interleaved channels, and scheduling the prefetches onto the memory device in a DRAM-state aware manner.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: March 25, 2008
    Assignee: Intel Corporation
    Inventors: Hemant G. Rotithor, Abhishek Singhal, Randy B. Osborne, Zohar Bogin, Raul N. Gutierrez, Buderya S. Acharya, Surya Kareenahalli
  • Publication number: 20080040724
    Abstract: A system, apparatus and method for instruction dispatch on a multi-thread processing device are described herein. The instruction dispatching method includes, in an instruction execution period having a plurality of execution cycles, successively fetching and issuing an instruction for each of a plurality of instruction execution threads according to an allocation of execution cycles of the instruction execution period among the plurality of instruction execution threads. Remaining execution cycles are subsequently used to successively fetch and issue another instruction for each of the plurality of instruction execution threads having at least one remaining allocated execution cycle of the instruction execution period. Other embodiments may be described and claimed.
    Type: Application
    Filed: August 2, 2007
    Publication date: February 14, 2008
    Inventors: Jack Kang, Yu-Chi Chuang
  • Publication number: 20080005534
    Abstract: Methods and apparatus for partitioning a microprocessor pipeline to support pipelined branch prediction and instruction fetching of multiple execution threads. A thread selection stage selects a thread from a plurality of execution threads. In one embodiment, storage in a branch prediction output queue is pre-allocated to a portion of the thread in one branch prediction stage in order to prevent stalling of subsequent stages in the branch prediction pipeline. In another embodiment, an instruction fetch stage fetches instructions at a fetch address corresponding to a portion of the selected thread. Another instruction fetch stage stores the instruction data in an instruction fetch output queue if enough storage is available. Otherwise, instruction fetch stages corresponding to the selected thread are invalidated and refetched to avoid stalling preceding stages in the instruction fetch pipeline, which may be fetching instructions of another thread.
    Type: Application
    Filed: June 29, 2006
    Publication date: January 3, 2008
    Inventors: Stephan Jourdan, Robert Hinton
  • Publication number: 20070294513
    Abstract: One embodiment of the present invention provides a system that performs a fast-scanning operation to generate fetch bundles within an instruction fetch unit (IFU) of a processor. During operation, the system obtains a cache line containing instructions at the IFU. Next, the system performs a complete-scanning operation on the cache line to identify control transfer instructions (CTIs) in the cache line. At the same time, the system performs a fast-scanning operation to identify CTIs in a group of initial instructions in the cache line, wherein the initial instructions are executed before other instructions in the cache line. Next, the system obtains results from the fast-scanning operation before results of the complete-scanning operation are available. The system then uses results from the fast-scanning operation to form an initial fetch bundle containing initial instructions, and sends the initial fetch bundle to the instruction-issue unit.
    Type: Application
    Filed: June 15, 2006
    Publication date: December 20, 2007
    Inventors: Abid Ali, Andrew T. Ewoldt
  • Patent number: 7268787
    Abstract: A graphics processing system has a cache which is partitionable into two or more slots. Once partitioned, the slots are dynamically allocatable to one or more texture maps. First, number of texture maps needed to render a given scene is determined. Then, available slots of the cache are allocated to the texture maps. Sometimes, more slots are allocated to the largest texture map. At other times, more slots are allocated to the texture map which is likely to be used most often. The slots can also be allocated equally to all of the texture maps needed.
    Type: Grant
    Filed: May 28, 2004
    Date of Patent: September 11, 2007
    Assignee: S3 Graphics Co., Ltd.
    Inventors: Zhou Hong, Chih-Hong Fu
  • Patent number: 7257807
    Abstract: The present invention is directed to a parallel processor language, a method for translating C++ programs into a parallel processor language, and a method for optimizing execution time of a parallel processor program. In an exemplary aspect of the present invention, a parallel processor program for defining a processor integrated circuit includes a plurality of processor commands with addresses. The plurality of processor commands may includes a starting processor command, and each of the plurality of processor commands includes one or more subcommands. When the processor integrated circuit executes the parallel processor program, the processor integrated circuit executes the staring processor command first and then executes the rest of the plurality of processor commands based on an order of the addresses.
    Type: Grant
    Filed: September 22, 2003
    Date of Patent: August 14, 2007
    Assignee: LSI Corporation
    Inventors: Andrey A. Nikitin, Alexander E. Andreev
  • Patent number: 7254689
    Abstract: In an embodiment of the present invention, the computational efficiency of decoding of block-sorted compressed data is improved by ensuring that more than one set of operations corresponding to a plurality of paths through a mapping array T are being handled by a processor. This sequence of operations, including instructions from the plurality of sets of operations, ensures that there is another operation in the pipeline if a cache miss on any given lookup operation in the mapping array results in a slower main memory access. In this way, the processor utilization is improved. While the sets of operations in the sequence of operations are independent of another other, there will be an overlap of a plurality of the main memory access operations due to the long time required for main memory access.
    Type: Grant
    Filed: July 15, 2004
    Date of Patent: August 7, 2007
    Assignee: Google Inc.
    Inventors: Sean M. Dorward, Sean Quinlan, Michael Burrows
  • Patent number: 7237095
    Abstract: A method and mechanism for managing shifts in a shifting queue. A reservation station in a processing device includes a queue of shifting entries. On a given cycle, zero, one, or two instructions may be dispatched and stored in the queue. Depending upon the dispatch conditions and the state of the queue, existing entries within the queue may be shifted to make room for the newly dispatched instruction(s) at the top of the queue. Shift vectors are generated which identify entries of the queue which are to be shifted and by how much. A queue management approach is adopted in which three rules are generally followed: (i) Only shift entries that must shift due to dispatch pressure from above; (ii) If an entry must be shifted elsewhere, shift it as far down the array as the particular implementation allows; and (iii) Don't allow the previous conditions to force additional entries to shift that are not required to shift by dispatch pressure.
    Type: Grant
    Filed: August 4, 2005
    Date of Patent: June 26, 2007
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Daniel B. Hopper
  • Patent number: 7234025
    Abstract: A microprocessor that executes a repeat prefetch instruction (REP PREFETCH). The REP PREFETCH prefetches multiple cache lines, wherein the number of cache lines is specifiable in the instruction. The instruction is specified by the Pentium III PREFETCH opcode preceded by the REP string instruction prefix. The programmer specifies the count of cache lines to be prefetched in the ECX register, similarly to the repeat count of a REP string instruction. The effective address of the first cache line is specified similar to the conventional PREFETCH instruction. The REP PREFETCH instruction stops if the address of the current prefetch cache line misses in the TLB, or if the current processor level changes. Additionally, a line is prefetched only if the number of free response buffers is above a programmable threshold. The prefetches are performed at a lower priority than other activities needing access to the cache or TLB.
    Type: Grant
    Filed: November 3, 2004
    Date of Patent: June 19, 2007
    Assignee: IP-First, LLC
    Inventor: Rodney E. Hooker
  • Patent number: 7219185
    Abstract: A processor having the capability to dispatch multiple parallel operations, including multiple load operations, accesses a cache which is divided into banks. Each bank supports a limited number of simultaneous read and write access operations. A bank prediction field is associated with each memory access operation. Memory access operations are selected for dispatch so that they are predicted to be non-conflicting. Preferably, the processor automatically maintains a bank predict value based on previous bank accesses, and a confirmation value indicating a degree of confidence in the bank prediction. The confirmation value is preferably an up-or-down counter which is incremented with each correct prediction and decremented with each incorrect prediction.
    Type: Grant
    Filed: April 22, 2004
    Date of Patent: May 15, 2007
    Assignee: International Business Machines Corporation
    Inventor: David Arnold Luick
  • Patent number: 7194734
    Abstract: A threaded interpreter executes a program having a series of program instructions stored in a memory. For the execution of a program instruction the threaded interpreter includes a preparatory unit for executing a plurality of preparatory steps making th program instruction available in the threaded interpreter, and an execution unit with one or more machine instructions emulating the program instruction. The threaded interpreter is designed such that during the execution on an instruction-level parallel processor of the series of program instructions. Machine instructions implement a first one of the preparatory steps for execution in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions.
    Type: Grant
    Filed: February 13, 2003
    Date of Patent: March 20, 2007
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Jan Hoogerbrugge, Alexander Augusteijn
  • Patent number: 7185178
    Abstract: In one embodiment, a processor comprises an instruction cache and a fetch generator circuit coupled thereto. The fetch generator circuit is configured to generate at least one fetch request to the instruction cache for at least one of the plurality of threads. The fetch generator circuit is also configured to monitor for a plurality of conditions for each thread, wherein each of the plurality of conditions defined to inhibit the thread from being fetched. The fetch generator circuit is configured to speculatively generate a first fetch request for a first thread of the plurality of threads if each thread is inhibited from fetching and the first thread is inhibited from fetching only due to a first predetermined condition of the plurality of conditions. In one particular implementation, the first predetermined condition is a lack of room in a corresponding one of a plurality of instruction buffers.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: February 27, 2007
    Assignee: Sun Microsystems, Inc.
    Inventors: Jama I. Barreh, Robert T. Golla
  • Patent number: 7143268
    Abstract: A data processor includes execution clusters, an instruction cache, an instruction issue unit, and alignment and dispersal circuitry. Each execution cluster includes an instruction execution pipeline having a number of processing stages, and each execution pipeline is a number of lanes wide. The processing stages execute instruction bundles, where each instruction bundle has one or more syllables. Each lane is capable of receiving one of the syllables of an instruction bundle. The instruction cache includes a number of cache lines. The instruction issue unit receives fetched cache lines and issues complete instruction bundles toward the execution clusters. The alignment and dispersal circuitry receives the complete instruction bundles from the instruction issue unit and routes each received complete instruction bundle to a correct one of the execution clusters. The complete instruction bundles are routed as a function of at least one address bit associated with each complete instruction bundle.
    Type: Grant
    Filed: December 29, 2000
    Date of Patent: November 28, 2006
    Assignees: STMicroelectronics, Inc., Hewlett-Packard Development Co., L.P.
    Inventors: Paolo Faraboschi, Anthony X. Jarvis, Mark Owen Homewood, Geoffrey M. Brown, Gary L. Vondran
  • Patent number: 7139898
    Abstract: A pipelined multistreaming processor has an instruction source, a plurality of streams fetching instructions from the instruction source, a dispatch stage for selecting and dispatching instructions to a set of execution units, a set of instruction queues having one queue associated with each stream in the plurality of streams, and located in the pipeline between the instruction cache and the dispatch stage, and a select system for selecting streams in each cycle to fetch instructions from the instruction cache. The processor is characterized in that the select system selects one or more streams in each cycle for which to fetch instructions from the instruction cache, and in that the number of streams selected for which to fetch instructions in each cycle is fewer than the number of streams in the plurality of streams.
    Type: Grant
    Filed: November 3, 2000
    Date of Patent: November 21, 2006
    Assignee: Mips Technologies, Inc.
    Inventors: Mario Nemirovsky, Adolfo Nemirovsky, Narendra Sankar, Enrique Musoll
  • Patent number: 7137109
    Abstract: In one embodiment, the invention may comprise a computer-implemented system for managing access to a controlled space in a simulator environment, comprising: means for requiring initialization of a simulated hardware control object by a user code application operable to run on a simulated target platform in the simulator environment, wherein the simulated hardware control object is associated with at least a partition of the controlled space that is simulated by an architectural simulator in the simulator environment; and means for verifying if the simulated hardware control object associated with the partition has been initialized by the user code application when the user code application issues a transaction that attempts to access the partition.
    Type: Grant
    Filed: December 17, 2002
    Date of Patent: November 14, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Richard Shortz
  • Patent number: 7124318
    Abstract: A multiple parallel pipeline digital processing apparatus has the capability to substitute a second pipeline for a first in the event that a failure is detected in the first pipeline. Preferably, a redundant pipeline is shared by multiple primary pipelines. Preferably, the pipelines are located physically adjacent one another in an array. A pipeline failure causes data to be shifted one position within the array of pipelines, to by-pass the failing pipeline, so that each pipeline has only two sources of data, a primary and an alternate. Preferably, selection logic controlling the selection between a primary and alternate source of pipeline data is integrated with other pipeline operand selection logic.
    Type: Grant
    Filed: September 18, 2003
    Date of Patent: October 17, 2006
    Assignee: International Business Machines Corporation
    Inventor: David Arnold Luick
  • Patent number: 7124207
    Abstract: A method and system for batching commands and status information between a host computer and an adapter installed on the host computer. The method for command batching includes the host storing command pointers, each command pointer pointing to a command in an array, and providing an array pointer to the array. When a predetermined threshold of stored commands has been reached, the host can deliver a multitude of commands via the array pointer with a single bus access. A method for status batching includes transferring command statuses from the adapter to the host computer and providing a pointer to the transferred statuses. When a predetermined threshold of statuses has been reached, the adapter interrupts the host computer once to fetch the pointer and the host can then read the statuses without requiring any more bus interrupts.
    Type: Grant
    Filed: August 14, 2003
    Date of Patent: October 17, 2006
    Assignee: Adaptec, Inc.
    Inventors: Timothy Vincent Lee, Timothy Chin-Cheung Ng
  • Patent number: 7124282
    Abstract: Instructions for a processing unit are stored in a number of memory banks, successive instructions being stored in successive, different memory banks. Whenever execution of an instruction is started, the reading of one instruction which will be executed more than one instruction cycle later is also started. Consequently, a plurality of instructions are read in parallel from different memory banks. After the reading of an instruction, and before starting the execution of the instruction, the instruction passes through a pipeline in which the processing device detects whether the relevant instruction is a branch instruction. If this is so, the processing unit starts the reading in parallel of a number of instructions as from a branch target instruction. If it appears at a later stage that the branch is taken, said number of instructions is loaded into the pipeline in parallel.
    Type: Grant
    Filed: November 15, 2001
    Date of Patent: October 17, 2006
    Inventors: Frederik Zandveld, Marnix C. Vlot
  • Patent number: 7117343
    Abstract: A program-controlled unit has a plurality of instruction-execution units for simultaneously executing successive instructions of a program that is to be executed. The program-controlled unit allows the number of access operations to a program memory storing the program that is to be executed to be reduced. The program-controlled unit has an assignment device which operates such that only the instructions for those instruction-execution units which are actually required for the execution of the program are stored in the program memory in which the program to be executed by the program-controlled unit is stored. The program includes a sequence of instructions which can be executed simultaneously. The assignment device allocates instructions that can be executed simultaneously to desired instruction-execution units for simultaneous execution, independent of each instruction's position within the sequence.
    Type: Grant
    Filed: September 4, 2001
    Date of Patent: October 3, 2006
    Assignee: Infineon Technologies AG
    Inventors: Raimund Leitner, Christian Panis
  • Patent number: 7107433
    Abstract: A mechanism for resource allocation in a processor, a method of allocating resources in a processor and a digital signal processor incorporating the mechanism or the method. In one embodiment, the mechanism includes: (1) categorization logic, associated with an earlier pipeline stage, that generates instruction type information for instructions to be executed in the processor and (2) priority logic, associated with a later pipeline stage, that allocates functional units of the processor to execution of the instructions based on the instruction type information.
    Type: Grant
    Filed: October 26, 2001
    Date of Patent: September 12, 2006
    Assignee: LSI Logic Corporation
    Inventor: Hung T. Nguyen
  • Patent number: 7096466
    Abstract: Improved techniques for loading class files into virtual computing machines are disclosed. The techniques seek to provide a mechanism that will generally improve the efficiency of virtual machines by selectively loading information into a virtual machine. A new class attribute (“load-attribute”) is defined and implemented for class files. This can be, for example, implemented as a “load-attribute” table that lists the components that have been selected for loading into the virtual machine. In addition, the load-attribute may provide references to the selected components in the class file. Accordingly, various components of the class file can be marked for loading and selectively loaded.
    Type: Grant
    Filed: March 26, 2001
    Date of Patent: August 22, 2006
    Assignee: Sun Microsystems, Inc.
    Inventors: Stepan Sokolov, David Wallman
  • Patent number: 7062640
    Abstract: A filtering system for instruction segments determines whether a new instruction segment satisfies a predetermined filtering condition prior to storage. If the instruction segment fails the filtering condition, the new instruction segment is not stored. Various filtering conditions are available; but all filtering conditions test to determine whether it is more likely than not that a new instruction segment will be reused by the execution unit in the future.
    Type: Grant
    Filed: December 14, 2000
    Date of Patent: June 13, 2006
    Assignee: Intel Corporation
    Inventors: Stephan J. Jourdan, Alan Miller, Glenn Hinton
  • Patent number: 7039790
    Abstract: A data processing system with a microprocessor. The microprocessor has an instruction execution pipeline including fetch and decode stages and several functional execution units. Fetch packets contain a plurality of instruction words. Execute packets include a plurality of instruction words that can be executed in parallel by two or more execution units. An execution packet can span two or more fetch packets. A predetermined bit in each instruction marks whether the next instruction is executed in parallel with the current instruction. Instructions in an execute packet are dispatched to appropriate functional execution units based on instruction type. Upon a branch into an execute packet instructions at memory addresses before the branch location are not executed in parallel with instructions following the branch location.
    Type: Grant
    Filed: October 31, 2000
    Date of Patent: May 2, 2006
    Assignee: Texas Instruments Incorporated
    Inventors: Laurence R. Simar, Jr., Richard A. Brown
  • Patent number: 7039791
    Abstract: A computing system as described in which individual instructions are executable in parallel by processing pipelines, and instructions to be executed in parallel by different pipelines are supplied to the pipelines simultaneously. The system includes storage for storing an arbitrary number of the instructions to be executed. The instructions to be executed are tagged with pipeline identification tags indicative of the pipeline to which they should be dispatched. The pipeline identification tags are supplied to a system which controls a crossbar switch, enabling the tags to be used to control the switch and supply the appropriate instructions simultaneously to the differing pipelines.
    Type: Grant
    Filed: July 3, 2002
    Date of Patent: May 2, 2006
    Assignee: Intergraph Corporation
    Inventors: Howard G. Sachs, Siamak Arya
  • Patent number: 7028164
    Abstract: There is disclosed a data processor containing an instruction issue unit that efficiently transfers instruction bundles from a cache to an instruction pipeline. The data processor comprises 1) an instruction pipeline comprising N processing stages; and 2) an instruction issue unit for fetching into the instruction pipeline instructions fetched from the instruction cache, each of the fetched instructions comprising from one to S syllables.
    Type: Grant
    Filed: December 29, 2000
    Date of Patent: April 11, 2006
    Assignee: STMicroelectronics, Inc.
    Inventors: Anthony X. Jarvis, Mark Owen Homewood, Gary L. Vondran
  • Patent number: 7000097
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: July 1, 2002
    Date of Patent: February 14, 2006
    Assignee: Seiko Epson Corporation
    Inventors: Cheryl D. Senter, Johannes Wang
  • Patent number: 6981127
    Abstract: A method and apparatus for providing a plurality of aligned instructions from an instruction stream provided by a memory unit for execution within a pipelined microprocessor is described. The microprocessor comprises a prefetch buffer, whereby the prefetch buffer stores prefetched instructions and additional information about the validity and size of the prefetch buffer. The method and apparatus use the prefetch buffer to buffer a part of an instruction stream. The actually aligned instruction stream is issued from the prefetch buffer or directly by instructions fetched from the memory, or from a combination of prefetched instructions and actually fetched instructions.
    Type: Grant
    Filed: May 26, 1999
    Date of Patent: December 27, 2005
    Assignee: Infineon Technologies North America Corp.
    Inventors: Balraj Singh, Venkat Mattela
  • Patent number: 6976154
    Abstract: A packet processing engine includes multiple microcode instruction memories implemented in parallel. For each cycle of the pipeline, an instruction from each of the memories is retrieved based on a program counter. One of the instructions is selected by a priority encoder that operates on true/false signals generated based on the instructions. The selected instruction is executed to thereby perform the packet processing operations specified by the instruction.
    Type: Grant
    Filed: November 7, 2001
    Date of Patent: December 13, 2005
    Assignee: Juniper Networks, Inc.
    Inventors: Stefan Dyckerhoff, Tesfaye Teshager
  • Patent number: 6965987
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: November 17, 2003
    Date of Patent: November 15, 2005
    Assignee: Seiko Epson Corporation
    Inventors: Cheryl Senter Brashears, Johannes Wang, Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6959375
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: October 29, 2002
    Date of Patent: October 25, 2005
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6957320
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: July 9, 2002
    Date of Patent: October 18, 2005
    Assignee: Seiko Epson Corporation
    Inventors: Cheryl D. Senter, Johannes Wang
  • Patent number: 6957305
    Abstract: This invention provides a dual usage cache reload buffer (CRB) to hold both demand loads as well as prefetch loads. A new form of a data cache block touch (DCBT) instruction specifies which level of the cache hierarchy to prefetch data into. A first asynchronous form of a DCBT instruction is issued to prefetch a stream of data into a L2 cache. A second synchronous form of a DCBT instruction is used to prefetch data from the L2 cache to the CRB in the main CPU, which will bypass the L1 data cache and forward data directly to the register file. This CRB has a dual usage and is used to hold both normal cache reloads as well as the aforementioned prefetched cache lines.
    Type: Grant
    Filed: August 29, 2002
    Date of Patent: October 18, 2005
    Assignee: International Business Machines Corporation
    Inventors: David Scott Ray, David J. Shippy
  • Patent number: 6920544
    Abstract: A processor includes a memory unit in which instructions having their constituent bytes stored in ascending address order alternate with instructions having their constituent bytes stored in descending address order. A single address pointer is used to read one instruction by reading up, and another instruction by reading down. The amount of address information needed for program execution is thereby reduced, as one address pointer suffices for two instructions. The address pointer may be provided by a branch instruction that also indicates whether to read up or down. An up-counter and a down-counter may be provided as address counters, enabling the two instructions to be read and executed concurrently. Four address counters may be provided, enabling a branch instruction to designate the execution of from one to four consecutive instructions.
    Type: Grant
    Filed: March 25, 2003
    Date of Patent: July 19, 2005
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Mototsugu Watanabe
  • Patent number: 6918018
    Abstract: The 64-bit single cycle fetch method described here relates to a specific ‘megastar’ core processor employed in a range of new digital signal processor devices. The ‘megastar’ core incorporates 32-bit memory blocks arranged into separate entities or banks. Because the parent CPU has only three 16-bit buses, a maximum read in one clock cycle through the memory interface would normally be 48-bits. This invention describes an approach for a fetch method involving tapping into the memory bank data at an earlier stage prior to the memory interface. This allows the normal 48-bit fetch to be extended to 64-bits as required for full performance of the numerical processor accelerator and other speed critical operations and functions.
    Type: Grant
    Filed: September 27, 2002
    Date of Patent: July 12, 2005
    Assignee: Texas Instruments Incorporated
    Inventors: Roshan J. Samuel, Jason D. Kridner
  • Patent number: 6915412
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: October 30, 2002
    Date of Patent: July 5, 2005
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6898692
    Abstract: A method of processing data relating to graphical primitives to be displayed on a display device using region-based SIMD multiprocessor architecture, has the shading and blending operations deferred until rasterization of the available graphical primitive data is completed.
    Type: Grant
    Filed: June 28, 2000
    Date of Patent: May 24, 2005
    Assignee: ClearSpeed Technology plc
    Inventors: Ken Cameron, Eamon O'Dea
  • Patent number: 6898696
    Abstract: A method and system for increasing the efficiency of execution in a processor. Instructions are dispatched in instruction groups, wherein if such an instruction group contains an interruptible instruction of a selected type, only one interruptible instruction of the selected type is included in the instruction group. A state of the processor is recorded, associated respectively with each of said dispatched instruction groups. The processor is restored to the recorded state associated with the instruction group containing the interruptible instruction of the selected type causing an interrupt, in response to the interrupt from one of the interruptible instructions of the selected type.
    Type: Grant
    Filed: June 14, 1999
    Date of Patent: May 24, 2005
    Assignee: International Business Machines Corporation
    Inventors: Hoichi Cheong, Hung Qui Le
  • Patent number: 6898694
    Abstract: The present invention provides a mechanism for supporting high bandwidth instruction fetching in a multi-threaded processor. A multi-threaded processor includes an instruction cache (I-cache) and a temporary instruction cache (TIC). In response to an instruction pointer (IP) of a first thread hitting in the I-cache, a first block of instructions for the thread is provided to an instruction buffer and a second block of instructions for the thread are provided to the TIC. On a subsequent clock interval, the second block of instructions is provided to the instruction buffer, and first and second blocks of instructions from a second thread are loaded into a second instruction buffer and the TIC, respectively.
    Type: Grant
    Filed: June 28, 2001
    Date of Patent: May 24, 2005
    Assignee: Intel Corporation
    Inventors: Sailesh Kottapalli, James S. Burns, Kenneth D. Shoemaker
  • Patent number: 6895473
    Abstract: A data control device capable of high-quality, high-efficiency control for speeding up data processing, thus permitting improvement of the throughput of a system. Attribute analyzing unit analyzes an attribute of data, and a main memory stores setting information of the data in a region corresponding to the attribute. A highway cache memory stores the data, and also receives and transmits the data on a highway. A processor performs an operation on the data in accordance with the setting information. A data cache memory is interposed between the processor and the main memory and stores the setting information.
    Type: Grant
    Filed: November 12, 2002
    Date of Patent: May 17, 2005
    Assignee: Fujitsu Limited
    Inventors: Masao Nakano, Takeshi Toyoyama, Yasuhiro Ooba
  • Patent number: 6892293
    Abstract: A computing system as described in which individual instructions are executable in parallel by processing pipelines, and instructions to be executed in parallel by different pipelines are supplied to the pipelines simultaneously. The system includes storage for storing an arbitrary number of the instructions to be executed. The instructions to be executed are tagged with pipeline identification tags indicative of the pipeline to which they should be dispatched. The pipeline identification tags are supplied to a system which controls a crossbar switch, enabling the tags to be used to control the switch and supply the appropriate instructions simultaneously to the differing pipelines.
    Type: Grant
    Filed: April 9, 1998
    Date of Patent: May 10, 2005
    Assignee: Intergraph Corporation
    Inventors: Howard G. Sachs, Siamak Arya
  • Patent number: 6862676
    Abstract: A superscalar processor having a content addressable memory structure that transmits a first and second output signal is presented. The superscalar processor performs out of order processing on an instruction set. From the first output signal, the dependencies between currently fetched instructions of the instruction set and previous in-flight instructions can be determined and used to generate a dependency matrix for all in-flight instructions. From the second output signal, the physical register addresses of the data required to execute an instruction, once the dependencies have been removed, may be determined.
    Type: Grant
    Filed: January 16, 2001
    Date of Patent: March 1, 2005
    Assignee: Sun Microsystems, Inc.
    Inventors: Micah C. Knapp, Poonacha P. Kongetira, Marc E. Lamere, Julie M. Staraitis