Prefetching Patents (Class 712/207)
  • Patent number: 6553484
    Abstract: Each entry of an instruction out-of-order buffer 330 has a Y-operand referencing succeeding-entry information field 335 and a Z-operand referencing succeeding-entry information field 336. In the field 335, the entry number of a succeeding instruction that will reference the execution result of the instruction in the entry as the Y operand is stored. In the field 336, the entry number of a succeeding instruction that will reference the execution result of the instruction in the entry as the Z operand is stored. When the succeeding instruction is stored in the instruction out-of-order buffer 330, a reference relation setting circuit 320 sets the reference relation between the execution result of a preceding instruction and the operands of the succeeding instruction in the operand referencing succeeding-entry information fields of the preceding instruction. A ready flag setting circuit 340 reflects the fields 335 and 336 on ready flags 333 and 334 of the Y-operand and the Z-operand of the corresponding entry.
    Type: Grant
    Filed: November 30, 1999
    Date of Patent: April 22, 2003
    Assignee: NEC Corporation
    Inventor: Akihiro Sawamura
  • Publication number: 20030065906
    Abstract: A system and method of storing instructions is disclosed. First an instruction is received. Then if the instruction will be used in a next instruction cycle, the instruction is loaded in a next instruction cycle cache memory.
    Type: Application
    Filed: September 28, 2001
    Publication date: April 3, 2003
    Inventors: Ryan N. Rakvic, Hong Wang, John P. Shen
  • Publication number: 20030065907
    Abstract: A program-controlled unit stores return addresses not only in a system stack but also in a return stack. The instructions which have already been taken into the program-controlled unit, but are not currently required, are stored in a storage device for alternative instructions. At times when the program-controlled unit is not active elsewhere, instructions are taken into the program-controlled unit, whereby the instructions are to be carried out when an instruction is not carried out or is not carried out as expected.
    Type: Application
    Filed: August 29, 2002
    Publication date: April 3, 2003
    Inventors: Steffen Sonnekalb, Jurgen Birkhauser
  • Patent number: 6542982
    Abstract: In order to simplify the instruction prefetch architecture for use with the programs having few loops, and having instructions almost in linear and sequential addresses, the bus controller in accordance with the present invention for controlling the bus in an external memory includes a plurality of instruction buffers, flags each specific to each of instruction buffers, and a buffer controller circuit. The buffer controller circuit may allocate one of specific values that plural lower bits of an instruction address may take to each of the instruction buffers, and prefetch instructions to the instruction buffers each corresponding to a respective addresses designated to by the plural lower bits, from the address following a predetermined fetch address. The constitution for instruction prefetch as above may be implemented in a simpler manner than the controlling structure using address tags of a cache memory or the controlling structure using read-write pointer based on the counter in FIFO buffers.
    Type: Grant
    Filed: February 15, 2001
    Date of Patent: April 1, 2003
    Assignees: Hitachi, Ltd., Hitachi ULSI Systems Co., Ltd.
    Inventors: Yasuyuki Murakami, Shigezumi Matsui, Kunihiko Nishiyama, Atsushi Kiuchi, Yuichi Takitsune
  • Publication number: 20030056086
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Application
    Filed: October 29, 2002
    Publication date: March 20, 2003
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20030056087
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Application
    Filed: October 30, 2002
    Publication date: March 20, 2003
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6535962
    Abstract: A data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache and a third level cache and a system memory. Prefetching of cache lines is performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches is performed over a private prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor.
    Type: Grant
    Filed: November 8, 1999
    Date of Patent: March 18, 2003
    Assignee: International Business Machines Corporation
    Inventors: Michael John Mayfield, Francis Patrick O'Connell, David Scott Ray
  • Patent number: 6535961
    Abstract: A spatial footprint predictor includes a mechanism to measure spatial footprints of nominating cache-lines and hold the footprints. In some embodiments, the mechanism includes an active macro-block table (AMBT) to measure the spatial footprints and a spatial footprint table (SFT) to hold the spatial footprints. In other embodiments, the mechanism includes a macro-block table (MBT) in which macro-blocks may be active or inactive.
    Type: Grant
    Filed: November 21, 1997
    Date of Patent: March 18, 2003
    Assignee: Intel Corporation
    Inventors: Christopher B. Wilkerson, Sanjeev Kumar
  • Patent number: 6505292
    Abstract: A processor includes a first instruction cache, a second instruction cache, a return stack, and a fetch unit. The return stack is configured to store return addresses corresponding to call instructions. The return stack is configured to output a first return address from a top of the return stack and a second return address which is next to the top of the return stack. The fetch unit is coupled to the first instruction cache, the second instruction cache, and the return stack, and is configured to convey the first return address to the first instruction cache responsive to a return instruction. Additionally, the fetch unit is configured to convey the second return address to the second instruction cache responsive to the return instruction.
    Type: Grant
    Filed: January 29, 2002
    Date of Patent: January 7, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventor: David B. Witt
  • Patent number: 6505295
    Abstract: To provide a data processing apparatus equipped with a control means to reduce the power required for memory accessing, in spite of the unavailability of a repeat instruction, by reading instructions reiteratively from a small scale buffer in loop processing. Also to provide a data processing apparatus equipped with a means to opt to apply, or not to apply, control to read instructions, that have to be executed reiteratively in loop processing, out of a small scale buffer reiteratively. If, as a result of execution of an instruction to alter the content of a certain register prior to a series of instructions to be executed reiteratively, the register satisfies a specific condition, the series of instructions to be executed repeatedly are read out of a small scale buffer reiteratively.
    Type: Grant
    Filed: August 16, 1999
    Date of Patent: January 7, 2003
    Assignee: Hitachi, Ltd.
    Inventors: Mitsuru Hiraki, Atsushi Kiuchi, Kesami Hagiwara
  • Publication number: 20030005262
    Abstract: The present invention provides a mechanism for supporting high bandwidth instruction fetching in a multi-threaded processor. A multi-threaded processor includes an instruction cache (I-cache) and a temporary instruction cache (TIC). In response to an instruction pointer (IP) of a first thread hitting in the I-cache, a first block of instructions for the thread is provided to an instruction buffer and a second block of instructions for the thread are provided to the TIC. On a subsequent clock interval, the second block of instructions is provided to the instruction buffer, and first and second blocks of instructions from a second thread are loaded into a second instruction buffer and the TIC, respectively.
    Type: Application
    Filed: June 28, 2001
    Publication date: January 2, 2003
    Inventors: Sailesh Kottapalli, James S. Burns, Kenneth D. Shoemaker
  • Patent number: 6502188
    Abstract: A branch prediction unit includes a local branch prediction and a global branch prediction. A global branch prediction utilizes a global history shift register to record the behavior of conditional branches. In some cases, a conditional branch may behave in a static manner, either always being taken or not taken, while resident in an instruction cache. Such static behaving conditional branches do not need a global history for prediction and contend with other conditional branches for global branch history training. By utilizing a dynamic branch classification scheme, branches requiring global history prediction can be identified and static behaving conditional branches may be prevented from polluting the global history. All conditional branches are initially classified as local and do not participate in global history training. Only after two mispredictions are branches recognized as exhibiting dynamic behavior and classified as global.
    Type: Grant
    Filed: November 16, 1999
    Date of Patent: December 31, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gerald D. Zuraski, Jr., James S. Roberts, Raghuram S. Tupuri
  • Patent number: 6496902
    Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.
    Type: Grant
    Filed: December 31, 1998
    Date of Patent: December 17, 2002
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg
  • Patent number: 6496921
    Abstract: A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster).
    Type: Grant
    Filed: June 30, 1999
    Date of Patent: December 17, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Lakshminarayana Baba Arimilli, Leo James Clark, John Steven Dodson, Guy Lynn Guthrie, James Stephen Fields, Jr.
  • Patent number: 6490658
    Abstract: A memory cache method and apparatus with two memory execution pipelines, each having a translation lookaside buffer (TLB). Memory instructions are executed in the first pipeline (324) by searching a data cache (310) and a prefetch cache (320). A large data TLB (330) provides memory for storing address translations for the first pipeline (324) A second pipeline (328) executes memory instructions by accessing the prefetch cache (320). A second micro-TLB (340) is associated with the second pipeline (328). It is loaded in anticipation of data that will be referenced by the second pipeline (328). A history file (360) is also provided to retain information on previous instructions to aid in deciding when to prefetch data. Prefetch logic (370) determines when to prefetch data, and steering logic (380) routes certain instructions to the second pipeline (328) to increase system performance.
    Type: Grant
    Filed: June 23, 1997
    Date of Patent: December 3, 2002
    Assignee: Sun Microsystems, Inc.
    Inventors: Sultan Ahmed, Joseph Chamdani
  • Patent number: 6484252
    Abstract: A microcomputer includes an instruction decoder and a program counter. The instruction decoder decodes fetched instructions and outputs a control signal ordering execution of the fetched instruction. The control signal from the instruction decoder includes a component controlling fetch cycles which triggers a fetch cycle at the beginning of each instruction cycle to fetch the operand for the instruction currently being executed and midway through each instruction cycle to fetch the OP code for the next instruction. The program counter is responsive to the triggering of each fetch cycle to increment its counter value so as to keep the counter value consistent with the address being accessed in each fetch cycle.
    Type: Grant
    Filed: June 7, 1995
    Date of Patent: November 19, 2002
    Assignee: Sony Corporation
    Inventor: Nobuhisa Watanabe
  • Patent number: 6470435
    Abstract: An embodiment of the present invention includes a speculative rename table (SRT), a shadow array, and an update circuit. The SRT stores mapping of frequent and infrequent registers. The frequent registers are frequently modified by instructions dispatched from a processor core. The infrequent registers are infrequently modified by the instructions. The shadow array stores shadow registers. Each of the shadow registers contains a rename state of a corresponding frequent register after a branch instruction. The update circuit transfers contents of the shadow registers to the frequent registers based on a selection condition.
    Type: Grant
    Filed: December 28, 2000
    Date of Patent: October 22, 2002
    Assignee: Intel Corporation
    Inventors: Nicholas G. Samra, Jacob Doweck, Belliappa Kuttanna
  • Patent number: 6470427
    Abstract: A programmable agent and method for managing prefetch queues provide dynamically configurable handling of priorities in a prefetching subsystem for providing look-ahead memory loads in a computer system. When it's queues are at capacity an agent handling prefetches from memory either ignores new requests, forces the new requests to retry or cancels a pending request in order to perform the new request. The behavior can be adjusted under program control by programming a register, or the control may be coupled to a load pattern analyzer. In addition, the behavior with respect to new requests can be set to different types depending on a phase of a pending request.
    Type: Grant
    Filed: November 9, 1999
    Date of Patent: October 22, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, John Steven Dodson, James Stephen Fields, Jr., Guy Lynn Guthrie
  • Patent number: 6470444
    Abstract: A method of performing a store operation in a computer processor is disclosed. The method issues a store operation that is divided into a pre-fetch micro-operation that loads a needed cache line into a cache memory, and the subsequent store micro-operation stores a data value into the needed cache line that was pre-fetched into the cache memory.
    Type: Grant
    Filed: June 16, 1999
    Date of Patent: October 22, 2002
    Assignee: Intel Corporation
    Inventor: Gad S. Sheaffer
  • Patent number: 6467030
    Abstract: A method and apparatus for forwarding data in a hierarchial cache memory architecture is disclosed. A cache memory hierarchy includes multiple levels of cache memories, each level having a different size and speed. A command is initially issued from a processor to the cache memory hierarchy. If the command is a Demand Load command, data corresponding to the Demand Load command is immediately forwarded from a cache having the data to the processor. Otherwise, if the command is a Prefetch Load command, data corresponding to the Prefetch Load command is held in a cache reload buffer within a cache memory preceding the processor.
    Type: Grant
    Filed: November 9, 1999
    Date of Patent: October 15, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Lakshminarayana Baba Arimilli, James Stephen Fields, Jr., Sanjeev Ghai, Praveen S. Reddy
  • Publication number: 20020144087
    Abstract: A kind of architecture of method for fetching microprocessor's instructions is provided to pre-read and pre-decode a next instruction. If the instruction pre-decoded is found a conditional branch instruction, an instruction reading-amount register is set for reading two instructions next to the current instruction in the program memory, or one is read instead if the next instruction is found an instruction other than the conditional branch one so as to waive reading of unnecessary program memory and thereby reduce power consumption.
    Type: Application
    Filed: December 18, 2001
    Publication date: October 3, 2002
    Inventors: Pao-Lung Chen, Chen-Yi Lee
  • Patent number: 6460115
    Abstract: A data processing system and method for prefetching data in a multi-level code subsystem. The data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache, and a third level cache and a system memory. Prefetching of cache lines is concurrently performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches are performed over a private or dedicated prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor. A software instruction or hint may be used to accelerate the prefetch process by overriding the normal functionality of the hardware prefetch engine.
    Type: Grant
    Filed: November 8, 1999
    Date of Patent: October 1, 2002
    Assignee: International Business Machines Corporation
    Inventors: James Allan Kahle, Michael John Mayfield, Francis Patrick O'Connell, David Scott Ray, Edward John Silha, Joel Tendler
  • Patent number: 6460116
    Abstract: A microprocessor configured to rapidly decode variable-length instructions is disclosed. The microprocessor is configured with a predecoder and an instruction cache. The predecoder is configured to expand variable-length instructions to create fixed-length instructions by padding instruction fields within each variable-length instruction with constants until each field reaches a predetermined maximum width. The fixed-width instructions are then stored within the instruction cache and output for execution when a corresponding requested address is received. The instruction cache may store both variable- and fixed-width instructions, or just fixed-width instructions. An array of pointers may be used to access particular fixed-length instructions. The fixed-length instructions may be configured to all have the same fields and the same lengths, or they may be divided into groups, wherein instructions within each group have the same fields and the same lengths.
    Type: Grant
    Filed: September 21, 1998
    Date of Patent: October 1, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Rupaka Mahalingaiah
  • Patent number: 6457117
    Abstract: The processor is configured to predecode instruction bytes prior to their storage within an instruction cache. During the predecoding, relative branch instructions are detected. The displacement included within the relative branch instruction is added to the address corresponding to the relative branch instruction, thereby generating the target address. The processor replaces the displacement field of the relative branch instruction with an encoding of the target address, and stores the modified relative branch instruction in the instruction cache. The branch prediction mechanism may select the target address from the displacement field of the relative branch instruction instead of performing an addition to generate the target address. In one embodiment, relative branch instructions having eight bit and 32-bit displacement fields are included in the instruction set executed by the processor.
    Type: Grant
    Filed: November 7, 2000
    Date of Patent: September 24, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventor: David B. Witt
  • Patent number: 6446190
    Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibilty in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: September 3, 2002
    Assignee: Bops, Inc.
    Inventors: Edwin F. Barry, Gerald G. Pechanek, Patrick R. Marchand
  • Publication number: 20020120829
    Abstract: In order to simplify the instruction prefetch architecture for use with the programs having few loops, and having instructions almost in linear and sequential addresses, the bus controller in accordance with the present invention for controlling the bus in an external memory includes a plurality of instruction buffers, flags each specific to each of instruction buffers, and a buffer controller circuit. The buffer controller circuit may allocate one of specific values that plural lower bits of an instruction address may take to each of the instruction buffers, and prefetch instructions to the instruction buffers each corresponding to a respective addresses designated to by the plural lower bits, from the address following a predetermined fetch address. The constitution for instruction prefetch as above may be implemented in a simpler manner than the controlling structure using address tags of a cache memory or the controlling structure using read-write pointer based on the counter in FIFO buffers.
    Type: Application
    Filed: February 15, 2001
    Publication date: August 29, 2002
    Inventors: Yasuyuki Murakami, Shigezumi Matsui, Kunihiko Nishiyama, Atsushi Kiuchi, Yuichi Takitsune
  • Publication number: 20020116598
    Abstract: A computer system with a processing unit and a memory. The processing unit is arranged to fetch memory lines from the memory and execute instructions from the memory lines. Each memory line is fetched as a whole and is capable of holding more than one instruction. An instruction comprises information that signals explicitly how the processing unit, when processing the instruction from a current memory line, should control how a part of processing is affected by crossing of a boundary to a subsequent memory line. The processing unit responds to the information by controlling said part as signaled by the information.
    Type: Application
    Filed: January 29, 2002
    Publication date: August 22, 2002
    Inventor: Jeroen Anton Johan Leijten
  • Publication number: 20020116597
    Abstract: A memory accelerator module buffers program instructions and/or data for high speed access using a deterministic access protocol. The program memory is logically partitioned into ‘stripes’, or ‘cyclically sequential’ partitions, and the memory accelerator module includes a latch that is associated with each partition. When a particular partition is accessed, it is loaded into its corresponding latch, and the instructions in the next sequential partition are automatically pre-fetched into their corresponding latch. In this manner, the performance of a sequential-access process will have a known response, because the pre-fetched instructions from the next partition will be in the latch when the program sequences to these instructions. Previously accessed blocks remain in their corresponding latches until the pre-fetch process ‘cycles around’ and overwrites the contents of each sequentially-accessed latch.
    Type: Application
    Filed: February 20, 2001
    Publication date: August 22, 2002
    Inventors: Gregory K. Goodhue, Ata R. Khan, John H. Wharton, Robert Michael Kallal
  • Patent number: 6438680
    Abstract: When a decision circuit (217) incorporated in a control circuit (21) in an instruction decode unit (2) in a microprocessor (1) decides that an integer operation unit (4) can not execute a following sub instruction, the decision circuit (217) controls each of selectors (211, 214, and 215) and an exchange circuit (216) so that a memory access unit (3) that has already executed a preceding sub instruction can execute the following sub instruction.
    Type: Grant
    Filed: June 3, 1999
    Date of Patent: August 20, 2002
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventors: Akira Yamada, Isao Minematsu
  • Patent number: 6434693
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address-collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: November 12, 1999
    Date of Patent: August 13, 2002
    Assignee: Seiko Epson Corporation
    Inventors: Cheryl D. Senter, Johannes Wang
  • Patent number: 6430675
    Abstract: The inventive mechanism uses a cache table to map branch targets. When a fetch instruction is initiated, the inventive mechanism searches the IP-to-TM cache to determine whether the branch target instruction has been optimized and placed into the trace memory. If there is a match with the P-to-TM cache, then the code in the trace is executed. This cache is examined in parallel with Instruction Translation Lookup Buffer (ITLB). If not a match is found in the IP-to-TM cache, the original binary in the physical address provided by the ITLB will be executed.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: August 6, 2002
    Assignee: Hewlett-Packard Company
    Inventors: Wei C. Hsu, Manuel Benitez
  • Patent number: 6430676
    Abstract: A computer-based method and system for determining designations for conditional branch operations and settings for lookahead values for a portion of a computer program. The lookahead system of the present invention evaluates various combinations of designations for the conditional branch operations for the portion of the computer program. The lookahead system generates a metric to measure the amount of parallel processing that would result from each combination of designations assuming that the lookahead values are set to optimal values for that combination. This metric may take into consideration estimated or actual execution frequencies of the instructions. The lookahead system then designates the conditional branch operations and sets the lookahead values based on the metric generated for one of the combinations.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: August 6, 2002
    Assignee: Cray Inc.
    Inventor: Brian D. Koblenz
  • Patent number: 6427204
    Abstract: A system for time-ordered issuance of instruction fetch requests (IFR). More specifically, the system enables just-in-time delivery of instructions requested by an IFR. The system consists of a processor, an L1 instruction cache with corresponding L1 cache controller, and an instruction processor. The instruction processor manipulates an architected time dependency field of an IFR to create a Time of Dependency (ToD) field. The ToD field holds a time dependency value which is utilized to order the IFR in a Relative Time-Ordered Queue (RTOQ) of the L1 cache controller. The IFR is issued from RTOQ to the L1 instruction cache so that the requested instruction is fetched from the L1 instruction cache at the time specified by the ToD value. In an alternate embodiment the ToD is converted to a CoD and the instruction is fetched from a lower level cache at the CoD value.
    Type: Grant
    Filed: June 25, 1999
    Date of Patent: July 30, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Lakshminarayanan Baba Arimilli, John Steven Dodson, Jerry Don Lewis
  • Publication number: 20020099926
    Abstract: In a first aspect of the present invention, a method for prefetching instructions in a superscalar processor is disclosed. The method comprises the steps of fetching a set of instructions along a predicted path and prefetching a predetermined number of instructions if a low confidence branch is fetched and storing the predetermined number of instructions in a prefetch buffer. In a second aspect of the present invention, a system for prefetching instructions in a superscalar processor is disclosed. The system comprises a cache for fetching a set of instructions along a predicted path, a prefetching mechanism coupled to the cache for prefetching a predetermined number of instructions if a low confidence branch is fetched and a prefetch buffer coupled to the prefetching mechanism for storing the predetermined number of instructions. Through the use of the method and system in accordance with the present invention, existing prefetching algorithms are improved with minimal additional hardware cost.
    Type: Application
    Filed: January 19, 2001
    Publication date: July 25, 2002
    Applicant: International Business Machines Corporation
    Inventor: Balaram Sinharoy
  • Publication number: 20020095563
    Abstract: One embodiment of the present invention provides a system that prefetches instructions by using an assist processor to perform prefetch operations in advance of a primary processor. The system operates by executing executable code on the primary processor, and simultaneously executing a reduced version of the executable code on the assist processor. This reduced version of the executable code executes more quickly than the executable code, and performs prefetch operations for the primary processor in advance of when the primary processor requires the instructions. The system also stores the prefetched instructions into a cache that is accessible by the primary processor so that the primary processor is able to access the prefetched instructions without having to retrieve the prefetched instructions from a main memory. In one embodiment of the present invention, prior to executing the executable code, the system compiles source code into executable code for the primary processor.
    Type: Application
    Filed: January 16, 2001
    Publication date: July 18, 2002
    Inventors: Shailender Chaudhry, Marc Tremblay
  • Publication number: 20020095566
    Abstract: A branch operation is processed using a branch predict instruction and an associated branch instruction. The branch predict instruction indicates a predicted direction, a target address, and an instruction address for the associated branch instruction. When the branch predict instruction is detected, the target address is stored at an entry indicated by the associated branch instruction address and a prefetch request is triggered to the target address. The branch predict instruction may also include hint information for managing the storage and use of the branch prediction information.
    Type: Application
    Filed: October 12, 1998
    Publication date: July 18, 2002
    Inventors: HARSHVARDHAN SHARANGPANI, TSE-YU YEH, MICHAEL PAUL CORWIN, MILLAND MITTAL, KENT FIELDEN, DALE MORRIS
  • Patent number: 6418516
    Abstract: A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions and treats instructions in a different manner when they are loaded speculatively. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. The load requests are sent to the lower level cache when the upper level cache does not contain the value required by the load.
    Type: Grant
    Filed: July 30, 1999
    Date of Patent: July 9, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Leo James Clark, John Steven Dodson, Guy Lynn Guthrie, William John Starke
  • Patent number: 6415377
    Abstract: The data processor contains a memory and a data prefetch unit. The data prefetch unit contains a respective FIFO queue for storing prefetched data from each of a number of address streams respectively. The data prefetch unit uses programmable information to generate addresses from a plurality of address streams and prefetches data from addresses successively addressed by a present address for the data stream in response to progress of execution of a program by the processor. The processor has an instruction which causes the data prefetch unit to extract an oldest data from the FIFO queue for an address stream and which causes the data processor to use the oldest data in the manner of operand data of the instruction.
    Type: Grant
    Filed: June 3, 1999
    Date of Patent: July 2, 2002
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Pieter Van Der Wolf, Kornelis A. Vissers
  • Patent number: 6415378
    Abstract: A method and system for debugging the execution of an instruction within an instruction pipeline is provided. A processor in a data processing system contains instruction pipeline units. An instruction may be tagged, and in response to an instruction pipeline unit completing its processing of the tagged instruction, a stage completion signal is asserted. An execution monitor external to the pipelined processor monitors the stage completion signals during the execution of the tagged instruction. The execution monitor may be a logic analyzer that displays the stage completion signals in real-time on a display device of the execution monitor. An instruction to be tagged may be selected based upon an instruction selection rule, such as the address of the instruction.
    Type: Grant
    Filed: June 30, 1999
    Date of Patent: July 2, 2002
    Assignee: International Business Machines Corporation
    Inventors: Joel Roger Davidson, Judith K. Laurens, Alexander Erik Mericas, Kevin F. Reick, Joel M. Tendler
  • Patent number: 6415356
    Abstract: One embodiment of the present invention provides a system that prefetches from memory by using an assist processor that executes in advance of a primary processor. The system operates by executing executable code on the primary processor, and simultaneously executing a reduced version of the executable code on the assist processor. This reduced version runs more quickly than the executable code, and generates the same pattern of memory references as the executable code. This allows the assist processor to generate the same pattern of memory references that the primary processor generates in advance of when the primary processor generates the memory references. The system stores results of memory references generated by the assist processor in a store that is shared with the primary processor so that the primary processor can access the results of the memory references. In one embodiment of the present invention, this store is a cache memory.
    Type: Grant
    Filed: May 4, 2000
    Date of Patent: July 2, 2002
    Assignee: Sun Microsystems, Inc.
    Inventors: Shailender Chaudhry, Marc Tremblay
  • Patent number: 6412066
    Abstract: A microprocessor is configured to fetch a compressed instruction set which comprises a subset of a corresponding non-compressed instruction set. The compressed instruction set is a variable length instruction set including 16-bit and 32-bit instructions. The 32-bit instructions are coded using an extend opcode, which indicates that the instruction being fetched is an extended (e.g. 32 bit) instruction. The compressed instruction set further includes multiple sets of register mappings from the compressed register fields to the decompressed register fields. Certain select instructions are assigned two opcode encodings, one for each of two mappings of the corresponding register fields. The compressed register field is directly copied into a portion of the decompressed register field while the remaining portion of the decompressed register field is created using a small number of logic gates.
    Type: Grant
    Filed: April 5, 2001
    Date of Patent: June 25, 2002
    Assignee: LSI Logic Corporation
    Inventors: Frank Worrell, Hartvig Ekner
  • Patent number: 6412046
    Abstract: A method and apparatus automatically and easily verifies a cache line prefetch mechanism. The verification method includes a strict definition of which cache lines should be prefetched and which cache lines should not. The method also emphasizes unusual operating conditions. For example, by exercising boundary conditions, the method by stresses situations in which a microprocessor or chip is likely to produce errors. The method can verify prefetch without having to access or view any internal signals or buses inside the chip. The method can be adopted in any system-level verification methodology in simulation, emulation, or actual hardware. The method can be used in a system-level test set up along with a chip-level test set up without requiring knowledge of the internal state of the chip. In this case, checking is done at the chip boundary. The method is automated and performs strict checks on overprefetch, underprefetch, and the relative order in which fetch and prefetches must occur.
    Type: Grant
    Filed: May 1, 2000
    Date of Patent: June 25, 2002
    Assignee: Hewlett Packard Company
    Inventors: Debendra Das Sharma, Kevin Hauck, Daniel F. Li
  • Patent number: 6401193
    Abstract: Prefetching data to a low level memory of a computer system is accomplished utilizing an instruction location indicator related to an upcoming instruction to identify a next data prefetch indicator and then utilizing the next data prefetch indicator to locate the corresponding prefetch data within the memory of the computer system. The prefetch data is located so that the prefetch data can be transferred to a primary cache where the data can be quickly fetched by a processor when the upcoming instruction is executed. The next data prefetch indicator is generated by carrying out the addressing mode function that is embedded in an instruction only when the addressing mode of the instruction is a deterministic addressing mode such as a sequential.
    Type: Grant
    Filed: October 26, 1998
    Date of Patent: June 4, 2002
    Assignee: Infineon Technologies North America Corp.
    Inventors: Muhammad Afsar, Klaus Oberlaender
  • Patent number: 6401212
    Abstract: In a computer system (10) embodiment, there is included a memory (18) and circuitry (16a) for prefetching information from the memory in response to a prefetch request. The system further includes a system resource (14), wherein the system resource is burdened in response to a prefetch operation by the circuitry for prefetching information. The system resource is also further burdened in response to other circuitry (16b, 16n, 17) using the system resource. The system further includes circuitry (20, 22, 24) for determining a measure of the burden on the system resource. Lastly, the system includes circuitry (26) for prohibiting prefetching of the information by the circuitry for prefetching information responsive to a comparison of the measure of the burden with a threshold. Other circuits, systems, and methods are also disclosed and claimed.
    Type: Grant
    Filed: November 8, 2000
    Date of Patent: June 4, 2002
    Assignee: Texas Instruments Incorporated
    Inventors: James O. Bondi, Jonathan H. Shiell
  • Patent number: 6401192
    Abstract: A mechanism and method for software hint initiated prefetch is provided. The prefetch may be directed to a prefetch of data for loading into a data cache, instructions for entry into an instruction cache or for either, in an embodiment having a combined cache. In response to a software instruction in an instruction stream, a plurality of prefetch specification data values are loaded into a register having a plurality of entries corresponding thereto. Prefetch specification data values include the address of the first cache line to be prefetched, and the stride, or the incremental offset, of the address of subsequent lines to be prefetched. Prefetch requests are generated by a prefetch control state machine using the prefetch specification data values stored in the register. Prefetch requests are issued to a hierarchy of cache memory devices. If a cache hit occurs having the specified cache coherency, the prefetch is vitiated.
    Type: Grant
    Filed: October 5, 1998
    Date of Patent: June 4, 2002
    Assignee: International Business Machines Corporation
    Inventors: David Andrew Schroter, Michael Thomas Vaden
  • Patent number: 6397320
    Abstract: A method for ordering the time of issuing of a load instruction from a lower level (L2) cache controller to its L2 cache in a data processing system to enable delivery of a load data at a time it is required by its downstream dependency is disclosed. The method comprises the steps of (i) determining a cycle of dependency (CoD) of the load data, where the CoD corresponds to an exact synchronized timer (ST) time, measured in cycles, on which said data is required by said downstream dependency from the L2 cache, and (ii) issuing the load instruction to said L2 cache at said time to synchronize a providing of said data to a pipeline of a system resource with a request by its downstream dependency. In the preferred embodiment of the invention, a distance of dependency (DoD) value is first appended to the load instruction. The DoD value is then converted to a CoD value when a miss occurs at the internal (L1) cache.
    Type: Grant
    Filed: June 25, 1999
    Date of Patent: May 28, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Lakshminarayanan Baba Arimilli, John Steven Dodson, Jerry Don Lewis
  • Publication number: 20020059511
    Abstract: A data processor having a debugging aid function capable of monitoring a plurality of kinds of internal buses from the outside and identifying each of the buses monitored is provided. A central processing unit (CPU), a debugging aid module, and other circuit modules are mounted on a semiconductor chip. The debugging aid module selects an information transmitting path in accordance with a trace condition from a plurality of information transmitting paths used for the operation of a central processing unit (CPU) or the like, holds trace information obtained according to the trace condition from the selected information transmitting path together with attribute information of the trace information in a buffer circuit, and outputs the information serially to the outside of the semiconductor chip. A plurality of kinds of internal buses can be monitored on the outside, and each of the buses monitored can be identified.
    Type: Application
    Filed: November 2, 2001
    Publication date: May 16, 2002
    Inventors: Ryo Sudo, Shigezumi Matsui, Yasunori Matsumoto
  • Patent number: 6389529
    Abstract: A system for time-ordered execution of load instructions. More specifically, the system enables just-in-time delivery of data requested by a load instruction. The system consists of a processor, an L1 data cache with corresponding L1 cache controller, and an instruction processor. The instruction processor manipulates a plurality of architected time dependency fields of a load instruction to create a plurality of dependency fields. The dependency fields holds a relative dependency value which is utilized to order the load instruction in a Relative Time-Ordered Queue (RTOQ) of the L1 cache controller. The load instruction is sent from RTOQ to the L1 data cache at a particular time so that the data requested is loaded from the L1 data cache at the time specified by one of the dependency fields. The dependency fields are prioritized so that the cycle corresponding to the highest priority field which is available is utilized.
    Type: Grant
    Filed: June 25, 1999
    Date of Patent: May 14, 2002
    Assignee: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, Lakshminarayanan Baba Arimilli, John Steven Dodson, Jerry Don Lewis
  • Patent number: 6381679
    Abstract: An information processing unit and method for controlling a cache according to a software prefetch instruction, are disclosed. Indication bits are provided for indicating a hierarchical level of a cache to which an operand data is to be transferred or a quantity of an operand data to be transferred, or both. The indication bits are provided in a software prefetch instruction such that at the time of a transfer of block data or line data, a required data is transferred to a cache based on the indication bits in the prefetch instruction. Thus, it is not necessary to change the timing for executing a software prefetch instruction depending on which one of the caches of the hierarchical levels is hit, and a compiler can generate an instruction sequence more easily.
    Type: Grant
    Filed: July 3, 2000
    Date of Patent: April 30, 2002
    Assignee: Hitachi, Ltd.
    Inventors: Kenji Matsubara, Toshihiko Kurihara, Hiromitsu Imori
  • Patent number: 6374344
    Abstract: A technique handles load instructions within a data processor that includes a cache circuit having a data cache and a tag memory indicating valid entries within the data cache. The technique involves writing data to the data cache during a series of four processor cycles in response to a first load instruction. Additionally, the technique involves updating the tag memory and preventing reading of the tag memory in response to the first load instruction during a first processor cycle in the series of processor cycles. Furthermore, the technique involves reading tag information from the tag memory during a processor cycle of the series of four processor cycles following the first processor cycle in response to a second load instruction.
    Type: Grant
    Filed: November 25, 1998
    Date of Patent: April 16, 2002
    Assignee: Compaq Information Technologies Group L.P. (CITG)
    Inventors: David Arthur James Webb, Jr., James B. Keller, Derrick R. Meyer